Fortunately, Twitter provides a web API and people started to implement it in different languages, like Mike Verdone (@sixohsix) and his great Python Twitter Tools.
So here comes twitter-archiver and twitter-follow, python programs added to Python Twitter Tools to archive any public timeline of tweets in simple text format and to view the list of following/followers of a user.
Why not using twitter-log?
Python Twitter Tools already includes twitter-log to backup tweets, but it does not display them as I want (compact, one per line) and does not support saving to file and resume (everytime you need to retrieve all tweets). Also, it does not include retweets and I want them (include_rts=1).Authentication?
By default, the script does not authenticate to Twitter therefore you can only browse public timelines. Also note that you have a lower API rate limit when not authenticated.If you want to archive a protected timeline you have access to, just give -o (--oauth) parameter the script will guide you through the OAuth process that is only asked on first run (tokens are saved in ~/.twitter-archiver_oauth and reused next time).
Installation
You can find twitter-archiver and twitter-follow on my cloned repository of Python Twitter Tools on github.You can choose to install the programs as explained in the README and then use twitter-archiver, or run it directly by using:
$ python -uc 'from twitter import archiver; archiver.main()' [<args...>]Tip: you can use PYTHONPATH environment variable to point at Python Twitter Tools (parent) directory.
Usage
$ twitter-archiver USAGE twitter-archiver [options] <-|user> [<user> ...] DESCRIPTION Archive tweets of users, sorted by date from oldest to newest, in the following format: <id> <date> <<screen_name>> <tweet_text> Date format is: YYYY-MM-DD HH:MM:SS TZ. Tweet <id> is used to resume archiving on next run. Archive file name is the user name. Provide "-" instead of users to read users from standard input. OPTIONS -o --oauth authenticate to Twitter using OAuth (default no) -s --save-dir <path> directory to save archives (default: current dir) -a --api-rate see current API rate limit status -t --timeline <file> archive own timeline into given file name (requires OAuth, max 800 statuses). AUTHENTICATION Authenticate to Twitter using OAuth to archive tweets of private profiles and have higher API rate limits. OAuth authentication tokens are stored in ~/.twitter-archiver_oauth.
Examples
Give a twitter user to retrieve all tweets and print them:$ twitter-archiver stalkr_ * Archiving stalkr_ tweets in ./stalkr_ Browsing stalkr_ timeline, new tweets: 200 [...] Browsing timeline, new tweets: 186 Total tweets for stalkr_: 982 (982 new)
It saves tweets in a format you can grep:
$ grep -i winbuilder stalkr_ 89264285980696576 2011-07-08 11:27:18 CEST <stalkr_> RT @irongeek_adc: Dual booting Winbuilder/Win7PE SE and Backtrack 5 on a USB flash drive with XBOOT http://t.co/SdcxMIu
See your current unauthenticated API rate limit status:
$ twitter-archiver --api-rate Remaining API requests: 139/150 (hourly limit) Next reset in 3508s (Mon Jul 11 02:18:07 2011)
Run it again and it automatically resumes archiving:
$ twitter-archiver stalkr_ * Archiving stalkr_ tweets in ./stalkr_ Browsing stalkr_ timeline, new tweets: 16 Total tweets for stalkr_: 998 (16 new)
You cannot archive tweets of a protected timeline:
$ twitter-archiver protected_user * Archiving protected_user tweets in ./protected_user Fail: 401 Unauthorized (tweets of that user are protected) Total tweets for protected_user: 0 (0 new)
Authenticate with OAuth to archive protected timelines you follow:
$ twitter-archiver --oauth protected_user * Archiving protected_user tweets in ./protected_user Browsing timeline, new tweets: 134 Total tweets for protected_user: 134 (134 new)
You can also see that your API rate limit is higher when authenticated:
$ twitter-archiver --oauth --api-rate Remaining API requests: 343/350 (hourly limit) Next reset in 752s (Mon Jul 11 01:38:22 2011)
And of course, you archive as many users you want:
$ twitter-archiver --oauth stalkr_ Ivanlef0u [etc.] * Archiving stalkr_ tweets in ./stalkr_ [...] * Archiving Ivanlef0u tweets in ./Ivanlef0u [...] Total: X tweets (Y new) for 2 users
Script automatically retries when there is a failure and automatically waits for the reset when the API rate limit is reached.
Simple list of following/followers
In the same spirit as the Archiver, I wanted a simple script to retrieve:- the list of users a particular user follow: the following page
- the list of users that follow a particular user: the followers page
Twitter kindly provides friends/ids and following/ids API for that, but it returns a list of user ids and not user names. One has to use users/lookup API to resolve a user id into a screen name, no more than 100 user ids at a time.
So along with twitter-archiver also comes twitter-follow, also with OAuth support:
$ twitter-follow USAGE twitter-follow [options] <user> DESCRIPTION Display all following/followers of a user, one user per line. OPTIONS -o --oauth authenticate to Twitter using OAuth (default no) -r --followers display followers of the given user (default) -g --following display users the given user is following -a --api-rate see your current API rate limit status AUTHENTICATION Authenticate to Twitter using OAuth to see following/followers of private profiles and have higher API rate limits. OAuth authentication tokens are stored in the file .twitter-follow_oauth in your home directory.
Usage example:
$ twitter-follow --oauth --followers stalkr_ Browsing followers, new: 1022 Resolving user ids to screen names: 100/1022 [...] Resolving user ids to screen names: 1022/1022 0vercl0k [...] Total followers for stalkr_: 1022 $ twitter-follow --oauth --following stalkr_ Browsing following, new: 314 Resolving user ids to screen names: 100/314 [...] Resolving user ids to screen names: 314/314 Ivanlef0u [...] Total users stalkr_ is following: 314
Pulling pieces together
You are now able to:- archive all your tweets
- archive tweets of users you follow (following)
- you can rebuild your own timeline (something Twitter API does not allow you to do because it limits to last 800 statuses)
- archive tweets of users who follow you (followers)
- do that regularily with a cron script
- grep tweets!
As an example, here is the script I use to build my personal twitter database:
#!/bin/bash # Your username + directory to work in ME=stalkr_ DIR=~/twitter-db error() { echo "Error: $@" >&2; exit 1; } cd "$DIR" || error "unable to cd to $DIR" S=$(date '+%s') # Save following/followers twitter-follow -o -g "$ME" > following.lst twitter-follow -o -r "$ME" > followers.lst # Archive all tweets of self + following/followers + others mkdir -p all || error "failed mkdir all" { echo "$ME" cat following.lst followers.lst others.lst 2>/dev/null } | twitter-archiver -o -s "$PWD/all" -- - # Build subsets of following/followers using symlinks rm -rf following followers mkdir following followers || error "failed mkdir following followers" while read N; do [ -f "all/$N" ] && ln -s "../all/$N" "following/$N" done < following.lst while read N; do [ -f "all/$N" ] && ln -s "../all/$N" "followers/$N" done < followers.lst # Rebuild timeline by sorting all following tweets find following -not -type d -print0 | xargs -0 cat | sort -n > timeline # Execution time D=$[$(date '+%s') - $S] [ -x "$(which duration)" ] && D=$(duration $D) || D="${D}s" echo "Total time: $D"
# Cron to update twitter-db 25 3 * * * cd /home/stalkr/twitter-db; flock -nox lock bash -c './db.sh > log 2>&1' # Ensure you already performed OAuth authentication before: # twitter-archiver -o -a; twitter-follow -o -a
Data liberation for Twitter!
Hi mate, thank you for this nice post!
ReplyDeleteI am looking for the best way to save/archive all my tweets (mine and people I follow) to a log file continuously (using an interval) avoiding duplicates.
When I use :
"twitter friends -d -t -r -R 30 --format default >> twitter.log"
Duplicate entries are included since the same tweets are logged more than once.
I tried your db.sh script (https://github.com/StalkR/misc/blob/master/twitter/db.sh) which is mentioned above too, but when the followers/following users are many it takes hours to create the timeline since the hourly limit for requests is reached and actually the file is created each time from scratch.
So "tail -f" doesn't help to check and maybe then grep the latest tweets.
Any ideas/suggestions? Thanks in advance!
Hi mate,
ReplyDeleteIt is slow because it has to make 1 request/user minimum (to check if there are new tweets) and request limit is low.. unfortunately.
The timeline is rebuilt from scratch but don't worry this is just a concatenation and sort by time of all tweet files created by twitter-archiver, and these are not created every time: they resume automatically and only new tweets are saved.
The reason it is rebuilt from scratch is because you might have new followers/following, so new tweets with an older date need to be inserted in the timeline. Similarly, you may have removed followers/following and their tweets needs to be removed from timeline.
If you ignore these cases, then yes you can build an append-only timeline that you can "tail -f". The algorithm can be as simple as append from last known tweet id.
Let me know if you need help doing that,
Cheers.