Twitter is as much a delight for tweeple (people using Twitter to tweet) as it is for data scientists. The APIs and the documentation are well updated and easy to use. Let us get started with the APIs.
Twitter has one of easiest yet most powerful set of APIs available of any social network out there. These APIs have been used by Twitter itself and data scientists to understand the dynamics of the Twitter world. Twitter APIs make use of four different objects, namely:
@MarsCuriosity
is one such nonhuman popular Twitter handle with over 2 million followers!The preceding objects from the Twitter APIs have been explained at length on the website https://dev.twitter.com/. We urge readers to go through it to understand the objects and APIs even better.
Twitter has libraries available in all major programming languages/platforms. We will be making use of TwitteR, that is, Twitter's library for R.
Twitter Best Practices
Twitter has a set of best practices and a list of dos and don'ts specified clearly on its developer site, https://dev.twitter.com/, which talks about security/authentication, privacy, and more. Since Twitter supports a huge customer base with high availability, it tracks the usage of its APIs as well to keep its systems healthy. There is a defined rate limit on the number of times their APIs are queried. Kindly go through the best practices and be a #gooddeveloper
!
Now that we have enough background about Twitter and its API objects, let us get our hands dirty. The first step when starting to use the APIs is to inform Twitter about your application. Twitter uses the standard Open Authentication (OAuth) protocol for authorizing a third party app. OAuth uses an application's consumer key, consumer secret, access token, and access token secret to allow it to use APIs and data of the connected service.
The following quick steps will set us up for the game:
TwitterAnalysis_rmre
. For callback URL use http://127.0.0.1:1410
to point back to your local system. You may choose any other port number as well.The Twitter application page
Congratulations, your app is created and registered with Twitter. But before we can use it, there's one more piece to it. We need to create access tokens, and to do that we perform the following steps.
Application keys and access tokens
We will be using the same application for this as well as in the coming chapter. Make a note of the consumer key, consumer secret, access token and access secret; we will need these in our application.
Now that we have everything ready at Twitter's end, let us set things up at R's end as well. Before we start playing with the data from Twitter, the first step would be to connect and authenticate ourselves through the app we just created using R.
We will make use of R's TwitteR library by Jeff Gentry. This library or client allows us to use Twitter's web APIs through R. We will use the method setup_twitter_oauth()
to connect to Twitter using our app's credentials (keys and access tokens). Kindly replace XXXX
in the following code with your access keys/tokens generated in the previous step:
> # load library > library(twitteR) > # set credentials > consumerSecret = "XXXXXXXXXXXXX" > consumerKey = "XXXXXXXXXXXXXXXXXXXXXXXXXx"
Upon executing the preceding snippet of code, it will prompt you to use a local file to cache credentials or not. For now, we will say No
to it:
This will open up your browser and ask you to log in using your Twitter credentials and authorize this app, as shown in the following screenshot:
Authorize app to fetch data
Once authorized, the browser will be redirected to the callback URL we mentioned when we created the app on Twitter. You may use a more informative URL for the user as well.
Congratulations, you are now connected to the ocean of tweets.
Now that we are connected to Twitter using R, it's time to extract some latest tweets and analyze what we get. To extract tweets, we will use the handle for Twitter's account 001 (Twitter's founder and first user), Jack Dorsey, @jack
. The following snippet of code extracts the latest 300 tweets from him:
> twitterUser <- getUser("jack") > # extract jack's tweets > tweets <- userTimeline(twitterUser, n = 300) > tweets
The output contains text combined with unprintable characters and URLs due to Twitter's content-rich data. We will look at the metadata of a tweet in a bit, but before that, the extracted information looks like this:
Sample tweets
To see the attributes and functions available to analyze and manipulate each tweet, use the getClass
method as follows:
> # get tweet attributes > tweets[[1]]$getClass() > > # get retweets count > tweets[[1]]$retweetCount > > # get favourite count > tweets[[1]]$favoriteCount
The following output will be generated: