Twitter API

From Q
Jump to navigation Jump to search

Overview of the Twitter API

This page explains how to use R with Twitter API's to extract information and insights from Twitter accounts, including tweets, followers, impressions, and more. The data obtained using the Twitter API can be used to create analysis and charting to track your social activity as well as mining content from public pages.

The basic process required to access and utilize the Twitter API is:

  1. Create an app for your Twitter Page (this is needed to obtain an authentication token which will be used in all API calls).
  2. Generate OAuth token.
  3. Create an R output to call Twitter APIs which will generate the dataframes used in the analysis.
  4. Create charts and analysis in Q.

App Setup and Configuration

A Twitter app must be setup to generate an Access Token and Access Token Secret which are needed to generate the authentication token used for the API calls.

  1. Login to the Twitter Developers site at https://dev.twitter.com/apps/ using your Twitter login ID and password.
  2. Click the "Create New App" button.
  3. Enter a App Name, Description and website URL.
  4. Click on the "Create your Twitter application" button.
  5. From your Application Settings dashboard, select the "Keys and Access Tokens" tab.
  6. Click the "Create my access token" token.

Required R Packages

The "twitterR" and "ROAuth" packages must first be installed and can be acquired from the CRAN site or from GitHub.

install.packages("twitteR")
install.packages("ROAuth")
library("twitteR")
library("ROAuth")

Authentication token

To use direct authentication, enter the following R code and insert the required values from your app dashboard (Keys and Tokens tab) to store your authentication parameters"

consumer_key <- '[your_consumer_key]'
consumer_secret <- '[your_consumer_secret]'
access_token <- '[your_access_token]'
access_secret <- '[your_access_secret]'

Run the authorization script passing your token parameters to the OAuth API:

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

Twitter API Calls

For a full list of the twitteR package API's, visit the Twitter Developer Documentation documentation page.

Search API

The searchTwitter API is used to search Twitter based on a supplied search string.

search.string = 'Hillary Clinton'
no.of.tweets = 25
tweets <- searchTwitter(search.string, n=no.of.tweets, lang="en")

Various search parameters can be specified in the API call including language, date range, location/geocode and the maximum number of tweets to return. An R list is returned with the search results which can then be converted into an R dataframe.

search.df <- twListToDF(tweets)

Tweets

Tweet data is extracted using the the userTimeline API. The following script will extract the last 1,000 Tweets from the Twitter page @readDonaldTrump.

tweets.user <- userTimeline("realDonaldTrump", n = 1000, maxID=NULL, sinceID=NULL, includeRts=TRUE)
tweets.user <- twListToDF(tweets.user) #converts returned Tweets to data frame
tweets.user

The following R data frame containing the Tweet details is returned.

Followers

A Twitter user's followers can be extracted using the getFollowers API. The following R script extracts a list of followers from the Q Twitter page (@qstatistics).

user <- getUser("qstatistics") #get user data
user$toDataFrame() #convert user data to data frame
followers <- user$getFollowers() # gets this user's followers
followers <- twListToDF(followers) #converts follower data to data frame
followers

The resulting R data frame is returned.

Using Twitter API with Q

Creating an R output

To create an R Output in Q, select Create > R Output. Enter the following R code to generate a data frame using the Twitter userTimeline API. Note that the consumer_key, consumer_secret, access_token and access_secret values are stored in separate R Outputs so that they can be referenced by all other R Outputs.

The following example extracts the most recent 3,000 Tweets from the @realDonaldTrump Twitter page.

library("twitteR")
library("ROAuth")

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

tweets.user <- userTimeline("realDonaldTrump", n = 3000, maxID=NULL, sinceID=NULL, includeRts=TRUE)
tweets.user <- twListToDF(tweets.user) #converts Tweet data to data frame
tweets.user$txtdate <- as.Date(tweets.user$created) #appends a date formatted field to the data frame of the Tweet date
tweets.user

The following data frame is generated in Q.

Q tweets example.png


Data manipulation

The Tweet data can be aggregated by the date field to generate a new data frame that can be charted to display the number of Tweets made each day. The following code will

aggdata <- aggregate(tweets.user$id, by=list(tweets.user$txtdate), FUN=length)



Charting

Using the R plotly package, a column chart of the aggregated data can be generated in Q to view the number of Tweets made per day on the @realDonaldTrump Twitter account.

library(plotly)

plot_ly(
  y = aggdata$x,
  x = aggdata$Group.1,
  type = 'bar',
  name = 'Link'
)%>%

layout(title = "Daily Tweets",
         xaxis = list(title = ""),
         yaxis = list(title = "# of Tweets"),
         annotations = list(x = aggdata$Group.1, 
                            y = aggdata$x, 
                            text = "",
                            xanchor = 'center', 
                            yanchor = 'bottom', 
                            showarrow = FALSE)
    )


Dailytweets chart.png


API Limits

Rate Limits

Twitter imposes API rate limiting on a per-user (per access token) basis and are divided into 15 minute intervals.

  • Application-only authentication rate limit: 15 calls per 15 minutes window
  • Search API rate limit: 180 calls per 15-minute window

More details regarding Twitter API rate limiting can be found on the Twitter API Rate Limiting page.

Displayr and Q (R Server) Limits

When making API calls from within Q or Displayr, the R server has a timeout limit of approximately 2 minutes. Under normal circumstances, the Twitter API will return approximately 1500 records within this timeframe. The number of records that will be returned will vary depending on the type of API call being made (for example, searching Twitter versus querying Tweet data from a specific user). Therefore, it is generally good practice to split larger data sets into multiple API calls so as to work within the R server timeout limits.

See also