Twitter Data Retrieval
TASK
- Fetch 100 tweets by a user.
- Retrieve tweet_id, tweet_text, created_at time.
- Store the retrieved data in a CSV file.
CONSTRAINTS
- Object Oriented Programming.
- Try using magic methods to make the UserTweet object iterable.
STEP 0
For fetching data from twitter access tokens need to be generated first.
Go to https://apps.twitter.com/. Create new App. Fill in the required
details.
STEP -1
Fill in the required details and click on create Twitter Application.
STEP-03
Select the application you have created. Take the consumer key. Click on generate access token. Add all the tokens generated in a separate file config.py. These tokens are used to access twitter using python API client
for twitter, tweepy.
STEP-04
Using tweepy do the authentication.
THE CODE
- Based on the twitter handle tweets were to be fetched. I was using Tweepy. So, went through the api documentation to figure out which method is being used to fetch the user timeline, are there any limitations? Task was to fetch 100.
- Came across user_timeline() method. Which returns tweets from the mentioned twitter handle. Here, was a little catch to be noticed. It does not return all the tweets by the user. Instead it by default returns only 20 tweets.
- To fetch tweets more than 20. There is a parameter count which needs to be passed. Along with the twitter handle.
get_tweets() method is used to store the tweets fetched from user’s timeline. Based on handle. And TWEET_COUNT. Iterate over the returned tweet_obj.
Out of complete data structure all I needed was tweet_id, created_at and tweet text. Fetch and stored those in a named tuple.
For per tweet these fields were to be stored. So, appended the tuple in a list.
And, finally self._tweets stores 100 tweets.
Last requirement was to implement magic methods __len__() and __getitem__() to make the UserTweet object Iterable.
All through, there was good recap of concepts.
Learnings
- Good use case to apply object oriented programming concepts,
- Implementation of magic methods.
- For storing tweets used named tuple within list. Named tuples helps in accessing elements using names instead of position.
- Recap of how named tuples are initialised. Wrong initialisation was leading to storing of tweet objects instead of a named tuple with required values. Multiple iterations and fixed the same.
POSSIBLE EXTENSIONS
Data Cleaning
Sentiment Analysis
Similarity between timelines
Fetch some more data points. Analyse data. Plot graphs. And, customer feedback for services and products can be observed.
And many more …….
Here’s the complete code .