The dataset that we will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people’s dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because “they’re good dogs, Brent.” WeRateDogs has over 4 million followers and has received international media coverage.
There are many insights we can get from this dataset on WeRateDogs but first, we have to do a final check and save our new dataframes to their
- pupper: Puppy, a small doggo
- puppo: A transitional phase between pupper and doggo. Easily understood as the dog equivalent of a teenager.
- doggo: Dog, usually older.
- floofer: Very fluffy dog or a dog with excess fur. Comical amounts of fur on a dog will certainly earn any dog this name.
- multiple_stages: This means either multiple dogs have been entered in a tweet or a single dog has multiple stages. For example, a single dog can both be a pupper and a floofer.
Analysing, and Visualizing Data
from PIL import Image # Library for importing images import pandas as pd # Library for working with datasets import numpy as np # Library for calculations Image.open('images/dog_jump.jpg').crop((100, 300, 1200, 1000)) # Crop the image
From the bar chart, we can see that
pupper is the most common dog type in our dataset with 203 entries and 66.6% of the share. This is nearly 3x more than
doggo in 2nd place with 73 entries and 23.9%. The least common dog type is floofer with only 7 entries or 2.3% of the total dog types followed by puppo with 22 entries and 7.2%.
2. Most Popular Dog Types by Favourite Counts
The most popular dog types by favourite count are
puppo in 1st place with 20,418,
puppo also placed 3rd in the most common dog types interestingly. 2nd place is
doggo with 18,399. Dogs with
multiple_stages enjoyed a strong showing with 15,653 favourites slotting in nicely in 3rd place, then
pupper in 4th and 5th place with 11,738 and 6,493 respectively.
3. Most Popular Dog Types by Retweet Counts
The most popular dog types by retweet count shows
doggo switching places with the remaining positions the same as last time. 1st place has 6,313, 2nd with 5,624 and 3rd and 4th with 4,786 and 4,200. The last place is
pupper with just 2,021. Interestingly,
pupper has finished last in favourite and retweet counts despite being the most common dog type. This chart shows a much closer spread than the last chart.
We can conclude in our table sorted by
ratio the most common dog type
pupper, is the least popular surprisingly but the most popular is
puppo with the second fewest entries at just 22.
Floofer has the strongest ratio of entries to popularity at 2276%, gaining 15938 total retweets and favourite counts from just 7 entries.
4. Followers v Date
We had some strange outliers we missed earlier so we can remove them by either using
np.percentile to remove values above a certain percentile. This graph shows an overall downward trend in terms of followers. From the period of 01/16 to 04/16 we observed a sharp decline from 630k to 580k before levelling out. Followers increased slightly from 09/16 to 12/16 before continuing at slight downward trend again.In :
Here, we can see retweets generally increase in the 2nd half of the year both in 2016 and 2017. We can see a few outliers with spikes showing in July 2016 of over 70,000 retweets. This could be a viral tweet, we see more signs of them between December 2016 and Feburary 2017 then again in the 2nd half of 2017.
5. Retweets v Month
# Dataframe of retweet_count by month pd.read_csv('tables/retweet_month.csv').sort_values('retweet_count', ascending=False).set_index('Unnamed: 0').rename_axis(None, axis = 0)
April shows the least retweets in 2016 with only 74,331, followed by May with 106,893. The best performing month is June with 303,113 retweets. Our bar chart shows an upward trend after falling in July. This recovery continues until December showing an impressive 254,200. This indicates an increase of around 68k retweets compared to July.
6. Retweets v Time of Day
# Dataframe of retweet_count by month pd.read_csv('tables/retweets_v_time_of_day_table.csv').sort_values('retweet_count', ascending=False).set_index('time_of_day')
The most retweets come
overnight which is interesting but can be explained by the fact that the users could be active in other timezones around the globe. It might be between midnight and 06:00 for
WeRateDogs but it could be daytime in the countries its followers retweeted from, this data isn’t as useful since we don’t have the data on individual countries.
7. Most Popular Dog Names
From our findings, we can see the most popular dog names are Tucker, Oliver, Cooper, Charlie, Penny each with 10 counts. 4 of the 5 top names seem to be male which is also an interesting observation, but there are female names such as Penny, Lucy and Lola with strong results.
8. Favourites vs Retweets
Favourite and retweet counts seem to have a nearly 1:1 correlation which shows a strong possibility users favourite and retweet posts they see at the same time. Users are also highly unlikey to perform either action seperately.
9. Favourites vs Followers
Here, we can see favourites comfortably outperforming retweets. However, the trend of retweets and favourites increasing and decreasing over time remains constant. This indicates users probably retweet then like the tweets they see but are more likely to favourite than retweet.
From our boxplot, we can see
puppo has the most variance but also the highest number of retweets but
doggo has the widest outliers. For favourites, we can see the pattern remains the same except
puppo in favourite counts with
Floofer also performing strongly. In both plots, we can see pupper has a very tight grouping which indicates consistency in terms of retweets and favourites.
Notes: Code can be toggled with
toggle_everything icon in the toolbar.