Introduction

The dataset that we will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people’s dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because “they’re good dogs, Brent.” WeRateDogs has over 4 million followers and has received international media coverage.

There are many insights we can get from this dataset on WeRateDogs but first, we have to do a final check and save our new dataframes to their master variables.

Dictionary:

  1. pupper: Puppy, a small doggo and usually younger.
  2. puppo: A transitional phase between pupper and doggo. Easily understood as the dog equivalent of a teenager.
  3. doggo: Dog, usually older.
  4. floofer: Very fluffy dog or a dog with excess fur. Comical amounts of fur on a dog will certainly earn any dog this name.
  5. multiple_stages: This means either multiple dogs have been entered in a tweet or a single dog has multiple stages. For example, a single dog can both be a pupper and a floofer.

Analysing, and Visualizing Data

from PIL import Image # Library for importing images
import pandas as pd # Library for working with datasets
import numpy as np # Library for calculations
Image.open('images/dog_jump.jpg').crop((100, 300, 1200, 1000))  # Crop the image
count
dog_type
pupper203
doggo62
puppo22
multiple_stages11
floofer7

From the bar chart, we can see that pupper is the most common dog type in our dataset with 203 entries and 66.6% of the share. This is nearly 3x more than doggo in 2nd place with 73 entries and 23.9%. The least common dog type is floofer with only 7 entries or 2.3% of the total dog types followed by puppo with 22 entries and 7.2%.

favorite_count
dog_type
puppo20418
doggo18399
multiple_stages15653
floofer11738
pupper6493

The most popular dog types by favourite count are puppo in 1st place with 20,418, puppo also placed 3rd in the most common dog types interestingly. 2nd place is doggo with 18,399. Dogs with multiple_stages enjoyed a strong showing with 15,653 favourites slotting in nicely in 3rd place, then floofer and pupper in 4th and 5th place with 11,738 and 6,493 respectively.

retweet_count
dog_type
doggo6313
puppo5624
multiple_stages4786
floofer4200
pupper2021

The most popular dog types by retweet count shows puppo and doggo switching places with the remaining positions the same as last time. 1st place has 6,313, 2nd with 5,624 and 3rd and 4th with 4,786 and 4,200. The last place is pupper with just 2,021. Interestingly, pupper has finished last in favourite and retweet counts despite being the most common dog type. This chart shows a much closer spread than the last chart.

retweet_countfavorite_counttotalentriesratio
dog_type
floofer4200117381593872276
multiple_stages47861565320439111858
puppo56242041826042221183
doggo6313183992471262398
pupper20216493851420341

We can conclude in our table sorted by ratio the most common dog type pupper, is the least popular surprisingly but the most popular is puppo with the second fewest entries at just 22. Floofer has the strongest ratio of entries to popularity at 2276%, gaining 15938 total retweets and favourite counts from just 7 entries.

4. Followers v Date

followers_count
count1.986000e+03
mean8.940962e+06
std6.865258e+01
min8.940931e+06
25%8.940937e+06
50%8.940943e+06
75%8.940973e+06
max8.941633e+06

We had some strange outliers we missed earlier so we can remove them by either using .clip or np.percentile to remove values above a certain percentile. This graph shows an overall downward trend in terms of followers. From the period of 01/16 to 04/16 we observed a sharp decline from 630k to 580k before levelling out. Followers increased slightly from 09/16 to 12/16 before continuing at slight downward trend again.In [13]:

Here, we can see retweets generally increase in the 2nd half of the year both in 2016 and 2017. We can see a few outliers with spikes showing in July 2016 of over 70,000 retweets. This could be a viral tweet, we see more signs of them between December 2016 and Feburary 2017 then again in the 2nd half of 2017.

5. Retweets v Month

# Dataframe of retweet_count by month
pd.read_csv('tables/retweet_month.csv').sort_values('retweet_count', ascending=False).set_index('Unnamed: 0').rename_axis(None, axis = 0)
retweet_count
June303113
December254200
October235565
November218451
January218206
September192606
July186798
August183950
March167290
February152670
May106893
April74331

April shows the least retweets in 2016 with only 74,331, followed by May with 106,893. The best performing month is June with 303,113 retweets. Our bar chart shows an upward trend after falling in July. This recovery continues until December showing an impressive 254,200. This indicates an increase of around 68k retweets compared to July.

6. Retweets v Time of Day

# Dataframe of retweet_count by month
pd.read_csv('tables/retweets_v_time_of_day_table.csv').sort_values('retweet_count', ascending=False).set_index('time_of_day')

Out[43]:

retweet_count
time_of_day
Overnight2080926
Evening1299954
Afternoon1243851
Morning56899

The most retweets come overnight which is interesting but can be explained by the fact that the users could be active in other timezones around the globe. It might be between midnight and 06:00 for WeRateDogs but it could be daytime in the countries its followers retweeted from, this data isn’t as useful since we don’t have the data on individual countries.

count
name
Cooper10
Oliver10
Charlie10
Tucker9
Penny9
Lucy9
Winston8
Sadie8
Daisy7
Lola7

From our findings, we can see the most popular dog names are Tucker, Oliver, Cooper, Charlie, Penny each with 10 counts. 4 of the 5 top names seem to be male which is also an interesting observation, but there are female names such as Penny, Lucy and Lola with strong results.

8. Favourites vs Retweets

Favourite and retweet counts seem to have a nearly 1:1 correlation which shows a strong possibility users favourite and retweet posts they see at the same time. Users are also highly unlikey to perform either action seperately.

9. Favourites vs Followers

Here, we can see favourites comfortably outperforming retweets. However, the trend of retweets and favourites increasing and decreasing over time remains constant. This indicates users probably retweet then like the tweets they see but are more likely to favourite than retweet.

From our boxplot, we can see puppo has the most variance but also the highest number of retweets but doggo has the widest outliers. For favourites, we can see the pattern remains the same except doggo beats puppo in favourite counts with Floofer also performing strongly. In both plots, we can see pupper has a very tight grouping which indicates consistency in terms of retweets and favourites.

Notes: Code can be toggled with toggle_everything icon in the toolbar.

Author

Write A Comment