Table of Contents
- Introduction
- Load Libraries
- Gather the Data
- Assess the Data
- Clean the Data
- Analysis & Visualization
- Conclusions
Introduction
This project is a data wrangling project, which mainly focus on fixing the data quality and tidiness issues using python. The dataset that I am wrangling is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people’s dogs with a humorous comment about the dog.
These ratings almost always have a denominator of 10. And the numerators almost always greater than 10, because “they’re good dogs Brent.” The tweet archive records using in this project contains basic tweet data (tweet ID, timestamp, text, etc.) for all 2356 of their tweets as they stood on August 1, 2017.
Load Libraries
In [1]:
# Load libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import datetime import json import os import requests import string import tweepy from IPython.display import Image from IPython.core.display import HTML %matplotlib inline
Gather the Data
I will obtain data from three sources, a manually downloaded csv file, an automatically downloaded csv file and data scraped from the Twitter API.
Twitter Archive
In [2]:
archive = pd.read_csv('twitter_archive_enhanced.csv')
Image Predictions
In [52]:
# Make directory if it doesn't already exist folder_name = 'image_predictions' if not os.path.exists(folder_name): os.makedirs(folder_name)
In [54]:
# Get data url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv' response = requests.get(url)
In [56]:
# Create file with open(os.path.join(folder_name, url.split('/')[-1]), mode='wb') as file: file.write(response.content)
In [3]:
predictions = pd.read_csv('image_predictions/image_predictions.tsv', sep='\t')
API Data
In [4]:
consumer_key = 'HIDDEN' consumer_secret = 'HIDDEN' access_token = 'HIDDEN' access_secret = 'HIDDEN' auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_secret) api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
In [5]:
# Get tweet info tweet = api.get_status(archive.tweet_id[2000], tweet_mode='extended')
In [6]:
# Get json info info = tweet._json info
Out[6]:
{'created_at': 'Thu Dec 03 18:52:12 +0000 2015', 'id': 672488522314567680, 'id_str': '672488522314567680', 'full_text': 'This is Jackie. She was all ready to go out, but her friends just cancelled on her. 10/10 hang in there Jackie https://t.co/rVfi6CCidK', 'truncated': False, 'display_text_range': [0, 134], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 672488519928037376, 'id_str': '672488519928037376', 'indices': [111, 134], 'media_url': 'http://pbs.twimg.com/media/CVUovvHWwAAD-nu.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CVUovvHWwAAD-nu.jpg', 'url': 'https://t.co/rVfi6CCidK', 'display_url': 'pic.twitter.com/rVfi6CCidK', 'expanded_url': 'https://twitter.com/dog_rates/status/672488522314567680/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 304, 'h': 411, 'resize': 'fit'}, 'small': {'w': 304, 'h': 411, 'resize': 'fit'}, 'medium': {'w': 304, 'h': 411, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 672488519928037376, 'id_str': '672488519928037376', 'indices': [111, 134], 'media_url': 'http://pbs.twimg.com/media/CVUovvHWwAAD-nu.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CVUovvHWwAAD-nu.jpg', 'url': 'https://t.co/rVfi6CCidK', 'display_url': 'pic.twitter.com/rVfi6CCidK', 'expanded_url': 'https://twitter.com/dog_rates/status/672488522314567680/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 304, 'h': 411, 'resize': 'fit'}, 'small': {'w': 304, 'h': 411, 'resize': 'fit'}, 'medium': {'w': 304, 'h': 411, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs™?️\u200d?', 'screen_name': 'dog_rates', 'location': '????? ↴ DM YOUR DOGS', 'description': 'Your Only Source for Pawfessional Dog Ratings STORE: @ShopWeRateDogs | IG, FB & SC: WeRateDogs | MOBILE APP: @GoodDogsGame Business: [email protected]', 'url': 'https://t.co/N7sNNHAEXS', 'entities': {'url': {'urls': [{'url': 'https://t.co/N7sNNHAEXS', 'expanded_url': 'http://weratedogs.com', 'display_url': 'weratedogs.com', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 6984325, 'friends_count': 9, 'listed_count': 4521, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 134498, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 7179, 'lang': 'en', 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/948761950363664385/Fpr2Oz35_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/948761950363664385/Fpr2Oz35_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1525830435', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': True, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 460, 'favorite_count': 1151, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}
In [88]:
info['retweet_count']
Out[88]:
460
In [89]:
info['favorite_count']
Out[89]:
1151
In [86]:
info['user']['followers_count']
Out[86]:
6982890
In [19]:
print(datetime.datetime.now().time())
10:38:26.842978
In [7]:
# Make file if it doesn't already exist file_name = 'tweet_json.txt' if not os.path.isfile(file_name): open(file_name, 'w').close()
In [5]:
tweet_ids = archive.tweet_id
In [25]:
tweet_errors = {} tweet_count = 1 data = [] for tweet_id in tweet_ids: try: # Print id counter print(tweet_count) # Collect tweet info tweet = api.get_status(tweet_id, tweet_mode='extended') info = tweet._json #print(info) # debug test #break # debug test # Append to file data.append(info) with open(file_name, 'w') as file: json.dump(data, file) # Print timer info to estimate time until wake-up print(datetime.datetime.now().time()) # Add one to the tweet count for further printing tweet_count += 1 except Exception as e: # Print exception info and add to tweet_errors dict print(str(tweet_id) + ": " + str(e)) tweet_errors[str(tweet_count - 1) + "_" + str(tweet_id)] = info
1 14:35:25.741490 2 14:35:25.878125 3 14:35:26.023736 4 14:35:26.170343 5 14:35:26.311471 6 14:35:26.463066 7 14:35:26.607679 8 14:35:26.748807 9 14:35:26.892929 10 14:35:27.035053 11 14:35:27.183657 12 14:35:27.327272 13 14:35:27.486846 14 14:35:27.631459 15 14:35:27.771591 16 14:35:27.910220 17 14:35:28.056828 18 14:35:28.201442 19 14:35:28.346068 20 888202515573088257: [{'code': 144, 'message': 'No status found with that ID.'}] 20 14:35:28.609364 21 14:35:28.750986 22 14:35:28.904575 23 14:35:29.077114 24 14:35:29.246661 25 14:35:29.398255 26 14:35:29.576779 27 14:35:29.730871 28 14:35:29.888954 29 14:35:30.053514 30 14:35:30.220069 31 14:35:30.377648 32 14:35:30.543206 33 14:35:30.712752 34 14:35:30.868336 35 14:35:31.023920 36 14:35:31.186992 37 14:35:31.350061 38 14:35:31.521602 39 14:35:31.698130 40 14:35:31.850722 41 14:35:31.999326 42 14:35:32.157901 43 14:35:32.308499 44 14:35:32.474560 45 14:35:32.644611 46 14:35:32.813175 47 14:35:32.967762 48 14:35:33.118359 49 14:35:33.283916 50 14:35:33.440498 51 14:35:33.599577 52 14:35:33.769124 53 14:35:33.926703 54 14:35:34.109728 55 14:35:34.282276 56 14:35:34.436863 57 14:35:34.589455 58 14:35:34.775956 59 14:35:34.936527 60 14:35:35.114053 61 14:35:35.286592 62 14:35:35.470101 63 14:35:35.670565 64 14:35:35.830151 65 14:35:35.990722 66 14:35:36.156279 67 14:35:36.317847 68 14:35:36.488391 69 14:35:36.648963 70 14:35:36.810530 71 14:35:36.976087 72 14:35:37.181549 73 14:35:37.362082 74 14:35:37.531630 75 14:35:37.731096 76 14:35:37.922584 77 14:35:38.113075 78 14:35:38.284616 79 14:35:38.464137 80 14:35:38.631689 81 14:35:38.828174 82 14:35:39.005699 83 14:35:39.208158 84 14:35:39.384686 85 14:35:39.558222 86 14:35:39.730761 87 14:35:39.935214 88 14:35:40.151636 89 14:35:40.348121 90 14:35:40.520165 91 14:35:40.719632 92 14:35:40.914112 93 14:35:41.134523 94 14:35:41.319030 95 873697596434513921: [{'code': 144, 'message': 'No status found with that ID.'}] 95 14:35:41.639174 96 14:35:41.825181 97 14:35:42.001216 98 14:35:42.212652 99 14:35:42.402145 100 14:35:42.587649 101 14:35:42.787116 102 14:35:42.998551 103 14:35:43.180571 104 14:35:43.379062 105 14:35:43.563569 106 14:35:43.770017 107 14:35:43.956519 108 14:35:44.159975 109 14:35:44.391356 110 14:35:44.590823 111 14:35:44.793788 112 14:35:44.986283 113 14:35:45.181761 114 14:35:45.387212 115 14:35:45.590667 116 14:35:45.797116 117 869988702071779329: [{'code': 144, 'message': 'No status found with that ID.'}] 117 14:35:46.107286 118 14:35:46.303772 119 14:35:46.518215 120 14:35:46.710700 121 14:35:46.912162 122 14:35:47.116616 123 14:35:47.325059 124 14:35:47.523528 125 14:35:47.729483 126 14:35:47.921981 127 14:35:48.131925 128 14:35:48.376272 129 14:35:48.595685 130 866816280283807744: [{'code': 144, 'message': 'No status found with that ID.'}] 130 14:35:48.924806 131 14:35:49.121281 132 14:35:49.323247 133 14:35:49.536690 134 14:35:49.743642 135 14:35:49.941114 136 14:35:50.143573 137 14:35:50.349533 138 14:35:50.549998 139 14:35:50.766419 140 14:35:50.982358 141 14:35:51.190801 142 14:35:51.400242 143 14:35:51.647580 144 14:35:51.853031 145 14:35:52.070450 146 14:35:52.302829 147 14:35:52.533222 148 14:35:52.851372 149 14:35:53.052833 150 14:35:53.299175 151 14:35:53.542524 152 861769973181624320: [{'code': 144, 'message': 'No status found with that ID.'}] 152 14:35:53.878626 153 14:35:54.123986 154 14:35:54.349383 155 14:35:54.573784 156 14:35:54.884951 157 14:35:55.097383 158 14:35:55.345732 159 14:35:55.584106 160 14:35:55.797536 161 14:35:56.019941 162 14:35:56.279248 163 14:35:56.496667 164 14:35:56.737528 165 14:35:56.980891 166 14:35:57.234718 167 14:35:57.493028 168 14:35:57.744356 169 14:35:57.966761 170 14:35:58.189167 171 14:35:58.434018 172 14:35:58.653939 173 14:35:58.906265 174 14:35:59.145624 175 14:35:59.394464 176 14:35:59.648799 177 14:35:59.902123 178 14:36:00.129526 179 14:36:00.359923 180 14:36:00.591305 181 14:36:00.820692 182 14:36:01.058057 183 14:36:01.323348 184 14:36:01.592648 185 14:36:01.841993 186 14:36:02.113267 187 14:36:02.341656 188 14:36:02.619912 189 14:36:02.848303 190 14:36:03.158977 191 14:36:03.495089 192 14:36:03.837175 193 14:36:04.072545 194 14:36:04.346812 195 14:36:04.580693 196 14:36:04.832032 197 14:36:05.067906 198 14:36:05.306268 199 14:36:05.575549 200 14:36:05.823885 201 14:36:06.081197 202 14:36:06.356967 203 14:36:06.598826 204 14:36:06.841178 205 14:36:07.088517 206 14:36:07.331868 207 14:36:07.582197 208 14:36:07.827048 209 14:36:08.067909 210 14:36:08.333200 211 14:36:08.584528 212 14:36:08.840843 213 14:36:09.120096 214 14:36:09.378909 215 14:36:09.632736 216 14:36:09.892046 217 14:36:10.148357 218 14:36:10.431601 219 14:36:10.686918 220 14:36:10.945240 221 14:36:11.248430 222 14:36:11.493774 223 14:36:11.764555 224 14:36:12.011894 225 14:36:12.291651 226 14:36:12.550968 227 14:36:12.830221 228 14:36:13.089528 229 14:36:13.373769 230 14:36:13.652025 231 14:36:13.931289 232 14:36:14.184612 233 14:36:14.465860 234 14:36:14.744619 235 14:36:15.014404 236 14:36:15.291663 237 14:36:15.585893 238 14:36:15.872128 239 14:36:16.208739 240 14:36:16.467049 241 14:36:16.743311 242 14:36:17.004132 243 845459076796616705: [{'code': 144, 'message': 'No status found with that ID.'}] 243 14:36:17.390100 244 14:36:17.679833 245 14:36:18.001971 246 14:36:18.286716 247 14:36:18.569481 248 14:36:18.840756 249 14:36:19.118015 250 14:36:19.407750 251 14:36:19.696977 252 14:36:19.991203 253 14:36:20.295389 254 14:36:20.588606 255 842892208864923648: [{'code': 144, 'message': 'No status found with that ID.'}] 255 14:36:20.995518 256 14:36:21.304198 257 14:36:21.607891 258 14:36:21.906095 259 14:36:22.218260 260 14:36:22.515466 261 14:36:22.814172 262 14:36:23.083957 263 14:36:23.377173 264 14:36:23.664405 265 14:36:23.949652 266 14:36:24.255834 267 14:36:24.561028 268 14:36:24.839284 269 14:36:25.129508 270 14:36:25.427712 271 14:36:25.707962 272 14:36:25.999702 273 14:36:26.288945 274 14:36:26.565207 275 14:36:26.844460 276 14:36:27.119724 277 14:36:27.403965 278 14:36:27.688221 279 14:36:27.983431 280 14:36:28.293602 281 14:36:28.583827 282 14:36:28.878040 283 14:36:29.165282 284 14:36:29.460493 285 14:36:29.755704 286 14:36:30.042936 287 14:36:30.347122 288 14:36:30.634858 289 14:36:30.933566 290 14:36:31.233764 291 14:36:31.525982 292 837012587749474308: [{'code': 144, 'message': 'No status found with that ID.'}] 292 14:36:31.924916 293 14:36:32.237099 294 14:36:32.536299 295 14:36:32.863425 296 14:36:33.165617 297 14:36:33.474319 298 14:36:33.800440 299 14:36:34.089666 300 14:36:34.406818 301 14:36:34.694555 302 14:36:35.007235 303 14:36:35.302446 304 14:36:35.594665 305 14:36:35.928277 306 14:36:36.246426 307 14:36:36.542152 308 14:36:36.851326 309 14:36:37.181443 310 14:36:37.492611 311 14:36:37.792809 312 14:36:38.110464 313 14:36:38.431606 314 14:36:38.760232 315 14:36:39.070402 316 14:36:39.426971 317 14:36:39.771052 318 14:36:40.080225 319 14:36:40.434279 320 14:36:40.769393 321 14:36:41.068593 322 14:36:41.399708 323 14:36:41.729344 324 14:36:42.041015 325 14:36:42.373141 326 14:36:42.715227 327 14:36:43.051328 328 14:36:43.352523 329 14:36:43.692623 330 14:36:44.027738 331 14:36:44.345888 332 14:36:44.654064 333 14:36:44.984182 334 14:36:45.303843 335 14:36:45.610035 336 14:36:45.948132 337 14:36:46.273263 338 14:36:46.587423 339 14:36:46.914058 340 14:36:47.233708 341 14:36:47.559836 342 14:36:47.887468 343 14:36:48.205627 344 14:36:48.543242 345 14:36:48.895301 346 14:36:49.227413 347 14:36:49.547557 348 14:36:49.882661 349 14:36:50.196836 350 14:36:50.533935 351 14:36:50.881007 352 14:36:51.204144 353 14:36:51.546252 354 14:36:51.886344 355 14:36:52.213469 356 14:36:52.544584 357 14:36:52.891656 358 14:36:53.221786 359 14:36:53.544923 360 14:36:53.865573 361 14:36:54.212162 362 14:36:54.560231 363 14:36:54.909299 364 14:36:55.231940 365 14:36:55.635870 366 14:36:55.967002 367 14:36:56.317066 368 14:36:56.653168 369 14:36:56.982791 370 14:36:57.306442 371 14:36:57.655509 372 14:36:57.986624 373 14:36:58.344667 374 14:36:58.701240 375 827228250799742977: [{'code': 144, 'message': 'No status found with that ID.'}] 375 14:36:59.144056 376 14:36:59.504094 377 14:36:59.836709 378 14:37:00.179309 379 14:37:00.536354 380 14:37:00.894398 381 14:37:01.254435 382 14:37:01.586065 383 14:37:01.913695 384 14:37:02.269743 385 14:37:02.603850 386 14:37:02.958911 387 14:37:03.327432 388 14:37:03.688467 389 14:37:04.111352 390 14:37:04.475882 391 14:37:04.859865 392 14:37:05.214916 393 14:37:05.577452 394 14:37:05.946465 395 14:37:06.308019 396 14:37:06.666062 397 14:37:07.046046 398 14:37:07.408582 399 14:37:07.756168 400 14:37:08.109225 401 14:37:08.477241 402 14:37:08.841268 403 14:37:09.202808 404 14:37:09.571339 405 14:37:09.935366 406 14:37:10.319843 407 14:37:10.668910 408 14:37:11.029958 409 14:37:11.391990 410 14:37:11.744049 411 14:37:12.115562 412 14:37:12.506527 413 14:37:12.862575 414 14:37:13.218130 415 14:37:13.574188 416 14:37:13.953175 417 14:37:14.345631 418 14:37:14.705682 419 14:37:15.068712 420 14:37:15.440727 421 14:37:15.799767 422 14:37:16.151343 423 14:37:16.507391 424 14:37:16.882389 425 14:37:17.244924 426 14:37:17.614945 427 14:37:18.048799 428 14:37:18.419807 429 14:37:18.781840 430 14:37:19.150369 431 14:37:19.513399 432 14:37:19.909844 433 14:37:20.281863 434 14:37:20.641913 435 14:37:21.018905 436 14:37:21.401881 437 14:37:21.813791 438 14:37:22.208744 439 14:37:22.617652 440 14:37:23.013593 441 14:37:23.389094 442 14:37:23.780553 443 14:37:24.166522 444 14:37:24.541027 445 14:37:24.949943 446 14:37:25.314978 447 14:37:25.688978 448 14:37:26.095890 449 14:37:26.469905 450 14:37:26.911229 451 14:37:27.289218 452 14:37:27.670200 453 14:37:28.066166 454 14:37:28.457624 455 14:37:28.856558 456 14:37:29.234558 457 14:37:29.637492 458 14:37:30.028448 459 14:37:30.430382 460 14:37:30.836803 461 14:37:31.236746 462 14:37:31.638671 463 14:37:32.028629 464 14:37:32.442031 465 14:37:32.833489 466 14:37:33.216465 467 14:37:33.604932 468 14:37:34.015352 469 14:37:34.403315 470 14:37:34.816211 471 14:37:35.210158 472 14:37:35.620075 473 14:37:36.031973 474 14:37:36.423926 475 14:37:36.844810 476 14:37:37.249737 477 14:37:37.656649 478 14:37:38.049599 479 14:37:38.498913 480 14:37:38.905835 481 14:37:39.405500 482 14:37:39.806932 483 14:37:40.212859 484 14:37:40.605315 485 14:37:41.032173 486 14:37:41.430122 487 14:37:41.842021 488 14:37:42.248933 489 14:37:42.646881 490 14:37:43.079737 491 14:37:43.474681 492 14:37:43.880596 493 14:37:44.270070 494 14:37:44.667515 495 14:37:45.061461 496 14:37:45.460406 497 14:37:45.890762 498 14:37:46.335585 499 14:37:46.733521 500 14:37:47.132968 501 14:37:47.548857 502 14:37:47.960766 503 14:37:48.408077 504 14:37:48.811503 505 14:37:49.220410 506 14:37:49.688183 507 14:37:50.102580 508 14:37:50.512485 509 14:37:50.937349 510 14:37:51.405109 511 14:37:51.828976 512 14:37:52.246859 513 14:37:52.659766 514 14:37:53.082143 515 14:37:53.526460 516 14:37:53.983743 517 14:37:54.409120 518 14:37:54.851937 519 14:37:55.258355 520 14:37:55.666768 521 14:37:56.102618 522 14:37:56.514517 523 14:37:56.930406 524 14:37:57.340320 525 14:37:57.771169 526 14:37:58.186059 527 14:37:58.618915 528 14:37:59.061239 529 14:37:59.482114 530 14:37:59.944892 531 14:38:00.364287 532 14:38:00.809114 533 14:38:01.244949 534 14:38:01.662832 535 14:38:02.093197 536 14:38:02.581891 537 14:38:03.038670 538 14:38:03.485980 539 14:38:03.913852 540 14:38:04.335739 541 14:38:04.806995 542 14:38:05.226379 543 14:38:05.654235 544 14:38:06.115004 545 14:38:06.565812 546 14:38:07.030569 547 14:38:07.493332 548 14:38:07.938648 549 14:38:08.398946 550 14:38:08.835777 551 14:38:09.266626 552 14:38:09.704976 553 14:38:10.164748 554 14:38:10.605074 555 14:38:11.065854 556 14:38:11.508682 557 14:38:11.966458 558 802247111496568832: [{'code': 144, 'message': 'No status found with that ID.'}] 558 14:38:12.517983 559 14:38:12.976274 560 14:38:13.419090 561 14:38:13.867399 562 14:38:14.301251 563 14:38:14.753044 564 14:38:15.208825 565 14:38:15.688543 566 14:38:16.172276 567 14:38:16.625065 568 14:38:17.075860 569 14:38:17.541131 570 14:38:17.975475 571 14:38:18.446216 572 14:38:18.890042 573 14:38:19.328374 574 14:38:19.801110 575 14:38:20.245930 576 14:38:20.708704 577 14:38:21.213355 578 14:38:21.732966 579 14:38:22.176789 580 14:38:22.636560 581 14:38:23.088352 582 14:38:23.565090 583 14:38:24.064754 584 14:38:24.510563 585 14:38:25.056114 586 14:38:25.562760 587 14:38:26.032504 588 14:38:26.561609 589 14:38:27.054292 590 14:38:27.550471 591 14:38:28.018736 592 14:38:28.517402 593 14:38:28.987650 594 14:38:29.490328 595 14:38:29.992985 596 14:38:30.455254 597 14:38:30.947948 598 14:38:31.424684 599 14:38:31.905400 600 14:38:32.378146 601 14:38:32.856879 602 14:38:33.311169 603 14:38:33.798866 604 14:38:34.271120 605 14:38:34.734881 606 14:38:35.216099 607 14:38:35.695827 608 14:38:36.175558 609 14:38:36.652284 610 14:38:37.138983 611 14:38:37.639654 612 14:38:38.111393 613 14:38:38.582638 614 14:38:39.072349 615 14:38:39.537106 616 14:38:40.032298 617 14:38:40.524981 618 14:38:41.029149 619 14:38:41.506874 620 14:38:42.015042 621 14:38:42.486297 622 14:38:43.002916 623 14:38:43.501583 624 14:38:44.000260 625 14:38:44.512396 626 14:38:45.005079 627 14:38:45.528690 628 14:38:46.034368 629 14:38:46.535552 630 14:38:47.017277 631 14:38:47.568309 632 14:38:48.082945 633 14:38:48.577130 634 14:38:49.061835 635 14:38:49.561016 636 14:38:50.076638 637 14:38:50.589267 638 14:38:51.103903 639 14:38:51.605081 640 14:38:52.104745 641 14:38:52.600933 642 14:38:53.092126 643 14:38:53.601763 644 14:38:54.123392 645 14:38:54.621061 646 14:38:55.134195 647 14:38:55.668777 648 14:38:56.165982 649 14:38:56.661174 650 14:38:57.162833 651 14:38:57.681962 652 14:38:58.189112 653 14:38:58.707736 654 14:38:59.200933 655 14:38:59.703590 656 14:39:00.225699 657 14:39:00.725876 658 14:39:01.247986 659 14:39:01.763607 660 14:39:02.270757 661 14:39:02.777414 662 14:39:03.300017 663 14:39:03.790222 664 14:39:04.284911 665 14:39:04.799536 666 14:39:05.325152 667 14:39:05.829309 668 14:39:06.326979 669 14:39:06.839608 670 14:39:07.331314 671 14:39:07.842946 672 14:39:08.351587 673 14:39:08.876206 674 14:39:09.395816 675 14:39:09.921434 676 14:39:10.424090 677 14:39:10.945696 678 14:39:11.483280 679 14:39:11.998902 680 14:39:12.514030 681 14:39:13.061085 682 14:39:13.585190 683 14:39:14.126742 684 14:39:14.640873 685 14:39:15.165482 686 14:39:15.694572 687 14:39:16.212693 688 14:39:16.721344 689 14:39:17.269385 690 14:39:17.791495 691 14:39:18.309135 692 14:39:18.834730 693 14:39:19.363821 694 14:39:19.881952 695 14:39:20.411548 696 14:39:20.949111 697 14:39:21.493656 698 14:39:22.041213 699 14:39:22.555838 700 14:39:23.136791 701 14:39:23.679360 702 14:39:24.206456 703 14:39:24.735552 704 14:39:25.273126 705 14:39:25.784771 706 14:39:26.307374 707 14:39:26.821505 708 14:39:27.339149 709 14:39:27.871725 710 14:39:28.392837 711 14:39:28.920942 712 14:39:29.454033 713 14:39:30.039467 714 14:39:30.577554 715 14:39:31.115117 716 14:39:31.645698 717 14:39:32.213704 718 14:39:32.751267 719 14:39:33.310278 720 14:39:33.841374 721 14:39:34.370475 722 14:39:34.925991 723 14:39:35.460079 724 14:39:36.003133 725 14:39:36.533714 726 14:39:37.085252 727 14:39:37.637282 728 14:39:38.188806 729 14:39:38.767273 730 14:39:39.340739 731 14:39:39.891291 732 14:39:40.442321 733 14:39:40.979884 734 14:39:41.546381 735 14:39:42.121842 736 14:39:42.715279 737 14:39:43.261818 738 14:39:43.817333 739 14:39:44.363390 740 14:39:44.897961 741 14:39:45.442016 742 14:39:45.999042 743 14:39:46.577496 744 14:39:47.132033 745 14:39:47.672095 746 14:39:48.209658 747 14:39:48.753205 748 14:39:49.304247 749 14:39:49.837346 750 14:39:50.375916 751 14:39:50.925458 752 14:39:51.467011 753 14:39:52.030011 754 14:39:52.569085 755 14:39:53.150531 756 14:39:53.693588 757 14:39:54.314950 758 14:39:54.870477 759 14:39:55.436963 760 14:39:55.987491 761 14:39:56.628805 762 14:39:57.201791 763 14:39:57.773263 764 14:39:58.343752 765 14:39:58.910762 766 14:39:59.487242 767 14:40:00.044258 768 14:40:00.640663 769 14:40:01.227611 770 14:40:01.798086 771 14:40:02.345633 772 14:40:02.889202 773 14:40:03.448720 774 14:40:04.033158 775 775096608509886464: [{'code': 144, 'message': 'No status found with that ID.'}] 775 14:40:04.707869 776 14:40:05.267386 777 14:40:05.826406 778 14:40:06.378436 779 14:40:06.947431 780 14:40:07.519900 781 14:40:08.102858 782 14:40:08.672336 783 14:40:09.245310 784 14:40:09.820793 785 14:40:10.373336 786 14:40:10.953785 787 14:40:11.510801 788 14:40:12.104736 789 14:40:12.667747 790 14:40:13.236227 791 14:40:13.822175 792 14:40:14.429552 793 14:40:15.018986 794 14:40:15.597452 795 14:40:16.172914 796 14:40:16.755871 797 14:40:17.330852 798 14:40:17.895343 799 14:40:18.476296 800 14:40:19.067235 801 14:40:19.657667 802 14:40:20.285013 803 14:40:20.867456 804 14:40:21.456388 805 14:40:22.049814 806 14:40:22.622284 807 14:40:23.220190 808 14:40:23.814107 809 14:40:24.379596 810 14:40:24.965031 811 14:40:25.550980 812 14:40:26.169342 813 14:40:26.771245 814 14:40:27.367650 815 14:40:27.995477 816 14:40:28.597876 817 14:40:29.188298 818 14:40:29.839567 819 14:40:30.455942 820 14:40:31.081788 821 14:40:31.721079 822 14:40:32.296057 823 14:40:32.897450 824 14:40:33.468931 825 14:40:34.047910 826 14:40:34.660272 827 14:40:35.248215 828 14:40:35.827172 829 14:40:36.407621 830 14:40:37.005035 831 14:40:37.624379 832 14:40:38.207819 833 14:40:38.845141 834 14:40:39.433568 835 14:40:40.034960 836 14:40:40.671773 837 14:40:41.263213 838 14:40:41.882570 839 14:40:42.519452 840 14:40:43.117852 841 14:40:43.750162 842 14:40:44.349570 843 14:40:44.950992 844 14:40:45.615216 845 14:40:46.235073 846 14:40:46.825509 847 14:40:47.433883 848 14:40:48.026299 849 14:40:48.615735 850 14:40:49.246081 851 14:40:49.838526 852 14:40:50.435447 853 14:40:51.066279 854 14:40:51.675157 855 14:40:52.340897 856 14:40:52.972209 857 14:40:53.573108 858 14:40:54.203927 859 14:40:54.802840 860 14:40:55.400243 861 14:40:56.010128 862 14:40:56.641946 863 14:40:57.268785 864 14:40:57.869180 865 14:40:58.471569 866 14:40:59.081459 867 14:40:59.682851 868 14:41:00.310678 869 14:41:00.913587 870 14:41:01.528958 871 14:41:02.158287 872 14:41:02.792591 873 14:41:03.400494 874 14:41:04.015849 875 14:41:04.664140 876 14:41:05.268524 877 14:41:05.884394 878 14:41:06.495760 879 14:41:07.138054 880 14:41:07.849153 881 14:41:08.513882 882 14:41:09.141227 883 14:41:09.767059 884 14:41:10.376440 885 14:41:10.996792 886 14:41:11.624115 887 14:41:12.246966 888 14:41:12.934142 889 14:41:13.567448 890 14:41:14.193292 891 14:41:14.834087 892 14:41:15.443962 893 14:41:16.067809 894 14:41:16.698124 895 14:41:17.320978 896 14:41:17.947303 897 14:41:18.566670 898 14:41:19.214950 899 14:41:19.833321 900 14:41:20.455659 901 14:41:21.101458 902 14:41:21.733768 903 14:41:22.362112 904 14:41:22.999408 905 14:41:23.626743 906 14:41:24.268043 907 14:41:24.915326 908 14:41:25.560614 909 14:41:26.214865 910 14:41:26.877114 911 14:41:27.541339 912 14:41:28.209070 913 14:41:28.862324 914 14:41:29.526055 915 14:41:30.204760 916 14:41:30.840087 917 14:41:31.477394 918 14:41:32.107720 919 14:41:32.761971 920 14:41:33.388296 921 14:41:34.031600 922 14:41:34.690838 923 14:41:35.332145 924 14:41:35.964454 925 14:41:36.637172 926 14:41:37.267487 927 14:41:37.907791 928 14:41:38.594974 929 14:41:39.255209 930 14:41:39.898502 931 14:41:40.564746 932 14:41:41.230482 933 14:41:41.875757 934 14:41:42.529025 935 14:41:43.185270 936 14:41:43.848003 937 14:41:44.491294 938 14:41:45.158511 939 14:41:45.861641 940 14:41:46.531850 941 14:41:47.193607 942 14:41:47.836887 943 14:41:48.479171 944 14:41:49.117504 945 14:41:49.759787 946 14:41:50.411570 947 14:41:51.080781 948 14:41:51.750006 949 14:41:52.442675 950 14:41:53.125849 951 14:41:53.775134 952 14:41:54.417417 953 14:41:55.061201 954 14:41:55.729424 955 14:41:56.385182 956 14:41:57.058897 957 14:41:57.701180 958 14:41:58.373899 959 14:41:59.086993 960 14:41:59.738262 961 14:42:00.431912 962 14:42:01.090153 963 14:42:01.772340 964 14:42:02.427608 965 14:42:03.110804 966 14:42:03.780533 967 14:42:04.446764 968 14:42:05.108503 969 14:42:05.771740 970 14:42:06.436480 971 14:42:07.104694 972 14:42:07.776920 973 14:42:08.509497 974 14:42:09.229089 975 14:42:09.912281 976 14:42:10.600441 977 14:42:11.273159 978 14:42:11.967313 979 14:42:12.627548 980 14:42:13.286292 981 14:42:14.001893 982 14:42:14.702536 983 14:42:15.383231 984 14:42:16.066909 985 14:42:16.733643 986 14:42:17.408363 987 14:42:18.094539 988 14:42:18.775718 989 14:42:19.438461 990 14:42:20.131144 991 14:42:20.790889 992 14:42:21.463105 993 14:42:22.149282 994 14:42:22.851919 995 14:42:23.541077 996 14:42:24.233247 997 14:42:24.907445 998 14:42:25.567689 999 14:42:26.247871 1000 14:42:26.928571 1001 14:42:27.634230 1002 14:42:28.329395 1003 14:42:29.028033 1004 14:42:29.720696 1005 14:42:30.389438 1006 14:42:31.063646 1007 14:42:31.765793 1008 14:42:32.439990 1009 14:42:33.115699 1010 14:42:33.803886 1011 14:42:34.555876 1012 14:42:35.261515 1013 14:42:35.944690 1014 14:42:36.664282 1015 14:42:37.354437 1016 14:42:38.100959 1017 14:42:38.897336 1018 14:42:39.592991 1019 14:42:40.292135 1020 14:42:40.965348 1021 14:42:41.658011 1022 14:42:42.407029 1023 14:42:43.096187 1024 14:42:43.776883 1025 14:42:44.480522 1026 14:42:45.180650 1027 14:42:45.867329 1028 14:42:46.580423 1029 14:42:47.339910 1030 14:42:48.054001 1031 14:42:48.747663 1032 14:42:49.450783 1033 14:42:50.179351 1034 14:42:50.879479 1035 14:42:51.576134 1036 14:42:52.280756 1037 14:42:53.045227 1038 14:42:53.760325 1039 14:42:54.499362 1040 14:42:55.274300 1041 14:42:56.070185 1042 14:42:56.819687 1043 14:42:57.645995 1044 14:42:58.409964 1045 14:42:59.206340 1046 14:43:00.052092 1047 14:43:00.746740 1048 14:43:01.434433 1049 14:43:02.149034 1050 14:43:02.880093 1051 14:43:03.611158 1052 14:43:04.310794 1053 14:43:05.023908 1054 14:43:05.751482 1055 14:43:06.493007 1056 14:43:07.225050 1057 14:43:08.012966 1058 14:43:08.728054 1059 14:43:09.438166 1060 14:43:10.150263 1061 14:43:10.842431 1062 14:43:11.553531 1063 14:43:12.279106 1064 14:43:12.981229 1065 14:43:13.699329 1066 14:43:14.424898 1067 14:43:15.174892 1068 14:43:15.869037 1069 14:43:16.574666 1070 14:43:17.308727 1071 14:43:18.010850 1072 14:43:18.706999 1073 14:43:19.427075 1074 14:43:20.166119 1075 14:43:20.929097 1076 14:43:21.649675 1077 14:43:22.369776 1078 14:43:23.143214 1079 14:43:23.868292 1080 14:43:24.574404 1081 14:43:25.282019 1082 14:43:26.010575 1083 14:43:26.735157 1084 14:43:27.460721 1085 14:43:28.210235 1086 14:43:28.935297 1087 14:43:29.697284 1088 14:43:30.420365 1089 14:43:31.137458 1090 14:43:31.852568 1091 14:43:32.580622 1092 14:43:33.325642 1093 14:43:34.033772 1094 14:43:34.760828 1095 14:43:35.478909 1096 14:43:36.234405 1097 14:43:37.005357 1098 14:43:37.744392 1099 14:43:38.471450 1100 14:43:39.200511 1101 14:43:39.914113 1102 14:43:40.648666 1103 14:43:41.393704 1104 14:43:42.114776 1105 14:43:42.835377 1106 14:43:43.556449 1107 14:43:44.273048 1108 14:43:45.032540 1109 14:43:45.760100 1110 14:43:46.486675 1111 14:43:47.224209 1112 14:43:47.972725 1113 14:43:48.710752 1114 14:43:49.442312 1115 14:43:50.190312 1116 14:43:50.931344 1117 14:43:51.690315 1118 14:43:52.422377 1119 14:43:53.182346 1120 14:43:53.919879 1121 14:43:54.647458 1122 14:43:55.366044 1123 14:43:56.107084 1124 14:43:57.113427 1125 14:43:57.872398 1126 14:43:58.654308 1127 14:43:59.500567 1128 14:44:00.298434 1129 14:44:01.086842 1130 14:44:01.818912 1131 14:44:02.547963 1132 14:44:03.297979 1133 14:44:04.033052 1134 14:44:04.809997 1135 14:44:05.570492 1136 14:44:06.337946 1137 14:44:07.089955 1138 14:44:07.816014 1139 14:44:08.549055 1140 14:44:09.316026 1141 14:44:10.121384 1142 14:44:10.860924 1143 14:44:11.629375 1144 14:44:12.366919 1145 14:44:13.124893 1146 14:44:13.864915 1147 14:44:14.622910 1148 14:44:15.376400 1149 14:44:16.137881 1150 14:44:16.899362 1151 14:44:17.641379 1152 14:44:18.404854 1153 14:44:19.147388 1154 14:44:19.882434 1155 14:44:20.623968 1156 14:44:21.389920 1157 14:44:22.139926 1158 14:44:22.880945 1159 14:44:23.615989 1160 14:44:24.393435 1161 14:44:25.159892 1162 14:44:25.932836 1163 14:44:26.685832 1164 14:44:27.458776 1165 14:44:28.202316 1166 14:44:28.960290 1167 14:44:29.707304 1168 14:44:30.454318 1169 14:44:31.208317 1170 14:44:31.970799 1171 14:44:32.723303 1172 14:44:33.473310 1173 14:44:34.283156 1174 14:44:35.052122 1175 14:44:35.840016 1176 14:44:36.588535 1177 14:44:37.343527 1178 14:44:38.126444 1179 14:44:38.899390 1180 14:44:39.661866 1181 14:44:40.437335 1182 14:44:41.202290 1183 14:44:41.966763 1184 14:44:42.714764 1185 14:44:43.529102 1186 14:44:44.283601 1187 14:44:45.072011 1188 14:44:45.840974 1189 14:44:46.591986 1190 14:44:47.358453 1191 14:44:48.121929 1192 14:44:48.875913 1193 14:44:49.671810 1194 14:44:50.459703 1195 14:44:51.243684 1196 14:44:52.054516 1197 14:44:52.820983 1198 14:44:53.586459 1199 14:44:54.379846 1200 14:44:55.167763 1201 14:44:55.931240 1202 14:44:56.753054 1203 14:44:57.510545 1204 14:44:58.291470 1205 14:44:59.102303 1206 14:44:59.932096 1207 14:45:00.693575 1208 14:45:01.494459 1209 14:45:02.252949 1210 14:45:03.022891 1211 14:45:03.791352 1212 14:45:04.554825 1213 14:45:05.345235 1214 14:45:06.111207 1215 14:45:06.904592 1216 14:45:07.694481 1217 14:45:08.501838 1218 14:45:09.260315 1219 14:45:10.037262 1220 14:45:10.828650 1221 14:45:11.662946 1222 14:45:12.443859 1223 14:45:13.240234 1224 14:45:14.022144 1225 14:45:14.828000 1226 14:45:15.595464 1227 14:45:16.383359 1228 14:45:17.194210 1229 14:45:17.960683 1230 14:45:18.748577 1231 14:45:19.525017 1232 14:45:20.343349 1233 14:45:21.127768 1234 14:45:21.926632 1235 14:45:22.737478 1236 14:45:23.523891 1237 14:45:24.310788 1238 14:45:25.082252 1239 14:45:25.857691 1240 14:45:26.639108 1241 14:45:27.434004 1242 14:45:28.235887 1243 14:45:29.016826 1244 14:45:29.793762 1245 14:45:30.566716 1246 14:45:31.379567 1247 14:45:32.206863 1248 14:45:33.012215 1249 14:45:33.817075 1250 14:45:34.596016 1251 14:45:35.382433 1252 14:45:36.170841 1253 14:45:36.980698 1254 14:45:37.778071 1255 14:45:38.560495 1256 14:45:39.373842 1257 14:45:40.155270 1258 14:45:40.971101 1259 14:45:41.779939 1260 14:45:42.575821 1261 14:45:43.369699 1262 14:45:44.203575 1263 14:45:45.034857 1264 14:45:45.877111 1265 14:45:46.691943 1266 14:45:47.517757 1267 14:45:48.343069 1268 14:45:49.127993 1269 14:45:49.923393 1270 14:45:50.720262 1271 14:45:51.515654 1272 14:45:52.325503 1273 14:45:53.170765 1274 14:45:53.983604 1275 14:45:54.811420 1276 14:45:55.653674 1277 14:45:56.496935 1278 14:45:57.311772 1279 14:45:58.144080 1280 14:45:58.933980 1281 14:45:59.792685 1282 14:46:00.585072 1283 14:46:01.410885 1284 14:46:02.222236 1285 14:46:03.025110 1286 14:46:03.916727 1287 14:46:04.746025 1288 14:46:05.596750 1289 14:46:06.388161 1290 14:46:07.202506 1291 14:46:08.027311 1292 14:46:08.818710 1293 14:46:09.620587 1294 14:46:10.443388 1295 14:46:11.299112 1296 14:46:12.149837 1297 14:46:12.981144 1298 14:46:13.797961 1299 14:46:14.654194 1300 14:46:15.461059 1301 14:46:16.281865 1302 14:46:17.115171 1303 14:46:17.935976 1304 14:46:18.773748 1305 14:46:19.583593 1306 14:46:20.391959 1307 14:46:21.201794 1308 14:46:22.021613 1309 14:46:22.876328 1310 14:46:23.697155 1311 14:46:24.508999 1312 14:46:25.319338 1313 14:46:26.118716 1314 14:46:26.984423 1315 14:46:27.819248 1316 14:46:28.696421 1317 14:46:29.547657 1318 14:46:30.365983 1319 14:46:31.193770 1320 14:46:32.061964 1321 14:46:32.915207 1322 14:46:33.792881 1323 14:46:34.614191 1324 14:46:35.446977 1325 14:46:36.304194 1326 14:46:37.143970 1327 14:46:37.975755 1328 14:46:38.817514 1329 14:46:39.653290 1330 14:46:40.475606 1331 14:46:41.320358 1332 14:46:42.169592 1333 14:46:43.011352 1334 14:46:43.817209 1335 14:46:44.657982 1336 14:46:45.498260 1337 14:46:46.322070 1338 14:46:47.159850 1339 14:46:48.011574 1340 14:46:48.857323 1341 14:46:49.694101 1342 14:46:50.511451 1343 14:46:51.363185 1344 14:46:52.206949 1345 14:46:53.029749 1346 14:46:53.852560 1347 14:46:54.740199 1348 14:46:55.588446 1349 14:46:56.452651 1350 14:46:57.289415 1351 14:46:58.143637 1352 14:46:58.966940 1353 14:46:59.835124 1354 14:47:00.705820 1355 14:47:01.596439 1356 14:47:02.472614 1357 14:47:03.326356 1358 14:47:04.165125 1359 14:47:04.991925 1360 14:47:05.811754 1361 14:47:06.882891 1362 14:47:07.754572 1363 14:47:08.583860 1364 14:47:09.413153 1365 14:47:10.276855 1366 14:47:11.155023 1367 14:47:12.012248 1368 14:47:12.888915 1369 14:47:13.717708 1370 14:47:14.547003 1371 14:47:15.416688 1372 14:47:16.299329 1373 14:47:17.159544 1374 14:47:17.998312 1375 14:47:18.855022 1376 14:47:19.687808 1377 14:47:20.556003 1378 14:47:21.458599 1379 14:47:22.349723 1380 14:47:23.197961 1381 14:47:24.029757 1382 14:47:24.865552 1383 14:47:25.732246 1384 14:47:26.576001 1385 14:47:27.441699 1386 14:47:28.311374 1387 14:47:29.160117 1388 14:47:30.006371 1389 14:47:30.881548 1390 14:47:31.722301 1391 14:47:32.582044 1392 14:47:33.429296 1393 14:47:34.308461 1394 14:47:35.152721 1395 14:47:36.027383 1396 14:47:36.906549 1397 14:47:37.793682 1398 14:47:38.668344 1399 14:47:39.583414 1400 14:47:40.443630 1401 14:47:41.313305 1402 14:47:42.174015 1403 14:47:43.035722 1404 14:47:43.921377 1405 14:47:44.781089 1406 14:47:45.633820 1407 14:47:46.487547 1408 14:47:47.407594 1409 14:47:48.301230 1410 14:47:49.202336 1411 14:47:50.106930 1412 14:47:50.966149 1413 14:47:51.806902 1414 14:47:52.648159 1415 14:47:53.518850 1416 14:47:54.375560 1417 14:47:55.241759 1418 14:47:56.114427 1419 14:47:56.961173 1420 14:47:57.818902 1421 14:47:58.677607 1422 14:47:59.593169 1423 14:48:00.500767 1424 14:48:01.384405 1425 14:48:02.250113 1426 14:48:03.114802 1427 14:48:04.169994 1428 14:48:05.043669 1429 14:48:05.895402 1430 14:48:06.786032 1431 14:48:07.642259 1432 14:48:08.571291 1433 14:48:09.429995 1434 14:48:10.292214 1435 14:48:11.187821 1436 14:48:12.070964 1437 14:48:12.956116 1438 14:48:13.847741 1439 14:48:14.702476 1440 14:48:15.594093 1441 14:48:16.519639 1442 14:48:17.435202 1443 14:48:18.364233 1444 14:48:19.282296 1445 14:48:20.162977 1446 14:48:21.057617 1447 14:48:21.950735 1448 14:48:22.824419 1449 14:48:23.701075 1450 14:48:24.593714 1451 14:48:25.470874 1452 14:48:26.365996 1453 14:48:27.229687 1454 14:48:28.093388 1455 14:48:28.993992 1456 14:48:29.856686 1457 14:48:30.728869 1458 14:48:31.649940 1459 14:48:32.562552 1460 14:48:33.423270 1461 14:48:34.316900 1462 14:48:35.182104 1463 14:48:36.051294 1464 14:48:36.921977 1465 14:48:37.794151 1466 14:48:38.661831 1467 14:48:39.566442 1468 14:48:40.449082 1469 14:48:41.333233 1470 14:48:42.247788 1471 14:48:43.169336 1472 14:48:44.066956 1473 14:48:44.978568 1474 14:48:45.886155 1475 14:48:46.788761 1476 14:48:47.661932 1477 14:48:48.537601 1478 14:48:49.412776 1479 14:48:50.283952 1480 14:48:51.208490 1481 14:48:52.092642 1482 14:48:53.005202 1483 14:48:53.953687 1484 14:48:54.893188 1485 14:48:55.797290 1486 14:48:56.696886 1487 14:48:57.618432 1488 14:48:58.505575 1489 14:48:59.407199 1490 14:49:00.310794 1491 14:49:01.203420 1492 14:49:02.109514 1493 14:49:03.010610 1494 14:49:03.941636 1495 14:49:04.841735 1496 14:49:05.730359 1497 14:49:06.637944 1498 14:49:07.526578 1499 14:49:08.416716 1500 14:49:09.339259 1501 14:49:10.253814 1502 14:49:11.148927 1503 14:49:12.045036 1504 14:49:12.948140 1505 14:49:13.871682 1506 14:49:14.789240 1507 14:49:15.675881 1508 14:49:16.596429 1509 14:49:17.517486 1510 14:49:18.439038 1511 14:49:19.329686 1512 14:49:20.247264 1513 14:49:21.161335 1514 14:49:22.086386 1515 14:49:22.997476 1516 14:49:23.929984 1517 14:49:24.848055 1518 14:49:25.744659 1519 14:49:26.662217 1520 14:49:27.563327 1521 14:49:28.486377 1522 14:49:29.405919 1523 14:49:30.317986 1524 14:49:31.222086 1525 14:49:32.189529 1526 14:49:33.097102 1527 14:49:34.020653 1528 14:49:34.941193 1529 14:49:35.896659 1530 14:49:36.805242 1531 14:49:37.737265 1532 14:49:38.660323 1533 14:49:39.587860 1534
Rate limit reached. Sleeping for: 248
14:49:40.540325 1535 14:53:54.487509 1536 14:53:55.409562 1537 14:53:56.340075 1538 14:53:57.260637 1539 14:53:58.168224 1540 14:53:59.100731 1541 14:54:00.020778 1542 14:54:00.957285 1543 14:54:01.889815 1544 14:54:02.804872 1545 14:54:03.724939 1546 14:54:04.671409 1547 14:54:05.602440 1548 14:54:06.506045 1549 14:54:07.453031 1550 14:54:08.385055 1551 14:54:09.349007 1552 14:54:10.306448 1553 14:54:11.275868 1554 14:54:12.244290 1555 14:54:13.172828 1556 14:54:14.102858 1557 14:54:15.042347 1558 14:54:16.045675 1559 14:54:17.036039 1560 14:54:18.018426 1561 14:54:19.081598 1562 14:54:20.029581 1563 14:54:21.019933 1564 14:54:21.986881 1565 14:54:23.001674 1566 14:54:23.957633 1567 14:54:24.965444 1568 14:54:25.958803 1569 14:54:27.170084 1570 14:54:28.096618 1571 14:54:29.043591 1572 14:54:30.019006 1573 14:54:31.065730 1574 14:54:31.993259 1575 14:54:32.958700 1576 14:54:33.874253 1577 14:54:34.794813 1578 14:54:35.795139 1579 14:54:36.803466 1580 14:54:37.876612 1581 14:54:38.805634 1582 14:54:39.740160 1583 14:54:40.693611 1584 14:54:41.636105 1585 14:54:42.687305 1586 14:54:43.627806 1587 14:54:44.588265 1588 14:54:45.524772 1589 14:54:46.467263 1590 14:54:47.433198 1591 14:54:48.373694 1592 14:54:49.303210 1593 14:54:50.266671 1594 14:54:51.203692 1595 14:54:52.147686 1596 14:54:53.091164 1597 14:54:54.060583 1598 14:54:55.024018 1599 14:54:56.105642 1600 14:54:57.070568 1601 14:54:58.388069 1602 14:54:59.337542 1603 14:55:00.311443 1604 14:55:01.271418 1605 14:55:02.224893 1606 14:55:03.169368 1607 14:55:04.124836 1608 14:55:05.084786 1609 14:55:06.066172 1610 14:55:07.082982 1611 14:55:08.055886 1612 14:55:09.011344 1613 14:55:09.964813 1614 14:55:10.932228 1615 14:55:11.893657 1616 14:55:12.838132 1617 14:55:13.766650 1618 14:55:14.701152 1619 14:55:15.663579 1620 14:55:16.634982 1621 14:55:17.597410 1622 14:55:18.569810 1623 14:55:19.549192 1624 14:55:20.514612 1625 14:55:21.480031 1626 14:55:22.462405 1627 14:55:23.410869 1628 14:55:24.353350 1629 14:55:25.304807 1630 14:55:26.277207 1631 14:55:27.216695 1632 14:55:28.230984 1633 14:55:29.169475 1634 14:55:30.114948 1635 14:55:31.064410 1636 14:55:32.001904 1637 14:55:32.948374 1638 14:55:33.903820 1639 14:55:34.871233 1640 14:55:35.816706 1641 14:55:36.811048 1642 14:55:37.759512 1643 14:55:38.706980 1644 14:55:39.723263 1645 14:55:40.685690 1646 14:55:41.658091 1647 14:55:42.618523 1648 14:55:43.573969 1649 14:55:44.532407 1650 14:55:45.480872 1651 14:55:46.434323 1652 14:55:47.378799 1653 14:55:48.355189 1654 14:55:49.313626 1655 14:55:50.282038 1656 14:55:51.278374 1657 14:55:52.223847 1658 14:55:53.168322 1659 14:55:54.129752 1660 14:55:55.090184 1661 14:55:56.038649 1662 14:55:57.002074 1663 14:55:57.996416 1664 14:55:58.982779 1665 14:55:59.934236 1666 14:56:00.899655 1667 14:56:01.867069 1668 14:56:02.883352 1669 14:56:04.079155 1670 14:56:05.093444 1671 14:56:06.092773 1672 14:56:07.067189 1673 14:56:08.036598 1674 14:56:09.012508 1675 14:56:09.994402 1676 14:56:10.993247 1677 14:56:11.975143 1678 14:56:12.995416 1679 14:56:13.996753 1680 14:56:15.012050 1681 14:56:16.036814 1682 14:56:17.047134 1683 14:56:18.076406 1684 14:56:19.092698 1685 14:56:20.100015 1686 14:56:21.155710 1687 14:56:22.222868 1688 14:56:23.215729 1689 14:56:24.201095 1690 14:56:25.193957 1691 14:56:26.241179 1692 14:56:27.308840 1693 14:56:28.305679 1694 14:56:29.295549 1695 14:56:30.271446 1696 14:56:31.302711 1697 14:56:32.327477 1698 14:56:33.328823 1699 14:56:34.342115 1700 14:56:35.348929 1701 14:56:36.395143 1702 14:56:37.433393 1703 14:56:38.435725 1704 14:56:39.408640 1705 14:56:40.386531 1706 14:56:41.494595 1707 14:56:42.520864 1708 14:56:43.526178 1709 14:56:44.522031 1710 14:56:45.540319 1711 14:56:46.549633 1712 14:56:47.539502 1713 14:56:48.579745 1714 14:56:49.624951 1715 14:56:50.651727 1716 14:56:51.679495 1717 14:56:52.695307 1718 14:56:53.687654 1719 14:56:54.671036 1720 14:56:55.676865 1721 14:56:56.705631 1722 14:56:57.705473 1723 14:56:58.688844 1724 14:56:59.664256 1725 14:57:00.677547 1726 14:57:01.778120 1727 14:57:02.796914 1728 14:57:03.853102 1729 14:57:04.852441 1730 14:57:05.866749 1731 14:57:06.878560 1732 14:57:07.878897 1733 14:57:08.878729 1734 14:57:09.880566 1735 14:57:11.029025 1736 14:57:12.055281 1737 14:57:13.108991 1738 14:57:14.136254 1739 14:57:15.151549 1740 14:57:16.164361 1741 14:57:17.203606 1742 14:57:18.214902 1743 14:57:19.248152 1744 14:57:20.267951 1745 14:57:21.318650 1746 14:57:22.392779 1747 14:57:23.391624 1748 14:57:24.393945 1749 14:57:25.425703 1750 14:57:26.520798 1751 14:57:27.576481 1752 14:57:28.605751 1753 14:57:29.641500 1754 14:57:30.697690 1755 14:57:31.749384 1756 14:57:32.786134 1757 14:57:33.881216 1758 14:57:34.928931 1759 14:57:36.329187 1760 14:57:37.354951 1761 14:57:38.369764 1762 14:57:39.375077 1763 14:57:40.392872 1764 14:57:41.426624 1765 14:57:42.474836 1766 14:57:43.511087 1767 14:57:44.532860 1768 14:57:45.578584 1769 14:57:46.632282 1770 14:57:47.677512 1771 14:57:48.707767 1772 14:57:49.737536 1773 14:57:50.777273 1774 14:57:51.831969 1775 14:57:52.866720 1776 14:57:53.878531 1777 14:57:54.911770 1778 14:57:55.949510 1779 14:57:57.070534 1780 14:57:58.156153 1781 14:57:59.190903 1782 14:58:00.228650 1783 14:58:01.298805 1784 14:58:02.330564 1785 14:58:03.397733 1786 14:58:04.429986 1787 14:58:05.446783 1788 14:58:06.516933 1789 14:58:07.549698 1790 14:58:08.589917 1791 14:58:09.663563 1792 14:58:10.734216 1793 14:58:11.896121 1794 14:58:12.948824 1795 14:58:14.034931 1796 14:58:15.090121 1797 14:58:16.201680 1798 14:58:17.246410 1799 14:58:18.289139 1800 14:58:19.334356 1801 14:58:20.376580 1802 14:58:22.460058 1803 14:58:23.487822 1804 14:58:24.536041 1805 14:58:25.595724 1806 14:58:26.657885 1807 14:58:27.770428 1808 14:58:28.820621 1809 14:58:29.937155 1810 14:58:31.001825 1811 14:58:32.060041 1812 14:58:33.125697 1813 14:58:34.199864 1814 14:58:35.236094 1815 14:58:36.280816 1816 14:58:37.341496 1817 14:58:38.396200 1818 14:58:39.448398 1819 14:58:40.528544 1820 14:58:41.680968 1821 14:58:42.753114 1822 14:58:43.831242 1823 14:58:44.927837 1824 14:58:46.057322 1825 14:58:47.181327 1826 14:58:48.245510 1827 14:58:49.325634 1828 14:58:50.398282 1829 14:58:51.460443 1830 14:58:52.549544 1831 14:58:53.625184 1832 14:58:54.685350 1833 14:58:55.734556 1834 14:58:56.789757 1835 14:58:57.836474 1836 14:58:58.865745 1837 14:58:59.901491 1838 14:59:00.969656 1839 14:59:02.010379 1840 14:59:03.079061 1841 14:59:04.144213 1842 14:59:05.217860 1843 14:59:06.308450 1844 14:59:07.369623 1845 14:59:08.422830 1846 14:59:09.486501 1847 14:59:10.536715 1848 14:59:11.692635 1849 14:59:12.760298 1850 14:59:13.801515 1851 14:59:14.840264 1852 14:59:15.973236 1853 14:59:17.034914 1854 14:59:18.077654 1855 14:59:19.158763 1856 14:59:20.220942 1857 14:59:21.273151 1858 14:59:22.355270 1859 14:59:23.424420 1860 14:59:24.496063 1861 14:59:25.553752 1862 14:59:26.636876 1863 14:59:27.701547 1864 14:59:28.738775 1865 14:59:29.793977 1866 14:59:30.892043 1867 14:59:31.980653 1868 14:59:33.063264 1869 14:59:34.121446 1870 14:59:35.196089 1871 14:59:36.567941 1872 14:59:37.631108 1873 14:59:38.718213 1874 14:59:39.793349 1875 14:59:40.896422 1876 14:59:41.961639 1877 14:59:43.058234 1878 14:59:44.143392 1879 14:59:45.212546 1880 14:59:46.299147 1881 14:59:47.396225 1882 14:59:48.470378 1883 14:59:49.548511 1884 14:59:50.608196 1885 14:59:51.705788 1886 14:59:52.799863 1887 14:59:53.876006 1888 14:59:54.954639 1889 14:59:56.075652 1890 14:59:57.173260 1891 14:59:58.261374 1892 14:59:59.348494 1893 15:00:00.478499 1894 15:00:01.600511 1895 15:00:02.676154 1896 15:00:03.740832 1897 15:00:04.793523 1898 15:00:05.870666 1899 15:00:06.953277 1900 15:00:08.014450 1901 15:00:09.132482 1902 15:00:10.187662 1903 15:00:11.538079 1904 15:00:12.628680 1905 15:00:13.700331 1906 15:00:14.817849 1907 15:00:15.927892 1908 15:00:16.999541 1909 15:00:18.086152 1910 15:00:19.160292 1911 15:00:20.244404 1912 15:00:21.423278 1913 15:00:22.614120 1914 15:00:23.719166 1915 15:00:24.808769 1916 15:00:26.244458 1917 15:00:27.311615 1918 15:00:28.379279 1919 15:00:29.479348 1920 15:00:30.556492 1921 15:00:31.666525 1922 15:00:32.758616 1923 15:00:33.845226 1924 15:00:34.937823 1925 15:00:36.041872 1926 15:00:37.141447 1927 15:00:38.224068 1928 15:00:39.319172 1929 15:00:40.421283 1930 15:00:41.521856 1931 15:00:42.588017 1932 15:00:43.701559 1933 15:00:44.791174 1934 15:00:45.874279 1935 15:00:46.995294 1936 15:00:48.095373 1937 15:00:49.217374 1938 15:00:50.286516 1939 15:00:51.434477 1940 15:00:52.538557 1941 15:00:53.621167 1942 15:00:54.696828 1943 15:00:55.788974 1944 15:00:56.882052 1945 15:00:57.969660 1946 15:00:59.065729 1947 15:01:00.178287 1948 15:01:01.292823 1949 15:01:02.487146 1950 15:01:03.560792 1951 15:01:04.634921 1952 15:01:05.741479 1953 15:01:06.844542 1954 15:01:07.925675 1955 15:01:09.048693 1956 15:01:10.163235 1957 15:01:11.284272 1958 15:01:12.380860 1959 15:01:13.498882 1960 15:01:14.607435 1961 15:01:15.701535 1962 15:01:16.826536 1963 15:01:17.928121 1964 15:01:19.053156 1965 15:01:20.149696 1966 15:01:21.309595 1967 15:01:22.461033 1968 15:01:23.571581 1969 15:01:24.705075 1970 15:01:25.848040 1971 15:01:26.974533 1972 15:01:28.066646 1973 15:01:29.168712 1974 15:01:30.262788 1975 15:01:31.393775 1976 15:01:32.513299 1977 15:01:33.658264 1978 15:01:34.775278 1979 15:01:35.870381 1980 15:01:37.024814 1981 15:01:38.134363 1982 15:01:39.299755 1983 15:01:40.461166 1984 15:01:42.601979 1985 15:01:43.722499 1986 15:01:44.831050 1987 15:01:45.940599 1988 15:01:47.082075 1989 15:01:48.215551 1990 15:01:49.322592 1991 15:01:50.427661 1992 15:01:51.561651 1993 15:01:52.683742 1994 15:01:53.822204 1995 15:01:54.916300 1996 15:01:56.051793 1997 15:01:57.186778 1998 15:01:58.304309 1999 15:01:59.418341 2000 15:02:00.530893 2001 15:02:01.655393 2002 15:02:02.887141 2003 15:02:04.007168 2004 15:02:05.129180 2005 15:02:06.269144 2006 15:02:07.473434 2007 15:02:08.585491 2008 15:02:09.730430 2009 15:02:10.828505 2010 15:02:11.953519 2011 15:02:13.092475 2012 15:02:14.295293 2013 15:02:15.413818 2014 15:02:16.542811 2015 15:02:17.696749 2016 15:02:18.828240 2017 15:02:19.943269 2018 15:02:21.080230 2019 15:02:22.250618 2020 15:02:23.376619 2021 15:02:24.522076 2022 15:02:25.666608 2023 15:02:26.821521 2024 15:02:27.961486 2025 15:02:29.070532 2026 15:02:30.185059 2027 15:02:31.337530 2028 15:02:32.583210 2029 15:02:33.699227 2030 15:02:34.825732 2031 15:02:35.953727 2032 15:02:37.116139 2033 15:02:38.234667 2034 15:02:39.350198 2035 15:02:40.491172 2036 15:02:41.678505 2037 15:02:42.838418 2038 15:02:43.961430 2039 15:02:45.137323 2040 15:02:46.380012 2041 15:02:47.526453 2042 15:02:48.642482 2043 15:02:49.758520 2044 15:02:50.885517 2045 15:02:52.042929 2046 15:02:53.197361 2047 15:02:54.324359 2048 15:02:55.478294 2049 15:02:56.630728 2050 15:02:57.811096 2051 15:02:58.984475 2052 15:03:00.137413 2053 15:03:01.285850 2054 15:03:02.449246 2055 15:03:03.593197 2056 15:03:04.770063 2057 15:03:05.928484 2058 15:03:07.307807 2059 15:03:08.449755 2060 15:03:09.622138 2061 15:03:10.777073 2062 15:03:11.943461 2063 15:03:13.147264 2064 15:03:14.307198 2065 15:03:15.436180 2066 15:03:16.610557 2067 15:03:17.747528 2068 15:03:18.887986 2069 15:03:20.056377 2070 15:03:21.258678 2071 15:03:22.471448 2072 15:03:23.624884 2073 15:03:24.838659 2074 15:03:25.988102 2075 15:03:27.155980 2076 15:03:28.310423 2077 15:03:29.444919 2078 15:03:30.606824 2079 15:03:31.780202 2080 15:03:32.925646 2081 15:03:34.094546 2082 15:03:35.225533 2083 15:03:36.428339 2084 15:03:37.610180 2085 15:03:38.774583 2086 15:03:39.947470 2087 15:03:41.093921 2088 15:03:42.593447 2089 15:03:43.748877 2090 15:03:44.903789 2091 15:03:46.116065 2092 15:03:47.382197 2093 15:03:48.547101 2094 15:03:49.695536 2095 15:03:50.868423 2096 15:03:52.037802 2097 15:03:53.179273 2098 15:03:54.380063 2099 15:03:55.516550 2100 15:03:56.664493 2101 15:03:57.858806 2102 15:03:59.035671 2103 15:04:00.191615 2104 15:04:01.345540 2105 15:04:02.562805 2106 15:04:03.774577 2107 15:04:04.947441 2108 15:04:06.140786 2109 15:04:07.322649 2110 15:04:08.496029 2111 15:04:09.678383 2112 15:04:10.851843 2113 15:04:12.037693 2114 15:04:13.247965 2115 15:04:14.452756 2116 15:04:15.672523 2117 15:04:17.003492 2118 15:04:18.192818 2119 15:04:19.391139 2120 15:04:20.589946 2121 15:04:21.760838 2122 15:04:22.926240 2123 15:04:24.109088 2124 15:04:25.287454 2125 15:04:26.466327 2126 15:04:27.657657 2127 15:04:28.833524 2128 15:04:30.012899 2129 15:04:31.203740 2130 15:04:32.390577 2131 15:04:33.587412 2132 15:04:34.778252 2133 15:04:36.005488 2134 15:04:37.179371 2135 15:04:38.345277 2136 15:04:39.500202 2137 15:04:40.675083 2138 15:04:41.891844 2139 15:04:43.078682 2140 15:04:44.242096 2141 15:04:45.400514 2142 15:04:46.617791 2143 15:04:47.899379 2144 15:04:49.167011 2145 15:04:50.357828 2146 15:04:51.907222 2147 15:04:53.092074 2148 15:04:54.295372 2149 15:04:55.456291 2150 15:04:56.631171 2151 15:04:57.837946 2152 15:04:59.013330 2153 15:05:00.207150 2154 15:05:01.402964 2155 15:05:02.706993 2156 15:05:03.970625 2157 15:05:05.165440 2158 15:05:06.407121 2159 15:05:07.609931 2160 15:05:08.817734 2161 15:05:10.006556 2162 15:05:11.268687 2163 15:05:12.514368 2164 15:05:13.747072 2165 15:05:14.923939 2166 15:05:16.097822 2167 15:05:17.354498 2168 15:05:18.559338 2169 15:05:19.732716 2170 15:05:20.949994 2171 15:05:22.155791 2172 15:05:23.375550 2173 15:05:24.569889 2174 15:05:25.753251 2175 15:05:26.998932 2176 15:05:28.216231 2177 15:05:29.412553 2178 15:05:30.618847 2179 15:05:31.926374 2180 15:05:33.130668 2181 15:05:34.319994 2182 15:05:35.499346 2183 15:05:36.746035 2184 15:05:37.958310 2185 15:05:39.186046 2186 15:05:40.389840 2187 15:05:41.743330 2188 15:05:42.934166 2189 15:05:44.149933 2190 15:05:45.358732 2191 15:05:46.621357 2192 15:05:47.934373 2193 15:05:49.167587 2194 15:05:50.364904 2195 15:05:51.651490 2196 15:05:52.867258 2197 15:05:54.082026 2198 15:05:55.289833 2199 15:05:56.474183 2200 15:05:57.676484 2201 15:05:58.956568 2202 15:06:00.140918 2203 15:06:01.373139 2204 15:06:02.593402 2205 15:06:03.806180 2206 15:06:05.012465 2207 15:06:06.254177 2208 15:06:07.460457 2209 15:06:08.675220 2210 15:06:09.917408 2211 15:06:11.133185 2212 15:06:12.357921 2213 15:06:13.574691 2214 15:06:14.796948 2215 15:06:16.005244 2216 15:06:17.362133 2217 15:06:18.565914 2218 15:06:19.790661 2219 15:06:20.995441 2220 15:06:22.250601 2221 15:06:23.476839 2222 15:06:24.667174 2223 15:06:25.852519 2224 15:06:27.054326 2225 15:06:28.271580 2226 15:06:29.519254 2227 15:06:30.745517 2228 15:06:31.982783 2229 15:06:33.206542 2230 15:06:34.439247 2231 15:06:35.662986 2232 15:06:36.918150 2233 15:06:38.150886 2234 15:06:39.376693 2235 15:06:40.612404 2236 15:06:41.894494 2237 15:06:43.090306 2238 15:06:44.320029 2239 15:06:45.534298 2240 15:06:46.749566 2241 15:06:47.993760 2242 15:06:49.232469 2243 15:06:50.457699 2244 15:06:51.682949 2245 15:06:52.940587 2246 15:06:54.162347 2247 15:06:55.391577 2248 15:06:56.643737 2249 15:06:57.896414 2250 15:06:59.160550 2251 15:07:00.374314 2252 15:07:01.628971 2253 15:07:02.904081 2254 15:07:04.150770 2255 15:07:05.406932 2256 15:07:06.638654 2257 15:07:07.912276 2258 15:07:09.161471 2259 15:07:10.407162 2260 15:07:11.656328 2261 15:07:12.904004 2262 15:07:14.135237 2263 15:07:15.385441 2264 15:07:16.635100 2265 15:07:17.889266 2266 15:07:19.127966 2267 15:07:20.352209 2268 15:07:21.578951 2269 15:07:22.863542 2270 15:07:24.116708 2271 15:07:25.360900 2272 15:07:26.595104 2273 15:07:27.875705 2274 15:07:29.084499 2275 15:07:30.301257 2276 15:07:31.602799 2277 15:07:32.824049 2278 15:07:34.042813 2279 15:07:35.319422 2280 15:07:36.587033 2281 15:07:37.851172 2282 15:07:39.154700 2283 15:07:40.400410 2284 15:07:41.645587 2285 15:07:42.980029 2286 15:07:44.218729 2287 15:07:45.442971 2288 15:07:46.705618 2289 15:07:47.986709 2290 15:07:49.239360 2291 15:07:50.499005 2292 15:07:52.125701 2293 15:07:53.411768 2294 15:07:54.656466 2295 15:07:55.925097 2296 15:07:57.261046 2297 15:07:58.541139 2298 15:07:59.776871 2299 15:08:01.072933 2300 15:08:02.365981 2301 15:08:03.602686 2302 15:08:04.843380 2303 15:08:06.094551 2304 15:08:07.357187 2305 15:08:08.600862 2306 15:08:09.878457 2307 15:08:11.128645 2308 15:08:12.385825 2309 15:08:13.626542 2310 15:08:14.876714 2311 15:08:16.129365 2312 15:08:17.574022 2313 15:08:18.854116 2314 15:08:20.101781 2315 15:08:21.372911 2316 15:08:22.669458 2317 15:08:23.942056 2318 15:08:25.199212 2319 15:08:26.556605 2320 15:08:27.909503 2321 15:08:29.152192 2322 15:08:30.406365 2323 15:08:31.699414 2324 15:08:32.996476 2325 15:08:34.241189 2326 15:08:35.494344 2327 15:08:36.866700 2328 15:08:38.117365 2329 15:08:39.367034 2330 15:08:40.654618 2331 15:08:42.009010 2332 15:08:43.362406 2333 15:08:44.613578 2334 15:08:45.889181 2335 15:08:47.242100 2336 15:08:48.493754 2337 15:08:49.735467 2338 15:08:51.015046 2339 15:08:52.300127 2340 15:08:53.583695 2341 15:08:54.852318 2342 15:08:56.170793 2343 15:08:57.514215 2344 15:08:58.847182 2345 15:09:00.147223
In [ ]:
data = {} data['tweets'] = [] tweet_errors = {} tweet_count = 1 for tweet_id in tweet_ids: try: # Print id counter print(tweet_count) # Collect tweet info tweet = api.get_status(tweet_id, tweet_mode='extended') info = tweet._json # Collect specific data retweet_count = info['retweet_count'] favorite_count = info['favorite_count'] followers_count = info['user']['followers_count'] # Append to data dict data['tweets'].append({ 'tweet_id': tweet_id, 'retweet_count': retweet_count, 'favorite_count': favorite_count, 'followers_count': followers_count }) #print(retweet_count, favorite_count, followers_count) # debug test #print(data) #break # debug test # Print timer info to estimate time until wake-up print(datetime.datetime.now().time()) # Add one to the tweet count for further printing tweet_count += 1 except Exception as e: # Print exception info and add to tweet_errors dict print(str(tweet_count) + "_" + str(tweet_id) + ": " + str(e)) tweet_errors[str(tweet_count) + "_" + str(tweet_id)] = info
In [5]:
# Extract data from file df_list = [] with open('tweet_json.txt') as json_file: data = json.load(json_file) for tweet in data: df_list.append({'tweet_id': tweet['id'], 'retweet_count': tweet['retweet_count'], 'favorite_count': tweet['favorite_count'], 'followers_count': tweet['user']['followers_count']})
In [6]:
# Create DataFrame from list of dictionaries api_data = pd.DataFrame(df_list, columns = ['tweet_id', 'retweet_count', 'favorite_count', 'followers_count'])
In [32]:
tweet_errors.keys()
Out[32]:
dict_keys(['19_888202515573088257', '94_873697596434513921', '116_869988702071779329', '129_866816280283807744', '151_861769973181624320', '242_845459076796616705', '254_842892208864923648', '291_837012587749474308', '374_827228250799742977', '557_802247111496568832', '774_775096608509886464'])
Resources:
Assess the Data
archive
table
In [68]:
archive
Out[68]:
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892420643555336193 | NaN | NaN | 2017-08-01 16:23:56 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Phineas. He’s a mystical boy. Only eve… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892420643… | 13 | 10 | Phineas | None | None | None | None |
1 | 892177421306343426 | NaN | NaN | 2017-08-01 00:17:27 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Tilly. She’s just checking pup on you…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892177421… | 13 | 10 | Tilly | None | None | None | None |
2 | 891815181378084864 | NaN | NaN | 2017-07-31 00:18:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Archie. He is a rare Norwegian Pouncin… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891815181… | 12 | 10 | Archie | None | None | None | None |
3 | 891689557279858688 | NaN | NaN | 2017-07-30 15:58:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Darla. She commenced a snooze mid meal… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891689557… | 13 | 10 | Darla | None | None | None | None |
4 | 891327558926688256 | NaN | NaN | 2017-07-29 16:00:24 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Franklin. He would like you to stop ca… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891327558… | 12 | 10 | Franklin | None | None | None | None |
5 | 891087950875897856 | NaN | NaN | 2017-07-29 00:08:17 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a majestic great white breaching … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891087950… | 13 | 10 | None | None | None | None | None |
6 | 890971913173991426 | NaN | NaN | 2017-07-28 16:27:12 +0000 | <a href=”http://twitter.com/download/iphone” r… | Meet Jax. He enjoys ice cream so much he gets … | NaN | NaN | NaN | https://gofundme.com/ydvmve-surgery-for-jax,ht… | 13 | 10 | Jax | None | None | None | None |
7 | 890729181411237888 | NaN | NaN | 2017-07-28 00:22:40 +0000 | <a href=”http://twitter.com/download/iphone” r… | When you watch your owner call another dog a g… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890729181… | 13 | 10 | None | None | None | None | None |
8 | 890609185150312448 | NaN | NaN | 2017-07-27 16:25:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Zoey. She doesn’t want to be one of th… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890609185… | 13 | 10 | Zoey | None | None | None | None |
9 | 890240255349198849 | NaN | NaN | 2017-07-26 15:59:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Cassie. She is a college pup. Studying… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890240255… | 14 | 10 | Cassie | doggo | None | None | None |
10 | 890006608113172480 | NaN | NaN | 2017-07-26 00:31:25 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Koda. He is a South Australian decksha… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890006608… | 13 | 10 | Koda | None | None | None | None |
11 | 889880896479866881 | NaN | NaN | 2017-07-25 16:11:53 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Bruno. He is a service shark. Only get… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889880896… | 13 | 10 | Bruno | None | None | None | None |
12 | 889665388333682689 | NaN | NaN | 2017-07-25 01:55:32 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here’s a puppo that seems to be on the fence a… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889665388… | 13 | 10 | None | None | None | None | puppo |
13 | 889638837579907072 | NaN | NaN | 2017-07-25 00:10:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Ted. He does his best. Sometimes that’… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889638837… | 12 | 10 | Ted | None | None | None | None |
14 | 889531135344209921 | NaN | NaN | 2017-07-24 17:02:04 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Stuart. He’s sporting his favorite fan… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889531135… | 13 | 10 | Stuart | None | None | None | puppo |
15 | 889278841981685760 | NaN | NaN | 2017-07-24 00:19:32 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Oliver. You’re witnessing one of his m… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889278841… | 13 | 10 | Oliver | None | None | None | None |
16 | 888917238123831296 | NaN | NaN | 2017-07-23 00:22:39 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Jim. He found a fren. Taught him how t… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888917238… | 12 | 10 | Jim | None | None | None | None |
17 | 888804989199671297 | NaN | NaN | 2017-07-22 16:56:37 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Zeke. He has a new stick. Very proud o… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888804989… | 13 | 10 | Zeke | None | None | None | None |
18 | 888554962724278272 | NaN | NaN | 2017-07-22 00:23:06 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Ralphus. He’s powering up. Attempting … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888554962… | 13 | 10 | Ralphus | None | None | None | None |
19 | 888202515573088257 | NaN | NaN | 2017-07-21 01:02:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | RT @dog_rates: This is Canela. She attempted s… | 8.874740e+17 | 4.196984e+09 | 2017-07-19 00:47:34 +0000 | https://twitter.com/dog_rates/status/887473957… | 13 | 10 | Canela | None | None | None | None |
20 | 888078434458587136 | NaN | NaN | 2017-07-20 16:49:33 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Gerald. He was just told he didn’t get… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888078434… | 12 | 10 | Gerald | None | None | None | None |
21 | 887705289381826560 | NaN | NaN | 2017-07-19 16:06:48 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Jeffrey. He has a monopoly on the pool… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887705289… | 13 | 10 | Jeffrey | None | None | None | None |
22 | 887517139158093824 | NaN | NaN | 2017-07-19 03:39:09 +0000 | <a href=”http://twitter.com/download/iphone” r… | I’ve yet to rate a Venezuelan Hover Wiener. Th… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887517139… | 14 | 10 | such | None | None | None | None |
23 | 887473957103951883 | NaN | NaN | 2017-07-19 00:47:34 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Canela. She attempted some fancy porch… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887473957… | 13 | 10 | Canela | None | None | None | None |
24 | 887343217045368832 | NaN | NaN | 2017-07-18 16:08:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | You may not have known you needed to see this … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887343217… | 13 | 10 | None | None | None | None | None |
25 | 887101392804085760 | NaN | NaN | 2017-07-18 00:07:08 +0000 | <a href=”http://twitter.com/download/iphone” r… | This… is a Jubilant Antarctic House Bear. We… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887101392… | 12 | 10 | None | None | None | None | None |
26 | 886983233522544640 | NaN | NaN | 2017-07-17 16:17:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Maya. She’s very shy. Rarely leaves he… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886983233… | 13 | 10 | Maya | None | None | None | None |
27 | 886736880519319552 | NaN | NaN | 2017-07-16 23:58:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Mingus. He’s a wonderful father to his… | NaN | NaN | NaN | https://www.gofundme.com/mingusneedsus,https:/… | 13 | 10 | Mingus | None | None | None | None |
28 | 886680336477933568 | NaN | NaN | 2017-07-16 20:14:00 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Derek. He’s late for a dog meeting. 13… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886680336… | 13 | 10 | Derek | None | None | None | None |
29 | 886366144734445568 | NaN | NaN | 2017-07-15 23:25:31 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Roscoe. Another pupper fallen victim t… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886366144… | 12 | 10 | Roscoe | None | None | pupper | None |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
2326 | 666411507551481857 | NaN | NaN | 2015-11-17 00:24:19 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is quite the dog. Gets really excited whe… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666411507… | 2 | 10 | quite | None | None | None | None |
2327 | 666407126856765440 | NaN | NaN | 2015-11-17 00:06:54 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a southern Vesuvius bumblegruff. Can d… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666407126… | 7 | 10 | a | None | None | None | None |
2328 | 666396247373291520 | NaN | NaN | 2015-11-16 23:23:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | Oh goodness. A super rare northeast Qdoba kang… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666396247… | 9 | 10 | None | None | None | None | None |
2329 | 666373753744588802 | NaN | NaN | 2015-11-16 21:54:18 +0000 | <a href=”http://twitter.com/download/iphone” r… | Those are sunglasses and a jean jacket. 11/10 … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666373753… | 11 | 10 | None | None | None | None | None |
2330 | 666362758909284353 | NaN | NaN | 2015-11-16 21:10:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | Unique dog here. Very small. Lives in containe… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666362758… | 6 | 10 | None | None | None | None | None |
2331 | 666353288456101888 | NaN | NaN | 2015-11-16 20:32:58 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a mixed Asiago from the Galápagos… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666353288… | 8 | 10 | None | None | None | None | None |
2332 | 666345417576210432 | NaN | NaN | 2015-11-16 20:01:42 +0000 | <a href=”http://twitter.com/download/iphone” r… | Look at this jokester thinking seat belt laws … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666345417… | 10 | 10 | None | None | None | None | None |
2333 | 666337882303524864 | NaN | NaN | 2015-11-16 19:31:45 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an extremely rare horned Parthenon. No… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666337882… | 9 | 10 | an | None | None | None | None |
2334 | 666293911632134144 | NaN | NaN | 2015-11-16 16:37:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a funny dog. Weird toes. Won’t come do… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666293911… | 3 | 10 | a | None | None | None | None |
2335 | 666287406224695296 | NaN | NaN | 2015-11-16 16:11:11 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an Albanian 3 1/2 legged Episcopalian… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666287406… | 1 | 2 | an | None | None | None | None |
2336 | 666273097616637952 | NaN | NaN | 2015-11-16 15:14:19 +0000 | <a href=”http://twitter.com/download/iphone” r… | Can take selfies 11/10 https://t.co/ws2AMaNwPW | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666273097… | 11 | 10 | None | None | None | None | None |
2337 | 666268910803644416 | NaN | NaN | 2015-11-16 14:57:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | Very concerned about fellow dog trapped in com… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666268910… | 10 | 10 | None | None | None | None | None |
2338 | 666104133288665088 | NaN | NaN | 2015-11-16 04:02:55 +0000 | <a href=”http://twitter.com/download/iphone” r… | Not familiar with this breed. No tail (weird)…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666104133… | 1 | 10 | None | None | None | None | None |
2339 | 666102155909144576 | NaN | NaN | 2015-11-16 03:55:04 +0000 | <a href=”http://twitter.com/download/iphone” r… | Oh my. Here you are seeing an Adobe Setter giv… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666102155… | 11 | 10 | None | None | None | None | None |
2340 | 666099513787052032 | NaN | NaN | 2015-11-16 03:44:34 +0000 | <a href=”http://twitter.com/download/iphone” r… | Can stand on stump for what seems like a while… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666099513… | 8 | 10 | None | None | None | None | None |
2341 | 666094000022159362 | NaN | NaN | 2015-11-16 03:22:39 +0000 | <a href=”http://twitter.com/download/iphone” r… | This appears to be a Mongolian Presbyterian mi… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666094000… | 9 | 10 | None | None | None | None | None |
2342 | 666082916733198337 | NaN | NaN | 2015-11-16 02:38:37 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a well-established sunblockerspan… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666082916… | 6 | 10 | None | None | None | None | None |
2343 | 666073100786774016 | NaN | NaN | 2015-11-16 01:59:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | Let’s hope this flight isn’t Malaysian (lol). … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666073100… | 10 | 10 | None | None | None | None | None |
2344 | 666071193221509120 | NaN | NaN | 2015-11-16 01:52:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a northern speckled Rhododendron…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666071193… | 9 | 10 | None | None | None | None | None |
2345 | 666063827256086533 | NaN | NaN | 2015-11-16 01:22:45 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is the happiest dog you will ever see. Ve… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666063827… | 10 | 10 | the | None | None | None | None |
2346 | 666058600524156928 | NaN | NaN | 2015-11-16 01:01:59 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is the Rand Paul of retrievers folks! He’… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666058600… | 8 | 10 | the | None | None | None | None |
2347 | 666057090499244032 | NaN | NaN | 2015-11-16 00:55:59 +0000 | <a href=”http://twitter.com/download/iphone” r… | My oh my. This is a rare blond Canadian terrie… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666057090… | 9 | 10 | a | None | None | None | None |
2348 | 666055525042405380 | NaN | NaN | 2015-11-16 00:49:46 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is a Siberian heavily armored polar bear … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666055525… | 10 | 10 | a | None | None | None | None |
2349 | 666051853826850816 | NaN | NaN | 2015-11-16 00:35:11 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an odd dog. Hard on the outside but lo… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666051853… | 2 | 10 | an | None | None | None | None |
2350 | 666050758794694657 | NaN | NaN | 2015-11-16 00:30:50 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a truly beautiful English Wilson Staff… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666050758… | 10 | 10 | a | None | None | None | None |
2351 | 666049248165822465 | NaN | NaN | 2015-11-16 00:24:50 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a 1949 1st generation vulpix. Enj… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666049248… | 5 | 10 | None | None | None | None | None |
2352 | 666044226329800704 | NaN | NaN | 2015-11-16 00:04:52 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a purebred Piers Morgan. Loves to Netf… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666044226… | 6 | 10 | a | None | None | None | None |
2353 | 666033412701032449 | NaN | NaN | 2015-11-15 23:21:54 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is a very happy pup. Big fan of well-main… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666033412… | 9 | 10 | a | None | None | None | None |
2354 | 666029285002620928 | NaN | NaN | 2015-11-15 23:05:30 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a western brown Mitsubishi terrier. Up… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666029285… | 7 | 10 | a | None | None | None | None |
2355 | 666020888022790149 | NaN | NaN | 2015-11-15 22:32:08 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a Japanese Irish Setter. Lost eye… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666020888… | 8 | 10 | None | None | None | None | None |
2356 rows × 17 columnsIn [69]:
archive.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2356 entries, 0 to 2355 Data columns (total 17 columns): tweet_id 2356 non-null int64 in_reply_to_status_id 78 non-null float64 in_reply_to_user_id 78 non-null float64 timestamp 2356 non-null object source 2356 non-null object text 2356 non-null object retweeted_status_id 181 non-null float64 retweeted_status_user_id 181 non-null float64 retweeted_status_timestamp 181 non-null object expanded_urls 2297 non-null object rating_numerator 2356 non-null int64 rating_denominator 2356 non-null int64 name 2356 non-null object doggo 2356 non-null object floofer 2356 non-null object pupper 2356 non-null object puppo 2356 non-null object dtypes: float64(4), int64(3), object(10) memory usage: 313.0+ KB
In [29]:
# Count number of not 'None' values in columns 'doggo' to 'puppo' (archive.loc[:,'doggo':'puppo'] != 'None').sum()
Out[29]:
doggo 97 floofer 10 pupper 257 puppo 30 dtype: int64
In [31]:
# Count number of cells of `text` with doggo, floofer, pupper, and puppo for column in archive.columns[-4:]: print(column, archive.text.str.contains(column).sum())
doggo 98 floofer 4 pupper 272 puppo 37
In [38]:
# Check if name is always captured archive[['text', 'name']].sample(10)
Out[38]:
text | name | |
---|---|---|
1128 | This is Stefan. He’s a downright remarkable pu… | Stefan |
2172 | Just got home from college. Dis my dog. She do… | None |
935 | This is Scout. Her batteries are low. 12/10 pr… | Scout |
518 | This is Pavlov. His floatation device has fail… | Pavlov |
1132 | When you’re way too slow for the “down low” po… | None |
1891 | These two pups are masters of camouflage. Very… | None |
684 | Atlas is back and this time he’s got doggles. … | None |
2269 | This a Norwegian Pewterschmidt named Tickles. … | None |
1583 | Army of water dogs here. None of them know whe… | None |
904 | This is Corey. He’s a Portobello Corgicool. Tr… | Corey |
In [39]:
# Identify example of missing name archive.text[2269]
Out[39]:
'This a Norwegian Pewterschmidt named Tickles. Ears for days. 12/10 I care deeply for Tickles https://t.co/0aDF62KVP7'
In [19]:
# Identify example of two names archive.text[2232]
These two dogs are Bo & Smittens. Smittens is trying out a new deodorant and wanted Bo to smell it. 10/10 true pals https://t.co/4pw1QQ6udh
In [79]:
archive.name.value_counts()
Out[79]:
None 745 a 55 Charlie 12 Oliver 11 Cooper 11 Lucy 11 Lola 10 Tucker 10 Penny 10 Bo 9 Winston 9 Sadie 8 the 8 an 7 Toby 7 Daisy 7 Bailey 7 Buddy 7 Jax 6 Scout 6 Bella 6 Oscar 6 Jack 6 Rusty 6 Stanley 6 Milo 6 Leo 6 Dave 6 Koda 6 Gus 5 ... Taco 1 Bert 1 Alexander 1 Rorie 1 Shikha 1 Snoop 1 old 1 Deacon 1 Grady 1 Yoda 1 Duchess 1 Ivar 1 Kathmandu 1 Sid 1 Dobby 1 Brudge 1 Sandra 1 Genevieve 1 Lillie 1 Dewey 1 Tedrick 1 Leonard 1 Bobby 1 Mookie 1 O 1 Rooney 1 Dook 1 Rinna 1 Kendall 1 Alfy 1 Name: name, Length: 957, dtype: int64
In [120]:
archive.rating_numerator.describe()
Out[120]:
count 2356.000000 mean 13.126486 std 45.876648 min 0.000000 25% 10.000000 50% 11.000000 75% 12.000000 max 1776.000000 Name: rating_numerator, dtype: float64
In [121]:
archive.rating_denominator.describe()
Out[121]:
count 2356.000000 mean 10.455433 std 6.745237 min 0.000000 25% 10.000000 50% 10.000000 75% 10.000000 max 170.000000 Name: rating_denominator, dtype: float64
predictions
table
In [58]:
predictions
Out[58]:
tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | Welsh_springer_spaniel | 0.465074 | True | collie | 0.156665 | True | Shetland_sheepdog | 0.061428 | True |
1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | redbone | 0.506826 | True | miniature_pinscher | 0.074192 | True | Rhodesian_ridgeback | 0.072010 | True |
2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | German_shepherd | 0.596461 | True | malinois | 0.138584 | True | bloodhound | 0.116197 | True |
3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | 1 | Rhodesian_ridgeback | 0.408143 | True | redbone | 0.360687 | True | miniature_pinscher | 0.222752 | True |
4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | 1 | miniature_pinscher | 0.560311 | True | Rottweiler | 0.243682 | True | Doberman | 0.154629 | True |
5 | 666050758794694657 | https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg | 1 | Bernese_mountain_dog | 0.651137 | True | English_springer | 0.263788 | True | Greater_Swiss_Mountain_dog | 0.016199 | True |
6 | 666051853826850816 | https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg | 1 | box_turtle | 0.933012 | False | mud_turtle | 0.045885 | False | terrapin | 0.017885 | False |
7 | 666055525042405380 | https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg | 1 | chow | 0.692517 | True | Tibetan_mastiff | 0.058279 | True | fur_coat | 0.054449 | False |
8 | 666057090499244032 | https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg | 1 | shopping_cart | 0.962465 | False | shopping_basket | 0.014594 | False | golden_retriever | 0.007959 | True |
9 | 666058600524156928 | https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg | 1 | miniature_poodle | 0.201493 | True | komondor | 0.192305 | True | soft-coated_wheaten_terrier | 0.082086 | True |
10 | 666063827256086533 | https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg | 1 | golden_retriever | 0.775930 | True | Tibetan_mastiff | 0.093718 | True | Labrador_retriever | 0.072427 | True |
11 | 666071193221509120 | https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg | 1 | Gordon_setter | 0.503672 | True | Yorkshire_terrier | 0.174201 | True | Pekinese | 0.109454 | True |
12 | 666073100786774016 | https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg | 1 | Walker_hound | 0.260857 | True | English_foxhound | 0.175382 | True | Ibizan_hound | 0.097471 | True |
13 | 666082916733198337 | https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg | 1 | pug | 0.489814 | True | bull_mastiff | 0.404722 | True | French_bulldog | 0.048960 | True |
14 | 666094000022159362 | https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg | 1 | bloodhound | 0.195217 | True | German_shepherd | 0.078260 | True | malinois | 0.075628 | True |
15 | 666099513787052032 | https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg | 1 | Lhasa | 0.582330 | True | Shih-Tzu | 0.166192 | True | Dandie_Dinmont | 0.089688 | True |
16 | 666102155909144576 | https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg | 1 | English_setter | 0.298617 | True | Newfoundland | 0.149842 | True | borzoi | 0.133649 | True |
17 | 666104133288665088 | https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg | 1 | hen | 0.965932 | False | cock | 0.033919 | False | partridge | 0.000052 | False |
18 | 666268910803644416 | https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg | 1 | desktop_computer | 0.086502 | False | desk | 0.085547 | False | bookcase | 0.079480 | False |
19 | 666273097616637952 | https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg | 1 | Italian_greyhound | 0.176053 | True | toy_terrier | 0.111884 | True | basenji | 0.111152 | True |
20 | 666287406224695296 | https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg | 1 | Maltese_dog | 0.857531 | True | toy_poodle | 0.063064 | True | miniature_poodle | 0.025581 | True |
21 | 666293911632134144 | https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg | 1 | three-toed_sloth | 0.914671 | False | otter | 0.015250 | False | great_grey_owl | 0.013207 | False |
22 | 666337882303524864 | https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg | 1 | ox | 0.416669 | False | Newfoundland | 0.278407 | True | groenendael | 0.102643 | True |
23 | 666345417576210432 | https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg | 1 | golden_retriever | 0.858744 | True | Chesapeake_Bay_retriever | 0.054787 | True | Labrador_retriever | 0.014241 | True |
24 | 666353288456101888 | https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg | 1 | malamute | 0.336874 | True | Siberian_husky | 0.147655 | True | Eskimo_dog | 0.093412 | True |
25 | 666362758909284353 | https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg | 1 | guinea_pig | 0.996496 | False | skunk | 0.002402 | False | hamster | 0.000461 | False |
26 | 666373753744588802 | https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg | 1 | soft-coated_wheaten_terrier | 0.326467 | True | Afghan_hound | 0.259551 | True | briard | 0.206803 | True |
27 | 666396247373291520 | https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg | 1 | Chihuahua | 0.978108 | True | toy_terrier | 0.009397 | True | papillon | 0.004577 | True |
28 | 666407126856765440 | https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg | 1 | black-and-tan_coonhound | 0.529139 | True | bloodhound | 0.244220 | True | flat-coated_retriever | 0.173810 | True |
29 | 666411507551481857 | https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg | 1 | coho | 0.404640 | False | barracouta | 0.271485 | False | gar | 0.189945 | False |
… | … | … | … | … | … | … | … | … | … | … | … | … |
2045 | 886366144734445568 | https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg | 1 | French_bulldog | 0.999201 | True | Chihuahua | 0.000361 | True | Boston_bull | 0.000076 | True |
2046 | 886680336477933568 | https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg | 1 | convertible | 0.738995 | False | sports_car | 0.139952 | False | car_wheel | 0.044173 | False |
2047 | 886736880519319552 | https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg | 1 | kuvasz | 0.309706 | True | Great_Pyrenees | 0.186136 | True | Dandie_Dinmont | 0.086346 | True |
2048 | 886983233522544640 | https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg | 2 | Chihuahua | 0.793469 | True | toy_terrier | 0.143528 | True | can_opener | 0.032253 | False |
2049 | 887101392804085760 | https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg | 1 | Samoyed | 0.733942 | True | Eskimo_dog | 0.035029 | True | Staffordshire_bullterrier | 0.029705 | True |
2050 | 887343217045368832 | https://pbs.twimg.com/ext_tw_video_thumb/88734… | 1 | Mexican_hairless | 0.330741 | True | sea_lion | 0.275645 | False | Weimaraner | 0.134203 | True |
2051 | 887473957103951883 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True |
2052 | 887517139158093824 | https://pbs.twimg.com/ext_tw_video_thumb/88751… | 1 | limousine | 0.130432 | False | tow_truck | 0.029175 | False | shopping_cart | 0.026321 | False |
2053 | 887705289381826560 | https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg | 1 | basset | 0.821664 | True | redbone | 0.087582 | True | Weimaraner | 0.026236 | True |
2054 | 888078434458587136 | https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg | 1 | French_bulldog | 0.995026 | True | pug | 0.000932 | True | bull_mastiff | 0.000903 | True |
2055 | 888202515573088257 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True |
2056 | 888554962724278272 | https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg | 3 | Siberian_husky | 0.700377 | True | Eskimo_dog | 0.166511 | True | malamute | 0.111411 | True |
2057 | 888804989199671297 | https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg | 1 | golden_retriever | 0.469760 | True | Labrador_retriever | 0.184172 | True | English_setter | 0.073482 | True |
2058 | 888917238123831296 | https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg | 1 | golden_retriever | 0.714719 | True | Tibetan_mastiff | 0.120184 | True | Labrador_retriever | 0.105506 | True |
2059 | 889278841981685760 | https://pbs.twimg.com/ext_tw_video_thumb/88927… | 1 | whippet | 0.626152 | True | borzoi | 0.194742 | True | Saluki | 0.027351 | True |
2060 | 889531135344209921 | https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg | 1 | golden_retriever | 0.953442 | True | Labrador_retriever | 0.013834 | True | redbone | 0.007958 | True |
2061 | 889638837579907072 | https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg | 1 | French_bulldog | 0.991650 | True | boxer | 0.002129 | True | Staffordshire_bullterrier | 0.001498 | True |
2062 | 889665388333682689 | https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg | 1 | Pembroke | 0.966327 | True | Cardigan | 0.027356 | True | basenji | 0.004633 | True |
2063 | 889880896479866881 | https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg | 1 | French_bulldog | 0.377417 | True | Labrador_retriever | 0.151317 | True | muzzle | 0.082981 | False |
2064 | 890006608113172480 | https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg | 1 | Samoyed | 0.957979 | True | Pomeranian | 0.013884 | True | chow | 0.008167 | True |
2065 | 890240255349198849 | https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg | 1 | Pembroke | 0.511319 | True | Cardigan | 0.451038 | True | Chihuahua | 0.029248 | True |
2066 | 890609185150312448 | https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg | 1 | Irish_terrier | 0.487574 | True | Irish_setter | 0.193054 | True | Chesapeake_Bay_retriever | 0.118184 | True |
2067 | 890729181411237888 | https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg | 2 | Pomeranian | 0.566142 | True | Eskimo_dog | 0.178406 | True | Pembroke | 0.076507 | True |
2068 | 890971913173991426 | https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg | 1 | Appenzeller | 0.341703 | True | Border_collie | 0.199287 | True | ice_lolly | 0.193548 | False |
2069 | 891087950875897856 | https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg | 1 | Chesapeake_Bay_retriever | 0.425595 | True | Irish_terrier | 0.116317 | True | Indian_elephant | 0.076902 | False |
2070 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | 2 | basset | 0.555712 | True | English_springer | 0.225770 | True | German_short-haired_pointer | 0.175219 | True |
2071 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | 1 | paper_towel | 0.170278 | False | Labrador_retriever | 0.168086 | True | spatula | 0.040836 | False |
2072 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | 1 | Chihuahua | 0.716012 | True | malamute | 0.078253 | True | kelpie | 0.031379 | True |
2073 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | 1 | Chihuahua | 0.323581 | True | Pekinese | 0.090647 | True | papillon | 0.068957 | True |
2074 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | 1 | orange | 0.097049 | False | bagel | 0.085851 | False | banana | 0.076110 | False |
2075 rows × 12 columns
Does this give me all of the images I want? Could I just use the ID’s to subset archive
?
In [59]:
predictions.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2075 entries, 0 to 2074 Data columns (total 12 columns): tweet_id 2075 non-null int64 jpg_url 2075 non-null object img_num 2075 non-null int64 p1 2075 non-null object p1_conf 2075 non-null float64 p1_dog 2075 non-null bool p2 2075 non-null object p2_conf 2075 non-null float64 p2_dog 2075 non-null bool p3 2075 non-null object p3_conf 2075 non-null float64 p3_dog 2075 non-null bool dtypes: bool(3), float64(3), int64(2), object(4) memory usage: 152.1+ KB
api_data
table
In [31]:
api_data
Out[31]:
tweet_id | retweet_count | favorite_count | followers_count | |
---|---|---|---|---|
0 | 892420643555336193 | 8559 | 38695 | 6989325 |
1 | 892177421306343426 | 6293 | 33168 | 6989325 |
2 | 891815181378084864 | 4176 | 24966 | 6989325 |
3 | 891689557279858688 | 8683 | 42080 | 6989325 |
4 | 891327558926688256 | 9451 | 40228 | 6989325 |
5 | 891087950875897856 | 3127 | 20175 | 6989325 |
6 | 890971913173991426 | 2082 | 11820 | 6989325 |
7 | 890729181411237888 | 18984 | 65374 | 6989325 |
8 | 890609185150312448 | 4281 | 27728 | 6989325 |
9 | 890240255349198849 | 7453 | 31873 | 6989325 |
10 | 890006608113172480 | 7367 | 30589 | 6989325 |
11 | 889880896479866881 | 4993 | 27727 | 6989325 |
12 | 889665388333682689 | 10110 | 48021 | 6989325 |
13 | 889638837579907072 | 4567 | 27117 | 6989325 |
14 | 889531135344209921 | 2243 | 15066 | 6989325 |
15 | 889278841981685760 | 5452 | 25246 | 6989325 |
16 | 888917238123831296 | 4516 | 29016 | 6989325 |
17 | 888804989199671297 | 4365 | 25537 | 6989325 |
18 | 888554962724278272 | 3597 | 19860 | 6989326 |
19 | 888078434458587136 | 3511 | 21707 | 6989326 |
20 | 887705289381826560 | 5417 | 30117 | 6989326 |
21 | 887517139158093824 | 11719 | 46132 | 6989326 |
22 | 887473957103951883 | 18314 | 68960 | 6989326 |
23 | 887343217045368832 | 10451 | 33602 | 6989326 |
24 | 887101392804085760 | 5988 | 30459 | 6989326 |
25 | 886983233522544640 | 7809 | 35074 | 6989326 |
26 | 886736880519319552 | 3314 | 12043 | 6989326 |
27 | 886680336477933568 | 4489 | 22367 | 6989326 |
28 | 886366144734445568 | 3209 | 21150 | 6989326 |
29 | 886267009285017600 | 4 | 116 | 6989326 |
… | … | … | … | … |
2315 | 666411507551481857 | 328 | 448 | 6989510 |
2316 | 666407126856765440 | 41 | 110 | 6989510 |
2317 | 666396247373291520 | 86 | 166 | 6989510 |
2318 | 666373753744588802 | 93 | 189 | 6989510 |
2319 | 666362758909284353 | 574 | 779 | 6989510 |
2320 | 666353288456101888 | 73 | 221 | 6989510 |
2321 | 666345417576210432 | 139 | 298 | 6989510 |
2322 | 666337882303524864 | 92 | 199 | 6989510 |
2323 | 666293911632134144 | 357 | 509 | 6989510 |
2324 | 666287406224695296 | 66 | 148 | 6989510 |
2325 | 666273097616637952 | 76 | 175 | 6989511 |
2326 | 666268910803644416 | 35 | 104 | 6989510 |
2327 | 666104133288665088 | 6637 | 14353 | 6989510 |
2328 | 666102155909144576 | 13 | 80 | 6989510 |
2329 | 666099513787052032 | 68 | 156 | 6989510 |
2330 | 666094000022159362 | 74 | 164 | 6989510 |
2331 | 666082916733198337 | 45 | 119 | 6989510 |
2332 | 666073100786774016 | 164 | 322 | 6989510 |
2333 | 666071193221509120 | 62 | 148 | 6989510 |
2334 | 666063827256086533 | 220 | 476 | 6989510 |
2335 | 666058600524156928 | 57 | 112 | 6989510 |
2336 | 666057090499244032 | 142 | 298 | 6989510 |
2337 | 666055525042405380 | 252 | 434 | 6989510 |
2338 | 666051853826850816 | 853 | 1224 | 6989510 |
2339 | 666050758794694657 | 58 | 132 | 6989511 |
2340 | 666049248165822465 | 41 | 109 | 6989511 |
2341 | 666044226329800704 | 141 | 298 | 6989511 |
2342 | 666033412701032449 | 45 | 125 | 6989511 |
2343 | 666029285002620928 | 47 | 129 | 6989510 |
2344 | 666020888022790149 | 517 | 2560 | 6989510 |
2345 rows × 4 columnsIn [33]:
api_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2345 entries, 0 to 2344 Data columns (total 4 columns): tweet_id 2345 non-null int64 retweet_count 2345 non-null int64 favorite_count 2345 non-null int64 followers_count 2345 non-null int64 dtypes: int64(4) memory usage: 73.4 KB
Data Inclusion Criteria
We are expected to use the following criteria to select the required data:
- Do not include retweets
- Only tweets that have images
An additional comment was made in student discussions: The archive also has reply tweets which in general contain upgraded/downgraded ratings of the dog. This means that in some cases there are two observations/ratings for the same dog. As a result, I decided to only include original ratings and so developed an additional criteria:
- Do not include replies
Findings
Quality
archive
table
- Retweets are included in the dataset
- Replies are included in the dataset
- Erroneous datatypes (tweet_id, in_reply_to_status_id, in_reply_to_user_id, timestamp, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp, doggo, floofer, pupper, and puppo columns)
- Missing info in
expanded_urls
- Nulls represented as “None” (str) for
name
,doggo
,floofer
,pupper
, andpuppo
columns - Missing counts for
doggo
,floofer
,pupper
andpuppo
- Missing names identified from
text
inname
e.g. index 1852 – Reggie - Some names identified are not names
text
column includes both text and short version of link- Second name missing if two are mentioned, e.g. index 2232 – Bo & Smittens
- Some extracted values for
rating_numerator
andrating_denominator
seem to be in error
predictions
table
- Erroneous datatype (tweet_id)
- The lower number of entries means that some posts don’t have images
api_data
table
- Erroneous datatype (tweet_id)
- Retweet and favorite information is not available for all tweets and cannot be retrieved
Tidiness
archive
table
- There are multiple columns containing the same type of data, e.g.
doggo
,floofer
,pupper
andpuppo
all contain dog types
predictions
table
- There are multiple columns containing the same type of data, e.g.
p1
,p2
,p3
all contain dog breed predictions
api_data
table
- This data is separate from the other tweet data
Clean the Data
In [7]:
# Make copies to preserve the original datasets archive_clean = archive.copy() predictions_clean = predictions.copy() api_data_clean = api_data.copy()
Missing Data
There are four areas of missing data identified:
- Missing info in
expanded_urls
- Missing counts for
doggo
,floofer
,pupper
andpuppo
- Missing names identified from
text
inname
e.g. index 1852 – Reggie - Second name missing if two are mentioned, e.g. index 2232 – Bo & Smittens
I am not concerned about tracking down the missing url information because I don’t plan to analyze it.
Missing counts for doggo
, floofer
, pupper
and puppo
in archive
table
The issue of Nulls represented as “None” (str) for doggo
, floofer
, pupper
, and puppo
columns is also able to be addressed here.
Define
Use for
loop and .str.contains()
to re-identify if text
contains each column header. Include text if it is found. If not, return NaN.
Code
In [8]:
dog_types = list(archive_clean.iloc[:,-4:]) dog_types
Out[8]:
['doggo', 'floofer', 'pupper', 'puppo']
In [9]:
def find_dog_type(df, dog_type): dog_list = [] for row in df['text']: if dog_type in row: dog_list.append(dog_type) else: dog_list.append(np.NaN) return dog_list
In [10]:
for dog_type in dog_types: archive_clean[dog_type] = find_dog_type(archive, dog_type)
Resources:
Test
In [78]:
# Check non-null data counts for columns archive_clean[dog_types].info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2356 entries, 0 to 2355 Data columns (total 4 columns): doggo 98 non-null object floofer 4 non-null object pupper 272 non-null object puppo 37 non-null object dtypes: object(4) memory usage: 73.7+ KB
In [45]:
# Compare to counts from text for dog_type in dog_types: print(dog_type, archive_clean.text.str.contains(dog_type).sum())
doggo 98 floofer 4 pupper 272 puppo 37
The counts of what is found in the text
strings matches what is found in the columns.
Missing names identified from text
in name
in archive
table
The issues of Some names identified are not names and Nulls represented as “None” (str) for name can also be addressed here.
This would also be the place to address Second name missing if two are mentioned however I decided that this was too difficult for me to do.
Define
Create function to identify pet names and re-populate name
column
Code
Pet names are capitalized, usually less than 10 characters but at least 2, and typically found before the first period. They typically only include letters and apostrophes and certain words are not usually used as pet names.In [11]:
def find_names(df): name_list = [] for row in df['text']: # Find first "." first_period = row.find(".") # If no period is found, assume there is no name if first_period == -1: name_list.append(np.NaN) else: # Find word before period word_before = row[:first_period].rsplit(' ', 1)[-1] # Check if word is capitalized if word_before == word_before.title(): # Add exclusionary criteria - not more than 10 letters, not less than 2 letters, not in other_words, only alphanumeric other_words = ["This", "Xbox", "Oh", "Christmas", "Up", "Pupper", "Doggo", "Puppo", "Floofer"] if (len(word_before) > 10) or any(word in word_before for word in other_words) or any(c for c in word_before if c not in string.ascii_letters + "'") or (len(word_before) < 2): name_list.append(np.NaN) else: name_list.append(word_before) else: name_list.append(np.NaN) return name_list
In [12]:
name_list = find_names(archive_clean) archive_clean.name = name_list
Resources:
- Find string before
- Split string by value
- Check if list element in string
- Check for only alpha and apostrophe
- Check for whitespace
Test
In [101]:
# View names and NaNs archive_clean.name.head(10)
Out[101]:
0 Phineas 1 Tilly 2 Archie 3 Darla 4 Franklin 5 NaN 6 Jax 7 NaN 8 Zoey 9 Cassie Name: name, dtype: object
In [157]:
# Check value counts for unexpected names archive_clean.name.value_counts()
Out[157]:
Charlie 14 Oliver 12 Cooper 11 Tucker 10 Lola 10 Lucy 10 Winston 9 Penny 9 Daisy 8 Bailey 7 Buddy 7 Bo 7 Toby 6 Scout 6 Sadie 6 Bella 6 Rusty 6 Stanley 6 Leo 6 Dave 6 Milo 6 Koda 6 Loki 5 Louis 5 Sophie 5 Jax 5 Gus 5 Larry 5 Ruby 5 Oscar 5 .. Jeffri 1 Lilah 1 Fwed 1 Bert 1 Rorie 1 Rinna 1 Pubert 1 Rooney 1 Kathmandu 1 Pumpkin 1 Darby 1 Apollo 1 Gustav 1 Banjo 1 Jeremy 1 Yoda 1 Duchess 1 Ivar 1 Hemry 1 Mookie 1 Dobby 1 Brudge 1 Sandra 1 Genevieve 1 Grady 1 Lillie 1 Tedrick 1 Leonard 1 Striker 1 Alfy 1 Name: name, Length: 961, dtype: int64
In [135]:
# Visually compare sample of results archive_clean[['text', 'name']].sample(10)
Out[135]:
text | name | |
---|---|---|
2188 | This is Jeremy. He hasn’t grown into his skin … | Jeremy |
312 | Meet Lola. Her hobbies include being precious … | Lola |
2130 | This is Wally. He’s a Flaccid Mitochondria. Go… | Wally |
971 | Meet Lilah. She agreed on one quick pic. Now s… | Lilah |
2353 | Here is a very happy pup. Big fan of well-main… | NaN |
2280 | This is Fwed. He is a Canadian Asian Taylormad… | Fwed |
942 | This is Grizzie. She’s a semi-submerged Bahrai… | Grizzie |
732 | Idk why this keeps happening. We only rate dog… | NaN |
2178 | Super rare dog right here guys. Doesn’t bark. … | NaN |
322 | This is Sunshine. She doesn’t believe in perso… | Sunshine |
Tidy Data
The next step is to address tidiness issues. Three were identified:
- There are multiple columns containing the same type of data in the
archive
table, e.g. doggo, floofer, pupper, puppo - There are multiple columns containing the same type of data in the
predictions
table, e.g. p1, p2, p3 all contain dog breed predictions - The tweet data in the
api_data
table is separate from the other tweet data
Multiple columns containing the same type of data in the archive
table
There is a small amount of overlap, but I would rather the posts be classified once.
Define
Create a column called dog_type
and merge all data in order of puppo
, pupper
, floofer
, doggo
using .fillna()
. Drop the redundant columns.
Code
In [13]:
archive_clean['dog_type'] = archive_clean.puppo.fillna(archive_clean.pupper.fillna(archive_clean.floofer.fillna(archive_clean.doggo)))
In [14]:
archive_clean.drop(['doggo', 'floofer', 'pupper', 'puppo'], axis=1, inplace=True)
Test
In [146]:
# Confirm NaNs remain archive_clean.dog_type.head(10)
Out[146]:
0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 NaN 8 NaN 9 doggo Name: dog_type, dtype: object
In [147]:
# Check dog_type counts archive_clean.dog_type.value_counts()
Out[147]:
pupper 272 doggo 86 puppo 37 floofer 4 Name: dog_type, dtype: int64
The original count was:
- doggo 98
- floofer 4
- pupper 272
- puppo 37
So just lost 12 counts from doggo which seems acceptable to me.In [139]:
# Confirm column drop archive_clean.columns
Out[139]:
Index(['tweet_id', 'in_reply_to_status_id', 'in_reply_to_user_id', 'timestamp', 'source', 'text', 'retweeted_status_id', 'retweeted_status_user_id', 'retweeted_status_timestamp', 'expanded_urls', 'rating_numerator', 'rating_denominator', 'name', 'dog_type'], dtype='object')
Multiple columns containing the same type of data in the predictions
table
Define
Change columns names for ease of use with pd.wide_to_long
. Use pd.wide_to_long
to
- melt
p1_conf
,p2_conf
,p3_conf
to aconfidence
column - melt
p1
,p2
,p3
to aprediction
column - melt
p1_dog
,p2_dog
,p3_dog
to adog
column.
Code
In [15]:
# Change column names col_names = ['tweet_id', 'jpg_url', 'img_num', 'prediction_1', 'confidence_1', 'dog_1', 'prediction_2', 'confidence_2', 'dog_2', 'prediction_3', 'confidence_3', 'dog_3'] predictions_clean.columns = col_names
In [16]:
# Convert wide to long predictions_clean = pd.wide_to_long(predictions_clean, stubnames=['prediction', 'confidence', 'dog'], i=['tweet_id', 'jpg_url', 'img_num'], j='prediction_order', sep='_')\ .reset_index()
Test
In [189]:
# Visual inspection predictions_clean.head(9)
Out[189]:
tweet_id | jpg_url | img_num | prediction_order | prediction | confidence | dog | |
---|---|---|---|---|---|---|---|
0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | 1 | Welsh_springer_spaniel | 0.465074 | True |
1 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | 2 | collie | 0.156665 | True |
2 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | 3 | Shetland_sheepdog | 0.061428 | True |
3 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | 1 | redbone | 0.506826 | True |
4 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | 2 | miniature_pinscher | 0.074192 | True |
5 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | 3 | Rhodesian_ridgeback | 0.072010 | True |
6 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | 1 | German_shepherd | 0.596461 | True |
7 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | 2 | malinois | 0.138584 | True |
8 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | 3 | bloodhound | 0.116197 | True |
In [190]:
predictions_clean.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6225 entries, 0 to 6224 Data columns (total 7 columns): tweet_id 6225 non-null int64 jpg_url 6225 non-null object img_num 6225 non-null int64 prediction_order 6225 non-null object prediction 6225 non-null object confidence 6225 non-null float64 dog 6225 non-null bool dtypes: bool(1), float64(1), int64(2), object(3) memory usage: 298.0+ KB
In [191]:
# Compare count to original counts 6225/2075
Out[191]:
3.0
Given that there are three predictions for each, it is expected that the length would increase by three times. This is what has occurred.
Resources:
Tweet data in the api_data table is separate from the other tweet data
Define
Merge the data from api_data
with the archive
table
Code
In [17]:
archive_clean = pd.merge(left=archive_clean, right=api_data_clean, how='left', on='tweet_id')
Test
In [194]:
archive_clean.head()
Out[194]:
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | dog_type | retweet_count | favorite_count | followers_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892420643555336193 | NaN | NaN | 2017-08-01 16:23:56 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Phineas. He’s a mystical boy. Only eve… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892420643… | 13 | 10 | Phineas | NaN | 8561.0 | 38696.0 | 6984446.0 |
1 | 892177421306343426 | NaN | NaN | 2017-08-01 00:17:27 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Tilly. She’s just checking pup on you…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892177421… | 13 | 10 | Tilly | NaN | 6295.0 | 33171.0 | 6984446.0 |
2 | 891815181378084864 | NaN | NaN | 2017-07-31 00:18:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Archie. He is a rare Norwegian Pouncin… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891815181… | 12 | 10 | Archie | NaN | 4176.0 | 24967.0 | 6984446.0 |
3 | 891689557279858688 | NaN | NaN | 2017-07-30 15:58:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Darla. She commenced a snooze mid meal… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891689557… | 13 | 10 | Darla | NaN | 8683.0 | 42087.0 | 6984446.0 |
4 | 891327558926688256 | NaN | NaN | 2017-07-29 16:00:24 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Franklin. He would like you to stop ca… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891327558… | 12 | 10 | Franklin | NaN | 9453.0 | 40235.0 | 6984446.0 |
In [82]:
archive_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2356 entries, 0 to 2355 Data columns (total 17 columns): tweet_id 2356 non-null int64 in_reply_to_status_id 78 non-null float64 in_reply_to_user_id 78 non-null float64 timestamp 2356 non-null object source 2356 non-null object text 2356 non-null object retweeted_status_id 181 non-null float64 retweeted_status_user_id 181 non-null float64 retweeted_status_timestamp 181 non-null object expanded_urls 2297 non-null object rating_numerator 2356 non-null int64 rating_denominator 2356 non-null int64 name 1531 non-null object dog_type 399 non-null object retweet_count 2345 non-null float64 favorite_count 2345 non-null float64 followers_count 2345 non-null float64 dtypes: float64(7), int64(3), object(7) memory usage: 331.3+ KB
Data Quality
Some posts don’t have images
Define
Remove any tweet ids in the archive
table that aren’t in the predictions
table.
Code
In [18]:
# Confirm the number to be removed no_image = (~archive_clean.tweet_id.isin(list(predictions_clean.tweet_id))) no_image.sum()
Out[18]:
281
In [19]:
# Remove non-shared tweet_id's archive_clean = archive_clean[~no_image]
Test
In [65]:
# Confirm no tweet_id's without images (~archive_clean.tweet_id.isin(list(predictions_clean.tweet_id))).sum()
Out[65]:
0
In [86]:
# Confirm new archive_clean counts archive_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2075 entries, 0 to 2355 Data columns (total 17 columns): tweet_id 2075 non-null int64 in_reply_to_status_id 23 non-null float64 in_reply_to_user_id 23 non-null float64 timestamp 2075 non-null object source 2075 non-null object text 2075 non-null object retweeted_status_id 81 non-null float64 retweeted_status_user_id 81 non-null float64 retweeted_status_timestamp 81 non-null object expanded_urls 2075 non-null object rating_numerator 2075 non-null int64 rating_denominator 2075 non-null int64 name 1422 non-null object dog_type 338 non-null object retweet_count 2069 non-null float64 favorite_count 2069 non-null float64 followers_count 2069 non-null float64 dtypes: float64(7), int64(3), object(7) memory usage: 291.8+ KB
Replies and retweets are included in archive
table
Define
- Identify rows that have info for
in_reply_to_status_id
orretweeted_status_id
and remove fromarchive_clean
. - Remove redundant columns (
in_reply_to_status_id
,in_reply_to_user_id
,retweeted_status_id
,retweeted_status_user_id
,retweeted_status_timestamp
). - Remove non-shared id’s from
predictions_clean
Code
In [20]:
# Check rows to remove for replies replies = (~archive_clean.in_reply_to_status_id.isnull()) replies.sum()
Out[20]:
23
In [21]:
# Remove replies archive_clean = archive_clean[~replies]
In [22]:
# Check rows to remove for retweets retweets = (~archive_clean.retweeted_status_user_id.isnull()) retweets.sum()
Out[22]:
81
In [23]:
# Remove retweets archive_clean = archive_clean[~retweets]
In [24]:
archive_clean.drop(['in_reply_to_status_id', 'in_reply_to_user_id', 'retweeted_status_id', 'retweeted_status_user_id', 'retweeted_status_timestamp'], axis=1, inplace=True)
In [25]:
# Identify tweet_ids in predictions not in archive not_shared = (~predictions_clean.tweet_id.isin(list(archive_clean.tweet_id))) not_shared.sum()
Out[25]:
312
This makes sense because it is 3 times 104 (the number of rows that were removed from archive_clean
).In [26]:
predictions_clean = predictions_clean[~not_shared]
Test
In [95]:
# Confirm new archive_clean counts archive_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1971 entries, 0 to 2355 Data columns (total 12 columns): tweet_id 1971 non-null int64 timestamp 1971 non-null object source 1971 non-null object text 1971 non-null object expanded_urls 1971 non-null object rating_numerator 1971 non-null int64 rating_denominator 1971 non-null int64 name 1367 non-null object dog_type 322 non-null object retweet_count 1971 non-null float64 favorite_count 1971 non-null float64 followers_count 1971 non-null float64 dtypes: float64(3), int64(3), object(6) memory usage: 200.2+ KB
Note: removing all of the replies and retweets also removed all rows that didnt’ have the api_data
information.In [83]:
2075 - (23 + 81)
Out[83]:
1971
The expected number of rows were removed and the columns were removed.In [98]:
# Confirm no unshared prection_clean tweet_id's with archive_clean (~predictions_clean.tweet_id.isin(list(archive_clean.tweet_id))).sum()
Out[98]:
0
text
column in archive
contains both text and short link
Define
Create a function to remove links and apply it to achive_clean.text
.
Code
In [27]:
def remove_link(x): http_pos = x.find("http") # If no link, retain row if http_pos == -1: x = x else: # Remove space before link to end x = x[:http_pos - 1] return x
In [28]:
archive_clean.text = archive_clean.text.apply(remove_link)
Test
In [118]:
# Print full text to check endings for row in archive_clean.text[:5]: print(row)
This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek
Values for rating_numerator are incorrect
Define
Create a function that identifies the value before the last /
in the text and uses this in the rating_numerator
column. Manually correct any ratings that are not covered by the function.
Code
In [42]:
def find_numerator(x): # Ratings are associated with the last "/" slash = x.rfind("/") # Don't need to check for missing because original set only includes tweets with ratings # Most ratings are two digits, but if not, preceded by " ", "()" or "..." # Check for decimal try: if x[slash - 2] == ".": numerator = x[slash - 4:slash].strip() if numerator[0] == ".": numerator = numerator.strip("...").strip("..") else: numerator = x[slash - 2:slash].strip().strip("(") return float(numerator) # Manage strange formatting except ValueError: return np.NaN
In [43]:
archive_clean.rating_numerator = archive_clean.text.apply(find_numerator)
In [44]:
# Identify strange formatting missing_numerator = list(archive_clean[archive_clean.rating_numerator.isnull()].index) missing_numerator
Out[44]:
[2216, 2246]
In [42]:
# Check full text for each for index in missing_numerator: print(index, archive_clean.text[index])
2216 This is Spark. He's nervous. Other dog hasn't moved in a while. Won't come when called. Doesn't fetch well 8/10&1/10 2246 This is Tedrick. He lives on the edge. Needs someone to hit the gas tho. Other than that he's a baller. 10&2/10
One contains two ratings and one is a humerous expression related to the picture. I’m going to go with 8 and 10In [45]:
archive_clean.at[missing_numerator[0], 'rating_numerator'] = 8 archive_clean.at[missing_numerator[1], 'rating_numerator'] = 10
Test
In [46]:
# Check all values are filled archive_clean.rating_numerator.isnull().sum()
Out[46]:
0
In [85]:
# Check range of values archive_clean.rating_numerator.describe()
Out[85]:
count 1971.000000 mean 10.893709 std 5.103397 min 0.000000 25% 10.000000 50% 11.000000 75% 12.000000 max 99.000000 Name: rating_numerator, dtype: float64
Values seem more inline with expectations (most over 10 but not many 15 and over)
Values for rating_denominator are incorrect
Define
Create a function that identifies the value after the last / in the text and uses this in the rating_denominator
column.
Code
In [48]:
def find_denominator(x): # Ratings are associated with the last "/" slash = x.rfind("/") # Don't need to check for missing because original set only includes tweets with ratings # Expect denominator to be two digits try: denominator = x[slash + 1:slash + 3] return float(denominator) # Manage strange formatting except ValueError: return np.NaN
In [49]:
archive_clean.rating_denominator = archive_clean.text.apply(find_denominator)
Test
In [86]:
# Check all values are filled archive_clean.rating_denominator.isnull().sum()
Out[86]:
0
In [87]:
# Check range of values archive_clean.rating_denominator.describe()
Out[87]:
count 1971.000000 mean 10.203957 std 3.483537 min 7.000000 25% 10.000000 50% 10.000000 75% 10.000000 max 90.000000 Name: rating_denominator, dtype: float64
Most denominators are expected to be 10.
Erroneous datatypes
- By melting the
predictions
table, an additional erroneous data type was created in theprediction_order
column. - With the collapse of the columns in
archive
table to a singledog_type
column, an additional erroneous data type was created in the column.
Define
archive_clean
table:
- tweet_id: change to str
- timestamp: change to datetime
- dog_type: categorical
predictions_clean
table:
- tweet_id: change to str
- prediction_order: changet to categorical
Code
In [50]:
# Change tweet_id's archive_clean.tweet_id = archive_clean.tweet_id.astype(str) predictions_clean.tweet_id = predictions_clean.tweet_id.astype(str)
In [51]:
# Change timestamp archive_clean.timestamp = pd.to_datetime(archive_clean.timestamp)
In [52]:
# Change dog_type and prediction order archive_clean.dog_type = archive_clean.dog_type.astype("category") predictions_clean.prediction_order = predictions_clean.prediction_order.astype("category")
Test
In [116]:
archive_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1971 entries, 0 to 2355 Data columns (total 12 columns): tweet_id 1971 non-null object timestamp 1971 non-null datetime64[ns] source 1971 non-null object text 1971 non-null object expanded_urls 1971 non-null object rating_numerator 1971 non-null float64 rating_denominator 1971 non-null float64 name 1367 non-null object dog_type 322 non-null category retweet_count 1971 non-null float64 favorite_count 1971 non-null float64 followers_count 1971 non-null float64 dtypes: category(1), datetime64[ns](1), float64(5), object(5) memory usage: 266.9+ KB
In [117]:
predictions_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 5913 entries, 0 to 6224 Data columns (total 7 columns): tweet_id 5913 non-null object jpg_url 5913 non-null object img_num 5913 non-null int64 prediction_order 5913 non-null category prediction 5913 non-null object confidence 5913 non-null float64 dog 5913 non-null bool dtypes: bool(1), category(1), float64(1), int64(1), object(3) memory usage: 288.8+ KB
Save Cleaned Data
In [118]:
archive_clean.to_csv('twitter_archive_master.csv', index=False) predictions_clean.to_csv('predictions_master.csv', index=False)
Analyze and Visualize
In [54]:
archive = pd.read_csv('twitter_archive_master.csv') predictions = pd.read_csv('predictions_master.csv')
In [55]:
archive.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1971 entries, 0 to 1970 Data columns (total 12 columns): tweet_id 1971 non-null int64 timestamp 1971 non-null object source 1971 non-null object text 1971 non-null object expanded_urls 1971 non-null object rating_numerator 1971 non-null float64 rating_denominator 1971 non-null float64 name 1367 non-null object dog_type 322 non-null object retweet_count 1971 non-null float64 favorite_count 1971 non-null float64 followers_count 1971 non-null float64 dtypes: float64(5), int64(1), object(6) memory usage: 184.9+ KB
In [56]:
predictions.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5913 entries, 0 to 5912 Data columns (total 7 columns): tweet_id 5913 non-null int64 jpg_url 5913 non-null object img_num 5913 non-null int64 prediction_order 5913 non-null int64 prediction 5913 non-null object confidence 5913 non-null float64 dog 5913 non-null bool dtypes: bool(1), float64(1), int64(3), object(2) memory usage: 283.0+ KB
All of the types have been lost with the conversion to and from csv so I need to re-run those.In [57]:
# Change types archive.tweet_id = archive.tweet_id.astype(str) predictions.tweet_id = predictions.tweet_id.astype(str) archive.dog_type = archive.dog_type.astype("category") predictions.prediction_order = predictions.prediction_order.astype("category") archive.timestamp = pd.to_datetime(archive.timestamp)
In [125]:
pd.plotting.scatter_matrix(archive.iloc[:, 1:], figsize=(15, 15));

Retweet Counts
In [58]:
archive.retweet_count.describe()
Out[58]:
count 1971.000000 mean 2725.781329 std 4699.394366 min 13.000000 25% 608.500000 50% 1323.000000 75% 3127.500000 max 77141.000000 Name: retweet_count, dtype: float64
In [59]:
def set_my_palette(): sns.set() current_palette = sns.color_palette(my_palette) sns.set_palette(current_palette)
In [60]:
my_palette = ['#66b3ff', '#00cc99', '#ff6666', '#ffff66', '#8c66ff', '#66ffd9'] set_my_palette() archive.retweet_count.hist();

In [61]:
archive[archive.retweet_count <= 20000].retweet_count.hist();

In [62]:
archive[archive.retweet_count <= 2500].retweet_count.hist();

Favorites Count
In [63]:
archive.favorite_count.describe()
Out[63]:
count 1971.000000 mean 8880.210046 std 12590.668395 min 80.000000 25% 1938.500000 50% 4040.000000 75% 11164.500000 max 143023.000000 Name: favorite_count, dtype: float64
In [64]:
archive.favorite_count.hist();

In [65]:
archive[archive.favorite_count <= 40000].favorite_count.hist();

In [66]:
archive[archive.favorite_count <= 5000].favorite_count.hist();

Most Popular Names
In [67]:
archive.name.value_counts().head(10)
Out[67]:
Charlie 13 Oliver 11 Cooper 10 Lucy 9 Tucker 9 Daisy 8 Penny 8 Winston 8 Lola 7 Stanley 6 Name: name, dtype: int64
Over Time
Followers
In [68]:
plt.subplots(figsize=(15, 9)) plt.plot(archive.timestamp, archive.followers_count);

In [69]:
archive.followers_count.describe()
Out[69]:
count 1.971000e+03 mean 6.989401e+06 std 8.672666e+01 min 6.988654e+06 25% 6.989352e+06 50% 6.989386e+06 75% 6.989467e+06 max 6.989511e+06 Name: followers_count, dtype: float64
There are these strange spikes that don’t seem to make sense. Can probably subset to just remove them as they correct back to the original values. Want to keep above 6989200.In [70]:
follower_count = archive.query('followers_count > 6989200')
In [71]:
sns.set_context("talk") plt.subplots(figsize=(12, 8)) plt.plot(follower_count.timestamp, follower_count.followers_count) plt.ylim(6989200, 6989800) plt.title('Are We In Trouble?\n', fontsize=18, weight='bold') plt.xlabel('\nDate (YYYY-MM)', weight='bold') plt.ylabel('Number of Followers\n', weight='bold'); plt.savefig('in-trouble.png')

Retweets
In [72]:
sns.set_context() plt.subplots(figsize=(15, 9)) plt.plot(archive.timestamp, archive.retweet_count);

In [73]:
weekly_retweet = archive.groupby(pd.Grouper(key='timestamp', freq='1w'))['retweet_count'].sum()\ .reset_index().sort_values('timestamp')[:-1]
In [74]:
plt.subplots(figsize=(15, 9)) plt.plot(weekly_retweet.timestamp, weekly_retweet.retweet_count);

Resources:
Favorites
In [75]:
plt.subplots(figsize=(15, 9)) plt.plot(archive.timestamp, archive.favorite_count);

In [76]:
weekly_favorite = archive.groupby(pd.Grouper(key='timestamp', freq='1w'))['favorite_count'].sum()\ .reset_index().sort_values('timestamp')[:-1]
In [77]:
plt.subplots(figsize=(15, 9)) plt.plot(weekly_favorite.timestamp, weekly_favorite.favorite_count);

In [78]:
sns.set_context("talk") plt.subplots(figsize=(14, 9)) plt.plot(weekly_retweet.timestamp, weekly_retweet.retweet_count, label="Weekly Retweets") plt.plot(weekly_favorite.timestamp, weekly_favorite.favorite_count, label="Weekly Favorites") plt.title('Their Love Increases\n', fontsize=18, weight='bold') plt.xlabel('\nDate (YYYY-MM)', weight='bold') plt.ylabel('Count\n', weight='bold') plt.legend(); plt.savefig('love-increases.png')

Dog Types
In [79]:
dog_counts = archive.groupby('dog_type')['tweet_id'].count() dog_counts
Out[79]:
dog_type doggo 66 floofer 3 pupper 225 puppo 28 Name: tweet_id, dtype: int64
In [80]:
sns.set_context("talk") plt.subplots(figsize=(12, 6)) plt.bar([1, 2, 3, 4], dog_counts, tick_label=['doggo', 'floofer', 'pupper', 'puppo']) plt.title('Favorite Dogs?\n', fontsize=18, weight='bold') plt.xlabel('\nDog Type', weight='bold') plt.ylabel('Count\n', weight='bold'); plt.savefig('favorite-dogs.png')

In [81]:
# Set outlier style flierprops = dict(marker='o', alpha=0.5, markeredgewidth=1) plt.subplots(figsize=(14, 8)) plt.subplot(121) sns.boxplot(x=archive.dog_type, y=archive.retweet_count, flierprops=flierprops, linewidth=1.5) plt.title('Retweets\n', fontsize=18, weight='bold') plt.xlabel('\nDog Type', weight='bold') plt.ylabel('Count\n', weight='bold'); plt.subplot(122) sns.boxplot(x=archive.dog_type, y=archive.favorite_count, flierprops=flierprops, linewidth=1.5) plt.title('Favorites\n', fontsize=18, weight='bold') plt.xlabel('\nDog Type', weight='bold') plt.ylabel('');plt.savefig('boxplot.png')

Resources:
Highest Rated
Retweet
In [85]:
# Get index ind = archive.retweet_count.nlargest(5).index # Get details high_retweet = archive[['tweet_id', 'text', 'name', 'retweet_count', 'favorite_count', 'rating_numerator', 'rating_denominator', 'dog_type']].iloc[ind] high_retweet
Out[85]:
tweet_id | text | name | retweet_count | favorite_count | rating_numerator | rating_denominator | dog_type | |
---|---|---|---|---|---|---|---|---|
769 | 744234799360020481 | Here’s a doggo realizing you can stand in a po… | NaN | 77141.0 | 127911.0 | 13.0 | 10.0 | doggo |
397 | 807106840509214720 | This is Stephan. He just wants to help. 13/10 … | Stephan | 60929.0 | 122715.0 | 13.0 | 10.0 | NaN |
804 | 739238157791694849 | Here’s a doggo blowing bubbles. It’s downright… | NaN | 50727.0 | 73148.0 | 13.0 | 10.0 | doggo |
306 | 822872901745569793 | Here’s a super supportive puppo participating … | NaN | 48967.0 | 143023.0 | 13.0 | 10.0 | puppo |
58 | 879415818425184262 | This is Duddles. He did an attempt. 13/10 some… | Duddles | 44476.0 | 105712.0 | 13.0 | 10.0 | NaN |
Two names, two not. All 13/10. Two doggo’s, one puppo.In [34]:
high_retweet.describe()
Out[34]:
retweet_count | favorite_count | rating_numerator | rating_denominator | |
---|---|---|---|---|
count | 5.000000 | 5.000000 | 5.0 | 5.0 |
mean | 56448.000000 | 114501.800000 | 13.0 | 10.0 |
std | 13041.315079 | 26683.887961 | 0.0 | 0.0 |
min | 44476.000000 | 73148.000000 | 13.0 | 10.0 |
25% | 48967.000000 | 105712.000000 | 13.0 | 10.0 |
50% | 50727.000000 | 122715.000000 | 13.0 | 10.0 |
75% | 60929.000000 | 127911.000000 | 13.0 | 10.0 |
max | 77141.000000 | 143023.000000 | 13.0 | 10.0 |
Get image urls from predictions.In [56]:
image = predictions[predictions.tweet_id == '744234799360020481']['jpg_url'] dups = image.duplicated() image = image[~dups] image.values[0]
Out[56]:
'https://pbs.twimg.com/ext_tw_video_thumb/744234667679821824/pu/img/1GaWmtJtdqzZV7jy.jpg'
In [58]:
url_list = [] for tweet_id in high_retweet.tweet_id: image = predictions[predictions.tweet_id == tweet_id]['jpg_url'] dups = image.duplicated() image = image[~dups] image_url = image.values[0] url_list.append(image_url) url_list
Out[58]:
['https://pbs.twimg.com/ext_tw_video_thumb/744234667679821824/pu/img/1GaWmtJtdqzZV7jy.jpg', 'https://pbs.twimg.com/ext_tw_video_thumb/807106774843039744/pu/img/8XZg1xW35Xp2J6JW.jpg', 'https://pbs.twimg.com/ext_tw_video_thumb/739238016737267712/pu/img/-tLpyiuIzD5zR1et.jpg', 'https://pbs.twimg.com/media/C2tugXLXgAArJO4.jpg', 'https://pbs.twimg.com/ext_tw_video_thumb/879415784908390401/pu/img/cX7XI1TnUsseGET5.jpg']
In [144]:
Image(url= url_list[0], width=150, height=150)
Out[144]:

In [75]:
print(high_retweet.text.loc[ind[0]])
Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad)
In [68]:
Image(url= url_list[1], width=250, height=250)
Out[68]:

In [76]:
print(high_retweet.text.loc[ind[1]])
This is Stephan. He just wants to help. 13/10 such a good boy
In [69]:
Image(url= url_list[2], width=300, height=300)
Out[69]:

In [77]:
print(high_retweet.text.loc[ind[2]])
Here's a doggo blowing bubbles. It's downright legendary. 13/10 would watch on repeat forever (vid by Kent Duryee)
In [78]:
Image(url= url_list[3], width=300, height=300)
Out[78]:

In [79]:
print(high_retweet.text.loc[ind[3]])
Here's a super supportive puppo participating in the Toronto #WomensMarch today. 13/10
In [80]:
Image(url= url_list[4], width=300, height=300)
Out[80]:

In [87]:
print(high_retweet.text.loc[ind[4]])
This is Duddles. He did an attempt. 13/10 someone help him (vid by Georgia Felici)
Almost all of the highest retweets have videos.
Favorite
In [88]:
# Get index ind = archive.favorite_count.nlargest(5).index # Get details high_favorite = archive[['tweet_id', 'text', 'name', 'retweet_count', 'favorite_count', 'rating_numerator', 'rating_denominator', 'dog_type']].iloc[ind] high_favorite
Out[88]:
tweet_id | text | name | retweet_count | favorite_count | rating_numerator | rating_denominator | dog_type | |
---|---|---|---|---|---|---|---|---|
306 | 822872901745569793 | Here’s a super supportive puppo participating … | NaN | 48967.0 | 143023.0 | 13.0 | 10.0 | puppo |
769 | 744234799360020481 | Here’s a doggo realizing you can stand in a po… | NaN | 77141.0 | 127911.0 | 13.0 | 10.0 | doggo |
108 | 866450705531457537 | This is Jamesy. He gives a kiss to every other… | Jamesy | 36296.0 | 124101.0 | 13.0 | 10.0 | pupper |
397 | 807106840509214720 | This is Stephan. He just wants to help. 13/10 … | Stephan | 60929.0 | 122715.0 | 13.0 | 10.0 | NaN |
58 | 879415818425184262 | This is Duddles. He did an attempt. 13/10 some… | Duddles | 44476.0 | 105712.0 | 13.0 | 10.0 | NaN |
In [89]:
high_favorite.tweet_id.isin(high_retweet.tweet_id)
Out[89]:
306 True 769 True 108 False 397 True 58 True Name: tweet_id, dtype: bool
Only one isn’t sharedIn [91]:
image = predictions[predictions.tweet_id == high_favorite.tweet_id.loc[108]]['jpg_url'] dups = image.duplicated() image = image[~dups] image_url = image.values[0]
In [92]:
Image(url= image_url, width=300, height=300)
Out[92]:

In [96]:
print(high_favorite.text.loc[108])
This is Jamesy. He gives a kiss to every other pupper he sees on his walk. 13/10 such passion, much tender
Rating to Retweet or Favorite
In [82]:
sns.set_context() plt.scatter(archive.rating_numerator, archive.retweet_count);

Rating under 17 and log transform for retweetsIn [83]:
ratings_df = archive.query('rating_numerator <= 17').copy() ratings_df.retweet_count = ratings_df.retweet_count.transform(lambda x: np.log10(x)) ratings_df.favorite_count = ratings_df.favorite_count.transform(lambda x: np.log10(x))
In [84]:
sns.set_context("talk") plt.subplots(figsize=(14, 8)) plt.subplot(121) sns.regplot(x='rating_numerator', y='retweet_count', data=ratings_df, fit_reg=False, x_jitter=0.25, scatter_kws={'alpha': 0.2, 's': 30}, color=my_palette[1]) plt.title('Retweets\n', fontsize=18, weight='bold') plt.xlabel('\nNumerator', weight='bold') plt.ylabel('Count (log10)', weight='bold'); plt.subplot(122) sns.regplot(x='rating_numerator', y='favorite_count', data=ratings_df, fit_reg=False, x_jitter=0.25, scatter_kws={'alpha': 0.2, 's': 30}, color=my_palette[-2]) plt.title('Favorites\n', fontsize=18, weight='bold') plt.xlabel('\nNumerator', weight='bold') plt.ylabel('', weight='bold'); plt.savefig('ratings.png')

Resources:
Predictions
How Confident
In [156]:
confidence = predictions.groupby('prediction_order')['confidence']
In [150]:
confidence.mean()
Out[150]:
prediction_order 1 0.594558 2 0.134585 3 0.060166 Name: confidence, dtype: float64
In [151]:
confidence.median()
Out[151]:
prediction_order 1 0.587764 2 0.117397 3 0.049444 Name: confidence, dtype: float64
In [152]:
confidence.std()
Out[152]:
prediction_order 1 0.272126 2 0.101053 3 0.050942 Name: confidence, dtype: float64
In [153]:
confidence.mean() - confidence.std()
Out[153]:
prediction_order 1 0.322431 2 0.033532 3 0.009224 Name: confidence, dtype: float64
In [154]:
confidence.mean() + confidence.std()
Out[154]:
prediction_order 1 0.866684 2 0.235638 3 0.111107 Name: confidence, dtype: float64
In [92]:
sns.FacetGrid(predictions, col="prediction_order", hue="prediction_order", palette=my_palette[3:], size=4)\ .map(plt.hist, "confidence")\ .set_titles("Prediction {col_name}\n", weight='bold', fontsize=14)\ .set_axis_labels("\nConfidence Rating", "Count\n"); plt.savefig('confidence.png')

Samples
In [189]:
samples = predictions.query('prediction_order == 1').sample(5) samples
Out[189]:
tweet_id | jpg_url | img_num | prediction_order | prediction | confidence | dog | |
---|---|---|---|---|---|---|---|
5106 | 829141528400556032 | https://pbs.twimg.com/media/C4GzztSWAAA_qi4.jpg | 2 | 1 | golden_retriever | 0.573140 | True |
5763 | 881536004380872706 | https://pbs.twimg.com/ext_tw_video_thumb/88153… | 1 | 1 | Samoyed | 0.281463 | True |
5112 | 829449946868879360 | https://pbs.twimg.com/media/C4LMUf8WYAkWz4I.jpg | 1 | 1 | Labrador_retriever | 0.315163 | True |
1233 | 674014384960745472 | https://pbs.twimg.com/media/CVqUgTIUAAUA8Jr.jpg | 1 | 1 | Pembroke | 0.742320 | True |
753 | 670733412878163972 | https://pbs.twimg.com/media/CU7seitWwAArlVy.jpg | 1 | 1 | dhole | 0.350416 | False |
First
In [190]:
Image(url=samples.jpg_url.iloc[0], width=300, height=300)
Out[190]:

In [191]:
predictions[predictions.tweet_id == samples.tweet_id.iloc[0]]
Out[191]:
tweet_id | jpg_url | img_num | prediction_order | prediction | confidence | dog | |
---|---|---|---|---|---|---|---|
5106 | 829141528400556032 | https://pbs.twimg.com/media/C4GzztSWAAA_qi4.jpg | 2 | 1 | golden_retriever | 0.573140 | True |
5107 | 829141528400556032 | https://pbs.twimg.com/media/C4GzztSWAAA_qi4.jpg | 2 | 2 | cocker_spaniel | 0.111159 | True |
5108 | 829141528400556032 | https://pbs.twimg.com/media/C4GzztSWAAA_qi4.jpg | 2 | 3 | gibbon | 0.094127 | False |
Spot on!
Second
In [192]:
Image(url=samples.jpg_url.iloc[1], width=300, height=300)
Out[192]:

In [193]:
predictions[predictions.tweet_id == samples.tweet_id.iloc[1]]
Out[193]:
tweet_id | jpg_url | img_num | prediction_order | prediction | confidence | dog | |
---|---|---|---|---|---|---|---|
5763 | 881536004380872706 | https://pbs.twimg.com/ext_tw_video_thumb/88153… | 1 | 1 | Samoyed | 0.281463 | True |
5764 | 881536004380872706 | https://pbs.twimg.com/ext_tw_video_thumb/88153… | 1 | 2 | Angora | 0.272066 | False |
5765 | 881536004380872706 | https://pbs.twimg.com/ext_tw_video_thumb/88153… | 1 | 3 | Persian_cat | 0.114854 | False |
Nice! of a pup’s behind!
Third
In [194]:
Image(url=samples.jpg_url.iloc[2], width=300, height=300)
Out[194]:

In [195]:
predictions[predictions.tweet_id == samples.tweet_id.iloc[2]]
Out[195]:
tweet_id | jpg_url | img_num | prediction_order | prediction | confidence | dog | |
---|---|---|---|---|---|---|---|
5112 | 829449946868879360 | https://pbs.twimg.com/media/C4LMUf8WYAkWz4I.jpg | 1 | 1 | Labrador_retriever | 0.315163 | True |
5113 | 829449946868879360 | https://pbs.twimg.com/media/C4LMUf8WYAkWz4I.jpg | 1 | 2 | golden_retriever | 0.153210 | True |
5114 | 829449946868879360 | https://pbs.twimg.com/media/C4LMUf8WYAkWz4I.jpg | 1 | 3 | Pekinese | 0.132791 | True |
Damn, even with a hat!
Fourth
In [196]:
Image(url=samples.jpg_url.iloc[3], width=300, height=300)
Out[196]:

In [197]:
predictions[predictions.tweet_id == samples.tweet_id.iloc[3]]
Out[197]:
tweet_id | jpg_url | img_num | prediction_order | prediction | confidence | dog | |
---|---|---|---|---|---|---|---|
1233 | 674014384960745472 | https://pbs.twimg.com/media/CVqUgTIUAAUA8Jr.jpg | 1 | 1 | Pembroke | 0.742320 | True |
1234 | 674014384960745472 | https://pbs.twimg.com/media/CVqUgTIUAAUA8Jr.jpg | 1 | 2 | Cardigan | 0.084937 | True |
1235 | 674014384960745472 | https://pbs.twimg.com/media/CVqUgTIUAAUA8Jr.jpg | 1 | 3 | Eskimo_dog | 0.068321 | True |
Not just corgi, PEMBROKE corgi.
Fifth
In [198]:
Image(url=samples.jpg_url.iloc[4], width=300, height=300)
Out[198]:

In [199]:
predictions[predictions.tweet_id == samples.tweet_id.iloc[4]]
Out[199]:
tweet_id | jpg_url | img_num | prediction_order | prediction | confidence | dog | |
---|---|---|---|---|---|---|---|
753 | 670733412878163972 | https://pbs.twimg.com/media/CU7seitWwAArlVy.jpg | 1 | 1 | dhole | 0.350416 | False |
754 | 670733412878163972 | https://pbs.twimg.com/media/CU7seitWwAArlVy.jpg | 1 | 2 | hare | 0.236661 | False |
755 | 670733412878163972 | https://pbs.twimg.com/media/CU7seitWwAArlVy.jpg | 1 | 3 | wood_rabbit | 0.091133 | False |
Squirrel…
So not quite in the end, but definitely in the right vicinity.