Project 2 - Experiment Results

Part I – Probability
Part II – A/B Test
Part III – Regression

Introduction

A/B tests are very commonly performed by data analysts and data scientists. For this project, we will be working to understand the results of an A/B test run by an e-commerce website. Our goal is to work through this notebook to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision.

Part I – Probability

To get started, let’s import our libraries.

In [1]:

import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline

random.seed(42)

Question 1. Now, read in the ab_data.csv data. Store it in df.

  a. Read in the dataset and take a look at the top few rows here:

In [2]:

df = pd.read_csv('ab_data.csv')
df.head()

Out[2]:

	user_id	timestamp	group	landing_page	converted
0	851104	2017-01-21 22:11:48.556739	control	old_page	0
1	804228	2017-01-12 08:01:45.159739	control	old_page	0
2	661590	2017-01-11 16:55:06.154213	treatment	new_page	0
3	853541	2017-01-08 18:28:03.143765	treatment	new_page	0
4	864975	2017-01-21 01:52:26.210827	control	old_page	1

b. Use the below cell to find the number of rows in the dataset.

In [3]:

df.shape[0]

Out[3]:

c. The number of unique users in the dataset.

In [4]:

df.user_id.nunique()

Out[4]:

d. The proportion of users converted.

In [5]:

df.converted.mean()*100

Out[5]:

11.96591935560551

e. The number of times the new_page and treatment don't line up.

The new_page and treatment don’t line up when other combinations are provided with each. First lets check the unique combinations:In [6]:

df.group.unique(), df.landing_page.unique()

Out[6]:

(array(['control', 'treatment'], dtype=object),
 array(['old_page', 'new_page'], dtype=object))

Accordingly, there are two unique values which are treatment and control for groups and old_page and new_page for landing_page. Hence, we need to find the number of rows where treatment was aligned with old_page and control was aligned with new page as follows:In [7]:

treat_old = df.query("group == 'treatment' and landing_page == 'old_page'").shape[0]
control_new = df.query("group == 'control' and landing_page == 'new_page'").shape[0]
misalignment = treat_old + control_new
misalignment

Out[7]:

f. Do any of the rows have missing values?

In [8]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
user_id         294478 non-null int64
timestamp       294478 non-null object
group           294478 non-null object
landing_page    294478 non-null object
converted       294478 non-null int64
dtypes: int64(2), object(3)
memory usage: 11.2+ MB

Question 2. For the rows where treatment is not aligned with new_page or control is not aligned with old_page, we cannot be sure if this row truly received the new or old page.

A. Using the answer from the previous exercise, we will create a new dataset that meets the specifications then store our new dataframe in df2.In [9]:

df.drop(df.query("group == 'treatment' and landing_page == 'old_page'").index, inplace=True)
df.drop(df.query("group == 'control' and landing_page == 'new_page'").index, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 290585 entries, 0 to 294477
Data columns (total 5 columns):
user_id         290585 non-null int64
timestamp       290585 non-null object
group           290585 non-null object
landing_page    290585 non-null object
converted       290585 non-null int64
dtypes: int64(2), object(3)
memory usage: 13.3+ MB

In [10]:

df.to_csv('ab_data2.csv', index=False)

In [11]:

df2 = pd.read_csv('ab_data2.csv')

We can confirm all correct rows were removed if 0 is generated in the following:In [12]:

df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]

Out[12]:

Question 3. Use df2 and the cells below to answer questions for the classroom.

a. How many unique user_ids are in df2?

In [13]:

df2.user_id.nunique()

Out[13]:

b. There is one user_id repeated in df2. What is it?

In [14]:

df2[df2.user_id.duplicated(keep=False)].user_id

Out[14]:

1876    773192
2862    773192
Name: user_id, dtype: int64

c. What is the row information for the repeat user_id?

In [15]:

df2[df2.user_id.duplicated(keep=False)]

Out[15]:

	user_id	timestamp	group	landing_page	converted
1876	773192	2017-01-09 05:37:58.781806	treatment	new_page	0
2862	773192	2017-01-14 02:55:59.590927	treatment	new_page	0

d. Remove one of the rows with a duplicate user_id, but keep your dataframe as df2.

In [16]:

df2.drop_duplicates('user_id', inplace=True)
df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 290584 entries, 0 to 290584
Data columns (total 5 columns):
user_id         290584 non-null int64
timestamp       290584 non-null object
group           290584 non-null object
landing_page    290584 non-null object
converted       290584 non-null int64
dtypes: int64(2), object(3)
memory usage: 13.3+ MB

a. What is the probability of an individual converting regardless of the page they receive?

In [17]:

df2.converted.mean()

Out[17]:

0.11959708724499628

b. Given that an individual was in the control group, what is the probability they converted?

In [18]:

df2.query("group == 'control'").converted.mean()

Out[18]:

0.1203863045004612

c. Given that an individual was in the treatment group, what is the probability they converted?

In [19]:

df2.query("group == 'treatment'").converted.mean()

Out[19]:

0.11880806551510564

d. What is the probability that an individual received the new page?

In [20]:

df2.query("landing_page == 'new_page'").shape[0] / df2.landing_page.shape[0]

Out[20]:

0.5000619442226688

e. Use the results in the previous two portions of this question to suggest if you think there is evidence that one page leads to more conversions?

From the above data, we can see that the number of individuals who converted from either group is almost identical which was equivalent to 12% of each group. Hence, there is no concrete evidence suggesting that those who explore either page will neccessary lead to more conversions.

Part II – A/B Test

Notice that because of the time stamp associated with each event, you could technically run a hypothesis test continuously as each observation was observed.

However, then the hard question is do you stop as soon as one page is considered significantly better than another or does it need to happen consistently for a certain amount of time? How long do you run to render a decision that neither page is better than another?

These questions are the difficult parts associated with A/B tests in general.

Question 1. For now, consider you need to make the decision just based on all the data provided. If you want to assume that the old page is better unless the new page proves to be definitely better at a Type I error rate of 5%, what should your null and alternative hypotheses be?

H1:pold−pnew≥0H1:pold−pnew≥0

poldpold and pnewpnew are the converted rates for the old and new pages respectively.

Question 2. Assume under the null hypothesis, pnewpnew and poldpold both have “true” success rates equal to the converted success rate regardless of page – that is pnewpnew and poldpold are equal. Furthermore, assume they are equal to the converted rate in ab_data.csv regardless of the page.

Use a sample size for each page equal to the ones in ab_data.csv.

Perform the sampling distribution for the difference in converted between the two pages over 10,000 iterations of calculating an estimate from the null.

Use the cells below to provide the necessary parts of this simulation.

a. What is the convert rate for $p_{new}$ under the null?

In [21]:

p_new = df2.converted.mean()
p_new

Out[21]:

0.11959708724499628

b. What is the convert rate for $p_{old}$ under the null?

In [22]:

p_old = df2.converted.mean()
p_old

Out[22]:

0.11959708724499628

c. What is $n_{new}$?

In [23]:

n_new = df2.query("group == 'treatment'").shape[0]
n_new

Out[23]:

d. What is $n_{old}$?

In [24]:

n_old = df2.query("group == 'control'").shape[0]
n_old

Out[24]:

e. Simulate $n_{new}$ transactions with a convert rate of $p_{new}$ under the null. Store these $n_{new}$ 1's and 0's in new_page_converted.

In [25]:

new_page_converted = np.random.choice([0, 1], size = n_new, p = [p_new, 1 - p_new])

f. Simulate $n_{old}$ transactions with a convert rate of $p_{old}$ under the null. Store these $n_{old}$ 1's and 0's in old_page_converted.

In [26]:

old_page_converted = np.random.choice([0, 1], size = n_old, p = [p_old, 1 - p_old])

g. Find $p_{new}$ - $p_{old}$ for your simulated values from part (e) and (f).

In [27]:

p_diff = new_page_converted.mean() - old_page_converted.mean()

p_diff

Out[27]:

0.001536831987102194

An error occurs when deducing the difference of pnewpnew – poldpold due to mismatch in sizes; hence, a difference in means is calculated instead.

h. Simulate 10,000 $p_{new}$ - $p_{old}$ values using this same process similarly to the one you calculated in parts a. through g. above. Store all 10,000 values in p_diffs.

In [28]:

p_diffs = []

for _ in range(10000):
    new_page_converted = np.random.choice([0, 1], size = n_new, p = [p_new, 1 - p_new]).mean()
    old_page_converted = np.random.choice([0, 1], size = n_old, p = [p_old, 1 - p_old]).mean()
    p_diffs.append(new_page_converted - old_page_converted)

i. Plot a histogram of the p_diffs. Does this plot look like what you expected? Use the matching problem in the classroom to assure you fully understand what was computed here.

In [29]:

plt.hist(p_diffs);
plt.ylabel('# of Simulations')
plt.xlabel('p_diffs')
plt.title('Plot of 10,000 Simulated p_diffs');

j. What proportion of the p_diffs are greater than the actual difference observed in ab_data.csv?

First, we need to convert p_diffs into a numpy array as follows:In [30]:

p_diffs = np.array(p_diffs)
p_diffs

Out[30]:

array([-0.00028686,  0.00142686,  0.00234903, ...,  0.00115874,
        0.0010417 , -0.00054158])

Next, we need to compute the actual difference observed in the csv dataset as follows:In [31]:

act_diffs = df2.query('group == "treatment"').converted.mean() - df2.query('group == "control"').converted.mean()
act_diffs

Out[31]:

-0.0015782389853555567

Finally, we can compute the proportion of p_diffs greater than act_diffIn [32]:

(p_diffs > act_diffs).mean()

Out[32]:

0.9003

k. In words, explain what you just computed in part

j. What is this value called in scientific studies? What does this value mean in terms of whether or not there is a difference between the new and old pages? 

In the previous part, we were calculating the p-value which is the probability of getting our statistic or a more extreme value if the null is true. 

Having a large p-value goes on to say that the statistic is more likely to come from our null hypothesis; hence, there is no statistical evidence to reject the null hypothesis which states that old pages are the same or slightly better than the new pages.

l. We could also use a built-in to achieve similar results. Though using the built-in might be easier to code, the above portions are a walkthrough of the ideas that are critical to correctly thinking about statistical significance. Let n_old and n_new refer the the number of rows associated with the old page and new pages, respectively.

In [33]:

import statsmodels.api as sm

convert_old = df2.query('group == "control"').converted.sum()
convert_new = df2.query('group == "treatment"').converted.sum()
n_old = df2.query("landing_page == 'old_page'").shape[0]
n_new = df2.query("landing_page == 'new_page'").shape[0]

m. Now use stats.proportions_ztest to compute your test statistic and p-value. Here is a helpful link on using the built in.

In [34]:

z_score, p_value = sm.stats.proportions_ztest([convert_old, convert_new], [n_old, n_new], alternative='smaller')
z_score, p_value

Out[34]:

(1.3109241984234394, 0.9050583127590245)

Next we import the norm function to compute the significance of our z-score.In [35]:

from scipy.stats import norm

norm.cdf(z_score)

Out[35]:

0.9050583127590245

Next we check our critical value at 95% confidence interval.In [36]:

norm.ppf(1-(0.05/2))

Out[36]:

1.959963984540054

n. What do the z-score and p-value computed in the previous question mean for the conversion rates of the old and new pages? Do they agree with the findings in parts j. and k.?

Results above deduced a z_score of 1.31. Since this value does not exceed the critical value at 95% confidence interval (1.96); there is no statistical evidence to reject the null hypothesis. Furthermore, p-value obtained is similar to the result obtained from our previous findings in j. and k. which also fails to reject the null hypothesis as it provides evidence of a higher probability of the null hypothesis

Part III – A regression approach

In the final part, you will see that the result acheived in the previous A/B test can also be acheived by performing regression.

a. Since each row is either a conversion or no conversion, what type of regression should you be performing in this case?

We are studying rows with either conversions or no conversions which predicts a probability between 0 and 1. Accordingly, may be used.

b. The goal is to use statsmodels to fit the regression model you specified in part a. to see if there is a significant difference in conversion based on which page a customer receives. However, first we need to create a column for the intercept, and create a dummy variable column for which page each user received.

We will add an intercept column, as well as an ab_page column, which is 1 when an individual receives the treatment and 0 if control.

In [37]:

df2['intercept'] = 1
df2[['ab_page2', 'ab_page']] = pd.get_dummies(df2['group'])
df2 = df2.drop('ab_page2', axis = 1)
df2.head()

Out[37]:

	user_id	timestamp	group	landing_page	converted	intercept	ab_page
0	851104	2017-01-21 22:11:48.556739	control	old_page	0	1	0
1	804228	2017-01-12 08:01:45.159739	control	old_page	0	1	0
2	661590	2017-01-11 16:55:06.154213	treatment	new_page	0	1	1
3	853541	2017-01-08 18:28:03.143765	treatment	new_page	0	1	1
4	864975	2017-01-21 01:52:26.210827	control	old_page	1	1	0

c. Use statsmodels to import the regression model. Instantiate the model, and fit the model using the two columns created in part b. to predict whether or not an individual converts.

In [38]:

log_mod = sm.Logit(df2['converted'], df2[['intercept', 'ab_page']])

d. Provide the summary of your model below, and use it as necessary to answer the following questions.

In [39]:

results = log_mod.fit()
results.summary()

Optimization terminated successfully.
         Current function value: 0.366118
         Iterations 6

Out[39]:

Dep. Variable:	converted	No. Observations:	290584
Model:	Logit	Df Residuals:	290582
Method:	MLE	Df Model:	1
Date:	Sat, 24 Aug 2019	Pseudo R-squ.:	8.077e-06
Time:	18:26:59	Log-Likelihood:	-1.0639e+05
converged:	True	LL-Null:	-1.0639e+05
Covariance Type:	nonrobust	LLR p-value:	0.1899

	coef	std err	z	P>\|z\|	[0.025	0.975]
intercept	-1.9888	0.008	-246.669	0.000	-2.005	-1.973
ab_page	-0.0150	0.011	-1.311	0.190	-0.037	0.007

e. What is the p-value associated with ab_page? Why does it differ from the value you found in the Part II?

The p-value associated with ab_page was 0.19 which was significantly lower than the one in Part II which was approximately 0.9. The reason for such a significant difference is because the null and alternative hypothesis differed in each exercise.H0:pold−pnew≥0H0:pold−pnew≥0H1:pold−pnew<0H1:pold−pnew<0H0:pold=pnewH0:pold=pnewH1:pold≠pnewH1:pold≠pnew

poldpold and pnewpnew are the converted rates for the old and new pages respectively.

Because the later case relies solely on two possible outcomes, it may be the reason that it yields a lower probability in the null hypothesis than that in the case in Part II of the earlier exercise.

f. Now, we shall consider other things that might influence whether or not an individual converts. The below section discusses why it is a good idea to consider other factors to add into your regression model along with disadvantages to adding additional terms into your regression model?

Allows for a more sophisticated model to distinguish other factors which may contribute to the outcome.
May be used to identify outliers.
May produce inaccurate results due to correlated errors.

g. Now along with testing if the conversion rate changes for different pages, we will also add an effect based on which country a user lives. First, we will need to read in the countries.csv dataset and merge together both datasets on the approporiate rows. Here are the docs for joining tables.

Does it appear that country had an impact on conversion? We will provide the statistical output as well as a written response to answer this question.In [40]:

countries_df = pd.read_csv('./countries.csv')
df_new = countries_df.set_index('user_id').join(df2.set_index('user_id'), how='inner')
df_new.head()

Out[40]:

	country	timestamp	group	landing_page	converted	intercept	ab_page
user_id
834778	UK	2017-01-14 23:08:43.304998	control	old_page	0	1	0
928468	US	2017-01-23 14:44:16.387854	treatment	new_page	0	1	1
822059	UK	2017-01-16 14:04:14.719771	treatment	new_page	1	1	1
711597	UK	2017-01-22 03:14:24.763511	control	old_page	0	1	0
710616	UK	2017-01-16 13:14:44.000513	treatment	new_page	0	1	1

Check number of unique rows under country column:In [41]:

df_new.country.unique()

Out[41]:

array(['UK', 'US', 'CA'], dtype=object)

Considering there are three dummy variables, we will need to include two columns.In [42]:

df_new[['UK', 'US']] = pd.get_dummies(df_new['country'])[['UK','US']]
df_new.head()

Out[42]:

	country	timestamp	group	landing_page	converted	intercept	ab_page	UK	US
user_id
834778	UK	2017-01-14 23:08:43.304998	control	old_page	0	1	0	1	0
928468	US	2017-01-23 14:44:16.387854	treatment	new_page	0	1	1	0	1
822059	UK	2017-01-16 14:04:14.719771	treatment	new_page	1	1	1	1	0
711597	UK	2017-01-22 03:14:24.763511	control	old_page	0	1	0	1	0
710616	UK	2017-01-16 13:14:44.000513	treatment	new_page	0	1	1	1	0

Computing the statistical output:In [43]:

log_mod = sm.Logit(df_new['converted'], df_new[['intercept', 'UK', 'US']])

In [44]:

results = log_mod.fit()
results.summary()

Optimization terminated successfully.
         Current function value: 0.366116
         Iterations 6

Out[44]:

Dep. Variable:	converted	No. Observations:	290584
Model:	Logit	Df Residuals:	290581
Method:	MLE	Df Model:	2
Date:	Sat, 24 Aug 2019	Pseudo R-squ.:	1.521e-05
Time:	18:27:00	Log-Likelihood:	-1.0639e+05
converged:	True	LL-Null:	-1.0639e+05
Covariance Type:	nonrobust	LLR p-value:	0.1984

	coef	std err	z	P>\|z\|	[0.025	0.975]
intercept	-2.0375	0.026	-78.364	0.000	-2.088	-1.987
UK	0.0507	0.028	1.786	0.074	-0.005	0.106
US	0.0408	0.027	1.518	0.129	-0.012	0.093

According to our statistical output the p-value for both countries yields a value larger than 0.05; hence, there is no statistical evidence on country’s significant impact on conversion.

h. Though you have now looked at the individual factors of country and page on conversion, we would now like to look at an interaction between page and country to see if there significant effects on conversion.

We will create the necessary additional columns, and fit the new model then provide the summary results, and conclusions based on the results.

Pages column is already included as per exercise in part b); hence, model may be made similar to previous part while including pages column.In [45]:

log_mod = sm.Logit(df_new['converted'], df_new[['intercept', 'UK', 'US', 'ab_page']])

In [46]:

results = log_mod.fit()
results.summary()

Optimization terminated successfully.
         Current function value: 0.366113
         Iterations 6

Out[46]:

Dep. Variable:	converted	No. Observations:	290584
Model:	Logit	Df Residuals:	290580
Method:	MLE	Df Model:	3
Date:	Sat, 24 Aug 2019	Pseudo R-squ.:	2.323e-05
Time:	18:27:01	Log-Likelihood:	-1.0639e+05
converged:	True	LL-Null:	-1.0639e+05
Covariance Type:	nonrobust	LLR p-value:	0.1760

	coef	std err	z	P>\|z\|	[0.025	0.975]
intercept	-2.0300	0.027	-76.249	0.000	-2.082	-1.978
UK	0.0506	0.028	1.784	0.074	-0.005	0.106
US	0.0408	0.027	1.516	0.130	-0.012	0.093
ab_page	-0.0149	0.011	-1.307	0.191	-0.037	0.007

According to results above, even after adding there does not seem to be any statistical evidence to indicate an impact on the conversion since p-values were all exceeding 0.05.In [47]:

from subprocess import call
call(['python', '-m', 'nbconvert', 'Project_2_-_Analyze_Experiment_Results.ipynb'])

Out[47]:

4294967295

Project 2 – Experiment Results

Write A Comment Cancel Reply

Algorithmic Forex Mastery: Building A Trading Bot and Analytical Tools

Safer Gambling Analytics Part 2 – Session Aggregation

Safer Gambling Analytics Part 1 – Declined Deposits

Kaggle Marketing Analysis – Power BI Dashboard