Table of Contents

  • Part I – Probability
  • Part II – A/B Test
  • Part III – Regression


A/B tests are very commonly performed by data analysts and data scientists. For this project, we will be working to understand the results of an A/B test run by an e-commerce website. Our goal is to work through this notebook to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision.

Part I – Probability

To get started, let’s import our libraries.

In [1]:

import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline


Question 1. Now, read in the ab_data.csv data. Store it in df.

  a. Read in the dataset and take a look at the top few rows here:

In [2]:

df = pd.read_csv('ab_data.csv')


08511042017-01-21 22:11:48.556739controlold_page0
18042282017-01-12 08:01:45.159739controlold_page0
26615902017-01-11 16:55:06.154213treatmentnew_page0
38535412017-01-08 18:28:03.143765treatmentnew_page0
48649752017-01-21 01:52:26.210827controlold_page1
b. Use the below cell to find the number of rows in the dataset.

In [3]:



c. The number of unique users in the dataset.

In [4]:



d. The proportion of users converted.

In [5]:



e. The number of times the new_page and treatment don't line up.

The new_page and treatment don’t line up when other combinations are provided with each. First lets check the unique combinations:In [6]:, df.landing_page.unique()


(array(['control', 'treatment'], dtype=object),
 array(['old_page', 'new_page'], dtype=object))

Accordingly, there are two unique values which are treatment and control for groups and old_page and new_page for landing_page. Hence, we need to find the number of rows where treatment was aligned with old_page and control was aligned with new page as follows:In [7]:

treat_old = df.query("group == 'treatment' and landing_page == 'old_page'").shape[0]
control_new = df.query("group == 'control' and landing_page == 'new_page'").shape[0]
misalignment = treat_old + control_new


f. Do any of the rows have missing values?

In [8]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
user_id         294478 non-null int64
timestamp       294478 non-null object
group           294478 non-null object
landing_page    294478 non-null object
converted       294478 non-null int64
dtypes: int64(2), object(3)
memory usage: 11.2+ MB

Question 2. For the rows where treatment is not aligned with new_page or control is not aligned with old_page, we cannot be sure if this row truly received the new or old page.

A. Using the answer from the previous exercise, we will create a new dataset that meets the specifications then store our new dataframe in df2.In [9]:

df.drop(df.query("group == 'treatment' and landing_page == 'old_page'").index, inplace=True)
df.drop(df.query("group == 'control' and landing_page == 'new_page'").index, inplace=True)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 290585 entries, 0 to 294477
Data columns (total 5 columns):
user_id         290585 non-null int64
timestamp       290585 non-null object
group           290585 non-null object
landing_page    290585 non-null object
converted       290585 non-null int64
dtypes: int64(2), object(3)
memory usage: 13.3+ MB

In [10]:

df.to_csv('ab_data2.csv', index=False)

In [11]:

df2 = pd.read_csv('ab_data2.csv')

We can confirm all correct rows were removed if 0 is generated in the following:In [12]:

df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]



Question 3. Use df2 and the cells below to answer questions for the classroom.

a. How many unique user_ids are in df2?

In [13]:



b. There is one user_id repeated in df2. What is it?

In [14]:



1876    773192
2862    773192
Name: user_id, dtype: int64
c. What is the row information for the repeat user_id?

In [15]:



18767731922017-01-09 05:37:58.781806treatmentnew_page0
28627731922017-01-14 02:55:59.590927treatmentnew_page0
d. Remove one of the rows with a duplicate user_id, but keep your dataframe as df2.

In [16]:

df2.drop_duplicates('user_id', inplace=True)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 290584 entries, 0 to 290584
Data columns (total 5 columns):
user_id         290584 non-null int64
timestamp       290584 non-null object
group           290584 non-null object
landing_page    290584 non-null object
converted       290584 non-null int64
dtypes: int64(2), object(3)
memory usage: 13.3+ MB
a. What is the probability of an individual converting regardless of the page they receive?

In [17]:



b. Given that an individual was in the control group, what is the probability they converted?

In [18]:

df2.query("group == 'control'").converted.mean()


c. Given that an individual was in the treatment group, what is the probability they converted?

In [19]:

df2.query("group == 'treatment'").converted.mean()


d. What is the probability that an individual received the new page?

In [20]:

df2.query("landing_page == 'new_page'").shape[0] / df2.landing_page.shape[0]


e. Use the results in the previous two portions of this question to suggest if you think there is evidence that one page leads to more conversions?

From the above data, we can see that the number of individuals who converted from either group is almost identical which was equivalent to 12% of each group. Hence, there is no concrete evidence suggesting that those who explore either page will neccessary lead to more conversions.

Part II – A/B Test

Notice that because of the time stamp associated with each event, you could technically run a hypothesis test continuously as each observation was observed.

However, then the hard question is do you stop as soon as one page is considered significantly better than another or does it need to happen consistently for a certain amount of time? How long do you run to render a decision that neither page is better than another?

These questions are the difficult parts associated with A/B tests in general.

Question 1. For now, consider you need to make the decision just based on all the data provided. If you want to assume that the old page is better unless the new page proves to be definitely better at a Type I error rate of 5%, what should your null and alternative hypotheses be?


poldpold and pnewpnew are the converted rates for the old and new pages respectively.

Question 2. Assume under the null hypothesis, pnewpnew and poldpold both have “true” success rates equal to the converted success rate regardless of page – that is pnewpnew and poldpold are equal. Furthermore, assume they are equal to the converted rate in ab_data.csv regardless of the page.

Use a sample size for each page equal to the ones in ab_data.csv.

Perform the sampling distribution for the difference in converted between the two pages over 10,000 iterations of calculating an estimate from the null.

Use the cells below to provide the necessary parts of this simulation.

a. What is the convert rate for $p_{new}$ under the null?

In [21]:

p_new = df2.converted.mean()


b. What is the convert rate for $p_{old}$ under the null? 

In [22]:

p_old = df2.converted.mean()


c. What is $n_{new}$?

In [23]:

n_new = df2.query("group == 'treatment'").shape[0]


d. What is $n_{old}$?

In [24]:

n_old = df2.query("group == 'control'").shape[0]


e. Simulate $n_{new}$ transactions with a convert rate of $p_{new}$ under the null. Store these $n_{new}$ 1's and 0's in new_page_converted.

In [25]:

new_page_converted = np.random.choice([0, 1], size = n_new, p = [p_new, 1 - p_new])
f. Simulate $n_{old}$ transactions with a convert rate of $p_{old}$ under the null. Store these $n_{old}$ 1's and 0's in old_page_converted.

In [26]:

old_page_converted = np.random.choice([0, 1], size = n_old, p = [p_old, 1 - p_old])
g. Find $p_{new}$ - $p_{old}$ for your simulated values from part (e) and (f).

In [27]:

p_diff = new_page_converted.mean() - old_page_converted.mean()




An error occurs when deducing the difference of pnewpnew – poldpold due to mismatch in sizes; hence, a difference in means is calculated instead.

h. Simulate 10,000 $p_{new}$ - $p_{old}$ values using this same process similarly to the one you calculated in parts a. through g. above. Store all 10,000 values in p_diffs.

In [28]:

p_diffs = []

for _ in range(10000):
    new_page_converted = np.random.choice([0, 1], size = n_new, p = [p_new, 1 - p_new]).mean()
    old_page_converted = np.random.choice([0, 1], size = n_old, p = [p_old, 1 - p_old]).mean()
    p_diffs.append(new_page_converted - old_page_converted)
i. Plot a histogram of the p_diffs. Does this plot look like what you expected? Use the matching problem in the classroom to assure you fully understand what was computed here.

In [29]:

plt.ylabel('# of Simulations')
plt.title('Plot of 10,000 Simulated p_diffs');
j. What proportion of the p_diffs are greater than the actual difference observed in ab_data.csv?

First, we need to convert p_diffs into a numpy array as follows:In [30]:

p_diffs = np.array(p_diffs)


array([-0.00028686,  0.00142686,  0.00234903, ...,  0.00115874,
        0.0010417 , -0.00054158])

Next, we need to compute the actual difference observed in the csv dataset as follows:In [31]:

act_diffs = df2.query('group == "treatment"').converted.mean() - df2.query('group == "control"').converted.mean()



Finally, we can compute the proportion of p_diffs greater than act_diffIn [32]:

(p_diffs > act_diffs).mean()


k. In words, explain what you just computed in part 
j. What is this value called in scientific studies? What does this value mean in terms of whether or not there is a difference between the new and old pages? 

In the previous part, we were calculating the p-value which is the probability of getting our statistic or a more extreme value if the null is true. 

Having a large p-value goes on to say that the statistic is more likely to come from our null hypothesis; hence, there is no statistical evidence to reject the null hypothesis which states that old pages are the same or slightly better than the new pages.
l. We could also use a built-in to achieve similar results. Though using the built-in might be easier to code, the above portions are a walkthrough of the ideas that are critical to correctly thinking about statistical significance. Let n_old and n_new refer the the number of rows associated with the old page and new pages, respectively.

In [33]:

import statsmodels.api as sm

convert_old = df2.query('group == "control"').converted.sum()
convert_new = df2.query('group == "treatment"').converted.sum()
n_old = df2.query("landing_page == 'old_page'").shape[0]
n_new = df2.query("landing_page == 'new_page'").shape[0]
m. Now use stats.proportions_ztest to compute your test statistic and p-value. Here is a helpful link on using the built in.

In [34]:

z_score, p_value = sm.stats.proportions_ztest([convert_old, convert_new], [n_old, n_new], alternative='smaller')
z_score, p_value


(1.3109241984234394, 0.9050583127590245)

Next we import the norm function to compute the significance of our z-score.In [35]:

from scipy.stats import norm




Next we check our critical value at 95% confidence interval.In [36]:



n. What do the z-score and p-value computed in the previous question mean for the conversion rates of the old and new pages? Do they agree with the findings in parts j. and k.?

Results above deduced a z_score of 1.31. Since this value does not exceed the critical value at 95% confidence interval (1.96); there is no statistical evidence to reject the null hypothesis. Furthermore, p-value obtained is similar to the result obtained from our previous findings in j. and k. which also fails to reject the null hypothesis as it provides evidence of a higher probability of the null hypothesis

Part III – A regression approach

  1. In the final part, you will see that the result acheived in the previous A/B test can also be acheived by performing regression.
a. Since each row is either a conversion or no conversion, what type of regression should you be performing in this case?

We are studying rows with either conversions or no conversions which predicts a probability between 0 and 1. Accordingly, may be used.

b. The goal is to use statsmodels to fit the regression model you specified in part a. to see if there is a significant difference in conversion based on which page a customer receives. However, first we need to create a column for the intercept, and create a dummy variable column for which page each user received.

We will add an intercept column, as well as an ab_page column, which is 1 when an individual receives the treatment and 0 if control.

In [37]:

df2['intercept'] = 1
df2[['ab_page2', 'ab_page']] = pd.get_dummies(df2['group'])
df2 = df2.drop('ab_page2', axis = 1)


08511042017-01-21 22:11:48.556739controlold_page010
18042282017-01-12 08:01:45.159739controlold_page010
26615902017-01-11 16:55:06.154213treatmentnew_page011
38535412017-01-08 18:28:03.143765treatmentnew_page011
48649752017-01-21 01:52:26.210827controlold_page110
c. Use statsmodels to import the regression model. Instantiate the model, and fit the model using the two columns created in part b. to predict whether or not an individual converts.

In [38]:

log_mod = sm.Logit(df2['converted'], df2[['intercept', 'ab_page']])
d. Provide the summary of your model below, and use it as necessary to answer the following questions.

In [39]:

results =
Optimization terminated successfully.
         Current function value: 0.366118
         Iterations 6


Dep. Variable:convertedNo. Observations:290584
Model:LogitDf Residuals:290582
Method:MLEDf Model:1
Date:Sat, 24 Aug 2019Pseudo R-squ.:8.077e-06
Covariance Type:nonrobustLLR p-value:0.1899
coefstd errzP>|z|[0.0250.975]
e. What is the p-value associated with ab_page? Why does it differ from the value you found in the Part II?

The p-value associated with ab_page was 0.19 which was significantly lower than the one in Part II which was approximately 0.9. The reason for such a significant difference is because the null and alternative hypothesis differed in each exercise.H0:pold−pnew≥0H0:pold−pnew≥0H1:pold−pnew<0H1:pold−pnew<0H0:pold=pnewH0:pold=pnewH1:pold≠pnewH1:pold≠pnew

poldpold and pnewpnew are the converted rates for the old and new pages respectively.

Because the later case relies solely on two possible outcomes, it may be the reason that it yields a lower probability in the null hypothesis than that in the case in Part II of the earlier exercise.

f. Now, we shall consider other things that might influence whether or not an individual converts. The below section discusses why it is a good idea to consider other factors to add into your regression model along with disadvantages to adding additional terms into your regression model?

  • Allows for a more sophisticated model to distinguish other factors which may contribute to the outcome.
  • May be used to identify outliers.
  • May produce inaccurate results due to correlated errors.
g. Now along with testing if the conversion rate changes for different pages, we will also add an effect based on which country a user lives. First, we will need to read in the countries.csv dataset and merge together both datasets on the approporiate rows. Here are the docs for joining tables.

Does it appear that country had an impact on conversion? We will provide the statistical output as well as a written response to answer this question.In [40]:

countries_df = pd.read_csv('./countries.csv')
df_new = countries_df.set_index('user_id').join(df2.set_index('user_id'), how='inner')


834778UK2017-01-14 23:08:43.304998controlold_page010
928468US2017-01-23 14:44:16.387854treatmentnew_page011
822059UK2017-01-16 14:04:14.719771treatmentnew_page111
711597UK2017-01-22 03:14:24.763511controlold_page010
710616UK2017-01-16 13:14:44.000513treatmentnew_page011

Check number of unique rows under country column:In [41]:


array(['UK', 'US', 'CA'], dtype=object)

Considering there are three dummy variables, we will need to include two columns.In [42]:

df_new[['UK', 'US']] = pd.get_dummies(df_new['country'])[['UK','US']]


834778UK2017-01-14 23:08:43.304998controlold_page01010
928468US2017-01-23 14:44:16.387854treatmentnew_page01101
822059UK2017-01-16 14:04:14.719771treatmentnew_page11110
711597UK2017-01-22 03:14:24.763511controlold_page01010
710616UK2017-01-16 13:14:44.000513treatmentnew_page01110

Computing the statistical output:In [43]:

log_mod = sm.Logit(df_new['converted'], df_new[['intercept', 'UK', 'US']])

In [44]:

results =
Optimization terminated successfully.
         Current function value: 0.366116
         Iterations 6


Dep. Variable:convertedNo. Observations:290584
Model:LogitDf Residuals:290581
Method:MLEDf Model:2
Date:Sat, 24 Aug 2019Pseudo R-squ.:1.521e-05
Covariance Type:nonrobustLLR p-value:0.1984
coefstd errzP>|z|[0.0250.975]

According to our statistical output the p-value for both countries yields a value larger than 0.05; hence, there is no statistical evidence on country’s significant impact on conversion.

h. Though you have now looked at the individual factors of country and page on conversion, we would now like to look at an interaction between page and country to see if there significant effects on conversion.

We will create the necessary additional columns, and fit the new model then provide the summary results, and conclusions based on the results.

Pages column is already included as per exercise in part b); hence, model may be made similar to previous part while including pages column.In [45]:

log_mod = sm.Logit(df_new['converted'], df_new[['intercept', 'UK', 'US', 'ab_page']])

In [46]:

results =
Optimization terminated successfully.
         Current function value: 0.366113
         Iterations 6


Dep. Variable:convertedNo. Observations:290584
Model:LogitDf Residuals:290580
Method:MLEDf Model:3
Date:Sat, 24 Aug 2019Pseudo R-squ.:2.323e-05
Covariance Type:nonrobustLLR p-value:0.1760
coefstd errzP>|z|[0.0250.975]

According to results above, even after adding there does not seem to be any statistical evidence to indicate an impact on the conversion since p-values were all exceeding 0.05.In [47]:

from subprocess import call
call(['python', '-m', 'nbconvert', 'Project_2_-_Analyze_Experiment_Results.ipynb'])



Write A Comment