Below is a dataset with single bets made by gambling customers. Our task is to aggregate them into sessions, group sessions into length groups and calculate the average margin by session length group. For example:

Session LengthAvg Margin (GGR divided by Turnover)
< 15minX
15 – 30 minY
30 – 45 minZ
…..

Description of the dataset:

  1. Player_id – unique player identifier
  2. Game_End_Timestamp – indicator when bet was placed (aggregated to full minute)
  3. Turnover – amount customer staked in that minute
  4. GGR – gross gaming revenue (turnover – wins), casino revenue in that minute

Plan of Action

This dataset consists of log entries that contain a player_id, timestamp, and betting information. Our first step would be to group all of the actions that a user takes within a given session. A session can be defined as any continuous entry of logs from a specific user, where the gap between the previous timestamp isn’t more than x minutes. From closer inspection, a good starting point to define x = 30 minutes.

Next, we need to extract a list of unique values from the player_id column, then need to loop through the list of player ids. For each one, we need to select a list of timestamps corresponding to that specific id, ordered by ascending timestamps.

To find when a session ends we need to filter each timestamp, then compare it to the next ordered timestamp in the list. This will help us to find out if the next one is < or > 30 minutes difference. If diff is < x, label it as the same session using identifier n. If diff > x, then label it as a new session using the new unique identifier n + 1 and repeat throughout the dataframe.

Generally, this method is not something recommended to be done locally, especially if this task will be repeated regularly. It is good practice to add a session identifier to the data when it is generated instead of processing it offline since the data size could be much larger.

1. Import Libraries

2. Gather Data

player_idgame_end_timestampturnoverggr
05558229332020-10-07 00:00:00+00:003.782.78
15428726702020-10-07 00:00:00+00:003.603.32
25379884772020-10-07 00:00:00+00:000.40-0.20
35592844212020-10-07 00:00:00+00:000.200.10
45507716572020-10-07 00:00:00+00:0055.0050.00
Imported CSV .head

3. Encoding Sessions

We can split sessions by gaps over x amount of minutes. As a starting point, we will use 30 minutes as a session splitter to get lengths. We can use this to create a new column called “session” where we can append a number to each session for each player id.

player_idgame_end_timestampturnoverggr
05558229332020-10-07 00:00:00+00:003.782.78
4065558229332020-10-07 00:01:00+00:007.186.18
6195558229332020-10-07 00:02:00+00:0014.2314.23
7655558229332020-10-07 00:03:00+00:004.113.11
8975558229332020-10-07 00:04:00+00:006.491.49
11225558229332020-10-07 00:05:00+00:001.00-13.00
14315558229332020-10-07 00:06:00+00:007.387.38
15385558229332020-10-07 00:07:00+00:001.00-1.00
18105558229332020-10-07 00:08:00+00:008.568.56
20795558229332020-10-07 00:09:00+00:007.485.48
player_idturnoverggrsession
game_end_timestamp
2020-10-07 00:00:00+00:005558229333.782.781
2020-10-07 00:00:00+00:005428726703.603.321
2020-10-07 00:00:00+00:005379884770.40-0.201
2020-10-07 00:00:00+00:005592844210.200.101
2020-10-07 00:00:00+00:0055077165755.0050.001

Below are each of the unique session labels we have created. We can see the total number of sessions for an individual user is 64. The session edges are determined by the minimum of 30-minutes inactivity we look for between sessions, if we increase this number we can expect fewer sessions to be created.

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
       52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64])

4. Calculating Session Lengths

turnover ggr time session_minutes
player_id session game_end_timestamp
2257174 1 2020-10-07 00:00:00+00:00 86.20 25.18 0 days 00:13:00 13.00
2 2020-10-07 00:00:00+00:00 50.90 -135.20 0 days 00:18:00 18.00
3 2020-10-07 00:00:00+00:00 1459.80 60.11 0 days 03:51:00 231.00
4 2020-10-07 00:00:00+00:00 1513.06 327.89 0 days 00:11:00 11.00
2020-10-08 00:00:00+00:00 1513.06 327.89 0 days 04:23:00 263.00

5. Grouping session periods

We can find the maximum session length, then divide by 15 (bin/group size in minutes) to split each session length into groups. We need to make sure to use .ceil to round up since the result be not be a whole number. The result helps determine the max length of the np.arange array. This helps us create an array of intervals that we can use as bin edges.

turnover ggr time session_minutes bins
player_id session game_end_timestamp
2257174 1 2020-10-07 00:00:00+00:00 86.20 25.18 0 days 00:13:00 13.00 (0, 15]
2 2020-10-07 00:00:00+00:00 50.90 -135.20 0 days 00:18:00 18.00 (15, 30]
3 2020-10-07 00:00:00+00:00 1459.80 60.11 0 days 03:51:00 231.00 (225, 240]
4 2020-10-07 00:00:00+00:00 1513.06 327.89 0 days 00:11:00 11.00 (0, 15]
2020-10-08 00:00:00+00:00 1513.06 327.89 0 days 04:23:00 263.00 (255, 270]

6. Calculating Average Margin

Avg Margin (GGR divided by Turnover) Total Sessions
Session Length (Mins)
(0, 15] 1.85% 5277
(15, 30] 2.85% 2524
(30, 45] 2.99% 1898
(45, 60] 0.90% 1571
(60, 75] 2.53% 1144

7. Report

Observation

Session lengths between 45-60 minutes have the highest average margins with 105.67 GBP. This is followed by sessions of 60–75 minutes, but the margin drops down to less than half at 49.58 GBP. Sessions of 15-30 minutes rank 6th of all the session lengths, but 0-15 minute sessions (2.42 GBP) are nowhere to be seen in the top 10. This could indicate spending may increase with session length but further analysis would have to be done. Currently, we can see no clear correlation between total sessions and average margin. There is a clear negative correlation between session length and total sessions since the majority of players or sessions generally last under 30 minutes.

Moving forward, this data could also be used to identify patterns of:

  • Uncontrolled spending
  • Significant time spent gambling
  • Compare times of day session takes place

This may find customers who may need help with gambling addictions or who are struggling financially.

References

Pandas

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html#pandas.read_pickle
https://stackoverflow.com/questions/64433889/dataframe-new-column-to-split-sessions-by-time-difference-pandas
https://stackoverflow.com/questions/32138627/calculating-time-elapsed-on-a-groupby-object
https://www.akashmittal.com/valueerror-truth-value-series-ambiguous/
https://pandas.pydata.org/docs/reference/api/pandas.Index.get_level_values.html
https://stackoverflow.com/questions/60462300/filter-by-a-bin-in-pandas
https://stackoverflow.com/questions/18429491/pandas-groupby-columns-with-nan-missing-values
https://stackoverflow.com/questions/41929772/time-difference-within-group-by-objects-in-python-pandas
https://stackoverflow.com/questions/62665950/cumulative-sum-of-timedelta-column-based-on-boolean-condition
https://stackoverflow.com/questions/33151463/how-to-bin-time-in-a-pandas-dataframe

Numpy

https://numpy.org/doc/stable/reference/generated/numpy.logical_and.html
https://numpy.org/doc/stable/reference/generated/numpy.ceil.html
https://numpy.org/doc/stable/reference/generated/numpy.arange.html

Plotly

https://stackoverflow.com/questions/58515938/typeerror-object-of-type-interval-is-not-json-serializable-in-plotly-python

Domain Information

https://corporatefinanceinstitute.com/resources/knowledge/finance/gross-gaming-revenue-ggr/
https://blog.revolut.com/a/turnover-vs-revenue/

Other

https://thisdavej.com/copy-table-in-excel-and-paste-as-a-markdown-table/
https://www.rbs.co.uk/life-moments/Struggling-financially/manage-your-gambling.html

Author

Write A Comment