# Thread: Strange probability problem

1. ## Strange probability problem

Hi

I am facing a probability problem....cant seem to understand how this is structured

Problem:

ROW 1 - Every month a set of new customers come and join a website. The first row (pasted below) clearly show how with a sample data. These customers are new as in they are registering for the first time with the website site.

ROW 2- mentions the Reordering customers…..customers who after joining the site like what they have seen and wish to reorder…..we have used a simple probability (mentioned on top of the table which is pasted below) every month to calculate reordering customers (just to take an extreme step we have used 100%).
Clear Constraints - Any new customer from any previous months in the past can reorder in that particular month - due to this constraint when we calculate we have used a cumulative total of all previous months

ROW 3 - Silver Level - Every customer who has reordered NECESSARILY in consecutive months that is he has reordered in month n, month n+1, month n+2 from ROW 2 will be Upgraded to an Silver Award where he gets additional benefits

Clear Constraints
- the customer needs to have ordered consecutively for three months in ROW 2 only then does he get upgraded to Silver

UNCLEAR DOUBTFUL Constraints
If any person has moved to SILVER Row (Row3) then should he be removed from calculation from the New Customers (Row 1) ?????. So that the customer count below will not affect the top and maybe prevent double counting
If a customer has upgraded to a silver level as in he has moved to ROW 3 from ROW 2 then this customer who has moved to Silver should not appear in the rows above (obviously because he has moved to a upgraded level) how to show this!!

If a customer has not upgraded to silver level then he should have a chance to stay in that level and maybe order in any of the future months.

 Probability of ordering next month 100% 100% 100% 100% Row 1 New Customer coming in every month 3 3 3 3 Row 2 Reordering Customers 3 6 9 Row 3 Silver ??

Questions - How to Calculate Silver Customers and How to Calculate Retention Rate as in no of customers retained at the end of the year

Please help me figure out how to solve this issue.........

Thanks in advance

2. ## Re: Strange probability problem

UNCLEAR DOUBTFUL Constraints
1. If any person has moved to SILVER Row (Row3) then should he be removed from calculation from the New Customers (Row 1) ?????. So that the customer count below will not affect the top and maybe prevent double counting
2. If a customer has upgraded to a silver level as in he has moved to ROW 3 from ROW 2 then this customer who has moved to Silver should not appear in the rows above (obviously because he has moved to a upgraded level) how to show this!!
3. If a customer has not upgraded to silver level then he should have a chance to stay in that level and maybe order in any of the future months.
1. Why would a silver customer ever be counted in the new customers. Surely you are using the website data to count the number of new accounts created, silver customers can't possibly have created a new account in the month before they get their silver status.
2. What do you mean how to show this? What is there to show, you said it was obvious.
3. If a customer makes an order and then does not make an order the next month then you can label them as "inactive" and make a new row for them.

I think it would help if you split up row 2 into customers who have made an order in the last month and customers who have made an order in both of the last 2 months.
How will you decide how many silver customers last until the end of the year? Maybe you can assume that once they reach silver level their probability of not making an order the next month is constant regardless of how many months they have been silver 4. (So the chance of a customer who has made an order each of the last 3 months making an order in their 4th month is the same as the chance that someone who has made an order in each of the last 11 months will make an order in their 12th month)

3. ## Re: Strange probability problem

Hi Shakkari - appreciate your prompt reply - let me redefine - Im facing a problem in calculating re-ordering each subsequent month from the INPUT of new customers who come in every month.
The company has basically 3 awards - Silver ,Gold ,Platinum . For each award there is a simple business rule Customer needs to order 3 months CONSECUTIVELY to achieve an award once the customer reaches a particular level.

Every new customer that orders with the company there is a probability that few from them will reorder the very next month and some could reorder at any point in future thereby staying alive in that level.

But if they have ordered consecutively (problem is how do i use a probability equation to predict the next customer who has ordered consecutively for three months?) then they move up a level. (silver for example). Then if they have already moved up a level then how do we remove the set of people who have upgraded from the lower level.....as the lower level series keeps continuing (so that we don't double count)

How to calculate this with 10 new customers coming every month for a year (120 customers at the end of the year). If so how many customers reach silver, how many reach gold, how many reach platinum....assuming any probability say 25% for reorder the very next month.....

Hope it was able to shed some more light..

4. ## Re: Strange probability problem

First give a name to each of the groups
N the number of new accounts created that month
F the number of "frozen" accounts, accounts which have been created but never bought something
I the number of inactive customers
A the number of customers who have ordered in the last month
B the number of customers who have ordered twice in a row
S the number of silver customers

Suppose you examine the numbers in each group on the first day of every month (or every 4th week, however you prefer) and suppose that at the start of month 1 before anyone has a chance to move between levels you have these numbers of customers in each group: $F_0, I_0, A_0, S_0$, as well as N customers who joined between the start of month 0 and the start of month 1.

Now define the probability of moving between levels when we move from one month to the next
Probability of going from N to A is p1
Probability of going from N to F is 1-p1
Probability of going from A to B is p2
Probability of going from B to S is p3
Probability of going from A to I is 1-p2
Probability of going from B to I is 1-p3
Probability of going from I to A is p4

This model is just for silver customers, you can apply the same thing for gold and platinum too once you know how it works for silver.

I am not sure how you should deal with accounts going from F to A. I can imagine that accounts that haven't bought anything for a month after creating an account on the website might come back but if they have been gone for a year then they most likely wont come back. Maybe if accounts more than 2 months old very rarely come back then you could have a group F for accounts which haven't bought anything for a month after account creation- with a probability p5 of making a purchase next month, and a group G for accounts which haven't bought anything for 2 months or more after account creation- with a probability of 0 of making a purchase next month. You can add more groups if the probability of group G making a purchase isn't close to 0, the calculation will get a little longer but not more complicated.

If you start with A0 people in group A then after a month you would expect $A_0+N p_1 - A_0 p_2 + I_0 p_4$ customers to be in group A.
To make sure you understand what is going on can you make these equations for the other groups B, F, S and I. Then we will be able to estimate the values of all those probabilities using any data you have from the website.

Some more questions about your website: will you be assuming that you get the same number of new customers every month or will it vary?
If someone reaches a level, Silver for example, and then does not buy anything will their silver status be removed?
To get gold do they need to purchase something 6 months consecutively or could they buy something for 3 months and get silver, then stop for 2 months, and ten come back to buy something for another 3 months to get gold?

5. ## Re: Strange probability problem

Hi Shakarri, Thanks again for your fantastic suggestion - it was bulls eye but we still have some questions.

But first, answering your questions that you asked

The no of customers who come every month will vary
When someone reaches a silver level and does not buy at all then his status remains same.....
To get Gold - Yes you are right one can easily buy 3 months consecutive and get silver, then stop for 2 months, and ten come back to buy something for another 3 months consecutively to get gold

We have some questions regarding "A" ROW - there is a NEGATIVE sign in the formula that you have suggested which removes P2 (which is B) from "A" Row which indicates that ROW A contains only those NEW customers who have reordered with probability P1 and stay alive and show some activity in the ROW for all future periods. Is our understanding correct.

If our understudying is correct then "B" ROW contains only those NEW CUSTOMERS who ordered TWICE in "A" ROW consecutively...which means that "B" ROW does not contain any element of "ROW A".

For F and I - instead of considering just the immediate priori month or the last month - we have considered the sum product of all previous months with the probabilities...that is (prob x * month1 + prob x+1 * month 2......) and so on - where the prob x, prob x + 1 are in descending order like 10%,9%.......dipping to near 0%......because of YOUR CORRECT SUGGESTION that if a group hasn't bought in any of the previous periods then they shouldn't be awarded a greater chance to be seen active in current months.

We have an excel file which we like to send to you.. Do help us with your email. (our email is girishbwl@gmail.com)

Thanks again

6. ## Re: Strange probability problem

Originally Posted by girishbwl
We have some questions regarding "A" ROW - there is a NEGATIVE sign in the formula that you have suggested which removes P2 (which is B) from "A" Row which indicates that ROW A contains only those NEW customers who have reordered with probability P1 and stay alive and show some activity in the ROW for all future periods. Is our understanding correct.
That is correct, the negative sign is the people leaving group A going to group B and the positive signs are people being added to group A from the new or inactive users.

If our understudying is correct then "B" ROW contains only those NEW CUSTOMERS who ordered TWICE in "A" ROW consecutively...which means that "B" ROW does not contain any element of "ROW A".
Yes, people can't be in A and B at the same time.
When you extend this approach for gold and platinum as well as silver you will need to add more groups. Just as there are groups A and B which come between new members and silver members you will need two groups that come between silver and gold.

For F and I - instead of considering just the immediate priori month or the last month - we have considered the sum product of all previous months with the probabilities...that is (prob x * month1 + prob x+1 * month 2......) and so on - where the prob x, prob x + 1 are in descending order like 10%,9%.......dipping to near 0%......because of YOUR CORRECT SUGGESTION that if a group hasn't bought in any of the previous periods then they shouldn't be awarded a greater chance to be seen active in current months.
While your idea models reality more closely it will not work with the idea I had for calculating the number of silver/gold/platinum users at the end of the year. I was going to use Markov chains to calculate the probability that new users and existing users will end up as silver/gold/platinum after a year, for this to work all the users need to be put into groups and all the users in each group are considered to be the same. More precisely, everyone in the same group is assumed to have the same probability of moving to every other group. If we put inactive customers into 1 group then we can't account for them having different histories because they are assumed to all be the same. The only way around this is if you have several inactive groups like I1= the customers who have had no activity for a month, I[sub]3/sub]= the customers who have had no activity for 3 months, I5= the customers who have had no activity for 5 months up until a number of months where you assume the person never returns to buy something. This is what I was trying to do by having groups F and G, like I said if you add more groups the calculation will be more accurate but it will take longer.
When you include gold and platinum as well you will have to have 10 groups (from new users with 0 months up to users with 9 consecutive months [or 3x 3 consecutive months]) and you will need at least 1 group for users who have never bought anything and users who did buy something but then went inactive. To avoid having a huge number of groups I suggest that you only pick a small number of groups for different types of F and I.

7. ## Re: Strange probability problem

Hi Shakarri - thanks again for your support and advise. Yes you are right markov chains will accurately model.

But i have some questions on what you have said

More precisely, everyone in the same group is assumed to have the same probability of moving to every other group. If we put inactive customers into 1 group then we can't account for them having different histories because they are assumed to all be the same.
what does the above statement - does it mean that in any group over time you have used a fixed probability across all months (because we have incremented probability every month by a factor or 1% cumulative which gives us increasing probability every month say 25%, 25.5%,26.6%.....and so on for each month). Help us understand why putting a incremental probability for a particular group which will closely reflect reality but will prevent from modeling a Markov chain model.

Secondly, we wanted to make some sense of the total or final figures of customers - so this is how the numbers are coming..
New Customers

N - Total no of New Customers at the end of the period 7052
A - Total No of customers who Reordered with us at the end of the period 8012
B - Total No of customers who Reordered Consecutivey for two months with us at the end of the period 1438
F - Total no of "frozen" accounts, accounts which have been created but never bought something for the entire period 4621
I - Total no of inactive customers (from Both A and B) who bought something in beginning but didn’t buy anything later 4151

Out of 7052 customers if i remove 4621 customers who are frozen the remaining set of ACTIVE customers are 2431 ? is this correct

If above is correct then 2431 customers reordered ("A") 8012 times = does this mean that each customer reodered for 2.39 times....so does this mean that they reordered for 2.39 months (maybe rounding off 2 times atleast) ?

If above understanding is correct then for group "B" - 1438 customers out of 2431 customers ordered two months consecutively ? but if i average is i am coming to some strange no as 0.59??? if so how can this be true and what to make sense of this no.....plus is our understanding correct that in the PERIOD the data for "B" will come to say assume PERIOD is for 12 months then the "12 month" data is reflective of 10th and 11th months..

Thanks again for your guidance...

8. ## Re: Strange probability problem

Originally Posted by girishbwl
what does the above statement - does it mean that in any group over time you have used a fixed probability across all months (because we have incremented probability every month by a factor or 1% cumulative which gives us increasing probability every month say 25%, 25.5%,26.6%.....and so on for each month). Help us understand why putting a incremental probability for a particular group which will closely reflect reality but will prevent from modeling a Markov chain model.
When using markov chains there is no way to increase the probability for different people in the same group, everyone in the group has the same constant probability.

Secondly, we wanted to make some sense of the total or final figures of customers - so this is how the numbers are coming..
New Customers

N - Total no of New Customers at the end of the period 7052
A - Total No of customers who Reordered with us at the end of the period 8012
B - Total No of customers who Reordered Consecutivey for two months with us at the end of the period 1438
F - Total no of "frozen" accounts, accounts which have been created but never bought something for the entire period 4621
I - Total no of inactive customers (from Both A and B) who bought something in beginning but didn’t buy anything later 4151

Out of 7052 customers if i remove 4621 customers who are frozen the remaining set of ACTIVE customers are 2431 ? is this correct

If above is correct then 2431 customers reordered ("A") 8012 times = does this mean that each customer reodered for 2.39 times....so does this mean that they reordered for 2.39 months (maybe rounding off 2 times atleast) ?
A is not the number of customers who reordered by the end of the period. A is the number of customers who have placed one or more orders in the last month, these could be new customers or inactive customers who became active again. You used the word Reordered but I think you meant ordered. Customers who Reordered (ordered 2 or more months consecutively) would be in B or S.

9. ## Re: Strange probability problem

Hi Shakarri - thanks for the prompt response -

Ok now i understand now about the Markov Chain. Fair enough i will keep that model aside.

I want to draw your attention to your statement

A is not the number of customers who reordered by the end of the period. A is the number of customers who have placed one or more orders in the last month, these could be new customers or inactive customers who became active again. You used the word Reordered but I think you meant ordered. Customers who Reordered (ordered 2 or more months consecutively) would be in B or S.
I agree that A is the no of customers who "ORDERED" and B is the no of customers who Reordered....my mistake. But you said specifically that A is the no of customers who have placed one or more orders in the last month. But if you see in the formula you have suggested you have added the previous month quantity into this month (ofcourse with some probability) - which essentially means that the previous month is cumulatively added with some % into this month...which means the previous month or last month contains some percentage of people from all succeeding months.....so in essence i have carried some percentage of all PAST into FUTURE....Is my understanding correct here

Secondly rewriting the total and final figures correctly with your suggestion on what A and B and S is ...can you help us understand the following questions

Secondly, we wanted to make some sense of the total or final figures of customers - so this is how the numbers are coming..
New Customers

N - Total no of New Customers at the end of the period 7052
A - Total No of customers who placed one or more orders in the last month calculated here as a total till the end of the period 8012
B - Total No of customers who Reordered Consecutivey for two months with us at the end of the period 1438
F - Total no of "frozen" accounts, accounts which have been created but never bought something for the entire period 4621
I - Total no of inactive customers (from Both A and B) who bought something in beginning but didn’t buy anything later 4151

1. Out of 7052 customers if i remove 4621 customers who are FROZEN the remaining set of ACTIVE customers are 2431 ? is this correct

2. If above is correct then 2431 customers reordered ("A") 8012 times = does this mean that each customer reodered for 2.39 times....so does this mean that they reordered for 2.39 months (maybe rounding off 2 times atleast) ?

Thanks again.

10. ## Re: Strange probability problem

Ok now i understand now about the Markov Chain. Fair enough i will keep that model aside.
Try to not to be too picky about how exact the model is, otherwise you will probably find yourself trying to account for too many variables and you wont be able to get any answer.

Originally Posted by girishbwl
I agree that A is the no of customers who "ORDERED" and B is the no of customers who Reordered....my mistake. But you said specifically that A is the no of customers who have placed one or more orders in the last month. But if you see in the formula you have suggested you have added the previous month quantity into this month (ofcourse with some probability) - which essentially means that the previous month is cumulatively added with some % into this month...which means the previous month or last month contains some percentage of people from all succeeding months.....so in essence i have carried some percentage of all PAST into FUTURE....Is my understanding correct here
Sorry, in my formula earlier I did not include the customers going from A to I so the formula should be
$A_0+N p_1 - A_0 p_2 + I_0 p_4 - A_0 p_6$

Where p6 is the probability of going from group A to being inactive. Since people in group A have to either go to group B or go to group I we can see that
p6=1-p2. Then the formula becomes

$A_0+N p_1 - A_0 p_2 + I_0 p_4 - A_0 (1-p_2)$

$=A_0+N p_1 - A_0 p_2 + I_0 p_4 - A_0 + A_0 p_2$

$=N p_1 + I_0 p_4$

Secondly rewriting the total and final figures correctly with your suggestion on what A and B and S is ...can you help us understand the following questions

Secondly, we wanted to make some sense of the total or final figures of customers - so this is how the numbers are coming..
New Customers

N - Total no of New Customers at the end of the period 7052
A - Total No of customers who placed one or more orders in the last month calculated here as a total till the end of the period 8012
B - Total No of customers who Reordered Consecutivey for two months with us at the end of the period 1438
F - Total no of "frozen" accounts, accounts which have been created but never bought something for the entire period 4621
I - Total no of inactive customers (from Both A and B) who bought something in beginning but didn’t buy anything later 4151

1. Out of 7052 customers if i remove 4621 customers who are FROZEN the remaining set of ACTIVE customers are 2431 ? is this correct

2. If above is correct then 2431 customers reordered ("A") 8012 times = does this mean that each customer reodered for 2.39 times....so does this mean that they reordered for 2.39 months (maybe rounding off 2 times atleast) ?

Thanks again.
Part 1 is correct, F is 4621 and the number of new customers who go to A is 2431. This indicates that p1= 2431/7052= 0.345. You might want to look at other months to get a bigger sample size for p1 and taking samples from different months will help account for fluctuations in the year (e.g. Sales might increase around Christmas)

In part 2, if you are just interested in how many people will be Silver, Gold and Platinum members after a year then it wont help to know how many times new customers buy something in their first month. But if you want to eventually model how many orders will be made in a year then keep the average number of orders in (2.39) for future reference.

Edit: Is 8021 the number of orders just made by new customers? If you want to get useful averages then I suggest you get the number of orders made by each group separately.

11. ## Re: Strange probability problem

Hi - Just working on the full sheet with many groups as suggested..your suggestions are completely right we are seeing how the complexity is building in....will revert shortly with the summary. thanks