# Identify group based on sales

• Oct 15th 2012, 06:46 AM
henrygd
Identify group based on sales
Hi all,

I have a pool of data which contains a list of stores and their sales quantity for 4 sizes, namely S, M, L, XL.

ie:
Store Size Quantity
2 S 5483
2 M 6444
2 L 5883
2 XL 3165
7 S 4931
7 M 5963
7 L 4915
7 XL 2469
12 S 115
12 M 149
12 L 94
12 XL 65

For each store, I need to label them to 1 of the 4 size group. Which means, store 2 belong to size group S because they are selling more of the S size, store 7 belongs to size group XL because they are selling more of size XL etc for all stores.

Please note that each store did not receive the same quantity of each size, and hence I didn't not analyze based on their sales quantity but rather their sales % contribution of each size instead... and the % variance of each size compare to the sales % contribution of all stores combined...
(Please find attached excel file for better understanding...)

And I got stuck here...

Can some please give me advice if I am going to the right direction? If yes, how should I proceed further.

Henry.
• Oct 15th 2012, 08:54 PM
chiro
Re: Identify group based on sales
Hey henrygd.

What are you ultimately trying to do? Are you trying to make comparisons between relative sales quantities with regards to existing stock in each store for each size of clothing? What is the goal here?
• Oct 16th 2012, 07:17 AM
henrygd
Re: Identify group based on sales

What I am trying to find out is;
Based on the store sales qty, identity which store should be identified as a "SMALL" store, which store should be identify as "LARGE" store.
A "SMALL" store means the store is selling more of the small size compare to the average. Same logic goes to "LARGE" store.

I have a solution in mind but I am not sure if it is the correct way to solve this.

I will take the % contribution of Size XS of all stores and calculate the average and the average deviation. Then store with % contribution for size XS higher than the average + 1*sd will be a "SMALL" store, and store with % contribution for size XS higher than the average + 2*sd will be a "XS" store...

Then do the same for size XL...
• Oct 16th 2012, 06:28 PM
chiro
Re: Identify group based on sales
One thing to point out is that your data may not be normally distributed.

However you can still do the kinds of things by specifying quantile information for what is a "XS" store and other categories.

What you do is you get your sample and get what is called an empirical distribution. You then say select the regions for "XS" and so on. For example "XS" might be top 10% (so this is the 0.9-1.0 range of cumulative probability), the next might be 20% (so 0.7-0.9 cumulative probability), next might be top 30% (so 0.4-0.7 cumulative probability) and so on.

In a normal distribution you do the same thing as above except because the Normal is symmetric, it means that you get simplifications. So when you use a normal distribution +- 1 standard deviation is roughly 66%, +- 2 is roughly 95%, +- 3 is roughly 99.7% and so on.

But this assume normality and you can't always assume this, but what you can do is use exactly the same idea but for the distribution represented by your data and not by an assumption.

Also Remember that +- includes not only data that are 2 s.d.'s in the positive direction, but also 2 s.d's in the negative: this is why I proposed breaking it in regions from the top, but if you had a reason for doing the above then just ignore that comment.

Excel should have a way to calculate the cumulative probability of an observation given a sample distribution (what I call empirical above).
• Oct 23rd 2012, 07:30 AM
henrygd
Re: Identify group based on sales