# Thread: gambling problem/ standard deviation

1. Originally Posted by CrazyAsian
Can someone explain why you multiply the estimate of SD by the square root of the sample size?
We have to start at the beginning and along the way we'll see why there are two ways to calculate the SD.

The total of the actual payoffs is a sum of independent random variables $\displaystyle \sum_i R_i.$ The standard deviation $\displaystyle S$ of this sum is the square root of the variance $\displaystyle V$ of the sum. The variance of a sum of independent random variables is the sum of the variances $\displaystyle V_i$ of each $\displaystyle R_i ,$ so

$\displaystyle V = \sum_i V_i .$

There are two ways to calculate the $\displaystyle V_i .$ The most accurate is to use the known pot sizes $\displaystyle P_i$ and the win probabilities $\displaystyle w_i$ to calculate the actual variance. Since $\displaystyle w_i P_i$ is the mean of the payoff $\displaystyle R_i ,$ the actual variance is

$\displaystyle V_i = w_i (P_i - w_i P_i)^2 + (1-w_i)(0 - w_i P_i)^2 .$

Then the standard deviation of $\displaystyle \sum_i R_i$ is

$\displaystyle S = \sqrt{V} = \sqrt{\sum V_i} .$

The 95% confidence interval around the expected total payoff $\displaystyle E = \sum{w_i P_i}$ is

$\displaystyle [ E- 1.96 S, E+ 1.96 S ].$

There is no multiplication by $\displaystyle \sqrt{n}$ of the sample size. Where did that go? Well, there is a second way to get the variance $\displaystyle V$ and that is to estimate it ignoring the $\displaystyle w_i$. We assume the random variables $\displaystyle R_i$ have the same variance $\displaystyle v'$ and estimate that as

$\displaystyle v' = \sum_i (R_i - M)^2 / (n-1)$

where $\displaystyle M = \sum_i R_i /n$ is the mean actual payoff.

Then the variance $\displaystyle V$ is estimated as

$\displaystyle V' = \sum_i v' = n v' .$

$\displaystyle V'$ is an estimate of the actual variance $\displaystyle V$ which is not exact because it treats the variation of $\displaystyle w_i$ as unknown.

The estimated standard deviation of $\displaystyle \sum R_i$ is

$\displaystyle S' = \sqrt{V'} = \sqrt{n v'} = \sqrt{v'}\sqrt{n} = s' \sqrt{n},$

where

$\displaystyle s' = \sqrt{v'} = \sqrt{\sum_i (R_i - M)^2 / (n-1)}$

is the estimated common standard deviation of the $\displaystyle R_i .$

Then the 95% confidence interval is

$\displaystyle [ E- 1.96 S', E+ 1.96 S' ] = [ E- 1.96 s' \sqrt{n}, E+ 1.96 s' \sqrt{n} ],$

which is the familiar one.

Page 3 of 3 First 123