# Math Help - Monte Carlo Simulation

1. ## Monte Carlo Simulation

Hi

I am currently working for a financial analyst using a modelling program which uses Monte Carlo simulations (Latin Hypercube Sampling method) to predict the annual cost of a set of potential events. I don’t have much formal education or training in advanced probability/statistics so I need some help.

Background –

The model is set up like so:
• Enter the data set of potential events
• Specify a set of ranges of potential cost per event (for example, level 1 might be between $0 and$100k per event, level 2 between $100k and$1m, level 3 between $1m and$5m, etc.) – this is the impact I.
• Specify a set of ranges of potential number of events per unit time, generally one year (for example, level 1 between 0 and 0.5 events per year, level 2 between 0.5 and 1 events per year, level 3 between 1 and 4 e.p.y., etc.) – this is the frequency F.
• Select a particular range for both these categories for each event, for example the first event could have Impact at level 2 and Frequency at level 3 [This would give you a broad estimate of 2.5 occurrences per year costing $550,000 each time leaving a total expected cost of$1.375m per year, assuming likelihood of taking a particular i or f within a range is normally distributed around the midpoint of that range]
• The cost per event and events per year are each assigned a distribution curve. In this case, the Impact uses a Gumbel distribution while the Frequency uses a Poisson distribution. This is used to randomly generate values within the specified range in the MC simulation.

Running the simulation –

As well as a sample size (50,000 simulations as this is enough to guarantee convergence using the LHS method), I specify a confidence level for each simulation. The best way I can describe this confidence level is as the probability that events of this severity won’t be seen in a given year. For example running simulations at a CL of 0.95, or 95%, will give the costs of these events for the worst year in twenty; using a CL of 99.93%, or 0.9993, will give the costs of these events for the seventh worst year in ten thousand and so on.

My question is, what confidence level do I use to simulate the ‘average year’? I believe it to be somewhere in the region of 55-65%, and some basic practical testing does tend to indicate a figure of around 59.7% (although this cannot be relied upon due to the extremely limited sample size). Can anyone give me a more accurate number, and if possible, some form of proof?

There is no correlation between events, they are independent.

Thanks.

[note: the basic testing is based upon the following –
- run a simulation at confidence level of X = 50% for a selection of approx 350 events (each event returns an annual cost x(n) for n = [1, 350])
- find an expected value y(n) for each event by multiplying the midpoints of the applicable I and F ranges as described above: y(n) = { I(min) + ½ [I(max) – I(min)] } x { F(min) + ½ [F(max) – F(min)] }
- find x(n) as a proportion of y(n). Using X = 50% means that x(n) < y(n) – not sure if this is necessarily true, but it is for all of my 350 samples. The average result of x(n) by y(n) is approximately 0.837
- Divide the original confidence level X by the average of these ratios, in this case 0.837. This gives the predicted confidence level needed to see the simulation match the expected results, which my tests gave the best estimate as 59.7%]

2. Is there a particular reason as to why you want to use latin hypercube? I have run 1 zillion ways of simulating and I would recommend using simple monte carlo with antithetic variates to expedite convergence. If you need more help, feel free to pm me.

Originally Posted by CC189
Hi

I am currently working for a financial analyst using a modelling program which uses Monte Carlo simulations (Latin Hypercube Sampling method) to predict the annual cost of a set of potential events. I don’t have much formal education or training in advanced probability/statistics so I need some help.

Background –

The model is set up like so:
• Enter the data set of potential events
• Specify a set of ranges of potential cost per event (for example, level 1 might be between $0 and$100k per event, level 2 between $100k and$1m, level 3 between $1m and$5m, etc.) – this is the impact I.
• Specify a set of ranges of potential number of events per unit time, generally one year (for example, level 1 between 0 and 0.5 events per year, level 2 between 0.5 and 1 events per year, level 3 between 1 and 4 e.p.y., etc.) – this is the frequency F.
• Select a particular range for both these categories for each event, for example the first event could have Impact at level 2 and Frequency at level 3 [This would give you a broad estimate of 2.5 occurrences per year costing $550,000 each time leaving a total expected cost of$1.375m per year, assuming likelihood of taking a particular i or f within a range is normally distributed around the midpoint of that range]
• The cost per event and events per year are each assigned a distribution curve. In this case, the Impact uses a Gumbel distribution while the Frequency uses a Poisson distribution. This is used to randomly generate values within the specified range in the MC simulation.

Running the simulation –

As well as a sample size (50,000 simulations as this is enough to guarantee convergence using the LHS method), I specify a confidence level for each simulation. The best way I can describe this confidence level is as the probability that events of this severity won’t be seen in a given year. For example running simulations at a CL of 0.95, or 95%, will give the costs of these events for the worst year in twenty; using a CL of 99.93%, or 0.9993, will give the costs of these events for the seventh worst year in ten thousand and so on.

My question is, what confidence level do I use to simulate the ‘average year’? I believe it to be somewhere in the region of 55-65%, and some basic practical testing does tend to indicate a figure of around 59.7% (although this cannot be relied upon due to the extremely limited sample size). Can anyone give me a more accurate number, and if possible, some form of proof?

There is no correlation between events, they are independent.

Thanks.

[note: the basic testing is based upon the following –
- run a simulation at confidence level of X = 50% for a selection of approx 350 events (each event returns an annual cost x(n) for n = [1, 350])
- find an expected value y(n) for each event by multiplying the midpoints of the applicable I and F ranges as described above: y(n) = { I(min) + ½ [I(max) – I(min)] } x { F(min) + ½ [F(max) – F(min)] }
- find x(n) as a proportion of y(n). Using X = 50% means that x(n) < y(n) – not sure if this is necessarily true, but it is for all of my 350 samples. The average result of x(n) by y(n) is approximately 0.837
- Divide the original confidence level X by the average of these ratios, in this case 0.837. This gives the predicted confidence level needed to see the simulation match the expected results, which my tests gave the best estimate as 59.7%]

3. Originally Posted by zigzag20
Is there a particular reason as to why you want to use latin hypercube? I have run 1 zillion ways of simulating and I would recommend using simple monte carlo with antithetic variates to expedite convergence. If you need more help, feel free to pm me.

The only reason I am using the LH method is because that is what the original programmer decided! I am not a programmer so couldn't rewrite it even if my company allowed me to.

I'll give a little more detail:

I have been given a set of data and asked to find out "what these events will cost in an average year" but I don't know what confidence level to run the simulations at.

A graph of the confidence level (x-axis) against the probability that the annual results will match that confidence level (y-axis) should give a skewed normal distribution. The modal average is at the 50% confidence level but this doesn't represent the average year, because, since costs have a lower bound of 0 but only a theoretical upper bound that could be hundreds of times higher than the average, it is more probable that costs will be slightly above this modal average than below.

The mean (which is the target confidence level) is found at the confidence level C such that on the graph I described, the line x = C divides the area under the curve into to two equal parts; if the curve is denoted f(x) then this could be written as

{the integral of f(x) evaluated over the interval [0, C]} = {the integral of f(x) evaluated over the interval [C, 100]}.

4. When I ran simulations, I avoided latin hypercube most of the time. Let me rethink your problem and I shall get back. It is not that hard.