Is there a particular reason as to why you want to use latin hypercube? I have run 1 zillion ways of simulating and I would recommend using simple monte carlo with antithetic variates to expedite convergence. If you need more help, feel free to pm me.

Hi

I am currently working for a financial analyst using a modelling program which uses Monte Carlo simulations (Latin Hypercube Sampling method) to predict the annual cost of a set of potential events. I don’t have much formal education or training in advanced probability/statistics so I need some help.

Background –

The model is set up like so:

- Enter the data set of potential events
- Specify a set of ranges of potential cost per event (for example, level 1 might be between $0 and $100k per event, level 2 between $100k and $1m, level 3 between $1m and $5m, etc.) – this is the impact I.
- Specify a set of ranges of potential number of events per unit time, generally one year (for example, level 1 between 0 and 0.5 events per year, level 2 between 0.5 and 1 events per year, level 3 between 1 and 4 e.p.y., etc.) – this is the frequency F.
- Select a particular range for both these categories for each event, for example the first event could have Impact at level 2 and Frequency at level 3 [This would give you a broad estimate of 2.5 occurrences per year costing $550,000 each time leaving a total expected cost of $1.375m per year, assuming likelihood of taking a particular
iorfwithin a range is normally distributed around the midpoint of that range]- The cost per event and events per year are each assigned a distribution curve. In this case, the Impact uses a Gumbel distribution while the Frequency uses a Poisson distribution. This is used to randomly generate values within the specified range in the MC simulation.

Running the simulation –

As well as a sample size (50,000 simulations as this is enough to guarantee convergence using the LHS method), I specify a confidence level for each simulation. The best way I can describe this confidence level is as the probability that events of this severity won’t be seen in a given year. For example running simulations at a CL of 0.95, or 95%, will give the costs of these events for the worst year in twenty; using a CL of 99.93%, or 0.9993, will give the costs of these events for the seventh worst year in ten thousand and so on.

My question is, what confidence level do I use to simulate the ‘average year’? I believe it to be somewhere in the region of 55-65%, and some basic practical testing does tend to indicate a figure of around 59.7% (although this cannot be relied upon due to the extremely limited sample size). Can anyone give me a more accurate number, and if possible, some form of proof?

There is no correlation between events, they are independent.

Thanks.

[note: the basic testing is based upon the following –

- run a simulation at confidence level of X = 50% for a selection of approx 350 events (each event returns an annual cost x(n) for n = [1, 350])

- find an expected value y(n) for each event by multiplying the midpoints of the applicable I and F ranges as described above: y(n) = { I(min) + ½ [I(max) – I(min)] } x { F(min) + ½ [F(max) – F(min)] }

- find x(n) as a proportion of y(n). Using X = 50% means that x(n) < y(n) – not sure if this is necessarily true, but it is for all of my 350 samples. The average result of x(n) by y(n) is approximately 0.837

- Divide the original confidence level X by the average of these ratios, in this case 0.837. This gives the predicted confidence level needed to see the simulation match the expected results, which my tests gave the best estimate as 59.7%]