Thread: [SOLVED] Am I Generating Random Numbers Correctly?

1. [SOLVED] Am I Generating Random Numbers Correctly?

Hi,

I'm not sure what level to pitch this at; I thought "High School" was probably appropriate given that I have not done any proper maths for years! I am a programmer rather than a mathematician, so please forgive me for lapsing into computer-nerd notation rather than using correct maths notation

Ok. What I have done is as follows:

I have a set of discrete samples of a measured probability distribution (i.e. it can't be generated by a simple equation, like the gaussian equation, etc). Let's call the array of input values x (e.g. vehicle speed, or whatever) and the associated array of probabilities P. There are N values in each set, such that [i]P is the normalised probability of the value [i]x turning up.

I wanted to write a random number generator that generated values which, over time, have a probability distribution that matches the one I have measured.

Pseudo-code (my appologies, I don't know how to write this in the way that a mathematician would!) - this represents the sentiment of what I did, not what I actually did!:
Code:
BEGIN FUNCTION RandDist():
LET N = a very large integer (e.g. 0x7FFFFFFF).

LET xmin be the minimum value of x
LET xmax be the maximum value of x

LET random(a, b) be a function which returns different a random number
each time it is called, between a and b inclusive, which has a flat
distribution.

LET P(x) be a function which returns the measured probability of x (or
nearest measured value to x) turning up.

LET r[] be the results array.

LOOP N times:
WHILE true:
LET v = random(xmin, xmax)
LET w = random(0, 1)

IF w < P(v):
break out of while loop
END WHILE

append v to r
END LOOP

RETURN r

END FUNCTION
My question is:

Do you think this is sensible?? I know there can be problems regarding random number generators not returning truly uncorrelated values. I am also concerned about using the same generator to provide values for v and w.

A penny for your thoughts ...

TIA,
Mike.

2. This might help (but depends on your application).
It shows how discrete multivariate data can be used to generate more observations while preserving correlation.

PDF available through
http://agecon.lib.umn.edu/

Journal of Agricultural and Applied Economics, 32,2(August 2000):299-315
© 2000 Southern Agricultural Economics Association
An Applied Procedure for Estimating and
Simulating Multivariate Empirical (MVE)
Probability Distributions In Farm-Level
Risk Assessment and Policy Analysis

3. A simpler method is this:
Suppose you have the prob. associated with ith member as p(i), compute f(i), the cumulative probabability, i.e. f(i) = p(1)+p(2)+... + p(i). Generate Uniform (0,1). If it falls between f(i-1) and f(i), select ith unit. This will ensure the probability of i th unit selected would be exactly p(i).

This method is called cumulative frequency method, used widely in Sample surveys to select units wtih probability proportional to size.

Best of luck

4. good post.