I would recommend you ook at the EM algorithm for fitting distributions:
I've got some data on an epidemic in various locations - the total number of agents and number killed by the infection after 1 year. -This gives gives me a distribution of percentages of the populations that have been killed by the infection. (but all the percentage values are relatively small)
I wrote a mathematical ODE model for the disease spread within a population with 3 free parameters:
p1 - probability of getting infected externally from the environment
p2 - probability of infecting a new agent once at least one is already sick
p3 - once an agent dies, it is replaced with a new one, the probability that the new one is already infected is given by p3.
Now I need to choose values for p1,p2 and p3 so that the model generates data distributed as closely to the original distribution as possible.
The trouble is that I have never done anything like this before and have very little experience with any sort of statistics.
How should I define the original data distribution - a list of percentages of killed agents? a continuous function somehow?
Then should I choose values for p1,p2,p3 by trial and error and run simulations multiple times to also generate distributions of data?
Lastly, is there a proper way of comparing the obtained data with the original set? I've seen somewhere something about distance functions, what would be the best way of implementing this?
Thanks for any advice!
Thanks for this reply, I've seen lots of pages like this, but my problem is that I'm unsure how to apply any of the techniques I read about to my model.
Please note that I made a mistake in the description, it's not a deterministic ODE model, but more like a Monte Carlo stochastic model, so all I can do is run the model with some guessed parameters and try somehow comparing the resulting sets of data.
I'm really staring to doubt I'll be able to get anywhere with this in the time I have left...
Is there any way how can I compare 2 sets of data if I have for example 100 data points of real data and generate 100 data points using my simulation? I need some idea of how I can implement it.
If you have an existing parametric distribution family in mind, then you should use the EM algorithm.
If you want to fit your random variable to your data points with a non-parametric fit then take a look at re-sampling techniques used in statistical inference.
To do this construct an empirical distribution and then use a program to simulate a random number that corresponds to a particular bin (which you choose) and from that bin you choose any number that belongs to the bin with some chance (usually equal).