Hey laban1.
Can you outline your sample properties? What is your distribution (assumed)? What are you using for outlier detection? (Cooks distance, something else maybe)?
Hi !
Sometimes I have lot of "zeroes" sometimes not.
It's for practical sampling, with a lot of figures, so I'd
rather have an easy enough approach then a perfect but complicated solution.
Instruments sometimes give unexpected values quite off.
How do I go about this?
Do I have to solve
a) upper bound: P(x>cutoff) = 0,003 (setting same level as Std3 in a Norm)
Is there a faster/better rule of thumb here?
It's for a practical point of view so it doesn't have to be perfect.
b) And how do I set lower bound? seems even more tricky!
c) When I found an outlier in a rule I set up,
How do I go about it? Do I remove it from my sample-set completely
directly when I discover it, or do I keep it if it
is within expected level of probability?
Hope you can help!
Thanks!
What is your distribution (assumed)?
- Poisson (as in title), Lambda = np is fairly stable in each of the studied Areas.
but Lambda can very from very small to very large in different Areas.
What are you using for outlier detection? (Cooks distance, something else maybe)?
Cooks distance? Don't know. Suppose it could be a multiple of sdt?? Hence my question.
There is an attribute in Poisson modelling called over-dispersion.
I think you should check it out and use your favourite software package like R or SAS to estimate the over-dispersion co-efficient and incorporate it into your analysis.
You could throw away the 0's if you have good enough justification to throw them out, but if that is not the case then take a look at over-dispersion and consider looking at other similar methods to account for this skewed behaviour in the Poisson.
Overdispersion - Wikipedia, the free encyclopedia
I can't tell you what to do with the data in terms of throwing out outliers or censoring data, but I do know that for your kind of problem, over-dispersion analyses is a good first start.
You should decide first of all whether the outliers should stay or be thrown out.
To do this, you need to figure out whether the outliers are representative of the data and what you are trying to answer or if they are not.
From the post just before yours, it appears chiro is asking for the context of the data. The more information you provide about the data, the better our understanding of it will become. The question "what do you need to know?" is difficult to answer from this end. We have no access to the data. We have no feel for what the data looks like, where it came from, how it is being evaluated, the circumstances of how it is obtained, potential expected causes for outliers (and possible methods for detecting them), potential dependencies that could be examined to rule out outliers, the level of accuracy you want in your findings, etc.
Did you read up on Cook's Distance as chiro suggested? You could also look at Identifying outliers. Check and see if any of those methods seem suitable for your model. The section below titled "Working with Outliers" might also be of interest. You asked if you should delete outlier data. That practice is frowned upon.
Thanks for your reply!
Sorry for the late feedback
It seems pretty advanced, for me, Cooks distance and all !
Just to choose method as you point out, is not even given.
I'm playing with an option to expand my measuring time so that instead of getting Lambda = 3
for one hour, I could sum 4 hour and get Lambda = 12 and thus I would get approx Norm
and could apply "3std-rule".
I have quite a lot of data, and I expect the process to be fairly stable (Lambda stable) over time.
Would that be an option to consider?
Pros and cons?
Thanks!
PS. Simplified about data:
I'm measuring waterflow during time of least expected flow, 1hour btw 03-04 at night.
I have different flow-meaters and I expect there are calibration-problems as well.
You may not necessarily get normality by using a higher rate.
You should look at asymptotic results, particular with regard to the deviance statistic (which is a chi-square statistic) if you have a big enough sample.