how to determine if a signal in several data streams "tends to occur together"?
I have six strings of data; each string consists of bits obtained from a device run over the same 12 hour period (the six experiments were run on six devices simultaneously). Most of the data are 0's, but occasionally (about 5% of the time) there's a 1. I want to test the hypothesis that the strings are synchronized, such that if a "1" occurs in one string, it's more likely to occur in the same position in the other strings (or at least close - within a couple of positions). Basically the idea is to test whether the positions of 1's in the strings are completely independent of each other, or if there's a significant correlation among them such that a "1" in one string is indicative of a higher probability of a 1 in the other strings. What statistical test should I use?
Re: how to determine if a signal in several data streams "tends to occur together"?
I don't know of a standard test specifically designed for this purpose, but here is an idea that may work for you. Count the number of positions in the string which contain 0 ones, 1 one, 2 ones, ..., 6 ones. If the strings are independent, then these numbers will follow a Binomial(n=6, p) distribution, where p is the fraction of ones. Specifically, if L is the length of the strings and
is the number of positions with i ones, then
 = L \binom{6}{i} p^i (1-p)^{6-i})
You can then use a contingency table (based on the chi square statistic) to compare the expected counts from this formula with the actual counts from your experiment.
Re: how to determine if a signal in several data streams "tends to occur together"?
| how many 1's among the 6 channels: | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| observed (in 200 trials): | 0.750 | 0.205 | 0.040 | 0.005 | 0 | 0 | 0 |
| expected, binom.dist | 0.735 | 0.232 | 0.031 | 0.002 | 0 | 0 | 0 |
Did I do it correctly? Looks very similar, and a chi squared test (using only categories 0-3, since 4-6 have zeros in the Observed) says 0.999726516, or exactly as expected. Does that look right?
thanks!
Re: how to determine if a signal in several data streams "tends to occur together"?
I think you want to multiply the numbers in your table by 200 in order to get the expected numbers of ones before computing the chi squared statistic; the test deals with counts, not proportions. You also should combine counts for number of ones = 2, 3, 4, 5, and 6 into a single category (2 or greater) before computing the statistic, since the expected numbers are so low in those categories. The usual rule of thumb for a chi square test is that the expected number in a bin should be at least 5.
Re: how to determine if a signal in several data streams "tends to occur together"?
Right, makes perfect sense. I did it, and p value is lower but still not significant (~0.5 or so). Thank you!!