# Thread: Observational vs. Experimental Data and How to Control Covariants.

1. ## Observational vs. Experimental Data and How to Control Covariants.

Hello, all. So I have this dataset that contains information pertaining to gun registration in the US including variables such as population, area, urban (I'm guessing that's the number of cities or something), poverty, gun registrations, and homicides. Here is the actual dataset:

Code:
     pop  area urban poverty gunreg homicides
AL  4089  52.4    60    19.0      1       410
AR  2372  53.2    54    18.4      1       240
CA 30380 163.7    93    14.2      1      3710
CT  3291   5.5    79     5.8      1       170
DC   598   0.1   100    19.2      1       489
FL 13277  65.8    85    14.1      1      1300
HI  1135  10.9    89    10.0      1        44
IA  2795  56.3    61    10.1      1        62
IL 11543  57.9    85    13.3      1      1270
MA  5996  10.6    84    10.2      1       200
MD  4860  12.4    81     9.3      1       540
MI  9368  96.8    70    13.9      1      1020
MN  4432  86.9    71    12.0      1       100
MO  5158  69.7    53    13.6      1       550
NC  6737  53.8    50    13.2      1       730
ND   635  70.7    53    13.5      1        11
NJ  7760   8.7    89     9.0      1       350
NY 18058  54.5    84    14.1      1      2550
OH 10939  44.8    74    11.8      1       760
PA 11961  46.1    69    10.8      1       740
RI  1004   1.5    86     8.2      1        38
SC  3560  32.0    55    16.5      1       350
TN  4953  42.1    61    16.9      1       470
TX 17349 268.6    80    16.8      1      2660
UT  1770  84.9    87     9.8      1        43
WA  5018  71.3    76    26.2      1       220
AK   570 656.4    68    11.2      0        56
AZ  3750 114.0    88    14.2      0       290
CO  3377 104.1    82    12.1      0       155
DE   680   2.5    73     8.1      0        32
GA  6623  59.4    63    16.0      0       720
ID  1039  83.6    57    13.7      0        21
IN  5610  36.4    65    14.1      0       380
KS  2495  82.3    69    11.1      0       150
KY  3713  40.4    52    17.4      0       260
LA  4252  51.8    68    22.0      0       760
ME  1235  35.4    45    12.5      0        23
MS  2592  48.4    47    23.8      0       370
MT   808 147.0    53    15.8      0        29
NE  1593  77.4    66    10.9      0        43
NH  1105   9.4    51     7.1      0        32
NM  1548 121.6    73    20.9      0       160
NV  1284 110.6    88    10.7      0       135
OK  3175  69.9    68    15.8      0       220
OR  2922  98.4    71    11.3      0       120
SD   703  77.1    50    13.5      0         9
VA  6286  42.8    69    10.6      0       550
VT   567   9.6    32     7.1      0        24
WI  4955  65.5    66     9.2      0       240
WV  1801  24.2    36    17.2      0       135
WY   460  97.8    65    10.6      0        20
So using this data (since it's not a perfectly set environment, it takes a little extra work to find relationships), how would you go about figuring out what factors actually contribute to the number of homicides. I feel a little overwhelmed and after transforming so much data (with basic least squares etc), I'm starting to confuse myself. Can anyone lend me a hand/lead me in the right direction to help me figure out what contributes to homicides?

2. ## Re: Observational vs. Experimental Data and How to Control Covariants.

Hey oldwarplanes.

For your problem you need to outline exactly what kind of relationships you are wishing to find.

Do you want to look at whether specific kinds of relationships exist or do you want to use algorithms to try and find relationships that you haven't pre-decided?

These two approaches require completely different kind of analyses.

3. ## Re: Observational vs. Experimental Data and How to Control Covariants.

I guess I'm looking to find relationships that I haven't pre-decided.

4. ## Re: Observational vs. Experimental Data and How to Control Covariants.

In that case you will need to look into data mining.

One suggestion I have for this is Rattle which has a GUI interface and allows you to perform a variety of data mining tasks.

Rattle is an add-on library/code-base for the free open source package R.

The R Project for Statistical Computing

Togaware: Rattle: A Graphical User Interface for Data Mining using R