Correlation with multiple variables
I learned in stats class how to do a standard Pearson correlation. I have a data set that I now want to analyze that adds another element and I am not sure what to do.
I have baseball stats that includes three pieces of data for a set of baseball games:
1. Number of Home Runs hit
2. Wind speed
3. Wind direction.
I want to see if a strong wind blowing to the outfield makes a difference in how many home runs are hit. The problem is that sometimes the wind only blows 1 mph. Sometimes 20 mph. I also have a bit more detailed direction data such as N, NNE, NE, ENE, E, etc. So it isn't just "wind blowing out" or "wind blowing in". It is reasonable to assume that there are more home runs hit when the wind is blowing out to N than when it is blowing to ENE.
Where do I start?
Re: Correlation with multiple variables
you can run a linear regression of the number of home runs hit (1) on a constant and the wind speed (2) and the wind direction (3). However, a strong wind blowing to N has the converse effect from a wind blowing to S. Therefore, you might transform the data to a two variable case:
Change the wind direction in angle phi. Let phi=0 for N, and compute
wind speed to N = (wind speed)*cos(phi)
Now you can compute the Pearson correlation of (i) the windspeed to N and (ii) number of home runs hit. Alternatively you can run a linear regression of (ii) on (i) and a constant to do a test of significance.