# Thread: What is an n-dimensional variable in statistics

1. ## What is an n-dimensional variable in statistics

I've been dumped into a situation where I need to know some serious statistics without having much of a background at all in statistics.

My understanding of a "variable" in statistics is that it's something that you're measuring. For example, height. In this way, you can think of a variable as an axis, or a dimension. So if you're measuring two variables -- say, height and weight -- you can think of that as two dimensions (x and y on a graph).

So in the process of studying a paper that throws around quite a lot of statistics that I can't follow very well, I came across this definition in the context of analyzing high-dimensional data (m = number of records, n = number of dimensions in the dataset):

"Frequency: The frequency function of D_mxn for the n-dimensional variable x may be defined by equation (1), where h is the size of the intervals (or bins) within which the frequency is being measured.

Equation 1: f(x) = delta / mh

where delta is the number of records d_i contained in the same bin that contains x."

I'm trying to understand what this really means -- not just in a rote mathematical way, but really understand what's going on. There are obviously some huge defects in my understanding. Based on what I know of statistical variables above, a "variable" should really correspond to one dimension in high-dimensional data. That kind of renders the concept of an "n-dimensional variable" meaningless, so I'm getting something wrong.

--What is an n-dimensional variable?
--Intuitively, what is this definition of frequency trying to express?

Thanks very much for your time.

2. ## Re: What is an n-dimensional variable in statistics

Hey RemedialInMath.

A n-dimensional variable is just a variable with n independent single variables. If it is a n-dimensional random variable, it's just a function of n individual random variables.

Its exactly the same as a non-random function of many variables like f(x,y,z) = x^2 + y + z but these variables are now random variables.

In probability when you have n-dimensional random variables, you have what is called a joint distribution and this joint distribution is a distribution that tells you P(A = a, B = b, C = c, etc) at those particular points so you can think of this new PDF as being a probability for all different combinations of possibilities for all of the random variables.

Before in univariate situations you only have P(X = x) or P(Y = y), but now you have more degrees of freedom which means that you have to now introduce more ways to sum or integrate if you want to get a probability over that region.

It has the same interpretation probability wise as the univariate but since you have multiple variables now, you also have co-relationships between the variables which gives rise to covariance, correlation and other things.

In terms of the random variables themselves, the joint distribution is defined across all permutations of each of the random variables, but you can still calculate expectations like X + Y or X - Y or X^2 + Y^2 just like you do in the normal univariate case by using E[g(X,Y)] instead of E[g(X)].

But that's all it really is.

3. ## Re: What is an n-dimensional variable in statistics

OK, that pretty much makes sense. Thanks.