I've been dumped into a situation where I need to know some serious statistics without having much of a background at all in statistics.
My understanding of a "variable" in statistics is that it's something that you're measuring. For example, height. In this way, you can think of a variable as an axis, or a dimension. So if you're measuring two variables -- say, height and weight -- you can think of that as two dimensions (x and y on a graph).
So in the process of studying a paper that throws around quite a lot of statistics that I can't follow very well, I came across this definition in the context of analyzing high-dimensional data (m = number of records, n = number of dimensions in the dataset):
"Frequency: The frequency function of D_mxn for the n-dimensional variable x may be defined by equation (1), where h is the size of the intervals (or bins) within which the frequency is being measured.
Equation 1: f(x) = delta / mh
where delta is the number of records d_i contained in the same bin that contains x."
I'm trying to understand what this really means -- not just in a rote mathematical way, but really understand what's going on. There are obviously some huge defects in my understanding. Based on what I know of statistical variables above, a "variable" should really correspond to one dimension in high-dimensional data. That kind of renders the concept of an "n-dimensional variable" meaningless, so I'm getting something wrong.
--What is an n-dimensional variable?
--Intuitively, what is this definition of frequency trying to express?
Thanks very much for your time.


1Thanks
LinkBack URL
About LinkBacks