Results 1 to 2 of 2

Math Help - Probability in binomial distributions

  1. #1
    Newbie
    Joined
    May 2009
    Posts
    1

    Probability in binomial distributions

    Hi,

    just to make it clear, this is about some very basic stuff. I am trying to learn statistics (using R), and I won't understand concepts until I've understood what they mean "in real life", so to speak.

    Ok, this is the scenario (taken from the book I'm using to learn this, H. Baayen's 'Analyzing linguistic data, a practical introduction to statistics using R', 2008, p.45ff.):

    We have a gathered corpus of English, and it has 18.6 million words. The most frequent word in this corpus is the word 'the', with 1.1 million instances. Now, the relative frequency of 'the' is 1.1/18.6=0.06. So this is p, right, meaning that if I randomly selected one word from this corpus, the probability of getting 'the' is 0.06. So far so good.

    Now, Baayen says 'let the NUMBER OF TRIALS(n) denote the size of the corpus. Each token in the corpus is regarded as a trial which can result either in a success or in a failure' (p.46). I don't really understand why that is. If we assume independence, then every time we pick a word, the word goes back in, so the number of trials won't equal the size of the corpus. If we didn't put the word back in, then sure, the numner of trials would automatically match the number of words.

    At any rate, now it's time to look at the probability (of something) given a binomial distribution. So this probability for 'the' is dbinom(Random Variable,n,p), which is dbinom(1100000,18600000,0.06). Now, this is simply a function that R has - I don't care how it calculates it. The output is 0.0004.

    I don't understand what 0.0004 is the probability _of_, in plain English. Could someone _explain_ with some analogy (or using the word corpus case) what 0.0004 is the probability of?

    Thanks!
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Master Of Puppets
    pickslides's Avatar
    Joined
    Sep 2008
    From
    Melbourne
    Posts
    5,236
    Thanks
    28
    Quote Originally Posted by tjodrik View Post
    dbinom(1100000,18600000,0.06). Now, this is simply a function that R has - I don't care how it calculates it. The output is 0.0004.
    I do not use R but I think the problem is in the function dbinom(). I have a feeling that the first parameter of 1,100,000 is in correct. I would think the first variable would be the amount of times you expect the word 'the' to appear.

    In other words given the word 'the' has a probabilty of 0.06 then the chance it will occur 11,100,000 times given 18,600,000 words appear is 0.04

    this first parameter can very for finding the probabilty of X amount of 'the' depending on how many you desire.

    Quote Originally Posted by tjodrik View Post
    So this probability for 'the' is dbinom(Random Variable,n,p), which is dbinom(1100000,18600000,0.06). Now, this is simply a function that R has - I don't care how it calculates it. The output is 0.0004.
    Here is the main problem, if you looked at the binomial theorem this would all make much more sense.
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Replies: 3
    Last Post: July 15th 2010, 05:33 AM
  2. Replies: 3
    Last Post: May 18th 2010, 03:15 AM
  3. Binomial Distributions
    Posted in the Advanced Statistics Forum
    Replies: 5
    Last Post: March 28th 2010, 12:36 PM
  4. Help with negative binomial distributions
    Posted in the Advanced Statistics Forum
    Replies: 4
    Last Post: March 26th 2010, 03:50 PM
  5. Expressions for Binomial Distributions
    Posted in the Advanced Statistics Forum
    Replies: 4
    Last Post: October 18th 2009, 07:48 AM

Search Tags


/mathhelpforum @mathhelpforum