# Thread: Statistics - Fiction/Non-fiction books

1. ## Statistics - Fiction/Non-fiction books

Would you expect word-length in general to differ in fiction and non-fiction books?

Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each, and find the mean, median, modal word-length in each and standard deviation. Make it clear how you have done the various calculations without presenting detailed arithmetic.

In the light of the figures found, comment on the initial question (no formal inferential work needed) just an informed view from the figures found.

2. Originally Posted by Natasha1

Would you expect word-length in general to differ in fiction and non-fiction books?

Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each, and find the mean, median, modal word-length in each and standard deviation. Make it clear how you have done the various calculations without presenting detailed arithmetic.

In the light of the figures found, comment on the initial question (no formal inferential work needed) just an informed view from the figures found.
I imagine it would also depend on the authors used in the samples. For instance, Frank Herbert would probably have an inordinate number of "$5" words, whereas Madeline L'Engle wouldn't. -Dan 3. Originally Posted by topsquark I imagine it would also depend on the authors used in the samples. For instance, Frank Herbert would probably have an inordinate number of "$5" words, whereas Madeline L'Engle wouldn't.

-Dan
I think it would be reasonable to say that in a fiction book there are about 100 000 words and in a non-fiction book maybe 80 000 words. Any suggestions?

4. Originally Posted by Natasha1

Would you expect word-length in general to differ in fiction and non-fiction books?
Yes, but it depends on the authors as well.

Take two books, of different authors, one fiction, one non-fiction. Choose a reasonable sample size of words from each, and find the mean, median, modal word-length in each and standard deviation. Make it clear how you have done the various calculations without presenting detailed arithmetic.
When you choose the books avoid non-fiction rich in mathematical, chemical
or similar notation, it will make the sampling more difficult and ambiguous.

I would suggest that the fiction be "Moby Dick", as its almost traditional
by now to use this as a literary reference test (as in testing the "Bible Codes"
claims).

You will need to consider how big a sample you will need, this will depend
on the spread of word lengths in the texts (the SD of word lengths), and
the resolution you wish to achieve in your test (that is do you wish to
detect a difference in mean word lengths of 1, 0.1, 0.01 ... letters with
high probability).

A rough order of magnitude estimate that I made indicates you may be looking
at sample sizes ~>1500 if you want to detect differences in the mean word
length of ~0.1 words.

You will need to devise a sampling frame that selects words fairly (method
of deciding which words to include in your sample). How you do this will
depend on the facilities you have available (computer text with software
that can randomly sample the texts, or doing it by hand with paper copies
of the books).

In the light of the figures found, comment on the initial question (no formal inferential work needed) just an informed view from the figures found.
That's about all I can think of for this at present.

RonL