# Categorical data and sample size

• May 5th 2011, 11:31 PM
TheRobster
Categorical data and sample size
Hello all,

Suppose I am taking a survey sample of how many people own a cat. Obviously there is only two categorical outcomes: yes or no. However suppose I want to expand my survey and determine what cat names are most popular. Again this is categorical with 'no cat' being one option but then many other possible options for names.

My question is therefore how do you calculate the standard error in cases like this, and how do you determine an appropriate sample size?

For instance in the first example suppose my sample size if effectively infinite (many millions of people, a whole country's population). Since I'm just looking at a yes/no answer I probably don't need that many random samples to get a fairly accurate answer. However in the second example (cat names) because there are so many more possibilities I expect my minimum sample size would have to be much larger than the simple yes/no survey, but how do I determine how large it would need to be in order to be representitive of the larger population?

Regards
Rob
• May 6th 2011, 03:04 AM
SpringFan25
standard error of what? the proportion of people who's cat name is (x) ?

My thinking:

If your trouble is that there are infinite categories, you can make it finite by having "other" as a category, along with other categories of your choice.

If you want to work out how many people you need to ask to have a good chance of picking up every name in use... this cant be determined (other than by judgment) without making assumptions about what the distribution of names is.