1 Attachment(s)

Improve the representation of the data

I have this graph, (please click on it)

Attachment 24663

From this graph I have 5 variables.

1) Country (various countries)

2) Gender (Male and female) - categorical

3) Education Level (Low, Mid and High) - categorical

4) Average Life Span - numerical

5) Standard Deviation - numerical

I am wondering what's a better representation of these data. i.e. the relationship between these 5 variables. Like what other graph can I use to show the 'main' message of the graph more simply?

I thought about doing a matrix plot, but plotting categorical variables against each other isn't ideal. Then I thought of doing a histrogram, splitted by the categories... but it will be massive as there are so many countries. I don't think taking the average in each category would be ideal, because then you use the 'country' variable.

What suggestions do you have?

Re: Improve the representation of the data

Hy lpd.

One recommendation I have is to use what is called a Principal Components Analysis and then generate a PCA plot.

This plot will give an idea of how dependent or independent things and it works even if you have multiple variables (not just the maximum of three): it's not the same as a normal plot or scatterplot and it's really easy to ascertain whether there is a relationship or not between different variables (and the nature of the dependency in terms of the magnitude).