Improve the representation of the data
I have this graph, (please click on it)
From this graph I have 5 variables.
1) Country (various countries)
2) Gender (Male and female) - categorical
3) Education Level (Low, Mid and High) - categorical
4) Average Life Span - numerical
5) Standard Deviation - numerical
I am wondering what's a better representation of these data. i.e. the relationship between these 5 variables. Like what other graph can I use to show the 'main' message of the graph more simply?
I thought about doing a matrix plot, but plotting categorical variables against each other isn't ideal. Then I thought of doing a histrogram, splitted by the categories... but it will be massive as there are so many countries. I don't think taking the average in each category would be ideal, because then you use the 'country' variable.
What suggestions do you have?
Re: Improve the representation of the data
One recommendation I have is to use what is called a Principal Components Analysis and then generate a PCA plot.
This plot will give an idea of how dependent or independent things and it works even if you have multiple variables (not just the maximum of three): it's not the same as a normal plot or scatterplot and it's really easy to ascertain whether there is a relationship or not between different variables (and the nature of the dependency in terms of the magnitude).