Results 1 to 3 of 3

Math Help - Can it be justified to remove anomalous data points to give a better representation?

  1. #1
    Member
    Joined
    Nov 2006
    Posts
    126

    Can it be justified to remove anomalous data points to give a better representation?

    Hey!

    Ok so we carried out an experiment called the oral glucose tolerance test (OGTT) in which we had to fast for >6hours and then drink a large glucose load (50g). We then used a prick test to test the concentration of blood glucose every 30 minutes for 2 hours. Generally speaking, most peoples blood glucose rose and peaked at around 30/60 minutes. I THEN measured the rate of increase (between conc. at 0mins to max[conc]) and the rate of decrease (between max[conc] and the concentration 30 minutes after max[conc]) and produced the table below:

    Code:
    0.07222223	0.02666667
    0.09333333	0.090000
    0.2466667	0.040000
    0.1416667	0.003333333
    0.2766667	0.020000
    0.2633333	0.05666667
    0.3566667	0.150000
    0.210000	0.060000
    0.310000	0.010000
    0.290000	0.01666667
    0.2433333	0.050000
    0.1766667	0.1133333
    0.290000	0.02333333
    This graphs out to what is in the attachment. (The y-axis is rate of blood glucose decreasing)

    As you can see, there is a nice cluster of negatively correlated values! HOWEVER, around the plot there are some pretty anomalous values.
    I suspect the values to the LEFT of the plot to be a result of individuals who did not like the taste of the glucose drink that was administered and therefore didnt finish it, otherwise their livers' would be considered super-human (a very unlikely reason!).
    The extreme value to the RIGHT of the graph is anomalous biologically explainable.

    The problem that these anomalous values cause is that they significantly affect the conclusions that can be drawn from the results.
    Correlation analysis of this full set data shows no significant correlation between the two variables. However, if I remove the 3 most extreme data points (from the imagined line of best fit), then the correlation is significant.

    This is for a lab report which forms a HUGE percentage of my final grade at university so it is important that I can produce some good results.

    So what I'm asking is:

    1) Is there a way to identify anomalous data?

    2) Would it be scientifically acceptable to remove this data? If so, what reason could I give for removing it?

    NOTE: I am using Excel, SPSS and GraphPrism for all the statistical analysis.

    Please help, I'm desperate!!!
    Attached Thumbnails Attached Thumbnails Can it be justified to remove anomalous data points to give a better representation?-justrates.jpg  
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Grand Panjandrum
    Joined
    Nov 2005
    From
    someplace
    Posts
    14,972
    Thanks
    4
    Quote Originally Posted by anthmoo View Post
    Hey!

    Ok so we carried out an experiment called the oral glucose tolerance test (OGTT) in which we had to fast for >6hours and then drink a large glucose load (50g). We then used a prick test to test the concentration of blood glucose every 30 minutes for 2 hours. Generally speaking, most peoples blood glucose rose and peaked at around 30/60 minutes. I THEN measured the rate of increase (between conc. at 0mins to max[conc]) and the rate of decrease (between max[conc] and the concentration 30 minutes after max[conc]) and produced the table below:

    Code:
    0.07222223	0.02666667
    0.09333333	0.090000
    0.2466667	0.040000
    0.1416667	0.003333333
    0.2766667	0.020000
    0.2633333	0.05666667
    0.3566667	0.150000
    0.210000	0.060000
    0.310000	0.010000
    0.290000	0.01666667
    0.2433333	0.050000
    0.1766667	0.1133333
    0.290000	0.02333333
    This graphs out to what is in the attachment. (The y-axis is rate of blood glucose decreasing)

    As you can see, there is a nice cluster of negatively correlated values! HOWEVER, around the plot there are some pretty anomalous values.
    I suspect the values to the LEFT of the plot to be a result of individuals who did not like the taste of the glucose drink that was administered and therefore didnt finish it, otherwise their livers' would be considered super-human (a very unlikely reason!).
    The extreme value to the RIGHT of the graph is anomalous biologically explainable.

    The problem that these anomalous values cause is that they significantly affect the conclusions that can be drawn from the results.
    Correlation analysis of this full set data shows no significant correlation between the two variables. However, if I remove the 3 most extreme data points (from the imagined line of best fit), then the correlation is significant.

    This is for a lab report which forms a HUGE percentage of my final grade at university so it is important that I can produce some good results.

    So what I'm asking is:

    1) Is there a way to identify anomalous data?

    2) Would it be scientifically acceptable to remove this data? If so, what reason could I give for removing it?

    NOTE: I am using Excel, SPSS and GraphPrism for all the statistical analysis.

    Please help, I'm desperate!!!
    What you present here is pretty close to what you should put in your report.

    Report the conclusion on the raw data and explain why you think that the outliers should be pruned and present the results etc after pruning and why these are to be preferred to the results with the raw data..

    I doubt it is important that you produce "good results" rather than a good report. To report the reality is much better science that trimming the data to fit the desired conclusions. The alternative is to encourage scientific fraud.

    CB
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Member
    Joined
    Nov 2006
    Posts
    126
    Thanks man! I remember you from here a few years back when I was doing my A Levels in Further Maths and Maths. You were a great help then and you still are thank you, you do great work here!
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. [SOLVED] Interpolating z(x,y) data point from 4 data points (rectangular)?
    Posted in the Advanced Applied Math Forum
    Replies: 2
    Last Post: June 20th 2011, 07:04 PM
  2. Equation from known data points
    Posted in the Geometry Forum
    Replies: 1
    Last Post: September 12th 2010, 11:56 PM
  3. Statistical Data Representation.
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: January 14th 2009, 12:07 AM
  4. Replies: 1
    Last Post: July 28th 2008, 09:51 PM
  5. Statistics - Representation Of Data
    Posted in the Statistics Forum
    Replies: 1
    Last Post: March 18th 2008, 06:16 AM

Search Tags


/mathhelpforum @mathhelpforum