# Thread: Chemist needs help in maths - Data analysis

1. ## Chemist needs help in maths - Data analysis

Hi!
Congratulations for this forum, is great!

I post this in basic area because i think it's basic for you, however it isnīt for me. I'm doing a project on chemistry and i need some help on analysis of data, if possible. I think i can get more help from a mathematician then a chemist. I'm studying a property (CMC, critical micellar concentration) of a solution, and to treat data i trace a plot of concentration (m) Vs specific conductivity (k), like this:

Then with a linear fit i get two straight lines, and the intersection is the CMC:

The problem begins when the variation is very smooth, like this:

Now, i've found this paper where they did a "complicated" - at least for me - data treatment and get a derivate-like result:

On page 138, in Results and Discussion, they said:
The problem of estimating the derivatives of the regression curve has beenvresolved by implementing the local polynomial regression estimator in the programming language Matlab, using the quartic kernel and plug-in optimal AMSE bandwidth. The calculation of hopt(x0) was achieved by estimating σε2 using a local polynomial regression, with a pilot bandwidth; the Parzen-Rosenblatt kernel estimator for f(x0) was also obtained by using a pilot bandwidth, and a parametric regression and a local polynomial regression were used for m(p+1)(x0). To avoid the problem of choosing an initial value h0, an iterative ap- proach was taken starting with a large h0. This initial pilot bandwidth produces another bandwidth, and this last one produces another, and so on. The iteration is continued to convergence.

paper:

This is chinese for me. I've never work in mathlab,so..it's sound very difficult to me to replicate this method, i just know the basics of math to work. But i would like give it a try. I've done the first and the second dervivative (dk/dm) ant it didn't worked. Its difficult to work with mathlab and this plugin? Is doable to a non-expert? Do you have any sugestion of a method to do the analysis of my data?

Thank you!

PS - Sorry my bad english!

2. I would put this question in the University Math Help\Advanced Applied Math or University Math Help\Other Advanced Topics section. Just a heads-up.

I have a couple of questions: presumably you're trying to find CMC, or the intersection of these two fitted lines. Do you have any bounds on the location? For example, do you always know that this many data points corresponds to the first line, and the next however many data points corresponds to the second line? Or perhaps do you always know that the CMC occurs after a particular value of m or before a particular value of m?

Or here's another question: do you always know that the function is concave down?

Here's one algorithm:

1. Take clusters of 5 contiguous data points (or so). Fit parabolas to them.
2. Take the second derivatives of each of the fitted polynomials (this equals twice the coefficient of the $x^{2}$ term).
3. Find the most negative of these second derivative values. That is your CMC.

Here's another algorithm:

1. Rotate all of your data points through an angle $\theta$ such that the y-component of the first data point is equal to the y component of the last data point.
2. Run a peak-finding routine to find the maximum value of the rotated data points. Keep track of the index of the data point corresponding to this maximum.
3. Rotate all the data points through an angle $-\theta$ to get back to your starting point. The x and y component of the indexed data point is fairly close to the CMC. You'd have to analyze the possible error here, because of the resolution of your data points.

3. Acbeet,
Thank you for your attention! Humm..maybe its a good ideia to move it, but just a Mod can do it, right?

Answering your questions: no, i donīt have any bounds on the location, CMC could be anywhere. Before i get the results i can do an estimation to define the range of concentration (m) i'll study, but sometimes i failed. And yes, here the function it's always concave - sometimes it couldn't be but in particular case that i'm study is!

As i said, i have some problems with math..i just work well in excell Can you give me an example or post a link of a tutorial of the first algorithm?

I've found today in a paper that were they they plot (∂K/∂m) vs square root of m, but i think in this very smooth curvatures its difficult to get the CMC too with small error.

Here's an exemple of my data: Exemplo.xlsx.zip If you could give me an example it would be great!!!

Thank you!

4. Thank you for your answers. Excel might be powerful enough to do what we're talking about. I would probably use LabVIEW if it were up to me. You might need to find a CS or engineering buddy to code this up for you, if Excel can't do what we're talking about.

Can you please give me another example that's really easy to check? I'm thinking of, say, the first example in the OP. If you could please give me the actual data for an example like that, it would be most helpful.

5. Yeah..sure, here it is: example 2 .xlsx.zip

Yeah, i'm trying to find some buddy in my university from engineering, maths or CS but they all are doing their finals exams...so its difficult to them get some free time to help me, and i need to present this project in 2/3 weeks and i would like to impress my teatcher! Let's see what i can learn here and do alone..

6. Taking a step back for a second, I'd say that the second example is highly problematic. Are you sure that the curve-fitting method exhibited in the first example is well-suited to getting useful information? The curves you have there don't seem to exhibit the same behavior as the initial ones. It looks like some other process is screening the phenomenon of interest.

7. If i understand correctly, you are saying that the fit in file "example 2" sheet "dados" is more problematic? No. What we want here is the intersection and this plot has two straight lines, with the parametric equations we get easly the intersection with low error.

In the file "exemplo" we have a curve, the variation of K in the CMC is much more smooth, and there its very difficult to get the "intersection", that in real isn't an intersection but a "great variation" or something like that, right?

Or maybe i didn't understand what are you saying..if is that, sorry..

8. Originally Posted by Scheele
If i understand correctly, you are saying that the fit in file "example 2" sheet "dados" is more problematic? No. What we want here is the intersection and this plot has two straight lines, with the parametric equations we get easly the intersection with low error.

In the file "exemplo" we have a curve, the variation of K in the CMC is much more smooth, and there its very difficult to get the "intersection", that in real isn't an intersection but a "great variation" or something like that, right?

Or maybe i didn't understand what are you saying..if is that, sorry..
He is saying that the error in the answer will be so large as to for all practical purposes it will be useless. You are attempting to find the position of the peak of the in the second derivative, which may not exist and so you will be just looking at noise (and in the end numerical derivative estimators are noise amplifiers).

CB

9. If the CMC is defined in terms of these two lines intersecting, then I'm saying that it will be very difficult (and error-prone) to fit those two lines to a smooth curve such as the third graphic in the OP.

There are ways around the error-prone nature of derivative-based algorithms. Line or curve-fitting is robust with respect to noise, because you're summing. So, for example, you could fit a parabola to the entire curve and find its second derivative, and maximize that (of course, the second derivative of a parabola opening up or down is a constant, so that's an easy problem). Then, since you did the curve-fitting first, the noise problem is alleviated before you take a derivative.