Spectroscopy: Multiple Linear regression Problem

• Jun 7th 2010, 07:49 PM
BornAHorn
Spectroscopy: Multiple Linear regression Problem
Hello All,

This is my first post on this forum and I would appreciate any feedback you all can provide. I am a graduate student in Texas. The problem below can be solved with very expensive proprietary software that frankly we cannot afford. I have tried to solve the problem in MatLab but have so far been unsuccessful.

Here is an abbreviated description of my problem:

I am trying to calculate a coefficient ?vector? that will be multiplied by each time point for many spectra (with a large number of data points) that is calculated by summing the result of coefficient X spectra value for each time point of a particular spectra that approaches a final number. The coefficient file will correlate to many spectra predicting their final number at a 0.95 R-squared level (or lower if indeed the spectra do not return an easy solution.

An example that might help:
Time Spectra 1 Spectra 2 Spectra 3
0 58.1213 68.1243 75.0123
1 58.0124 68.0457 74.9856
2 57.9854 68.0012 74.9754
... ...... .... .....
10000 2.1242 1.2144 0.2121

Spectra 1 has 0.4 grams of oil, spectra 2 has 0.5 grams of oil, and spectra 3 has 0.6 grams of oil. The spectra are not linear and will have many peaks and valleys (it is an NMR spectra to be more specific).

A sample coefficient file calculated will a provide a value for each time point. This coefficient file will be calculated by trying to match each spectra's final number with the highest confidence of fit versus actual final value (i.e. the grams of oil):
Time Coefficient value
0 0.00005
1 0.00005
2 0.00004
... .....
10000 -0.0001

The resulting Spectra 1 file is calculated
Time Coefficient X spectra data point = result
0 0.00005 x 58.1213 = 0.0029
1 0.00005 x 58.0124 = 0.0029
2 0.00004 x 57.9854 = 0.0023
... ..... x ...... = ........
10000 -0.0001 x 2.1242 = -0.0002

The sum of all the results for this spectra will equal 0.4 grams of oil.

I am not sure if this can be solved by ?multiple linear regression, or matrix algebra. I have some experience in MatLab (and some other math programs) and have no problem downloading recommended software. I can provide full length and example data (and results) if anyone is more interested.

Thanks,
Patrick
• Jun 8th 2010, 02:06 AM
CaptainBlack
Quote:

Originally Posted by BornAHorn
Hello All,

This is my first post on this forum and I would appreciate any feedback you all can provide. I am a graduate student in Texas. The problem below can be solved with very expensive proprietary software that frankly we cannot afford. I have tried to solve the problem in MatLab but have so far been unsuccessful.

Here is an abbreviated description of my problem:

I am trying to calculate a coefficient ?vector? that will be multiplied by each time point for many spectra (with a large number of data points) that is calculated by summing the result of coefficient X spectra value for each time point of a particular spectra that approaches a final number. The coefficient file will correlate to many spectra predicting their final number at a 0.95 R-squared level (or lower if indeed the spectra do not return an easy solution.

An example that might help:
Time Spectra 1 Spectra 2 Spectra 3
0 58.1213 68.1243 75.0123
1 58.0124 68.0457 74.9856
2 57.9854 68.0012 74.9754
... ...... .... .....
10000 2.1242 1.2144 0.2121

Spectra 1 has 0.4 grams of oil, spectra 2 has 0.5 grams of oil, and spectra 3 has 0.6 grams of oil. The spectra are not linear and will have many peaks and valleys (it is an NMR spectra to be more specific).

A sample coefficient file calculated will a provide a value for each time point. This coefficient file will be calculated by trying to match each spectra's final number with the highest confidence of fit versus actual final value (i.e. the grams of oil):
Time Coefficient value
0 0.00005
1 0.00005
2 0.00004
... .....
10000 -0.0001

The resulting Spectra 1 file is calculated
Time Coefficient X spectra data point = result
0 0.00005 x 58.1213 = 0.0029
1 0.00005 x 58.0124 = 0.0029
2 0.00004 x 57.9854 = 0.0023
... ..... x ...... = ........
10000 -0.0001 x 2.1242 = -0.0002

The sum of all the results for this spectra will equal 0.4 grams of oil.

I am not sure if this can be solved by ?multiple linear regression, or matrix algebra. I have some experience in MatLab (and some other math programs) and have no problem downloading recommended software. I can provide full length and example data (and results) if anyone is more interested.

Thanks,
Patrick

Let A be a matrix the rows of which are your spectra, let Y be the column vector of the oil mass corresponding to the rows of A, and let X denote the required weight vector.

So:

AX=Y.

Now if there are more rows that columns (more spectra than time points) matlab will give a leat squares fit with:

>X=A\Y

(left divide)

If there are fewer rows than columns then in general X is under-determined and a pefect but non-unique fit can be obtained, but I don't think that is what you require.

I suspect that there are some extra constraints on X that you have not told us about.

CB
• Jun 8th 2010, 12:09 PM
GeoC
Seems like you need do use singular value decomposition to find the principal components of the time varying dataset. Two sets of vectors will emerge: One will represent the component specta and the other the time dependent concentrations of the components. You should be able to plot the eigenvalues in order of decreasing magnitude and see a break in the trend line. This will tell you the number of components in the sample.

If two or more components are changing in the sample with an identical time profile throughout the entire time period, then one of the component vectors will actually be a linear combination of the two covarying components.

Matlab is very well equipped to do this. Look up Principal Components Analysis. Alternatively, there's a paper published in Science back ~ 1995 by Coulston, et al. that used this approach to look at changing oxidation states of vanadium phospate catalysts during butane oxidation.