# Thread: Matlab - code optimalization problem

1. ## Matlab - code optimalization problem

Hi there,

I have problem with optimizing my code, I am no programmer so can't really see the way to make that computation faster. It will take days to complete all I need from it. I searched the web and as I understand there is multithreading constantly allowed in newer matlab versions, but I still get only about 27% working of that c2q which is doing it. I also used profiler, but gave me nothing useful. And also tried to use Parallel Computing toolbox, but when I set
Code:
parfor b=1:n1
then error will show up. "??? Error: File: pohpriem.m Line: 33 Column: 9
The variable d in a parfor cannot be classified."

Can someone point me the way how to do this correctly or some other optimalization that code should have?

Here is the code:
Code:
function d=pohpriem(x, y)
matlabpool open local 2
[m1,n1]=size(x);
[m2, n2]=size(y);
c=m1-1;
i=0;
j=0;
for b=1:n1

for l=0:c;
for k=l+1:m1-1;
A=regexp(y{k},'\.','split');
D1=[str2num(A{1}) str2num(A{2}) str2num(A{3})];
sD1= datenum(D1);
B=regexp(y{k+1},'\.','split');
D2=[str2num(B{1}) str2num(B{2}) str2num(B{3})];
sD2=datenum(D2);

if sD1 < sD2;
break;
else
i=i+1;
j=j+1;
z(j,i)=log(x(m1-l,b))-log(x(m1-k,b));
end
end
end
[m3 n3]=size(z);
for a=1:n3;
l(a)=z(a,a);
d(b,a)=l(a);
end
z=0;
i=0;
j=0;
end
d=d';
Tom.

2. Without knowing what exactly your code does and the format of the inputs, giving any advice will be difficult.

I would consider getting rid of as many foor loops as possible using MATLAB inbuilt functions. This can be quite a pain and it look like "y" is a cell array with make things a little more interesting. I would try use "cellfun", this has the potential to cut down on the for loops you need. An example is:

Code:
function A = mhfExample(y)
%use default values here as example
if nargin < 1;y = {'02.12.2009' '04.05.2009' '08.09.2009'};end

A = cellfun(@(x)strDate2Serial(x),y);
%A = A';%transpose results so they display as column vector
end

function serialDate = strDate2Serial(strdt)
tmp = regexp(strdt,'\.','split');
tmp = [str2double(tmp{1}) str2double(tmp{2}) str2double(tmp{3})];
serialDate = datenum(tmp);
end
which in you code corresponds (closely) to the block:
Code:
        for k=l+1:m1-1;
A=regexp(y{k},'\.','split');
D1=[str2num(A{1}) str2num(A{2}) str2num(A{3})];
sD1= datenum(D1);
I would need to see some of your sample data before I can comment on whether this is faster or not. My function "strDate2Serial" could probably be written a little nicer aswell, I will have to have a think on how to optimise this some more (must be an easier way using regexp to convert to double).

Regards Elbarto

3. Originally Posted by jalko
Hi there,

I have problem with optimizing my code, I am no programmer so can't really see the way to make that computation faster. It will take days to complete all I need from it. I searched the web and as I understand there is multithreading constantly allowed in newer matlab versions, but I still get only about 27% working of that c2q which is doing it. I also used profiler, but gave me nothing useful. And also tried to use Parallel Computing toolbox, but when I set
Code:
parfor b=1:n1
then error will show up. "??? Error: File: pohpriem.m Line: 33 Column: 9
The variable d in a parfor cannot be classified."

Can someone point me the way how to do this correctly or some other optimalization that code should have?

Here is the code:
Code:
function d=pohpriem(x, y)
matlabpool open local 2
[m1,n1]=size(x);
[m2, n2]=size(y);
c=m1-1;
i=0;
j=0;
for b=1:n1

for l=0:c;
for k=l+1:m1-1;
A=regexp(y{k},'\.','split');
D1=[str2num(A{1}) str2num(A{2}) str2num(A{3})];
sD1= datenum(D1);
B=regexp(y{k+1},'\.','split');
D2=[str2num(B{1}) str2num(B{2}) str2num(B{3})];
sD2=datenum(D2);

if sD1 < sD2;
break;
else
i=i+1;
j=j+1;
z(j,i)=log(x(m1-l,b))-log(x(m1-k,b));
end
end
end
[m3 n3]=size(z);
for a=1:n3;
l(a)=z(a,a);
d(b,a)=l(a);
end
z=0;
i=0;
j=0;
end
d=d';
Tom.
What is the purpose of z? If this is a local variable you are dynamically redimensioning it within the innermost loop. If possible compute its final dimensions before entering any loop and initialise it with zeros(dim1,dim2).

Also you appear only to use the diagonal elements of z (in fact as far as I can tell you are only computing these). If so only compute the diagonal elements and use a vector for them not an array.

CB

4. Sorry for missing explanation, code is for computing log differences for every two values of every column of matrix x. In vector y are dates for constrain that combinations must lie only within one day.

data looks like this:
Code:
Date(vector y)| Value(matrix x)
1.1.2000| 1, 2
1.1.2000| 2, 3
1.1.2000| 5, 4
...
1.1.2009| 3, 5
2.1.2009| 8, 6
2.1.2009| 4, 7
...
Elbarto, I don't understand your code so much now, have to read help to implement it, but that "datenum" function from your another advice solved my first problem, so hope you are right again

Originally Posted by CaptainBlack
What is the purpose of z? If this is a local variable you are dynamically redimensioning it within the innermost loop. If possible compute its final dimensions before entering any loop and initialise it with zeros(dim1,dim2).

Also you appear only to use the diagonal elements of z (in fact as far as I can tell you are only computing these). If so only compute the diagonal elements and use a vector for them not an array.

CB
If I use for example data such
Code:
y,x
1.1.2000,1
1.1.2000,2
2.1.2000,3
2.1.2000,4
3.1.2000,5
3.1.2000,6
to matrix z will be stored every outcome for loops "l" and "k" (in this case, b=1) in diagonal places as you wrote(1OU->1. outcome, 2OU->second, 3OU->third)
Code:
1OU, 0, 0
0, 2OU, 0
0, 0, 3OU
I used this way because i need to store answer for every value of loop "l" and corresponding values of "k" and in this way, so for reasons that I can't make it store the value in first zero position in vector I have to use matrix way, I can use matrix z(l+1,k) instead, but for z(i,j) I easy know where the value is in matrix z and can distinguish zeros from answer from these matlab put to empty places in the matrix. Maybe there is some way to write to first zero position in vector and I can use it instead of that matrix, but I don't know how to do it.

Can compute dimensions for "z" before, thanks for tip, but as I assume, first need to compute all restrictions for dates and maybe create another dummy variable to can correctly compute combination number for every day and sum them to compute dimensions of z, so I am not sure that this will speed it up. But I will try.

Thanks for helping me out with this.
-------------------
I solved the problem to write in last zero position in vector, shame on me that i did not see it before. I still tried to find some function for that but just set
Code:
i=i+1
z(i)=..
was enough.
But i would still like to make it faster if it's possible somehow.

5. I changed code like so:
Code:
function d=pohpriem(x, y)
matlabpool open local 4;
[m1,n1]=size(x);
[m2, n2]=size(y);
c=m1-1;
%i=0;
%j=0;
p=0;
por=zeros(m2, 1)
parfor i=1:m2
A=regexp(y{i},'\.','split');
D1=[str2num(A{1}) str2num(A{2}) str2num(A{3})];
sD1= datenum(D1);
por(i)=sD1
%B=regexp(y{k+1},'\.','split');
%D2=[str2num(B{1}) str2num(B{2}) str2num(B{3})];
%sD2=datenum(D2);

end
for b=1:n1

for l=0:c;
for k=l+1:m1-1;
if por(k) < por(k+1);
break;

else
p=p+1
z(p,b)=log(x(m1-l,b))-log(x(m1-k,b));
end
end
end
p=0;

end

d=z;
It's quicker now, but still I want somehow quicker way if it is possible. When I tried to use parfor for loop in "b" for example I still have that error which I mentioned in 1. post.(The variable in a parfor cannot be classified.) Can you show me some more way to optimize this code?

6. jalko,
can you please confirm if I understand this correctly.

1) find all dates within 1 day of each other
2) compute ALL the combinations for difference in values for all these dates. If this is the case I am unsure why there are 2 columns in the x vector. If you can explain this line that would be great:
Code:
z(j,i)=log(x(m1-l,b))-log(x(m1-k,b));
Also if you can explain simply what the output should contain, ie just differences of dates with differences that would be useful. You have got an interesting little problem here by the looks of it.

Im not sure using cellfun will be any faster as I haven't bench marked it yet. Would be interesting to see with a large data set. To understand my previous post look into "anonymous functions". They are quite usful, you can get by with out them usually but if your going to be doing a bit more MATLAB they do make life easier in certain applications.

Elbarto

7. 1) and 2) yes.
two dimensions in z was there because I have 2 "for" cycles, one "for" for starting date and the second "for" for all remaining combinations whitin current day with that value from first "for". Because as was pointed here this cause a lot of problems i changed code to this:
Code:
function d=pohpriem(x, y)
matlabpool close
matlabpool open local 4;
[m1,n1]=size(x);
[m2, n2]=size(y);
c=m1-1;
r=0;
p=0;
q=0;
por=zeros(m2, 1);
parfor i=1:m2
A=regexp(y{i},'\.','split');
D1=[str2num(A{1}) str2num(A{2}) str2num(A{3})];
sD1= datenum(D1);
por(i)=sD1
end

for j=1:m2-1;
q=q+1;
if por(j)~=por(j+1);

h=nchoosek(q, 2);
r=r+h;
q=0;
end
end
h=nchoosek(q+1, 2);
r=r+h
z=zeros(r,n1);
for b=1:n1

for l=0:c;
for k=l+1:m1-1;
if por(k) < por(k+1);
break;

else
p=p+1;
z(p,b)=log(x(m1-l,b))-log(x(m1-k,b));
end
end
end
p=0;

end

d=z;
My output is value of all these combinations of differences and because I have observations in milions, it's taking quite a lot of time to compute. I optimized coinstrain checking proces(a bit), now I want to use parfor for loop in "b" or "l" but still got error that this cannot be used due to way that"z" is ussed.

8. Let me sleep on that one jalko, I will have a look tomorrow. I have no experience with parallel computing so im not the best one to answer that question but I have a few ideas I would like to try. I think I can cut the use of for loops down at least so I will have a go tommorow and post up a quick example to see if I understand correctly what you are trying to do.

Captain Black seems to have a lot of MATLAB experience so he might be more usful on the high performance side of things.

Regards Elbarto

9. Originally Posted by elbarto
Let me sleep on that one jalko, I will have a look tomorrow. I have no experience with parallel computing so im not the best one to answer that question but I have a few ideas I would like to try. I think I can cut the use of for loops down at least so I will have a go tommorow and post up a quick example to see if I understand correctly what you are trying to do.

Captain Black seems to have a lot of MATLAB experience so he might be more usful on the high performance side of things.

Regards Elbarto
No problem, sleep well mate.

10. Interesting, I noticed that if echo to command window is not allowed for writing just value of p, computation is dramatically faster.