# Thread: Algorithm Analysis and Summations

1. ## Algorithm Analysis and Summations

Hi there,

This is for an assignment, but I'm having trouble with some basic examples.
I need to calculate the actual number of basic operations from the following algorithm and the big-o complexity:

Code:
for i = 1..n
for j = 1..i
for k = 1..i
x[j][i]++;
A pretty simple little algorithm.

My basic idea is to set up a summation like this:

$\sum_{i=1}^{n} \sum_{j=1}^{i} \sum_{k=1}^{i} 1$

I guess first up, does this summation look correct, or am I waaay off. (I'm taking an algorithm course, but my discrete math is very fuzzy).

(I'll have follow ups on how to properly execute the summation, because I'm having some trouble with that as well. But no use continuing that if my summation is wrong to begin with.)

thanks.

2. Originally Posted by mc72
Hi there,

This is for an assignment, but I'm having trouble with some basic examples.
I need to calculate the actual number of basic operations from the following algorithm and the big-o complexity:

Code:
for i = 1..n
for j = 1..i
for k = 1..i
x[j][i]++;
A pretty simple little algorithm.

My basic idea is to set up a summation like this:

$\sum_{i=1}^{n} \sum_{j=1}^{i} \sum_{k=1}^{i} 1$

I guess first up, does this summation look correct, or am I waaay off. (I'm taking an algorithm course, but my discrete math is very fuzzy).

(I'll have follow ups on how to properly execute the summation, because I'm having some trouble with that as well. But no use continuing that if my summation is wrong to begin with.)

thanks.
Well that will give you the operations count for the snippet (ignoring loop overheads), but are you sure the snippet is what is intended (it is an extravagant way of incrementing x[j][i] by i)

CB

3. They are just excercises to learn the fundementals of calculating basic operations so they aren't designed to be efficient algorithms.

So you are saying based on that snippet, that summation is correct?

Thanks.

4. Your summation is correct but do you know how to evaluate it? There is a nice formula for this sum. Can you at least see the pattern that develops?

In fact if you were to put a code like that through a C compiler there is some chance it would completely optimise the inner loop away into x[j][i] += i.

5. I think it evaluates down to

$\sum_{i=1}^{n}{i}^2$

which would get down to

$\approx\frac{1}{3}n^3$ ?

6. Yes, you have the right thing to sum, and you have the order of n and the O constant right.
The exact formula for the sum of the first n squares is 1/6(n)(n+1)(2n+1), which you can prove by induction.
I've been a professional computer programmer for a while but was not taught how to do algorithm analysis at university as I studied maths rather than computer science. I'm sure you are learning a really useful skill here, but please be aware that theoretical and actual results for an algorithm can be very different in practice. Simple algorithms that are not theoretically best often are best in practice because the compiler can optimise them better, and because special features of the CPU (e.g. branch prediction, cache) will usually work better with a simple algorithm.

7. Originally Posted by alunw
Simple algorithms that are not theoretically best often are best in practice because the compiler can optimise them better, and because special features of the CPU (e.g. branch prediction, cache) will usually work better with a simple algorithm.
In the most demanding applications you do in fact not let the complier have the last word, but the processor specific libaries are hand tuned.

CB

8. ## Tuning code.

Originally Posted by CaptainBlack
In the most demanding applications you do in fact not let the complier have the last word, but the processor specific libaries are hand tuned.

CB
Unfortunately I have never worked on such applications, though I recently experimented (not as part of my day job) with the latest Intel MKL (math kernel library) and found that it could multiply large matrices about 10* faster than my best effort could - and my code was already close to the performance of an older version of MKL and much better than a naive multiplication algorithm, so I can certainly vouch for the fact that a library hand-tuned by experts can have outstanding performance. Most modern compilers can profile code so that after a test run the compiler can optimise the code further using the information in the profile, and compilers that can parallelize algorithms based on hints provided by the programmer (using OpenMP) are becoming mainstream, though probably only a small minority of programmers both need and are capable of using these features well. Except occasionally for some small functions it must be very difficult these days to write assembly language that will out-perform well optimised and carefully written C or C++ code.

9. Originally Posted by alunw
Unfortunately I have never worked on such applications, though I recently experimented (not as part of my day job) with the latest Intel MKL (math kernel library) and found that it could multiply large matrices about 10* faster than my best effort could - and my code was already close to the performance of an older version of MKL and much better than a naive multiplication algorithm, so I can certainly vouch for the fact that a library hand-tuned by experts can have outstanding performance. Most modern compilers can profile code so that after a test run the compiler can optimise the code further using the information in the profile, and compilers that can parallelize algorithms based on hints provided by the programmer (using OpenMP) are becoming mainstream, though probably only a small minority of programmers both need and are capable of using these features well. Except occasionally for some small functions it must be very difficult these days to write assembly language that will out-perform well optimised and carefully written C or C++ code.
It can be easier than you might think, using the assember output from the compiler as a starting point for hand optimisation would alow hand tuning to proceed, and then any improvement will trump the compiler (assuming the optimiser has not already scrambled the code to such an extent that it is completely incomprehensible).

But none of this negates the value of algorithm analysis, since generally at best we are talking about changing the proportionality constant not the order of growth or the complexity class of the time complexity, and it is the rate of growth that is often the killer.

(as an aside; you can often trade time for space complexity and I'm not sure the compiler optimisation would allways be aware of such a possibility; well actually I am sure that the compiler will not be aware of the possibility of short cutting an algorithm by storing large look-up tables and similar techniques)

CB

10. Originally Posted by CaptainBlack
It can be easier than you might think, using the assember output from the compiler as a starting point for hand optimisation would alow hand tuning to proceed, and then any improvement will trump the compiler (assuming the optimiser has not already scrambled the code to such an extent that it is completely incomprehensible).
I think you would only want to do this for code that would be unlikely to change in future, as otherwise it would be a big maintenance headache.

I agree with you about look-up tables - they are often extremely useful for speeding up a program, for example they are useful to avoid doing divisions to calculate remainders when you are calculating in a finite field.

I also agree with you about the importance of analysing algorithms, and that rate of growth can be a killer - all too often programs are tested on small data-sets and have good performance, but turn out to be slow on real-life data. Even so, if you know that are certain array will always be small you would be justified in using an algorithm which does not perform well on large ones, though probably you should put a comment to warn future programmers in case something changes. But one must also consider other factors : for example, what if the faster algorithm is less numerically stable? Also you have to be sure your analysis does not neglect something that turns out to be important : a while ago I wrote some software to do computation on finitely presented groups using rewriting systems. These are supposed to be able to put an arbitrary word into a canonical form in time that is linear in the length of the word. But I found I was able to produce certain pathological examples where even quite short words would require billions of operations to do this, so that word reduction could take an hour or more. When I investigated more closely it turned out that almost all the execution time was taken up moving elements of the word around, something which the usual analysis neglects. Naturally when I went to the trouble of changing the implementation so that this copying was not necessary any more the performance improved hugely and was almost linear in the length of the word even in the pathological cases.