Hello there fellow coder!
This is my explanation of “Prefix Sums” a powerful and versatile algorithm technique.
It can be used for optimizations, parallelization or enabling other algorithms that are essential for all kinds of complex problems.
So it’s a very useful concept to understand, to know where and how to use it. For this post you better be familiar with Big O notation as I use it to talk about time complexity.
Trust me, it’s much easier than it sounds: given a list of numbers A, create a new list P of numbers on which each element is the sum of all the numbers from A prior to its index.
If you are a visual person, this is what it is (A is input and P is the output):
If you are a mathematical person, this is what it is:
Given A = [a<sub>0</sub>, a<sub>1</sub>, a<sub>2</sub>, ... , a<sub>n-1</sub>], of size <i>n > 0</i>
Prefix Sums is
P = [a<sub>0</sub>, (a<sub>0</sub> + a<sub>1</sub>), (a<sub>0</sub>+a<sub>1</sub>+a<sub>2</sub>), ... , (a<sub>0</sub>+a<sub>1</sub>+ ... + a<sub>n-1</sub>)]
Or in other words:
P[x] = P[x-1] + A[x]
Like I said, easy huh?
There are many ways of implementing this technique but I like this one because is quite readable:
As you can see, its time complexity is linear, O(n). Now that the concept is clear, let’s use it.
Imagine that you have to calculate many many times the sum of parts of an array of numbers. It could be for displaying a heat map or the average stock price. Our example just calculates the sum of all sub-arrays/slices of 5 elements from an initial big array.
The most straight forward way to do it is by repeatedly do the iterative sum as many time as needed. Maybe like so:
function sumArrayLinear(A, start, end) from the previous Gist can be represented as:
sumArrayLinear(A,x,y): A<sub>x</sub> + A<sub>x+1</sub> + ... A<sub>y-1</sub> + A<sub>y</sub>
And we can apply it like so:
sumArrayLinear([2,3,7,5],1,3): A<sub>1</sub> + A<sub>2</sub> + A<sub>3</sub> = 3 + 7 + 5 = 15
The time complexity of our example is O(n x m). Which is not bad.
But now you have another tool available for your disposal: Prefix Sums.
Let’s put it to use.
We calculate the Prefix Sums array with the numbers array. Then we define a new function which takes the Prefix Sums array as follows:
sumPrefixSumArray(P,x,y): P<sub>y</sub> - P<sub>x-1</sub>
Which basically reads as: The sum of the interval is the sum at the end (y) minus the sum of everything before the start of the interval (x-1). So with just one operation we obtain the result. Its time complexity is constant, O(1). Let’s try with our example.
The Prefix Sum of
A = [2,3,7,5] is
P = [2,5,12,17] .
sumPrefixSumArray(P,1,3): P<sub>3</sub> - P<sub>1-1</sub> = 17 - 2 = 15
Which is the same result as before, therefore:
A<sub>x</sub> + A<sub>x+1</sub> + ... + A<sub>y-1</sub> + A<sub>y</sub> = P<sub>y</sub> - P<sub>x-1</sub>
We can rewrite our code to use our new function:
The total time complexity is O(n) + m x O(1) which is simply O(n+m). And that’s pretty good! Of course, it all depends on the values of m & n. For our example, the first approach needs roughly 140 operations to obtain the resulting array while the seconds needs only 61.
There you have it!
The Prefix Sums technique for all sorts of other more complex problems. A powerful simple idea.
Was I clear? Spotted any problems or bugs? Do you know other examples where Prefix Sums can be used? Speak up on the comments bellow.