Spoiler Alert! This article reveals the answer to Project Euler problem #10.


Try it!

A few months ago I worked through the first 10 problems of Project Euler. I did OK up to problem #10. The question involved finding all the prime numbers up to 2,000,001. There were earlier questions that involved finding primes, and I cobbled together some javascript with the help of underscore.js that got the job done – albiet very slowly. My homespun solution was just too durn slow for problem #10. I turned to the internet, and it provided me a blazing fast solution. See below.

var ten = function sieve(max) {
    var D = [],
        primes = [],
        sum = 0;
    for (var q=2; q < max; q++) {
    if (D[q]) {
      for (var i=0; i < D[q].length; i++) {
        var p = D[q][i];
               
        if (D[p+q])
           D[p+q].push(p);
        else
           D[p+q]=[p];
      }
      delete D[q];
    } else {
            sum += q;
      primes.push(q);
      if (q*q<max)
         D[q*q]=[q];
    }
  }
  return sum;
 }(2000001);

So yeah, I cheated. At the time, I had no idea how this code worked, I just knew that it did. Since then, I’ve occasionally revisited this code to take another shot at figuring out what this mysterious D array is doing with its Ps and Qs. However, trace after trace and many breakpoints later I was still scratching my head.

But just yesterday I stumbled across Python Tutor. This site is simply amazing. These geniuses have developed an interactive online code editor that spits out pictures based on what your code is doing. You even get little playhead controls to step through each line! This wizardry finally allowed me to scratch the itch that’s been bugging me for a while.

I returned to the site where I ‘borrowed’ the JavaScript algorithm for the Sieve of E., and found the Python equivalent. The syntax is a bit different, but it essentially doing the same thing. The code looks like this:

def eratosthenes(maxnum):
   D = {}  # map composite integers to primes witnessing their compositeness
   q = 2   # first integer to test for primality
   while q <= maxnum:
       if q not in D:
           yield q       # not marked composite, must be prime
           D[q*q] = [q]  # first multiple of q not already marked
       else:
           for p in D[q]:  # move each witness to its next multiple
               D.setdefault(p+q,[]).append(p)
           del D[q]        # no longer need D[q], free memory
       q += 1

for p in eratosthenes(19): print p

The comments were helpful in determining that it was doing the same thing as the JS algorithm, but when I plugged this code into Python Tutor, WHAMMO! It hit me like a ton of bricks. D is a dictionary!!! D is where non-prime numbers are stored and created from prime factors.

So we iterate over all the numbers from 1 to whatever, with q as the iterator index. For each value of q, check the dictionary (D). If that value does not exist in D, then the number is prime. For each prime number, find the square and store it in the dictionary for later. So when q = 2, the first entry in D is 4. Then when q = 3, stick 9 in the dictionary. Also, for each dictionary entry created by squaring a prime, attach that prime as a child object.

When q is not prime, examine the dictionary entry. Take the child, add it to the dictionary value, and create a new entry in the dictionary for the sum. So when q = 4, its child value is 2. Take 2 + 4 and get 6. Create a dictionary entry for 6, and store the original prime factor (2) as its child. Then delete the previous dictionary entry (4). Keep this up until q is larger than your target value (2000001).

This part took me the longest to figure out, and it makes a lot more sense by skipping the deletion part. In the Python Tutor example, comment out the line del D[q]. When running the visualizer, the dictionary keeps growing and it is much easier to see how each entry is created, and why each entry is composite rather than prime. For each square value, its root is added over and over again until q reaches its limit. Deleting dictionary entries is not absolutely necessary with low values of q. It’s a memory saver that keeps D from getting out of hand (and crashing Chrome when trying this in JavaScript).

I know I did a sub-par job of explaining this, but seriously, that’s what Python Tutor is for. Try it, you’ll like it.