**Cooking with Python, Part 2**

Pages: 1, **2**

## Recipe 18.10: Computing Prime Numbers

*Credit: David Eppstein, Tim Peters, Alex Martelli, Wim
Stolker, Kazuo Moriwaka, Hallvard Furuseth, Pierre Denis, Tobias
Klausmann, David Lees, Raymond Hettinger*

### Problem

You need to compute an unbounded sequence of all primes, or the list of all primes that are less than a certain threshold.

### Solution

To compute an unbounded sequence, a generator is the natural Pythonic approach, and the Sieve of Eratosthenes, using a dictionary as the supporting data structure, is the natural algorithm to use:

```
import itertools
def eratosthenes( ):
'''Yields the sequence of prime numbers via the Sieve of Eratosthenes.'''
D = { } # map each composite integer to its first-found prime factor
for q in itertools.count(2): # q gets 2, 3, 4, 5, ... ad infinitum
p = D.pop(q, None)
if p is None:
# q not a key in D, so q is prime, therefore, yield it
yield q
# mark q squared as not-prime (with q as first-found prime factor)
D[q*q] = q
else:
# let x <- smallest (N*p)+q which wasn't yet known to be composite
# we just learned x is composite, with p first-found prime factor,
# since p is the first-found prime factor of q -- find and mark it
x = p + q
while x in D:
x += p
D[x] = p
```

### Discussion

To compute all primes up to a predefined threshold, rather than an unbounded sequence, it's reasonable to wonder if it's possible to use a faster way than good old Eratosthenes, even in the smart variant shown as the "Solution". Here is a function that uses a few speed-favoring tricks, such as a hard-coded tuple of the first few primes:

```
def primes_less_than(N):
# make `primes' a list of known primes < N
primes = [x for x in (2, 3, 5, 7, 11, 13) if x < N]
if N <= 17: return primes
# candidate primes are all odd numbers less than N and over 15,
# not divisible by the first few known primes, in descending order
candidates = [x for x in xrange((N-2)|1, 15, -2)
if x % 3 and x % 5 and x % 7 and x % 11 and x % 13]
# make `top' the biggest number that we must check for compositeness
top = int(N ** 0.5)
while (top+1)*(top+1) <= N:
top += 1
# main loop, weeding out non-primes among the remaining candidates
while True:
# get the smallest candidate: it must be a prime
p = candidates.pop( )
primes.append(p)
if p > top:
break
# remove all candidates which are divisible by the newfound prime
candidates = filter(p._ _rmod_ _, candidates)
# all remaining candidates are prime, add them (in ascending order)
candidates.reverse( )
primes.extend(candidates)
return primes
```

On a typical small task such as looping over all primes up to 8,192,
`eratosthenes`

(on an oldish 1.2 GHz Athlon PC, with
Python 2.4) takes 22 milliseconds, while
`primes_less_than`

takes 9.7; so, the slight
trickery and limitations of `primes_less_than`

can
pay for themselves quite handsomely if generating such primes is a
bottleneck in your program. Be aware, however, that
`eratosthenes`

scales better. If you need all primes
up to 199,999, `eratosthenes`

will deliver them in
0.88 seconds, while `primes_less_than`

takes 0.65.

Since `primes_less_than`

's little
speed-up tricks can help, it's natural to wonder
whether a perhaps simpler trick can be retrofitted into
`eratosthenes`

as well. For example, we might simply
avoid wasting work on a lot of *even* numbers,
concentrating on odd numbers only, beyond the initial
`2`

. In other words:

```
def erat2( ):
D = { }
yield 2
for q in itertools.islice(itertools.count(3), 0, None, 2):
p = D.pop(q, None)
if p is None:
D[q*q] = q
yield q
else:
x = p + q
while x in D or not (x&1):
x += p
D[x] = p
```

And indeed, `erat2`

takes 16 milliseconds, versus
`eratosthenes`

' 22, to get primes
up to 8,192; 0.49 seconds, versus
`eratosthenes`

' 0.88, to get primes
up to 199,999. In other words, `erat2`

scales just
as well as `eratosthenes`

and is always
approximately 25% faster. Incidentally, if you're
wondering whether it might be even faster to program at a slightly
lower level, with `q = 3`

to start, a ```
while
True
```

as the loop header, and a `q += 2`

at
the end of the loop, don't worry—the slightly
higher-level approach using
`itertools`

'
`count`

and `islice`

functions is
repeatedly approximately 4% faster. Other languages may impose a
performance penalty for programming with higher abstraction, Python
*rewards* you for doing that.

You might keep pushing the same idea yet further, avoiding multiples
of `3`

as well as even numbers, and so on. However,
it would be an exercise in diminishing returns: greater and greater
complication for smaller and smaller gain. It's
better to quit while we're ahead!

If you're into one liners, you might want to study the following:

```
def primes_oneliner(N):
aux = { }
return [aux.setdefault(p, p) for p in range(2, N)
if 0 not in [p%d for d in aux if p>=d+d]]
```

Be aware that one liners, even clever ones, are generally anything
but speed demons! primes_oneliner takes 2.9 seconds
to complete the same small task (computing primes up to 8,192) which,
`eratosthenes`

does in 22 milliseconds, and
`primes_less_than`

in 9.7--so,
you're slowing things down by 130 to 300 times just
for the fun of using a clever, opaque one liner, which is clearly not
a sensible tradeoff. Clever one liners can be instructive but should
almost never be used in production code, not just because
they're terse and make maintenance harder than
straightforward coding (which is by far the main reason), but also
because of the speed penalties they may entail.

While prime numbers, and number theory more generally, used to be considered purely theoretical problems, nowadays they have plenty of practical applications, starting with cryptography.

### See Also

To explore both number theory and its applications, the best book is
probably Kenneth Rosen, *Elementary Number Theory and Its
Applications* (Addison-Wesley); http://www.utm.edu/research/primes/ for more
information about prime numbers.

*
*

*
*

Related Reading Python Cookbook |

Return to Python DevCenter.