How to find prime numbers with Python
One of the first things a beginning programmer should try is code for testing the primality of a given number, or getting a list of primes. A prime is a number only divisible by itself and one. (Note that divisibility here means divisible without a remainder…)
Searching for big primes can get very subtle. This piece will only scratch the surface of primes, but it’s still a good challenge which sharpens our thinking. The code we will use will evolve through the article, so please read sequentially from the start.
There are two naive methods to find primes. First the Divisor Test which is useful for checking the primality of a given number, or small collection of numbers, second the Prime Sieve which is faster to generate lots of primes.
The Divisor Test
If we have a number, n, then the first immediate way to test if it is prime is to divide every number less than n into n and if none divide then answer Yes. If any divide then answer No. This is a candidate for using a loop. What about:
startnumber=7 startnumber*=1.0 prime = True for divisor in range(2,startnumber): if startnumber/divisor==int(startnumber/divisor): prime=False if prime==False: print startnumber," is not prime" else: print startnumber, "is prime"
When Python divides two numbers of type integer it returns an integer which is the result of the division rounded down. To get the answer in decimal we have to set the type of at least one of the numbers to a float type by multiplying by 1.0.
7.0 is prime
So that works for 7 but there is a lot of refining still to do. We can tighten up the loop as follows: Look at each number the loop tries as the divisor: 2,3,4,5,6… It is apparent that there’s not much point in dividing 7 by 6 since we know that 6 is too big to divide 7. In fact n-1 will only ever divide n when n=2. so what do we want the loop to count up to ? The square root of n. If there are no divisors below this value there aren’t any at all. This will make the code quicker, especially when we plug in some big numbers later.
startnumber=7 startnumber*=1.0 prime = True for divisor in range(2,startnumber**0.5+1): if startnumber/divisor==int(startnumber/divisor): prime=False if prime==False: print startnumber," is not prime" else: print startnumber, "is prime"
The +1 in the range statement is necessary because python would stop counting at int(n**0.5)-1 otherwise.
Time for more improvements. Let’s turn the code into a function so we can combine it with other code more easily.
def isprime(startnumber): startnumber*=1.0 prime = True for divisor in range(2,startnumber**0.5+1): if startnumber/divisor==int(startnumber/divisor): prime=False return prime
…this can be shortened too. What’s this business of creating the variable, “prime” ? It isn’t necessary. Like removing excess weight from a racing car we can chop out the padding and have:
def isprime(startnumber): startnumber*=1.0 for divisor in range(2,int(startnumber**0.5)+1): if startnumber/divisor==int(startnumber/divisor): return False return True
This is getting nicely streamlined. As soon as a divisor is found we break out of the loop, if no divisors are ever found we know n is prime so the function returns True once the loop has terminated. Now we can call this function from a loop and generate a list of primes by adding
for a in range(2,100): if isprime(a): print a
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
OK, fine. This will work in reasonable time for quite big numbers, which could be fun. To find all the primes under 10,000 took about 20 seconds on my pc. The Prime Sieve is faster for lots of primes so read on.
Looking again at what that loop is doing we see that it is using even divisors to test. Well if n isn’t divisible by 2 it cant be divisible by any even number so let’s stop it testing those.
def isprime(startnumber): startnumber*=1.0 if startnumber/2==int(startnumber/2) and startnumber!=2: return False for divisor in range(3,int(startnumber**0.5)+1,2): if startnumber/divisor==int(startnumber/divisor): return False return True
We had to add a new line testing solely for 2, but its outside the loop so won’t slow us down much. Also this function might return False for n == 2, so we’ve added another condition that stops this. Good, but there’s more.
The division operation inside the loop could be trimmed down. At the moment, within the loop, there are two division operations and an int(n) operation. All divisions in the function could be changed to single modulus operations, using Python’s % operator.
def isprime(startnumber): startnumber*=1.0 if startnumber%2==0 and startnumber!=2: return False for divisor in range(3,int(startnumber**0.5)+1,2): if startnumber%divisor==0: return False return True
Whatever code is inside the loop will be repeated lots of times. Thus trimming and speeding up this region of the code is essential to shorten runtime. It is fun to evolve code like this, don’t you think? This is more how a real programmer might work, happy the person who can write down perfect code immediately, for me it tends to evolve, as I have tried to show here in these simple examples.
Our final evolutionary step is as follows. We still have redundant calculations going on in our code, for instance although we only check odd numbers > 2 now we still check for divisibility by a lot of numbers that have prime factors in common, such as checking for divisibility by 5 which is good but later checking for 15 which is a wasted task. We start by observing that numbers of the form 6n+1 or 6n-1 are usually prime. In fact this formula generates all the primes > 3 as well as a few composites. Fewer composites than above though so its still an improvement. We check for divisibility separately by 2 then 3, then by 6n+1 and 6n-1 for all n until 6n+1>= root n
def isprime(n): n*=1.0 if n%2==0 and n!=2 or n%3==0 and n!=3: return False for b in range(1,int((n**0.5+1)/6.0+1)): if n%(6*b-1)==0: return False if n %(6*b+1)==0: return False return True
Note that the best way of testing speed is to use the time module in Python. The shortest code isn’t always the fastest either, this last code is quicker than our earlier attempts but slightly longer. If the program is only occupying a few bytes of memory (space complexity) then speed (time complexity) is clearly more important…
The Prime Sieve
If you want lots of primes then there is a faster way, though. Start with a collection of all the natural numbers and remove all the even ones. Then remove all multiples of 3, then 5 and so on through the earlier primes. After you have taken out all possible multiples what remains must be the primes, since they are not multiples. The code for this works quicker when you need a collection of the first n primes, for large n. Here’s a naive yet short Sieve of Eratosthenes in python. This one removes the composites from the list in situ until there are none left and only primes remain.
A flaw is that many numbers are checked more than once.
a=list(range(2,1000)) for x in a: n=x while x*n <= max(a): if x*n in a: a.remove(x*n) n+=1 print(a)
How can we eliminate the redundancy ? The next one i tried uses a slightly different approach, using two lists, one of composites which is populated in the first loop and one of the primes which are every integer not in the composites, populated in the second loop.
comp= for a in range(2,1000): if a not in comp: n=a while n*a <=1000: comp.append(n*a) n+=1 primes= for c in range(2,1000): if c not in comp: primes.append(c) print(primes)
but this next one is even better since it seems to avoid any repetition. For each starting factor a temporary list of composites is generated, these are then removed from the primes list, and then the logic moves to the next prime factor in the list and repeats the same process until it’s done.
a=range(2,1000) for x in a: comp= for y in a: comp.append(x*y) for z in comp: if z in a: a.remove(z) print(a)
aha ! but there was redundancy in that one too. The first unchecked composite is n*n, but we were counting n*2,n*3, n*5 etc up to n*n, this can be remedied. Also we need to avoid checking the numbers when they are too large, so here is our final version.
a=range(2,1000) for x in a: comp= for y in range(a.index(x),len(a)): q=x*a[y] if q<=max(a): comp.append(q) else: break for z in comp: a.remove(z)
A pretty picture
Finally, here is a plot of the first thousand primes with a vertical line for each prime and blank lines for composites. It gives you a loose idea of the density of primes. Looks a bit like a giant barcode, as Arka says.
The large primes used for encryption have hundreds of digits. With what we have done here it would probably take thousands of years of computer time to verify the primality of such numbers, for those you use clever math, see for example this wikipedia page. Brute force is usually the easiest to code, but as our skills develop we will try and use insight to work in harmony with the computer. Use the wetware inside your skull for the bits it is best for, and leave the silicon to its specialty.