After my last adventure in prime testing, I decided that I should implement the Sieve of Erastosthenes and give that a whirl. There's full code at the Haskell pastebin if you're morbidly curious. But most of it is the Newton-Rapheson approximate square root routines. The entire Sieve of Erastosthenes function is only 5 lines long:
listPrimes :: Integer -> [Integer] -> Integer -> [Integer] listPrimes n knownPrimes m | m > n = knownPrimes | not (any (== 0) (map (m `mod`) knownPrimes)) = listPrimes n (m:knownPrimes) (m+2) | otherwise = listPrimes n knownPrimes (m+2)
This generates all primes less than the number n, starting at the number m, and using the initial list of primes given in the "knownPrimes" list. m is assumed to be odd and greater than 2, and only odd numbers less than n are checked for primality. The full list of generated primes is returned.
For example, listPrimes 100  3 gives [97,89,83,79,73,71,67,61,59,53,47,43,41,3
The (map (m `mod`) knownPrimes)) bit is probably a little baffling to anyone without experience in a functional language. map is one of those things that is very common in the functional paradigm, but imperative programmers have rarely seen anything similiar. In functional languages, functions are first-class objects, which means you can use them in many of the same ways you use data values. In particular, you can create a temporary function, and pass it into a subroutine, just like you would with an int or pointer. The subroutine can then use the passed in function however it wants. (If you've ever used C's qsort(), you may have seen this in action.) map makes extensive use of the ability to pass in a function as a parameter to a function. Map takes two parameters: a function, and a list. It applies the function to each item in the list, thus computing a new value for each item. Then the new values are put into a list and returned. So in imperative syntax, what it's saying is: "for each item on the knownPrimes list, compute m mod list item, and return me a list of resulting values."
Now, you may ask, how does this help us? Well, the whole idea behind the Seive of Erastosthenes is that a number M is prime if and only if it is NOT evenly divisible by any prime number smaller than sqrt(M). So, if M mod (any item in knownPrimes) == 0, then M is evenly divisible by one of the primes on the list, and therefore M is not prime. And finally, the (any (== 0) ) thing is similiar in concept to map. It uses the given function ("equals 0"), upon each item in the list (each one an "M mod knownPrime" value) and sees if any of the values on the list are 0. If so, the number is NOT prime. So in order to see if the number IS prime, we need to not the value of the any "M mod knownPrime" == 0 computation. If the number M is prime, we add it to our list of primes with the "newNum : previousList" syntax, and go on the the next M. Otherwise, we just go on to the next M without adding anything to the list of knownPrimes.
The Sieve of Erastosthenes is supposed to be a very efficient way to generate prime numbers, as prime number generation algorithms go. So I was surprised at how slow it was when I tested the new code. You may remember that after a few elementary optimizations, the old, naive, "check possible prime against every odd number < sqrt(possible prime)" code ran pretty quickly on numbers less than 15 digits. The new code - which in theory is much smarter for generating all primes less than sqrt(possible prime), and then testing the possible prime only against those - is actually about an order of magnitude slower.
And I suppose that's not too surprising, if you think about it. "Check every number < sqrt(n)" has roughly a O(sqrt(n)) runtime. But generating the entire list of primes less than sqrt(n) with the Sieve of Erastosthenes means that we have to check every odd number from 3 to sqrt(n) against ln(n) items that are already on the prime list. So it turns out to be about O(sqrt(n) * ln(n)) runtime, and with much larger constant factors on every computation due to running down the list, cons'ing a new item onto the list occasionally, and so on. Here are the actual performance tests:
62615533 - 7 secs confirm non-prime. (Previously - before my finger came off the enter key)
15485863 - 2 secs confirm prime. (Previously - before my finger came off the enter key)
101364623 - 7 secs confirm non-prime. (Not previously tested.)
275604541 - 16 secs confirm prime. (Previously - maybe 200 ms.)
6092426891 - 5 mins confirm non-prime. (Not previously tested. Big jump in perceived time!)
5915587277 - 7 mins confirm prime. (Not previously tested.)
10967535067 - 10 mins confirm non-prime. (Previously - 4 seconds.)
10113352331 - 9 mins confirm prime. (Not previously tested.)
All in all, very interesting results.
Also, I'd like to say thanks to all the smart folks over at irc.freenode.net #haskell for their help, support and patience with my clueless n00b questions.