Monday, July 18, 2016

Functional Practice: Euler Problem #3 in F# - Prime Factors

I'm continuing my exploration of F# by looking at the Euler problems. I've got working (though not yet optimal) solutions for Euler Problem #1 and Euler Problem #2. Now it's time to move on to Euler Problem #3.

Here's the text for the problem:
The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?
At first look it doesn't seem too hard. There's some prime factoring... but the target number is 600 billion! That won't fit into a 32-bit integer. So we'll have to think about data types here.

[Update: Collected articles for the first 10 Euler Problems in F# are available here: Jeremy Explores Functional Programming.]

Calculating Factors
It's best to start with the sample number: 13195. So we'll find the prime factors of this. I've done this before with C# code, and it's not all that hard. But I wanted to try to figure it out with more of a "functional" style. I'm not sure how successful I was. But it works, and it performs well, so that's a start.

I knew that I would want to use a sequence for this. I took a couple stabs at how to get the factors out (just the evenly-divisible numbers -- we'll worry about primes in a bit).

Here's what I came up with after a little bit of experimentation (this shows the function from the script file and the output from the F# Interactive window):


This is a brute-force way of finding factors of a number. It starts with the number 2 and then iterates through all the values up to the target number divided by 2. If the number is evenly divisible, then it gets added to the sequence (that's the "yield i").

The initial idea of going up to the target number divided by 2 was to minimize calculations of non-prime factors. This was a good thought, but it turns out there's a much better way of doing this (which I tried a bit later -- we'll see that when it comes up).

When running this against our test number 13195, we get the following results:


We pipe the output to "Seq.toList" to force the evaluation of the sequence.

The output shows us all of the numbers that are evenly-divisible into 13195. And we see that the prime numbers that we're looking for are included: 5, 7, 13, 29. But we also have lots of other numbers (which aren't prime).

Prime Factors
So the next step was to figure out which of these numbers are prime numbers. To do that, I did some more experiments with the "factor" function that we have.


When we run the "factor" function against a number that is prime (meaning, it's only divisible by 1 and itself), we get an empty list back. We can see that "factor 29" (a prime) gives us an empty list, but "factor 1885" (not prime) gives us a list of values.

So this gave me the idea that I could run the initial output through a filter that runs "factor" against all the values. Then I could hang onto the ones that resulted in empty lists.

This is kind of a brute-force approach. But we need to start somewhere.


This takes our sequence of factors and runs it through a filter. If the sequence is empty when we run it through the "factor" function, then we know it's a prime number, and we'll hang onto it.

We can combine these functions with pipelining:


This gives us the list of prime factors for 13195. Success!

Performance Issues
I wasn't sure if this approach would work with our actual target number -- 13 thousand is much smaller than 600 billion. But I figured it was worth a try.

One change that we'll have to make: we need this to work with bigger integers. 600 billion will overflow the standard 32-bit integer.

This is pretty easy to change by just adding the "L" suffix to all of the integers. This specifies that it is a 64-bit integer (which is big enough for us).


After adding the "L" to the integers in our code, the F# type inference system takes over. We can see in the function signatures that we now have "int64" and "seq<int64>" specified.

And our code still works with our test values:


So far, so good. (You'll notice that we had to add "L" to our input value, and all of our output values are also "L"s.)

A Big Problem
The big problem came when I tried to run this against our target value. I kicked it off and let it run. The good news is that it didn't cause any memory overflows. In some of my earlier tests, I was using allocated lists instead of sequences, and these ran through the memory *very* quickly.

But I let this run for 3 hours, and didn't get any results back. It was still running; it didn't memory overflow, and it didn't stack overflow. I suspect that it would have completed eventually. But it obviously wasn't a good solution.

Better Factoring
So the problem is that I was generating (or trying to generate) a sequence based on billions of numbers. When I was doing some searching on how to identify prime numbers, I ran across the Sieve of Eratosthenes. This is something that I've used in the past when calculating prime numbers.

One thing that I remembered is that the Sieve of Eratosthenes uses a max value of the square root of the target number (not the target number divided by 2 like I was using here).

**WARNING WARNING WARNING**
***BAD MATH ALERT***
Update: This turns out to be bad math (yes, I know bad math coming from me is a bit of a surprise). Using the square root works in this particular case (actually both our sample case and our target case), but this does not hold true across numbers.

I have an alternative solution that performs even better (and with good math). Check it out: Getting Prime Factors in F# (with Good Math).
**END WARNING**

This makes a huge difference. The square root of our target number is 775,146. This changed our sequence of billions of items to a sequence of hundreds of thousands of items. This is a *huge* difference.

So, I tried updating the "factor" method to use the square root:


You'll notice that there are several casts here. "Sqrt" works with floats, so we have to cast our incoming value to float. Then we need to cast it back to int64 to work with the rest of our function. (I also needed to bring in the "System" namespace so that I could use the "Math" functions.)

This gave me a smaller list of factors for the test values:


Instead of 14 values (like we had originally), we now only have 7. And it still includes all of our primes since we're looking at the lower end of the list for those.

And this will now run practically against our target number:


As we can see, this is not that many values (only 7). Which means that running this through the filter to get the primes will also go very quickly:


This gives us the prime factors -- and also our answer. The value "6857" is the largest prime (and it's easy to verify that as the correct value by looking at solutions to Euler Problem #3).

Putting Things Together
Now that I had some working functions, I wanted to put them together. Here's where I ended up:


There are a couple things here. First, I made "factor" a local function. The first call "factor n" will do the initial pass of getting the factors. Then the second call inside "Seq.filter" will get us just the prime values.

Finally, there's a call to "Seq.max". This will get us the largest of the prime factors.

When we run this against our test number, we see that we get the expected output:


And when we run this against our target number, we see that we get the expected value based on what we saw above:


One other very important thing to note is how long this took to run. I used the "#time" command in F# Interactive to turn on timing. Notice that the whole process took 221 milliseconds. This is a very acceptable response (especially considering the 3 hour incomplete response from earlier).

Based on this time, I don't really feel a need to optimize this further (even though I'm sure there are ton of optimizations to be had here).

Wrap Up
Here's the final, working solution:



There is probably a much better way to calculate prime factors. And now that I have my working solution, I'll start looking online at other people's solutions.

I honestly didn't think I'd be able to come up with a solution that had acceptable performance without looking for more help. So, I'm a bit surprised that I got this working as well as I did.

I'm still having fun exploring and learning. I've veered a little on my learning path (I'll be writing about that a bit later this week). But I'm still moving forward with F#.

Happy Coding!

No comments:

Post a Comment