Thursday, July 21, 2016

Functional Practice: Euler Problem #5 - Faster with Math & Help from the Community

Rather than moving on to the next problem from Project Euler, we'll revisit Euler Problem #5. Last time, I did a brute-force attack of the problem. I got the right answer, but I knew there was a better way of approaching the problem: by using math.

Here's Euler Problem #5 (again):
2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
It's possible to solve this problem using prime factors. And since I have a good way to find prime factors in F#, I figured it would be good to explore this approach.

[Update: Collected articles for the first 10 Euler Problems in F# are available here: Jeremy Explores Functional Programming.]

Manually Solving the Problem
We'll start by looking at the prime factors for the values 2 to 10. (We'll skip 1 because that divides evenly into everything):
  • 2 = 2
  • 3 = 3
  • 4 = 2 * 2
  • 5 = 5
  • 6 = 2 * 3
  • 7 = 7
  • 8 = 2 * 2 * 2
  • 9 = 3 * 3
  • 10 = 2 * 5
From here, we want to create a set of prime factors that covers all of these individual sets of prime factors. That looks like this:
  • 2 * 2 * 2 * 3 * 3 * 5 * 7
If we look at this set, we see that it covers all of the cases from our list. If we look at 8, we see that it's factors are three 2s; and we have three 2s in our result set. In addition, 10 consists of a 2 and a 5, and we see that we have (at least) one 2 and one 5 in our result set.

We can go through each of the numbers from 2 to 10 and find that the prime factors are represented in our result set.

And this is where the magic happens. If we multiply the numbers of our result set, we get...
  • 2520
Hey, that looks significant. In fact, it's the answer to the sample case of Euler Problem #5.

We can use this same technique to get the solution to the target case. We just need to figure out how to get the prime factors for the numbers 2 to 20 and then reduce the factors.

Getting the Prime Factors
It's pretty easy to get the prime factors of our values. First, we'll use the "primeFactors" function that we came up with earlier:


It's pretty easy to use a map with this function to get all of the prime factors for the values 2 to 20:


This gives us a list of lists. Now we just have to figure out how to reduce this list to a single collection.

We can't use a "concat" because that just appends the lists together. If we did that, then we would have too many values -- for example, we would have sixteen 2s instead of just the four 2s that we need.

And we can't use a "distinct" or "union" operation because that would eliminate values that we need -- it would give us just one 2.

I thought about a few ways that I might approach this problem. But I felt like I was doing things the hard way. Maybe I was missing something that would make this really easy.

The F# Community is Awesome
I decided to put send out a tweet with my problem:


And I had several people from the F# community jump on this (even though I didn't really provide any specification at all).

Huge thanks to everyone who responded including Cameron Presley (@pcameronpresley), Reid Evans (@ReidNEvans), Jack Fox (@foxyjackfox), Tea Driven Dev (@TeaDrivenDev -- sorry, I don't know your real name), and Llewellyn Pritchard (@leppie).

In the end, Tea Driven Dev provided the solution I needed (again, especially good since my "spec" was pretty lacking):


If nothing else, I felt better about not missing an "easy" solution. But as Jack Fox pointed out:


If you'd like to take a look at the Twitter interaction, you can find the thread starting here: https://twitter.com/jeremybytes/status/755758533648322560.

Breaking Down the Solution
There are quite a few steps in this solution. Let's take things one line at a time to try to understand what's going on. As a reminder, here's "whatIhave" (a list of lists):


Getting Lists with Single Numbers
So, we'll send this in to the first line of the function:


This combines several functions to filter the initial list. First it does a distinct on each of our lists:

[2] --> [2]
[2; 2] --> [2]
[3; 2] --> [3; 2]
[19] --> [19]
[5; 2; 2] --> [5; 2]

Then it looks for the lists that have a length of 1. This eliminates any lists with multiple values:

[2] --> [2]
[2; 2] --> [2]
[3; 2] --> [3; 2]
[19] --> [19]
[5; 2; 2] --> [5; 2]

If we look at the resulting set of lists, we see that each list only contains one number (although some have multiple occurrences of that one number).

We can safely do this because the list with multiple numbers are represented by lists with single numbers. For example, the number 20 has factors of [5; 2; 2] (one 5 and two 2s). These factors are represented by other numbers in our list as well. 5 has a factor of one 5, and 4 has factors of two 2s.

This is true of all of our items with multiple numbers; earlier items in the list will represent the factors with only one number lists. This may be a bit hard to follow, but it does work.

Grouping by Number
Now let's head to the next step:


This groups our lists according to the number they contain. "head" pulls out the first number of each list, but since all the lists only have one number, the result is that they are grouped by that number.

Each item is now a tuple with the number and the list of lists. For example, we have two lists that contain 3s. These are grouped together as (3, [[3]; [3; 3]]).

Sorting by Number
The next step is to sort them according to the number:


"fst" represents the first value in our tuple, and that's what we're sorting by. This is the same order that they were already in. But doing the explicit sorting is a good safety measure.

Getting the Longest Lists
The next step is to get the longest list in each grouping:


This has quite a few functions combined together, so let's look at what each one does.

The "map" will run this against each item in our overall list -- so it will run for each number.

The "snd" pulls out the second item of the tuple (which is our list of lists):

(2, [[2]; [2; 2]; [2; 2; 2]; [2; 2; 2; 2]])
becomes
[[2]; [2; 2]; [2; 2; 2]; [2; 2; 2; 2]]

Then we sort by descending length:

[[2]; [2; 2]; [2; 2; 2]; [2; 2; 2; 2]]
becomes
[[2; 2; 2; 2]; [2; 2; 2]; [2; 2]; [2]]

Then we take the "head" item of the list which leaves us with [2; 2; 2; 2] -- the collection with the most number of this digit.

Combining the Lists
The last step is to combine the lists using "concat":



This gives us the output that we're looking for.

A Function to Reduce Prime Factors
Now that we've seen all the steps, let's look at the entire function:


Now we can take our list of lists of factors and get a single list back.

Solving the Problem
Now that we have our reduced prime factor list, all we have to do is multiply the items together to get the solution:


For this we'll use a "fold". This uses an accumulator value ("1" in this case), and then runs a function against every item in our list. In this case "x" represent our list item and "a" represents the accumulator.

Update: As Thomas Leveseque points out in the comments, This can be shortened to simply "|> List.fold (*) 1", and we'll get the same result.

The result of this is that we multiply each item in our list. And we can see that the result is what is expected (based on our previous look at Euler Problem #5).

Troubleshooting
Now that we have the pieces working, let's start putting them together into a function that will take an arbitrary range as a parameter.

We'll start with a hard-coded list:


When we run this, we get the right answer. But look at the timing; it took 13 seconds to get there. Something doesn't seem right about that.

Let's run just the first two lines to see what happens:


Look at the first value that comes out: [-1; -1]. This is our problem. In our earlier tests, we were starting with "2" because "1" evenly divides into everything. But we didn't take that into consideration here.

We tried to run our "primeFactors" function against "1", and that caused a problem. In fact, if we were to walk through the function, we would find that it loops through all of the integers, overflows, and then works it's way back to -1. And that's why it takes 13 seconds.

It's easy enough to fix this. Just to be safe, we'll skip over everything that's less than 2:


The "skipWhile" will make sure that we don't have "0", "1", or any negative numbers in our range. And now we see that our function takes almost no time at all to complete.

The Final Function
Now let's put everything together:


This puts "primeFactors" and "reduceFactors" into our function as local functions. These functions are actually useful on their own, so it would probably make sense to keep these separate. But we'll put them together so we have everything in one place.

When we run this, we get the expected output:


And look at the timing; this took almost no time at all. Much better than waiting over 1 minute for our brute force approach.

Learnings
There were a couple of big take-aways from this particular process.

The F# Community is Awesome
The F# community is really awesome. I already knew that, but this just reinforced that. I was very excited to see so many people take interest in my problem. And I think the reason for that is 2-fold. First, there are people who are willing and able to help out folks who have questions. Second, there are people who like the challenge of solving a problem.

In this case, Euler Problem #5 is a well-defined problem, and it has many well-defined solutions. But it's still an interesting challenge. Once Reid Evans found that I was working on Euler #5, he sent over his solution: FunctionalKnox/Euler.

I have met so many people that are willing to help and share. I'm hoping that I'll be able to pass that along in the not-too-distant future.

Debugger vs. REPL
The other thing that I noticed here was the difference in how to troubleshoot an issue. I had read about (and heard people talk about) how in F# you don't really use the debugger. Instead, you use the REPL to investigate and find things.

And I experienced that first-hand with this problem. When I ran into the timing issue (why is it taking 13 seconds to calculate prime numbers?), I didn't fire up the debugger. Instead, I just ran a few lines of the code to see what the intermediate output is.

It makes more sense now. You don't need to inspect intermediate variables because values aren't changing -- there's no need for a "watch". Instead you're mostly dealing with discrete inputs and outputs, and many processes are several functions composed together. So you can run them in bits and pieces and to get observable results. That's pretty cool.

Wrap Up
I'm obviously still headed forward with F#. And I'll be exploring with more of the Euler problems. I've taken a peek at #6 and it looks pretty easy. We'll see how it goes when I dive into it.

We get better by getting help from those who are ahead of us on the path, and we make our community stronger by helping those who are behind us on the path.

Happy Coding!

4 comments:

  1. Nice write up, thanks! For what it's worth, (fun x a -> x * a) can be reduced to just (*) ;)

    ReplyDelete
  2. Rather than factorizing every number <= N, a little inspection shows that the problem is equivalent to

    fold * 1 (for each prime p <= N, the highest power of p also <= N)

    where you should have prime number generation to hand from problem #3

    ReplyDelete
    Replies
    1. Thanks, Steve. This is an interesting way to look at the problem. That's definitely what makes Project Euler such a good challenge. There are a lot of different approaches, and they vary on efficiency and speed. I'll have to try this method to see what I come up with.
      -Jeremy

      Delete