To help me get used to F# and relearn the ways of functional programming I've been working through Project Euler in a Jupyter IfSharp notebook and keeping my solutions on GitHub at https://github.com/smcl/ProjectEulerJupyter

After around 50 problems so far I've spotted a handful of patterns which had either a number of possible solutions or were a pain to type out (or copy/paste) each time I used them. I explored them each in a little more detail to find the most optimal implementation of each pattern. The reason I wrote this up is that even though the problems are pretty simple, some of the results were pretty surprising.

For each of the patterns I've got a simple definition, some solutions and a set of benchmark results in a table. In each results table I've **highlighted** the most optimal solution that *fully expands the result set *(so the lazily evaluated solutions that "complete" in a millisecond don't count) so that we can have a realistic idea of what the best solution is.

# Combinations of two lists

The first item is the simplest of the four problems - if we have two lists *foo* and *bar, *produce a list of pairs featuring all combinations of elements of *foo* with *bar. *So given something like ...

let foo = [ 'a'; 'b'; 'c' ] let bar = [ 1; 2; 3 ]

We expect to see a list like this ...

[ ('a', 1); ('a', 2); ('a', 3) ('b', 1); ('b', 2); ('b', 3) ('c', 1); ('c', 2); ('c', 3) ]

I came up with only three solutions - I don't feel like I have an ideal solution to this, just the least hacky variant of the first solution that popped up in my mind.

**pair_list**: The first solution, call *List.map* for every member of one list, then in the function argument call *List.map *again for every member of the second - flatten the resulting list using *List.concat*

let pair_list l1 l2 = List.map (fun x -> l2 |> List.map (fun y -> (x,y))) l1 |> List.concat

**pair_seq**: as above, but assume we can have sequences as input, so we can produce the (fairly large) output array lazily:

let pair_seq s1 s2 = Seq.map (fun x -> s2 |> Seq.map (fun y -> (x,y))) s1 |> Seq.concat

**pair_seq_expanded**: as above, expand fully to a *List* for an idea of how long it takes to operate on the whole output:

let pair_seq_expanded s1 s2 = Seq.map (fun x -> s2 |> Seq.map (fun y -> (x,y))) s1 |> Seq.concat |> Seq.toList

**pair_seq_for**: it's pretty clear that using a sequence is preferable here, especially if you need to work with 1000 x 1000 collections, so the final variant is a slight rewrite of the second, using a *for* loop and *yield*-ing the resulting tuple.

let pair_seq_for s1 s2 = [ for x in s1 do for y in s2 do yield (x,y) ]

To compare the performance of these three I've defined 100 and 1000 element lists/sequences and measured how long it takes to iterate through each sequence pair performing a simple operation (accumulating the difference between the pairs).

### time to create *n*-element collection of pairs in milliseconds

method | n=10000 | n=100000 | n=1000000 | n=10000000 |
---|---|---|---|---|

pair_list | 1.5096 | 14.7937 | 226.0501 | 2927.2477 |

pair_seq | 0.8690 | 0.8690 | 0.8846 | 0.9028 |

pair_seq_expanded | 3.3952 | 21.5028 | 184.3353 | 2264.2805 |

pair_seq_for | 3.2361 | 12.5183 | 180.1352 | 1997.3700 |

So thankfully the cleanest looking *pair_seq_for *solution is actually the fastest when we get to larger data sets. This isn't quite where the story ends though. There's a nice discussion here on Stack Overflow about a similar but slightly different problem - finding *n* element combinations of a single list - so for

let someList = [ 1; 2; 3 ]

... we wanted a function *combs (n:'a) (lst:'a list)* which would produce something like the below for *combs 2 someList*

[ ( 1, 2 ); ( 1, 3 ) ( 2, 1 ); ( 2, 3 ) ( 3, 1 ); ( 3, 2 ) ]

This is different from the problem I posed, but I've got a GitHub gist here where I've turned them all loose on the same set of data, and performed some simple measurements.

# Pairing elements of Collections with their indexes

In a couple of places I found myself wondering if F# collections had an equivalent of python's *enumerate* - which is a function which wraps a list and returns an index/element pair for each loop iteration:

letters = [ "a", "b", "c", "d" ] for i, c in enumerate(letters): print "%d: %s" % (i, c) # output: # 0: a # 1: b # 2: c # 3: d

It took a little while before I spotted *Array.mapi* so I ended up working through and measuring a handful of different ways first - some are obviously pretty poor (particularly those using recursion) but I left them in nonetheless:

**enumerate_by_for_seq** - using a *Seq* to generate the index and return a pair

let enumerate_by_for_seq (a:string []) = seq { for i in 0 .. (a.Length-1) -> (i, a.[i]) }

**enumerate_by_for_seq_expanded** - as previous, but returning a *List* to fully expand the sequence

let enumerate_by_for_seq_expanded (a:string []) = seq { for i in 0 .. (a.Length-1) -> (i, a.[i]) } |> Seq.toList

**enumerate_by_for_list** - iterating over each index using a for loop, returning a *(int * string) list*

let enumerate_by_for_list (a:string []) = [ for i in 0 .. (a.Length-1) -> (i, a.[i]) ]

**enumerate_by_for_array** - as above but returning *(int * string*) *[]*, note that this seems ridiculously similar, but differs surprisingly in performance (I discovered this by accident and included it in this experiment because of the difference!)

let enumerate_by_for_array (a:string []) = [| for i in 0 .. (a.Length-1) -> (i, a.[i]) |]

**enumerate_by_map** - generating a list of indexes and then using |> and *List.map* to create the index/element pair (i.e. the same as the first approach, but using *List*)

let enumerate_by_map (a:string []) = [0..(a.Length-1)] |> List.map (fun i -> (i, a.[i]))

**enumerate_by_recursion_array** - bonkers approach, abusing *Array.append* and recursing. Just don't do this...

let rec enumerate_by_recursion_array' i (a:string[]) = match i with | 0 -> [||] | _ -> Array.append [| (i, a.[i]) |] (enumerate_by_recursion_array' (i-1) (a.[1..])) let enumerate_by_recursion_array (a:string[]) = enumerate_by_recursion_array' (a.Length-1) a

**enumerate_by_****recursion_list** - *List* variant of the above. Don't do this either

let rec enumerate_by_recursion_list' i (a:string[]) = match i with | 0 -> [] | _ -> [ (i, a.[i]) ] @ (enumerate_by_recursion_list' (i-1) (a.[1..])) let enumerate_by_recursion_list (a:string[]) = enumerate_by_recursion_list' (a.Length-1) a

**enumerate_by_for_zip **- Using *Array.zip* - shortest solution, the best until I spotted *Array.mapi*

let enumerate_by_zip (a:string[]) = Array.zip a [|0..(a.Length-1)|]

**enumerate_by_for_mapi** - Probably the most "correct" solution, using *Array.mapi*

let enumerate_by_mapi (a:string[]) = Array.mapi (fun i x -> (i,x)) a

**enumerate_by_for_parallel_mapi** - As above but naively switching in *Array.Parallel.mapi *without any other changes

let enumerate_by_parallel_mapi (a:string[]) = Array.Parallel.mapi (fun i x -> (i,x)) a

### time taken to enumerate *n* element collection (milliseconds)

method | n=10000 | n=100000 | n=1000000 | n=10000000 |
---|---|---|---|---|

enumerate_by_for_seq | 0.3385 | 0.3496 | 0.3471 | 0.3540 |

enumerate_by_for_seq_expanded | 2.6177 | 18.8341 | 205.4403 | 3610.3913 |

enumerate_by_for_list | 1.3487 |
22.1703 | 248.5039 | 4200.8530 |

enumerate_by_for_array | 2.1619 | 12.8186 | 192.3148 | 3178.5893 |

enumerate_by_map | 2.0391 | 26.2468 | 287.2852 | 4179.3407 |

enumerate_by_recursion_array | 7760.3141 | n/a* | n/a* |
n/a* |

enumerate_by_recursion_list | 5368.5472 | n/a* |
n/a* |
n/a* |

enumerate_by_zip | 7.1136 | 9.4388 |
170.0941 | 1917.8617 |

enumerate_by_mapi | 2.6911 | 13.0303 | 116.5348 | 1268.8625 |

enumerate_by_parallel_mapi | 8.1293 | 17.7548 | 102.2350 |
1379.0431 |

* = this took *way* too long so I killed it

Obviously *Array.mapi* was the fastest overall. However it wasn't as much faster than *Array.zip *as I would have imagined, and I suspect that I'm doing something wrong with *Array.Parallel.mapi**.* Also interesting is that while the super-fast performance of the *enumerate_by_for_seq* method dissipates somewhat when fully evaluated, it is still faster than the equivalent *enumerate_by_for_list* version*.*

# Pandigitals

"Pandigital" numbers feature relatively frequently in Project Euler. An *n-*digit pandigital number will contain all digits from 0..*n *or 1..(n-1) once in some order. For example *41523* is a 1-5 pandigital, and 43210 is 0-4 pandigital. These numbers are mentioned in 32, 38, 41, 104, 118, 170 (and perhaps more) so a relatively efficient way to recognise them is pretty useful to have at hand.

Again there's a few ways we can do this - in each case I can think of we start with taking the string representation of the input number and splitting it up using *ToCharArray() *and with this we can do a number of different things.

**pandigital_strcmp** - sort array, map each element to string, sort, create string + compare to "123..n"

let pandigital_strcmp (n:int) = let sorted = new string (string(n).ToCharArray() |> Array.sort) sorted = pandigitalString

**pandigital_intcmp** - sort array, map each element to string, sort, create string, cast to int + compare to 123..n

let pandigital_intcmp (n:int) = let sorted = new string (string(n).ToCharArray() |> Array.sort) int(sorted) = pandigitalInt

**pandigital_arrcmp** - sort array, string, cast to int + compare to existing array [| '1'; '2'; .. n |]

let pandigital_arrcmp (n:int) = pandigitalArray = (string(n).ToCharArray() |> Array.sort)

**pandigital_set_difference** - convert to Set and compute difference from precalc'd set, pandigital if empty

let pandigital_set_difference (n:int) = string(n).ToCharArray() |> Set.ofArray |> Set.difference pandigitalSet |> Set.isEmpty

**pandigital_array_contains** - for each element in precalculated pandigital array, check it's present in array and use List.fold to ensure all true

let pandigital_array_contains (n:int) = let a = string(n).ToCharArray() pandigitalArray |> Array.map (fun c -> Array.contains c a) |> Array.fold (fun e acc -> e && acc) true

So I tested these against using the code to measure how quickly each method was in applying

// where panDigitalInt is the upper limit ("n" in the table) let testNumbers = [ 0 .. pandigitalInt ] let bench name f = let sw = Stopwatch.StartNew() let res = testNumbers |> List.filter f |> List.length sw.Stop() printfn "%s: %f ms" name sw.Elapsed.TotalMilliseconds

### time taken to filter pandigitals in [0..n__]__ in milliseconds

method | n=1234 | n=12345 | n=123456 | n=1234567 |
---|---|---|---|---|

pandigital_strcmp | 2.1081 | 11.2639 | 113.2086 | 1356.1985 |

pandigital_intcmp | 0.9716 | 9.7646 | 89.3238 | 947.0513 |

pandigital_arrcmp | 2.4441 | 6.1932 | 59.7014 | 618.0665 |

pandigital_set_difference | 2.5024 | 17.2115 | 199.2863 | 1986.9592 |

pandigital_array_contains | 0.9790 | 4.8161 | 50.447 | 565.6698 |

So it seems *Array.contains* wins overall. The* Set.difference* approach was pretty dismal which was disappointing - it came to me when I was out walking my dog and I rushed back to write it and benchmark it. I think *Set.ofArray* is perhaps a little slow, but I haven't done any investigation into it.

It's worth noting that you probably shouldn't do something like *[0..bigNumber] |> List.filter pandigital_array_contains* to start with - maybe it's worth approaching the problem from a different angle in some cases.

# Sorting a 3-element tuple

OK this only came up once and was part of a pretty poor solution I had for problem 39 (original, solution) - I generated thousands of tuple featuring 3 of integers and then tested whether they could represent the sides of right-angled triangles using Pythagoras' theorem. However since they were in no particular order I thought I needed identify the hypotenuse. I wrote this out long-form since there's only a handful of cases and solved the problem relatively quickly.**sort_nested_if**- use nested if statements to reduce the number of comparisons needed while introducing branches

let sort_nested_if (a,b,c) = if a >= b then if b >= c then (a,b,c) else (a,c,b) elif b >= a then if a >= c then (b,a,c) else (b,c,a) else if a >= b then (c,a,b) else (c,b,a)

**sort_flat_if** - have a separate *if* for each result at the top level

let sort_flat_if (a,b,c) = if a >= b && b >= c then (a,b,c) elif a >= b && b >= c then (a,c,b) elif b >= a && a >= c then (b,a,c) elif b >= a && c >= a then (b,c,a) elif (*c >= b &&*) a >= b then (c,a,b) else (*c >= b && b >= a then*) (c,b,a)

**sort_array** - create an array, use *Array.sort* and map the results back into a tuple when returning the result

let sort_array (a,b,c) = let sorted = Array.sort [| a;b;c |] (sorted.[0], sorted.[1], sorted.[2])

To test these I generated large arrays of size 4000, 40000 and 400000 3-element tuples and timed how long each method took to sort them.

### time taken to sort *n* 3-element tuples, in milliseconds

method | n=4000 | 40000 | 400000 |
---|---|---|---|

sort_nested_if | 1.2626 | 13.9014 | 193.3619 |

sort_flat_if | 1.7864 | 23.4633 | 258.2538 |

sort_array | 1.2424 | 11.9907 | 132.4312 |

OK now it's probably obvious why I didn't just bin this little experiment - **sort_array** is surprisingly the clear winner. I would have guessed that the overhead of building an array and calling *Array.sort* function on a list way smaller than you'd normally need to sort would be insane. But apparently it's not!

# Conclusion

It's surprising how many different ways you can write some relatively simple algorithms. Some of them are obviously pretty awful, like the recursive *enumerate* function (though I'm sure I can rearrange it to take advantage of tail call elimination) - and some were surprisingly performant, like the *sort_array* function in the final example. I've noticed that some other Project Euler people maintain a sort of library of functions they can re-use. Eventually I'd like to do something like this, but until it becomes unmanageable I'll just keep a Gist on GitHub: