## F# - some Project Euler patterns

To help me get used to F# and relearn the ways of functional programming I've been working through Project Euler in a Jupyter IfSharp notebook and keeping my solutions on GitHub at https://github.com/smcl/ProjectEulerJupyter After around 50 problems so far I've spotted a handful of patterns which had either a number of possible solutions or were a pain to type out (or copy/paste) each time I used them. I explored them each in a little more detail to find the most optimal implementation of each pattern. The reason I wrote this up is that even though the problems are pretty simple, some of the results were pretty surprising.

For each of the patterns I've got a simple definition, some solutions and a set of benchmark results in a table. In each results table I've highlighted the most optimal solution that fully expands the result set (so the lazily evaluated solutions that "complete" in a millisecond don't count) so that we can have a realistic idea of what the best solution is.

# Combinations of two lists

The first item is the simplest of the four problems - if we have two lists foo and bar, produce a list of pairs featuring all combinations of elements of foo with bar. So given something like ...

```let foo = [ 'a'; 'b'; 'c' ]
let bar = [ 1; 2; 3 ]
```

We expect to see a list like this ...

```[
('a', 1); ('a', 2); ('a', 3)
('b', 1); ('b', 2); ('b', 3)
('c', 1); ('c', 2); ('c', 3)
]```

I came up with only three solutions - I don't feel like I have an ideal solution to this, just the least hacky variant of the first solution that popped up in my mind.

pair_list: The first solution, call List.map for every member of one list, then in the function argument call List.map again for every member of the second - flatten the resulting list using List.concat

```let pair_list l1 l2 =
List.map (fun x -> l2 |> List.map (fun y -> (x,y))) l1
|> List.concat
```

pair_seq: as above, but assume we can have sequences as input, so we can produce the (fairly large) output array lazily:

```let pair_seq s1 s2 =
Seq.map (fun x -> s2 |> Seq.map (fun y -> (x,y))) s1
|> Seq.concat
```

pair_seq_expanded: as above, expand fully to a List for an idea of how long it takes to operate on the whole output:

```let pair_seq_expanded s1 s2 =
Seq.map (fun x -> s2 |> Seq.map (fun y -> (x,y))) s1
|> Seq.concat
|> Seq.toList
```

pair_seq_for: it's pretty clear that using a sequence is preferable here, especially if you need to work with 1000 x 1000 collections, so the final variant is a slight rewrite of the second, using a for loop and yield-ing the resulting tuple.

```let pair_seq_for s1 s2 =
[ for x in s1 do
for y in s2 do
yield (x,y) ]
```

To compare the performance of these three I've defined 100 and 1000 element lists/sequences and measured how long it takes to iterate through each sequence pair performing a simple operation (accumulating the difference between the pairs).

### time to create n-element collection of pairs in milliseconds

method n=10000 n=100000 n=1000000 n=10000000
pair_list 1.5096 14.7937 226.0501 2927.2477
pair_seq 0.8690 0.8690 0.8846 0.9028
pair_seq_expanded 3.3952 21.5028 184.3353 2264.2805
pair_seq_for 3.2361 12.5183 180.1352 1997.3700

So thankfully the cleanest looking pair_seq_for solution is actually the fastest when we get to larger data sets. This isn't quite where the story ends though. There's a nice discussion here on Stack Overflow about a similar but slightly different problem - finding n element combinations of a single list - so for

```let someList = [ 1; 2; 3 ]
```

... we wanted a function combs (n:'a) (lst:'a list) which would produce something like the below for combs 2 someList

```[
( 1, 2 ); ( 1, 3 )
( 2, 1 ); ( 2, 3 )
( 3, 1 ); ( 3, 2 )
]
```

This is different from the problem I posed, but I've got a GitHub gist here where I've turned them all loose on the same set of data, and performed some simple measurements.

# Pairing elements of Collections with their indexes

In a couple of places I found myself wondering if F# collections had an equivalent of python's enumerate - which is a function which wraps a list and returns an index/element pair for each loop iteration:

```letters = [ "a", "b", "c", "d" ]
for i, c in enumerate(letters):
print "%d: %s" % (i, c)

# output:
#     0: a
#     1: b
#     2: c
#     3: d
```

It took a little while before I spotted Array.mapi so I ended up working through and measuring a handful of different ways first - some are obviously pretty poor (particularly those using recursion) but I left them in nonetheless:

enumerate_by_for_seq - using a Seq to generate the index and return a pair

```let enumerate_by_for_seq (a:string []) =
seq { for i in 0 .. (a.Length-1) -> (i, a.[i]) }
```

enumerate_by_for_seq_expanded - as previous, but returning a List to fully expand the sequence

```let enumerate_by_for_seq_expanded (a:string []) =
seq { for i in 0 .. (a.Length-1) -> (i, a.[i]) }
|> Seq.toList
```

enumerate_by_for_list - iterating over each index using a for loop, returning a (int * string) list

```let enumerate_by_for_list (a:string []) =
[ for i in 0 .. (a.Length-1) -> (i, a.[i]) ]
```

enumerate_by_for_array - as above but returning (int * string[], note that this seems ridiculously similar, but differs surprisingly in performance (I discovered this by accident and included it in this experiment because of the difference!)

```let enumerate_by_for_array (a:string []) =
[| for i in 0 .. (a.Length-1) -> (i, a.[i]) |]
```

enumerate_by_map - generating a list of indexes and then using |> and List.map to create the index/element pair (i.e. the same as the first approach, but using List)

```let enumerate_by_map (a:string []) =
[0..(a.Length-1)]
|> List.map (fun i -> (i, a.[i]))
```

enumerate_by_recursion_array - bonkers approach, abusing Array.append and recursing. Just don't do this...

```let rec enumerate_by_recursion_array' i (a:string[]) =
match i with
| 0 -> [||]
| _ -> Array.append [| (i, a.[i]) |] (enumerate_by_recursion_array' (i-1) (a.[1..]))

let enumerate_by_recursion_array (a:string[]) =
enumerate_by_recursion_array' (a.Length-1) a
```

enumerate_by_recursion_list - List variant of the above. Don't do this either

```let rec enumerate_by_recursion_list' i (a:string[]) =
match i with
| 0 -> []
| _ -> [ (i, a.[i]) ] @ (enumerate_by_recursion_list' (i-1) (a.[1..]))

let enumerate_by_recursion_list (a:string[]) =
enumerate_by_recursion_list' (a.Length-1) a```

enumerate_by_for_zip - Using Array.zip - shortest solution, the best until I spotted Array.mapi

```let enumerate_by_zip (a:string[]) =
Array.zip a [|0..(a.Length-1)|]```

enumerate_by_for_mapi - Probably the most "correct" solution, using Array.mapi

```let enumerate_by_mapi (a:string[]) =
Array.mapi (fun i x -> (i,x)) a```

enumerate_by_for_parallel_mapi - As above but naively switching in Array.Parallel.mapi without any other changes

```let enumerate_by_parallel_mapi (a:string[]) =
Array.Parallel.mapi (fun i x -> (i,x)) a```

### time taken to enumerate n element collection (milliseconds)

method n=10000  n=100000 n=1000000 n=10000000
enumerate_by_for_seq 0.3385 0.3496 0.3471 0.3540
enumerate_by_for_seq_expanded 2.6177 18.8341 205.4403 3610.3913
enumerate_by_for_list 1.3487 22.1703 248.5039 4200.8530
enumerate_by_for_array 2.1619 12.8186 192.3148 3178.5893
enumerate_by_map 2.0391 26.2468 287.2852 4179.3407
enumerate_by_recursion_array 7760.3141 n/a*  n/a*
n/a*
enumerate_by_recursion_list 5368.5472 n/a*
n/a*
n/a*
enumerate_by_zip 7.1136 9.4388 170.0941 1917.8617
enumerate_by_mapi 2.6911 13.0303 116.5348 1268.8625
enumerate_by_parallel_mapi 8.1293 17.7548 102.2350 1379.0431

* = this took way too long so I killed it

Obviously Array.mapi was the fastest overall. However it wasn't as much faster than Array.zip as I would have imagined, and I suspect that I'm doing something wrong with Array.Parallel.mapi. Also interesting is that while the super-fast performance of the enumerate_by_for_seq method dissipates somewhat when fully evaluated, it is still faster than the equivalent enumerate_by_for_list version.

# Pandigitals

"Pandigital" numbers feature relatively frequently in Project Euler. An n-digit pandigital number will contain all digits from 0..or 1..(n-1) once in some order. For example 41523 is a 1-5 pandigital, and 43210 is 0-4 pandigital. These numbers are mentioned in 32, 38, 41, 104, 118, 170 (and perhaps more) so a relatively efficient way to recognise them is pretty useful to have at hand.

Again there's a few ways we can do this - in each case I can think of we start with taking the string representation of the input number and splitting it up using ToCharArray() and with this we can do a number of different things.

pandigital_strcmp - sort array, map each element to string, sort, create string + compare to "123..n"

```let pandigital_strcmp (n:int) =
let sorted = new string (string(n).ToCharArray() |> Array.sort)
sorted = pandigitalString
```

pandigital_intcmp - sort array, map each element to string, sort, create string, cast to int + compare to 123..n

```let pandigital_intcmp (n:int) =
let sorted = new string (string(n).ToCharArray() |> Array.sort)
int(sorted) = pandigitalInt
```

pandigital_arrcmp - sort array, string, cast to int + compare to existing array [| '1'; '2'; .. n |]

```let pandigital_arrcmp (n:int) =
pandigitalArray = (string(n).ToCharArray() |> Array.sort)```

pandigital_set_difference - convert to Set and compute difference from precalc'd set, pandigital if empty

```let pandigital_set_difference (n:int) =
string(n).ToCharArray()
|> Set.ofArray
|> Set.difference pandigitalSet
|> Set.isEmpty
```

pandigital_array_contains - for each element in precalculated pandigital array, check it's present in array and use List.fold to ensure all true

```let pandigital_array_contains (n:int) =
let a = string(n).ToCharArray()
pandigitalArray
|> Array.map (fun c -> Array.contains c a)
|> Array.fold (fun e acc -> e && acc) true```

So I tested these against using the code to measure how quickly each method was in applying

```// where panDigitalInt is the upper limit ("n" in the table)
let testNumbers = [ 0 .. pandigitalInt ]
let bench name f =
let sw = Stopwatch.StartNew()
let res = testNumbers |> List.filter f |> List.length
sw.Stop()
printfn "%s: %f ms" name sw.Elapsed.TotalMilliseconds```

### time taken to filter pandigitals in [0..n] in milliseconds

method n=1234 n=12345 n=123456 n=1234567
pandigital_strcmp 2.1081 11.2639 113.2086 1356.1985
pandigital_intcmp 0.9716 9.7646 89.3238 947.0513
pandigital_arrcmp 2.4441 6.1932 59.7014 618.0665
pandigital_set_difference 2.5024 17.2115 199.2863 1986.9592
pandigital_array_contains 0.9790 4.8161 50.447 565.6698

So it seems Array.contains wins overall. The Set.difference approach was pretty dismal which was disappointing - it came to me when I was out walking my dog and I rushed back to write it and benchmark it. I think Set.ofArray is perhaps a little slow, but I haven't done any investigation into it.

It's worth noting that you probably shouldn't do something like [0..bigNumber] |> List.filter pandigital_array_contains to start with - maybe it's worth approaching the problem from a different angle in some cases.

# Sorting a 3-element tuple

OK this only came up once and was part of a pretty poor solution I had for problem 39 (original, solution) - I generated thousands of tuple featuring 3 of integers and then tested whether they could represent the sides of right-angled triangles using Pythagoras' theorem. However since they were in no particular order I thought I needed identify the hypotenuse. I wrote this out long-form since there's only a handful of cases and solved the problem relatively quickly.

Regardless of whether this was a suitable solution for the problem, I was curious as to what approach works best for sorting these tiny collections of 3 elements.

sort_nested_if - use nested if statements to reduce the number of comparisons needed while introducing branches
```let sort_nested_if (a,b,c) =
if a >= b then
if b >= c then (a,b,c) else (a,c,b)
elif b >= a then
if a >= c then (b,a,c) else (b,c,a)
else
if a >= b then (c,a,b) else (c,b,a)
```

sort_flat_if - have a separate if for each result at the top level

```let sort_flat_if (a,b,c) =
if a >= b && b >= c then       (a,b,c)
elif a >= b && b >= c then     (a,c,b)
elif b >= a && a >= c then     (b,a,c)
elif b >= a && c >= a then     (b,c,a)
elif (*c >= b &&*) a >= b then (c,a,b)
else (*c >= b && b >= a then*) (c,b,a)
```

sort_array - create an array, use Array.sort and map the results back into a tuple when returning the result

```let sort_array (a,b,c) =
let sorted = Array.sort [| a;b;c |]
(sorted., sorted., sorted.)
```

To test these I generated large arrays of size 4000, 40000 and 400000 3-element tuples and timed how long each method took to sort them.

### time taken to sort n 3-element tuples, in milliseconds

method n=4000 40000 400000
sort_nested_if 1.2626 13.9014 193.3619
sort_flat_if 1.7864 23.4633 258.2538
sort_array 1.2424 11.9907 132.4312

OK now it's probably obvious why I didn't just bin this little experiment - sort_array is surprisingly the clear winner. I would have guessed that the overhead of building an array and calling Array.sort function on a list way smaller than you'd normally need to sort would be insane. But apparently it's not!

# Conclusion

It's surprising how many different ways you can write some relatively simple algorithms. Some of them are obviously pretty awful, like the recursive enumerate function (though I'm sure I can rearrange it to take advantage of tail call elimination) - and some were surprisingly performant, like the sort_array function in the final example. I've noticed that some other Project Euler people maintain a sort of library of functions they can re-use. Eventually I'd like to do something like this, but until it becomes unmanageable I'll just keep a Gist on GitHub:

0 responses