F# - some Project Euler patterns

To help me get used to F# and relearn the ways of functional programming I've been working through Project Euler in a Jupyter IfSharp notebook and keeping my solutions on GitHub at https://github.com/smcl/ProjectEulerJupyter

After around 50 problems so far I've spotted a handful of patterns which had either a number of possible solutions or were a pain to type out (or copy/paste) each time I used them. I explored them each in a little more detail to find the most optimal implementation of each pattern. The reason I wrote this up is that even though the problems are pretty simple, some of the results were pretty surprising.

For each of the patterns I've got a simple definition, some solutions and a set of benchmark results in a table. In each results table I've highlighted the most optimal solution that fully expands the result set (so the lazily evaluated solutions that "complete" in a millisecond don't count) so that we can have a realistic idea of what the best solution is.

Combinations of two lists 

The first item is the simplest of the four problems - if we have two lists foo and bar, produce a list of pairs featuring all combinations of elements of foo with bar. So given something like ...

let foo = [ 'a'; 'b'; 'c' ]
let bar = [ 1; 2; 3 ]

We expect to see a list like this ...

[ 
    ('a', 1); ('a', 2); ('a', 3)
    ('b', 1); ('b', 2); ('b', 3)
    ('c', 1); ('c', 2); ('c', 3)
]

I came up with only three solutions - I don't feel like I have an ideal solution to this, just the least hacky variant of the first solution that popped up in my mind.

pair_list: The first solution, call List.map for every member of one list, then in the function argument call List.map again for every member of the second - flatten the resulting list using List.concat

let pair_list l1 l2 =
    List.map (fun x -> l2 |> List.map (fun y -> (x,y))) l1
    |> List.concat    

pair_seq: as above, but assume we can have sequences as input, so we can produce the (fairly large) output array lazily:

let pair_seq s1 s2 =
    Seq.map (fun x -> s2 |> Seq.map (fun y -> (x,y))) s1
    |> Seq.concat

pair_seq_expanded: as above, expand fully to a List for an idea of how long it takes to operate on the whole output:

let pair_seq_expanded s1 s2 =
    Seq.map (fun x -> s2 |> Seq.map (fun y -> (x,y))) s1
    |> Seq.concat
    |> Seq.toList

pair_seq_for: it's pretty clear that using a sequence is preferable here, especially if you need to work with 1000 x 1000 collections, so the final variant is a slight rewrite of the second, using a for loop and yield-ing the resulting tuple.

let pair_seq_for s1 s2 = 
    [ for x in s1 do 
        for y in s2 do 
            yield (x,y) ]

To compare the performance of these three I've defined 100 and 1000 element lists/sequences and measured how long it takes to iterate through each sequence pair performing a simple operation (accumulating the difference between the pairs).

time to create n-element collection of pairs in milliseconds

method n=10000 n=100000 n=1000000 n=10000000
pair_list 1.5096 14.7937 226.0501 2927.2477
pair_seq 0.8690 0.8690 0.8846 0.9028
pair_seq_expanded 3.3952 21.5028 184.3353 2264.2805
pair_seq_for 3.2361 12.5183 180.1352 1997.3700

So thankfully the cleanest looking pair_seq_for solution is actually the fastest when we get to larger data sets. This isn't quite where the story ends though. There's a nice discussion here on Stack Overflow about a similar but slightly different problem - finding n element combinations of a single list - so for

let someList = [ 1; 2; 3 ]

... we wanted a function combs (n:'a) (lst:'a list) which would produce something like the below for combs 2 someList

[
    ( 1, 2 ); ( 1, 3 )
    ( 2, 1 ); ( 2, 3 )
    ( 3, 1 ); ( 3, 2 )
]

This is different from the problem I posed, but I've got a GitHub gist here where I've turned them all loose on the same set of data, and performed some simple measurements. 

Pairing elements of Collections with their indexes

In a couple of places I found myself wondering if F# collections had an equivalent of python's enumerate - which is a function which wraps a list and returns an index/element pair for each loop iteration:

letters = [ "a", "b", "c", "d" ]
for i, c in enumerate(letters):
    print "%d: %s" % (i, c)

# output:
#     0: a
#     1: b
#     2: c
#     3: d

It took a little while before I spotted Array.mapi so I ended up working through and measuring a handful of different ways first - some are obviously pretty poor (particularly those using recursion) but I left them in nonetheless:

enumerate_by_for_seq - using a Seq to generate the index and return a pair

let enumerate_by_for_seq (a:string []) =
    seq { for i in 0 .. (a.Length-1) -> (i, a.[i]) }

enumerate_by_for_seq_expanded - as previous, but returning a List to fully expand the sequence

let enumerate_by_for_seq_expanded (a:string []) =
    seq { for i in 0 .. (a.Length-1) -> (i, a.[i]) }
    |> Seq.toList

enumerate_by_for_list - iterating over each index using a for loop, returning a (int * string) list

let enumerate_by_for_list (a:string []) =
    [ for i in 0 .. (a.Length-1) -> (i, a.[i]) ]

enumerate_by_for_array - as above but returning (int * string[], note that this seems ridiculously similar, but differs surprisingly in performance (I discovered this by accident and included it in this experiment because of the difference!)

let enumerate_by_for_array (a:string []) =
    [| for i in 0 .. (a.Length-1) -> (i, a.[i]) |]

enumerate_by_map - generating a list of indexes and then using |> and List.map to create the index/element pair (i.e. the same as the first approach, but using List)

let enumerate_by_map (a:string []) =
    [0..(a.Length-1)]
    |> List.map (fun i -> (i, a.[i]))

enumerate_by_recursion_array - bonkers approach, abusing Array.append and recursing. Just don't do this...

let rec enumerate_by_recursion_array' i (a:string[]) =
    match i with
    | 0 -> [||]
    | _ -> Array.append [| (i, a.[i]) |] (enumerate_by_recursion_array' (i-1) (a.[1..]))

let enumerate_by_recursion_array (a:string[]) =
    enumerate_by_recursion_array' (a.Length-1) a

enumerate_by_recursion_list - List variant of the above. Don't do this either

let rec enumerate_by_recursion_list' i (a:string[]) =
    match i with
    | 0 -> []
    | _ -> [ (i, a.[i]) ] @ (enumerate_by_recursion_list' (i-1) (a.[1..]))
    
let enumerate_by_recursion_list (a:string[]) =
    enumerate_by_recursion_list' (a.Length-1) a

enumerate_by_for_zip - Using Array.zip - shortest solution, the best until I spotted Array.mapi

let enumerate_by_zip (a:string[]) =
    Array.zip a [|0..(a.Length-1)|]

enumerate_by_for_mapi - Probably the most "correct" solution, using Array.mapi

let enumerate_by_mapi (a:string[]) =   
    Array.mapi (fun i x -> (i,x)) a

enumerate_by_for_parallel_mapi - As above but naively switching in Array.Parallel.mapi without any other changes

let enumerate_by_parallel_mapi (a:string[]) =
    Array.Parallel.mapi (fun i x -> (i,x)) a

time taken to enumerate n element collection (milliseconds)

method n=10000  n=100000 n=1000000 n=10000000
enumerate_by_for_seq 0.3385 0.3496 0.3471 0.3540
enumerate_by_for_seq_expanded 2.6177 18.8341 205.4403 3610.3913
enumerate_by_for_list 1.3487 22.1703 248.5039 4200.8530
enumerate_by_for_array 2.1619 12.8186 192.3148 3178.5893
enumerate_by_map 2.0391 26.2468 287.2852 4179.3407
enumerate_by_recursion_array 7760.3141 n/a*  n/a* 
n/a*
enumerate_by_recursion_list 5368.5472 n/a* 
n/a* 
n/a*
enumerate_by_zip 7.1136 9.4388 170.0941 1917.8617
enumerate_by_mapi 2.6911 13.0303 116.5348 1268.8625
enumerate_by_parallel_mapi 8.1293 17.7548 102.2350 1379.0431

* = this took way too long so I killed it

Obviously Array.mapi was the fastest overall. However it wasn't as much faster than Array.zip as I would have imagined, and I suspect that I'm doing something wrong with Array.Parallel.mapi. Also interesting is that while the super-fast performance of the enumerate_by_for_seq method dissipates somewhat when fully evaluated, it is still faster than the equivalent enumerate_by_for_list version.

Pandigitals

"Pandigital" numbers feature relatively frequently in Project Euler. An n-digit pandigital number will contain all digits from 0..or 1..(n-1) once in some order. For example 41523 is a 1-5 pandigital, and 43210 is 0-4 pandigital. These numbers are mentioned in 32, 38, 41, 104, 118, 170 (and perhaps more) so a relatively efficient way to recognise them is pretty useful to have at hand. 

Again there's a few ways we can do this - in each case I can think of we start with taking the string representation of the input number and splitting it up using ToCharArray() and with this we can do a number of different things.

pandigital_strcmp - sort array, map each element to string, sort, create string + compare to "123..n"

let pandigital_strcmp (n:int) = 
    let sorted = new string (string(n).ToCharArray() |> Array.sort)
    sorted = pandigitalString

pandigital_intcmp - sort array, map each element to string, sort, create string, cast to int + compare to 123..n

let pandigital_intcmp (n:int) = 
    let sorted = new string (string(n).ToCharArray() |> Array.sort)
    int(sorted) = pandigitalInt

pandigital_arrcmp - sort array, string, cast to int + compare to existing array [| '1'; '2'; .. n |]

let pandigital_arrcmp (n:int) = 
    pandigitalArray = (string(n).ToCharArray() |> Array.sort)

pandigital_set_difference - convert to Set and compute difference from precalc'd set, pandigital if empty

let pandigital_set_difference (n:int) = 
    string(n).ToCharArray()
    |> Set.ofArray
    |> Set.difference pandigitalSet
    |> Set.isEmpty

pandigital_array_contains - for each element in precalculated pandigital array, check it's present in array and use List.fold to ensure all true

let pandigital_array_contains (n:int) = 
    let a = string(n).ToCharArray()
    pandigitalArray 
    |> Array.map (fun c -> Array.contains c a) 
    |> Array.fold (fun e acc -> e && acc) true

So I tested these against using the code to measure how quickly each method was in applying 

// where panDigitalInt is the upper limit ("n" in the table)
let testNumbers = [ 0 .. pandigitalInt ]
let bench name f =
    let sw = Stopwatch.StartNew()
    let res = testNumbers |> List.filter f |> List.length
    sw.Stop()
    printfn "%s: %f ms" name sw.Elapsed.TotalMilliseconds

time taken to filter pandigitals in [0..n] in milliseconds

method n=1234 n=12345 n=123456 n=1234567
pandigital_strcmp 2.1081 11.2639 113.2086 1356.1985
pandigital_intcmp 0.9716 9.7646 89.3238 947.0513
pandigital_arrcmp 2.4441 6.1932 59.7014 618.0665
pandigital_set_difference 2.5024 17.2115 199.2863 1986.9592
pandigital_array_contains 0.9790 4.8161 50.447 565.6698

So it seems Array.contains wins overall. The Set.difference approach was pretty dismal which was disappointing - it came to me when I was out walking my dog and I rushed back to write it and benchmark it. I think Set.ofArray is perhaps a little slow, but I haven't done any investigation into it.

It's worth noting that you probably shouldn't do something like [0..bigNumber] |> List.filter pandigital_array_contains to start with - maybe it's worth approaching the problem from a different angle in some cases.

Sorting a 3-element tuple

OK this only came up once and was part of a pretty poor solution I had for problem 39 (original, solution) - I generated thousands of tuple featuring 3 of integers and then tested whether they could represent the sides of right-angled triangles using Pythagoras' theorem. However since they were in no particular order I thought I needed identify the hypotenuse. I wrote this out long-form since there's only a handful of cases and solved the problem relatively quickly.

Regardless of whether this was a suitable solution for the problem, I was curious as to what approach works best for sorting these tiny collections of 3 elements.

I had only three approaches:

sort_nested_if - use nested if statements to reduce the number of comparisons needed while introducing branches
let sort_nested_if (a,b,c) =
    if a >= b then
        if b >= c then (a,b,c) else (a,c,b)
    elif b >= a then
        if a >= c then (b,a,c) else (b,c,a)
    else
        if a >= b then (c,a,b) else (c,b,a)

sort_flat_if - have a separate if for each result at the top level

let sort_flat_if (a,b,c) =
    if a >= b && b >= c then       (a,b,c)
    elif a >= b && b >= c then     (a,c,b)
    elif b >= a && a >= c then     (b,a,c)
    elif b >= a && c >= a then     (b,c,a)
    elif (*c >= b &&*) a >= b then (c,a,b)
    else (*c >= b && b >= a then*) (c,b,a)

sort_array - create an array, use Array.sort and map the results back into a tuple when returning the result

let sort_array (a,b,c) =
    let sorted = Array.sort [| a;b;c |]
    (sorted.[0], sorted.[1], sorted.[2])

To test these I generated large arrays of size 4000, 40000 and 400000 3-element tuples and timed how long each method took to sort them.

time taken to sort n 3-element tuples, in milliseconds

method n=4000 40000 400000
sort_nested_if 1.2626 13.9014 193.3619
sort_flat_if 1.7864 23.4633 258.2538
sort_array 1.2424 11.9907 132.4312

OK now it's probably obvious why I didn't just bin this little experiment - sort_array is surprisingly the clear winner. I would have guessed that the overhead of building an array and calling Array.sort function on a list way smaller than you'd normally need to sort would be insane. But apparently it's not!

Conclusion

It's surprising how many different ways you can write some relatively simple algorithms. Some of them are obviously pretty awful, like the recursive enumerate function (though I'm sure I can rearrange it to take advantage of tail call elimination) - and some were surprisingly performant, like the sort_array function in the final example. I've noticed that some other Project Euler people maintain a sort of library of functions they can re-use. Eventually I'd like to do something like this, but until it becomes unmanageable I'll just keep a Gist on GitHub:

PDFCat

I wanted an easy way to read the Handbook of Applied Cryptography ebook @ http://cacr.uwaterloo.ca/hac which is split into a number of separate PDFs, so I created PDFCat (PDF + "cat" from the *nix command). It uses CherryPy and PyPDF2 as well as Bootstrap for layout (because I wanted an excuse to try Bootstrap) and Sortable for the nice draggable, sortable list to change the order of the PDFs in the final document. 

GPS Speedometer with Python and SL4A

On a very hungover 8 hour train journey back from a spontaneous weekend trip to Berlin I was mucking around SL4A with my Nexus 7 and somehow we ended up wanting to know how fast the train was going and figured we could use the GPS positions to do this. A challenge was issued - I had to write this before we reached Brno. We were just passed Usti nad Labem at this point giving me plenty of time in theory, but my dwindling battery life added a bit of drama to the proceedings. Anyway I managed:

Here's what the output looks like when I run it on my morning tram to work:

The journey this tram follows looks a little like this:

Reshma's flight - EK139 (but only between the Black Sea and Austria)

You can get the FlightRadar24 data as a JSON object for playing with at http://db8.flightradar24.com/zones/full_all.js

I altered my Earth program from a previous post to periodically poll this URL to plot the progress of my Girlfriend's Emirates flight from Dubai to Prague, and created a nice little gif of this. Sadly her flight wasn't on the feed at all except for this very short period between the Black Sea and Austria so the GIF is rather shirt and boring!

I should've made this

When checking my twitter feed an eerily incomprehensible series of tweets from Ryan McLeod (of Consolevania and VideoGaiden fame) popped up.

The reason they were familiar is that they use Markov Chains to generate sort of original looking tweets, and I had previously done something similar to recreate a gibberish talking Brian Cox bot. I had hoped that the  bot would use the word "billions" a lot like Brian does in "Wonders..." but he just ended up tweeting nonsense to Brian's wife so I canned him.

Anyway the service is quite cool and I wish I had taken my stupid Brian Cox not idea a bit further and made it myself but never mind! 

Check it out: http://yes.thatcan.be/my/next/tweet


hiding Foreign Policy's social media popup

Foreign Policy has an annoying lightbox which occasionally pops up, prompts you to sign into facebook and prevents you from viewing articles until you sign in.

If you create a new bookmark with the text below as the address you can click it to hide this popup and read the story without logging in:

javascript: (function(){document.getElementById("TB_window").style.display = "none";document.getElementById("TB_overlay").style.display = "none";}());

taps aff

Definitions

taps aff: /tæps æf/, interjection

The removing of one's shirt or other upper body garments, most often in the event of warm weather but also often for purposes of celebration or simple antagonism.

Also
 taps.af: /tæps dɒt eɪ ɛf/, website

Weather service at http://taps.af borne of a drunken evening's nonsense talk followed by Easter Sunday morning in bed learning to use CherryPy and Heroku. Given a location a one word verdict on the weather is decided: "AYE" (indicating warm weather) or "NAW" (denoting colder, more unpleasant weather)

Screenshots


dcraw and stdin

You may or may not have noticed that dcraw.c has a "-I" command line argument reserved for read_input_from_stdin, however it doesn't as yet do anything - when you attempt to use it you'll see that dcraw outputs "No files to process" and quits. I crafted a quick-n-dirty hack to allow you to use this functionality.

Just apply the patch to dcraw.c and you're all set:

$ cat _MG_2184.CR2 | dcraw -e -v -w -I
Writing data to /var/tmp/tmp.0.thumb.jpg ...

It's rather shoddy but it did what I wanted it to do. One happy/weird side effect of how I implemented it (writing stdin to a temp file and stuffing the filename into argv[]) is that as well as converting from stdin it will also take care of any RAW files supplied as a command line argument:

$ cat _MG_2184.CR2 | dcraw -e -v -w -I _MG_2041.CR2 
Writing data to /var/tmp/tmp.0.thumb.jpg ...
Writing data to _MG_2041.thumb.jpg ...

The reason I employed this hack rather than substituting stdin for the FILE* objects used by dcraw to open and read the files was that fseek() was used at various points, and it doesn't play nice with stdin.

Awkward Texter (SL4A/Python is fun)

Have you ever felt like there's not enough awkward moments in your life? Well I have the solution in 3 easy steps!

1. Download SL4A by doing scanning the QR code below

2. Download the Python libraries for SL4A:

3. Paste this badboy in and run

SL4A is my favourite App. Now I just need to figure out how to use the text-to-speech interface in voice conversations...