Jupyter Notebook - Running under IIS on Windows

I use Jupyter Notebooks quite a lot for personal development, and to do this I set up an instance on a DigitalOcean droplet. However I also wanted to do something similar for some tasks at work which can access resources only available in our internal network. Since we use Windows that means I need to do something a little different. I could just use Anaconda to install and load it, but I find it a little clunky, I don't like how it tries to manage everything for you, I wanted a slightly nicer url than https://localhost:8000 and more importantly I wanted it to just be there without having to start anything up. So I set out to get Jupyter Notebook running on IIS using the ASP.NET Core Module V2 on http://notebook.local.

Make sure Python, IIS and the ASP.NET Core Hosting Bundle are installed. Open up a Terminal (I'm using bash, Powershell is probably file but I don't like it), create an new directory where our notebook application will be served, setup a venv there, and run the activation script, upgrade pip so that it doesn't complain at us and then finally install Jupyter Notebook:

$ cd ~/source/repos
$ mkdir notebook
$ python -m venv venv
$ source venv/Scripts/activate
$ pip install --upgrade pip
$ pip install notebook
To start with let's just run it on its own to check that it works:

$ jupyter notebook
[I 20:39:42.322 NotebookApp] The port 8888 is already in use, trying another port.
[I 20:39:42.328 NotebookApp] Serving notebooks from local directory: C:\Users\sean\source\external
[I 20:39:42.329 NotebookApp] Jupyter Notebook 6.1.4 is running at:
[I 20:39:42.329 NotebookApp] http://localhost:8889/?token=e27aa8d7754e1bb078dfdf48e7c55032ac3551360389fd65
[I 20:39:42.329 NotebookApp]  or
[I 20:39:42.329 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 20:39:42.476 NotebookApp]

    To access the notebook, open this file in a browser:
    Or copy and paste one of these URLs:

So here you can see why I'm doing all this - http://localhost:8888 isn't quite as nice as https://notebook.local. Next let's create a logs folder where our stdout logging will live and notebooks folder where we'll keep our notebook files.

$ mkdir logs notebooks

Next we'll create a Python script called launch.py that will launch Jupyter:

import os, sys
from IPython.terminal.ipapp import launch_new_instance
from IPython.lib import passwd
import warnings

# set the password you want to use
notebook_password = "test"

sys.argv.append("--NotebookApp.port=" + os.environ["ASPNETCORE_PORT"])
sys.argv.append("--NotebookApp.password=" + passwd(notebook_password))


This script is kind of interesting, but we'll go into it later once we've got everything up and running. Next we will need to create a Web.config file that'll be used by IIS and ANCMV2 to call this script with the Python executable and libraries from our venv. So let's create the following:

<?xml version="1.0" encoding="utf-8"?>
      <add name="aspNetCore" path="*" verb="*" modules="AspNetCoreModuleV2" resourceType="Unspecified" />
    <aspNetCore processPath=".\venv\Scripts\python.exe" 
        <environmentVariable name="PYTHONPATH" value="." />

Next we'll set up the new Site in IIS - so open up IIS Manager (hit Start and start typing "Internet Information Services" - it'll appear in the suggestions) then:

  1. in the left-hand menu right click the top-level node, hit "Add Website..."
  2. in the dialog that appears call the site whatever you like
  3. in the "Physical Path" section click the "..." button, and choose the directory your notebooks live in
  4. in the "Binding" section type "notebook.local" in the "Host name" section

Now nearly everything's in place we just need to modify our local DNS to resolve "notebook.local" to so add the following line to C:/Windows/System32/drivers/etc/hosts (you'll need to be an Administrator to do so):    notebook.local

So now we have a Website bound to "notebook.local" running under IIS, which is configured to use ASP.NET Core Runtime Module V2 to launch a Python script that will run Jupyter Notebook. Let's fire up a browser and navigate to http://notebook.local and enter the password test:

It loads nicely so let's just create a simple Python 3 notebook to confirm everything works end-to-end:

Excellent. Ok so let's rewind a little bit and ask the question: why do we use a special script instead of having our Web.config file launch jupyter notebook directly? Well that would be ideal, but a bit of a hack required. Notice in launch.py that we grab the ASPNETCORE_PORT environment variable and use that as our port? Well that contains the port IIS wants to communicate with our app in - so we need to hook Jupyter up to listen on that port. While it is possible to call jupyter notebook --port 8080 to specify the port we want to use, it's actually not possible to use the ASPNETCORE_PORT variable in our Web.config. These environment variables are not expanded in our arguments="..." attribute, (see issue #117 in the aspnet/AspNetCoreModule repo on GitHub) - so we need to run a script that grabs this from the Environment and sets it up. It's a bit hacky, but it gets us where we need to be.

Fedora - Jupyter on a Linux remote using systemd

When you want to do some experimentation or put together a simple code-based presentation Jupyter notebooks are a powerful tool to have at your disposal. But if you use a number of devices over a few locations it can be useful to have a single instance hosted somewhere central (Linode, Digital Ocean, wherever) that you can access from any device wherever you are. There are a handful of ways that you can achieve this:

  1. log in to your remote machine, set Jupyter up and run jupyter notebook (perhaps in a tmux session) then log out - do this whenever your machine reboots
  2. as above but using an existing docker image
  3. spin up an Azure notebook
  4. ... or we could do something like #1 - but have it setup under a separate user and administered via a systemd service

All four of the above are fine for different reasons and use-cases but here I'll talk about how I put #4 together in a little Linode instance running Fedora 25 - it's relatively simple, you can control over the kernels installed, and it's another excuse to get a bit more in-depth with another Linux subsystem (systemd).


All you need is a Linux system which uses systemd (Fedora 15.0 or newer, Debian 8.x or newer, Ubuntu 15.04 or newer, for example) which you have sudoer level access on, and Python 3.x. It's probably pretty straight-forward to set this up on systems using the SysV init but I won't cover them here.

Install and Set Up Jupyter 

First thing we need to do is install Jupyter and set up the user context which the Jupyter will be run under - which is a user called "jupyter":

$ sudo python3 -m ensurepip
$ sudo pip install jupyter
$ sudo useradd jupyter
$ sudo passwd jupyter

Next we should switch to the new jupyter user, create the directory our notebooks will live in and generate the Jupyter config we'll mess around with:

$ su - jupyter
$ mkdir notebooks
$ jupyter notebook --generate-config

The last command will create a new file ~/.jupyter/jupyter_notebook_config.py which we'll do a little messing around with shortly, but before this we'll set up a password 

$ python3
Python 3.5.2 (default, Sep 14 2016, 11:28:32) 
[GCC 6.2.1 20160901 (Red Hat 6.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from notebook.auth import passwd
>>> passwd() # below I enter "password123"
Enter password: 
Verify password: 

This will be used to log in to the application when its running. Open up the ~/.jupyter/jupyter_notebook_config.py file in a text editor and add/modify the following lines (using the SHA1 hash returned by the above):

c.NotebookApp.port = 8888
c.NotebookApp.ip = ''
c.NotebookApp.password = 'sha1:2eff88aac285:385c87867bd18fe852ee1d56b1010d4beed96969'

Setting up a Jupyter systemd service

Now we want to create a new systemd service so we can make sure our Jupyter notebook runs on startup, handles logging nicely and has all the other bells-and-whistles afforded to us by systemd. This is surprisingly simple - we want to create a new file jupyter.service in /usr/lib/systemd/system - this will tie together our newly installed Jupyter software and our newly setup jupyter user - using your favourite text editor create it so it looks like the below:

$ sudo cat /usr/lib/systemd/system/jupyter.service

ExecStart=/usr/bin/jupyter notebook --no-browser


Now all that's left to do is cross our fingers, enable our services, kick them off and browse to our remote box and login with our password:

$ sudo systemctl daemon-reload
$ sudo systemctl enable jupyter
$ sudo systemctl start jupyter

And if you want you can stop here - bookmark your http://www.xxx.yyy.zzz:port address and you're all set!


This was initially just an experiment - an excuse to test out my ability to put together a systemd .service file and do something more with a mostly-idle linux server sitting in a facility somewhere in Amsterdam. However I have found that I really like using this setup. When I was first shown Jupyter (née IPython) I was unimpressed and didn't see the point. However over the last few days I've been working through Project Euler problems again while teaching myself F# (using the IfSharp kernel) and I have found that it lends itself very well to my problem solving workflow on Project Euler.

IronPython - Integer performance improvements

When I was puddling around in IronPython ahead of an upcoming project I spotted something interesting - when we want to deal with integers within the IronPython interpreter we frequently call a function in Microsoft.Scripting.Runtime.ScriptingRuntimeHelpers called Int32ToObject:

    public static object Int32ToObject(Int32 value) {
        // caches improves pystone by ~5-10% on MS .Net 1.1, this is a very integer intense app
        // TODO: investigate if this still helps perf. There's evidence that it's harmful on
        // .NET 3.5 and 4.0

        if (value < MAX_CACHE && value >= MIN_CACHE) {
            return cache[value - MIN_CACHE];
        return (object)value;
For integers in the range -100 and 1,000 we maintain a cached array of objects already converted from Int32. This is called a lot even in relatively innocuous looking programs - to see this we can a simple Console.Out.WriteLine("Int32ToObject") in there and count the number of times it occurs in a program that simply prints "Hello, world!" we can see this:
    $ cat hello.py
    print "Hello, world!"
    $ ./bin/Debug/ipy.exe hello.py | grep -c "Int32ToObject"

The code itself specifically references the pystone benchmark (which I found here) in a comment suggesting that we could see a performance improvement on pystone with versions of .NET newer than 3.5 - which appears to be the minimum version later versions of IronPython supports.

I built the Release configuration of IronPython both before and after removing this cache functionality, and tested the default pystone benchmark on my work computer (a pretty hefty 8-core Xeon E3-1275 @ 3.6 GHz, with 32GB RAM) - the results are below where I ran the test 10 times and took the average. The values output by the benchmark are "pystones per second" - where one "pystone" is an iteration through the main loop inside the Proc0() function which performs a number of integer operations and function calls:

Before  After 
1. 238262 234663
2. 239115 234595
3. 245149 245931
4. 237845 302562
5. 228906 295027
6. 248294 275535
7. 258694 297271
8. 246650 282791
9. 235741 296104
10. 233604 274396
Average 241226 273887

So with the fix we see 32,661 more of these iterations per-second, which is roughly a 13.5% improvement. This makes sense - presumably casting each int to object has been improved so that it's nearly free, leaving the overhead being a simple function call.

IronPython - Efficient string concatenation

Previously I'd done a bit of fiddling around with Python, revisiting some string concatenation benchmarks from an old-ish article and trying to explain some unexpected results. After playing around a bit with IronPython I was curious whether it'd be faster or slower than CPython on windows.

I installed the latest versions of both IronPython (2.7.5) and CPython (2.7.12) into my Windows 10 VM and re-ran the same set of tests.

The most interesting thing I learned was that some changes to how memory was allocated for the new buffer caused the "naive" method to be on par with the optimised version. As it turns out, IronPython doesn't actually have this - so running stest.py we get the following:

    $ ipy64 stest.py 1
    method 1
    time 2406.60858154 ms
    output size  86 kb

    $ ipy64 stest.py 6
    method 6
    time 46.9284057617 ms
    output size  86 kb

IronPython is a totally different beast to CPython, so my previous method of debugging - taking the code and examining it with the dis module doesn't yield anything useful:

This is because it compiles it into a different set of instructions to be executed using the .NET CLR (it's important to note that it does not go directly to .NET IL, there's still a level of instructions above this similar to CPythons opcodes).

However since we're on Windows with .NET we do have Visual Studio - which is arguably easier than working through python bytecode instructions in a text editor. To begin with it's extremely simple to find out where exactly we spend most of our execution time using dotTrace by JetBrains:

So the program execution is split with roughly 50% spent in initialisation (ImportSite, again!) but that's not included in our benchmark, however the remaining 50% is spent in String.Concat() in mscorlib (source here) which is what we're interested in:

    [System.Security.SecuritySafeCritical]  // auto-generated
    public static String Concat(String str0, String str1) {
        Contract.Ensures(Contract.Result() != null);
        Contract.Ensures(Contract.Result().Length ==
            (str0 == null ? 0 : str0.Length) +
            (str1 == null ? 0 : str1.Length));

        if (IsNullOrEmpty(str0)) {
            if (IsNullOrEmpty(str1)) {
                return String.Empty;
            return str1;

        if (IsNullOrEmpty(str1)) {
            return str0;

        int str0Length = str0.Length;
        String result = FastAllocateString(str0Length + str1.Length);
        FillStringChecked(result, 0,          str0);
        FillStringChecked(result, str0Length, str1);
        return result;

This explains why things are so slow - when concatenating two strings there are no realloc-based tricks like CPython had - we allocate a new memory buffer every time, copy both strings into it, and let the garbage collector handle the old buffers.

Sadly it's pretty non-trivial for someone like me to implement a similar optimisation here - since we depend on the underlying string implementation in .NET we're stuck with how string concatenation was implemented there. I toyed with the idea of re-writing a hacky reimplementation of FastAllocateString as FastReallocateString specifically for the += operator (it's possible to do - we need to change PythonBinaryOperationBinder.BindDelegate() to handle Add and AddAssign differently) this would've involved getting stuck into the mscorlib sources in coreclr - and I'd rather stay in Python-land for the time being.

However since it's possible to access the .NET standard libraries from IronPython we can at least test how System.Text.StringBuilder performs, since it is designed to solve this very problem. So I setup the stest.py code I previously used, and re-ran them on my Windows 10 VM for both CPython 2.7.12 and IronPython 2.7.5. Just to quickly recap, here are the string concatenation methods I tested:

Method 1: simple concatenation: s1 += s2

Method 2: concatenation using MutableString (s1 += s2, where s1 is a MutableString)

Method 3: appending to a long array of char

Method 4: building a list of strings, then calling "".join()

Method 5: writing to a cStringIO buffer in memory using write()

Method 6: same as Method 4 but using a list comprehension inline

Method 7: using System.Text.StringBuilder (IronPython only)

runtime (ms) concatenations per second
method 1 16.00 1,250,000
method 2 108.99 183,503
method 3 14.99 1,334,222
method 4 3.00 6,666,666
method 5 6.00 3,333,333
method 6 2.00 10,000,000

runtime (ms) concatenations per second
method 1 2517.44 7,944
method 2 3968.87 5,039
method 3 25.39 787,711
method 4 42.13 474,721
method 5 35.56 562,429
method 6 33.22 602,046
method 7 22.43 891,662


So in IronPython the most efficient way to join strings together is to hook into .NET's System.Text library and use StringBuilder, no surprises there. What was surprising was how much slower IronPython was than CPython. I'm curious if this is just a freak result or if IronPython is known to be pretty slow. I'll probably not attempt to speed up the AddAssign op in IronPython, however I would like to investigate why IronPython is so slow, and if there are any plans to improve things. In addition I was surprised that the realloc trick had little-to-no effect on CPython in Windows (i.e. method 1 was slow even on 2.7.12).

I am a little sick of this benchmark now - I might revisit it one final time to compare it across CPython, IronPython, PyPy and Pyjion to complete the picture, but only if I'm really bored :)

IronPython - tackling some unexpected Exceptions

After I listened to an episode of Talk Python To Me featuring Alex Earl from the IronPython project I learned that not only is IronPython not dead/dying, but it's actually seeing a bit of a resurgence recently. I grabbed the sources from the IronLanguages GitHub, setup my dev environment, opened it up and launched the IronPythonConsole project hoping to see the familiar python REPL.

However instead I saw that it had hit an exception:

I was frustrated at first, thinking I'd done something wrong, but realised that getting to the bottom of an Exception was a fine way to introduce yourself to a new codebase.

The exception itself is a ZipImportError with the text "not a Zip file" and is thrown in the constructor for zipimporter.

Python Confession #1: I had never heard of or used zipimporter before.

Since I'd never heard of the class before I had no idea why the IronPython runtime would be calling this and especially on something which didn't appear to exist. So it's time to dig through the call stack to see where this comes from:

It appears that PythonCommandLine.ImportSite kicks this process off so that's where I started looking:

    private void ImportSite() {
        if (Options.SkipImportSite)
        try {
            Importer.ImportModule(PythonContext.SharedContext, null, "site", false, -1);
        } catch (Exception e) {
            Console.Write(Language.FormatException(e), Style.Error);

It turns out that site is a special Python module which is imported by default when the interpreter starts (for all implementations - IronPython, JythonPyPy and good old vanilla CPython). It's responsible for doing some platform-specific module path setup.

Python Confession #2: I had never heard of the site module before.

So how does importing site cause us to execute some code relating to zipimporter? Searching through the call stack at the point of the Exception shows that all this seems to come from FindImporterForPath which takes every function in path_hooks and attempts to apply it to the path we're importing.

    /// Finds a user defined importer for the given path or returns null if no importer
    /// handles this path.
    private static object FindImporterForPath(CodeContext/*!*/ context, string dirname) {
        List pathHooks = PythonContext.GetContext(context).GetSystemStateValue("path_hooks") as List;

        foreach (object hook in (IEnumerable)pathHooks) {
            try {
               object handler = PythonCalls.Call(context, hook, dirname);

                if (handler != null) {
                    return handler;
            } catch (ImportException) {
                // we can't handle the path

So we call every path_hook with the module we're importing as an argument using PythonCalls.Call()The path_hooks themselves come from the sys module:


A list of callables that take a path argument to try to create a finder for the path. If a finder can be created, it is to be returned by the callable, else raise ImportError.

Python Confession #3: I had never heard of or used path_hooks before.

So what is in path_hooks? If I keep hitting continue on the Visual Studio debugger the Exception is caught, I reach the python REPL and can inspect what is in sys.path_hooks:

And there it is - zipimporter. Now we're approaching an explanation - when the IronPython interpreter is initialised it imports the site module which takes the everything in path_hooks and applies them to all the modules in our path - but since there are no .zip files anywhere in our path zipimporter (the only path hook) cannot find anything to operate on, so throws an exception which is normally caught and handled.

So this is normal behaviour - the exception is even expected, since path_hooks' documentation states that if a path_hook fails it raises an Exception.


OK nothing special has happened here since IronPython is behaving exactly as it should, however unexpected it was to me. That said, this is a very nice way to learn some new Python concepts, like:

  1. the zipimport module
  2. the site module
  3. sys.path_hooks 

I even have a half-working clone of zipimporter for tarballs called tgzimporter, but there's little need for such functionality as I suspect that even zipimporter is seldom used. 

It would've been easy to just keep hitting the "F5" key until I hit the REPL, but then I would likely have struggled to find a way to approach the source code and perhaps would've put it off indefinitely. Hopefully now I'll find some way to continue to improve my understanding and contribute to the IronPython project.

Thinkpad X250 - SMS and GPS via Python

When I was messing around trying to get the 4G modem working on my Thinkpad X250 for this post I ended up doing a little debugging by connecting to the modem over serial using cu:

    $ cu -h -l /dev/ttyACM0

It turns out that the Sierra EM7345 modem in my Thinkpad can be controlled through AT commands sent over serial. Over at zukota.com there's a guy who's done a seriously good job of experimenting with this modem and documenting what was previously not very well documented.  

For example if we wanted to use a SIM locked with the PIN "1234", then send the message "Hello, world!" to the Czech number 775123456, we would connect as above and enter the following:


    > Hello, world!^Z^M
    +CMGS: 40


Getting the GPS position involved issuing the XLCSLSR at command and parsing it's output. This is a little more complicated, since there's a slight delay in the response, and the response contains dozens of fields (I've included the request/response in its entirety below so you can see):

    +XLCSLSR: request id 2


    +XLCSLSR: 2, 49.195669 N, 16.606075 E, 119.996932, 48.743179, 39.616302,143, 100.997169,67,2016/05/10,18:38:22,0,1,75.45,2.28,-0.25,2.20,0.64,239919,239919.74,,,4.50,2.50,3.50,118.92,62.80,100.98,,,1,1896,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4,73.04,70.04,21.00,1,,,,21 ,61.03,283.14,41.00,1,,,,16 ,20.01,173.09,30.00,1,,,,10 ,17.01,100.05,32.00,1,,,,29 


I was delighted when I found this out, as I had recently discovered the pyserial library, and realised I could interact with the modem and explore/expose some of the modem's functionality via a Python library.


I rolled up my sleeves and wrote a library called em73xx (PyPI, GitHub) which will issue the necessary AT commands to send and receive SMS messages, and retrieve the current location using the modem's GPS functionality.

To install the library either retrieve from pypi using pip:

    $ pip install em73xx

... or you can clone the smcl/em73xx repo and install using setup.py:

    $ git clone https://github.com/smcl/py-em73xx
    $ cd py-em73xx
    $ python setup.py install

Once it's installed you can use em73xx by importing the Modem class and instantiating it with the path to the serial device on your system (probably /dev/ttyACM0, as below) and optionally specify the PIN for the sim inserted:

    from em73xx import Modem
    modem = Modem("/dev/ttyACM0", pin="1234")

To retrieve all the SMS messages on the SIM, and print them to stdout we can use the getSMS method:

    messages = modem.getSMS()
    for m in messages:
        print("from %s: % (m.sender))
        print("\t%s" % (m.message))

If we want to send a message we can use the sendSMS method:

    # request an 60 minute tram ticket in brno!
    modem.sendSMS("90206", "BRNO")

And finally if we want to retrieve the current GPS position, we can use the getGPS method - note that the modem can fail to get a GPS fix, and will return None if so:

    gps = modem.getGPS()
    if gps:


Ultimately I started this because I wanted to have a simple utility that integrated with xmobar which would inform me whether I had received a new SMS, and would allow me to reply to existing messages or compose an entirely new one. To achieve this I wrote xsms which is a utility that will either print to stdout the number of unread messages (for xmobar) or launch a GUI, depending on the command line switches used.

Again you can either retrieve xsms from pypi using pip:

    $ pip install xsms

... or clone the smcl/xsms repo from github and install using setup.py:

    $ git clone https://github.com/smcl/xsms
    $ cd xsms
    $ python setup.py install

Once xsms is installed you can either launch it standalone.

    $ python -m xsms --device=/dev/ttyACM0

And if you want to have a little indicator in your xmobar you can use something like the below (which takes advantage of the ability to specify the font via tags to easily get some icons from Font Awesome):

    -- assumes you have Font Awesome installed and used here:
    -- additionalFonts = ["xft:FontAwesome-10"],
    Run Com "/usr/bin/python" [ "-m", "xsms", "-d", "/dev/ttyACM0", "-p", "1234", "-r", "", "-u", " %d" ] "xsms" 600,

So when you add %sms% to your xmobarrc's template and restart you'll see something like this:

... and if you want to be able to click the icon to raise the GUI, you can surround it with an <action> which invokes xsms:

  template = "%StdinReader% }{ ... stuff ... <action=`python -m xsms -g -d /dev/ttyACM0 -p 1234`>%xsms%</action> ... "


This is an idea that sat half-finished in my ~/dev folder for about 6 months, and I'm really glad that I was able to take it from a single hacky one-shot script to two fully-fledged packages released on PyPI and ready to be used. There is still some additional functionality I'd like to add to em73xx, for example I only grab longitude/latitude from the GPS data and I'd like to be able to reset or power-cycle the modem in case of problems, however it's in pretty good shape overall.

As for xsms, it's the second Tkinter application I've put together, and while I'm finding it easier each time I write something (with fewer cut-paste lines from effbot.org's Tkinter docs) it's starting to be a little unwieldy. For example the ttk module allows you to add widgets to your application with a unified theme, but it's missing a multi-line text entry widget - which is something I use for inputting and displaying the SMS messages. This meant that I had to either decide to put off styling xsms or add some hacks to customise the Tkinter.Text widget that I ultimately used. In addition programatically constructing the UI using the grid() and pack() layout managers feels a little like creating a webpage laid out using nested <table> elements in the late 90's. Ultimately if I find myself writing a Python desktop app in future I'll spend a little more time investigating the frameworks available and weighing them up against using Tkinter, now that I'm broadly familiar with it.

Useful Links

Python - Efficient String Concatenation in Python (2016 edition)

After working through Philip Guo's excellent series of lectures on the CPython VM I found myself looking at the performance of string concatenation. Strings are immutable in python - so they cannot be modified in-place, you have to create a new string. This also affects string concatenation, so if we want to tack one string onto the end of another we might have some code like this:

    x = "hello"
    x += ", world!"

When the CPython interpreter is running the above code, it's doing something conceptually similar to the following:

  1. allocate a buffer for the string "hello" 6 bytes => len("hello") + 1, and copy the string into it
  2. make x reference this
  3. allocate a new buffer with length 14 bytes => len("hello") + len(", world") + 1
  4. copy "hello", then ", world" into this new buffer, and make x reference it
  5. decrement reference on the old string, possibly free it

Note even though Python is unlike C in that it doesn't rely on a trailing NULL to terminate strings (we store each string's length), it does by convention still null-terminate them - which is where the extra "+ 1" byte comes from.

Anyway, if we have to repeatedly join strings together we have to be careful how we do it, just in case we accidentally introduce some slow, inefficient code. This has been long understood in the python community, for instance here is an article from 2004 which explored different methods of concatenating strings, and compared their performance. 

I was curious if there was any difference on how things shaped up on modern hardware (the test used an old 400 MHz Celeron) and a newer version of Python (this used python 2.2.1, the latest version of python 2 is 2.7.12) - so I grabbed the source code and started playing around.

It needed a tiny bit of tweaking however - the timing module it uses doesn't seem exist in the standard library and current version in pypi isn't compatible. I modified it to use the time module, the source code is here on GitHub if you're interested: stest-new.py

As a quick recap, there are 6 methods of concatenating strings which we're putting under the microscope:

Method 1: simple concatenation: s1 += s2

Method 2: concatenation using MutableString (s1 += s2, where s1 is a MutableString)

Method 3: appending to a long array of char

Method 4: building a list of strings, then calling "".join()

Method 5: writing to a cStringIO buffer in memory using write()

Method 6: same as Method 4 but using a list comprehension inline

The original tests were performed using Python 2.2.1, for comparison's sake I've re-run them on my computer just to see:

    $ for i in {1..6}; do python stest.py $i; done

The results for Python 2.2.1 are below:

runtime (ms)  concatenations per second 
Method 1  55.11 362,910
Method 2  74.67 267,852
Method 3  10.80 1,851,337
Method 4  6.21 3,220,611
Method 5  8.11 2,467,612
Method 6  5.42 3,694,808

So in Python 2.2.1 the list comprehension method was the fastest by a pretty clear margin. However when I re-ran using 2.7.12 things turned out very differently:

runtime (ms) concatenations per second
Method 1 1.7995 11,113,977
Method 2 90.1073 221,957
Method 3 3.9557 5,055,967
Method 4 2.1804 9,172,689
Method 5 4.8047 4,162,585
Method 6 1.4191 14,093,289

In the time since 2.2.1 the performance of the naïve string concatenation method has improved hugely, it's now it's the fastest method (note: this is using a Python interpreter I built, using the Python 2,7.12 package that comes with Debian it's actually the fastest). This is surprising, since I thought it was relatively well-established and understood that it was slow, or just not ... pythonic. I was curious exactly why Method 6 was now slower than Method 1, so I disassembled the functions using the dis module.

There were a number of SET_LINENO instructions in the 2.2.1 version which I've not shown - it makes the disassembly a little easier to read and the performance impact would have been negligible - when tracing is turned off (which it is) all this instruction did was set the current frame's f_lineno and continue executing the next instruction.

Disassembly - Method 1 

Python 2.2.1 Python 2.7.12
 6 LOAD_CONST               1 ('')
 9 STORE_FAST               0 (out_str)
15 SETUP_LOOP              37 (to 55)
18 LOAD_GLOBAL              1 (xrange)
21 LOAD_GLOBAL              2 (loop_count)
24 CALL_FUNCTION            1
27 GET_ITER            
31 FOR_ITER                20 (to 54)
34 STORE_FAST               1 (num)
40 LOAD_FAST                0 (out_str)
43 LOAD_FAST                1 (num)
47 INPLACE_ADD         
48 STORE_FAST               0 (out_str)
51 JUMP_ABSOLUTE           28
54 POP_BLOCK           
58 LOAD_FAST                0 (out_str)
61 RETURN_VALUE              
 0 LOAD_CONST               1 ('')
 3 STORE_FAST               0 (out_str)
 6 SETUP_LOOP              31 (to 40)
 9 LOAD_GLOBAL              0 (xrange)
12 LOAD_GLOBAL              1 (loop_count)
15 CALL_FUNCTION            1
18 GET_ITER            
19 FOR_ITER                17 (to 39)
22 STORE_FAST               1 (num)
25 LOAD_FAST                0 (out_str)
28 LOAD_FAST                1 (num)
32 INPLACE_ADD         
33 STORE_FAST               0 (out_str)
36 JUMP_ABSOLUTE           19
39 POP_BLOCK           
40 LOAD_FAST                0 (out_str)
43 RETURN_VALUE             
So pretty much identical, which I half-expected - even though we're looking at two versions of Python which were released 14 years apart the compiler doesn't seem to get a great deal of changes, so the generated bytecode shouldn't change much for simple code. 

Disassembly - Method 6

Python 2.2.1 Python 2.7.12
 6 LOAD_CONST               1 ('') 
 9 LOAD_ATTR                0 (join)
12 BUILD_LIST               0
15 DUP_TOP             
16 LOAD_ATTR                1 (append)
19 STORE_FAST               0 (_[1])
22 LOAD_GLOBAL              3 (xrange)
25 LOAD_GLOBAL              4 (loop_count)
28 CALL_FUNCTION            1
31 GET_ITER            
35 FOR_ITER                17 (to 55)
38 STORE_FAST               2 (num)
41 LOAD_FAST                0 (_[1])
44 LOAD_FAST                2 (num)
48 CALL_FUNCTION            1
51 POP_TOP             
52 JUMP_ABSOLUTE           32
55 DELETE_FAST              0 (_[1])
58 CALL_FUNCTION            1
61 STORE_FAST               1 (out_str)
67 LOAD_FAST                1 (out_str)
70 RETURN_VALUE        
 0 LOAD_CONST               1 ('')
 3 LOAD_ATTR                0 (join)
 6 BUILD_LIST               0
 9 LOAD_GLOBAL              1 (xrange)
12 LOAD_GLOBAL              2 (loop_count)
15 CALL_FUNCTION            1
18 GET_ITER            
19 FOR_ITER                13 (to 35)
22 STORE_FAST               0 (num)
25 LOAD_FAST                0 (num)
29 LIST_APPEND              2
32 JUMP_ABSOLUTE           19
35 CALL_FUNCTION            1
38 STORE_FAST               1 (out_str)

41 LOAD_FAST                1 (out_str)
44 RETURN_VALUE        

There are slightly more differences when comparing how 2.2.1 and 2.7.12 generate the bytecode for the list comprehension method. However, aside from a couple of quirks in the 2.2.1 version (i'm not sure why we call DUP_TOP on the list we created, and I have no idea what _[1] is) much of the bytecode is broadly the same - we produce a list of integers by applying CALL_FUNCTION to xrange with argument loop_count, and then iterate over the results, calling UNARY_CONVERT on each and assembling either a list or string using INPLACE_ADD or LIST_APPEND

Since the generated bytecode contains no substantial differences, if we want to understand why the naive concatenation method (which uses INPLACE_ADD) became super-fast over the last 14 years we'll need to inspect how the Python VM interprets this code.

Analysis - INPLACE_ADD in Python 2.2.1

To save some space and time I'll skip straight to the meat and bones of where the actual string concatenation occurs - which is in Objects/stringobject.c, in the string_concat() function. It's quite small, so I've incuded it below - but I've stripped some macros for easier reading:

static PyObject *
string_concat(register PyStringObject *a, register PyObject *bb)
        register unsigned int size;
        register PyStringObject *op;
        if (!PyString_Check(bb)) {
                             "cannot concatenate 'str' and '%.200s' objects",
                return NULL;
#define b ((PyStringObject *)bb)
        /* Optimize cases with empty left or right operand */
        if ((a->ob_size == 0 || b->ob_size == 0) &&
            PyString_CheckExact(a) && PyString_CheckExact(b)) {
                if (a->ob_size == 0) {
                        return bb;
                return (PyObject *)a;
        size = a->ob_size + b->ob_size;
        /* PyObject_NewVar is inlined */
        op = (PyStringObject *)
                PyObject_MALLOC(sizeof(PyStringObject) + size * sizeof(char));
        if (op == NULL)
                return PyErr_NoMemory();
        PyObject_INIT_VAR(op, &PyString_Type, size);
        memcpy(op->ob_sval, a->ob_sval, (int) a->ob_size);
        memcpy(op->ob_sval + a->ob_size, b->ob_sval, (int) b->ob_size);
        op->ob_sval[size] = '\0';
        return (PyObject *) op;
#undef b

This is pretty much exactly the algorithm I described in my first paragraph - after a couple of checks to ensure we're definitely  dealing with strings we malloc a new buffer and memcpy both strings into it.

Analysis - INPLACE_ADD in Python 2.7.12

Again I'll skip straight to where the concatenation occurs - which is in Python/ceval.c in the string_concatenate() function: 

static PyObject *
string_concatenate(PyObject *v, PyObject *w,
                   PyFrameObject *f, unsigned char *next_instr)
    /* This function implements 'variable += expr' when both arguments                         
       are strings. */
    Py_ssize_t v_len = PyString_GET_SIZE(v);
    Py_ssize_t w_len = PyString_GET_SIZE(w);
    Py_ssize_t new_len = v_len + w_len;
    if (new_len < 0) {
                        "strings are too large to concat");
        return NULL;

    if (v->ob_refcnt == 2) {
        /* In the common case, there are 2 references to the value                             
         * stored in 'variable' when the += is performed: one on the                           
         * value stack (in 'v') and one still stored in the                                    
         * 'variable'.  We try to delete the variable now to reduce                            
         * the refcnt to 1.                                                                    
        switch (*next_instr) {
        case STORE_FAST:
            int oparg = PEEKARG();
            PyObject **fastlocals = f->f_localsplus;
            if (GETLOCAL(oparg) == v)
                SETLOCAL(oparg, NULL);
        case STORE_DEREF:
            PyObject **freevars = (f->f_localsplus +
            PyObject *c = freevars[PEEKARG()];
            if (PyCell_GET(c) == v)
                PyCell_Set(c, NULL);
        case STORE_NAME:
            PyObject *names = f->f_code->co_names;
            PyObject *name = GETITEM(names, PEEKARG());
            PyObject *locals = f->f_locals;
            if (PyDict_CheckExact(locals) &&
                PyDict_GetItem(locals, name) == v) {
                if (PyDict_DelItem(locals, name) != 0) {

    if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) {
        /* Now we own the last reference to 'v', so we can resize it                           
         * in-place.                                                                           
        if (_PyString_Resize(&v, new_len) != 0) {
            /* XXX if _PyString_Resize() fails, 'v' has been                                   
             * deallocated so it cannot be put back into                                       
             * 'variable'.  The MemoryError is raised when there                               
             * is no value in 'variable', which might (very                                    
             * remotely) be a cause of incompatibilities.                                      
            return NULL;
        /* copy 'w' into the newly allocated area of 'v' */
        memcpy(PyString_AS_STRING(v) + v_len,
               PyString_AS_STRING(w), w_len);
        return v;
    else {
        /* When in-place resizing is not an option. */
        PyString_Concat(&v, w);
        return v;

This one is a little more complex because there are a couple of neat optimisations. 

Firstly, if the next instruction to be executed is one of a handful (STORE_FAST, STORE_DEREF, STORE_NAME) we save ourselves a few cycles by doing a little setup ahead of time.

However more importantly, if there's only a single reference to the destination string we attempt to resize it in-place using PyString_Resize() instead of allocating an entirely new buffer. We can check this is the case by forcing this condition to be false:

    //if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) {                                  
    if (0) {
        /* Now we own the last reference to 'v', so we can resize it                           
         * in-place.                                                                           

If we build a Python 2.7.12 compiler without this resize optimisation, and retest:

runtime (ms) concatenations per second
Method 1 48.7949 409,878
Method 2 89.2184 224,168
Method 3 3.9204 5,101,535
Method 4 2.1489 9,307,231
Method 5 4.8215 4,148,073
Method 6 1.4203 14,081,697

We're right back where we started, with Method 1 being the second-slowest and the list comprehension used in Method 6 outperforming everyone else. 


In the intervening years since the original benchmark was performed, an optimisation has been introduced to improve how the CPython VM handles string concatenation. In the context of this specific benchmark it means that if you need to concatenate strings you can go with a naive, slightly more intuitive implementation and performance very close to the recommended, more-pythonic implementation. 

It's very likely that this is not applicable in all cases, I'm sure that if we have a number of objects allocated on the heap then it'll get slower very quickly (we won't be able to extend that buffer every time, and will need to find new locations) - however it's still an interesting and surprising result nonetheless.

Python - using a DualShock 4 with pygame

The Playstation 4 came out in 2013 and came with the DualShock 4 controller. I bought one in the midst of a particularly tough hangover, and when I received it two days later the part that most intrigued me was the controller. It's quite an amazing piece of equipment - I originally wanted to claim that the controller alone sported higher specs than the original PSX, and while the CPU is beefier (ARM Cortex M3 @ 160MHz vs MIPS R3K @ 33.8MHz) however the controller has a hundred or so Kb of RAM versus the PSX' 2MB.

That said the interfaces are pretty cool - in addition to the standard d-pad and X, O, triangle + square  buttons there's a couple of analog sticks, shoulder buttons (2 x analogue) and a handful of accelerometers for motion sensing. All in all a nice little controller, no wonder mine needs charged every couple of days.

I thought that it'd be nice to play with some of these sensors and write OpenGL code in python. So I put together a quick demo - a 3D cube which is controlled by the accelerometers. 

If we want to read events from the controller we can use pygame which can handle a lot of the fiddly parts (preventing the need for us to mess with something like evdev directly) - assuming we only have only one controller connected we can do this:

    import pygame
    controller = pygame.joystick.Joystick(0)

What we'll do is repeatedly look for events, and then save the data we've read back into a couple of dictionaries - this roughly looks like the below:

    axis = {}
    button = {}

# these are the identifiers for the PS4's accelerometers AXIS_X = 3 AXIS_Y = 4
# variables we'll store the rotations in, initialised to zero rot_x = 0.0 rot_y = 0.0 # main loop while True: # copy rot_x/rot_y into axis[] in case we don't read any axis[AXIS_X] = rot_x axis[AXIS_Y] = rot_y # retrieve any events ... for event in pygame.event.get(): if event.type == pygame.JOYAXISMOTION: axis[event.axis] = round(event.value,2) elif event.type == pygame.JOYBUTTONDOWN: button[event.button] = True elif event.type == pygame.JOYBUTTONUP: button[event.button] = False rot_x = axis[AXIS_X] rot_y = axis[AXIS_Y] # do something with this ...

It's important to note that we don't necessarily receive updated values each time we call pygame.event.get() - so we need to be careful about how we handle this.

Now that we've got some rotation values from the controller, we can use them to render and rotate a simple cube on screen:

    glRotatef(roty_scale * rot_y, 1, 0, 0)
    glRotatef(rotx_scale * rot_x, 0, 0, 1)
    for edge in edges:
        for vertex in edge:

However I could only see accelerometers for two axes - to perform the rotation through the third axis I wanted to use the L2 and R2 shoulder buttons. However these really are a funny beast. After a bit of experimenting I noticed that there's some extremely weird quirk. Joystick input using pygame can either return analogue ("axis") or boolean ("button") data - L2 and R2 buttons are unique in that they have both. The peculiar thing is how these values change based on how far the button is pressed - the below shows how the axis/button data changes as you press the shoulder button:

There's an ambiguity between the button being halfway and fully pressed. I'm not sure if this is a bug or just a missing feature in pygame. However since I was just using the buttons to control rotation, I was able to carefully select a scaling value (1.0 = 90 degrees) to apply to the value we read so that it's not possible to see this. Computer graphics has a long and storied history of such careful hacks to give the illusion of something working, so I was happy to let this slide!

I added a couple more things (shifting the cube around using the touchpad, lighting the cube depending on the buttons pressed) and the results are here this gist if anyone's curious.

Finally I wanted to use the lightbar - each DualShock 4 has an array of Red, Green and Blue LEDs which can be set independently to 256 different brightness levels. This is put to neat use in things like Grand Theft Auto - when the police are on you the lightbar flashes blue and red.

Sadly there doesn't appear to be any way to handle this through the pygame interface, and when I tried dropping to the lower-level evdev interface the leds() method didn't return anything.

I did however notice that a couple of devices appeared in the /sys/class/leds directory, so I could play around with them by writing 0 .. 255 values to these devices. Sadly they're not so easy to detect (the device name changes each time it's connected) so I have to find the device names using a slightly nasty regex:


XMonad - NetworkManager menu using xmobar and Tkinter (xnm)

I recently moved to using XMonad semi-fulltime, which is working out nicely most of the time. However the one sticking point is that when I try to work somewhere other than my flat or my office I had to drop to the command line and use nmcli to connect to wifi or to enable my builtin 4G modem.

This is less than ideal, there doesn't appear to by any simple NetworkManager popup/interface I could integrate easily with my xmobar setup - I wanted to have a little icon launch a UI that allowed me to the wifi network to connect to or switch on my 4G modem.

To address this I put together a little python app using Tkinter which updates network connectivity settings through NetworkManager's D-Bus API. The app is launched by clicking an area on Xmobar containing the dynnetwork widget beside a neat little old-school TTY icon:

Clicking this area will raise a menu like the below - listing WiFi and Modem interfaces

Above you can see that /dev/cdc-wdm0 is connected to my "Vodafone CZ" connection - there's a little chain icon beside it. Clicking on this connection would disconnect. Selecting one of the WiFi networks would have either connected automatically (if it was Open) or raised a popup asking for a password (if it was password-protected).

To achieve this you need to do a couple of simple things. Firstly ensure that the necessary dependencies are installed, and checkout the code

    $ sudo apt-get install python-tk python-networkmanager
    $ git clone https://github.com/smcl/xnm

Then ensure your xmobar template has the following line, which will display the dynnetwork info beside a TTY (assuming Font Awesome is installed as Additional Font #1 in xmobar):

<action=`/home/sean/.xmonad/xnm.py`>%dynnetwork% <fn=1></fn></action>

And that's it!

Raspberry Pi - Hacking a Canon DSLR shutter release (part 1)

In a previous post I created a simple timed shutter release from an Arduino, and used this to create a couple of time lapse videos. The drawback of this was that changing timing was a little tough - you had to update the source code and reflash the program to the Arduino.

Since I had a Raspberry Pi sitting around with similarly simple programmable GPIO pins I created a simple web frontend. Again hooking up the shutter release involved a pretty simple circuit based around a NPN transistor - I selected a random GPIO pin (pin 12):

The program uses the RPi.GPIO package to take the pictures, and the web frontend is a simple Bootstrap interface served by CherryPy. Currently all you can do is start/stop the timer, change the timing, take a picture ad-hoc.

I wrote this during winter and tried to capture a bit of snowmelt: