IronPython - Integer performance improvements

When I was puddling around in IronPython ahead of an upcoming project I spotted something interesting - when we want to deal with integers within the IronPython interpreter we frequently call a function in Microsoft.Scripting.Runtime.ScriptingRuntimeHelpers called Int32ToObject:

    public static object Int32ToObject(Int32 value) {
        // caches improves pystone by ~5-10% on MS .Net 1.1, this is a very integer intense app
        // TODO: investigate if this still helps perf. There's evidence that it's harmful on
        // .NET 3.5 and 4.0

        if (value < MAX_CACHE && value >= MIN_CACHE) {
            return cache[value - MIN_CACHE];
        }
        return (object)value;
    }
For integers in the range -100 and 1,000 we maintain a cached array of objects already converted from Int32. This is called a lot even in relatively innocuous looking programs - to see this we can a simple Console.Out.WriteLine("Int32ToObject") in there and count the number of times it occurs in a program that simply prints "Hello, world!" we can see this:
    $ cat hello.py
    print "Hello, world!"
    $ ./bin/Debug/ipy.exe hello.py | grep -c "Int32ToObject"
    1317

The code itself specifically references the pystone benchmark (which I found here) in a comment suggesting that we could see a performance improvement on pystone with versions of .NET newer than 3.5 - which appears to be the minimum version later versions of IronPython supports.

I built the Release configuration of IronPython both before and after removing this cache functionality, and tested the default pystone benchmark on my work computer (a pretty hefty 8-core Xeon E3-1275 @ 3.6 GHz, with 32GB RAM) - the results are below where I ran the test 10 times and took the average. The values output by the benchmark are "pystones per second" - where one "pystone" is an iteration through the main loop inside the Proc0() function which performs a number of integer operations and function calls:


Before  After 
1. 238262 234663
2. 239115 234595
3. 245149 245931
4. 237845 302562
5. 228906 295027
6. 248294 275535
7. 258694 297271
8. 246650 282791
9. 235741 296104
10. 233604 274396
Average 241226 273887

So with the fix we see 32,661 more of these iterations per-second, which is roughly a 13.5% improvement. This makes sense - presumably casting each int to object has been improved so that it's nearly free, leaving the overhead being a simple function call.

IronPython - Efficient string concatenation

Previously I'd done a bit of fiddling around with Python, revisiting some string concatenation benchmarks from an old-ish article and trying to explain some unexpected results. After playing around a bit with IronPython I was curious whether it'd be faster or slower than CPython on windows.

I installed the latest versions of both IronPython (2.7.5) and CPython (2.7.12) into my Windows 10 VM and re-ran the same set of tests.

The most interesting thing I learned was that some changes to how memory was allocated for the new buffer caused the "naive" method to be on par with the optimised version. As it turns out, IronPython doesn't actually have this - so running stest.py we get the following:

    $ ipy64 stest.py 1
    method 1
    time 2406.60858154 ms
    output size  86 kb

    $ ipy64 stest.py 6
    method 6
    time 46.9284057617 ms
    output size  86 kb

IronPython is a totally different beast to CPython, so my previous method of debugging - taking the code and examining it with the dis module doesn't yield anything useful:

This is because it compiles it into a different set of instructions to be executed using the .NET CLR (it's important to note that it does not go directly to .NET IL, there's still a level of instructions above this similar to CPythons opcodes).

However since we're on Windows with .NET we do have Visual Studio - which is arguably easier than working through python bytecode instructions in a text editor. To begin with it's extremely simple to find out where exactly we spend most of our execution time using dotTrace by JetBrains:


So the program execution is split with roughly 50% spent in initialisation (ImportSite, again!) but that's not included in our benchmark, however the remaining 50% is spent in String.Concat() in mscorlib (source here) which is what we're interested in:

    [System.Security.SecuritySafeCritical]  // auto-generated
    public static String Concat(String str0, String str1) {
        Contract.Ensures(Contract.Result() != null);
        Contract.Ensures(Contract.Result().Length ==
            (str0 == null ? 0 : str0.Length) +
            (str1 == null ? 0 : str1.Length));
        Contract.EndContractBlock();

        if (IsNullOrEmpty(str0)) {
            if (IsNullOrEmpty(str1)) {
                return String.Empty;
            }
            return str1;
        }

        if (IsNullOrEmpty(str1)) {
            return str0;
        }

        int str0Length = str0.Length;
        
        String result = FastAllocateString(str0Length + str1.Length);
        
        FillStringChecked(result, 0,          str0);
        FillStringChecked(result, str0Length, str1);
        
        return result;
    }

This explains why things are so slow - when concatenating two strings there are no realloc-based tricks like CPython had - we allocate a new memory buffer every time, copy both strings into it, and let the garbage collector handle the old buffers.

Sadly it's pretty non-trivial for someone like me to implement a similar optimisation here - since we depend on the underlying string implementation in .NET we're stuck with how string concatenation was implemented there. I toyed with the idea of re-writing a hacky reimplementation of FastAllocateString as FastReallocateString specifically for the += operator (it's possible to do - we need to change PythonBinaryOperationBinder.BindDelegate() to handle Add and AddAssign differently) this would've involved getting stuck into the mscorlib sources in coreclr - and I'd rather stay in Python-land for the time being.

However since it's possible to access the .NET standard libraries from IronPython we can at least test how System.Text.StringBuilder performs, since it is designed to solve this very problem. So I setup the stest.py code I previously used, and re-ran them on my Windows 10 VM for both CPython 2.7.12 and IronPython 2.7.5. Just to quickly recap, here are the string concatenation methods I tested:

Method 1: simple concatenation: s1 += s2

Method 2: concatenation using MutableString (s1 += s2, where s1 is a MutableString)

Method 3: appending to a long array of char

Method 4: building a list of strings, then calling "".join()

Method 5: writing to a cStringIO buffer in memory using write()

Method 6: same as Method 4 but using a list comprehension inline

Method 7: using System.Text.StringBuilder (IronPython only)

runtime (ms) concatenations per second
method 1 16.00 1,250,000
method 2 108.99 183,503
method 3 14.99 1,334,222
method 4 3.00 6,666,666
method 5 6.00 3,333,333
method 6 2.00 10,000,000

runtime (ms) concatenations per second
method 1 2517.44 7,944
method 2 3968.87 5,039
method 3 25.39 787,711
method 4 42.13 474,721
method 5 35.56 562,429
method 6 33.22 602,046
method 7 22.43 891,662

Conclusion

So in IronPython the most efficient way to join strings together is to hook into .NET's System.Text library and use StringBuilder, no surprises there. What was surprising was how much slower IronPython was than CPython. I'm curious if this is just a freak result or if IronPython is known to be pretty slow. I'll probably not attempt to speed up the AddAssign op in IronPython, however I would like to investigate why IronPython is so slow, and if there are any plans to improve things. In addition I was surprised that the realloc trick had little-to-no effect on CPython in Windows (i.e. method 1 was slow even on 2.7.12).

I am a little sick of this benchmark now - I might revisit it one final time to compare it across CPython, IronPython, PyPy and Pyjion to complete the picture, but only if I'm really bored :)

IronPython - tackling some unexpected Exceptions

After I listened to an episode of Talk Python To Me featuring Alex Earl from the IronPython project I learned that not only is IronPython not dead/dying, but it's actually seeing a bit of a resurgence recently. I grabbed the sources from the IronLanguages GitHub, setup my dev environment, opened it up and launched the IronPythonConsole project hoping to see the familiar python REPL.

However instead I saw that it had hit an exception:

I was frustrated at first, thinking I'd done something wrong, but realised that getting to the bottom of an Exception was a fine way to introduce yourself to a new codebase.

The exception itself is a ZipImportError with the text "not a Zip file" and is thrown in the constructor for zipimporter.

Python Confession #1: I had never heard of or used zipimporter before.

Since I'd never heard of the class before I had no idea why the IronPython runtime would be calling this and especially on something which didn't appear to exist. So it's time to dig through the call stack to see where this comes from:

It appears that PythonCommandLine.ImportSite kicks this process off so that's where I started looking:

    private void ImportSite() {
        if (Options.SkipImportSite)
            return;
    
        try {
            Importer.ImportModule(PythonContext.SharedContext, null, "site", false, -1);
        } catch (Exception e) {
            Console.Write(Language.FormatException(e), Style.Error);
        }
    }

It turns out that site is a special Python module which is imported by default when the interpreter starts (for all implementations - IronPython, JythonPyPy and good old vanilla CPython). It's responsible for doing some platform-specific module path setup.

Python Confession #2: I had never heard of the site module before.

So how does importing site cause us to execute some code relating to zipimporter? Searching through the call stack at the point of the Exception shows that all this seems to come from FindImporterForPath which takes every function in path_hooks and attempts to apply it to the path we're importing.

    /// 
    /// Finds a user defined importer for the given path or returns null if no importer
    /// handles this path.
    /// 
    private static object FindImporterForPath(CodeContext/*!*/ context, string dirname) {
        List pathHooks = PythonContext.GetContext(context).GetSystemStateValue("path_hooks") as List;

        foreach (object hook in (IEnumerable)pathHooks) {
            try {
               object handler = PythonCalls.Call(context, hook, dirname);

                if (handler != null) {
                    return handler;
                }
            } catch (ImportException) {
                // we can't handle the path
            }
        }

So we call every path_hook with the module we're importing as an argument using PythonCalls.Call()The path_hooks themselves come from the sys module:

    sys.path_hooks

A list of callables that take a path argument to try to create a finder for the path. If a finder can be created, it is to be returned by the callable, else raise ImportError.

Python Confession #3: I had never heard of or used path_hooks before.

So what is in path_hooks? If I keep hitting continue on the Visual Studio debugger the Exception is caught, I reach the python REPL and can inspect what is in sys.path_hooks:

And there it is - zipimporter. Now we're approaching an explanation - when the IronPython interpreter is initialised it imports the site module which takes the everything in path_hooks and applies them to all the modules in our path - but since there are no .zip files anywhere in our path zipimporter (the only path hook) cannot find anything to operate on, so throws an exception which is normally caught and handled.

So this is normal behaviour - the exception is even expected, since path_hooks' documentation states that if a path_hook fails it raises an Exception.

Conclusion

OK nothing special has happened here since IronPython is behaving exactly as it should, however unexpected it was to me. That said, this is a very nice way to learn some new Python concepts, like:

  1. the zipimport module
  2. the site module
  3. sys.path_hooks 

I even have a half-working clone of zipimporter for tarballs called tgzimporter, but there's little need for such functionality as I suspect that even zipimporter is seldom used. 

It would've been easy to just keep hitting the "F5" key until I hit the REPL, but then I would likely have struggled to find a way to approach the source code and perhaps would've put it off indefinitely. Hopefully now I'll find some way to continue to improve my understanding and contribute to the IronPython project.