When I was puddling around in IronPython ahead of an upcoming project I spotted something interesting - when we want to deal with integers within the IronPython interpreter we frequently call a function in Microsoft.Scripting.Runtime.ScriptingRuntimeHelpers called Int32ToObject:
public static object Int32ToObject(Int32 value) {
// caches improves pystone by ~5-10% on MS .Net 1.1, this is a very integer intense app
// TODO: investigate if this still helps perf. There's evidence that it's harmful on
// .NET 3.5 and 4.0
if (value < MAX_CACHE && value >= MIN_CACHE) {
return cache[value - MIN_CACHE];
}
return (object)value;
}
$ cat hello.py
print "Hello, world!"
$ ./bin/Debug/ipy.exe hello.py | grep -c "Int32ToObject"
1317
The code itself specifically references the pystone benchmark (which I found here) in a comment suggesting that we could see a performance improvement on pystone with versions of .NET newer than 3.5 - which appears to be the minimum version later versions of IronPython supports.
I built the Release configuration of IronPython both before and after removing this cache functionality, and tested the default pystone benchmark on my work computer (a pretty hefty 8-core Xeon E3-1275 @ 3.6 GHz, with 32GB RAM) - the results are below where I ran the test 10 times and took the average. The values output by the benchmark are "pystones per second" - where one "pystone" is an iteration through the main loop inside the Proc0() function which performs a number of integer operations and function calls:
| Before | After | |
|---|---|---|
| 1. | 238262 | 234663 |
| 2. | 239115 | 234595 |
| 3. | 245149 | 245931 |
| 4. | 237845 | 302562 |
| 5. | 228906 | 295027 |
| 6. | 248294 | 275535 |
| 7. | 258694 | 297271 |
| 8. | 246650 | 282791 |
| 9. | 235741 | 296104 |
| 10. | 233604 | 274396 |
| Average | 241226 | 273887 |
So with the fix we see 32,661 more of these iterations per-second, which is roughly a 13.5% improvement. This makes sense - presumably casting each int to object has been improved so that it's nearly free, leaving the overhead being a simple function call.