Discover a Linux Utility - xssstate

To learn more about Debian and Linux in general I'm selecting utilities at random from my PATH using the command below, learning what they do and writing a blog post about it. Previously: Part 1, Part 2, Part 3

    $ (for folder in `echo $PATH | sed "s/:/\\n/g"`; do ls -1 $folder; done; ) | shuf -n 1 | xargs man

Today's utility is xssstate, which lets your check the status of X window system's screensaver. It's written by the suckless guys, who've created a number of very good tools, such as surf (a minimalist web browser) and dmenu (autocompleting program launcher), both of which I use regularly.

The utility itself is pretty simple, there are only four command line switches including -v, so this will be pretty short post. First we can check if the screensaver is currently enabled using -t switch:

    $ xssstate -s
    off

Obviously the screensaver is off, since I am actively using this computer - however if the screensaver was active it'd print "on" and if it was disabled altogether you'd see "disabled".

To check the time idle in milliseconds, use the -i switch:

    $ xssstate -i
    2
    $ sleep 5 && xssstate -i
    4947

And to get time in milliseconds until the screensaver activates, invoke it with -t:

    $ xssstate -t
    599998
    $ sleep 10 && xssstate -t
    590052

The way the utility does this is by using some functionality provided by a X11 library, wrapped in a handful of switch statements (they have their own neat little github-style source browser if you want to check out xssstate.c in its entirety):

    // ...
    info = XScreenSaverAllocInfo();
    XScreenSaverQueryInfo(dpy, DefaultRootWindow(dpy), info);

    if (showstate) {
    	switch(info->state) {
	case ScreenSaverOn:
		printf("on\n");
		break;
	case ScreenSaverOff:
		printf("off\n");
		break;
	case ScreenSaverDisabled:
		printf("disabled\n");
		break;
	}
    } else if (showtill) {
	switch(info->state) {
	case ScreenSaverOn:
		printf("0\n");
		break;
	case ScreenSaverOff:
		printf("%lu\n", info->til_or_since);
		break;
	case ScreenSaverDisabled:
		printf("-1\n");
		break;
	}
    } else if (showidle) {
	printf("%lu\n", info->idle);
    }
    // ...

When I do these articles I like to show some practical real-life usage of the utility - in this case I decided to add a little timer to my xmobar showing how long my computer had been idle. To this I added a Run Com entry to my xmobarrc:

    -- also stick %xssstate% into the template
    Run Com "xssstate" [ "-t" ] "xssstate" 10,

This ends up showing with something like the below - apologies for shaky-cam!

Download IMG_0014.MOV

IronPython - Efficient string concatenation

Previously I'd done a bit of fiddling around with Python, revisiting some string concatenation benchmarks from an old-ish article and trying to explain some unexpected results. After playing around a bit with IronPython I was curious whether it'd be faster or slower than CPython on windows.

I installed the latest versions of both IronPython (2.7.5) and CPython (2.7.12) into my Windows 10 VM and re-ran the same set of tests.

The most interesting thing I learned was that some changes to how memory was allocated for the new buffer caused the "naive" method to be on par with the optimised version. As it turns out, IronPython doesn't actually have this - so running stest.py we get the following:

    $ ipy64 stest.py 1
    method 1
    time 2406.60858154 ms
    output size  86 kb

    $ ipy64 stest.py 6
    method 6
    time 46.9284057617 ms
    output size  86 kb

IronPython is a totally different beast to CPython, so my previous method of debugging - taking the code and examining it with the dis module doesn't yield anything useful:

This is because it compiles it into a different set of instructions to be executed using the .NET CLR (it's important to note that it does not go directly to .NET IL, there's still a level of instructions above this similar to CPythons opcodes).

However since we're on Windows with .NET we do have Visual Studio - which is arguably easier than working through python bytecode instructions in a text editor. To begin with it's extremely simple to find out where exactly we spend most of our execution time using dotTrace by JetBrains:

So the program execution is split with roughly 50% spent in initialisation (ImportSite, again!) but that's not included in our benchmark, however the remaining 50% is spent in String.Concat() in mscorlib (source here) which is what we're interested in:

    [System.Security.SecuritySafeCritical]  // auto-generated
    public static String Concat(String str0, String str1) {
        Contract.Ensures(Contract.Result() != null);
        Contract.Ensures(Contract.Result().Length ==
            (str0 == null ? 0 : str0.Length) +
            (str1 == null ? 0 : str1.Length));
        Contract.EndContractBlock();

        if (IsNullOrEmpty(str0)) {
            if (IsNullOrEmpty(str1)) {
                return String.Empty;
            }
            return str1;
        }

        if (IsNullOrEmpty(str1)) {
            return str0;
        }

        int str0Length = str0.Length;
        
        String result = FastAllocateString(str0Length + str1.Length);
        
        FillStringChecked(result, 0,          str0);
        FillStringChecked(result, str0Length, str1);
        
        return result;
    }

This explains why things are so slow - when concatenating two strings there are no realloc-based tricks like CPython had - we allocate a new memory buffer every time, copy both strings into it, and let the garbage collector handle the old buffers.

Sadly it's pretty non-trivial for someone like me to implement a similar optimisation here - since we depend on the underlying string implementation in .NET we're stuck with how string concatenation was implemented there. I toyed with the idea of re-writing a hacky reimplementation of FastAllocateString as FastReallocateString specifically for the += operator (it's possible to do - we need to change PythonBinaryOperationBinder.BindDelegate() to handle Add and AddAssign differently) this would've involved getting stuck into the mscorlib sources in coreclr - and I'd rather stay in Python-land for the time being.

However since it's possible to access the .NET standard libraries from IronPython we can at least test how System.Text.StringBuilder performs, since it is designed to solve this very problem. So I setup the stest.py code I previously used, and re-ran them on my Windows 10 VM for both CPython 2.7.12 and IronPython 2.7.5. Just to quickly recap, here are the string concatenation methods I tested:

Method 1: simple concatenation: s1 += s2

Method 2: concatenation using MutableString (s1 += s2, where s1 is a MutableString)

Method 3: appending to a long array of char

Method 4: building a list of strings, then calling "".join()

Method 5: writing to a cStringIO buffer in memory using write()

Method 6: same as Method 4 but using a list comprehension inline

Method 7: using System.Text.StringBuilder (IronPython only)

	runtime (ms)	concatenations per second
method 1	16.00	1,250,000
method 2	108.99	183,503
method 3	14.99	1,334,222
method 4	3.00	6,666,666
method 5	6.00	3,333,333
method 6	2.00	10,000,000

	runtime (ms)	concatenations per second
method 1	2517.44	7,944
method 2	3968.87	5,039
method 3	25.39	787,711
method 4	42.13	474,721
method 5	35.56	562,429
method 6	33.22	602,046
method 7	22.43	891,662

Conclusion

So in IronPython the most efficient way to join strings together is to hook into .NET's System.Text library and use StringBuilder, no surprises there. What was surprising was how much slower IronPython was than CPython. I'm curious if this is just a freak result or if IronPython is known to be pretty slow. I'll probably not attempt to speed up the AddAssign op in IronPython, however I would like to investigate why IronPython is so slow, and if there are any plans to improve things. In addition I was surprised that the realloc trick had little-to-no effect on CPython in Windows (i.e. method 1 was slow even on 2.7.12).

I am a little sick of this benchmark now - I might revisit it one final time to compare it across CPython, IronPython, PyPy and Pyjion to complete the picture, but only if I'm really bored :)

IronPython - tackling some unexpected Exceptions

After I listened to an episode of Talk Python To Me featuring Alex Earl from the IronPython project I learned that not only is IronPython not dead/dying, but it's actually seeing a bit of a resurgence recently. I grabbed the sources from the IronLanguages GitHub, setup my dev environment, opened it up and launched the IronPythonConsole project hoping to see the familiar python REPL.

However instead I saw that it had hit an exception:

I was frustrated at first, thinking I'd done something wrong, but realised that getting to the bottom of an Exception was a fine way to introduce yourself to a new codebase.

The exception itself is a ZipImportError with the text "not a Zip file" and is thrown in the constructor for zipimporter.

Python Confession #1: I had never heard of or used zipimporter before.

Since I'd never heard of the class before I had no idea why the IronPython runtime would be calling this and especially on something which didn't appear to exist. So it's time to dig through the call stack to see where this comes from:

It appears that PythonCommandLine.ImportSite kicks this process off so that's where I started looking:

    private void ImportSite() {
        if (Options.SkipImportSite)
            return;
    
        try {
            Importer.ImportModule(PythonContext.SharedContext, null, "site", false, -1);
        } catch (Exception e) {
            Console.Write(Language.FormatException(e), Style.Error);
        }
    }

It turns out that site is a special Python module which is imported by default when the interpreter starts (for all implementations - IronPython, Jython, PyPy and good old vanilla CPython). It's responsible for doing some platform-specific module path setup.

Python Confession #2: I had never heard of the site module before.

So how does importing site cause us to execute some code relating to zipimporter? Searching through the call stack at the point of the Exception shows that all this seems to come from FindImporterForPath which takes every function in path_hooks and attempts to apply it to the path we're importing.

    /// 
    /// Finds a user defined importer for the given path or returns null if no importer
    /// handles this path.
    /// 
    private static object FindImporterForPath(CodeContext/*!*/ context, string dirname) {
        List pathHooks = PythonContext.GetContext(context).GetSystemStateValue("path_hooks") as List;

        foreach (object hook in (IEnumerable)pathHooks) {
            try {
               object handler = PythonCalls.Call(context, hook, dirname);

                if (handler != null) {
                    return handler;
                }
            } catch (ImportException) {
                // we can't handle the path
            }
        }

So we call every path_hook with the module we're importing as an argument using PythonCalls.Call(). The path_hooks themselves come from the sys module:

sys.path_hooks

A list of callables that take a path argument to try to create a finder for the path. If a finder can be created, it is to be returned by the callable, else raise ImportError.

Python Confession #3: I had never heard of or used path_hooks before.

So what is in path_hooks? If I keep hitting continue on the Visual Studio debugger the Exception is caught, I reach the python REPL and can inspect what is in sys.path_hooks:

And there it is - zipimporter. Now we're approaching an explanation - when the IronPython interpreter is initialised it imports the site module which takes the everything in path_hooks and applies them to all the modules in our path - but since there are no .zip files anywhere in our path zipimporter (the only path hook) cannot find anything to operate on, so throws an exception which is normally caught and handled.

So this is normal behaviour - the exception is even expected, since path_hooks' documentation states that if a path_hook fails it raises an Exception.

Conclusion

OK nothing special has happened here since IronPython is behaving exactly as it should, however unexpected it was to me. That said, this is a very nice way to learn some new Python concepts, like:

the zipimport module
the site module
sys.path_hooks

I even have a half-working clone of zipimporter for tarballs called tgzimporter, but there's little need for such functionality as I suspect that even zipimporter is seldom used.

It would've been easy to just keep hitting the "F5" key until I hit the REPL, but then I would likely have struggled to find a way to approach the source code and perhaps would've put it off indefinitely. Hopefully now I'll find some way to continue to improve my understanding and contribute to the IronPython project.

Thinkpad X250 - SMS and GPS via Python

When I was messing around trying to get the 4G modem working on my Thinkpad X250 for this post I ended up doing a little debugging by connecting to the modem over serial using cu:

    $ cu -h -l /dev/ttyACM0
    Connected.
    AT^M
    OK

It turns out that the Sierra EM7345 modem in my Thinkpad can be controlled through AT commands sent over serial. Over at zukota.com there's a guy who's done a seriously good job of experimenting with this modem and documenting what was previously not very well documented.

For example if we wanted to use a SIM locked with the PIN "1234", then send the message "Hello, world!" to the Czech number 775123456, we would connect as above and enter the following:

    AT+CPIN="1234"^M

    OK
    AT+CMGS="+420775356278"^M
    > Hello, world!^Z^M
    +CMGS: 40

    OK

Getting the GPS position involved issuing the XLCSLSR at command and parsing it's output. This is a little more complicated, since there's a slight delay in the response, and the response contains dozens of fields (I've included the request/response in its entirety below so you can see):

    AT+XLCSLSR=1,1,,,,,,,,,,^M
    +XLCSLSR: request id 2

    OK

    +XLCSLSR: 2, 49.195669 N, 16.606075 E, 119.996932, 48.743179, 39.616302,143, 100.997169,67,2016/05/10,18:38:22,0,1,75.45,2.28,-0.25,2.20,0.64,239919,239919.74,,,4.50,2.50,3.50,118.92,62.80,100.98,,,1,1896,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4,73.04,70.04,21.00,1,,,,21 ,61.03,283.14,41.00,1,,,,16 ,20.01,173.09,30.00,1,,,,10 ,17.01,100.05,32.00,1,,,,29 

    OK

I was delighted when I found this out, as I had recently discovered the pyserial library, and realised I could interact with the modem and explore/expose some of the modem's functionality via a Python library.

em73xx

I rolled up my sleeves and wrote a library called em73xx (PyPI, GitHub) which will issue the necessary AT commands to send and receive SMS messages, and retrieve the current location using the modem's GPS functionality.

To install the library either retrieve from pypi using pip:

    $ pip install em73xx

... or you can clone the smcl/em73xx repo and install using setup.py:

    $ git clone https://github.com/smcl/py-em73xx
    $ cd py-em73xx
    $ python setup.py install

Once it's installed you can use em73xx by importing the Modem class and instantiating it with the path to the serial device on your system (probably /dev/ttyACM0, as below) and optionally specify the PIN for the sim inserted:

    from em73xx import Modem
    modem = Modem("/dev/ttyACM0", pin="1234")

To retrieve all the SMS messages on the SIM, and print them to stdout we can use the getSMS method:

    messages = modem.getSMS()
    for m in messages:
        print("from %s: % (m.sender))
        print("\t%s" % (m.message))

If we want to send a message we can use the sendSMS method:

    # request an 60 minute tram ticket in brno!
    modem.sendSMS("90206", "BRNO")

And finally if we want to retrieve the current GPS position, we can use the getGPS method - note that the modem can fail to get a GPS fix, and will return None if so:

    gps = modem.getGPS()
    if gps:
        print(gps.latitude)
        print(gps.longitude)

xsms

Ultimately I started this because I wanted to have a simple utility that integrated with xmobar which would inform me whether I had received a new SMS, and would allow me to reply to existing messages or compose an entirely new one. To achieve this I wrote xsms which is a utility that will either print to stdout the number of unread messages (for xmobar) or launch a GUI, depending on the command line switches used.

Again you can either retrieve xsms from pypi using pip:

    $ pip install xsms

... or clone the smcl/xsms repo from github and install using setup.py:

    $ git clone https://github.com/smcl/xsms
    $ cd xsms
    $ python setup.py install

Once xsms is installed you can either launch it standalone.

    $ python -m xsms --device=/dev/ttyACM0

And if you want to have a little indicator in your xmobar you can use something like the below (which takes advantage of the ability to specify the font via tags to easily get some icons from Font Awesome):

    -- assumes you have Font Awesome installed and used here:
    -- additionalFonts = ["xft:FontAwesome-10"],
    Run Com "/usr/bin/python" [ "-m", "xsms", "-d", "/dev/ttyACM0", "-p", "1234", "-r", "", "-u", " %d" ] "xsms" 600,

So when you add %sms% to your xmobarrc's template and restart you'll see something like this:

... and if you want to be able to click the icon to raise the GUI, you can surround it with an <action> which invokes xsms:

  template = "%StdinReader% }{ ... stuff ... <action=`python -m xsms -g -d /dev/ttyACM0 -p 1234`>%xsms%</action> ... "

Conclusion

This is an idea that sat half-finished in my ~/dev folder for about 6 months, and I'm really glad that I was able to take it from a single hacky one-shot script to two fully-fledged packages released on PyPI and ready to be used. There is still some additional functionality I'd like to add to em73xx, for example I only grab longitude/latitude from the GPS data and I'd like to be able to reset or power-cycle the modem in case of problems, however it's in pretty good shape overall.

As for xsms, it's the second Tkinter application I've put together, and while I'm finding it easier each time I write something (with fewer cut-paste lines from effbot.org's Tkinter docs) it's starting to be a little unwieldy. For example the ttk module allows you to add widgets to your application with a unified theme, but it's missing a multi-line text entry widget - which is something I use for inputting and displaying the SMS messages. This meant that I had to either decide to put off styling xsms or add some hacks to customise the Tkinter.Text widget that I ultimately used. In addition programatically constructing the UI using the grid() and pack() layout managers feels a little like creating a webpage laid out using nested <table> elements in the late 90's. Ultimately if I find myself writing a Python desktop app in future I'll spend a little more time investigating the frameworks available and weighing them up against using Tkinter, now that I'm broadly familiar with it.

Useful Links

Sierra EM73xx AT Command Reference (requires free registration)
Nice article from Zukota.com on the EM7345 AT commands
xsms on GitHub
py-em73xx on GitHub

Discover a Linux Utility - jjs

To learn more about Debian and Linux in general I'm selecting utilities at random from my PATH using the command below, learning what they do and writing a blog post about it. Previously: Part 1, Part 2

    $ (for folder in `echo $PATH | sed "s/:/\\n/g"`; do ls -1 $folder; done; ) | shuf -n 1 | xargs man

The random command I'm looking at this time is jjs - whose summary is simply "Invokes the Nashorn engine" which is a little vague and as it turns out is underselling things slightly. Nashorn is a Javascript engine written in Java for the Java VM with very neat integration and access to the range of libraries and functionality provided by the JDK.

While I'm not a Java or Javascript developer by trade I am surprised that I had never seen this pop up on Hacker News or lobste.rs before, and I'm sure many professional Java devs aren't particularly familiar with it either. I was even more surprised how quick it was to get productive (the man page suggests using the println function, which doesn't exist) since my hazy memories from using Java at university involved fiddling around with the CLASSPATH env variable and launching things in weird ways.

Entering jjs takes you to a REPL prompt where you can muck around with javascript to your heart's content while you get familiar - here you'll see what I mean about the example in the manpage, println should be print:

    $ jjs
    jjs> println("hello, world!")
    :1 ReferenceError: "println" is not defined
    jjs> print("hello, world!") 
    hello, world!
    jjs> function add(x, y) { return x + y }
    function add(x, y) { return x + y }
    jjs> add(10, 20)
    30
    jjs> function fib(n) { if (n < 1) { return 0 } else if (n <= 2) { return 1 } else { return fib (n - 1) + fib (n - 2)} }
    function fib(n) { if (n < 1) { return 0 } else if (n <= 2) { return 1 } else { return fib (n - 1) + fib (n - 2)} }
    jjs> fib(3)
    2
    jjs> fib(50)
    12586269025
    jjs>

What I really like is that there's a fuss-free interface to the entire JDK through an object conveniently called java:

    jjs> var str = new java.lang.String("Hello, world!")
    jjs> java.lang.System.out.println(str)
    Hello, world!
    jjs> var dict = new java.util.HashMap()
    jjs> dict.put("foo", "bar")
    null
    jjs> dict.put("baf", "baz")
    null
    jjs> dict.forEach(function(k,v) { print(k + v); }) 
    bafbaz
    foobar

That final line pretty much sums up why I think this is cool - having created an instance of a java.util.ashMap we iterate over it using its forEach method but we can give it a javascript lambda function as an argument to apply to each key/value pair. There's no denying that this is neat :)

I wanted to do a little measurement between the V8 compiler used by Chrome and Node.js but it turns out this has already been done: http://blog.nidi.guru/tech/java/2015/10/05/graphviz-with-java/

So Nashorn is actually a good deal slower than V8 (and that's with a cold VM, warmed up the difference is more stark) - which isn't a huge problem I suppose unless you're doing some really heavy lifting using Nashorn. I don't think many people are.

My previous "Discover ..." posts generally had to lay a bit of groundwork to introduce various OS concepts before the utility could be understood. However since Java and Javascript are both pretty commonplace there's no introduction needed, and the best way to show it off and understand what it's capable of is to write some code.

So I implemented four demo programs which are small enough to comprehend in a minute or so, explore the interesting relationship between js/JDK and demonstrates some relatively common use-cases for programming languages:

shell scripting
unix system utilities
visualisation
web services

Demo 1 - a simple shell script

I'd recently read a blog post at IBM developerWorks tracing this history of UNIX shells (csh, tcsh, bash, etc) and implementing the same task in each one, I figured that this was as good a task as any to start with. I implemented this as findexec.js below:

The code is a little less elegant than the bash version, we rely pretty heavily on java.io.File to accomplish some of the things built into bash, but realistically we're using the wrong tool for the job here.

    $ jjs findexec.js -- /usr/local/bin
    /usr/local/bin/fsv
    /usr/local/bin/n-m
    /usr/local/bin/behave
    /usr/local/bin/tzupdate
    /usr/local/bin/dotnet

Demo 2 - a unix-y utility

The next program I wrote was a unix-style system utility that reverses the order of its input, which is either

a list of lines read in from stdin (piped or input)
a list of files supplied as arguments, which are each written reversed

This was a little more fun to write - couple of interesting things here. First was ability to use generics - could just create new ArrayList without specifying the type. Second was polymorphism between a Java stdlib class BufferedReader and a javascript class I wrote MultipleFileReader which both happen to implement a method readLine() but which don't explicitly implement any common interface or abstract class.

I implemented this as filerev.js, which is a wee bit long for this blog but can be found at this Gist on GitHub. Below is a little snippet showing its usage:

    $ cat > foo << EOF
    > herp
    > derp
    > blorp
    > EOF
    $ jjs filerev.js -- foo
    blorp
    derp
    herp
    $ cat foo | jjs filerev.js 
    blorp
    derp
    herp

Demo 3 - a JavaFX utility

In the manpage I noticed the -fx which "launches the script as a JavaFX application". I hadn't used Java since university so I had no clue what JavaFX was, but it's apparently a set of libraries for writing graphical applications in Java, and mostly replaces Swing/AWT toolkits among other things.

After I read a bunch of documentation and puddled around a bit I decided that I wanted an application which can quickly produce little line graphs from input piped via stdin (and dynamically resizes according to the max/min values in the dataset).

This was a little trickier than the previous two examples but after finding the documentation on extending abstract classes (for the AnimationTimer) the rest was surprisingly straight forward. I created a file containing a a few repetitions of a sine wave, and piped it in to generate the visualisation below:

Download out-3.ogv

Again, the code is a Gist on GitHub as plotstdin.js along with the input data (and the code that generated it).

Demo 4 - a web service

For my final example I wanted to spin up a quick and simple web service, however this apparently not so straight-forward in Java. While the JDK has plenty of libraries available, there's no equivalent to SimpleHttpServer in Python or http.createServer() in Node.js - it seems that you need to use a third party package like Tomcat or Jetty. This presents a bit of a problem, since I want to create a little self-contained example without having to resort to resolve any dependencies using Gradle and Maven, which I'm unfamiliar with and would be tricky to work with using Nashorn.

However I found a nice project on GitHub which handles most of this called Nasven which wraps Maven and let's me easily add a dependency for Spark (lightweight web framework) and launch it.

I created a repo called thumbooo which will spin up a simple web service running on port 8080, with a single POST resource that produces < 100 x 100 pixel thumbnails for submitted images.

While Nasven handled the toughest part, I still ran into trouble since Nashorn was unable to determine which of the three ImageIO.write() functions it should call since the image resizing produced an object of type ToolkitImage and it expected a RenderedImage.

You can ran the following to start the server:

    $ git clone https://github.com/smcl/thumbooo
    Cloning into 'thumbooo'...
    remote: Counting objects: 42, done.
    remote: Compressing objects: 100% (37/37), done.
    remote: Total 42 (delta 18), reused 17 (delta 4), pack-reused 0
    Unpacking objects: 100% (42/42), done.
    Checking connectivity... done.
    $ cd thumbooo
    $ jjs -scripting nasven.js -- src

and in another xterm use cURL to POST an image to the /thumb endpoint:

    $ curl -L -o thumb.png --form "file=@lenna.png" http://localhost:8080/thumb
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  494k    0 32358  100  462k   638k  9353k --:--:-- --:--:-- --:--:-- 9447k

This resized the original 512x512 Lenna image ...

... to a 100x100 thumbnail:

Conclusion

I'm impressed with the integration between Javascript and Java that Nashorn provides, as demonstrated by the snippets above it's relatively flexible and easy to use (if a tad slow!). As a technical exercise it's very impressive, however I'm still not 100% sure what it was originally meant to be used for. Perhaps as a non-js/Java developer I'm missing something, but I feel like there are already better options available for individual hackers all the way to Enterprise-scale projects. I certainly had fun using it though!

Discover a Linux Utility - slabtop

To learn more about Debian and Linux in general I'm selecting utilities at random from my PATH using the command below, learning what they do and writing a blog post about it. Previously: Part 1

    $ (for folder in `echo $PATH | sed "s/:/\\n/g"`; do ls -1 $folder; done; ) | shuf -n 1 | xargs man

Today's randomly selected utility is called slabtop - according to the man page it should "display kernel slab cache information in real time". I was pretty pleased about this one actually, I had no idea about slab allocators so it was good to have something to explore on a stormy day:

Download trim.E7474A60-2E53-4369-9F38-0F35C1EFB786.MOV

Put simply, Linux permits us to manage and allocate chunks of memory from a set of fixed-size buffers - called slabs - for different purposes. And slaptop gives us a view into the state of each slab cache: what they're called, how many elements are allocated in total and how full each slab is:

So for the highlighted row, we can know the following information:

- name ("kmalloc-32")

- number of slabs allocated (163)

- number of objects allocated (20007)

- percentage of slab space allocated (98%)

- total size of the slab space allocated (652K)

... and so on.

What is slab allocation

So what does all this mean and why do we need it? During the execution of some program we may have requested and released lots of differently sized objects from the heap. This could result in a pretty fragmented heap.

This could mean that allocating new buffers is either a little slow or unpredictable, since we need to search for an appropriately sized buffer before we can mark it as allocated and return it to the calling code. If any part of our application depends on a particular object being allocated quickly and predictably this is not ideal.

Slab allocation involves setting up a slab which can hold certain amount of objects of the same size and type. Since we're dealing with fixed-size objects allocating space for a new object is quick - we just need to scan through an array to find an unused slot, mark it as used, then calculate the address (start_addr + size * index) and return it.

We could roll our own implementation since it's pretty straightforward to implement - and that would be quite an interesting project for an evening or so. However if we're writing Linux kernel or module code there's already an existing set of functions which are well understood and battle-hardened. In addition these functions hide a lot of underlying complexity by abstracting a lot of the memory management (we can actually have numerous slabs, but Linux manages this for us).

How Linux supports slab allocation

To start with, if we want to create a new slab cache (a managed collection of slabs) we can use kmem_cache_create:

    struct kmem_cache *kmem_cache_create(
	    const char *,      // name of cache
	    size_t,            // size of each object
	    size_t,            // object alignment
	    unsigned long,     // any special flags
	    void (*)(void *)   // slab constructor
    );

To allocate a new object from this cache we can use kmem_cache_alloc:

    void *kmem_cache_alloc( 
    	struct kmem_cache *, // ptr to the cache 
    	gfp_t flags          // flags to use if we need new slab
    );

Then when we need to free any, kmem_cache_free which will mark the space as unused and make it available to the allocator:

    void kmem_cache_free(
    	struct kmem_cache *, // ptr to the cache
    	void *               // ptr to object to free
    );

And finally when we're done and we want to remove the cache entirely there is kmem_cache_destroy which will free and release all the slabs related to a given cache:

	void kmem_cache_destroy(
		struct kmem_cache *  // ptr to the cache
	);

How to use slab allocation

As an example of how this could be used we can think about it in the context of a simple kernel module. I'll roughly go over the individual parts, and then present a full-blown module to explore. Firstly we'll want to defined the type we want to create the slab cache for - imaginatively named slabthing - and use it to declare a variable s:

    typedef struct
    {
      char foo; 
      char bar; 
      char baz;
    } slabthing;

    slabthing *s;

Inside our modules init_module we can use kmem_cache_create - so that our slab cache will be called "slabdemo", it'll allocate objects with the required size. Since I don't really care much about performance (specifying an appropriate alignment could permit us to use different/faster load instructions) so we'll just request it to be single byte-aligned.

    struct kmem_cache *slabthing_cache;

    int init_module(void)
    {
      slabthing_cache = kmem_cache_create("slabdemo", sizeof(slabthing), 1, NULL, NULL);
      // other module stuff...
    }

If our module wants to allocate space for the variable s in our cache we can call kmem_cache_alloc:

    s = kmem_cache_alloc(slabthing_cache, NULL);

When we want to free the cache, in this case in our module cleanup code, we can call kmem_cache_free and kmem_cache_destroy. I don't think the free is necessary in this case, but I've just included it anyway:

    void cleanup_module(void) {
        kmem_cache_free(slabthing_cache, s);
        kmem_cache_destroy(slabthing_cache);
     }

To see this in action I've created a little demo module that creates a cache and then allocates 128 objects when it's loaded - then frees all the objects and destroys them when it's unloaded. You'll need to make sure you have the sources checked out for your current kernel inside /usr/src/linux-<version>:

    $ git clone http://github.com/smcl/slabdemo
    $ cd slabdemo
    $ make

If our module builds successfully we can load it using insmod and then fire up slabtop to see how things look:

    $ sudo insmod slabdemo.ko
    $ sudo slabtop

So there it is - our slabdemo cache is there, with 128 objects allocated in a single slab which can fit 240 objects in total (peculiar number!). If you check out slabtop you can easily see things like kmalloc-512, kmalloc-1024 and kmalloc-2048 which are probably slab caches used by kmalloc for allocating memory pools of 512, 1024 and 2048 bytes respectively. In any case, that's another issue for another day. Obviously this is a tool designed for kernel developers, or for sysadmins of particularly performance-critical sysadmins, but I'm glad I took the time to potter around with it.

Python - Efficient String Concatenation in Python (2016 edition)

After working through Philip Guo's excellent series of lectures on the CPython VM I found myself looking at the performance of string concatenation. Strings are immutable in python - so they cannot be modified in-place, you have to create a new string. This also affects string concatenation, so if we want to tack one string onto the end of another we might have some code like this:

    x = "hello"
    x += ", world!"

When the CPython interpreter is running the above code, it's doing something conceptually similar to the following:

allocate a buffer for the string "hello" 6 bytes => len("hello") + 1, and copy the string into it
make x reference this
allocate a new buffer with length 14 bytes => len("hello") + len(", world") + 1
copy "hello", then ", world" into this new buffer, and make x reference it
decrement reference on the old string, possibly free it

Note even though Python is unlike C in that it doesn't rely on a trailing NULL to terminate strings (we store each string's length), it does by convention still null-terminate them - which is where the extra "+ 1" byte comes from.

Anyway, if we have to repeatedly join strings together we have to be careful how we do it, just in case we accidentally introduce some slow, inefficient code. This has been long understood in the python community, for instance here is an article from 2004 which explored different methods of concatenating strings, and compared their performance.

I was curious if there was any difference on how things shaped up on modern hardware (the test used an old 400 MHz Celeron) and a newer version of Python (this used python 2.2.1, the latest version of python 2 is 2.7.12) - so I grabbed the source code and started playing around.

It needed a tiny bit of tweaking however - the timing module it uses doesn't seem exist in the standard library and current version in pypi isn't compatible. I modified it to use the time module, the source code is here on GitHub if you're interested: stest-new.py.

As a quick recap, there are 6 methods of concatenating strings which we're putting under the microscope:

Method 1: simple concatenation: s1 += s2

Method 2: concatenation using MutableString (s1 += s2, where s1 is a MutableString)

Method 3: appending to a long array of char

Method 4: building a list of strings, then calling "".join()

Method 5: writing to a cStringIO buffer in memory using write()

Method 6: same as Method 4 but using a list comprehension inline

The original tests were performed using Python 2.2.1, for comparison's sake I've re-run them on my computer just to see:

    $ for i in {1..6}; do python stest.py $i; done

The results for Python 2.2.1 are below:

	runtime (ms)	concatenations per second
Method 1	55.11	362,910
Method 2	74.67	267,852
Method 3	10.80	1,851,337
Method 4	6.21	3,220,611
Method 5	8.11	2,467,612
Method 6	5.42	3,694,808

So in Python 2.2.1 the list comprehension method was the fastest by a pretty clear margin. However when I re-ran using 2.7.12 things turned out very differently:

	runtime (ms)	concatenations per second
Method 1	1.7995	11,113,977
Method 2	90.1073	221,957
Method 3	3.9557	5,055,967
Method 4	2.1804	9,172,689
Method 5	4.8047	4,162,585
Method 6	1.4191	14,093,289

In the time since 2.2.1 the performance of the naïve string concatenation method has improved hugely, it's now it's the fastest method (note: this is using a Python interpreter I built, using the Python 2,7.12 package that comes with Debian it's actually the fastest). This is surprising, since I thought it was relatively well-established and understood that it was slow, or just not ... pythonic. I was curious exactly why Method 6 was now slower than Method 1, so I disassembled the functions using the dis module.

There were a number of SET_LINENO instructions in the 2.2.1 version which I've not shown - it makes the disassembly a little easier to read and the performance impact would have been negligible - when tracing is turned off (which it is) all this instruction did was set the current frame's f_lineno and continue executing the next instruction.

Disassembly - Method 1

Python 2.2.1	Python 2.7.12
6 LOAD_CONST 1 ('') 9 STORE_FAST 0 (out_str) 15 SETUP_LOOP 37 (to 55) 18 LOAD_GLOBAL 1 (xrange) 21 LOAD_GLOBAL 2 (loop_count) 24 CALL_FUNCTION 1 27 GET_ITER 31 FOR_ITER 20 (to 54) 34 STORE_FAST 1 (num) 40 LOAD_FAST 0 (out_str) 43 LOAD_FAST 1 (num) 46 UNARY_CONVERT 47 INPLACE_ADD 48 STORE_FAST 0 (out_str) 51 JUMP_ABSOLUTE 28 54 POP_BLOCK 58 LOAD_FAST 0 (out_str) 61 RETURN_VALUE	0 LOAD_CONST 1 ('') 3 STORE_FAST 0 (out_str) 6 SETUP_LOOP 31 (to 40) 9 LOAD_GLOBAL 0 (xrange) 12 LOAD_GLOBAL 1 (loop_count) 15 CALL_FUNCTION 1 18 GET_ITER 19 FOR_ITER 17 (to 39) 22 STORE_FAST 1 (num) 25 LOAD_FAST 0 (out_str) 28 LOAD_FAST 1 (num) 31 UNARY_CONVERT 32 INPLACE_ADD 33 STORE_FAST 0 (out_str) 36 JUMP_ABSOLUTE 19 39 POP_BLOCK 40 LOAD_FAST 0 (out_str) 43 RETURN_VALUE

Python 2.2.1

Python 2.7.12

 6 LOAD_CONST               1 ('')
 9 STORE_FAST               0 (out_str)
15 SETUP_LOOP              37 (to 55)
18 LOAD_GLOBAL              1 (xrange)
21 LOAD_GLOBAL              2 (loop_count)
24 CALL_FUNCTION            1
27 GET_ITER            
31 FOR_ITER                20 (to 54)
34 STORE_FAST               1 (num)
40 LOAD_FAST                0 (out_str)
43 LOAD_FAST                1 (num)
46 UNARY_CONVERT       
47 INPLACE_ADD         
48 STORE_FAST               0 (out_str)
51 JUMP_ABSOLUTE           28
54 POP_BLOCK           
58 LOAD_FAST                0 (out_str)
61 RETURN_VALUE

 0 LOAD_CONST               1 ('')
 3 STORE_FAST               0 (out_str)
 6 SETUP_LOOP              31 (to 40)
 9 LOAD_GLOBAL              0 (xrange)
12 LOAD_GLOBAL              1 (loop_count)
15 CALL_FUNCTION            1
18 GET_ITER            
19 FOR_ITER                17 (to 39)
22 STORE_FAST               1 (num)
25 LOAD_FAST                0 (out_str)
28 LOAD_FAST                1 (num)
31 UNARY_CONVERT       
32 INPLACE_ADD         
33 STORE_FAST               0 (out_str)
36 JUMP_ABSOLUTE           19
39 POP_BLOCK           
40 LOAD_FAST                0 (out_str)
43 RETURN_VALUE

So pretty much identical, which I half-expected - even though we're looking at two versions of Python which were released 14 years apart the compiler doesn't seem to get a great deal of changes, so the generated bytecode shouldn't change much for simple code.

Disassembly - Method 6

Python 2.2.1	Python 2.7.12
6 LOAD_CONST 1 ('') 9 LOAD_ATTR 0 (join) 12 BUILD_LIST 0 15 DUP_TOP 16 LOAD_ATTR 1 (append) 19 STORE_FAST 0 (_[1]) 22 LOAD_GLOBAL 3 (xrange) 25 LOAD_GLOBAL 4 (loop_count) 28 CALL_FUNCTION 1 31 GET_ITER 35 FOR_ITER 17 (to 55) 38 STORE_FAST 2 (num) 41 LOAD_FAST 0 (_[1]) 44 LOAD_FAST 2 (num) 47 UNARY_CONVERT 48 CALL_FUNCTION 1 51 POP_TOP 52 JUMP_ABSOLUTE 32 55 DELETE_FAST 0 (_[1]) 58 CALL_FUNCTION 1 61 STORE_FAST 1 (out_str) 67 LOAD_FAST 1 (out_str) 70 RETURN_VALUE	0 LOAD_CONST 1 ('') 3 LOAD_ATTR 0 (join) 6 BUILD_LIST 0 9 LOAD_GLOBAL 1 (xrange) 12 LOAD_GLOBAL 2 (loop_count) 15 CALL_FUNCTION 1 18 GET_ITER 19 FOR_ITER 13 (to 35) 22 STORE_FAST 0 (num) 25 LOAD_FAST 0 (num) 28 UNARY_CONVERT 29 LIST_APPEND 2 32 JUMP_ABSOLUTE 19 35 CALL_FUNCTION 1 38 STORE_FAST 1 (out_str) 41 LOAD_FAST 1 (out_str) 44 RETURN_VALUE

Python 2.2.1

Python 2.7.12

 6 LOAD_CONST               1 ('') 
 9 LOAD_ATTR                0 (join)
12 BUILD_LIST               0
15 DUP_TOP             
16 LOAD_ATTR                1 (append)
19 STORE_FAST               0 (_[1])
22 LOAD_GLOBAL              3 (xrange)
25 LOAD_GLOBAL              4 (loop_count)
28 CALL_FUNCTION            1
31 GET_ITER            
35 FOR_ITER                17 (to 55)
38 STORE_FAST               2 (num)
41 LOAD_FAST                0 (_[1])
44 LOAD_FAST                2 (num)
47 UNARY_CONVERT       
48 CALL_FUNCTION            1
51 POP_TOP             
52 JUMP_ABSOLUTE           32
55 DELETE_FAST              0 (_[1])
58 CALL_FUNCTION            1
61 STORE_FAST               1 (out_str)
67 LOAD_FAST                1 (out_str)
70 RETURN_VALUE

 0 LOAD_CONST               1 ('')
 3 LOAD_ATTR                0 (join)
 6 BUILD_LIST               0
 9 LOAD_GLOBAL              1 (xrange)
12 LOAD_GLOBAL              2 (loop_count)
15 CALL_FUNCTION            1
18 GET_ITER            
19 FOR_ITER                13 (to 35)
22 STORE_FAST               0 (num)
25 LOAD_FAST                0 (num)
28 UNARY_CONVERT       
29 LIST_APPEND              2
32 JUMP_ABSOLUTE           19
35 CALL_FUNCTION            1
38 STORE_FAST               1 (out_str)

41 LOAD_FAST                1 (out_str)
44 RETURN_VALUE

There are slightly more differences when comparing how 2.2.1 and 2.7.12 generate the bytecode for the list comprehension method. However, aside from a couple of quirks in the 2.2.1 version (i'm not sure why we call DUP_TOP on the list we created, and I have no idea what _[1] is) much of the bytecode is broadly the same - we produce a list of integers by applying CALL_FUNCTION to xrange with argument loop_count, and then iterate over the results, calling UNARY_CONVERT on each and assembling either a list or string using INPLACE_ADD or LIST_APPEND.

Since the generated bytecode contains no substantial differences, if we want to understand why the naive concatenation method (which uses INPLACE_ADD) became super-fast over the last 14 years we'll need to inspect how the Python VM interprets this code.

Analysis - INPLACE_ADD in Python 2.2.1

To save some space and time I'll skip straight to the meat and bones of where the actual string concatenation occurs - which is in Objects/stringobject.c, in the string_concat() function. It's quite small, so I've incuded it below - but I've stripped some macros for easier reading:

static PyObject *
string_concat(register PyStringObject *a, register PyObject *bb)
{
        register unsigned int size;
        register PyStringObject *op;
        if (!PyString_Check(bb)) {
                PyErr_Format(PyExc_TypeError,
                             "cannot concatenate 'str' and '%.200s' objects",
                             bb->ob_type->tp_name);
                return NULL;
        }
#define b ((PyStringObject *)bb)
        /* Optimize cases with empty left or right operand */
        if ((a->ob_size == 0 || b->ob_size == 0) &&
            PyString_CheckExact(a) && PyString_CheckExact(b)) {
                if (a->ob_size == 0) {
                        Py_INCREF(bb);
                        return bb;
                }
                Py_INCREF(a);
                return (PyObject *)a;
        }
        size = a->ob_size + b->ob_size;
        /* PyObject_NewVar is inlined */
        op = (PyStringObject *)
                PyObject_MALLOC(sizeof(PyStringObject) + size * sizeof(char));
        if (op == NULL)
                return PyErr_NoMemory();
        PyObject_INIT_VAR(op, &PyString_Type, size);
        memcpy(op->ob_sval, a->ob_sval, (int) a->ob_size);
        memcpy(op->ob_sval + a->ob_size, b->ob_sval, (int) b->ob_size);
        op->ob_sval[size] = '\0';
        return (PyObject *) op;
#undef b
}

This is pretty much exactly the algorithm I described in my first paragraph - after a couple of checks to ensure we're definitely dealing with strings we malloc a new buffer and memcpy both strings into it.

Analysis - INPLACE_ADD in Python 2.7.12

Again I'll skip straight to where the concatenation occurs - which is in Python/ceval.c in the string_concatenate() function:

static PyObject *
string_concatenate(PyObject *v, PyObject *w,
                   PyFrameObject *f, unsigned char *next_instr)
{
    /* This function implements 'variable += expr' when both arguments                         
       are strings. */
    Py_ssize_t v_len = PyString_GET_SIZE(v);
    Py_ssize_t w_len = PyString_GET_SIZE(w);
    Py_ssize_t new_len = v_len + w_len;
    if (new_len < 0) {
        PyErr_SetString(PyExc_OverflowError,
                        "strings are too large to concat");
        return NULL;
    }

    if (v->ob_refcnt == 2) {
        /* In the common case, there are 2 references to the value                             
         * stored in 'variable' when the += is performed: one on the                           
         * value stack (in 'v') and one still stored in the                                    
         * 'variable'.  We try to delete the variable now to reduce                            
         * the refcnt to 1.                                                                    
         */
        switch (*next_instr) {
        case STORE_FAST:
        {
            int oparg = PEEKARG();
            PyObject **fastlocals = f->f_localsplus;
            if (GETLOCAL(oparg) == v)
                SETLOCAL(oparg, NULL);
            break;
        }
        case STORE_DEREF:
        {
            PyObject **freevars = (f->f_localsplus +
                                   f->f_code->co_nlocals);
            PyObject *c = freevars[PEEKARG()];
            if (PyCell_GET(c) == v)
                PyCell_Set(c, NULL);
            break;
        }
        case STORE_NAME:
        {
            PyObject *names = f->f_code->co_names;
            PyObject *name = GETITEM(names, PEEKARG());
            PyObject *locals = f->f_locals;
            if (PyDict_CheckExact(locals) &&
                PyDict_GetItem(locals, name) == v) {
                if (PyDict_DelItem(locals, name) != 0) {
                    PyErr_Clear();
                }
            }
            break;
        }
        }
    }

    if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) {
        /* Now we own the last reference to 'v', so we can resize it                           
         * in-place.                                                                           
         */
        if (_PyString_Resize(&v, new_len) != 0) {
            /* XXX if _PyString_Resize() fails, 'v' has been                                   
             * deallocated so it cannot be put back into                                       
             * 'variable'.  The MemoryError is raised when there                               
             * is no value in 'variable', which might (very                                    
             * remotely) be a cause of incompatibilities.                                      
             */
            return NULL;
        }
        /* copy 'w' into the newly allocated area of 'v' */
        memcpy(PyString_AS_STRING(v) + v_len,
               PyString_AS_STRING(w), w_len);
        return v;
    }
    else {
        /* When in-place resizing is not an option. */
        PyString_Concat(&v, w);
        return v;
    }
}

This one is a little more complex because there are a couple of neat optimisations.

Firstly, if the next instruction to be executed is one of a handful (STORE_FAST, STORE_DEREF, STORE_NAME) we save ourselves a few cycles by doing a little setup ahead of time.

However more importantly, if there's only a single reference to the destination string we attempt to resize it in-place using PyString_Resize() instead of allocating an entirely new buffer. We can check this is the case by forcing this condition to be false:

    //if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) {                                  
    if (0) {
        /* Now we own the last reference to 'v', so we can resize it                           
         * in-place.                                                                           
         */

If we build a Python 2.7.12 compiler without this resize optimisation, and retest:

	runtime (ms)	concatenations per second
Method 1	48.7949	409,878
Method 2	89.2184	224,168
Method 3	3.9204	5,101,535
Method 4	2.1489	9,307,231
Method 5	4.8215	4,148,073
Method 6	1.4203	14,081,697

We're right back where we started, with Method 1 being the second-slowest and the list comprehension used in Method 6 outperforming everyone else.

Conclusion

In the intervening years since the original benchmark was performed, an optimisation has been introduced to improve how the CPython VM handles string concatenation. In the context of this specific benchmark it means that if you need to concatenate strings you can go with a naive, slightly more intuitive implementation and performance very close to the recommended, more-pythonic implementation.

It's very likely that this is not applicable in all cases, I'm sure that if we have a number of objects allocated on the heap then it'll get slower very quickly (we won't be able to extend that buffer every time, and will need to find new locations) - however it's still an interesting and surprising result nonetheless.

Discover a Linux Utility - ischroot

When I open up dmenu and start typing out the program I want to open it provides some autocomplete suggestions based on the programs in my PATH.

I realised that there are hundreds of these, most of which I have never heard of before in my life. I realised that without a concerted effort I'd actually never end up learning anything about most of them - so I wrote a quick one liner to choose a random utility from my PATH and open up the man page:

    $ (for folder in `echo $PATH | sed "s/:/\\n/g"`; do ls -1 $folder; done; ) | shuf -n 1 | xargs man

How this works is basically splitting my PATH into new lines using sed, then listing the contents using one calling ls -1 and selecting one from the whole lot using shuf and attempting to open its manpage using man.

The first one I picked up was ...

Oh nice - ischroot. Well as the manpage says, this detects if we're currently running in a chroot. OK another quick mystery solved...

However that's doesn't exactly help us understand the whole picture. So let's start by asking "what is a chroot"? Well say we have a system that has a pretty straightforward set of folders in its root directory like the following

    $ echo $PATH
    /usr/bin:/bin
    $ ls /
    bin  boot  dev  etc  home  initrd.img  lib  lib64  lost+found  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  vmlinuz

So when we run a command like 'ls' it'll look for the utility in the two folders in our PATH (/bin and /usr/bin), then execute it on root directory of our file system. If we want however we can do a little trickery, and use the utility chroot so that when ls runs it sees an entirely different root altogether.

To try this we'll need to prepare our fake root directory:

    $ sudo mkdir /opt/fakeroot
    $ sudo debootstrap --arch i386 jessie /opt/fakeroot http://httpredir.debian.org/debian
    I: Retrieving Release 
    I: Retrieving Release.gpg 
    I: Checking Release signature
    I: Valid Release signature (key id 75DDC3C4A499F1A18CB5F3C8CBF8D6FD518E17E1)
    ... many lines later ...
    I: Configuring tasksel...
    I: Configuring tasksel-data...
    I: Configuring libc-bin...
    I: Configuring systemd...
    I: Base system installed successfully.
    $ ls /
    bin  boot  dev  etc  home  initrd.img  lib  lib64  lost+found  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  vmlinuz
    $ ls /opt/fakeroot/
    bin  boot  dev  etc  home  lib  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

OK so we now have a similar looking set of folders in /opt/fakeroot that we'd have in a freshly built Debian Jessie system - however we can run ls / using the chroot command so that it sees /opt/fakeroot as its root directory:

    $ sudo chroot /opt/fakeroot ls /
    bin  boot  dev	etc  home  lib	media  mnt  opt  proc  root  run  sbin	srv  sys  tmp  usr  var

OK nice, I can trick a process I've launched into thinking that any directory is root. However if I am launching a process like ls using chroot then surely I have no need for a tool like ischroot? Well the painful truth is that you may be running in a chroot "jail" without even knowing it. Some crafty (or security conscious) sysadmin may have set the system up so that, for example, when you connect via SSH your session is run inside a chroot. So assuming you've been entrusted with sudo priviledges, you can run ischroot and you'll be able to find out whether you are or not.

So just to replay the man page information, the return value of running ischroot should indicate whether we're running inside a chroot or not - the possible return values are:

value	meaning
0	running in a chroot
1	not running in a chroot
2	error occurred

So to test this out, let's run ischroot outside a chroot jail ...

    root@hanoi:/opt# ischroot
    root@hanoi:/opt# echo $?
    1

Cool, so that is exactly as expected. So if we switch over to our newly created chroot and run the same command...

    root@hanoi:/opt# sudo chroot /opt/fakeroot
    root@hanoi:/# ischroot
    root@hanoi:/# echo $?
    2

Ah, so instead of returning 1 we got 2 - indicating an error. After a bit of messing around it turns out we have just hit debian bug #685034 -" debianutils: ischroot fails to detect if it is running in a chroot". However we can determine if we definitely are not running in a chroot, but we have a bit of difficulty determining whether or not we are - a return value of 2 could be thought of as a "maybe". Since we don't like "maybe" in some cases, we can force ischroot to interpret "maybe" as "yes" using the -t flag:

    root@hanoi:/# ischroot -t
    root@hanoi:/# echo $?
    0

So not exactly a great start! However I am glad I learned about creating a chroot jail, even if ischroot isn't working as expected. Hopefully the next utility I pick up works a little better (and involves a smaller writeup).

Python - using a DualShock 4 with pygame

The Playstation 4 came out in 2013 and came with the DualShock 4 controller. I bought one in the midst of a particularly tough hangover, and when I received it two days later the part that most intrigued me was the controller. It's quite an amazing piece of equipment - I originally wanted to claim that the controller alone sported higher specs than the original PSX, and while the CPU is beefier (ARM Cortex M3 @ 160MHz vs MIPS R3K @ 33.8MHz) however the controller has a hundred or so Kb of RAM versus the PSX' 2MB.

That said the interfaces are pretty cool - in addition to the standard d-pad and X, O, triangle + square buttons there's a couple of analog sticks, shoulder buttons (2 x analogue) and a handful of accelerometers for motion sensing. All in all a nice little controller, no wonder mine needs charged every couple of days.

I thought that it'd be nice to play with some of these sensors and write OpenGL code in python. So I put together a quick demo - a 3D cube which is controlled by the accelerometers.

If we want to read events from the controller we can use pygame which can handle a lot of the fiddly parts (preventing the need for us to mess with something like evdev directly) - assuming we only have only one controller connected we can do this:

    import pygame
    controller = pygame.joystick.Joystick(0)
    controller.init()

What we'll do is repeatedly look for events, and then save the data we've read back into a couple of dictionaries - this roughly looks like the below:

    axis = {}
    button = {}

    # these are the identifiers for the PS4's accelerometers
    AXIS_X = 3
    AXIS_Y = 4

    # variables we'll store the rotations in, initialised to zero
    rot_x = 0.0
    rot_y = 0.0

    # main loop
    while True:

        # copy rot_x/rot_y into axis[] in case we don't read any
        axis[AXIS_X] = rot_x
        axis[AXIS_Y] = rot_y

        # retrieve any events ...
        for event in pygame.event.get():

            if event.type == pygame.JOYAXISMOTION:
                axis[event.axis] = round(event.value,2)
            elif event.type == pygame.JOYBUTTONDOWN:
                button[event.button] = True
            elif event.type == pygame.JOYBUTTONUP:
                button[event.button] = False

        rot_x = axis[AXIS_X]
        rot_y = axis[AXIS_Y]            

        # do something with this ...

It's important to note that we don't necessarily receive updated values each time we call pygame.event.get() - so we need to be careful about how we handle this.

Now that we've got some rotation values from the controller, we can use them to render and rotate a simple cube on screen:

    glPushMatrix()   
    glRotatef(roty_scale * rot_y, 1, 0, 0)
    glRotatef(rotx_scale * rot_x, 0, 0, 1)
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
    glBegin(GL_LINES)
    for edge in edges:
        for vertex in edge:
            glColor3fv((1,1,1))
            glVertex3fv(vertices[vertex])
    glEnd()
    glPopMatrix()

Download IMG_7558.MOV

However I could only see accelerometers for two axes - to perform the rotation through the third axis I wanted to use the L2 and R2 shoulder buttons. However these really are a funny beast. After a bit of experimenting I noticed that there's some extremely weird quirk. Joystick input using pygame can either return analogue ("axis") or boolean ("button") data - L2 and R2 buttons are unique in that they have both. The peculiar thing is how these values change based on how far the button is pressed - the below shows how the axis/button data changes as you press the shoulder button:

There's an ambiguity between the button being halfway and fully pressed. I'm not sure if this is a bug or just a missing feature in pygame. However since I was just using the buttons to control rotation, I was able to carefully select a scaling value (1.0 = 90 degrees) to apply to the value we read so that it's not possible to see this. Computer graphics has a long and storied history of such careful hacks to give the illusion of something working, so I was happy to let this slide!

I added a couple more things (shifting the cube around using the touchpad, lighting the cube depending on the buttons pressed) and the results are here this gist if anyone's curious.

Finally I wanted to use the lightbar - each DualShock 4 has an array of Red, Green and Blue LEDs which can be set independently to 256 different brightness levels. This is put to neat use in things like Grand Theft Auto - when the police are on you the lightbar flashes blue and red.

Sadly there doesn't appear to be any way to handle this through the pygame interface, and when I tried dropping to the lower-level evdev interface the leds() method didn't return anything.

I did however notice that a couple of devices appeared in the /sys/class/leds directory, so I could play around with them by writing 0 .. 255 values to these devices. Sadly they're not so easy to detect (the device name changes each time it's connected) so I have to find the device names using a slightly nasty regex:

Download IMG_7645.MOV

XMonad - NetworkManager menu using xmobar and Tkinter (xnm)

I recently moved to using XMonad semi-fulltime, which is working out nicely most of the time. However the one sticking point is that when I try to work somewhere other than my flat or my office I had to drop to the command line and use nmcli to connect to wifi or to enable my builtin 4G modem.

This is less than ideal, there doesn't appear to by any simple NetworkManager popup/interface I could integrate easily with my xmobar setup - I wanted to have a little icon launch a UI that allowed me to the wifi network to connect to or switch on my 4G modem.

To address this I put together a little python app using Tkinter which updates network connectivity settings through NetworkManager's D-Bus API. The app is launched by clicking an area on Xmobar containing the dynnetwork widget beside a neat little old-school TTY icon:

Clicking this area will raise a menu like the below - listing WiFi and Modem interfaces

Above you can see that /dev/cdc-wdm0 is connected to my "Vodafone CZ" connection - there's a little chain icon beside it. Clicking on this connection would disconnect. Selecting one of the WiFi networks would have either connected automatically (if it was Open) or raised a popup asking for a password (if it was password-protected).

To achieve this you need to do a couple of simple things. Firstly ensure that the necessary dependencies are installed, and checkout the code

    $ sudo apt-get install python-tk python-networkmanager
    $ git clone https://github.com/smcl/xnm

Then ensure your xmobar template has the following line, which will display the dynnetwork info beside a TTY (assuming Font Awesome is installed as Additional Font #1 in xmobar):

<action=`/home/sean/.xmonad/xnm.py`>%dynnetwork% <fn=1></fn></action>

And that's it!