Discover a Linux Utility - slabtop

To learn more about Debian and Linux in general I'm selecting utilities at random from my PATH using the command below, learning what they do and writing a blog post about it. Previously: Part 1

    $ (for folder in `echo $PATH | sed "s/:/\\n/g"`; do ls -1 $folder; done; ) | shuf -n 1 | xargs man

Today's randomly selected utility is called slabtop - according to the man page it should "display kernel slab cache information in real time". I was pretty pleased about this one actually, I had no idea about slab allocators so it was good to have something to explore on a stormy day:

Put simply, Linux permits us to manage and allocate chunks of memory from a set of fixed-size buffers - called slabs - for different purposes. And slaptop gives us a view into the state of each slab cache: what they're called, how many elements are allocated in total and how full each slab is:

So for the highlighted row, we can know the following information:

- name ("kmalloc-32")

- number of slabs allocated (163)

- number of objects allocated (20007)

- percentage of slab space allocated (98%)

- total size of the slab space allocated (652K)

... and so on.

What is slab allocation

So what does all this mean and why do we need it? During the execution of some program we may have requested and released  lots of differently sized objects from the heap. This could result in a pretty fragmented heap.

This could mean that allocating new buffers is either a little slow or unpredictable, since we need to search for an appropriately sized buffer before we can mark it as allocated and return it to the calling code. If any part of our application depends on a particular object being allocated quickly and predictably this is not ideal. 

Slab allocation involves setting up a slab which can hold certain amount of objects of the same size and type. Since we're dealing with fixed-size objects allocating space for a new object is quick - we just need to scan through an array to find an unused slot, mark it as used, then calculate the address (start_addr + size * index) and return it.

We could roll our own implementation since it's pretty straightforward to implement - and that would be quite an interesting project for an evening or so. However if we're writing Linux kernel or module code there's already an existing set of functions which are well understood and battle-hardened. In addition these functions hide a lot of underlying complexity by abstracting a lot of the memory management (we can actually have numerous slabs, but Linux manages this for us).

How Linux supports slab allocation

To start with, if we want to create a new slab cache (a managed collection of slabs) we can use kmem_cache_create:

    struct kmem_cache *kmem_cache_create(
	    const char *,      // name of cache
	    size_t,            // size of each object
	    size_t,            // object alignment
	    unsigned long,     // any special flags
	    void (*)(void *)   // slab constructor
    );

To allocate a new object from this cache we can use kmem_cache_alloc:

    void *kmem_cache_alloc( 
    	struct kmem_cache *, // ptr to the cache 
    	gfp_t flags          // flags to use if we need new slab
    );

Then when we need to free any, kmem_cache_free which will mark the space as unused and make it available to the allocator:

    void kmem_cache_free(
    	struct kmem_cache *, // ptr to the cache
    	void *               // ptr to object to free
    );

And finally when we're done and we want to remove the cache entirely there is kmem_cache_destroy which will free and release all the slabs related to a given cache:

	void kmem_cache_destroy(
		struct kmem_cache *  // ptr to the cache
	);

How to use slab allocation

As an example of how this could be used we can think about it in the context of a simple kernel module. I'll roughly go over the individual parts, and then present a full-blown module to explore. Firstly we'll want to defined the type we want to create the slab cache for  - imaginatively named slabthing - and use it to declare a variable s:

    typedef struct
    {
      char foo; 
      char bar; 
      char baz;
    } slabthing;

    slabthing *s;

Inside our modules init_module we can use kmem_cache_create - so that our slab cache will be called "slabdemo", it'll allocate objects with the required size. Since I don't really care much about performance (specifying an appropriate alignment could permit us to use different/faster load instructions) so we'll just request it to be single byte-aligned.

    struct kmem_cache *slabthing_cache;

    int init_module(void)
    {
      slabthing_cache = kmem_cache_create("slabdemo", sizeof(slabthing), 1, NULL, NULL);
      // other module stuff...
    }

If our module wants to allocate space for the variable s in our cache we can call kmem_cache_alloc:

    s = kmem_cache_alloc(slabthing_cache, NULL);

When we want to free the cache, in this case in our module cleanup code, we can call kmem_cache_free and kmem_cache_destroy. I don't think the free is necessary in this case, but I've just included it anyway:

    void cleanup_module(void) {
        kmem_cache_free(slabthing_cache, s);
        kmem_cache_destroy(slabthing_cache);
     }

To see this in action I've created a little demo module that creates a cache and then allocates 128 objects when it's loaded - then frees all the objects and destroys them when it's unloaded. You'll need to make sure you have the sources checked out for your current kernel inside /usr/src/linux-<version>:

    $ git clone http://github.com/smcl/slabdemo
    $ cd slabdemo
    $ make 

If our module builds successfully we can load it using insmod and then fire up slabtop to see how things look:

    $ sudo insmod slabdemo.ko
    $ sudo slabtop

So there it is - our slabdemo cache is there, with 128 objects allocated in a single slab which can fit 240 objects in total (peculiar number!). If you check out slabtop you can easily see things like kmalloc-512, kmalloc-1024 and kmalloc-2048 which are probably slab caches used by kmalloc for allocating memory pools of 512, 1024 and 2048 bytes respectively. In any case, that's another issue for another day. Obviously this is a tool designed for kernel developers, or for sysadmins of particularly performance-critical sysadmins, but I'm glad I took the time to potter around with it.