Skip to main content

Posts

ScaleConf 2013

I attended ScaleConf a few weeks back. It was a really great conference - perhaps the best I have ever attended. Kirstenbosch Gardens is a fantastic venue. I took every opportunity between talks and after lunch to walk around the gardens and just absorb the peacefulness of the place. The speakers in general have interesting talks. And it is always interesting to hear from international speakers about the scaling issues they face.  And the food - both snacks and lunch at Moyo - was fantastic.  Here are a few of themes I took away with me: Scale horizontally, not vertically.  Do this by decoupling your service into smaller independent systems and use message queuing (RabbitMQ and friends) to get data to appropriate components.  Use NOSQL/memcache/key-value stores where possible since relational DBs don't scale well.  And measure everything so you know when stuff is broken. I'd highly recommend attending next year if you get the opportunity. 
Recent posts

Using MySQL as handle.net database back-end

I've recently been working on getting a CNRI handle (aka persistent identifier) server up and running. We wanted to use MySQL as our database back-end. Since the documentation didn't cover how to do this in sufficient detail so here is what I did. PS: We are using a pretty old version of the handle.net software - v 6.2. Hopefully this configuration still applies to the newer versions. Background By default the handle.net software stores its data in a Java jdb database. In my opinion this is less than ideal since there are no tools to manipulate or even view your data. You can, I guess, write your own java code to do this, but this is a pain. Everybody knows MySQL so we wanted to use it as our back-end. The handle server does support SQL but the document only provides a config example for postgres. (Strangely, it does provide a table layout for MySQL) Google also didn't provide any useful answers. So, in case it is useful to anybody else, here is my configuration

Normalizing a MAC address string

Over the last few days, I have been spending some time working on my python - reading the sections of Diving into Python that I have never got around to and refactoring parts of some of my python scripts to make better use of the features of language and, ultimately, to make them more robust (i.e. usable by people other than me). The script I have started with is a simple one for registering hosts for DHCP access. Basically, it takes two command line arguments - a fully qualified hostname and a MAC address - and then does some validation, checks that neither address is already in use, normalizes the output to the correct format, constructs a properly formatted host stanza and appends it to the end of our ISC DHCP servers dhcpd.conf configuration file. I have made improvements to various parts of the code but the changes I am most conflicted about are those I have made to the MAC address normalization function which works reliably and therefore probably isn't a good candidate for

A few days of relative fame

Here's what happens when somebody posts a few things you've written about / in python to reddit.com :

A more efficient method of sorting a list of IP addresses in Python

As ben, okplus and tom pointed out in comments to my Sorting a list of IP addresses in Python blog post, my sort function is very inefficient. As implemented, each IP address gets converted to decimal a number of times which is unnecessary and wasteful. Thinking about the bigger picture - using the function in an application - it doesn't make sense to convert the IP address to decimal in the sort function. Since my goal was to extend the work of the iplib module, the solution, I think, should involve using this module more efficiently, if possible. The iplib module represents an IPv4 address using its IPv4Address class. When initializing an instance of this class, the address gets converted to decimal and stored in the _ip_dec attribute, accessible by calling its get_dec() method. So by converting the raw IP addresses to IPv4Address instances outside the sort function (which I would probably be doing anyway) and using the get_dec() method in the function should improve things. S

Playing with Google AppEngine

I've just spent the last two days playing around with Google's AppEngine , and I like it. When my language of choice was perl , I wrote plain ole CGI web applications (if you can call them that) using perl's CGI module . In the python world though, web frameworks and WSGI seem to be the way to do things, so I've known for a while that I needed to learn one of the many frameworks out there. I spent about a week getting my head around TurboGears and the concepts of ORM a year or so ago using the TurboGears book . After a week I has finished the book and has the sample app working but had no more time left to spend on the project. Obviously, not using it, I've long since forgotten all of the TurboGears specific stuff. I had through to given Django my next try but AppEngine caught my eye in the meantime. AppEngine has a nice Getting Started Guide which I used to get a simple little application up and running on the local development web server included in the SDK

Sorting a list of IP addresses in Python

As I work a lot with network data, one of my favourite python modules is iplib . It takes care of quite a few of things I want to do with IP addresses but lacks a lot of functionality of perl's Net::Netmask which I relied on extensively when perl was my favourite language. One of the iplib missing features is a method for sorting a list of IP addresses, or at the very least, a method for comparing two addresses. Luckily this is easy enough to implement yourself in python using a customised sort function. See the Sorting Mini-HOW TO for a well written document on sorting in python. Here is my attempt at a custom function for sorting IP addresses. import iplib ips = ["192.168.100.56", "192.168.0.3", "192.0.0.192", "8.0.0.255"] def ip_compare(x, y): """ Compare two IP addresses. """ # Convert IP addresses to decimal for easy comparison dec_x = int(iplib.convert(x, "dec")) dec_y = int(ipl