Skip to main content

Normalizing a MAC address string

Over the last few days, I have been spending some time working on my python - reading the sections of Diving into Python that I have never got around to and refactoring parts of some of my python scripts to make better use of the features of language and, ultimately, to make them more robust (i.e. usable by people other than me).

The script I have started with is a simple one for registering hosts for DHCP access. Basically, it takes two command line arguments - a fully qualified hostname and a MAC address - and then does some validation, checks that neither address is already in use, normalizes the output to the correct format, constructs a properly formatted host stanza and appends it to the end of our ISC DHCP servers dhcpd.conf configuration file.

I have made improvements to various parts of the code but the changes I am most conflicted about are those I have made to the MAC address normalization function which works reliably and therefore probably isn't a good candidate for refactoring but (to me anyway) looks inelegant - something which I think matters in an elegant language like python.

The normalization function takes as inpute a MAC address in one of three (string) formats - unix (00:11:22:33:aa:55), Windows(00-11-22-33-AA-55) and Cisco (0011.2233.aa55) - and the same MAC address in the unix format.

Since I am trying to move towards a more test-driven development approach, I started out by writing a very basic unit test to make sure my new function is at least as reliable as the old function. Here is the code on the unit test (test.py):


#!/usr/bin/env python
import unittest
import sys
sys.path.append(".")
import oldmac as mac
#import newmac as mac

class Test(unittest.TestCase):
""" Unit test/s for MAC address normalization function """
normalize_values = (
('00:11:22:33:aa:55', '00:11:22:33:aa:55'),
('0011.2233.aa55', '00:11:22:33:aa:55'),
('00-11-22-33-aa-55', '00:11:22:33:aa:55'),
('00:11:22:33:AA:55', '00:11:22:33:aa:55')
)

def testMacNormalize(self):
""" Normalize MAC addresses to lowercase unix format """
for addr, expected in self.normalize_values:
result = mac.normalize(addr)
self.assertEqual(expected, result)

if __name__ == "__main__":
unittest.main()

Here is my orginal normalization function (oldmac.py):

#!/usr/bin/env python
""" Old MAC normalization function """
import re

def normalize(m):
""" Normalize a MAC address to lower case unix style """
m = re.sub("[.:-]", "", m)
m = m.lower()
n = "%s:%s:%s:%s:%s:%s" % (m[0:2], m[2:4], m[4:6], m[6:8], m[8:10], m[10:])
return n

There are two things I don't like about this code:
  1. Use of a regular expression for something as simple as eliminating the delimiters from the string. There must be a simpler way to do this.
  2. And the bit I find inelegant - the construction of the normalised string, n. It looks ugly :)

First off, here is the results of running my unit tests against this old function:

$ ./test.py -v
Normalize MAC addresses to lowercase unix format ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
So, I set out to rewrite the normalize function more elegantly. Here is the result (newmac.py):

#!/usr/bin/env python
""" New MAC normalization function """

def normalize(addr):

# Determine which delimiter style out input is using
if "." in addr:
delimiter = "."
elif ":" in addr:
delimiter = ":"
elif "-" in addr:
delimiter = "-"

# Eliminate the delimiter
m = addr.replace(delimiter, "")

m = m.lower()

# Normalize
n= ":".join(["%s%s" % (m[i], m[i+1]) for i in range(0,12,2)])

return n

The differences between this version and old version are:
  1. I replaced the regular expression with a simple string.replace. Why use a (something-big) when a (something-small) will do.
  2. I replaced the normalization expression with does the same thing but using a list comprehension.

And my unit test runs against this version:

$./test.py -v
Normalize MAC addresses to lowercase unix format ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
So, what do you think? An improvement or not?

Have I over-engineered the normalization code by replacing something static, simple and fast with something more complex, less readable and probably slower?

Is there a better way of normalizing a MAC address string?

Note: I have just presented one aspect of the refactoring on this script - the MAC address normalization. This is the part I am most conflicted about because the old code worked fine but the new code looks "cooler" :) . The other improvements such as the more robust validation I have left out (perhaps for another blog post).

Comments

Popular posts from this blog

Sorting a list of IP addresses in Python

As I work a lot with network data, one of my favourite python modules is iplib . It takes care of quite a few of things I want to do with IP addresses but lacks a lot of functionality of perl's Net::Netmask which I relied on extensively when perl was my favourite language. One of the iplib missing features is a method for sorting a list of IP addresses, or at the very least, a method for comparing two addresses. Luckily this is easy enough to implement yourself in python using a customised sort function. See the Sorting Mini-HOW TO for a well written document on sorting in python. Here is my attempt at a custom function for sorting IP addresses. import iplib ips = ["192.168.100.56", "192.168.0.3", "192.0.0.192", "8.0.0.255"] def ip_compare(x, y): """ Compare two IP addresses. """ # Convert IP addresses to decimal for easy comparison dec_x = int(iplib.convert(x, "dec")) dec_y = int(ipl...

Recursive Descent Parsers and pyparsing

Yesterday while browsing the table of contents of the May 2008 issue of Python Magazine I came across a reference to the pyparsing module - a python module for writing recursive descent parsers using familiar python grammar. O'Reilly's Python DevCenter has an excellent introduction to using this module entitled Building Recursive Descent Parsers with Python . Well worth a read. It just so happens that I have a number of projects which are stalled because writing code to parse complexly structured data is not my strong point. I enjoy parsing up text line by line as much as the next guy but this recursive stuff I find tedious. The ISC DHCP configuration file is, in my opinion, a good example of parsing complexity. It's configuration directives can contain many optional directives, can be nested, and can be all on a single line or broken up move multiple lines. Writing the parser using pyparsing makes this much simpler. Here is a simple example of using pyparsing to parse...