Sunday, February 26, 2012

Python and byte order marks


I am working with some files that have byte order marks (BOM) at the beginning. The first bytes in the file are FEFF. When I load them with the standard Python open command, the first two characters of the first line are corrupted with these two bytes before any text. There are two basic problems here, easily solved.

First, the file is encoded with UTF-8, which is unicode and not plain ASCII (though in most cases UTF-8 is directly compatible with ASCII). Whether or not this file contains any non-ASCII characters represented with multi-byte Unicode, it is more appropriate to load it encoded as UTF-8.

Second, the byte order mark is showing up as part of the data. I could just assume it is there, read the first two bytes and ignore them, but I need to let Python handle it properly so I don't have to think about it. Writing special code for the first two bytes of the first line is annoying and ugly. Note that a BOM is not required for a UTF-8 file, but many applications—particularly on Windows—include it.

Saturday, February 18, 2012

Connect to wifi at boot

Like all good nerds, I have needs for computers in several places around the house, but I have little desire to run cat5 cables all over the place to provide internet connectivity. That's what wifi is for. In most Linux distributions, though, the wifi connection is dependent on the user login and storing the keys in the users keyring. Consequently, there is no internet access via wifi until after the user logs in. With some simple configuration, though, it is possible to connect to wifi on startup and have that connection available even if the user has not yet logged in.