Steven email: sjh@svana.org
web: https://svana.org/sjh Other online diaries:
Aaron Broughton, Links:
Linux Weekly News, Canberra Weather: forecast, radar.
Categories:
|
Mon, 15 Jul 2013
My annual animosity toward ATO for their failure to provide open eTax - 21:31
I was happy to see some more information has been released, thanks to Andrew Donnellan about the underlying activity behind this lack of support. Fri, 09 Dec 2011
My software works too well, change it back - 10:23
It appears the profiling and lower memory foot print work various gurus in the kde and gnome and similar camps has paid dividends as there appears to be a pretty big drop in usage and memory leaks here and everything feels a bit faster all of which is good news. Not that I have done any real testing but perceived feel is relevant to some extent in a computing environment. The most amusing thing here I thought was my interpretation of how he asked the question, it sounded almost as if something was wrong. As if James was saying "my computer is not using enough memory, and is running to fast, fix it, make it as slow and hoggy as it used to be". I guess at least he was not about to request a change to a computing system that seems to constantly get slower and more user unfriendly with every major release.
Finally faster - 10:23
Back in November I mentioned this to Steve Walsh of Nerdvana, he told me they do colo, and would throw in new hardware (leasing arrangement) all for less per month than we are currently paying and colocated in a rather nice facility in Sydney. Martijn and I thought this sounded tops so signed up. Finally we shifted all the domains and config and data and everything across for the final time last night and we now are actively using the new server for all domains we host and everything else. The new machine is definitely a nice step up, now a Dual 3 GHz Xeon with Hyperthreading, 1 GB of RAM and 2 250 GB SATA drives configured in RAID 1 for full redundancy. Damn this new machine is fast, operations that used to take a few minutes now happen in 2 or 3 seconds. Finally I can do a few things I have been holding off from doing on the old machine for a while, either for lack of disk space, lack of memory or incredibly high load caused by trying to do the things I had in mind. Heck I may even add some sort of comments thing to this diary (Jane reckons I need comments here) One of the other problems with the old machine was I had never gotten it to cleanly boot up into a kernel newer than 2.2.20pre2, which meant ancient firewalling, probably a few vulnerabilities, inability to try some new things that may have been interesting and a few other issues. The machine was also running Woody, so it is nice to have Sarge with a few even newer bits on the new machine. RIP calyx.svana.org, long live calyx.svana.org (we did not change the name, which was confusing once or twice while moving config over).
[15:46:41] 9 calyx sjh ~> sh -c 'cat /proc/cpuinfo ; free ; df ; uname -a' | egrep 'MHz|Mem|cg0-data|Linux' cpu MHz : 3000.269 cpu MHz : 3000.269 cpu MHz : 3000.269 cpu MHz : 3000.269 Mem: 1036352 1001088 35264 0 68208 713860 /dev/mapper/vg0-data 235694888 8981204 214741076 5% /data Linux calyx 2.6.14.3 #1 SMP Fri Nov 25 23:43:09 EST 2005 i686 GNU/Linux [15:47:27] 10 calyx sjh ~>
Obscurity, P=NP etc, Hash Visualisation - 10:23
It is interesting to see some companies such as Kryptonite eventually reacted, others seem intent on denying public information, or trying to shut down people who know about it. In computing it is a well known fact (although still ignored by too many people/companies) that security through obscurity will not work, public design and analysis by experts in the field however does work and should be used for things that need to be secure. Although one aspect that comes to mind here is that in the case of locks you may not want to make them impossible as other attack vectors are then used. As the article mentions crooks seem to prefer using a hammer (or maybe explosives) over opening the locks through lock exploits. There were some discussions about this in the car that were I think linked to by Schneier a few years back. Next was an interesting wikipedia page linked to by kottke, a list of unsolved problems from a number of different field, those listed in Computing are familiar, however looking through the collected information on those in other fields is pretty fascinating. Mmmmmm wikipedia goodness. Catching up on some LWN reading and I see the mention of a new OpenSSH version approaching, in the list of new features is "Experimental SSH fingerprint visualisation" with a paper (pdf) linked. So I download and had a read of the paper, largely to see what sort of images they generate. It is good to see some work on what is one of the biggest security weaknesses out there, the humans using secure systems. Wed, 06 Apr 2011
Connection limiting in Apache2 - 16:01
Looking at the logs it was interesting to note the User-Agent was identical for each request even though it was coming from so many different ip addresses. So I had the situation of needing to limit connections to a certain type of file or an area on disk via apache so as not to have resource starvation and no download blow outs. Looking around for ways to do this in apache2 there was not a whole lot of options already implemented, some per ip connection limits in one module, some rate limiting in another module, but no way to limit connections to a given Directory, Vhost or Location immediately turned up. Fortunately a few different searches eventually turned up the libapache2-mod-bw package in Debian. As it says in the package description This module allows you to limit bandwidth usage on every virtual host or directory or to restrict the number of simultaneous connections.This was the solution it seemed, so I read the documentation in the text file in the package, enabled it on the server and got it working. To get it working pay attention to the bit that says ExtendedStatus needs to be enabled before the LoadModule line. Then you can simply place it in a Directory section in your main config file for a given vhost. I configured it with the following section ForceBandWidthModule On BandWidthModule On <Directory "/on/disk/location"> BandWidth "u:BLAHBLAH" 200 BandWidth all 2000000 MaxConnection "u:BLAHBLAH" 1 MaxConnection all 10 </Directory>Which says if the user agent has the string "BLAHBLAH" in it anywhere limit to 200 bytes per second and later 1 connection allowed from that user agent to this directory. I thought it worth while to put in a limit on all connections to the directory of 10 just in case the user agent changes and it will not starve the machine or max out the link. Initially I had the limit of 10 without limiting the user agent more and the DOS was simply using up all 10 and thus no one else could connect to and download these items. Fortunately so far this seems to be working and I can monitor it for a few days to see the resultant behaviour of the attack. Thanks to the module author this seems to work fairly well and was easier than writing a mechanism inside apache2 myself to limit the connections in the manner required. Sat, 05 Feb 2011
New toy - A netbook with great battery life - 15:43
I have been reading about various netbooks for a while, and finally I realised I have a good laptop for the things I need a laptop for provided by work. However when travelling it is handy to have something less important and expensive, with better battery life. Everything else can be easily dealt with. The Samsung series of netbooks regularly had the best battery life mentioned in reviews, so looking at the models in stock at JB Hifi the NF 210 was claimed (by Samsung) to have 14 hours. Most Linux reviewers of it seemed to suggest 8 to 10 hours was the norm. So I headed over to buy one. For AUD $437 I got a 1 GB RAM, 250 GB hdd, Atom N445 dual thread (I think) netbook with a 1024x600 screen and a huge battery life with the 6 cell battery it came with. At lca I was able to leave the (rather minimalist) power adapter where I was staying and just take the netbook, it easily lasted the whole day open during all talks and using wireless the whole time plus some other usage. Gnome power battery status suggests 12 hours from 100% charge with the screen on minimum brightness, right now I am typing this outdoors with the screen at 50% the battery is at 50% and the report suggests 5 hours remaining. I installed a standard Debian Squeeze netinst install off a usb stick and downloaded an identical set of packages (almost) to those on my laptop, no need for a restricted environment as it is a fairly powerful computer anyway. Pretty much everything has worked well under Linux, the only slight complication was the need for a ppa samsung-backlight deb to control the backlight from the keyboard. The backlight seems to go dim on no use even when those options are not selected in gnome power manager so also something that could be investigated. Also Paulus bought one and had a few problems with it freezing due to the closed wireless firmware on resume from suspend it seemed. I have had one lockup (possibly related) but it has not been a problem. The wireless driver does need to be reloaded on resume from suspend before it works (easy to do) but that is something I may be keen to look into at some point. I should not be surprised though how easy it was to have a capable working Linux system, that is often the norm with hardware these days (especially with much better/broader driver support than any other operating system). For getting around the place some light work (compiling, interpreters, emacs, web browser, etc) it is a capable system and not lacking. I am a happy purchaser, even though my first one had to be returned within three hours of purchase due to a failed hard disk, since then it has been excellent. Fri, 16 Jul 2010
Today's strangely named Debian package - 16:25
This definitely sounded odd as the package name suggests it is some sort of perl instruction package. When I looked at the output of apt-cache show perlprimer I thought it even stranger. In the description is the following "open-source GUI application written in Perl that designs primers for standard Polymerase Chain Reaction (PCR), bisulphite PCR, real-time PCR (QPCR) and sequencing.". So this is in fact a genetic research related package, with the name perlprimer (it is admittedly written in perl). I know Debian packages tend to be named on a first in first named basis, however this definitely strikes me as deceptive/strange. Obviously all mad gene scientists are out there trying to hide their work with deceptive package names... or something. Fri, 02 Jul 2010
USB key destruction, Game On! - 11:15
Wed, 24 Feb 2010
More on the google search mechanism - 16:32
I liked the story about dog and puppy searches and people looking for hot dogs, someone should make a t-shirt, Google: No longer boiling puppies since 2002. Or something. It is also interesting to think they use all incoming searches as some form of testing or control for other tests. The scale of the operation and being able to respond fast is still the most impressive thing about it I think. Also the internal best of search ideas conference and meetings sound like an interesting way to get ideas working. Also everything has to be backed up with results to show it improves things. On a side note, Kate and Ruth are awesome, listening to their cover of the Dylan song Let Me Die In My Footsteps and I am reminded how good they are together singing. Wed, 30 Sep 2009
Rockbox freezes on ogg files fix - 17:41
Happily now I have discovered this problem I have easily removed all the id3 and id3v2 tags form ogg files on the device with "find -name '*.ogg' -print0 | xargs -0 id3v2 -D" and hey presto I can now play all these files again easily. The ogg/vorbis tag remains intact, for some reason I had add id3 tags ticked in grip without restricting it to files ending in .mp3. Mon, 17 Aug 2009
dc++ or new digital camera - 14:43
Fri, 07 Aug 2009
Tracking down disk accesses - 14:02
I started looking around to see how to track down the problem. I have used iostat in the past to give me some details about activity happening. However the problem with that is it does not tell you which process is doing things. Running top also is not any good as it does not identify io and it also does not show cumulative use/hits as iostat and similar tools do. There is a python program called iotop I have not tried yet, also there are files in /proc now days you can laboriously work your way through to track down some of this information. However while looking around and reading some of the man pages I discovered the existence of pidstat. This program is fantastic. It can display accumulated disk,vm,cpu,thread information on a per process basis. This is a program I have wished I had for years. So I ran pidstat -d 5 and watched to see what was writing to the disk so often. First I noticed the predictable kjournald. Rather than messing around trying to change commit interval for this I found there is a laptop-mode-tools package I should have had installed on my laptop. I have now installed it and enabled it to operate even when AC power is plugged in and now kjournald seems to be able to go for minutes at a time without needing to write to disk. Next I noticed xulrunner-stub was writing often and causing the disk to spin up now it was spun down due to laptop_mode. This is firefox (or iceweasel in the debian case). I found details suggesting firefox 3.0.1 onward had an option to decrease the save/fsync regularity and that 3.5 and up should be even better. I installed 3.5.1 from debian experimental and found another page with 28 good firefox tips, one of which actually told me which about:config option to change to decrease the save/sync interval. So the disk is not always spinning up or constantly accessing now, though there still appear to be a few culprits I could track down more information on in the pidstat output. Also I may want to play around with more proc settings such as /proc/sys/vm/dirty_expire_centisecs which can change how often pdflush sends stuff to disk and there are other suggestions around which if I think about may help too. Update: I also have since first writing this found a good Linux Journal article on what laptop mode does. One of the reasons I am so excited about pidstat is it helps with my work a lot, if there is a problem with a mail server, or student login server or any number of other machines. Getting a read out of this information by process accumulated over time is really useful to work out what is causing issues and thus work on controlling and preventing problems. Fri, 10 Jul 2009
An easy to use cheap DVB-T USB dongle - 11:35
Anyway I did not have time to go into town and buy a digital tuner to go with the tv, however the on campus computer shop has USB DVB-T dongles. One they had for sale (rather cheap at $68) is the Leadtek Winfast DTV Dongle Gold, which according to this page at the mythtv wiki works well on Linux. So I bought one, plugged it in, discovered that the driver is already in the 2.6.30 kernel I am running, the firmware linked to on that page first however was buggy, some forum posts suggested running the latest 4.95.0 firmware rather than 4.65.0 and it would work (it did). After playing around with a few tv programs I settled on simply using xine which can tune into all the channels I scanned for Canberra. I am happy to say this works a treat and only took about 5 minutes to make it work. Of course if I make it to swimming tonight I may try to come back past the shops in Civic and buy a tuner box for the tv in the lounge room so everyone can enjoy the tour at home. Though I do not watch tv much I have to say it is sort of exciting to have a tuner that only takes up a really small amount of space (the dongle, antenna and short usb cable extension are about the size of a small USB hard drive all up) in my laptop bag and works fine in most places like this. Wed, 08 Jul 2009
Success with WPA2 - 17:58
Due to concerns that my driver for the iwlagn driver could be bad I upgraded my laptop kernel to the Debian sid 2.6.30 packages, I also then downloaded the latest wireless kernel drivers and installed them. Also the three programs mentioned, iw (new interface to wireless stack in Linux), crda and wireless-regdb. Eventually I am not entirely convinced those things helped, many forum complaints for Ubuntu and other systems said network-manager had issues and to try wicd. My initial efforts with wicd failed. Eventually while reading some efforts someone else had made to work out what was happening on their system I saw someone using the rather simple iwlist tool to scan for the capabilities of the secure access points. When I did this I notice the ANU-Secure access points all advertised the following.
IE: IEEE 802.11i/WPA2 Version 1 Group Cipher : CCMP Pairwise Ciphers (1) : CCMP Authentication Suites (1) : 802.1x I had previously been trying TKIP and WPA2 when I tried wpa_supplicant alone without a manager on top. WPA2 and RSN are aliases for each other in this instance. Anyway with the new drivers and the sid wpa_supplicant I was able to get a wpa_supplicant.conf with the following to work on ANU-Secure.
ctrl_interface=/var/run/wpa_supplicant ctrl_interface_group=root network={ ssid="ANU-Secure" scan_ssid=0 proto=RSN key_mgmt=WPA-EAP eap=PEAP pairwise=CCMP group=CCMP identity="u9999999" password="PASSWORD" phase1="peaplabel=0" phase2="auth=MSCHAPV2" # priority=1 } Then I looked through the wicd templates for one that had the minimum needed and noticed the wicd PEAP-GTC template had the desired fields set. So now in wicd I can access ANU-Secure from the desktop with no problems. I really should test out older drivers and some other configurations, also try out network manager again I think. Works for now though, I can finally stop wasting so much time on this. Thu, 02 Jul 2009
A regression for WPA2 - 18:20
I thought maybe there was some problem with my laptop hardware and maybe the iwl4965 chipset simply would not do it under Linux. However searching online suggested I should be able to make it do WPA2. Thinking maybe the Ubuntu people had done it right and Debian was missing something I tried booting a Jaunty live cd. I also discovered the rather neat feature of suspend to disk (hibernate) in that you can hibernate your computer, boot off a live cd, use it, reboot and have your existing session come right back up normally on the next boot. Anyway I booted up Jaunty and tried to authenticate, still failed in a similar manner to my Debian installation. Out of curiosity as I had heard of hardy working I booted my laptop on a hardy live cd. So network manager and iwlagn driver combined on either Debian sid or Ubuntu jaunty had failed to authenticate. Ubuntu hardy on the other hand, using an older version of network manager and the iwl4965 driver in the kernel worked fine. WPA2 authentication and use on the ANU Secure wireless network. So now I need to find out where the regression has happened that means WPA2 is broken in more recent releases of the software (kernel drivers, wpa supplicant, network manager) on either Debian or Ubuntu. Mon, 01 Jun 2009
An interesting languages comparison - 15:45
Back in 1999 and 2000 I put a pretty trivial example of a single problem being solved in multiple languages online. In this case scanning html for entities, largely because I was mildly interested in how different languages and the different implementations of them may solve the same problem and the time it would take. I say mildly interested because it is such a trivial example and because I did not put much effort in. (I was amazed a few weeks ago to get an email from someone rerunning these to see if recent Java implementations had caught up to c yet). The person who wrote this speed, size and dependability post put a lot more effort in and actually was able to draw some interesting conclusions about languages and how they work and develop over time. For the geeks out there I recommend having a look. Thu, 14 May 2009
More open source required in government - 12:36
Today Schneier had some information on breathalysers that due to court orders finally had the source made available for some analysis. This is not the same breath test system as used in the Florida case from what I can tell at a glance (this was a New Jersey case), however it definitely opens your eyes once more on how crap closed source software can be (and yes I admit lots of open source software can also be crap) and you will have no idea, and no way to fix it. Any software used in law enforcement in such a way that it could be so incorrect or wrong and yet still cause someone to lose their licence or gain a criminal record really should be opened up, at least to the agency/government/force using the software, if not open to all people. Wed, 06 May 2009
Reminders and bugs - 17:16
As for the bugs thing, I think there is a bug in the Ubuntu 9.04 libnss-ldap, I found a problem where it was not reading something configured by the install from ldap.conf and I need to do a little bit more testing before submitting a bug report. Tue, 21 Apr 2009
Another change to cache_timestamps for perl 5.10 - 11:28
It used to be something like this
{ my (%h1,%h2); sub wanted { $h1{$File::Find::name} = "someval"; } find (&wanted, "topdir"); } However when I changed to perl 5.10 though the assignment seemed to work (blosxom runs without -w or use strict enabled) if I tried to display %h1 inside wanted or tried to use it like a hash I got a weird error "Bizarre copy of HASH in refgen" at the line of code I tried to use the variable as a hash. Looking at other uses of File::Find it seems everyone used anonymous subroutines from the call to find. I have changed the code to do the following.
{ my (%h1,%h2); find (sub { $h1{$File::Find::name} = "someval"; }, "topdir"); } And now the hashes are in scope and not some so called Bizarre copy any more. The code for the cache_timestamps plugin can be found here and details about cache_timestamps are in my comp/blosxom category. Update: found some details, rather than searching for the error message I started searching for variable scope changes in 5.10. Found this page talking about state variables being available in 5.10 as my variables are not persistent across scope changes. Fri, 19 Dec 2008
Thu, 27 Nov 2008
Easy Dell HSDPA SIM access - 12:15
Originally I had no intention to use it, and the laptop came with it specced for Vodafone usage. Recently however Telstra and Optus have both started offering prepaid wireless broadband. I was wondering how easy it would be to change the SIM to one of those networks. After all lsusb currently outputs Bus 002 Device 019: ID 413c:8138 Dell Computer Corp. Wireless 5520 Voda I Mobile Broadband (3G HSDPA) Minicard EAP-SIM Port The book that came with the laptop has good instructions on how to pull it apart and access various parts of the hardware. So I had a glance at the WWAN instructions and was easily able to open it up and look at the device. However when I did this I discovered that the SIM was not attached to the device at all. At this point I googled more accurately for details about the location of the SIM in Dell laptops with HSDPA devices. It was at this point I discovered an article on the Register that said Dell's are not tied to Vodafone and quite plainly pointed out to me the location of the SIM is in the Battery bay. And hey presto an easily accessed Vodafone SIM is indeed sitting right there, it should be no problem to put a Telstra or Optus SIM in on a prepaid plan. Telstra appears to have better coverage by far, their USB devices may or may not work with Linux, however I know from experience the Optus USB device does work with Linux. However I do not need either for this laptop, Optus offer a SIM only prepaid kit for AUD $30, Telstra do not mention offering it, however forums suggest you can walk into a Telstra shop and ask for a 3G prepaid kit and request that it be wireless broadband enabled for around AUD $30 also. The other nice thing I would like to note from this experience is how good the book that came with the laptop from Dell is, that it has good detail about accessing most of the hardware in the laptop is very useful and means you are less likely to break things if you want to look inside. Thu, 25 Sep 2008
Doing it backwards or unlink returning ENOSPC - 16:28
Due to the way snapshots work on ZFS there is a possibility you will get an ENOSPC returned when trying to unlink (rm) a file. This is of course completely reversed from the intuition most people will have, to free up space remove some files. Out of curiosity I looked in the unlink man page on Linux and in the rm source code on Linux, at a cursory glance neither of them will deal with ENOSPC (unlink does not mention it as an error). Without testing my guess is that in such a case unlink (2) would return EIO. Tue, 08 Jul 2008
How to capture one image from a v4l2 device - 17:22
"gst-launch-0.10 v4l2src ! video/x-raw-yuv,width=640,height=480 ! ffmpegcolorspace ! pngenc ! filesink location=foo.png" As one command captures the image at that resolution into a file foo.png. This is on my laptop, however I tested this with the QuickCam 9000 on my desktop with a resolution of 1600x1200 and it worked, the focus meant it took a while but it popped out a good image. Gstreamer really is cool, I still remember seeing Federico talk about GMF (Gnome Media Framework, which is what became GStreamer) at CALU in 1999 and being excited by it. Fri, 13 Jun 2008
Interest in data from an email spike - 13:56
I can not find the department NTEU person to learn if there are any numbers on how many staff on campus are actually union members, nor can I get hold of the campus wide email system admin people so I can not predict how much this hit storage and network load on the email systems campus wide. I could do some analysis on the department email server, though I am not sure if that would provide much insight. As I suspect there are a fairly large number of union members on campus and they all will have received this email as it is valid email and will have come in through the spam filters. Thu, 29 May 2008
Some system config updates - 15:39
After lots of mucking around with fontconfig and other things trying to track down the issue, Tony suggested I look at the resolution for fonts in GNOME System -> Preferences -> Appearance :: Fonts :: Details wondering what my DPI for fonts was set to. His was set to 96, mine however was at 112. So I changed this and all of a sudden the font in gnome-terminal could look identical to my xterm fixed font. Rock on, something I should share with the world here in case it comes up for others. Getting the font size right in the terminal application is important as my brain is so used to a certain look there. On another note I should probably stop bagging the nvidia setup as much as I have been, sure it is a pain I can not use xrandr commands to automatically do funky stuff in a scripted environment, however I can at least use the gooey tool nvidia-settings to do the stuff I want, even if it is not as nice as doing things automatically. Still it sure would be nice if nvidia opened up and allowed open source development with full specs to the hardware. If this laptop had been available with the Intel chipset I would have specced it with that for sure. Wed, 28 May 2008
Yet another sign I may work with computers - 18:26
Mon, 26 May 2008
Mon, 19 May 2008
Little laptops that can - 18:15
All this in such a small package is mind boggling to pretty much anyone who has been around computers since 486 or earlier model chips powered most PCs. I doubt I will be getting any Heidelberg Scars now. Thu, 08 May 2008
Move a little thing to python - 13:44
Sometime last year I realised that though the URL I was using on the ANU Internal Web still worked it seemed not to interface with the latest phone database for the uni so it sometimes did not match people I knew worked on campus, other times it contained out of date numbers for people. However there were other important uses for my time so I did not bother looking too closely into updating it when most of the time the old results were still good enough. Finally this week Bob noticed there were no matches coming back, it seems the old interface no longer connected to the database correctly. Thus I opened the program and had a look at updating it. The old program used LWP to fetch the page with a GET request. The newer interface now on ANU Web works properly with a POST request. Also the result page is more complex to parse than the old one (more complex regular expressions, or maybe a small state machine needed). Still it did not look too hard to spend an hour or so fixing the old perl code up to get the new page and parse it properly for the desired results. However I hit a snag when for some reason LWP did not fetch the entire result from the web server that was returning the data in chunks. A tcpdump session showed it simply closed the request rather then fetch all the data. At this point I could have debugged the perl code and fixed, after all there is no good reason LWP should not work. However I thought to myself, I have been keen to write python a bit for a while. Bob bought the Mark Lutz Programming Python book for my office and I read through about half of it. So why not rewrite the program in python. See how a perl hacker can transfer to using python at least for a small program. I am happy to say that the page fetching in python even made perl look complex, the code that did the job (and worked, doing a post request fine) was
name = ' '.join(sys.argv[1:]) params = urllib.urlencode({'stype': 'Staff Directory', 'button': 'Search', 'querytext': name}) f = urllib.urlopen(searchuri, params) r = f.read() Cool I thought, this is hell easy, what a fantastic language, I will forever give up my perl ways if everything is this easy and obvious. Obviously this was not going to last, I guess partly because my brain meshes with perl well after so many years, and I am used to perl associative arrays, classes, modules, and regular expressions. Anyway I now had my result from the search and all I had to do was parse it and extract a form that can be printed on a terminal nicely. First I tried using the python regular expression matching and needed to create some hideous regexp to match the data returned. I also discovered that when a search matches more than about 2 people the data is returned in a different format. Fortunately in this second case the format is really easy to match against with a regexp. Even though the regexp language is similar/identical to perl I was still getting my head around the documentation for all of what I was doing and could not at first construct a regexp that made sense to parse the first sort of data. So I decided to get a HTMLParser and extract the data I wanted without the crap in the tags. My first attempt was to use the HTMLParser module, however I soon found that this threw an exception when ever I fed it the page from the uni with the matches in it. I tried except: pass in the hopes it would keep on going, however it stopped there and did not process the rest of the page. So I had to change to using the htmllib.HTMLParser which was almost identically easy to use and managed to process the entire page. Next I wanted to store the data until all matches were found, in perl this would be trivial using a multiple level hash or an array of hashes. Of course the most obvious way to do this in python now I think about it is using a list of dicts. However I had my brain stuck on using a multi level hash. I found this was most difficult in python as you need to initialise dict entries and can not simply assign arbitrarily into them when you need. I needed to use the following construct.
if (D.has_key (key1) == 0): (D[key1]) = {} if ((D[key1]).has_key (key2) == 0): D[key1][key2] = '' s = D[key1][key2] D[key1][key2] = s + data Which is obviously a bit more verbose than the perl vernacular of $H{key1}{key2} = $s; I think that dicts do not yet work this easily is a problem, however someone has assured me that future python releases will have dicts that can work as easily as a perl hacker would expect. Anyway rather than next go on to the now obvious that I thought about it list of dicts I was still stuck on the idea of using a pair of keys to access some value, thus a tuple seemed obvious to store the data in a dict still. However this meant that when I extract the values from the dict I can not simply use len on the dict collection as it does not accurately reflect the number of records. Which of course was the perfect chance to go and learn how to use map and lambda in python, after all I use map in perl often and it really is lovely to have functional capabilities in a language you program in. Using a number as one of the record keys I was then able to have constructs such as (after refactoring to list of dicts I did not need the high = expression and modified the second expression slightly)
high = max (map (lambda k: k[0], D.keys()))and name, phone, address = map (lambda k: D[(i,k)],['Name', 'Phone', 'Address']) The first to find the number of records from the numeric key and the second to extract the information I was interested in printing. The second especially is often used in perl to extract matches with a [0..N] or range(N) sort of thing when you get things with multiple function calls into a list. Such as the perl expression my @emails = map { $res->getvalue ($_,0); } (0..$res->ntuples-1); The final problem I had was when printing the data, in perl and c I can do printf ("%-20s %-12s %46s", name, phone, address)However in python the string formatting in print did not justify or cut off arguments as expected. Also string.rjust and string.ljust did not limit the size of strings if they were larger than the field size. So I needed to do the following.
print "%s %s %s" % (name[0:30].ljust(30), \ phone.rjust(12), \ address[0:45].rjust(45)) That final concern is not really a problem, and arguably clearer as to what is going on than using printf formatting as a c programmer is used to. Anyway if anyone who works at ANU wants to use this from a command line or anyone wants to see it I have it online for download/viewing. There may be a few places I can clean this up better, and the version online is stripped of comments. I can understand how people like the way python works, the code really is almost like pseudo code in many ways, it does most of the time work the way you expect it to, it is a little hard to wrap my perl oriented brain around, however that does not take long to work around I expect. Also anyone complaining about whitespace formatting in python, IMO you are deranged, it really is not an issue needing to use whitespace for program layout. Thu, 01 May 2008
Another Ubuntu annoyance - 22:03
Unfortunately in Ubuntu there is no way to disable this in grub, the uuid change is hard coded into update-grub in /usr/sbin. At least in Debian it is still optional. Anyway I had forgotten to modify update-grub to remove the uuid stuff and had installed a new kernel on a student server, then reboot the machine and hey presto it did not come back online. If it were not for the need to run this server on Ubuntu to be similar to the lab image and easy environment for a student to duplicate at home it would be so much easier to run Debian on it again. Of course to compound the issue this was a server I had to wait until after normal hours to take offline so I was messing around with after 7pm. Mon, 28 Apr 2008
Update on deb package archive clearing. - 14:44
They all have a 100 Mbit (or better) link to the mirror, and it seems silly to have them using local disk storage once an entire successful apt run is finished. Andrew suggested the Dpkg::Post-Invoke rule could be used to run apt-get clean, my understanding upon reading the documentation last week was that would run clean after every individual deb package as installed. I guess it is likely when installing large numbers it may not be run until after the post-inst script, however without looking close it appeared to me it may mess up install processes somehow. I may have gotten that intuition wrong, however as pointed out in the other online response it will not work for some use cases. It still seems the only current way to solve this is to add apt-get clean to cron (or of course write a patch for apt that allows a Apt::Install-Success::Post method or something), not really a huge problem for now, however as I said strangely different to dselect and my expected capabilities. Wed, 23 Apr 2008
Keeping /var/cache/apt/archives empty. - 13:02
So I had a look at the apt.conf and apt-get documentation and /usr/share/doc/apt/examples/configure-index.gz and a bit of a look around online to see how to disable the cache. I thought it may be bad to completely disable the directory for packages to sit as apt places them there when it downloads them. However as the partial directory being used for packages in transit I wondered if that was where packages were kept during the install process. Anyway I tried adding Dir::Cache::Archive ""; and Dir::Cache::pkgcache ""; to a new file /etc/apt/apt.conf.d/10pkgcache. This however did not change anything and packages were still left in the archive. Next I tried setting both items to /dev/null, that caused a bus error when running apt-get install. I was kind of hoping there was some way to tell apt not to store files after it has run, dselect runs apt-get clean upon completion, there appears to be no way to tell apt to do a post install hook and run clean when finished. (assuming apt ran with no errors in the case the post install hook runs) The only way to do this appears to be to place apt-get clean in a crontab somewhere, which is a pain if you are short on disk space so would like to get rid of packages as soon as installing is finished. Interestingly /dev/null was also changed by what I tried above, it became a normal file and I caused some other processes depending on it to fail. Restarting udev did not recreate the device (even though the udev config said to recreate it as a char device with the correct permissions set) instead it reappeared as a normal file with the wrong permissions, some other running process seems to have interfered with /dev/null creation. Anyway that was easily fixed with /bin/mknod, now if only the emptying of /var/cache/apt/archives were so easy without resorting to cron. Sat, 19 Apr 2008
Participating the BarCamp way - 15:02
So when I have talked to people during the day, or when someone has given a presentation, I have looked for the link they placed on the Barcamp page and been able to go read some of their blog and see what they talk about more. I probably should participate to the extent of adding myself to the wiki, after all I am here all day. However it is interesting to note Bob and I have both had the same sort of reaction to our involvement. The Unorganisers suggested we all sign up to some yahoogroup or something for more of the discussions leading up to hosting the event. As far as I know Bob did not join, and I did not either, too much effort involved to sign up to another mailing list. So I just had a look at adding my name and diary link to the BarCampCanberra page and to edit the wiki requires a login so I decided not to bother. Sure it makes perfect sense that to edit the page you need to go through some form of authentication to stop spammers and such from blowing the wiki apart. I simply can no overcome my web forum/online login apathy enough to sign in here, kind of strange, though I notice Bob has not done this either.
Reminder that other people exist - 14:33
It is a highly amusing presentation, he has been talking about many things we all know and recognise that his students seem to not understand or know about. He mentioned that the Comp Sci students he had the first year or so he ran the course no longer do the subject as they seem to think they do not need it, so all the students are marketing commerce students who do not live in Internet culture. Something that I am reminded of listening to this is that we often forget there are people dissimilar to ourselves out there. For example a somewhat elitist example I often have to remember is that most people in the population are not university educated, however living in Canberra and hanging out with people who generally are, and working at a university, I often forget that not everyone shares my background. Dr Dann is dealing with non Internet savvy people and trying to induct them, it is interesting to hear his experiences. Good talk.
Getting deeper into the materials - 13:31
So the fact that people using the abbreviations on their badges is so prevalent today it had me wondering if there would be a cool way to obfuscate this a little bit (so I admit I like geek in jokes). Alas the symbols on the table are not the same abbreviations as found on the real periodic table so his is not quite as simple as I first hoped. My idea is if you select your list of elements to put on your badge and then could arrange them in such a way as to create materials or more complex things made up of the elements bonded in specific ways. For example water is H2O (two hydrogen molecules bonded to one oxygen molecule), so if you had a drop of water drawn on the bottom of your badge you are indicating your geek interests included H and O (you could even use it as a way to indicate you do H more than O if you want to be exact about this). The idea above falls apart a bit as the letters do not match the elements. However if you wanted to go ahead with this obfuscation you could simply use the elements in the same place on the table as those you select to try and choose various compounds then represent these compounds on your badge rather than the letters them selves. However no one would easily be able to work out what you mean now as they would need to know the chemical make up of the compounds you use, know where those elements are placed on the periodic table and then have memorised the geek periodic table to the extent they know what geek interests are in those positions. This is however a unconference that focuses on cool geeky online apps to some extent, you could fairly quickly extend the geek periodic table to enable translating from a selection of geek elements into a selection of real materials and have some symbol suggestions for the materials. People who want to use the obfuscation could use the tool (in both directions) to work out what is on a badge.
User interface discussions - 12:23
The presenter did have a definite point, when you consider where interfaces were at in 1968, why has there not been more research into different interfaces for different use cases and scenarios. It occurred to me that it is interesting to look at life possibly imitating art. In the Neal Stephenson book Snowcrash. Most users interface to the virtual reality world via the real life interfaces there and also appear to access computers in reality via a VR environment. However the hard core hackers all still access the low level real code with a keyboard and VDU and a Unix style command line interface (not too surprising from Stephenson when you consider his brilliant essay In the Beginning ... was the Command Line) So there are likely to be real uses for the currently accepted interfaces all the time, however the uses of alternative interfaces is likely to apply in a more specific use case scenario, and thus manufacturers, designers, researchers exactly need to somehow align and market them in specific ways and inform the people who want that use of a better (if it really is better) way to use the technology. An amusing aspect that came up for me (from a cycling background) was the question asked why in The Tour de France the UCI has banned recumbents. The person asking the question has obviously drunk the kool-aid on offer from the HPV community on this issue with there constant claims that they are obviously faster and superior for all uses. The reality of this is that they simply can not climb as fast, thus any race with climbing (such as The Tour de France) will make them useless. The reasons they do not climb well is they can not be made as light as a modern diamond frame road bike (they can be easily purchased at 6 KG ready to ride now) and you can not get out of the saddle in a recumbent and really work more muscle groups, the limitations of muscle uses restrict the ability to go hard up hills. Also when climbing with the rather limited motor available in a human body the aerodynamic advantages of a recumbent do not matter at such low speeds and can not overcome the advantages of low weight and more muscle groups. Thus Paul had some basis in suggesting that one reason computer interfaces have not advanced is that they are rather optimal for the purpose, though I strongly tend to agree more with the presenter that computer interfaces have a lot of room for improvement.
Barcamp thing - 10:28
So it will be interesting to see how the talks and other stuff go all day, there are a rather large number of people here so it is likely to work well. Right now there is a talk about Meraki on. Thu, 10 Apr 2008
Not meant to own one - 15:51
On my return to Canberra I bought another one and all seemed fine. I tied it onto my phone and was able to slip it inside the leather phone cover so it stayed put and was out of the way. This was until last Wednesday morning when I crashed and fractured my collar bone my phone was in a back pocket of my cycle jersey. Though the phone has come out of the crash unscratched and working as well as it was previously. The usb key has a bent pink metal cover and the back of the plastic bit where the chip contacts are is scratched a bit. After seeing APC tests in which the USB keys still often worked after much more severe torture than this one would expect it would still work. Alas I plug the key into a usb slot and nothing happens, definitely dead, tried it in multiple computers with a lot of wiggling around of the key. So small pink usb key junkie that I am I wandered over to the store today and they no longer have the 2GB key in pink, and they rang the importer who also no longer has them, only blue or black which really is not as cool. Thus it appears I am simply not meant to permanently own a cool small pink usb key. I did however see a helmet in the Giro line up that is a rather cool pink, maybe I should get that to replace my broken helmet. Mon, 25 Feb 2008
API design and error handling in code - 21:12
First it is true that putting in full error handling in code when using fairly standard libraries can take a lot of time, complexity and ugliness. However there should be some way somewhere to find out if errors happened I suggest, largely so you can deal with them if there is a situation they may be likely. Also understanding that libraries can fail in calls and what this means is important for coders, even if they do not handle them all. When marking assignments at uni I am keen to see that students have thought about error conditions and made the decision about what level of complexity to trade off against what likleyhood certain errors have of occurring. The above issue with assignments however does tend to be students who are newer to programming than most free software hackers so there are considerations in both directions there. As for the other reason the above posts interest me, it is cool to see Cairo getting such props for great design again. Carl and co have done a stellar job with that library. As I continue reading the planet I can see more entries in the thread. Thu, 21 Feb 2008
X and KDE out of sync - 17:59
In the last while the Xorg crew have been doing some great work to ensure X will generally run better with no config file around, working things out as it starts up and all that. However kde (at least the version in Kubuntu 7.10) has not caught up to the idea of querying the X server or working with it to that extent yet. I hope the newer kde releases are heading this way, also I should check out gnome and see if it handles this cleaner. One thing I should note though is xrandr really is seriously cool. I found the thinkwiki xrandr page to be one of the best for describing cool stuff it can do. |