sjh - mountain biking running linux vegan geek spice - mtb / vegan / running / linux / canberra / cycling / etc

Steven Hanley hackergotchi picture Steven
Hanley

About

email: sjh@svana.org

web: https://svana.org/sjh
twitter: https://twitter.com/sjhmtb
instagram: https://instagram.com/sjhmtb

Other online diaries:

Aaron Broughton,
Andrew Pollock,
Anthony Towns,
Chris Yeoh,
Martijn van Oosterhout,
Michael Davies,
Michael Still,
Tony Breeds,

Links:

Linux Weekly News,
XKCD,
Girl Genius,
Planet Linux Australia,
Bilbys,
CORC,

Canberra Weather: forecast, radar.

Subscribe: rss, rss2.0, atom

September
Mon Tue Wed Thu Fri Sat Sun
           
12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

2024
Months
Sep
Oct Nov Dec

Categories:

Archive by month:

Mon, 01 Jun 2009

An interesting languages comparison - 15:45
I got the link to this from Tony and it is interesting to see the results of these tests. The speed, size and dependability of programming languages uses code from the Computer Language Benchmarks Game to generate some information comparing many (72) different languages.

Back in 1999 and 2000 I put a pretty trivial example of a single problem being solved in multiple languages online. In this case scanning html for entities, largely because I was mildly interested in how different languages and the different implementations of them may solve the same problem and the time it would take. I say mildly interested because it is such a trivial example and because I did not put much effort in. (I was amazed a few weeks ago to get an email from someone rerunning these to see if recent Java implementations had caught up to c yet).

The person who wrote this speed, size and dependability post put a lot more effort in and actually was able to draw some interesting conclusions about languages and how they work and develop over time. For the geeks out there I recommend having a look.

[/comp/prog] link

Thu, 08 May 2008

Move a little thing to python - 13:44
At ANU there is an online (web page) searchable phone database for all ANU phone numbers. A few years ago (July 2002, according to the version control dates) I spent an hour or two writing a command line program in perl that queries this and prints the results. I find it much easier to use a command line application than open a tab in a web browser and find the appropriate page and enter a query when all I want is a simple bit of information back. I suspect most of the staff in this department are similar (Computer Science).

Sometime last year I realised that though the URL I was using on the ANU Internal Web still worked it seemed not to interface with the latest phone database for the uni so it sometimes did not match people I knew worked on campus, other times it contained out of date numbers for people. However there were other important uses for my time so I did not bother looking too closely into updating it when most of the time the old results were still good enough.

Finally this week Bob noticed there were no matches coming back, it seems the old interface no longer connected to the database correctly. Thus I opened the program and had a look at updating it. The old program used LWP to fetch the page with a GET request. The newer interface now on ANU Web works properly with a POST request. Also the result page is more complex to parse than the old one (more complex regular expressions, or maybe a small state machine needed). Still it did not look too hard to spend an hour or so fixing the old perl code up to get the new page and parse it properly for the desired results.

However I hit a snag when for some reason LWP did not fetch the entire result from the web server that was returning the data in chunks. A tcpdump session showed it simply closed the request rather then fetch all the data. At this point I could have debugged the perl code and fixed, after all there is no good reason LWP should not work. However I thought to myself, I have been keen to write python a bit for a while. Bob bought the Mark Lutz Programming Python book for my office and I read through about half of it. So why not rewrite the program in python. See how a perl hacker can transfer to using python at least for a small program.

I am happy to say that the page fetching in python even made perl look complex, the code that did the job (and worked, doing a post request fine) was

   name = ' '.join(sys.argv[1:])
   params = urllib.urlencode({'stype': 'Staff Directory', 'button': 'Search', 'querytext': name})
   f = urllib.urlopen(searchuri, params)
   r = f.read()

Cool I thought, this is hell easy, what a fantastic language, I will forever give up my perl ways if everything is this easy and obvious. Obviously this was not going to last, I guess partly because my brain meshes with perl well after so many years, and I am used to perl associative arrays, classes, modules, and regular expressions. Anyway I now had my result from the search and all I had to do was parse it and extract a form that can be printed on a terminal nicely.

First I tried using the python regular expression matching and needed to create some hideous regexp to match the data returned. I also discovered that when a search matches more than about 2 people the data is returned in a different format. Fortunately in this second case the format is really easy to match against with a regexp. Even though the regexp language is similar/identical to perl I was still getting my head around the documentation for all of what I was doing and could not at first construct a regexp that made sense to parse the first sort of data. So I decided to get a HTMLParser and extract the data I wanted without the crap in the tags.

My first attempt was to use the HTMLParser module, however I soon found that this threw an exception when ever I fed it the page from the uni with the matches in it. I tried except: pass in the hopes it would keep on going, however it stopped there and did not process the rest of the page. So I had to change to using the htmllib.HTMLParser which was almost identically easy to use and managed to process the entire page.

Next I wanted to store the data until all matches were found, in perl this would be trivial using a multiple level hash or an array of hashes. Of course the most obvious way to do this in python now I think about it is using a list of dicts. However I had my brain stuck on using a multi level hash. I found this was most difficult in python as you need to initialise dict entries and can not simply assign arbitrarily into them when you need. I needed to use the following construct.

if (D.has_key (key1) == 0):
   (D[key1]) = {}

if ((D[key1]).has_key (key2) == 0):
   D[key1][key2] = ''

s = D[key1][key2]
D[key1][key2] = s + data

Which is obviously a bit more verbose than the perl vernacular of $H{key1}{key2} = $s; I think that dicts do not yet work this easily is a problem, however someone has assured me that future python releases will have dicts that can work as easily as a perl hacker would expect. Anyway rather than next go on to the now obvious that I thought about it list of dicts I was still stuck on the idea of using a pair of keys to access some value, thus a tuple seemed obvious to store the data in a dict still. However this meant that when I extract the values from the dict I can not simply use len on the dict collection as it does not accurately reflect the number of records.

Which of course was the perfect chance to go and learn how to use map and lambda in python, after all I use map in perl often and it really is lovely to have functional capabilities in a language you program in. Using a number as one of the record keys I was then able to have constructs such as (after refactoring to list of dicts I did not need the high = expression and modified the second expression slightly)

high = max (map (lambda k: k[0], D.keys()))
and
name, phone, address = map (lambda k: D[(i,k)],['Name', 'Phone', 'Address'])

The first to find the number of records from the numeric key and the second to extract the information I was interested in printing. The second especially is often used in perl to extract matches with a [0..N] or range(N) sort of thing when you get things with multiple function calls into a list. Such as the perl expression

my @emails = map { $res->getvalue ($_,0); } (0..$res->ntuples-1);

The final problem I had was when printing the data, in perl and c I can do

printf ("%-20s %-12s %46s", name, phone, address)
However in python the string formatting in print did not justify or cut off arguments as expected. Also string.rjust and string.ljust did not limit the size of strings if they were larger than the field size. So I needed to do the following.

   print "%s %s %s" % (name[0:30].ljust(30), \
                       phone.rjust(12), \
                       address[0:45].rjust(45))

That final concern is not really a problem, and arguably clearer as to what is going on than using printf formatting as a c programmer is used to. Anyway if anyone who works at ANU wants to use this from a command line or anyone wants to see it I have it online for download/viewing. There may be a few places I can clean this up better, and the version online is stripped of comments. I can understand how people like the way python works, the code really is almost like pseudo code in many ways, it does most of the time work the way you expect it to, it is a little hard to wrap my perl oriented brain around, however that does not take long to work around I expect. Also anyone complaining about whitespace formatting in python, IMO you are deranged, it really is not an issue needing to use whitespace for program layout.

[/comp/prog] link


home, email, rss, rss2.0, atom