Programming

Life With Gerrit Code Review

Gerrit is a code review tool, based on Git, originally written for the Android project. It's the center of our workflow on the OpenStack project. In every other programming job I've had, every programmer in the company has had commit rights to the central repository, and code review has been an occasional ad hoc practice, mostly ignored in the name of expediency. On OpenStack, only Gerrit has commit rights to the main branches of the central repository, and if your code isn't reviewed, it doesn't get in. This is a very different way to work, and wanting to try it was part of the reason I decided to work on OpenStack.

On OpenStack Nova, we have Gerrit configured to require a +2 vote from two core reviewers, plus a +1 vote by our Jenkins automated test system. So, basically, you have to impress two senior programmers and one robot. Non-core-reviewers can and do review code as well, but they can only vote +1 not +2, and their approval isn't enough to get your code in.

On a nuts-and-bolts level, using Gerrit like this requires a ton of work by a talented infrastructure team, who configure the central Git/Gerrit server and the Jenkins server with all its automated test jobs. I'm not on that team, though, so that's all magic to me. The thing that every single programmer on the team has to deal with is a particular workflow in Git reflecting that changes go to Gerrit rather than directly being pushed to origin:master, plus using the Gerrit web interface to review code and to view feedback from others.

On the Git side, we use the git-review tool to simplify submitting local Git branches to Gerrit. Basically, a user has to follow a few steps, nicely documented in a wiki page, to setup his local repository to communicate with Gerrit. Some of these are things you'd already do with the average centralized Git setup, like configuring your username and email address and ssh keys and cloning the central repository to your local computer. Luckily, most of the extra setup is a one-time thing. The things that are different day-to-day is that we always develop in a local branch, never on master; that instead of using "git push" to send our local changes up to the server, we use "git review" to send them to Gerrit, and that most branches are a single commit rather than a series of commits, so if you need to make changes to appease a reviewer you do "git commit –amend" to modify the last commit, rather than a normal "git commit" to make a new one.

When you run "git review", it sticks an ugly line like "Change-Id: Ie4a59eeb7e3895f5d35471377c3bea462c690210″ at the end of your commit message. This is Gerrit's change id, and it's separate from the Git commit id because Git's id will change if you rebase or amend your commit (which we do often), while Gerrit's id needs to remain constant. A few seconds after you run "git review", your commit shows up in the Gerrit web interface, in our case at review.openstack.org (If it's a modification of an existing commit with the same Change-Id, then the new change shows up as a new Patch Set under the existing change.)

After code is submitted for review, you're at the mercy of two classes of reviewers. The human ones are busy with other reviews, writing their own code, attending meetings, playing Foosball, whatever. We have a large enough team on some OpenStack projects that you'll usually get your first human review within a few hours, but if it's a weekend or holiday then you might have to wait. And of course you don't just need one review, you need two +2 reviews from core reviewers, so if there's some back-and-forth about the quality of your change, it can take a few days to get everything through the system.

And then there are the bots. Bots are tireless and ready to review code 24/7/365, but we have lots of tests to run and finite hardware to run them on, so in practice it can sometimes take about an hour for your commit to get to the front of the queue and have the tests run. (This depends on how many other commits are going in; you get faster test runs during off-peak hours.) Our tests consist of a bunch of core OpenStack functional tests like starting virtual machines, plus running the project's entire Python unit test suite, plus a customized version of the pep8 tool to enforce the project's coding standards. (The latter is, in my opinion, the best thing ever because it means I never need to look at 250-column-wide OpenStack code that doesn't fit in my xterm, and because it frees up human reviewers from needing to deal with style so they can focus on substance.)

If all the automated tests pass and two core reviewers like your code enough to vote +2, then you're done, and your commit eventually gets into the project's master branch. That sometimes happens on the first try, for very simple changes. Usually, it takes a few revisions. You go over 80 columns in one place and get dinged by pep8, some unit test fails, or some reviewer thinks you have a bug in your logic or a typo in your commit message or need a comment to describe some hairy logic. In cases where it's clear what the problem is, the solution is pretty simple: you modify the code, run "git commit –amend" to modify your commit, and run "git review" to push the new patch set to Gerrit and hope they like it better this time. Sometimes spurious test failures happen, and you have to tell Gerrit to "recheck" or "reverify" your existing commit. Eventually, everyone is happy and your code gets in, and then you can delete your local Git branch and go work on something new.

How does this change your day-to-day routine as a programmer? First, it takes longer to get small changes in. Something small that might take 5 minutes to code and then 1 minute to commit and push now takes 10 minutes to code (because you're being more careful to avoid being dinged by a reviewer) and a couple of hours to get reviewed. Second, as with any project that has centralized test infrastructure, you spend a lot of time saying "but that test passed on my machine!" and then digging through logs, figuring out why it fails for the Jenkins bot. Third, you spend a lot of time reviewing other people's changes.

What do you get for all that effort? First, all the code in the project's master branch has been reviewed by at least two senior people besides its author, and has made it through a pretty comprehensive test suite. In short, obvious junk with syntax errors doesn't get in at all. Buggy code can of course still get in (you can't test for everything), but the bugs have to be more subtle. Second, because of the way we continually amend a single commit during the review process, each change in the master repository tends to be a "perfect patch", with mostly correct code and a mostly useful commit message, adding one feature or fixing one bug. When later reviewing project history, you don't usually need to look at the one commit that adds most of the feature, then two more later that day that fix bugs in it. (That, in turn, makes it easier to backport fixes to other branches, since they're more likely to be self-contained.) OpenStack's code isn't perfect, but it's a lot better than on any other project of similar size that I've worked on. Third, because you spend a lot of time reviewing other people's code, you accidentally learn about parts of the project that you haven't directly worked on yet.

Would I choose to use Gerrit on a new project? Yes, with a few reservations. First, you need a solid enough infrastructure team to set it up and maintain it well. If all commits are gated by Gerrit and Gerrit breaks, you need to fix it right now or integration of new changes will stop. Second, you need a big enough team that some code reviewers are just about always available (at least during the team's core business hours), to keep work from grinding to a halt while waiting for reviews. Third, you need people who are willing to constructively criticize other people's code, and people who are willing to accept constructive criticism. If everyone just says "+1″ or "+2″ all the time without really looking hard, then you're just wasting your time. Fourth, you need an out-of-band way to communicate when authors and reviewers have a major disagreement and they need to have a ten-minute interactive conversation rather than just slowly throwing one-liners back and forth in the code review tool. If you're in the same office, that's easy. OpenStack is a global multi-company team, so we use IRC, which works fine.

Programming

Comments (0)

Permalink

PyCon 2012

I'm back from PyCon in Silicon Valley. (If you don't know what PyCon is, it's 2000+ nerds talking about the Python programming language for a few days.) Here are some highlights I remember, in case you're looking for recommendations of videos to watch. (Of course, it's fantastic that they record everything and make it all available for free, here )

The introduction featuring dancing robots was impressive. I remember that just a few years ago it was hard to teach robots to walk, and now you (or your rich friend) can buy a robot that can be programmed to dance (among many other things) for a few thousand bucks. They ended up giving away one of the robots at the end, a pretty nice prize.

Paul Graham's keynote about ten really-hard-but-not-quite-impossible startup ideas was awesome, probably the best thing I saw at the conference.

Stormy Peters' keynote about freedom and privacy and stuff was okay; I agreed with her about pretty much everything, but I don't know if someone who didn't would actually have been swayed.

Guido van Rossum's keynote about "trolls" who whine about Python didn't do much for me.

Katie Cunningham's intro to hosting a web site in the cloud and using Fabric to maintain it was very good. I've never used Fabric but I plan to start.

Raymond Hettinger's talk about subclassing was predictably excellent. It was a very balanced talk about when it makes sense to subclass (because it lets you write less code) and when it doesn't, not the typical object oriented programming 101 nonsense where they explain how to subclass but not why it's not always the right thing to do. But since I pretty much already agreed with everything he said, I don't think I got anything out of it.

Larry Hastings' low-level stepping through Python was fun for a very select audience. Most people probably don't need that level of detail.

Paul McMillan's talk about security was very good, and useful for any web programmer who doesn't know this stuff.

Chris McDonough's talk about PDB (the Python debugger) was pretty good. I picked up a few tricks I didn't know, even though I've used PDB a bunch of times.

Moshe Zadka's talk about writing resilient programs that recover from crashes was great. He mentioned several techniques I hadn't even thought of, let alone used.

Geremy Condra's talk about security testing was good, but not quite as mind-blowing for me as his previous talk about timing attacks.

Glyph Lefkowitz's talk about low-level networking (and not Twisted) was great, though I already knew pretty much everything he covered so I probably should have been elsewhere.

The PyPy survey talk by Maciej Fijalkowski, Alex Gaynor, and Armin Rigo was actually more like two talks. Maciej and Alex gave kind of a general introduction to PyPy, then Armin talked about software transactional memory, making an analogy to garbage collection. I already follow PyPy so I didn't get too much out of this talk, but it's worth watching if you don't.

Benjamin Peterson's talk about the PyPy just-in-time compiler was more in-depth and thus more interesting to me. I'd heard most of the material before but it's complicated enough that I needed to hear it again.

Michael Bayer's talk about SQLAlchemy was okay. It was more advocacy of the right way to do a flexible database toolkit (make it flexible) than practical advice on how to use SQLAlchemy.

Both Geographic Information System talks I attended were excellent. Jason Scheirer talked about GIS and maps at a more theoretical level, covering the basic terminology and map projections. Zain Memon's talk was more practical, basically how to use a bunch of libraries to quickly create a map-oriented web site. I need to do some GIS stuff at work, so these two talks definitely justified the money my employer spent to send me to the conference.

Jacqueline Kazil and Dana Bauer's talk about mining public data released by local governments was really good. I wasn't at all familiar with most of the tools they used, so I'll need to watch the video and take notes.

Chris Lambacher's talk about cross-compiling Python and C extensions for embedded systems covered something else I need to do at work, so I was really hoping it would solve all my problems. Unfortunately, his slides showed a big mess of low-level hacks around tools that aren't really designed for cross-compiling, rather than the magic bullet I wanted (but didn't expect to find.) It's a hard problem, but maybe the emergence of ARM devices will make cross-compiling more mainstream and the tools will improve.

Erik Rose's talk about parsing MediaWiki files (which are too complicated to parse with simple regular expressions) gave us an entertaining survey of Python parsing libraries. He covered a handful in some detail, and has a survey of lots and lots more on his web site.

Ryan Kelly's talk about frozen standalone programs didn't really go much into running programs like py2exe, but rather dealt with secondary issues like auto-updaters, binary compatibility with old operating system versions, and code signing. It was interesting, but didn't solve my short-term need to make PyInstaller and PyGTK cooperate on 64-bit MacOS.

Brandon Rhodes' talk about linkers and virtual memory was really excellent, but unfortunately covered things I'd already learned, so I should have been somewhere else.

The lightning talks were a mixed bag. I enjoyed many of them, but certainly don't remember enough to review them all. I'd kind of like to attend a conference that was nothing but lightning talks, five minutes each on hundreds of subjects.

The poster session was good. There were about 30 posters, with their authors handy to talk, and about half of them covered subjects that interested me at least a bit.

The Expo Hall showed that the tech economy is doing fine, even if the larger economy is still kind of bleak. There were dozens of sponsors with booths, and at least half of them were desperately trying to hire people. If you're an out-of-work Python programmer, definitely go to PyCon.

Finally, I spent a day and a half at the sprints, hacking on PyPy. I fixed 3 or 4 minor bugs and learned a bit in the process. I would love to have spent another couple of days at the sprint, but work and family called. (I had actually planned to spend a couple of days playing tourist rather than coding, but changed my mind. So I'll have to go back and look at redwoods some other time.)

In some ways, this was the best PyCon I'd attended since the very first one in DC. The WiFi worked really well for me for a change, even though I only brought a cheap netbook. I managed to avoid attending any really bad talks. I got to write some useful code and talk to a lot of people and even play in a free poker tournament. And I caught not one but two door prize frisbees. (Unfortunately, I only won a couple of books, not a robot or an iPad, but maybe next year.)

Programming
Python

Comments (0)

Permalink

git bisect for the impatient

I was testing Slugathon last night and found a bug.  Unfortunately, it was a GUI bug so my unit tests didn't catch it.  Luckily, the bug was easy to reproduce (I could see it in about ten seconds from the start of a new game).  So here's how I used git bisect to find which commit had the bug.  (Note: I had a clean tree with no uncommitted changes at the start of the process.  If I had uncommitted changes I would have used git stash first.)


$ git bisect bad

You need to start by "git bisect start"

Do you want me to do it for you [Y/n]? Y

$ git checkout HEAD~10

$ ./setup.py build; sudo ./setup.py install; ./slugathon server

(Did not see the bug.)

$ git bisect good

Bisecting: 4 revisions left to test after this (roughly 2 steps)
[b1b94b6253ad506cb03199bea6ff52e76ebedfb9] Remove some obsolete TODO comments.

$ ./setup.py build; sudo ./setup.py install; ./slugathon server

(Did not see the bug.)

$ git bisect good

Bisecting: 2 revisions left to test after this (roughly 1 step)
[cb7b8c2553f48a6d2a8c8eca5cb538927720874c] Rename Player.legions to markerid_to_legion

$ ./setup.py build; sudo ./setup.py install; ./slugathon server

(Did not see the bug.)

$ git bisect good

Bisecting: 0 revisions left to test after this (roughly 1 step)
[d0a783c61eba6ba5f70b3021252a46f301293727] Update legion and marker counts after StartMusterPhase.

$ ./setup.py build; sudo ./setup.py install; ./slugathon server

(Did not see the bug.)

$ git bisect good

69efbfd5e19220e46f588944a8692140933cb303 is the first bad commit
commit 69efbfd5e19220e46f588944a8692140933cb303
Author: David Ripton <dripton@ripton.net>
Date:   Mon Dec 5 21:34:36 2011 -0500

Rename Player.markerids to markerids_left for clarity.

$ git bisect reset

And that's all; it found which commit had the bug. I did a git log -p on that commit to see exactly what I'd changed, found a suspicious part of the patch a few minutes later, fixed it, verified the bug was fixed, and committed and pushed the fix. Easy.

Of course you can do this without git bisect; it just saves you the effort of manually doing the binary search through the range of possibly-bad commits.

Programming

Comments (0)

Permalink

PyPy is still the fastest Python

PyPy 1.6 was just released, so I ran my Project Euler benchmark script again.

The script runs my pile of Project Euler solver programs across various implementations of Python available on my computer, and gives results only for the ones that finish successfully in less than a minute on all of them.

The contenders this time were PyPy 1.6, CPython 2.7.2, CPython 2.6.7 with Psyco, and Jython 2.5.1+. (Unladen Swallow appears to be dead, and IronPython is surprisingly painful to install on Linux, so I left them out.)

This time, 125 programs qualified, out of 197 working solvers. (Some programs require the gmpy library, which doesn't currently work on PyPy or Jython. And some use Python 2.7 syntax, which won't work with Psyco or Jython. And some are just too slow.)

Once again, PyPy is fastest, followed by Psyco, then CPython, then Jython. PyPy and Psyco are roughly twice as fast as CPython, and Jython is roughly twice as slow as CPython. (Though a lot of that is startup overhead; Jython gets more competitive for longer-lived programs.)

script pypy 1.6 jython 2.5 psyco 2.6 CPython 2.7
euler1.py 0.10 2.76 0.10 0.03
euler2.py 0.10 2.73 0.10 0.10
euler3.py 0.21 4.17 0.41 0.41
euler4.py 0.32 4.67 0.42 0.31
euler5.py 0.11 3.65 0.11 0.11
euler6.py 0.11 3.25 0.12 0.11
euler7.py 0.32 4.98 0.31 0.62
euler8.py 0.10 3.44 0.10 0.10
euler9.py 0.11 4.06 0.11 0.20
euler10.py 1.43 9.42 2.04 8.61
euler11.py 0.11 2.75 0.10 0.11
euler13.py 0.10 1.42 0.10 0.10
euler14.py 4.86 7.99 1.42 2.93
euler15.py 0.11 2.73 0.10 0.12
euler16.py 0.10 1.62 0.10 0.11
euler18.py 0.20 3.13 0.10 0.10
euler19.py 0.10 3.04 0.10 0.10
euler20.py 0.11 2.74 0.11 0.11
euler21.py 0.21 4.47 0.10 0.31
euler22.py 0.12 3.97 0.11 0.12
euler23.py 3.45 18.13 7.01 15.39
euler24.py 3.54 32.17 7.91 5.47
euler25.py 0.62 4.60 1.11 0.20
euler26.py 6.39 18.62 11.25 5.19
euler27.py 1.12 10.42 1.32 9.93
euler28.py 0.10 2.74 0.12 0.12
euler29.py 0.11 3.85 0.11 0.11
euler30.py 3.15 9.51 5.17 4.15
euler32.py 2.23 6.37 5.17 4.16
euler33.py 0.10 3.45 0.11 0.10
euler34.py 4.56 12.45 12.09 15.48
euler35.py 4.95 30.77 7.38 28.47
euler36.py 1.93 5.89 3.04 3.24
euler37.py 9.62 34.24 14.57 16.42
euler38.py 1.32 6.98 1.83 1.83
euler39.py 0.30 3.86 0.10 0.31
euler40.py 1.12 6.40 0.83 0.72
euler41.py 4.68 18.73 5.42 4.97
euler42.py 0.22 3.86 0.10 0.10
euler44.py 0.61 5.77 0.71 3.25
euler45.py 5.59 10.45 2.96 2.34
euler46.py 0.20 5.56 0.22 1.73
euler47.py 0.71 8.32 0.72 4.86
euler48.py 0.10 4.65 0.11 0.10
euler49.py 0.41 6.11 0.62 0.61
euler52.py 0.41 5.08 0.81 0.81
euler53.py 0.32 4.27 0.21 0.20
euler54.py 0.43 5.06 0.22 0.42
euler55.py 0.61 5.17 0.41 0.41
euler56.py 1.14 5.98 1.63 0.92
euler57.py 0.41 5.68 0.53 0.72
euler59.py 13.76 15.08 14.57 13.56
euler61.py 0.31 4.27 0.20 0.11
euler62.py 0.20 3.84 0.31 0.20
euler63.py 0.20 3.34 0.20 0.10
euler65.py 0.11 3.06 0.11 0.12
euler66.py 1.31 20.25 8.00 13.35
euler67.py 0.42 3.36 0.11 0.11
euler68.py 0.10 2.54 0.10 0.10
euler69.py 0.10 3.55 0.10 0.10
euler70.py 0.41 7.00 0.81 0.72
euler71.py 0.22 5.18 0.31 1.01
euler72.py 6.48 38.36 6.38 54.47
euler73.py 2.95 21.28 2.95 27.14
euler75.py 11.95 6.47 0.71 2.33
euler77.py 0.51 6.48 0.21 0.41
euler79.py 0.10 1.72 0.10 0.10
euler80.py 0.61 2.43 0.72 0.61
euler81.py 0.30 2.02 0.10 0.10
euler82.py 0.51 7.28 0.30 0.30
euler83.py 0.21 2.84 0.41 0.61
euler84.py 1.52 19.45 6.58 22.46
euler85.py 9.15 21.56 10.53 12.86
euler87.py 2.24 5.57 1.23 1.12
euler89.py 0.10 2.33 0.10 0.10
euler93.py 2.63 10.96 5.76 11.66
euler94.py 24.61 21.98 27.12 22.97
euler97.py 2.93 9.32 2.73 3.46
euler98.py 0.41 3.65 0.51 0.61
euler99.py 0.12 2.24 0.11 0.10
euler100.py 0.10 1.62 0.10 0.10
euler101.py 0.41 3.04 0.10 0.10
euler102.py 0.10 2.43 0.11 0.11
euler103.py 0.10 1.72 0.10 0.10
euler104.py 1.01 4.17 1.82 1.72
euler105.py 1.74 6.19 0.51 0.41
euler106.py 1.52 7.41 0.51 0.41
euler107.py 0.21 4.88 0.21 0.41
euler108.py 2.83 12.77 8.11 11.66
euler109.py 0.30 9.60 0.41 2.14
euler111.py 4.57 26.96 5.68 25.42
euler112.py 6.38 12.35 13.28 17.82
euler114.py 0.10 3.24 0.10 0.10
euler115.py 0.21 4.76 0.20 0.30
euler116.py 0.10 3.75 0.11 0.10
euler117.py 0.11 3.35 0.12 0.12
euler119.py 0.10 1.72 0.10 0.10
euler120.py 0.10 2.33 0.10 0.10
euler121.py 0.10 3.74 0.10 0.20
euler123.py 4.06 22.27 7.70 7.58
euler124.py 0.72 7.38 0.72 2.84
euler125.py 1.13 4.97 1.64 1.73
euler126.py 1.52 12.44 3.66 16.31
euler135.py 3.86 6.37 2.33 3.65
euler142.py 0.21 4.66 0.10 0.32
euler143.py 0.11 2.54 0.12 0.12
euler150.py 0.52 5.39 1.02 0.91
euler157.py 0.10 1.42 0.10 0.10
euler162.py 0.10 2.43 0.10 0.10
euler171.py 0.10 1.43 0.10 0.10
euler172.py 0.51 2.83 0.51 0.51
euler173.py 0.61 2.93 0.71 0.91
euler174.py 3.65 6.58 3.65 3.34
euler181.py 0.10 2.33 0.10 0.11
euler190.py 0.10 1.62 0.10 0.10
euler202.py 0.12 3.35 0.10 0.10
euler205.py 1.02 8.51 1.12 0.71
euler207.py 0.41 4.36 0.20 0.81
euler222.py 0.10 1.72 0.10 0.10
euler230.py 0.11 3.54 0.11 0.12
euler233.py 0.12 3.15 0.12 0.12
euler234.py 4.56 19.15 7.40 9.96
euler235.py 0.42 4.86 0.62 0.42
euler240.py 10.36 50.91 11.76 22.39
euler267.py 0.51 3.95 0.20 0.41
total 209.07 946.66 267.48 473.62
wins 82 1 57 55

Programming
Python

Comments (0)

Permalink

PyPy gets faster

In a previous post, I mentioned that PyPy was the fastest Python implementation for most of my Project Euler programs, but that it was very slow for a few of them.

This is no longer the case. The jit-generator branch was merged a few days ago, fixing a weakness with code that uses generators. And now PyPy is clearly the fastest Python implementation for this code, with both the most wins and the lowest overall time. Psyco is still pretty close. Both are a bit more than twice as fast as CPython.

I compared PyPy trunk, Unladen Swallow trunk, Jython 2.5.1+, CPython 2.6.6 with Psyco, and CPython 2.7.

PyPy is very strong across the board. Its worst result is on euler94, a Sudoku solver that heavily uses sets and copy.deepcopy.

Psyco still does very well, but it doesn't work on Python 2.7 yet and still doesn't work on amd64, so it feels more and more like a dead end.

Unladen Swallow hasn't had a commit since August. I suspect it's just resting, not dead, but it's falling behind PyPy in performance. Version 2.8 of LLVM has been released, but Unladen still requires version 2.7.

CPython is the baseline. I used 2.7, the latest Python 2. (Some of my Euler programs work in Python 3; some don't.)

Jython is by far the slowest. Its large startup overhead hurts on the easy problems, and the mature HotSpot JIT isn't enough for it to catch up on the harder ones. Jython does have the advantage of being free-threaded, but since this code was originally written for CPython it rarely uses multiple threads and so doesn't really benefit.

Here are the numbers.

script PyPy Unladen Jython Psyco CPython
euler1.py 0.01 0.12 3.65 0.10 0.02
euler2.py 0.10 0.10 3.94 0.10 0.10
euler3.py 0.23 0.91 6.40 0.51 0.40
euler4.py 0.31 0.84 6.49 0.41 0.42
euler5.py 0.12 0.12 5.38 0.12 0.12
euler6.py 0.12 0.11 5.50 0.12 0.11
euler7.py 0.33 0.52 7.29 0.12 0.73
euler8.py 0.13 0.12 4.77 0.11 0.11
euler9.py 0.11 0.32 6.67 0.10 0.21
euler10.py 2.34 6.02 11.81 1.93 12.02
euler11.py 0.10 0.10 4.28 0.10 0.11
euler13.py 0.10 0.10 3.36 0.10 0.10
euler14.py 3.88 2.76 8.46 1.96 3.07
euler15.py 0.11 0.11 4.00 0.10 0.10
euler16.py 0.11 0.12 3.58 0.11 0.13
euler18.py 0.10 0.10 3.45 0.10 0.10
euler19.py 0.13 0.11 3.06 0.12 0.10
euler20.py 0.10 0.10 2.45 0.10 0.10
euler21.py 0.10 0.21 4.16 0.10 0.21
euler22.py 0.10 0.10 3.95 0.10 0.10
euler23.py 3.84 21.33 15.36 4.45 12.03
euler24.py 5.97 5.66 31.25 6.97 5.46
euler25.py 0.71 0.91 3.44 0.91 0.20
euler26.py 9.59 10.84 20.61 9.74 3.54
euler27.py 1.33 7.08 12.35 1.32 11.33
euler28.py 0.11 0.12 4.37 0.12 0.11
euler29.py 0.22 0.12 6.49 0.12 0.22
euler30.py 3.95 4.45 10.72 5.37 4.26
euler32.py 2.83 3.54 9.20 4.95 3.34
euler33.py 0.11 0.12 5.26 0.11 0.10
euler34.py 4.35 12.33 13.86 12.67 16.08
euler35.py 5.59 18.19 30.04 6.47 25.47
euler36.py 1.83 3.24 7.39 2.93 2.44
euler37.py 10.92 12.44 39.17 14.57 17.50
euler38.py 2.24 1.31 9.91 2.14 2.34
euler39.py 0.21 0.42 5.79 0.10 0.41
euler40.py 1.12 1.63 7.30 0.42 1.13
euler41.py 4.25 3.94 20.63 5.46 4.76
euler42.py 0.11 0.21 4.98 0.12 0.11
euler44.py 0.91 9.41 7.48 1.02 3.75
euler45.py 8.01 2.73 12.06 2.73 2.12
euler46.py 0.52 1.13 6.98 0.31 1.14
euler47.py 1.44 2.44 8.03 0.73 3.65
euler48.py 0.10 0.20 5.49 0.21 0.10
euler49.py 0.62 1.74 6.72 0.82 1.02
euler50.py 2.23 52.50 58.27 6.11 56.61
euler52.py 0.61 0.81 5.36 0.82 0.71
euler53.py 0.22 0.42 5.98 0.21 0.42
euler54.py 0.41 0.32 7.59 0.32 0.32
euler55.py 0.82 0.73 6.39 0.52 0.73
euler56.py 0.92 1.22 5.98 1.03 1.43
euler57.py 0.52 0.72 6.09 0.53 0.82
euler58.py 6.67 34.19 44.30 7.60 53.89
euler59.py 11.42 8.19 17.06 12.81 14.35
euler61.py 0.31 0.21 5.38 0.11 0.10
euler62.py 0.42 0.32 6.49 0.53 0.22
euler63.py 0.21 0.21 6.89 0.20 0.11
euler65.py 0.10 0.10 3.85 0.10 0.10
euler66.py 1.92 7.68 23.86 7.18 12.33
euler67.py 0.11 0.12 4.77 0.11 0.12
euler68.py 0.11 0.12 4.16 0.10 0.11
euler69.py 0.11 0.21 4.96 0.11 0.11
euler70.py 0.73 0.61 7.18 1.02 1.22
euler71.py 0.21 1.44 6.48 0.32 0.91
euler72.py 7.23 30.18 37.07 8.42 49.23
euler73.py 9.50 22.54 21.23 3.35 24.59
euler75.py 2.12 2.22 5.56 0.62 2.43
euler77.py 0.41 0.30 4.86 0.20 0.41
euler79.py 0.10 0.10 3.13 0.10 0.10
euler80.py 0.91 0.71 3.94 0.72 0.62
euler81.py 0.10 0.20 3.54 0.10 0.10
euler82.py 0.71 0.30 8.70 0.30 0.20
euler83.py 0.20 0.81 4.04 0.41 0.61
euler84.py 2.65 15.13 23.56 4.66 19.91
euler85.py 7.98 7.99 22.23 10.41 13.85
euler87.py 1.93 1.74 7.28 1.03 1.23
euler89.py 0.11 0.11 5.17 0.11 0.10
euler93.py 3.54 5.56 12.72 6.58 12.33
euler94.py 34.69 29.33 24.57 26.20 20.42
euler97.py 3.64 4.95 10.61 2.63 3.13
euler98.py 0.41 0.61 3.84 0.51 0.61
euler99.py 0.10 0.10 2.12 0.10 0.10
euler100.py 0.10 0.10 2.43 0.10 0.10
euler101.py 0.20 0.10 4.76 0.10 0.10
euler102.py 0.10 0.10 3.44 0.10 0.10
euler103.py 0.10 0.12 2.94 0.10 0.10
euler105.py 0.61 0.41 7.90 0.42 0.41
euler106.py 0.71 0.51 8.82 0.41 0.51
euler107.py 0.31 0.62 7.19 0.31 0.31
euler108.py 3.66 19.55 17.52 8.86 10.62
euler109.py 0.41 1.23 11.61 0.62 2.36
euler111.py 2.64 17.94 25.92 6.19 17.00
euler112.py 4.57 12.55 12.85 12.45 14.97
euler114.py 0.11 0.21 5.17 0.10 0.11
euler115.py 0.31 0.42 5.58 0.21 0.41
euler116.py 0.11 0.12 4.86 0.11 0.12
euler117.py 0.11 0.12 4.76 0.12 0.11
euler119.py 0.10 0.11 3.74 0.10 0.10
euler120.py 0.11 0.11 3.56 0.12 0.11
euler121.py 0.20 0.30 4.87 0.11 0.32
euler123.py 5.59 5.42 24.12 7.08 7.48
euler124.py 1.24 1.84 9.22 0.62 2.77
euler125.py 1.32 1.52 6.28 1.22 1.11
euler126.py 4.06 15.59 13.75 3.85 17.70
euler128.py 9.50 21.56 26.49 12.63 14.76
euler135.py 3.34 5.36 7.48 2.22 3.34
euler142.py 0.20 0.71 5.57 0.22 0.51
euler143.py 0.11 0.11 4.46 0.11 0.10
euler157.py 0.12 0.12 3.97 0.11 0.12
euler162.py 0.10 0.10 3.44 0.10 0.10
euler171.py 0.10 0.10 3.74 0.11 0.11
euler173.py 0.93 1.12 6.17 0.61 1.53
euler174.py 4.45 4.16 8.08 3.74 3.23
euler181.py 0.10 0.10 3.85 0.11 0.10
euler190.py 0.10 0.10 2.43 0.10 0.10
euler193.py 0.10 0.10 2.12 0.10 0.10
euler202.py 0.10 0.10 2.74 0.10 0.10
euler205.py 0.92 0.72 10.16 1.03 0.72
euler207.py 0.21 0.83 6.29 0.32 1.43
euler222.py 0.10 0.10 3.44 0.10 0.10
euler233.py 0.10 0.10 3.35 0.10 0.10
euler234.py 4.46 9.00 19.12 7.71 7.48
euler235.py 0.20 0.20 5.47 0.32 0.22
euler240.py 14.57 14.52 48.67 12.14 22.65
total 250.37 509.72 1211.07 282.73 569.37
wins 68 33 0 59 51

Programming
Python

Comments (2)

Permalink

Three Pathological cases for the PyPy JIT

At PyCon, Maciej from the PyPy team asked for programs that the PyPy JIT did a really bad job on.

I have a directory full of over 200 little Python programs that attempt to solve Project Euler math problems. Only 186 of them get the right answer, and fewer of them run in a reasonable period of time. But this led me to concoct a quick benchmark comparing the Python implementations installed on my computer: CPython, CPython with Psyco, Unladen Swallow, and PyPy. (I haven't tested them with Jython or IronPython yet. Maybe someday.)

Basically, I just run all of euler*.py under four different Python implementations, record how long it took each implementation to run each program, and throw out the results for any programs that didn't work in all four versions of Python, or that didn't finish in 60 seconds or less on all of them.

Here are the results, for recent svn checkouts of PyPy and Unladen, CPython 2.6.4, and CPython 2.6.4 plus Psyco 1.6. (Don't get too obsessed with the exact results, since it's just a private benchmark and most of the code is unpublished. This is just background for the real information at the bottom of the post.)

script pypy unladen psyco CPython
euler1.py 0.66 0.60 0.11 0.61
euler2.py 0.10 0.10 0.10 0.40
euler3.py 0.21 0.51 0.31 0.41
euler4.py 0.20 0.51 0.20 0.31
euler5.py 0.10 0.10 0.10 0.10
euler6.py 0.10 0.10 0.10 0.10
euler7.py 0.11 0.20 0.10 0.31
euler8.py 0.10 0.10 0.10 0.10
euler9.py 0.10 0.20 0.10 0.20
euler10.py 2.23 3.98 1.74 7.73
euler11.py 0.11 0.10 0.11 0.10
euler13.py 0.10 0.10 0.10 0.10
euler14.py 44.79 2.64 1.33 2.96
euler15.py 0.10 0.10 0.10 0.10
euler16.py 0.10 0.10 0.11 0.10
euler18.py 0.10 0.10 0.10 0.10
euler19.py 0.10 0.10 0.10 0.10
euler20.py 0.10 0.10 0.10 0.10
euler21.py 0.10 0.20 0.10 0.20
euler22.py 0.10 0.10 0.10 0.10
euler23.py 4.55 20.96 4.08 10.50
euler24.py 6.77 4.18 7.88 4.65
euler25.py 0.61 0.81 0.82 0.81
euler26.py 8.47 10.10 9.09 10.41
euler27.py 3.06 6.37 1.23 8.64
euler28.py 0.10 0.10 0.10 0.10
euler29.py 0.10 0.10 0.10 0.10
euler30.py 3.04 3.55 5.62 3.85
euler32.py 2.96 3.34 4.07 3.57
euler33.py 0.10 0.11 0.10 0.10
euler34.py 7.08 11.96 11.12 12.96
euler35.py 5.97 18.01 9.91 25.38
euler36.py 1.83 2.93 3.24 2.83
euler37.py 11.23 10.83 17.89 12.13
euler38.py 1.92 1.22 1.63 1.42
euler39.py 0.10 0.30 0.10 0.20
euler40.py 0.81 0.91 0.51 0.71
euler41.py 4.18 3.54 4.86 4.05
euler42.py 0.10 0.10 0.10 0.10
euler43.py 36.69 27.91 37.50 29.19
euler44.py 1.02 7.98 0.72 4.50
euler45.py 6.95 2.54 2.36 2.45
euler46.py 0.21 0.91 0.21 1.23
euler47.py 0.71 1.92 0.61 2.97
euler48.py 0.10 0.21 0.21 0.10
euler49.py 0.31 1.02 0.72 0.72
euler51.py 22.68 27.90 49.11 31.86
euler52.py 0.71 0.81 0.91 1.11
euler53.py 0.20 0.20 0.20 0.31
euler54.py 0.30 0.20 0.21 0.21
euler56.py 0.83 0.71 0.92 0.62
euler57.py 0.41 0.51 0.52 0.72
euler58.py 6.57 37.76 6.77 58.48
euler59.py 12.85 6.88 9.81 15.98
euler61.py 0.10 0.20 0.31 0.21
euler62.py 0.31 0.20 0.41 0.20
euler63.py 0.21 0.10 0.20 0.20
euler64.py 10.93 38.67 44.00 59.01
euler65.py 0.10 0.10 0.11 0.11
euler66.py 2.43 8.49 9.82 12.13
euler67.py 0.11 0.11 0.10 0.11
euler68.py 0.10 0.10 0.10 0.10
euler69.py 0.10 0.11 0.10 0.11
euler70.py 0.41 0.73 1.03 0.61
euler71.py 0.32 1.53 0.31 1.31
euler72.py 7.85 32.59 9.77 56.56
euler73.py 9.19 27.32 2.87 25.50
euler75.py 5.15 2.22 0.61 2.32
euler77.py 0.31 0.32 0.23 0.51
euler79.py 0.20 0.20 0.10 0.10
euler80.py 0.71 0.71 0.71 0.61
euler81.py 0.10 0.20 0.10 0.10
euler82.py 0.71 0.31 0.30 0.20
euler83.py 0.20 0.61 0.41 0.51
euler84.py 2.93 14.24 4.76 17.89
euler85.py 6.67 7.20 11.16 11.84
euler87.py 1.63 1.32 0.41 0.82
euler89.py 0.10 0.10 0.10 0.10
euler93.py 3.13 5.26 5.31 11.75
euler94.py 34.40 26.40 26.30 27.80
euler97.py 3.44 4.56 2.35 3.34
euler98.py 0.30 0.61 0.51 0.51
euler99.py 0.10 0.10 0.10 0.10
euler100.py 0.10 0.10 0.10 0.10
euler101.py 0.20 0.10 0.10 0.10
euler102.py 0.10 0.10 0.10 0.10
euler103.py 0.10 0.10 0.10 0.10
euler105.py 0.61 0.30 0.81 0.30
euler106.py 0.61 0.51 0.81 0.31
euler107.py 0.20 0.41 0.20 0.20
euler108.py 3.77 13.74 7.14 9.99
euler109.py 0.31 1.22 0.31 1.64
euler111.py 3.27 15.82 4.19 15.28
euler112.py 5.96 12.45 11.42 13.50
euler114.py 0.12 0.21 0.10 0.11
euler115.py 0.22 0.41 0.30 0.62
euler116.py 0.11 0.11 0.10 0.11
euler117.py 0.10 0.10 0.10 0.11
euler119.py 0.10 0.10 0.10 0.10
euler120.py 0.10 0.10 0.10 0.10
euler121.py 0.10 0.20 0.10 0.20
euler123.py 5.87 5.26 8.78 7.05
euler124.py 0.82 1.83 0.71 2.43
euler125.py 0.93 1.42 1.33 1.32
euler135.py 28.39 5.22 2.65 4.19
euler142.py 0.20 0.81 0.11 0.41
euler157.py 0.10 0.10 0.10 0.10
euler162.py 0.10 0.10 0.10 0.10
euler171.py 0.12 0.11 0.10 0.10
euler173.py 0.92 1.42 1.02 1.01
euler174.py 35.72 4.75 2.96 3.27
euler181.py 0.10 0.10 0.11 0.10
euler190.py 0.10 0.10 0.10 0.10
euler193.py 0.10 0.10 0.10 0.10
euler202.py 0.10 0.10 0.10 0.10
euler205.py 1.83 0.72 2.35 0.71
euler207.py 0.11 0.53 0.32 0.42
euler222.py 0.10 0.10 0.10 0.10
euler233.py 0.10 0.10 0.11 0.10
euler234.py 5.77 6.62 7.38 7.48
euler235.py 0.20 0.20 0.42 0.20
euler240.py 16.17 14.96 17.38 19.59
total 409.3 492.26 393.54 593.8
wins 77 48 67 45

So all the JITs are faster than CPython for this set of programs, but none is twice as fast. PyPy has the most wins, but Psyco has the lowest total time.

While PyPy does very well overall, it's an order of magnitude slower than all the others on 3 of the programs: 14, 135, and 174. If PyPy fixed whatever made those ones slow, it would definitely be the overall leader.

Here's my euler14.py

#!/usr/bin/env python

"""Project Euler, problem 14

The following iterative sequence is defined for the set of positive integers:

n -> n / 2 (n is even)
n -> 3n + 1 (n is odd)

Which starting number, under one million, produces the longest chain?
"""

def next_num(num):
    if num & 1:
        return 3 * num + 1
    else:
        return num // 2

MAX_NUM = 1000000

lengths = {1: 0}

def series_length(num):
    global lengths
    if num in lengths:
        return lengths[num]
    else:
        num2 = next_num(num)
        result = 1 + series_length(num2)
        lengths[num] = result
        return result

def main():
    num_with_max_length = 1
    max_length = 0
    for ii in range(1, MAX_NUM):
        length = series_length(ii)
        if length > max_length:
            max_length = length
            num_with_max_length = ii
    print num_with_max_length, max_length

if __name__ == "__main__":
    main()

Why is 14 so slow? My guess would be that it's because it heavily mutates a global dictionary.

Here's euler135.py:

#!/usr/bin/env python

"""Project Euler, problem 135

Given the positive integers, x, y, and z, are consecutive terms of an
arithmetic progression, the least value of the positive integer, n, for which
the equation, x**2 - y**2 - z**2 = n, has exactly two solutions is n = 27:

34**2 - 27**2 - 20**2 = 12**2 - 9**2 - 6**2 = 27

It turns out that n = 1155 is the least value which has exactly ten solutions.

How many values of n less than one million have exactly ten distinct solutions?
"""

"""
x**2 - y**2 - z**2 = n
y = z + d
x = z + 2 * d
d > 0
z > 0
(z + 2 d) ** 2 - (z + d) ** 2 - z ** 2 = n
(z**2 + 4dz + 4d**2) - (z**2 + 2dz + d**2) - z**2 = n
z**2 + 4dz + 4d**2 -z**2 - 2dz - d**2 - z**2 = n
-z**2 + 2dz + 3d**2 - n = 0
a = -1
b = 2d
c = 3d**2 - n
z = (-b +/- sqrt(b**2-4ac)) / 2a
z = (-2d +/- sqrt(4d**2+4(3d**2-n))) / -2
z = (-2d +/- sqrt(4d**2+12d**2-4n))) / -2
z = (-2d +/- sqrt(16d**2-4n))) / -2
z = (-2d +/- 2 sqrt(4d**2 - n)) / -2
z = (-d +/-  sqrt(4d**2 - n)) / -1
z = d +/- sqrt(4d**2 - n)
4d**2 - n >= 0
n <= 4d**2
4d**2 - n is a perfect square
"""

def main():
    limit = 1000000
    counts = limit * [0]
    for d in range(1, limit / 4 + 1):
        d24 = d ** 2 * 4
        if d24 < limit:
            start = 1
            counts[d24] += 1
        else:
            start = int((d24 - limit) ** 0.5)
        if start < 1:
            start = 1
        for root in xrange(start, 2 * d):
            n = d24 - root ** 2
            if n <= 0:
                break
            if n < limit:
                z = root + d
                y = z + d
                x = y + d
                #print "n", n, "d", d, "root", root, "x", x, "y", y, "z", z
                #assert x**2 - y**2 - z**2 == n
                counts[n] += 1
                if d > root:
                    z = d - root
                    y = z + d
                    x = y + d
                    #print "n", n, "d", d, "root", root, "x", x, "y", y, "z", z
                    #assert x**2 - y**2 - z**2 == n
                    counts[n] += 1
    total = 0
    assert counts[0] == 0
    for val in counts:
        if val == 10:
            total += 1
    print total

if __name__ == "__main__":
    main()

Why is PyPy so slow on this one? Well, maybe it's because all the work happens in one function, if PyPy doesn't JIT a function while it's still running. Or maybe it's the list with a million elements in it.

Finally, euler174.py:

#!/usr/bin/env python

"""Project Euler, problem 174

We shall define a square lamina to be a square outline with a square "hole" so
that the shape possesses vertical and horizontal symmetry.

Given eight tiles it is possible to form a lamina in only one way: 3x3 square
with a 1x1 hole in the middle. However, using thirty-two tiles it is possible
to form two distinct laminae.

If t represents the number of tiles used, we shall say that t = 8 is type L(1)
and t = 32 is type L(2).

Let N(n) be the number of t <= 1000000 such that t is type L(n); for example,
N(15) = 832.

What is sum(N(n)) for 1 <= n <= 10?
"""

from math import ceil
from collections import defaultdict

def gen_laminae(limit):
    """Yield laminae with up to limit squares, as tuples
    (outer, inner, squares)"""
    for outer in range(3, limit // 4 + 2):
        if outer & 1:
            min_min_inner = 1
        else:
            min_min_inner = 2
        min_inner_squared = outer ** 2 - limit
        if min_inner_squared < 0:
            min_inner = min_min_inner
        else:
            min_inner = max(min_min_inner, int(ceil(min_inner_squared ** 0.5)))
            if outer & 1 != min_inner & 1:
                min_inner += 1
        outer_squared = outer ** 2
        for inner in range(min_inner, outer - 1, 2):
            squares = outer_squared - inner ** 2
            yield (outer, inner, squares)

def main():
    squares_to_count = defaultdict(int)
    for (outer, inner, squares) in gen_laminae(1000000):
        squares_to_count[squares] += 1
    histogram = defaultdict(int)
    for val in squares_to_count.values():
        if val <= 10:
            histogram[val] += 1
    print sum(histogram.values())

if __name__ == "__main__":
    main()

Perhaps this one is slow because it heavily uses generators and a defaultdict?

Anyway, I'm not claiming any of these are great programs. I know others have much better solutions to these problems. And this code is certainly not optimized for PyPy. They're just examples of small programs that were originally written for CPython, and do work correctly on PyPy, but happen to be much slower on PyPy than on CPython.

Again, note these are the exceptions not the rule. PyPy runs most pure-Python code great. Just not these three programs.

Math
Programming
Python

Comments (1)

Permalink

Moved Slugathon to github

I've been working on Slugathon on-and-off (but mostly off) since 2003. The bare minimum feature set that I consider necessary to do a serious alpha release is almost done. (Release early, release often; but if you release before you actually do anything useful you might just leave a bad first impression and scare off your potential users.) So I've been thinking about installers, and portability to MacOS and Windows, and whether my current hosting solution (svn and Trac provided for free to open source Python projects by WebFaction; thanks WebFaction) would cut it when I had users and possibly more contributors.

I like Trac, a lot. I now strongly prefer Git to Subversion. You can use Git and Trac together with the GitTracPlugin, but it's hard to find a host that has it installed. (SourceForge, for example, now offers both Git and Trac, but not together.) So I decided to go with github. They're the most popular Git host, their paid hosting business seems to be solid enough that they're unlikely to go away anytime soon, they offer free hosting for open source projects, and they have a wiki and an issue tracker.

So this morning I experimentally moved everything from Subversion+Trac to github. I've administered both Svn/Trac and Git/Trac installations at work, so I knew how to do this on a local machine where I had control, but doing things locally is different from doing things on a remote hosting service.

Here's what I did on github:

1. Signed up for an account. (I've poked around github plenty of times, and cloned other people's repositories there to my local box, but I never needed to register until I actually wanted to move my own repository there. Which is exactly the way it should work.)

2. Uploaded an ssh key. I already know ssh so it was just a matter of posting ~/.ssh/id_rsa.pub into a web form.

3. Re-cloned my Subversion repository to my local box, using git-svn, using the –no-metadata option to remove all the ugly "git-svn-id" blobs in the log. I'd done this before when we switched from Subversion to Git at work, but I didn't remember the exact syntax. Luckily it was right there on Github's help page so I didn't even need to check a Git man page.

4. Uploaded my new local Git repository to Github.

5. Realized that I'd forgotten to use the –authors-file option to convert the author metadata from Subversion format (bare username) to Git format (firstname lastname email). Oops, this was also on github's help page but I'd been in too much of a hurry. I poked around github until I found out how to delete a repository, deleted the one I'd just made, redid the git-svn clone with the missing option, and re-uploaded my repository.

6. Basically cut-and-pasted my wiki pages (there were only 8 of them) from Trac to Github's wiki. I did a little tweaking to convert formatting from Trac format to the Textile format that Github uses, but didn't obsess about getting every little format perfect. One thing that github's wiki doesn't support well is images; they let you link to external images, but not post attachments directly into the wiki. So for now the thumbnail screenshots are there, but they don't link to the larger versions.

7. Manually ported all my Trac tickets to github issues. I had 62 tickets, 30 closed and 32 open. Both Trac and the github issue tracker share the same #number format for referring to issues, and I mention ticket numbers inside commit messages, so I thought it was necessary to add all the already-closed issues for historical reasons. I also changed any references to Subversion commit numbers to Git commit ids inside the issues. This was an annoying data entry task, but with only 62 tickets it was faster to just do it by hand than to write some awesome Trac to github conversion script. (Which would require either cooperation from github, or advanced web scraping, since github is a rather fancy web site with JavaScript everywhere.)

8. Changed Slugathon's SourceForge page to point at github rather than WebFaction. (I'm keeping a SourceForge presence for the project because they offer useful things that Github does not, like project mailing lists. Also, if SourceForge ever integrates Git and Trac and I become dissatisfied with Github's new and fairly minimal issue tracker, I might move everything to SourceForge.)

9. Changed the main wiki page at WebFaction to point to github, too. (I will eventually ask WebFaction to delete the project, but first I want to make sure that the move was a good idea. And give Google enough time to cache the version of the page that shows the redirect to github, so people don't think the project just disappeared.)

10. Created a static web page using the instructions at pages.github.com. This feels somewhat redundant with the github wiki, but it gave me a place to park my images.

My conclusion so far: I still like Git a lot more than Subversion. I like github's wiki and issue tracker less than Trac, but they seem to be good enough.

Games
Programming
Python

Comments (0)

Permalink

Finding a missing print server

I have a TrendNet TE100-P1P print server. It's a little $30 gizmo with an Ethernet jack on one end and a parallel printer port on the other, so that you can turn a printer into a network printer without hanging it off a computer. Bought it because my new motherboard didn't have a parallel port. The other benefit is that now the printer works when my main desktop computer is off.

Anyway, we had a power failure, and the printer stopped working. No "lexmark" found in local DNS. No DHCP lease for the TrendNet gizmo.

The gizmo was still there, but I couldn't find it. There's no UI on the device, just the network jack. So I wrote a little script (below) to portscan my LAN for anything that answered on port 631. That found it.

Turns out that when the power goes out, the print server reconfigures its network settings from DHCP to fixed. It keeps its last network settings, but the DHCP / DNS server no longer has any record of them, so you can't find it unless you remember what its last IP was.

Here's the script. Because I had 700+ addresses to check, and it sometimes takes a few seconds for a failed socket connection to timeout, I used 250 threads to make it faster. (I initally tried using a separate thread for each IP, but I hit an OS limit at 255 threads, and decided to use a fixed-size pool and two shared work queues.)

#!/usr/bin/env python

"""Hunting for the TrendNet print server"""

import socket
import threading
import Queue

socket.setdefaulttimeout(5)

port = 631
num_threads = 250

class ConnectThread(threading.Thread):
    def __init__(self, in_queue, out_queue):
        threading.Thread.__init__(self)
        self.in_queue = in_queue
        self.out_queue = out_queue
        self.setDaemon(True)

    def run(self):
        while True:
            ip = self.in_queue.get()
            try:
                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                sock.connect((ip, port))
            except socket.error:
                self.out_queue.put((ip, False))
            else:
                self.out_queue.put((ip, True))
            self.in_queue.task_done()

def main():
    job_queue = Queue.Queue()
    result_queue = Queue.Queue()
    count = 0
    for net in range(3):
        for addr in range(1, 255 + 1):
            ip = "192.168.%d.%d" % (net, addr)
            count += 1
            job_queue.put(ip)
    for unused in xrange(num_threads):
        ConnectThread(job_queue, result_queue).start()
    job_queue.join()

    for unused in xrange(count):
        ip, status = result_queue.get()
        if status:
            print ip

if __name__ == "__main__":
    main()

Linux
Programming
Python

Comments (0)

Permalink

Drawing rectangles in PyGTK: GDK vs. Cairo

Someone on the PyGTK mailing list just asked which is faster for drawing rectangles, GDK or Cairo.

I wasn't sure, so I wrote a silly little test program.  Note that it's not quite apples-to-apples as the Cairo rectangles have variable transparency while the GDK rectangles are opaque.

On my box, running Gentoo Linux and the latest stable versions of everything, the Cairo version draws 500 rectangles in about 0.01 to 0.03 seconds, while the GDK version takes about 0.13 to 0.14 seconds.  So Cairo is faster.

To run this, just cut-and-paste into an editor window, save the file as rectangles.py, and run "python rectangles.py"

#!/usr/bin/env python

"""GTK rectangle drawing speed test, GDK vs. Cairo

David Ripton 2008-12-03
MIT license
"""

import time
import random

import gtk

NUM_RECTS = 500

handle_id = None

def main():
    window = gtk.Window()
    window.connect("destroy", gtk.main_quit)
    window.set_default_size(800, 600)
    vbox = gtk.VBox()
    window.add(vbox)
    area = gtk.DrawingArea()
    vbox.pack_start(area)
    gdk_button = gtk.Button("GDK")
    gdk_button.connect("clicked", on_gdk_button_clicked, area)
    vbox.pack_start(gdk_button, expand=False)
    cairo_button = gtk.Button("Cairo")
    cairo_button.connect("clicked", on_cairo_button_clicked, area)
    vbox.pack_start(cairo_button, expand=False)
    window.show_all()
    gtk.main()

def on_gdk_button_clicked(button, area):
    global handle_id
    if handle_id is not None:
        area.disconnect(handle_id)
    handle_id = area.connect("expose-event", on_area_exposed_gdk)
    area.queue_draw()

def on_cairo_button_clicked(button, area):
    global handle_id
    if handle_id is not None:
        area.disconnect(handle_id)
    handle_id = area.connect("expose-event", on_area_exposed_cairo)
    area.queue_draw()

def on_area_exposed_gdk(area, event):
    t0 = time.time()
    width, height = area.window.get_size()
    colormap = area.get_colormap()
    gc = area.get_style().fg_gc[gtk.STATE_NORMAL]
    for ii in xrange(NUM_RECTS):
        r = random.randrange(0, 65535 + 1)
        g = random.randrange(0, 65535 + 1)
        b = random.randrange(0, 65535 + 1)
        gc.foreground = colormap.alloc_color(r, g, b)
        x = random.randrange(0, width)
        y = random.randrange(0, height)
        w = random.randrange(0, width - x)
        h = random.randrange(0, height - y)
        area.window.draw_rectangle(gc, True, x, y, w, h)
    t1 = time.time()
    print "gdk drew %d rectangles in %f seconds" % (NUM_RECTS, t1-t0)

def on_area_exposed_cairo(area, event):
    t0 = time.time()
    cr = area.window.cairo_create()
    width, height = area.window.get_size()
    for ii in xrange(NUM_RECTS):
        r = random.random()
        g = random.random()
        b = random.random()
        a = random.random()
        cr.set_source_rgba(r, g, b, a)
        x = random.randrange(0, width)
        y = random.randrange(0, height)
        w = random.randrange(0, width - x)
        h = random.randrange(0, height - y)
        cr.rectangle(x, y, w, h)
        cr.fill()
    t1 = time.time()
    print "cairo drew %d rectangles in %f seconds" % (NUM_RECTS, t1-t0)

if __name__ == "__main__":
    main()

screen shot of the cairo rectangles


Rant: It is way too hard to post code with Wordpress. First I tried the "code" button, but all my indentation was destroyed (bad for any code, fatal for Python). Then I tried the "b-quote" button; same effect. Then I switched to HTML mode and hand-inserted a "pre" tag, which preserved the indentation. But Wordpress then proceded to vandalize my code with "smart" quotes. (Have you ever seen anything with "smart" in its name that actually was?) Luckily it was simple to find and install the wpuntexturize plugin, a few lines of PHP that eradicate Moron Quotes. But why the hell are they there in the first place, let alone enabled by default, let alone enabled by default inside a "pre" tag?

Programming
Python
Rant

Comments (0)

Permalink

AmpChat WxPython support

Stephen Waterbury added a WxPython client to my AmpChat example program. (Mentioned earlier in this post.)

His Mercurial repository is here.

This is still just example code, but now it's an example of multiple things:

  • how to use Twisted AMP
  • how to use Twisted with PyGTK (trivial because gtk2reactor rocks)
  • how to make WxPython's mainloop run in parallel with Twisted's using threads
  • how to write the same simple GUI program in both PyGTK and WxPython
  • how to use Mercurial to contribute to a project without commit rights in the original repo

Anyone want to contribute a PyQT or Tk version?

Programming
Python

Comments (0)

Permalink