Programming

Three Pathological cases for the PyPy JIT

At PyCon, Maciej from the PyPy team asked for programs that the PyPy JIT did a really bad job on.

I have a directory full of over 200 little Python programs that attempt to solve Project Euler math problems. Only 186 of them get the right answer, and fewer of them run in a reasonable period of time. But this led me to concoct a quick benchmark comparing the Python implementations installed on my computer: CPython, CPython with Psyco, Unladen Swallow, and PyPy. (I haven't tested them with Jython or IronPython yet. Maybe someday.)

Basically, I just run all of euler*.py under four different Python implementations, record how long it took each implementation to run each program, and throw out the results for any programs that didn't work in all four versions of Python, or that didn't finish in 60 seconds or less on all of them.

Here are the results, for recent svn checkouts of PyPy and Unladen, CPython 2.6.4, and CPython 2.6.4 plus Psyco 1.6. (Don't get too obsessed with the exact results, since it's just a private benchmark and most of the code is unpublished. This is just background for the real information at the bottom of the post.)

script pypy unladen psyco CPython
euler1.py 0.66 0.60 0.11 0.61
euler2.py 0.10 0.10 0.10 0.40
euler3.py 0.21 0.51 0.31 0.41
euler4.py 0.20 0.51 0.20 0.31
euler5.py 0.10 0.10 0.10 0.10
euler6.py 0.10 0.10 0.10 0.10
euler7.py 0.11 0.20 0.10 0.31
euler8.py 0.10 0.10 0.10 0.10
euler9.py 0.10 0.20 0.10 0.20
euler10.py 2.23 3.98 1.74 7.73
euler11.py 0.11 0.10 0.11 0.10
euler13.py 0.10 0.10 0.10 0.10
euler14.py 44.79 2.64 1.33 2.96
euler15.py 0.10 0.10 0.10 0.10
euler16.py 0.10 0.10 0.11 0.10
euler18.py 0.10 0.10 0.10 0.10
euler19.py 0.10 0.10 0.10 0.10
euler20.py 0.10 0.10 0.10 0.10
euler21.py 0.10 0.20 0.10 0.20
euler22.py 0.10 0.10 0.10 0.10
euler23.py 4.55 20.96 4.08 10.50
euler24.py 6.77 4.18 7.88 4.65
euler25.py 0.61 0.81 0.82 0.81
euler26.py 8.47 10.10 9.09 10.41
euler27.py 3.06 6.37 1.23 8.64
euler28.py 0.10 0.10 0.10 0.10
euler29.py 0.10 0.10 0.10 0.10
euler30.py 3.04 3.55 5.62 3.85
euler32.py 2.96 3.34 4.07 3.57
euler33.py 0.10 0.11 0.10 0.10
euler34.py 7.08 11.96 11.12 12.96
euler35.py 5.97 18.01 9.91 25.38
euler36.py 1.83 2.93 3.24 2.83
euler37.py 11.23 10.83 17.89 12.13
euler38.py 1.92 1.22 1.63 1.42
euler39.py 0.10 0.30 0.10 0.20
euler40.py 0.81 0.91 0.51 0.71
euler41.py 4.18 3.54 4.86 4.05
euler42.py 0.10 0.10 0.10 0.10
euler43.py 36.69 27.91 37.50 29.19
euler44.py 1.02 7.98 0.72 4.50
euler45.py 6.95 2.54 2.36 2.45
euler46.py 0.21 0.91 0.21 1.23
euler47.py 0.71 1.92 0.61 2.97
euler48.py 0.10 0.21 0.21 0.10
euler49.py 0.31 1.02 0.72 0.72
euler51.py 22.68 27.90 49.11 31.86
euler52.py 0.71 0.81 0.91 1.11
euler53.py 0.20 0.20 0.20 0.31
euler54.py 0.30 0.20 0.21 0.21
euler56.py 0.83 0.71 0.92 0.62
euler57.py 0.41 0.51 0.52 0.72
euler58.py 6.57 37.76 6.77 58.48
euler59.py 12.85 6.88 9.81 15.98
euler61.py 0.10 0.20 0.31 0.21
euler62.py 0.31 0.20 0.41 0.20
euler63.py 0.21 0.10 0.20 0.20
euler64.py 10.93 38.67 44.00 59.01
euler65.py 0.10 0.10 0.11 0.11
euler66.py 2.43 8.49 9.82 12.13
euler67.py 0.11 0.11 0.10 0.11
euler68.py 0.10 0.10 0.10 0.10
euler69.py 0.10 0.11 0.10 0.11
euler70.py 0.41 0.73 1.03 0.61
euler71.py 0.32 1.53 0.31 1.31
euler72.py 7.85 32.59 9.77 56.56
euler73.py 9.19 27.32 2.87 25.50
euler75.py 5.15 2.22 0.61 2.32
euler77.py 0.31 0.32 0.23 0.51
euler79.py 0.20 0.20 0.10 0.10
euler80.py 0.71 0.71 0.71 0.61
euler81.py 0.10 0.20 0.10 0.10
euler82.py 0.71 0.31 0.30 0.20
euler83.py 0.20 0.61 0.41 0.51
euler84.py 2.93 14.24 4.76 17.89
euler85.py 6.67 7.20 11.16 11.84
euler87.py 1.63 1.32 0.41 0.82
euler89.py 0.10 0.10 0.10 0.10
euler93.py 3.13 5.26 5.31 11.75
euler94.py 34.40 26.40 26.30 27.80
euler97.py 3.44 4.56 2.35 3.34
euler98.py 0.30 0.61 0.51 0.51
euler99.py 0.10 0.10 0.10 0.10
euler100.py 0.10 0.10 0.10 0.10
euler101.py 0.20 0.10 0.10 0.10
euler102.py 0.10 0.10 0.10 0.10
euler103.py 0.10 0.10 0.10 0.10
euler105.py 0.61 0.30 0.81 0.30
euler106.py 0.61 0.51 0.81 0.31
euler107.py 0.20 0.41 0.20 0.20
euler108.py 3.77 13.74 7.14 9.99
euler109.py 0.31 1.22 0.31 1.64
euler111.py 3.27 15.82 4.19 15.28
euler112.py 5.96 12.45 11.42 13.50
euler114.py 0.12 0.21 0.10 0.11
euler115.py 0.22 0.41 0.30 0.62
euler116.py 0.11 0.11 0.10 0.11
euler117.py 0.10 0.10 0.10 0.11
euler119.py 0.10 0.10 0.10 0.10
euler120.py 0.10 0.10 0.10 0.10
euler121.py 0.10 0.20 0.10 0.20
euler123.py 5.87 5.26 8.78 7.05
euler124.py 0.82 1.83 0.71 2.43
euler125.py 0.93 1.42 1.33 1.32
euler135.py 28.39 5.22 2.65 4.19
euler142.py 0.20 0.81 0.11 0.41
euler157.py 0.10 0.10 0.10 0.10
euler162.py 0.10 0.10 0.10 0.10
euler171.py 0.12 0.11 0.10 0.10
euler173.py 0.92 1.42 1.02 1.01
euler174.py 35.72 4.75 2.96 3.27
euler181.py 0.10 0.10 0.11 0.10
euler190.py 0.10 0.10 0.10 0.10
euler193.py 0.10 0.10 0.10 0.10
euler202.py 0.10 0.10 0.10 0.10
euler205.py 1.83 0.72 2.35 0.71
euler207.py 0.11 0.53 0.32 0.42
euler222.py 0.10 0.10 0.10 0.10
euler233.py 0.10 0.10 0.11 0.10
euler234.py 5.77 6.62 7.38 7.48
euler235.py 0.20 0.20 0.42 0.20
euler240.py 16.17 14.96 17.38 19.59
total 409.3 492.26 393.54 593.8
wins 77 48 67 45

So all the JITs are faster than CPython for this set of programs, but none is twice as fast. PyPy has the most wins, but Psyco has the lowest total time.

While PyPy does very well overall, it's an order of magnitude slower than all the others on 3 of the programs: 14, 135, and 174. If PyPy fixed whatever made those ones slow, it would definitely be the overall leader.

Here's my euler14.py

#!/usr/bin/env python

"""Project Euler, problem 14

The following iterative sequence is defined for the set of positive integers:

n -> n / 2 (n is even)
n -> 3n + 1 (n is odd)

Which starting number, under one million, produces the longest chain?
"""

def next_num(num):
    if num & 1:
        return 3 * num + 1
    else:
        return num // 2

MAX_NUM = 1000000

lengths = {1: 0}

def series_length(num):
    global lengths
    if num in lengths:
        return lengths[num]
    else:
        num2 = next_num(num)
        result = 1 + series_length(num2)
        lengths[num] = result
        return result

def main():
    num_with_max_length = 1
    max_length = 0
    for ii in range(1, MAX_NUM):
        length = series_length(ii)
        if length > max_length:
            max_length = length
            num_with_max_length = ii
    print num_with_max_length, max_length

if __name__ == "__main__":
    main()

Why is 14 so slow? My guess would be that it's because it heavily mutates a global dictionary.

Here's euler135.py:

#!/usr/bin/env python

"""Project Euler, problem 135

Given the positive integers, x, y, and z, are consecutive terms of an
arithmetic progression, the least value of the positive integer, n, for which
the equation, x**2 - y**2 - z**2 = n, has exactly two solutions is n = 27:

34**2 - 27**2 - 20**2 = 12**2 - 9**2 - 6**2 = 27

It turns out that n = 1155 is the least value which has exactly ten solutions.

How many values of n less than one million have exactly ten distinct solutions?
"""

"""
x**2 - y**2 - z**2 = n
y = z + d
x = z + 2 * d
d > 0
z > 0
(z + 2 d) ** 2 - (z + d) ** 2 - z ** 2 = n
(z**2 + 4dz + 4d**2) - (z**2 + 2dz + d**2) - z**2 = n
z**2 + 4dz + 4d**2 -z**2 - 2dz - d**2 - z**2 = n
-z**2 + 2dz + 3d**2 - n = 0
a = -1
b = 2d
c = 3d**2 - n
z = (-b +/- sqrt(b**2-4ac)) / 2a
z = (-2d +/- sqrt(4d**2+4(3d**2-n))) / -2
z = (-2d +/- sqrt(4d**2+12d**2-4n))) / -2
z = (-2d +/- sqrt(16d**2-4n))) / -2
z = (-2d +/- 2 sqrt(4d**2 - n)) / -2
z = (-d +/-  sqrt(4d**2 - n)) / -1
z = d +/- sqrt(4d**2 - n)
4d**2 - n >= 0
n <= 4d**2
4d**2 - n is a perfect square
"""

def main():
    limit = 1000000
    counts = limit * [0]
    for d in range(1, limit / 4 + 1):
        d24 = d ** 2 * 4
        if d24 < limit:
            start = 1
            counts[d24] += 1
        else:
            start = int((d24 - limit) ** 0.5)
        if start < 1:
            start = 1
        for root in xrange(start, 2 * d):
            n = d24 - root ** 2
            if n <= 0:
                break
            if n < limit:
                z = root + d
                y = z + d
                x = y + d
                #print "n", n, "d", d, "root", root, "x", x, "y", y, "z", z
                #assert x**2 - y**2 - z**2 == n
                counts[n] += 1
                if d > root:
                    z = d - root
                    y = z + d
                    x = y + d
                    #print "n", n, "d", d, "root", root, "x", x, "y", y, "z", z
                    #assert x**2 - y**2 - z**2 == n
                    counts[n] += 1
    total = 0
    assert counts[0] == 0
    for val in counts:
        if val == 10:
            total += 1
    print total

if __name__ == "__main__":
    main()

Why is PyPy so slow on this one? Well, maybe it's because all the work happens in one function, if PyPy doesn't JIT a function while it's still running. Or maybe it's the list with a million elements in it.

Finally, euler174.py:

#!/usr/bin/env python

"""Project Euler, problem 174

We shall define a square lamina to be a square outline with a square "hole" so
that the shape possesses vertical and horizontal symmetry.

Given eight tiles it is possible to form a lamina in only one way: 3x3 square
with a 1x1 hole in the middle. However, using thirty-two tiles it is possible
to form two distinct laminae.

If t represents the number of tiles used, we shall say that t = 8 is type L(1)
and t = 32 is type L(2).

Let N(n) be the number of t <= 1000000 such that t is type L(n); for example,
N(15) = 832.

What is sum(N(n)) for 1 <= n <= 10?
"""

from math import ceil
from collections import defaultdict

def gen_laminae(limit):
    """Yield laminae with up to limit squares, as tuples
    (outer, inner, squares)"""
    for outer in range(3, limit // 4 + 2):
        if outer & 1:
            min_min_inner = 1
        else:
            min_min_inner = 2
        min_inner_squared = outer ** 2 - limit
        if min_inner_squared < 0:
            min_inner = min_min_inner
        else:
            min_inner = max(min_min_inner, int(ceil(min_inner_squared ** 0.5)))
            if outer & 1 != min_inner & 1:
                min_inner += 1
        outer_squared = outer ** 2
        for inner in range(min_inner, outer - 1, 2):
            squares = outer_squared - inner ** 2
            yield (outer, inner, squares)

def main():
    squares_to_count = defaultdict(int)
    for (outer, inner, squares) in gen_laminae(1000000):
        squares_to_count[squares] += 1
    histogram = defaultdict(int)
    for val in squares_to_count.values():
        if val <= 10:
            histogram[val] += 1
    print sum(histogram.values())

if __name__ == "__main__":
    main()

Perhaps this one is slow because it heavily uses generators and a defaultdict?

Anyway, I'm not claiming any of these are great programs. I know others have much better solutions to these problems. And this code is certainly not optimized for PyPy. They're just examples of small programs that were originally written for CPython, and do work correctly on PyPy, but happen to be much slower on PyPy than on CPython.

Again, note these are the exceptions not the rule. PyPy runs most pure-Python code great. Just not these three programs.

Math
Programming
Python

Comments (0)

Permalink

Moved Slugathon to github

I've been working on Slugathon on-and-off (but mostly off) since 2003. The bare minimum feature set that I consider necessary to do a serious alpha release is almost done. (Release early, release often; but if you release before you actually do anything useful you might just leave a bad first impression and scare off your potential users.) So I've been thinking about installers, and portability to MacOS and Windows, and whether my current hosting solution (svn and Trac provided for free to open source Python projects by WebFaction; thanks WebFaction) would cut it when I had users and possibly more contributors.

I like Trac, a lot. I now strongly prefer Git to Subversion. You can use Git and Trac together with the GitTracPlugin, but it's hard to find a host that has it installed. (SourceForge, for example, now offers both Git and Trac, but not together.) So I decided to go with github. They're the most popular Git host, their paid hosting business seems to be solid enough that they're unlikely to go away anytime soon, they offer free hosting for open source projects, and they have a wiki and an issue tracker.

So this morning I experimentally moved everything from Subversion+Trac to github. I've administered both Svn/Trac and Git/Trac installations at work, so I knew how to do this on a local machine where I had control, but doing things locally is different from doing things on a remote hosting service.

Here's what I did on github:

1. Signed up for an account. (I've poked around github plenty of times, and cloned other people's repositories there to my local box, but I never needed to register until I actually wanted to move my own repository there. Which is exactly the way it should work.)

2. Uploaded an ssh key. I already know ssh so it was just a matter of posting ~/.ssh/id_rsa.pub into a web form.

3. Re-cloned my Subversion repository to my local box, using git-svn, using the –no-metadata option to remove all the ugly "git-svn-id" blobs in the log. I'd done this before when we switched from Subversion to Git at work, but I didn't remember the exact syntax. Luckily it was right there on Github's help page so I didn't even need to check a Git man page.

4. Uploaded my new local Git repository to Github.

5. Realized that I'd forgotten to use the –authors-file option to convert the author metadata from Subversion format (bare username) to Git format (firstname lastname email). Oops, this was also on github's help page but I'd been in too much of a hurry. I poked around github until I found out how to delete a repository, deleted the one I'd just made, redid the git-svn clone with the missing option, and re-uploaded my repository.

6. Basically cut-and-pasted my wiki pages (there were only 8 of them) from Trac to Github's wiki. I did a little tweaking to convert formatting from Trac format to the Textile format that Github uses, but didn't obsess about getting every little format perfect. One thing that github's wiki doesn't support well is images; they let you link to external images, but not post attachments directly into the wiki. So for now the thumbnail screenshots are there, but they don't link to the larger versions.

7. Manually ported all my Trac tickets to github issues. I had 62 tickets, 30 closed and 32 open. Both Trac and the github issue tracker share the same #number format for referring to issues, and I mention ticket numbers inside commit messages, so I thought it was necessary to add all the already-closed issues for historical reasons. I also changed any references to Subversion commit numbers to Git commit ids inside the issues. This was an annoying data entry task, but with only 62 tickets it was faster to just do it by hand than to write some awesome Trac to github conversion script. (Which would require either cooperation from github, or advanced web scraping, since github is a rather fancy web site with JavaScript everywhere.)

8. Changed Slugathon's SourceForge page to point at github rather than WebFaction. (I'm keeping a SourceForge presence for the project because they offer useful things that Github does not, like project mailing lists. Also, if SourceForge ever integrates Git and Trac and I become dissatisfied with Github's new and fairly minimal issue tracker, I might move everything to SourceForge.)

9. Changed the main wiki page at WebFaction to point to github, too. (I will eventually ask WebFaction to delete the project, but first I want to make sure that the move was a good idea. And give Google enough time to cache the version of the page that shows the redirect to github, so people don't think the project just disappeared.)

10. Created a static web page using the instructions at pages.github.com. This feels somewhat redundant with the github wiki, but it gave me a place to park my images.

My conclusion so far: I still like Git a lot more than Subversion. I like github's wiki and issue tracker less than Trac, but they seem to be good enough.

Games
Programming
Python

Comments (0)

Permalink

Finding a missing print server

I have a TrendNet TE100-P1P print server. It's a little $30 gizmo with an Ethernet jack on one end and a parallel printer port on the other, so that you can turn a printer into a network printer without hanging it off a computer. Bought it because my new motherboard didn't have a parallel port. The other benefit is that now the printer works when my main desktop computer is off.

Anyway, we had a power failure, and the printer stopped working. No "lexmark" found in local DNS. No DHCP lease for the TrendNet gizmo.

The gizmo was still there, but I couldn't find it. There's no UI on the device, just the network jack. So I wrote a little script (below) to portscan my LAN for anything that answered on port 631. That found it.

Turns out that when the power goes out, the print server reconfigures its network settings from DHCP to fixed. It keeps its last network settings, but the DHCP / DNS server no longer has any record of them, so you can't find it unless you remember what its last IP was.

Here's the script. Because I had 700+ addresses to check, and it sometimes takes a few seconds for a failed socket connection to timeout, I used 250 threads to make it faster. (I initally tried using a separate thread for each IP, but I hit an OS limit at 255 threads, and decided to use a fixed-size pool and two shared work queues.)

#!/usr/bin/env python

"""Hunting for the TrendNet print server"""

import socket
import threading
import Queue

socket.setdefaulttimeout(5)

port = 631
num_threads = 250

class ConnectThread(threading.Thread):
    def __init__(self, in_queue, out_queue):
        threading.Thread.__init__(self)
        self.in_queue = in_queue
        self.out_queue = out_queue
        self.setDaemon(True)

    def run(self):
        while True:
            ip = self.in_queue.get()
            try:
                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                sock.connect((ip, port))
            except socket.error:
                self.out_queue.put((ip, False))
            else:
                self.out_queue.put((ip, True))
            self.in_queue.task_done()

def main():
    job_queue = Queue.Queue()
    result_queue = Queue.Queue()
    count = 0
    for net in range(3):
        for addr in range(1, 255 + 1):
            ip = "192.168.%d.%d" % (net, addr)
            count += 1
            job_queue.put(ip)
    for unused in xrange(num_threads):
        ConnectThread(job_queue, result_queue).start()
    job_queue.join()

    for unused in xrange(count):
        ip, status = result_queue.get()
        if status:
            print ip

if __name__ == "__main__":
    main()

Linux
Programming
Python

Comments (0)

Permalink

Drawing rectangles in PyGTK: GDK vs. Cairo

Someone on the PyGTK mailing list just asked which is faster for drawing rectangles, GDK or Cairo.

I wasn't sure, so I wrote a silly little test program.  Note that it's not quite apples-to-apples as the Cairo rectangles have variable transparency while the GDK rectangles are opaque.

On my box, running Gentoo Linux and the latest stable versions of everything, the Cairo version draws 500 rectangles in about 0.01 to 0.03 seconds, while the GDK version takes about 0.13 to 0.14 seconds.  So Cairo is faster.

To run this, just cut-and-paste into an editor window, save the file as rectangles.py, and run "python rectangles.py"

#!/usr/bin/env python

"""GTK rectangle drawing speed test, GDK vs. Cairo

David Ripton 2008-12-03
MIT license
"""

import time
import random

import gtk

NUM_RECTS = 500

handle_id = None

def main():
    window = gtk.Window()
    window.connect("destroy", gtk.main_quit)
    window.set_default_size(800, 600)
    vbox = gtk.VBox()
    window.add(vbox)
    area = gtk.DrawingArea()
    vbox.pack_start(area)
    gdk_button = gtk.Button("GDK")
    gdk_button.connect("clicked", on_gdk_button_clicked, area)
    vbox.pack_start(gdk_button, expand=False)
    cairo_button = gtk.Button("Cairo")
    cairo_button.connect("clicked", on_cairo_button_clicked, area)
    vbox.pack_start(cairo_button, expand=False)
    window.show_all()
    gtk.main()

def on_gdk_button_clicked(button, area):
    global handle_id
    if handle_id is not None:
        area.disconnect(handle_id)
    handle_id = area.connect("expose-event", on_area_exposed_gdk)
    area.queue_draw()

def on_cairo_button_clicked(button, area):
    global handle_id
    if handle_id is not None:
        area.disconnect(handle_id)
    handle_id = area.connect("expose-event", on_area_exposed_cairo)
    area.queue_draw()

def on_area_exposed_gdk(area, event):
    t0 = time.time()
    width, height = area.window.get_size()
    colormap = area.get_colormap()
    gc = area.get_style().fg_gc[gtk.STATE_NORMAL]
    for ii in xrange(NUM_RECTS):
        r = random.randrange(0, 65535 + 1)
        g = random.randrange(0, 65535 + 1)
        b = random.randrange(0, 65535 + 1)
        gc.foreground = colormap.alloc_color(r, g, b)
        x = random.randrange(0, width)
        y = random.randrange(0, height)
        w = random.randrange(0, width - x)
        h = random.randrange(0, height - y)
        area.window.draw_rectangle(gc, True, x, y, w, h)
    t1 = time.time()
    print "gdk drew %d rectangles in %f seconds" % (NUM_RECTS, t1-t0)

def on_area_exposed_cairo(area, event):
    t0 = time.time()
    cr = area.window.cairo_create()
    width, height = area.window.get_size()
    for ii in xrange(NUM_RECTS):
        r = random.random()
        g = random.random()
        b = random.random()
        a = random.random()
        cr.set_source_rgba(r, g, b, a)
        x = random.randrange(0, width)
        y = random.randrange(0, height)
        w = random.randrange(0, width - x)
        h = random.randrange(0, height - y)
        cr.rectangle(x, y, w, h)
        cr.fill()
    t1 = time.time()
    print "cairo drew %d rectangles in %f seconds" % (NUM_RECTS, t1-t0)

if __name__ == "__main__":
    main()

screen shot of the cairo rectangles


Rant: It is way too hard to post code with Wordpress. First I tried the "code" button, but all my indentation was destroyed (bad for any code, fatal for Python). Then I tried the "b-quote" button; same effect. Then I switched to HTML mode and hand-inserted a "pre" tag, which preserved the indentation. But Wordpress then proceded to vandalize my code with "smart" quotes. (Have you ever seen anything with "smart" in its name that actually was?) Luckily it was simple to find and install the wpuntexturize plugin, a few lines of PHP that eradicate Moron Quotes. But why the hell are they there in the first place, let alone enabled by default, let alone enabled by default inside a "pre" tag?

Programming
Python
Rant

Comments (0)

Permalink

AmpChat WxPython support

Stephen Waterbury added a WxPython client to my AmpChat example program. (Mentioned earlier in this post.)

His Mercurial repository is here.

This is still just example code, but now it's an example of multiple things:

  • how to use Twisted AMP
  • how to use Twisted with PyGTK (trivial because gtk2reactor rocks)
  • how to make WxPython's mainloop run in parallel with Twisted's using threads
  • how to write the same simple GUI program in both PyGTK and WxPython
  • how to use Mercurial to contribute to a project without commit rights in the original repo

Anyone want to contribute a PyQT or Tk version?

Programming
Python

Comments (0)

Permalink

I'm no longer maintaining Colossus

I started Colossus in December 1997, almost 11 years ago. Yesterday I turned over leadership of the project to Clemens Katzer. That's a long time to maintain a software project, and it feels weird to me that I finally quit.

Colossus was my first large game project, my first large Java program, my first large GUI program, my first experience working on a complex game AI, and my first time managing a significant multi-programmer software project. Somehow it worked out pretty well, for the first few years. We had a pretty full-featured, stable Titan game, with somewhat dumb but working AI, and support for lots of cool variants. We had a rotating team of up to half a dozen developers at a time working on the game, and it was fun.

But then I added network play, and I didn't do a very good job. At the time there were no Java remote method libraries that had the features we needed, so I wrote everything myself with a simple string socket protocol, with lots of receiver threads. But I wasn't rigorous enough with thread synchronization, which meant walking the line between deadlock and corrupted game data. And gaining acceptable performance meant that a lot of logic that existed on the server side needed to be reproduced on the client side, but excessive coupling in the code base meant this couldn't be done cleanly and a lot of code got duplicated. I tried and tried to fix the mess, but couldn't put Humpty-Dumpty together again.

I finally decided to start over, in a better programming language (Python), with a better networking framework (Twisted), with a better overall design paradigm (game events flowing from the server to the client, using the Observer pattern to reduce coupling and allow complete reuse of the core game logic between the client and server) and with safe sane single-threaded code. Thus was born Slugathon. Unfortunately, as a new father I didn't have nearly as much free time as I did back when I started Colossus, and so Slugathon still isn't finished.

With Slugathon unfinished I felt obligated to keep maintaining Colossus, but I didn't really have my heart in it. I figured my job was to keep the project alive until a successor showed up to take it over. We had several guys on the team who were technically qualified to take over, but none seemed likely to put in the necessary amount of time on a consistent basis. Then Clemens joined, and not only submitted code for his pet features (the fun part that everyone wants to do) but started watching the bug tracker like a hawk, and talking to users about their bugs, and making special test builds so that users could test that their obscure hard-to-reproduce bugs were fixed, and adding documentation of which features went into which releases, and all the other not-fun crap that 90% of small volunteer open source projects can't find anyone to do. Around the same time, longtime contributor Peter Becker did a giant refactoring that reduced the amount of code duplication from disgusting to merely gross. And Clemens switched the server side from listener threads to NIO, reducing the total thread count from insane to merely scary.

Which meant I was no longer needed on Colossus. So I'm committing to releasing a stable, network-playable version of Slugathon by the end of 2008. And hoping that the Colossus team can continue cleaning up the mess I left them. Hopefully someday soon we'll have two stable networked Titan games instead of zero.

(By the way, I played a couple of four-human 2-AI networked Abyssal6 Colossus games this morning. It was great fun, even though both games had technical difficulties.)

Games
Programming
Python

Comments (0)

Permalink

Converting a svn working copy to a git-svn repo

I'm at Pycon, where we have wlan in the common areas but not in the hotel rooms (unless you pay $12/day extra). I'd like to be able to continue working on Slugathon in my room. I have a deep psychological need to check my changes in frequently.

For simple projects (one programmer, trunk only, no branches), svn is fine, until you lose your network connection to the svn server. For disconnected work on a project that lives in svn, you want git-svn. (Or maybe svk, but I don't know svk.)

Anyway, here's the recipe, starting from a working copy in ~/src/Slugathon

  1. cd ~/src/Slugathon
  2. svn status, to make sure I have no changes that aren't checked in
  3. svn info, to find the URL to the svn repository
  4. mkdir ~/src/Slugathon-git
  5. cd ~/src/Slugathon-git
  6. git svn init -t tags -b branches -T trunk URL (using URL from step 3)
  7. git svn fetch (this can be slow, for a large project, and you need net access)
  8. git log, to confirm that the history is right
  9. Ensure that the working copy has the expected files
  10. (optional) rm -rf ~/src/Slugathon; mv ~/src/Slugathon-git ~/src/Slugathon

Of course, this only helps if you know git and git-svn.  I know how to use them day-to-day, but I always have to lookup infrequent operations like this.

Programming

Comments (0)

Permalink

Twisted AMP chat example

I wrote a simple little PyGTK chat program a couple of months ago, to learn the Twisted AMP protocol.

Someone just asked about bidirectional AMP on the Twisted mailing list, which reminded me that I should publicize this example a bit.

It's in a Mercurial repository (browseable over the web if you don't want to bother with Mercurial) at
http://ripton.net/hg/ampchat

Programming
Python

Comments (1)

Permalink

"+" is valid in an email address, dammit

Dreamhost supports email addresses of the form base+whatever@domain.com

The mail goes to the same address as base@domain.com, but you can filter on the +whatever

It astounds me how many web pages refuse to accept an email address with a + in it as valid. It's valid. Really. I get the mail.

Parsing whether something is a valid email address is hard. (See the O'Reilly Mastering Regular Expressions book for a serious attempt.) If you're not willing to go to those lengths, Don't Try. You'll just make people with unusual-looking but valid email addresses mad.

If you really need the address to be valid, then send mail to it, and make the user do something to prove he received the mail.

If you don't, then trust the user. He knows more about his email address than you do.

Programming
Rant

Comments (0)

Permalink

SLOC counts for various version control systems

At work I'm currently using Git and Subversion and Mercurial for version control. Other people there are using Bazaar and CVS. Too many programs to learn! We need to eventually standardize on a primary third-generation distributed tool to replace Subversion, but it hasn't happened yet.

There are lots of opinions on which third-generation distributed version control system should replace Subversion. (I like Mercurial a lot, respect Git's power despite its ridiculous learning curve, and haven't used Bazaar enough to have a strong opinion about it, except that it used to be unusably slow.)

Here's one data point you won't see in all the comparisons: code size as measured by sloccount:

Name Version SLOC
CVS 1.12.12 136,873
Subversion 1.4.6 492,564
Mercurial 0.9.5 40,682
Git 1.5.4rc1 142,669
Bzr 1.1 125,637

Why is Subversion more than triple the size of any of its main competitors? C, two backends, language bindings, separate libraries for client and server, Apache / Webdav integration, separate admin tool, yadda yadda yadda. Do you really need all that?

Why is Mercurial less than a third the size of any of its main competitors? It's in Python, and there's definitely an aversion to feature creep, to the point where lots of features that would be core in other programs are extensions in Mercurial. (Note that the extensions that are bundled with the project are included in its SLOC number, while the ones you'd need to download separately are not.) What else? It is missing some features compared to, say, Git, but it's sure not missing two-thirds of them. Why is Bzr, which is also written in Python and has a roughly comparable feature set, three times as big?

How important is lean code? What features can the Mercurial team implement by the time Subversion merge tracking is done and stable? Is there actually anyone who can name all 143 programs that git put in my /usr/bin/ directory?

Linux
Programming

Comments (0)

Permalink