Python

PyPy 2.5 Speed Test

I saw there was a new version of PyPy available, and realized I hadn't benchmarked it in a while. So here's a quick test of CPython 2.7 versus PyPy 2.2.1 (the default on Ubuntu 14.04 LTS) and PyPy 2.5 (the latest and greatest, unless you're reading this post in the future).

The benchmark is simply running all of my Python Project Euler solutions on each of these Python versions, throwing away any that didn't work or took longer than 60 seconds on any of them, and then dumping all the runtimes. There is definitely some random noise in the numbers, but it should mostly even out over many programs. (There were 145 that made the time cut this time.)

I ran these on Ubuntu 14.04 on an Amazon EC2 c4.2xlarge instance. (A Haswell Xeon CPU running somewhere in the 2.9 – 3.2 GHz range depending on whether Turbo Mode is engaged, 4 or 8 cores depending on whether you count HyperThreading, 15 GB.)

I'll give the results first in case you don't want to scroll past the big table: PyPy 2.5 177.35s, PyPy 2.2.1 208.11s, CPython 2.7 616.73s. So PyPy 2.5 is about 17% faster than PyPy 2.2.1 on this benchmark, a nice sign that PyPy keeps getting faster even though much of the low-hanging fruit was picked long ago. And PyPy is about 3.5 times as fast, or 250% faster, than CPython 2.7 on this benchmark.

I don't see any interesting cases where CPython significantly beats PyPy. We used to see cases like that, but as PyPy has matured it's become more consistently fast on a larger set of programs.) I guess the closest to an interesting result is euler225.py, where CPython beats PyPy 23.74s to 27.75s. Not a huge difference, but worth taking a look. I may try to figure out what's going on and post a followup.

script PyPy 2.2.1 PyPy 2.5 CPython 2.7.6
euler1.py 0.31 0.10 0.31
euler2.py 0.31 0.10 0.10
euler3.py 0.20 0.10 0.10
euler4.py 0.10 0.10 0.10
euler5.py 0.10 0.10 0.10
euler6.py 0.10 0.10 0.10
euler7.py 0.10 0.10 0.20
euler8.py 0.10 0.10 0.10
euler9.py 0.10 0.10 0.10
euler10.py 0.99 0.71 3.20
euler11.py 0.10 0.10 0.10
euler13.py 0.10 0.10 0.10
euler14.py 1.50 0.90 1.50
euler15.py 0.10 0.10 0.10
euler16.py 0.10 0.10 0.10
euler18.py 0.10 0.10 0.10
euler19.py 0.10 0.10 0.10
euler20.py 0.10 0.10 0.10
euler21.py 0.10 0.10 0.10
euler22.py 0.10 0.10 0.10
euler23.py 0.60 0.60 4.91
euler24.py 1.60 1.00 2.11
euler25.py 0.20 0.10 0.10
euler26.py 1.71 1.10 1.91
euler27.py 1.10 0.60 4.71
euler28.py 0.10 0.10 0.10
euler29.py 0.10 0.10 0.10
euler30.py 0.70 0.30 2.21
euler32.py 0.60 0.50 1.30
euler33.py 0.10 0.10 0.10
euler34.py 1.50 1.00 4.91
euler35.py 2.31 1.20 10.42
euler36.py 0.30 0.30 0.90
euler37.py 2.41 1.70 5.01
euler38.py 0.40 0.30 0.50
euler39.py 0.10 0.10 0.10
euler40.py 0.20 0.20 0.30
euler41.py 1.70 0.60 1.60
euler42.py 0.10 0.10 0.10
euler43.py 5.91 4.31 11.52
euler44.py 0.30 0.30 1.10
euler45.py 0.40 0.40 0.60
euler46.py 0.20 0.10 0.40
euler47.py 0.40 0.30 1.20
euler48.py 0.10 0.10 0.10
euler49.py 0.20 0.10 0.20
euler50.py 0.90 0.70 27.36
euler51.py 4.41 2.61 9.72
euler52.py 0.20 0.10 0.30
euler53.py 0.10 0.10 0.10
euler54.py 0.20 0.20 0.10
euler55.py 0.30 0.10 0.10
euler56.py 0.20 0.10 0.30
euler57.py 0.10 0.10 0.30
euler58.py 2.61 2.41 28.96
euler59.py 2.21 1.20 9.72
euler61.py 0.20 0.10 0.10
euler62.py 0.20 0.10 0.10
euler63.py 0.10 0.10 0.10
euler64.py 3.11 2.91 38.38
euler65.py 0.10 0.10 0.10
euler66.py 0.70 0.70 5.71
euler67.py 0.10 0.10 0.10
euler68.py 0.10 0.10 0.10
euler69.py 0.10 0.10 0.10
euler70.py 0.20 0.20 0.30
euler71.py 0.10 0.10 0.40
euler72.py 3.51 2.91 21.84
euler73.py 1.90 1.71 16.23
euler75.py 7.12 0.60 1.00
euler77.py 0.20 0.30 0.20
euler79.py 0.10 0.10 0.10
euler80.py 0.10 0.10 0.20
euler81.py 0.10 0.10 0.10
euler82.py 0.20 0.20 0.20
euler83.py 0.20 0.10 0.40
euler84.py 0.90 0.80 13.83
euler85.py 0.90 1.00 5.11
euler87.py 0.20 0.30 0.40
euler88.py 3.31 1.40 2.21
euler89.py 0.10 0.10 0.10
euler92.py 13.93 7.72 46.80
euler93.py 2.11 1.20 4.01
euler94.py 4.31 3.41 5.71
euler95.py 11.62 13.33 53.92
euler97.py 0.20 0.20 0.80
euler98.py 0.20 0.20 0.20
euler99.py 0.10 0.10 0.10
euler100.py 0.10 0.10 0.10
euler101.py 0.20 0.20 0.10
euler102.py 0.10 0.10 0.10
euler103.py 0.10 0.10 0.10
euler104.py 0.30 0.30 0.60
euler105.py 0.80 0.20 0.20
euler106.py 0.80 0.20 0.20
euler107.py 0.20 0.20 0.10
euler108.py 0.50 0.50 2.81
euler109.py 0.30 0.20 1.30
euler111.py 0.80 0.30 7.02
euler112.py 2.01 0.80 4.81
euler114.py 0.10 0.10 0.10
euler115.py 0.10 0.10 0.20
euler116.py 0.10 0.10 0.10
euler117.py 0.10 0.10 0.10
euler119.py 0.10 0.10 0.10
euler120.py 0.10 0.10 0.10
euler121.py 0.10 0.10 0.10
euler122.py 13.63 13.73 18.34
euler123.py 1.10 1.11 2.21
euler124.py 0.40 0.40 0.90
euler125.py 0.20 0.30 0.40
euler126.py 0.70 0.80 5.01
euler135.py 0.20 0.20 0.60
euler136.py 8.12 6.31 36.97
euler142.py 0.10 0.10 0.10
euler143.py 0.10 0.10 0.10
euler147.py 0.10 0.10 0.30
euler149.py 9.22 7.62 17.34
euler150.py 0.10 0.10 0.20
euler162.py 0.10 0.10 0.10
euler171.py 0.10 0.10 0.10
euler172.py 0.30 0.30 0.20
euler173.py 0.10 0.10 0.30
euler174.py 0.50 0.70 0.80
euler181.py 0.10 0.10 0.10
euler188.py 0.20 0.10 0.10
euler190.py 0.10 0.10 0.10
euler202.py 0.10 0.10 0.10
euler205.py 0.40 0.40 0.30
euler206.py 4.51 4.21 15.63
euler207.py 0.20 0.20 0.20
euler222.py 0.10 0.10 0.10
euler225.py 27.75 27.75 23.74
euler227.py 0.10 0.10 0.60
euler230.py 0.10 0.10 0.10
euler233.py 0.10 0.10 0.10
euler234.py 1.30 1.00 2.51
euler235.py 0.10 0.10 0.10
euler240.py 3.41 1.91 7.22
euler267.py 0.20 0.30 0.20
euler286.py 9.02 8.12 54.39
euler345.py 15.13 21.54 27.76
euler346.py 1.00 0.80 1.80
euler347.py 6.41 6.92 20.04
euler371.py 0.10 0.10 0.10
total 208.11 177.35 616.73
wins 87 127 71

Programming
Python

Comments (0)

Permalink

Why is Go slower than Python for this parallel math code?

I was recently messing around with one of my old Project Euler solutions (specifically, problem 215) to test PyPy's new software transactional memory feature, and decided to port it to Go to see how the code compared.

The first question I had was how to do generators in Go. Python has the magic yield statement, so you can do things like this:

def gen_ways(width, blocks):
    """Generate all possible permutations of items in blocks that add up
    to width, as strings."""
    if width == 0:
        yield ""
    else:
        for block in blocks:
            if block <= width:
                for way in gen_ways(width - block, blocks):
                    yield str(block) + way

The Go equivalent is a function that returns an output-only unbuffered channel, which returns the output from a goroutine. It's not quite as terse, but then it didn't require adding extra support to the language, either.

/* Generate all possible permutations of items in blocks that add up to width,
as strings. */
func gen_ways(width int, blocks []int) chan string {
    out := make(chan string)
    go func() {
        if width == 0 {
            out <- ""
        } else {
            for _, block := range blocks {
                if block <= width {
                    for way := range gen_ways(width-block, blocks) {
                        out <- strconv.Itoa(block) + way
                    }
                }
            }
        }
        close(out)
    }()
    return out
}

Longer, but it's very nice to be able to do this without an additional keyword. One gripe: the channel is logically output-only, but the code doesn't work unless I make it bidirectional. (Because the function is recursive?)

The next function to port looked like this:

def build_mirrors(ways):
    mirrored = set()
    mirrors = set()
    for way in ways:
        rev = "".join(reversed(way))
        if way != rev:
            low, high = sorted([way, rev])
            mirrored.add(low)
            mirrors.add(high)
    return mirrored, mirrors

Go lacks a builtin set. I considered just building my own with a map, but instead pulled in github.com/deckarep/golang-set. Which worked okay. One annoyance was that I couldn't just declare a set and have a useful set; I had to call mapset.NewSet(). Can't really fault the author of the set module for that problem, though, since Go's builtin maps and arrays and chans have the same problem. Zero values that give runtime errors when you try to use them aren't very useful. So much for static typing helping find errors at compile time. Anyway, I ended up with this:

func build_mirrors(ways []string) (mirrored, mirrors mapset.Set) {
    mirrored = mapset.NewSet()
    mirrors = mapset.NewSet()
    for _, way := range ways {
        rev := reverse(way)
        if way < rev {
            mirrored.Add(way)
            mirrors.Add(rev)
        } else if rev < way {
            mirrored.Add(rev)
            mirrors.Add(way)
        }
    }
    return
}

The next concept I had to translate was a group of processes or threads running in parallel, reading from a common input queue and writing to a common results queue. Here's the Python function:

def count_combos(in_queue, out_queue, compatible_ways):
    """Read tuples of (height, prev_way) from in_queue, call gen_combos
    on each, count the number of results, and put the count on out_queue."""
    while True:
        (height, prev_way) = in_queue.get()
        if height is None:
            return
        count = 0
        for combo in gen_combos(height, prev_way, compatible_ways):
            count += 1
        out_queue.put((height, prev_way, count))

and the blob of code that calls it:

in_queue = multiprocessing.Queue()
out_queue = multiprocessing.Queue()
cpus = multiprocessing.cpu_count()

half = (height - 1) // 2
for way in ways:
    if way not in mirrors:
        in_queue.put((half, way))
# sentinels
for unused in xrange(cpus):
    in_queue.put((None, None))
procs = []
for unused in xrange(cpus):
    proc = multiprocessing.Process(target=count_combos,
                                   args=(in_queue,
                                         out_queue,
                                         compatible_ways))
    proc.daemon = True
    proc.start()
    procs.append(proc)

I think the called code is fine, but the calling code is kind of ugly, because of the way I need to manually track the process objects to clean them up later. Also, it might have been better with more explicit sentinels than just lazily using None.

The Go version of the called code is similar, except that I had to setup some structs rather than just passing tuples around.

type (
    height_way struct {
        height int
        way    string
    }
    height_way_count struct {
        height int
        way    string
        count  uint64
    }
)

const SENTINEL_HEIGHT = -1

/* Read height_way structs from in_queue, call gen_combos
   on each, count the number of results, and put height_way_count
   structs on out_queue. */
func count_combos(in_queue <-chan height_way,
    out_queue chan<- height_way_count,
    compatible_ways map[string][]string) {
    for {
        hw := <-in_queue
        height := hw.height
        way := hw.way
        if height == SENTINEL_HEIGHT {
            return
        }
        var count uint64
        for _ = range gen_combos(height, way, compatible_ways) {
            count += 1
        }
        out_queue <- height_way_count{height, way, count}
    }
}

The Go version of the calling code is nicer, because goroutines are builtin syntax so you don't need to do manual process management.

    const QUEUE_SIZE = 9999

    in_queue := make(chan height_way, QUEUE_SIZE)
    out_queue := make(chan height_way_count, QUEUE_SIZE)

    half := (height - 1) / 2
    for _, way := range ways {
        if !mirrors.Contains(way) {
            in_queue <- height_way{half, way}
        }
    }
    for ii := 0; ii < maxprocs; ii++ {
        in_queue <- height_way{SENTINEL_HEIGHT, ""}
    }
    for ii := 0; ii < maxprocs; ii++ {
        go count_combos(in_queue, out_queue, compatible_ways)
    }

It was surprisingly tricky to make this work without deadlocking. First I forgot to set the channel size, which meant I had blocking channels, which immediately blocked when I put data on them, since the goroutine to receive the data wasn't running yet. Unbuffered and buffered channels are fundamentally different beasts.

Finally, when I got everything working, the Go program only ran on a single CPU core. I remembered seeing GOMAXPROCS in the docs, and doing "GOMAXPROCS=4 ./euler215" worked. But when I tried setting it from inside the program, like "runtime.GOMAXPROCS = 4", the variable changed but the program never actually used multiple cores. So runtime.GOMAXPROCS is an attractive nuisance; it's documented, and looks like it should work, but (at least in Go 1.2.1 on Ubuntu) doesn't really. (This may be issue 1492 , but that bug is listed as fixed.)

Anyway, once the program gave the correct answer on the trivial problem size (width 9, height 3), it was time to run it with the full problem size (width 32, height 10). On my (slow) 3 GHz Phenom II quad-core, CPython 2.7 takes 21:13 and PyPy 2.3.1 takes 7:06, so I was figuring the Go version should finish in a couple of minutes.

Nope. Half an hour later, I was wondering what was wrong. An hour later, I was still wondering. The program did finish eventually, and gave the right answer (once I changed a few counters from int to uint64 to keep them from rolling over), but it took 81 minutes. Almost 4 times as slow as CPython, and almost 12 times as slow as PyPy.

So I instrumented the Go version for profiling, following the directions from the Go blog. After my program was profiled, I ran it with "time GOMAXPROCS=4 ./euler215 -cpuprofile prof.out -width 25 -height 10" so it would run fairly quickly, then ran "go tool pprof ./euler215 prof.out" to enter the interactive profiler, then "top25" to see:

(pprof) top25
Total: 2883 samples
     226   7.8%   7.8%      226   7.8% scanblock
     221   7.7%  15.5%      221   7.7% etext
     218   7.6%  23.1%      739  25.6% runtime.mallocgc
     195   6.8%  29.8%      195   6.8% runtime.xchg
     185   6.4%  36.2%      191   6.6% runtime.settype_flush
     181   6.3%  42.5%     1631  56.6% main.func·002
     173   6.0%  48.5%      173   6.0% sweepspan
     146   5.1%  53.6%      146   5.1% runtime.casp
     112   3.9%  57.5%      208   7.2% runtime.markallocated
     108   3.7%  61.2%      770  26.7% cnew
      98   3.4%  64.6%       98   3.4% markonly
      95   3.3%  67.9%      692  24.0% runtime.growslice
      92   3.2%  71.1%       92   3.2% runtime.memclr
      70   2.4%  73.5%      330  11.4% runtime.makeslice
      53   1.8%  75.4%      101   3.5% runtime.unlock
      51   1.8%  77.1%      166   5.8% runtime.chansend
      45   1.6%  78.7%      122   4.2% runtime.lock
      43   1.5%  80.2%      159   5.5% runtime.chanrecv
      35   1.2%  81.4%      597  20.7% growslice1
      31   1.1%  82.5%       31   1.1% runtime.memmove
      29   1.0%  83.5%       29   1.0% schedule
      25   0.9%  84.4%       25   0.9% park0
      24   0.8%  85.2%       24   0.8% flushptrbuf
      23   0.8%  86.0%       43   1.5% runtime.mapaccess1_faststr
      17   0.6%  86.6%       17   0.6% dequeue

Meh. It's spending a lot of time making slices. Growing slices. Clearing memory. Allocating memory. Sending and receiving on channels. Basically, all down inside the Go runtime, not in my code. Not a whole lot of useful information on what to fix. Maybe I'm doing something incorrect with maps that's causing excessive memory churn. Or maybe Go maps are just too slow.

Anyway, I'll dump the full code here, in case anyone wants to see the full example. Here's the Python program:

#!/usr/bin/env python

"""Project Euler, problem 215

Consider the problem of building a wall out of 2x1 and 3x1 bricks
(horizontal vertical dimensions) such that, for extra strength, the gaps
between horizontally adjacent bricks never line up in consecutive layers, i.e.
never form a "running crack".

For example, the following 9x3 wall is not acceptable due to the running
crack shown in red:

3222
2232
333

There are eight ways of forming a crack-free 9x3 wall, written W(9,3) = 8.

Calculate W(32,10).
"""

import sys
import multiprocessing

try:
    import psyco
    psyco.full()
except ImportError:
    pass


sys.setrecursionlimit(100)


def gen_ways(width, blocks):
    """Generate all possible permutations of items in blocks that add up
    to width, as strings."""
    if width == 0:
        yield ""
    else:
        for block in blocks:
            if block <= width:
                for way in gen_ways(width - block, blocks):
                    yield str(block) + way


def build_mirrors(ways):
    mirrored = set()
    mirrors = set()
    for way in ways:
        rev = "".join(reversed(way))
        if way != rev:
            low, high = sorted([way, rev])
            mirrored.add(low)
            mirrors.add(high)
    return mirrored, mirrors


def find_cracks(way):
    """Return the set of indexes where cracks occur in way"""
    result = set()
    total = 0
    for ch in way[:-1]:
        total += int(ch)
        result.add(total)
    return result


def crack_free(tup1, tup2, cracks):
    """Return True iff tup1 and tup2 can be adjacent without making a
    crack."""
    return not cracks[tup1].intersection(cracks[tup2])


def find_compatible_ways(way, ways, cracks):
    """Return a list of crack-free adjacent ways for way"""
    result = []
    for way2 in ways:
        if crack_free(way, way2, cracks):
            result.append(way2)
    return result


def build_compatible_ways(ways):
    cracks = {}
    for way in ways:
        cracks[way] = find_cracks(way)
    print "done generating %d cracks" % len(cracks)
    compatible_ways = {}
    compatible_ways[()] = ways
    for way in ways:
        compatible_ways[way] = find_compatible_ways(way, ways, cracks)
    return compatible_ways


def gen_combos(height, prev_way, compatible_ways):
    """Generate all ways to make a crack-free wall of size (width, height),
    as height-lists of width-strings."""
    if height == 0:
        return
    elif height == 1:
        for way in compatible_ways[prev_way]:
            yield [way]
    else:
        for way in compatible_ways[prev_way]:
            for combo in gen_combos(height - 1, way, compatible_ways):
                yield [way] + combo


def count_combos(in_queue, out_queue, compatible_ways):
    """Read tuples of (height, prev_way) from in_queue, call gen_combos
    on each, count the number of results, and put the count on out_queue."""
    while True:
        (height, prev_way) = in_queue.get()
        if height is None:
            return
        count = 0
        for combo in gen_combos(height, prev_way, compatible_ways):
            count += 1
        out_queue.put((height, prev_way, count))


def count_combos_memo(in_queue, out_queue, compatible_ways, memo):
    """Read tuples of (height, prev_way) from in_queue, call gen_combos
    on each, chain the result of memo to the last result in the combo
    to get the total count, and put the count on out_queue."""
    while True:
        (height, prev_way) = in_queue.get()
        if height is None:
            return
        count = 0
        for combo in gen_combos(height, prev_way, compatible_ways):
            last = combo[-1]
            count += memo[last]
        out_queue.put((height, prev_way, count))


def W(width, height):
    """Return the number of ways to make a crack-free wall of size
    (width, height)."""
    ways = sorted(gen_ways(width, [2, 3]))
    print "done generating %d ways" % len(ways)

    mirrored, mirrors = build_mirrors(ways)
    print "have %d mirror images " % (len(mirrored))

    compatible_ways = build_compatible_ways(ways)
    print "done generating %d compatible_ways" % sum(map(
        len, compatible_ways.itervalues()))

    in_queue = multiprocessing.Queue()
    out_queue = multiprocessing.Queue()

    cpus = multiprocessing.cpu_count()

    half = (height - 1) // 2
    for way in ways:
        if way not in mirrors:
            in_queue.put((half, way))
    # sentinels
    for unused in xrange(cpus):
        in_queue.put((None, None))
    procs = []
    for unused in xrange(cpus):
        proc = multiprocessing.Process(target=count_combos,
                                       args=(in_queue,
                                             out_queue,
                                             compatible_ways))
        proc.daemon = True
        proc.start()
        procs.append(proc)

    half_memo = {}

    num_ways = len(ways) - len(mirrors)
    for ii in xrange(num_ways):
        (unused, prev_way, count) = out_queue.get()
        half_memo[prev_way] = count
        if prev_way in mirrored:
            half_memo["".join(reversed(prev_way))] = count
        print "(%d/%d) %s mirrored=%d count=%d" % (
            ii + 1, num_ways, prev_way, prev_way in mirrored, count)

    for proc in procs:
        proc.join()

    rest = (height - 1) - half

    for way in ways:
        if way not in mirrors:
            in_queue.put((rest, way))
    # sentinels
    for unused in xrange(cpus):
        in_queue.put((None, None))
    procs = []
    for unused in xrange(cpus):
        proc = multiprocessing.Process(target=count_combos_memo, args=(
                                       in_queue, out_queue, compatible_ways,
                                       half_memo))
        proc.daemon = True
        proc.start()
        procs.append(proc)

    total = 0
    for ii in xrange(num_ways):
        (unused, prev_way, count) = out_queue.get()
        if prev_way in mirrored:
            count *= 2
        total += count
        print "(%d/%d) %s mirrored=%d count=%d total=%d" % (
            ii + 1, num_ways, prev_way, prev_way in mirrored, count, total)
    for proc in procs:
        proc.join()
    return total


def main():
    try:
        width = int(sys.argv[1])
    except IndexError:
        width = 32
    try:
        height = int(sys.argv[2])
    except IndexError:
        height = 10
    print W(width, height)


if __name__ == "__main__":
    main()

And here's the Go version:

/* Project Euler, problem 215

Consider the problem of building a wall out of 2x1 and 3x1 bricks
(horizontal vertical dimensions) such that, for extra strength, the gaps
between horizontally adjacent bricks never line up in consecutive layers, i.e.
never form a "running crack".

For example, the following 9x3 wall is not acceptable due to the running
crack shown in red:

3222
2232
333

There are eight ways of forming a crack-free 9x3 wall, written W(9,3) = 8.

Calculate W(32,10).
*/

package main

import (
    "fmt"
    "github.com/deckarep/golang-set"
    "os"
    "runtime"
    "strconv"
    "flag"
    "log"
    "runtime/pprof"
)

type (
    height_way struct {
        height int
        way    string
    }
    height_way_count struct {
        height int
        way    string
        count  uint64
    }
)

const SENTINEL_HEIGHT = -1
const QUEUE_SIZE = 9999

/* Generate all possible permutations of items in blocks that add up to width,
as strings. */
func gen_ways(width int, blocks []int) chan string {
    out := make(chan string)
    go func() {
        if width == 0 {
            out <- ""
        } else {
            for _, block := range blocks {
                if block <= width {
                    for way := range gen_ways(width-block, blocks) {
                        out <- strconv.Itoa(block) + way
                    }
                }
            }
        }
        close(out)
    }()
    return out
}

func reverse(str string) (result string) {
    for _, ch := range str {
        result = string(ch) + result
    }
    return
}

func build_mirrors(ways []string) (mirrored, mirrors mapset.Set) {
    mirrored = mapset.NewSet()
    mirrors = mapset.NewSet()
    for _, way := range ways {
        rev := reverse(way)
        if way < rev {
            mirrored.Add(way)
            mirrors.Add(rev)
        } else if rev < way {
            mirrored.Add(rev)
            mirrors.Add(way)
        }
    }
    return
}

/* Return the set of indexes where cracks occur in way */
func find_cracks(way string) (result mapset.Set) {
    result = mapset.NewSet()
    total := 0
    for ii := 0; ii < len(way)-1; ii++ {
        str1 := way[ii : ii+1]
        num, _ := strconv.Atoi(str1)
        total += num
        result.Add(total)
    }
    return result
}

/* Return True iff tup1 and tup2 can be adjacent without making a crack. */
func crack_free(tup1 string, tup2 string, cracks map[string]mapset.Set) bool {
    return cracks[tup1].Intersect(cracks[tup2]).Cardinality() == 0
}

/* Return a list of crack-free adjacent ways for way */
func find_compatible_ways(way string,
    ways []string,
    cracks map[string]mapset.Set) (result []string) {
    for _, way2 := range ways {
        if crack_free(way, way2, cracks) {
            result = append(result, way2)
        }
    }
    return result
}

func build_compatible_ways(ways []string) map[string][]string {
    cracks := make(map[string]mapset.Set)
    for _, way := range ways {
        cracks[way] = find_cracks(way)
    }
    fmt.Printf("done generating %d cracks\n", len(cracks))
    compatible_ways := make(map[string][]string)
    compatible_ways[""] = ways
    for _, way := range ways {
        compatible_ways[way] = find_compatible_ways(way, ways, cracks)
    }
    return compatible_ways
}

/* Generate all ways to make a crack-free wall of size (width, height),
   as height-arrays of width-strings. */
func gen_combos(height int,
    prev_way string,
    compatible_ways map[string][]string) chan []string {
    out := make(chan []string)
    go func() {
        if height == 0 {
            return
        } else if height == 1 {
            for _, way := range compatible_ways[prev_way] {
                wayarray := []string{way}
                out <- wayarray
            }
        } else {
            for _, way := range compatible_ways[prev_way] {
                for combo := range gen_combos(height-1, way, compatible_ways) {
                    wayarray := make([]string, height)
                    wayarray = append(wayarray, way)
                    for _, x := range combo {
                        wayarray = append(wayarray, x)
                    }
                    out <- wayarray
                }
            }
        }
        close(out)
    }()
    return out
}

/* Read height_way structs from in_queue, call gen_combos
   on each, count the number of results, and put height_way_count
   structs on out_queue. */
func count_combos(in_queue <-chan height_way,
    out_queue chan<- height_way_count,
    compatible_ways map[string][]string) {
    for {
        hw := <-in_queue
        height := hw.height
        way := hw.way
        if height == SENTINEL_HEIGHT {
            return
        }
        var count uint64
        for _ = range gen_combos(height, way, compatible_ways) {
            count += 1
        }
        out_queue <- height_way_count{height, way, count}
    }
}

/* Read tuples of (height, prev_way) from in_queue, call gen_combos
   on each, chain the result of memo to the last result in the combo
   to get the total count, and put the count on out_queue. */
func count_combos_memo(in_queue <-chan height_way,
    out_queue chan<- height_way_count,
    compatible_ways map[string][]string,
    memo map[string]uint64) {
    for {
        hw := <-in_queue
        height := hw.height
        way := hw.way
        if height == SENTINEL_HEIGHT {
            return
        }
        var count uint64
        for combo := range gen_combos(height, way, compatible_ways) {
            last := combo[len(combo)-1]
            count += memo[last]
        }
        out_queue <- height_way_count{height, way, count}
    }
}

/* Return the number of ways to make a crack-free wall of size
   (width, height). */
func W(width, height int) uint64 {
    var ways []string
    for way := range gen_ways(width, []int{2, 3}) {
        ways = append(ways, way)
    }
    fmt.Printf("done generating %d ways\n", len(ways))

    mirrored, mirrors := build_mirrors(ways)
    fmt.Printf("have %d mirror images\n", mirrored.Cardinality())

    compatible_ways := build_compatible_ways(ways)
    var total uint64
    for _, way := range compatible_ways {
        total += uint64(len(way))
    }
    fmt.Printf("done generating %d compatible_ways\n", total)

    in_queue := make(chan height_way, QUEUE_SIZE)
    out_queue := make(chan height_way_count, QUEUE_SIZE)

    half := (height - 1) / 2
    for _, way := range ways {
        if !mirrors.Contains(way) {
            in_queue <- height_way{half, way}
        }
    }
    maxprocs := runtime.NumCPU()
    // XXX This doesn't seem to work reliably in Go 1.2.1; set GOMAXPROCS
    // environment variable instead of doing it after the program starts.
    // https://code.google.com/p/go/issues/detail?id=1492 ?
    runtime.GOMAXPROCS(maxprocs)
    for ii := 0; ii < maxprocs; ii++ {
        in_queue <- height_way{SENTINEL_HEIGHT, ""}
    }
    for ii := 0; ii < maxprocs; ii++ {
        go count_combos(in_queue, out_queue, compatible_ways)
    }

    half_memo := make(map[string]uint64)

    num_ways := len(ways) - mirrors.Cardinality()
    fmt.Println("num_ways", num_ways)
    for ii := 0; ii < num_ways; ii++ {
        hwc := <-out_queue
        prev_way := hwc.way
        count := hwc.count
        half_memo[prev_way] = count
        if mirrored.Contains(prev_way) {
            half_memo[reverse(prev_way)] = count
        }
        fmt.Printf("(%d/%d) %s mirrored=%v count=%d\n", ii+1, num_ways,
            prev_way, mirrored.Contains(prev_way), count)
    }

    rest := (height - 1) - half

    in_queue2 := make(chan height_way, QUEUE_SIZE)
    out_queue2 := make(chan height_way_count, QUEUE_SIZE)

    for _, way := range ways {
        if !mirrors.Contains(way) {
            in_queue2 <- height_way{rest, way}
        }
    }
    for ii := 0; ii < maxprocs; ii++ {
        in_queue2 <- height_way{SENTINEL_HEIGHT, ""}
    }
    for ii := 0; ii < maxprocs; ii++ {
        go count_combos_memo(in_queue2, out_queue2, compatible_ways, half_memo)
    }

    var total2 uint64
    for ii := 0; ii < num_ways; ii++ {
        hwc := <-out_queue2
        prev_way := hwc.way
        count := hwc.count
        if mirrored.Contains(prev_way) {
            count *= 2
        }
        total2 += uint64(count)
        fmt.Printf("(%d/%d) %s mirrored=%v count=%d total2=%d\n", ii+1,
            num_ways, prev_way, mirrored.Contains(prev_way), count, total2)
    }
    return total2
}

func main() {
    var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
    var width = flag.Int("width", 32, "width in blocks")
    var height = flag.Int("height", 10, "height in blocks")
    flag.Parse()

    if *cpuprofile != "" {
        f, err := os.Create(*cpuprofile)
        if err != nil {
            log.Fatal(err)
        }
        pprof.StartCPUProfile(f)
        defer pprof.StopCPUProfile()
    }

    fmt.Println(W(*width, *height))
}

Programming
Python

Comments (0)

Permalink

A Quick Test of PyPy-STM

The Software Transactional Memory branch of PyPy has been under development for a while now, and the first binary release was made available earlier this month. (See the announcement.) So I went digging through my collection of Project Euler solutions, looking for a good candidate to test with pypy-stm. Basically, I needed a CPU-bound program that used multiple threads, and didn't depend on C libraries that weren't available in PyPy.

I found one, my solution to Project Euler Problem 215. It used the multiprocessing module to run on multiple CPU cores (without tripping over Python's global interpreter lock and ending up with single-core performance). But multiprocessing uses almost the same interface as threading, so it just took a few simple tweaks to switch it over.

Here's the multiprocessing version:

#!/usr/bin/env python

"""Project Euler, problem 215

Consider the problem of building a wall out of 2x1 and 3x1 bricks
(horizontal vertical dimensions) such that, for extra strength, the gaps
between horizontally adjacent bricks never line up in consecutive layers, i.e.
never form a "running crack".

For example, the following 9x3 wall is not acceptable due to the running
crack shown in red:

3222
2232
333

There are eight ways of forming a crack-free 9x3 wall, written W(9,3) = 8.

Calculate W(32,10).
"""

import sys
import multiprocessing


sys.setrecursionlimit(100)


def gen_ways(width, blocks):
    """Generate all possible permutations of items in blocks that add up
    to width, as strings."""
    if width == 0:
        yield ""
    else:
        for block in blocks:
            if block <= width:
                for way in gen_ways(width - block, blocks):
                    yield str(block) + way


def build_mirrors(ways):
    mirrored = set()
    mirrors = set()
    for way in ways:
        rev = "".join(reversed(way))
        if way != rev:
            low, high = sorted([way, rev])
            mirrored.add(low)
            mirrors.add(high)
    return mirrored, mirrors


def find_cracks(way):
    """Return the set of indexes where cracks occur in way"""
    result = set()
    total = 0
    for ch in way[:-1]:
        total += int(ch)
        result.add(total)
    return result


def crack_free(tup1, tup2, cracks):
    """Return True iff tup1 and tup2 can be adjacent without making a
    crack."""
    return not cracks[tup1].intersection(cracks[tup2])


def find_compatible_ways(way, ways, cracks):
    """Return a list of crack-free adjacent ways for way"""
    result = []
    for way2 in ways:
        if crack_free(way, way2, cracks):
            result.append(way2)
    return result


def build_compatible_ways(ways):
    cracks = {}
    for way in ways:
        cracks[way] = find_cracks(way)
    print "done generating %d cracks" % len(cracks)
    compatible_ways = {}
    compatible_ways[()] = ways
    for way in ways:
        compatible_ways[way] = find_compatible_ways(way, ways, cracks)
    return compatible_ways


def gen_combos(height, prev_way, compatible_ways):
    """Generate all ways to make a crack-free wall of size (width, height),
    as height-lists of width-strings."""
    if height == 0:
        return
    elif height == 1:
        for way in compatible_ways[prev_way]:
            yield [way]
    else:
        for way in compatible_ways[prev_way]:
            for combo in gen_combos(height - 1, way, compatible_ways):
                yield [way] + combo


def count_combos(in_queue, out_queue, compatible_ways):
    """Read tuples of (height, prev_way) from in_queue, call gen_combos
    on each, count the number of results, and put the count on out_queue."""
    while True:
        (height, prev_way) = in_queue.get()
        if height is None:
            return
        count = 0
        for combo in gen_combos(height, prev_way, compatible_ways):
            count += 1
        out_queue.put((height, prev_way, count))


def count_combos_memo(in_queue, out_queue, compatible_ways, memo):
    """Read tuples of (height, prev_way) from in_queue, call gen_combos
    on each, chain the result of memo to the last result in the combo
    to get the total count, and put the count on out_queue."""
    while True:
        (height, prev_way) = in_queue.get()
        if height is None:
            return
        count = 0
        for combo in gen_combos(height, prev_way, compatible_ways):
            last = combo[-1]
            count += memo[last]
        out_queue.put((height, prev_way, count))


def W(width, height):
    """Return the number of ways to make a crack-free wall of size
    (width, height)."""
    ways = sorted(gen_ways(width, [2, 3]))
    print "done generating %d ways" % len(ways)

    mirrored, mirrors = build_mirrors(ways)
    print "have %d mirror images " % (len(mirrored))

    compatible_ways = build_compatible_ways(ways)
    print "done generating %d compatible_ways" % sum(map(
        len, compatible_ways.itervalues()))

    in_queue = multiprocessing.Queue()
    out_queue = multiprocessing.Queue()

    cpus = multiprocessing.cpu_count()

    half = (height - 1) // 2
    for way in ways:
        if way not in mirrors:
            in_queue.put((half, way))
    # sentinels
    for unused in xrange(cpus):
        in_queue.put((None, None))
    procs = []
    for unused in xrange(cpus):
        proc = multiprocessing.Process(target=count_combos,
                                       args=(in_queue,
                                             out_queue,
                                             compatible_ways))
        proc.daemon = True
        proc.start()
        procs.append(proc)

    half_memo = {}

    num_ways = len(ways) - len(mirrors)
    for ii in xrange(num_ways):
        (unused, prev_way, count) = out_queue.get()
        half_memo[prev_way] = count
        if prev_way in mirrored:
            half_memo["".join(reversed(prev_way))] = count
        print "(%d/%d) %s mirrored=%d count=%d" % (
            ii + 1, num_ways, prev_way, prev_way in mirrored, count)

    for proc in procs:
        proc.join()

    rest = (height - 1) - half

    for way in ways:
        if way not in mirrors:
            in_queue.put((rest, way))
    # sentinels
    for unused in xrange(cpus):
        in_queue.put((None, None))
    procs = []
    for unused in xrange(cpus):
        proc = multiprocessing.Process(target=count_combos_memo, args=(
                                       in_queue, out_queue, compatible_ways,
                                       half_memo))
        proc.daemon = True
        proc.start()
        procs.append(proc)

    total = 0
    for ii in xrange(num_ways):
        (unused, prev_way, count) = out_queue.get()
        if prev_way in mirrored:
            count *= 2
        total += count
        print "(%d/%d) %s mirrored=%d count=%d total=%d" % (
            ii + 1, num_ways, prev_way, prev_way in mirrored, count, total)
    for proc in procs:
        proc.join()
    return total


def main():
    try:
        width = int(sys.argv[1])
    except IndexError:
        width = 32
    try:
        height = int(sys.argv[2])
    except IndexError:
        height = 10
    print W(width, height)


if __name__ == "__main__":
    main()

and here's the threaded version:

#!/usr/bin/env python

"""Project Euler, problem 215

Consider the problem of building a wall out of 2x1 and 3x1 bricks
(horizontal vertical dimensions) such that, for extra strength, the gaps
between horizontally adjacent bricks never line up in consecutive layers, i.e.
never form a "running crack".

For example, the following 9x3 wall is not acceptable due to the running
crack shown in red:

3222
2232
333

There are eight ways of forming a crack-free 9x3 wall, written W(9,3) = 8.

Calculate W(32,10).
"""

import sys
import multiprocessing
import threading
import Queue


sys.setrecursionlimit(100)


def gen_ways(width, blocks):
    """Generate all possible permutations of items in blocks that add up
    to width, as strings."""
    if width == 0:
        yield ""
    else:
        for block in blocks:
            if block <= width:
                for way in gen_ways(width - block, blocks):
                    yield str(block) + way


def build_mirrors(ways):
    mirrored = set()
    mirrors = set()
    for way in ways:
        rev = "".join(reversed(way))
        if way != rev:
            low, high = sorted([way, rev])
            mirrored.add(low)
            mirrors.add(high)
    return mirrored, mirrors


def find_cracks(way):
    """Return the set of indexes where cracks occur in way"""
    result = set()
    total = 0
    for ch in way[:-1]:
        total += int(ch)
        result.add(total)
    return result


def crack_free(tup1, tup2, cracks):
    """Return True iff tup1 and tup2 can be adjacent without making a
    crack."""
    return not cracks[tup1].intersection(cracks[tup2])


def find_compatible_ways(way, ways, cracks):
    """Return a list of crack-free adjacent ways for way"""
    result = []
    for way2 in ways:
        if crack_free(way, way2, cracks):
            result.append(way2)
    return result


def build_compatible_ways(ways):
    cracks = {}
    for way in ways:
        cracks[way] = find_cracks(way)
    print "done generating %d cracks" % len(cracks)
    compatible_ways = {}
    compatible_ways[()] = ways
    for way in ways:
        compatible_ways[way] = find_compatible_ways(way, ways, cracks)
    return compatible_ways


def gen_combos(height, prev_way, compatible_ways):
    """Generate all ways to make a crack-free wall of size (width, height),
    as height-lists of width-strings."""
    if height == 0:
        return
    elif height == 1:
        for way in compatible_ways[prev_way]:
            yield [way]
    else:
        for way in compatible_ways[prev_way]:
            for combo in gen_combos(height - 1, way, compatible_ways):
                yield [way] + combo


def count_combos(in_queue, out_queue, compatible_ways):
    """Read tuples of (height, prev_way) from in_queue, call gen_combos
    on each, count the number of results, and put the count on out_queue."""
    while True:
        (height, prev_way) = in_queue.get()
        if height is None:
            return
        count = 0
        for combo in gen_combos(height, prev_way, compatible_ways):
            count += 1
        out_queue.put((height, prev_way, count))


def count_combos_memo(in_queue, out_queue, compatible_ways, memo):
    """Read tuples of (height, prev_way) from in_queue, call gen_combos
    on each, chain the result of memo to the last result in the combo
    to get the total count, and put the count on out_queue."""
    while True:
        (height, prev_way) = in_queue.get()
        if height is None:
            return
        count = 0
        for combo in gen_combos(height, prev_way, compatible_ways):
            last = combo[-1]
            count += memo[last]
        out_queue.put((height, prev_way, count))


def W(width, height, cpus=None):
    """Return the number of ways to make a crack-free wall of size
    (width, height)."""
    if cpus is None:
        cpus = multiprocessing.cpu_count()
    ways = sorted(gen_ways(width, [2, 3]))
    print "done generating %d ways" % len(ways)

    mirrored, mirrors = build_mirrors(ways)
    print "have %d mirror images " % (len(mirrored))

    compatible_ways = build_compatible_ways(ways)
    print "done generating %d compatible_ways" % sum(map(
        len, compatible_ways.itervalues()))

    in_queue = Queue.Queue()
    out_queue = Queue.Queue()

    half = (height - 1) // 2
    for way in ways:
        if way not in mirrors:
            in_queue.put((half, way))
    # sentinels
    for unused in xrange(cpus):
        in_queue.put((None, None))
    procs = []
    for unused in xrange(cpus):
        proc = threading.Thread(target=count_combos, args=(in_queue,
                                out_queue, compatible_ways))
        proc.daemon = True
        proc.start()
        procs.append(proc)

    half_memo = {}

    num_ways = len(ways) - len(mirrors)
    for ii in xrange(num_ways):
        (unused, prev_way, count) = out_queue.get()
        half_memo[prev_way] = count
        if prev_way in mirrored:
            half_memo["".join(reversed(prev_way))] = count
        print "(%d/%d) %s mirrored=%d count=%d" % (
            ii + 1, num_ways, prev_way, prev_way in mirrored, count)

    for proc in procs:
        proc.join()

    rest = (height - 1) - half

    for way in ways:
        if way not in mirrors:
            in_queue.put((rest, way))
    # sentinels
    for unused in xrange(cpus):
        in_queue.put((None, None))
    procs = []
    for unused in xrange(cpus):
        proc = threading.Thread(target=count_combos_memo, args=(
                                in_queue, out_queue, compatible_ways,
                                half_memo))
        proc.daemon = True
        proc.start()
        procs.append(proc)

    total = 0
    for ii in xrange(num_ways):
        (unused, prev_way, count) = out_queue.get()
        if prev_way in mirrored:
            count *= 2
        total += count
        print "(%d/%d) %s mirrored=%d count=%d total=%d" % (
            ii + 1, num_ways, prev_way, prev_way in mirrored, count, total)
    for proc in procs:
        proc.join()
    return total


def main():
    try:
        width = int(sys.argv[1])
    except IndexError:
        width = 32
    try:
        height = int(sys.argv[2])
    except IndexError:
        height = 10
    try:
        cpus = int(sys.argv[3])
    except IndexError:
        cpus = multiprocessing.cpu_count()
    print W(width, height, cpus)


if __name__ == "__main__":
    main()

Anyway, on my old Phenom II 3.0 GHz quad-core CPU, under Kubuntu Linux 14.04, performance looks like this:

Python Elapsed time (mm:ss)
CPython 2.7.6 21:13
PyPy 2.3.1 7:06
pypy-stm 2.3r2 11:27

So, basically, PyPy is almost 3 times as fast as CPython, and pypy-stm adds enough overhead to be about 60% slower than PyPy. (Note that since this version already uses multiprocessing to run on multiple CPUs, STM is pretty much all overhead no benefit here.)

The more interesting result is for the threaded version. We'd expect it to be about 4 times as slow as the multiprocessing version on CPython and vanilla PyPy, and we'd hope it to be less than 4 times as slow on pypy-stm, showing a benefit from transactional memory letting single-process multi-threaded code avoid the GIL.

Unfortunately, I guess this program uses a bit too much memory for the current version of pypy-stm, as it consistently segfaults about 3 minutes in. Armin's blog post (above) warned that this might happen due to a bug in LLVM. Oh well.

Fortunately, my program takes command-line options that can be used to vary the size of the problem. So, instead of building a wall that's 32 units wide and 10 units high, let's build one that's only 18 units wide and 8 units high. (I picked those values experimentally, trying to find the biggest numbers that let the program complete successfully many times in a row on pypy-stm.)

First, the multiprocessing version:

Python Elapsed time (seconds)
CPython 2.7.6 0.052
PyPy 2.3.1 0.167
pypy-stm 2.3r2 3.28

So, due to JIT startup overhead, CPython is actually faster than PyPy here. pypy-stm's overhead looks really bad on the smaller problem size.

Then, the more interesting result, the threaded version:

Python Elapsed time (seconds)
CPython 2.7.6 0.044
PyPy 2.3.1 0.176
pypy-stm 2.3r2 1.21

Threads are actually slightly faster than multiprocessing on both CPython and vanilla PyPy here, I guess because the overhead of forking subprocesses exceeded the benefit of using multiple CPUs on such a small problem size. Note that pypy-stm is significantly faster on the threaded version than the multiprocessing version, at least on its best run. (All results shown are best of 3 runs.)

One interesting thing is that pypy-stm's performance was highly variable. Over the 3 runs, I saw speeds varying from 1.21s to over 3s. It appears there's an element of luck in whether the transactions collide, and when they do, the code takes longer to finish (but still gives the correct result.)

In conclusion, pypy-stm is still highly experimental, will crash if you give it a program that uses too much memory, and isn't fast enough yet to be helpful in this benchmark. Even though Python programmers love to whine about the GIL, the multiprocessing module already gives a pretty nice way around it, so the bar for pypy-stm to be practical in production (as opposed to a nice theoretical result) is pretty high. Still, seeing transactional memory work at all to parallelize threads in Python is very impressive. (Haskell has had a nice STM implementation for a while, but then Haskell doesn't allow side effects in most functions so it's a lot easier to parallelize than a language like Python.)

I'll be donating to the next phase of the pypy-stm effort, and looking forward to better results in the future.

Python

Comments (0)

Permalink

PyCon 2012

I'm back from PyCon in Silicon Valley. (If you don't know what PyCon is, it's 2000+ nerds talking about the Python programming language for a few days.) Here are some highlights I remember, in case you're looking for recommendations of videos to watch. (Of course, it's fantastic that they record everything and make it all available for free, here )

The introduction featuring dancing robots was impressive. I remember that just a few years ago it was hard to teach robots to walk, and now you (or your rich friend) can buy a robot that can be programmed to dance (among many other things) for a few thousand bucks. They ended up giving away one of the robots at the end, a pretty nice prize.

Paul Graham's keynote about ten really-hard-but-not-quite-impossible startup ideas was awesome, probably the best thing I saw at the conference.

Stormy Peters' keynote about freedom and privacy and stuff was okay; I agreed with her about pretty much everything, but I don't know if someone who didn't would actually have been swayed.

Guido van Rossum's keynote about "trolls" who whine about Python didn't do much for me.

Katie Cunningham's intro to hosting a web site in the cloud and using Fabric to maintain it was very good. I've never used Fabric but I plan to start.

Raymond Hettinger's talk about subclassing was predictably excellent. It was a very balanced talk about when it makes sense to subclass (because it lets you write less code) and when it doesn't, not the typical object oriented programming 101 nonsense where they explain how to subclass but not why it's not always the right thing to do. But since I pretty much already agreed with everything he said, I don't think I got anything out of it.

Larry Hastings' low-level stepping through Python was fun for a very select audience. Most people probably don't need that level of detail.

Paul McMillan's talk about security was very good, and useful for any web programmer who doesn't know this stuff.

Chris McDonough's talk about PDB (the Python debugger) was pretty good. I picked up a few tricks I didn't know, even though I've used PDB a bunch of times.

Moshe Zadka's talk about writing resilient programs that recover from crashes was great. He mentioned several techniques I hadn't even thought of, let alone used.

Geremy Condra's talk about security testing was good, but not quite as mind-blowing for me as his previous talk about timing attacks.

Glyph Lefkowitz's talk about low-level networking (and not Twisted) was great, though I already knew pretty much everything he covered so I probably should have been elsewhere.

The PyPy survey talk by Maciej Fijalkowski, Alex Gaynor, and Armin Rigo was actually more like two talks. Maciej and Alex gave kind of a general introduction to PyPy, then Armin talked about software transactional memory, making an analogy to garbage collection. I already follow PyPy so I didn't get too much out of this talk, but it's worth watching if you don't.

Benjamin Peterson's talk about the PyPy just-in-time compiler was more in-depth and thus more interesting to me. I'd heard most of the material before but it's complicated enough that I needed to hear it again.

Michael Bayer's talk about SQLAlchemy was okay. It was more advocacy of the right way to do a flexible database toolkit (make it flexible) than practical advice on how to use SQLAlchemy.

Both Geographic Information System talks I attended were excellent. Jason Scheirer talked about GIS and maps at a more theoretical level, covering the basic terminology and map projections. Zain Memon's talk was more practical, basically how to use a bunch of libraries to quickly create a map-oriented web site. I need to do some GIS stuff at work, so these two talks definitely justified the money my employer spent to send me to the conference.

Jacqueline Kazil and Dana Bauer's talk about mining public data released by local governments was really good. I wasn't at all familiar with most of the tools they used, so I'll need to watch the video and take notes.

Chris Lambacher's talk about cross-compiling Python and C extensions for embedded systems covered something else I need to do at work, so I was really hoping it would solve all my problems. Unfortunately, his slides showed a big mess of low-level hacks around tools that aren't really designed for cross-compiling, rather than the magic bullet I wanted (but didn't expect to find.) It's a hard problem, but maybe the emergence of ARM devices will make cross-compiling more mainstream and the tools will improve.

Erik Rose's talk about parsing MediaWiki files (which are too complicated to parse with simple regular expressions) gave us an entertaining survey of Python parsing libraries. He covered a handful in some detail, and has a survey of lots and lots more on his web site.

Ryan Kelly's talk about frozen standalone programs didn't really go much into running programs like py2exe, but rather dealt with secondary issues like auto-updaters, binary compatibility with old operating system versions, and code signing. It was interesting, but didn't solve my short-term need to make PyInstaller and PyGTK cooperate on 64-bit MacOS.

Brandon Rhodes' talk about linkers and virtual memory was really excellent, but unfortunately covered things I'd already learned, so I should have been somewhere else.

The lightning talks were a mixed bag. I enjoyed many of them, but certainly don't remember enough to review them all. I'd kind of like to attend a conference that was nothing but lightning talks, five minutes each on hundreds of subjects.

The poster session was good. There were about 30 posters, with their authors handy to talk, and about half of them covered subjects that interested me at least a bit.

The Expo Hall showed that the tech economy is doing fine, even if the larger economy is still kind of bleak. There were dozens of sponsors with booths, and at least half of them were desperately trying to hire people. If you're an out-of-work Python programmer, definitely go to PyCon.

Finally, I spent a day and a half at the sprints, hacking on PyPy. I fixed 3 or 4 minor bugs and learned a bit in the process. I would love to have spent another couple of days at the sprint, but work and family called. (I had actually planned to spend a couple of days playing tourist rather than coding, but changed my mind. So I'll have to go back and look at redwoods some other time.)

In some ways, this was the best PyCon I'd attended since the very first one in DC. The WiFi worked really well for me for a change, even though I only brought a cheap netbook. I managed to avoid attending any really bad talks. I got to write some useful code and talk to a lot of people and even play in a free poker tournament. And I caught not one but two door prize frisbees. (Unfortunately, I only won a couple of books, not a robot or an iPad, but maybe next year.)

Programming
Python

Comments (0)

Permalink

PyPy 1.8 speed

I reran my Project Euler benchmark with PyPy 1.8, PyPy 1.7, Psyco 1.6, and CPython 2.7.2. (On Gentoo Linux x86.)

Total runtime for 127 small programs was 184.18s for PyPy 1.8, 185.55s for PyPy 1.7, 343.51s for Psyco, and 533.09s for CPython.

Or in relative terms, PyPy 1.8 was less than 1% faster than 1.7. Both versions of PyPy were almost twice as fast as Psyco, and almost three times as fast as CPython.

Full results follow:

script PyPy 1.7 PyPy 1.8 Psyco 1.6 CPython 2.7.2
euler1.py 0.34 0.31 0.23 0.10
euler2.py 0.10 0.10 0.10 0.10
euler3.py 0.21 0.21 0.41 0.41
euler4.py 0.20 0.21 0.21 0.20
euler5.py 0.10 0.10 0.10 0.10
euler6.py 0.10 0.10 0.10 0.10
euler7.py 0.10 0.21 0.10 0.31
euler8.py 0.10 0.10 0.11 0.11
euler9.py 0.10 0.10 0.10 0.20
euler10.py 1.43 1.21 1.72 7.09
euler11.py 0.10 0.10 0.10 0.10
euler13.py 0.10 0.10 0.10 0.10
euler14.py 3.44 3.64 1.42 3.03
euler15.py 0.10 0.10 0.10 0.11
euler16.py 0.10 0.10 0.10 0.10
euler18.py 0.20 0.20 0.10 0.10
euler19.py 0.10 0.10 0.10 0.10
euler20.py 0.10 0.10 0.10 0.10
euler21.py 0.10 0.10 0.10 0.20
euler22.py 0.10 0.10 0.10 0.10
euler23.py 2.93 3.03 4.85 12.33
euler24.py 2.12 2.73 7.48 5.36
euler25.py 0.51 0.51 0.91 0.20
euler26.py 6.27 7.07 9.50 4.25
euler27.py 0.91 0.91 1.21 8.99
euler28.py 0.10 0.10 0.10 0.10
euler29.py 0.10 0.10 0.10 0.10
euler30.py 0.61 0.51 4.75 3.23
euler32.py 2.12 1.62 4.85 4.04
euler33.py 0.10 0.10 0.10 0.10
euler34.py 2.43 2.53 11.12 14.15
euler35.py 3.84 4.04 6.37 26.71
euler36.py 1.93 2.14 2.95 2.63
euler37.py 4.39 4.08 14.90 15.31
euler38.py 0.72 0.61 1.33 1.63
euler39.py 0.31 0.31 0.21 0.41
euler40.py 0.61 0.41 0.41 0.61
euler41.py 2.55 3.77 5.00 4.48
euler42.py 0.10 0.10 0.10 0.10
euler43.py 10.00 10.82 38.13 36.13
euler44.py 0.61 0.51 0.82 3.37
euler45.py 4.80 3.98 2.86 2.35
euler46.py 0.21 0.31 0.31 1.12
euler47.py 0.61 0.61 0.61 3.45
euler48.py 0.21 0.20 0.10 0.10
euler49.py 0.31 0.31 0.41 0.71
euler51.py 6.78 7.08 42.35 33.36
euler52.py 0.41 0.31 0.91 1.01
euler53.py 0.20 0.31 0.20 0.31
euler54.py 0.31 0.41 0.31 0.20
euler55.py 0.81 0.71 0.41 0.41
euler56.py 0.41 0.41 0.91 0.81
euler57.py 0.41 0.41 0.41 0.81
euler59.py 6.06 2.93 16.37 15.66
euler61.py 0.30 0.20 0.10 0.22
euler62.py 0.20 0.31 0.20 0.30
euler63.py 0.20 0.31 0.20 0.10
euler65.py 0.10 0.10 0.11 0.10
euler66.py 1.42 1.31 8.80 15.67
euler67.py 0.20 0.10 0.10 0.11
euler68.py 0.10 0.10 0.10 0.11
euler69.py 0.20 0.10 0.10 0.20
euler70.py 0.51 0.41 0.91 0.71
euler71.py 0.30 0.20 0.30 1.11
euler72.py 6.07 6.27 7.98 58.81
euler73.py 2.63 2.33 2.73 29.21
euler75.py 6.23 9.58 0.51 2.23
euler77.py 0.51 0.41 0.31 0.41
euler79.py 0.10 0.10 0.10 0.10
euler80.py 0.71 0.71 0.71 0.51
euler81.py 0.41 0.10 0.10 0.10
euler82.py 0.41 0.20 0.31 0.31
euler83.py 0.20 0.20 0.41 0.61
euler84.py 1.42 1.42 4.85 20.72
euler85.py 6.27 1.72 10.21 12.63
euler87.py 1.11 0.61 0.51 0.91
euler89.py 0.10 0.10 0.10 0.10
euler93.py 2.86 2.86 6.12 12.96
euler94.py 30.64 31.95 28.95 23.43
euler97.py 3.13 3.34 3.13 3.94
euler98.py 0.30 0.31 0.51 0.61
euler99.py 0.10 0.10 0.10 0.10
euler100.py 0.10 0.10 0.10 0.10
euler101.py 0.41 0.30 0.10 0.10
euler102.py 0.10 0.10 0.10 0.10
euler103.py 0.10 0.10 0.10 0.10
euler104.py 1.01 0.81 2.12 2.02
euler105.py 1.42 1.32 0.41 0.41
euler106.py 1.32 1.62 0.51 0.41
euler107.py 0.20 0.51 0.20 0.41
euler108.py 3.13 3.34 7.99 11.02
euler109.py 0.41 0.30 0.41 2.02
euler111.py 3.13 3.34 4.95 20.11
euler112.py 2.43 2.93 13.34 15.46
euler114.py 0.20 0.11 0.10 0.10
euler115.py 0.20 0.10 0.20 0.41
euler116.py 0.10 0.11 0.10 0.10
euler117.py 0.10 0.10 0.10 0.10
euler119.py 0.10 0.10 0.10 0.10
euler120.py 0.10 0.10 0.10 0.10
euler121.py 0.10 0.10 0.10 0.21
euler123.py 4.96 5.56 8.40 7.80
euler124.py 0.71 0.71 0.61 2.63
euler125.py 1.32 1.52 1.42 1.32
euler126.py 1.73 1.52 3.94 18.40
euler135.py 3.34 2.93 2.53 3.94
euler142.py 0.22 0.21 0.21 0.34
euler143.py 0.10 0.10 0.10 0.10
euler150.py 0.62 0.51 0.93 0.82
euler162.py 0.10 0.10 0.10 0.10
euler171.py 0.10 0.10 0.10 0.10
euler172.py 0.51 0.41 0.51 0.51
euler173.py 0.61 0.71 0.61 0.91
euler174.py 3.34 3.13 3.23 3.23
euler181.py 0.10 0.10 0.11 0.11
euler188.py 0.72 0.21 0.10 0.21
euler190.py 0.10 0.10 0.10 0.10
euler202.py 0.10 0.10 0.10 0.10
euler205.py 0.61 0.51 1.32 0.71
euler207.py 0.41 0.51 0.20 0.91
euler222.py 0.10 0.10 0.10 0.10
euler230.py 0.10 0.10 0.10 0.11
euler233.py 0.10 0.10 0.10 0.10
euler234.py 3.44 4.04 8.40 9.40
euler235.py 0.20 0.10 0.31 0.30
euler240.py 7.90 7.98 13.14 23.24
euler267.py 0.61 0.51 0.20 0.51
total 185.55 184.18 343.51 533.09
wins 73 75 66 49

Python

Comments (0)

Permalink

PyPy is still the fastest Python

PyPy 1.6 was just released, so I ran my Project Euler benchmark script again.

The script runs my pile of Project Euler solver programs across various implementations of Python available on my computer, and gives results only for the ones that finish successfully in less than a minute on all of them.

The contenders this time were PyPy 1.6, CPython 2.7.2, CPython 2.6.7 with Psyco, and Jython 2.5.1+. (Unladen Swallow appears to be dead, and IronPython is surprisingly painful to install on Linux, so I left them out.)

This time, 125 programs qualified, out of 197 working solvers. (Some programs require the gmpy library, which doesn't currently work on PyPy or Jython. And some use Python 2.7 syntax, which won't work with Psyco or Jython. And some are just too slow.)

Once again, PyPy is fastest, followed by Psyco, then CPython, then Jython. PyPy and Psyco are roughly twice as fast as CPython, and Jython is roughly twice as slow as CPython. (Though a lot of that is startup overhead; Jython gets more competitive for longer-lived programs.)

script pypy 1.6 jython 2.5 psyco 2.6 CPython 2.7
euler1.py 0.10 2.76 0.10 0.03
euler2.py 0.10 2.73 0.10 0.10
euler3.py 0.21 4.17 0.41 0.41
euler4.py 0.32 4.67 0.42 0.31
euler5.py 0.11 3.65 0.11 0.11
euler6.py 0.11 3.25 0.12 0.11
euler7.py 0.32 4.98 0.31 0.62
euler8.py 0.10 3.44 0.10 0.10
euler9.py 0.11 4.06 0.11 0.20
euler10.py 1.43 9.42 2.04 8.61
euler11.py 0.11 2.75 0.10 0.11
euler13.py 0.10 1.42 0.10 0.10
euler14.py 4.86 7.99 1.42 2.93
euler15.py 0.11 2.73 0.10 0.12
euler16.py 0.10 1.62 0.10 0.11
euler18.py 0.20 3.13 0.10 0.10
euler19.py 0.10 3.04 0.10 0.10
euler20.py 0.11 2.74 0.11 0.11
euler21.py 0.21 4.47 0.10 0.31
euler22.py 0.12 3.97 0.11 0.12
euler23.py 3.45 18.13 7.01 15.39
euler24.py 3.54 32.17 7.91 5.47
euler25.py 0.62 4.60 1.11 0.20
euler26.py 6.39 18.62 11.25 5.19
euler27.py 1.12 10.42 1.32 9.93
euler28.py 0.10 2.74 0.12 0.12
euler29.py 0.11 3.85 0.11 0.11
euler30.py 3.15 9.51 5.17 4.15
euler32.py 2.23 6.37 5.17 4.16
euler33.py 0.10 3.45 0.11 0.10
euler34.py 4.56 12.45 12.09 15.48
euler35.py 4.95 30.77 7.38 28.47
euler36.py 1.93 5.89 3.04 3.24
euler37.py 9.62 34.24 14.57 16.42
euler38.py 1.32 6.98 1.83 1.83
euler39.py 0.30 3.86 0.10 0.31
euler40.py 1.12 6.40 0.83 0.72
euler41.py 4.68 18.73 5.42 4.97
euler42.py 0.22 3.86 0.10 0.10
euler44.py 0.61 5.77 0.71 3.25
euler45.py 5.59 10.45 2.96 2.34
euler46.py 0.20 5.56 0.22 1.73
euler47.py 0.71 8.32 0.72 4.86
euler48.py 0.10 4.65 0.11 0.10
euler49.py 0.41 6.11 0.62 0.61
euler52.py 0.41 5.08 0.81 0.81
euler53.py 0.32 4.27 0.21 0.20
euler54.py 0.43 5.06 0.22 0.42
euler55.py 0.61 5.17 0.41 0.41
euler56.py 1.14 5.98 1.63 0.92
euler57.py 0.41 5.68 0.53 0.72
euler59.py 13.76 15.08 14.57 13.56
euler61.py 0.31 4.27 0.20 0.11
euler62.py 0.20 3.84 0.31 0.20
euler63.py 0.20 3.34 0.20 0.10
euler65.py 0.11 3.06 0.11 0.12
euler66.py 1.31 20.25 8.00 13.35
euler67.py 0.42 3.36 0.11 0.11
euler68.py 0.10 2.54 0.10 0.10
euler69.py 0.10 3.55 0.10 0.10
euler70.py 0.41 7.00 0.81 0.72
euler71.py 0.22 5.18 0.31 1.01
euler72.py 6.48 38.36 6.38 54.47
euler73.py 2.95 21.28 2.95 27.14
euler75.py 11.95 6.47 0.71 2.33
euler77.py 0.51 6.48 0.21 0.41
euler79.py 0.10 1.72 0.10 0.10
euler80.py 0.61 2.43 0.72 0.61
euler81.py 0.30 2.02 0.10 0.10
euler82.py 0.51 7.28 0.30 0.30
euler83.py 0.21 2.84 0.41 0.61
euler84.py 1.52 19.45 6.58 22.46
euler85.py 9.15 21.56 10.53 12.86
euler87.py 2.24 5.57 1.23 1.12
euler89.py 0.10 2.33 0.10 0.10
euler93.py 2.63 10.96 5.76 11.66
euler94.py 24.61 21.98 27.12 22.97
euler97.py 2.93 9.32 2.73 3.46
euler98.py 0.41 3.65 0.51 0.61
euler99.py 0.12 2.24 0.11 0.10
euler100.py 0.10 1.62 0.10 0.10
euler101.py 0.41 3.04 0.10 0.10
euler102.py 0.10 2.43 0.11 0.11
euler103.py 0.10 1.72 0.10 0.10
euler104.py 1.01 4.17 1.82 1.72
euler105.py 1.74 6.19 0.51 0.41
euler106.py 1.52 7.41 0.51 0.41
euler107.py 0.21 4.88 0.21 0.41
euler108.py 2.83 12.77 8.11 11.66
euler109.py 0.30 9.60 0.41 2.14
euler111.py 4.57 26.96 5.68 25.42
euler112.py 6.38 12.35 13.28 17.82
euler114.py 0.10 3.24 0.10 0.10
euler115.py 0.21 4.76 0.20 0.30
euler116.py 0.10 3.75 0.11 0.10
euler117.py 0.11 3.35 0.12 0.12
euler119.py 0.10 1.72 0.10 0.10
euler120.py 0.10 2.33 0.10 0.10
euler121.py 0.10 3.74 0.10 0.20
euler123.py 4.06 22.27 7.70 7.58
euler124.py 0.72 7.38 0.72 2.84
euler125.py 1.13 4.97 1.64 1.73
euler126.py 1.52 12.44 3.66 16.31
euler135.py 3.86 6.37 2.33 3.65
euler142.py 0.21 4.66 0.10 0.32
euler143.py 0.11 2.54 0.12 0.12
euler150.py 0.52 5.39 1.02 0.91
euler157.py 0.10 1.42 0.10 0.10
euler162.py 0.10 2.43 0.10 0.10
euler171.py 0.10 1.43 0.10 0.10
euler172.py 0.51 2.83 0.51 0.51
euler173.py 0.61 2.93 0.71 0.91
euler174.py 3.65 6.58 3.65 3.34
euler181.py 0.10 2.33 0.10 0.11
euler190.py 0.10 1.62 0.10 0.10
euler202.py 0.12 3.35 0.10 0.10
euler205.py 1.02 8.51 1.12 0.71
euler207.py 0.41 4.36 0.20 0.81
euler222.py 0.10 1.72 0.10 0.10
euler230.py 0.11 3.54 0.11 0.12
euler233.py 0.12 3.15 0.12 0.12
euler234.py 4.56 19.15 7.40 9.96
euler235.py 0.42 4.86 0.62 0.42
euler240.py 10.36 50.91 11.76 22.39
euler267.py 0.51 3.95 0.20 0.41
total 209.07 946.66 267.48 473.62
wins 82 1 57 55

Programming
Python

Comments (0)

Permalink

PyPy gets faster

In a previous post, I mentioned that PyPy was the fastest Python implementation for most of my Project Euler programs, but that it was very slow for a few of them.

This is no longer the case. The jit-generator branch was merged a few days ago, fixing a weakness with code that uses generators. And now PyPy is clearly the fastest Python implementation for this code, with both the most wins and the lowest overall time. Psyco is still pretty close. Both are a bit more than twice as fast as CPython.

I compared PyPy trunk, Unladen Swallow trunk, Jython 2.5.1+, CPython 2.6.6 with Psyco, and CPython 2.7.

PyPy is very strong across the board. Its worst result is on euler94, a Sudoku solver that heavily uses sets and copy.deepcopy.

Psyco still does very well, but it doesn't work on Python 2.7 yet and still doesn't work on amd64, so it feels more and more like a dead end.

Unladen Swallow hasn't had a commit since August. I suspect it's just resting, not dead, but it's falling behind PyPy in performance. Version 2.8 of LLVM has been released, but Unladen still requires version 2.7.

CPython is the baseline. I used 2.7, the latest Python 2. (Some of my Euler programs work in Python 3; some don't.)

Jython is by far the slowest. Its large startup overhead hurts on the easy problems, and the mature HotSpot JIT isn't enough for it to catch up on the harder ones. Jython does have the advantage of being free-threaded, but since this code was originally written for CPython it rarely uses multiple threads and so doesn't really benefit.

Here are the numbers.

script PyPy Unladen Jython Psyco CPython
euler1.py 0.01 0.12 3.65 0.10 0.02
euler2.py 0.10 0.10 3.94 0.10 0.10
euler3.py 0.23 0.91 6.40 0.51 0.40
euler4.py 0.31 0.84 6.49 0.41 0.42
euler5.py 0.12 0.12 5.38 0.12 0.12
euler6.py 0.12 0.11 5.50 0.12 0.11
euler7.py 0.33 0.52 7.29 0.12 0.73
euler8.py 0.13 0.12 4.77 0.11 0.11
euler9.py 0.11 0.32 6.67 0.10 0.21
euler10.py 2.34 6.02 11.81 1.93 12.02
euler11.py 0.10 0.10 4.28 0.10 0.11
euler13.py 0.10 0.10 3.36 0.10 0.10
euler14.py 3.88 2.76 8.46 1.96 3.07
euler15.py 0.11 0.11 4.00 0.10 0.10
euler16.py 0.11 0.12 3.58 0.11 0.13
euler18.py 0.10 0.10 3.45 0.10 0.10
euler19.py 0.13 0.11 3.06 0.12 0.10
euler20.py 0.10 0.10 2.45 0.10 0.10
euler21.py 0.10 0.21 4.16 0.10 0.21
euler22.py 0.10 0.10 3.95 0.10 0.10
euler23.py 3.84 21.33 15.36 4.45 12.03
euler24.py 5.97 5.66 31.25 6.97 5.46
euler25.py 0.71 0.91 3.44 0.91 0.20
euler26.py 9.59 10.84 20.61 9.74 3.54
euler27.py 1.33 7.08 12.35 1.32 11.33
euler28.py 0.11 0.12 4.37 0.12 0.11
euler29.py 0.22 0.12 6.49 0.12 0.22
euler30.py 3.95 4.45 10.72 5.37 4.26
euler32.py 2.83 3.54 9.20 4.95 3.34
euler33.py 0.11 0.12 5.26 0.11 0.10
euler34.py 4.35 12.33 13.86 12.67 16.08
euler35.py 5.59 18.19 30.04 6.47 25.47
euler36.py 1.83 3.24 7.39 2.93 2.44
euler37.py 10.92 12.44 39.17 14.57 17.50
euler38.py 2.24 1.31 9.91 2.14 2.34
euler39.py 0.21 0.42 5.79 0.10 0.41
euler40.py 1.12 1.63 7.30 0.42 1.13
euler41.py 4.25 3.94 20.63 5.46 4.76
euler42.py 0.11 0.21 4.98 0.12 0.11
euler44.py 0.91 9.41 7.48 1.02 3.75
euler45.py 8.01 2.73 12.06 2.73 2.12
euler46.py 0.52 1.13 6.98 0.31 1.14
euler47.py 1.44 2.44 8.03 0.73 3.65
euler48.py 0.10 0.20 5.49 0.21 0.10
euler49.py 0.62 1.74 6.72 0.82 1.02
euler50.py 2.23 52.50 58.27 6.11 56.61
euler52.py 0.61 0.81 5.36 0.82 0.71
euler53.py 0.22 0.42 5.98 0.21 0.42
euler54.py 0.41 0.32 7.59 0.32 0.32
euler55.py 0.82 0.73 6.39 0.52 0.73
euler56.py 0.92 1.22 5.98 1.03 1.43
euler57.py 0.52 0.72 6.09 0.53 0.82
euler58.py 6.67 34.19 44.30 7.60 53.89
euler59.py 11.42 8.19 17.06 12.81 14.35
euler61.py 0.31 0.21 5.38 0.11 0.10
euler62.py 0.42 0.32 6.49 0.53 0.22
euler63.py 0.21 0.21 6.89 0.20 0.11
euler65.py 0.10 0.10 3.85 0.10 0.10
euler66.py 1.92 7.68 23.86 7.18 12.33
euler67.py 0.11 0.12 4.77 0.11 0.12
euler68.py 0.11 0.12 4.16 0.10 0.11
euler69.py 0.11 0.21 4.96 0.11 0.11
euler70.py 0.73 0.61 7.18 1.02 1.22
euler71.py 0.21 1.44 6.48 0.32 0.91
euler72.py 7.23 30.18 37.07 8.42 49.23
euler73.py 9.50 22.54 21.23 3.35 24.59
euler75.py 2.12 2.22 5.56 0.62 2.43
euler77.py 0.41 0.30 4.86 0.20 0.41
euler79.py 0.10 0.10 3.13 0.10 0.10
euler80.py 0.91 0.71 3.94 0.72 0.62
euler81.py 0.10 0.20 3.54 0.10 0.10
euler82.py 0.71 0.30 8.70 0.30 0.20
euler83.py 0.20 0.81 4.04 0.41 0.61
euler84.py 2.65 15.13 23.56 4.66 19.91
euler85.py 7.98 7.99 22.23 10.41 13.85
euler87.py 1.93 1.74 7.28 1.03 1.23
euler89.py 0.11 0.11 5.17 0.11 0.10
euler93.py 3.54 5.56 12.72 6.58 12.33
euler94.py 34.69 29.33 24.57 26.20 20.42
euler97.py 3.64 4.95 10.61 2.63 3.13
euler98.py 0.41 0.61 3.84 0.51 0.61
euler99.py 0.10 0.10 2.12 0.10 0.10
euler100.py 0.10 0.10 2.43 0.10 0.10
euler101.py 0.20 0.10 4.76 0.10 0.10
euler102.py 0.10 0.10 3.44 0.10 0.10
euler103.py 0.10 0.12 2.94 0.10 0.10
euler105.py 0.61 0.41 7.90 0.42 0.41
euler106.py 0.71 0.51 8.82 0.41 0.51
euler107.py 0.31 0.62 7.19 0.31 0.31
euler108.py 3.66 19.55 17.52 8.86 10.62
euler109.py 0.41 1.23 11.61 0.62 2.36
euler111.py 2.64 17.94 25.92 6.19 17.00
euler112.py 4.57 12.55 12.85 12.45 14.97
euler114.py 0.11 0.21 5.17 0.10 0.11
euler115.py 0.31 0.42 5.58 0.21 0.41
euler116.py 0.11 0.12 4.86 0.11 0.12
euler117.py 0.11 0.12 4.76 0.12 0.11
euler119.py 0.10 0.11 3.74 0.10 0.10
euler120.py 0.11 0.11 3.56 0.12 0.11
euler121.py 0.20 0.30 4.87 0.11 0.32
euler123.py 5.59 5.42 24.12 7.08 7.48
euler124.py 1.24 1.84 9.22 0.62 2.77
euler125.py 1.32 1.52 6.28 1.22 1.11
euler126.py 4.06 15.59 13.75 3.85 17.70
euler128.py 9.50 21.56 26.49 12.63 14.76
euler135.py 3.34 5.36 7.48 2.22 3.34
euler142.py 0.20 0.71 5.57 0.22 0.51
euler143.py 0.11 0.11 4.46 0.11 0.10
euler157.py 0.12 0.12 3.97 0.11 0.12
euler162.py 0.10 0.10 3.44 0.10 0.10
euler171.py 0.10 0.10 3.74 0.11 0.11
euler173.py 0.93 1.12 6.17 0.61 1.53
euler174.py 4.45 4.16 8.08 3.74 3.23
euler181.py 0.10 0.10 3.85 0.11 0.10
euler190.py 0.10 0.10 2.43 0.10 0.10
euler193.py 0.10 0.10 2.12 0.10 0.10
euler202.py 0.10 0.10 2.74 0.10 0.10
euler205.py 0.92 0.72 10.16 1.03 0.72
euler207.py 0.21 0.83 6.29 0.32 1.43
euler222.py 0.10 0.10 3.44 0.10 0.10
euler233.py 0.10 0.10 3.35 0.10 0.10
euler234.py 4.46 9.00 19.12 7.71 7.48
euler235.py 0.20 0.20 5.47 0.32 0.22
euler240.py 14.57 14.52 48.67 12.14 22.65
total 250.37 509.72 1211.07 282.73 569.37
wins 68 33 0 59 51

Programming
Python

Comments (2)

Permalink

Three Pathological cases for the PyPy JIT

At PyCon, Maciej from the PyPy team asked for programs that the PyPy JIT did a really bad job on.

I have a directory full of over 200 little Python programs that attempt to solve Project Euler math problems. Only 186 of them get the right answer, and fewer of them run in a reasonable period of time. But this led me to concoct a quick benchmark comparing the Python implementations installed on my computer: CPython, CPython with Psyco, Unladen Swallow, and PyPy. (I haven't tested them with Jython or IronPython yet. Maybe someday.)

Basically, I just run all of euler*.py under four different Python implementations, record how long it took each implementation to run each program, and throw out the results for any programs that didn't work in all four versions of Python, or that didn't finish in 60 seconds or less on all of them.

Here are the results, for recent svn checkouts of PyPy and Unladen, CPython 2.6.4, and CPython 2.6.4 plus Psyco 1.6. (Don't get too obsessed with the exact results, since it's just a private benchmark and most of the code is unpublished. This is just background for the real information at the bottom of the post.)

script pypy unladen psyco CPython
euler1.py 0.66 0.60 0.11 0.61
euler2.py 0.10 0.10 0.10 0.40
euler3.py 0.21 0.51 0.31 0.41
euler4.py 0.20 0.51 0.20 0.31
euler5.py 0.10 0.10 0.10 0.10
euler6.py 0.10 0.10 0.10 0.10
euler7.py 0.11 0.20 0.10 0.31
euler8.py 0.10 0.10 0.10 0.10
euler9.py 0.10 0.20 0.10 0.20
euler10.py 2.23 3.98 1.74 7.73
euler11.py 0.11 0.10 0.11 0.10
euler13.py 0.10 0.10 0.10 0.10
euler14.py 44.79 2.64 1.33 2.96
euler15.py 0.10 0.10 0.10 0.10
euler16.py 0.10 0.10 0.11 0.10
euler18.py 0.10 0.10 0.10 0.10
euler19.py 0.10 0.10 0.10 0.10
euler20.py 0.10 0.10 0.10 0.10
euler21.py 0.10 0.20 0.10 0.20
euler22.py 0.10 0.10 0.10 0.10
euler23.py 4.55 20.96 4.08 10.50
euler24.py 6.77 4.18 7.88 4.65
euler25.py 0.61 0.81 0.82 0.81
euler26.py 8.47 10.10 9.09 10.41
euler27.py 3.06 6.37 1.23 8.64
euler28.py 0.10 0.10 0.10 0.10
euler29.py 0.10 0.10 0.10 0.10
euler30.py 3.04 3.55 5.62 3.85
euler32.py 2.96 3.34 4.07 3.57
euler33.py 0.10 0.11 0.10 0.10
euler34.py 7.08 11.96 11.12 12.96
euler35.py 5.97 18.01 9.91 25.38
euler36.py 1.83 2.93 3.24 2.83
euler37.py 11.23 10.83 17.89 12.13
euler38.py 1.92 1.22 1.63 1.42
euler39.py 0.10 0.30 0.10 0.20
euler40.py 0.81 0.91 0.51 0.71
euler41.py 4.18 3.54 4.86 4.05
euler42.py 0.10 0.10 0.10 0.10
euler43.py 36.69 27.91 37.50 29.19
euler44.py 1.02 7.98 0.72 4.50
euler45.py 6.95 2.54 2.36 2.45
euler46.py 0.21 0.91 0.21 1.23
euler47.py 0.71 1.92 0.61 2.97
euler48.py 0.10 0.21 0.21 0.10
euler49.py 0.31 1.02 0.72 0.72
euler51.py 22.68 27.90 49.11 31.86
euler52.py 0.71 0.81 0.91 1.11
euler53.py 0.20 0.20 0.20 0.31
euler54.py 0.30 0.20 0.21 0.21
euler56.py 0.83 0.71 0.92 0.62
euler57.py 0.41 0.51 0.52 0.72
euler58.py 6.57 37.76 6.77 58.48
euler59.py 12.85 6.88 9.81 15.98
euler61.py 0.10 0.20 0.31 0.21
euler62.py 0.31 0.20 0.41 0.20
euler63.py 0.21 0.10 0.20 0.20
euler64.py 10.93 38.67 44.00 59.01
euler65.py 0.10 0.10 0.11 0.11
euler66.py 2.43 8.49 9.82 12.13
euler67.py 0.11 0.11 0.10 0.11
euler68.py 0.10 0.10 0.10 0.10
euler69.py 0.10 0.11 0.10 0.11
euler70.py 0.41 0.73 1.03 0.61
euler71.py 0.32 1.53 0.31 1.31
euler72.py 7.85 32.59 9.77 56.56
euler73.py 9.19 27.32 2.87 25.50
euler75.py 5.15 2.22 0.61 2.32
euler77.py 0.31 0.32 0.23 0.51
euler79.py 0.20 0.20 0.10 0.10
euler80.py 0.71 0.71 0.71 0.61
euler81.py 0.10 0.20 0.10 0.10
euler82.py 0.71 0.31 0.30 0.20
euler83.py 0.20 0.61 0.41 0.51
euler84.py 2.93 14.24 4.76 17.89
euler85.py 6.67 7.20 11.16 11.84
euler87.py 1.63 1.32 0.41 0.82
euler89.py 0.10 0.10 0.10 0.10
euler93.py 3.13 5.26 5.31 11.75
euler94.py 34.40 26.40 26.30 27.80
euler97.py 3.44 4.56 2.35 3.34
euler98.py 0.30 0.61 0.51 0.51
euler99.py 0.10 0.10 0.10 0.10
euler100.py 0.10 0.10 0.10 0.10
euler101.py 0.20 0.10 0.10 0.10
euler102.py 0.10 0.10 0.10 0.10
euler103.py 0.10 0.10 0.10 0.10
euler105.py 0.61 0.30 0.81 0.30
euler106.py 0.61 0.51 0.81 0.31
euler107.py 0.20 0.41 0.20 0.20
euler108.py 3.77 13.74 7.14 9.99
euler109.py 0.31 1.22 0.31 1.64
euler111.py 3.27 15.82 4.19 15.28
euler112.py 5.96 12.45 11.42 13.50
euler114.py 0.12 0.21 0.10 0.11
euler115.py 0.22 0.41 0.30 0.62
euler116.py 0.11 0.11 0.10 0.11
euler117.py 0.10 0.10 0.10 0.11
euler119.py 0.10 0.10 0.10 0.10
euler120.py 0.10 0.10 0.10 0.10
euler121.py 0.10 0.20 0.10 0.20
euler123.py 5.87 5.26 8.78 7.05
euler124.py 0.82 1.83 0.71 2.43
euler125.py 0.93 1.42 1.33 1.32
euler135.py 28.39 5.22 2.65 4.19
euler142.py 0.20 0.81 0.11 0.41
euler157.py 0.10 0.10 0.10 0.10
euler162.py 0.10 0.10 0.10 0.10
euler171.py 0.12 0.11 0.10 0.10
euler173.py 0.92 1.42 1.02 1.01
euler174.py 35.72 4.75 2.96 3.27
euler181.py 0.10 0.10 0.11 0.10
euler190.py 0.10 0.10 0.10 0.10
euler193.py 0.10 0.10 0.10 0.10
euler202.py 0.10 0.10 0.10 0.10
euler205.py 1.83 0.72 2.35 0.71
euler207.py 0.11 0.53 0.32 0.42
euler222.py 0.10 0.10 0.10 0.10
euler233.py 0.10 0.10 0.11 0.10
euler234.py 5.77 6.62 7.38 7.48
euler235.py 0.20 0.20 0.42 0.20
euler240.py 16.17 14.96 17.38 19.59
total 409.3 492.26 393.54 593.8
wins 77 48 67 45

So all the JITs are faster than CPython for this set of programs, but none is twice as fast. PyPy has the most wins, but Psyco has the lowest total time.

While PyPy does very well overall, it's an order of magnitude slower than all the others on 3 of the programs: 14, 135, and 174. If PyPy fixed whatever made those ones slow, it would definitely be the overall leader.

Here's my euler14.py

#!/usr/bin/env python

"""Project Euler, problem 14

The following iterative sequence is defined for the set of positive integers:

n -> n / 2 (n is even)
n -> 3n + 1 (n is odd)

Which starting number, under one million, produces the longest chain?
"""

def next_num(num):
    if num & 1:
        return 3 * num + 1
    else:
        return num // 2

MAX_NUM = 1000000

lengths = {1: 0}

def series_length(num):
    global lengths
    if num in lengths:
        return lengths[num]
    else:
        num2 = next_num(num)
        result = 1 + series_length(num2)
        lengths[num] = result
        return result

def main():
    num_with_max_length = 1
    max_length = 0
    for ii in range(1, MAX_NUM):
        length = series_length(ii)
        if length > max_length:
            max_length = length
            num_with_max_length = ii
    print num_with_max_length, max_length

if __name__ == "__main__":
    main()

Why is 14 so slow? My guess would be that it's because it heavily mutates a global dictionary.

Here's euler135.py:

#!/usr/bin/env python

"""Project Euler, problem 135

Given the positive integers, x, y, and z, are consecutive terms of an
arithmetic progression, the least value of the positive integer, n, for which
the equation, x**2 - y**2 - z**2 = n, has exactly two solutions is n = 27:

34**2 - 27**2 - 20**2 = 12**2 - 9**2 - 6**2 = 27

It turns out that n = 1155 is the least value which has exactly ten solutions.

How many values of n less than one million have exactly ten distinct solutions?
"""

"""
x**2 - y**2 - z**2 = n
y = z + d
x = z + 2 * d
d > 0
z > 0
(z + 2 d) ** 2 - (z + d) ** 2 - z ** 2 = n
(z**2 + 4dz + 4d**2) - (z**2 + 2dz + d**2) - z**2 = n
z**2 + 4dz + 4d**2 -z**2 - 2dz - d**2 - z**2 = n
-z**2 + 2dz + 3d**2 - n = 0
a = -1
b = 2d
c = 3d**2 - n
z = (-b +/- sqrt(b**2-4ac)) / 2a
z = (-2d +/- sqrt(4d**2+4(3d**2-n))) / -2
z = (-2d +/- sqrt(4d**2+12d**2-4n))) / -2
z = (-2d +/- sqrt(16d**2-4n))) / -2
z = (-2d +/- 2 sqrt(4d**2 - n)) / -2
z = (-d +/-  sqrt(4d**2 - n)) / -1
z = d +/- sqrt(4d**2 - n)
4d**2 - n >= 0
n <= 4d**2
4d**2 - n is a perfect square
"""

def main():
    limit = 1000000
    counts = limit * [0]
    for d in range(1, limit / 4 + 1):
        d24 = d ** 2 * 4
        if d24 < limit:
            start = 1
            counts[d24] += 1
        else:
            start = int((d24 - limit) ** 0.5)
        if start < 1:
            start = 1
        for root in xrange(start, 2 * d):
            n = d24 - root ** 2
            if n <= 0:
                break
            if n < limit:
                z = root + d
                y = z + d
                x = y + d
                #print "n", n, "d", d, "root", root, "x", x, "y", y, "z", z
                #assert x**2 - y**2 - z**2 == n
                counts[n] += 1
                if d > root:
                    z = d - root
                    y = z + d
                    x = y + d
                    #print "n", n, "d", d, "root", root, "x", x, "y", y, "z", z
                    #assert x**2 - y**2 - z**2 == n
                    counts[n] += 1
    total = 0
    assert counts[0] == 0
    for val in counts:
        if val == 10:
            total += 1
    print total

if __name__ == "__main__":
    main()

Why is PyPy so slow on this one? Well, maybe it's because all the work happens in one function, if PyPy doesn't JIT a function while it's still running. Or maybe it's the list with a million elements in it.

Finally, euler174.py:

#!/usr/bin/env python

"""Project Euler, problem 174

We shall define a square lamina to be a square outline with a square "hole" so
that the shape possesses vertical and horizontal symmetry.

Given eight tiles it is possible to form a lamina in only one way: 3x3 square
with a 1x1 hole in the middle. However, using thirty-two tiles it is possible
to form two distinct laminae.

If t represents the number of tiles used, we shall say that t = 8 is type L(1)
and t = 32 is type L(2).

Let N(n) be the number of t <= 1000000 such that t is type L(n); for example,
N(15) = 832.

What is sum(N(n)) for 1 <= n <= 10?
"""

from math import ceil
from collections import defaultdict

def gen_laminae(limit):
    """Yield laminae with up to limit squares, as tuples 
    (outer, inner, squares)"""
    for outer in range(3, limit // 4 + 2):
        if outer & 1:
            min_min_inner = 1
        else:
            min_min_inner = 2
        min_inner_squared = outer ** 2 - limit
        if min_inner_squared < 0:
            min_inner = min_min_inner
        else:
            min_inner = max(min_min_inner, int(ceil(min_inner_squared ** 0.5)))
            if outer & 1 != min_inner & 1:
                min_inner += 1
        outer_squared = outer ** 2
        for inner in range(min_inner, outer - 1, 2):
            squares = outer_squared - inner ** 2
            yield (outer, inner, squares)

def main():
    squares_to_count = defaultdict(int)
    for (outer, inner, squares) in gen_laminae(1000000):
        squares_to_count[squares] += 1
    histogram = defaultdict(int)
    for val in squares_to_count.values():
        if val <= 10:
            histogram[val] += 1
    print sum(histogram.values())

if __name__ == "__main__":
    main()

Perhaps this one is slow because it heavily uses generators and a defaultdict?

Anyway, I'm not claiming any of these are great programs. I know others have much better solutions to these problems. And this code is certainly not optimized for PyPy. They're just examples of small programs that were originally written for CPython, and do work correctly on PyPy, but happen to be much slower on PyPy than on CPython.

Again, note these are the exceptions not the rule. PyPy runs most pure-Python code great. Just not these three programs.

Math
Programming
Python

Comments (1)

Permalink

PyCon 2010

PyCon was in Atlanta this year, which got me away from the record snow in the DC area.  Yay for winter conferences being in warm places.  (Nothing against Chicago, but Chicago conferences should be in the summer.)

Attendance was about 1100, a little more than last year, but nothing like the crazy growth PyCon saw when the economy was strong. I think that Python is still growing but travel and recruiting budgets are down.

The best talk I attended was Raymond Hettinger's.  It was about using the right container classes to solve computationally expensive problems.  He pointed out that every ordered dict implementation except the one he just added to Python 3.1 had O(n) deletes due to using a list for storing the order, while his had O(1) deletes because he used a linked list with a dict index.  I hung my head in shame because I've written odict twice (once at a former employer, again from scratch for Slugathon) and done it "wrong" both times.

The most useful thing I attended was the Twisted Open Space.   Because my employer really wants IPv6 in Twisted, and I've submitted a patch but not had it accepted because of reverse compatibility concerns.  Actually talking to most of the core Twisted team at once in person really helped clarify what we need to do to break the logjam and get the patch moving forward again.  That ten minutes probably justified the cost of sending me to PyCon this year.  (And I wish I could go to the sprints and maybe actually get this change into Twisted this week, but I can't.  Next year I really want to go to at least a couple of sprint days.)

Rackspace Cloud was there as a sponsor with a booth, which reminded me that I should have blogged about my experience using Rackspace Cloud.  Basically, they stood up a virtual small Ubuntu (they have other choices too) server for me in about 5 minutes, for about $12 per month (plus bandwidth, more money for more memory), and it just stinking worked.  I've since turned it off because Slugathon isn't done yet so I don't really need a dedicated game server yet, but I'll definitely be back when it is.  The only negative is that there's no way to setup a cap at which the server turns itself off, so if you get Slashdotted or DOS attacked you may get a high bandwidth bill.

Negatives:

Airline travel sucks.  You already know this.

Guido's keynote was just taking questions via Twitter, and the signal-to-noise ratio was awful.  (Yes, I'm an old Angry Unix Guy.  Get off my lawn and take your Twitter and your Facebook and your iPhone with you.)  If Guido doesn't want to do a real keynote, that's fine; why not let someone else have the slot?

I'm no longer in favor of invited talks, because one of them was just content-free pattern metababble that never would have been accepted if the speaker had had to do a proposal.  (But, in fairness, all the other invited talks I attended were excellent.)

There were several talks that I really enjoyed and thought were great fun, but where I didn't really learn anything.  So they validated my existing opinions but didn't stretch my brain at all.  (Larry Hastings actually pointed out before his talk about micro-optimizations that it was just nerd porn and that it would be entertaining but nobody would learn much.  I applaud him for his honesty.)  I guess that's natural when you've been doing something for a long time and have attended the same conference a bunch of times.  I need to attend fewer talks and spend more time just talking to people.

The board game social wasn't as awesome as last year because there wasn't a huge pile of games to pick from.  (I think last year a game store donated some games, and you can't expect that to happen every year.)  So I will bring at least one game next year.

My 7-year-old laptop lost its WiFi connection whenever things got crowded.  The networking people do their best, but 1000 laptops crammed into a small area means the newer stronger ruder WiFi cards will crowd out the older weaker more-polite ones, no matter how many access points there are.  I plugged into a switch when I could, and lived without WiFi when I couldn't.  My laptop is heavy and has poor battery life anyway, so I think it's time to retire it in favor of a netbook.

Python

Comments (0)

Permalink

Moved Slugathon to github

I've been working on Slugathon on-and-off (but mostly off) since 2003. The bare minimum feature set that I consider necessary to do a serious alpha release is almost done. (Release early, release often; but if you release before you actually do anything useful you might just leave a bad first impression and scare off your potential users.) So I've been thinking about installers, and portability to MacOS and Windows, and whether my current hosting solution (svn and Trac provided for free to open source Python projects by WebFaction; thanks WebFaction) would cut it when I had users and possibly more contributors.

I like Trac, a lot. I now strongly prefer Git to Subversion. You can use Git and Trac together with the GitTracPlugin, but it's hard to find a host that has it installed. (SourceForge, for example, now offers both Git and Trac, but not together.) So I decided to go with github. They're the most popular Git host, their paid hosting business seems to be solid enough that they're unlikely to go away anytime soon, they offer free hosting for open source projects, and they have a wiki and an issue tracker.

So this morning I experimentally moved everything from Subversion+Trac to github. I've administered both Svn/Trac and Git/Trac installations at work, so I knew how to do this on a local machine where I had control, but doing things locally is different from doing things on a remote hosting service.

Here's what I did on github:

1. Signed up for an account. (I've poked around github plenty of times, and cloned other people's repositories there to my local box, but I never needed to register until I actually wanted to move my own repository there. Which is exactly the way it should work.)

2. Uploaded an ssh key. I already know ssh so it was just a matter of posting ~/.ssh/id_rsa.pub into a web form.

3. Re-cloned my Subversion repository to my local box, using git-svn, using the –no-metadata option to remove all the ugly "git-svn-id" blobs in the log. I'd done this before when we switched from Subversion to Git at work, but I didn't remember the exact syntax. Luckily it was right there on Github's help page so I didn't even need to check a Git man page.

4. Uploaded my new local Git repository to Github.

5. Realized that I'd forgotten to use the –authors-file option to convert the author metadata from Subversion format (bare username) to Git format (firstname lastname email). Oops, this was also on github's help page but I'd been in too much of a hurry. I poked around github until I found out how to delete a repository, deleted the one I'd just made, redid the git-svn clone with the missing option, and re-uploaded my repository.

6. Basically cut-and-pasted my wiki pages (there were only 8 of them) from Trac to Github's wiki. I did a little tweaking to convert formatting from Trac format to the Textile format that Github uses, but didn't obsess about getting every little format perfect. One thing that github's wiki doesn't support well is images; they let you link to external images, but not post attachments directly into the wiki. So for now the thumbnail screenshots are there, but they don't link to the larger versions.

7. Manually ported all my Trac tickets to github issues. I had 62 tickets, 30 closed and 32 open. Both Trac and the github issue tracker share the same #number format for referring to issues, and I mention ticket numbers inside commit messages, so I thought it was necessary to add all the already-closed issues for historical reasons. I also changed any references to Subversion commit numbers to Git commit ids inside the issues. This was an annoying data entry task, but with only 62 tickets it was faster to just do it by hand than to write some awesome Trac to github conversion script. (Which would require either cooperation from github, or advanced web scraping, since github is a rather fancy web site with JavaScript everywhere.)

8. Changed Slugathon's SourceForge page to point at github rather than WebFaction. (I'm keeping a SourceForge presence for the project because they offer useful things that Github does not, like project mailing lists. Also, if SourceForge ever integrates Git and Trac and I become dissatisfied with Github's new and fairly minimal issue tracker, I might move everything to SourceForge.)

9. Changed the main wiki page at WebFaction to point to github, too. (I will eventually ask WebFaction to delete the project, but first I want to make sure that the move was a good idea. And give Google enough time to cache the version of the page that shows the redirect to github, so people don't think the project just disappeared.)

10. Created a static web page using the instructions at pages.github.com. This feels somewhat redundant with the github wiki, but it gave me a place to park my images.

My conclusion so far: I still like Git a lot more than Subversion. I like github's wiki and issue tracker less than Trac, but they seem to be good enough.

Games
Programming
Python

Comments (0)

Permalink