Postmortems

Discussion of Transmission that doesn't fit in the other categories
Post Reply
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Postmortems

Post by Jordan »

This is an experiment to document how some bugs make it into Transmission releases. Maybe by doing this, future releases can make fun new mistakes instead of the same old ones. Or maybe not. Who knows?
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Postmortem: Transmission 1.80

Post by Jordan »

http://trac.transmissionbt.com/ticket/2781

The biggest bug was in the new feature of announcing to one tracker per tier instead of one tracker per torrent. getaddrinfo() is a blocking call, so the extra calls to it in the same thread as peer IO caused libtransmission to choke at regular intervals. This was exacerbated by having dozens of trackers in a torrent with several unresolvable hostnames.

This behavior exited in the beta releases, but was not reported by testers.

What to do better next time:
(1) Test Transmission with whose tracker lists that are long and polluted.
(2) Test Transmission when using a very slow DNS server.
(3) Make it easier for beta users to report bugs. Thousands of beta testers and nobody saw this?

http://trac.transmissionbt.com/ticket/2777

The second most visible bug in 1.80 was some magnet links didn't parse. It was a stupid bug with an easy fix -- tr_hex_to_sha1() didn't understand uppercase A-F.

What to do better next time:
(4) When accepting new forms of input from outside, pile on the test cases.
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Postmortem: Transmission 1.81

Post by Jordan »

http://trac.transmissionbt.com/ticket/2793

1.81 attempted to fix #2781 by using libevent's evdns mechanism to resolve announce hostnames without blocking, and hacking that resolved name into the URL that we pass to libcurl. This ugly hack doesn't solve lookups from redirects, but on the whole seems to work very well.

The problem was that the Host: header implemented for #2781 didn't include the original port number, which drove at least one tracker crazy.

What to do better next time:
(5) When implementing behavior from a well-defined spec, read the spec. :)
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Postmortem: Transmission 1.82

Post by Jordan »

http://trac.transmissionbt.com/ticket/2783

1.82 added the Host: port back in for #2793, but some hosts (such as update.transmissionbt.com) didn't like port 80 being explicit. Is this the server's fault? Maybe so, but if it breaks on our server it probably breaks elsewhere too.

What to do better next time:
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.

http://trac.transmissionbt.com/ticket/2792

1.82 continued to freeze on DNS lookups because we didn't handle the case of hostnames that couldn't be resolved -- we passed those URLS to libcurl unchanged. This was fixed by (1) immediately failing any web tasks whose hostnames were unresolvable, and (2) by caching DNS failures as well as successes.

What to do better next time:
(1) Test Transmission with whose tracker lists that are long and polluted.
(2) Test Transmission when using a very slow DNS server.
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.
kovalev
Posts: 24
Joined: Fri Oct 16, 2009 1:09 pm

Re: Postmortems

Post by kovalev »

Cannot agree more with (3) - strict and easier to find bug reporting rules would be less puzzling to ordinary users and benefit the development team as well.
Post Reply