Postmortems
Postmortems
This is an experiment to document how some bugs make it into Transmission releases. Maybe by doing this, future releases can make fun new mistakes instead of the same old ones. Or maybe not. Who knows?
Postmortem: Transmission 1.80
http://trac.transmissionbt.com/ticket/2781
The biggest bug was in the new feature of announcing to one tracker per tier instead of one tracker per torrent. getaddrinfo() is a blocking call, so the extra calls to it in the same thread as peer IO caused libtransmission to choke at regular intervals. This was exacerbated by having dozens of trackers in a torrent with several unresolvable hostnames.
This behavior exited in the beta releases, but was not reported by testers.
What to do better next time:
(1) Test Transmission with whose tracker lists that are long and polluted.
(2) Test Transmission when using a very slow DNS server.
(3) Make it easier for beta users to report bugs. Thousands of beta testers and nobody saw this?
http://trac.transmissionbt.com/ticket/2777
The second most visible bug in 1.80 was some magnet links didn't parse. It was a stupid bug with an easy fix -- tr_hex_to_sha1() didn't understand uppercase A-F.
What to do better next time:
(4) When accepting new forms of input from outside, pile on the test cases.
The biggest bug was in the new feature of announcing to one tracker per tier instead of one tracker per torrent. getaddrinfo() is a blocking call, so the extra calls to it in the same thread as peer IO caused libtransmission to choke at regular intervals. This was exacerbated by having dozens of trackers in a torrent with several unresolvable hostnames.
This behavior exited in the beta releases, but was not reported by testers.
What to do better next time:
(1) Test Transmission with whose tracker lists that are long and polluted.
(2) Test Transmission when using a very slow DNS server.
(3) Make it easier for beta users to report bugs. Thousands of beta testers and nobody saw this?
http://trac.transmissionbt.com/ticket/2777
The second most visible bug in 1.80 was some magnet links didn't parse. It was a stupid bug with an easy fix -- tr_hex_to_sha1() didn't understand uppercase A-F.
What to do better next time:
(4) When accepting new forms of input from outside, pile on the test cases.
Postmortem: Transmission 1.81
http://trac.transmissionbt.com/ticket/2793
1.81 attempted to fix #2781 by using libevent's evdns mechanism to resolve announce hostnames without blocking, and hacking that resolved name into the URL that we pass to libcurl. This ugly hack doesn't solve lookups from redirects, but on the whole seems to work very well.
The problem was that the Host: header implemented for #2781 didn't include the original port number, which drove at least one tracker crazy.
What to do better next time:
(5) When implementing behavior from a well-defined spec, read the spec.
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.
1.81 attempted to fix #2781 by using libevent's evdns mechanism to resolve announce hostnames without blocking, and hacking that resolved name into the URL that we pass to libcurl. This ugly hack doesn't solve lookups from redirects, but on the whole seems to work very well.
The problem was that the Host: header implemented for #2781 didn't include the original port number, which drove at least one tracker crazy.
What to do better next time:
(5) When implementing behavior from a well-defined spec, read the spec.
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.
Postmortem: Transmission 1.82
http://trac.transmissionbt.com/ticket/2783
1.82 added the Host: port back in for #2793, but some hosts (such as update.transmissionbt.com) didn't like port 80 being explicit. Is this the server's fault? Maybe so, but if it breaks on our server it probably breaks elsewhere too.
What to do better next time:
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.
http://trac.transmissionbt.com/ticket/2792
1.82 continued to freeze on DNS lookups because we didn't handle the case of hostnames that couldn't be resolved -- we passed those URLS to libcurl unchanged. This was fixed by (1) immediately failing any web tasks whose hostnames were unresolvable, and (2) by caching DNS failures as well as successes.
What to do better next time:
(1) Test Transmission with whose tracker lists that are long and polluted.
(2) Test Transmission when using a very slow DNS server.
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.
1.82 added the Host: port back in for #2793, but some hosts (such as update.transmissionbt.com) didn't like port 80 being explicit. Is this the server's fault? Maybe so, but if it breaks on our server it probably breaks elsewhere too.
What to do better next time:
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.
http://trac.transmissionbt.com/ticket/2792
1.82 continued to freeze on DNS lookups because we didn't handle the case of hostnames that couldn't be resolved -- we passed those URLS to libcurl unchanged. This was fixed by (1) immediately failing any web tasks whose hostnames were unresolvable, and (2) by caching DNS failures as well as successes.
What to do better next time:
(1) Test Transmission with whose tracker lists that are long and polluted.
(2) Test Transmission when using a very slow DNS server.
(6) Freeze for a few days before release. This would've shown up during even a two-day freeze.
Re: Postmortems
Cannot agree more with (3) - strict and easier to find bug reporting rules would be less puzzling to ordinary users and benefit the development team as well.