Proposal: Systemd Watchdog integration

Feature requests not specific to either the Mac OS X or GTK+ versions of Transmission
Post Reply
zertyz
Posts: 1
Joined: Tue Oct 07, 2014 4:56 pm

Proposal: Systemd Watchdog integration

Post by zertyz »

Hello, Transmission maintainers and users.

I read some discussions about previous attempts & suggestions concerning the addition of watchdog support to transmission-daemon and I agree that, before, it didn't make sense to the daemon to be aware of hardware & userland software details.

However, in my view, today we have a different scenario:
1) transmission already supports systemd
2) systemd supports watchdog -- both providing the hardware with the needed ticks and collecting reports from services.

I, then, would like to propose that transmission-daemon starts supporting systemd's watchdog features, as described at this article - http://0pointer.de/blog/projects/watchdog.html

As you see, the daemon (possibly daemon/daemon.c) only requirement is to add some systemd specific instrumentation. Also the systemd service file needs some suitable options added to it -- for watchdog & service restarting.

The real world scenario that makes this functionality useful happens to me from time to time: I use transmission to seed some heavily downloaded files, which push the system to the extreme and, unfortunately, sometimes confuses the kernel (or the hardware), making transmission (and any other process trying to do a fsync operation) to block forever. Simply killing the process won't help... Apart from it becoming a zombie that never leave the process table, the machine needs to go through a forced restart, since a normal shutdown involves some fsync operations as well.

Here is the corresponding output of dmesg:
[49326.097003] INFO: task transmission-da:303 blocked for more than 120 seconds.
[49326.133151] Not tainted 3.12.28-2-ARCH #1
[49326.164561] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[49326.198450] transmission-da D c05d4420 0 303 1 0x00000000
[49326.237052] [<c05d4420>] (__schedule+0x290/0x5ec) from [<c05d4888>] (io_schedule+0x94/0x100)
[49326.274114] [<c05d4888>] (io_schedule+0x94/0x100) from [<c00bd55c>] (sleep_on_page+0x8/0x10)
[49326.309871] [<c00bd55c>] (sleep_on_page+0x8/0x10) from [<c05d2d0c>] (__wait_on_bit+0x7c/0xb4)
[49326.347320] [<c05d2d0c>] (__wait_on_bit+0x7c/0xb4) from [<c00bd334>] (wait_on_page_bit+0xa0/0xb8)
[49326.384111] [<c00bd334>] (wait_on_page_bit+0xa0/0xb8) from [<c00bd484>] (filemap_fdatawait_range+0xe8/0x144)
[49326.421438] [<c00bd484>] (filemap_fdatawait_range+0xe8/0x144) from [<c00bef78>] (filemap_write_and_wait_range+0x54/0x74)
[49326.459987] [<c00bef78>] (filemap_write_and_wait_range+0x54/0x74) from [<c0180834>] (ext4_sync_file+0xa0/0x384)
[49326.497749] [<c0180834>] (ext4_sync_file+0xa0/0x384) from [<c013152c>] (vfs_fsync+0x3c/0x4c)
[49326.534220] [<c013152c>] (vfs_fsync+0x3c/0x4c) from [<c01316c0>] (do_fsync+0x28/0x50)
[49326.570430] [<c01316c0>] (do_fsync+0x28/0x50) from [<c000e0e0>] (ret_fast_syscall+0x0/0x30)

Transmission integration with systemd's watchdog functionality seems to be a clean solution for cases like that.

I Hope this suggestion contributes, in some way, to the evolution of this wonderful software.

-- Luiz
Post Reply