transmission-daemon 2.30b4 profiling & performance

Ask for help and report issues not specific to either the Mac OS X or GTK+ versions of Transmission
Post Reply
blacklion
Posts: 25
Joined: Wed Aug 04, 2010 7:35 pm

transmission-daemon 2.30b4 profiling & performance

Post by blacklion »

I've done some ad-hoc profiling of 2.30b4 on FreeBSD 8.2. Please note, it is not a Linux system, but I think, conclusions are common for these systems.

System: Coe2Duo E4500, 4Gb memory, 1GiB Intel NIC, 64-bit system, dedicated router to ISP connection (so, this system doesn't perform any NAT or firewalling). Theoretically, 25Mbit/s internet connection, but it is more at practice. No other ``heavy'' processes on system -- only essential system processes like cron and transmission-daemon.

Torrents: 357 torrents, all 100% ready (Seed), 692.1GiB according to ``transmission-remote --list''. Upload speed is between 2000KiB/s and 4500KiB/s with average about 3000KiB/s. No limits are set via config file.

transmission-daemon, compiled with standard options (no profiling, optimizations), consumes 12-14% of CPU (one core, so Idle is ~185%). It is very responsive if upload speed is about 2000KiB/s.

When speed is higher than 2000KiB/s, it becomes unresponsive: simple ``transmission-remote --list'' could take 30 seconds, or even ended with timeout or ``no answer'', Tr. Remote GUI losts connection and could not reconnect, etc. Please note, that here is plenty of CPU at this moment: one core is completely available, and other is loaded only at 15%.

transmission-daemon contains 4 threads, but only 1 of them is really works. Two others don't consume CPU at all.

Profiling with gprof (gcc -gp -g) shows, that most of time (about 50%) is consumed by bandwidthPulse(), most of THAT time is consumed by reconnectPulse(), and, down by the tree, tr_bandwidthAllocate(). NB: here is NO any bandwidth limits in configuration file! makeNewPeerConnections() consumes a lot of CPU too.

Functions comparePeerCandidates() and addValToKey() spend a lot of time by itself (without children calls or syscalls): 9% and 4.2%.

Ok. Now to syscall analyzyss with kdump. I've runned kdump for 20 seconds and get this statistics by number of calls (top-20):

Code: Select all

gettimeofday  87054
writev        31118
clock_gettime 16124
kevent         8378
sendto         7150
readv          4577
ioctl          4577
fstat          3888
pread          3883
recvfrom       3831
poll           2288
getpid         1384
madvise         951
fcntl           945
setsockopt      908
close           503
socket          461
connect         461
bind            436
stat            190
Please, note: expensive, but ``not productive'' method gettimeofday() is called most often, when here is no need at bandwidth allocation at all.

Ok. Enough data. Now I have some questions to authors :)

(1) Why is any RPC so sluggish when here is alot of CPU available?
(2) Why here is 4000+ calls per second to gettimeofday(), expensive syscall? Standard resolution of system timer on FreeBSD is less than that, I think, that Linux doesn't use so high-resolution timer for this call too. And, please note, it is when there is no need in bandwidth allocation at all.
(2a) Even with bandwidth allocation -- do we need to call gettimeofday() SO often? Which precision and guarantees do we want to achieve with this?
(3) Here is another 800 call/second time-related syscall clock_gettime(). WHY?
(4) What do 3 other threads of transmission-daemon do? It seems, that one ins DHT, and other is "web" (RPC?), but why RPC is so sluggish then?
(5) Why here is poll() calls when libevent uses kevent() (it should not affect performance, to have 100 polls/s, but for completeness :))

Really, I'm satisfied by transmission-daemon performance, as it looks like I never will have Internet connection, which will be faster, than my server could serve. But this GUI sluggish-ness is annoying.
blacklion
Posts: 25
Joined: Wed Aug 04, 2010 7:35 pm

Re: transmission-daemon 2.30b4 profiling & performance

Post by blacklion »

According to callgrind, bandwidthPulse() is on very top too, most of it inclusive time is made by reconnectPulse() via tr_torrentNext().
makeNewPeerConnections() with calls to getPeerCandidates() and qsort() and comparePeerCandidates().
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Re: transmission-daemon 2.30b4 profiling & performance

Post by Jordan »

https://trac.transmissionbt.com/changeset/12420/ resolves some of the bandwidth pulse / gettimeofday() issues, particularly when speed limits are disabled.

The reason the RPC interface becomes sluggish is probably that the libtransmission thread is blocking on disk IO. That issue is being tracked at https://trac.transmissionbt.com/ticket/1753
blacklion
Posts: 25
Joined: Wed Aug 04, 2010 7:35 pm

Re: transmission-daemon 2.30b4 profiling & performance

Post by blacklion »

I'll try this change. But I want to note, that calling gettimeofday() / clock_gettime() so often is not have any usefulness even in case where here IS bandwidth management.

Humm...
(1) RPC thread should not be blocked by I/O in other thread -- FreeBSD has true kernel-supported threads, not userland implementation. Or are they use one lock, which is held by libtransmission thread while it is blocked in disk I/O?
(2) According to ktrace pread() is used to read data from disk. In 20 seconds span here is 3883 calls to it with typical read size of 4096 bytes, which gives us 776KiB/s. Even if it is truly-random read, it doesn't look like very heavy disk load.
(3) It is not tuly-random read, really. According to ktrace, there is lot of chains of 10-20 calls to pread() from the same FD, with offsets, increasing by 4096 (read size). It means, that such calls call be collapsed in one call for 40-80KiB read. Maybe, it is worth to increase read buffer to 64KiB?
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Re: transmission-daemon 2.30b4 profiling & performance

Post by Jordan »

blacklion wrote:I'll try this change. But I want to note, that calling gettimeofday() / clock_gettime() so often is not have any usefulness even in case where here IS bandwidth management.
Yes, I know. It's more a question of needing to refactor the code so that it's called less frequently. There are several cases where the value could be passed around from function to function so that the various functions don't need to call gettimeofday().
blacklion wrote:RPC thread should not be blocked by I/O in other thread -- FreeBSD has true kernel-supported threads, not userland implementation. Or are they use one lock, which is held by libtransmission thread while it is blocked in disk I/O?
Correct, one lock held by libtransmission's thread while it's blocked in disk IO. It's a known problem, hence the ticket.
blacklion wrote:(2) According to ktrace pread() is used to read data from disk. In 20 seconds span here is 3883 calls to it with typical read size of 4096 bytes, which gives us 776KiB/s. Even if it is truly-random read, it doesn't look like very heavy disk load.
If IO blocking isn't what's causing the RPC lag, you might want to investigate more closely... I don't know what else the problem could be and would be interested to hear more about that.
blacklion wrote:(3) It is not tuly-random read, really. According to ktrace, there is lot of chains of 10-20 calls to pread() from the same FD, with offsets, increasing by 4096 (read size). It means, that such calls call be collapsed in one call for 40-80KiB read. Maybe, it is worth to increase read buffer to 64KiB?
Well, yes and no. On the one hand, this could be done by modifying tr_cacheReadBlock() to read more than one block at a time. However, this probably wouldn't give much of a win because we're already prefetching these blocks via posix_fadvise() or fcntl( fd, F_RDADVISE ). In my experience, reading isn't much of a bottleneck compared to writing.
radir
Posts: 18
Joined: Sat Apr 18, 2009 4:47 pm

Re: transmission-daemon 2.30b4 profiling & performance

Post by radir »

Hi,

First of all very interesting thread, thanks for insight.

Just a note: Jordan, your assumption that prefetching working well by using fadvise is invalid in case of FreeBSD as it has no fadvise support.
rb07
Posts: 1400
Joined: Sun Aug 24, 2008 3:14 am

Re: transmission-daemon 2.30b4 profiling & performance

Post by rb07 »

radir wrote:If IO blocking isn't what's causing the RPC lag
Its worse than that, try changing a big torrent's location which moves the file(s)... not only RPC is blocked until the move completes, the whole daemon stops sending and receiving.

A picture is worth a 1000 words (when the disk transfer goes up, seeding stops):
Attachments
screenshot
screenshot
gkrellm_S-HPMediaVault.png (35.1 KiB) Viewed 15650 times
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Re: transmission-daemon 2.30b4 profiling & performance

Post by Jordan »

rb07 wrote:Its worse than that, try changing a big torrent's location which moves the file(s)... not only RPC is blocked until the move completes, the whole daemon stops sending and receiving.
Correct, one lock held by libtransmission's thread while it's blocked in disk IO. It's a known problem, hence the ticket.
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Re: transmission-daemon 2.30b4 profiling & performance

Post by Jordan »

radir wrote:Just a note: Jordan, your assumption that prefetching working well by using fadvise is invalid in case of FreeBSD as it has no fadvise support.
Radir,

Are you saying that FreeBSD has no posix_fadvise() API call, or that FreeBSD doesn't have any similar calls (such as an fcntl + F_RADVISE) that could be used instead? The code is in tr_prefetch() at https://trac.transmissionbt.com/browser ... mit.c#L242 and currently has #ifdefs for DARWIN and HAVE_POSIX_FADVISE, but the function could definitely be reworked to add support for other situations too.
blacklion
Posts: 25
Joined: Wed Aug 04, 2010 7:35 pm

Re: transmission-daemon 2.30b4 profiling & performance

Post by blacklion »

Jordan wrote:Are you saying that FreeBSD has no posix_fadvise() API call, or that FreeBSD doesn't have any similar calls (such as an fcntl + F_RADVISE) that could be used instead? The code is in tr_prefetch() at https://trac.transmissionbt.com/browser ... mit.c#L242 and currently has #ifdefs for DARWIN and HAVE_POSIX_FADVISE, but the function could definitely be reworked to add support for other situations too.
FreeBSD doesn't have posix_fadvice() and it doesn't have F_RADVISE too. I'm trying to find is here any third way to accomplish this task on FreeBSD. It seems that there is no such way at all.
radir
Posts: 18
Joined: Sat Apr 18, 2009 4:47 pm

Re: transmission-daemon 2.30b4 profiling & performance

Post by radir »

Latest version (=8.2 what blacklion used for his measurements) supports these fcntl() params may look useful:

F_READAHEAD
Set or clear the read ahead amount for sequential access to
the third argument, arg, which is rounded up to the nearest
block size. A zero value in arg turns off read ahead.

F_RDAHEAD Equivalent to Darwin counterpart which sets read ahead amount
of 128KB when the third argument, arg is non-zero. A zero
value in arg turns off read ahead.
blacklion
Posts: 25
Joined: Wed Aug 04, 2010 7:35 pm

Re: transmission-daemon 2.30b4 profiling & performance

Post by blacklion »

radir wrote:Latest version (=8.2 what blacklion used for his measurements) supports these fcntl() params may look useful:

F_READAHEAD
Set or clear the read ahead amount for sequential access to
the third argument, arg, which is rounded up to the nearest
block size. A zero value in arg turns off read ahead.

F_RDAHEAD Equivalent to Darwin counterpart which sets read ahead amount
of 128KB when the third argument, arg is non-zero. A zero
value in arg turns off read ahead.
Yep, I've found it and will try tomorrow :)
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Re: transmission-daemon 2.30b4 profiling & performance

Post by Jordan »

If it works out, you might consider cooking up a patch for fdlimit.c so that it could be used in a future release :)
blacklion
Posts: 25
Joined: Wed Aug 04, 2010 7:35 pm

Re: transmission-daemon 2.30b4 profiling & performance

Post by blacklion »

Yep, of course :)
But I have not time yet top play with it :(
radir
Posts: 18
Joined: Sat Apr 18, 2009 4:47 pm

Re: transmission-daemon 2.30b4 profiling & performance

Post by radir »

Hi blacklion,

[OFF]
It is very OFF topic but as you too build transmission from source I am just wondering if you can build with --enable-utp because i receiving error message:

Code: Select all

Making all in libutp
  CXX    utp.o
In file included from utp.cpp:78:
utp_config.h:8:2: warning: #warning implement this in libtransmission
utp.cpp: In member function 'byte PackedSockAddr::get_family() const':
utp.cpp:123: error: ISO C++ forbids declaration of 'type name' with no type
utp.cpp:123: error: ISO C++ forbids declaration of 'type name' with no type
utp.cpp:123: error: expected primary-expression before 'const'
utp.cpp:123: error: expected `)' before 'const'
utp.cpp:123: error: expected `)' before ';' token
utp.cpp:123: error: expected `)' before ';' token
utp.cpp:123: error: expected `)' before ';' token
*** Error code 1
[/OFF]
Post Reply