Some very bad decisions in transmission

porter · Post by **porter** » Sun Dec 16, 2012 5:41 pm

Let me first say that transmission is definitely the best torrent client out there, and nothing I write below will change that fact.

Now, having said that, I noticed that the client does some incredibly stupid things. Namely, when a torrent download (or verification) is finally completed, and thus the content is perfectly cached in the memory, suddenly the client decides to discard everything and clears the memory of all cached content. What happens next is that the torrent, which is of course still seeded, slowly pulls all the content back from the slow disk, thrashing it immensely. And we all know that disks are at least 100 times slower than RAM. So, why would anyone deliberately discard the cache and slow down the whole system? Well, beats me... It's like a spit in the face of numerous kernel developers that work hard to optimize memory caching. It's like you completely wasted your money when you bought additional RAM to save on disk accesses, because transmission won't use it anyway. Bad, bad, bad!

Here's a little program (shared library, actually) that you can compile to fix the bad behaviour, without the need to recompile or mess with the transmission client itself. The library effectively disables every posix_fadvise() call in the app, and thus enables it to utilize the memory properly. The kernel can make a much better memory management decisions anyway, because the kernel sees a bigger picture. Also, lots of competent people work on improving the kernel memory management subsystem, you can't beat them at their job!

The library has been written and tested on a Linux system, but it should work on any other POSIX system, just the same. Nobody should run without it, because every time you thrash your disk, you use much more energy and kill another flower somewhere in the world. So, this green solution should be used until the transmission client is fixed properly. Or at least we're given a choice, to thrash our disk or not. Some people actually enjoy the sound of a disk arm moving randomly, they don't care about flowers...

Code: Select all

#include <fcntl.h>

int posix_fadvise(int fd, off_t offset, off_t len, int advice) {
    return 0;
}

/*
 * gcc -fPIC -rdynamic -c libdisable_posix_fadvise.c
 * gcc -shared -o libdisable_posix_fadvise.so libdisable_posix_fadvise.o -lc -ldl
 */

Save the code to a file named libdisable_posix_fadvise.c, compile it using the instructions in the comment block, and use it like this:

Code: Select all

LD_PRELOAD=/path/to/libdisable_posix_fadvise.so transmission-daemon

I guess it will work with all transmission UIs, but I tested it only with transmission-daemon, because that's what I use.

Have fun!

Post by **Jordan** » Sun Dec 23, 2012 1:53 pm

Porter, what tools are you using to perform the disk accesses with and without disabling the posix_fadvise() calls, and could you post the before & after statistics?

porter · Post by **porter** » Sun Dec 23, 2012 2:07 pm

Any tool that can show memory usage in real time will do. I like 'vmstat 1' in terminal.

Typically memory is completely used (very low numbers in the free column in the vmstat output), not surprising because when there are active torrents with decent speed, Linux page cache will cache everything it can. That's what it does, and it does it well!

The moment some big torrent is finished (last piece downloaded and seeding started), sheer amounts of page cache (the cache column in the vmstat output) get discarded and memory is freed. The amount freed gets added to the free column in the vmstat output, of course. At the same moment, because the torrent is still active, disk I/O (the io/bi column in vmstat) rises heavily because now all those memory that was freed a second ago has to be redownloaded from the slow disk. All this happens because there's posix_fadvise() call in there that explicitely tells the kernel to free all the page cache belonging to the file(s) just downloaded. But that is wrong, because the files are still needed!

porter · Post by **porter** » Tue Jan 22, 2013 9:10 am

Good news x190! In what version will the fix land, so I can test it properly?

porter · Post by **porter** » Mon Mar 04, 2013 12:13 am

Installed the v2.77 the other day. I'm afraid I see no improvement at all.

I've attached a 'vmstat 1' session. Notice how there's ~223MB of memory at the beginning. It's like that all the time, the memory is fully utilized. A few seconds into the session I picked an active 2GB torrent and clicked 'Verify'. By active I mean, it has been actively seeded for some time, and it can be expected to be almost fully cached in RAM (not too much unrelated I/O activity, enough RAM!). But, instead of going fast through the file cache, transmission did exactly the opposite, it threw away all the cached content from memory, reread the whole file from the disk again (notice high values in bi - blocks in & wa - disk %wait columns) and also discarded those new read pages. You can verify that the operation ends with 2GB of free memory. So, double trouble. Very very inefficient! I'm back to my preload library which solves all these engineering mistakes.

Code: Select all

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 223800  54224 4950152    0    0   792    27 2465 3403  9  4 87  1
 0  0      0 223156  54228 4950696    0    0   548     7 2560 2968  7  3 89  1
 2  0      0 239100  54240 4934008    0    0    32   336 2614 3482  8  4 87  1
 1  0      0 514780  54252 4658020    0    0   996   147 3034 3458 46  9 42  4
 1  0      0 776232  54252 4396692    0    0   424   188 4887 3890 51  9 38  2
 1  0      0 924820  54260 4246464    0    0  6008    18 3807 3983 31  6 44 19
 0  1      0 990508  54260 4181648    0    0 12232     9 2840 4213 19  5 46 29
 0  3      0 1037724  54268 4135496    0    0 10236   589 2860 3851 15  5 47 34
 0  1      0 1081348  54276 4091472    0    0  8604    85 3165 3607 14  4 47 35
 0  2      0 1151324  54276 4023436    0    0 11064   306 2755 4620 18  7 44 32
 0  1      0 1186024  54276 3984816    0    0 11364    80 3155 4340 23  6 40 32
 0  2      0 1246596  54276 3924944    0    0  9572   575 3781 3890 18  5 44 32
 0  1      0 1284256  54284 3886880    0    0  8080   318 3663 4375 16  7 37 41
 0  1      0 1318848  54292 3853428    0    0  7592   343 3102 3505 22  4 40 34
 0  1      0 1373688  54440 3798760    0    0 11284    16 2891 3606 16  5 44 36
 0  1      0 1438684  54440 3734380    0    0 12224    42 2786 3787 19  5 47 29
 1  1      0 1486928  54440 3685752    0    0  8196    83 2917 4427 21  5 44 31
 2  0      0 1528844  54452 3645036    0    0  8768   197 2970 3974 15  4 46 34
 0  2      0 1566904  54452 3607092    0    0  8784   621 3638 3787 14  4 45 37
 0  1      0 1614068  54460 3560764    0    0  8632    53 4081 3810 15  5 43 37
 0  2      0 1680556  54460 3493748    0    0 11908    10 3603 3704 17  5 48 30
 1  1      0 1737848  54460 3438616    0    0 12368    50 3399 4622 24  5 44 28
 2  2      0 1802536  54460 3377980    0    0 10400     9 3451 3949 16  6 48 30
 1  0      0 1868880  54468 3316492    0    0 11540    99 3165 3999 16  5 49 30
 1  1      0 1919792  54472 3269020    0    0 12204   323 3195 4315 16  6 45 33
 1  0      0 1983548  54476 3212348    0    0  8852    26 2985 4202 15  6 46 34
 0  0      0 1995500  54476 3202992    0    0  4628     0 2934 3801  8  5 80  8
 0  0      0 1993768  54476 3204784    0    0  1760    67 2839 3726 12  3 84  1
 0  0      0 1994544  54504 3205184    0    0   364   148 2596 3713  8  4 85  4
 0  0      0 1990812  54504 3209168    0    0  2272    43 2827 3750  7  4 88  1
 0  0      0 1989652  54628 3208896    0    0  1556    33 3461 3648  6  5 87  1
 0  0      0 1989780  54636 3210096    0    0  1256    44 2673 3557  7  4 88  2
 1  1      0 1983136  54636 3213488    0    0  1424   373 2852 3924  9  4 85  2
 0  0      0 1936244  54644 3213564    0    0  2048  1263 4745 4541 57  8 27  8

porter · Post by **porter** » Mon Mar 04, 2013 9:25 am

Yeah, those posix_advise(..., POSIX_FADV_DONTNEED) calls are all terribly wrong. Just remove them and all will be fine.

cfpp2p · Post by **cfpp2p** » Wed Mar 06, 2013 7:14 pm

Thanks porter. This kind of research is very important, especially for users like myself who want to customize their version of transmission for specific enhancements and improvements based on a certain platform or system configuration. Great work. Thanks again.

porter · Post by **porter** » Wed Mar 06, 2013 7:57 pm

Sure thing, mate. Transmission is a great app, isn't it.

porter · Post by **porter** » Sun Jul 14, 2013 10:03 pm

I was glad to see a long list of improvements in 2.80, unfortunately all memory management bugs are still not fixed.

I tested only the torrent verification. If I do it right after the torrent has completed, it seems to come out of memory cache, so it's fast and efficient. But, then transmission decides to discard all that cache (memory gets freed), so when the torrent is returned to seed mode seconds later, it starts to slowly pull data from disk to memory, although it had it all cached a split second before verification process completed?! What a pity.

In short, the preload library is still needed, and it still successfully corrects some serious design mistakes in transmission.

porter · Post by **porter** » Mon Jul 15, 2013 8:25 am

x190 wrote:I think verify.c does that because it's supposed to be not needed later. One could lobby to have that changed and I've thought about it, but I guess a few hard numbers might help. If you don't request verification, then does it seed from memory? I believe I got devs to fix that part of the equation already.

Yes, if verification is not done, everything's smooth. The torrent can be seeded straight out of memory, no (read) disk I/O, thanks to it's size being smaller than the available memory. Only when it is verified, it gets thrown out of memory.

Post by **Jordan** » Wed Jul 17, 2013 12:34 am

IMO the question of caching pieces from "Verify Local Data" is a pathological variation of bug #5102. It's one thing to not remove recently-loaded pieces from the cache when we're in the middle of uploading. It's another thing to choose to keep the entire torrent in memory.

Consider the use case of a 4.3 GB torrent. The user hits 'verify'. Are the 4.3 GB all going to stay in the cache? No. Are the pieces that stay in the cache the ones that the next peer will need? Probably not. Does this reintroduce the 'Inactive Memory' issue? I see no stats on that here.
Also consider the case where you have a dozen torrents actively seeding when you hit 'verify' on the 4.3 GB torrent. If we flood the cache with the contents of the 4.3 GB torrent, what's that going to do to the cache hit/miss stats on the other torrents?

porter · Post by **porter** » Wed Jul 17, 2013 9:18 am

x190's answers are all correct, but allow me to give a shot, too.

Jordan wrote:[*]Consider the use case of a 4.3 GB torrent. The user hits 'verify'. Are the 4.3 GB all going to stay in the cache? No. Are the pieces that stay in the cache the ones that the next peer will need?

Yes, they're going to stay in the cache because I have more than enough memory (8GB). The answer to the second question is once again yes, it was a freshly downloaded torrent, so it's very hot and lots of peers are interested. So, after a quick verification, when I return it to seed, I expect it to be fully cached, just like it was before a verification process. Unfortunately, that is not happenning now. In our example, I would end up with 4.3GB of completely freed up memory after verification. And peers then pull all pieces from the disk again. And that is wrong.

Jordan wrote:[*]Also consider the case where you have a dozen torrents actively seeding when you hit 'verify' on the 4.3 GB torrent. If we flood the cache with the contents of the 4.3 GB torrent, what's that going to do to the cache hit/miss stats on the other torrents?[/list]

So, that's our other scenario, when there's more torrent content than can be cached in the available memory. The question now is who knows what pieces are more important than the others? Only the kernel. The kernel keeps track which pieces are important (accessed more often) and keeps them in memory. So, who then should decide what to throw out and what to keep in memory? Only the kernel! The kernel is the system that does memory management in the modern OS, and it should be let to do it, unhampered.

Also notice, there's no "if" in your "if we flood the cache". When verification process is completed, the cache is already flooded. Also those clean pages of memory are as good to the OS as free memory. It's trivial to free them, and return a new memory page for some other usage. So there's really no good reason to free them. None. But there's at least one good reason why it is bad to free them, and that is my disk thrashing after every verification.

Post by **Jordan** » Wed Jul 17, 2013 1:38 pm

Porter wrote:
Jordan wrote:[*]Also consider the case where you have a dozen torrents actively seeding when you hit 'verify' on the 4.3 GB torrent. If we flood the cache with the contents of the 4.3 GB torrent, what's that going to do to the cache hit/miss stats on the other torrents?[/list]
So, that's our other scenario, when there's more torrent content than can be cached in the available memory. The question now is who knows what pieces are more important than the others? Only the kernel. The kernel keeps track which pieces are important (accessed more often) and keeps them in memory. So, who then should decide what to throw out and what to keep in memory? Only the kernel! The kernel is the system that does memory management in the modern OS, and it should be let to do it, unhampered.

I disagree. Under the approach you're describing, the steps would be:

* Transmission prefetches 15 blocks for torrent A, peer A1 because it knows it will be sending them soon.
* Transmission prefetches 15 blocks for torrent A, peer A2 because it knows it will be sending them soon.
* Transmission prefetches 15 blocks for torrent A, peer A3 because it knows it will be sending them soon.
* Repeat previous steps for remaining torrents
* User hits "verify local data" for torrent X
* Verify loads all of torrent X into the cache at the same priority as the prefetched blocks. If X is large, the prefetched blocks get swapped out.
* The OS the has to hit the disk again to reload the previously-prefetched blocks we're uploading in every torrent except X
* In exchange, some or all of torrent X is kept in the cache even if there are 0 downloading peers

The problem with this approach isn't the kernel, it's that we've told the kernel to give equal weight to blocks loaded by verify and blocks prefetched due to peer requests. The latter are ones we'll definitely need soon, and the former are ones we may or may not need.

porter · Post by **porter** » Wed Jul 17, 2013 3:17 pm

Jordan wrote:
Porter wrote:
Jordan wrote:[*]Also consider the case where you have a dozen torrents actively seeding when you hit 'verify' on the 4.3 GB torrent. If we flood the cache with the contents of the 4.3 GB torrent, what's that going to do to the cache hit/miss stats on the other torrents?[/list]
So, that's our other scenario, when there's more torrent content than can be cached in the available memory. The question now is who knows what pieces are more important than the others? Only the kernel. The kernel keeps track which pieces are important (accessed more often) and keeps them in memory. So, who then should decide what to throw out and what to keep in memory? Only the kernel! The kernel is the system that does memory management in the modern OS, and it should be let to do it, unhampered.
I disagree. Under the approach you're describing, the steps would be:

* Transmission prefetches 15 blocks for torrent A, peer A1 because it knows it will be sending them soon.
* Transmission prefetches 15 blocks for torrent A, peer A2 because it knows it will be sending them soon.
* Transmission prefetches 15 blocks for torrent A, peer A3 because it knows it will be sending them soon.
* Repeat previous steps for remaining torrents
* User hits "verify local data" for torrent X
* Verify loads all of torrent X into the cache at the same priority as the prefetched blocks. If X is large, the prefetched blocks get swapped out.
* The OS the has to hit the disk again to reload the previously-prefetched blocks we're uploading in every torrent except X
* In exchange, some or all of torrent X is kept in the cache even if there are 0 downloading peers

The problem with this approach isn't the kernel, it's that we've told the kernel to give equal weight to blocks loaded by verify and blocks prefetched due to peer requests. The latter are ones we'll definitely need soon, and the former are ones we may or may not need.

Huh, it looks like we're talking about 2 different things?! And it looks like you're trying to implement another I/O caching layer, on top of the existing kernel one, with different rules. I'm afraid it just can't work well. How much memory are you controlling with "your" cache, do you take care of all 6-7GB that I have available? Of course not. At least not in a efficient way, you're actually killing gigabytes of cached content to protect a few blocks in "your" cache, do you really think it's fair?

First of all, you say "transmission prefetches", what do you mean by that? In all modern OS-es it's the kernel that fetches blocks from disk, and if application needs them, it will ask kernel via system call to provide the block. The kernel than takes care of allocating memory, reading the block, doing readahead (prefetching the following blocks, anticipating they will be needed soon), caching the block(s) and eventually freeing them. The kernel also takes care to keep cached (in all available memory!) what's needed often, and throw out what is not needed. And it has the best view overall, because transmission is not the only app running on the OS, typically, right?

Now, the only way you could implement another cache on top of that, is to (physically) allocate memory, and manage it yourself. But how much memory? I see that transmission daemon process is only 5MB on my machine. Are you telling me that you're protecting some infinitesimally small 1MB cache by killing gigabytes of perfectly cached memory from the system page cache? I'm confused...

Lastly, quote "it's that we've told the kernel to give equal weight to blocks loaded by verify and blocks prefetched due to peer requests" just doesn't make sense, because there ain't such system call, at least not in Linux. Actually, the kernel always starts giving equal weights to every page of memory, but soon it starts to prefer those pages that are accessed often, by using multiple LRU lists, hardware page table bits (was the page accessed or not?) and various complicated clock algorithms. That is stuff that you just CAN'T implement in user space. So it's better not to even start, because you can just make things worse, and actually - you are.

How can I help you understand that the extra I/O layer can only slow things down? Help me to help you.

Post by **Jordan** » Wed Jul 17, 2013 7:19 pm

porter wrote:
Jordan wrote:Under the approach you're describing, the steps would be:

* Transmission prefetches 15 blocks for torrent A, peer A1 because it knows it will be sending them soon.
* Transmission prefetches 15 blocks for torrent A, peer A2 because it knows it will be sending them soon.
* Transmission prefetches 15 blocks for torrent A, peer A3 because it knows it will be sending them soon.
* Repeat previous steps for remaining torrents
* User hits "verify local data" for torrent X
* Verify loads all of torrent X into the cache at the same priority as the prefetched blocks. If X is large, the prefetched blocks get swapped out.
* The OS the has to hit the disk again to reload the previously-prefetched blocks we're uploading in every torrent except X
* In exchange, some or all of torrent X is kept in the cache even if there are 0 downloading peers

The problem with this approach isn't the kernel, it's that we've told the kernel to give equal weight to blocks loaded by verify and blocks prefetched due to peer requests. The latter are ones we'll definitely need soon, and the former are ones we may or may not need.
Huh, it looks like we're talking about 2 different things?! And it looks like you're trying to implement another I/O caching layer, on top of the existing kernel one, with different rules.

When I talk about Transmission prefetching blocks, I'm referring to the use of POSIX_FADV_WILLNEED and F_RDADVISE to tell the OS about upcoming reads. We know in advance that we'll need these blocks because we can peek the request lists that the peers have sent us:

Code: Select all

int
tr_prefetch (int fd UNUSED, off_t offset UNUSED, size_t count UNUSED)
{
#ifdef HAVE_POSIX_FADVISE
  return posix_fadvise (fd, offset, count, POSIX_FADV_WILLNEED);
#elif defined (SYS_DARWIN)
  struct radvisory radv;
  radv.ra_offset = offset;
  radv.ra_count = count;
  return fcntl (fd, F_RDADVISE, &radv);
#else
  return 0;
#endif
}

Porter wrote:I'm afraid it just can't work well. How much memory are you controlling with "your" cache, do you take care of all 6-7GB that I have available? Of course not. At least not in a efficient way, you're actually killing gigabytes of cached content to protect a few blocks in "your" cache, do you really think it's fair?

None of this is relevant to my question, which is about OS-level prefetching.

Interesting though that you chose to "quote" me "twice" on something that I wasn't talking about.

Porter wrote:First of all, you say "transmission prefetches", what do you mean by that? In all modern OS-es it's the kernel that fetches blocks from disk, and if application needs them, it will ask kernel via system call to provide the block. The kernel than takes care of allocating memory, reading the block, doing readahead (prefetching the following blocks, anticipating they will be needed soon), caching the block(s) and eventually freeing them. The kernel also takes care to keep cached (in all available memory!) what's needed often, and throw out what is not needed.

Yes.

Porter wrote:And it has the best view overall, because transmission is not the only app running on the OS, typically, right?

This is true to a point, but the app has a role. If it has specific knowledge about upcoming IO, it can give hints to the OS to optimize for it. This is why things like posix_fadvise() exist.

This paragraph isn't relevant to the topic of prefetching, but as an aside, there's also a place for an app-level write cache in BitTorrent clients. The client has unique knowledge about what disk writes are upcoming because it knows (1) which blocks it's requested (2) from which peers and (3) how fast each peer is. A smart torrent client can lower the total number of disk writes with even a small in-memory write cache.

Porter wrote:Lastly, quote "it's that we've told the kernel to give equal weight to blocks loaded by verify and blocks prefetched due to peer requests" just doesn't make sense, because there ain't such system call, at least not in Linux. Actually, the kernel always starts giving equal weights to every page of memory, but soon it starts to prefer those pages that are accessed often, by using multiple LRU lists, hardware page table bits (was the page accessed or not?) and various complicated clock algorithms.

This finally gets to the question I asked. Let's say ${OS} is using LRU. Transmission prefetches the blocks it knows it's going to need soon. Then "verify local data" loads a torrent larger than the cache, causing the prefetched blocks to fall off the end of the LRU list. The end result is we lose blocks we know we're going to need, to make room for a torrent that may or may not have any peers.

That's the question I'm asking, anyway. Ideally I'd like to see some testing about how this plays out in practice on different OSes and different cache sizes.

Porter wrote:That is stuff that you just CAN'T implement in user space. So it's better not to even start, because you can just make things worse, and actually - you are.

How can I help you understand that the extra I/O layer can only slow things down? Help me to help you.

I hope this response helps you to understand the question I'm asking.

Transmission

Some very bad decisions in transmission

Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission

Re: Some very bad decisions in transmission