request removal of moderation for bugdb (loses info)

Feature requests not specific to either the Mac OS X or GTK+ versions of Transmission
Post Reply
Astara
Posts: 50
Joined: Sun Apr 18, 2010 8:36 pm

request removal of moderation for bugdb (loses info)

Post by Astara »

've submitted 2 updates to an existing bug -- http://trac.transmissionbt.com/ticket/5553 and neither of them ever showed up in the report.

I've never had bug reports "dropped" after submission, but also true is that I've never had *bug reports* routed for moderation. Bug reports are usually factual statements of problems -- not a social or interpersonal forum or chat room.

Personally, I think it makes the project look bad. Closing a bug to the general public because it is security sensitive is one thing, but pre-censoring bug reporting, makes it look like you are trying to hide problems in the code by artificially keeping the bug count lower.

It's not like a bug database is a fertile ground for spamming: who would look at it? I.e. reading a bug-database isn't usually something people do for fun (some maybe, but it's hard to imagine it making any "top readers' choice" lists.

I tried to reply to a request for information to 'x190', who asked if I lowered the allowed client # to 800, if it helped avoid hitting the new, hard-coded 1024 file limit in 2.82.

I responded that my limit was already below that @ 768 -- so it wasn't a solution to the problem (which stopped and would not restart about 1/3 of my torrents).

I also noted, looking in /proc at the process's open fd directory and
saw (and still see) ~14,472 open file descriptors.

Of those, only 134 are sockets (which would correspond to peers, I think).

Almost all of the remainder were pointers to individual files.

It makes me wonder if there might not be a FD leak somewhere --
if I only am talking to 130 clients, they usually could only ask
for 1 file at a time.

I also asked if having that many open files v. the hard coded limit of
1024 that was in the new version gave some idea of the extent or scope or
size of the problem -- in that if transmission is really using 14k file
descriptors, then limiting it to 1024 is likely to cause problems.

Why was the limit moved from a user-selectable choice to being hard
coded? That seems a bit dictatorial in choosing for users how they
can allocate their resources.

(FWIW, transmission is also using a 13.3G virtual address space,
most of which is DATA (according to ps) and 7.3G of that
being "resident" (my cache size is set to 3G).

I also listed pertinent setings from my settings.json file --
but AFAIK, nothing sensitive in there either.

Anyway, hope this note fairs better. It would really help if
the bug-database didn't "drop" bug-data.

I suppose I could open up a transmission pad on google for posting of
bugs so information won't get lost...

Thanks, and hoping the problem gets fixed.
cfpp2p
Posts: 290
Joined: Sat Aug 08, 2009 3:14 pm

Re: request removal of moderation for bugdb (loses info)

Post by cfpp2p »

I've never had bug reports "dropped" after submission, but also true is that I've never had *bug reports* routed for moderation. Bug reports are usually factual statements of problems -- not a social or interpersonal forum or chat room.
:(

here are a couple of related tickets that I thought of:
https://trac.transmissionbt.com/ticket/5161
https://trac.transmissionbt.com/ticket/5056
https://trac.transmissionbt.com/ticket/4164

I use this:
https://trac.transmissionbt.com/ticket/5161#comment:3

:( :o :shock: :?
Astara
Posts: 50
Joined: Sun Apr 18, 2010 8:36 pm

Re: request removal of moderation for bugdb (loses info)

Post by Astara »

Yikes!!!

I haven't changed the number of torrents nor the # of peers or any of the config vals for a long time. All I did this time was upgrade from 2.72->2.82.

Working around it for me was a matter of changing "fdlimit.c":

--- fdlimit.c.orig 2013-08-08 19:45:40.000000000 -0700
+++ fdlimit.c 2013-11-24 02:44:56.315577494 -0800
@@ -36,7 +36,7 @@

#include <sys/types.h>
#include <sys/stat.h>
-#include <sys/time.h> /* getrlimit */
+/*#include <sys/time.h> getrlimit */
#include <sys/resource.h> /* getrlimit */
#include <fcntl.h> /* O_LARGEFILE posix_fadvise */
#include <unistd.h> /* lseek (), write (), ftruncate (), pread (), pwrite (), etc */
@@ -500,7 +500,7 @@
int peerCount;
struct tr_fileset fileset;
};
-
+#define FD_SETSIZE 16384
static void
ensureSessionFdInfoExists (tr_session * session)
{
@@ -510,7 +510,7 @@
{
struct rlimit limit;
struct tr_fdInfo * i;
- const int FILE_CACHE_SIZE = 32;
+ const int FILE_CACHE_SIZE = 16384;

/* Create the local file cache */
i = tr_new0 (struct tr_fdInfo, 1);
@@ -518,7 +518,7 @@
session->fdInfo = i;

/* set the open-file limit to the largest safe size wrt FD_SETSIZE */
- if (!getrlimit (RLIMIT_NOFILE, &limit))
+/* if (!getrlimit (RLIMIT_NOFILE, &limit))
{
const int old_limit = (int) limit.rlim_cur;
const int new_limit = MIN (limit.rlim_max, FD_SETSIZE);
@@ -530,6 +530,7 @@
tr_logAddInfo ("Changed open file limit from %d to %d", old_limit, (int)limit.rlim_cur);
}
}
+*/
}
}
--------------------------------------------------------

But it looks like that simply hid some other problem?

I think the code doesn't close files in torrents that are inactive. Given I have only 130 clients, even 10 files/client would only be 1300 files -- but I'm seeing over 10k being open.

Transmission has been running for multiple days now -- (~1200 cpu seconds), so it takes a while to reach the higher levels. It's using about twice the memory the previous version did.
Astara
Posts: 50
Joined: Sun Apr 18, 2010 8:36 pm

Re: request removal of moderation for bugdb (loses info)

Post by Astara »

x190 wrote:
I also noted, looking in /proc at the process's open fd directory and
saw (and still see) ~14,472 open file descriptors.
Should not be such a high number. Is your system getting the accounting right? If it was an obvious Transmission issue, I think there would be many more bug reports.

Just now thinking about µTP. Try disabling that.

cfpp2p, can you recall what you found to be the underlying cause of this issue for you?
-----
I disabled µTP yesterday after you posted (caught it early as my post crossed your post and noticed it almost immediately). Today, I note 15067 open FD's. Note, I didn't restart transmission, just used the remote interface to disable µTP.

Does transmission do any garbage cleaning to clean out old FD's, or does it
rely on the open file limit?

If it was set to 32 FD's, doesn't that mean that my ~130 "clients".. would all be "thrashing" the FD cache. I.e. nothing would be cached as the clients were serviced and each needed a different file. So my allowing 3GB for a file cache would have gone to waste? Maybe that's why I see the virtual size up as high as it is, as it isn't closing out older, non-used FD's?

THOUGH -- the above said, that's addressing the case where that many FD's are allowed, it doesn't address the case of why I would suddenly get a problem with my setup as w/~ 130 clients and ~340 possible streams to choose from.

In case it makes any difference or isn't clear, I'm only using the daemon, so GUI connections wouldn't enter into it.

I'm not sure any of my clients come in on µTP... sure doesn't look like it made
dent in the traffic. I guess I'll re-enable it?
Astara
Posts: 50
Joined: Sun Apr 18, 2010 8:36 pm

Re: request removal of moderation for bugdb (loses info)

Post by Astara »

x190 wrote: 15067 open FD's
Have you done lsof on the process to see exactly what the majority of these fds are?
With the original code and only 130 sockets, total open fds should be well below 1
------
Actually, I looked in /proc/<pid> for pid= that of transmission.

Almost all of them except for ~ 130 were to individual files in torrents.
~130 were sockets; (the "fd" directory shows all open FD's as symlinks to the file they connect to).

Right now, looked again after a restart earlier today, it's at 136 sockets + 7456 files total.

I modified the number in fdlimit.c because 1/3rd of my torrents were stopped. and wouldn't restart.

As for the cache -- it's lower now than it was about 6 months ago when I had it at 4G and none of the "too many fd's" problem.

So what problem should I be seeing that is related to curl? My max clients value was 768 -- well below the 800 you suggested. From the open sockets, I'd say I only have about 1/5-1/6th that limit.

Obviously I have the fdlimit set higher than needed -- but when I kept trying to increase the fdlimit and it wasn't working (I didn't know it was hard coded to 1024 at the time, NOR that the file cache was hard-coded to 32 files.

The "too many fd's" problem happened after I upgraded to 2.82 -- no settings
have were changed until I ran into the "couldn't save torrent" errors. Then I upped the fd's in fdlimit.


You think I should put it back at 32? :shock:

BTW -- wouldn't open uTP connections show up as sockets?
Astara
Posts: 50
Joined: Sun Apr 18, 2010 8:36 pm

Re: request removal of moderation for bugdb (loses info)

Post by Astara »

I know what I put in __circumvents__ the problem.

At the time, I didn't know it's history.

To debug it, I'd probably want to instrument the cache.

I can't see why curl would have a limit of 1024 files unless it has a hard-coded limit as well -- which means it has a design flaw that could use fixing.

But before I debug this, I need to get time and motivation to look at it.

But given the code as it is, I don't see why the default code would have
run into problems -- unless there is a leak somewhere...in which case, for testing, one might want to run with even lower limits that the unaltered source would run with... I.e. an open cache of 4-8, and an FD limit around 128 or lower. Having such ridiculously small limits would (I would think) be more likely to trigger the problem. than using the unaltered code...wouldn't you agree?

I'll have to see how my time goes, but lately been up to my ears fixing distro-changes in their new release (that breaks alot of things )....sigh...
Astara
Posts: 50
Joined: Sun Apr 18, 2010 8:36 pm

Re: request removal of moderation for bugdb (loses info)

Post by Astara »

x190 wrote:
But given the code as it is, I don't see why the default code would have
run into problems
This is a rare problem experienced by, or reported by, only a very few over several versions.

Important Points:

• Run unaltered code, especially fdlimit code.
• Leave global peer connections at default (240) for testing.
• Set the write cache to a reasonably close to default (4MB) setting.
---
You say "for testing" -- but it looks like you are trying to minimize the chances of encountering the problem.

If you are testing, you want to maximize the chances of finding the problem, no?

To maximize chances, wouldn't you likely:

• Run altered code, especially fdlimit -- might want to reduce this well below normal so any problem in the *algorithm*, is likely to be hit much sooner.
• Set global peer connections to ~ 75% of fdlimit (leave at least ~40-50 fd's open for cache management and random overhead)
• I''m not sure how the write cache would figure in, BUT, if it is a resource problem, using a larger number would seem to be more likely to cause a problem. But that said -- memory usage and #fd's are pretty much kept separately, so unless you are exhausting the memory on your machine, the fd's shouldn't be affected one way or the other.


It really depends on what behaviors you want to reproduce and how quickly.

Example: Had a bug in the linux-kernel audit code from sgi where system would go into deadlock after about 3-5 days of continuous testing. My solution was to optimize the code and run it at full tilt so it would reproduce in under an hour. It wasn't the way the code would normally be run, but was a "stress test". If you want to find out what happens on the edge cases, you have to push things to the edge. Conservative testing tends to give conservative bugs-found counts. Depends on where you are aiming.

FWIW, it seems that having enough FD's in the file cache to service each active torrent would be a useful thing to lower the overhead of file opens...but that's really an aside...

Thanks for pointing me at the earlier instances of my problem... though
none of the base reasons (# clients, seemed to have been out of order).

Handles aren't used for keeping track of memory buffers, so I still don't see how
size of mem buffs should make a diff... is there something in the code that would relate them? (curious as to your reasoning on this one)...
Astara
Posts: 50
Joined: Sun Apr 18, 2010 8:36 pm

Re: request removal of moderation for bugdb (loses info)

Post by Astara »

ok... things been crazy here... so may be a bit... but will put it on Q ;-)...
Astara
Posts: 50
Joined: Sun Apr 18, 2010 8:36 pm

Re: request removal of moderation for bugdb (loses info)

Post by Astara »

Regarding the libcurl issue... I found this note dated 2007 that this problem had been worked around in libcurl?
Daniel Stenberg | 2 Jun 11:52 2007
Picon
Re: fd_set beyond 1024 problem
Daniel Stenberg <daniel <at> haxx.se>
2007-06-02 09:52:17 GMT
On Sat, 2 Jun 2007, Aleksandar Lazic wrote:
>>> It isn't clear to me what the portable way of raising this limit is for
>>> fd_set, but I feel a strong urge to fix this issue.
....
Thanks for the suggestions and help, but the fix for us was actually "easier"
than so: the last use of select() in libcurl was still present due to c-ares
returning information and providing an API that only fit select(). I modified
c-ares to allow a more "direct" approach that operates directly on the given
socket/file descriptor, and then I could just rip out the use of select() from
libcurl.

That's why we now require c-ares 1.4.0 for libcurl if you want to do asynch
name resolves. I hope to release that soon, and I intend to do it before I can
release libcurl 7.16.3. There's only a few remaining patches to go in and a
little more testing to be done. Unfortunately, I'm the maintainer of c-ares as
well so that work ends up on me too...

libcurl now uses poll() all over internally, and only those systems that don't
have a native poll() will use select() as then we emulate poll() by using
select(). In today's world, there won't be many systems that don't have poll.
I can mostly think of Windows before Vista and Mac OS X, and possibly we can
work things out on the Mac side one day (and they might even fix their poll
one day).
So in the intervening 7 years, did progress in this area go backwards?

I note on the linux man page for the select/pselect calls that use
of poll(2) and, especially, the linux specific "epoll" is suggested over
use of select/pselect, as both allow "sparse fd-set monitoring".

BTW, also tried to gather some cache stats.. maybe you could patch
the main transmission with something similar to the cache stat code
so stats could be tallied for the main line?

I'll post that and my odd stats ....
Post Reply