Filesystem charset

Feature requests not specific to either the Mac OS X or GTK+ versions of Transmission
Post Reply
Altimit
Posts: 3
Joined: Thu Oct 23, 2008 10:34 am

Filesystem charset

Post by Altimit »

I use Transmission 1.34 in FreeBSD. The file system uses Cp1251 encoding in file names and the torrent-file uses UTF-8 (Transmission works with UTF-8 too). Files with names in russian are downloaded with illegible charset. The second problem is that Transmission is not able to find an existing file in file system I want to seed.
Could you please include the possibility to choose what file system encoding Transmission must use? For example, it is done by using "--filesystem_encoding cp1251" in BitTornado. Or could you please prompt where exactly could I try to build this kind of function in source?
Jordan
Transmission Developer
Posts: 2312
Joined: Sat May 26, 2007 3:39 pm
Location: Titania's Room

Re: Filesystem charset

Post by Jordan »

The change would need to be in libtransmission/inout.c and libtransmission/fdlimit.c's fopen() and tr_mkdirp() calls, I think.

Charset munging often seems like a black art to me. Thanks for offering to look into it. :clap:
dukzcry
Posts: 2
Joined: Tue Jul 28, 2009 8:30 pm

Re: Filesystem charset

Post by dukzcry »

Hello!
Could you say what in whole since changed?
I've seen that fopen() funcs were changed to proprietary open() ones.

I've tried to use iconv.h and wrapper function for it to try it to work, but get's no far then creating and reading files with cp1251 (or given) names. When it's verifying torrent, it's stall on 99,9 percents and shout that it can't create resume file, while web interface show me "No such file" in torrent status.

Of course sorry for this crap, since i'm not a programmer.
jch
Posts: 175
Joined: Wed May 13, 2009 12:08 am

Re: Filesystem charset

Post by jch »

Oh my... I've done quite a bit of internationalisation of software in a previous life, and, trust me, you don't want to open that particular can of worms.

Under Mac OS X, things are done reasonably, using an approach borrowed from Plan 9 from Bell Labs: all filenames are in UTF-8, whatever the user's locale and the particular filesystem.

Under traditional Unix, however, the common approach is to use the locale's encoding for file names, which means that on-disk files change names whenever you switch locales, and that sending filenames over the network usually breaks. That's the reason why we European Unix users learn to avoid our best-beloved diacritical characters in file names, and use plain ASCII. Which is not necessarily a workable solution for our Eastern friends, who use a different alphabet.

The one sane solution is the one advocated by the Linux community: give up on non-UTF-8 locales, and encourage all users to convert all of their filenames to UTF-8. This is helped somewhat by software such as Emacs, which is able to automatically detect non-UTF-8 file encodings even when the system locale is UTF-8, but it remains a difficult transition, and one that the BSD community (communities?) haven't attempted yet.

--Juliusz
dukzcry
Posts: 2
Joined: Tue Jul 28, 2009 8:30 pm

Re: Filesystem charset

Post by dukzcry »

all filenames are in UTF-8, whatever the user's locale and the particular filesystem.
Giving needed charset to user is the BEST choice.
Especially in IM and mail sofware, where there are a lot of charsets in common use. For example one can create IRC server and shout for UTF-8 as default, while other more conservative will use local charset.
mutt and irssi are examples of such programs, while both of these software aren't example of minimalism and state of art, there are most powerful solutions, and in 40% of success cause of right internalisation.

There is war in such discussion, but you SHOULD create multi-charset solution in you program, if you want to have a success. Of course in part of interface, GUI applications made with widely using GTK or QT toolkits are superseders over CLI, except the only and default codepage is UTF-8.

Pleny of the years should pass to make UTF-8 or even UTF-16 or something new as ONE and only enough. Many people think that they don't need such wasteful codepage, while using no more than two alphabets in usual life. One of which is Latin and other is native.

What about charsets in filesystems? Even Windows ext3 driver can use a dozen of codepages for filesystems, so why not to do your application multi-charset capable? Especially if you're already including such a shit like iconv library.

P.S.: Don't forget about that precious part of people that would hate UTF-8 until their last days.
Sadly we can't totally ingore them :lol:
jch
Posts: 175
Joined: Wed May 13, 2009 12:08 am

Re: Filesystem charset

Post by jch »

Giving needed charset to user is the BEST choice.

Especially in IM and mail sofware
We were speaking about the encoding of file names in the filesystem, not about the contents of the files or the data that goes on the wire.

I expect to be able to save a file on a French machine, take my USB key with me to Poland, and have the filenames preserved. I don't expect French « ê » to become Polish « ę » just because my USB key crossed a border. (« ê » and « ę » have the same codepoint in the locale-specific 8-bit encodings of the two countries).

A truly tragic example. In the Good Old Times of MS-DOS, filenames were encoded using the current codepage. Since Eastern-European CP-852 has more codepoints assigned then Western CP-850, some Polish floppies were unreadable under French MS-DOS, which would reject the « impossible » filenames.

The issue was first fixed in Plan 9, which pioneered the use of UTF-8 (originally UTF-1) in all locales. Microsoft followed suit, using UTF-16 in all locales on the NTFS and VFAT filesystems. Apple started using UTF-8 in HFS+, if memory serves. And we, in the Free Unix community, are trying to follow suit.

--Juliusz
starmoon
Posts: 11
Joined: Wed Oct 27, 2010 5:35 pm

Re: Filesystem charset

Post by starmoon »

I'm using GB2312/GBK charset on my local filesystem.

I highly recommand and request that transmission add support local charset option in feature release.

Not every one can covert there local filesystem to UTF-8. something else will confuse with utf-8.

Thanks.
Post Reply