Check torrent content against existing file library

Feature requests not specific to either the Mac OS X or GTK+ versions of Transmission
Post Reply
BugMagnet
Posts: 1
Joined: Wed Apr 20, 2011 8:48 am

Check torrent content against existing file library

Post by BugMagnet »

I have amassed thousands of files over the years (1-2 TB of documentaries, commentaries and other educational media) which I try to make available for sharing over various p2p networks. I try to keep these organized by subject, author/speaker, etc.

Recently, others have made torrents of archives consisting of perhaps dozens of files. Since I seek files similar to what I already have, I often have many of the files included in the torrent.

So, how is it best to prevent downloading duplicate files which I already have in my library?

My idea would be to have an indexing utility that would create a hash table for all my existing files. Then when I add a torrent, the hash for each file within the torrent would be compared to the list of existing file hashes. Option then would be to mark matches 'do not download' or in the alternative, to ask user to decide what to do. Secondly, for sharing purposes, a link to the existing file location should be noted so that it can be sourced from there.

It would help a lot also to maintain such links as incoming files are manually moved into the appropriate subdirectories. For example, 10 files in a torrent may be sorted into 5 different subdirectories to make them organized for user access. Transmission should keep track of where they are for continued seeding of the torrent.

Another thing, which I am not sure of is how name file changes might affect the process. I anticipate using 2, 3 or more p2p programs simultaneously, such as Transmission and emule. It could be that an eD2k link was made for the same file but with a different filename. Is it possible to point to a single file to seed for both ed2k and torrents, even if the filename differs from the torrent?
killemov
Posts: 573
Joined: Sat Jul 31, 2010 5:04 pm

Re: Check torrent content against existing file library

Post by killemov »

BugMagnet wrote:So, how is it best to prevent downloading duplicate files which I already have in my library?

My idea would be to have an indexing utility that would create a hash table for all my existing files. Then when I add a torrent, the hash for each file within the torrent would be compared to the list of existing file hashes. Option then would be to mark matches 'do not download' or in the alternative, to ask user to decide what to do. Secondly, for sharing purposes, a link to the existing file location should be noted so that it can be sourced from there.
First, there is no hash per file. There is a hash per block and the hash per torrent. So generating a hashtable of all your files is almost useless.

Second, this has absolutely nothing to do with transmission.

Third, you need (to write) a 2-pass filtering system.
The first pass analyzes the filedata within actual torrent and searches the filesystem for matches and persist that. Then the system places the torrent in the watched folder and waits for it to be picked up. Then the torrent is paused and the matched files are deselected. Then the torrent is restarted.
The second pass is activated when the torrent has finished downloading. Again the filesystem is searched, but now with complete files. So match by filesize, hash (This could be the only use for your hashtable.), ... and delete the downloaded duplicate file or ...

Fourth, symbolic links are your friends. Or replace all the matched files (both passes) with symbolic links to the files in your collection. This way you could keep seeding the new torrent with your old and new files. This way you "bridge" or merge your torrents and get maximum exposure for your entire collection.

Good luck with that.
Post Reply