Page 1 of 1

How does DHT torrent indexing sites scrape infoHashes efficiently?

Posted: Wed Feb 17, 2021 11:51 am
by LouisJMurphy
I am interested how DHT torrent indexing site works. I have working scraper of inhoHashes written using nodejs lib. At first time I tried to execute behind NAT, but it was not efficient, then I went to BSD server with public IP and things are really better. In many publications about this topic, I have learnt that best solution is to run several virtual DHT nodes to scrape infoHashes faster. I have code which initiate several DHT nodes instances runned with unique NODEID and on own port.

My nodejs code:

"use strict"
talktowendys
const DHT = require('bittorrent-dht')
const crypto = require('crypto');

let DHTnodeID = []
for(let i = 1; i<=10; i++){
DHTnodeID.push({:crypto.createHash('sha1').update(`myDHTnodeLocal${i}`).digest('hex')}) //Give each node unique hash ID
}

let dhtOpt = {
nodeId: '', // 160-bit DHT node ID (Buffer or hex string, default: randomly generated)
//bootstrap: [], // bootstrap servers (default: router.bittorrent.com:6881, router.utorrent.com:6881, dht.transmissionbt.com:6881)
host: false, // host of local peer, if specified then announces get added to local table (String, disabled by default)
concurrency: 16, // k-rpc option to specify maximum concurrent UDP requests allowed (Number, 16 by default)
//hash: Function, // custom hash function to use (Function, SHA1 by default),
//krpc: krpc(), // optional k-rpc instance
//timeBucketOutdated: 900000, // check buckets every 15min
//maxAge: Infinity // optional setting for announced peers to time out
}

var dhtNodes = []
for(let i = 1; i<=DHTnodeID.length; i++){
dhtOpt.nodeId = DHTnodeID[i-1][String(i)]
dhtNodes.push(new DHT(dhtOpt))
}

let port = 6881 //run 10 DHT nodes
for(let item of dhtNodes){
item.listen(port, listenFce)
item.on('ready', readyFce)
item.on('announce', announceFce)

port++
}
Then I found one university research project, where is following statement:

The most obvious approach to increasing throughput is using several DHT nodes instead of one. Using several ports on a single IP address was not considered a viable option due to IP-address based filtering against potential DoS attacks. Instead the indexer is designed to run on several hosts or on a multihomed host. Individual instances synchronize their indexing activity through a shared relational database that stores discovered infohashes and the current processing stage for each .torrent file.

By Aaron Grunthal - University of Applied Sciences Esslingen

If above statement is true does it mean, that my 10 node DHT instances will be considered as DoS attack and can I be penalized somehow? If that is true, how then those websites (DHT torrent indexing site) deal with this problem? Is there any possibility to run efficient infoHash scraper with one public IP on one server? Obviously the more instances I execute the more hashes I get but above statement make me worry. Thank you very much in advance.