Configuration¶
Key |
Value type |
Explanation |
Default value |
|---|---|---|---|
max_links_fetching |
Integer |
The maximum amount of download links that shall be scraped. |
|
max_links_download |
Integer |
The maximum amount of files that shall be downloaded. |
|
max_filesize |
Integer |
Given max_filesize = x, files bigger than x MiB will be ignored. |
|
skip_resources |
Integer |
The number of links that shall be skipped during scraping, in other words: the offset. This is useful if the scraping process was interrupted. |
|
timeout |
Integer |
The number of seconds after a web request fails due to a timeout. |
|
storage_directory |
String |
The path where the files shall be saved. |
|
mongo_url |
String |
The address for the MongoDB database. |
|
mongo_port |
Integer |
The port for the MongoDB database to listen to. |
|
collection_name |
String |
The name of the MongoDB collection where the information is stored. You can call it whatever you want. |
|
master_address |
String |
The address of the master server. |
|
master_port |
Integer |
The port at which the master server shall listen to. |
|
delay_seconds |
Integer |
The number of seconds before the master server sends a message. This is only useful for development purposes. |
|
max_threads |
Integer |
Given max_threads = x, all nodes will use up to x threads to proceed the files. |
|
nodes |
Array of node objects |
The node servers that shall be used. |
|
Node objects¶
Every “node object” is a JSON object with the following attributes (please note that address and port have to be set):
Key |
Value type |
Explanation |
Example |
|---|---|---|---|
address |
String |
The address of the node server. |
|
port |
Integer |
The port of the node server. |
|
uuid |
UUID4 |
(Optional) A UUID for the node to identify it. |
|
Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | {
"max_filesize": 2048,
"timeout": 30,
"storage_directory": "/mnt/jhound_data",
"mongo_url": "jhound.informatik.uni-rostock.de",
"collection_name": "repositories_uni",
"master": {
"address": "localhost",
"port": 9876,
"max_threads": 4
},
"nodes": [
{
"address": "jh1.some.domain",
"port": 1234
},
{
"address": "jh2.some.domain",
"port": 1234
}
]
}
|