helpers.classes package¶
Submodules¶
helpers.classes.arrayproperties module¶
-
class
helpers.classes.arrayproperties.ArrayProperties¶ Bases:
object-
add_object()¶
-
add_simple()¶
-
analysis()¶
-
end_of_object()¶ needed to count only the elements seen at the very level of the array
-
get_amount_empty()¶
-
get_amount_mixed()¶
-
get_amount_nested()¶
-
get_amount_object()¶
-
get_amount_only_array()¶
-
get_amount_simple()¶
-
get_results()¶
-
is_in_array()¶
-
nest()¶
-
un_nest()¶
-
helpers.classes.config_keys module¶
-
class
helpers.classes.config_keys.ConfigKeys(value)¶ Bases:
enum.EnumAn enumeration.
-
COLLECTION_NAME= 'collection_name'¶
-
DELAY_SECONDS= 'delay_seconds'¶
-
MASTER_ADDRESS= 'master_address'¶
-
MASTER_PORT= 'master_port'¶
-
MAX_FILESIZE= 'max_filesize'¶
-
MAX_LINKS_DOWNLOAD= 'max_links_download'¶
-
MAX_LINKS_FETCHING= 'max_links_fetching'¶
-
MAX_THREADS= 'max_threads'¶
-
SKIP_RESOURCES= 'skip_resources'¶
-
STORAGE_DIRECTORY= 'storage_directory'¶
-
TIMEOUT= 'timeout'¶
-
UPDATE_NODE_INTERVAL= 'update_node_interval'¶
-
helpers.classes.errors module¶
-
class
helpers.classes.errors.AnalysisErrors(value)¶ Bases:
helpers.classes.errors.ErrorsErrors that might occur during the data analysis.
-
UNKNOWN= 'Unknown error'¶
-
-
class
helpers.classes.errors.Errors(value)¶ Bases:
enum.EnumThis is just a pseudo-abstract class.
-
class
helpers.classes.errors.ScrapingErrors(value)¶ Bases:
helpers.classes.errors.ErrorsErrors that might occur while scraping or downloading a repository.
-
CKAN= 'CKAN error'¶
-
CONNECTION= 'Connection error'¶
-
CONTENTTYPE= 'Content-Type'¶
-
FILETOOLARGE= 'File too large'¶
-
MIMETYPE= 'MimeType'¶
-
PARSING= 'Parsing after download'¶
-
PROTOCOL= 'Unsupported protocol'¶
-
SSL= 'SSL'¶
-
STATUSCODE= 'Bad status code'¶
-
TIMEOUT= 'Timeout'¶
-
UNKNOWN= 'Unknown error'¶
-
helpers.classes.exceptions module¶
-
exception
helpers.classes.exceptions.InvalidConfigurationError¶ Bases:
Exception
helpers.classes.filestatus module¶
helpers.classes.job module¶
-
class
helpers.classes.job.AnalysisJob(descriptor, pid=None)¶ Bases:
helpers.classes.job.Job
-
class
helpers.classes.job.DownloadJob(descriptor, pid=None)¶ Bases:
helpers.classes.job.Job
helpers.classes.node module¶
-
class
helpers.classes.node.Node(address, port, uuid='39079a69-32b8-42e8-9d5b-7b0051507d42', enabled=True, storage_directory=None)¶ Bases:
object-
add_file(file)¶ Sets the UUID of the file the node currently works on.
-
get_address()¶ Returns the address of the node.
-
get_connection_tuple()¶ Returns a tuple containing the address and the port.
-
get_files()¶ Returns the UUID of the file the node currently works on.
-
get_port()¶ Returns the port number of the node.
-
get_semaphore()¶ Returns the semaphore value of the node.
-
get_storage_directory()¶
-
get_uuid()¶ Returns the UUID of the node.
-
is_enabled()¶
-
knows_uuid()¶
-
register_sent_uuid()¶
-
remove_file(file)¶ Removes a file from the file list.
- Returns
True if everything went fine, else False.
-
toggle()¶
-
helpers.classes.nodetype module¶
helpers.classes.repositorystatus module¶
helpers.classes.statisticsbuilder module¶
-
class
helpers.classes.statisticsbuilder.StatisticsBuilder¶ Bases:
object-
class
MultResProperties¶ Bases:
objectContains the variables used to count the occurences of multiplicity keywords in schemas. For reasons explained in the paper, these variables have no meaning for JSON documents.
-
compact()¶ Returns tuple ready to use elsewhere in the scripts.
-
get_allof()¶
-
get_anyof()¶
-
get_oneof()¶
-
-
NODE_TYPES= [<NodeType.STRING: 1>, <NodeType.NUMBER: 3>, <NodeType.INTEGER: 2>, <NodeType.BOOLEAN: 4>, <NodeType.ARRAY: 5>, <NodeType.OBJECT: 6>, <NodeType.NULL: 8>]¶
-
class
RequiredProperties¶ Bases:
objectContains the variables used to count the number of required properties within a document.
-
compact()¶ Returns tuple reay to use elsewhere in the sripts.
-
-
class
TypeCountProperties¶ Bases:
objectContains the variables used to count the occurences of types.
-
compact()¶ Returns tuple ready to use elsewhere in the scripts.
-
get_amount_of_abusive_booleans()¶
-
get_amount_of_abusive_numbers()¶
-
get_amount_of_arrays()¶
-
get_amount_of_booleans()¶
-
get_amount_of_empty_strings()¶
-
get_amount_of_integers()¶
-
get_amount_of_non_empty_strings()¶
-
get_amount_of_nulls()¶
-
get_amount_of_numbers()¶
-
get_amount_of_objects()¶
-
get_amount_of_strings()¶
-
get_amounts()¶
-
-
full_analysis(custom_objects)¶
-
req_analyze_co(root_custom_objects)¶ Computes the stats related to req/opt property characteristics.
-
class