Some spiders may crawl duplicate data, such as transaction subgraph spiders. The built-in extractor of this project realizes the function of deduplicating transaction data. You can run on this command in the console:
python extract.py deduplicate\ -i /path/of/raw/data \ -o /path/of/output/data
Parameter Description:
-i: the input directory of raw data.
-i
-o: the output directory of deduplicated data.
-o
Last updated 3 years ago