Deduplicate

Some spiders may crawl duplicate data, such as transaction subgraph spiders. The built-in extractor of this project realizes the function of deduplicating transaction data. You can run on this command in the console:

python extract.py deduplicate\
-i /path/of/raw/data \
-o /path/of/output/data

Parameter Description:

  • -i: the input directory of raw data.

  • -o: the output directory of deduplicated data.

Last updated