BlockchainSpider
  • What is BlockchainSpider?
  • Guides
    • Crawl a transaction subgraph
    • Collect label data
    • Collect transaction data
  • subgraph spiders
    • Overview
    • BFS
    • Poison
    • Haircut
    • APPR
    • TTR
  • label spiders
    • Overview
    • CryptoScamsDB
    • LabelCloud
    • OFAC
    • Tor
  • Transaction spiders
    • Overview
    • Collect by block order
    • Collect by transaction hash
  • Extractors
    • Overview
    • Deduplicate
    • Local community
  • Settings
    • APIKeys
    • Cache
    • Customizing your workflow
Powered by GitBook
On this page
  1. Extractors

Deduplicate

Some spiders may crawl duplicate data, such as transaction subgraph spiders. The built-in extractor of this project realizes the function of deduplicating transaction data. You can run on this command in the console:

python extract.py deduplicate\
-i /path/of/raw/data \
-o /path/of/output/data

Parameter Description:

  • -i: the input directory of raw data.

  • -o: the output directory of deduplicated data.

PreviousOverviewNextLocal community

Last updated 3 years ago