BlockchainSpider
  • What is BlockchainSpider?
  • Guides
    • Crawl a transaction subgraph
    • Collect label data
    • Collect transaction data
  • subgraph spiders
    • Overview
    • BFS
    • Poison
    • Haircut
    • APPR
    • TTR
  • label spiders
    • Overview
    • CryptoScamsDB
    • LabelCloud
    • OFAC
    • Tor
  • Transaction spiders
    • Overview
    • Collect by block order
    • Collect by transaction hash
  • Extractors
    • Overview
    • Deduplicate
    • Local community
  • Settings
    • APIKeys
    • Cache
    • Customizing your workflow
Powered by GitBook
On this page
  1. Settings

Cache

The cache settings will facilitate the restart of the spiders. In fact, after setting the cache directory of the spiders, all HTTP requests will be cached in the directory, so that the engine can directly read the local cache when sending the same request without going through the network.

Note: If you want to collect the latest data, please delete the cache first!

Well, you can config your cache directory in settings.py, just like this:

HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 0
HTTPCACHE_DIR='/path/to/cache'  # set your cache path here!
HTTPCACHE_IGNORE_HTTP_CODES = []
HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
HTTPCACHE_GZIP = True

If you use a relative path, the cache will appear in the .scrapy.

PreviousAPIKeysNextCustomizing your workflow

Last updated 1 year ago