Taking transaction spider as an example, we discuss how to calculate transaction semantic vectors during the process of crawling transactions by the block order.
Firstly, you need to define your own pipeline (see MoTSPipeline) that can process continuously synchronized block data:
from BlockchainSpider.items import SyncItem, TransactionItem, TraceItem,\ Token721TransferItem, Token20TransferItem, Token1155TransferItemclassMoTSPipeline:def__init__(self): ...defprocess_item(self,item,spider):if self.file isNone:return itemifnotisinstance(item, SyncItem):return item# collect money transfer items# the 'data' field in SyncItem is a dict,# where keys are parsed item class names,# and values are parsed items.# all the items in a SyncItem is parsed from the same block txhash2edges =dict() transfer_type_names = [ cls.__name__for cls in [ TransactionItem, TraceItem, Token721TransferItem, Token20TransferItem, Token1155TransferItem, ] ]for name in transfer_type_names:ifnot item['data'].get(name):continuefor transfer_item in item['data'][name]: txhash = transfer_item['transaction_hash']ifnot txhash2edges.get(txhash): txhash2edges[txhash]=list() txhash2edges[txhash].append({'address_from': transfer_item['address_from'],'address_to': transfer_item['address_to'], })# create calc vec task vecs =list()for txhash, edges in txhash2edges.items(): vec =HighOrderMotifCounter(motif_size=4).count(edges) vecs.append(vec)# start the tasksfor txhash, vec inzip(txhashes, vecs): vec_list = [vec[i]for i inrange(1, 16+1)] self.writer.writerow([txhash, *vec_list])return item
Next, enable the pipeline in the settings:
# Enable or disable spider middlewares# See https://docs.scrapy.org/en/latest/topics/spider-middleware.htmlSPIDER_MIDDLEWARES ={'contrib.mots.middlewares.MoTSMiddleware':500,}
Finally, the following command can help you start the transaction spider, which can calculate and save the semantic vector of each transaction during the process of crawling transaction data: