# Customizing your workflow

In addition to the features provided by BlockchainSpider, you can customize the workflow of the spider by configuring the pipeline.

> **Note**: this page assumes that you already have an understanding of [pipeline ](https://docs.scrapy.org/en/latest/topics/item-pipeline.html)in Scrapy, and the [transaction semantic representation technique](https://dl.acm.org/doi/abs/10.1145/3543507.3583537).

Taking transaction spider as an example, we discuss how to calculate transaction semantic vectors during the process of crawling transactions by the block order.

**Firstly**, you need to define your own pipeline (see [MoTSPipeline](https://github.com/wuzhy1ng/BlockchainSpider/blob/master/contrib/mots/pipelines.py)) that can process continuously synchronized block data:

```python
from BlockchainSpider.items import SyncItem, TransactionItem, TraceItem, \
    Token721TransferItem, Token20TransferItem, Token1155TransferItem
    
class MoTSPipeline:
    def __init__(self):
        ...

    def process_item(self, item, spider):
        if self.file is None:
            return item
        if not isinstance(item, SyncItem):
            return item

        # collect money transfer items
        # the 'data' field in SyncItem is a dict,
        # where keys are parsed item class names,
        # and values are parsed items.
        # all the items in a SyncItem is parsed from the same block
        txhash2edges = dict()
        transfer_type_names = [
            cls.__name__ for cls in [
                TransactionItem, TraceItem,
                Token721TransferItem, Token20TransferItem,
                Token1155TransferItem,
            ]
        ]
        for name in transfer_type_names:
            if not item['data'].get(name):
                continue
            for transfer_item in item['data'][name]:
                txhash = transfer_item['transaction_hash']
                if not txhash2edges.get(txhash):
                    txhash2edges[txhash] = list()
                txhash2edges[txhash].append({
                    'address_from': transfer_item['address_from'],
                    'address_to': transfer_item['address_to'],
                })

        # create calc vec task
        vecs = list()
        for txhash, edges in txhash2edges.items():
            vec = HighOrderMotifCounter(motif_size=4).count(edges)
            vecs.append(vec)

        # start the tasks
        for txhash, vec in zip(txhashes, vecs):
            vec_list = [vec[i] for i in range(1, 16 + 1)]
            self.writer.writerow([txhash, *vec_list])
        return item
```

**Next**, enable the pipeline in the settings:

```python
# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
SPIDER_MIDDLEWARES = {
   'contrib.mots.middlewares.MoTSMiddleware': 500,
}
```

**Finally**, the following command can help you start the transaction spider, which can calculate and save the semantic vector of each transaction during the process of crawling transaction data:

```bash
scrapy crawl trans.block.evm
-a out=/path/to/output/data \
-a start_blk=19000000 -a end_blk=19001000 \
-a providers=https://freerpc.merkle.io \
-a enable=BlockchainSpider.middlewares.trans.TransactionReceiptMiddleware,BlockchainSpider.middlewares.trans.TraceMiddleware,BlockchainSpider.middlewares.trans.TokenTransferMiddleware
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://870167019.gitbook.io/blockchainspider/settings/customizing-your-workflow.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
