pocketflow/docs/batch.md

84 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
layout: default
title: "Batch"
parent: "Core Abstraction"
nav_order: 4
---
# Batch
**Batch** makes it easier to handle large inputs in one Node or **rerun** a Flow multiple times. Useful for:
- **Chunk-based** processing (e.g., large texts in parts).
- **Multi-file** processing.
- **Iterating** over lists of params (e.g., user queries, documents, URLs).
## 1. BatchNode
A **BatchNode** extends `Node` but changes `prep()` and `exec()`:
- **`prep(shared)`**: returns an **iterable** (list, generator, etc.).
- **`exec(shared, item)`**: called **once** per item in that iterable.
- **`post(shared, prep_res, exec_res_list)`**: receives a **list** of all `exec()` results, can combine or store them, and returns an **Action**.
### Example: Summarize a Large File
```python
class MapSummaries(BatchNode):
def prep(self, shared):
# Suppose we have a big file; chunk it
content = shared["data"].get("large_text.txt", "")
chunk_size = 10000
chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
return chunks
def exec(self, shared, chunk):
prompt = f"Summarize this chunk in 10 words: {chunk}"
summary = call_llm(prompt)
return summary
def post(self, shared, prep_res, exec_res_list):
combined = "\n".join(exec_res_list)
shared["summary"]["large_text.txt"] = combined
return "default"
map_summaries = MapSummaries()
flow = Flow(start=map_summaries)
flow.run(shared)
```
---
## 2. BatchFlow
A **BatchFlow** runs a **Flow** multiple times, each with different `params`. Think of it as a loop that replays the Flow for each param set.
### Example: Summarize Many Files
```python
class SummarizeAllFiles(BatchFlow):
def prep(self, shared):
filenames = list(shared["data"].keys()) # e.g., ["file1.txt", "file2.txt", ...]
return [{"filename": fn} for fn in filenames]
# Suppose we have a per-file flow:
# load_file >> summarize >> reduce etc.
summarize_file = SummarizeFile(start=load_file)
summarize_all_files = SummarizeAllFiles(start=summarize_file)
summarize_all_files.run(shared)
```
**Under the hood**:
1. `prep(shared)` returns a list of param dicts (e.g., `[{filename: "file1.txt"}, {filename: "file2.txt"}, ...]`).
2. The BatchFlow **iterates** over them, sets params on the sub-Flow, and calls `flow.run(shared)` each time.
3. The Flow is run repeatedly, once per item.
### Nested or Multi-level Batches
You can nest a BatchFlow in another BatchFlow. For example:
- Outer batch: iterate over directories.
- Inner batch: summarize each file in a directory.
The **outer** BatchFlows `exec()` can return a list of directories; the **inner** BatchFlow then processes each file in those dirs.