2.7 KiB
2.7 KiB
| layout | title | parent | nav_order |
|---|---|---|---|
| default | Batch | Core Abstraction | 4 |
Batch
Batch makes it easier to handle large inputs in one Node or rerun a Flow multiple times. Useful for:
- Chunk-based processing (e.g., large texts in parts).
- Multi-file processing.
- Iterating over lists of params (e.g., user queries, documents, URLs).
1. BatchNode
A BatchNode extends Node but changes prep() and exec():
prep(shared): returns an iterable (list, generator, etc.).exec(shared, item): called once per item in that iterable.post(shared, prep_res, exec_res_list): receives a list of allexec()results, can combine or store them, and returns an Action.
Example: Summarize a Large File
class MapSummaries(BatchNode):
def prep(self, shared):
# Suppose we have a big file; chunk it
content = shared["data"].get("large_text.txt", "")
chunk_size = 10000
chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
return chunks
def exec(self, shared, chunk):
prompt = f"Summarize this chunk in 10 words: {chunk}"
summary = call_llm(prompt)
return summary
def post(self, shared, prep_res, exec_res_list):
combined = "\n".join(exec_res_list)
shared["summary"]["large_text.txt"] = combined
return "default"
map_summaries = MapSummaries()
flow = Flow(start=map_summaries)
flow.run(shared)
2. BatchFlow
A BatchFlow runs a Flow multiple times, each with different params. Think of it as a loop that replays the Flow for each param set.
Example: Summarize Many Files
class SummarizeAllFiles(BatchFlow):
def prep(self, shared):
filenames = list(shared["data"].keys()) # e.g., ["file1.txt", "file2.txt", ...]
return [{"filename": fn} for fn in filenames]
# Suppose we have a per-file flow:
# load_file >> summarize >> reduce etc.
summarize_file = SummarizeFile(start=load_file)
summarize_all_files = SummarizeAllFiles(start=summarize_file)
summarize_all_files.run(shared)
Under the hood:
prep(shared)returns a list of param dicts (e.g.,[{filename: "file1.txt"}, {filename: "file2.txt"}, ...]).- The BatchFlow iterates over them, sets params on the sub-Flow, and calls
flow.run(shared)each time. - The Flow is run repeatedly, once per item.
Nested or Multi-level Batches
You can nest a BatchFlow in another BatchFlow. For example:
- Outer batch: iterate over directories.
- Inner batch: summarize each file in a directory.
The outer BatchFlow’s exec() can return a list of directories; the inner BatchFlow then processes each file in those dirs.