pocketflow/docs/batch.md

2.7 KiB
Raw Blame History

layout title parent nav_order
default Batch Core Abstraction 4

Batch

Batch makes it easier to handle large inputs in one Node or rerun a Flow multiple times. Useful for:

  • Chunk-based processing (e.g., large texts in parts).
  • Multi-file processing.
  • Iterating over lists of params (e.g., user queries, documents, URLs).

1. BatchNode

A BatchNode extends Node but changes prep() and exec():

  • prep(shared): returns an iterable (list, generator, etc.).
  • exec(shared, item): called once per item in that iterable.
  • post(shared, prep_res, exec_res_list): receives a list of all exec() results, can combine or store them, and returns an Action.

Example: Summarize a Large File

class MapSummaries(BatchNode):
    def prep(self, shared):
        # Suppose we have a big file; chunk it
        content = shared["data"].get("large_text.txt", "")
        chunk_size = 10000
        chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
        return chunks

    def exec(self, shared, chunk):
        prompt = f"Summarize this chunk in 10 words: {chunk}"
        summary = call_llm(prompt)
        return summary

    def post(self, shared, prep_res, exec_res_list):
        combined = "\n".join(exec_res_list)
        shared["summary"]["large_text.txt"] = combined
        return "default"

map_summaries = MapSummaries()
flow = Flow(start=map_summaries)
flow.run(shared)

2. BatchFlow

A BatchFlow runs a Flow multiple times, each with different params. Think of it as a loop that replays the Flow for each param set.

Example: Summarize Many Files

class SummarizeAllFiles(BatchFlow):
    def prep(self, shared):
        filenames = list(shared["data"].keys())  # e.g., ["file1.txt", "file2.txt", ...]
        return [{"filename": fn} for fn in filenames]

# Suppose we have a per-file flow:
# load_file >> summarize >> reduce etc.
summarize_file = SummarizeFile(start=load_file)

summarize_all_files = SummarizeAllFiles(start=summarize_file)
summarize_all_files.run(shared)

Under the hood:

  1. prep(shared) returns a list of param dicts (e.g., [{filename: "file1.txt"}, {filename: "file2.txt"}, ...]).
  2. The BatchFlow iterates over them, sets params on the sub-Flow, and calls flow.run(shared) each time.
  3. The Flow is run repeatedly, once per item.

Nested or Multi-level Batches

You can nest a BatchFlow in another BatchFlow. For example:

  • Outer batch: iterate over directories.
  • Inner batch: summarize each file in a directory.

The outer BatchFlows exec() can return a list of directories; the inner BatchFlow then processes each file in those dirs.