From 86297ea2a8d7632b585538579ccd370209d9169a Mon Sep 17 00:00:00 2001 From: zachary62 Date: Tue, 31 Dec 2024 05:57:56 +0000 Subject: [PATCH] update doc --- docs/batch.md | 20 +++++++++++++++++++- docs/communication.md | 2 +- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/batch.md b/docs/batch.md index 24cafb0..b263845 100644 --- a/docs/batch.md +++ b/docs/batch.md @@ -86,4 +86,22 @@ You can nest a **BatchFlow** in another **BatchFlow**. For instance: - **Outer** batch: returns a list of diretory param dicts (e.g., `{"directory": "/pathA"}`, `{"directory": "/pathB"}`, ...). - **Inner** batch: returning a list of per-file param dicts. -At each level, **BatchFlow** merges its own param dict with the parent’s. By the time you reach the **innermost** node, the final `params` is the merged result of **all** parents in the chain. This way, a nested structure can keep track of the entire context (e.g., directory + file name) at once. \ No newline at end of file +At each level, **BatchFlow** merges its own param dict with the parent’s. By the time you reach the **innermost** node, the final `params` is the merged result of **all** parents in the chain. This way, a nested structure can keep track of the entire context (e.g., directory + file name) at once. + +```python + +class FileBatchFlow(BatchFlow): + def prep(self, shared): + directory = self.params["directory"] + files = [f for f in os.listdir(directory) if f.endswith(".txt")] + return [{"filename": f} for f in files] + +class DirectoryBatchFlow(BatchFlow): + def prep(self, shared): + directories = [ "/path/to/dirA", "/path/to/dirB"] + return [{"directory": d} for d in directories] + + +inner_flow = FileBatchFlow(start=MapSummaries()) +outer_flow = DirectoryBatchFlow(start=inner_flow) +``` \ No newline at end of file diff --git a/docs/communication.md b/docs/communication.md index e3d1b1e..bd74345 100644 --- a/docs/communication.md +++ b/docs/communication.md @@ -17,7 +17,7 @@ If you know memory management, **Shared Store** is like a **heap** shared across ### Why Not Use Other Communication Models like Message Passing? -**Message passing** works well for simple DAGs (e.g., for data pipelines), but with **nested graphs** (Flows containing Flows, repeated or cyclic calls), routing messages becomes hard to maintain. A shared store keeps the design simple and easy. +**Message passing** works well for simple DAGs, but with **nested graphs** (Flows containing Flows, repeated or cyclic calls), routing messages becomes hard to maintain. A shared store keeps the design simple and easy. ---