From c3e8c7596838bb7797365f747793120dbfdd16f5 Mon Sep 17 00:00:00 2001 From: zachary62 Date: Sat, 28 Dec 2024 20:50:24 +0000 Subject: [PATCH] communication --- docs/communication.md | 140 +++++++++++++++--------------------------- docs/index.md | 4 +- docs/llm.md | 1 - docs/node.md | 2 +- 4 files changed, 52 insertions(+), 95 deletions(-) diff --git a/docs/communication.md b/docs/communication.md index 6efe873..3462ed6 100644 --- a/docs/communication.md +++ b/docs/communication.md @@ -7,12 +7,17 @@ nav_order: 3 # Communication -In **Mini LLM Flow**, Nodes and Flows **communicate** with each other in two ways: +Nodes and Flows **communicate** in two ways: -1. **Shared Store** – A global data structure (often a Python dict) that every Node can read from and write to. -2. **Params** – Small pieces of metadata or configuration, set on each Node or Flow, typically used to identify items or tweak behavior. +1. **Shared Store** – A global data structure (often an in-mem dict) that all nodes can read from and write to. Every Node’s `prep()`, `exec()`, and `post()` methods receive the **same** `shared` store. +2. **Params** – Each node and Flow has a `params` dict assigned by the **parent Flow**. Params mostly serve as identifiers, letting each node/flow know what task it’s assigned. -This design avoids complex message-passing or data routing. It also lets you **nest** Flows easily without having to manage multiple channels. +If you know memory management, **Shared Store** is like a **heap** shared across function calls, while **Params** is like a **stack** assigned by parent function calls. + + +### Why Not Message Passing? + +**Message passing** can work for simple DAGs (e.g., for data pipelines), but with **nested graphs** (Flows containing Flows, repeated or cyclic calls), routing messages becomes hard to maintain. A shared store keeps the design simpler and easier. --- @@ -20,15 +25,13 @@ This design avoids complex message-passing or data routing. It also lets you **n ### Overview -A shared store is typically a Python dictionary, like: +A shared store is typically an in-mem dictionary, like: +```python +shared = {"data": {}, "summary": {}, "config": {...}, ...} ```python -shared = {"data": {}, "summary": {}, "config": { ... }, ...} -``` -Every Node’s `prep()`, `exec()`, and `post()` methods receive the **same** `shared` object. This makes it easy to: -- Read data that another Node loaded, such as a text file or database record. -- Write results for later Nodes to consume. -- Maintain consistent state across the entire Flow. +It can also contain local file handlers, DB connections, or a combination for persistence. +We recommend deciding the data structure or DB schema in advance based on your app requirements. ### Example @@ -39,13 +42,6 @@ class LoadData(Node): shared["data"]["my_file.txt"] = "Some text content" return None - def exec(self, shared, prep_res): - # Not doing anything special here - return None - - def post(self, shared, prep_res, exec_res): - return "default" - class Summarize(Node): def prep(self, shared): # We can read what LoadData wrote @@ -60,30 +56,32 @@ class Summarize(Node): def post(self, shared, prep_res, exec_res): shared["summary"]["my_file.txt"] = exec_res return "default" + +load_data = LoadData() +summarize = Summarize() +load_data >> summarize +flow = Flow(start=load_data) + +shared = {} +flow.run(shared) ``` -Here, +Here: - `LoadData` writes to `shared["data"]`. - `Summarize` reads from the same location. -No special data-passing code—just the same `shared` object. - -### Why Not Message Passing? - -**Message-passing** can be great for simple DAGs, but with **nested graphs** (Flows containing Flows, repeated or cyclic calls), routing messages can become complicated. A shared store keeps the design simpler and easier to debug. +No special data-passing—just the same `shared` object. --- ## 2. Params -**Params** let you store **per-Node** or **per-Flow** configuration that does **not** need to be in the global store. They are: -- **Immutable** during a Node’s run cycle (i.e., don’t change mid-run). -- **Set** via `set_params()`. -- **Cleared** or updated each time you call the Flow or Node again. +**Params** let you store **per-Node** or **per-Flow** config that doesn’t need to live in the global store. They are: +- **Immutable** during a Node’s run cycle (i.e., they don’t change mid-`prep`, `exec`, `post`). +- **Set** via `set_params()`. + ⚠️ Only set the uppermost Flow params because others will be overwritten by the parent Flow. If you need to set child node params, see [Batch](./batch.md). +- **Cleared** and updated each time a parent Flow calls it. -Common examples: -- **File names** to process. -- **Model hyperparameters** for an LLM call. -- **API credentials** or specialized flags. +Typically, **Params** are identifiers (e.g., file name, page number). Use them to fetch the task you assigned or write to a specific part of the shared store. ### Example @@ -106,73 +104,33 @@ class SummarizeFile(Node): # 2) Set params node = SummarizeFile() + +# 3) Set Node params directly (for testing) node.set_params({"filename": "doc1.txt"}) - -# 3) Run node.run(shared) -``` -Because **params** are only for that Node, you don’t pollute the global `shared` with fields that might only matter to one operation. +# 4) Create Flow +flow = Flow(start=node) + +# 5) Set Flow params (overwrites node params) +flow.set_params({"filename": "doc2.txt"}) +flow.run(shared) # The node summarizes doc2, not doc1 +``` --- ## 3. Shared Store vs. Params -- **Shared Store**: - - Public, global. +Think of the **Shared Store** like a heap and **Params** like a stack. + +- **Shared Store**: + - Public, global. + - You can design and populate ahead, e.g., for the input to process. - Great for data results, large content, or anything multiple nodes need. - - Must be carefully structured (like designing a mini schema). + - Keep it tidy—structure it carefully (like a mini schema). - **Params**: - - Local, ephemeral config for a single node or flow execution. - - Perfect for small values such as filenames or numeric IDs. - - Does **not** persist across different nodes unless specifically copied into `shared`. - ---- - -## 4. Best Practices - -1. **Design a Clear `shared` Schema** - - Decide on keys upfront. Example: `shared["data"]` for raw data, `shared["summary"]` for results, etc. - -2. **Use Params for Identifiers / Config** - - If you need to pass a single ID or filename to a Node, **params** are usually best. - -3. **Don’t Overuse the Shared Store** - - Keep it tidy. If a piece of data only matters to one Node, consider using `params` or discarding it after usage. - -4. **Ensure `shared` Is Accessible** - - If you switch from an in-memory dict to a database or file-based approach, the Node code can remain the same as long as your `shared` interface is consistent. - ---- - -## Putting It All Together - -```python -# Suppose you have a flow: -load_data >> summarize_file -my_flow = Flow(start=load_data) - -# Example usage: -load_data.set_params({"path": "path/to/data/folder"}) # local param for load_data -summarize_file.set_params({"filename": "my_text.txt"}) # local param for summarize_file - -# shared store -shared = { - "data": {}, - "summary": {} -} - -my_flow.run(shared) -# After run, shared["summary"]["my_text.txt"] might have the LLM summary -``` - -- `load_data` uses its param (`"path"`) to load some data into `shared["data"]`. -- `summarize_file` uses its param (`"filename"`) to pick which file from `shared["data"]` to summarize. -- They share results via `shared["summary"]`. - -That’s the **Mini LLM Flow** approach to communication: -- **A single shared store** to handle large data or results for multiple Nodes. -- **Per-node params** for minimal configuration and identification. - -Use these patterns to build powerful, modular LLM pipelines with minimal overhead. + - Local, ephemeral. + - Passed in by parent Flows. You should only set it for the uppermost flow. + - Perfect for small values like filenames or numeric IDs. + - Do **not** persist across different nodes and are reset. \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index d0fdd95..7d2bcc4 100644 --- a/docs/index.md +++ b/docs/index.md @@ -42,6 +42,6 @@ We model the LLM workflow as a **Nested Flow**: - Structured Output - Evaluation -## Example Use Cases +## Example Projects -TODO +- TODO diff --git a/docs/llm.md b/docs/llm.md index fa5fa98..e1612a9 100644 --- a/docs/llm.md +++ b/docs/llm.md @@ -62,7 +62,6 @@ def call_llm(prompt): return response ``` - ## Why Not Provide a Built-in LLM Wrapper? I believe it is a **bad practice** to provide LLM-specific implementations in a general framework: - **LLM APIs change frequently**. Hardcoding them makes maintenance a nighmare. diff --git a/docs/node.md b/docs/node.md index 27b235f..6784ac1 100644 --- a/docs/node.md +++ b/docs/node.md @@ -46,7 +46,7 @@ def process_after_fail(self, shared, prep_res, exc): By **default**, it just re-raises `exc`. But you can return a fallback result instead. That fallback result becomes the `exec_res` passed to `post()`. -## Minimal Example +## Example ```python class SummarizeFile(Node):