communication
This commit is contained in:
parent
d2017a05e2
commit
c3e8c75968
|
|
@ -7,12 +7,17 @@ nav_order: 3
|
|||
|
||||
# Communication
|
||||
|
||||
In **Mini LLM Flow**, Nodes and Flows **communicate** with each other in two ways:
|
||||
Nodes and Flows **communicate** in two ways:
|
||||
|
||||
1. **Shared Store** – A global data structure (often a Python dict) that every Node can read from and write to.
|
||||
2. **Params** – Small pieces of metadata or configuration, set on each Node or Flow, typically used to identify items or tweak behavior.
|
||||
1. **Shared Store** – A global data structure (often an in-mem dict) that all nodes can read from and write to. Every Node’s `prep()`, `exec()`, and `post()` methods receive the **same** `shared` store.
|
||||
2. **Params** – Each node and Flow has a `params` dict assigned by the **parent Flow**. Params mostly serve as identifiers, letting each node/flow know what task it’s assigned.
|
||||
|
||||
This design avoids complex message-passing or data routing. It also lets you **nest** Flows easily without having to manage multiple channels.
|
||||
If you know memory management, **Shared Store** is like a **heap** shared across function calls, while **Params** is like a **stack** assigned by parent function calls.
|
||||
|
||||
|
||||
### Why Not Message Passing?
|
||||
|
||||
**Message passing** can work for simple DAGs (e.g., for data pipelines), but with **nested graphs** (Flows containing Flows, repeated or cyclic calls), routing messages becomes hard to maintain. A shared store keeps the design simpler and easier.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -20,15 +25,13 @@ This design avoids complex message-passing or data routing. It also lets you **n
|
|||
|
||||
### Overview
|
||||
|
||||
A shared store is typically a Python dictionary, like:
|
||||
A shared store is typically an in-mem dictionary, like:
|
||||
```python
|
||||
shared = {"data": {}, "summary": {}, "config": {...}, ...}
|
||||
```python
|
||||
shared = {"data": {}, "summary": {}, "config": { ... }, ...}
|
||||
```
|
||||
|
||||
Every Node’s `prep()`, `exec()`, and `post()` methods receive the **same** `shared` object. This makes it easy to:
|
||||
- Read data that another Node loaded, such as a text file or database record.
|
||||
- Write results for later Nodes to consume.
|
||||
- Maintain consistent state across the entire Flow.
|
||||
It can also contain local file handlers, DB connections, or a combination for persistence.
|
||||
We recommend deciding the data structure or DB schema in advance based on your app requirements.
|
||||
|
||||
### Example
|
||||
|
||||
|
|
@ -39,13 +42,6 @@ class LoadData(Node):
|
|||
shared["data"]["my_file.txt"] = "Some text content"
|
||||
return None
|
||||
|
||||
def exec(self, shared, prep_res):
|
||||
# Not doing anything special here
|
||||
return None
|
||||
|
||||
def post(self, shared, prep_res, exec_res):
|
||||
return "default"
|
||||
|
||||
class Summarize(Node):
|
||||
def prep(self, shared):
|
||||
# We can read what LoadData wrote
|
||||
|
|
@ -60,30 +56,32 @@ class Summarize(Node):
|
|||
def post(self, shared, prep_res, exec_res):
|
||||
shared["summary"]["my_file.txt"] = exec_res
|
||||
return "default"
|
||||
|
||||
load_data = LoadData()
|
||||
summarize = Summarize()
|
||||
load_data >> summarize
|
||||
flow = Flow(start=load_data)
|
||||
|
||||
shared = {}
|
||||
flow.run(shared)
|
||||
```
|
||||
|
||||
Here,
|
||||
Here:
|
||||
- `LoadData` writes to `shared["data"]`.
|
||||
- `Summarize` reads from the same location.
|
||||
No special data-passing code—just the same `shared` object.
|
||||
|
||||
### Why Not Message Passing?
|
||||
|
||||
**Message-passing** can be great for simple DAGs, but with **nested graphs** (Flows containing Flows, repeated or cyclic calls), routing messages can become complicated. A shared store keeps the design simpler and easier to debug.
|
||||
No special data-passing—just the same `shared` object.
|
||||
|
||||
---
|
||||
|
||||
## 2. Params
|
||||
|
||||
**Params** let you store **per-Node** or **per-Flow** configuration that does **not** need to be in the global store. They are:
|
||||
- **Immutable** during a Node’s run cycle (i.e., don’t change mid-run).
|
||||
- **Set** via `set_params()`.
|
||||
- **Cleared** or updated each time you call the Flow or Node again.
|
||||
**Params** let you store **per-Node** or **per-Flow** config that doesn’t need to live in the global store. They are:
|
||||
- **Immutable** during a Node’s run cycle (i.e., they don’t change mid-`prep`, `exec`, `post`).
|
||||
- **Set** via `set_params()`.
|
||||
⚠️ Only set the uppermost Flow params because others will be overwritten by the parent Flow. If you need to set child node params, see [Batch](./batch.md).
|
||||
- **Cleared** and updated each time a parent Flow calls it.
|
||||
|
||||
Common examples:
|
||||
- **File names** to process.
|
||||
- **Model hyperparameters** for an LLM call.
|
||||
- **API credentials** or specialized flags.
|
||||
Typically, **Params** are identifiers (e.g., file name, page number). Use them to fetch the task you assigned or write to a specific part of the shared store.
|
||||
|
||||
### Example
|
||||
|
||||
|
|
@ -106,73 +104,33 @@ class SummarizeFile(Node):
|
|||
|
||||
# 2) Set params
|
||||
node = SummarizeFile()
|
||||
|
||||
# 3) Set Node params directly (for testing)
|
||||
node.set_params({"filename": "doc1.txt"})
|
||||
|
||||
# 3) Run
|
||||
node.run(shared)
|
||||
```
|
||||
|
||||
Because **params** are only for that Node, you don’t pollute the global `shared` with fields that might only matter to one operation.
|
||||
# 4) Create Flow
|
||||
flow = Flow(start=node)
|
||||
|
||||
# 5) Set Flow params (overwrites node params)
|
||||
flow.set_params({"filename": "doc2.txt"})
|
||||
flow.run(shared) # The node summarizes doc2, not doc1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Shared Store vs. Params
|
||||
|
||||
- **Shared Store**:
|
||||
- Public, global.
|
||||
Think of the **Shared Store** like a heap and **Params** like a stack.
|
||||
|
||||
- **Shared Store**:
|
||||
- Public, global.
|
||||
- You can design and populate ahead, e.g., for the input to process.
|
||||
- Great for data results, large content, or anything multiple nodes need.
|
||||
- Must be carefully structured (like designing a mini schema).
|
||||
- Keep it tidy—structure it carefully (like a mini schema).
|
||||
|
||||
- **Params**:
|
||||
- Local, ephemeral config for a single node or flow execution.
|
||||
- Perfect for small values such as filenames or numeric IDs.
|
||||
- Does **not** persist across different nodes unless specifically copied into `shared`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Best Practices
|
||||
|
||||
1. **Design a Clear `shared` Schema**
|
||||
- Decide on keys upfront. Example: `shared["data"]` for raw data, `shared["summary"]` for results, etc.
|
||||
|
||||
2. **Use Params for Identifiers / Config**
|
||||
- If you need to pass a single ID or filename to a Node, **params** are usually best.
|
||||
|
||||
3. **Don’t Overuse the Shared Store**
|
||||
- Keep it tidy. If a piece of data only matters to one Node, consider using `params` or discarding it after usage.
|
||||
|
||||
4. **Ensure `shared` Is Accessible**
|
||||
- If you switch from an in-memory dict to a database or file-based approach, the Node code can remain the same as long as your `shared` interface is consistent.
|
||||
|
||||
---
|
||||
|
||||
## Putting It All Together
|
||||
|
||||
```python
|
||||
# Suppose you have a flow:
|
||||
load_data >> summarize_file
|
||||
my_flow = Flow(start=load_data)
|
||||
|
||||
# Example usage:
|
||||
load_data.set_params({"path": "path/to/data/folder"}) # local param for load_data
|
||||
summarize_file.set_params({"filename": "my_text.txt"}) # local param for summarize_file
|
||||
|
||||
# shared store
|
||||
shared = {
|
||||
"data": {},
|
||||
"summary": {}
|
||||
}
|
||||
|
||||
my_flow.run(shared)
|
||||
# After run, shared["summary"]["my_text.txt"] might have the LLM summary
|
||||
```
|
||||
|
||||
- `load_data` uses its param (`"path"`) to load some data into `shared["data"]`.
|
||||
- `summarize_file` uses its param (`"filename"`) to pick which file from `shared["data"]` to summarize.
|
||||
- They share results via `shared["summary"]`.
|
||||
|
||||
That’s the **Mini LLM Flow** approach to communication:
|
||||
- **A single shared store** to handle large data or results for multiple Nodes.
|
||||
- **Per-node params** for minimal configuration and identification.
|
||||
|
||||
Use these patterns to build powerful, modular LLM pipelines with minimal overhead.
|
||||
- Local, ephemeral.
|
||||
- Passed in by parent Flows. You should only set it for the uppermost flow.
|
||||
- Perfect for small values like filenames or numeric IDs.
|
||||
- Do **not** persist across different nodes and are reset.
|
||||
|
|
@ -42,6 +42,6 @@ We model the LLM workflow as a **Nested Flow**:
|
|||
- Structured Output
|
||||
- Evaluation
|
||||
|
||||
## Example Use Cases
|
||||
## Example Projects
|
||||
|
||||
TODO
|
||||
- TODO
|
||||
|
|
|
|||
|
|
@ -62,7 +62,6 @@ def call_llm(prompt):
|
|||
return response
|
||||
```
|
||||
|
||||
|
||||
## Why Not Provide a Built-in LLM Wrapper?
|
||||
I believe it is a **bad practice** to provide LLM-specific implementations in a general framework:
|
||||
- **LLM APIs change frequently**. Hardcoding them makes maintenance a nighmare.
|
||||
|
|
|
|||
|
|
@ -46,7 +46,7 @@ def process_after_fail(self, shared, prep_res, exc):
|
|||
|
||||
By **default**, it just re-raises `exc`. But you can return a fallback result instead. That fallback result becomes the `exec_res` passed to `post()`.
|
||||
|
||||
## Minimal Example
|
||||
## Example
|
||||
|
||||
```python
|
||||
class SummarizeFile(Node):
|
||||
|
|
|
|||
Loading…
Reference in New Issue