From c3e8c7596838bb7797365f747793120dbfdd16f5 Mon Sep 17 00:00:00 2001
From: zachary62 <zhuang333@wisc.edu>
Date: Sat, 28 Dec 2024 20:50:24 +0000
Subject: [PATCH] communication

---
 docs/communication.md | 140 +++++++++++++++---------------------------
 docs/index.md         |   4 +-
 docs/llm.md           |   1 -
 docs/node.md          |   2 +-
 4 files changed, 52 insertions(+), 95 deletions(-)

diff --git a/docs/communication.md b/docs/communication.md
index 6efe873..3462ed6 100644
--- a/docs/communication.md
+++ b/docs/communication.md
@@ -7,12 +7,17 @@ nav_order: 3
 
 # Communication
 
-In **Mini LLM Flow**, Nodes and Flows **communicate** with each other in two ways:
+Nodes and Flows **communicate** in two ways:
 
-1. **Shared Store** – A global data structure (often a Python dict) that every Node can read from and write to.
-2. **Params** – Small pieces of metadata or configuration, set on each Node or Flow, typically used to identify items or tweak behavior.
+1. **Shared Store** – A global data structure (often an in-mem dict) that all nodes can read from and write to. Every Node’s `prep()`, `exec()`, and `post()` methods receive the **same** `shared` store.  
+2. **Params** – Each node and Flow has a `params` dict assigned by the **parent Flow**. Params mostly serve as identifiers, letting each node/flow know what task it’s assigned.
 
-This design avoids complex message-passing or data routing. It also lets you **nest** Flows easily without having to manage multiple channels.
+If you know memory management, **Shared Store** is like a **heap** shared across function calls, while **Params** is like a **stack** assigned by parent function calls.
+
+
+### Why Not Message Passing?
+
+**Message passing** can work for simple DAGs (e.g., for data pipelines), but with **nested graphs** (Flows containing Flows, repeated or cyclic calls), routing messages becomes hard to maintain. A shared store keeps the design simpler and easier.
 
 ---
 
@@ -20,15 +25,13 @@ This design avoids complex message-passing or data routing. It also lets you **n
 
 ### Overview
 
-A shared store is typically a Python dictionary, like:
+A shared store is typically an in-mem dictionary, like:
+```python
+shared = {"data": {}, "summary": {}, "config": {...}, ...}
 ```python
-shared = {"data": {}, "summary": {}, "config": { ... }, ...}
-```
 
-Every Node’s `prep()`, `exec()`, and `post()` methods receive the **same** `shared` object. This makes it easy to:
-- Read data that another Node loaded, such as a text file or database record.
-- Write results for later Nodes to consume.
-- Maintain consistent state across the entire Flow.
+It can also contain local file handlers, DB connections, or a combination for persistence.  
+We recommend deciding the data structure or DB schema in advance based on your app requirements.
 
 ### Example
 
@@ -39,13 +42,6 @@ class LoadData(Node):
         shared["data"]["my_file.txt"] = "Some text content"
         return None
 
-    def exec(self, shared, prep_res):
-        # Not doing anything special here
-        return None
-
-    def post(self, shared, prep_res, exec_res):
-        return "default"
-
 class Summarize(Node):
     def prep(self, shared):
         # We can read what LoadData wrote
@@ -60,30 +56,32 @@ class Summarize(Node):
     def post(self, shared, prep_res, exec_res):
         shared["summary"]["my_file.txt"] = exec_res
         return "default"
+
+load_data = LoadData()
+summarize = Summarize()
+load_data >> summarize
+flow = Flow(start=load_data)
+
+shared = {}
+flow.run(shared)
 ```
 
-Here,
+Here:
 - `LoadData` writes to `shared["data"]`.
 - `Summarize` reads from the same location.  
-No special data-passing code—just the same `shared` object.
-
-### Why Not Message Passing?
-
-**Message-passing** can be great for simple DAGs, but with **nested graphs** (Flows containing Flows, repeated or cyclic calls), routing messages can become complicated. A shared store keeps the design simpler and easier to debug.
+No special data-passing—just the same `shared` object.
 
 ---
 
 ## 2. Params
 
-**Params** let you store **per-Node** or **per-Flow** configuration that does **not** need to be in the global store. They are:
-- **Immutable** during a Node’s run cycle (i.e., don’t change mid-run).
-- **Set** via `set_params()`.
-- **Cleared** or updated each time you call the Flow or Node again.
+**Params** let you store **per-Node** or **per-Flow** config that doesn’t need to live in the global store. They are:
+- **Immutable** during a Node’s run cycle (i.e., they don’t change mid-`prep`, `exec`, `post`).
+- **Set** via `set_params()`.  
+  ⚠️ Only set the uppermost Flow params because others will be overwritten by the parent Flow. If you need to set child node params, see [Batch](./batch.md).
+- **Cleared** and updated each time a parent Flow calls it.
 
-Common examples:
-- **File names** to process.
-- **Model hyperparameters** for an LLM call.
-- **API credentials** or specialized flags.
+Typically, **Params** are identifiers (e.g., file name, page number). Use them to fetch the task you assigned or write to a specific part of the shared store.
 
 ### Example
 
@@ -106,73 +104,33 @@ class SummarizeFile(Node):
 
 # 2) Set params
 node = SummarizeFile()
+
+# 3) Set Node params directly (for testing)
 node.set_params({"filename": "doc1.txt"})
-
-# 3) Run
 node.run(shared)
-```
 
-Because **params** are only for that Node, you don’t pollute the global `shared` with fields that might only matter to one operation.
+# 4) Create Flow
+flow = Flow(start=node)
+
+# 5) Set Flow params (overwrites node params)
+flow.set_params({"filename": "doc2.txt"})
+flow.run(shared)  # The node summarizes doc2, not doc1
+```
 
 ---
 
 ## 3. Shared Store vs. Params
 
-- **Shared Store**: 
-  - Public, global. 
+Think of the **Shared Store** like a heap and **Params** like a stack.
+
+- **Shared Store**:
+  - Public, global.
+  - You can design and populate ahead, e.g., for the input to process.
   - Great for data results, large content, or anything multiple nodes need.
-  - Must be carefully structured (like designing a mini schema).
+  - Keep it tidy—structure it carefully (like a mini schema).
 
 - **Params**:
-  - Local, ephemeral config for a single node or flow execution.
-  - Perfect for small values such as filenames or numeric IDs.
-  - Does **not** persist across different nodes unless specifically copied into `shared`.
-
----
-
-## 4. Best Practices
-
-1. **Design a Clear `shared` Schema**  
-   - Decide on keys upfront. Example: `shared["data"]` for raw data, `shared["summary"]` for results, etc.
-
-2. **Use Params for Identifiers / Config**  
-   - If you need to pass a single ID or filename to a Node, **params** are usually best.
-
-3. **Don’t Overuse the Shared Store**  
-   - Keep it tidy. If a piece of data only matters to one Node, consider using `params` or discarding it after usage.
-
-4. **Ensure `shared` Is Accessible**  
-   - If you switch from an in-memory dict to a database or file-based approach, the Node code can remain the same as long as your `shared` interface is consistent.
-
----
-
-## Putting It All Together
-
-```python
-# Suppose you have a flow:
-load_data >> summarize_file
-my_flow = Flow(start=load_data)
-
-# Example usage:
-load_data.set_params({"path": "path/to/data/folder"})  # local param for load_data
-summarize_file.set_params({"filename": "my_text.txt"})  # local param for summarize_file
-
-# shared store
-shared = {
-    "data": {},
-    "summary": {}
-}
-
-my_flow.run(shared)
-# After run, shared["summary"]["my_text.txt"] might have the LLM summary
-```
-
-- `load_data` uses its param (`"path"`) to load some data into `shared["data"]`.
-- `summarize_file` uses its param (`"filename"`) to pick which file from `shared["data"]` to summarize.
-- They share results via `shared["summary"]`.
-
-That’s the **Mini LLM Flow** approach to communication:  
-- **A single shared store** to handle large data or results for multiple Nodes.  
-- **Per-node params** for minimal configuration and identification.
-
-Use these patterns to build powerful, modular LLM pipelines with minimal overhead.
+  - Local, ephemeral.
+  - Passed in by parent Flows. You should only set it for the uppermost flow.
+  - Perfect for small values like filenames or numeric IDs.
+  - Do **not** persist across different nodes and are reset.
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index d0fdd95..7d2bcc4 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -42,6 +42,6 @@ We model the LLM workflow as a **Nested Flow**:
 - Structured Output
 - Evaluation
 
-## Example Use Cases
+## Example Projects
 
-TODO
+- TODO
diff --git a/docs/llm.md b/docs/llm.md
index fa5fa98..e1612a9 100644
--- a/docs/llm.md
+++ b/docs/llm.md
@@ -62,7 +62,6 @@ def call_llm(prompt):
     return response
 ```
 
-
 ## Why Not Provide a Built-in LLM Wrapper?
 I believe it is a **bad practice** to provide LLM-specific implementations in a general framework:
 - **LLM APIs change frequently**. Hardcoding them makes maintenance a nighmare.
diff --git a/docs/node.md b/docs/node.md
index 27b235f..6784ac1 100644
--- a/docs/node.md
+++ b/docs/node.md
@@ -46,7 +46,7 @@ def process_after_fail(self, shared, prep_res, exc):
 
 By **default**, it just re-raises `exc`. But you can return a fallback result instead. That fallback result becomes the `exec_res` passed to `post()`.
 
-## Minimal Example
+## Example
 
 ```python 
 class SummarizeFile(Node):