structured output

2025-01-01 22:24:44 +00:00 · 2025-01-01 22:24:44 +00:00 · 1d8377f6cd
parent 118293a641
commit 1d8377f6cd
4 changed files with 142 additions and 18 deletions
--- a/docs/index.md
+++ b/docs/index.md
@ -8,6 +8,12 @@ nav_order: 1
 A [100-line](https://github.com/zachary62/miniLLMFlow/blob/main/minillmflow/__init__.py) minimalist LLM framework for *Agents, Task Decomposition, RAG, etc*.
 <div align="center">
  <img src="https://github.com/zachary62/miniLLMFlow/blob/main/assets/minillmflow.jpg?raw=true" width="400"/>
 </div>
 ## Core Abstraction
 We model the LLM workflow as a **Nested Directed Graph**:
 - **Nodes** handle simple (LLM) tasks.
 - Nodes connect through **Actions** (labeled edges) for *Agents*.  
@ -16,12 +22,7 @@ We model the LLM workflow as a **Nested Directed Graph**:
 - **Batch** Nodes/Flows for data-intensive tasks.
 - **Async** Nodes/Flows allow waits or **Parallel** execution
-<div align="center">
+To learn more:
  <img src="https://github.com/zachary62/miniLLMFlow/blob/main/assets/minillmflow.jpg?raw=true" width="400"/>
 </div>
 ## Core Abstraction
 - [Node](./node.md)
 - [Flow](./flow.md)
 - [Communication](./communication.md)
@ -29,20 +30,32 @@ We model the LLM workflow as a **Nested Directed Graph**:
 - [(Advanced) Async](./async.md)
 - [(Advanced) Parallel](./parallel.md)
-## Preparation
+## LLM Wrapper & Tools
 **We DO NOT provide built-in LLM wrappers and tools!**
 I believe it is a *bad practice* to provide low-level implementations in a general framework:
 - **APIs change frequently.** Hardcoding them makes maintenance a nightmare.
 - You may need **flexibility.** E.g., using fine-tunined LLMs or deploying local ones.
 - You may need **optimizations.** E.g., prompt caching, request batching, response streaming...
 We provide some simple example implementations:
 - [LLM Wrapper](./llm.md)
 - [Tool](./tool.md)
-## Paradigm Implementation
+## Paradigm
 Based on the core abstraction, we implement common high-level paradigms:
 - [Structured Output](./structure.md)
 - Task Decomposition
 - Agent
 - Map Reduce
 - RAG
- Structured Output
+- Chat Memory
 - Map Reduce
 - Agent
 - Multi-Agent
 - Evaluation
 ## Example Projects
- TODO
+- Coming soon ... 
--- a/docs/llm.md
+++ b/docs/llm.md
@ -62,9 +62,3 @@ def call_llm(prompt):
    return response
 ```
 ## Why Not Provide a Built-in LLM Wrapper?
 I believe it is a **bad practice** to provide LLM-specific implementations in a general framework:
 - **LLM APIs change frequently**. Hardcoding them makes maintenance a nighmare.
 - You may need **flexibility** to switch vendors, use fine-tuned models, or deploy local LLMs.
 - You may need **optimizations** like prompt caching, request batching, or response streaming.
--- a/docs/paradigm.md
+++ b/docs/paradigm.md
@ -0,0 +1,6 @@
 ---
 layout: default
 title: "Paradigm"
 nav_order: 4
 has_children: true
 ---
--- a/docs/structure.md
+++ b/docs/structure.md
@ -0,0 +1,111 @@
 ---
 layout: default
 title: "Structured Output"
 parent: "Paradigm"
 nav_order: 1
 ---
 # Structured Output
 In many use cases, you may want the LLM to output a specific structure, such as a list or a dictionary with predefined keys.
 There are several approaches to achieve a structured output:
 - **Prompting** the LLM to strictly return a defined structure.
 - Using LLMs that natively support **schema enforcement**.
 - **Post-processing** the LLM’s response to extract structured content.
 In practice, **Prompting** is simple and reliable for modern LLMs.
 ## Example Use Cases
 1. **Extracting Key Information**  
 ```yaml
 product:
  name: Widget Pro
  price: 199.99
  description: |
    A high-quality widget designed for professionals.
    Recommended for advanced users.
 ```
 2. **Summarizing Documents into Bullet Points**  
 ```yaml
 summary:
  - This product is easy to use.
  - It is cost-effective.
  - Suitable for all skill levels.
 ```
 3. **Generating Configuration Files**  
 ```yaml
 server:
  host: 127.0.0.1
  port: 8080
  ssl: true
 ```
 ## Prompt Engineering
 When prompting the LLM to produce **structured** output:
 1. **Wrap** the structure in code fences (e.g., ```yaml).
 2. **Validate** that all required fields exist (and retry if necessary).
 ### Example Text Summarization
 ```python
 class SummarizeNode(Node):
    def exec(self, prep_res):
        # Suppose `prep_res` is the text to summarize.
        prompt = f"""
 Please summarize the following text as YAML, with exactly 3 bullet points
 {prep_res}
 Now, output:
 ```yaml
 summary:
  - bullet 1
  - bullet 2
  - bullet 3
 ```"""
        response = call_llm(prompt)
        yaml_str = response.split("```yaml")[1].split("```")[0].strip()
        import yaml
        structured_result = yaml.safe_load(yaml_str)
        assert "summary" in structured_result
        assert isinstance(structured_result["summary"], list)
        return structured_result
 ```
 ### Why YAML instead of JSON?
 Current LLMs struggle with escaping. YAML is easier with strings since they don’t always need quotes.
 **In JSON**  
 ```json
 {
  "dialogue": "Alice said: \"Hello Bob.\\nHow are you?\\nI am good.\""
 }
 ```
 - Every double quote inside the string must be escaped with `\"`.
 - Each newline in the dialogue must be represented as `\n`.
 **In YAML**  
 ```yaml
 dialogue: |
  Alice said: "Hello Bob.
  How are you?
  I am good."
 ```
 - No need to escape interior quotes—just place the entire text under a block literal (`|`).
 - Newlines are naturally preserved without needing `\n`.