structured output

This commit is contained in:
zachary62 2025-01-01 22:24:44 +00:00
parent 118293a641
commit 1d8377f6cd
4 changed files with 142 additions and 18 deletions

View File

@ -8,6 +8,12 @@ nav_order: 1
A [100-line](https://github.com/zachary62/miniLLMFlow/blob/main/minillmflow/__init__.py) minimalist LLM framework for *Agents, Task Decomposition, RAG, etc*.
<div align="center">
<img src="https://github.com/zachary62/miniLLMFlow/blob/main/assets/minillmflow.jpg?raw=true" width="400"/>
</div>
## Core Abstraction
We model the LLM workflow as a **Nested Directed Graph**:
- **Nodes** handle simple (LLM) tasks.
- Nodes connect through **Actions** (labeled edges) for *Agents*.
@ -16,12 +22,7 @@ We model the LLM workflow as a **Nested Directed Graph**:
- **Batch** Nodes/Flows for data-intensive tasks.
- **Async** Nodes/Flows allow waits or **Parallel** execution
<div align="center">
<img src="https://github.com/zachary62/miniLLMFlow/blob/main/assets/minillmflow.jpg?raw=true" width="400"/>
</div>
## Core Abstraction
To learn more:
- [Node](./node.md)
- [Flow](./flow.md)
- [Communication](./communication.md)
@ -29,20 +30,32 @@ We model the LLM workflow as a **Nested Directed Graph**:
- [(Advanced) Async](./async.md)
- [(Advanced) Parallel](./parallel.md)
## Preparation
## LLM Wrapper & Tools
**We DO NOT provide built-in LLM wrappers and tools!**
I believe it is a *bad practice* to provide low-level implementations in a general framework:
- **APIs change frequently.** Hardcoding them makes maintenance a nightmare.
- You may need **flexibility.** E.g., using fine-tunined LLMs or deploying local ones.
- You may need **optimizations.** E.g., prompt caching, request batching, response streaming...
We provide some simple example implementations:
- [LLM Wrapper](./llm.md)
- [Tool](./tool.md)
## Paradigm Implementation
## Paradigm
Based on the core abstraction, we implement common high-level paradigms:
- [Structured Output](./structure.md)
- Task Decomposition
- Agent
- Map Reduce
- RAG
- Structured Output
- Chat Memory
- Map Reduce
- Agent
- Multi-Agent
- Evaluation
## Example Projects
- TODO
- Coming soon ...

View File

@ -62,9 +62,3 @@ def call_llm(prompt):
return response
```
## Why Not Provide a Built-in LLM Wrapper?
I believe it is a **bad practice** to provide LLM-specific implementations in a general framework:
- **LLM APIs change frequently**. Hardcoding them makes maintenance a nighmare.
- You may need **flexibility** to switch vendors, use fine-tuned models, or deploy local LLMs.
- You may need **optimizations** like prompt caching, request batching, or response streaming.

6
docs/paradigm.md Normal file
View File

@ -0,0 +1,6 @@
---
layout: default
title: "Paradigm"
nav_order: 4
has_children: true
---

111
docs/structure.md Normal file
View File

@ -0,0 +1,111 @@
---
layout: default
title: "Structured Output"
parent: "Paradigm"
nav_order: 1
---
# Structured Output
In many use cases, you may want the LLM to output a specific structure, such as a list or a dictionary with predefined keys.
There are several approaches to achieve a structured output:
- **Prompting** the LLM to strictly return a defined structure.
- Using LLMs that natively support **schema enforcement**.
- **Post-processing** the LLMs response to extract structured content.
In practice, **Prompting** is simple and reliable for modern LLMs.
## Example Use Cases
1. **Extracting Key Information**
```yaml
product:
name: Widget Pro
price: 199.99
description: |
A high-quality widget designed for professionals.
Recommended for advanced users.
```
2. **Summarizing Documents into Bullet Points**
```yaml
summary:
- This product is easy to use.
- It is cost-effective.
- Suitable for all skill levels.
```
3. **Generating Configuration Files**
```yaml
server:
host: 127.0.0.1
port: 8080
ssl: true
```
## Prompt Engineering
When prompting the LLM to produce **structured** output:
1. **Wrap** the structure in code fences (e.g., ```yaml).
2. **Validate** that all required fields exist (and retry if necessary).
### Example Text Summarization
```python
class SummarizeNode(Node):
def exec(self, prep_res):
# Suppose `prep_res` is the text to summarize.
prompt = f"""
Please summarize the following text as YAML, with exactly 3 bullet points
{prep_res}
Now, output:
```yaml
summary:
- bullet 1
- bullet 2
- bullet 3
```"""
response = call_llm(prompt)
yaml_str = response.split("```yaml")[1].split("```")[0].strip()
import yaml
structured_result = yaml.safe_load(yaml_str)
assert "summary" in structured_result
assert isinstance(structured_result["summary"], list)
return structured_result
```
### Why YAML instead of JSON?
Current LLMs struggle with escaping. YAML is easier with strings since they dont always need quotes.
**In JSON**
```json
{
"dialogue": "Alice said: \"Hello Bob.\\nHow are you?\\nI am good.\""
}
```
- Every double quote inside the string must be escaped with `\"`.
- Each newline in the dialogue must be represented as `\n`.
**In YAML**
```yaml
dialogue: |
Alice said: "Hello Bob.
How are you?
I am good."
```
- No need to escape interior quotes—just place the entire text under a block literal (`|`).
- Newlines are naturally preserved without needing `\n`.