From a2447f0bb6d2920fb1dfe17e3616d59acf0412a1 Mon Sep 17 00:00:00 2001 From: zachary62 Date: Fri, 28 Feb 2025 18:24:16 -0500 Subject: [PATCH] update .cursorrules --- assets/.cursorrules => .cursorrules | 510 ++++++++++++++-------------- README.md | 2 +- 2 files changed, 258 insertions(+), 254 deletions(-) rename assets/.cursorrules => .cursorrules (86%) diff --git a/assets/.cursorrules b/.cursorrules similarity index 86% rename from assets/.cursorrules rename to .cursorrules index 3e4c5cc..97b2a8e 100644 --- a/assets/.cursorrules +++ b/.cursorrules @@ -1,155 +1,24 @@ - - -================================================ -File: docs/guide.md -================================================ ---- -layout: default -title: "Design Guidance" -parent: "Apps" -nav_order: 1 ---- - -# LLM System Design Guidance - - -## Example LLM Project File Structure - -``` -my_project/ -├── main.py -├── flow.py -├── utils/ -│ ├── __init__.py -│ ├── call_llm.py -│ └── search_web.py -├── tests/ -│ ├── __init__.py -│ ├── test_flow.py -│ └── test_nodes.py -├── requirements.txt -└── docs/ - └── design.md -``` - - -### `docs/` - -Store the documentation of the project. - -It should include a `design.md` file, which describes -- Project requirements -- Required utility functions -- High-level flow with a mermaid diagram -- Shared memory data structure -- For each node, discuss - - Node purpose and design (e.g., should it be a batch or async node?) - - How the data shall be read (for `prep`) and written (for `post`) - - How the data shall be processed (for `exec`) - -### `utils/` - -Houses functions for external API calls (e.g., LLMs, web searches, etc.). - -It’s recommended to dedicate one Python file per API call, with names like `call_llm.py` or `search_web.py`. Each file should include: - -- The function to call the API -- A main function to run that API call - -For instance, here’s a simplified `call_llm.py` example: - -```python -from openai import OpenAI - -def call_llm(prompt): - client = OpenAI(api_key="YOUR_API_KEY_HERE") - response = client.chat.completions.create( - model="gpt-4o", - messages=[{"role": "user", "content": prompt}] - ) - return response.choices[0].message.content - -def main(): - prompt = "Hello, how are you?" - print(call_llm(prompt)) - -if __name__ == "__main__": - main() -``` - -### `main.py` - -Serves as the project’s entry point. - -### `flow.py` - -Implements the application’s flow, starting with node followed by the flow structure. - - -### `tests/` - -Optionally contains all tests. Use `pytest` for testing flows, nodes, and utility functions. -For example, `test_call_llm.py` might look like: - -```python -from utils.call_llm import call_llm - -def test_call_llm(): - prompt = "Hello, how are you?" - assert call_llm(prompt) is not None -``` - -## System Design Steps - -1. **Project Requirements** - - Identify the project's core entities. - - Define each functional requirement and map out how these entities interact step by step. - -2. **Utility Functions** - - Determine the low-level utility functions you’ll need (e.g., for LLM calls, web searches, file handling). - - Implement these functions and write basic tests to confirm they work correctly. - -3. **Flow Design** - - Develop a high-level process flow that meets the project’s requirements. - - Specify which utility functions are used at each step. - - Identify possible decision points for *Node Actions* and data-intensive operations for *Batch* tasks. - - Illustrate the flow with a Mermaid diagram. - -4. **Data Structure** - - Decide how to store and update state, whether in memory (for smaller applications) or a database (for larger or persistent needs). - - Define data schemas or models that detail how information is stored, accessed, and updated. - -5. **Implementation** - - Start coding with a simple, direct approach (avoid over-engineering at first). - - For each node in your flow: - - **prep**: Determine how data is accessed or retrieved. - - **exec**: Outline the actual processing or logic needed. - - **post**: Handle any final updates or data persistence tasks. - -6. **Optimization** - - **Prompt Engineering**: Use clear and specific instructions with illustrative examples to reduce ambiguity. - - **Task Decomposition**: Break large, complex tasks into manageable, logical steps. - -7. **Reliability** - - **Structured Output**: Verify outputs conform to the required format. Consider increasing `max_retries` if needed. - - **Test Cases**: Develop clear, reproducible tests for each part of the flow. - - **Self-Evaluation**: Introduce an additional Node (powered by LLMs) to review outputs when the results are uncertain. - ================================================ File: docs/agent.md ================================================ --- layout: default title: "Agent" -parent: "Paradigm" +parent: "Design" nav_order: 6 --- # Agent -For many tasks, we need agents that take dynamic and recursive actions based on the inputs they receive. -You can create these agents as **Nodes** connected by *Actions* in a directed graph using [Flow](./flow.md). +Agent is a powerful design pattern, where node can take dynamic actions based on the context it receives. +To express an agent, create a Node (the agent) with [branching](./flow.md) to other nodes (Actions). +> The core of build **performant** and **reliable** agents boils down to: +> +> 1. **Context Management:** Provide *clear, relevant context* so agents can understand the problem.E.g., Rather than dumping an entire chat history or entire files, use a [Workflow](./decomp.md) that filters out and includes only the most relevant information. +> +> 2. **Action Space:** Define *a well-structured, unambiguous, and easy-to-use* set of actions. For instance, avoid creating overlapping actions like `read_databases` and `read_csvs`. Instead, unify data sources (e.g., move CSVs into a database) and design a single action. The action can be parameterized (e.g., string for search) or programmable (e.g., SQL queries). +{: .best-practice } ### Example: Search Agent @@ -234,8 +103,6 @@ flow = Flow(start=decide) flow.run({"query": "Who won the Nobel Prize in Physics 2024?"}) ``` - - ================================================ File: docs/async.md ================================================ @@ -436,10 +303,8 @@ Nodes and Flows **communicate** in two ways: If you know memory management, think of the **Shared Store** like a **heap** (shared by all function calls), and **Params** like a **stack** (assigned by the caller). -> **Best Practice:** Use `Shared Store` for almost all cases. It's flexible and easy to manage. It separates data storage from data processing, making the code more readable and easier to maintain. -> -> `Params` is more a syntax sugar for [Batch](./batch.md). -{: .note } +> Use `Shared Store` for almost all cases. It's flexible and easy to manage. It separates *Data Schema* from *Compute Logic*, making the code easier to maintain. `Params` is more a syntax sugar for [Batch](./batch.md). +{: .best-practice } --- @@ -551,14 +416,21 @@ File: docs/decomp.md ================================================ --- layout: default -title: "Task Decomposition" -parent: "Paradigm" +title: "Workflow" +parent: "Design" nav_order: 2 --- -# Task Decomposition +# Workflow -Many real-world tasks are too complex for one LLM call. The solution is to decompose them into multiple calls as a [Flow](./flow.md) of Nodes. +Many real-world tasks are too complex for one LLM call. The solution is to decompose them into a [chain](./flow.md) of multiple Nodes. + + +> - You don't want to make each task **too coarse**, because it may be *too complex for one LLM call*. +> - You don't want to make each task **too granular**, because then *the LLM call doesn't have enough context* and results are *not consistent across nodes*. +> +> You usually need multiple *iterations* to find the *sweet spot*. If the task has too many *edge cases*, consider using [Agents](./agent.md). +{: .best-practice } ### Example: Article Writing @@ -932,6 +804,123 @@ flowchart LR +================================================ +File: docs/guide.md +================================================ +--- +layout: default +title: "Design Guidance" +parent: "Apps" +nav_order: 1 +--- + +# LLM System Design Guidance + + +## System Design Steps + +1. **Project Requirements** + - Identify the project's core entities, and provide a step-by-step user story. + - Define a list of both functional and non-functional requirements. + +2. **Utility Functions** + - Determine the utility functions on which this project depends (e.g., for LLM calls, web searches, file handling). + - Implement these functions and write basic tests to confirm they work correctly. + +> After this step, don't jump straight into building an LLM system. +> +> First, make sure you clearly understand the problem by manually solving it using some example inputs. +> +> It's always easier to first build a solid intuition about the problem and its solution, then focus on automating the process. +{: .warning } + +3. **Flow Design** + - Build a high-level design of the flow of nodes (for example, using a Mermaid diagram) to automate the solution. + - For each node in your flow, specify: + - **prep**: How data is accessed or retrieved. + - **exec**: The specific utility function to use (ideally one function per node). + - **post**: How data is updated or persisted. + - Identify potential design patterns, such as Batch, Agent, or RAG. + +4. **Data Structure** + - Decide how you will store and update state (in memory for smaller applications or in a database for larger, persistent needs). + - If it isn’t straightforward, define data schemas or models detailing how information is stored, accessed, and updated. + - As you finalize your data structure, you may need to refine your flow design. + +5. **Implementation** + - For each node, implement the **prep**, **exec**, and **post** functions based on the flow design. + - Start coding with a simple, direct approach (avoid over-engineering at first). + - Add logging throughout the code to facilitate debugging. + +6. **Optimization** + - **Prompt Engineering**: Use clear, specific instructions with illustrative examples to reduce ambiguity. + - **Task Decomposition**: Break large or complex tasks into manageable, logical steps. + +7. **Reliability** + - **Structured Output**: Ensure outputs conform to the required format. Consider increasing `max_retries` if needed. + - **Test Cases**: Develop clear, reproducible tests for each part of the flow. + - **Self-Evaluation**: Introduce an additional node (powered by LLMs) to review outputs when results are uncertain. + +## Example LLM Project File Structure + +``` +my_project/ +├── main.py +├── flow.py +├── utils/ +│ ├── __init__.py +│ ├── call_llm.py +│ └── search_web.py +├── requirements.txt +└── docs/ + └── design.md +``` + +### `docs/` + +Holds all project documentation. Include a `design.md` file covering: +- Project requirements +- Utility functions +- High-level flow (with a Mermaid diagram) +- Shared memory data structure +- Node designs: + - Purpose and design (e.g., batch or async) + - Data read (prep) and write (post) + - Data processing (exec) + +### `utils/` + +Houses functions for external API calls (e.g., LLMs, web searches, etc.). It’s recommended to dedicate one Python file per API call, with names like `call_llm.py` or `search_web.py`. Each file should include: + +- The function to call the API +- A main function to run that API call for testing + +For instance, here’s a simplified `call_llm.py` example: + +```python +from openai import OpenAI + +def call_llm(prompt): + client = OpenAI(api_key="YOUR_API_KEY_HERE") + response = client.chat.completions.create( + model="gpt-4o", + messages=[{"role": "user", "content": prompt}] + ) + return response.choices[0].message.content + +if __name__ == "__main__": + prompt = "Hello, how are you?" + print(call_llm(prompt)) +``` + +### `main.py` + +Serves as the project’s entry point. + +### `flow.py` + +Implements the application’s flow, starting with node followed by the flow structure. + ================================================ File: docs/index.md ================================================ @@ -956,7 +945,7 @@ We model the LLM workflow as a **Nested Directed Graph**:
- +
@@ -974,23 +963,21 @@ We model the LLM workflow as a **Nested Directed Graph**: - [(Advanced) Async](./async.md) - [(Advanced) Parallel](./parallel.md) -## Low-Level Details +## Utility Functions - [LLM Wrapper](./llm.md) - [Tool](./tool.md) - [Viz and Debug](./viz.md) - Chunking -> We do not provide built-in implementations. -> -> Example implementations are provided as reference. +> We do not provide built-in utility functions. Example implementations are provided as reference. {: .warning } -## High-Level Paradigm +## Design Patterns - [Structured Output](./structure.md) -- [Task Decomposition](./decomp.md) +- [Workflow](./decomp.md) - [Map Reduce](./mapreduce.md) - [RAG](./rag.md) - [Chat Memory](./memory.md) @@ -1012,7 +999,7 @@ File: docs/llm.md --- layout: default title: "LLM Wrapper" -parent: "Details" +parent: "Utility" nav_order: 1 --- @@ -1113,13 +1100,19 @@ File: docs/mapreduce.md --- layout: default title: "Map Reduce" -parent: "Paradigm" +parent: "Design" nav_order: 3 --- # Map Reduce -Process large inputs by splitting them into chunks using [BatchNode](./batch.md), then combining results. +MapReduce is a design pattern suitable when you have either: +- Large input data (e.g., multiple files to process), or +- Large output data (e.g., multiple forms to fill) + +and there is a logical way to break the task into smaller, ideally independent parts. +You first break down the task using [BatchNode](./batch.md) in the map phase, followed by aggregation in the reduce phase. + ### Example: Document Summarization @@ -1151,7 +1144,7 @@ File: docs/memory.md --- layout: default title: "Chat Memory" -parent: "Paradigm" +parent: "Design" nav_order: 5 --- @@ -1197,59 +1190,81 @@ We can: 2. Use [vector search](./tool.md) to retrieve relevant exchanges beyond the last 4. ```python -class ChatWithMemory(Node): +################################ +# Node A: Retrieve user input & relevant messages +################################ +class ChatRetrieve(Node): def prep(self, s): - # Initialize shared dict s.setdefault("history", []) s.setdefault("memory_index", None) - user_input = input("You: ") - - # Retrieve relevant past if we have enough history and an index + return user_input + + def exec(self, user_input): + emb = get_embedding(user_input) relevant = [] - if len(s["history"]) > 8 and s["memory_index"]: - idx, _ = search_index(s["memory_index"], get_embedding(user_input), top_k=2) - relevant = [s["history"][i[0]] for i in idx] + if len(shared["history"]) > 8 and shared["memory_index"]: + idx, _ = search_index(shared["memory_index"], emb, top_k=2) + relevant = [shared["history"][i[0]] for i in idx] + return (user_input, relevant) - return {"user_input": user_input, "recent": s["history"][-8:], "relevant": relevant} + def post(self, s, p, r): + user_input, relevant = r + s["user_input"] = user_input + s["relevant"] = relevant + return "continue" - def exec(self, c): - messages = [{"role": "system", "content": "You are a helpful assistant."}] - # Include relevant history if any - if c["relevant"]: - messages.append({"role": "system", "content": f"Relevant: {c['relevant']}"}) - # Add recent history and the current user input - messages += c["recent"] + [{"role": "user", "content": c["user_input"]}] - return call_llm(messages) +################################ +# Node B: Call LLM, update history + index +################################ +class ChatReply(Node): + def prep(self, s): + user_input = s["user_input"] + recent = s["history"][-8:] + relevant = s.get("relevant", []) + return user_input, recent, relevant + + def exec(self, inputs): + user_input, recent, relevant = inputs + msgs = [{"role":"system","content":"You are a helpful assistant."}] + if relevant: + msgs.append({"role":"system","content":f"Relevant: {relevant}"}) + msgs.extend(recent) + msgs.append({"role":"user","content":user_input}) + ans = call_llm(msgs) + return ans def post(self, s, pre, ans): - # Update chat history - s["history"] += [ - {"role": "user", "content": pre["user_input"]}, - {"role": "assistant", "content": ans} - ] + user_input, _, _ = pre + s["history"].append({"role":"user","content":user_input}) + s["history"].append({"role":"assistant","content":ans}) - # When first reaching 8 messages, create index + # Manage memory index if len(s["history"]) == 8: - embeddings = [] + embs = [] for i in range(0, 8, 2): - e = s["history"][i]["content"] + " " + s["history"][i+1]["content"] - embeddings.append(get_embedding(e)) - s["memory_index"] = create_index(embeddings) - - # Embed older exchanges once we exceed 8 messages + text = s["history"][i]["content"] + " " + s["history"][i+1]["content"] + embs.append(get_embedding(text)) + s["memory_index"] = create_index(embs) elif len(s["history"]) > 8: - pair = s["history"][-10:-8] - embedding = get_embedding(pair[0]["content"] + " " + pair[1]["content"]) - s["memory_index"].add(np.array([embedding]).astype('float32')) - + text = s["history"][-2]["content"] + " " + s["history"][-1]["content"] + new_emb = np.array([get_embedding(text)]).astype('float32') + s["memory_index"].add(new_emb) + print(f"Assistant: {ans}") return "continue" -chat = ChatWithMemory() -chat - "continue" >> chat -flow = Flow(start=chat) -flow.run({}) +################################ +# Flow wiring +################################ +retrieve = ChatRetrieve() +reply = ChatReply() +retrieve - "continue" >> reply +reply - "continue" >> retrieve + +flow = Flow(start=retrieve) +shared = {} +flow.run(shared) ``` @@ -1259,7 +1274,7 @@ File: docs/multi_agent.md --- layout: default title: "(Advanced) Multi-Agents" -parent: "Paradigm" +parent: "Design" nav_order: 7 --- @@ -1268,6 +1283,8 @@ nav_order: 7 Multiple [Agents](./flow.md) can work together by handling subtasks and communicating the progress. Communication between agents is typically implemented using message queues in shared storage. +> Most of time, you don't need Multi-Agents. Start with a simple solution first. +{: .best-practice } ### Example Agent Communication: Message Queue @@ -1548,18 +1565,6 @@ print("Action returned:", action_result) # "default" print("Summary stored:", shared["summary"]) ``` - - -================================================ -File: docs/paradigm.md -================================================ ---- -layout: default -title: "Paradigm" -nav_order: 4 -has_children: true ---- - ================================================ File: docs/parallel.md ================================================ @@ -1577,6 +1582,14 @@ nav_order: 6 > Because of Python’s GIL, parallel nodes and flows can’t truly parallelize CPU-bound tasks (e.g., heavy numerical computations). However, they excel at overlapping I/O-bound work—like LLM calls, database queries, API requests, or file I/O. {: .warning } +> - **Ensure Tasks Are Independent**: If each item depends on the output of a previous item, **do not** parallelize. +> +> - **Beware of Rate Limits**: Parallel calls can **quickly** trigger rate limits on LLM services. You may need a **throttling** mechanism (e.g., semaphores or sleep intervals). +> +> - **Consider Single-Node Batch APIs**: Some LLMs offer a **batch inference** API where you can send multiple prompts in a single call. This is more complex to implement but can be more efficient than launching many parallel requests and mitigates rate limits. +{: .best-practice } + + ## AsyncParallelBatchNode Like **AsyncBatchNode**, but run `exec_async()` in **parallel**: @@ -1613,33 +1626,13 @@ parallel_flow = SummarizeMultipleFiles(start=sub_flow) await parallel_flow.run_async(shared) ``` - -## Best Practices - -- **Ensure Tasks Are Independent**: If each item depends on the output of a previous item, **do not** parallelize. - -- **Beware of Rate Limits**: Parallel calls can **quickly** trigger rate limits on LLM services. You may need a **throttling** mechanism (e.g., semaphores or sleep intervals). - -- **Consider Single-Node Batch APIs**: Some LLMs offer a **batch inference** API where you can send multiple prompts in a single call. This is more complex to implement but can be more efficient than launching many parallel requests and mitigates rate limits. - - -================================================ -File: docs/preparation.md -================================================ ---- -layout: default -title: "Details" -nav_order: 3 -has_children: true ---- - ================================================ File: docs/rag.md ================================================ --- layout: default title: "RAG" -parent: "Paradigm" +parent: "Design" nav_order: 4 --- @@ -1653,34 +1646,44 @@ Use [vector search](./tool.md) to find relevant context for LLM responses. ```python class PrepareEmbeddings(Node): def prep(self, shared): - texts = shared["texts"] - embeddings = [get_embedding(text) for text in texts] - shared["search_index"] = create_index(embeddings) + return shared["texts"] + + def exec(self, texts): + # Embed each text chunk + embs = [get_embedding(t) for t in texts] + return embs + + def post(self, shared, prep_res, exec_res): + shared["search_index"] = create_index(exec_res) + # no action string means "default" class AnswerQuestion(Node): def prep(self, shared): question = input("Enter question: ") - query_embedding = get_embedding(question) - indices, _ = search_index(shared["search_index"], query_embedding, top_k=1) - relevant_text = shared["texts"][indices[0][0]] - return question, relevant_text + return question - def exec(self, inputs): - question, context = inputs - prompt = f"Question: {question}\nContext: {context}\nAnswer: " + def exec(self, question): + q_emb = get_embedding(question) + idx, _ = search_index(shared["search_index"], q_emb, top_k=1) + best_id = idx[0][0] + relevant_text = shared["texts"][best_id] + prompt = f"Question: {question}\nContext: {relevant_text}\nAnswer:" return call_llm(prompt) - def post(self, shared, prep_res, exec_res): - print(f"Answer: {exec_res}") + def post(self, shared, p, answer): + print("Answer:", answer) -# Connect nodes +############################################ +# Wire up the flow prep = PrepareEmbeddings() qa = AnswerQuestion() prep >> qa -# Create flow -qa_flow = Flow(start=prep) -qa_flow.run(shared) +flow = Flow(start=prep) + +# Example usage +shared = {"texts": ["I love apples", "Cats are great", "The sky is blue"]} +flow.run(shared) ``` ================================================ @@ -1689,7 +1692,7 @@ File: docs/structure.md --- layout: default title: "Structured Output" -parent: "Paradigm" +parent: "Design" nav_order: 1 --- @@ -1771,6 +1774,9 @@ summary: return structured_result ``` +> Besides using `assert` statements, another popular way to validate schemas is [Pydantic](https://github.com/pydantic/pydantic) +{: .note } + ### Why YAML instead of JSON? Current LLMs struggle with escaping. YAML is easier with strings since they don't always need quotes. @@ -1804,7 +1810,7 @@ File: docs/tool.md --- layout: default title: "Tool" -parent: "Details" +parent: "Utility" nav_order: 2 --- @@ -1814,7 +1820,6 @@ Similar to LLM wrappers, we **don't** provide built-in tools. Here, we recommend --- - ## 1. Embedding Calls ```python @@ -2025,7 +2030,7 @@ File: docs/viz.md --- layout: default title: "Viz and Debug" -parent: "Details" +parent: "Utility" nav_order: 3 --- @@ -2162,4 +2167,3 @@ data_science_flow.run({}) ``` The output would be: `Call stack: ['EvaluateModelNode', 'ModelFlow', 'DataScienceFlow']` - diff --git a/README.md b/README.md index a09aab6..71ff130 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,7 @@ For a new development paradigmn: **Build LLM Apps by Chatting with LLM agents, N - **For quick questions**: Use the [GPT assistant](https://chatgpt.com/g/g-677464af36588191b9eba4901946557b-pocket-flow-assistant) (note: it uses older models not ideal for coding). - **For one-time LLM task**: Create a [ChatGPT](https://help.openai.com/en/articles/10169521-using-projects-in-chatgpt) or [Claude](https://www.anthropic.com/news/projects) project; upload the [docs](docs) to project knowledge. - - **For LLM App development**: Use [Cursor AI](https://www.cursor.com/). Copy [.cursorrules](assets/.cursorrules) to your project root as **[Cursor Rules](https://docs.cursor.com/context/rules-for-ai)**. + - **For LLM App development**: Use [Cursor AI](https://www.cursor.com/). Copy [.cursorrules](.cursorrules) to your project root as **[Cursor Rules](https://docs.cursor.com/context/rules-for-ai)**.