5.2 KiB
5.2 KiB
Design Doc: PocketFlow Code Generator
Please DON'T remove notes for AI
Requirements
Notes for AI: Keep it simple and clear. If the requirements are abstract, write concrete user stories
User Story: As a developer, I want an AI system that can take a LeetCode-style coding problem and automatically:
- Generate comprehensive test cases including edge cases
- Implement a solution function
- Test the implementation against the test cases
- When tests fail, intelligently decide whether to revise the test cases or the function
- Iterate until all tests pass
Sample Problem: Two Sum - Given an array of integers and a target, return indices of two numbers that add up to the target.
This is well-suited for AI because:
- ✅ Routine task: Test case generation follows patterns
- ✅ Creative task: Code generation from clear problem descriptions
- ✅ Clear decision criteria: Whether to revise tests vs implementation
Flow Design
Notes for AI:
- Consider the design patterns of agent, map-reduce, rag, and workflow. Apply them if they fit.
- Present a concise, high-level description of the workflow.
Applicable Design Pattern:
- Workflow Pattern: Sequential steps of test generation → coding → testing
- Agent Pattern: Decision-making when tests fail with structured output
- Context: Test results, current test cases, and function code
- Actions: Structured output to revise test cases and/or function
Flow high-level Design:
- Generate Test Cases: Create comprehensive input/output test pairs from problem description
- Implement Function: Write
def run_codefunction based on problem and current test cases - Run Tests: Execute function against all test cases using batch processing
- Revise: Analyze failures and output structured revisions for test cases and/or function
- Loop back to Run Tests until all pass
flowchart TD
start[Problem Input] --> generateTests[Generate Test Cases]
generateTests --> implement[Implement Function]
implement --> runTests[Run Tests - Batch]
runTests --> decision{All Tests Pass?}
decision -->|Yes| success[Success!]
decision -->|No| revise[Revise]
revise --> runTests
Utility Functions
Notes for AI:
- Understand the utility function definition thoroughly by reviewing the doc.
- Include only the necessary utility functions, based on nodes in the flow.
-
Call LLM (
utils/call_llm.py)- Input: prompt (str)
- Output: response (str)
- Used by all LLM-powered nodes for generating tests, code, and analysis
-
Execute Python Code (
utils/code_executor.py)- Input: function_code (str), input (dict/list/any)
- Output: output (any), error (str)
- Used by RunTests batch node to safely execute generated code against individual input
Node Design
Shared Memory
Notes for AI: Try to minimize data redundancy
The shared memory structure is organized as follows:
shared = {
"problem": "Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.",
"test_cases": [
{"input": {"nums": [2,7,11,15], "target": 9}, "expected": [0,1]},
# ... more test cases
],
"function_code": "def run_code(nums, target): ...",
"test_results": [
{"test_case": {...}, "passed": True/False, "error": "..."},
# ... results for each test case
],
"iteration_count": 0,
"max_iterations": 5
}
Node Steps
Notes for AI: Carefully decide whether to use Batch/Async Node/Flow.
- GenerateTestCases Node
- Purpose: Create comprehensive test cases including edge cases from problem description
- Type: Regular Node
- Steps:
- prep: Read problem description from shared store
- exec: Call LLM to generate diverse test cases in structured format
- post: Store test cases directly in shared["test_cases"]
- ImplementFunction Node
- Purpose: Generate
def run_codefunction based on problem and current test cases - Type: Regular Node
- Steps:
- prep: Read problem description and test cases from shared store
- exec: Call LLM to implement
def run_codefunction with clean code output - post: Store function code directly in shared["function_code"]
- RunTests Node
- Purpose: Execute function against all test cases using batch processing
- Type: Batch Node
- Steps:
- prep: Read function code from shared store, return list of test cases
- exec: Use code executor utility to run function against each individual test case
- post: Store all results in shared["test_results"], return "success" if all pass else "failure"
- Revise Node (Agent with Structured Output)
- Purpose: Analyze test failures and output structured revisions for test cases and/or function
- Type: Regular Node (Agent decision-making)
- Steps:
- prep: Read test results, test cases, function code, iteration count from shared store
- exec: Call LLM to analyze failures and output structured YAML with revised test cases and/or function code
- post: Update shared["test_cases"] and/or shared["function_code"] based on structured output