pocketflow/cookbook/pocketflow-code-generator/doc/design.md

# Design Doc: PocketFlow Code Generator

> Please DON'T remove notes for AI

## Requirements

> Notes for AI: Keep it simple and clear.
> If the requirements are abstract, write concrete user stories

**User Story**: As a developer, I want an AI system that can take a LeetCode-style coding problem and automatically:
1. Generate comprehensive test cases including edge cases
2. Implement a solution function
3. Test the implementation against the test cases
4. When tests fail, intelligently decide whether to revise the test cases or the function
5. Iterate until all tests pass

**Sample Problem**: Two Sum - Given an array of integers and a target, return indices of two numbers that add up to the target.

This is well-suited for AI because:
- ✅ Routine task: Test case generation follows patterns
- ✅ Creative task: Code generation from clear problem descriptions
- ✅ Clear decision criteria: Whether to revise tests vs implementation

## Flow Design

> Notes for AI:
> 1. Consider the design patterns of agent, map-reduce, rag, and workflow. Apply them if they fit.
> 2. Present a concise, high-level description of the workflow.

### Applicable Design Pattern:

1. **Workflow Pattern**: Sequential steps of test generation → coding → testing
2. **Agent Pattern**: Decision-making when tests fail with structured output
   - *Context*: Test results, current test cases, and function code
   - *Actions*: Structured output to revise test cases and/or function

### Flow high-level Design:

1. **Generate Test Cases**: Create comprehensive input/output test pairs from problem description
2. **Implement Function**: Write `def run_code` function based on problem and current test cases
3. **Run Tests**: Execute function against all test cases using batch processing
4. **Revise**: Analyze failures and output structured revisions for test cases and/or function
5. **Loop back to Run Tests** until all pass

```mermaid
flowchart TD
    start[Problem Input] --> generateTests[Generate Test Cases]
    generateTests --> implement[Implement Function]
    implement --> runTests[Run Tests - Batch]
    runTests --> decision{All Tests Pass?}
    decision -->|Yes| success[Success!]
    decision -->|No| revise[Revise]
    revise --> runTests
```

## Utility Functions

> Notes for AI:
> 1. Understand the utility function definition thoroughly by reviewing the doc.
> 2. Include only the necessary utility functions, based on nodes in the flow.

1. **Call LLM** (`utils/call_llm.py`)
   - *Input*: prompt (str)
   - *Output*: response (str)
   - Used by all LLM-powered nodes for generating tests, code, and analysis

2. **Execute Python Code** (`utils/code_executor.py`)
   - *Input*: function_code (str), test_case (dict)
   - *Output*: test_result (dict with passed, failed, error details)
   - Used by RunTests batch node to safely execute generated code against individual test cases

## Node Design

### Shared Memory

> Notes for AI: Try to minimize data redundancy

The shared memory structure is organized as follows:

```python
shared = {
    "problem": "Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.",
    "test_cases": [
        {"input": {"nums": [2,7,11,15], "target": 9}, "expected": [0,1]},
        # ... more test cases
    ],
    "function_code": "def run_code(nums, target): ...",
    "test_results": [
        {"test_case": {...}, "passed": True/False, "error": "..."},
        # ... results for each test case
    ],
    "iteration_count": 0,
    "max_iterations": 5
}
```

### Node Steps

> Notes for AI: Carefully decide whether to use Batch/Async Node/Flow.

1. **GenerateTestCases Node**
  - *Purpose*: Create comprehensive test cases including edge cases from problem description
  - *Type*: Regular Node
  - *Steps*:
    - *prep*: Read problem description from shared store
    - *exec*: Call LLM to generate diverse test cases in structured format
    - *post*: Store test cases directly in shared["test_cases"]

2. **ImplementFunction Node**
  - *Purpose*: Generate `def run_code` function based on problem and current test cases
  - *Type*: Regular Node
  - *Steps*:
    - *prep*: Read problem description and test cases from shared store
    - *exec*: Call LLM to implement `def run_code` function with clean code output
    - *post*: Store function code directly in shared["function_code"]

3. **RunTests Node**
  - *Purpose*: Execute function against all test cases using batch processing
  - *Type*: Batch Node
  - *Steps*:
    - *prep*: Read function code from shared store, return list of test cases
    - *exec*: Use code executor utility to run function against each individual test case
    - *post*: Store all results in shared["test_results"], return "success" if all pass else "failure"

4. **Revise Node** (Agent with Structured Output)
  - *Purpose*: Analyze test failures and output structured revisions for test cases and/or function
  - *Type*: Regular Node (Agent decision-making)
  - *Steps*:
    - *prep*: Read test results, test cases, function code, iteration count from shared store
    - *exec*: Call LLM to analyze failures and output structured YAML with revised test cases and/or function code
    - *post*: Update shared["test_cases"] and/or shared["function_code"] based on structured output