pocketflow/cookbook/pocketflow-code-generator/README.md

5.7 KiB

PocketFlow Code Generator

An intelligent AI system that takes LeetCode-style coding problems and automatically generates comprehensive test cases, implements solutions, and iteratively improves them until all tests pass.

Features

  • Automatic Test Case Generation: Creates diverse test cases including edge cases
  • Intelligent Code Implementation: Generates run_code functions with proper algorithms
  • Iterative Improvement: Analyzes failures and decides whether to revise tests or code
  • Rich Debugging Output: Detailed progress tracking and validation

Getting Started

  1. Install required dependencies:
pip install -r requirements.txt
  1. Set up your Anthropic API key:

    export ANTHROPIC_API_KEY="your-api-key-here"
    

    Test your API key is working:

    python utils/call_llm.py
    
  2. Run the code generator with the default Two Sum problem:

python main.py
  1. Or provide your own problem:
python main.py "Reverse a linked list. Given the head of a singly linked list, reverse the list and return the reversed list."

How It Works

The system follows an intelligent workflow combining Agent and Workflow design patterns:

flowchart TD
    start[Problem Input] --> generateTests[Generate Test Cases]
    generateTests --> implement[Implement Function]
    implement --> runTests[Run Tests - Batch]
    runTests --> decision{All Tests Pass?}
    decision -->|Yes| success[Success!]
    decision -->|No| revise[Revise - Agent Decision]
    revise --> runTests
    decision -->|Max Iterations| maxIter[Max Iterations Reached]

The Process

  1. GenerateTestCases: Creates 5-7 comprehensive test cases from problem description
  2. ImplementFunction: Writes a run_code function based on problem and test cases
  3. RunTests: Executes function against all test cases using batch processing
  4. Revise: Analyzes failures and makes intelligent decisions to revise test cases and/or function code
  5. Loop: Continues until all tests pass or max iterations reached

Sample Output

Here's what you'll see when running the Two Sum example:

Starting PocketFlow Code Generator...

=== Generated 7 Test Cases ===
1. Basic case - solution at beginning
   input: {'nums': [2, 7, 11, 15], 'target': 9}
   expected: [0, 1]
2. Basic case - solution in middle
   input: {'nums': [3, 2, 4], 'target': 6}
   expected: [1, 2]
3. Edge case - minimum array size with duplicates
   input: {'nums': [3, 3], 'target': 6}
   expected: [0, 1]
4. Case with negative numbers
   input: {'nums': [-1, -2, -3, -4, -5], 'target': -8}
   expected: [2, 4]
5. Case with zero and negative target
   input: {'nums': [0, 4, 3, 0], 'target': 0}
   expected: [0, 3]
6. Case with solution at the end
   input: {'nums': [1, 2, 3, 4, 5, 6], 'target': 11}
   expected: [4, 5]
7. Larger array case
   input: {'nums': [5, 75, 25, 45, 42, 2, 11, 9, 55, 12], 'target': 14}
   expected: [2, 6]

=== Implemented Function ===
def run_code(nums, target):
    # Dictionary to store number -> index mapping
    num_to_index = {}
    
    # Iterate through the array
    for i, num in enumerate(nums):
        # Calculate what number we need to reach the target
        complement = target - num
        
        # Check if the complement exists in our map
        if complement in num_to_index:
            # Found the pair! Return indices
            return [num_to_index[complement], i]
        
        # Store current number and its index
        num_to_index[num] = i
    
    # Should never reach here given problem constraints
    return []

=== Test Results: 6/7 Passed ===
Failed tests:
1. Larger array case:
   error: Expected [2, 6], got [0, 7]
   expected: [2, 6]

=== Revisions (Iteration 1) ===
Revising test cases:
  Test 7: 'Larger array case' -> 'Larger array case'
    old input: {'nums': [5, 75, 25, 45, 42, 2, 11, 9, 55, 12], 'target': 14}
    new input: {'nums': [5, 75, 25, 45, 42, 2, 11, 9, 55, 12], 'target': 14}
    old expected: [2, 6]
    new expected: [0, 7]

=== Test Results: 7/7 Passed ===

Key Features

Intelligent Decision Making

The Revise node acts as an agent that analyzes test failures and decides whether to:

  • Fix test cases (if they have incorrect expected outputs)
  • Fix the function implementation (if the logic is wrong)
  • Or both

Structured Output with Validation

All LLM interactions use YAML format with:

  • Reasoning fields: Transparent decision-making process
  • Validation asserts: Ensures outputs match expected structure
  • Rich debugging: Comprehensive logging of all steps

Batch Processing

The RunTests node uses PocketFlow's BatchNode to efficiently test the function against all test cases in parallel.

Files

Design Patterns Used

  • Workflow: Sequential steps of test generation → coding → testing
  • Agent: Intelligent decision-making when tests fail
  • Batch: Efficient parallel test execution
  • Structured Output: YAML validation for reliable LLM outputs