|
|
||
|---|---|---|
| .. | ||
| doc | ||
| utils | ||
| README.md | ||
| flow.py | ||
| main.py | ||
| nodes.py | ||
| requirements.txt | ||
README.md
PocketFlow Code Generator
An intelligent AI system that takes LeetCode-style coding problems and automatically generates comprehensive test cases, implements solutions, and iteratively improves them until all tests pass.
- Check out the Substack Post Tutorial for more!
Features
- Automatic Test Case Generation: Creates diverse test cases including edge cases
- Intelligent Code Implementation: Generates
run_codefunctions with proper algorithms - Iterative Improvement: Analyzes failures and decides whether to revise tests or code
- Rich Debugging Output: Detailed progress tracking and validation
Getting Started
- Install required dependencies:
pip install -r requirements.txt
-
Set up your Anthropic API key:
export ANTHROPIC_API_KEY="your-api-key-here"Test your API key is working:
python utils/call_llm.py -
Run the code generator with the default Two Sum problem:
python main.py
- Or provide your own problem:
python main.py "Reverse a linked list. Given the head of a singly linked list, reverse the list and return the reversed list."
How It Works
The system follows an intelligent workflow combining Agent and Workflow design patterns:
flowchart TD
start[Problem Input] --> generateTests[Generate Test Cases]
generateTests --> implement[Implement Function]
implement --> runTests[Run Tests - Batch]
runTests --> decision{All Tests Pass?}
decision -->|Yes| success[Success!]
decision -->|No| revise[Revise - Agent Decision]
revise --> runTests
decision -->|Max Iterations| maxIter[Max Iterations Reached]
The Process
- GenerateTestCases: Creates 5-7 comprehensive test cases from problem description
- ImplementFunction: Writes a
run_codefunction based on problem and test cases - RunTests: Executes function against all test cases using batch processing
- Revise: Analyzes failures and makes intelligent decisions to revise test cases and/or function code
- Loop: Continues until all tests pass or max iterations reached
Sample Output
Here's what you'll see when running the Two Sum example:
Starting PocketFlow Code Generator...
=== Generated 7 Test Cases ===
1. Basic case - solution at beginning
input: {'nums': [2, 7, 11, 15], 'target': 9}
expected: [0, 1]
2. Basic case - solution in middle
input: {'nums': [3, 2, 4], 'target': 6}
expected: [1, 2]
3. Edge case - minimum array size with duplicates
input: {'nums': [3, 3], 'target': 6}
expected: [0, 1]
4. Case with negative numbers
input: {'nums': [-1, -2, -3, -4, -5], 'target': -8}
expected: [2, 4]
5. Case with zero and negative target
input: {'nums': [0, 4, 3, 0], 'target': 0}
expected: [0, 3]
6. Case with solution at the end
input: {'nums': [1, 2, 3, 4, 5, 6], 'target': 11}
expected: [4, 5]
7. Larger array case
input: {'nums': [5, 75, 25, 45, 42, 2, 11, 9, 55, 12], 'target': 14}
expected: [2, 6]
=== Implemented Function ===
def run_code(nums, target):
# Dictionary to store number -> index mapping
num_to_index = {}
# Iterate through the array
for i, num in enumerate(nums):
# Calculate what number we need to reach the target
complement = target - num
# Check if the complement exists in our map
if complement in num_to_index:
# Found the pair! Return indices
return [num_to_index[complement], i]
# Store current number and its index
num_to_index[num] = i
# Should never reach here given problem constraints
return []
=== Test Results: 6/7 Passed ===
Failed tests:
1. Larger array case:
error: Expected [2, 6], got [0, 7]
expected: [2, 6]
=== Revisions (Iteration 1) ===
Revising test cases:
Test 7: 'Larger array case' -> 'Larger array case'
old input: {'nums': [5, 75, 25, 45, 42, 2, 11, 9, 55, 12], 'target': 14}
new input: {'nums': [5, 75, 25, 45, 42, 2, 11, 9, 55, 12], 'target': 14}
old expected: [2, 6]
new expected: [0, 7]
=== Test Results: 7/7 Passed ===
Key Features
Intelligent Decision Making
The Revise node acts as an agent that analyzes test failures and decides whether to:
- Fix test cases (if they have incorrect expected outputs)
- Fix the function implementation (if the logic is wrong)
- Or both
Structured Output with Validation
All LLM interactions use YAML format with:
- Reasoning fields: Transparent decision-making process
- Validation asserts: Ensures outputs match expected structure
- Rich debugging: Comprehensive logging of all steps
Batch Processing
The RunTests node uses PocketFlow's BatchNode to efficiently test the function against all test cases in parallel.
Files
main.py: Entry point with sample Two Sum problemflow.py: Connects all nodes into the complete workflownodes.py: Core logic nodes with validation and debuggingutils/call_llm.py: Anthropic Claude API wrapperutils/code_executor.py: Safe Python code execution utilitydoc/design.md: Detailed system design documentation
Design Patterns Used
- Workflow: Sequential steps of test generation → coding → testing
- Agent: Intelligent decision-making when tests fail
- Batch: Efficient parallel test execution
- Structured Output: YAML validation for reliable LLM outputs