diff --git a/cookbook/pocketflow-code-generator/doc/design.md b/cookbook/pocketflow-code-generator/doc/design.md
new file mode 100644
index 0000000..95cfd25
--- /dev/null
+++ b/cookbook/pocketflow-code-generator/doc/design.md
@@ -0,0 +1,131 @@
+# Design Doc: PocketFlow Code Generator
+
+> Please DON'T remove notes for AI
+
+## Requirements
+
+> Notes for AI: Keep it simple and clear.
+> If the requirements are abstract, write concrete user stories
+
+**User Story**: As a developer, I want an AI system that can take a LeetCode-style coding problem and automatically:
+1. Generate comprehensive test cases including edge cases
+2. Implement a solution function
+3. Test the implementation against the test cases
+4. When tests fail, intelligently decide whether to revise the test cases or the function
+5. Iterate until all tests pass
+
+**Sample Problem**: Two Sum - Given an array of integers and a target, return indices of two numbers that add up to the target.
+
+This is well-suited for AI because:
+- ✅ Routine task: Test case generation follows patterns
+- ✅ Creative task: Code generation from clear problem descriptions  
+- ✅ Clear decision criteria: Whether to revise tests vs implementation
+
+## Flow Design
+
+> Notes for AI:
+> 1. Consider the design patterns of agent, map-reduce, rag, and workflow. Apply them if they fit.
+> 2. Present a concise, high-level description of the workflow.
+
+### Applicable Design Pattern:
+
+1. **Workflow Pattern**: Sequential steps of test generation → coding → testing
+2. **Agent Pattern**: Decision-making when tests fail with structured output
+   - *Context*: Test results, current test cases, and function code
+   - *Actions*: Structured output to revise test cases and/or function
+
+### Flow high-level Design:
+
+1. **Generate Test Cases**: Create comprehensive input/output test pairs from problem description
+2. **Implement Function**: Write `def run_code` function based on problem and current test cases  
+3. **Run Tests**: Execute function against all test cases using batch processing
+4. **Revise**: Analyze failures and output structured revisions for test cases and/or function
+5. **Loop back to Run Tests** until all pass
+
+```mermaid
+flowchart TD
+    start[Problem Input] --> generateTests[Generate Test Cases]
+    generateTests --> implement[Implement Function]
+    implement --> runTests[Run Tests - Batch]
+    runTests --> decision{All Tests Pass?}
+    decision -->|Yes| success[Success!]
+    decision -->|No| revise[Revise]
+    revise --> runTests
+```
+
+## Utility Functions
+
+> Notes for AI:
+> 1. Understand the utility function definition thoroughly by reviewing the doc.
+> 2. Include only the necessary utility functions, based on nodes in the flow.
+
+1. **Call LLM** (`utils/call_llm.py`)
+   - *Input*: prompt (str)
+   - *Output*: response (str)
+   - Used by all LLM-powered nodes for generating tests, code, and analysis
+
+2. **Execute Python Code** (`utils/code_executor.py`)
+   - *Input*: function_code (str), test_case (dict)
+   - *Output*: test_result (dict with passed, failed, error details)
+   - Used by RunTests batch node to safely execute generated code against individual test cases
+
+## Node Design
+
+### Shared Memory
+
+> Notes for AI: Try to minimize data redundancy
+
+The shared memory structure is organized as follows:
+
+```python
+shared = {
+    "problem": "Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.",
+    "test_cases": [
+        {"input": {"nums": [2,7,11,15], "target": 9}, "expected": [0,1]},
+        # ... more test cases
+    ],
+    "function_code": "def run_code(nums, target): ...",
+    "test_results": [
+        {"test_case": {...}, "passed": True/False, "error": "..."},
+        # ... results for each test case
+    ],
+    "iteration_count": 0,
+    "max_iterations": 5
+}
+```
+
+### Node Steps
+
+> Notes for AI: Carefully decide whether to use Batch/Async Node/Flow.
+
+1. **GenerateTestCases Node**
+  - *Purpose*: Create comprehensive test cases including edge cases from problem description
+  - *Type*: Regular Node
+  - *Steps*:
+    - *prep*: Read problem description from shared store
+    - *exec*: Call LLM to generate diverse test cases in structured format
+    - *post*: Store test cases directly in shared["test_cases"]
+
+2. **ImplementFunction Node**
+  - *Purpose*: Generate `def run_code` function based on problem and current test cases
+  - *Type*: Regular Node  
+  - *Steps*:
+    - *prep*: Read problem description and test cases from shared store
+    - *exec*: Call LLM to implement `def run_code` function with clean code output
+    - *post*: Store function code directly in shared["function_code"]
+
+3. **RunTests Node**
+  - *Purpose*: Execute function against all test cases using batch processing
+  - *Type*: Batch Node
+  - *Steps*:
+    - *prep*: Read function code from shared store, return list of test cases
+    - *exec*: Use code executor utility to run function against each individual test case
+    - *post*: Store all results in shared["test_results"], return "success" if all pass else "failure"
+
+4. **Revise Node** (Agent with Structured Output)
+  - *Purpose*: Analyze test failures and output structured revisions for test cases and/or function
+  - *Type*: Regular Node (Agent decision-making)
+  - *Steps*:
+    - *prep*: Read test results, test cases, function code, iteration count from shared store
+    - *exec*: Call LLM to analyze failures and output structured YAML with revised test cases and/or function code
+    - *post*: Update shared["test_cases"] and/or shared["function_code"] based on structured output