diff --git a/cookbook/pocketflow-streamlit-hitl/docs/design.md b/cookbook/pocketflow-streamlit-hitl/docs/design.md new file mode 100644 index 0000000..f34ccac --- /dev/null +++ b/cookbook/pocketflow-streamlit-hitl/docs/design.md @@ -0,0 +1,105 @@ +# Design Doc: PocketFlow Streamlit Image Generation HITL + +> Human-in-the-Loop image generation application using PocketFlow and Streamlit + +## Requirements + +**User Story**: As a user, I want to: +1. Enter a text prompt describing an image I want to generate +2. Have the system generate an image based on my prompt using OpenAI's image generation API +3. Review the generated image in the web interface +4. Approve the image if I'm satisfied, OR regenerate with the same prompt if I want a different result +5. See the final approved image as the completed result + +**Technical Requirements**: +- Use OpenAI's image generation API (via responses.create with image_generation tool) +- Keep generated images in memory (base64 format) - no disk storage +- Provide clear approve/regenerate workflow +- Handle API errors gracefully with retries +- Maintain session state between generations + +## Flow Design + +### Applicable Design Pattern: + +**State Machine with Multiple Subflows**: Each state has its own user interface and workflow. Users interact with different UI elements in each state, and the app transitions to the next state based on user actions and feedback. + +### States & User Interface: + +1. **initial_input** - User sees text input field, enters prompt, clicks "Generate Image" button +2. **user_feedback** - User sees generated image, has "Approve" and "Regenerate" buttons +3. **final** - User sees final approved image and "Start Over" button + +### Flow High-level Design & Transitions: + +```mermaid +flowchart TD + Start([Start]) --> IS[initial_input] + IS --> GI[GenerateImage] + GI --> UF[user_feedback] + UF -->|Regenerate| GI + UF -->|Approve| F[final] + F --> IS + + %% Legend + classDef stateStyle fill:#e1f5fe,stroke:#01579b,stroke-width:2px + classDef nodeStyle fill:#fff3e0,stroke:#e65100,stroke-width:2px + + class IS,UF,F stateStyle + class GI nodeStyle +``` + +**Legend:** +- 🔷 **Blue rectangles**: User interface states (initial_input, user_feedback, final) +- 🔶 **Orange rectangles**: PocketFlow processing nodes (GenerateImage) + +## Utility Functions + +1. **Generate Image** (`utils/generate_image.py`) + - *Input*: prompt (str) + - *Output*: base64 image data (str) + - *Purpose*: Calls OpenAI's image generation API and returns base64 encoded image + - *Error Handling*: Includes retry logic for API failures + +## Node Design + +### Shared Memory + +**Using Streamlit Session State as Shared Store**: We use `st.session_state` directly as the shared store for PocketFlow, eliminating the need for separate data structures. + +The session state structure for the image generation workflow: + +```python +st.session_state = { + # User input and workflow state + "task_input": "user's text prompt for image generation", + "stage": "current workflow stage (initial_input/user_feedback/final)", + "error_message": "any error messages for user feedback", + + # Processing data + "input_used_by_process": "prompt used for generation", + "generated_image": "base64 encoded image data", + "final_result": "final approved image data", + + # Streamlit built-in keys (managed automatically) + # "_streamlit_*": various internal streamlit state +} +``` + +### Node Steps + +**Initial Input Flow Nodes:** + +1. **Image Generation Node** + - *Purpose*: Generate image using OpenAI API based on the prompt + - *Type*: Regular (with retries for API reliability) + - *Steps*: + - *prep*: Read "input_used_by_process" from st.session_state + - *exec*: Call generate_image utility with the prompt, return base64 image data + - *post*: Write base64 image data to "generated_image" in st.session_state + +**User Feedback Flow:** +- Reuses the same `GenerateImage` node when user clicks "Regenerate" + +**Final Flow:** +- No processing nodes needed - the `final` state simply displays the approved image from `generated_image` and provides UI for starting over diff --git a/cookbook/pocketflow-streamlit-hitl/utils/generate_image.py b/cookbook/pocketflow-streamlit-hitl/utils/generate_image.py new file mode 100644 index 0000000..cdf372a --- /dev/null +++ b/cookbook/pocketflow-streamlit-hitl/utils/generate_image.py @@ -0,0 +1,30 @@ +from openai import OpenAI +import os +import base64 + +def generate_image(prompt: str) -> str: + client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) + + response = client.images.generate( + model="gpt-image-1", + prompt=prompt, + n=1, + size="1024x1024" + ) + + image_b64 = response.data[0].b64_json + print(f"Generated image ({len(image_b64)} chars)") + return image_b64 + +if __name__ == "__main__": + test_prompt = "A gray tabby cat hugging an otter with an orange scarf" + print(f"Generating image for prompt: {test_prompt[:50]}...") + + image_base64 = generate_image(test_prompt) + print(f"Success! Generated {len(image_base64)} characters of base64 data") + + # Write image to local file for testing + image_bytes = base64.b64decode(image_base64) + with open("test_generated_image.png", "wb") as f: + f.write(image_bytes) + print("Test image saved as test_generated_image.png") \ No newline at end of file diff --git a/cookbook/pocketflow-streamlit-hitl/utils/test_generated_image.png b/cookbook/pocketflow-streamlit-hitl/utils/test_generated_image.png new file mode 100644 index 0000000..f0f4277 Binary files /dev/null and b/cookbook/pocketflow-streamlit-hitl/utils/test_generated_image.png differ