3.0 KiB

Raw Blame History

Design Doc: FastAPI WebSocket Chat Interface

Please DON'T remove notes for AI

Requirements

Notes for AI: Keep it simple and clear. If the requirements are abstract, write concrete user stories

User Story: As a user, I want to interact with an AI chatbot through a web interface where:

I can send messages and receive real-time streaming responses
The connection stays persistent (WebSocket)
I can see the AI response being typed out in real-time as the LLM generates it
The interface is minimal and easy to use

Technical Requirements:

FastAPI backend with WebSocket support
Real-time bidirectional communication
True LLM streaming integration using PocketFlow AsyncNode
Simple HTML/JavaScript frontend
Minimal dependencies

Flow Design

Notes for AI:

Consider the design patterns of agent, map-reduce, rag, and workflow. Apply them if they fit.

Present a concise, high-level description of the workflow.

Applicable Design Pattern:

Single Async Node Pattern: One PocketFlow AsyncNode handles the entire LLM streaming process with real-time WebSocket streaming

Flow high-level Design:

PocketFlow AsyncFlow: Just one async node

Streaming Chat Node: Processes message, calls LLM with real streaming, sends chunks immediately to WebSocket

Integration: FastAPI WebSocket endpoint calls the PocketFlow AsyncFlow

flowchart TD
    user((User Browser)) --> websocket(FastAPI WebSocket)
    websocket --> flow[Streaming Chat AsyncNode]
    flow --> websocket
    websocket --> user
    
    style user fill:#e1f5fe
    style websocket fill:#f3e5f5
    style flow fill:#e8f5e8,stroke:#4caf50,stroke-width:3px

Utility Functions

Notes for AI:

Understand the utility function definition thoroughly by reviewing the doc.

Include only the necessary utility functions, based on nodes in the flow.

Stream LLM (utils/stream_llm.py)
- Input: messages (list of chat history)
- Output: generator yielding real-time response chunks from OpenAI API
- Used by streaming chat node to get LLM chunks as they're generated

Node Design

Shared Store

Notes for AI: Try to minimize data redundancy

The shared store structure is organized as follows:

shared = {
    "websocket": None,           # WebSocket connection object
    "user_message": "",          # Current user message
    "conversation_history": []   # List of message history with roles
}

Node Steps

Notes for AI: Carefully decide whether to use Batch/Async Node/Flow.

Streaming Chat Node

Purpose: Process user message, call LLM with real streaming, and send chunks immediately via WebSocket
Type: AsyncNode (for real-time streaming)
Steps:
- prep: Read user message, build conversation history with new message
- exec_async: Call streaming LLM utility, stream each chunk immediately to WebSocket as received
- post: Update conversation history with complete assistant response

3.0 KiB Raw Blame History