add chat with memory tutorial

2025-03-21 15:28:55 -04:00 · 2025-03-21 15:28:55 -04:00 · 154e37528b
parent a0a1a4cadd
commit 154e37528b
13 changed files with 555 additions and 309 deletions
--- a/cookbook/pocketflow-chat-memory/README.md
+++ b/cookbook/pocketflow-chat-memory/README.md
@ -1,91 +1,105 @@
-# Retrieval Augmented Generation (RAG)
+# PocketFlow Chat with Memory
-This project demonstrates a simplified RAG system that retrieves relevant documents based on user queries.
+A chat application with memory retrieval using PocketFlow. This example maintains a sliding window of recent conversations while retrieving relevant past conversations based on context.
 ## Features
- Simple vector-based document retrieval
+- Maintains a window of 3 most recent conversation pairs
- Two-stage pipeline (offline indexing, online querying)
+- Archives older conversations with embeddings
- FAISS-powered similarity search
+- Uses vector similarity to retrieve the most relevant past conversation
 - Combines recent context (3 pairs) with retrieved context (1 pair) for better responses
-## Getting Started
+## Run It
-1. Install the required dependencies:
+1. Make sure your OpenAI API key is set:
-
+    ```bash
-```bash
+    export OPENAI_API_KEY="your-api-key-here"
-pip install -r requirements.txt
+    ```
 ```
 2. Run the application with a sample query:
 ```bash
 python main.py --"Large Language Model"
 ```
 3. Or run without arguments to use the default query:
 ```bash
 python main.py
 ```
 ## API Key
 By default, demo uses dummy embedding based on character frequencies. To use real OpenAI embedding:
 1. Edit nodes.py to replace the dummy `get_embedding` with `get_openai_embedding`:
 ```python
 # Change this line:
 query_embedding = get_embedding(query)
 # To this:
 query_embedding = get_openai_embedding(query)
 # And also change this line:
 return get_embedding(text)
 # To this:
 return get_openai_embedding(text)
 ```
 2. Make sure your OpenAI API key is set:
 ```bash
 export OPENAI_API_KEY="your-api-key-here"
 ```
 2. Install requirements and run the application:
    ```bash
    pip install -r requirements.txt
    python main.py
    ```
 ## How It Works
 The magic happens through a two-stage pipeline implemented with PocketFlow:
 ```mermaid
-graph TD
+flowchart LR
-    subgraph OfflineFlow[Offline Document Indexing]
+    Question[GetUserQuestionNode] -->|retrieve| Retrieve[RetrieveNode]
-        EmbedDocs[EmbedDocumentsNode] --> CreateIndex[CreateIndexNode]
+    Retrieve -->|answer| Answer[AnswerNode]
-    end
+    Answer -->|question| Question
-    
+    Answer -->|embed| Embed[EmbedNode]
-    subgraph OnlineFlow[Online Query Processing]
+    Embed -->|question| Question
        EmbedQuery[EmbedQueryNode] --> RetrieveDoc[RetrieveDocumentNode]
    end
 ```
-Here's what each part does:
+The chat application uses:
-1. **EmbedDocumentsNode**: Converts documents into vector representations
+- Four specialized nodes:
-2. **CreateIndexNode**: Creates a searchable FAISS index from embeddings
+  - `GetUserQuestionNode`: Handles interactive user input
-3. **EmbedQueryNode**: Converts user query into the same vector space
+  - `RetrieveNode`: Finds relevant past conversations using vector similarity
-4. **RetrieveDocumentNode**: Finds the most similar document using vector search
+  - `AnswerNode`: Generates responses using both recent and retrieved context
  - `EmbedNode`: Archives older conversations with embeddings
 - A sliding window approach that maintains only the 3 most recent conversation pairs in active context
 ## Files
 - [`nodes.py`](./nodes.py): Four node implementations with clear separation of concerns
 - [`flow.py`](./flow.py): Chat flow structure definition
 - [`main.py`](./main.py): Entry point for running the demo
 - [`utils/`](./utils/): Utility functions for embeddings, LLM calls, and vector operations
 ## Example Output
 ```
-✅ Created 5 document embeddings
+==================================================
-🔍 Creating search index...
+PocketFlow Chat with Memory
-✅ Index created with 5 vectors
+==================================================
-🔍 Embedding query: Large Language Model
+This chat keeps your 3 most recent conversations
-🔎 Searching for relevant documents...
+and brings back relevant past conversations when helpful
-📄 Retrieved document (index: 3, distance: 0.3296)
+Type 'exit' to end the conversation
-📄 Most relevant text: "PocketFlow is a 100-line Large Language Model Framework."
+==================================================
 Welcome to the interactive chat! Type 'exit' to end the conversation.
 You: remember, my cat name is Whiskers and she only drinks from running water taps
 Assistant: Got it! Whiskers only drinks from running water taps. If you have any questions or need advice related to her, feel free to ask!
 You: remember, I have a peanut allergy but I can eat almonds and cashews without problems
 Assistant: Understood! You have a peanut allergy but can safely eat almonds and cashews. If you need any assistance or information related to your dietary preferences or restrictions, just let me know!
 You: remember, my anniversary is June 17th and I always forget to buy a gift until the last minute
 Assistant: Got it! Your anniversary is on June 17th, and you often remember to buy a gift at the last minute. If you'd like, I can help you brainstorm gift ideas or set reminders as the date approaches!
 You: remember, I lived in Portugal for 3 years when I was a child and still speak some Portuguese
 Assistant: That's great to know! You lived in Portugal for three years during your childhood and still speak some Portuguese. If you ever want to practice or need help with anything related to Portuguese language or culture, feel free to ask!
 ✅ Added conversation to index at position 0
 ✅ Index now contains 1 conversations
 You: remember, my favorite dessert is tiramisu but only if it's made with real mascarpone cheese
 🔍 Finding relevant conversation for: remember, my favorite dessert ...
 📄 Retrieved conversation (distance: 0.5008)
 Assistant: Got it! Your favorite dessert is tiramisu, but only when it's made with real mascarpone cheese. If you ever need recommendations or recipes, just let me know!
 ✅ Added conversation to index at position 1
 ✅ Index now contains 2 conversations
 You: remember, I collect vintage mechanical watches and my most valuable one is a 1965 Omega Seamaster
 🔍 Finding relevant conversation for: remember, I collect vintage me...
 📄 Retrieved conversation (distance: 0.5374)
 Assistant: Got it! You collect vintage mechanical watches, and your most valuable piece is a 1965 Omega Seamaster. If you have questions about watches or need assistance with your collection, feel free to reach out!
 ✅ Added conversation to index at position 2
 ✅ Index now contains 3 conversations
 You: what's my cat name?
 🔍 Finding relevant conversation for: what's my cat name?...
 📄 Retrieved conversation (distance: 0.3643)
 Assistant: Your cat's name is Whiskers.
 ✅ Added conversation to index at position 3
 ✅ Index now contains 4 conversations
 ```
 ## Files
 - [`main.py`](./main.py): Main entry point for running the RAG demonstration
 - [`flow.py`](./flow.py): Configures the flows that connect the nodes
 - [`nodes.py`](./nodes.py): Defines the nodes for document processing and retrieval
 - [`utils.py`](./utils.py): Utility functions including the embedding function
--- a/cookbook/pocketflow-chat-memory/flow.py
+++ b/cookbook/pocketflow-chat-memory/flow.py
@ -1,22 +1,33 @@
 from pocketflow import Flow
-from nodes import EmbedDocumentsNode, CreateIndexNode, EmbedQueryNode, RetrieveDocumentNode
+from nodes import GetUserQuestionNode, RetrieveNode, AnswerNode, EmbedNode
-def get_offline_flow():
+def create_chat_flow():
-    # Create offline flow for document indexing
+    # Create the nodes
-    embed_docs_node = EmbedDocumentsNode()
+    question_node = GetUserQuestionNode()
-    create_index_node = CreateIndexNode()
+    retrieve_node = RetrieveNode()
-    embed_docs_node >> create_index_node
+    answer_node = AnswerNode()
-    offline_flow = Flow(start=embed_docs_node)
+    embed_node = EmbedNode()
-    return offline_flow
+    
    # Connect the flow:
    # 1. Start with getting a question
    # 2. Retrieve relevant conversations
    # 3. Generate an answer
    # 4. Optionally embed old conversations
    # 5. Loop back to get the next question
-def get_online_flow():
+    # Main flow path
-    # Create online flow for document retrieval
+    question_node - "retrieve" >> retrieve_node
-    embed_query_node = EmbedQueryNode()
+    retrieve_node - "answer" >> answer_node
-    retrieve_doc_node = RetrieveDocumentNode()
+    
-    embed_query_node >> retrieve_doc_node
+    # When we need to embed old conversations
-    online_flow = Flow(start=embed_query_node)
+    answer_node - "embed" >> embed_node
-    return online_flow
+    
    # Loop back for next question
    answer_node - "question" >> question_node
    embed_node - "question" >> question_node
    # Create the flow starting with question node
    return Flow(start=question_node)
-# Initialize flows
+# Initialize the flow
-offline_flow = get_offline_flow()
+chat_flow = create_chat_flow() 
 online_flow = get_online_flow() 
--- a/cookbook/pocketflow-chat-memory/main.py
+++ b/cookbook/pocketflow-chat-memory/main.py
@ -1,55 +1,26 @@
-import sys
+from flow import chat_flow
 from flow import offline_flow, online_flow
-def run_rag_demo():
+def run_chat_memory_demo():
    """
-    Run a demonstration of the RAG system.
+    Run an interactive chat interface with memory retrieval.
-    This function:
+    Features:
-    1. Indexes a set of sample documents (offline flow)
+    1. Maintains a window of the 3 most recent conversation pairs
-    2. Takes a query from the command line
+    2. Archives older conversations with embeddings
-    3. Retrieves the most relevant document (online flow)
+    3. Retrieves 1 relevant past conversation when needed
    4. Total context to LLM: 3 recent pairs + 1 retrieved pair
    """
    # Sample texts - corpus of documents to search
    texts = [
        "The quick brown fox jumps over the lazy dog.",
        "Machine learning is a subset of artificial intelligence.",
        "Python is a popular programming language for data science.",
        "PocketFlow is a 100-line Large Language Model Framework.",
        "The weather is sunny and warm today.",
    ]
    print("=" * 50)
-    print("PocketFlow RAG Document Retrieval")
+    print("PocketFlow Chat with Memory")
    print("=" * 50)
    print("This chat keeps your 3 most recent conversations")
    print("and brings back relevant past conversations when helpful")
    print("Type 'exit' to end the conversation")
    print("=" * 50)
-    # Default query
+    # Run the chat flow
-    default_query = "Large Language Model"
+    chat_flow.run({})
    # Get query from command line if provided with --
    query = default_query
    for arg in sys.argv[1:]:
        if arg.startswith("--"):
            query = arg[2:]
            break
    # Single shared store for both flows
    shared = {
        "texts": texts,
        "embeddings": None,
        "index": None,
        "query": query,
        "query_embedding": None,
        "retrieved_document": None
    }
    # Initialize and run the offline flow (document indexing)
    offline_flow.run(shared)
    # Run the online flow to retrieve the most relevant document
    online_flow.run(shared)
 if __name__ == "__main__":
-    run_rag_demo()
+    run_chat_memory_demo()
--- a/cookbook/pocketflow-chat-memory/nodes.py
+++ b/cookbook/pocketflow-chat-memory/nodes.py
@ -1,95 +1,203 @@
-from pocketflow import Node, Flow, BatchNode
+from pocketflow import Node
-import numpy as np
+from utils.vector_index import create_index, add_vector, search_vectors
-import faiss
+from utils.call_llm import call_llm
-from utils import get_embedding, get_openai_embedding
+from utils.get_embedding import get_embedding
-# Nodes for the offline flow
+class GetUserQuestionNode(Node):
 class EmbedDocumentsNode(BatchNode):
    def prep(self, shared):
-        """Read texts from shared store and return as an iterable"""
+        """Initialize messages if first run"""
-        return shared["texts"]
+        if "messages" not in shared:
-    
+            shared["messages"] = []
-    def exec(self, text):
+            print("Welcome to the interactive chat! Type 'exit' to end the conversation.")
        """Embed a single text"""
        return get_embedding(text)
    def post(self, shared, prep_res, exec_res_list):
        """Store embeddings in the shared store"""
        embeddings = np.array(exec_res_list, dtype=np.float32)
        shared["embeddings"] = embeddings
        print(f"✅ Created {len(embeddings)} document embeddings")
        return "default"
 class CreateIndexNode(Node):
    def prep(self, shared):
        """Get embeddings from shared store"""
        return shared["embeddings"]
    def exec(self, embeddings):
        """Create FAISS index and add embeddings"""
        print("🔍 Creating search index...")
        dimension = embeddings.shape[1]
-        # Create a flat L2 index
+        return None
-        index = faiss.IndexFlatL2(dimension)
+    
-        
+    def exec(self, _):
-        # Add the embeddings to the index
+        """Get user input interactively"""
-        index.add(embeddings)
+        # Get interactive input from user
-        
+        user_input = input("\nYou: ")
-        return index
+            
        # Check if user wants to exit
        if user_input.lower() == 'exit':
            return None
        return user_input
    def post(self, shared, prep_res, exec_res):
-        """Store the index in shared store"""
+        # If exec_res is None, the user wants to exit
-        shared["index"] = exec_res
+        if exec_res is None:
-        print(f"✅ Index created with {exec_res.ntotal} vectors")
+            print("\nGoodbye!")
-        return "default"
+            return None  # End the conversation
        # Add user message to current messages
        shared["messages"].append({"role": "user", "content": exec_res})
        return "retrieve"
-# Nodes for the online flow
+class AnswerNode(Node):
 class EmbedQueryNode(Node):
    def prep(self, shared):
-        """Get query from shared store"""
+        """Prepare context for the LLM"""
-        return shared["query"]
+        if not shared.get("messages"):
            return None
        # 1. Get the last 3 conversation pairs (or fewer if not available)
        recent_messages = shared["messages"][-6:] if len(shared["messages"]) > 6 else shared["messages"]
        # 2. Add the retrieved relevant conversation if available
        context = []
        if shared.get("retrieved_conversation"):
            # Add a system message to indicate this is a relevant past conversation
            context.append({
                "role": "system", 
                "content": "The following is a relevant past conversation that may help with the current query:"
            })
            context.extend(shared["retrieved_conversation"])
            context.append({
                "role": "system", 
                "content": "Now continue the current conversation:"
            })
        # 3. Add the recent messages
        context.extend(recent_messages)
        return context
-    def exec(self, query):
+    def exec(self, messages):
-        """Embed the query"""
+        """Generate a response using the LLM"""
-        print(f"🔍 Embedding query: {query}")
+        if messages is None:
-        query_embedding = get_embedding(query)
+            return None
-        return np.array([query_embedding], dtype=np.float32)
+        
        # Call LLM with the context
        response = call_llm(messages)
        return response
    def post(self, shared, prep_res, exec_res):
-        """Store query embedding in shared store"""
+        """Process the LLM response"""
-        shared["query_embedding"] = exec_res
+        if prep_res is None or exec_res is None:
-        return "default"
+            return None  # End the conversation
        # Print the assistant's response
        print(f"\nAssistant: {exec_res}")
        # Add assistant message to history
        shared["messages"].append({"role": "assistant", "content": exec_res})
        # If we have more than 6 messages (3 conversation pairs), archive the oldest pair
        if len(shared["messages"]) > 6:
            return "embed"
        # We only end if the user explicitly typed 'exit'
        # Even if last_question is set, we continue in interactive mode
        return "question"
-class RetrieveDocumentNode(Node):
+class EmbedNode(Node):
    def prep(self, shared):
-        """Get query embedding, index, and texts from shared store"""
+        """Extract the oldest conversation pair for embedding"""
-        return shared["query_embedding"], shared["index"], shared["texts"]
+        if len(shared["messages"]) <= 6:
            return None
        # Extract the oldest user-assistant pair
        oldest_pair = shared["messages"][:2]
        # Remove them from current messages
        shared["messages"] = shared["messages"][2:]
        return oldest_pair
-    def exec(self, inputs):
+    def exec(self, conversation):
-        """Search the index for similar documents"""
+        """Embed a conversation"""
-        print("🔎 Searching for relevant documents...")
+        if not conversation:
-        query_embedding, index, texts = inputs
+            return None
        # Combine user and assistant messages into a single text for embedding
        user_msg = next((msg for msg in conversation if msg["role"] == "user"), {"content": ""})
        assistant_msg = next((msg for msg in conversation if msg["role"] == "assistant"), {"content": ""})
        combined = f"User: {user_msg['content']} Assistant: {assistant_msg['content']}"
-        # Search for the most similar document
+        # Generate embedding
-        distances, indices = index.search(query_embedding, k=1)
+        embedding = get_embedding(combined)
        # Get the index of the most similar document
        best_idx = indices[0][0]
        distance = distances[0][0]
        # Get the corresponding text
        most_relevant_text = texts[best_idx]
        return {
-            "text": most_relevant_text,
+            "conversation": conversation,
-            "index": best_idx,
+            "embedding": embedding
            "distance": distance
        }
    def post(self, shared, prep_res, exec_res):
-        """Store retrieved document in shared store"""
+        """Store the embedding and add to index"""
-        shared["retrieved_document"] = exec_res
+        if not exec_res:
-        print(f"📄 Retrieved document (index: {exec_res['index']}, distance: {exec_res['distance']:.4f})")
+            # If there's nothing to embed, just continue with the next question
-        print(f"📄 Most relevant text: \"{exec_res['text']}\"")
+            return "question"
-        return "default"
+            
        # Initialize vector index if not exist
        if "vector_index" not in shared:
            shared["vector_index"] = create_index()
            shared["vector_items"] = []  # Track items separately
        # Add the embedding to the index and store the conversation
        position = add_vector(shared["vector_index"], exec_res["embedding"])
        shared["vector_items"].append(exec_res["conversation"])
        print(f"✅ Added conversation to index at position {position}")
        print(f"✅ Index now contains {len(shared['vector_items'])} conversations")
        # Continue with the next question
        return "question"
 class RetrieveNode(Node):
    def prep(self, shared):
        """Get the current query for retrieval"""
        if not shared.get("messages"):
            return None
        # Get the latest user message for searching
        latest_user_msg = next((msg for msg in reversed(shared["messages"]) 
                                if msg["role"] == "user"), {"content": ""})
        # Check if we have a vector index with items
        if ("vector_index" not in shared or 
            "vector_items" not in shared or 
            len(shared["vector_items"]) == 0):
            return None
        return {
            "query": latest_user_msg["content"],
            "vector_index": shared["vector_index"],
            "vector_items": shared["vector_items"]
        }
    def exec(self, inputs):
        """Find the most relevant past conversation"""
        if not inputs:
            return None
        query = inputs["query"]
        vector_index = inputs["vector_index"]
        vector_items = inputs["vector_items"]
        print(f"🔍 Finding relevant conversation for: {query[:30]}...")
        # Create embedding for the query
        query_embedding = get_embedding(query)
        # Search for the most similar conversation
        indices, distances = search_vectors(vector_index, query_embedding, k=1)
        if not indices:
            return None
        # Get the corresponding conversation
        conversation = vector_items[indices[0]]
        return {
            "conversation": conversation,
            "distance": distances[0]
        }
    def post(self, shared, prep_res, exec_res):
        """Store the retrieved conversation"""
        if exec_res is not None:
            shared["retrieved_conversation"] = exec_res["conversation"]
            print(f"📄 Retrieved conversation (distance: {exec_res['distance']:.4f})")
        else:
            shared["retrieved_conversation"] = None
        return "answer"
--- a/cookbook/pocketflow-chat-memory/utils.py
+++ b/cookbook/pocketflow-chat-memory/utils.py
@ -1,79 +0,0 @@
 import os
 import numpy as np
 from openai import OpenAI
 def get_embedding(text):
    """
    A simple embedding function that converts text to vector.
    In a real application, you would use a proper embedding model like OpenAI,
    Hugging Face, or other embedding services. For this example, we'll use a 
    simple approach based on character frequencies for demonstration purposes.
    """
    # Create a simple embedding (128-dimensional) based on character frequencies
    # This is just for demonstration - not a real embedding algorithm!
    embedding = np.zeros(128, dtype=np.float32)
    # Generate a deterministic but distributed embedding based on character frequency
    for i, char in enumerate(text):
        # Use modulo to distribute values across the embedding dimensions
        pos = ord(char) % 128
        embedding[pos] += 1.0
    # Normalize the embedding
    norm = np.linalg.norm(embedding)
    if norm > 0:
        embedding = embedding / norm
    return embedding
 def get_openai_embedding(text):
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "YOUR_API_KEY"))
    response = client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    # Extract the embedding vector from the response
    embedding = response.data[0].embedding
    # Convert to numpy array for consistency with other embedding functions
    return np.array(embedding, dtype=np.float32)
 if __name__ == "__main__":
    # Test the embedding function
    text1 = "The quick brown fox jumps over the lazy dog."
    text2 = "Python is a popular programming language for data science."
    emb1 = get_embedding(text1)
    emb2 = get_embedding(text2)
    print(f"Embedding 1 shape: {emb1.shape}")
    print(f"Embedding 2 shape: {emb2.shape}")
    # Calculate similarity (dot product)
    similarity = np.dot(emb1, emb2)
    print(f"Similarity between texts: {similarity:.4f}")
    # Compare with a different text
    text3 = "Machine learning is a subset of artificial intelligence."
    emb3 = get_embedding(text3)
    similarity13 = np.dot(emb1, emb3)
    similarity23 = np.dot(emb2, emb3)
    print(f"Similarity between text1 and text3: {similarity13:.4f}")
    print(f"Similarity between text2 and text3: {similarity23:.4f}")
    # These simple comparisons should show higher similarity 
    # between related concepts (text2 and text3) than between
    # unrelated texts (text1 and text3)
    # Uncomment to test OpenAI embeddings (requires API key)
    print("\nTesting OpenAI embeddings (requires API key):")
    oai_emb1 = get_openai_embedding(text1)
    oai_emb2 = get_openai_embedding(text2)
    print(f"OpenAI Embedding 1 shape: {oai_emb1.shape}")
    oai_similarity = np.dot(oai_emb1, oai_emb2)
    print(f"OpenAI similarity between texts: {oai_similarity:.4f}")
--- a/cookbook/pocketflow-chat-memory/utils/init.py
+++ b/cookbook/pocketflow-chat-memory/utils/init.py
@ -0,0 +1 @@
--- a/cookbook/pocketflow-chat-memory/utils/call_llm.py
+++ b/cookbook/pocketflow-chat-memory/utils/call_llm.py
@ -0,0 +1,20 @@
 import os
 from openai import OpenAI
 def call_llm(messages):
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "your-api-key"))
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        temperature=0.7
    )
    return response.choices[0].message.content
 if __name__ == "__main__":
    # Test the LLM call
    messages = [{"role": "user", "content": "In a few words, what's the meaning of life?"}]
    response = call_llm(messages)
    print(f"Prompt: {messages[0]['content']}")
    print(f"Response: {response}") 
--- a/cookbook/pocketflow-chat-memory/utils/get_embedding.py
+++ b/cookbook/pocketflow-chat-memory/utils/get_embedding.py
@ -0,0 +1,33 @@
 import os
 import numpy as np
 from openai import OpenAI
 def get_embedding(text):
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "YOUR_API_KEY"))
    response = client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    # Extract the embedding vector from the response
    embedding = response.data[0].embedding
    # Convert to numpy array for consistency with other embedding functions
    return np.array(embedding, dtype=np.float32)
 if __name__ == "__main__":
    # Test the embedding function
    text1 = "The quick brown fox jumps over the lazy dog."
    text2 = "Python is a popular programming language for data science."
    emb1 = get_embedding(text1)
    emb2 = get_embedding(text2)
    print(f"Embedding 1 shape: {emb1.shape}")
    print(f"Embedding 2 shape: {emb2.shape}")
    # Calculate similarity (dot product)
    similarity = np.dot(emb1, emb2)
    print(f"Similarity between texts: {similarity:.4f}") 
--- a/cookbook/pocketflow-chat-memory/utils/vector_index.py
+++ b/cookbook/pocketflow-chat-memory/utils/vector_index.py
@ -0,0 +1,65 @@
 import numpy as np
 import faiss
 def create_index(dimension=1536):
    return faiss.IndexFlatL2(dimension)
 def add_vector(index, vector):
    # Make sure the vector is a numpy array with the right shape for FAISS
    vector = np.array(vector).reshape(1, -1).astype(np.float32)
    # Add the vector to the index
    index.add(vector)
    # Return the position (index.ntotal is the total number of vectors in the index)
    return index.ntotal - 1
 def search_vectors(index, query_vector, k=1):
    """Search for the k most similar vectors to the query vector
    Args:
        index: The FAISS index
        query_vector: The query vector (numpy array or list)
        k: Number of results to return (default: 1)
    Returns:
        tuple: (indices, distances) where:
            - indices is a list of positions in the index
            - distances is a list of the corresponding distances
    """
    # Make sure we don't try to retrieve more vectors than exist in the index
    k = min(k, index.ntotal)
    if k == 0:
        return [], []
    # Make sure the query is a numpy array with the right shape for FAISS
    query_vector = np.array(query_vector).reshape(1, -1).astype(np.float32)
    # Search the index
    distances, indices = index.search(query_vector, k)
    return indices[0].tolist(), distances[0].tolist()
 # Example usage
 if __name__ == "__main__":
    # Create a new index
    index = create_index(dimension=3)
    # Add some random vectors and track them separately
    items = []
    for i in range(5):
        vector = np.random.random(3)
        position = add_vector(index, vector)
        items.append(f"Item {i}")
        print(f"Added vector at position {position}")
    print(f"Index contains {index.ntotal} vectors")
    # Search for a similar vector
    query = np.random.random(3)
    indices, distances = search_vectors(index, query, k=2)
    print("Query:", query)
    print("Found indices:", indices)
    print("Distances:", distances)
    print("Retrieved items:", [items[idx] for idx in indices]) 
--- a/cookbook/pocketflow-chat-memory/vector_index.py
+++ b/cookbook/pocketflow-chat-memory/vector_index.py
@ -0,0 +1,92 @@
 import numpy as np
 import faiss
 def create_index(dimension=128):
    """Create a new vector index for fast similarity search
    Args:
        dimension: The dimensionality of the vectors to be indexed
    Returns:
        tuple: (index, items_list) where:
            - index is the FAISS index for searching
            - items_list is an empty list for storing the items
    """
    # Create a flat (exact, brute-force) index for storing vectors
    index = faiss.IndexFlatL2(dimension)
    # Initialize an empty list to store the actual items
    items_list = []
    return index, items_list
 def add_to_index(index, items_list, embedding, item):
    """Add an item and its vector representation to the index
    Args:
        index: The FAISS index
        items_list: The list of items corresponding to vectors in the index
        embedding: The vector representation of the item (numpy array)
        item: The actual item to store
    Returns:
        int: The position where the item was added
    """
    # Make sure the embedding is a numpy array with the right shape for FAISS
    vector = np.array(embedding).reshape(1, -1).astype(np.float32)
    # Add the vector to the index
    index.add(vector)
    # Store the item and return its position
    items_list.append(item)
    return len(items_list) - 1
 def search_index(index, items_list, query_embedding, k=1):
    """Search for the k most similar items to the query vector
    Args:
        index: The FAISS index
        items_list: The list of items corresponding to vectors in the index
        query_embedding: The query vector (numpy array)
        k: Number of results to return (default: 1)
    Returns:
        tuple: (found_items, distances) where:
            - found_items is a list of the k most similar items
            - distances is a list of the corresponding distances
    """
    # Make sure we don't try to retrieve more items than exist in the index
    k = min(k, len(items_list))
    if k == 0:
        return [], []
    # Make sure the query is a numpy array with the right shape for FAISS
    query_vector = np.array(query_embedding).reshape(1, -1).astype(np.float32)
    # Search the index
    D, I = index.search(query_vector, k)
    # Get the items corresponding to the found indices
    found_items = [items_list[i] for i in I[0]]
    distances = D[0].tolist()
    return found_items, distances
 # Example usage
 if __name__ == "__main__":
    # Create a new index
    index, items = create_index(dimension=3)
    # Add some random vectors and items
    for i in range(5):
        vector = np.random.random(3)
        add_to_index(index, items, vector, f"Item {i}")
    print(f"Added {len(items)} items to the index")
    # Search for a similar vector
    query = np.random.random(3)
    found_items, distances = search_index(index, items, query, k=2)
    print("Query:", query)
    print("Found items:", found_items)
    print("Distances:", distances) 
--- a/cookbook/pocketflow-chat/README.md
+++ b/cookbook/pocketflow-chat/README.md
@ -39,6 +39,6 @@ The chat application uses:
 ## Files
- `main.py`: Implementation of the ChatNode and chat flow
+- [`main.py`](./main.py): Implementation of the ChatNode and chat flow
- `utils.py`: Simple wrapper for calling the OpenAI API
+- [`utils.py`](./utils.py): Simple wrapper for calling the OpenAI API
--- a/cookbook/pocketflow-thinking/README.md
+++ b/cookbook/pocketflow-thinking/README.md
@ -69,7 +69,7 @@ For comparison:
 - [GPT-o1 pro with thinking](https://chatgpt.com/share/67dcb1bf-ceb0-8000-823a-8ce894032e37): Correct answer after 1.5 min
-Below is an example of how Claude 3.7 Sonnet uses thinking mode to solve this complex problem, and get the correct result:
+Below is an example of how Claude 3.7 Sonnet (without native thinking) to solve this complex problem, and get the correct result:
 ```
 🤔 Processing question: Break a stick, then break the longer piece again. What's the probability of forming a triangle?
--- a/docs/guide.md
+++ b/docs/guide.md
@ -140,6 +140,7 @@ Agentic Coding should be a collaboration between Human System Design and Agent I
 ```
 my_project/
 ├── main.py
 ├── nodes.py
 ├── flow.py
 ├── utils/
 │   ├── __init__.py
@ -154,13 +155,12 @@ my_project/
 - **`utils/`**: Contains all utility functions.
  - It's recommended to dedicate one Python file to each API call, for example `call_llm.py` or `search_web.py`.
  - Each file should also include a `main()` function to try that API call
- **`flow.py`**: Implements the system's flow, starting with node definitions followed by the overall structure.
+- **`nodes.py`**: Contains all the node definitions.
  ```python
-  # flow.py
+  # nodes.py
-  from pocketflow import Node, Flow
+  from pocketflow import Node
  from utils.call_llm import call_llm
  # Example with two nodes in a flow
  class GetQuestionNode(Node):
      def exec(self, _):
          # Get question directly from user input
@ -184,21 +184,29 @@ my_project/
      def post(self, shared, prep_res, exec_res):
          # Store the answer in shared
          shared["answer"] = exec_res
  ```
 - **`flow.py`**: Implements functions that create flows by importing node definitions and connecting them.
  ```python
  # flow.py
  from pocketflow import Flow
  from nodes import GetQuestionNode, AnswerNode
-  # Create nodes
+  def create_qa_flow():
-  get_question_node = GetQuestionNode()
+      """Create and return a question-answering flow."""
-  answer_node = AnswerNode()
+      # Create nodes
-  
+      get_question_node = GetQuestionNode()
-  # Connect nodes in sequence
+      answer_node = AnswerNode()
-  get_question_node >> answer_node
+      
-  
+      # Connect nodes in sequence
-  # Create flow starting with input node
+      get_question_node >> answer_node
-  qa_flow = Flow(start=get_question_node)
+      
      # Create flow starting with input node
      return Flow(start=get_question_node)
  ```
 - **`main.py`**: Serves as the project's entry point.
  ```python
  # main.py
-  from flow import qa_flow
+  from flow import create_qa_flow
  # Example main function
  # Please replace this with your own main function
@ -208,6 +216,8 @@ my_project/
          "answer": None     # Will be populated by AnswerNode
      }
      # Create the flow and run it
      qa_flow = create_qa_flow()
      qa_flow.run(shared)
      print(f"Question: {shared['question']}")
      print(f"Answer: {shared['answer']}")