pocketflow/docs/utility_function/llm.md

147 lines
3.9 KiB
Markdown

---
layout: default
title: "LLM Wrapper"
parent: "Utility Function"
nav_order: 1
---
# LLM Wrappers
We **don't** provide built-in LLM wrappers. Instead, please implement your own or check out libraries like [litellm](https://github.com/BerriAI/litellm).
Here, we provide some minimal example implementations:
1. OpenAI
```python
def call_llm(prompt):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
# Example usage
call_llm("How are you?")
```
> Store the API key in an environment variable like OPENAI_API_KEY for security.
2. Claude (Anthropic)
```python
def call_llm(prompt):
from anthropic import Anthropic
client = Anthropic(api_key="YOUR_API_KEY_HERE")
response = client.messages.create(
model="claude-2",
messages=[{"role": "user", "content": prompt}],
max_tokens=100
)
return response.content
```
3. Google (Generative AI Studio / PaLM API)
```python
def call_llm(prompt):
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY_HERE")
response = genai.generate_text(
model="models/text-bison-001",
prompt=prompt
)
return response.result
```
4. Azure (Azure OpenAI)
```python
def call_llm(prompt):
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://<YOUR_RESOURCE_NAME>.openai.azure.com/",
api_key="YOUR_API_KEY_HERE",
api_version="2023-05-15"
)
r = client.chat.completions.create(
model="<YOUR_DEPLOYMENT_NAME>",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
```
5. Ollama (Local LLM)
```python
def call_llm(prompt):
from ollama import chat
response = chat(
model="llama2",
messages=[{"role": "user", "content": prompt}]
)
return response.message.content
```
## Improvements
Feel free to enhance your `call_llm` function as needed. Here are examples:
- Handle chat history:
```python
def call_llm(messages):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
return r.choices[0].message.content
```
- Add in-memory caching
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def call_llm(prompt):
# Your implementation here
pass
```
> ⚠️ Caching conflicts with Node retries, as retries yield the same result.
>
> To address this, you could use cached results only if not retried.
{: .warning }
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_call(prompt):
pass
def call_llm(prompt, use_cache):
if use_cache:
return cached_call(prompt)
# Call the underlying function directly
return cached_call.__wrapped__(prompt)
class SummarizeNode(Node):
def exec(self, text):
return call_llm(f"Summarize: {text}", self.cur_retry==0)
```
- Enable logging:
```python
def call_llm(prompt):
import logging
logging.info(f"Prompt: {prompt}")
response = ... # Your implementation here
logging.info(f"Response: {response}")
return response
```
## Why Not Provide Built-in LLM Wrappers?
I believe it is a **bad practice** to provide LLM-specific implementations in a general framework:
- **LLM APIs change frequently**. Hardcoding them makes maintenance a nightmare.
- You may need **flexibility** to switch vendors, use fine-tuned models, or deploy local LLMs.
- You may need **optimizations** like prompt caching, request batching, or response streaming.