3.9 KiB
3.9 KiB
| layout | title | parent | nav_order |
|---|---|---|---|
| default | LLM Wrapper | Utility Function | 1 |
LLM Wrappers
We don't provide built-in LLM wrappers. Instead, please implement your own or check out libraries like litellm. Here, we provide some minimal example implementations:
-
OpenAI
def call_llm(prompt): from openai import OpenAI client = OpenAI(api_key="YOUR_API_KEY_HERE") r = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return r.choices[0].message.content # Example usage call_llm("How are you?")Store the API key in an environment variable like OPENAI_API_KEY for security. {: .best-practice }
-
Claude (Anthropic)
def call_llm(prompt): from anthropic import Anthropic client = Anthropic(api_key="YOUR_API_KEY_HERE") response = client.messages.create( model="claude-2", messages=[{"role": "user", "content": prompt}], max_tokens=100 ) return response.content -
Google (Generative AI Studio / PaLM API)
def call_llm(prompt): import google.generativeai as genai genai.configure(api_key="YOUR_API_KEY_HERE") response = genai.generate_text( model="models/text-bison-001", prompt=prompt ) return response.result -
Azure (Azure OpenAI)
def call_llm(prompt): from openai import AzureOpenAI client = AzureOpenAI( azure_endpoint="https://<YOUR_RESOURCE_NAME>.openai.azure.com/", api_key="YOUR_API_KEY_HERE", api_version="2023-05-15" ) r = client.chat.completions.create( model="<YOUR_DEPLOYMENT_NAME>", messages=[{"role": "user", "content": prompt}] ) return r.choices[0].message.content -
Ollama (Local LLM)
def call_llm(prompt): from ollama import chat response = chat( model="llama2", messages=[{"role": "user", "content": prompt}] ) return response.message.content
Improvements
Feel free to enhance your call_llm function as needed. Here are examples:
- Handle chat history:
def call_llm(messages):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
return r.choices[0].message.content
- Add in-memory caching
from functools import lru_cache
@lru_cache(maxsize=1000)
def call_llm(prompt):
# Your implementation here
pass
⚠️ Caching conflicts with Node retries, as retries yield the same result.
To address this, you could use cached results only if not retried. {: .warning }
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_call(prompt):
pass
def call_llm(prompt, use_cache):
if use_cache:
return cached_call(prompt)
# Call the underlying function directly
return cached_call.__wrapped__(prompt)
class SummarizeNode(Node):
def exec(self, text):
return call_llm(f"Summarize: {text}", self.cur_retry==0)
- Enable logging:
def call_llm(prompt):
import logging
logging.info(f"Prompt: {prompt}")
response = ... # Your implementation here
logging.info(f"Response: {response}")
return response
Why Not Provide Built-in LLM Wrappers?
I believe it is a bad practice to provide LLM-specific implementations in a general framework:
- LLM APIs change frequently. Hardcoding them makes maintenance a nightmare.
- You may need flexibility to switch vendors, use fine-tuned models, or deploy local LLMs.
- You may need optimizations like prompt caching, request batching, or response streaming.