calute.llms.openai#
OpenAI LLM provider implementation.
This module provides the OpenAI-specific implementation of the BaseLLM interface for integrating OpenAI’s GPT models into the Calute framework. It supports all OpenAI Chat Completion API features including streaming, function calling, and tool use.
The module supports: - GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, and GPT-4o models - Synchronous and asynchronous completion generation - Streaming responses with real-time function call detection - Tool/function calling with automatic argument accumulation - OpenAI-compatible API endpoints (Azure, local proxies, etc.) - Automatic model metadata fetching from /v1/models endpoint
Key features: - Automatic API key resolution from environment (OPENAI_API_KEY) - Support for custom base URLs (useful for Azure OpenAI or proxies) - Filtering of unsupported parameters (top_k, min_p, repetition_penalty) - Robust streaming with incremental tool call argument accumulation
- Typical usage example:
from calute.llms.openai import OpenAILLM from calute.llms.base import LLMConfig
# Using configuration object config = LLMConfig(
model=”gpt-4o”, temperature=0.7, max_tokens=2048, api_key=”sk-…”
) llm = OpenAILLM(config)
# Or using kwargs (defaults to gpt-4o-mini) llm = OpenAILLM(model=”gpt-4”, api_key=”sk-…”)
# Generate completion response = await llm.generate_completion(“Explain quantum computing”) content = llm.extract_content(response)
# With streaming stream_response = await llm.generate_completion(
“Write a poem”, stream=True
) for chunk in llm.stream_completion(stream_response):
- if chunk[“content”]:
print(chunk[“content”], end=””)
- class calute.llms.openai.OpenAILLM(config: calute.llms.base.LLMConfig | None = None, client: Any | None = None, async_client: Any | None = None, **kwargs)[source]#
Bases:
BaseLLMOpenAI LLM provider implementation using the official OpenAI Python client.
OpenAILLM provides a complete implementation of the BaseLLM interface for OpenAI’s GPT models. It wraps the official OpenAI Python client and handles all the complexity of API communication, streaming, and tool call parsing.
This class supports both standard OpenAI API and OpenAI-compatible endpoints (such as Azure OpenAI, LM Studio, or custom proxy servers) through the base_url configuration option.
The implementation automatically: - Resolves API keys from environment variables if not provided - Filters out unsupported parameters (top_k, min_p, repetition_penalty) - Accumulates streaming tool call arguments incrementally - Fetches model metadata from the /v1/models endpoint
- config#
LLMConfig instance containing provider configuration.
- client#
OpenAI client instance used for API communication.
Example
# Basic usage with environment variable API key llm = OpenAILLM(model=”gpt-4o”) response = await llm.generate_completion(“Hello!”) print(llm.extract_content(response))
# With explicit configuration config = LLMConfig(
model=”gpt-4-turbo”, temperature=0.5, max_tokens=4096, api_key=”sk-…”
) llm = OpenAILLM(config)
# Using custom base URL (e.g., Azure or local server) llm = OpenAILLM(
model=”gpt-4”, base_url=”https://my-resource.openai.azure.com/”, api_key=”azure-key”
)
# Inject custom client from openai import OpenAI custom_client = OpenAI(api_key=”sk-…”, timeout=120.0) llm = OpenAILLM(client=custom_client)
- async astream_completion(response: Any, agent: Any | None = None) AsyncIterator[dict[str, Any]][source]#
Async stream completion chunks with tool/function call detection.
Asynchronous version of stream_completion that works with async iterators. Iterates through an OpenAI async streaming response, yielding structured data for each chunk with automatic tool call accumulation.
This method is designed for use with AsyncOpenAI client or when the streaming response is an async iterator. It provides the same functionality as stream_completion but with proper async/await semantics.
- Parameters
response – OpenAI async streaming response (AsyncIterator). Should be the result of an async chat.completions.create() call with stream=True.
agent – Optional agent instance for function detection. Currently reserved for future use with agent-specific function filtering.
- Yields
Dictionary containing streaming chunk information with keys – - content: Text content in this specific chunk (or None) - buffered_content: All accumulated text content so far - function_calls: List of completed function calls (populated
only in final chunk when tool calls are present)
tool_calls: Accumulated tool call data indexed by position
streaming_tool_calls: Incremental tool call updates in this chunk
raw_chunk: The original OpenAI ChatCompletionChunk object
is_final: Boolean indicating if this is the last chunk
Example
stream = await async_llm.generate_completion(prompt, stream=True) async for chunk in llm.astream_completion(stream):
- if chunk[“content”]:
print(chunk[“content”], end=””)
- if chunk[“is_final”]:
print() # Newline at end
- async close() None[source]#
Close the OpenAI client and release resources.
Closes any open HTTP connections maintained by the OpenAI client. This method is called automatically when using the LLM provider as an async context manager.
The method safely checks if the client has a close method before calling it, ensuring compatibility with different client versions or custom client implementations.
- Side Effects:
Closes the underlying HTTP client/session
Releases any connection pool resources
Example
llm = OpenAILLM(model=”gpt-4”) try:
response = await llm.generate_completion(“Hello”)
- finally:
await llm.close()
# Or use as context manager (preferred): async with OpenAILLM(model=”gpt-4”) as llm:
response = await llm.generate_completion(“Hello”)
- extract_content(response: Any) str[source]#
Extract text content from an OpenAI ChatCompletion response.
Safely extracts the message content from the first choice in an OpenAI response. Handles both regular text responses and tool call responses (which have no text content).
The method checks for: 1. Presence of choices array 2. Message object in first choice 3. Content attribute on message 4. Tool calls (returns empty string if present without content)
- Parameters
response – OpenAI ChatCompletion response object. Should have a ‘choices’ attribute containing at least one choice with a ‘message’ attribute.
- Returns
The text content from the response message. Returns an empty string if the response is a tool call response, has no content, or if the response structure is unexpected.
Example
response = await llm.generate_completion(“Hello!”) content = llm.extract_content(response) print(content) # “Hello! How can I help you today?”
# Tool call response returns empty string response = await llm.generate_completion(prompt, tools=tools) content = llm.extract_content(response) # “” (tool calls present)
- extract_reasoning_content(response: Any) str[source]#
Extract reasoning/thinking content from an OpenAI response.
Extracts reasoning tokens from reasoning models (o1, o3, etc.) in non-streaming responses. These models may include internal chain-of-thought reasoning separately from the main content.
- Parameters
response – OpenAI ChatCompletion response object.
- Returns
The reasoning content string, or empty string if not present.
Example
response = await llm.generate_completion(“Solve this math problem”) reasoning = llm.extract_reasoning_content(response) if reasoning:
print(f”Model reasoning: {reasoning}”)
content = llm.extract_content(response) print(f”Answer: {content}”)
- fetch_model_info() dict[str, Any][source]#
Fetch model metadata from the OpenAI /v1/models endpoint.
Queries the OpenAI models API to retrieve information about the configured model. This information can include context window size, capabilities, and other metadata depending on the API endpoint.
The method searches through all available models returned by the API to find the one matching self.config.model, then extracts relevant metadata fields.
- Returns
- max_model_len: Maximum context length in tokens (if available)
metadata: Additional model metadata dictionary
Returns an empty dictionary if the model is not found or if the API call fails.
- Return type
Dictionary containing model metadata
Note
This method silently catches all exceptions to prevent initialization failures. Some OpenAI-compatible endpoints may not support the /v1/models endpoint or may return different metadata fields.
Example
llm = OpenAILLM(model=”gpt-4”) info = llm.fetch_model_info() if info.get(“max_model_len”):
print(f”Context window: {info[‘max_model_len’]} tokens”)
- async generate_completion(prompt: str | list[dict[str, str]], model: str | None = None, temperature: float | None = None, max_tokens: int | None = None, top_p: float | None = None, stop: list[str] | None = None, stream: bool | None = None, tools: list[dict] | None = None, **kwargs) Any[source]#
Generate a completion using the OpenAI Chat Completions API.
Sends a request to OpenAI’s chat completions endpoint and returns the response. Supports both simple string prompts and full message lists, with optional tool/function calling support.
The method automatically: - Converts string prompts to message format - Merges config defaults with override parameters - Filters out unsupported parameters (top_k, min_p, repetition_penalty) - Sets tool_choice to “auto” when tools are provided - Applies extra_params from config
- Parameters
prompt – The input to generate completion for. Can be either: - A string (converted to a single user message) - A list of message dicts with ‘role’ and ‘content’ keys
model – Model identifier override. If None, uses config.model.
temperature – Sampling temperature override (0.0-2.0). Higher values make output more random. If None, uses config.temperature.
max_tokens – Maximum tokens to generate. If None, uses config.max_tokens.
top_p – Nucleus sampling parameter (0.0-1.0). If None, uses config.top_p.
stop – List of stop sequences. When encountered, generation stops. If None, uses config.stop.
stream – Whether to stream the response. If True, returns a streaming iterator instead of a complete response. If None, uses config.stream.
tools – List of tool definitions for function calling. Each tool should be a dict with ‘type’ and ‘function’ keys following OpenAI’s tool schema. When provided, tool_choice is set to “auto”.
**kwargs – Additional OpenAI-specific parameters. Unsupported params (top_k, min_p, repetition_penalty) are automatically filtered out. Examples: response_format, seed, logprobs, user.
- Returns
OpenAI ChatCompletion response object. If stream=True, returns a streaming iterator that yields ChatCompletionChunk objects.
Note
This method uses the synchronous OpenAI client internally but is declared async for interface consistency with other providers. For true async, consider using AsyncOpenAI client.
Example
# Simple string prompt response = await llm.generate_completion(“What is 2+2?”)
# Message list with system prompt messages = [
{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Hello!”}
] response = await llm.generate_completion(messages)
# With tools tools = [{
“type”: “function”, “function”: {
“name”: “get_weather”, “description”: “Get weather for a location”, “parameters”: {…}
}
}] response = await llm.generate_completion(prompt, tools=tools)
- parse_tool_calls(raw_data: Any) list[dict[str, Any]][source]#
Parse tool/function calls from an OpenAI response message.
Extracts and standardizes tool call information from an OpenAI message object. Converts OpenAI’s tool call format to a simplified dictionary format used consistently across all Calute LLM providers.
This method is typically used to extract tool calls from non-streaming responses where the complete tool call data is available at once.
- Parameters
raw_data – OpenAI message object (typically response.choices[0].message) that may contain a ‘tool_calls’ attribute with a list of tool call objects.
- Returns
- id: Unique identifier for the tool call
name: Name of the function to call
arguments: JSON string of function arguments
Returns an empty list if no tool calls are present.
- Return type
List of standardized tool call dictionaries, each containing
Example
response = await llm.generate_completion(prompt, tools=tools) message = response.choices[0].message tool_calls = llm.parse_tool_calls(message) for call in tool_calls:
print(f”Call {call[‘id’]}: {call[‘name’]}”) args = json.loads(call[‘arguments’]) # Execute the function with args…
- async process_streaming_response(response: Any, callback: Callable[[str, Any], None]) str[source]#
Process a streaming response from OpenAI with a callback for each chunk.
Iterates through all chunks in a streaming response, extracts text content and reasoning tokens from each delta, accumulates the full response, and invokes the callback for each content chunk received.
This method is useful for displaying streaming output in real-time while also capturing the complete response. The callback receives both the individual chunk content and the raw chunk object for additional processing.
Note: Reasoning tokens (from o1/o3 models) are included in the callback via the raw chunk object. Access them via chunk.choices[0].delta.reasoning_content.
- Parameters
response – OpenAI streaming response iterator. Should be the result of a chat.completions.create() call with stream=True.
callback – Function called for each chunk with (content, raw_chunk). - content: The text content from this chunk’s delta - raw_chunk: The raw ChatCompletionChunk object
- Returns
The complete accumulated content from all chunks concatenated together.
Example
- def print_chunk(content: str, chunk: Any) -> None:
print(content, end=””, flush=True)
stream = await llm.generate_completion(“Tell me a story”, stream=True) full_text = await llm.process_streaming_response(stream, print_chunk) print() # Newline after streaming print(f”Total length: {len(full_text)}”)
- stream_completion(response: Any, agent: Any | None = None) Iterator[dict[str, Any]][source]#
Stream completion chunks with tool/function call detection and accumulation.
Iterates through an OpenAI streaming response, yielding structured data for each chunk. Automatically detects and accumulates tool calls across chunks, building complete function call information as it streams.
This method handles the complexity of OpenAI’s streaming format where tool call arguments are split across multiple chunks. It maintains internal accumulators and provides both incremental updates and accumulated totals in each yielded chunk.
- Parameters
response – OpenAI streaming response iterator from a chat.completions call with stream=True.
agent – Optional agent instance for function detection. Currently reserved for future use with agent-specific function filtering.
- Yields
Dictionary containing streaming chunk information with keys – - content: Text content in this specific chunk (or None) - buffered_content: All accumulated text content so far - function_calls: List of completed function calls (populated
only in final chunk when tool calls are present)
tool_calls: Accumulated tool call data indexed by position
streaming_tool_calls: Incremental tool call updates in this chunk
raw_chunk: The original OpenAI ChatCompletionChunk object
is_final: Boolean indicating if this is the last chunk
Example
stream = await llm.generate_completion(prompt, stream=True, tools=tools) for chunk in llm.stream_completion(stream):
# Print text content as it arrives if chunk[“content”]:
print(chunk[“content”], end=””)
# Check for completed function calls at end if chunk[“is_final”] and chunk[“function_calls”]:
- for call in chunk[“function_calls”]:
print(f”Function: {call[‘name’]}”) print(f”Args: {call[‘arguments’]}”)