calute.llms.anthropic

calute.llms.anthropic#

Anthropic Claude LLM provider implementation.

This module provides integration with Anthropic’s Claude API for the Calute framework. It implements the BaseLLM interface to offer synchronous and asynchronous completion generation, streaming responses, and function/tool call support for Claude models.

The implementation uses httpx for HTTP communication with the Anthropic API, providing efficient async operations and streaming support. It handles message format conversion from OpenAI-style to Anthropic’s expected format.

Features: - Support for all Claude 3.x and 4.x models (Opus, Sonnet, Haiku) - Streaming responses with Server-Sent Events (SSE) parsing - Tool/function call support with structured output parsing - Automatic message format conversion from OpenAI-style to Anthropic format - Context length metadata for supported models - Async context manager support for proper resource cleanup

Supported models include: - claude-3-opus-20240229 (200K context) - claude-3-sonnet-20240229 (200K context) - claude-3-haiku-20240307 (200K context) - claude-3-5-sonnet-20240620 (200K context) - claude-3-5-haiku-20241022 (200K context) - claude-opus-4-20250514 (200K context) - claude-sonnet-4-20250514 (200K context)

Typical usage example:

from calute.llms.anthropic import AnthropicLLM from calute.llms.base import LLMConfig

config = LLMConfig(: model=”claude-3-opus-20240229”, temperature=0.7, max_tokens=4096, api_key=”your-api-key”

)

async with AnthropicLLM(config) as llm:: response = await llm.generate_completion(“Hello, Claude!”) content = llm.extract_content(response) print(content)

Note

Requires the httpx library for HTTP communication. Install with: pip install httpx

class calute.llms.anthropic.AnthropicLLM(config: calute.llms.base.LLMConfig | None = None, version: str = '2023-06-01', **kwargs)[source]#

Bases: BaseLLM

Anthropic Claude LLM provider implementation.

AnthropicLLM provides integration with Anthropic’s Claude API, implementing the BaseLLM interface for seamless integration with the Calute framework. It supports all Claude 3.x and 4.x model variants with features including streaming responses, tool/function calling, and automatic message format conversion.

This implementation uses httpx for async HTTP communication and handles the conversion between OpenAI-style message formats and Anthropic’s expected format. System messages are automatically merged into the first user message as Anthropic requires.

config#: LLMConfig instance containing provider configuration.

version#: Anthropic API version string (e.g., “2023-06-01”).

client#: httpx.AsyncClient instance for making API requests.

Example

# Using with explicit config config = LLMConfig(

model=”claude-3-opus-20240229”, temperature=0.5, max_tokens=4096, api_key=”sk-ant-…”

) async with AnthropicLLM(config) as llm:

response = await llm.generate_completion(“Hello!”) print(llm.extract_content(response))

# Using with kwargs llm = AnthropicLLM(model=”claude-3-haiku-20240307”, api_key=”sk-ant-…”) response = await llm.generate_completion(“What is 2+2?”) await llm.close()

Note

The API key can be provided via config, kwargs, or the ANTHROPIC_API_KEY environment variable.

async astream_completion(response: Any, agent: Any | None = None) → AsyncIterator[dict[str, Any]][source]#

Async stream completion chunks with function call detection.

Asynchronous version of stream_completion() that processes streaming responses from the Anthropic API. Yields standardized chunk dictionaries containing accumulated content and detected function/tool calls.

This method is the preferred way to handle streaming responses in async contexts, providing the same unified interface as the synchronous stream_completion() method.

The method tracks: - Incremental text content from content_block_delta events - Accumulated content across the entire stream - Tool use blocks for function calling - Stream completion (message_stop event)

Parameters

response – An async iterator of streaming events from generate_completion(stream=True). Each event is either a dictionary or an object with a “type” attribute.
agent – Optional agent instance for advanced function detection. Currently reserved for future use.

Yields

dict –

Standardized chunk information with keys:

content: Text content in this chunk (str or None)
buffered_content: All text accumulated so far (str)
function_calls: List of detected function calls (list)
tool_calls: Raw tool call data (None for Anthropic)
raw_chunk: The original event data
is_final: True if this is the final chunk (bool)

Function call format in function_calls list:

{: “id”: “tool_use_id”, “name”: “function_name”, “arguments”: ‘{“arg”: “value”}’ # JSON string

}

Example

response = await llm.generate_completion(“Hello”, stream=True) async for chunk in llm.astream_completion(response):

if chunk[“content”]:
print(chunk[“content”], end=””, flush=True)

if chunk[“is_final”] and chunk[“function_calls”]:
print(“nFunction calls detected!”)

async close() → None[source]#

Close the HTTP client and release resources.

Properly closes the httpx.AsyncClient connection pool. This method should be called when done using the LLM provider to prevent resource leaks. It is called automatically when using the provider as an async context manager.

This method is safe to call multiple times - it checks for client existence before attempting to close.

Example

llm = AnthropicLLM(config) try:

response = await llm.generate_completion(“Hello”)

finally:: await llm.close()

# Or preferably, use context manager: async with AnthropicLLM(config) as llm:

response = await llm.generate_completion(“Hello”)

extract_content(response: Any) → str[source]#

Extract text content from an Anthropic API response.

Parses the response from the Anthropic Messages API and extracts all text content from the content blocks. Anthropic responses contain a list of content blocks, each with a type (usually “text” or “tool_use”). This method concatenates all text blocks into a single string.

Parameters

response –

The API response dictionary from generate_completion(). Expected structure: {

”content”: [
{“type”: “text”, “text”: “Hello!”}, {“type”: “tool_use”, “id”: “…”, “name”: “…”, “input”: {…}}

}

Returns

Concatenated text from all “text” type content blocks. Returns an empty string if: - response is not a dictionary - response has no “content” key - no text blocks are present

Example

>>> response = {
...     "content": [
...         {"type": "text", "text": "Hello, "},
...         {"type": "text", "text": "world!"}
...     ]
... }
>>> llm.extract_content(response)
"Hello, world!"

fetch_model_info() → dict[str, Any][source]#

Fetch model metadata from known Anthropic model specifications.

Since Anthropic does not provide a public API endpoint for querying model capabilities, this method uses a local mapping of known model context lengths (ANTHROPIC_CONTEXT_LENGTHS). The model is matched by prefix to handle versioned model names.

The context length information is useful for: - Token counting and context window management - Preventing context overflow errors - Optimizing prompt construction

Returns

max_model_len: Maximum context length in tokens (int)

Returns an empty dictionary if the model name doesn’t match any known prefixes.

Return type

Dictionary with model metadata. Contains

Example

>>> llm = AnthropicLLM(model="claude-3-opus-20240229")
>>> info = llm.fetch_model_info()
>>> print(info)  # {"max_model_len": 200000}

Note

This method is called automatically during client initialization via _auto_fetch_model_info(). The result is stored in config.model_metadata and config.max_model_len.

Generate a completion using the Anthropic Claude API.

Sends a prompt to the Claude API and returns the model’s response. Supports both simple string prompts and OpenAI-style message lists. When using message lists, the method automatically converts them to Anthropic’s expected format, handling system messages appropriately.

Parameters

prompt –
The input prompt. Can be either: - A string: Converted to a single user message - A list of message dicts with ‘role’ and ‘content’ keys,

following OpenAI’s chat format. Roles can be ‘user’, ‘assistant’, or ‘system’.
model – Model identifier to use, overriding config.model. Example: “claude-3-opus-20240229”
temperature – Sampling temperature (0.0-1.0), overriding config. Higher values produce more random outputs.
max_tokens – Maximum tokens to generate, overriding config. Claude requires this parameter (unlike some other providers).
top_p – Nucleus sampling parameter, overriding config. Only applied if different from the default 0.95.
stop – List of stop sequences that will halt generation, overriding config.stop. Called “stop_sequences” in Anthropic API.
stream – Whether to stream the response, overriding config.stream. If True, returns an async iterator of chunks.
**kwargs – Additional Anthropic-specific parameters passed directly to the API, such as: - tools: List of tool definitions for function calling - tool_choice: How to select tools - metadata: Request metadata

Returns

A dictionary containing the API response with:

id: Unique message ID
type: “message”
role: “assistant”
content: List of content blocks (text, tool_use)
model: Model used
stop_reason: Why generation stopped
usage: Token usage statistics

If stream is True: An AsyncIterator yielding SSE event dicts.

Return type

If stream is False

Raises

RuntimeError – If the API request fails due to network errors, authentication issues, or API errors.

Example

# Simple string prompt response = await llm.generate_completion(“What is Python?”)

# Message list with system prompt messages = [

{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Hello!”}

] response = await llm.generate_completion(messages)

# With streaming async for chunk in await llm.generate_completion(“Hi”, stream=True):

print(chunk)

parse_tool_calls(raw_data: Any) → list[dict[str, Any]][source]#

Parse tool/function calls from an Anthropic API response.

Extracts tool use blocks from an Anthropic response and converts them to a standardized format compatible with the Calute framework. This method handles Anthropic’s content block structure where tool calls are embedded as “tool_use” type blocks.

Parameters

raw_data –

The API response dictionary from generate_completion(). Expected to have a “content” key containing a list of blocks: {

”content”: [
{“type”: “text”, “text”: “…”}, {

”type”: “tool_use”, “id”: “toolu_123”, “name”: “get_weather”, “input”: {“location”: “NYC”}

}

]

}

Returns

id: The tool use ID from Anthropic
- name: The function/tool name
- arguments: JSON string of the input parameters

Returns an empty list if: - raw_data is not a dictionary - raw_data has no “content” key - no tool_use blocks are present

Return type

List of standardized tool call dictionaries, each containing

Example

>>> response = await llm.generate_completion(prompt, tools=[...])
>>> tool_calls = llm.parse_tool_calls(response)
>>> for call in tool_calls:
...     args = json.loads(call["arguments"])
...     result = execute_function(call["name"], args)

async process_streaming_response(response: Any, callback: Callable[[str, Any], None]) → str[source]#

Process a streaming response from the Anthropic API.

Iterates through streaming response events and accumulates text content. For each content delta event, the callback is invoked with the new text and the raw event data.

This method is useful for real-time display of streaming output while also collecting the complete response.

Parameters

response – An async iterator of streaming events from generate_completion(stream=True). Events are dictionaries with a “type” field indicating the event type.
callback – A function called for each text chunk received. Signature: callback(text: str, raw_chunk: dict) - text: The incremental text content - raw_chunk: The raw event dictionary

Returns

The complete accumulated text content from all content_block_delta events in the stream.

Example

def on_chunk(text, chunk):: print(text, end=””, flush=True)

response = await llm.generate_completion(“Tell me a story”, stream=True) full_text = await llm.process_streaming_response(response, on_chunk) print(f”nTotal: {len(full_text)} chars”)

stream_completion(response: Any, agent: Any | None = None) → Iterator[dict[str, Any]][source]#

Stream completion chunks with function call detection.

Processes a synchronous streaming response from Anthropic, yielding standardized chunk dictionaries that include accumulated content and detected function/tool calls. This method provides a unified interface for handling streaming responses across different providers.

The method tracks: - Incremental text content from content_block_delta events - Accumulated content across the entire stream - Tool use blocks for function calling - Stream completion (message_stop event)

Parameters

response – A synchronous iterator of streaming events from the Anthropic API. Each event is either a dictionary or an object with a “type” attribute.
agent – Optional agent instance for advanced function detection. Currently reserved for future use.

Yields

dict –

Standardized chunk information with keys:

content: Text content in this chunk (str or None)
buffered_content: All text accumulated so far (str)
function_calls: List of detected function calls (list)
tool_calls: Raw tool call data (None for Anthropic)
raw_chunk: The original event data
is_final: True if this is the final chunk (bool)

Function call format in function_calls list:

{: “id”: “tool_use_id”, “name”: “function_name”, “arguments”: ‘{“arg”: “value”}’ # JSON string

}

Example

for chunk in llm.stream_completion(response):

if chunk[“content”]:

print(chunk[“content”], end=””)

if chunk[“is_final”]:

for call in chunk[“function_calls”]:: print(f”Function: {call[‘name’]}”)

calute.llms.anthropic

Contents

calute.llms.anthropic#