calute.llms.gemini#
Google Gemini LLM provider implementation.
This module provides integration with Google’s Generative AI (Gemini) models through the google-generativeai Python SDK. It supports both streaming and non-streaming completions, multi-turn conversations, and function calling.
The module handles: - Authentication via API key (config, environment variables GEMINI_API_KEY
or GOOGLE_API_KEY)
Message format conversion from standard chat format to Gemini format
Streaming response processing with callback support
Automatic model metadata fetching (token limits)
Tool/function call parsing from Gemini responses
Supported models include: - gemini-pro (default) - gemini-pro-vision - gemini-1.5-pro - gemini-1.5-flash - Other models available through the Gemini API
- Typical usage example:
from calute.llms.gemini import GeminiLLM from calute.llms.base import LLMConfig
- config = LLMConfig(
model=”gemini-1.5-pro”, temperature=0.7, max_tokens=2048, api_key=”your-api-key”
)
- async with GeminiLLM(config) as llm:
response = await llm.generate_completion(“Explain quantum computing”) content = llm.extract_content(response) print(content)
Note
Requires the google-generativeai package to be installed: pip install google-generativeai
- class calute.llms.gemini.GeminiLLM(config: calute.llms.base.LLMConfig | None = None, client: Any | None = None, **kwargs)[source]#
Bases:
BaseLLMGoogle Gemini LLM provider implementation.
GeminiLLM provides a complete integration with Google’s Generative AI (Gemini) API, offering text generation capabilities with support for both single prompts and multi-turn conversations.
This implementation handles the conversion between the standardized Calute message format and Gemini’s expected input format, manages streaming responses, and supports function calling for agentic workflows.
- config#
LLMConfig instance containing provider configuration.
- client#
Google GenerativeModel client instance for API calls.
- genai#
Reference to the google.generativeai module for configuration and model access.
Example
# Basic usage with string prompt llm = GeminiLLM(model=”gemini-pro”, api_key=”your-key”) response = await llm.generate_completion(“What is the meaning of life?”) print(llm.extract_content(response))
# Using with chat-style messages messages = [
{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Hello!”},
] response = await llm.generate_completion(messages)
# Streaming response response = await llm.generate_completion(“Tell me a story”, stream=True) for chunk in llm.stream_completion(response):
print(chunk[“content”], end=””, flush=True)
Note
The Gemini API key can be provided via: 1. The config.api_key parameter 2. The GEMINI_API_KEY environment variable 3. The GOOGLE_API_KEY environment variable
- async astream_completion(response: Any, agent: Any | None = None) AsyncIterator[dict[str, Any]][source]#
Asynchronously stream completion chunks with function call detection.
Processes an asynchronous streaming response from the Gemini API, yielding standardized chunk dictionaries compatible with the Calute agent framework. This is the async counterpart to stream_completion().
This method enables non-blocking streaming of responses, allowing other async operations to proceed while waiting for chunks.
- Parameters
response – An asynchronous streaming response iterator from Gemini’s async generate_content method.
agent – Optional agent instance for function call detection. Currently not used in this implementation but provided for interface compatibility.
- Yields
Dictionary containing streaming chunk information – - content (str | None): Text content in this chunk - buffered_content (str): Accumulated content so far - function_calls (list): Detected function calls (empty) - tool_calls (Any): Raw tool call data (None for Gemini) - raw_chunk (Any): The original Gemini chunk object - is_final (bool): Whether this is the final chunk
Example
response = await llm.generate_completion(“Hello”, stream=True) async for chunk in llm.astream_completion(response):
- if chunk[“content”]:
print(chunk[“content”], end=””, flush=True)
- extract_content(response: Any) str[source]#
Extract the text content from a Gemini API response.
Parses the GenerateContentResponse object to extract the generated text. Handles multiple response formats including direct text access and candidate-based structures.
The method attempts extraction in the following order: 1. Direct .text attribute (simplest case) 2. First candidate’s content parts (structured response)
- Parameters
response – A GenerateContentResponse object from the Gemini API. Can be either a complete response or a streaming chunk.
- Returns
The extracted text content as a string. Returns an empty string if no text content is found or the response structure is unexpected.
Example
response = await llm.generate_completion(“Hello”) text = llm.extract_content(response) print(text) # “Hello! How can I help you today?”
- fetch_model_info() dict[str, Any][source]#
Fetch model metadata from the Gemini API.
Retrieves information about the configured model from Google’s model registry, including token limits and capabilities. This information is used to optimize token usage and prevent context overflow errors.
The method is called automatically during client initialization via _auto_fetch_model_info() to populate config.max_model_len and config.model_metadata.
- Returns
- max_model_len (int | None): Maximum input tokens accepted
output_token_limit (int | None): Maximum output tokens
Returns an empty dictionary if the model info cannot be fetched (e.g., network error, invalid model name).
- Return type
A dictionary containing model metadata
Note
This method silently catches exceptions to prevent initialization failures when model info is unavailable.
- async generate_completion(prompt: str | list[dict[str, str]], model: str | None = None, temperature: float | None = None, max_tokens: int | None = None, top_p: float | None = None, stop: list[str] | None = None, stream: bool | None = None, **kwargs) Any[source]#
Generate a completion using the Google Gemini API.
Sends a prompt to the Gemini API and returns the generated response. Supports both single text prompts and chat-style message lists. When streaming is enabled, returns an iterator for processing chunks.
- Parameters
prompt –
The input for generation. Can be either: - A string containing the prompt text - A list of message dictionaries with ‘role’ and ‘content’ keys
(will be formatted using _format_messages_for_gemini)
model – Optional model override. If different from config.model, a new GenerativeModel client is created for this request.
temperature – Sampling temperature override (0.0 to 1.0). Higher values produce more random output.
max_tokens – Maximum number of tokens to generate in the response.
top_p – Nucleus sampling parameter override (0.0 to 1.0).
stop – List of sequences that will stop generation when encountered.
stream – Whether to stream the response. If True, returns a streaming response iterator instead of a complete response.
**kwargs – Additional Gemini-specific parameters passed directly to the generate_content method (e.g., safety_settings).
- Returns
- A GenerateContentResponse object containing
the complete generated text and metadata.
- If stream=True: A streaming response iterator that yields
chunks as they are generated.
- Return type
If stream=False
- Raises
RuntimeError – If the Gemini API request fails for any reason, wrapping the original exception with context.
Example
# Simple text completion response = await llm.generate_completion(“Explain photosynthesis”) text = llm.extract_content(response)
# Chat-style with messages messages = [
{“role”: “user”, “content”: “What’s 2+2?”}, {“role”: “assistant”, “content”: “4”}, {“role”: “user”, “content”: “And 3+3?”},
] response = await llm.generate_completion(messages)
# Streaming response response = await llm.generate_completion(“Write a poem”, stream=True) async for chunk in response:
print(chunk.text, end=””)
- parse_tool_calls(raw_data: Any) list[dict[str, Any]][source]#
Parse tool/function calls from Gemini response format.
Extracts function call information from a Gemini API response and converts it to the standardized Calute tool call format. Gemini function calls are embedded within content parts of response candidates.
This method enables agentic workflows where the model can request execution of predefined functions and receive their results.
- Parameters
raw_data – A GenerateContentResponse object from the Gemini API that may contain function call requests in its candidates.
- Returns
- id (str | None): Function call identifier (if available)
name (str): Name of the function to call
arguments (str): String representation of function arguments
Returns an empty list if no function calls are found.
- Return type
A list of tool call dictionaries, each containing
Example
response = await llm.generate_completion(prompt, tools=tools) tool_calls = llm.parse_tool_calls(response) for call in tool_calls:
result = execute_function(call[“name”], call[“arguments”])
- async process_streaming_response(response: Any, callback: Callable[[str, Any], None]) str[source]#
Process a streaming response from Gemini with callback support.
Iterates through the streaming response chunks from the Gemini API, extracting text content from each chunk and invoking the provided callback function. Accumulates and returns the complete response.
This method is useful for real-time display of generated content or for implementing progress indicators during long generations.
- Parameters
response – A streaming response iterator from Gemini’s generate_content method (called with stream=True).
callback – A function called for each chunk with two arguments: - content (str): The text content in the current chunk - chunk (Any): The raw chunk object from Gemini API
- Returns
The complete accumulated text content from all chunks concatenated together.
Example
- def on_chunk(content: str, chunk: Any) -> None:
print(content, end=””, flush=True)
response = await llm.generate_completion(“Tell a story”, stream=True) full_text = await llm.process_streaming_response(response, on_chunk) print(f”nTotal length: {len(full_text)}”)
- stream_completion(response: Any, agent: Any | None = None) Iterator[dict[str, Any]][source]#
Stream completion chunks with function call detection.
Processes a synchronous streaming response from the Gemini API, yielding standardized chunk dictionaries compatible with the Calute agent framework. Tracks accumulated content and provides metadata for each chunk.
This method is used internally by agents to process streaming responses while detecting potential function calls in the output.
- Parameters
response – A synchronous streaming response iterator from Gemini’s generate_content method (stream=True).
agent – Optional agent instance for function call detection. Currently not used in this implementation but provided for interface compatibility.
- Yields
Dictionary containing streaming chunk information – - content (str | None): Text content in this chunk - buffered_content (str): Accumulated content so far - function_calls (list): Detected function calls (empty) - tool_calls (Any): Raw tool call data (None for Gemini) - raw_chunk (Any): The original Gemini chunk object - is_final (bool): Whether this is the final chunk
Example
response = await llm.generate_completion(“Hello”, stream=True) for chunk in llm.stream_completion(response):
- if chunk[“content”]:
print(chunk[“content”], end=””)
- if chunk[“is_final”]:
print(“n— Generation complete —”)