calute.tools.web_tools#

Web-related tools for fetching, scraping, and processing web content.

This module provides a collection of web tools for agents to interact with web resources. It includes: - Web scraping with content extraction and CSS selectors - Generic HTTP API client for RESTful interactions - RSS/Atom feed reading and parsing - URL analysis and validation

All tools support both synchronous and asynchronous execution modes, with comprehensive error handling and configurable timeouts.

Example

>>> scraper = WebScraper()
>>> result = scraper.static_call("https://example.com", extract_links=True)
>>> print(result["title"])
class calute.tools.web_tools.APIClient(name, bases, namespace, /, **kwargs)[source]#

Bases: AgentBaseFn

Generic API client for making HTTP requests.

Provides a flexible HTTP client for interacting with REST APIs. Supports all standard HTTP methods, custom headers, query parameters, JSON payloads, and raw data bodies.

The client automatically handles JSON response parsing and provides comprehensive error handling for network and HTTP errors.

Inherits from AgentBaseFn for agent integration.

Example

>>> result = APIClient.static_call(
...     "https://api.example.com/data",
...     method="POST",
...     json_data={"key": "value"}
... )
>>> print(result["status_code"])
async static async_call(url: str, method: str = 'GET', headers: dict[str, str] | None = None, params: dict[str, Any] | None = None, json_data: dict[str, Any] | None = None, data: str | None = None, timeout: int = 30, **context_variables) dict[str, Any][source]#

Make HTTP API requests with various methods and payloads.

Sends an HTTP request to the specified URL with the given method and payload options. Automatically follows redirects and parses JSON responses when available.

Parameters
  • url – The API endpoint URL.

  • method – HTTP method (GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS).

  • headers – Dictionary of request headers.

  • params – Dictionary of URL query parameters.

  • json_data – Dictionary to be sent as JSON in request body.

  • data – Raw string data for request body.

  • timeout – Request timeout in seconds (default: 30).

  • **context_variables – Additional context passed from the agent.

Returns

  • status_code: HTTP response status code

  • headers: Response headers as dictionary

  • url: Final URL after redirects

  • json: Parsed JSON response (if applicable)

  • text: Response text (if not JSON, truncated to 10000 chars)

  • error: Error message if request failed

Return type

Dictionary containing

Raises

Returns error dict if HTTP method is invalid or request fails.

static static_call(url: str, method: str = 'GET', headers: dict[str, str] | None = None, params: dict[str, Any] | None = None, json_data: dict[str, Any] | None = None, data: str | None = None, timeout: int = 30, **context_variables) dict[str, Any][source]#

Synchronous wrapper for API requests.

Provides a blocking interface to the async API client. Uses asyncio.run() to execute the async method.

Parameters
  • url – The API endpoint URL.

  • method – HTTP method (GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS).

  • headers – Dictionary of request headers.

  • params – Dictionary of URL query parameters.

  • json_data – Dictionary to be sent as JSON in request body.

  • data – Raw string data for request body.

  • timeout – Request timeout in seconds.

  • **context_variables – Additional context passed from the agent.

Returns

Dictionary with response data and metadata.

class calute.tools.web_tools.RSSReader(name, bases, namespace, /, **kwargs)[source]#

Bases: AgentBaseFn

RSS/Atom feed reader and parser.

Provides functionality for reading and parsing RSS and Atom feeds. Extracts feed metadata and individual entries with configurable content inclusion and item limits.

Uses the feedparser library for robust feed parsing that handles various feed formats and edge cases.

Inherits from AgentBaseFn for agent integration.

Example

>>> result = RSSReader.static_call(
...     "https://example.com/feed.xml",
...     max_items=5
... )
>>> for item in result["items"]:
...     print(item["title"])
async static async_call(feed_url: str, max_items: int = 10, include_content: bool = True, **context_variables) dict[str, Any][source]#

Read and parse RSS/Atom feeds.

Fetches and parses an RSS or Atom feed from the specified URL, extracting feed metadata and individual entries.

Parameters
  • feed_url – URL of the RSS/Atom feed.

  • max_items – Maximum number of items to return (default: 10).

  • include_content – Whether to include full article content.

  • **context_variables – Additional context passed from the agent.

Returns

  • title: Feed title

  • description: Feed description

  • link: Feed website link

  • updated: Last update timestamp

  • items: List of feed entries with title, link, published, author, tags, and optionally content

  • error: Error message if parsing failed

Return type

Dictionary containing

Note

Requires feedparser library. Install with: pip install feedparser

static static_call(feed_url: str, max_items: int = 10, include_content: bool = True, **context_variables) dict[str, Any][source]#

Synchronous wrapper for RSS reading.

Provides a blocking interface to the async RSS reader. Uses asyncio.run() to execute the async method.

Parameters
  • feed_url – URL of the RSS/Atom feed.

  • max_items – Maximum number of items to return.

  • include_content – Whether to include full article content.

  • **context_variables – Additional context passed from the agent.

Returns

Dictionary with parsed feed data and articles.

class calute.tools.web_tools.URLAnalyzer(name, bases, namespace, /, **kwargs)[source]#

Bases: AgentBaseFn

Analyze and extract information from URLs.

Provides URL parsing, validation, and optional availability checking. Extracts URL components like scheme, domain, path, and query parameters. Can also fetch and extract page metadata including Open Graph tags.

Inherits from AgentBaseFn for agent integration.

Example

>>> result = URLAnalyzer.static_call(
...     "https://example.com/path?query=value",
...     check_availability=True
... )
>>> print(result["domain"])
static static_call(url: str, check_availability: bool = False, extract_metadata: bool = True, **context_variables) dict[str, Any][source]#

Analyze URL structure and optionally check availability.

Parses the URL to extract its components and optionally checks if the URL is accessible and extracts page metadata.

Parameters
  • url – URL to analyze.

  • check_availability – Whether to check if URL is accessible.

  • extract_metadata – Whether to extract page metadata (requires availability).

  • **context_variables – Additional context passed from the agent.

Returns

  • url: Original URL

  • scheme: URL scheme (http, https, etc.)

  • domain: Full domain/netloc

  • path: URL path

  • params: URL parameters

  • query: Query string

  • fragment: URL fragment

  • is_valid: Whether URL has valid scheme and domain

  • tld: Top-level domain (if extractable)

  • domain_name: Primary domain name

  • subdomain: Subdomain if present

  • is_available: Whether URL responds (if checked)

  • status_code: HTTP status code (if checked)

  • final_url: URL after redirects (if checked)

  • title: Page title (if metadata extracted)

  • open_graph: Open Graph tags (if present)

  • description: Meta description (if present)

Return type

Dictionary containing

class calute.tools.web_tools.WebScraper(name, bases, namespace, /, **kwargs)[source]#

Bases: AgentBaseFn

Advanced web scraper with content extraction.

Provides comprehensive web scraping capabilities including HTML parsing, CSS selector-based content extraction, link and image extraction, and metadata parsing. Supports both async and sync execution modes.

The scraper uses BeautifulSoup for HTML parsing and httpx for HTTP requests with automatic redirect following.

Inherits from AgentBaseFn for agent integration.

Example

>>> result = WebScraper.static_call(
...     "https://example.com",
...     selector="article",
...     extract_links=True
... )
>>> print(result["title"])
async static async_call(url: str, selector: str | None = None, extract_links: bool = False, extract_images: bool = False, clean_text: bool = True, timeout: int = 30, **context_variables) dict[str, Any][source]#

Scrape web content with advanced extraction options.

Fetches the specified URL and extracts content based on the provided options. Supports CSS selector-based extraction, link/image extraction, and automatic text cleaning.

Parameters
  • url – The URL to scrape.

  • selector – CSS selector for specific content (requires beautifulsoup4).

  • extract_links – Whether to extract all links from the page.

  • extract_images – Whether to extract all images from the page.

  • clean_text – Whether to clean and format extracted text.

  • timeout – Request timeout in seconds (default: 30).

  • **context_variables – Additional context passed from the agent.

Returns

  • url: Final URL after redirects

  • status_code: HTTP response status code

  • title: Page title if available

  • content/selected_content: Extracted text content

  • links: List of extracted links (if requested)

  • images: List of extracted images (if requested)

  • meta: Dictionary of meta tags

  • error: Error message if request failed

Return type

Dictionary containing

Note

Requires beautifulsoup4 for HTML parsing.

static static_call(url: str, selector: str | None = None, extract_links: bool = False, extract_images: bool = False, clean_text: bool = True, timeout: int = 30, **context_variables) dict[str, Any][source]#

Synchronous wrapper for async web scraping.

Provides a blocking interface to the async scraping functionality. Uses asyncio.run() to execute the async method.

Parameters
  • url – The URL to scrape.

  • selector – CSS selector for specific content.

  • extract_links – Whether to extract all links from the page.

  • extract_images – Whether to extract all images from the page.

  • clean_text – Whether to clean and format extracted text.

  • timeout – Request timeout in seconds.

  • **context_variables – Additional context passed from the agent.

Returns

Dictionary with scraped content and metadata.