RobutlerRobutler
SkillsCore

LLM Skills

WebAgents provides a harmonized interface for interacting with various Large Language Model (LLM) providers. Whether you are using Google Gemini, OpenAI, Anthropic Claude, or xAI Grok, the configuration patterns for tools and reasoning capabilities remain consistent.

Every LLM skill is a thin wrapper over a shared provider adapter (webagents/typescript/src/adapters/). Adapters own all provider-specific logic — request building, stream parsing, media support declarations — while skills focus on lifecycle, context, and billing integration. This architecture means adding a new provider requires only a new adapter; billing, media handling, and tool pricing work automatically.

Supported Providers

  • Google: Gemini models via shared googleAdapter.
  • OpenAI: GPT and o-series models via shared openaiAdapter.
  • Anthropic: Claude models via shared anthropicAdapter.
  • xAI (Grok): Grok models via shared xaiAdapter (OpenAI-compatible).
  • Fireworks: Open-weight models via shared fireworksAdapter (OpenAI-compatible).

Configuration

LLM skills are configured in your AGENT.md file under the skills section or passed dynamically.

Basic Configuration

All LLM skills accept these common parameters:

skills:
  - google:
      model: "gemini-2.5-flash"
      temperature: 0.7
      max_tokens: 8192
      api_key: "${GOOGLE_API_KEY}" # Optional, uses env var by default

Harmonized "Thinking" Configuration

Enable internal reasoning (Chain of Thought) across supported models using a unified syntax. The framework automatically translates this to the provider's specific API parameters (e.g., thinking_config for Google, reasoning_effort for OpenAI o1, thinking block for Anthropic).

skills:
  - google:
      model: "gemini-2.5-flash"
      thinking:
        enabled: true
        budget_tokens: 4096  # Token budget for thinking
        effort: "medium"     # Alternative to budget: low (1k), medium (4k), high (8k)
ParameterTypeDescription
enabledboolActivates reasoning/thinking mode.
budget_tokensintMaximum number of tokens to allocate for thoughts.
effortstringAbstract effort level: low, medium, high. Used if budget_tokens is not set.

Built-in Tools Configuration

Enable provider-specific built-in tools using harmonized names where possible.

skills:
  - google:
      tools:
        - web_search        # Maps to Google Search
        - code_execution    # Maps to Google Code Execution
  
  - openai:
      tools:
        - web_search        # Maps to OpenAI Web Search
        - code_interpreter  # Maps to OpenAI Code Interpreter
        
  - anthropic:
      tools:
        - computer_use      # Maps to Claude Computer Use
        - bash              # Maps to Bash tool
        - text_editor       # Maps to Text Editor tool

Common Tool Names:

  • web_search: General web search capability.
  • code_execution / code_interpreter: Python code execution sandbox.

Shared Adapter Architecture

All LLM skills delegate provider-specific work to shared adapters defined in webagents/typescript/src/adapters/. Each adapter implements the LLMAdapter interface:

interface LLMAdapter {
  provider: string;
  mediaSupport: Record<string, 'base64' | 'url'>;
  buildRequest(params: AdapterRequestParams): AdapterRequest;
  parseStream(response: Response): AsyncGenerator<AdapterChunk>;
}

Media Support

Each adapter declares which content modalities it supports and how (base64 inline data vs URL reference). The MediaSkill reads these declarations to automatically convert content to the right format before an LLM call.

ProviderImagesAudioDocumentsVideo
Googlebase64base64base64base64
OpenAIurlbase64
Anthropicbase64base64
xAIurl
Fireworksurl

Context Integration

Every skill sets two context fields that other skills (PaymentSkill, MediaSkill) depend on:

  • _llm_capabilities: Set before the LLM call with model name, pricing rates, and max output tokens. Used by PaymentSkill to lock funds.
  • _llm_usage: Set after the LLM call with actual token counts, model used, and is_byok flag. Used by PaymentSkill to settle charges.
// Before call
context.set('_llm_capabilities', {
  model: 'gemini-2.5-flash',
  pricing: { inputPer1kTokens: 0.00015, outputPer1kTokens: 0.0006 },
  maxOutputTokens: 8192,
});

// After call (from streaming done event)
context.set('_llm_usage', {
  model: 'gemini-2.5-flash',
  input_tokens: 1200,
  output_tokens: 450,
  is_byok: false,
});

Developer Usage

When using the Python API directly:

from webagents.agents.skills.core.llm.google.skill import GoogleAISkill

config = {
    "model": "gemini-2.5-flash",
    "thinking": {
        "enabled": True,
        "effort": "high"
    }
}

skill = GoogleAISkill(config)
response = await skill.chat_completion(messages=[...])

Adding a New Provider

  1. Create an adapter in webagents/typescript/src/adapters/your-provider.ts implementing LLMAdapter.
  2. Register it in webagents/typescript/src/adapters/index.ts via getAdapter().
  3. Create a thin skill wrapper in webagents/typescript/src/skills/llm/your-provider/skill.ts that calls adapter.buildRequest() and adapter.parseStream(), and sets _llm_capabilities / _llm_usage on the context.

Billing, media resolution, and tool pricing will work automatically through the PaymentSkill and MediaSkill hooks.

On this page