LLM Skills
WebAgents provides a harmonized interface for interacting with various Large Language Model (LLM) providers. Whether you are using Google Gemini, OpenAI, Anthropic Claude, or xAI Grok, the configuration patterns for tools and reasoning capabilities remain consistent.
Every LLM skill is a thin wrapper over a shared provider adapter (webagents/typescript/src/adapters/). Adapters own all provider-specific logic — request building, stream parsing, media support declarations — while skills focus on lifecycle, context, and billing integration. This architecture means adding a new provider requires only a new adapter; billing, media handling, and tool pricing work automatically.
Supported Providers
- Google: Gemini models via shared
googleAdapter. - OpenAI: GPT and o-series models via shared
openaiAdapter. - Anthropic: Claude models via shared
anthropicAdapter. - xAI (Grok): Grok models via shared
xaiAdapter(OpenAI-compatible). - Fireworks: Open-weight models via shared
fireworksAdapter(OpenAI-compatible).
Configuration
LLM skills are configured in your AGENT.md file under the skills section or passed dynamically.
Basic Configuration
All LLM skills accept these common parameters:
skills:
- google:
model: "gemini-2.5-flash"
temperature: 0.7
max_tokens: 8192
api_key: "${GOOGLE_API_KEY}" # Optional, uses env var by defaultHarmonized "Thinking" Configuration
Enable internal reasoning (Chain of Thought) across supported models using a unified syntax. The framework automatically translates this to the provider's specific API parameters (e.g., thinking_config for Google, reasoning_effort for OpenAI o1, thinking block for Anthropic).
skills:
- google:
model: "gemini-2.5-flash"
thinking:
enabled: true
budget_tokens: 4096 # Token budget for thinking
effort: "medium" # Alternative to budget: low (1k), medium (4k), high (8k)| Parameter | Type | Description |
|---|---|---|
enabled | bool | Activates reasoning/thinking mode. |
budget_tokens | int | Maximum number of tokens to allocate for thoughts. |
effort | string | Abstract effort level: low, medium, high. Used if budget_tokens is not set. |
Built-in Tools Configuration
Enable provider-specific built-in tools using harmonized names where possible.
skills:
- google:
tools:
- web_search # Maps to Google Search
- code_execution # Maps to Google Code Execution
- openai:
tools:
- web_search # Maps to OpenAI Web Search
- code_interpreter # Maps to OpenAI Code Interpreter
- anthropic:
tools:
- computer_use # Maps to Claude Computer Use
- bash # Maps to Bash tool
- text_editor # Maps to Text Editor toolCommon Tool Names:
web_search: General web search capability.code_execution/code_interpreter: Python code execution sandbox.
Shared Adapter Architecture
All LLM skills delegate provider-specific work to shared adapters defined in webagents/typescript/src/adapters/. Each adapter implements the LLMAdapter interface:
interface LLMAdapter {
provider: string;
mediaSupport: Record<string, 'base64' | 'url'>;
buildRequest(params: AdapterRequestParams): AdapterRequest;
parseStream(response: Response): AsyncGenerator<AdapterChunk>;
}Media Support
Each adapter declares which content modalities it supports and how (base64 inline data vs URL reference). The MediaSkill reads these declarations to automatically convert content to the right format before an LLM call.
| Provider | Images | Audio | Documents | Video |
|---|---|---|---|---|
| base64 | base64 | base64 | base64 | |
| OpenAI | url | base64 | — | — |
| Anthropic | base64 | — | base64 | — |
| xAI | url | — | — | — |
| Fireworks | url | — | — | — |
Context Integration
Every skill sets two context fields that other skills (PaymentSkill, MediaSkill) depend on:
_llm_capabilities: Set before the LLM call with model name, pricing rates, and max output tokens. Used by PaymentSkill to lock funds._llm_usage: Set after the LLM call with actual token counts, model used, andis_byokflag. Used by PaymentSkill to settle charges.
// Before call
context.set('_llm_capabilities', {
model: 'gemini-2.5-flash',
pricing: { inputPer1kTokens: 0.00015, outputPer1kTokens: 0.0006 },
maxOutputTokens: 8192,
});
// After call (from streaming done event)
context.set('_llm_usage', {
model: 'gemini-2.5-flash',
input_tokens: 1200,
output_tokens: 450,
is_byok: false,
});Developer Usage
When using the Python API directly:
from webagents.agents.skills.core.llm.google.skill import GoogleAISkill
config = {
"model": "gemini-2.5-flash",
"thinking": {
"enabled": True,
"effort": "high"
}
}
skill = GoogleAISkill(config)
response = await skill.chat_completion(messages=[...])Adding a New Provider
- Create an adapter in
webagents/typescript/src/adapters/your-provider.tsimplementingLLMAdapter. - Register it in
webagents/typescript/src/adapters/index.tsviagetAdapter(). - Create a thin skill wrapper in
webagents/typescript/src/skills/llm/your-provider/skill.tsthat callsadapter.buildRequest()andadapter.parseStream(), and sets_llm_capabilities/_llm_usageon the context.
Billing, media resolution, and tool pricing will work automatically through the PaymentSkill and MediaSkill hooks.