StoreMediaSkill (Media Skill)
The StoreMediaSkill handles multi-modal content resolution, storage, and URL management across all LLM providers. It acts as the portal content boundary: tools produce raw UAMP content_items (base64 or temp CDN URLs), and this skill intercepts via hooks to upload them to /content and replace references with /api/content/UUID URLs.
How It Works
StoreMediaSkill uses three UAMP lifecycle hooks:
before_llm_call -- Content Resolution
Before an LLM call, StoreMediaSkill scans all messages for content URLs (e.g., /api/content/{uuid}). For each URL, it:
- Reads the adapter's
mediaSupportdeclaration for the target provider. - If the provider expects
base64, resolves the URL to base64 data via the configuredMediaResolver. - If the provider expects
url, passes through (or converts to an HMAC-signed URL for access control). - Caches resolved content to avoid redundant fetches across turns.
Resolved content is stored on context._resolved_images for the adapter to use.
after_tool -- Content Storage (Portal Boundary)
After a tool call returns a StructuredToolResult with content_items, StoreMediaSkill:
- Detects non-
/contentmedia (base64 data URIs, external temp CDN URLs). - Downloads external URLs or extracts base64 data.
- Uploads via the configured
MediaSaverto get a/api/content/UUIDURL. - Replaces the content_item's media field with the new URL.
- Appends
/api/content/UUIDURLs totool_result.textso the LLM can see them for delegation.
This hook runs at priority 10, before the payment hook (priority 20).
after_llm_call -- Generated Media Saving
After an LLM call, StoreMediaSkill checks context._inline_images (populated by adapters when they encounter inline media in the provider response, e.g., Gemini's inlineData). For each generated image:
- Saves the base64 data to persistent storage via the configured
MediaSaver. - Stores the resulting content URLs on
context._saved_media_urls.
Architecture
Tools/LLM skills produce raw UAMP content_items (base64 or temp CDN URLs)
|
v
StoreMediaSkill (portal-specific)
- after_tool: uploads base64/temp URLs to /content
- Replaces content_items with /api/content/UUID
- Appends URLs to tool result text
|
v
LLM sees /api/content/UUID in context
Delegate passes /api/content/UUID in attachments
UI renders via content_itemsStandalone (non-portal) agents: no StoreMediaSkill = content stays as base64 UAMP, everything still works.
Configuration
StoreMediaSkill requires two injectable dependencies:
import { StoreMediaSkill } from 'webagents/skills/media';
const mediaSkill = new StoreMediaSkill({
resolver: myMediaResolver, // implements MediaResolver
saver: myMediaSaver, // implements MediaSaver
});MediaResolver Interface
interface MediaResolver {
resolve(url: string, mode: 'base64' | 'url', userId?: string): Promise<ResolvedMedia | null>;
}The resolver fetches content from a URL and returns it as base64 data or a signed URL. The optional userId parameter enables access control checks.
MediaSaver Interface
interface MediaSaver {
save(base64: string, mimeType: string, meta?: { chatId?: string; agentId?: string; userId?: string }): Promise<string>;
}The saver persists base64 media data and returns a relative URL (/api/content/UUID) where the content can be accessed.
Platform Integration
On the Robutler platform, StoreMediaSkill is automatically configured with:
PortalMediaResolver: Resolves content URLs by reading from the platform's content storage. Enforces ownership checks (canAccessContent(contentId, userId)) to prevent unauthorized content access.PortalMediaSaver: Saves generated media to the platform's content storage system and returns a/api/content/{uuid}URL. Falls back to a generated UUID forownerIdwhen neitheruserIdnoragentIdare available (e.g. in anonymous or system contexts).
These are wired via PortalStoreMediaFactory in lib/agents/factories.ts.
Content Reference Model
- All portal content eventually gets a
/api/content/UUIDURL. /api/content/UUIDURLs are the primary LLM-visible content reference in tool result text.- The delegate tool accepts full
/api/content/UUIDURLs or bare UUIDs in itsattachmentsparameter. [content:UUID]labels are not used. URLs serve as the reference mechanism.
Cross-Turn Visual Context
The platform preserves media content across conversation turns. When an LLM generates or receives an image, that content remains available to subsequent LLM calls in the same chat:
- Chat history reconstruction (
chatHistoryToOpenAIMessages): When loading conversation history for a new LLM call, mediacontent_itemsattached to both user and assistant messages are preserved and signed. This ensures the LLM can "see" images from prior turns. - Content resolution (
resolveContentMediainuamp-proxy.ts): Before sending messages to the LLM provider,/api/content/UUIDreferences incontent_itemsare resolved to base64inlineData(for Gemini) or signed URLs (for other providers). This applies to all message roles including tool results. - Tool result media: When a tool call returns
content_items(e.g., fromgenerate_imageordelegate), these are included as inline media parts in subsequent Gemini requests, allowing the LLM to reference generated images in its response.
This enables workflows like: "generate a unicorn with Flux, then delegate to nano-banana to make it green" — nano-banana receives the original image as visual context alongside the editing instruction.
Provider-Aware Resolution
The key value of StoreMediaSkill is that it adapts content format to match each provider's requirements:
| Provider | Image Format | Audio Format | Video Format | Document Format |
|---|---|---|---|---|
| Google Gemini | base64 inlineData | base64 inlineData | base64 inlineData | base64 inlineData (PDF, text/*, JSON, XML, code) |
| OpenAI | URL image_url | base64 input_audio | placeholder | base64 file part (PDF, DOCX, XLSX, PPTX, text/*, JSON, code) |
| Anthropic | base64 source.data | placeholder | placeholder | base64 document block (PDF, DOCX, XLSX, PPTX, text/*, CSV, HTML, MD) |
| xAI / Fireworks | URL image_url | -- | -- | -- |
Without StoreMediaSkill, each LLM skill would need its own content resolution logic. With it, all content handling is centralized and consistent.
Caching
StoreMediaSkill caches resolved content in-memory keyed by (url, format) pairs. This avoids redundant disk reads when the same image appears in multiple conversation turns. The cache is scoped to the skill instance lifetime.
Security
- Content IDOR Prevention:
PortalMediaResolveralways verifies that the requesting user owns the content before resolving it. Content IDs that fail the ownership check returnnull. - No Raw URLs to Providers: When a provider requires base64, the content URL is never sent to the external LLM provider -- only the resolved base64 data is transmitted.