Agentic Browser

Home Projects Agentic Browser Browser Extension

Browser Extension

The Browser Extension is a WXT-based TypeScript extension providing frontend browser automation capabilities. It operates independently from the main Python API server with its own Flask backend for secure credential management. The extension features a sidepanel chat interface, a background service worker with 26+ browser automation tools, and content scripts for DOM manipulation.

The extension architecture consists of three main components:

For information about the main Python backend API that can also be used independently, see Python Backend API. For details on the agent intelligence system, see Agent Intelligence System.

High-Level Architecture

The extension implements a message-passing architecture where the React UI communicates with the background script, which orchestrates browser automation and external API calls.

Architecture Diagram

Key Design Decisions:

  • Dual Backend Model: The extension has its own Flask backend (backend_service.py) separate from the main FastAPI server. This handles OAuth token exchange securely without exposing client secrets to the extension code.
  • Message-Based Communication: All communication between UI and background uses browser.runtime.sendMessage() with typed message payloads.
  • Script Injection Pattern: DOM operations use browser.scripting.executeScript() rather than persistent content scripts for better security and isolation.

Sources: extension/entrypoints/background.ts17-156 extension/backend_service.py1-229

Component Directory Structure

The extension follows the WXT framework's entrypoint-based structure:

extension/
├── entrypoints/
│   ├── sidepanel/           # React UI components
│   │   ├── AgentExecutor.tsx
│   │   ├── UnifiedSettingsMenu.tsx
│   │   └── lib/
│   │       └── agent-map.ts
│   ├── utils/               # Shared utilities
│   │   ├── parseAgentCommand.ts
│   │   └── executeAgent.ts
│   ├── background.ts        # Service worker
│   └── content.ts          # Content script
├── backend_service.py       # Flask backend server
└── wxt.config.ts           # WXT configuration
Entrypoint Type File Purpose
Sidepanel sidepanel/AgentExecutor.tsx Main chat interface (see Extension Architecture Overview)
Background background.ts Message routing and 26+ browser tools (see Background Script and Message Handling)
Content Script content.ts DOM manipulation (injected on demand)
Backend Service backend_service.py OAuth and Gemini API proxy (see Extension Backend Service)

Sources: extension/entrypoints/background.ts1-50 extension/backend_service.py1-50

Integration with Backend APIs

The extension integrates with two separate backend services:

Main Python API (FastAPI)

The extension can call the main FastAPI server for agent operations. Connection details are stored in browser.storage.local under the baseUrl key. The UI constructs HTTP requests to endpoints like:

  • /api/genai/react - React Agent with tool use
  • /api/gmail/* - Gmail operations
  • /api/calendar/* - Calendar operations
  • /api/genai/youtube - YouTube analysis
  • /api/genai/website - Website analysis

The executeAgent() utility at extension/entrypoints/utils/executeAgent.ts17-127 handles request construction, credential injection, and response formatting. See Extension Architecture Overview for details on the command system and request flow.

Extension Flask Backend

The extension includes a standalone Flask backend at extension/backend_service.py that provides:

  • OAuth Token Exchange: /exchange-code, /refresh-token for Google OAuth (lines 36-112)
  • GitHub OAuth: /github/exchange-code for GitHub authentication (lines 115-159)
  • Gemini Chat API: /chat endpoint proxying requests to Google Gemini API (lines 162-202)

This separation ensures OAuth client secrets never leak to the extension code. The Flask server must be started separately: python backend_service.py.

Environment Variables Required:

Variable Purpose Used By
GOOGLE_CLIENT_SECRET Google OAuth client secret /exchange-code, /refresh-token
GITHUB_CLIENT_ID GitHub OAuth app ID /github/exchange-code
GITHUB_CLIENT_SECRET GitHub OAuth app secret /github/exchange-code
GEMINI_API_KEY Google Gemini API key /chat

Sources: extension/backend_service.py1-229 extension/entrypoints/utils/executeAgent.ts17-127

Browser Automation Capabilities

The extension provides 26+ browser automation tools implemented in the background script. These tools can be invoked via the EXECUTE_AGENT_TOOL message type. The tools fall into several categories:

DOM Interaction Tools

  • GET_PAGE_INFO: Extract page metadata (URL, title, media, forms)
  • EXTRACT_DOM: Build recursive DOM tree structure
  • CLICK: Click elements by selector
  • TYPE: Type text into inputs, textareas, or contenteditable elements
  • FILL_FORM: Fill multiple form fields and optionally submit
  • SELECT_DROPDOWN: Select dropdown options by value, text, or index
  • HOVER: Trigger hover events on elements

Element Query Tools

  • WAIT_FOR_ELEMENT: Poll until element exists/visible/hidden
  • FIND_ELEMENTS: Find all elements matching selector with metadata
  • GET_ELEMENT_TEXT: Extract text content from element
  • GET_ELEMENT_ATTRIBUTES: Get all attributes of an element

Navigation Tools

  • NAVIGATE: Navigate to URL and wait for page load
  • GO_BACK: Navigate backward in history
  • GO_FORWARD: Navigate forward in history
  • RELOAD_TAB: Reload current tab
  • SCROLL: Scroll page or to specific element

Tab Management Tools

  • OPEN_TAB: Open new tab with URL
  • CLOSE_TAB: Close specified tab
  • SWITCH_TAB: Switch to tab by ID or direction (next/previous)
  • DUPLICATE_TAB: Duplicate current tab
  • GET_ALL_TABS: List all open tabs

Data Extraction Tools

  • SCREENSHOT: Capture visible page area
  • GET_COOKIES: Retrieve session/auth cookies
  • SET_COOKIE: Set cookie value
  • GET_LOCAL_STORAGE: Read localStorage keys
  • SET_LOCAL_STORAGE: Write localStorage keys

Advanced Tools

  • EXECUTE_SCRIPT: Execute arbitrary JavaScript in page context

All tools are implemented at extension/entrypoints/background.ts893-1700 and use either browser.scripting.executeScript() for DOM operations or browser.tabs.* APIs for browser-level operations. See Browser Automation Tools for detailed implementation of each tool.

Sources: extension/entrypoints/background.ts893-1026 extension/entrypoints/background.ts1030-1700

Communication Patterns

The extension implements two primary communication patterns:

Extension to Backend Communication

User commands flow from the UI to external APIs:

Title: Extension to Backend API Flow

Architecture Diagram

The executeAgent() function at extension/entrypoints/utils/executeAgent.ts17-127 handles credential injection, URL extraction, and payload construction. See Extension Architecture Overview for detailed request flow.

Internal Message Passing

Background script handles messages from UI and content scripts:

Title: Internal Extension Message Flow

Architecture Diagram

The background script's message router at extension/entrypoints/background.ts24-128 supports 8 message types. See Background Script and Message Handling for complete message type documentation.

Sources: extension/entrypoints/utils/executeAgent.ts17-127 extension/entrypoints/background.ts24-128 extension/entrypoints/background.ts516-539

Storage Schema

The extension uses browser.storage.local for persistent state. Key storage keys:

Storage Key Type Purpose Set By
chatHistory ChatMessage[] Conversation messages AgentExecutor
googleUser {token, email, name, picture} Google OAuth state Settings UI
baseUrl string Python backend URL Settings UI
jportalId string JIIT portal username Settings UI
jportalPass string JIIT portal password Settings UI
jportalData object JIIT session data Settings UI after login
tabsData TabsData Current tab information background.ts
allTabsUrls TabInfo[] All open tabs background.ts
activeTabUrl TabInfo Currently active tab background.ts
totalTabs number Tab count background.ts

Chat history persistence logic at extension/Extension/entrypoints/sidepanel/AgentExecutor.tsx53-77:

Sources: extension/Extension/entrypoints/sidepanel/AgentExecutor.tsx53-77 extension/Extension/entrypoints/utils/executeAgent.ts29-54 extension/Extension/entrypoints/background.ts867-882

UI Rendering

The AgentExecutor component renders a modern chat interface with three main sections:

Empty State

At extension/Extension/entrypoints/sidepanel/AgentExecutor.tsx429-434 shown when chatHistory.length === 0:

Chat Container

At extension/Extension/entrypoints/sidepanel/AgentExecutor.tsx435-466 displays conversation:

Composer

At extension/Extension/entrypoints/sidepanel/AgentExecutor.tsx480-550 handles input:

All styling is inline at extension/Extension/entrypoints/sidepanel/AgentExecutor.tsx552-676 using dark gradient theme with blur effects.

Sources: extension/Extension/entrypoints/sidepanel/AgentExecutor.tsx407-676

Environment Configuration

The extension reads configuration from environment variables via Vite:

The .gitignore at extension/.gitignore42-44 excludes .env files from version control to protect secrets.

Sources: extension/Extension/entrypoints/utils/executeAgent.ts38 extension/.gitignore42-44