Skip to content

Browser Automation Deep Dive

The Use Case That Started It All

This is the browser automation extension that led to the discovery of the Fetch framework.

See the Origin Story for the full context.


The Problem

Building an AI browser automation agent that needs to:

  • Parse DOM structure
  • Detect user commands
  • Remember context across page navigations
  • Decide when to execute actions
  • Avoid infinite loops

How Fetch Solved It

The Agent Loop

1. SENSE (3D)
   - Chirp: Detect command urgency
   - Perch: Parse DOM structure
   - Wake: Remember previous actions

2. MEASURE (DRIFT)
   - Current state: Where are we?
   - Goal state: Where should we be?
   - Gap: DRIFT = Goal - Current

3. ACT (Fetch)
   - Calculate: Fetch = Chirp × |DRIFT| × Confidence
   - Decide: Execute/Confirm/Queue/Wait
   - Execute if threshold met

4. LOOP
   - Re-measure DRIFT
   - Continue until DRIFT ≈ 0

User command: "Search for cormorant foraging on Reddit"

Step 1: Initial State

javascript
// Dimension scores
chirp = 90;  // Clear "search" command detected
perch = 70;  // On Reddit homepage, search bar found
wake = 80;   // Remember we're on Reddit

// DRIFT calculation
methodology = 90;  // Goal: view search results
performance = 20;  // Current: on homepage
drift = 70;

// Fetch calculation
confidence = min(70, 80) / 100 = 0.70;
fetch = 90 × 70 × 0.70 = 4,410;

Decision: Execute (fetch > 1000) → Type query in search bar


Step 2: After Typing

javascript
// New state
perch = 75;  // Search box now has text
performance = 40;  // Partially toward goal

// Recalculate
drift = 50;
fetch = 90 × 50 × 0.70 = 3,150;

Decision: Execute → Click search button or press Enter


javascript
// New state
perch = 85;  // Results page loaded
performance = 90;  // Near goal
drift = 0;

// Recalculate
fetch = 90 × 0 × 0.75 = 0;

Decision: Wait (DRIFT ≈ 0, goal achieved)


Preventing Infinite Loops

The original problem: Agent kept navigating to Reddit 11 times.

Solution using Fetch:

javascript
const actionHistory = [];

function checkForLoop(action, url) {
  const hash = `${action}-${url}`;
  const recentActions = actionHistory.slice(-10);
  const repeatCount = recentActions.filter(h => h === hash).length;

  if (repeatCount >= 3) {
    // Same action repeated too many times
    return true;  // Loop detected
  }

  actionHistory.push(hash);
  return false;
}

// Before executing
if (checkForLoop(decision.action, currentUrl)) {
  console.log('Loop detected. DRIFT not decreasing. Stopping.');
  fetch = 0;  // Force stop
}

The Semantic Anchoring Pattern

From the browser extension's governance document:

Rule 1: Observable DOM Over Inferred State

javascript
// ❌ WRONG: Inferring state
const isLoginPage = url.includes('/auth');

// ✅ RIGHT: Observable semantics
const isLoginPage = document.querySelector('input[type="password"]') !== null;

This maps to PerchIQX (Space/Structure) - always anchor to observable structure.

Rule 2: Intent Preservation

Never override user's original goal during multi-step execution.

This maps to WakeIQX (Time/Memory) - remember the original intent.

Rule 3: Semantic Element Selection

Use meaningful attributes:

  • ID attributes (most reliable)
  • Name attributes
  • ARIA labels
  • Text content

This ensures high PerchIQX scores → higher confidence → safer execution.


Real Agent Code

javascript
async function executeStep() {
  // 1. SENSE: Get DOM
  const dom = await extractSemanticDOM();

  // 2. SENSE: Calculate dimension scores
  const chirp = detectCommandUrgency(userCommand);
  const perch = calculateStructureScore(dom);
  const wake = getContextScore(actionHistory);

  // 3. MEASURE: Calculate DRIFT
  const drift = assessGapToGoal(currentState, goalState);

  // 4. ACT: Calculate Fetch
  const confidence = Math.min(perch, wake) / 100;
  const fetch = chirp * Math.abs(drift) * confidence;

  // 5. DECIDE
  if (fetch > 1000 && confidence > 0.6) {
    await executeAction(aiDecision);
  } else if (fetch > 1000 && confidence <= 0.6) {
    await executeWithReview(aiDecision);
  } else if (fetch > 500) {
    const approved = await confirmWithUser(aiDecision);
    if (approved) await executeAction(aiDecision);
  } else {
    console.log('Waiting - insufficient Fetch score');
  }

  // 6. LOOP: Re-measure
  setTimeout(executeStep, 2000);
}

The GitHub Repository

The actual browser extension:

Semantic Fetch Intelligence Extension

Features:

  • Gemini 3.0 Pro integration
  • Semantic DOM extraction
  • Agentic loop (analyze → think → act)
  • Loop detection
  • Action validation

This is where Fetch was discovered.


Key Insights

1. The DRIFT Must Decrease

After each action, re-measure DRIFT. If it's not decreasing, you're stuck.

2. Confidence Gates Action

Even with high urgency (Chirp) and large gap (DRIFT), low confidence (Perch or Wake) blocks execution.

3. Multiplicative is Critical

javascript
fetch = chirp × drift × confidence

If any component is zero, Fetch is zero. This prevents:

  • Acting without urgency (Chirp = 0)
  • Acting when goal is met (DRIFT = 0)
  • Acting when not ready (Confidence = 0)

Lessons Learned

ProblemFetch Solution
Infinite loopsDRIFT should decrease; track history
Wrong element clickedHigh Perch score requires semantic selectors
Premature actionConfidence multiplier gates execution
Stuck on loading pageWake (context) should track "page loaded"
Ignoring user commandsChirp detects urgency; don't act without it

Try It Yourself

  1. Clone the repository
  2. Install dependencies
  3. Load in Chrome
  4. Try: "Search for artificial intelligence on Reddit"
  5. Watch the Fetch framework in action

View the complete extension →

← Back to Use Cases