How to Use Browser History SQLite Data for AI Agent Memory with Frequency Ranking

M
Matthew Diakonov

How to Use Browser History SQLite Data for AI Agent Memory with Frequency Ranking

Open any agent memory repository on GitHub and you will find the same pattern: the user tells the agent things, the agent stores them, the agent retrieves them later. This is note-taking with extra steps.

The most valuable information about how you actually work is not what you say - it is what you do. Your browser history shows which documentation pages you return to every week. Your bookmarks reveal the resources you considered important enough to save. Your frequently visited pages expose your real workflow, not the one you think you have.

This data already exists. Every browser stores it in SQLite. No scraping, no instrumentation, no special setup required. Yet agent memory systems ignore all of it because they are built around explicit knowledge capture instead of implicit behavior observation.

Where Browsers Store Their SQLite Data

All three major desktop browsers use SQLite under the hood, and all three expose visit counts directly in the schema.

Chrome and Chromium-based browsers (Edge, Brave, Arc):

macOS:   ~/Library/Application Support/Google/Chrome/Default/History
Linux:   ~/.config/google-chrome/Default/History
Windows: C:\Users\<user>\AppData\Local\Google\Chrome\User Data\Default\History

Firefox:

macOS:   ~/Library/Application Support/Firefox/Profiles/<profile>.default/places.sqlite
Linux:   ~/.mozilla/firefox/<profile>.default/places.sqlite
Windows: C:\Users\<user>\AppData\Roaming\Mozilla\Firefox\Profiles\<profile>.default\places.sqlite

Safari (macOS only):

~/Library/Safari/History.db

These are plain SQLite files. You can open them with sqlite3 from the command line or any SQLite viewer. The catch is that Chrome and Firefox lock the file while the browser is running - copy the file to a temp location before querying it in production.

Chrome History Schema

Chrome's History database is minimal and clean:

-- The urls table - core data for agent memory
CREATE TABLE urls (
    id              INTEGER PRIMARY KEY,
    url             LONGVARCHAR NOT NULL,
    title           LONGVARCHAR DEFAULT '',
    visit_count     INTEGER DEFAULT 0 NOT NULL,
    typed_count     INTEGER DEFAULT 0 NOT NULL,
    last_visit_time INTEGER NOT NULL,  -- microseconds since 1601-01-01 UTC
    hidden          INTEGER DEFAULT 0 NOT NULL
);

-- The visits table - each individual visit
CREATE TABLE visits (
    id               INTEGER PRIMARY KEY,
    url              INTEGER NOT NULL,  -- FK to urls.id
    visit_time       INTEGER NOT NULL,
    from_visit       INTEGER DEFAULT 0,
    transition       INTEGER DEFAULT 0 NOT NULL,
    segment_id       INTEGER DEFAULT 0,
    visit_duration   INTEGER DEFAULT 0 NOT NULL
);

The visit_count column in urls is what you want. Chrome increments it every time you navigate to that URL. typed_count is even more interesting - it counts how many times you typed the URL directly, which signals a much stronger intent than clicking a link.

Firefox places.sqlite Schema

Firefox uses a slightly more complex schema but exposes a frecency score - a built-in frequency + recency composite that Mozilla has been refining for years:

-- moz_places - Firefox's equivalent of Chrome's urls table
CREATE TABLE moz_places (
    id             INTEGER PRIMARY KEY,
    url            LONGVARCHAR NOT NULL,
    title          LONGVARCHAR,
    visit_count    INTEGER DEFAULT 0,
    hidden         INTEGER DEFAULT 0 NOT NULL,
    typed          INTEGER DEFAULT 0 NOT NULL,
    frecency       INTEGER DEFAULT -1 NOT NULL,  -- composite frequency/recency score
    last_visit_date INTEGER
);

-- moz_bookmarks - bookmarks and their folder structure
CREATE TABLE moz_bookmarks (
    id         INTEGER PRIMARY KEY,
    type       INTEGER,        -- 1 = bookmark, 2 = folder
    fk         INTEGER,        -- FK to moz_places.id
    parent     INTEGER,
    position   INTEGER,
    title      LONGVARCHAR,
    dateAdded  INTEGER,
    lastModified INTEGER
);

Firefox's frecency uses a decay algorithm that weights recent visits more than old ones, but still rewards frequent visits. A page with 100 visits spread over six months scores higher than a page visited once last week.

Safari History.db Schema

Safari's schema is the simplest:

CREATE TABLE history_items (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    url             TEXT NOT NULL UNIQUE,
    domain_expansion TEXT,
    visit_count     INTEGER,
    daily_visit_counts BLOB,   -- compact representation of per-day visits
    weekly_visit_counts BLOB,
    autocomplete_triggers BLOB,
    should_recompute_derived_visit_counts INTEGER,
    visit_count_score INTEGER  -- Safari's own frequency score
);

CREATE TABLE history_visits (
    id                  INTEGER PRIMARY KEY AUTOINCREMENT,
    history_item        INTEGER NOT NULL REFERENCES history_items(id),
    visit_time          REAL NOT NULL,  -- seconds since 2001-01-01 00:00:00 UTC
    title               TEXT,
    load_successful     BOOLEAN NOT NULL DEFAULT 1,
    http_non_get        BOOLEAN NOT NULL DEFAULT 0,
    synthesized         BOOLEAN NOT NULL DEFAULT 0,
    redirect_source     INTEGER,
    redirect_destination INTEGER,
    origin              INTEGER,
    generation          INTEGER
);

Safari stores dates as seconds since January 1, 2001 (Apple's Cocoa epoch) rather than the Unix epoch. Keep that in mind when converting timestamps.

The Agent Memory SQLite Schema

Rather than querying each browser's native database every time, extract the relevant data into a unified agent memory store:

CREATE TABLE browser_pages (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    url             TEXT NOT NULL,
    domain          TEXT NOT NULL,
    title           TEXT,
    visit_count     INTEGER DEFAULT 0,
    typed_count     INTEGER DEFAULT 0,  -- direct URL entry = strong intent signal
    first_seen      INTEGER,            -- unix timestamp
    last_seen       INTEGER,            -- unix timestamp
    source_browser  TEXT,               -- 'chrome', 'firefox', 'safari'
    is_bookmarked   INTEGER DEFAULT 0,
    bookmark_title  TEXT,
    tags            TEXT                -- JSON array of inferred topic tags
);

CREATE INDEX idx_visit_count ON browser_pages(visit_count DESC);
CREATE INDEX idx_domain      ON browser_pages(domain);
CREATE INDEX idx_last_seen   ON browser_pages(last_seen DESC);

CREATE TABLE page_visits (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    page_id     INTEGER NOT NULL REFERENCES browser_pages(id),
    visited_at  INTEGER NOT NULL,       -- unix timestamp
    duration_s  INTEGER DEFAULT 0      -- seconds spent on page
);

Separating browser_pages from page_visits lets you answer two different questions: "what pages are central to this person's work" (query browser_pages by visit_count) and "when do they typically access specific resources" (query page_visits by time of day).

Frequency-Based Ranking Queries

The core insight is simple: rank by access frequency, not recency. A page you visit fifty times over three months is more important to surface than a page you visited once yesterday.

Top pages by raw visit count:

SELECT
    domain,
    title,
    url,
    visit_count,
    typed_count,
    datetime(last_seen, 'unixepoch') AS last_seen
FROM browser_pages
ORDER BY visit_count DESC
LIMIT 50;

Weighted score combining frequency, typed intent, and bookmark status:

SELECT
    url,
    title,
    visit_count,
    typed_count,
    is_bookmarked,
    -- weight: visit_count + 3x typed entries + 10x bookmark bonus
    (visit_count + (typed_count * 3) + (is_bookmarked * 10)) AS importance_score
FROM browser_pages
WHERE last_seen > strftime('%s', 'now', '-90 days')
ORDER BY importance_score DESC
LIMIT 30;

Top domains (useful for understanding which tools and services the user relies on):

SELECT
    domain,
    COUNT(*) AS page_count,
    SUM(visit_count) AS total_visits,
    MAX(last_seen) AS most_recent_visit
FROM browser_pages
GROUP BY domain
ORDER BY total_visits DESC
LIMIT 20;

Recently revisited documentation pages - good candidates for proactive surfacing:

SELECT
    url,
    title,
    visit_count,
    datetime(last_seen, 'unixepoch') AS last_seen
FROM browser_pages
WHERE
    visit_count >= 5
    AND last_seen > strftime('%s', 'now', '-7 days')
    AND (
        url LIKE '%docs.%'
        OR url LIKE '%/docs/%'
        OR url LIKE '%/api/%'
        OR url LIKE '%stackoverflow.com%'
        OR url LIKE '%github.com%'
    )
ORDER BY visit_count DESC;

Ingesting Chrome History Into the Agent Store

Here is a minimal Python script that copies Chrome's history and ingests it into the agent memory database:

import sqlite3
import shutil
import os
import time

CHROME_HISTORY = os.path.expanduser(
    "~/Library/Application Support/Google/Chrome/Default/History"
)
AGENT_MEMORY_DB = os.path.expanduser("~/.agent-memory/browser.db")

def ingest_chrome_history():
    # Chrome locks the db while running - work from a copy
    tmp = "/tmp/chrome_history_copy"
    shutil.copy2(CHROME_HISTORY, tmp)

    src = sqlite3.connect(tmp)
    dst = sqlite3.connect(AGENT_MEMORY_DB)

    dst.execute("""
        CREATE TABLE IF NOT EXISTS browser_pages (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            url             TEXT NOT NULL UNIQUE,
            domain          TEXT NOT NULL,
            title           TEXT,
            visit_count     INTEGER DEFAULT 0,
            typed_count     INTEGER DEFAULT 0,
            last_seen       INTEGER,
            source_browser  TEXT,
            is_bookmarked   INTEGER DEFAULT 0
        )
    """)

    rows = src.execute("""
        SELECT
            url,
            title,
            visit_count,
            typed_count,
            -- convert Chrome microseconds to unix seconds
            (last_visit_time / 1000000) - 11644473600 AS last_seen_unix
        FROM urls
        WHERE visit_count > 0
        ORDER BY visit_count DESC
    """).fetchall()

    for url, title, visits, typed, last_seen in rows:
        domain = url.split("/")[2] if "//" in url else url
        dst.execute("""
            INSERT INTO browser_pages (url, domain, title, visit_count, typed_count, last_seen, source_browser)
            VALUES (?, ?, ?, ?, ?, ?, 'chrome')
            ON CONFLICT(url) DO UPDATE SET
                visit_count = MAX(visit_count, excluded.visit_count),
                typed_count = MAX(typed_count, excluded.typed_count),
                last_seen   = MAX(last_seen, excluded.last_seen),
                title       = COALESCE(excluded.title, title)
        """, (url, domain, title, visits, typed, last_seen))

    dst.commit()
    src.close()
    dst.close()
    os.remove(tmp)
    print(f"Ingested {len(rows)} URLs from Chrome history")

The ON CONFLICT ... DO UPDATE pattern lets you re-run ingestion without duplicating records. It keeps the higher visit count in case the agent's copy and the live browser diverge.

How an Agent Uses This Data

Once the data is in browser.db, the agent queries it at the start of a session to load context about what the user has been working on:

def get_user_work_context(db_path: str, days: int = 30) -> dict:
    conn = sqlite3.connect(db_path)

    # Top domains - reveals the tech stack and tools in use
    top_domains = conn.execute("""
        SELECT domain, SUM(visit_count) as visits
        FROM browser_pages
        WHERE last_seen > strftime('%s', 'now', '-' || ? || ' days')
        GROUP BY domain
        ORDER BY visits DESC
        LIMIT 10
    """, (days,)).fetchall()

    # High-frequency documentation - what the user is actively building
    key_docs = conn.execute("""
        SELECT url, title, visit_count
        FROM browser_pages
        WHERE visit_count >= 5
          AND (url LIKE '%docs.%' OR url LIKE '%/docs/%' OR url LIKE '%/api/%')
          AND last_seen > strftime('%s', 'now', '-' || ? || ' days')
        ORDER BY visit_count DESC
        LIMIT 20
    """, (days,)).fetchall()

    conn.close()

    return {
        "top_domains": top_domains,
        "key_documentation": key_docs,
    }

The agent calls this once per session, before the user types anything. It already knows which frameworks and APIs the user works with, which documentation pages they return to, and which tools they depend on. There is no onboarding step. There is no "tell me about your tech stack" prompt.

Frequency Over Recency

Most memory systems prioritize recent information because it is easy to implement - sort by timestamp, done. But frequency reveals importance in a way recency cannot.

Consider two scenarios. In the first, you open the React hooks documentation once to copy an example you saw on Twitter. In the second, you visit the same page forty times over three months because you keep needing to check the exact behavior of useEffect during a long refactor. Both show up equally in a recency-sorted query from last week. A frequency-sorted query puts the second page near the top of your agent's context and ignores the first one entirely.

This is what transforms agent memory from a recall system into something closer to an anticipation system. The agent does not wait for you to mention you are working in React. It already knows, because forty visits to the React docs is a stronger signal than any preference you could articulate.

Bookmarks add a second layer. A typed_count above zero is an even stronger signal - you considered this URL important enough to type it directly rather than clicking through from search or another page. Combining visit count, typed count, and bookmark status gives you a simple importance score that outperforms any recency-based ranking without requiring embeddings, vector search, or LLM inference.

The data already exists. The schema is straightforward. The queries are fast. The remaining gap is that most agent memory systems never thought to look.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts