Browser Agent

Overview

Magnitude can control a browser visually using screenshots to observe the page and coordinate-based actions to interact with it. Common uses include:

Testing web UI behavior
Verifying interface changes
Web scraping and scripted browsing tasks

Setup

The browser agent requires Chromium.

Run /browser-setup in Magnitude, or
Install manually with:

npx patchright install chromium

How it works

On each turn, the browser agent receives a fresh screenshot of the current page. It can then perform actions such as clicking, typing, scrolling, dragging, navigation, tab switching, or JavaScript evaluation. After actions, it waits for page stability before continuing so results are based on settled page state.

Supported models

The browser agent requires a visually grounded model. See Providers & Models for compatibility details.

Capabilities

Click
Double-click
Right-click
Type (including special keys)
Scroll
Drag
Navigate
Go back
Tab management
Screenshots
JavaScript evaluation

Getting Started

Configuration

Working with Magnitude

Core Concepts

Tools Reference

Contributing

Overview

Setup

How it works

Supported models

Capabilities

Getting Started

Configuration

Working with Magnitude

Core Concepts

Tools Reference

Contributing

​Overview

​Setup

​How it works

​Supported models

​Capabilities

Overview

Setup

How it works

Supported models

Capabilities