stop web scraping

Stop web scraping before it costs you

A lightweight beacon plus an edge model that fingerprints visitors and names the scrapers and AI crawlers hitting your content, including ones that never run JavaScript.

Start free Read the docs

The problem

The bots draining your content never run your scripts

Scrapers and AI crawlers fetch your pages all day, and the worst of them never run JavaScript, so client-only analytics and most bot tools never see them. You end up guessing which traffic is real and which is quietly draining your content into someone else's index or training set.

How it works

A beacon for browsers, a server report for the rest

The client beacon catches everything that runs JavaScript; a fire-and-forget origin report covers the pure crawlers that don't.

Drop the beacon on your pages

Add one async <script> tag pointing at https://api.formshield.dev/js/formshield.js with your publishable key and data-fs-mode="pageload". It auto-initializes from data-fs-* attributes, performs a signed handshake (POST /v1/handshake), and posts browser fingerprint and automation signals to POST /v1/collect on each pageview. No extra code.

Catch the non-JS crawlers server-side

Pure crawlers like GPTBot, ClaudeBot, PerplexityBot, and Bytespider fetch HTML without running scripts, so the beacon never sees them. Report each request from your origin worker or backend with POST /v1/report, passing the visitor's UA and IP. Fire it with ctx.waitUntil (fire-and-forget) so it adds zero latency and your page never depends on FormShield being up.

Read named, verified verdicts in your logs

The edge model scores every hit, classifies the user agent, and checks IP reputation. It names the bot (bot_id like gptbot or googlebot, plus the operating company) and, for operators that publish IP ranges (Google, Microsoft, OpenAI, DuckDuckGo), verifies the request really came from them. A forged Googlebot from the wrong IP is flagged as spoofed. View score, decision, and reasons per observation in the dashboard Logs.

What you get

Named crawlers, verified by IP, scored on every hit

Declared agents are named and operators that publish IP ranges are verified or flagged as spoofed, all backed by automation tells and self-hosted IP reputation.

AI and search crawler identification

Declared agents get a bot:ai_crawler or bot:search_crawler reason plus the named operator. GPTBot, ClaudeBot, PerplexityBot, and Bytespider are recognized; verified benign search crawlers are credited toward allow while AI crawlers stay visible.

Spoof detection via IP verification

For operators that publish their ranges, a request whose UA claims a crawler but whose IP is out of range is flagged bot:spoofed and scored high. A real crawler is confirmed (bot:verified); a forged one is caught.

Automation and missing-token tells

The signed handshake token proves a real browser ran the beacon. Its absence (client_token_missing) plus webdriver and headless markers (automation_detected) push the score toward block on the client path. Server reports correctly skip the missing-client penalty.

IP reputation on every hit

FormShield combines UA classification with self-hosted IP intelligence: datacenter, VPN, proxy, residential-proxy, and scanner flags plus country and ASN. A human UA from a datacenter range, or a desktop UA on a mobile IP, raises a consistency flag.

The call

Report a crawler from your origin

A spoofed-or-real GPTBot request reported from your edge — fire-and-forget, scored server-side, no decision to gate on.

curl -X POST https://api.formshield.dev/v1/report \
  -H "Authorization: Bearer fs_pub_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "ua": "Mozilla/5.0 (compatible; GPTBot/1.1; +https://openai.com/gptbot)",
    "ip": "203.0.113.42",
    "hostname": "example.com",
    "path": "/pricing",
    "action": "pageview"
  }'

{ "ok": true, "request_id": "rpt_a1b2c3d4e5f6" }

Get an API key Read the docs

FAQ

Common questions

The JS beacon only sees clients that execute scripts, so pure crawlers slip past it. Report each request from your origin with POST /v1/report, passing the visitor's UA and IP from the incoming request (on Cloudflare, CF-Connecting-IP and User-Agent). FormShield classifies and scores it server-side, naming AI agents like GPTBot and ClaudeBot and verifying or flagging crawlers by IP range. Send it fire-and-forget with ctx.waitUntil so it adds zero latency.

No. /v1/report returns only { ok: true, request_id }; scoring happens server-side and the score, decision, and reasons are stored on the observation. View them in the dashboard Logs. Never gate your response on this call, and always wrap the fetch in try/catch so a FormShield outage can never break your page.

The passive beacon is free up to 1M events per month, then metered. Deep analysis costs 4 credits per request. Billing is in credits, with raw request counts tracked separately; enterprise orgs can buy credit blocks.

Stop fighting spam by hand

One API call. IP, email, content & behavior signals in a single intelligence platform. Start free, no credit card required.

Start Free View Docs