webscraperapi.ai: The Web Toolkit Your AI Agents Won't Get Blocked On

Scrape any web page and optionally process it with an LLM. Supports JavaScript rendering, CSS selectors, and multiple output formats (raw HTML, clean HTML, markdown, metadata, links, emails).

Use this to fetch any web page as clean markdown, HTML, or structured data. Add a prompt to extract specific fields with an LLM instead of getting the full page.

1 credit per request. Add render_js=true for JavaScript-heavy pages (+4 credits). Add prompt for LLM extraction (+5 credits).

Quick example

Scrape a page and get clean markdown:

curl -G "https://api.webscraperapi.ai/v1/scrape" \
  -H "Authorization: Bearer $WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "url=https://news.ycombinator.com" \
  --data-urlencode "output=markdown"

Extract specific data with a prompt:

curl -G "https://api.webscraperapi.ai/v1/scrape" \
  -H "Authorization: Bearer $WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "url=https://en.wikipedia.org/wiki/Anthropic" \
  --data-urlencode "prompt=When was Anthropic founded and who are the founders?"

Use output=markdown when feeding content to an LLM — it's clean and token-efficient. Use css_selector to narrow extraction to a specific part of the page (e.g., css_selector=article or css_selector=.product-details).

Output formats

Value	What you get
`markdown`	Clean markdown — great for LLMs
`raw_html`	Original HTML as-is
`clean_html`	HTML with scripts and styles stripped
`html_head_metadata_json`	Page title, description, Open Graph tags as JSON
`links`	All links on the page
`emails`	Email addresses found on the page

Full API reference

Query Parameters

url*string

The URL to scrape.

Formaturi

Length1 <= length <= 2083

prompt?|

Optional prompt for LLM processing.

css_selector?|

Optional CSS selector to narrow content.

llm?|||||

LLM model for processing (e.g., 'gpt-4o-mini').

render_js?|

Whether to render JavaScript.

Defaultfalse

output?|

Output format: 'raw_html', 'clean_html', 'markdown', 'html_head_metadata_json', 'email_addresses', 'internal_links', 'external_links', 'all_links'.

timeout_ms?|

Request timeout in milliseconds.

Response Body

`application/json`

curl -X GET "https://api.webscraperapi.ai/v1/scrape?url=http%3A%2F%2Fexample.com"

null

{
  "detail": [
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}

Scrape

Quick example

Example response

Example response

Output formats

Full API reference

Query Parameters

Response Body

200application/json

422application/json

`application/json`

`application/json`