API Reference

Scrape

Scrape any web page and optionally process it with an LLM. Supports JavaScript rendering, CSS selectors, and multiple output formats (raw HTML, clean HTML, markdown, metadata, links, emails).

Use this to fetch any web page as clean markdown, HTML, or structured data. Add a prompt to extract specific fields with an LLM instead of getting the full page.

1 credit per request. Add render_js=true for JavaScript-heavy pages (+4 credits). Add prompt for LLM extraction (+5 credits).

Quick example

Scrape a page and get clean markdown:

curl -G "https://api.webscraperapi.ai/v1/scrape" \
  -H "Authorization: Bearer $WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "url=https://news.ycombinator.com" \
  --data-urlencode "output=markdown"

Extract specific data with a prompt:

curl -G "https://api.webscraperapi.ai/v1/scrape" \
  -H "Authorization: Bearer $WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "url=https://en.wikipedia.org/wiki/Anthropic" \
  --data-urlencode "prompt=When was Anthropic founded and who are the founders?"

Use output=markdown when feeding content to an LLM — it's clean and token-efficient. Use css_selector to narrow extraction to a specific part of the page (e.g., css_selector=article or css_selector=.product-details).

Output formats

ValueWhat you get
markdownClean markdown — great for LLMs
raw_htmlOriginal HTML as-is
clean_htmlHTML with scripts and styles stripped
html_head_metadata_jsonPage title, description, Open Graph tags as JSON
linksAll links on the page
emailsEmail addresses found on the page

Full API reference

GET
/v1/scrape

Query Parameters

url*string

The URL to scrape.

Formaturi
Length1 <= length <= 2083
prompt?|

Optional prompt for LLM processing.

css_selector?|

Optional CSS selector to narrow content.

llm?|||||

LLM model for processing (e.g., 'gpt-4o-mini').

render_js?|

Whether to render JavaScript.

Defaultfalse
output?|

Output format: 'raw_html', 'clean_html', 'markdown', 'html_head_metadata_json', 'email_addresses', 'internal_links', 'external_links', 'all_links'.

timeout_ms?|

Request timeout in milliseconds.

Response Body

application/json

application/json

curl -X GET "https://api.webscraperapi.ai/v1/scrape?url=http%3A%2F%2Fexample.com"
null
{
  "detail": [
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}