---
name: webscraperapi
description: >-
  Scrape any web page and get clean, LLM-ready content. Use this whenever you
  need to fetch a URL, read a web page, extract data from a website, check what
  technologies a site uses, search Google, or verify an email address. Handles
  anti-bot protection, JavaScript rendering, proxies, and CAPTCHAs
  automatically. Use the prompt parameter to extract only the specific
  information you need — this returns a small, focused answer instead of the
  full page. Supports markdown, JSON, CSV, and HTML output. 500 free credits
  on signup.
---

# webscraperapi

> Version: 0.0.30+2026-03-03 | Generated: 2026-03-03

Web scraping API that handles anti-bot, JS rendering, proxies, and CAPTCHAs so you don't have to. Returns clean content ready for LLM consumption.

Base URL: `https://api.webscraperapi.ai:443`

## Quick start

Set the `Authorization` header on every request:

```
Authorization: Bearer WEBSCRAPERAPI_API_KEY
```

Get a key at https://api.webscraperapi.ai:443/dashboard (500 free credits on signup).

### Scrape a page as markdown

```bash
curl -G "https://api.webscraperapi.ai:443/v2/scrape" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "url=https://news.ycombinator.com" \
  --data-urlencode "output=markdown"
```

The response is the page content as clean markdown text — ready to pass to an LLM.

### Extract specific data with `prompt`

Instead of getting the full page and parsing it yourself, use `prompt` to ask a question or extract specific information. The response contains only the extracted answer.

```bash
curl -G "https://api.webscraperapi.ai:443/v2/scrape" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "url=https://en.wikipedia.org/wiki/Anthropic" \
  --data-urlencode "prompt=When was Anthropic founded?"
```

This is the most efficient way to get information from a web page — you save tokens by not processing the full page content yourself.

## Endpoint reference

All endpoints use GET with query parameters.

### Scrape

Scrape any web page. Returns clean content in the requested output format. Use the `prompt` parameter to extract only specific information via LLM — this returns a much smaller, focused response instead of the full page content. Supports JavaScript rendering, CSS selectors, and multiple output formats.

Credit cost: 1 credit (`render_js=true`: +4, `prompt`: +5)

```bash
curl -G "https://api.webscraperapi.ai:443/v2/scrape" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "url=https://news.ycombinator.com" \
  --data-urlencode "output=markdown"
```

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `url` | string | required | The URL to scrape. |
| `prompt` | string | none | Natural-language prompt that tells an LLM what to extract from the page. When set, only the extracted answer is returned instead of the full page content. Example: 'Extract the product name, price, and description as JSON'. Costs +5 credits. |
| `css_selector` | string | none | Optional CSS selector to narrow content. |
| `llm` | string | none | LLM model for processing (e.g., 'gpt-4o-mini'). |
| `render_js` | string | `false` | Whether to render JavaScript. |
| `output` | string | none | Output format: 'raw_html', 'clean_html', 'markdown', 'html_head_metadata_json', 'email_addresses', 'internal_links', 'external_links', 'all_links'. |
| `timeout_ms` | string | none | Request timeout in milliseconds. |


### Detect Technologies

Detect technologies used by a website including e-commerce platforms, analytics tools (GA4, Google Ads), and tracking pixels (Facebook Pixel, GTM).

Credit cost: 2 credits

```bash
curl -G "https://api.webscraperapi.ai:443/v2/detect_technologies" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "url=https://news.ycombinator.com" \
  --data-urlencode "output=markdown"
```

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `url` | string | required | The URL to analyze. |
| `technologies` | string[] | `technology,drupal,bigcommerce,prestashop,webflow,react,next_js,vue_js,nuxt_js,angular,jquery,web_server,ga4,google_ads,google_ads_audience_tracker,google_ads_conversion,google_ads_remarketing,google_ads_user_list,google_tag_manager,google_recaptcha,facebook_pixel,matomo_analytics,hotjar,mixpanel,plausible_analytics,microsoft_clarity,fullstory,cloudflare_cdn,aws_cloudfront,fastly,akamai,hubspot,marketo,salesforce_pardot,intercom,zendesk_chat,drift,crisp,onetrust,cookiebot,stripe,paypal,hcaptcha,segment,rudderstack` | Technologies to check for: ga4, google_ads, google_ads_audience_tracker, google_ads_conversion, google_ads_remarketing, google_ads_user_list, google_tag_manager, facebook_pixel. |
| `output` | string | `markdown` | Output format. |
| `render_js` | boolean | `true` | Whether to render JavaScript (recommended). |
| `timeout_ms` | string | none | Request timeout in milliseconds. |


### Google Ads

Search Google and retrieve paid advertisement results with title, URL, displayed URL, and description. Supports pagination and geographic targeting.

Credit cost: 5 credits

```bash
curl -G "https://api.webscraperapi.ai:443/v2/google_ads" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "query=best+web+scraping+api" \
  --data-urlencode "output=markdown"
```

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `query` | string | required | Search query. |
| `domain` | string | none | Google domain TLD (e.g., 'com', 'de', 'co.uk'). |
| `locale` | string | none | Interface language in IETF BCP 47 (e.g., 'en-US'). |
| `geo_location` | string | none | Geographic location (e.g., 'California,United States'). |
| `start_page` | integer | `1` | Starting page number. |
| `pages` | integer | `1` | Number of pages to scrape. |
| `render_js` | boolean | `false` | Whether to render JavaScript. |
| `output` | string | `markdown` | Output format. |
| `timeout_ms` | string | none | Request timeout in milliseconds. |


### Google Search

Search Google and retrieve structured organic results with title, URL, and description. Supports search types (images, news, videos), time filters, and geographic targeting.

Credit cost: 5 credits

```bash
curl -G "https://api.webscraperapi.ai:443/v2/google_search" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "query=best+web+scraping+api" \
  --data-urlencode "output=markdown"
```

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `query` | string | required | Search query. |
| `domain` | string | none | Google domain TLD (e.g., 'com', 'de', 'co.uk'). |
| `locale` | string | none | Interface language in IETF BCP 47 (e.g., 'en-US'). |
| `geo_location` | string | none | Geographic location (e.g., 'California,United States'). |
| `start_page` | integer | `1` | Starting page number. |
| `pages` | integer | `1` | Number of pages to scrape. |
| `limit` | string | none | Results per page (max 100). |
| `tbm` | string | none | Search type: 'isch' (images), 'nws' (news), 'vid' (videos), 'bks' (books). |
| `tbs` | string | none | Time/sort filters: 'qdr:d' (past day), 'qdr:w' (past week), 'li:1' (verbatim). |
| `render_js` | boolean | `false` | Whether to render JavaScript. |
| `output` | string | `markdown` | Output format. |
| `timeout_ms` | string | none | Request timeout in milliseconds. |


### Google Shopping Search

Search Google Shopping for product listings with prices, merchant info, and actual product URLs. Supports currency, language, and geographic targeting.

Credit cost: 5 credits

```bash
curl -G "https://api.webscraperapi.ai:443/v2/google_shopping_search" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "output=markdown"
```

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `query` | string | none | Search query for Google Shopping. |
| `domain` | string | none | Google domain TLD (e.g., 'com', 'cl', 'uk'). |
| `locale` | string | none | Interface language in IETF BCP 47 (e.g., 'en-US'). |
| `geo_location` | string | none | Geographic location (e.g., 'Brazil', 'Germany'). |
| `language` | string | none | Results language in ISO 639-1 (e.g., 'en', 'es'). |
| `currency` | string | none | ISO 4217 currency code (e.g., 'USD', 'EUR'). |
| `prompt` | string | none | Optional prompt for LLM processing. |
| `render_js` | string | none | Whether to render JavaScript. |
| `output` | string | `markdown` | Output format. |
| `timeout_ms` | string | none | Request timeout in milliseconds. |


### Serp Domain Frequencies

Analyze Google Shopping results to count how often each domain appears. Useful for competitive analysis and market research.

Credit cost: 5 credits

```bash
curl -G "https://api.webscraperapi.ai:443/v2/serp_domain_frequencies" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "output=markdown"
```

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `query` | string | none | Search query for Google Shopping. |
| `domain` | string | none | Google domain TLD (e.g., 'com', 'cl', 'uk'). |
| `locale` | string | none | Interface language in IETF BCP 47 (e.g., 'en-US'). |
| `geo_location` | string | none | Geographic location (e.g., 'Brazil', 'Germany'). |
| `language` | string | none | Results language in ISO 639-1 (e.g., 'en', 'es'). |
| `currency` | string | none | ISO 4217 currency code (e.g., 'USD', 'EUR'). |
| `render_js` | string | none | Whether to render JavaScript. |
| `output` | string | `markdown` | Output format. |
| `timeout_ms` | string | none | Request timeout in milliseconds. |


### Verify Email Address

Verify the validity and deliverability of an email address. Returns status, sub-status, and detailed verification metadata.

Credit cost: 3 credits

```bash
curl -G "https://api.webscraperapi.ai:443/v2/verify_email_address" \
  -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \
  --data-urlencode "email_address=test@example.com" \
  --data-urlencode "output=markdown"
```

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `email_address` | string | required | The email address to verify. |
| `output` | string | `markdown` | Output format. |
| `timeout_ms` | string | none | Request timeout in milliseconds. |


## Credit costs

| Endpoint | Base | Extras |
|----------|------|--------|
| `/v2/scrape` | 1 | `render_js=true`: +4, `prompt`: +5 |
| `/v2/detect_technologies` | 2 |  |
| `/v2/google_ads` | 5 |  |
| `/v2/google_search` | 5 |  |
| `/v2/google_shopping_search` | 5 |  |
| `/v2/serp_domain_frequencies` | 5 |  |
| `/v2/verify_email_address` | 3 |  |

## When to use each output format

- **`output=markdown`** (default) — Best for LLM consumption. Returns clean text with structure preserved. Use this when you plan to pass the content to another LLM or need human-readable text.
- **`output=json`** — Best for programmatic processing. Use with Google Search, Google Shopping, Google Ads, and Verify Email endpoints when you need structured fields you can parse.
- **`output=csv`** — Best for tabular data. Use when you want to load results into a spreadsheet or data pipeline.
- **`output=raw_html`** / **`output=clean_html`** — Use only when you specifically need HTML structure (e.g., for CSS selector extraction or HTML analysis).

## Tips

- **Prefer `prompt` over full-page scraping** when you only need specific data. It costs +5 credits but saves significant tokens and post-processing work because the response is just the extracted answer, not the entire page.
- **Skip `render_js`** unless the page is a single-page app (React, Vue, Angular) or loads content dynamically. It adds latency and costs +4 credits. Most content-heavy pages (articles, product pages, docs) work fine without it.
- **Use `css_selector`** to narrow the HTML before processing when you know exactly which element contains the data you need. This reduces noise and improves `prompt` extraction accuracy.
- **Sequential requests recommended** — requests are throttled per API key. Sending many in parallel may result in rate-limit errors.

## MCP Server

Connect webscraperapi to any MCP-compatible LLM client:

```bash
claude mcp add --transport http webscraperapi https://api.webscraperapi.ai:443/mcp
```

## Links

- Documentation: https://api.webscraperapi.ai:443/docs
- OpenAPI spec: https://api.webscraperapi.ai:443/openapi.json
- Dashboard & API keys: https://api.webscraperapi.ai:443/dashboard