--- name: webscraperapi description: >- Scrape any web page and get clean, LLM-ready content. Use this whenever you need to fetch a URL, read a web page, extract data from a website, check what technologies a site uses, search Google, or verify an email address. Handles anti-bot protection, JavaScript rendering, proxies, and CAPTCHAs automatically. Use the prompt parameter to extract only the specific information you need — this returns a small, focused answer instead of the full page. Supports markdown, JSON, CSV, and HTML output. 500 free credits on signup. --- # webscraperapi > Version: 0.0.30+2026-03-03 | Generated: 2026-03-03 Web scraping API that handles anti-bot, JS rendering, proxies, and CAPTCHAs so you don't have to. Returns clean content ready for LLM consumption. Base URL: `https://api.webscraperapi.ai:443` ## Quick start Set the `Authorization` header on every request: ``` Authorization: Bearer WEBSCRAPERAPI_API_KEY ``` Get a key at https://api.webscraperapi.ai:443/dashboard (500 free credits on signup). ### Scrape a page as markdown ```bash curl -G "https://api.webscraperapi.ai:443/v2/scrape" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "url=https://news.ycombinator.com" \ --data-urlencode "output=markdown" ``` The response is the page content as clean markdown text — ready to pass to an LLM. ### Extract specific data with `prompt` Instead of getting the full page and parsing it yourself, use `prompt` to ask a question or extract specific information. The response contains only the extracted answer. ```bash curl -G "https://api.webscraperapi.ai:443/v2/scrape" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "url=https://en.wikipedia.org/wiki/Anthropic" \ --data-urlencode "prompt=When was Anthropic founded?" ``` This is the most efficient way to get information from a web page — you save tokens by not processing the full page content yourself. ## Endpoint reference All endpoints use GET with query parameters. ### Scrape Scrape any web page. Returns clean content in the requested output format. Use the `prompt` parameter to extract only specific information via LLM — this returns a much smaller, focused response instead of the full page content. Supports JavaScript rendering, CSS selectors, and multiple output formats. Credit cost: 1 credit (`render_js=true`: +4, `prompt`: +5) ```bash curl -G "https://api.webscraperapi.ai:443/v2/scrape" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "url=https://news.ycombinator.com" \ --data-urlencode "output=markdown" ``` | Param | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | The URL to scrape. | | `prompt` | string | none | Natural-language prompt that tells an LLM what to extract from the page. When set, only the extracted answer is returned instead of the full page content. Example: 'Extract the product name, price, and description as JSON'. Costs +5 credits. | | `css_selector` | string | none | Optional CSS selector to narrow content. | | `llm` | string | none | LLM model for processing (e.g., 'gpt-4o-mini'). | | `render_js` | string | `false` | Whether to render JavaScript. | | `output` | string | none | Output format: 'raw_html', 'clean_html', 'markdown', 'html_head_metadata_json', 'email_addresses', 'internal_links', 'external_links', 'all_links'. | | `timeout_ms` | string | none | Request timeout in milliseconds. | ### Detect Technologies Detect technologies used by a website including e-commerce platforms, analytics tools (GA4, Google Ads), and tracking pixels (Facebook Pixel, GTM). Credit cost: 2 credits ```bash curl -G "https://api.webscraperapi.ai:443/v2/detect_technologies" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "url=https://news.ycombinator.com" \ --data-urlencode "output=markdown" ``` | Param | Type | Default | Description | |-------|------|---------|-------------| | `url` | string | required | The URL to analyze. | | `technologies` | string[] | `technology,drupal,bigcommerce,prestashop,webflow,react,next_js,vue_js,nuxt_js,angular,jquery,web_server,ga4,google_ads,google_ads_audience_tracker,google_ads_conversion,google_ads_remarketing,google_ads_user_list,google_tag_manager,google_recaptcha,facebook_pixel,matomo_analytics,hotjar,mixpanel,plausible_analytics,microsoft_clarity,fullstory,cloudflare_cdn,aws_cloudfront,fastly,akamai,hubspot,marketo,salesforce_pardot,intercom,zendesk_chat,drift,crisp,onetrust,cookiebot,stripe,paypal,hcaptcha,segment,rudderstack` | Technologies to check for: ga4, google_ads, google_ads_audience_tracker, google_ads_conversion, google_ads_remarketing, google_ads_user_list, google_tag_manager, facebook_pixel. | | `output` | string | `markdown` | Output format. | | `render_js` | boolean | `true` | Whether to render JavaScript (recommended). | | `timeout_ms` | string | none | Request timeout in milliseconds. | ### Google Ads Search Google and retrieve paid advertisement results with title, URL, displayed URL, and description. Supports pagination and geographic targeting. Credit cost: 5 credits ```bash curl -G "https://api.webscraperapi.ai:443/v2/google_ads" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "query=best+web+scraping+api" \ --data-urlencode "output=markdown" ``` | Param | Type | Default | Description | |-------|------|---------|-------------| | `query` | string | required | Search query. | | `domain` | string | none | Google domain TLD (e.g., 'com', 'de', 'co.uk'). | | `locale` | string | none | Interface language in IETF BCP 47 (e.g., 'en-US'). | | `geo_location` | string | none | Geographic location (e.g., 'California,United States'). | | `start_page` | integer | `1` | Starting page number. | | `pages` | integer | `1` | Number of pages to scrape. | | `render_js` | boolean | `false` | Whether to render JavaScript. | | `output` | string | `markdown` | Output format. | | `timeout_ms` | string | none | Request timeout in milliseconds. | ### Google Search Search Google and retrieve structured organic results with title, URL, and description. Supports search types (images, news, videos), time filters, and geographic targeting. Credit cost: 5 credits ```bash curl -G "https://api.webscraperapi.ai:443/v2/google_search" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "query=best+web+scraping+api" \ --data-urlencode "output=markdown" ``` | Param | Type | Default | Description | |-------|------|---------|-------------| | `query` | string | required | Search query. | | `domain` | string | none | Google domain TLD (e.g., 'com', 'de', 'co.uk'). | | `locale` | string | none | Interface language in IETF BCP 47 (e.g., 'en-US'). | | `geo_location` | string | none | Geographic location (e.g., 'California,United States'). | | `start_page` | integer | `1` | Starting page number. | | `pages` | integer | `1` | Number of pages to scrape. | | `limit` | string | none | Results per page (max 100). | | `tbm` | string | none | Search type: 'isch' (images), 'nws' (news), 'vid' (videos), 'bks' (books). | | `tbs` | string | none | Time/sort filters: 'qdr:d' (past day), 'qdr:w' (past week), 'li:1' (verbatim). | | `render_js` | boolean | `false` | Whether to render JavaScript. | | `output` | string | `markdown` | Output format. | | `timeout_ms` | string | none | Request timeout in milliseconds. | ### Google Shopping Search Search Google Shopping for product listings with prices, merchant info, and actual product URLs. Supports currency, language, and geographic targeting. Credit cost: 5 credits ```bash curl -G "https://api.webscraperapi.ai:443/v2/google_shopping_search" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "output=markdown" ``` | Param | Type | Default | Description | |-------|------|---------|-------------| | `query` | string | none | Search query for Google Shopping. | | `domain` | string | none | Google domain TLD (e.g., 'com', 'cl', 'uk'). | | `locale` | string | none | Interface language in IETF BCP 47 (e.g., 'en-US'). | | `geo_location` | string | none | Geographic location (e.g., 'Brazil', 'Germany'). | | `language` | string | none | Results language in ISO 639-1 (e.g., 'en', 'es'). | | `currency` | string | none | ISO 4217 currency code (e.g., 'USD', 'EUR'). | | `prompt` | string | none | Optional prompt for LLM processing. | | `render_js` | string | none | Whether to render JavaScript. | | `output` | string | `markdown` | Output format. | | `timeout_ms` | string | none | Request timeout in milliseconds. | ### Serp Domain Frequencies Analyze Google Shopping results to count how often each domain appears. Useful for competitive analysis and market research. Credit cost: 5 credits ```bash curl -G "https://api.webscraperapi.ai:443/v2/serp_domain_frequencies" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "output=markdown" ``` | Param | Type | Default | Description | |-------|------|---------|-------------| | `query` | string | none | Search query for Google Shopping. | | `domain` | string | none | Google domain TLD (e.g., 'com', 'cl', 'uk'). | | `locale` | string | none | Interface language in IETF BCP 47 (e.g., 'en-US'). | | `geo_location` | string | none | Geographic location (e.g., 'Brazil', 'Germany'). | | `language` | string | none | Results language in ISO 639-1 (e.g., 'en', 'es'). | | `currency` | string | none | ISO 4217 currency code (e.g., 'USD', 'EUR'). | | `render_js` | string | none | Whether to render JavaScript. | | `output` | string | `markdown` | Output format. | | `timeout_ms` | string | none | Request timeout in milliseconds. | ### Verify Email Address Verify the validity and deliverability of an email address. Returns status, sub-status, and detailed verification metadata. Credit cost: 3 credits ```bash curl -G "https://api.webscraperapi.ai:443/v2/verify_email_address" \ -H "Authorization: Bearer WEBSCRAPERAPI_API_KEY" \ --data-urlencode "email_address=test@example.com" \ --data-urlencode "output=markdown" ``` | Param | Type | Default | Description | |-------|------|---------|-------------| | `email_address` | string | required | The email address to verify. | | `output` | string | `markdown` | Output format. | | `timeout_ms` | string | none | Request timeout in milliseconds. | ## Credit costs | Endpoint | Base | Extras | |----------|------|--------| | `/v2/scrape` | 1 | `render_js=true`: +4, `prompt`: +5 | | `/v2/detect_technologies` | 2 | | | `/v2/google_ads` | 5 | | | `/v2/google_search` | 5 | | | `/v2/google_shopping_search` | 5 | | | `/v2/serp_domain_frequencies` | 5 | | | `/v2/verify_email_address` | 3 | | ## When to use each output format - **`output=markdown`** (default) — Best for LLM consumption. Returns clean text with structure preserved. Use this when you plan to pass the content to another LLM or need human-readable text. - **`output=json`** — Best for programmatic processing. Use with Google Search, Google Shopping, Google Ads, and Verify Email endpoints when you need structured fields you can parse. - **`output=csv`** — Best for tabular data. Use when you want to load results into a spreadsheet or data pipeline. - **`output=raw_html`** / **`output=clean_html`** — Use only when you specifically need HTML structure (e.g., for CSS selector extraction or HTML analysis). ## Tips - **Prefer `prompt` over full-page scraping** when you only need specific data. It costs +5 credits but saves significant tokens and post-processing work because the response is just the extracted answer, not the entire page. - **Skip `render_js`** unless the page is a single-page app (React, Vue, Angular) or loads content dynamically. It adds latency and costs +4 credits. Most content-heavy pages (articles, product pages, docs) work fine without it. - **Use `css_selector`** to narrow the HTML before processing when you know exactly which element contains the data you need. This reduces noise and improves `prompt` extraction accuracy. - **Sequential requests recommended** — requests are throttled per API key. Sending many in parallel may result in rate-limit errors. ## MCP Server Connect webscraperapi to any MCP-compatible LLM client: ```bash claude mcp add --transport http webscraperapi https://api.webscraperapi.ai:443/mcp ``` ## Links - Documentation: https://api.webscraperapi.ai:443/docs - OpenAPI spec: https://api.webscraperapi.ai:443/openapi.json - Dashboard & API keys: https://api.webscraperapi.ai:443/dashboard