Basic Web Scraper

Basic Web Scraper API Documentation

Introduction

The Basic Web Scraper API enables extraction of raw HTML content and specified elements from web pages. This version focuses on direct web scraping without LLM processing.

Authentication

All API requests require authentication using an API key. Include your key in the x-api-key header:

x-api-key: YOUR_API_KEY_HERE

Base URL

https://api.yetanotherapi.com/v1/llm-web-scrapper

Pricing

Each API call costs 1 credit.

Endpoint Details

Scrape Web Page

Extract raw content from a specified URL using CSS selectors or XPath.

HTTP Method: POST Endpoint: /

Request Headers

Header
Value
Description

x-api-key

YOUR_API_KEY_HERE

Your unique API authentication key

Content-Type

application/json

Specify JSON request body

Request Body Parameters

Parameter
Type
Required
Description

url

string

Yes

URL of the web page to scrape

selector

string

No

CSS selector or XPath to target specific elements

use_llm

boolean

Yes

Must be set to false for basic scraping

webhook

string

No

Optional webhook URL for receiving response

Example Request

curl --location 'https://api.yetanotherapi.com/v1/llm-web-scrapper' \
--header 'x-api-key: $API_KEY_HERE' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://www.amazon.in/AMVR-Controller-Compatible-Accessories-Adjustable/dp/B0CJRK7B8J",
    "selector": "#productTitle",
    "use_llm": false,
    "webhook": "https://your-webhook.com/endpoint"
}'

Response Structure

{
    "request_id": "c03995eb-e117-4eca-85c8-e6d398a968d9",
    "scraped_content": {
        "html": "<div id='productTitle'>Product Name Here</div>",
        "text": "Product Name Here"
    }
}

Error Handling

{
    "error": "Invalid URL format",
    "status_code": 400
}

Common Error Codes:

  • 400: Bad Request (invalid parameters)

  • 401: Unauthorized (invalid API key)

  • 403: Forbidden (blocked by target website)

  • 404: Page Not Found

  • 429: Too Many Requests

Limitations

  • JavaScript rendering is not supported

  • Some websites may block automated access

  • Maximum page size: 5MB

  • Timeout: 20 seconds

Support

For technical support or to report issues, contact: hey@manojlk.work

Last updated