Web Scraper

Welcome to the Web Scraper API documentation. Our API provides powerful web scraping capabilities with optional LLM (Language Model) processing, making it easy to extract and structure web content for your applications.

Features

Basic Web Scraping: Extract text content, links, images, and metadata from any webpage
Markdown Support: Get content in either plaintext or markdown format
LLM Processing: Use AI to structure and analyze scraped content
Caching: Improve performance with optional result caching
Webhook Notifications: Receive results asynchronously via webhooks
Rich Metadata: Extract meta tags, schema.org data, and structured content

Authentication

All API requests require an API key, which should be included in the x-api-key header:

--header 'x-api-key: your-api-key'

Base URL

https://api.yetanotherapi.com/web-scrapper/

Available Endpoints

Method

Endpoint

Description

Documentation

POST

/

Submit scraping request

Basic Scraping

POST

/

Submit LLM scraping request

LLM Scraping

GET

/{request_id}

Check request status and get results

Status Check

Quick Start

Basic Scraping Request

curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "output_type": "plaintext"
}'

LLM Processing Request

curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "use_llm": true,
    "prompt": "Extract main topics and summarize key points",
    "openai_key_id": "your-key-id"
}'

Processing Modes

Synchronous Processing

Results returned immediately if processing completes within 20 seconds
Best for simple pages and quick scraping tasks

Asynchronous Processing

For longer running requests
Status check endpoint for polling results
Webhook notifications available

Common Use Cases

Content Aggregation
- Extract articles and blog posts
- Monitor news and updates
- Collect product information
Data Analysis
- Extract structured data
- Analyze web content
- Generate insights using LLM
Content Transformation
- Convert HTML to markdown
- Extract clean text content
- Generate structured JSON

Best Practices

Use Caching
- Enable use_cache for frequently accessed pages.
- Cache results available for 15 days.
- Reduces processing time and AI cost.
Handle Asynchronous Processing
- Implement webhook endpoint for notifications
- Use status check endpoint with reasonable polling intervals (5-15 minutes)
- Handle timeout scenarios gracefully
LLM Processing
- Write clear, specific prompts
- Consider content length and complexity
- Test with sample content first

Error Handling

All API endpoints use standard HTTP response codes:

200: Success
202: Accepted (processing)
400: Bad request
401: Unauthorized
429: Rate limit exceeded
500: Server error

Error responses include detailed messages and codes:

{
    "error": "E001: Invalid request format"
}

Support

For support requests or questions:

Email: [email protected]
API Support Portal or send an email: https://app.yetanotherapi.com

Changelog

See our changelog for API updates and changes.

PreviousScrapper Processing Status NextBasic Web Scraper

Last updated 7 months ago

Was this helpful?