YetAnotherAPI Documentation
Signup for APIGo to main websiteContact Support
  • API Documentation
    • YetAnotherAPI Overview
    • Authentication
  • Integrations
    • Pabbly-Connect
  • Document parser
    • PDF Parser
    • Doc Parser
    • PNG & JPG Parser
    • TXT Parser
    • Parser Processing Status
  • Web Scrapper [deprecated]
    • Basic Web Scraper
    • LLM Web Scraper
    • Scrapper Processing Status
  • Web Scraper
    • Basic Web Scraper
    • Webpage links Scrapper
    • Metadata Scrapper
    • Webhook Notification
    • Status Check
  • LLM Web Scraper
    • Basic Text
    • Structured JSON
    • Best Practices
    • Use Cases
    • Status Check
  • UChat Webhook System
Powered by GitBook
On this page
  • Features
  • Authentication
  • Base URL
  • Available Endpoints
  • Quick Start
  • Processing Modes
  • Common Use Cases
  • Best Practices
  • Error Handling
  • Support
  • Changelog

Was this helpful?

Web Scraper

Welcome to the Web Scraper API documentation. Our API provides powerful web scraping capabilities with optional LLM (Language Model) processing, making it easy to extract and structure web content for your applications.

Features

  • Basic Web Scraping: Extract text content, links, images, and metadata from any webpage

  • Markdown Support: Get content in either plaintext or markdown format

  • LLM Processing: Use AI to structure and analyze scraped content

  • Caching: Improve performance with optional result caching

  • Webhook Notifications: Receive results asynchronously via webhooks

  • Rich Metadata: Extract meta tags, schema.org data, and structured content

Authentication

All API requests require an API key, which should be included in the x-api-key header:

--header 'x-api-key: your-api-key'

Base URL

https://api.yetanotherapi.com/web-scrapper/

Available Endpoints

Method
Endpoint
Description
Documentation

POST

/

Submit scraping request

Basic Scraping

POST

/

Submit LLM scraping request

LLM Scraping

GET

/{request_id}

Check request status and get results

Status Check

Quick Start

Basic Scraping Request

curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "output_type": "plaintext"
}'

LLM Processing Request

curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "use_llm": true,
    "prompt": "Extract main topics and summarize key points",
    "openai_key_id": "your-key-id"
}'

Processing Modes

Synchronous Processing

  • Results returned immediately if processing completes within 20 seconds

  • Best for simple pages and quick scraping tasks

Asynchronous Processing

  • For longer running requests

  • Status check endpoint for polling results

  • Webhook notifications available

Common Use Cases

  1. Content Aggregation

    • Extract articles and blog posts

    • Monitor news and updates

    • Collect product information

  2. Data Analysis

    • Extract structured data

    • Analyze web content

    • Generate insights using LLM

  3. Content Transformation

    • Convert HTML to markdown

    • Extract clean text content

    • Generate structured JSON

Best Practices

  1. Use Caching

    • Enable use_cache for frequently accessed pages.

    • Cache results available for 15 days.

    • Reduces processing time and AI cost.

  2. Handle Asynchronous Processing

    • Implement webhook endpoint for notifications

    • Use status check endpoint with reasonable polling intervals (5-15 minutes)

    • Handle timeout scenarios gracefully

  3. LLM Processing

    • Write clear, specific prompts

    • Consider content length and complexity

    • Test with sample content first

Error Handling

All API endpoints use standard HTTP response codes:

  • 200: Success

  • 202: Accepted (processing)

  • 400: Bad request

  • 401: Unauthorized

  • 429: Rate limit exceeded

  • 500: Server error

Error responses include detailed messages and codes:

{
    "error": "E001: Invalid request format"
}

Support

For support requests or questions:

  1. Email: hey@manojlk.work

  2. API Support Portal or send an email: https://app.yetanotherapi.com

Changelog

See our changelog for API updates and changes.

PreviousScrapper Processing StatusNextBasic Web Scraper

Last updated 5 months ago

Was this helpful?