YetAnotherAPI Documentation
Signup for APIGo to main websiteContact Support
  • API Documentation
    • YetAnotherAPI Overview
    • Authentication
  • Integrations
    • Pabbly-Connect
  • Document parser
    • PDF Parser
    • Doc Parser
    • PNG & JPG Parser
    • TXT Parser
    • Parser Processing Status
  • Web Scrapper [deprecated]
    • Basic Web Scraper
    • LLM Web Scraper
    • Scrapper Processing Status
  • Web Scraper
    • Basic Web Scraper
    • Webpage links Scrapper
    • Metadata Scrapper
    • Webhook Notification
    • Status Check
  • LLM Web Scraper
    • Basic Text
    • Structured JSON
    • Best Practices
    • Use Cases
    • Status Check
  • UChat Webhook System
Powered by GitBook
On this page
  • Overview
  • Base URL
  • Authentication
  • Request Headers
  • Request Body
  • Important Parameter Notes
  • Responses
  • Error Codes
  • Webhook Integration

Was this helpful?

LLM Web Scraper

Overview

The LLM Web Scraper API combines powerful web scraping capabilities with Language Model processing to extract and structure web content intelligently. It can analyze web pages and return structured data based on your specific requirements.

Base URL

POST https://api.yetanotherapi.com/web-scrapper/

Authentication

All requests require an API key passed in the x-api-key header.

Request Headers

Header
Required
Description

Content-Type

Yes

Must be application/json

x-api-key

Yes

Your API authentication key

Request Body

{
    "url": "https://example.com",
    "output_type": "plaintext", //optional
    "use_llm": true,
    "prompt": "Extract product details including name, price, and specifications",
    "openai_key_id": "752724", //optional but recommended
    "use_cache": false, //optional
    "webhook": "https://your-webhook-url.com" //optional
}

Request Parameters

Parameter
Type
Required
Default
Description

url

string

Yes

-

The URL of the website to scrape

output_type

string

No

plaintext

Either "plaintext" or "markdown"

use_llm

boolean

Yes

-

Must be set to true for LLM processing

prompt

string

Yes

-

Instructions for the LLM about what to extract

openai_key_id

string

No

null

Optional ID of your registered OpenAI key

use_cache

boolean

No

false

If true, returns cached result if available

webhook

string

No

null

URL to receive webhook notification when processing complete

Important Parameter Notes

  1. Cache Behavior

    • When use_cache: true, all other parameters except url are ignored

    • Returns most recent cached result for the URL

    • 404 error if no cache exists

  2. OpenAI Key ID

    • Optional parameter

    • If provided, uses the specified OpenAI key from your account

    • If not provided, uses your most recently added OpenAI key

  3. Webhook

    • Optional callback URL for asynchronous processing

    • Receives full results when processing completes

    • Must be publicly accessible HTTPS endpoint

Responses

Immediate Success Response (HTTP 200)

When processing completes within 20 seconds:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "completed",
    "timestamp": 1635545600,
    "content": {
        "text": "Extracted text content...",
        "meta": {
            "title": "Page Title",
            "description": "Meta description..."
        },
        "links": [
            {
                "text": "Link text",
                "url": "https://example.com/link",
                "type": "internal"
            }
        ],
        "images": [
            {
                "url": "https://example.com/image.jpg",
                "alt": "Image description"
            }
        ]
    },
    "llm_output": {
        // Structured JSON based on prompt
    }
}

Processing Response (HTTP 202)

When processing takes longer than 20 seconds:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "processing",
    "message": "Processing your request. Please check status later."
}

Error Response (HTTP 4XX/5XX)

{
    "error": "ERROR_CODE: Error message"
}

Error Codes

Code
Description
HTTP Status

E001

Invalid request format

400

E003

Invalid URL format

400

E004

Authentication error

401

E008

Content processing failed

500

E009

Validation error

400

Webhook Integration

When providing a webhook URL, you'll receive a POST request with the complete results:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "completed",
    "url": "https://example.com",
    "timestamp": 1635545600,
    "content": {
        // Scraped content
    },
    "llm_output": {
        // LLM processed data
    }
}
PreviousStatus CheckNextBasic Text

Last updated 5 months ago

Was this helpful?

Manage multiple keys through your

yetanotherapi dashboard