YetAnotherAPI Documentation
Signup for APIGo to main websiteContact Support
  • API Documentation
    • YetAnotherAPI Overview
    • Authentication
  • Integrations
    • Pabbly-Connect
  • Document parser
    • PDF Parser
    • Doc Parser
    • PNG & JPG Parser
    • TXT Parser
    • Parser Processing Status
  • Web Scrapper [deprecated]
    • Basic Web Scraper
    • LLM Web Scraper
    • Scrapper Processing Status
  • Web Scraper
    • Basic Web Scraper
    • Webpage links Scrapper
    • Metadata Scrapper
    • Webhook Notification
    • Status Check
  • LLM Web Scraper
    • Basic Text
    • Structured JSON
    • Best Practices
    • Use Cases
    • Status Check
  • UChat Webhook System
Powered by GitBook
On this page
  • Endpoint
  • Headers
  • Request Body
  • Response
  • Response Fields
  • Example Curl Request
  • Notes

Was this helpful?

  1. Web Scraper

Metadata Scrapper

This API endpoint allows you to extract content from websites in either plaintext or markdown format.

Endpoint

POST https://api.yetanotherapi.com/web-scrapper/

Headers

Header
Required
Description

Content-Type

Yes

Must be application/json

x-api-key

Yes

Your API authentication key

Request Body

Though you don't have to explicitly mention about metadata. we will scrape it by default and just make API call using below payload.

{
    "url": "https://example.com",
    "output_type": "plaintext",
    "use_cache": false,
    "webhook": "https://your-webhook-url.com" (optional)
}

Parameters

Parameter
Type
Required
Description

url

string

Yes

The URL of the website to scrape

output_type

string

No

Either "plaintext" (default) or "markdown"

use_cache

boolean

No

If true, returns cached result if available. Default: false

webhook

string

No

URL to receive webhook notification when processing is complete

Response

Immediate Response (HTTP 200)

If processing completes within 20 seconds, you'll receive the full result:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "completed",
    "timestamp": 1635545600,
    "text_content": "Extracted text content...",
    "meta": {
        "title": "Page Title",
        "description": "Meta description..."
    },
    "links": [
        {
            "text": "Link text",
            "url": "https://example.com/link",
            "type": "internal"
        }
    ],
    "images": [
        {
            "url": "https://example.com/image.jpg",
            "alt": "Image description"
        }
    ]
}

Processing Response (HTTP 202)

If processing takes longer than 20 seconds:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "processing",
    "message": "Processing your request. Please check status later."
}

Error Response (HTTP 4XX/5XX)

{
    "error": "ERROR_CODE: Error message"
}

Common error codes:

  • E001: Invalid request format

  • E003: Invalid URL format

  • E006: Storage service error

  • E008: Content processing failed

Response Fields

Field
Type
Description

request_id

string

Unique identifier for the request

url

string

The URL that was scraped

status

string

Status of the request ("completed" or "processing")

timestamp

number

Unix timestamp of when the request was processed

text_content

string

The extracted text content (if output_type=plaintext)

meta

object

Metadata from the page

links

array

Array of links found on the page

images

array

Array of images found on the page

Example Curl Request

curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "output_type": "plaintext",
    "use_cache": false
}'

Notes

  • The API supports both synchronous and asynchronous processing

  • For pages requiring longer processing time, use the status check endpoint to poll for results

  • Use webhooks for automatic notification when processing completes

  • Cache results are available for 15 days

PreviousWebpage links ScrapperNextWebhook Notification

Last updated 5 months ago

Was this helpful?