Metadata Scrapper

This API endpoint allows you to extract content from websites in either plaintext or markdown format.

Endpoint

POST https://api.yetanotherapi.com/web-scrapper/

Headers

Header

Required

Description

Content-Type

Yes

Must be application/json

x-api-key

Yes

Your API authentication key

Request Body

Though you don't have to explicitly mention about metadata. we will scrape it by default and just make API call using below payload.

{
    "url": "https://example.com",
    "output_type": "plaintext",
    "use_cache": false,
    "webhook": "https://your-webhook-url.com" (optional)
}

Parameters

Parameter

Type

Required

Description

url

string

Yes

The URL of the website to scrape

output_type

string

Either "plaintext" (default) or "markdown"

use_cache

boolean

If true, returns cached result if available. Default: false

webhook

string

URL to receive webhook notification when processing is complete

Response

Immediate Response (HTTP 200)

If processing completes within 20 seconds, you'll receive the full result:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "completed",
    "timestamp": 1635545600,
    "text_content": "Extracted text content...",
    "meta": {
        "title": "Page Title",
        "description": "Meta description..."
    },
    "links": [
        {
            "text": "Link text",
            "url": "https://example.com/link",
            "type": "internal"
        }
    ],
    "images": [
        {
            "url": "https://example.com/image.jpg",
            "alt": "Image description"
        }
    ]
}

Processing Response (HTTP 202)

If processing takes longer than 20 seconds:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "processing",
    "message": "Processing your request. Please check status later."
}

Error Response (HTTP 4XX/5XX)

{
    "error": "ERROR_CODE: Error message"
}

Common error codes:

E001: Invalid request format
E003: Invalid URL format
E006: Storage service error
E008: Content processing failed

Response Fields

Field

Type

Description

request_id

string

Unique identifier for the request

url

string

The URL that was scraped

status

string

Status of the request ("completed" or "processing")

timestamp

number

Unix timestamp of when the request was processed

text_content

string

The extracted text content (if output_type=plaintext)

Example Curl Request

curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "output_type": "plaintext",
    "use_cache": false
}'

Notes

The API supports both synchronous and asynchronous processing
For pages requiring longer processing time, use the status check endpoint to poll for results
Use webhooks for automatic notification when processing completes
Cache results are available for 15 days

PreviousWebpage links Scrapper NextWebhook Notification

Last updated 7 months ago

Was this helpful?