YetAnotherAPI Documentation
Signup for APIGo to main websiteContact Support
  • API Documentation
    • YetAnotherAPI Overview
    • Authentication
  • Integrations
    • Pabbly-Connect
  • Document parser
    • PDF Parser
    • Doc Parser
    • PNG & JPG Parser
    • TXT Parser
    • Parser Processing Status
  • Web Scrapper [deprecated]
    • Basic Web Scraper
    • LLM Web Scraper
    • Scrapper Processing Status
  • Web Scraper
    • Basic Web Scraper
    • Webpage links Scrapper
    • Metadata Scrapper
    • Webhook Notification
    • Status Check
  • LLM Web Scraper
    • Basic Text
    • Structured JSON
    • Best Practices
    • Use Cases
    • Status Check
  • UChat Webhook System
Powered by GitBook
On this page
  • Endpoint
  • Headers
  • Path Parameters
  • Response
  • Response Fields
  • Example Curl Request
  • Error Handling
  • Notes

Was this helpful?

  1. Web Scraper

Status Check

This endpoint allows you to check the status and retrieve results of a previously submitted scraping request.

Endpoint

GET https://api.yetanotherapi.com/web-scrapper/{request_id}

Headers

Header
Required
Description

x-api-key

Yes

Your API authentication key

Path Parameters

Parameter
Type
Description

request_id

string

The request ID returned from the scraper API

Response

Success Response (HTTP 200)

For completed requests:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "completed",
    "timestamp": 1635545600,
    "content": {
        "text": "Extracted text content...",
        "markdown": "# Extracted markdown content...", // If markdown was requested
        "meta": {
            "title": "Page Title",
            "description": "Meta description..."
        },
        "links": [
            {
                "text": "Link text",
                "url": "https://example.com/link",
                "type": "internal"
            }
        ],
        "images": [
            {
                "url": "https://example.com/image.jpg",
                "alt": "Image description"
            }
        ]
    },
    "llm_output": { // Only present if LLM processing was requested
        // Structured JSON output based on the prompt
    }
}

Processing Response (HTTP 200)

For requests still processing:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "processing",
    "timestamp": 1635545600
}

Error Response (HTTP 4XX/5XX)

{
    "error": "ERROR_CODE: Error message"
}

Common error codes:

  • E001: Invalid request format

  • E002: Request ID not found

  • E006: Storage service error

Response Fields

Field
Type
Description

request_id

string

Unique identifier for the request

url

string

The URL that was scraped

status

string

Current status of the request

timestamp

number

Unix timestamp of last status update

content

object

Contains extracted content if status is "completed"

llm_output

object

Present only if LLM processing was requested

Possible Status Values

Status
Description

received

Request has been received but not yet processed

processing

Request is currently being processed

completed

Processing has completed successfully

failed

Processing failed with an error

Example Curl Request

curl --location 'https://api.yetanotherapi.com/web-scrapper/550e8400-e29b-41d4-a716-446655440000' \
--header 'x-api-key: your-api-key'

Error Handling

If processing failed, the response will include error details:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "failed",
    "timestamp": 1635545600,
    "error": "E008: Content processing failed",
    "error_trace": "Detailed error information" // Only in development environment
}

Notes

  • Polling interval should be at least 5 minutes

  • Results are available for 15 hours after completion

PreviousWebhook NotificationNextLLM Web Scraper

Last updated 5 months ago

Was this helpful?