Webhook Notification

When you provide a webhook URL in your scraping request, our system will automatically send the results to your specified endpoint once processing is complete.

Webhook Configuration

Add the webhook URL to your scraping request:

{
    "url": "https://example.com",
    "webhook": "https://your-webhook-url.com/endpoint"
}

Webhook Request Details

Headers

Header
Value

Content-Type

application/json

User-Agent

Web-Scraper-Webhook/1.0

Payload Structure

The webhook will send a POST request with the following JSON structure:

{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "completed",
    "url": "https://example.com",
    "timestamp": 1635545600,
    "content": {
        "txt": "Extracted text content...",
        "markdown": "# Markdown content...", // If markdown was requested
        "meta": {
            "title": "Page Title",
            "description": "Meta description...",
            "og:image": "https://example.com/image.jpg",
            // ... other meta tags
        },
        "schema": [
            // Array of schema.org structured data
            {
                "type": "Article",
                "properties": {
                    "headline": "Article Title",
                    "datePublished": "2023-01-01"
                }
            }
        ],
        "images": [
            {
                "url": "https://example.com/image.jpg",
                "alt": "Image description",
                "title": "Image title"
            }
        ],
        "links": [
            {
                "text": "Link text",
                "url": "https://example.com/link",
                "type": "internal",
                "location": 0
            }
        ]
    },
    "llm_output": { // Only present if LLM processing was requested
        // Structured JSON based on the provided prompt
    }
}

Webhook Behavior

Retry Policy

  • Maximum retries: 3 attempts

  • Retry interval: Exponential backoff starting at 5 seconds

  • Timeout: 10 seconds per attempt

Success Criteria

  • HTTP 2XX response is considered successful

  • Any other response code will trigger a retry

  • After all retry attempts are exhausted, the webhook status will be marked as failed

Error Handling

If the webhook delivery fails, the status can be checked via the status endpoint:

{
    "webhook_status": "failed",
    "webhook_error": "Failed to deliver webhook notification",
    "webhook_timestamp": 1635545600
}

Testing Webhooks

For development and testing, we recommend:

  1. Using tools like webhook.site for initial testing

  2. Setting up a local tunnel using ngrok for development

  3. Implementing a test endpoint that logs webhook payloads

Last updated

Was this helpful?