# Webpage links Scrapper

This API endpoint allows you to extract content from websites in either plaintext or markdown format.

### Endpoint

```
POST https://api.yetanotherapi.com/web-scrapper/
```

### Headers

| Header       | Required | Description                 |
| ------------ | -------- | --------------------------- |
| Content-Type | Yes      | Must be `application/json`  |
| x-api-key    | Yes      | Your API authentication key |

### Request Body

***Though you don't have to explicitly mention about links. we will scrape it by default and just make API call using below payload.***

```json
{
    "url": "https://example.com",
    "output_type": "markdown",
    "use_cache": false,
    "webhook": "https://your-webhook-url.com" (optional)
}
```

#### Parameters

| Parameter    | Type    | Required | Description                                                     |
| ------------ | ------- | -------- | --------------------------------------------------------------- |
| url          | string  | Yes      | The URL of the website to scrape                                |
| output\_type | string  | No       | Either "plaintext" (default) or "markdown"                      |
| use\_cache   | boolean | No       | If true, returns cached result if available. Default: false     |
| webhook      | string  | No       | URL to receive webhook notification when processing is complete |

### Response

#### Immediate Response (HTTP 200)

If processing completes within 20 seconds, you'll receive the full result:

```json
{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "completed",
    "timestamp": 1635545600,
    "text_content": "Extracted text content...",
    "meta": {
        "title": "Page Title",
        "description": "Meta description..."
    },
    "links": [
        {
            "text": "Link text",
            "url": "https://example.com/link",
            "type": "internal"
        }
    ],
    "images": [
        {
            "url": "https://example.com/image.jpg",
            "alt": "Image description"
        }
    ]
}
```

#### Processing Response (HTTP 202)

If processing takes longer than 20 seconds:

```json
{
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "url": "https://example.com",
    "status": "processing",
    "message": "Processing your request. Please check status later."
}
```

#### Error Response (HTTP 4XX/5XX)

```json
{
    "error": "ERROR_CODE: Error message"
}
```

Common error codes:

* E001: Invalid request format
* E003: Invalid URL format
* E006: Storage service error
* E008: Content processing failed

### Response Fields

| Field         | Type   | Description                                            |
| ------------- | ------ | ------------------------------------------------------ |
| request\_id   | string | Unique identifier for the request                      |
| url           | string | The URL that was scraped                               |
| status        | string | Status of the request ("completed" or "processing")    |
| timestamp     | number | Unix timestamp of when the request was processed       |
| text\_content | string | The extracted text content (if output\_type=plaintext) |
| meta          | object | Metadata from the page                                 |
| links         | array  | Array of links found on the page                       |
| images        | array  | Array of images found on the page                      |

### Example Curl Request

```bash
curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "output_type": "plaintext",
    "use_cache": false
}'
```

### Notes

* The API supports both synchronous and asynchronous processing
* For pages requiring longer processing time, use the status check endpoint to poll for results
* Use webhooks for automatic notification when processing completes
* Cache results are available for 15 days


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.yetanotherapi.com/web-scraper/webpage-links-scrapper.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
