# Web Scraper

Welcome to the Web Scraper API documentation. Our API provides powerful web scraping capabilities with optional LLM (Language Model) processing, making it easy to extract and structure web content for your applications.

### Features

* **Basic Web Scraping**: Extract text content, links, images, and metadata from any webpage
* **Markdown Support**: Get content in either plaintext or markdown format
* **LLM Processing**: Use AI to structure and analyze scraped content
* **Caching**: Improve performance with optional result caching
* **Webhook Notifications**: Receive results asynchronously via webhooks
* **Rich Metadata**: Extract meta tags, schema.org data, and structured content

### Authentication

All API requests require an API key, which should be included in the `x-api-key` header:

```bash
--header 'x-api-key: your-api-key'
```

### Base URL

```
https://api.yetanotherapi.com/web-scrapper/
```

### Available Endpoints

| Method | Endpoint        | Description                          | Documentation  |
| ------ | --------------- | ------------------------------------ | -------------- |
| POST   | `/`             | Submit scraping request              | Basic Scraping |
| POST   | `/`             | Submit LLM scraping request          | LLM Scraping   |
| GET    | `/{request_id}` | Check request status and get results | Status Check   |

### Quick Start

#### Basic Scraping Request

```bash
curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "output_type": "plaintext"
}'
```

#### LLM Processing Request

```bash
curl --location 'https://api.yetanotherapi.com/web-scrapper/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: your-api-key' \
--data '{
    "url": "https://example.com",
    "use_llm": true,
    "prompt": "Extract main topics and summarize key points",
    "openai_key_id": "your-key-id"
}'
```

### Processing Modes

#### Synchronous Processing

* Results returned immediately if processing completes within 20 seconds
* Best for simple pages and quick scraping tasks

#### Asynchronous Processing

* For longer running requests
* Status check endpoint for polling results
* Webhook notifications available

### Common Use Cases

1. **Content Aggregation**
   * Extract articles and blog posts
   * Monitor news and updates
   * Collect product information
2. **Data Analysis**
   * Extract structured data
   * Analyze web content
   * Generate insights using LLM
3. **Content Transformation**
   * Convert HTML to markdown
   * Extract clean text content
   * Generate structured JSON

### Best Practices

1. **Use Caching**
   * Enable `use_cache` for frequently accessed pages.
   * Cache results available for 15 days.
   * Reduces processing time and AI cost.
2. **Handle Asynchronous Processing**
   * Implement webhook endpoint for notifications
   * Use status check endpoint with reasonable polling intervals (5-15 minutes)
   * Handle timeout scenarios gracefully
3. **LLM Processing**
   * Write clear, specific prompts
   * Consider content length and complexity
   * Test with sample content first

### Error Handling

All API endpoints use standard HTTP response codes:

* 200: Success
* 202: Accepted (processing)
* 400: Bad request
* 401: Unauthorized
* 429: Rate limit exceeded
* 500: Server error

Error responses include detailed messages and codes:

```json
{
    "error": "E001: Invalid request format"
}
```

### Support

For support requests or questions:

1. Email: <hey@manojlk.work>
2. API Support Portal or send an email: <https://app.yetanotherapi.com>

### Changelog

See our changelog for API updates and changes.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.yetanotherapi.com/web-scraper.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
