# Document parser

## Document Parser API Documentation

Base URL: `https://api.yetanotherapi.com`

### Overview

The Document Parser API allows you to extract text content from various document formats including PDF, Word documents, and images. The service provides both synchronous and asynchronous processing with optional webhook notifications for completion.

### Authentication

All API requests require an API key sent in the header:

```
x-api-key: YOUR_API_KEY
```

### API Endpoints

## Submit Document for Processing

Submit a document for text extraction.

**Endpoint:** `POST /documents/parse`

**Headers:**

```
Content-Type: application/json
x-api-key: YOUR_API_KEY
```

**Request Body:**

```json
{
    "url": "string",          // Required: URL of the document
    "type": "string",         // Optional: Document type (default: "pdf")
    "output": "string",       // Optional: Output format (default: "plain")
    "webhook": "string"       // Optional: Webhook URL for completion notification
}
```

**Supported File Types:**

* `pdf`: PDF documents
* `doc`: Microsoft Word documents (.doc)
* `docx`: Microsoft Word documents (.docx)
* `jpg`/`jpeg`: JPEG images
* `png`: PNG images
* `txt`: Plain text files

**Output Formats:**

* `plain`: Plain text (default)
* `markdown`: Formatted markdown text

**Response:**

* Quick Processing (< 20 seconds):

```json
{
    "requestId": "string",
    "status": "COMPLETED",
    "data": "string"         // Extracted text content
}
```

* Async Processing (> 20 seconds):

```json
{
    "requestId": "string",
    "status": "PROCESSING",
    "message": "Processing in progress. Please check status endpoint."
}
```

* Error Response:

```json
{
    "error": "string",
    "details": ["string"]    // Array of error details
}
```

**Status Codes:**

* 200: Success (processing completed)
* 202: Accepted (processing continues asynchronously)
* 400: Bad Request (invalid input)
* 401: Unauthorized (invalid API key)
* 500: Internal Server Error

### Example Usage

#### cURL Example

```bash
curl --location 'https://api.yetanotherapi.com/documents/parse' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/document.pdf",
    "type": "pdf",
    "output": "markdown",
    "webhook": "https://your-webhook-url.com/callback"
}'
```

####

### Error Codes and Descriptions

| Error Code | Description             |
| ---------- | ----------------------- |
| 400-001    | Invalid file type       |
| 400-002    | Invalid URL format      |
| 400-003    | Invalid webhook URL     |
| 400-004    | Missing required field  |
| 400-005    | Invalid output format   |
| 401-001    | Invalid API key         |
| 429-001    | Rate limit exceeded     |
| 500-001    | Processing error        |
| 500-002    | Storage error           |
| 500-003    | Webhook delivery failed |

### Notes

1. Processing time varies based on document size and complexity
2. Files are stored temporarily and deleted after 7 days
3. Webhook endpoints should respond within 30 seconds
4. All timestamps are in Unix epoch format


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.yetanotherapi.com/document-parser.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
