Document parser

Document Parser API Documentation

Base URL: https://api.yetanotherapi.com

Overview

The Document Parser API allows you to extract text content from various document formats including PDF, Word documents, and images. The service provides both synchronous and asynchronous processing with optional webhook notifications for completion.

Authentication

All API requests require an API key sent in the header:

x-api-key: YOUR_API_KEY

API Endpoints

Submit Document for Processing

Submit a document for text extraction.

Endpoint: POST /documents/parse

Headers:

Content-Type: application/json
x-api-key: YOUR_API_KEY

Request Body:

{
    "url": "string",          // Required: URL of the document
    "type": "string",         // Optional: Document type (default: "pdf")
    "output": "string",       // Optional: Output format (default: "plain")
    "webhook": "string"       // Optional: Webhook URL for completion notification
}

Supported File Types:

  • pdf: PDF documents

  • doc: Microsoft Word documents (.doc)

  • docx: Microsoft Word documents (.docx)

  • jpg/jpeg: JPEG images

  • png: PNG images

  • txt: Plain text files

Output Formats:

  • plain: Plain text (default)

  • markdown: Formatted markdown text

Response:

  • Quick Processing (< 20 seconds):

{
    "requestId": "string",
    "status": "COMPLETED",
    "data": "string"         // Extracted text content
}
  • Async Processing (> 20 seconds):

{
    "requestId": "string",
    "status": "PROCESSING",
    "message": "Processing in progress. Please check status endpoint."
}
  • Error Response:

{
    "error": "string",
    "details": ["string"]    // Array of error details
}

Status Codes:

  • 200: Success (processing completed)

  • 202: Accepted (processing continues asynchronously)

  • 400: Bad Request (invalid input)

  • 401: Unauthorized (invalid API key)

  • 500: Internal Server Error

Example Usage

cURL Example

curl --location 'https://api.yetanotherapi.com/documents/parse' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/document.pdf",
    "type": "pdf",
    "output": "markdown",
    "webhook": "https://your-webhook-url.com/callback"
}'

Error Codes and Descriptions

Error Code
Description

400-001

Invalid file type

400-002

Invalid URL format

400-003

Invalid webhook URL

400-004

Missing required field

400-005

Invalid output format

401-001

Invalid API key

429-001

Rate limit exceeded

500-001

Processing error

500-002

Storage error

500-003

Webhook delivery failed

Notes

  1. Processing time varies based on document size and complexity

  2. Files are stored temporarily and deleted after 7 days

  3. Webhook endpoints should respond within 30 seconds

  4. All timestamps are in Unix epoch format

Last updated