Document parser

Document Parser API Documentation

Base URL: https://api.yetanotherapi.com

Overview

The Document Parser API allows you to extract text content from various document formats including PDF, Word documents, and images. The service provides both synchronous and asynchronous processing with optional webhook notifications for completion.

Authentication

All API requests require an API key sent in the header:

x-api-key: YOUR_API_KEY

API Endpoints

Submit Document for Processing

Submit a document for text extraction.

Endpoint: POST /documents/parse

Headers:

Content-Type: application/json
x-api-key: YOUR_API_KEY

Request Body:

{
    "url": "string",          // Required: URL of the document
    "type": "string",         // Optional: Document type (default: "pdf")
    "output": "string",       // Optional: Output format (default: "plain")
    "webhook": "string"       // Optional: Webhook URL for completion notification
}

Supported File Types:

pdf: PDF documents
doc: Microsoft Word documents (.doc)
docx: Microsoft Word documents (.docx)
jpg/jpeg: JPEG images
png: PNG images
txt: Plain text files

Output Formats:

plain: Plain text (default)
markdown: Formatted markdown text

Response:

Quick Processing (< 20 seconds):

{
    "requestId": "string",
    "status": "COMPLETED",
    "data": "string"         // Extracted text content
}

Async Processing (> 20 seconds):

{
    "requestId": "string",
    "status": "PROCESSING",
    "message": "Processing in progress. Please check status endpoint."
}

Error Response:

{
    "error": "string",
    "details": ["string"]    // Array of error details
}

Status Codes:

200: Success (processing completed)
202: Accepted (processing continues asynchronously)
400: Bad Request (invalid input)
401: Unauthorized (invalid API key)
500: Internal Server Error

Example Usage

cURL Example

curl --location 'https://api.yetanotherapi.com/documents/parse' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/document.pdf",
    "type": "pdf",
    "output": "markdown",
    "webhook": "https://your-webhook-url.com/callback"
}'

Error Codes and Descriptions

Error Code

Description

400-001

Invalid file type

400-002

Invalid URL format

400-003

Invalid webhook URL

400-004

Missing required field

400-005

Invalid output format

401-001

Invalid API key

429-001

Rate limit exceeded

500-001

Processing error

500-002

Storage error

500-003

Webhook delivery failed

Notes

Processing time varies based on document size and complexity
Files are stored temporarily and deleted after 7 days
Webhook endpoints should respond within 30 seconds
All timestamps are in Unix epoch format

PreviousPabbly-Connect NextPDF Parser

Last updated 7 months ago

Was this helpful?