PDF Parser

PDF Document Processing Guide

Document Parser API supports extraction of text content from PDF files.

Endpoint

POST /documents/parse

PDF-Specific Configuration

{
    "url": "string",          // URL of the PDF document
    "type": "pdf",           // Specify "pdf" for PDF processing
    "output": "plain|markdown",
    "webhook": "string"      // Optional webhook URL
}

Supported PDF Features

Single and multi-page PDFs
Text-based PDFs
Scanned PDFs (using OCR)
Password-protected PDFs (not supported)
Maximum file size: 50MB

Example Request

curl --location 'https://api.yetanotherapi.com/documents/parse' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/document.pdf",
    "type": "pdf",
    "output": "markdown"
}'

Response Format

Each page's content is separated by a page break marker:

{
    "requestId": "string",
    "status": "COMPLETED",
    "data": "Page 1 content\n\n=== Page Break ===\n\nPage 2 content"
}

PDF-Specific Limitations

Forms and fillable fields are processed as static text
Complex layouts may affect text ordering
Headers and footers are included in the extracted text
Images within PDFs are not processed
PDF versions supported: 1.0 to 2.0

PreviousDocument parser NextDoc Parser

Last updated 6 months ago

Was this helpful?