PDF Parser

PDF Document Processing Guide

Document Parser API supports extraction of text content from PDF files.

Endpoint

POST /documents/parse

PDF-Specific Configuration

{
    "url": "string",          // URL of the PDF document
    "type": "pdf",           // Specify "pdf" for PDF processing
    "output": "plain|markdown",
    "webhook": "string"      // Optional webhook URL
}

Supported PDF Features

  • Single and multi-page PDFs

  • Text-based PDFs

  • Scanned PDFs (using OCR)

  • Password-protected PDFs (not supported)

  • Maximum file size: 50MB

Example Request

curl --location 'https://api.yetanotherapi.com/documents/parse' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/document.pdf",
    "type": "pdf",
    "output": "markdown"
}'

Response Format

Each page's content is separated by a page break marker:

{
    "requestId": "string",
    "status": "COMPLETED",
    "data": "Page 1 content\n\n=== Page Break ===\n\nPage 2 content"
}

PDF-Specific Limitations

  1. Forms and fillable fields are processed as static text

  2. Complex layouts may affect text ordering

  3. Headers and footers are included in the extracted text

  4. Images within PDFs are not processed

  5. PDF versions supported: 1.0 to 2.0

Last updated