YetAnotherAPI Documentation
Signup for APIGo to main websiteContact Support
  • API Documentation
    • YetAnotherAPI Overview
    • Authentication
  • Integrations
    • Pabbly-Connect
  • Document parser
    • PDF Parser
    • Doc Parser
    • PNG & JPG Parser
    • TXT Parser
    • Parser Processing Status
  • Web Scrapper [deprecated]
    • Basic Web Scraper
    • LLM Web Scraper
    • Scrapper Processing Status
  • Web Scraper
    • Basic Web Scraper
    • Webpage links Scrapper
    • Metadata Scrapper
    • Webhook Notification
    • Status Check
  • LLM Web Scraper
    • Basic Text
    • Structured JSON
    • Best Practices
    • Use Cases
    • Status Check
  • UChat Webhook System
Powered by GitBook
On this page
  • PDF Document Processing Guide
  • Endpoint
  • PDF-Specific Configuration
  • Supported PDF Features
  • Example Request
  • Response Format
  • PDF-Specific Limitations

Was this helpful?

  1. Document parser

PDF Parser

PDF Document Processing Guide

Document Parser API supports extraction of text content from PDF files.

Endpoint

POST /documents/parse

PDF-Specific Configuration

{
    "url": "string",          // URL of the PDF document
    "type": "pdf",           // Specify "pdf" for PDF processing
    "output": "plain|markdown",
    "webhook": "string"      // Optional webhook URL
}

Supported PDF Features

  • Single and multi-page PDFs

  • Text-based PDFs

  • Scanned PDFs (using OCR)

  • Password-protected PDFs (not supported)

  • Maximum file size: 50MB

Example Request

curl --location 'https://api.yetanotherapi.com/documents/parse' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/document.pdf",
    "type": "pdf",
    "output": "markdown"
}'

Response Format

Each page's content is separated by a page break marker:

{
    "requestId": "string",
    "status": "COMPLETED",
    "data": "Page 1 content\n\n=== Page Break ===\n\nPage 2 content"
}

PDF-Specific Limitations

  1. Forms and fillable fields are processed as static text

  2. Complex layouts may affect text ordering

  3. Headers and footers are included in the extracted text

  4. Images within PDFs are not processed

  5. PDF versions supported: 1.0 to 2.0

PreviousDocument parserNextDoc Parser

Last updated 5 months ago

Was this helpful?