Web Scraper
Welcome to the Web Scraper API documentation. Our API provides powerful web scraping capabilities with optional LLM (Language Model) processing, making it easy to extract and structure web content for your applications.
Features
Basic Web Scraping: Extract text content, links, images, and metadata from any webpage
Markdown Support: Get content in either plaintext or markdown format
LLM Processing: Use AI to structure and analyze scraped content
Caching: Improve performance with optional result caching
Webhook Notifications: Receive results asynchronously via webhooks
Rich Metadata: Extract meta tags, schema.org data, and structured content
Authentication
All API requests require an API key, which should be included in the x-api-key
header:
Base URL
Available Endpoints
POST
/
Submit scraping request
Basic Scraping
POST
/
Submit LLM scraping request
LLM Scraping
GET
/{request_id}
Check request status and get results
Status Check
Quick Start
Basic Scraping Request
LLM Processing Request
Processing Modes
Synchronous Processing
Results returned immediately if processing completes within 20 seconds
Best for simple pages and quick scraping tasks
Asynchronous Processing
For longer running requests
Status check endpoint for polling results
Webhook notifications available
Common Use Cases
Content Aggregation
Extract articles and blog posts
Monitor news and updates
Collect product information
Data Analysis
Extract structured data
Analyze web content
Generate insights using LLM
Content Transformation
Convert HTML to markdown
Extract clean text content
Generate structured JSON
Best Practices
Use Caching
Enable
use_cache
for frequently accessed pages.Cache results available for 15 days.
Reduces processing time and AI cost.
Handle Asynchronous Processing
Implement webhook endpoint for notifications
Use status check endpoint with reasonable polling intervals (5-15 minutes)
Handle timeout scenarios gracefully
LLM Processing
Write clear, specific prompts
Consider content length and complexity
Test with sample content first
Error Handling
All API endpoints use standard HTTP response codes:
200: Success
202: Accepted (processing)
400: Bad request
401: Unauthorized
429: Rate limit exceeded
500: Server error
Error responses include detailed messages and codes:
Support
For support requests or questions:
Email: hey@manojlk.work
API Support Portal or send an email: https://app.yetanotherapi.com
Changelog
See our changelog for API updates and changes.
Last updated
Was this helpful?