YetAnotherAPI Documentation
Signup for APIGo to main websiteContact Support
  • API Documentation
    • YetAnotherAPI Overview
    • Authentication
  • Integrations
    • Pabbly-Connect
  • Document parser
    • PDF Parser
    • Doc Parser
    • PNG & JPG Parser
    • TXT Parser
    • Parser Processing Status
  • Web Scrapper [deprecated]
    • Basic Web Scraper
    • LLM Web Scraper
    • Scrapper Processing Status
  • Web Scraper
    • Basic Web Scraper
    • Webpage links Scrapper
    • Metadata Scrapper
    • Webhook Notification
    • Status Check
  • LLM Web Scraper
    • Basic Text
    • Structured JSON
    • Best Practices
    • Use Cases
    • Status Check
  • UChat Webhook System
Powered by GitBook
On this page

Was this helpful?

  1. LLM Web Scraper

Best Practices

Best Practices

1. Prompt Engineering

Clear and Specific Instructions

// ❌ Bad
{
    "prompt": "Get product information"
}

// ✅ Good
{
    "prompt": "Extract the product's name, current price, original price (if on sale), available sizes, and color options. For prices, include the currency symbol."
}

Define Expected Format

// ❌ Bad
{
    "prompt": "What are the key features of this product?"
}

// ✅ Good
{
    "prompt": "List the product's key features as an array of strings, with each feature being a concise single sentence."
}

Include Validation Rules

// ❌ Bad
{
    "prompt": "Get the product price and rating"
}

// ✅ Good
{
    "prompt": "Extract the product price (as a number without currency symbol) and rating (must be between 0 and 5, with one decimal place)"
}

2. Data Structuring

Clear Hierarchy

// ❌ Bad
{
    "prompt": "Get all prices from the page"
}

// ✅ Good
{
    "prompt": "Extract pricing information in this structure: base_price (number), additional_options (array of objects with name and price), discounts (array of objects with description and amount)"
}

Handle Missing Data

// ❌ Bad
{
    "prompt": "Get the author's name and bio"
}

// ✅ Good
{
    "prompt": "Extract the author's details with these rules: name (string, use 'Anonymous' if not found), bio (string, use null if not present), role (string, use 'Contributor' if not specified)"
}

3. Performance Optimization

Focused Extraction

// ❌ Bad
{
    "prompt": "Get everything from the page"
}

// ✅ Good
{
    "prompt": "Extract only the technical specifications table, converting it to a JSON object with spec_name as keys and spec_value as values"
}

Batch Processing

// ❌ Bad: Multiple separate requests
{
    "prompt": "Get product prices"
}
{
    "prompt": "Get product features"
}

// ✅ Good: Single comprehensive request
{
    "prompt": "Extract all product information in a single structured response: prices (object), features (array), specifications (object)"
}

PreviousStructured JSONNextUse Cases

Last updated 5 months ago

Was this helpful?