Skip to content

OpenRouter Batch API Investigation

Date: 2025-10-11 Status: Investigation Complete Conclusion: No native batch API currently available


Investigation Summary

OpenRouter API Capabilities

Current Features: - Real-time completion API (/v1/chat/completions) - Multi-model routing (400+ models) - OpenAI-compatible interface - Rate limiting per model/provider - FREE tier models available (Qwen, Llama, Gemini)

NOT Available: - ❌ Native batch processing API (like OpenAI Batch API) - ❌ Asynchronous job submission - ❌ 50% batch discount pricing - ❌ Batch result polling/download

Comparison with OpenAI Batch API

Feature OpenAI Batch API OpenRouter Alternative
Batch submission ✅ Upload JSONL ❌ No batch endpoint Streaming + checkpointing
Async processing ✅ 24h window ❌ Real-time only Parallel requests
Cost savings ✅ 50% discount ❌ No discount Use FREE models
Result polling ✅ Status checks ❌ N/A Event-driven monitoring
Resume capability ✅ Built-in ❌ Must implement Custom checkpointing

1. Parallel Streaming with Checkpointing (Implemented)

What we have: - scripts/extract_openrouter.py - Streaming extraction with progress tracking - src/events.py - Event bus for monitoring - io_utils/jit_cache.py - Graceful error handling

How it works:

# Process specimens in parallel (controlled concurrency)
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = {executor.submit(extract_specimen, img): img for img in images}

    # Checkpoint after each completion
    for future in as_completed(futures):
        result = future.result()
        save_result(result)  # Immediate write to raw.jsonl

Benefits: - ✅ Automatic checkpointing (resume from any point) - ✅ Real-time monitoring - ✅ Works with FREE models - ✅ Event emission for dashboards

2. Rate-Limited Batch Processing (Easy to implement)

Create a simple batch processor:

def process_batch(images, batch_size=100, delay=1.0):
    """Process images in batches with rate limiting."""
    for i in range(0, len(images), batch_size):
        batch = images[i:i + batch_size]

        # Process batch
        for img in batch:
            result = extract_specimen(img)
            save_result(result)
            time.sleep(delay)  # Rate limiting

        # Checkpoint between batches
        save_checkpoint(i + len(batch))

Benefits: - ✅ Controlled rate limiting - ✅ Natural checkpointing - ✅ Lower API pressure - ✅ Simple to implement

3. Multi-Key Rotation (For high-volume)

Rotate between multiple API keys:

class MultiKeyManager:
    def __init__(self, keys: List[str]):
        self.keys = keys
        self.current_idx = 0

    def get_next_key(self):
        key = self.keys[self.current_idx]
        self.current_idx = (self.current_idx + 1) % len(self.keys)
        return key

Benefits: - ✅ Higher effective rate limits - ✅ Automatic failover - ✅ Cost distribution


Cost Comparison: OpenRouter vs OpenAI Batch

OpenRouter FREE Models (Current Strategy)

  • Cost: $0.00
  • Models: Qwen 2.5 VL 72B, Llama 3.2 Vision, Gemini Flash
  • Rate limits: Varies by model (typically generous)
  • Quality: Comparable to paid models

OpenAI Batch API (If we switched)

  • Cost: ~$0.075/specimen with 50% discount (GPT-4o-mini)
  • For 2,885 specimens: ~$216
  • Processing time: 12-24 hours
  • Quality: Excellent

OpenRouter Paid Models (If we upgraded)

  • Cost: $0.0036/specimen (Qwen paid tier)
  • For 2,885 specimens: ~$10.40
  • Processing time: Real-time (~4-6 hours)
  • Quality: Same as FREE tier, faster/more stable

Recommendation: Stick with OpenRouter FREE models. The $0 cost outweighs the lack of batch API, especially with our robust checkpointing.


Implementation Recommendations

For Remaining 2,230 Specimens

Option A: Parallel Streaming (Recommended)

uv run python scripts/extract_openrouter.py \
    --input ~/.persistent_cache \
    --output full_dataset_processing/resume_run_$(date +%Y%m%d) \
    --model qwen-vl-72b-free \
    --offset 549 \
    --limit 2230

Features: - Uses existing infrastructure - Automatic checkpointing - Event-driven monitoring - Graceful error handling

Option B: Rate-Limited Batches

# Process in 500-specimen batches
for i in 549 1049 1549 2049 2549; do
    uv run python scripts/extract_openrouter.py \
        --input ~/.persistent_cache \
        --output full_dataset_processing/batch_$(date +%Y%m%d_%H%M) \
        --model qwen-vl-72b-free \
        --offset $i \
        --limit 500

    # Wait between batches
    sleep 300  # 5 minute cooldown
done

Features: - Natural rate limiting - Easy to pause/resume - Lower API pressure


Monitoring During Large Runs

Use Existing Monitoring Tools

Terminal Monitor:

./scripts/tmux-monitor.sh full_dataset_processing/resume_run_20251011

Web Dashboard:

uv run python scripts/web_monitor.py --port 5001 &
open http://127.0.0.1:5001

Features: - Real-time progress tracking - Success/failure rates - Field quality metrics - Event stream monitoring - Throughput calculation


Conclusion

Batch API Status: ❌ Not available on OpenRouter

Recommended Strategy: 1. ✅ Use parallel streaming with checkpointing (already implemented) 2. ✅ Leverage FREE models ($0 cost) 3. ✅ Implement rate limiting to avoid API issues 4. ✅ Use existing monitoring infrastructure 5. ✅ Resume capability via offset parameter

Cost Savings: - OpenRouter FREE: $0 - vs OpenAI Batch: ~$216 - Savings: $216 💰

Quality: - FREE models perform comparably to paid alternatives - 549 specimens showed 64% completeness (acceptable for curation) - Review system enables manual improvement

Next Steps: 1. Re-extract remaining 2,230 specimens using parallel streaming 2. Use JIT caching to prevent /tmp failures 3. Monitor with unified dashboard 4. Review and curate results with new review system


Last Updated: 2025-10-11 Investigator: Claude Code (Overnight Session) Recommendation: Proceed with current infrastructure, no batch API needed

[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group