Cloud API Setup Guide - 7 Vision APIs¶
Complete setup instructions for all supported cloud vision APIs for herbarium OCR.
🎯 API Overview & Strategy¶
Cost-Optimized Cascade Strategy¶
Budget APIs ($1-1.50/1000): Azure → Google → AWS Textract
Premium APIs ($2.50-15/1000): Gemini → GPT-4o → Claude
Ultra-Premium ($50/1000): GPT-4 Vision (emergency only)
Platform Recommendations¶
- macOS: Apple Vision (free, 95% accuracy) + premium APIs for difficult cases
- Windows: Azure primary + cascade fallback for comprehensive coverage
- Linux: Google Vision primary + multi-cloud fallback
1. 🔵 Microsoft Azure Computer Vision (RECOMMENDED FOR WINDOWS)¶
Why Azure First?¶
- Lowest cost: $1.00/1000 images
- Windows integration: Best Microsoft ecosystem support
- Handwriting detection: Good for herbarium labels
- Enterprise support: Institutional billing available
Setup Steps¶
- Create Azure Account: https://azure.microsoft.com/en-us/free/
- Create Computer Vision Resource:
- Get Subscription Key:
- Go to your Computer Vision resource
- Copy Key 1 and Endpoint URL
- Configure Environment:
Test Setup¶
2. 🟢 Google Vision API¶
Why Google Vision?¶
- Proven reliability: Most tested cloud OCR
- Good accuracy: 85% on herbarium specimens
- Reasonable cost: $1.50/1000 images
- Document detection: Specialized for text extraction
Setup Steps¶
- Create Google Cloud Project: https://console.cloud.google.com/
- Enable Vision API:
- Create Service Account:
- Configure Environment:
Test Setup¶
3. 🟠AWS Textract¶
Why AWS Textract?¶
- Document analysis: Excellent for structured forms
- Table extraction: Handles herbarium data sheets
- AWS integration: Good for existing AWS infrastructure
- Same cost as Google: $1.50/1000 images
Setup Steps¶
- Create AWS Account: https://aws.amazon.com/
- Create IAM User:
- Configure AWS CLI or Environment:
Test Setup¶
4. 🟡 Google Gemini Vision¶
Why Gemini?¶
- Latest AI: Google's newest multimodal model
- Scientific reasoning: Good botanical context understanding
- Moderate cost: $2.50/1000 images
- High accuracy: ~90% on complex specimens
Setup Steps¶
- Get Gemini API Key: https://aistudio.google.com/app/apikey
- Configure Environment:
- Enable Safety Settings (optional):
Test Setup¶
5. 🔴 OpenAI GPT-4o Vision¶
Why GPT-4o?¶
- Speed: Faster than GPT-4 Vision
- Cost-effective: $2.50/1000 vs $50/1000 for GPT-4
- High accuracy: 95% on herbarium specimens
- Botanical context: Excellent understanding of scientific terms
Setup Steps¶
- Create OpenAI Account: https://platform.openai.com/
- Generate API Key: https://platform.openai.com/api-keys
- Configure Environment:
- Set Model in Config:
Test Setup¶
6. 🟣 Anthropic Claude Vision¶
Why Claude Vision?¶
- Highest accuracy: 98% on herbarium specimens
- Botanical expertise: Excellent scientific reasoning
- Context understanding: Handles complex layouts
- Premium pricing: $15/1000 images
Setup Steps¶
- Create Anthropic Account: https://console.anthropic.com/
- Generate API Key: In your dashboard
- Configure Environment:
- Enable Botanical Context:
Test Setup¶
7. 🔴 OpenAI GPT-4 Vision (EMERGENCY FALLBACK)¶
Why GPT-4 Vision Last?¶
- Ultra-premium: $50/1000 images (20x more than Azure)
- High accuracy: 95% but not worth the cost premium
- Emergency only: Use when all other APIs fail
Setup Steps¶
Same as GPT-4o, but:
🚀 Quick Setup Commands¶
Complete Windows Setup (All APIs)¶
# 1. Install dependencies
uv sync
# 2. Copy Windows configuration
cp config/config.windows.toml config/config.local.toml
# 3. Add all API keys to .env
cat >> .env << EOF
AZURE_COMPUTER_VISION_SUBSCRIPTION_KEY=your-azure-key
GOOGLE_APPLICATION_CREDENTIALS=.google-credentials.json
AWS_ACCESS_KEY_ID=your-aws-key
AWS_SECRET_ACCESS_KEY=your-aws-secret
GOOGLE_API_KEY=your-gemini-key
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-claude-key
EOF
# 4. Test all APIs
python cli.py check-deps --engines azure,google,textract,gemini,gpt4o,claude,gpt
Budget-Only Setup (Minimum cost)¶
# Setup only budget APIs: Azure + Google
python cli.py process --engines azure,google --input photos/ --output results/
# Expected cost: ~$1-1.50 per 1000 specimens
Premium Setup (Maximum accuracy)¶
# Setup premium cascade: Azure → Google → Claude
python cli.py process --engines azure,google,claude --input photos/ --output results/
# Expected cost: ~$1-15 per 1000 specimens (adaptive)
💰 Cost Management¶
Budget Controls¶
# Set daily spending limits
python cli.py process --input photos/ --output results/ \
--max-daily-cost 50 --max-weekly-cost 200
Cost Monitoring¶
# Track spending by API
python cli.py stats --db results/app.db --show-api-costs
# Generate cost report
python cli.py report --db results/app.db --format cost-breakdown
Cost Optimization Tips¶
- Start with Azure/Google for 80-85% accuracy at $1-1.50/1000
- Use premium APIs selectively for low-confidence cases only
- Process in batches to manage daily spending
- Review confidence thresholds to optimize API usage
- Manual review often cheaper than ultra-premium APIs
ROI Comparison¶
1000 Specimen Processing Costs:
Manual Transcription: $1600 (40 hours @ $40/hour)
Azure Primary: $1.00 + ~$200 manual review = $201 (87% savings)
Google Primary: $1.50 + ~$150 manual review = $151.50 (91% savings)
Claude Premium: $15.00 + ~$50 manual review = $65 (96% savings)
Mixed Strategy (optimal): $3-8 + ~$100 manual review = $103-108 (93-94% savings)
🔧 Troubleshooting¶
Common Issues¶
API Authentication Failures
# Check all environment variables
python cli.py check-deps --engines all --verbose
# Test individual APIs
python cli.py test-api --engine azure --sample-image test.jpg
Cost Overruns
# Check current spending
python cli.py stats --db results/app.db --show-costs
# Reset daily limits
python cli.py config --set daily_cost_limit 25.00
Poor Results from Budget APIs
# Try next tier up
python cli.py process --engines google,gemini --input photos/ --output results/
# Or focus on specific problem cases
python cli.py process --input photos/ --output results/ \
--filter "confidence < 0.80" --engine claude
API-Specific Issues¶
Azure: Ensure correct region in endpoint URL Google: Service account JSON must be valid and accessible AWS: Check IAM permissions for Textract access Gemini: API key must be from Google AI Studio, not Google Cloud OpenAI: Ensure sufficient credit balance in account Claude: Verify API key is for Claude 3.5, not older models
📊 Performance Expectations¶
Accuracy by API Type¶
- Budget APIs: 80-85% accuracy (Azure, Google, AWS)
- Premium APIs: 90-95% accuracy (Gemini, GPT-4o)
- Ultra-Premium: 95-98% accuracy (Claude, GPT-4)
Speed by API¶
- Fastest: Google Vision (~0.5s per image)
- Fast: Azure, AWS Textract (~1s per image)
- Medium: Gemini, GPT-4o (~2-3s per image)
- Slower: Claude, GPT-4 (~3-5s per image)
Reliability by Provider¶
- Most Reliable: Google (Vision & Gemini)
- Enterprise Grade: Microsoft Azure, AWS
- Premium Quality: Anthropic Claude
- Versatile: OpenAI (GPT-4o & GPT-4)
Next Step: Choose your APIs based on budget and accuracy needs, then run your first batch test!
# Recommended first test (50 specimens)
python cli.py process --input test_batch/ --output test_results/ \
--config config/config.windows.toml --max-cost 5.00
[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group