Skip to content

Installation

Get started with Herbarium DWC Extraction in minutes.


Requirements

  • Python: 3.11 or higher
  • Disk space: ~1GB for dependencies, ~5GB for image cache
  • Memory: 4GB minimum (8GB recommended for large batches)
  • OS: macOS (recommended), Linux, Windows

Quick Install

# Clone repository
git clone https://github.com/devvyn/aafc-herbarium-dwc-extraction-2025.git
cd aafc-herbarium-dwc-extraction-2025

# Install dependencies
./bootstrap.sh

# Verify installation
python cli.py --help

Platform-Specific Setup

Apple Vision API works out-of-the-box (FREE, no API keys required)

# Check available engines
python cli.py check-deps

# Expected output:
# ✓ Apple Vision - Available (FREE)
# ✓ Python environment - OK

Linux/Windows

Requires cloud API keys for vision extraction.

  1. Copy environment template:

    cp .env.example .env
    

  2. Add API keys to .env:

    # OpenAI (for GPT-4o-mini extraction)
    OPENAI_API_KEY="your-key-here"
    
    # OpenRouter (for FREE models - recommended)
    OPENROUTER_API_KEY="your-key-here"
    
    # Optional: Other providers
    # ANTHROPIC_API_KEY=""
    # GOOGLE_API_KEY=""
    

  3. Get API keys:

  4. OpenAI API
  5. OpenRouter - FREE tier available

Development Setup

For contributors and developers:

# Install with dev dependencies
uv sync

# Run tests
pytest

# Run linter
ruff check . --fix

# Build documentation
mkdocs serve

Docker Installation (Optional)

# Build image
docker build -t herbarium-dwc .

# Run extraction
docker run -v $(pwd)/photos:/photos \
           -v $(pwd)/results:/results \
           -e OPENAI_API_KEY=$OPENAI_API_KEY \
           herbarium-dwc \
           python cli.py process --input /photos --output /results

Verification

Test your installation:

# Check all dependencies
python cli.py check-deps

# Run test extraction (if you have sample images)
python cli.py process --input test_images/ --output test_results/ --limit 1

Troubleshooting

Common Issues

1. uv command not found

Install uv package manager:

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Python version mismatch

Ensure Python 3.11+:

python --version  # Should be 3.11 or higher

# If not, install via:
# macOS: brew install python@3.11
# Ubuntu: sudo apt install python3.11
# Windows: Download from python.org

3. Apple Vision not available

Apple Vision only works on macOS. On other platforms, use cloud APIs.

4. Out of memory errors

Reduce batch size:

python cli.py process --input photos/ --output results/ --batch-size 10


Next Steps

After installation, you can:

  • Run your first extraction - Use the Quick Install example above
  • Explore sample code - Check examples/ directory in the repository
  • Review documentation - See the GitHub README for complete usage guide

[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group