Installation¶
Get started with Herbarium DWC Extraction in minutes.
Requirements¶
- Python: 3.11 or higher
- Disk space: ~1GB for dependencies, ~5GB for image cache
- Memory: 4GB minimum (8GB recommended for large batches)
- OS: macOS (recommended), Linux, Windows
Quick Install¶
# Clone repository
git clone https://github.com/devvyn/aafc-herbarium-dwc-extraction-2025.git
cd aafc-herbarium-dwc-extraction-2025
# Install dependencies
./bootstrap.sh
# Verify installation
python cli.py --help
Platform-Specific Setup¶
macOS (Recommended)¶
✅ Apple Vision API works out-of-the-box (FREE, no API keys required)
# Check available engines
python cli.py check-deps
# Expected output:
# ✓ Apple Vision - Available (FREE)
# ✓ Python environment - OK
Linux/Windows¶
Requires cloud API keys for vision extraction.
-
Copy environment template:
-
Add API keys to
.env: -
Get API keys:
- OpenAI API
- OpenRouter - FREE tier available
Development Setup¶
For contributors and developers:
# Install with dev dependencies
uv sync
# Run tests
pytest
# Run linter
ruff check . --fix
# Build documentation
mkdocs serve
Docker Installation (Optional)¶
# Build image
docker build -t herbarium-dwc .
# Run extraction
docker run -v $(pwd)/photos:/photos \
-v $(pwd)/results:/results \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
herbarium-dwc \
python cli.py process --input /photos --output /results
Verification¶
Test your installation:
# Check all dependencies
python cli.py check-deps
# Run test extraction (if you have sample images)
python cli.py process --input test_images/ --output test_results/ --limit 1
Troubleshooting¶
Common Issues¶
1. uv command not found
Install uv package manager:
2. Python version mismatch
Ensure Python 3.11+:
python --version # Should be 3.11 or higher
# If not, install via:
# macOS: brew install python@3.11
# Ubuntu: sudo apt install python3.11
# Windows: Download from python.org
3. Apple Vision not available
Apple Vision only works on macOS. On other platforms, use cloud APIs.
4. Out of memory errors
Reduce batch size:
Next Steps¶
After installation, you can:
- Run your first extraction - Use the Quick Install example above
- Explore sample code - Check
examples/directory in the repository - Review documentation - See the GitHub README for complete usage guide
[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group