Quick Start Guide for Successor¶

What You're Inheriting¶

2,800 herbarium specimen photos already captured and processed
Complete OCR toolkit for extracting label text from specimen images
Review workflows for correcting and validating extracted data
SharePoint integration for institutional data handoff

Day 1: Get Running¶

Check the processed data: Look in output/occurrence.csv for extracted specimen records
Review flagged items: Use web interface at python review_web.py to correct low-confidence results
Export to SharePoint: Run export scripts to transfer data to institutional systems

Your Main Tasks¶

Quality control: Review OCR results and make corrections
Photography: Continue photographing remaining specimens (if any)
Data entry: Fill gaps in extracted information
Institutional delivery: Regular exports to SharePoint and institutional databases

Key Commands¶

# Process new photos
python cli.py process --input photos/ --output results/

# Review and correct results
python review_web.py --db results/candidates.db --images photos/

# Export a versioned Darwin Core bundle
python cli.py export --output results/ --version 1.1.0

# Check processing status
sqlite3 results/app.db "SELECT status, COUNT(*) FROM processing_state GROUP BY status;"

Where Everything Lives¶

Photos: input/ directory
Results: output/ directory
Spreadsheets: Export to SharePoint via output/*.csv
Documentation: docs/ directory
Configuration: config/ directory

When You Need Help¶

Check docs/troubleshooting.md for common issues
Review docs/user_guide.md for detailed workflows
Contact information in HANDOVER_PRIORITIES.md

Network Setup¶

On herbarium network: Full access to SharePoint and email
Offline: Can still process photos and generate spreadsheets
Sync later: Upload results when back on network

Start here: Process the 2,800 existing photos first, then continue with any remaining specimens.

[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group