Quick Start Guide for Successor¶
What You're Inheriting¶
- 2,800 herbarium specimen photos already captured and processed
- Complete OCR toolkit for extracting label text from specimen images
- Review workflows for correcting and validating extracted data
- SharePoint integration for institutional data handoff
Day 1: Get Running¶
- Check the processed data: Look in
output/occurrence.csvfor extracted specimen records - Review flagged items: Use web interface at
python review_web.pyto correct low-confidence results - Export to SharePoint: Run export scripts to transfer data to institutional systems
Your Main Tasks¶
- Quality control: Review OCR results and make corrections
- Photography: Continue photographing remaining specimens (if any)
- Data entry: Fill gaps in extracted information
- Institutional delivery: Regular exports to SharePoint and institutional databases
Key Commands¶
# Process new photos
python cli.py process --input photos/ --output results/
# Review and correct results
python review_web.py --db results/candidates.db --images photos/
# Export a versioned Darwin Core bundle
python cli.py export --output results/ --version 1.1.0
# Check processing status
sqlite3 results/app.db "SELECT status, COUNT(*) FROM processing_state GROUP BY status;"
Where Everything Lives¶
- Photos:
input/directory - Results:
output/directory - Spreadsheets: Export to SharePoint via
output/*.csv - Documentation:
docs/directory - Configuration:
config/directory
When You Need Help¶
- Check
docs/troubleshooting.mdfor common issues - Review
docs/user_guide.mdfor detailed workflows - Contact information in
HANDOVER_PRIORITIES.md
Network Setup¶
- On herbarium network: Full access to SharePoint and email
- Offline: Can still process photos and generate spreadsheets
- Sync later: Upload results when back on network
Start here: Process the 2,800 existing photos first, then continue with any remaining specimens.
[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group