Handover Priorities - 2 Months Remaining¶
Current Status¶
- 2,800 photos captured from pressed plant specimens
- Physical access: Herbarium for specimens/equipment, camera, photography box
- Network access: Work network for email, SharePoint, Microsoft services
- Original task: Digitize specimens + create spreadsheet of labels/DAS numbers
- Goal: Maximize tools and workflows for successor
Immediate Priorities (Next 2 Months)¶
Phase 1: Process Existing Photos (Week 1-2)¶
Objective: Get maximum value from 2.8k photos already captured
- Bulk OCR Processing
- Run toolkit on all 2,800 images using fastest engines (Vision/Tesseract)
- Generate initial CSV with extracted labels and confidence scores
-
Flag low-confidence results for manual review
-
Quick Quality Assessment
- Sample 100-200 results to assess OCR accuracy
- Identify common failure patterns
- Document preprocessing needs (lighting, contrast, rotation)
Phase 2: Streamline Review Workflow (Week 3-4)¶
Objective: Create efficient review process for successor
- Optimize Review Interface
- Set up web review interface for side-by-side image/text comparison
- Configure batch approval workflows
-
Create shortcuts for common corrections (collector names, locations)
-
Institutional Integration
- Export review-ready spreadsheets to SharePoint
- Set up import/export workflows with Microsoft services
- Document network connectivity requirements
Phase 3: Knowledge Transfer Package (Week 5-6)¶
Objective: Complete handover documentation
- Successor Onboarding Guide
- Step-by-step processing workflow
- Camera setup and photography best practices
- Common troubleshooting scenarios
-
Network setup and SharePoint integration
-
Institutional Deliverables
- Final processed dataset from 2.8k photos
- Quality metrics and accuracy assessment
- Recommended workflow for remaining specimens
- Cost/time estimates for future work
Phase 4: Future-Proofing (Week 7-8)¶
Objective: Set up successor for long-term success
- Automation Setup
- Configure batch processing scripts
- Set up scheduled exports to institutional systems
-
Document maintenance and updates
-
Expansion Planning
- Recommend hardware/software for scaling
- Document integration with GBIF and other databases
- Create roadmap for remaining collection digitization
Key Deliverables for Successor¶
Technical Package¶
- Fully processed dataset from 2,800 photos
- Working OCR + review workflow
- SharePoint integration scripts
- Installation and setup documentation
Institutional Package¶
- Spreadsheet template with standardized fields
- Quality control procedures
- Network setup and security documentation
- Cost/benefit analysis for continued digitization
Knowledge Package¶
- Photography best practices guide
- Common specimen types and OCR challenges
- Workflow optimization lessons learned
- Contact information for technical support
Success Metrics¶
- Data: All 2,800 photos processed with quality scores
- Workflow: Successor can process 50+ specimens/day
- Integration: Seamless handoff to SharePoint/institutional systems
- Sustainability: Clear path for remaining collection digitization
Risk Mitigation¶
- Time constraints: Focus on core workflow before advanced features
- Technical complexity: Emphasize documentation and training over optimization
- Institutional continuity: Package everything for offline operation if network access limited
This document prioritizes practical deliverables over technical development, maximizing impact for institutional continuity and successor productivity.
[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group