Development guide¶

General guidelines¶

The roadmap is the single source for open tasks, priorities, and timelines. Review it before starting work or filing a pull request to avoid duplication.

Run ./bootstrap.sh before development to install dependencies, copy .env.example, and execute linting/tests.

Keep preprocessing, OCR, mapping, QC, import, and export phases decoupled.
Prefer configuration-driven behavior and avoid hard-coded values.
Document new processing phases with reproducible examples.

Pair Programming with AI Agents (Encouraged)¶

This project promotes collaborative development between humans and AI agents. Agents should act as active programming partners, not just code generators:

AI Agent Partnership Guidelines¶

Question assumptions about problem-solving approaches
Balance technical implementation with practical usability
Regularly suggest hands-on testing with real data
Keep focus on end-user workflows and institutional needs
Identify gaps between code functionality and real-world usage
Propose concrete testing protocols and validation approaches
Create actionable human work lists for tasks requiring domain expertise

Practical Development Mindset¶

Build → Test on Real Data → Iterate (not just build → build → build)
Ask "Does this solve the actual problem?" before adding complexity
Prioritize user workflows over technical elegance
Document what humans need to do alongside what code can do
Bridge the gap between development environment and production usage

This collaborative approach ensures technical solutions actually serve institutional and research needs.

Feature Implementation Workflow¶

This project uses the .specify framework for structured feature development. Follow this complete workflow for new features:

Phase 1: Specification & Clarification¶

# 1. Create feature specification
/specify <feature description>
# Creates feature branch, initializes spec.md with structured requirements

# 2. Resolve ambiguities
/clarify
# Interactive Q&A to clarify missing decisions and update spec

Example: /specify Add batch OCR processing with progress tracking and error recovery

Phase 2: Planning & Task Decomposition¶

# 3. Generate technical plan
/plan
# Creates detailed technical design in plan.md

# 4. Generate actionable tasks
/tasks
# Creates dependency-ordered tasks.md for implementation

# 5. Analyze design consistency
/analyze
# Cross-validates spec, plan, and tasks for consistency

Phase 3: Implementation & Deployment¶

# 6. Execute implementation
/implement
# Processes each task sequentially, writes code and tests

# 7. Deploy feature
/deploy
# Handles deployment with dependency resolution

Workflow Benefits¶

Structured Requirements: Clear specifications prevent scope creep
Design Validation: Cross-artifact analysis catches inconsistencies early
Dependency Management: Task ordering prevents implementation blockers
Quality Gates: Built-in validation at each phase
Traceability: Requirements → Design → Implementation → Deployment

Integration with Testing¶

The workflow integrates with our testing standards: - Specifications include testable acceptance criteria - Plans define testing strategies - Implementation phase includes test creation - Each task completion triggers validation

For advanced patterns and coordination with the meta-project system, see the .specify/ directory documentation.

Retroactive Specifications¶

This project has undergone systematic retroactive specification analysis to document major features developed before formal specification processes were established. Key findings:

Major Features Analyzed: - Apple Vision OCR Integration (v0.3.0) - 95% accuracy breakthrough - Modern UI/UX System (Current) - Complete interface transformation - Darwin Core Archive Export (v0.2.0) - Institutional compliance system

Key Lessons Learned: - Research-driven development produces superior outcomes - Architecture decisions need explicit documentation - Performance requirements should be specified upfront - Migration strategies must be planned for breaking changes

Specification Checkpoints: Going forward, specifications are required for: - Features requiring >3 days development effort - Architecture changes affecting >2 modules - New external dependencies or critical libraries - Data format or schema changes

See .specify/retro-specs/ for complete retroactive analysis and future specification strategy.

Testing and linting¶

Run the full test suite and linter before committing changes.

ruff check .
pytest

These checks help maintain a consistent code style and verify that new contributions do not introduce regressions.

Release Process¶

This project follows semantic versioning and Keep a Changelog format for all releases.

Creating a Release¶

Update version numbers:

# Update version in pyproject.toml
# Update version in CHANGELOG.md with new section

Update CHANGELOG.md:
Add new version section with date: ## [X.Y.Z] - YYYY-MM-DD
Move items from [Unreleased] to the new version section
Follow Keep a Changelog format
Add version comparison link at bottom of file

Create and push the release:

git add .
git commit -m "🚀 Release vX.Y.Z: Brief description"
git tag vX.Y.Z
git push origin main
git push origin vX.Y.Z

Update comparison links:
Update [Unreleased] link to compare from new version
Add new version comparison link

Example format:

[Unreleased]: https://github.com/devvyn/aafc-herbarium-dwc-extraction-2025/compare/vX.Y.Z...HEAD
[X.Y.Z]: https://github.com/devvyn/aafc-herbarium-dwc-extraction-2025/compare/vX.Y.W...vX.Y.Z

Critical Requirements¶

Always create git tags for releases - this enables changelog comparison links
Use semantic versioning: v0.1.0, v0.2.0, v1.0.0, etc.
Follow Keep a Changelog format - agents must maintain this structure
Update pyproject.toml version to match changelog and tag
Test the comparison links - they should work on GitHub

Changelog Format Reference¶

## [Unreleased]

## [1.0.0] - 2025-01-15
### Added
- New feature descriptions

### Changed
- Modified functionality descriptions

### Fixed
- Bug fix descriptions

[Unreleased]: https://github.com/repo/compare/v1.0.0...HEAD
[1.0.0]: https://github.com/repo/compare/v0.9.0...v1.0.0

[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group