Skip to content

Simple Trial Run Guide

Since the S3 URLs are having SSL certificate issues, here are the easiest ways to run a trial:

Option 1: Use Local Images (if available)

If you have any herbarium images locally (JPG, PNG files):

# Create directory and copy images
mkdir trial_images
cp /path/to/your/images/*.jpg trial_images/

# Process with Apple Vision
python cli.py process --input trial_images/ --output trial_results/ --engine vision

# Launch review interface
python review_web.py --db trial_results/candidates.db --images trial_images/

Option 2: Test with Empty Database (Review Interface Only)

You can test the review interface even without processing:

# Create empty database
mkdir trial_results
touch trial_results/app.db
touch trial_results/candidates.db

# Test the web interface (will show empty state)
python review_web.py --db trial_results/candidates.db --images trial_images/

Option 3: Fix S3 URLs and Use CLI

If you want to bypass the SSL issue:

# Download images directly using curl (bypasses Python SSL)
mkdir trial_images

# Example using standard S3 URLs
curl -o trial_images/specimen_001.jpg "https://s3.amazonaws.com/bucket-name/path/to/image1.jpg"
curl -o trial_images/specimen_002.jpg "https://s3.amazonaws.com/bucket-name/path/to/image2.jpg"

# Then process normally
python cli.py process --input trial_images/ --output trial_results/ --engine vision

If you have access to your 2,800 specimens directory:

# Process full collection directly
python cli.py process --input /path/to/2800_specimens/ --output production_results/ --engine vision

# This will:
# - Process all images with Apple Vision OCR
# - Create production_results/app.db with all data
# - Take ~4 hours for 2,800 specimens
# - Be ready for immediate curator review

Testing the Review Workflow

Once you have processed data:

# Launch web interface
python review_web.py --db production_results/candidates.db --images /path/to/images/

# Open browser to: http://localhost:5000
# Features available:
# - Side-by-side image and extracted data
# - Edit Darwin Core fields
# - Approve/reject specimens
# - Export approved data

Most Practical Approach

For immediate testing tomorrow:

  1. Skip the trial - Go directly to full processing if you have image access
  2. Use Option 4 with your 2,800 specimens
  3. Let it run overnight (4-hour processing)
  4. Review interface ready tomorrow morning

This gives you real production data for curator testing rather than a small sample.

[AAFC]: Agriculture and Agri-Food Canada [GBIF]: Global Biodiversity Information Facility [DwC]: Darwin Core [OCR]: Optical Character Recognition [API]: Application Programming Interface [CSV]: Comma-Separated Values [IPT]: Integrated Publishing Toolkit [TDWG]: Taxonomic Databases Working Group