Notes from the lab.
Methodology, results, and open questions from the Living Models team. Published openly — pre-peer-review.
Regulatory Landscape for AI-Assisted Variety Registration in the EU
Where AI-assisted variety characterisation fits — and doesn't yet fit — in EU DUS testing and variety registration requirements.
Embedding Drift in Longitudinal Breeding Programs
When a breeding program shifts over selection cycles, model embeddings drift. How we detect, quantify, and respond without retraining from scratch.
Cross-Species Transfer Learning: Rice Models Predict Sorghum Traits
How embeddings from a rice-fine-tuned model transfer to sorghum drought tolerance prediction — and what the geometry of that transfer reveals.
Scaling Pre-Training to 500M Sequences: Infrastructure Lessons
What breaks when you double your training corpus — pipeline throughput, checkpoint strategy, and data quality at scale.
Building Research Collaborations Between AI Startups and Genomics Labs
Bridging the gap between a computational startup and a wet-lab research group requires more than a data-sharing agreement. Lessons from our first 18 months.
Fine-Tuning a Genomic Foundation Model When You Have 200 Samples
Most breeding programs have 200–500 phenotyped lines — far fewer than typical ML use cases. Techniques for stable fine-tuning in this regime.
Why Pangenome Training Changes Everything for Crop Diversity
Single-reference training misses 15–40% of a species' gene space. Our pangenome-aware training strategy and why it matters for landraces.
The State of Open Plant Sequence Datasets in 2025
A curated map of publicly available plant genomic datasets — coverage, licensing, assembly quality, and suitability for foundation model training.
Predicting Disease Resistance in Tomato Without Field Inoculation
Sequence-based prediction of Fusarium wilt and late blight resistance using our fine-tuning framework — without a greenhouse booking.
Integrating Genomic Predictions into a Seed Company's Selection Pipeline
How a mid-size European cereal breeder connected our prediction API to their existing marker-assisted selection workflow — reducing evaluation time by 60%.
From SNP Arrays to Breeding Value: What Embedding Geometry Tells Us
Trait-associated SNPs self-organize into interpretable subspaces in our 512-dimensional embedding space — without any labeled data.
Modeling Yield Stability Under Three Climate Scenarios
Variety performance under RCP 4.5, 6.0, and 8.5 warming pathways — giving breeders a 10-year horizon view from sequence data.
Benchmarking Genomic Models on Crop Trait Data: A Practical Guide
The evaluation suite we built against GRIN, SeedNet, and internal phenotype datasets — and an invitation for feedback from other labs.
The Hidden Problem in Plant Genomic Training Data
80% of publicly available plant sequence data contains assembly artifacts or mislabeled cultivars. How we filter before a single training step.
Predicting Drought Tolerance from Sequence Data Alone
How our genomic embeddings correlate with curated phenotype records across 12,000 maize accessions — and where the model still fails.
Why Plant Science Needs Its Own Foundation Model
The same shift that LLMs caused in NLP is overdue in plant genomics — and sequence-only pre-training on 200M+ plant reads changes what's possible.