Platform Science Use Cases Research Blog Request Access

Notes from the lab.

Methodology, results, and open questions from the Living Models team. Published openly — pre-peer-review.

Abstract visualization of regulatory documentation and genomic data intersecting in EU plant variety registration
Regulatory

Regulatory Landscape for AI-Assisted Variety Registration in the EU

Where AI-assisted variety characterisation fits — and doesn't yet fit — in EU DUS testing and variety registration requirements.

Cyril Veran
Abstract visualization of embedding space drift over breeding cycles showing population shift
Monitoring

Embedding Drift in Longitudinal Breeding Programs

When a breeding program shifts over selection cycles, model embeddings drift. How we detect, quantify, and respond without retraining from scratch.

Cyril Veran
Abstract visualization of cross-species model transfer between rice and sorghum genomes
Transfer Learning

Cross-Species Transfer Learning: Rice Models Predict Sorghum Traits

How embeddings from a rice-fine-tuned model transfer to sorghum drought tolerance prediction — and what the geometry of that transfer reveals.

Cyril Veran
Abstract visualization of large-scale distributed training infrastructure for genomic sequence data
Infrastructure

Scaling Pre-Training to 500M Sequences: Infrastructure Lessons

What breaks when you double your training corpus — pipeline throughput, checkpoint strategy, and data quality at scale.

Cyril Veran
Abstract visualization of research collaboration between computational and wet-lab teams
Research

Building Research Collaborations Between AI Startups and Genomics Labs

Bridging the gap between a computational startup and a wet-lab research group requires more than a data-sharing agreement. Lessons from our first 18 months.

Cyril Veran
Abstract visualization of machine learning model fine-tuning with limited data
Fine-tuning

Fine-Tuning a Genomic Foundation Model When You Have 200 Samples

Most breeding programs have 200–500 phenotyped lines — far fewer than typical ML use cases. Techniques for stable fine-tuning in this regime.

Cyril Veran
Abstract pangenome diversity visualization showing multiple crop genome variations
Pangenome

Why Pangenome Training Changes Everything for Crop Diversity

Single-reference training misses 15–40% of a species' gene space. Our pangenome-aware training strategy and why it matters for landraces.

Cyril Veran
Abstract open dataset archive visualization for plant genomics
Open Data

The State of Open Plant Sequence Datasets in 2025

A curated map of publicly available plant genomic datasets — coverage, licensing, assembly quality, and suitability for foundation model training.

Cyril Veran
Abstract plant pathogen resistance prediction visualization
Disease Resistance

Predicting Disease Resistance in Tomato Without Field Inoculation

Sequence-based prediction of Fusarium wilt and late blight resistance using our fine-tuning framework — without a greenhouse booking.

Cyril Veran
Abstract API data pipeline integration visualization
API

Integrating Genomic Predictions into a Seed Company's Selection Pipeline

How a mid-size European cereal breeder connected our prediction API to their existing marker-assisted selection workflow — reducing evaluation time by 60%.

Cyril Veran
Abstract multidimensional embedding space visualization for genomic variants
Embeddings

From SNP Arrays to Breeding Value: What Embedding Geometry Tells Us

Trait-associated SNPs self-organize into interpretable subspaces in our 512-dimensional embedding space — without any labeled data.

Cyril Veran
Abstract climate scenario modeling visualization for wheat yield stability
Climate

Modeling Yield Stability Under Three Climate Scenarios

Variety performance under RCP 4.5, 6.0, and 8.5 warming pathways — giving breeders a 10-year horizon view from sequence data.

Cyril Veran
Abstract benchmark comparison visualization for genomic prediction models
Benchmarks

Benchmarking Genomic Models on Crop Trait Data: A Practical Guide

The evaluation suite we built against GRIN, SeedNet, and internal phenotype datasets — and an invitation for feedback from other labs.

Cyril Veran
Abstract data quality and filtering visualization for genomic datasets
Data Quality

The Hidden Problem in Plant Genomic Training Data

80% of publicly available plant sequence data contains assembly artifacts or mislabeled cultivars. How we filter before a single training step.

Cyril Veran
Abstract drought tolerance prediction visualization
Drought

Predicting Drought Tolerance from Sequence Data Alone

How our genomic embeddings correlate with curated phenotype records across 12,000 maize accessions — and where the model still fails.

Cyril Veran
Abstract visualization representing foundation model pre-training on plant genomic sequences
Foundation Models

Why Plant Science Needs Its Own Foundation Model

The same shift that LLMs caused in NLP is overdue in plant genomics — and sequence-only pre-training on 200M+ plant reads changes what's possible.

Cyril Veran