Foundation models for crop genomics.
Living Models trains large-scale AI on plant sequence data to predict phenotype from genotype — cutting years off trait selection cycles for seed companies and breeding programmes.
From sequence to prediction in three steps
Submit your genotype data
Upload VCF, FASTA, or SNP array CSV via our REST API or Python SDK. We accept standard plant genomics formats with no preprocessing required.
Our model computes genomic embeddings
The foundation model generates a 512-dimensional representation encoding population history, evolutionary context, and variant-level signal.
Receive trait predictions
Get drought tolerance index, yield stability score, and pathogen resistance flags — each with a confidence interval and known limitations declared.
import livingmodels as lm
result = lm.predict(vcf_path='sample.vcf', traits='all')
print(result.drought_tolerance_index) # 0.847 (CI: 0.81–0.88)
Foundation model pre-trained on plant biology
Our model was trained ground-up for plant genomics — intron-aware tokenization, multi-species joint training, and variant-level embeddings validated against GWAS hits across five crops.
- 47 crop species + 120 wild relative taxa in training corpus
- Ensembl Plants + NCBI SRA + internal curation pipeline
- Wheat, maize, tomato, soybean, sorghum coverage
- k-mer tokenizer tuned for transposons and tandem repeats
Fine-tuning for your breeding program
Few-shot fine-tuning on as few as 200 phenotyped lines. Adapter layers preserve pre-training signal while capturing your program's specific genetic background and environment.
- Works with as few as 200 phenotyped accessions
- Adapter architecture preserves pre-trained representations
- Runs on your proprietary data with confidentiality guarantees
- Outputs are confidence-bounded, not point estimates
Predict what breeders actually measure
Six validated prediction targets covering the traits that matter most in modern crop development programmes.
Probability-weighted index for survival under water deficit, scored against a panel of 3,400 characterised maize and wheat accessions. Returns a point estimate with confidence interval.
Cross-environment consistency score derived from embedding-space proximity to varieties with documented multi-location trial stability. Validated across 3 climate zones.
Resistance scores for six common pathogens including Fusarium, late blight, and powdery mildew.
Predicted days-to-heading distribution under standard and extended photoperiod regimes. Useful for matching variety flowering time to target environment growing season.
Genotypic nitrogen use efficiency signal estimated from embedding-space proximity to phenotyped high-NUE accessions. Particularly relevant for low-input breeding programs.
Predicted performance under high-temperature episodes during critical growth windows.
From the breeding programs using it
We submitted 800 candidate lines and received ranked drought tolerance predictions within hours. We brought 40 into the first field entry season — the shortlist quality was better than anything we'd achieved with marker-assisted selection alone.
We pre-screened 1,200 tomato lines for Fusarium and late blight resistance — without a single inoculation trial. Living Models predicted the resistance profile from sequence alone. The correlation with our inoculation data was compelling enough to change our pipeline.
Ready to compress six seasons into one?
We onboard research teams on a case-by-case basis. Bring your VCF file to a 30-minute technical call.