One of the underexplored questions in plant genomic modelling is how much trait-specific signal can be transferred between related species without any additional labeled data. We had a rice drought tolerance model — fine-tuned on several thousand phenotyped accessions from a collaborating programme — and a set of sorghum accessions with no fine-tuning data available. What happened when we ran sorghum sequences through the rice model was instructive enough that we think it warrants a dedicated writeup.
The Setting: Why Rice to Sorghum
Rice and sorghum are both C3 and C4 grasses respectively — not the most obvious pairing — but they share substantial regions of conserved synteny around drought-response loci. The genomic regions associated with osmotic stress tolerance, root angle, and stomatal regulation show enough sequence conservation that our tokeniser maps homologous regions to overlapping token distributions. This is not guaranteed: transfer between, say, rice and cassava would face a much wider evolutionary distance and the synteny argument would not hold in the same way.
The practical motivation was a request from a collaborator working with West African sorghum breeders. They had sequence data on a collection of landraces and improved varieties but no phenotype records — historical phenotyping had been done on a different set of lines and was not genotyped at the same markers. They wanted to know whether our model could give them any useful ranking of drought tolerance candidates before they committed to a field evaluation season.
Direct Transfer: What the Numbers Looked Like
We ran the sorghum accessions through the rice drought tolerance model without modification — no adapter layers, no domain shift correction, just raw embedding extraction followed by the same regression head trained on rice phenotype data. The Pearson correlation between predicted and observed drought tolerance scores on the subset of sorghum accessions that did have validation phenotype records was 0.41.
That number is below the threshold we would consider acceptable for a production recommendation — our standard for a fine-tuned model on its target species is 0.6 or above. But 0.41 without a single sorghum-labeled example is substantially above what a random baseline produces (near zero), and it suggests the embeddings are capturing something real about the biological variation that underlies drought response across grass crops. The ranking of the top quartile of predicted drought-tolerant accessions overlapped with the validated top quartile at a rate of about 58%.
For the collaborator's practical use case — reducing a large collection to a manageable candidate shortlist before field evaluation — a model that can identify the top quartile with 58% precision using zero labeled sorghum data is actually useful. It is not good enough to replace field phenotyping, but it reduces the number of lines that need to go to the field.
What Transfers and What Does Not
We did a follow-up analysis to understand which regions of the sorghum genome were driving the predictions — which token positions contributed most to the model's output. The high-attribution regions corresponded closely to known syntenic drought loci shared between rice and sorghum, which gives us some confidence that the model is not just responding to superficial sequence composition differences between drought-tolerant and susceptible accessions.
Conversely, the model performed poorly on sorghum-specific traits where there is no rice analog. Stay-green — a sorghum post-flowering drought tolerance mechanism with no direct equivalent in rice — was predicted no better than random. This is expected: the rice model has no signal for stay-green because nothing in its training objective or fine-tuning data was related to that trait. The embedding space simply does not encode it.
The embedding geometry analysis also revealed an interesting artefact: the rice model's drought-tolerance subspace is organised along a gradient that corresponds reasonably well to osmotic stress response in both species, but the sorghum accessions show a wider spread along a secondary axis that has no strong rice analog. That secondary axis appears to correlate weakly with root architecture features that are more variable in sorghum than in the rice genotypes we trained on. It is not noise — it organises consistently across accessions — but it is signal the rice model does not have a label for.
Lightweight Adaptation: A Few Labels Go a Long Way
We ran a second experiment: how many sorghum-labeled examples are needed to substantially improve on the direct transfer baseline? We used a subset of the validated sorghum phenotype records as a few-shot fine-tuning dataset, holding out the rest for evaluation, and tracked performance as a function of labeled set size.
With 50 labeled sorghum accessions and a lightweight adapter layer (we froze the base model and trained only the adapter and regression head), the Pearson correlation improved from 0.41 to 0.57. With 150 accessions it reached 0.64, exceeding our production threshold. The improvement curve flattened after about 200 accessions — adding more labeled data beyond that gave diminishing returns in this setting. This is broadly consistent with what we see in within-species fine-tuning on small datasets, but the starting point is higher because the pre-training and cross-species transfer has already done much of the heavy lifting.
The practical implication for breeders working with orphan or under-phenotyped crops is significant. If a related well-phenotyped species exists in our model library, a targeted phenotyping campaign of 100–200 accessions may be enough to produce a useful prediction model rather than the several thousand accessions that would be required for from-scratch training.
Limits and Honest Caveats
We want to be careful not to overstate what cross-species transfer buys. The rice-to-sorghum case is probably close to a best-case scenario within the grass family. We have run the same experiment for rice to wheat — a more distant pairing in terms of ploidy and genome size — and the direct transfer correlation is substantially lower, around 0.28, with a larger improvement needed from fine-tuning before the model is useful.
There is also a data provenance issue we have not fully resolved. The rice training data was collected under controlled conditions in some programmes and field conditions in others. The sorghum validation data came from field trials under variable environments. The mismatch between training and validation environments likely depresses our correlation estimates below what we would see if both sets had been collected under comparable conditions. Correcting for this properly would require a joint experimental design that we did not have the ability to specify retrospectively.
Where This Points
The most useful near-term application we see for cross-species transfer is not replacing fine-tuning but replacing the cold-start problem. For a new crop or a new breeding programme, the question is not "can we build a great model with zero data" but "can we get a good-enough model fast enough to be useful while we accumulate the labeled data for a better one." The answer, for phylogenetically adjacent species, appears to be yes — with honest expectations about precision at the extremes.
We are currently cataloguing which species pairings in our model library show meaningful transfer and which do not, with the goal of publishing a transfer compatibility map alongside our next model release. That map will not tell breeders everything they need to know, but it will tell them whether it is worth trying a cross-species baseline before investing in a phenotyping campaign.