Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.
Skip to main content

Moving away from genome scan methods used for human GWAS (ultimately inappropriate for the short highly polymorphic genomes of RNA viruses), the group has demonstrated the potential of multi-class machine learning algorithms in inferring the functional genetic changes associated with phenotypic change (e.g. a virus crossing the species barrier).

Atome figure 3
The first two principal components of the PCA undertaken using (A) SARS coronavirus complete spike protein nucleotide sequences, and (B) nucleotides selected by the RFA. Viral groups, defined by host species and season, are represented by ellipses of different colours: Human patient samples from 2002/2003 collected in early, mid and late epidemic phase are HP03E (green), HP03M (purple) and HP03L (yellow); 2004 Human samples are labelled HP04 (black); palm civets samples collected in 2003 and 2004 are labelled PC03 (blue) and PC04 (red); bat samples are labelled BT (magenta).

These genotype to phenotype (GP) methods allow to uncover a set of features and insights that ultimately could be quite relevant in understanding viral transmission across host species. They:

  • Show that even distantly related viruses within a viral family share highly conserved genetic signatures of host specificity;
  • Reinforce how fitness landscapes of host adaptation are shaped by host phylogeny;
  • Highlight the evolutionary trajectories of RNA viruses in rapid expansion and under great evolutionary pressure.

These methods can serve as rigorous tools of emergence potential assessment, specifically in scenarios where rapid host classification of newly emerging viruses can be more important than identifying putative functional sites.