Directions

How to use the app.

Support
App Status

Include text with status, version and updates.

License

Copyright 2025

Maximum likelihood phylogeny of the clustered IAV dataset. The curated database for segment N was clustered using MMSeq2 on a 0.95 identity threshold. Representative sequences were aligned using MAFFT. Maximum likelihood trees were obtained with IQ-tree and midpoint rooted for visualization.


Tree visualisation

The IAV phylogenetic trees are visualized using Taxonium, a tool for exploring large phylogenetic trees.

Methods for Influenza A database build

To assemble the sequences database, Influenza A virus nucleotide sequences were retrieved from the NCBI Entrez databases using the Influenza A virus taxonomic ID with an in-house Python tool to access the E-utilities API. A BLAST search against a curated IAV reference set, including the eight viral segments, was performed to

  1. identify the closest references for each sequence,
  2. to validate the segment number.

Alignments for each segment were obtained as follows:

  1. Nextalign was used to obtain sub-alignments including all the sequences that shared the closest reference in the IAV reference set;
  2. the reference alignment of the IAV reference set was used to guide the insertion of gaps in individual sub-alignments, ensuring consistency with the corresponding reference sequence;
  3. sub-alignments were concatenated to obtain the full alignment per segment.

Nucleotide alignments were subsequently trimmed to the coding sequences (CDS) of the IAV proteins and translated into amino acid alignments.

Several curation steps were introduced during the previous process, including:

  • removing non-IAV genomic sequences,
  • eliminating sequences with non-significant blast hits
  • removing sequences shorter than a pre-established length threshold for each segment
  • and removing sequences unable to align to the reference with Nextalign.

For visualization, the IAV sequence database was clustered, and phylogenetic trees were constructed. Clustering was performed on the nucleotide sequences using MMseqs2 with a 95% sequence identity threshold (–cluster-mode 2 –cov-mode 1 –min-seq-id 0.95). Representative sequences for each cluster and segment were aligned using MAFFT (v7.453) with default parameters. Maximum likelihood trees were inferred with IQ-TREE (v2.1.2) using the best-fit model determined by the Bayesian Information Criterion (BIC) and were midpoint-rooted for visualization purposes.

Adaptation mutations

Our database of mammalian adaptations is maintained by Daniel Goldhill.

Sources

Taxonium Sanderson, T. (2022). Taxonium, a web-based tool for exploring large phylogenetic trees. eLife, 11:e82392. https://doi.org/10.7554/eLife.82392.

MAFFT Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16. PMID: 23329690; PMCID: PMC3603318.

MMSEQ2 Steinegger M and Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35, 1026–1028 (2017). DOI: 10.1038/nbt.3988.

IQ-TREE Nguyen, L.-T., Schmidt, H. A., von Haeseler, A., & Minh, B. Q. (2015). IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies. Molecular Biology and Evolution, 32(1), 268–274. https://doi.org/10.1093/molbev/msu300

Nextalign Aksamentov, I., Roemer, C., Hodcroft, E. B., & Neher, R. A. (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67), 3773. https://doi.org/10.21105/joss.03773.