How to use the app.
Include text with status, version and updates.
Copyright 2025
Maximum likelihood phylogeny of the clustered IAV dataset. The curated database for segment N was clustered using MMSeq2 on a 0.95 identity threshold. Representative sequences were aligned using MAFFT. Maximum likelihood trees were obtained with IQ-tree and midpoint rooted for visualization.
The IAV phylogenetic trees are visualized using Taxonium, a tool for exploring large phylogenetic trees.
To assemble the sequences database, Influenza A virus nucleotide sequences were retrieved from the NCBI Entrez databases using the Influenza A virus taxonomic ID with an in-house Python tool to access the E-utilities API. A BLAST search against a curated IAV reference set, including the eight viral segments, was performed to
Alignments for each segment were obtained as follows:
Nucleotide alignments were subsequently trimmed to the coding sequences (CDS) of the IAV proteins and translated into amino acid alignments.
Several curation steps were introduced during the previous process, including:
For visualization, the IAV sequence database was clustered, and phylogenetic trees were constructed. Clustering was performed on the nucleotide sequences using MMseqs2 with a 95% sequence identity threshold (–cluster-mode 2 –cov-mode 1 –min-seq-id 0.95). Representative sequences for each cluster and segment were aligned using MAFFT (v7.453) with default parameters. Maximum likelihood trees were inferred with IQ-TREE (v2.1.2) using the best-fit model determined by the Bayesian Information Criterion (BIC) and were midpoint-rooted for visualization purposes.
Our database of mammalian adaptations is maintained by Daniel Goldhill.
Taxonium Sanderson, T. (2022). Taxonium, a web-based tool for exploring large phylogenetic trees. eLife, 11:e82392. https://doi.org/10.7554/eLife.82392.
MAFFT Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16. PMID: 23329690; PMCID: PMC3603318.
MMSEQ2 Steinegger M and Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35, 1026–1028 (2017). DOI: 10.1038/nbt.3988.
IQ-TREE Nguyen, L.-T., Schmidt, H. A., von Haeseler, A., & Minh, B. Q. (2015). IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies. Molecular Biology and Evolution, 32(1), 268–274. https://doi.org/10.1093/molbev/msu300
Nextalign Aksamentov, I., Roemer, C., Hodcroft, E. B., & Neher, R. A. (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67), 3773. https://doi.org/10.21105/joss.03773.