DiMA: sequence diversity dynamics analyser for viruses.

Tharanga S.; Ünlü ES.; Hu Y.; Sjaugi MF.; Çelik MA.; Hekimoğlu H.; Miotto O.; Öncel MM.; Khan AM.

Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

DiMA: sequence diversity dynamics analyser for viruses.

Tharanga S., Ünlü ES., Hu Y., Sjaugi MF., Çelik MA., Hekimoğlu H., Miotto O., Öncel MM., Khan AM.

Sequence diversity is one of the major challenges in the design of diagnostic, prophylactic, and therapeutic interventions against viruses. DiMA is a novel tool that is big data-ready and designed to facilitate the dissection of sequence diversity dynamics for viruses. DiMA stands out from other diversity analysis tools by offering various unique features. DiMA provides a quantitative overview of sequence (DNA/RNA/protein) diversity by use of Shannon's entropy corrected for size bias, applied via a user-defined k-mer sliding window to an input alignment file, and each k-mer position is dissected to various diversity motifs. The motifs are defined based on the probability of distinct sequences at a given k-mer alignment position, whereby an index is the predominant sequence, while all the others are (total) variants to the index. The total variants are sub-classified into the major (most common) variant, minor variants (occurring more than once and of incidence lower than the major), and the unique (singleton) variants. DiMA allows user-defined, sequence metadata enrichment for analyses of the motifs. The application of DiMA was demonstrated for the alignment data of the relatively conserved Spike protein (2,106,985 sequences) of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the relatively highly diverse pol gene (2637) of the human immunodeficiency virus-1 (HIV-1). The tool is publicly available as a web server (https://dima.bezmialem.edu.tr), as a Python library (via PyPi) and as a command line client (via GitHub).

Original publication

DOI

10.1093/bib/bbae607

Type

Journal article

Journal

Briefings in bioinformatics

Publication Date

11/2024

Volume

26

Addresses

Centre for Bioinformatics, School of Data Sciences, Perdana University, MAEPS Building, Jalan MAEPS Perdana, Serdang, Kuala Lumpur 50490, Malaysia.

Keywords

Humans, Viruses, HIV-1, Computational Biology, Algorithms, Software, Genetic Variation, COVID-19, SARS-CoV-2