ViruSurf

an integrated database to investigate viral sequences

verfasst von
Arif Canakoglu, Pietro Pinoli, Anna Bernasconi, Tommaso Alfonsi, Damianos p Melidis, Stefano Ceri
Abstract

ViruSurf, available at gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at gmql.eu/virusurf gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

Organisationseinheit(en)
Forschungszentrum L3S
Externe Organisation(en)
Politecnico di Milano
Typ
Artikel
Journal
Nucleic Acids Research
Band
49
Seiten
D817-D824
ISSN
0301-5610
Publikationsdatum
08.01.2021
Publikationsstatus
Veröffentlicht
Peer-reviewed
Ja
ASJC Scopus Sachgebiete
Genetik
Ziele für nachhaltige Entwicklung
SDG 3 – Gute Gesundheit und Wohlergehen
Elektronische Version(en)
https://doi.org/10.1093/nar/gkaa846 (Zugang: Offen)