CovRadar Help Page

CovRadar is a tool for molecular surveillance of the Corona spike protein. The spike protein contains the receptor binding domain (RBD) that is used as a target for antibodies and vaccines.

The CovRadar webservice offers several analysis tools which are described in the following sections. The blue info buttons above the generated plots can provide additional explanations.

Please refer CovRadar's application note for further information: CovRadar: Continuously tracking and filtering SARS-CoV-2 mutations for molecular surveillance

CovRadar Workflow Chart

This is a simplified overview of Covradar’s input, analytical pipeline, and web service. The analytical pipeline is described in detail in the Supplementary Methods of CovRadar's preprint.

Mutation Frequency by Location

The interactive map shows the spatial distribution of selected mutations within a given time span. If mutations are present, the plot displays a central data point for each country. For the German sequences exclusively, the zip codes of the sequencing labs (not the location of the patient) are available, which are used to provide a more precise spatial distribution for Germany.

Clicking on a data point triggers the subplots to show additional information about the mutation distribution and the time course of the mutation frequencies for a specific region.

Option Description Note
Mutation(s) of Concern Predefined amino acid mutation selectable from dropdown list e.g. E484K, N501Y
Map Display Mode [n-th most frequent mutation | Absolute frequencies]
n-th Most Frequent Mutation e.g. 1 for most frequent mutation, 2 for 2nd most mutation Only eligible if display mode "n-th most frequent mutation" is selected
Method of Visualization [Frequency | MOC Proportion | Increase] 'MOC Proportion' and 'Increase' only selectable display mode is set to 'Absolute frequencies'
Number of Days e.g., 30 means that the map shows the data aggregated to the last 30 days The set interval is visible below the map
Y-axis Frequency Plot Switches between logarithmic and linear y axis scale of the Time Course of Mutation Frequency Plot in the lower right corner

Data Distribution

The data distribution plot provides an overview of the available genome sequences per calendar week and country.

This allows to identify over-/underrepresented countries in the dataset (and thus in the "Global" option in the other plots).

Option Description Note
Y-axis [weekly sequences | accumulated sequences]

Alternative Allele Frequency Plot

In the alternative allele frequency plot each block represents a nucleotide in the multiple sequence alignment (MSA) of spike sequences. The coordinates on the y-axis are related to the MSA position.

The color shows the alternative allele frequency of the selected sequences compared to the selected reference sequence. The dafault reference is Wuhan-HU-1 (NC_045512.2).

The mouse tooltip shows the MSA, spike position and codon position, the frequency and coverage, and, if selected, the MOC at this position.

Positions followed by a dot and an integer indicate insertions in the MSA and gaps in the aligned Wuhan sequence, respectively. Coverage means how many sequences are used for retrieving the frequency at this position.

Note, that degenerated bases for which it cannot be clearly determined whether they are an alternative allele or not are excluded.

Consensus Sequence Options

Option Description Note
Calendar Week Consensus sequence built by sequences from this calendar week Wuhan-Hu-01 represents the reference sequence NC_045512.2
Country Consensus sequence built by sequences from this country
  • Note: The number of sequences the consensus sequence is built from is displayed above the plot.

  • Changes to the options need to be confirmed by pressing the Apply Filters button, which triggers a recalculation of the plot.

Sequence Options

Option Description Note
Country Only sequences from these countries are used for calculating the allele frequencies To include all available countries set this option to 'Global'
Origins Only sequences from this database are used for calculating the allele frequencies
Host Only sequences from these hosts are used for calculating the allele frequencies
Time Interval Only sequences sampled in this time interval are used for calculating the allele frequencies
Lineages Only sequences classified as the selected Pango lineage(s) are used for calculating the allele frequencies Parent lineages contain all sublineages (e.g. B.1.617.2 is already considered when B.1 is selected). Using the "all" selector also considers unclassified sequences.
  • Note: The number of considered sequences is denoted above the plot.

  • Changes to the options need to be confirmed by pressing the Apply Filters button, which induces a recalculation of the plot.

Annotation Options

Option Description Note
Domains A specific physical region or amino acid sequence in a protein which is associated with a particular function or corresponding segment of DNA e.g. RBD (Receptor Binding Domain)
Concerning Amino Acid Mutations The selection of these amino acid changes marks the corresponding nucleotide positions in the plot. e.g. E484K or common_delta. common_delta highlights nucleotide positions of mutations that are common in the VOC delta (see outbreak.info).
Custom Codons Highlights the corresponding nucleotide positions in the plot e.g 215;217-219
  • Note: These options don't have an influence on the allele frequency calculation. They only highlight specific regions within the heatmap.

  • The MOC information becomes visible by hovering over the respective nucleotide position while the domain identifiers are displayed on the right.

Nucleotide and Amino Acid Distribution

The nucleotide distribution plot shows the base distribution for each calendar week for the selected spike position and country.

The amino acid distribution plot is connected to the nucleotide distribution plot and shows the corresponding amino acid distribution for each calendar week with the same selection as for the nucleotide distribution.

Note that not every nucleotide change results in an amino acid change. For degenerate bases, only unambiguous amino acid changes are displayed, otherwise they are treated as reference. Insertions cannot be shown yet. The colors between the two plots are unrelated.

Option Description Note
Y-axis [count | frequencies] Change between absolute and relative number of sequences per calendar week.
Country Only sequences from this country are used for calculating the nucleotide and amino acid distributions To include all available countries set this option to 'Global'
Origins Only sequences from this database are used for calculating the nucleotide and amino acid distributions
Host Only sequences from these hosts are used for calculating the nucleotide and amino acid distributions
Lineages Only sequences classified as the selected Pango lineage(s) are used for calculating the nucleotide and amino acid distributions Parent lineages contain all sublineages (e.g. B.1.617.2 is already considered when B.1 is selected). Using the "all" selector also considers unclassified sequences.
Spike Position 1-based nucleotide position within the spike protein based on refrence Wuhan-HU-1 (NC_045512.2)
  • Note: If the user is interested in certain MSA positions or codons the position converter in the right-hand menu can be used to calculate the corresponding spike position.

Consensus Sequence Table

The table reports changes in consensus compared to Wuhan-Hu-1 (NC_045512.2) for each calendar week.

The table header shows the position with respect to the Multiple Sequence Alignment (msa) as well as with respect to Wuhan-Hu-1 (ref).

The row starting with "Wuhan-Hu-1" contains the corresponding reference bases with associated amino acid translations.

If the consensus base matches the reference base, it is marked with a dot. Deletions are marked by "-". Next to the calendar week is the sequence coverage. For each position the number of sequences supporting that base and the translated amino acid is provided.

Option Description Note
Country Only sequences from this country are used for calculating the consensus sequences To include all available countries, set this option to 'Global'
Spike Position Range 1-based nucleotide position range within the spike protein based on refrence Wuhan-HU-1 (NC_045512.2)
Calendar Week Interval Interval of calendar weeks for which the consensus sequences are generated

Browser Compatibility

Operating System Firefox Chrome Edge Safari
Windows 10 90 90 90 n/a
macOS Monterey 90 96 90 15
Ubuntu 20.04 93 96 n/a n/a

Note that these are the browser versions we specifically used for testing. Older versions will likely also work. Mobile browsers and Internet Explorer are generally not supported.

FAQ

What is the difference between nucleotide, codon and amino acid?

A nucleotide is a building block of DNA and RNA. DNA consists of the four nucleic bases A (Adenine), G (Guanine), C (Cytosine), T (Thymine). In RNA there are A, G, C, and U (Uracil). A codon is a sequence of three nucleotides that are translated into an amino acid. Which codons translate to which amino acid can be determined using the codon sun. Proteins are built from amino acids.

Why is T used instead of U even though SARS-CoV-2 is an RNA virus?

SARS-CoV-2 is an RNA virus, however, sequencing occurs after conversion to double stranded complementary DNA, that is why the sequencer outputs T instead of U.

What is the difference between mutation, variant, strain and lineage?

A mutation is a change in the genetic code, which is caused by errors in replication. These changes can lead to a change in nucleotides, which in turn can lead to changes in amino acids. Amino acids form proteins and these have a variety of functions in the organism. For example, SARS-CoV-2 uses the spike protein to bind with the human host cell. Altering this protein can for example cause the binding ability to improve (but also to worsen).

Variants are viruses that share a specific set of mutations. Strains are variants that lead to a change in viral properties, e.g. enhanced transmissibility. Depending on their severity, SARS-CoV-2 variants can be classified by WHO as VOC (variants of concern), VOI (variants of interest) and VUM (variants under monitoring). Lineages are usually used as a synonym for Pango nomenclature (e.g., B.1.1.7). They refer to a branch in the phylogenetic tree and help to classify the SARS-CoV-2 variants in an evolutionary context.

Where can I find Delta, Omicron, or other VOC and VOI?

With CovRadar, we are focusing on mutation information, independent of lineage assignment, to rapidly track SARS-CoV-2 evolution. There is a time delay before a new lineage is defined and especially with most sequences being Delta at the moment, it becomes increasingly important to monitor the pandemic at the mutation level.

With CovRadar you can investigate, for exmple, MOI S:L452R and MOC "Erik" S:E484K, two antibody escape mutations, or MOI "Nelly" S:N501Y, a mutation associated with better host receptor binding (Harvey et al.). For a good overview of which mutations are common in which VOCs, visit outbreak.info.

What is the difference between MSA, spike, genome and codon position?

MSA position is the position in the multiple sequence alignment calculated by CovRadar.

Spike position is the postion within the gene which encodes for the spike protein. The sequence refers to Wuhan-Hu-1 (NC_045512.2). Insertions are not numbered consecutively as in the MSA, but are indicated with ".X". So e.g. 100.2 describes the position of a second nucleotide insertion starting from the 100th position in the spike.

Genome position the position within the full viral genome Wuhan-Hu-1 (NC_045512.2). Insertions are indicated as in the spike position.

Codon position is the converted spike position.

You can use our Position Converter (found on the right side of each app) to convert between positions.