Coronavirus genome – What do we know about genes of COVID-19?

Recent coronavirus outbreak brought the science of genomics into the center of world’s attention: genome of COVID-19 influences mode of transmission, severity of caused disease, as well as directly determines possible drug- and vaccine-related efforts. Fortunately, modern genome sequencing technology allowed fast response in terms of both sequencing and tracking of the SARS-Cov-2 pandemic.

Genome of coronavirus

Genome length: 29-30 thousand of nucleotides (over 2 times longer than flu)

Number of separate genes: 10 genes, including 1 long and 9 short structural

Number of proteins: 29 proteins, including 16 non-structural proteins

Genome type: RNA positive-sense (like dengue virus, but unlike RNA-negative flu)

Two main genes of COVID-19

ORF1ab gene – Codes multiple proteins, which guide infection process after invading the cell. They block antiviral pathways, isolate viral particles from inners of the cell, support replication of viruses, and evade detection of coronavirus.

Spike gene – Serves as the main mechanism of invasion. Produces protein present at the outer layer of coronavirus, which then binds to human ACE2 receptor, found commonly on surfaces of cells in the human lungs.

Comparison of COVID-19 to SARS

SARS virus has similar length (29-30 thousand nucleotides) and analogous genes. Genomes of SARS and SARS-CoV-2 are identical in 82%. Highest differences are found in the region of the spike gene which codes binding region, as well as in the two structural open reading frames, ORF3b and ORF8.

Similarity on the whole genome level can be also illustrated by trinucleotide frequencies, as demonstrated by V. Mallawaarachchi:

Access to coronavirus genome sequences

GISAID collects sequenced strains from all over the world: “Genetic epidemiology of hCov-19”

NCBI’s GenBank collects publicly available sequences: “SARS-CoV-2 Sequence Read Archive”

bioRxiv and medRxiv provide analyses of COVID-19 data: “SARS-Cov-2 preprints”

