During protein synthesis, ribosomes, which are nanomachines located in the cell, use the mRNA molecules as a templates for synthesizing polypeptide chains. We have a very good understanding of the DNA sequence that makes up human chromosomes. We also know how to convert this sequence into an mRNA sequence. However, knowledge of the nucleotide sequence alone is insufficient to determine the regions of the genome that encode proteins.
As early as 1961, F.H.C. Crick and his colleagues proposed that nucleic acid sequences could be divided into codons of three nucleotides. Depending on where the reading frame begins, these codons can be read in three different frames.
Analysis of the sequences revealed that some frames contain a sequence including the translation start codon (usually AUG), followed by codons determining the amino acids and the translation stop codon. This part of the reading frame is known as an open reading frame (ORF).
When analysing genome sequences, such sequences can be found everywhere, even in regions not used for polypeptide synthesis. Therefore, the presence of an ORF does not prove polypeptide synthesis. However, the presence of an ORF enables us to predict sequences that can be used as templates for protein synthesis.
Until relatively recently, it was believed that, in eukaryotic cells, one mRNA molecule encodes one polypeptide. The term 'coding DNA sequence' (CDS) was introduced. However, this understanding has changed due to the widespread use of various methods.
The ribosome profiling method can be used to determine the RNA sequences that are 'protected' by the ribosome. Identifying these sequences allows us to experimentally determine those translated by the ribosome. It has been found that, during the synthesis of certain proteins, a reading frame shift, called ribosomal frameshift, occurs. In such cases, two different reading frames are used to produce a single polypeptide chain.
This mechanism is widespread in viruses but also occurs in cellular genes. Currently, it is not possible to annotate frameshifts when analysing genomes. To address this, pseudo-introns are introduced that do not actually occur in the genome. This results in an inaccurate and oversimplified representation of the genetic information and the molecular composition of the cells.
Therefore, there is a real need for a new term that allows for a more accurate description of the sequences used for polypeptide chain synthesis.
The term 'Translon' was proposed by 127 scientists to describe the sequences used for polypeptide chain synthesis. Translon is an abbreviation of the term 'translated region' and was first suggested by Suresh C. Goel in 1973. At that time, the term was not adopted.
However, Translon is compatible with the already-in-use terms Intron (“intragenic region”, or an intermediate sequence located within a gene that is removed during mRNA maturation) and Exon (“expressed region”, or the part of a gene that remains in the mature mRNA).
'Translon' is a general term referring to any region that is decoded by the ribosome. This region can be minimal sequence, consisting only of the start and stop codons. Alternatively, it can be a long sequence, and it can shift from one reading frame to another. Unlike existing terms, 'translon' is defined directly by the process it describes.
Most ORFs are not translons because these sequences are not used in protein synthesis. Similarly, not all translons are ORFs, as they do not necessarily start with the AUG codon. However, all CDSs are translons, since protein synthesis occurs from these sequences.
The introduction of this new term should reduce confusion surrounding the definition of regions used for polypeptide chain synthesis, allowing for more accurate annotation of translation regions in the genome.
Read the full article from the Nature Methods journals' homepage.
Read further about University of Tartu's eukaryotic protein synthesis research group.