MoBio  >  RNA Splicing

RNA splicing is a process that removes introns and joins exons in a primary transcript. An intron usually contains a clear signal for splicing (e.g., the beta globin gene). In some cases (e.g., the Tau gene), a splicing signal may be masked by a regulatory protein, resulting in alternative splicing. In rare cases (e.g., HIV genes), a pre-mRNA may contain several ambiguous splicing signals, resulting in a few alternatively spliced mRNAs.

Splicing signal

Most introns start from the sequence GU and end with the sequence AG (in the 5' to 3' direction). They are referred to as the splice donor and splice acceptor site, respectively. However, the sequences at the two sites are not sufficient to signal the presence of an intron. Another important sequence is called the branch site located 20 - 50 bases upstream of the acceptor site. The consensus sequence of the branch site is "CU(A/G)A(C/U)", where A is conserved in all genes.

In over 60% of cases, the exon sequence is (A/C)AG at the donor site, and G at the acceptor site.


Figure 5-A-4. The consensus sequence for splicing. Pu = A or G; Py = C or U.

Splicing mechanism

The detailed splicing mechanism is quite complex. In short, it involves five snRNAs and their associated proteins. These ribonucleoproteins form a large (60S) complex, called spliceosome. Then, after a two-step enzymatic reaction, the intron is removed and two neighboring exons are joined together (see Alberts et al.). The branch point A residue plays a critical role in the enzymatic reaction.


Figure 5-A-5. Schematic drawing for the formation of the spliceosome during RNA splicing. U1, U2, U4, U5 and U6 denote snRNAs and their associated proteins. The U3 snRNA is not involved in the RNA splicing, but is involved in the processing of pre-rRNA.

β-globin gene

Expression of the β-globin gene is a typical process. This gene contains two introns and three exons. Interestingly, the codon of the 30th amino acid, AGG, is separated by an intron. As a result, the first two nucleotides AG are in one exon and the third nucleotide G is in another exon.


Figure 5-A-6. Expression of the human β-globin gene. U5 and U3 represent untranslated regions at the 5' and 3' end, respectively. Note that the mature β-globin protein does not contain the initiating methionine for protein synthesis.

Tau gene MAPT

The Tau protein has six isoforms produced from a single gene through alternative RNA splicing (Figure 5-A-7). They differ in the number of inserts at the N-terminal half and the number of repeats at the C-terminal half . The number of inserts may be 0, 1 or 2, depending on whether the exon 2 and/or 3 are included during RNA splicing. The number of repeats may be either 3 or 4. The 4-repeat (4R) Tau includes the second repeat encoded by exon 10.

The repeat region is the microtubule binding domain. The 4R Tau binds to, assembles, and stabilizes microtubules more effectively than 3R Tau. In a healthy adult brain, the levels of 4R and 3R Tau proteins are approximately equal. Distortion of the balance will lead to neurodegeneration such as Alzheimer's disease. The underlying mechanism is explained in this book.


Figure 5-A-7. The gene, mRNA and protein isoforms of Tau. In Tau genomic structure (top panel), the black boxes represent constitutive exons, and the gray and empty boxes represent alternative spliced exons. The middle panel demonstrates mRNAs of Tau in adult human brain. Six mRNAs are generated by alternative splicing of exons 2, 3 and 10, which is indicated by alternative lines linking these exons. The lower panel shows six isoforms of Tau in adult human brain. Gray boxes represent the N-terminal inserts (coded by exons 2 and 3) or repeats (coded by exons 9, 10, 11 and 12). The second repeat coded by exon 10 is highlighted by dark box. An isoform is commonly designated as xNyR, where x is the number of inserts and y is the number of repeats. [Source: Liu and Gong, 2008]

HIV-1 genome

The HIV-1 genome contains nine genes: gag, pol, vif, vpr, vpu, env, nef, rev and tat. Their protein products are all derived from a single primary transcript. This is achieved by three mechanisms: (i) alternative splicing, (ii) leaky scanning of the initiation codon, and (iii) ribosomal frameshifting.


Figure 5-A-8. Alternative splicing of the HIV-1 primary transcript. (i) is unspliced, (ii) to (iv) are singly spliced, (v) and (vi) are doubly spliced. The resulting mRNA (i), (iv) and (vi) are bicistronic. The star "*" indicates the location of the initiation codon (AUG).

The HIV genome contains several ambiguous splicing signals, resulting in a few alternatively spliced mRNAs. They can be divided into three groups: (I) unspliced, (II) singly spliced, and (III) doubly spliced. As shown in the above figure, the resulting mRNA (i), (iv) and (vi) are bicistronic (each encoding two proteins). mRNA (i) encodes gag and pol proteins, mRNA (iv) encodes vpu and env, mRNA (vi) encodes rev and nef.

Protein synthesis starts from the initiation codon (AUG) and ends with one of three stop codons. In HIV, mRNA (iv) and (vi) have two initiation codons, but the first is sometimes skipped so that the second protein may be synthesized. mRNA (i) has only one initiation codon. Synthesis of the second protein (pol) is due to translational frameshifting.