Molecular Biology: From DNA to RNA to Protein
6 Transcription in Prokaryotes
6.1 Introduction
In the preceding sections, we have discussed the replication of the cell’s DNA and the mechanisms by which the integrity of the genetic information is carefully maintained.
What do cells do with this information? How does the sequence in DNA control what happens in a cell?
If DNA is a giant instruction book containing all of the cell’s “knowledge” that is copied and passed down from generation to generation, what are the instructions for? How do cells use these instructions to make what they need?
Genes must be expressed
The “instructions” within DNA direct the cell to synthesize proteins. We saw in Chapter 2, that thousands of proteins carry out the functions of a cell and determine what the cell does.
Recall that proteins are polymers, or chains, of many amino acid building blocks. The linear order (primary structure) of amino acids for different determines the way the protein will fold and the final shape of the protein. Therefore the “instructions” within the sequence of bases in DNA (that is, its sequence of A, T, C, G nucleotides) should correspond to the amino- acids sequence of the protein to be made.
The sections of DNA that carry these instructions are “Genes”. The information transfer requires an intermediary, where DNA first makes RNA and then RNA is “Translated” to make protein (Figure 6.1).
All living cells express the genetic information in this way, from DNA to RNA to protein, a process so fundamental that it is referred to as the central dogma of molecular biology.
It will be beneficial to revisit the section “Basics of Gene Expression” from the Chapter “Themes from Intro Bio”
Gene Expression Is Regulated
Consider that all of the cells in a multicellular organism have arisen by division from a single fertilized egg and therefore, all have the same DNA. Division of that original fertilized egg produces, in the case of humans, over a trillion cells, by the time a baby is produced from that egg (that’s a lot of DNA replication!).
Yet, we also know that a baby is not a giant ball of a trillion identical cells, but has the many different kinds of cells that make up tissues like skin and muscle and bone and nerves.
How did cells that have identical DNA turn out so different? The answer lies in the REGULATION of GENE EXPRESSION which is the process by which the information in DNA is used.
Although all the cells in a baby have the same DNA, not all genes are transcribed at the same time within a cell and different cell types will transcribe different sets of genes!
In this chapter, we will learn first about how the cell converts DNA to RNA. In subsequent units, we will learn how transcription can be controlled.
Learning Objectives
Level 1 and Level 2 (Knowledge and Comprehension)
- Explain what is meant by functional RNA
- Draw, describe, or identify key features within a transcription unit.
- Explain the role of Sigma factor in bacterial transcription.
- Describe the three steps of transcription initiation that occur before the elongation phase begins.
- Explain the molecular mechanism behind the transition from the closed-open complex during initiation—(isomerization)
- Understand the meaning of promoter consensus sequences.
- Understand the different associations between RNA polymerase and DNA during transcription stages.
- Explain how termination of bacterial transcription occurs
- Distinguish between the 2 types of terminator sequences
- Explain the differences and similarities between RNA polymerase and DNA polymerase.
⊕ Level Up (Application, Analysis, Synthesis)
- When given an illustration that shows: a portion of a gene undergoing transcription, the template and coding strands labeled, and a DNA sequence you should be able to-
- Indicate the direction in which RNA polymerase moves as it transcribes this gene.
- Write the polarity and sequence of the RNA transcript from the DNA sequence given.
- Identify the location of the promoter for the gene.
- Identify elements of a gene, when given a gene sequence from a database.
- Predict how mutations in promoter sequences, and genes coding for sigma factors would affect transcription.
6.2 Basics of Transcription
All the RNA in a cell is made by the process of transcription.
Are all RNAs in a cell protein-coding?
No! Cells produce many kinds of RNA! Some of the RNAs in the cell are functional or non-coding RNA (ncRNA). (Figure 6.2)
They are “functional” because the RNA itself carries out important functions within the cell. These can be enzymatic, structural, or involved in the control of gene expression.
They are “non-coding” because they do not carry any code or instructions to make proteins.
The protein-coding RNAs are called messenger RNAs or mRNA for short.
Below is a chart depicting the kinds of RNAs found in Prokaryotic and Eukaryotic Cells and their functions.
Type of RNA | Found in.. |
Function of RNA |
mRNA | Prokaryotes and Eukaryotes | Has code for making proteins |
tRNA (Transfer RNA) | Prokaryotes and Eukaryotes | Reads the code on mRNA during translation, and carries the appropriate amino acids into the ribosome for inclusion in the new protein. |
rRNA (ribosomal RNA) | Prokaryotes and Eukaryotes | Forms the core of Ribosomes and catalyzes protein synthesis |
miRNA (micro) RNA | Eukaryotes | Regulates gene expression |
lnc RNA (long non-coding) RNA | Eukaryotes | Diverse functions– still being discovered! |
siRNA (short interfering) | Eukaryotes | Provides protection from viruses |
In our discussion of Transcription and Translation, we will start by refining our idea of a gene as a “Transcription Unit” that makes RNA, recognizing that in many cases it ends there.
When the ultimate goal is Protein, then both Transcription and Translation are needed. We will return to the roles of RNA in the regulation of gene expression later, but for the remainder of the chapter, we are focusing on proteins coding genes and therefore messenger RNA (mRNA).
The diagram below shows the central dogma in prokaryotes and eukaryotes. See if you can spot the differences between them as it relates to transcription by comparing the diagrams below!
Location
Most mRNA transcripts in prokaryotes emerge from transcription ready to use! Transcription and translation are coupled, with the association of ribosomes with mRNA and the translation of a polypeptide beginning even before the transcript is finished. This is because these cells have no nucleus.
Eukaryotic transcripts must exit the nucleus before they encounter the ribosomes in the cytoplasm.
mRNA type :
Eukaryotic transcripts have to undergo additional processing by trimming, splicing, or both!
In contrast to eukaryotes, some bacterial genes are part of operons whose mRNAs encode multiple polypeptides.
Chromatin:
DNA in bacteria is virtually ‘naked’ in the cytoplasm while eukaryotic DNA is wrapped up in chromatin proteins in a nucleus.
RNA Synthesis Overview
Since we are coming straight off a unit of Replication, it is useful in thinking about these 2 processes by comparing the 2 polymers (DNA v/s RNA) – we can use that fundamental information to get a basic outline of transcription and the rules of transcription as we did for replication.
Building an RNA strand is very similar to building a DNA strand. This is not surprising, knowing that DNA and RNA are very similar molecules. Transcription is catalyzed by an enzyme called RNA Polymerase. RNA Polymerase Enzymes (RNAPs) are found in all cells ranging from bacteria to humans.
There are several different kinds of RNA polymerases in eukaryotic cells, while in prokaryotes, a single type of RNA polymerase is responsible for all transcription.
RNA synthesis: The substrates
Like DNA polymerases, RNA polymerases synthesize new strands only in the 5′ to 3′ direction, but because they are making RNA, they use ribonucleotides (i.e., RNA nucleotides NTPs) rather than deoxyribonucleotides. (Figure 6.4)
Ribonucleotides are joined in the same way as deoxyribonucleotides, i.e., the 3’OH of the last nucleotide on the growing chain is joined to the 5′ phosphate on the incoming nucleotide to make a phosphodiester bond.
One important difference between DNA polymerases and RNA polymerases is that the latter does not require a primer to start making RNA.
RNA Polymerases synthesize RNA in the 5′-3 direction but do not need a primer.
RNA synthesis: The Template
Unlike DNA replication, however, only short sections of the genome (the genes) are transcribed. Different genes may be copied into RNA at different times in the cell’s life cycle. Further, our cells need billions of copies of proteins which will be made using these instructions.
Once RNA polymerases are in the right place to start copying DNA, the DNA unwinds, however only one of the two strands is used as a template. This is easier to understand when we consider that even though DNA is double-stranded, only one of the two strands carries the instructions in the correct order for making protein! That strand is referred to as the CODING strand or SENSE strand or (+- Plus) strand.
Remember the goal of transcribing protein-coding genes is to generate a mRNA that carries instructions accurately such that the 5′ end of the code (DNA) is also represented at the 5′ end of the mRNA and eventually the N-terminus of the protein when being translated.
To do so the strand complementary to the coding strand must be used as TEMPLATE during transcription. The RNA is synthesized complementary to and antiparallel with the TEMPLATE strand of DNA.
The RNA transcript will now have the same sequence as that of the RNA transcript except for thymine (T) in place of uracil (U).
(Figure 6.5)
A DNA molecule contains many genes, and not all use the same physical strand for a template. The determination of sense and antisense is gene-specific!
(This is very important to understand!)
The overall process highlighting these aspects is depicted in the diagram below. Note that RNA polymerase moves along the DNA unwinding the DNA in front of it and the synthesized RNA transcript (which is single-stranded) is displaced. This allows the sections of DNA behind the polymerase to become double-stranded and another RNA polymerase can bind to the DNA to start transcription.
Thus multiple RNA polymerases can simultaneously transcribe a single gene leading to many copies of RNA transcript.
Did I Get This?
6.3 Genes are Transcription Units
As mentioned above, unlike replication only sections of DNA (Genes) are transcribed into RNA.
How do RNA polymerases find these sections? Once they find these sections how do they determine which strand to use as the template?
How do RNA polymerases know where to start and stop?
Patterns in the DNA sequence indicate where RNA polymerase should start and end transcription. These sequences are recognized by the RNA polymerase or by proteins that help RNA polymerase determine where it should bind the DNA to start transcription and end transcription.
Promoter – The Start Signals
Is the specific nucleotide sequence in the DNA of a gene that RNA polymerase binds to. This association positions the enzyme to start transcribing RNA at the appropriate place.
The expression of the gene is dependent on the binding of RNA polymerase to the promoter sequence to begin transcription. If the RNA polymerase and its helper proteins do not bind at the promoter, the gene cannot be transcribed and it will, therefore, not be expressed.
A promoter is always situated upstream of the starting site of RNA synthesis or “before” the gene it controls. Therefore the coding strands of the gene will always have the promoter sequence in the correct orientation (5′-3′)
Thus coding strands of genes have a promoter sequence before
Transcription Start Site:
Once the RNA polymerase has bound to the promoter, the first nucleotide that will be transcribed into RNA is the Transcription Starts Site (TSS). It is given the number +1.
Note: the promoter is NOT part of the mRNA!
Coding Region (Open Reading Frame):
All nucleotides added after the TSS form the open reading frame or RNA-coding region.
KEY CONVENTION
The coding strand of DNA and RNA transcript have the same sequences (except T for DNA and U in RNA) in the same 5 ‘-3 direction. Therefore by convention, gene, promoter, and regulatory sequences in DNA are written as they appear on the coding strand in the 5’-3”direction! The open reading frame (ORF) of a gene is usually represented as an arrow indicating the direction in which the sense strand is read.
Terminator Sequences:
The DNA sequence that indicates the endpoint of transcription, where the RNA polymerase should stop adding nucleotides and dissociate from the template is known as a terminator sequence.
Terminators are usually part of the RNA-coding sequence; transcription stops only after the terminator has been copied into RNA.
Upstream and Downstream:
Upstream refers to sequences in the opposite direction from the expression and are assigned negative numbers. Nucleotides downstream of the transcription start site are assigned positive numbers.
There is no nucleotide numbered 0.
When DNA sequences are written out, often the sequence of only one of the two strands is listed.
Molecular biologists typically write the sequence of the nontemplate strand (coding strand) because it will be the same as the sequence of the RNA transcribed from the template strand.
Much of the gene structure just described is broadly similar between eukaryotes and prokaryotes. These common elements largely result from the shared ancestry of cellular life in organisms with roughly 3.8 billion years of evolution.
Key differences in gene structure between eukaryotes and prokaryotes reflect their divergent transcription and translation machinery. These same patterns also help scientists identify the beginning, middle, and end of a gene when genomes are sequenced.
Key Takeaways
- During Transcription, only certain sections of the DNA are transcribed at any one time.
- RNA is transcribed from a single strand of DNA. Within a gene, only one of the two DNA strands—the template strand—is usually copied into RNA.
- Ribonucleoside triphosphates are used as the substrates in RNA synthesis.
- The transcribed RNA molecule is antiparallel and complementary to the DNA template strand. Transcription is always in the 5′→3′ direction, meaning that the RNA molecule grows at the 3′ end.
- The transcription unit contains all of the sequences that are necessary for both making the RNA as well as regulating it.
- Upstream of the start of the gene are Promoter sequences that are crucial for the binding of RNA polymerase to DNA.
- Terminator sequences signal the end of the gene, these are transcribed and found in RNA.
6.4 Transcription in Bacteria
Like many of the discussions prior, studies in bacteria (E. Coli) gave us insight into the basic biochemical processes of Transcription. We begin first by discussing what we know about transcription in a simpler system and then we will compare it with eukaryotes.
Prokaryotic RNA Polymerase
The Bacterial RNAP core enzyme contains five major subunits (α2ββ’ω- 2 copies of alpha, beta, beta prime and omega)
The core enzyme associates with a sixth subunit called the sigma factor (σ). The sigma factor helps the core enzyme bind to promoter sequences to initiate transcription. The sigma factor itself is not needed for any of the enzymatic activity of RNA synthesis and dissociates from the core enzyme once transcription has begun.
Together, the σ subunit and core polymerase make up what is termed the RNA polymerase holoenzyme.
The first sigma factor to be identified was sigma70 (σ70) , so named because the molecular weight is 70kDa. σ70 is the most common sigma factor in bacteria and is responsible for transcribing “housekeeping genes”, genes whose products are continuously needed for the regular functioning of cells.
How We Know
Biochemical assays combined with protein purification techniques of the kind you saw in Chapter 2 led to the identification of Bacterial RNA polymerase subunits.
It was found that RNA polymerase activity was associated with two protein species. A core polymerase (with subunit structure α2ββ′) can transcribe DNA into RNA inefficiently and nonspecifically. When the sigma subunit, σ70, is added, it can bind to the core forming a holoenzyme (α2ββ′σ) that is capable of specific engagement with duplex DNA at the beginning of genes (promoters) as well as efficient initiation of transcription. (1)
REQUIRED WATCHING Lecture Video 3 for the experiments that led to the identification and role of the sigma factor: L211 Introduction to Transcription (3): RNA polymerase
Before you continue you should
- Watch any Lecture videos that cover the material above if you have time.
- Do the embedded exercises.
- Complete associated problems.
The role of the sigma factor is to help RNA polymerase find ‘authentic’ genes. Those with promoter sequences. What are those sequences?
Bacterial Promoters
Because the same RNA polymerase has to bind to many different promoters, it would be predicted that promoters would have some similarities in their sequences. Scientists examined many genes and their surrounding sequences and as expected, common sequence patterns were seen to be present in many promoters.
The positions of these nucleotides relative to the transcription start site are also similar in most promoters.
This common sequence pattern is called a CONSENSUS SEQUENCE. It is important to understand that each nucleotide in a consensus sequence is simply the one that appeared at that position in the majority (consensus) of promoters examined and does not mean that the entire consensus sequence is found in all promoters.
Two common sequences act as bacterial promoters:
1) -10 sequence (Pribnow box): a 6 bp region located roughly (although not exactly) 10 bp upstream of the start site. The consensus sequence at this position is 5′- TATAAT-3′ (on the non-template or coding strand)
2) -35 sequence: a 6 bp sequence locate roughly (although not exactly) at about 35 base pairs upstream from the start of transcription. The consensus sequence at this position is 5′-TTGACA-3′ (on the non-template or coding strand)
The distance between the two is 17-19bp. While the exact sequence of the spacer is not important the length is with a separation of 17 nucleotides as optimal.
The figure below shows the most common prokaryotic promoter: the σ70 promoter, so-called because it is recognized and bound by the σ70 transcription factor.
Notice the relationship between the various individual promoters and the consensus sequence. In general, those promoters with more matches to the consensus sequence are stronger.
What does it mean to be a stronger (or weaker) promoter? First, keep in mind that the expression of any given gene is not automatic, or 100%. At any point in time, many of a cell’s genes will be near 0% or shut off. Even genes that are turned on are transcribed at different rates.
Genes with strong promoters are transcribed frequently—as often as every 2 seconds in E. coli. The RNA polymerase holoenzyme (with sigma) is more likely to recognize the site, dock properly, open up the double helix, and begin transcribing.
In contrast, genes with very weak promoters are transcribed about once in 10 minutes. RNA polymerase can potentially recognize weaker promoters, but it is less likely to do so, instead of passing it by as just another unimportant stretch of DNA.
We can now add promoter details to our Transcription Unit diagram from before.
Alternative Sigma Factors
Within bacteria, multiple different sigma factors can associate with the catalytic core of RNAP that help to direct the catalytic core to the correct DNA locations where RNAP can then initiate transcription.
6.5 Steps in Transcription
Like DNA Replication, RNA polymerase synthesizes RNA in three distinct phases that are also conceptually similar Initiation, Elongation, and Termination.
6.5.1 Transcriptional Initiation
The events that comprise transcriptional initiation can be conceptually broken down into
1) Recognition of promoter 2) Unwinding of dsDNA and creation of bubble and 3) Initial synthesis and escape of RNA polymerase from the promoter.
Below is a summary of the events
a. RNA polymerase holoenzyme with the help of the sigma factor binds to the promoter to form a closed complex; at this stage, there is no unwinding of DNA, and DNA remains in ds form (closed)
b. Closed complex is then converted to an “open” complex by the separation of the DNA strands to create a transcription bubble about 12-14 base pairs long. The conversion of the closed complex to the open complex also requires the presence of the σ subunit. The section of promoter DNA that is within it is known as a ‘transcription bubble’. The transcription bubble is about 12-14 base pairs long.
Role of sigma factor
- Initial specific binding to the promoter by sigma factors of the holoenzyme sets in motion conformational changes that result in the separation of the two strands of DNA and expose a portion of the template strand.
- The sigma protein consists of different subunits with portions that interact with the −35 element and −10 element.
- The structure of the sigma unit is such that there is a recognition pocket for a nucleotide– the A−11(nt) base from the duplex DNA. This base gets flipped into its recognition pocket and is thought to be the key event in the initiation of promoter melting and the formation of the transcription bubble.
- Once the transcription bubble has formed transcription initiates.
c. Abortive initiation: Once the open complex has formed, the DNA template can begin to be copied, and the core polymerase adds nucleotides complementary to one strand of the DNA. The polymerase adds several nucleotides while still bound to the promoter, and without moving along the DNA template. Initially, short pieces of RNA a few nucleotides long may be made and released, without the polymerase leaving the promoter.
Part of it is due to the contacts the sigma factor still has with the promoter.
d. Promoter Escape: After several abortive initiation attempts, the polymerase synthesizes an RNA molecule from 9 to 12 nucleotides in length, which allows the polymerase to transition to the elongation stage.
The σ subunit also dissociates from the core enzyme which ‘breaks free’ or ‘escapes’ into the gene.
Exercises
6.5.2 Transcriptional Elongation
The core polymerase can move along the template, unwinding the DNA ahead of it to maintain a transcription bubble of 12-15 base pairs and synthesizing RNA complementary to one of the strands of the DNA.
RNA polymerases are by themselves processive (with no need for additional apparatus) adding hundreds or thousands of bases to the growing RNA at about 20-50 nucleotides per second.
Unlike DNA polymerases there is no elaborate mechanism for proof-reading. Although research has shown that RNA polymerase is capable of a type of proofreading in the course of transcription.
When RNA polymerase incorporates a nucleotide that does not match the DNA template, it backtracks and cleaves the last two nucleotides.
6.5.3 Transcription Termination
As mentioned earlier, a sequence of nucleotides called the terminator is the signal to the RNA polymerase to stop transcription and dissociate from the template.
Some terminator sequences, known as intrinsic terminators, allow termination by RNA polymerase without the help of any additional factors, while others, called Rho-dependent terminators, require the assistance of a protein factor called rho (ρ).
How does the sequence of the terminator cause the RNA polymerase to stop adding nucleotides and release the transcript?
To understand this, it is useful to know that the terminator sequence precedes the last nucleotide of the transcript. In other words, the terminator is part of the end of the sequence that is transcribed.
Intrinsic terminators
Intrinsic terminators have two common features, first, they contain inverted repeats, which are sequences of nucleotides on the same strand that are inverted and complementary. This sequence when transcribed into RNA can base-pair with each other to form a hairpin structure.
Second, adjacent to the inverted repeat is a stretch of 7- 9 Adenines (on the template strand of DNA, the coding strand will have a stretch of T’s). These when transcribed get converted to ‘U’s.
As RNA polymerase reaches and transcribes the terminator region the RNA will have a stem-loop structure. The secondary structure formed by the folding of the end of the RNA into the hairpin causes the RNA polymerase to pause!
Meanwhile, the run of U’s at the end of the hairpin permits the RNA-DNA hybrid in this region to come apart, because the base-pairing between A’s in the DNA template and the U’s in the RNA are relatively weak.
This allows the transcript to be released from the DNA template and the RNA polymerase.
Rho-dependent termination
Transcription termination factor Rho is an essential protein in E. coli first identified for its role in transcription termination at Rho-dependent terminators and is estimated to terminate ~20% of E. coli transcripts. The rho gene is highly conserved and nearly ubiquitous in bacteria.
Rho is a helicase and consists of a hexamer of six identical monomers arranged in an open circle. The protein can separate the transcript from the template it is paired
As in intrinsic termination, rho-dependent termination requires the formation of a hairpin structure in the RNA that causes pausing of the RNA polymerase.
Meanwhile, rho binds to a region of the transcript called the rho utilization site (rut), a∼ cytidine-rich and poorly structured RNA sequence, and moves along the RNA till it reaches the paused RNA polymerase.
It then acts on the RNA-DNA hybrid, releasing the transcript from the template.
Did I Get This?
Remember to:
- Watch the Lecture videos that cover the material above. This will help to clarify or reinforce certain concepts if they are unclear.
References and Attributions
This chapter contains material taken from the following CC-licensed content. Changes include rewording, removing paragraphs replacing them with original material, and combining material from the sources.
1. Bergtrom, Gerald, “Cell and Molecular Biology 4e: What We Know and How We Found Out” (2020). Cell and Molecular Biology 4e: What We Know and How We Found Out – All Versions. 13.
https://dc.uwm.edu/biosci_facbooks_bergtrom/13
2. Works contributed to LibreTexts by Kevin Ahern and Indira Rajagopal. LibreTexts content is licensed by CC BY-NC-SA 3.0. The entire textbook is available for free from the authors at http://biochem.science.oregonstate.edu/content/biochemistry-free-and-easy
3. Flatt, P.M. (2019) Biochemistry – Defining Life at the Molecular Level. Published by Western Oregon University, Monmouth, OR (CC BY-NC-SA). Available at: https://wou.edu/chemistry/courses/online-chemistry-textbooks/ch450-and-ch451-biochemistry-defining-life-at-the-molecular-level/?preview_id=4919&preview_nonce=cca8f0ce36&preview=true