research on dna structure

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Explore content
About the journal
Publish with us
Sign up for alerts
NEWS AND VIEWS
09 October 2019

The structure of DNA

Georgina Ferry 0

Georgina Ferry is a science writer based in Oxford, UK. A revised edition of her biography Dorothy Crowfoot Hodgkin has just been published by Bloomsbury Reader.

You can also search for this author in PubMed Google Scholar

On 25 April 1953, James Watson and Francis Crick announced 1 in Nature that they “wish to suggest” a structure for DNA . In an article of just over a page, with one diagram (Fig. 1), they transformed the future of biology and gave the world an icon — the double helix. Recognizing at once that their structure suggested a “possible copying mechanism for the genetic material”, they kick-started a process that, over the following decade, would lead to the cracking of the genetic code and, 50 years later, to the complete sequence of the human genome.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

Nature 575 , 35-36 (2019)

doi: https://doi.org/10.1038/d41586-019-02554-z

Watson, J. D. & Crick, F. H. C. Nature 171 , 737–738 (1953).

Article PubMed Google Scholar

Avery, O. T., MacLeod, C. M. & McCarty, M. J. Exp. Med. 79 , 137–158 (1944).

Hershey, A. D. & Chase, M. J. Gen. Physiol. 36 , 39–56 (1952).

Pauling, L., Corey, R. B. & Branson, H. R. Proc. Natl Acad. Sci. USA 37 , 205–211 (1951).

Cochran, W., Crick, F. H. & Vand, V. Acta Crystallogr. 5 , 581–586 (1952).

Article Google Scholar

Vischer, E. & Chargaff, E. J. Biol. Chem. 176 , 703–714 (1948).

PubMed Google Scholar

Wilkins, M. H. F., Stokes, A. R. & Wilson, H. R. Nature 171 , 738–740 (1953).

Franklin, R. E. & Gosling, R. G. Nature 171 , 740–741 (1953).

Olby, R. Nature 421 , 402–405 (2003).

Brenner, S. Proc. Natl Acad. Sci. USA 43 , 687–694 (1957).

Crick, F. H. C. Symp. Soc. Exp. Biol. 12 , 138–163 (1958).

Meselson, M. & Stahl, F. W. Proc. Natl Acad. Sci. USA 44 , 671–682 (1958).

Lehman, I. R., Bessman, M. J., Simms, E. S. & Kornberg, A. J. Biol. Chem. 233 , 163–170 (1958).

Nirenberg, M. W. & Matthaei, J. H. Proc. Natl Acad. Sci. USA 47 , 1588–1602 (1961).

Crick, F. H. C., Barnett, L., Brenner, S. & Watts-Tobin, R. J. Nature 192 , 1227–1232 (1961).

Sanger, F., Nicklen, S. & Coulson, A. R. Proc. Natl Acad. Sci. USA 74 , 5463–5467 (1977).

Crick, F. H. C. The Astonishing Hypothesis: The Scientific Search for the Soul (Simon & Schuster, 1994).

Google Scholar

Download references

Reprints and permissions

Nature PastCast: The other DNA papers

Host: Kerri Smith

This is the Nature PastCast , each month raiding Nature ’s archive and looking at key moments in science. In this show, we’re going back to the 1950s.

Music: I’ve Got the World on a String by Ella Fitzgerald

Voice of Nature: John Howe

From the Editorial and Publishing Offices of Nature , Macmillan and Co., St Martin’s Street, London. Nature , April 25 th 1953.

Page 734, Microsomal particles of normal cow’s milk . Page 737, Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid, J. D. Watson and F. H. C. Crick.

Raymond Gosling

Walking into the lab and seeing this double helix, of course, it looked familiar because all of the stator of the dimensions was the stuff that we got from our X-ray diffraction patterns. So, it looked right and it was sheer elegance.

I’m Raymond Gosling, co-author of one of the papers in Nature , 1953, April, on the structure of DNA.

Melinda Baldwin

My name is Melinda Baldwin. I’m a historian of science at the American Academy of Arts and Sciences in Cambridge, Massachusetts. I think a lot of people don’t necessarily know that there were three DNA papers instead of just the one, and I think the big reason that the Watson and Crick paper became the one that we do remember is because that’s the one where the structure of DNA was published, and I think as a consequence the second two papers have really fallen out a bit of consciousness. The Franklin and Gosling paper was primarily about crystallographic work.

Page 740, Rosalind E. Franklin and R. G. Gosling, King’s College London, Molecular Configuration in Sodium Thymonucleate .

Georgina Ferry

I’m Georgina Ferry. I’m a science writer and author. At the time, X-ray crystallography of large molecules – the sort of molecules that you get in living bodies – was still a very, very small field. It had really started in the 1930s. Everybody was interested in the structure of proteins back in the 30s because nobody thought that DNA could possibly be complicated enough to be the molecule of life. That wasn’t really discovered until the mid-40s and then, obviously, it became very important to study its structure.

The only time I could get at the X-ray set in King’s, the only one that existed, was in the basement of the chemistry department, and that was below the level of the Thames and I was only allowed to play with it in the evenings.

What you need is an X-ray source, which in those days would have been an X-ray tube. I mean it was a form of technology that was available from the 19 th century but it’s a tube full of gas that you run an electric current through and it emits X-rays, and then in order to study your molecule, the thing you’re interested in, you have to crystallise it. You surround that, in the early days, with photographic film so that when the X-rays come in, they hit the atoms in the crystal and they’re diffracted out and they make spots on the photographic film.

I needed lots of fibres. One would produce the diffraction pattern so weak that you’d never see it, so I wound 35 fibres round a paperclip and then pushed the clip open a bit to make the fibres taught.

Sodium thymonucleate fibres give two distinct types of X-ray diagram. The first corresponds to a crystalline form, structure A. At higher humidities, a different structure, structure B, appears.

And the best structure B pattern we ever got is photo 51, which I took and was called 51 because that was the 51 st photograph that we’d taken, Rosalind and I, in our efforts to sort out this A and B difference.

It’s a really beautiful photo. It’s very crisp, it’s very clean, it’s got this really neat ‘X’ shape, and apparently if you know something about crystallography, this photo just screams helix.

What is puzzling, I think is still puzzling, is why they didn’t pursue that photograph once they had it.

Now, Rosalind was absolutely determined that there was so much information in structure A’s diffraction pattern that was what she wanted to do and therefore put this photo 51 on one side and said we’ll come back to that. I only wish I’d been able to plug the value of looking at structure B as well as Structure A.

Ella Fitzgerald – I’ve Got the World on a String

So, Rosalind Franklin was working with Maurice Wilkins but the two of them had a pretty bad working relationship. Apparently, Franklin thought that she was being brought to King’s College London as an independent investigator who would be in charge of her own research. Wilkins thought that she was being brought in as an assistant, and eventually the relationship grew so fraught that Franklin stopped showing him her data, and she was planning on moving to Birkbeck College. Somehow, Wilkins got a copy of photo 51.

I took it down the corridor and gave it to him because it had reached the stage now when Rosalind was going to leave, so she suggested that I go down the corridor and give this beautiful structure B pattern, this photo 51, to Maurice. Maurice couldn’t believe it when I offered it to him. He couldn’t believe that I hadn’t stolen it from her desk. He didn’t think that she could ever offer him something as interesting as this. He’d only had it for two or three days when Watson chipped up.

He showed it to James Watson when James came down to visit him and to chat a little bit about DNA.

Who of course knew what a helical diffraction pattern would look like because Crick had two years previously published a theoretical paper of what the diffraction pattern of a helix would look like.

Watson’s got this great passage in The Double Helix where he said my pulse sped up and my heart began to race because he looked at this photo and realised immediately that DNA was helical and that he knew what size the turns had to be. So, this photo contained all of the information that he needed to build the model that he and Crick ended up being famous for.

We wish to suggest a structure for the salt of deoxyribose nucleic acid (D. N. A). This structure has two helical chains, each coiled round the same axis.

So, it was pretty out of order for Watson and Crick to start working on DNA because they knew full well that Maurice Wilkins was working on it at King’s and subsequently Rosalind Franklin joined him there and she was also working on it. But it was King’s’ problem, and there was very much a sort of unspoken gentleman’s agreement – it would be understood that a particular group or lab was working on one problem and you wouldn’t then go and do that one.

You didn’t go to work on another man’s problem, especially if he’d got a whole team working on it.

In the Watson and Crick paper, it’s not credited. Watson and Crick say they were stimulated by a general knowledge of the unpublished results of Wilkins and Franklin.

We have been stimulated by a knowledge of the general nature of the unpublished experimental results and ideas of Dr Wilkins, Dr Franklin and…

But they don’t cite photo 51 specifically and then Franklin and Gosling, in their paper, say this photo clearly supports the model that Watson and Crick had put forth.

Rosalind’s reaction was, I think, typical of Rosalind. She wasn’t furious or didn’t use the word ‘scooped’. What she actually said was we all stand on each other’s shoulders. We had this second-, third-prize feeling that we were within a millimetre or two of the right answer ourselves.

So, Watson and Crick had their paper ready to go. They had the structure solved. They wanted to publish it in Nature . Apparently, John Randall, the uber-head of the Kings College London Laboratory, was a member of The Athenaeum, the British social club in London, and so was L. J. F. Brimble, then one of the co-editors of Nature . So, apparently, Brimble approached Randall to say well, we’ve got this paper under consideration, don’t you want the King’s work represented as well? And I think Watson and Crick and Wilkins had already agreed that they would publish two papers side-by-side. Wilkins sort of knew that his work was going to be outshone by Watson and Crick, but he certainly wanted it published. And then apparently after the two of them had agreed to publish the two papers together, Rosalind Franklin said, well, I want a paper on the crystallographic work that Ray Gosling and I did in there as well, and so it was really by conversation by the editors and the heads of the laboratories that the editors agreed to print these paper as quickly as possible. So, famously, the three DNA papers were not peer-reviewed. I think that was quite typically for the Brimble-and-Gale editorship, that they placed a lot of trust in particular laboratory heads and particular friends in the British scientific community and so if Laurence Bragg said that something was good and important, they were going to print it.

There wasn’t a huge fuss made, even within science, about the DNA structure until probably the early 60s when the code began to be cracked because obviously – as Watson and Crick famously said –

Voice of Nature : John Howe

It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.

But the actual code wasn’t cracked until the early 60s, and that was when the power of this discovery really started to make a big difference.

Elsewhere in Nature , Page 757, Appointments vacant. Physicists wanted for fundamental research on felt and applied research of the felt-making industry, The British Hat and Allied Felt-makers Research Association, Manchester.

Page 716, Department of Scientific and Industrial Research UK, The gross expenditure of the department was £5.5 million as against £5 million in the previous year.

The climbing of Mount Everest and the coronation of the Queen and all these things came together so that ’53 in that lab was seen as an almost miraculous time.

Everywhere you looked you could see that it fitted a double helix. It was uncanny. It just screamed at you. I’ve often asked how long would it have been before we as a group saw that and I really don’t know the answer to that. It was a stroke of genius on his part.

Nature . Annual subscription £6. Payable in advance. Postage paid to any part of the world.

Kerri Smith

The Nature PastCast was produced by me, Kerri Smith, with contributions from Raymond Gosling, writer Georgina Ferry and historian Melinda Baldwin. In episode two of this twelve-part series on the history of science, we’re heading back to the 1980s.

See all News & Views

Structural biology
Molecular biology

Pig-organ transplants: what three human recipients have taught scientists

News 17 MAY 24

How to kill the ‘zombie’ cells that make you age

News Feature 15 MAY 24

Decoding the interplay between genetic and non-genetic drivers of metastasis

Review Article 15 MAY 24

Physiological temperature drives TRPM4 ligand recognition and gating

Article 15 MAY 24

Dimerization and antidepressant recognition at noradrenaline transporter

Plasmid targeting and destruction by the DdmDE bacterial defence system

Article 13 MAY 24

Release of a ubiquitin brake activates OsCERK1-triggered immunity in rice

The temperature sensor TWA1 is required for thermotolerance in Arabidopsis

Structural mechanism of angiogenin activation by the ribosome

Article 08 MAY 24

Postdoc in CRISPR Meta-Analytics and AI for Therapeutic Target Discovery and Priotisation (OT Grant)

APPLICATION CLOSING DATE: 14/06/2024 Human Technopole (HT) is a new interdisciplinary life science research institute created and supported by the...

Human Technopole

Research Associate - Metabolism

Houston, Texas (US)

Baylor College of Medicine (BCM)

Postdoc Fellowships

Train with world-renowned cancer researchers at NIH? Consider joining the Center for Cancer Research (CCR) at the National Cancer Institute

Bethesda, Maryland

NIH National Cancer Institute (NCI)

Faculty Recruitment, Westlake University School of Medicine

Faculty positions are open at four distinct ranks: Assistant Professor, Associate Professor, Full Professor, and Chair Professor.

Hangzhou, Zhejiang, China

Westlake University

PhD/master's Candidate

PhD/master's Candidate Graduate School of Frontier Science Initiative, Kanazawa University is seeking candidates for PhD and master's students i...

Kanazawa University

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Search Menu
Advance Articles

Editor's Choice

Information for authors
Submission Site
Open Access Options
Why publish with the journal
About DNA Research
About the Kazusa DNA Research Institute
Editorial Board
Advertising and Corporate Services
Journals Career Network
Self-Archiving Policy
Dispatch Dates
Journals on Oxford Academic
Books on Oxford Academic

Editor-in-Chief

Satoshi Tabata

About the journal

DNA Research is an internationally peer-reviewed journal which aims at publishing papers of highest quality in broad aspects of DNA and genome-related research …

Latest Articles

High-Impact Research Collection

Explore a collection of freely available high-impact research from 2020 and 2021 published in DNA Research .

Browse the collection

DNA Research is the official journal of Kazusa DNA Research Institute, published by Oxford University Press and supported by funding from Chiba Prefecture, Japan.

Why publish in DNA Research?

Growing Impact Factor, fully open access journal, low open access charges, and more.

Volume 26, Issue 6: TASUKE+: a web-based platform for exploring GWAS results and large-scale resequencing data

Read the Executive Editor’s commentary

Resource Articles: Genomes Explored

Email alerts

Recommend to your library

Fill out our simple online form to recommend DNA Research to your library.

Recommend now

Committee on Publication Ethics (COPE)

This journal is a member of and subscribes to the principles of the Committee on Publication Ethics (COPE)

publicationethics.org

PubMed Central

This journal enables compliance with the NIH Public Access Policy Read more

Open access

Open access options for authors.

Accepting high quality papers on broad aspects of DNA and genome-related research.

Related Titles

Author Guidelines

Affiliations

Online ISSN 1756-1663
Copyright © 2024 Kazusa DNA Research Institute
About Oxford Academic
Publish journals with us
University press partners
What we publish
New features
Institutional account management
Rights and permissions
Get help with access
Accessibility
Advertising
Media enquiries
Oxford University Press
Oxford Languages
University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

Copyright © 2024 Oxford University Press
Cookie settings
Cookie policy
Privacy policy
Legal notice

This Feature Is Available To Subscribers Only

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

school Campus Bookshelves
menu_book Bookshelves
perm_media Learning Objects
login Login
how_to_reg Request Instructor Account
hub Instructor Commons

Margin Size

Download Page (PDF)
Download Full Book (PDF)
Periodic Table
Physics Constants
Scientific Calculator
Reference & Cite
Tools expand_more
Readability

selected template will load here

This action is not available.

9.1: The Structure of DNA

Last updated
Save as PDF
Page ID 7022

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

In the 1950s, Francis Crick and James Watson worked together at the University of Cambridge, England, to determine the structure of DNA. Other scientists, such as Linus Pauling and Maurice Wilkins, were also actively exploring this field. Pauling had discovered the secondary structure of proteins using X-ray crystallography. X-ray crystallography is a method for investigating molecular structure by observing the patterns formed by X-rays shot through a crystal of the substance. The patterns give important information about the structure of the molecule of interest. In Wilkins’ lab, researcher Rosalind Franklin was using X-ray crystallography to understand the structure of DNA. Watson and Crick were able to piece together the puzzle of the DNA molecule using Franklin's data (Figure $\PageIndex{1}$). Watson and Crick also had key pieces of information available from other researchers such as Chargaff’s rules. Chargaff had shown that of the four kinds of monomers (nucleotides) present in a DNA molecule, two types were always present in equal amounts and the remaining two types were also always present in equal amounts. This meant they were always paired in some way. In 1962, James Watson, Francis Crick, and Maurice Wilkins were awarded the Nobel Prize in Medicine for their work in determining the structure of DNA.

$Photo in part A shows James Watson, Francis Crick, and Maclyn McCarty. The x-ray diffraction pattern in part b is symmetrical, with dots in an x-shape.$

Now let’s consider the structure of the two types of nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The building blocks of DNA are nucleotides, which are made up of three parts: a deoxyribose (5-carbon sugar), a phosphate group, and a nitrogenous base (Figure $\PageIndex{2}$). There are four types of nitrogenous bases in DNA. Adenine (A) and guanine (G) are double-ringed purines, and cytosine (C) and thymine (T) are smaller, single-ringed pyrimidines. The nucleotide is named according to the nitrogenous base it contains.

The phosphate group of one nucleotide bonds covalently with the sugar molecule of the next nucleotide, and so on, forming a long polymer of nucleotide monomers. The sugar–phosphate groups line up in a “backbone” for each single strand of DNA, and the nucleotide bases stick out from this backbone. The carbon atoms of the five-carbon sugar are numbered clockwise from the oxygen as 1', 2', 3', 4', and 5' (1' is read as “one prime”). The phosphate group is attached to the 5' carbon of one nucleotide and the 3' carbon of the next nucleotide. In its natural state, each DNA molecule is actually composed of two single strands held together along their length with hydrogen bonds between the bases.

Watson and Crick proposed that the DNA is made up of two strands that are twisted around each other to form a right-handed helix, called a double helix. Base-pairing takes place between a purine and pyrimidine: namely, A pairs with T, and G pairs with C. In other words, adenine and thymine are complementary base pairs, and cytosine and guanine are also complementary base pairs. This is the basis for Chargaff’s rule; because of their complementarity, there is as much adenine as thymine in a DNA molecule and as much guanine as cytosine. Adenine and thymine are connected by two hydrogen bonds, and cytosine and guanine are connected by three hydrogen bonds. The two strands are anti-parallel in nature; that is, one strand will have the 3' carbon of the sugar in the “upward” position, whereas the other strand will have the 5' carbon in the upward position. The diameter of the DNA double helix is uniform throughout because a purine (two rings) always pairs with a pyrimidine (one ring) and their combined lengths are always equal (Figure $\PageIndex{3}$).

Part A shows an illustration of a DNA double helix, which has a sugar phosphate backbone on the outside and nitrogenous base pairs on the inside. Part B shows base-pairing between thymine and adenine, which form two hydrogen bonds, and between guanine and cytosine, which form three hydrogen bonds.

The Structure of RNA

There is a second nucleic acid in all cells called ribonucleic acid, or RNA. Like DNA, RNA is a polymer of nucleotides. Each of the nucleotides in RNA is made up of a nitrogenous base, a five-carbon sugar, and a phosphate group. In the case of RNA, the five-carbon sugar is ribose, not deoxyribose. Ribose has a hydroxyl group at the 2' carbon, unlike deoxyribose, which has only a hydrogen atom (Figure $\PageIndex{4}$).

A figure showing the structure of ribose and deoxyribose sugars. In ribose, the OH at the 2' position is highlighted in red. In deoxyribose, the H at the 2' position is highlighted in red.

RNA nucleotides contain the nitrogenous bases adenine, cytosine, and guanine. However, they do not contain thymine, which is instead replaced by uracil, symbolized by a “U.” RNA exists as a single-stranded molecule rather than a double-stranded helix. Molecular biologists have named several kinds of RNA on the basis of their function. These include messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA)—molecules that are involved in the production of proteins from the DNA code.

How DNA Is Arranged in the Cell

DNA is a working molecule; it must be replicated when a cell is ready to divide, and it must be “read” to produce the molecules, such as proteins, to carry out the functions of the cell. For this reason, the DNA is protected and packaged in very specific ways. In addition, DNA molecules can be very long. Stretched end-to-end, the DNA molecules in a single human cell would come to a length of about 2 meters. Thus, the DNA for a cell must be packaged in a very ordered way to fit and function within a structure (the cell) that is not visible to the naked eye. The chromosomes of prokaryotes are much simpler than those of eukaryotes in many of their features (Figure $\PageIndex{5}$). Most prokaryotes contain a single, circular chromosome that is found in an area in the cytoplasm called the nucleoid.

Illustration shows a eukaryotic cell, which has a membrane-bound nucleus containing chromatin and a nucleolus, and a prokaryotic cell, which has DNA contained in an area of the cytoplasm called the nucleoid. The prokaryotic cell is much smaller than the eukaryotic cell.

The size of the genome in one of the most well-studied prokaryotes, Escherichia coli, is 4.6 million base pairs, which would extend a distance of about 1.6 mm if stretched out. So how does this fit inside a small bacterial cell? The DNA is twisted beyond the double helix in what is known as supercoiling. Some proteins are known to be involved in the supercoiling; other proteins and enzymes help in maintaining the supercoiled structure.

Eukaryotes, whose chromosomes each consist of a linear DNA molecule, employ a different type of packing strategy to fit their DNA inside the nucleus (Figure $\PageIndex{6}$). At the most basic level, DNA is wrapped around proteins known as histones to form structures called nucleosomes. The DNA is wrapped tightly around the histone core. This nucleosome is linked to the next one by a short strand of DNA that is free of histones. This is also known as the “beads on a string” structure; the nucleosomes are the “beads” and the short lengths of DNA between them are the “string.” The nucleosomes, with their DNA coiled around them, stack compactly onto each other to form a 30-nm–wide fiber. This fiber is further coiled into a thicker and more compact structure. At the metaphase stage of mitosis, when the chromosomes are lined up in the center of the cell, the chromosomes are at their most compacted. They are approximately 700 nm in width, and are found in association with scaffold proteins.

In interphase, the phase of the cell cycle between mitoses at which the chromosomes are decondensed, eukaryotic chromosomes have two distinct regions that can be distinguished by staining. There is a tightly packaged region that stains darkly, and a less dense region. The darkly staining regions usually contain genes that are not active, and are found in the regions of the centromere and telomeres. The lightly staining regions usually contain genes that are active, with DNA packaged around nucleosomes but not further compacted.

Illustration shows levels of organization of eukaryotic chromosomes, starting with the DNA double helix, which wraps around histone proteins. The entire DNA molecule wraps around many clusters of histone proteins, forming a structure that looks like beads on a string. The chromatin is further condensed by wrapping around a protein core. The result is a compact chromosome, shown in duplicated form.

CONCEPT IN ACTION

Watch this animation of DNA packaging.

The model of the double-helix structure of DNA was proposed by Watson and Crick. The DNA molecule is a polymer of nucleotides. Each nucleotide is composed of a nitrogenous base, a five-carbon sugar (deoxyribose), and a phosphate group. There are four nitrogenous bases in DNA, two purines (adenine and guanine) and two pyrimidines (cytosine and thymine). A DNA molecule is composed of two strands. Each strand is composed of nucleotides bonded together covalently between the phosphate group of one and the deoxyribose sugar of the next. From this backbone extend the bases. The bases of one strand bond to the bases of the second strand with hydrogen bonds. Adenine always bonds with thymine, and cytosine always bonds with guanine. The bonding causes the two strands to spiral around each other in a shape called a double helix. Ribonucleic acid (RNA) is a second nucleic acid found in cells. RNA is a single-stranded polymer of nucleotides. It also differs from DNA in that it contains the sugar ribose, rather than deoxyribose, and the nucleotide uracil rather than thymine. Various RNA molecules function in the process of forming proteins from the genetic code in DNA.

Prokaryotes contain a single, double-stranded circular chromosome. Eukaryotes contain double-stranded linear DNA molecules packaged into chromosomes. The DNA helix is wrapped around proteins to form nucleosomes. The protein coils are further coiled, and during mitosis and meiosis, the chromosomes become even more greatly coiled to facilitate their movement. Chromosomes have two distinct regions which can be distinguished by staining, reflecting different degrees of packaging and determined by whether the DNA in a region is being expressed (euchromatin) or not (heterochromatin).

Contributors and Attributions

Samantha Fowler (Clayton State University), Rebecca Roush (Sandhills Community College), James Wise (Hampton University). Original content by OpenStax (CC BY 4.0; Access for free at https://cnx.org/contents/b3c1e1d2-83...4-e119a8aafbdd ).

DNA structure and function

Affiliations.

1 MRC Laboratory of Molecular Biology, Cambridge, UK.
2 Department of Biochemistry, University of Cambridge, UK.
3 Jacobs University Bremen, Germany.
PMID: 25903461
DOI: 10.1111/febs.13307

The proposal of a double-helical structure for DNA over 60 years ago provided an eminently satisfying explanation for the heritability of genetic information. But why is DNA, and not RNA, now the dominant biological information store? We argue that, in addition to its coding function, the ability of DNA, unlike RNA, to adopt a B-DNA structure confers advantages both for information accessibility and for packaging. The information encoded by DNA is both digital - the precise base specifying, for example, amino acid sequences - and analogue. The latter determines the sequence-dependent physicochemical properties of DNA, for example, its stiffness and susceptibility to strand separation. Most importantly, DNA chirality enables the formation of supercoiling under torsional stress. We review recent evidence suggesting that DNA supercoiling, particularly that generated by DNA translocases, is a major driver of gene regulation and patterns of chromosomal gene organization, and in its guise as a promoter of DNA packaging enables DNA to act as an energy store to facilitate the passage of translocating enzymes such as RNA polymerase.

Keywords: A-DNA; B-DNA; DNA as an energy store; DNA backbone conformation; DNA elasticity; DNA information; DNA structure; DNA topology; alternative DNA structures; genome organisation.

Publication types

Historical Article
Chromatin Assembly and Disassembly
DNA / chemistry*
DNA / metabolism
DNA, Superhelical / chemistry
DNA, Superhelical / metabolism
Energy Metabolism
Genetic Phenomena*
Genetics / history*
History, 20th Century
History, 21st Century
Nucleic Acid Conformation
DNA, Superhelical

Grants and funding

MC_U105178783/MRC_/Medical Research Council/United Kingdom

History Classics
Your Profile
Find History on Facebook (Opens in a new window)
Find History on Twitter (Opens in a new window)
Find History on YouTube (Opens in a new window)
Find History on Instagram (Opens in a new window)
Find History on TikTok (Opens in a new window)
This Day In History
History Podcasts
History Vault

Rosalind Franklin’s Overlooked Role in the Discovery of DNA’s Structure

By: Sarah Pruitt

Published: March 25, 2024

It’s one of the most famous moments in the history of science: On February 28, 1953, Cambridge University molecular biologists James Watson and Francis Crick determined that the structure of deoxyribonucleic acid, or DNA—the molecule carrying the genetic code unique to any individual—was a double helix polymer, a spiral consisting of two strands of DNA wound around one another.

Nearly 10 years later, Watson and Crick, along with biophysicist Maurice Wilkins, received the 1962 Nobel Prize in Physiology or Medicine for uncovering what they called the “secret of life.” Yet another person was missing from the award ceremony, whose work was vital to the discovery of DNA’s structure. Rosalind Franklin was a chemist and X-ray crystallographer who studied DNA at King’s College London from 1951 to 1953, and her unpublished data paved the way for Watson and Crick’s breakthrough.

An Unflattering Portrayal in Watson's Account

Franklin, who died of ovarian cancer in 1958 at the age of 37, was ineligible to receive the Nobel, which is not given posthumously. Yet debate over her role in the discovery of DNA’s structure and her failure to be recognized for it began simmering after the publication of Watson’s bestselling book The Double Helix: A Personal Account of the Discovery of the Structure of DNA in 1968 and its highly unflattering portrait of Franklin.

“Watson portrayed Franklin as this kind of evil figure—a schoolmarmish, shrewish person,” says Nathaniel Comfort, a historian of medicine at Johns Hopkins University who is working on a biography of the famed molecular biologist. Watson also related in his book that he and Crick had gained access to Franklin’s data without her knowledge, including the now-famous Photograph 51, an X-ray image of DNA that immediately convinced Watson that the molecule’s structure must be a helix.

Watson’s treatment of Franklin in The Double Helix provoked a robust backlash among those who viewed her as a victim of betrayal, sexism and misogyny, including Franklin’s friend Anne Sayre, who published a biography of Franklin in 1975 . Comfort argues that this view also obscures the more complicated truth of Franklin’s contributions. As he and Matthew Cobb argued in a 2023 article in Nature , a reconsideration of the available evidence suggests that Franklin should be recognized not as a martyr, but as an equal contributor to solving the double helix structure of DNA.

Rosalind Franklin: Expert Crystallographer

Rosalind Elsie Franklin. (Credit: Universal History Archive/Getty Images)

In 1951, Franklin joined a team of biophysicists led by John Randall at King’s College who were using X-ray crystallography to study DNA. The molecule had been discovered in 1869, but its structure and function weren’t yet understood. After learning X-ray crystallography at a government-run lab in France, she was already an expert in the scientific technique, which involves beaming X-rays at crystalline structures and taking photographs of the patterns created by atoms in the structures diffracting the X-rays. By measuring the sizes, angles and intensities of the patterns, researchers can create a 3-D picture of the crystalline structure.

From the beginning, Franklin famously clashed with Wilkins, who was Randall's deputy, and the two began working largely separately from one another. Wilkins had previously identified two forms of DNA appearing in the X-ray images; Franklin discovered that by adjusting the level of humidity in the specimen chamber, she could convert the crystalline, relatively dry “A” form of DNA into the wetter, paracrystalline “B” form. She shared these key insights into DNA at a seminar in November 1951, which Watson attended.

“Her notes for that lecture are very detailed,” Comfort says, adding that Franklin initially assumed both the A and B forms had a helical structure. “She describes DNA as a big helix, describes the two forms and lays out their differences…and [explains] how the structure switches from A to B depending on the relative humidity in the sample chamber.”

Franklin’s ‘Photograph 51’

Despite capturing clear evidence of the B form’s double helical structure—most notably in what became known as Photograph 51, taken in May 1952—Franklin chose to focus on the drier A form of DNA, which produced a much sharper, more detailed image than the B form. This focus pointed her away from the idea of a helix, because the A form did not appear to be helical.

“For a chemist and an X-ray crystallographer, she was doing the [form] that made the most sense,” Comfort says. “She wasn't a biologist, and so she didn't appreciate that in a living cell, the more hydrated B form was going to be much more present, because a cell is a very wet place.”

In February 1953, Wilkins showed Photograph 51 of the B form of DNA to his friend Watson at Cambridge, who along with Crick was attempting to determine the molecule’s structure mainly through building and analyzing physical models. Wilkins received the image from Raymond Gosling, who worked for both Wilkins and Franklin and had taken the photo with Franklin.

Watson later claimed that seeing Photograph 51 immediately convinced him that a DNA helix must exist. “The instant I saw the picture my mouth fell open and my pulse began to race,” he wrote in The Double Helix . Soon after that, Crick’s supervisor passed along a report on Franklin’s unpublished results, which he had received during a visit to the King’s College lab in December 1952. By late February 1953, Watson and Crick had constructed their model of the DNA double helix, which they formally announced in a landmark paper in Nature that April.

To Comfort, Watson’s version of events doesn’t ring entirely true when it comes to Photograph 51 and its importance. “Watson talks [in The Double Helix ] about realizing only then that there was an A and a B form…but Franklin talked about that at the end of 1951, and she and Wilkins talked about it openly,” Comfort says. “I think he was writing it as though the photograph was the magic key because it made a good discovery narrative that allowed him to boil down and communicate an enormously complex, highly technical kind of science.”

Franklin’s Understanding of DNA’s Structure

Comfort also discounts the idea that Franklin, an expert crystallographer, did not understand the significance of the X-ray diffraction image she and Gosling had taken of DNA’s B form 10 months earlier. “She was way too good for that,” he says.

In fact, Franklin was simply more focused on the A form of DNA at the time, and was also in the process of leaving King’s College behind for a new job at Birkbeck College, also in London. Before she left, however, Franklin started a new laboratory notebook, with notes on the B form of DNA.

By late February 1953, Franklin’s notes reveal that she had not only accepted that DNA had a helical structure, probably with two strands; she had also recognized that the component nucleotides, or bases, on each strand were related in a way that made the strands complementary, allowing the molecule to easily replicate. “Franklin’s colleague Aaron Klug analyzed her research notes and said that Franklin was ‘two steps away’ from the double helix,” Comfort says. “Given a couple more months, she surely would have had it.”

Both Wilkins and Franklin (with Gosling) published separate papers in the same April 1953 issue of Nature , largely supporting Watson and Crick’s model of DNA’s structure. The earliest presentation of the double helix that June was signed by authors of all three papers, suggesting—as Comfort and Cobb point out in their article—that the discovery of DNA was seen at the time as a joint effort, not just the triumph of Watson and Crick.

Taking Full Measure of Franklin’s Contributions

Over the next five years, Franklin led a team of researchers studying ribonucleic acid, or RNA, in viruses such as polio and the tobacco mosaic virus (TMV). Diagnosed with ovarian cancer in 1956, Franklin continued her work until days before her death in April 1958. Franklin also remained in regular contact with Watson and Crick after she left King’s College, even becoming good friends with Crick and his wife, Odile.

Franklin’s unjust exclusion from the Nobel Prize, combined with Watson’s decidedly sexist portrayal in The Double Helix led many to see her as a victim of chauvinism and betrayal. A more complicated view of events reveals a scientist who was an equal contributor to the discovery of DNA’s structure, as well as a trailblazer in the all-important field of virology.

“Franklin had an incredible series of insights into how the RNA is packed within the protein shell of TMV,” Comfort says. “She was widely recognized and seen as being at the top of her field.”

HISTORY Vault: Women's History

Stream acclaimed women's history documentaries in HISTORY Vault.

Sign up for Inside History

Get HISTORY’s most fascinating stories delivered to your inbox three times a week.

By submitting your information, you agree to receive emails from HISTORY and A+E Networks. You can opt out at any time. You must be 16 years or older and a resident of the United States.

More details : Privacy Notice | Terms of Use | Contact Us

Francis Crick

Biographical Overview

The Discovery of the Double Helix, 1951-1953

Defining the Genetic Coding Problem, 1954-1957
Deciphering the Genetic Code, 1958-1966
Embryology and the Organization of DNA in Higher Organisms, 1966-1976
From Molecular Biology to Neurobiology, 1976-2004
Additional Resources
Collection Items

The discovery in 1953 of the double helix, the twisted-ladder structure of deoxyribonucleic acid (DNA), by James Watson and Francis Crick marked a milestone in the history of science and gave rise to modern molecular biology, which is largely concerned with understanding how genes control the chemical processes within cells. In short order, their discovery yielded ground-breaking insights into the genetic code and protein synthesis. During the 1970s and 1980s, it helped to produce new and powerful scientific techniques, specifically recombinant DNA research, genetic engineering, rapid gene sequencing, and monoclonal antibodies, techniques on which today's multi-billion dollar biotechnology industry is founded. Major current advances in science, namely genetic fingerprinting and modern forensics, the mapping of the human genome, and the promise, yet unfulfilled, of gene therapy, all have their origins in Watson and Crick's inspired work. The double helix has not only reshaped biology, it has become a cultural icon, represented in sculpture, visual art, jewelry, and toys.

Researchers working on DNA in the early 1950s used the term "gene" to mean the smallest unit of genetic information, but they did not know what a gene actually looked like structurally and chemically, or how it was copied, with very few errors, generation after generation. In 1944, Oswald Avery had shown that DNA was the "transforming principle," the carrier of hereditary information, in pneumococcal bacteria. Nevertheless, many scientists continued to believe that DNA had a structure too uniform and simple to store genetic information for making complex living organisms. The genetic material, they reasoned, must consist of proteins, much more diverse and intricate molecules known to perform a multitude of biological functions in the cell.

Crick and Watson recognized, at an early stage in their careers, that gaining a detailed knowledge of the three-dimensional configuration of the gene was the central problem in molecular biology. Without such knowledge, heredity and reproduction could not be understood. They seized on this problem during their very first encounter, in the summer of 1951, and pursued it with single-minded focus over the course of the next eighteen months. This meant taking on the arduous intellectual task of immersing themselves in all the fields of science involved: genetics, biochemistry, chemistry, physical chemistry, and X-ray crystallography. Drawing on the experimental results of others (they conducted no DNA experiments of their own), taking advantage of their complementary scientific backgrounds in physics and X-ray crystallography (Crick) and viral and bacterial genetics (Watson), and relying on their brilliant intuition, persistence, and luck, the two showed that DNA had a structure sufficiently complex and yet elegantly simple enough to be the master molecule of life.

Other researchers had made important but seemingly unconnected findings about the composition of DNA; it fell to Watson and Crick to unify these disparate findings into a coherent theory of genetic transfer. The organic chemist Alexander Todd had determined that the backbone of the DNA molecule contained repeating phosphate and deoxyribose sugar groups. The biochemist Erwin Chargaff had found that while the amount of DNA and of its four types of bases--the purine bases adenine (A) and guanine (G), and the pyrimidine bases cytosine (C) and thymine(T)--varied widely from species to species, A and T always appeared in ratios of one-to-one, as did G and C. Maurice Wilkins and Rosalind Franklin had obtained high-resolution X-ray images of DNA fibers that suggested a helical, corkscrew-like shape. Linus Pauling, then the world's leading physical chemist, had recently discovered the single-stranded alpha helix, the structure found in many proteins, prompting biologists to think of helical forms. Moreover, he had pioneered the method of model building in chemistry by which Watson and Crick were to uncover the structure of DNA. Indeed, Crick and Watson feared that they would be upstaged by Pauling, who proposed his own model of DNA in February 1953, although his three-stranded helical structure quickly proved erroneous.

The time, then, was ripe for their discovery. After several failed attempts at model building, including their own ill-fated three-stranded version and one in which the bases were paired like with like (adenine with adenine, etc.), they achieved their break-through. Jerry Donohue, a visiting physical chemist from the United States who shared Watson and Crick's office for the year, pointed out that the configuration for the rings of carbon, nitrogen, hydrogen, and oxygen (the elements of all four bases) in thymine and guanine given in most textbooks of chemistry was incorrect. On February 28, 1953, Watson, acting on Donohue's advice, put the two bases into their correct form in cardboard models by moving a hydrogen atom from a position where it bonded with oxygen to a neighboring position where it bonded with nitrogen. While shifting around the cardboard cut-outs of the accurate molecules on his office table, Watson realized in a stroke of inspiration that A, when joined with T, very nearly resembled a combination of C and G, and that each pair could hold together by forming hydrogen bonds. If A always paired with T, and likewise C with G, then not only were Chargaff's rules (that in DNA, the amount of A equals that of T, and C that of G) accounted for, but the pairs could be neatly fitted between the two helical sugar-phosphate backbones of DNA, the outside rails of the ladder. The bases connected to the two backbones at right angles while the backbones retained their regular shape as they wound around a common axis, all of which were structural features demanded by the X-ray evidence. Similarly, the complementary pairing of the bases was compatible with the fact, also established by the X-ray diffraction pattern, that the backbones ran in opposite direction to each other, one up, the other down.

Watson and Crick published their findings in a one-page paper, with the understated title "A Structure for Deoxyribose Nucleic Acid," in the British scientific weekly Nature on April 25, 1953, illustrated with a schematic drawing of the double helix by Crick's wife, Odile. A coin toss decided the order in which they were named as authors. Foremost among the "novel features" of "considerable biological interest" they described was the pairing of the bases on the inside of the two DNA backbones: A=T and C=G. The pairing rule immediately suggested a copying mechanism for DNA: given the sequence of the bases in one strand, that of the other was automatically determined, which meant that when the two chains separated, each served as a template for a complementary new chain. Watson and Crick developed their ideas about genetic replication in a second article in Nature , published on May 30, 1953.

The two had shown that in DNA, form is function: the double-stranded molecule could both produce exact copies of itself and carry genetic instructions. During the following years, Crick elaborated on the implications of the double-helical model, advancing the hypothesis, revolutionary then but widely-accepted since, that the sequence of the bases in DNA forms a code by which genetic information can be stored and transmitted.

Although recognized today as one of the seminal scientific papers of the twentieth century, Watson and Crick's original article in Nature was not frequently cited at first. Its true significance became apparent, and its circulation widened, only towards the end of the 1950s, when the structure of DNA they had proposed was shown to provide a mechanism for controlling protein synthesis, and when their conclusions were confirmed in the laboratory by Matthew Meselson, Arthur Kornberg, and others.

Crick himself immediately understood the significance of his and Watson's discovery. As Watson recalled, after their conceptual breakthrough on February 28, 1953, Crick declared to the assembled lunch patrons at The Eagle that they had "found the secret of life." Crick himself had no memory of such an announcement, but did recall telling his wife that evening "that we seemed to have made a big discovery." He revealed that "years later she told me that she hadn't believed a word of it." As he recounted her words, "You were always coming home and saying things like that, so naturally I thought nothing of it."

Retrospective accounts of the discovery of the structure of DNA have continued to elicit a measure of controversy. Crick was incensed at Watson's depiction of their collaboration in The Double Helix (1968), castigating the book as a betrayal of their friendship, an intrusion into his privacy, and a distortion of his motives. He waged an unsuccessful campaign to prevent its publication. He eventually became reconciled to Watson's bestseller, concluding that if it presented an unfavorable portrait of a scientist, it was of Watson, not of himself.

A more enduring controversy has been generated by Watson and Crick's use of Rosalind Franklin's crystallographic evidence of the structure of DNA, which was shown to them, without her knowledge, by her estranged colleague, Maurice Wilkins, and by Max Perutz. Her evidence demonstrated that the two sugar-phosphate backbones lay on the outside of the molecule, confirmed Watson and Crick's conjecture that the backbones formed a double helix, and revealed to Crick that they were antiparallel. Franklin's superb experimental work thus proved crucial in Watson and Crick's discovery. Yet, they gave her scant acknowledgment. Even so, Franklin bore no resentment towards them. She had presented her findings at a public seminar to which she had invited the two. She soon left DNA research to study tobacco mosaic virus. She became friends with both Watson and Crick, and spent her last period of remission from ovarian cancer in Crick's house (Franklin died in 1958). Crick believed that he and Watson used her evidence appropriately, while admitting that their patronizing attitude towards her, so apparent in The Double Helix , reflected contemporary conventions of gender in science.

Health Conditions
Health Products

What is DNA and how does it impact health?

DNA is a biological molecule that contains the instructions an organism needs to function, develop, and reproduce. It is present in all forms of life on earth and contains each organism’s genetic code.

Virtually every cell in the body contains deoxyribonucleic acid (DNA). It is the genetic code that makes each person unique. DNA carries the instructions for the development, growth, reproduction, and functioning of all life.

Differences in the genetic code are why one person has blue eyes rather than brown, why birds only have two wings, or why giraffes have long necks. Differences or mutations in the genetic code can also lead to being susceptible to certain diseases.

Not only do nearly all cells in the body contain DNA, but the DNA in a single cell would span over 6.5 feet (ft) long if unraveled and stretched end-to-end.

This article will break down the basics of DNA, what it is made of, how it works, and how it impacts health.

What is DNA?

In short, DNA is a long molecule that contains each person’s unique genetic code. It holds the instructions for building the proteins essential for the body’s function.

DNA instructions pass from parent to child, with roughly half of a child’s DNA originating from the father and half from the mother.

How is DNA structured?

DNA is a two-stranded molecule that appears twisted, giving it a unique shape referred to as the double helix.

Each of the two strands is a long sequence of nucleotides. These are the individual units of DNA and they are made of:

a phosphate molecule
a sugar molecule called deoxyribose, containing five carbons
a nitrogen-containing region

There are four types of nitrogen-containing regions called bases, including :

adenine (A)
cytosine (C)
guanine (G)
thymine (T)

The order of these four bases forms the genetic code, which is the instructions for life.

The bases of the two strands of DNA are stuck together to create a ladder-like shape. Within the ladder, A sticks to T, and G sticks to C to create the “rungs.” The length of the ladder forms through the sugar and phosphate groups.

What is a gene?

Each length of DNA that codes for a specific protein is called a gene. For instance, one gene codes for the protein insulin , the hormone that helps control levels of sugar in the blood. Humans have around 30,000 genes , although estimates vary.

It’s believed that only about 1% of DNA is made up of protein-coding genes. Scientists know less about the function of the remaining 99% of DNA but believe them to be involved in regulating transcription and translation.

Chromosome 1 is the largest and contains around 2,800 genes . The smallest is chromosome 22 with around 750 genes.

How does DNA work?

Most DNA lives in the nuclei of cells and some exist in mitochondria , which are the powerhouses of the cells.

Because humans have so much DNA and the nuclei are so small, DNA needs to be packaged incredibly neatly.

Strands of DNA loop, coil, and wrap around proteins called histones. In this coiled state, it is DNA is called chromatin.

Chromatin condenses further through a supercoiling process and packages into structures called chromosomes. These chromosomes form the familiar “X” shape.

Each chromosome contains one DNA molecule. Humans have 23 pairs of chromosomes or 46 chromosomes in total. Other species have different numbers. For example, fruit flies have 8 chromosomes , while pigeons have 80 chromosomes .

Protein creation

For genes to create a protein, there are two main steps, including :

Transcription: The DNA code duplicates into messenger RNA (mRNA). RNA is a copy of DNA, but it is normally single-stranded. Another difference is that RNA does not contain the base thymine (T). In RNA, uracil (U) replaces thymine (T).
Translation: The mRNA translates into amino acids by transfer RNA (tRNA).

mRNA provides information on a particular amino acid via three-letter sections called codons. Each codon codes for a specific amino acid or building block of a protein. For instance, the codon GUG codes for the amino acid valine.

There are 20 possible amino acids .

Telomeres are regions of repeated nucleotides at the end of chromosomes.

They protect the ends of the chromosome from being damaged or fusing with other chromosomes.

Scientists liken them to the plastic tips on shoelaces that stop them from becoming frayed.

As a person gets older, this protective region steadily becomes smaller. Each time a cell divides and DNA is replicated, the telomere becomes shorter.

How does DNA affect health?

In all people, DNA degrades over time, causing people to age.

Sometimes, however, a person’s DNA sequence may change randomly. This is called a mutation. Certain mutations in a person’s genetic code can lead them to develop a variety of diseases or conditions.

Alternatively, a person can inherit a gene that may cause problems with their health. Environmental factors can influence how these mutated genes manifest.

Damage to the structure of DNA can occur in various ways. This includes when:

the bases connect in the wrong order after replication
a base pair is missing
there is an extra base pair
there is a malfunction in DNA replication or recombination
there is exposure to environmental factors such as radiation or heavy metals
there is a mutation in the process of repairing damaged DNA.
there is a change in the number or structure of chromosomes

Diseases or health conditions can result from damage in only one gene, such as cystic fibrosis, or damage in several parts of a person’s DNA, such as cancer. Other examples include:

Down’s syndrome
autoimmune conditions
chronic inflammatory conditions
neurodegenerative diseases like Huntington’s disease

Frequently asked questions

Here are a few common questions about DNA.

Who discovered DNA?

The discovery of DNA is credited to Swiss scientist Friedrich Miescher, who first isolated DNA from human pus cells in the late 1860s.

What are the different types of DNA?

There are many types of DNA, each of which varies depending on its specific structure. The most common is B-DNA, but some other types found in the genome include A-DNA, H-DNA, and Z-DNA.

What is DNA replication?

DNA replication is a process that occurs when DNA in the cells copies itself. This helps ensure that each new cell has its own complete genome during cell division.

Can genetic diseases be cured?

Doctors can only treat the symptoms of conditions caused by a genetic mutation. However, researchers are continuously working to develop gene therapy types that may help stop a disease from progressing. The U.S. Food and Drug Administration (FDA) has approved some gene therapy drugs, while others are undergoing clinical trials.

DNA is a molecule found in most cells holding each person’s unique genetic code. It is responsible for coding proteins, which are essential to the growth and development of cells.

Chromosomes are tightly coiled strands of DNA. Genes are sections of DNA that code individual proteins. DNA also carries important genetic information necessary for the survival and function of all life forms on earth.

Put another way, DNA is the master plan for life on earth and gives all living organisms their unique genetic code. When something in this plan malfunctions, diseases and health problems can occur.

Last medically reviewed on August 2, 2022

Biology / Biochemistry

How we reviewed this article:

ACGT. (2022). https://www.genome.gov/genetics-glossary/acgt
Autosome. (2022). https://www.genome.gov/genetics-glossary/Autosome
Codon. (2022). https://www.genome.gov/genetics-glossary/Codon
Di Salvo, T. G., et al. (2015). Right ventricular long noncoding RNA expression in human heart failure. https://onlinelibrary.wiley.com/doi/full/10.1086/679721
DNA replication. (2022). https://www.genome.gov/genetics-glossary/DNA-Replication
Drexler, M. (n.d.). The DNA of public health. https://www.hsph.harvard.edu/news/magazine/centennial-dna-public-health/
Drosophila melanogaster (fruit fly). (n.d.). https://www.ncbi.nlm.nih.gov/genome?term=txid7227
Ghannam, J. Y., et al. (2021). Biochemistry, DNA structure. https://www.ncbi.nlm.nih.gov/books/NBK538241/
Guiblet, W. M., et al. (2021). Non-B DNA: A major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. https://academic.oup.com/nar/article/49/3/1497/6101603
Hoerter, J. E., et al. (2021). Biochemistry, protein synthesis. https://www.ncbi.nlm.nih.gov/books/NBK545161/
Human genome project FAQ. (n.d.). https://www.genome.gov/human-genome-project/Completion-FAQ
Kretschmer, R., et al. (2020). A comprehensive cytogenetic analysis of several members of the family Columbidae (Aves, Columbiformes). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7349364/
Liu, P. P. (2022). Chromatin. https://www.genome.gov/genetics-glossary/Chromatin
Nakad, R., et al. (2016). Dna damage response and immune defense: links and mechanisms. /full https://www.frontiersin.org/articles/10.3389/fgene.2016.00147
Privalov, P. L., et al. (2022). Forces maintaining the DNA double helix. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7351851/
Sturm, N. (2019). Types of mutations. http://www2.csudh.edu/nsturm/CHEMXL153/DNAMutationRepair.htm
Telomere. (2022). https://www.genome.gov/genetics-glossary/Telomere
Thess A., et al. (2021). Historic nucleic acids isolated by Friedrich Miescher contain RNA besides DNA. https://pubmed.ncbi.nlm.nih.gov/34523295/
Understanding the importance of gene therapy for rare disease. (n.d.). https://rarediseases.org/gene-therapy/
Yousefzadeh, M., et al. (2021). DNA damage—how and why we age? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7846274/
What is a genetic disorder? (n.d.). https://www.yourgenome.org/facts/what-is-a-genetic-disorder/
What is noncoding DNA? (2021). https://medlineplus.gov/genetics/understanding/basics/noncodingdna/

Share this article

Latest news

Newly identified biomarkers may warn of cancer 7 years before symptoms set in
Ketogenic diet may help lower stress and boost mental health, researchers say
Semaglutide may improve heart failure symptoms, reduce need for diuretics
Why are men at higher diabetes, diabetes complications risk than women?
Global health: Disease risk factors such as hypertension, high blood sugar are on the rise

Related Coverage

A genetic disorder is a condition that occurs as a result of a mutation in DNA. There are many different types of genetic disorder. Learn more here.

The body needs CoQ10, or coenzyme Q10, to function correctly. In this article, we discuss the possible health benefits of CoQ10 supplementation, along…

Find out how vesicles function in the body and what the five main types of vesicle are. We also discuss how vesicles interact with other cells and…

Introduction to Genomics
Educational Resources
Policy Issues in Genomics

The Human Genome Project

Funding Opportunities
Funded Programs & Projects
Division and Program Directors
Scientific Program Analysts
Contact by Research Area
News & Events
Research Areas
Research investigators
Research Projects
Clinical Research
Data Tools & Resources
Genomics & Medicine
Family Health History
For Patients & Families
For Health Professionals
Jobs at NHGRI
Training at NHGRI
Funding for Research Training
Professional Development Programs
NHGRI Culture
Social Media
Broadcast Media
Image Gallery
Press Resources
Organization
NHGRI Director
Mission & Vision
Policies & Guidance
Institute Advisors
Strategic Vision
Leadership Initiatives
Diversity, Equity, and Inclusion
Partner with NHGRI
Staff Search

The Human Genome Project (HGP) is one of the greatest scientific feats in history. The project was a voyage of biological discovery led by an international group of researchers looking to comprehensively study all of the DNA (known as a genome) of a select set of organisms. Launched in October 1990 and completed in April 2003, the Human Genome Project’s signature accomplishment – generating the first sequence of the human genome – provided fundamental information about the human blueprint, which has since accelerated the study of human biology and improved the practice of medicine.

Learn more about the Human Genome Project below.

A virtual exhibit exploring the 1990 letter writing campaign to oppose the HGP.

A virtual discussion with the leaders of the five genome-sequencing centers that provides the untold story on how they got the HGP across the finish line in 2003.

A fact sheet detailing how the project began and how it shaped the future of research and technology.

Human Genome Project Timeline of Events | NHGRI

An interactive timeline listing key moments from the history of the project.

A downloadable poster containing major scientific landmarks before and throughout the project.

Prominent scientists involved in the project reflect on the lessons learned.

Commentary in the journal Nature written by NHGRI leaders discussing the legacies of the project.

Lecture-oriented slides telling the story of the project by a front-line participant.

MIT Technology Review

Newsletters

Google DeepMind’s new AlphaFold can model a much larger slice of biological life

AlphaFold 3 can predict how DNA, RNA, and other molecules interact, further cementing its leading role in drug discovery and research. Who will benefit?

James O'Donnell archive page

Google DeepMind has released an improved version of its biology prediction tool, AlphaFold, that can predict the structures not only of proteins but of nearly all the elements of biological life.

It’s a development that could help accelerate drug discovery and other scientific research. The tool is currently being used to experiment with identifying everything from resilient crops to new vaccines.

While the previous model, released in 2020, amazed the research community with its ability to predict proteins structures, researchers have been clamoring for the tool to handle more than just proteins.

Now, DeepMind says, AlphaFold 3 can predict the structures of DNA, RNA, and molecules like ligands, which are essential to drug discovery. DeepMind says the tool provides a more nuanced and dynamic portrait of molecule interactions than anything previously available.

“Biology is a dynamic system,” DeepMind CEO Demis Hassabis told reporters on a call. “Properties of biology emerge through the interactions between different molecules in the cell, and you can think about AlphaFold 3 as our first big sort of step toward [modeling] that.”

AlphaFold 2 helped us better map the human heart , model antimicrobial resistance , and identify the eggs of extinct birds , but we don’t yet know what advances AlphaFold 3 will bring.

Mohammed AlQuraishi, an assistant professor of systems biology at Columbia University who is unaffiliated with DeepMind, thinks the new version of the model will be even better for drug discovery. “The AlphaFold 2 system only knew about amino acids, so it was of very limited utility for biopharma,” he says. “But now, the system can in principle predict where a drug binds a protein.”

Isomorphic Labs, a drug discovery spinoff of DeepMind, is already using the model for exactly that purpose, collaborating with pharmaceutical companies to try to develop new treatments for diseases, according to DeepMind.

AlQuraishi says the release marks a big leap forward. But there are caveats.

“It makes the system much more general, and in particular for drug discovery purposes (in early-stage research), it’s far more useful now than AlphaFold 2,” he says. But as with most models, the impact of AlphaFold will depend on how accurate its predictions are. For some uses, AlphaFold 3 has double the success rate of similar leading models like RoseTTAFold. But for others, like protein-RNA interactions, AlQuraishi says it’s still very inaccurate.

DeepMind says that depending on the interaction being modeled, accuracy can range from 40% to over 80%, and the model will let researchers know how confident it is in its prediction. With less accurate predictions, researchers have to use AlphaFold merely as a starting point before pursuing other methods. Regardless of these ranges in accuracy, if researchers are trying to take the first steps toward answering a question like which enzymes have the potential to break down the plastic in water bottles, it’s vastly more efficient to use a tool like AlphaFold than experimental techniques such as x-ray crystallography.

A revamped model

AlphaFold 3’s larger library of molecules and higher level of complexity required improvements to the underlying model architecture. So DeepMind turned to diffusion techniques, which AI researchers have been steadily improving in recent years and now power image and video generators like OpenAI’s DALL-E 2 and Sora. It works by training a model to start with a noisy image and then reduce that noise bit by bit until an accurate prediction emerges. That method allows AlphaFold 3 to handle a much larger set of inputs.

That marked “a big evolution from the previous model,” says John Jumper, director at Google DeepMind. “It really simplified the whole process of getting all these different atoms to work together.”

It also presented new risks. As the AlphaFold 3 paper details, the use of diffusion techniques made it possible for the model to hallucinate, or generate structures that look plausible but in reality could not exist. Researchers reduced that risk by adding more training data to the areas most prone to hallucination, though that doesn’t eliminate the problem completely.

Restricted access

Part of AlphaFold 3’s impact will depend on how DeepMind divvies up access to the model. For AlphaFold 2, the company released the open-source code , allowing researchers to look under the hood to gain a better understanding of how it worked. It was also available for all purposes, including commercial use by drugmakers. For AlphaFold 3, Hassabis said, there are no current plans to release the full code. The company is instead releasing a public interface for the model called the AlphaFold Server , which imposes limitations on which molecules can be experimented with and can only be used for noncommercial purposes. DeepMind says the interface will lower the technical barrier and broaden the use of the tool to biologists who are less knowledgeable about this technology.

Artificial intelligence

Sam altman says helpful agents are poised to become ai’s killer function.

Open AI’s CEO says we won’t need new hardware or lots more training data to get there.

Is robotics about to have its own ChatGPT moment?

Researchers are using generative AI and other techniques to teach robots new skills—including tasks they could perform in homes.

Melissa Heikkilä archive page

What’s next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

Will Douglas Heaven archive page

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

Synthesia's new technology is impressive but raises big questions about a world where we increasingly can’t tell what’s real.

Stay connected

Get the latest updates from mit technology review.

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.

AlphaFold 3 predicts the structure and interactions of all of life’s molecules

May 08, 2024

[[read-time]] min read

Introducing AlphaFold 3, a new AI model developed by Google DeepMind and Isomorphic Labs. By accurately predicting the structure of proteins, DNA, RNA, ligands and more, and how they interact, we hope it will transform our understanding of the biological world and drug discovery.

Colorful protein structure against an abstract gradient background.

Inside every plant, animal and human cell are billions of molecular machines. They’re made up of proteins, DNA and other molecules, but no single piece works on its own. Only by seeing how they interact together, across millions of types of combinations, can we start to truly understand life’s processes.

In a paper published in Nature , we introduce AlphaFold 3, a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy. For the interactions of proteins with other molecule types we see at least a 50% improvement compared with existing prediction methods, and for some important categories of interaction we have doubled prediction accuracy.

We hope AlphaFold 3 will help transform our understanding of the biological world and drug discovery. Scientists can access the majority of its capabilities, for free, through our newly launched AlphaFold Server , an easy-to-use research tool. To build on AlphaFold 3’s potential for drug design, Isomorphic Labs is already collaborating with pharmaceutical companies to apply it to real-world drug design challenges and, ultimately, develop new life-changing treatments for patients.

Our new model builds on the foundations of AlphaFold 2, which in 2020 made a fundamental breakthrough in protein structure prediction . So far, millions of researchers globally have used AlphaFold 2 to make discoveries in areas including malaria vaccines, cancer treatments and enzyme design. AlphaFold has been cited more than 20,000 times and its scientific impact recognized through many prizes, most recently the Breakthrough Prize in Life Sciences . AlphaFold 3 takes us beyond proteins to a broad spectrum of biomolecules. This leap could unlock more transformative science, from developing biorenewable materials and more resilient crops, to accelerating drug design and genomics research.

7PNM - Spike protein of a common cold virus (Coronavirus OC43): AlphaFold 3’s structural prediction for a spike protein (blue) of a cold virus as it interacts with antibodies (turquoise) and simple sugars (yellow), accurately matches the true structure (gray). The animation shows the protein interacting with an antibody, then a sugar. Advancing our knowledge of such immune-system processes helps better understand coronaviruses, including COVID-19, raising possibilities for improved treatments.

How AlphaFold 3 reveals life’s molecules

Given an input list of molecules, AlphaFold 3 generates their joint 3D structure, revealing how they all fit together. It models large biomolecules such as proteins, DNA and RNA, as well as small molecules, also known as ligands — a category encompassing many drugs. Furthermore, AlphaFold 3 can model chemical modifications to these molecules which control the healthy functioning of cells, that when disrupted can lead to disease.

AlphaFold 3’s capabilities come from its next-generation architecture and training that now covers all of life’s molecules. At the core of the model is an improved version of our Evoformer module — a deep learning architecture that underpinned AlphaFold 2’s incredible performance. After processing the inputs, AlphaFold 3 assembles its predictions using a diffusion network, akin to those found in AI image generators. The diffusion process starts with a cloud of atoms, and over many steps converges on its final, most accurate molecular structure.

AlphaFold 3’s predictions of molecular interactions surpass the accuracy of all existing systems. As a single model that computes entire molecular complexes in a holistic way, it’s uniquely able to unify scientific insights.

7R6R - DNA binding protein: AlphaFold 3’s prediction for a molecular complex featuring a protein (blue) bound to a double helix of DNA (pink) is a near-perfect match to the true molecular structure discovered through painstaking experiments (gray).

Leading drug discovery at Isomorphic Labs

AlphaFold 3 creates capabilities for drug design with predictions for molecules commonly used in drugs, such as ligands and antibodies, that bind to proteins to change how they interact in human health and disease.

AlphaFold 3 achieves unprecedented accuracy in predicting drug-like interactions, including the binding of proteins with ligands and antibodies with their target proteins. AlphaFold 3 is 50% more accurate than the best traditional methods on the PoseBusters benchmark without needing the input of any structural information, making AlphaFold 3 the first AI system to surpass physics-based tools for biomolecular structure prediction. The ability to predict antibody-protein binding is critical to understanding aspects of the human immune response and the design of new antibodies — a growing class of therapeutics.

Using AlphaFold 3 in combination with a complementary suite of in-house AI models, Isomorphic Labs is working on drug design for internal projects as well as with pharmaceutical partners. Isomorphic Labs is using AlphaFold 3 to accelerate and improve the success of drug design — by helping understand how to approach new disease targets, and developing novel ways to pursue existing ones that were previously out of reach.

AlphaFold Server: A free and easy-to-use research tool

8AW3 - RNA modifying protein: AlphaFold 3’s prediction for a molecular complex featuring a protein (blue), a strand of RNA (purple), and two ions (yellow) closely matches the true structure (gray). This complex is involved with the creation of other proteins — a cellular process fundamental to life and health.

Google DeepMind’s newly launched AlphaFold Server is the most accurate tool in the world for predicting how proteins interact with other molecules throughout the cell. It is a free platform that scientists around the world can use for non-commercial research. With just a few clicks, biologists can harness the power of AlphaFold 3 to model structures composed of proteins, DNA, RNA and a selection of ligands, ions and chemical modifications.

AlphaFold Server helps scientists make novel hypotheses to test in the lab, speeding up workflows and enabling further innovation. Our platform gives researchers an accessible way to generate predictions, regardless of their access to computational resources or their expertise in machine learning.

Experimental protein-structure prediction can take about the length of a PhD and cost hundreds of thousands of dollars. Our previous model, AlphaFold 2, has been used to predict hundreds of millions of structures, which would have taken hundreds of millions of researcher-years at the current rate of experimental structural biology.

Demo video showing the capabilities of the server.

Sharing the power of AlphaFold 3 responsibly

With each AlphaFold release, we’ve sought to understand the broad impact of the technology , working together with the research and safety community. We take a science-led approach and have conducted extensive assessments to mitigate potential risks and share the widespread benefits to biology and humanity.

Building on the external consultations we carried out for AlphaFold 2, we’ve now engaged with more than 50 domain experts, in addition to specialist third parties, across biosecurity, research and industry, to understand the capabilities of successive AlphaFold models and any potential risks. We also participated in community-wide forums and discussions ahead of AlphaFold 3’s launch.

AlphaFold Server reflects our ongoing commitment to share the benefits of AlphaFold, including our free database of 200 million protein structures. We’ll also be expanding our free AlphaFold education online course with EMBL-EBI and partnerships with organizations in the Global South to equip scientists with the tools they need to accelerate adoption and research, including on underfunded areas such as neglected diseases and food security. We’ll continue to work with the scientific community and policy makers to develop and deploy AI technologies responsibly.

Opening up the future of AI-powered cell biology

7BBV - Enzyme: AlphaFold 3’s prediction for a molecular complex featuring an enzyme protein (blue), an ion (yellow sphere) and simple sugars (yellow), along with the true structure (gray). This enzyme is found in a soil-borne fungus (Verticillium dahliae) that damages a wide range of plants. Insights into how this enzyme interacts with plant cells could help researchers develop healthier, more resilient crops.

AlphaFold 3 brings the biological world into high definition. It allows scientists to see cellular systems in all their complexity, across structures, interactions and modifications. This new window on the molecules of life reveals how they’re all connected and helps understand how those connections affect biological functions — such as the actions of drugs, the production of hormones and the health-preserving process of DNA repair.

The impacts of AlphaFold 3 and our free AlphaFold Server will be realized through how they empower scientists to accelerate discovery across open questions in biology and new lines of research. We’re just beginning to tap into AlphaFold 3’s potential and can’t wait to see what the future holds.

New support for AI advancement in Central and Eastern Europe

Bringing Gemini to Google Workspace for Education

8 new accessibility updates across Lookout, Google Maps and more

100 things we announced at I/O 2024

How Google’s AI model Gemini got its name

How The FA uses Google Cloud AI to identify future England football stars

Let’s stay in touch. Get the latest news from Google in your inbox.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Portland Press Opt2Pay

Understanding biochemistry: structure and function of nucleic acids

Steve minchin.

School of Biosciences, University of Birmingham, Birmingham, United Kingdom

Julia Lodge

Nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), carry genetic information which is read in cells to make the RNA and proteins by which living things function. The well-known structure of the DNA double helix allows this information to be copied and passed on to the next generation. In this article we summarise the structure and function of nucleic acids. The article includes a historical perspective and summarises some of the early work which led to our understanding of this important molecule and how it functions; many of these pioneering scientists were awarded Nobel Prizes for their work. We explain the structure of the DNA molecule, how it is packaged into chromosomes and how it is replicated prior to cell division. We look at how the concept of the gene has developed since the term was first coined and how DNA is copied into RNA (transcription) and translated into protein (translation).

The structure of deoxyribonucleic acid

Deoxyribonucleic acid (DNA) is one of the most important molecules in living cells. It encodes the instruction manual for life. Genome is the complete set of DNA molecules within the organism, so in humans this would be the DNA present in the 23 pairs of chromosomes in the nucleus plus the relatively small mitochondrial genome. Humans have a diploid genome, inheriting one set of chromosomes from each parent. A complete and functioning diploid genome is required for normal development and to maintain life.

Discovery and chemical characterisation of DNA

DNA was discovered in 1869 by a Swiss biochemist, Friedrich Miescher. He wanted to determine the chemical composition of leucocytes (white blood cells), his source of leucocytes was pus from fresh surgical bandages. Although initially interested in all the components of the cell, Miescher quickly focussed on the nucleus because he observed that when treated with acid, a precipitate was formed which he called ‘nuclein’. Almost all molecular bioscience graduates would have repeated a form of this experiment in laboratory classes where DNA is isolated from cells. Miescher, Richard Altmann and Albrecht Kossel further characterised ‘nuclein’ and the name was changed to nucleic acid by Altmann. Kossel went on to show that nucleic acid contained purine and pyrimidine bases, a sugar and phosphate. Work in the 1930s from many scientists further characterised nucleic acids including the identification of the four bases and the presence of deoxyribose, hence the name deoxyribonucleic acid (DNA). Erwin Chargaff had found that DNA molecules from a particular species always contained the same amount of the bases cytosine (C) and guanine (G) and the same amount of adenosine (A) and thymine (T). So, for example, the human genome contains 20% C, 20% G, 30% A and 30% T.

DNA is a polymer made of monomeric units called nucleotides ( Figure 1 A), a nucleotide comprises a 5-carbon sugar, deoxyribose, a nitrogenous base and one or more phosphate groups. The building blocks for DNA synthesis contain three phosphate groups, two are lost during this process, so the DNA strand contains one phosphate group per nucleotide.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g1.jpg

( A ) A nucleotide (guanosine triphosphate). The nitrogenous base (guanine in this example) is linked to the 1′ carbon of the deoxyribose and the phosphate groups are linked to the 5′ carbon. A nucleoside is a base linked to a sugar. A nucleotide is a nucleoside with one or more phosphate groups. ( B ) A DNA strand containing four nucleotides with the nitrogenous bases thymine (T), cytosine (C), adenine (A) and guanine (G) respectively. The 3′ carbon of one nucleotide is linked to the 5′ carbon of the next via a phosphodiester bond. The 5′ end is at the top and the 3′ end at the bottom.

There are four different bases in DNA, the double-ring purine bases: adenine and guanine; and the single-ring pyrimidine bases: cytosine and thymine ( Figure 1 B). The carbon within the deoxyribose ring are numbered 1′ to 5′. Within each monomer the phosphate is linked to the 5′ carbon of deoxyribose and the nitrogenous base is linked to the 1′ carbon, this is called an N-glyosidic bond. The phosphate group is acidic, hence the name nucleic acid.

In the DNA chain ( Figure 1 B), the phosphate residue forms a link between the 3′-hydroxyl of one deoxyribose and the 5′-hydroxyl of the next. This linkage is called a phosphodiester bond. DNA strands have a ‘sense of direction’. The deoxyribose at the top of the diagram in Figure 1 B is not linked to another deoxyribose; it terminates with a 5′ phosphate group. At the other end the chain terminates with a 3′ hydroxyl.

DNA is the genetic material

Although many scientists, including Miescher, had observed that prior to cell division the amount of nucleic acid increased, it was not believed to be the genetic material until the work of Fredrick Griffith, Oswald Avery, Colin MacLeod and Maclyn McCarty. In 1928, Griffith showed that living cells could be transformed by extracts from heat-killed cells and that this transformation had the potential to permanently change the genetic makeup of the recipient cell. Griffith was working with two strains of the bacterium Streptococcus pneumoniae. The encapsulated so-called S strain is virulent, whereas the non-capsulated R strain is nonvirulent. If the S strain is injected subcutaneously into mice, the mice die, whereas, if either live R strain is injected or heat-killed S strain is injected, the mouse lives. However, if a mixture of live R strain and heat-killed S strain is injected into a mouse, the mouse will die, and live S strain can be isolated from the blood. So, in the Griffith experiment a component of the heat-killed S strain is transforming the R strain. In 1944, Avery, MacLeod and McCarty went on to show that it was DNA that could transform the avirulent bacterium. They isolated a crude DNA extract from the S strain and destroyed any protein, lipid, carbohydrate and ribonucleic acid (RNA) component and showed that this purified DNA could still transform the R strain. However, when the purified DNA was treated with DNAse, an enzyme that degrades DNA, transformation was lost.

Alfred Hershey and Martha Chase confirmed that DNA was the genetic material. They used a virus that infects bacteria called a bacteriophage. The bacteriophage contains a protein capsid surrounding a DNA molecule. They showed that when bacteriophage T2 infects Escherichia coli , it is the phage DNA, not protein, that enters the bacterial cell.

Determining the structure of DNA

Once it had been shown that DNA was the genetic material, there was a race to determine the three-dimensional structure of the DNA molecule. At King’s College London, Rosalind Franklin and Maurice Wilkins, having obtained data using X-ray diffraction, had proposed that DNA had a helical structure and Franklin had obtained a particularly good X-ray diffraction pattern. In Cambridge, James Watson and Francis Crick used model building together with data from a variety of sources including Franklin’s X-ray diffraction pattern and Chargaff’s base composition data to work out the now well-known double helix structure of DNA. Their work was published in Nature in 1953. The Watson–Crick structure is shown in Figure 2 A.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g2.jpg

( A ) The DNA double helix, with the sugar phosphate backbone on the outside and the nitrogenous bases in the middle. ( B ) An A:T and a G:C base pair with the C1′ of the deoxyribose indicated by the arrow. Note that the C1′ of the deoxyribose is in the same position in all base pairs. In this figure, the atoms on the upper edge of the base pair face into the major groove and those facing lower edge face into the minor groove. The hydrogen bonds between the base pairs are indicated by the dotted line.

DNA is a two-stranded helical structure, the two strands run in opposite directions. In Figure 2 A, one strand is running 5′ to 3′ top to bottom, whereas the other strand is running 3′ to 5′ top to bottom. The helix is right-handed which means that if you are looking down the axis, the helix turns clockwise as it gets further away from you. The two chains interact via hydrogen bonds between pairs of bases with adenine always pairing with thymine, and guanine always pairing with cytosine. The Watson–Crick structure therefore accounts for and explains the Chargaff data which showed that there was always an equal amount of C and G and of A and T. The regular nature of the double helix comes about because the distance between the 1′ carbon of the deoxyribose on one strand and 1′ carbon of the opposite deoxyribose is always the same irrespective of the base pair ( Figure 2 B). The 1′ carbons of the deoxyribose opposing nucleotides do not lie directly opposite each other on the helical axis, this means that the two sugar–phosphate backbones are not equally spaced along the helical axis resulting in major and minor grooves.

The diameter of the helix is 2 nm, adjacent bases are separated by 0.34 nm (0.34 × 10 −9 m) and related by a rotation of 36°, this results in the helical structure repeating every 10 residues. DNA molecules are normally very long and the sequence of bases along the DNA chain is not restricted. For example, the genome of the bacterium E. coli is a single circular chromosome which contains 4.6 million base pairs (4.6 × 10 6 bp), this is therefore 1.6 mm long (4.6 × 10 6 × 0.34 × 10 −9 m). The human genome is made up of 24 distinct chromosomes, chromosomes 1–22 and the X and Y chromosomes present in the nucleus plus mitochondrial DNA. The nuclear chromosomes vary in size from approximately 50–250 × 10 6 bp, the mitochondrial DNA is 17 × 10 3 bp. The total length of a haploid human genome is 3 × 10 9 bp. Within a single human diploid cell, which contains 23 chromosome pairs there is 2 m of DNA. Based on the assumption that humans contain 3 trillion cells with a nucleus, if all the DNA from a single human individual was put end to end, it would reach to the sun and back approximately 20 times.

Another important class of nucleic acids is RNA, the roles of RNA molecules in the cell will be discussed below. Chemically RNA is similar to DNA, it is a chain of similar monomers. The building blocks are nucleotides containing the 5-carbon sugar ribose, a phosphate and a nitrogenous base. The phosphate is attached to the 5′ carbon of the ribose and the nitrogenous base to the 1′ carbon ( Figure 3 ). RNA contains four bases adenine, guanine, cytosine and uracil. RNA is more labile (easily broken down) than DNA and most RNA molecules do not form stable secondary structures, some notable exceptions will be discussed below. The properties of RNA make it ideal as a genetic messenger during protein synthesis, the idea of this genetic messenger, mRNA, was proposed by François Jacob and Jacques Monod.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g3.jpg

An RNA strand containing the four nucleotides with the nitrogenous bases: adenine (A), cytosine (C), guanine (G) and uracil (U) respectively. The 3′ carbon of the ribose of one nucleotide is linked to the 5′ carbon of the next via a phosphodiester bond. The 5′ end on the left and the 3′ end on the right.

Packaging of DNA into eukaryotic cells

DNA has to be highly condensed to fit into the bacterial cell or eukaryotic nucleus. In eukaryotes, histone proteins are used to condense the DNA into chromatin. The basic structure of chromatin is the nucleosome, a nucleosome contains DNA wrapped almost two times around the histone octamer (comprising two copies each of the histone proteins H2A, H2B, H3 and H4) ( Figure 4 ). Further levels of compaction are required to fit the DNA into the nucleus ( Figure 4 ), the nucleosomes are folded upon themselves to form the 30-nm fibre, this is then folded again to form the 300-nm fibre and during mitosis further compaction can occur forming the chromatid which is 700 nm in diameter.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g4.jpg

Histone proteins (H2A, H2B, H3 and H4) associate to form a histone octamer. Approximately 147 bp of DNA wraps around histone octamer to form a nucleosome, generating a ‘beads on a string’ structure, the nucleosome together with histone H1 condense into the 30-nm fibre, there is further condensation to form the 300-nm fibre. During mitosis there is further compaction (not shown).

Processes such as DNA replication and DNA transcription need to occur in the chromatin environment and because of the level of compaction, this acts as a barrier to proteins that need to interact with DNA. Therefore, chromatin structure plays an important role in processes such as regulation of gene expression in eukaryotes. DNA and the histone proteins can be chemically modified, these are called epigenetic modifications as they do not change the DNA sequence, however, they can be passed on during cell division and to subsequent generations, a process known as epigenetic inheritance. As these epigenetic modifications can alter the chromatin structure they regulate gene transcription and can affect the phenotype. Epigenetics plays key roles in many processes, including development, cancer and behaviour and addiction. This will be discussed further later in this article.

Nuclear organisation plays an important role in many biological processes including regulation of gene transcription. In recent years the development of several techniques, including microscopy, have allowed us to gain an understanding of the way the genome is organised in 3D. Individual chromosomes are not randomly spaced within the nucleus; each chromosome has a distinct territory. Actively transcribed regions from different chromosomes are often close to each other and near the interior of the nucleus, whereas, inactive genes are on the periphery or near a special area called the nucleolus where ribosomal RNA is transcribed.

DNA replication

Whenever a cell divides there is a need to synthesise two copies of each chromosome present within the cell. For example in a human, prior to cell division, all 23 pairs of chromosomes need to be replicated to form 46 pairs, so that following cell division each daughter cell has a full complement (23 pairs) of chromosomes. The structure of DNA gives us a clue to how it is replicated, this was eloquently postulated by Watson and Crick in their 1953 paper: “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material”. Each strand can act as a template for the synthesis of the complementary strand, so the replication machinery would ‘unzip’ the double helix and read along the two existing ‘parent’ strands, synthesising a complementary new ‘daughter’ strand with A opposite T, C opposite G etc. This is described as semi-conservative, since each ‘new’ double-stranded DNA molecule has one original parent strand and one newly made daughter ‘strand’.

The evidence that DNA replication was semi-conservative came from an elegant experiment completed by Matthew Meselson and Franklin Stahl. They labelled the parental DNA with a heavy isotope of nitrogen ( 15 N) by growing bacteria in a growth medium that contained 15 NH 4 Cl. They then grew the bacteria, in a medium that contained 14 NH 4 Cl, in conditions such that any newly synthesised DNA would contain 14 N. Since DNA replication is semi-conservative, after one round of DNA replication, each cell would have a DNA molecule that contains one ‘old’ parental strand labelled with 15 N and one ‘new’ daughter strand labelled with 14 N. This was shown by analysing the density of the DNA using density-gradient centrifugation. As predicted, they observed that the new daughter DNA molecule had a density consistent with the fact that it contained both 15 N and 14 N and that this daughter DNA contained one strand with 15 N and another strand with 14 N.

DNA polymerase and DNA synthesis

The enzyme, DNA polymerase, is responsible for DNA synthesis. DNA polymerase is a template-driven enzyme, so it will use the parental DNA strand as a template. It cannot synthesise DNA in the absence of a template. In addition, it will only add nucleotides on to the 3′ end of an existing nucleic acid chain. The building blocks for DNA synthesis are deoxynucleoside triphosphates (dATP, dTTP, dCTP and dGTP). During DNA synthesis, the base within the incoming deoxynucleoside triphosphate pairs with the complementary base on the template strand, a phosphodiester bond is formed between the 5′ phosphate on the incoming nucleotide and the free 3′ hydroxyl on the existing nucleic acid chain; pyrophosphate is released ( Figure 5 ).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g5.jpg

( A ) DNA polymerase binds the template DNA and the new strand. The next nucleotide to be added to the 3′ end of the growing chain will contain guanine (G), this is complementary to the C on the template strand. DNA polymerase catalyses the formation of a phosphodiester bond. ( B ) The chemical reaction during the formation of a phosphodiester bond, showing the addition of a nucleotide containing guanine and the release of pyrophosphate.

Pyrophosphate is the two phosphate residues within the deoxynucleoside triphosphate building block that are not incorporated into the DNA chain. DNA polymerase synthesises DNA in the 5′ to 3′ direction, because it can only add nucleotides on to the 3′ end of the chain. DNA polymerase has proofreading activity, so after the phosphodiester bond has been formed, the base pairing is checked and if a nucleotide with an incorrect base has been added, DNA polymerase will remove the nucleotide using a 3′ to 5′ exonuclease activity. Exonucleases are enzymes that can remove nucleotides from the ends of a DNA molecule, 3′ to 5′ exonucleases remove nucleotides from the 3′ end of a DNA molecule and therefore can remove the last nucleotide that was added during DNA replication. This is analogous to using the delete key to remove a letter that you have typed incorrectly before adding the correct one and continuing typing.

DNA polymerase requires a short double-stranded region with a free 3′ hydroxyl in order to start making a copy of the template; this ensures that DNA is synthesised in a controlled way. Initiation of DNA synthesis uses a small RNA primer (8–12 bases) made by the enzyme primase. DNA polymerase will then extend from the primer copying the template and synthesising the daughter DNA strand. This means that when DNA synthesis first starts each DNA molecule actually contains a small piece of RNA at its 5′ end. This RNA will ultimately be replaced with DNA, how this is done is discussed below.

The origin of replication and the replisome

A large multiprotein complex, called the replisome, is responsible for DNA replication. In prokaryotes, two replisomes form at a specific point on the chromosome called the Origin of Replication ( ori ). The DNA in this region will be opened up, ‘unzipped’ so that the replication machinery can gain access to single-stranded parental DNA, which will act as template for synthesis of the new daughter strands. The two replisomes then travel in opposite directions around the circular prokaryotic chromosome, each replisome forming a replication fork, a schematic representation of one replication fork is shown in Figure 6 .

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g6.jpg

A single replication fork showing the leading and lagging strands. The leading strand is synthesised continuously, reading the template 3′ to 5′, synthesising DNA in the 5′ to 3′ direction. The lagging strand is synthesised discontinuously, in short Okazaki fragments (1000 bases in prokaryotes and 100 bases in eukaryotes).

The replication fork

Within the replication fork, on the so-called leading strand, DNA polymerase moves 3′ to 5′ with respect to the template and synthesises DNA in the 5′ to 3′ direction as it moves in the same direction as the replication fork. Although overall the lagging strand is synthesised in the 3′ to 5′ direction, it is actually synthesised discontinuously in small segments called Okazaki fragments, which are synthesised 5′ to 3′ ( Figure 6 ). Each Okazaki fragment will be started with an RNA primer and is synthesised in the opposite direction to the movement of the replication fork. In prokaryotes, Okazaki fragments are 1000–2000 bases in length. In Figure 6 you will see that the DNA polymerase synthesising the Okazaki fragment will eventually reach the primer for the previous Okazaki fragment. When this happens the primer for the previous fragment is removed by a DNA polymerase using 5′ to 3′ exonuclease activity. DNA polymerase then replaces the missing nucleotides by adding them to the 3′ end of the last Okazaki fragment. When all the primer has been removed, there will be two DNA strands adjacent to each other but not joined by a phosphodiester bond, these two strands are joined together by the enzyme DNA ligase.

The replisome contains a number of other important proteins required for DNA replication. The double-stranded DNA needs to be separated, ‘unzipped’, by a helicase to generate the single-stranded DNA templates for DNA polymerase. As the replication fork moves along the helical DNA, the coils in the DNA in front of the fork become compressed so the DNA is described as being overwound; a topoisomerase is required to ‘relax’ it by remove the over-winding. Single-stranded binding proteins (SSBs) bind the lagging strand template to stabilise and protect the single-stranded DNA.

The two replication forks that form at the ori will move in opposite directions around the circular prokaryotic genome until they reach the terminator sequence, ter , which is on the opposite side of the genome compared with the ori , i.e. it is at 6 o’clock compared with 12 o’clock. This results in the complete replication of the genome. Once DNA replication has been completed a post-replication DNA repair process will correct errors that were not corrected by the proofreading activity of DNA polymerase. The fidelity of DNA replication is extremely high, resulting in an error rate of 1 mistake per 10 9 –10 10 nucleotides added.

DNA replication in eukaryotes

DNA replication is essentially the same in eukaryotes and prokaryotes. In both cases two replisomes form at an ori and generate two replication forks moving in opposite directions away from the origin. In each replication fork there are leading and lagging strands. There are two major differences. The first is that, due to the larger genome size, each chromosome has multiple origins of replication, so there will be a large number of replication forks on each chromosome.

The second difference is that, with the exception of mitochondrial DNA, eukaryotic chromosomes are linear and this results in an issue because of lagging strand synthesis. Replication of a linear chromosome results in shortening of one 5′ end of each daughter DNA molecule. This is because when the primer required for the last Okazaki fragment is removed, DNA polymerase cannot fill the gap ( Figure 7 A). Repeated rounds of DNA replication results in shorter and shorter DNA molecules. If this is not corrected, eukaryotes would have become extinct as their chromosomes get shorter with each generation. Eukaryotes have a mechanism to preserve the ends of chromosomes when it counts; that is in the gametes. The terminal ends of chromosomes, telomeres, contain a highly repeated sequence, for example, in humans the sequence TTAGGG is repeated in tandem 100 to over 1000-times. Repeated rounds of DNA replication will result in the shortening of these telomeric sequences that is the number of repeats will reduce. Telomerase, an RNA containing enzyme, can add additional copies of the repeat sequence to the 3′ end, replacing those lost during DNA replication (see Figure 7 ).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g7.jpg

( A ) Following DNA replication and removal of the primer for the last Okazaki fragment of the lagging strand, there will be a region at the 3′ end that is not base paired, called a 3′ overhang. ( B ) Telomerase binds and uses the RNA it contains to act as a template to extend the 3′ overhang. This extends the 3′ end sufficiently for a new RNA primer to bind and the final Okazaki fragment to be made.

This actually extends the 3′ end of the telomere rather than extending the 5′ that is initially lost during DNA replication. The RNA sequence within telomerase is complementary to the 3′ telomeric sequence and so can bind and act as a template for synthesis of a short DNA sequence. Telomerase then moves along the newly synthesised strand and the process is repeated. Multiple rounds of elongation and translocation ultimately results in the 3′ end being extended so that it is long enough for it to act as template for synthesis of another Okazaki fragment, hence extending both strands of the telomere. Only germ cells and a few other actively dividing cells (e.g. haematopoietic cells) have sufficient levels of telomerase activity to counteract the loss of repeat sequences during DNA replication. At birth, telomeres are over 10000 base pairs in length and there are enough repeats to allow DNA replication and somatic cell division during the lifetime of the organism. If telomeres become too short this will trigger programmed cell death (a process called apoptosis). The lack of telomerase activity in somatic cells limits the number of cell divisions that can occur, and this is a ‘problem’ that needs to be overcome by cancer cells. Telomerase activity is reactivated in most cancers, allowing these cells to divide indefinitely and therefore this activity is a potential target for cancer therapies.

An understanding of DNA synthesis is central to many experimental approaches in molecular biosciences, it allows us to determine DNA sequences including that of the human genome, to analyse environmental samples to better understand the living world around us and to analyse minute biological samples from crime scenes to identify offenders. It is exploited in medicine, for example several drugs used to treat HIV infection or exposure are nucleoside analogues that inhibit DNA synthesis. Many chemotherapy agents used to treat cancer target DNA replication.

The genetic code and the concept of a gene

As we have seen in the previous two sections, the genetic material in a cell is made of DNA and can be copied and passed on to progeny through DNA replication allowing for inheritance of the information that it carries. A large proportion of the information on the DNA is first transcribed into mRNA and then translated into proteins. However there are some RNAs that are never translated into proteins and these have important functions too. Phrases like ‘it is in my genes’ or ‘in my DNA’ are used in common speech to mean to be an important part of who someone is.

The term gene was coined in the early 1900s to describe the basic unit of heredity. Genes were thought of as distinct loci arranged lineally on chromosomes. Breeding experiments with the fruitfly Drosophila supported this view and showed that if two genes are close together on a chromosome they are more likely to be inherited together. The observation that mutations in genes could give rise to altered phenotypes gave rise to the ‘one gene one polypeptide’ hypothesis. Once it became clear that genes were made of DNA, what is referred to as the central dogma of molecular biology was coined. This describes a two step process in which the genes on the DNA are transcribed into RNA and then translated into a sequence of amino acids that makes up a protein. The information flow is from DNA to RNA and then to protein ( Figure 8 ).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g8.jpg

The arrows represent steps where DNA or RNA is being used as a template to direct the synthesis of another polymer, either RNA or protein.

However there are exceptions to this, firstly some viruses have RNA genomes and in some cases these are reverse transcribed into DNA before the genes can be expressed. The retrovirus HIV is an example of this. The other exception is that not all functional RNAs are translated into proteins (see non-coding RNAs below).

The genetic code

The genetic code is the set of rules used by living cells to translate the information encoded within genetic material into proteins. When DNA and RNA were first discovered, the relative simplicity of nucleic acids led many scientists to doubt that it carried the genetic information. DNA only has four different kinds of bases; the question was how it could code for 20 amino acids. If there were a 1:1 correlation between bases and amino acids DNA could only encode four amino acids. Pairs of bases would give 16 possible combinations which is still not enough. However if you consider a triplet code you have 64 possibilities, which is more than enough. This is the code that we are familiar with where each codon, a sequence of three nucleotides, specifies a particular amino acid. This triplet code still did not seem logical because now you have far more codons than you need. There are some other important questions about the genetic code too; are the spare codons used? Is the code overlapping? And is it continuous or are there spacers indicating the end of each codon?

Table 1 shows the genetic code as we now understand it. It is written as RNA with a U rather than a T because it is RNA that cells translate into amino acids. The code is said to be redundant or degenerate because a single amino acid is often coded for by more than one codon. In most cases it is the third nucleotide in the codon that differs; this is often referred to as the degenerate position.

Evidence for the triplet code

The experiments that allowed scientists to decipher the genetic code were carried out long before we were able to determine the sequence of DNA. While it was possible at that time to determine the proportions of each different amino acid in a protein, it was not yet possible to work out the order in which they occurred. Francis Crick and Sydney Brenner answered some key questions with an experiment using mutants of a virus that infects bacteria called bacteriophage. The normal or wild-type phage will infect E. coli and grow. Crick and Brenner investigated mutants that would not grow on some strains of E. coli .

Mutants which are insertions or deletions cause what are called frameshifts. Inserting a single adenine base into the DNA sequence not only changes the amino acid at the position of the insertion but all subsequent amino acids translated from that sequence (Compare Figures 9 A and B); the reading frame has been shifted by one base and it results in a protein that is non-functional. However if you insert three nucleotides you often get a wild-type or near wild-type phenotype. This is because you have inserted a whole triplet codon, you will get one or two amino acids that were not in the original sequence but the reading frame is not shifted ( Figure 9 C) and the rest of the sequence is normal.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g9.jpg

( A ) Wild-type sequence, ( B ) a single base insertion (shown in red) causes a frameshift so all subsequent amino acids are different from the wild-type, ( C ) insertion of three base pairs (shown in red) causes two incorrect amino acids to be incorporated into the protein but there is no frameshift so the rest of the protein has the wild-type sequence.

Crick and Brenner were looking for what they called suppressor mutations that would rescue the mutant and allow it to grow normally. They showed that their suppressor mutants did not simply reverse the original mutation; they often added or subtracted one or more bases. They worked out that if you insert or delete one, two or four nucleotides then you see a mutant phenotype. However, if you insert or delete three nucleotides, this has little or no effect. This was strong supporting evidence for a triplet code. This is also evidence for a redundant code where the same amino acid can be coded in more than one way. If the code were non-redundant there would be 20 codons that code for amino acids and 44 that are ‘nonsense’ codons. In this case inserting three nucleotides would be most likely to introduce a nonsense codon and not restore the wild-type. Crick and Brenner proposed correctly that the genetic code is read from a fixed starting point and the bases are read in groups of three.

Cracking the code

At about the same time two American scientists Marshall Nirenberg and Heinrich Matthaei had developed a cell-free system which could synthesise proteins in a test tube when provided with an RNA molecule. They showed that when provided with an artificial RNA chain composed only of uracil (polyuracil) the system made a polypeptide composed entirely of phenyalanine residues. They now had a tool that they could use to crack the genetic code. RNA composed of cytosine (C) residues directed the synthesis of polyproline and RNA composed of adenosine (A) made polylysine. Experiments with combinations of nucleotides demonstrated that, for example, if you make RNA from A and C you produce proteins containing only six amino acids: asparagine, glutamine, histidine, lysine, proline and threonine. There are eight possible triplet codons that can be made from A and C, two of these we know encode proline and cysteine. The remaining four amino acids must be encoded by other combinations of A and C. This of course provides additional evidence for the redundancy of the genetic code.

These experiments using RNA molecules composed of random combinations of two or three bases were not enough to fully crack the genetic code. The use of chemically synthesised RNA molecules of known repeating sequence added some more important information. For example a synthetic RNA of alternating A and G residues (AGAGAGAGAG…) can be read as two alternating codons CAC and ACA. It encodes a protein of alternating histidine and threonine residues.

In the last section, we will discuss how tRNAs and ribosomes decode the genetic code and synthesise proteins. The final detail of the genetic code was determined by a technique using ribosome-bound tRNAs. Pieces of RNA as short as a single codon will bind to ribosomes and if amino acids attached to tRNA are added they will associate with the complementary RNA. If you then filter the solution you trap only the tRNAs that are bound to the ribosome, these are the ones specified by the codon in your RNA.

Start and stop codons

Of the 64 possible codons, 61 encode amino acids. The three remaining codons: UAA, UAG and UGA do not code for an amino acid, they are sometimes called nonsense codons. They are stop codons; when the ribosome encounters these protein synthesis stops. The AUG codon encodes the amino acid methionine but it is also the most common start codon. As you will see in the last section, the first residue in eukaryotic proteins is always a methionine and in prokaryotes it is a modified amino acid N-Formylmethionine.

Expanding the genetic code

Nature uses a small set of amino acids to make proteins, however if we were able to engineer cells that could use a wider range of building blocks with different physical and chemical properties it would be possible to make novel materials some of which could have useful therapeutic properties; this is one of the aims of synthetic biology. To do this successfully we need to reprogramme the genetic code and to engineer the translation machinery (see later section) to use these new combinations. Some progress has been made, for example in using both the UAG stop codon and the AAG codon for arginine to code for amino acids not normally found in proteins.

Current concept of the gene

Once the genetic code was cracked it was clear that a gene is a sequence of bases on a DNA molecule that codes for a sequence of amino acids in a polypeptide chain or for an RNA molecule with a specific function. The availability of DNA sequences (see ‘Recombinant DNA Technology and DNA Sequencing’ in this issue of Essays in Biochemistry ) of individual genes made it possible to look for patterns characteristic of genes. A gene that codes for a protein has a start codon followed by a series of codons that encode the amino acid sequence and then a stop codon; this is called an open reading frame.

Whole genome sequencing has provided biological data on an unprecedented scale. The need to analyse sequence data has led to the development of the field of bioinformatics; the analysis of these data to answer biological questions. One key concept used in bioinformatics is that of homology. Two organisms that have a common ancestor are said to be homologous and the same can be said of a structure or of a gene. For example limbs with five digits (the pentadactyl limb) are found not only in humans and other mammals but also in birds, reptiles and amphibians. The limbs are homologous, and this is evidence of a common evolutionary ancestor of all of these groups of animals. The same is true of genes. All vertebrates have red blood cells that contain haemoglobin, adult human haemoglobin is made from two α and two β globin molecules. The DNA sequence of the genes that encode globin molecules in vertebrates are all similar to each other and you can estimate how long ago two animals shared a common ancestor by looking at how similar their globin genes are. This principle can also be used to find genes in a new piece of DNA sequence; if there is a section of sequence that is similar to a known gene then it is likely to encode a homologous gene.

A gene is more than just the sequence that encodes the protein; it also includes sequences involved in regulation of gene expression such as promoter sequences that define where transcription starts and are the sites where proteins involved in transcription bind to the DNA. In bacteria, almost all genes are a single uninterrupted sequence of DNA. In eukaryotes the situation is more complicated because the coding region is usually interrupted by introns. The primary transcript is referred to as precursor or pre-mRNA, this contains both exons and introns. The introns are removed when the pre-mRNA is processed before it leaves the nucleus ( Figure 10 ) leaving the exons which are spliced together to make the mature mRNA. Eukaryotic mRNAs have a 5′ cap which is a methylated guanosine nucleotide added to the 5′ end of the mRNA by an unusual 5′ to 5′ linkage; this is important in initiating translation. At the 3′ end is the poly A tail, this is a chain of between 100 and 250 adenine residues added to the mRNA to increase its stability. Analysis of the human genome sequence suggests that there are approximately 20000–25000 protein-coding genes, however there are far more different proteins. This is because many genes are capable of encoding several variants of a protein. Alternative splicing allows for different combinations of exons to be included in the mature mRNA and genes can also have several alternative promoters and alternative poly A sites. It is thought that 95% of human genes are alternatively spliced.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g10.jpg

The DNA includes an untranslated region at both the 5′ and 3′ ends as well as introns and exons. The codon where translation starts (green) and the stop codon (red) are shown. The DNA is transcribed into mRNA and is processed by addition of the 5′ cap, splicing out the introns and addition of the poly A tail. This mature mRNA is exported from the nucleus into the cytoplasm.

Non-coding RNAs

Only approximately 1.2% of the human genome codes for protein. However, if you compare the genomes of the human, the mouse and the dog you can see that much more of the genome is under what is called negative selection since the species diverged. Negative selection means that mutations which are disadvantageous are selected against. This suggests that more than just the protein coding regions affect the fitness of the organism carrying the DNA. Some of these are DNA sequences that are important in controlling gene expression (next section). However systematic screens are revealing large numbers of RNA transcripts that are processed but do not encode proteins. The most well-known are transfer RNAs and ribosomal RNAs both of which as we will see in a later section are fundamental to protein synthesis. However we are beginning to understand that there are other non-coding RNAs that carry out important cellular processes.

Two types of non-coding RNA, small inhibitory RNAs (siRNAs) and microRNAs (miRNAs) have a role in reducing gene expression after the mRNA has been transcribed from the DNA. They work by targeting a protein complex called RISC to specific mRNAs which it then degrades. Expression of the gene is specifically knocked out or reduced and the phenotypic effect of this can then be observed. Another group of non-coding RNAs play an important role in increasing the stability and correct folding of ribosomal RNAs. This process takes place in a compartment within the nucleus called the nucleolus; the RNAs are called small nucleolar RNAs (snoRNAs). These are mostly generated from intron RNA after it has been spliced out of the precursor mRNA and they function in association with proteins.

Modern concept of a gene

The modern concept of the gene has to take into account all of the complexity of mRNA processing including alternative splicing, regulatory sequences and polyadenylation sites as well as the plethora of non-coding RNAs. A definition of a gene that takes these factors into account would be that a gene codes for one or more transcripts that can function as an RNA or can be translated into one or more proteins.

Transcription

We have seen that a gene can encode either an RNA product or a protein sequence. The production of both requires the gene to be transcribed into RNA, either because the RNA is the final product or because the RNA will need to act as template for protein synthesis. RNA synthesis is very similar in prokaryotes and eukaryotes, being catalysed by the enzyme RNA Polymerase. However, of the processes discussed in this article it is arguably the one that differs most between prokaryotes and eukaryotes. One difference is that in eukaryotes the whole process needs to occur in a chromatin context, so access to the DNA template is limited. Regulation of gene expression is a major facilitator of cell differentiation, homoeostasis and speciation. Different cell types turn on transcription of different genes giving rise to their differentiated phenotypes. If we look at mammals as an example of speciation, they all have roughly the same gene content; it is how transcription is regulated that has changed as mammals have evolved. For example, if you compare humans and mice, the important changes to the human and mouse genome sequence that have occurred since they diverged from a common ancestor, are predominantly in the sequences that control transcription rather than in protein coding sequences.

RNA polymerase

DNA-dependent RNA polymerases are responsible for transcription of DNA into RNA. Like DNA Polymerase, RNA polymerase requires a DNA template and nucleoside triphosphate precursors. RNA polymerase does not require a primer. During RNA synthesis, the base within the incoming nucleoside triphosphate pairs with the base on the DNA template, a phosphodiester bond is formed, and pyrophosphate is released. RNA polymerase synthesises RNA in the 5′ to 3′ direction, because it can only add nucleotides on to the 3′ end of the chain. During transcription only one DNA strand is transcribed into RNA.

Gene transcription

When a gene is transcribed, RNA polymerase will bind upstream from the start of the gene, it will unwind almost two turns of the DNA helix to form a transcription bubble, it will add nucleotides on to the growing RNA chain, the last 12 nucleotides to be added to the RNA chain will base pair with the DNA template, forming a DNA–RNA heteroduplex ( Figure 11 ).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g11.jpg

As each nucleotide is added to the growing chain, the transcription bubble and the heteroduplex moves with respect to the DNA template. So, as RNA polymerase synthesises RNA, there is unwinding of the DNA template in front of the site of synthesis and rewinding of DNA once RNA polymerase has passed through. Once RNA polymerase has transcribed the gene, transcription will terminate. For some genes, transcription termination is signalled by a particular sequence within the DNA, a terminator sequence, which RNA polymerase recognises. In some cases, RNA polymerase requires the help of other protein factors to recognise the terminator sequence. Finally, many eukaryotic genes do not contain a specific terminator sequence; instead, termination of transcription is linked to other events, for example cleavage of the RNA prior to addition of the polyA tail. Termination of transcription, leads to the dissociation of RNA polymerase from the DNA template and release of the RNA product. In prokaryotes, mRNA does not need processing before it can be translated, in fact, as will be discussed below, mRNA is translated as it is being made. However, the initial transcript in eukaryotes does need to be processed to produce a functional mRNA that can be exported to the cytoplasm for translation.

Control of transcription in prokaryotes

At many genes in prokaryotes, RNA polymerase can bind to the gene and initiate transcription without other protein factors. However, for most prokaryotic genes, the binding of RNA polymerase to the gene is controlled by transcription factors to ensure the correct genes are transcribed at the correct level within the cell. Upstream from the transcription start there will be a ‘promoter’ which contains specific DNA sequences that are recognised by RNA polymerase and transcription factors. Each gene will have a different promoter sequence and can be controlled by different transcription factors. A good example of this type of promoter is the promoter that controls the lac operon in E. coli ( Figure 12 ). Transcription factors that up-regulate transcription are called activators and those that down-regulate transcription are called repressors. In this example RNA polymerase on its own can bind the promoter and drive low levels of transcription. If the repressor binds it will stop all transcription and would override RNA polymerase and the activator. In the absence of the repressor, if the activator is present then it can drive high levels of transcription.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g12.jpg

The different binding sites for transcription factors are shown on the DNA; ABS, activator binding site, RBS, repressor binding site. The left-hand panel indicates the presence of lactose and/or glucose in the environment, the right-hand panel indicates transcription levels.

The lac operon codes for genes required to use lactose and needs to be controlled in response to glucose and lactose concentrations. A repressor protein is responsible for responding to lactose concentration and an activator is responsible for responding to glucose ( Figure 12 ).

In the absence of lactose, the lac operon is kept in an off state by the repressor protein binding to the promoter and stopping transcription. If lactose is present in the cell, it will bind to the repressor and this stops the repressor binding the promoter, RNA polymerase can bind and drive low levels of transcription. If the cell is starved of glucose, the activator is turned on and this binds the promoter and helps RNA polymerase to initiate transcription, resulting in high rates of transcription.

In the examples above, RNA polymerase on its own drives low levels of transcription. This might not be the case for all promoters, at some promoters RNA polymerase may not be able to bind and drive transcription without an activator protein. At other promoters RNA polymerase on its own will be able to drive high levels of transcription and a repressor protein would be needed to turn off transcription.

Control of transcription in eukaryotes

Control of transcription in eukaryotes has to occur on a chromosome which is condensed into chromatin ( Figure 13 ). In addition, transcription requires the assembly of a large multiprotein complex at the gene. This complex will contain RNA polymerase and several other general transcription factors (GTFs). The core promoter is a region that overlaps the transcription start and is the binding site for RNA Polymerase and the GTFs. In addition, there will be further control sequences, enhancers, that can be just upstream or several 1000 base pairs away from the core promoter. In the absence of activator proteins the chromatin structure will stop RNA polymerase and the GTFs binding to the core promoter. Here histone proteins act as generic repressors of transcription. In order for a transcription to be turned on activators will bind the enhancers and recruit co-activators which open up the chromatin structure and ensure the core promoter is not blocked by histone proteins. The activators and co-activators will then assemble RNA polymerase and the GTF at the core promoter and drive transcription initiation. Transcription factors will also ensure the chromatin structure across the whole gene is in a conformation that is suitable for transcription.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g13.jpg

( A ) When a gene is in a silent state the surrounding DNA will be in condensed chromatin and the histones will epigenetic modifications which facilitate gene repression (red spheres). ( B ) A gene that is being transcribed will have activators bound to enhancer sequences, the activators recruit co-activators that acetylate the histone and add other epigenetic modifications that facilitate gene transcription (green spheres). The activator and co-activators will recruit RNA Polymerase and the GTFs to the core promoter.

Repressors are not normally required to block assembly of transcription complex at the core promoter, however, they are important in the regulatory patterns needed in complex multicellular organisms. Eukaryotes have repressor proteins which can block the action of a specific activator and ensure the activator is only active when required. Repressors can work in a number of ways including binding to DNA and blocking the binding of the activator to the DNA, stopping the activator interacting with other proteins required for transcription or by binding to the activator and keeping the activator in the cytoplasm.

Epigenetics

As discussed above transcription initiation in eukaryotes requires the opening up of the chromatin structure. This is facilitated by co-activator proteins that can move the relative position of the nucleosomes ( Figure 4 ) with respect to the DNA and hence make certain regions of the DNA more accessible. They can also add chemical tags to both the histone proteins and DNA ( Figure 13 ). These epigenetic modifications can affect whether a gene or genomic region is available for transcription or is transcriptionally silenced. Histones are acylated by enzymes which transfer an acetyl functional group to from acetyl-coenzyme A to lysine residues in the histone protein. This is linked to activation of transcription because it reduces the positive charge on histones and therefore reduces their affinity for the negatively charged DNA. Acetylation can also act as a tag that is recognised by other proteins that drive gene transcription. This modification of the DNA is described as epigenetic because it affects gene expression rather than the genetic code itself. Conversely some repressor proteins will recruit co-repressors that deacetylate histones, increasing their affinity for DNA causing the chromatin to be highly condensed and leading to transcriptional silencing. Methylation of lysine residues is another epigenetic tag, a single lysine residue can have 1, 2 or 3 methyl groups added. Unlike acetylation, methylation of lysine residues does not change the positive charge. The consequences of histone methylation are more complex because depending on which lysine residue is methylated and the level of methylation, the tag may mark that region of the genome for transcription activation or repression.

DNA methylation is another important epigenetic modification which leads to transcriptional silencing of the genomic region that has been methylated. During differentiation in the developing embryo whole regions of the genome will be methylated and therefore transcriptionally silenced. The DNA methylation patterns are maintained during cell division and future generations of that cell.

Analysing transcription on a global scale

For many years individual scientists would study the transcriptional regulation of their ‘favourite’ gene and so we gained an understanding of how individual genes were regulated in response to different development or environmental signals, for example the control of the lac operon in response to lactose and glucose. In the last 15 years, many techniques have been developed to allow us to study transcriptional control of genes within a cell. Using techniques such as ‘RNA-Seq’, we can isolate the total RNA from a cell and use high-throughput sequencing to catalogue the level of transcription of all genes. In the case of eukaryotes this will also show how they have been spliced. It is also possible to analyse the binding of transcription factors and study epigenetic changes within histone proteins across the genome using techniques such as ChIP-Seq. So, combining techniques such as RNA-Seq and ChIP-Seq we can determine when and where a protein factor is bound to DNA and study epigenetic changes in a particular cell type and the consequences in terms of gene transcription. In combination these techniques give a detailed picture of the factors that affect transcription; this has been used, for example to look at differences between cancer cells and normal cells from the same patient.

Transcription and disease

Transcription factors and promoters play major roles in health and disease, below are just a few examples to give an idea of their role in health and disease.

The transcription factor p53 is a tumour suppressor protein, it guards against cancer and some human cancers have mutations that knock out p53 function.
The drug Tamoxifen used in the treatment of breast cancer binds the oestrogen receptor inhibiting its function. The oestrogen receptor is a transcription factor that turns on the transcription of genes in response to oestrogen.
Rett Syndrome is a neurodevelopmental disorder that affects approximately 1 in 15000 female births. It is due to mutations in a transcription factor that would normally repress transcription of specific genes, the mutations lead to inappropriate transcription of these genes.
Cocaine use results in changes in expression of many genes, this can include epigenetic changes within genes involved in cognition and brain function. These epigenetic changes can be inherited and there is evidence that cocaine use by a father can result in epigenetic changes that result in male, but not female, offspring being cocaine resistant.

Translation of RNA into proteins

The key player in protein synthesis is the ribosome, a complex structure composed of RNA and proteins. The ribosome provides a framework that ensures that the mRNA and tRNA are correctly positioned enabling the deciphering of the genetic code. There are many other proteins that are important in protein synthesis; some of these are part of the ribosome and some are again correctly positioned by the framework of the ribosome. As we will see, the small subunit ribosomal RNA is a ribozyme; an RNA molecule with catalytic properties similar to those of enzymes. Ribosomal RNA can form a peptide bond between two amino acids.

Transfer RNA

The other nucleic acid that you need for protein synthesis is the tRNA. The tRNA molecule is single stranded and folds up into a characteristic structure by base pairing ( Figure 14 ). These act as adaptor molecules, each has an anticodon for a specific mRNA codon and each carries the amino acid specified by that codon. The anticodon has a complementary sequence to the codon on the mRNA.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g14.jpg

( A ) Tertiary structure of the phenylalanine tRNA from yeast showing the anticodon (grey), the acceptor stem (violet) with the nucleotides CAA at the 3′ OH end (yellow). Image modified from ‘TRNA-Phe yeast’ Yikrazuul (licensed under CC BY-SA 3.0). ( B ) Clover leaf representation of the secondary structure of tRNA.

The enzymes which attach amino acids to tRNAs are called aminoacyl tRNA synthetases; they recognise a specific amino acid and the corresponding tRNA. The reaction also requires ATP, it is carried out in two steps:

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-e1.jpg

In the first step the enzyme hydrolyses ATP releasing pyrophosphate (PP) and in the second it attaches the amino acid to the 3′ hydroxyl of the tRNA. Aminoacyl tRNA synthetase enzymes are highly specific, they recognise specific amino acids and will only attach them to the correct tRNA. This ensures correct coupling of amino acids and tRNA molecules which is just as important in ensuring the fidelity of protein synthesis as the matching of the anticodon to the codon by the ribosome. In addition this step is said to activate the aminoacyl tRNA as it not only produces the correct substrate for the ribosome but also provides much of the energy required for peptide bond formation during protein synthesis.

Structure of the ribosome

All living things contain ribosomes. The ribosomes in bacteria are slightly smaller than those found in eukaryotic cells ( Table 2 ) but the overall structure and the way in which they work are essentially the same. The 2009 Nobel Prize for Chemistry was awarded to three scientists, Ada Yonath, Thomas Steitz and Venkatraman Ramakrishnan, who used X-ray crystallography to solve the three-dimensional structure of the bacterial ribosome. The ribosome is composed of two subunits, the small subunit which reads the messenger RNA and the large subunit which forms the bonds between amino acids, adding them to the growing polypeptide chain. There are three important binding sites for tRNAs in the ribosome which are at the interface between the two subunits and only formed when the two subunits come together. These sites are shown on the image in Figure 15 , they are referred to as the acceptor or aminoacyl (A) site, the peptidyl (P) site where the peptide bond between amino acids is formed and the exit (E) site from which spent tRNAs leave the ribosome.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g15.jpg

In ( A ) the new tRNA is delivered to the ribosome by elongation factor EF-Tu (purple). In ( B ) the amino acid on the incoming tRNA is brought close to the amino acid on the tRNA in the peptidly site to facilitate peptide bond formation (bright green) (Adapted from Goodsell 2010 , licensed under CC-BY-4.0 licence).

In addition to the ribosome, the mRNA and tRNA, there are a number of small proteins that are not part of the structure of the ribosome, but are required for protein synthesis: initiation factors, elongation factors and termination factors. The importance of these factors is illustrated by the inherited condition Vanishing White Matter Disease (VWM). This serious neurodegenerative disease which results in lesions in the white matter in the brain is due to mutations in one of the initiation factors.

Protein synthesis

During protein synthesis the ribosome brings together the amino acid charged tRNA and the mRNA, the codon and anticodon are matched and the amino acids are joined together in the correct sequence. There are three phases to this process: initiation where the ribosome assembles on the mRNA, elongation where the triplet code is read and amino acids are added to the growing peptide chain and termination where protein synthesis stops.

A complex of proteins called the cap-binding complex bind to the 5′ cap of the mRNA ( Figure 10 ) in the nucleus. The mRNA is then exported to the cytoplasm where it recruits initiation factors, tRNA charged with a methionine and the small (40S) ribosomal subunit. Initiation factors also bind and the small subunit scans along the 5′ untranslated region of the mRNA until it encounters the first AUG start codon (Figure 16A). This is recognised by the anticodon codon of the initiator tRNA, the large subunit then docks to give the translation complex. The 80S ribosome with the tRNA charged with methionine at the P site is now ready to accept the next tRNA ( Figure 16 B).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g16.jpg

( A ) During initiation, the mRNA recruits a tRNA charged with a methionine and the small ribosomal subunit, ( B ) the large subunit then docks to give the translation complex, ( C ) a tRNA with an amino acid attached enters the A site, ( D ) the peptide bond is formed between the amino acid in the P site and the one in the A site. The effect is that the growing peptide chain is transferred to the incoming aminoacyl tRNA in the A site leaving an empty tRNA in the P site. ( E ) Finally, everything moves along the mRNA by one codon in a process called translocation so the peptidyl tRNA with the growing peptide chain attached moves to the P site and the spent tRNA to the E site from where it leaves the ribosome. ( F ) When a stop codon is in the A site, a termination or release factor enters the A site, ( G ) the peptide is released from the ribosome and ( H ) the two subunits of the ribosome disassociate and are recycled.

With initiation complete, the mRNA is in the correct reading frame with the A site empty and the next codon exposed. In the elongation phase an aminoacyl tRNA, one charged with an amino acid, is brought to the ribosome in a complex with an elongation factor and enters the A site. If the anticodon it carries is complementary to the exposed codon it is correctly positioned in the acceptor site and GTP is hydrolysed on the elongation factor ( Figure 16 C). A peptide bond ( Figure 17 ) is then formed between the C terminus of the amino acid in the P site and the N terminus of the amino acid in the A site, this reaction is catalysed in the peptidyl transfer centre of the large subunit of the ribosome. The effect is that the growing peptide chain is transferred to the incoming aminoacyl tRNA in the A site leaving an empty or spent tRNA in the P site ( Figure 16 D). Finally, the peptidyl tRNA with the growing peptide chain attached moves to the P site. This step is called translocation and the energy being provided by hydrolysis of GTP by the elongation factor EF-G. The spent tRNA moves to the exit site from where it can leave the ribosome. The mRNA moves so that the next codon is exposed in the A site ( Figure 16 E) ready to accept a new aminoacyl-tRNA charged with another amino acid. During the elongation phase the ribosome cycles through this process, adding amino acids to the growing peptide chain until a stop codon is exposed in the A site. The new protein emerges from the ribosome through an exit tunnel in the large subunit.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g17.jpg

( A ) Amino acids consist of a carbon atom with an amine group (the N terminus), a carboxylic acid group (the C terminus) and a variable R group. The simplest R group is a methyl group giving the amino acid alanine. ( B ) When two amino acids are joined together a peptide bond is formed between the N terminus of one amino acid and the C terminus of another. This is a condensation reaction releasing one molecule of water.

Termination

The stop codon is not decoded by being recognised by an anticodon on a tRNA. Instead it is detected by proteins called termination or release factors. In eukaryotes there is a single release factor (RF1) that recognises all three stop codons enters the A site ( Figure 16 F). The ester bond linking the peptide chain to the tRNA in the P site is broken and the peptide is released from the ribosome ( Figure 16 G) The two subunits of the ribosome disassociate and are recycled ( Figure 16 H).

The structure and function of ribosomes are highly conserved with a large core of structurally conserved proteins and rRNAs found in both eukaryotic and prokaryotic ribosomes. However, there are some differences both in the rRNAs and in some of the additional proteins involved in translation ( Table 2 ). The elongation phase is highly conserved but there are important differences in how protein synthesis is initiated. Bacterial mRNAs have a specific sequence called the ribosome binding site or Shine–Dalgarno sequence. In order to ensure that the mRNA is correctly positioned in the ribosome the Shine–Dalgarno sequence binds to a complementary sequence of the 16S rRNA in the small subunit. In bacteria the initiator tRNA is charged with a modified amino acid N-Formylmethionine.

Differences between the structure of bacterial and eukaryotic ribosomes can be exploited by antibiotics which are selective in that they affect protein synthesis in bacteria but not in mammalian cells. Macrolide antibiotics like erythromycin, block the exit tunnel in the large subunit of bacterial ribosomes and halt protein synthesis. The exit tunnel in eukaryotic ribosomes is slightly narrower which means that eukaryotic ribosomes are not affected. Streptomycin, an important antibiotic in the treatment of tuberculosis binds to the 16S of bacterial ribosomes. This distorts the structure of the decoding site and results in misreading of the mRNA.

Polyribosome

Protein synthesis can proceed very quickly, particularly in rapidly growing cells or those that are differentiating. In bacteria between 15 and 20, new peptide bonds can be formed per second. In eukaryotes it is slower, more like five peptide bonds per second. A small human protein like insulin would take only 10 seconds to make whereas the largest human protein titin, which is found in human muscle cells, takes about an hour and a half per molecule. One of the mechanisms that ensures that protein synthesis is carried out efficiently is the polyribosome. As soon as one ribosome has started translation another ribosome binds to initiate synthesis of another protein copy. This gives rise to polyribosomes or polysomes which can be seen by electron microscopy. Recent cryo-EM images show that ribosomes can be arranged very closely on the mRNA with the mRNA entry and exit channels aligned to allow the smooth passage of mRNA between them ( Figure 18 ). Sometimes these polyribosomes can form circular structures so that, as soon as the ribosome has finished synthesis of one polypeptide it can rebind the same mRNA molecule and start synthesis of another copy of the protein.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g18.jpg

Cyro-electron micrograph reconstruction of eukaryotic polyribosome. Reprinted from ( Myasnikov 2014 ) by permission.

Closing remarks

The study of nucleic acids, from their first identification as the genetic material is littered with landmarks in molecular biosciences, many of them marked with Nobel Prizes. Since Watson and Crick proposed their structure of DNA our knowledge about DNA and how it works has expanded almost exponentially. The topics introduced in this article are important topics covered in all bioscience programmes; understanding them is key to all areas of biosciences from evolution and animal diversity to health and disease. Recent developments in the techniques that we can use to study DNA, often in living cells means that new and exciting developments in our understanding of the way nucleic acids work are occurring all the time. Given the scope of this article we have barely scratched the surface of the topic, however, the reader can find more detail from the articles in the bibliography below and even more detail from a few minutes searching on the internet.

Abbreviations

Competing interests.

The authors declare that there are no competing interests associated with the manuscript.

Review articles

Afonina Z.A. and Shirokov V.A. (2018) Three dimensional organization of polyribosomes–a modern approach . Biochemistry (Moscow) 83 , S48–S55 10.1134/S0006297918140055 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Gerstein M.B, Bruce C., Rozowsky J.S., Zheng D, Du J., Korbel J.O.. et al. (2007) What is a gene, post-ENCODE? History and updated definition . Genome Res. 17 , 669–681 10.1101/gr.6339607 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Kruglyak L. and Stern D.L. (2007) An embarrassment of switches . Science 317 , 758–759 10.1126/science.1146921 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Minchin S.D. and Busby S.J.W. (2013) Transcription factors . In Brenner’s Encyclopedia of Genetics (Maloy S. and Hughes K., eds), Elsevier, U.S.A. [ Google Scholar ]
Roberts M. (2019) Recombinant DNA technology and DNA sequencing . Essays Biochem. 63 , 10.1042/EBC20180039 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Roeder R.G. (2003) The eukaryotic transcriptional machinery: complexities and mechanisms unforeseen . Nat. Med. 9 , 1239–1244 10.1038/nm938 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Historical perspectives

Dahm R. (2008) Discovering DNA: Friedrich Miescher and the early years of nucleic acid research . Hum. Genet. 122 , 565–581 10.1007/s00439-007-0433-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
McCarty M. (2003) Discovering genes are made of DNA . Nature 421 , 406 10.1038/nature01398 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Maddox B. (2003) The double helix and the “wronged heroine” . Nature 421 , 407–408 [ PubMed ] [ Google Scholar ]
Kemp M. (2003) The Mona Lisa of modern science . Nature 421 , 416–420 10.1038/nature01403 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Original research papers

Crick F.H.C., Barnett L., Brenner S. and Watts-Tobin R.J. (1961) General nature of the genetic code for proteins . Nature 192 , 1227–1232 10.1038/1921227a0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Franklin R.E. and Gosling R.G. (1953) Molecular configuration in sodium thymonucleate . Nature 171 , 740–741 10.1038/171740a0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Meselson M. and Stahl F.W. (1958) The replication of DNA in Escherichia coli . Proc. Natl. Acad. Sci. U.S.A. 44 , 671–682 10.1073/pnas.44.7.671 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Watson J. and Crick F. (1953) Molecular structure of nucleic acid. A structure for deoxyribose nucleic acid . Nature 171 , 737–738 10.1038/171737a0 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Citations for figures

Goodsell D. (2010) Molecule of the month: ribosome . https://pdb101.rcsb.org/motm/121 [ Google Scholar ]
Myasnikov A.G. (2014) The molecular structure of the left-handed supra-molecular helix of eukaryotic polyribosomes . Nat. Commun. 5 , 5294 10.1038/ncomms6294 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Yikrazuul X.X. (2010) tRNA-Phe yeast . https://commons.wikimedia.org/wiki/File:TRNA-Phe_yeast_1ehz.png [ Google Scholar ]

May 14, 2024

15 min read

Revolutionary Genetics Research Shows RNA May Rule Our Genome

Scientists have recently discovered thousands of active RNA molecules that can control the human body

By Philip Ball

Illustration of active RNA molecules behind machines

T homas Gingeras did not intend to upend basic ideas about how the human body works. In 2012 the geneticist, now at Cold Spring Harbor Laboratory in New York State, was one of a few hundred colleagues who were simply trying to put together a compendium of human DNA functions. Their project was called ENCODE, for the Encyclopedia of DNA Elements. About a decade earlier almost all of the three billion DNA building blocks that make up the human genome had been identified. Gingeras and the other ENCODE scientists were trying to figure out what all that DNA did.

The assumption made by most biologists at that time was that most of it didn’t do much. The early genome mappers estimated that perhaps 1 to 2 percent of our DNA consisted of genes as classically defined: stretches of the genome that coded for proteins, the workhorses of the human body that carry oxygen to different organs, build heart muscles and brain cells, and do just about everything else people need to stay alive. Making proteins was thought to be the genome’s primary job. Genes do this by putting manufacturing instructions into messenger molecules called mRNAs, which in turn travel to a cell’s protein-making machinery. As for the rest of the genome’s DNA? The “protein-coding regions,” Gingeras says, were supposedly “surrounded by oceans of biologically functionless sequences.” In other words, it was mostly junk DNA.

So it came as rather a shock when, in several 2012 papers in Nature , he and the rest of the ENCODE team reported that at one time or another, at least 75 percent of the genome gets transcribed into RNAs. The ENCODE work, using techniques that could map RNA activity happening along genome sections, had begun in 2003 and came up with preliminary results in 2007. But not until five years later did the extent of all this transcription become clear. If only 1 to 2 percent of this RNA was encoding proteins, what was the rest for? Some of it, scientists knew, carried out crucial tasks such as turning genes on or off; a lot of the other functions had yet to be pinned down. Still, no one had imagined that three quarters of our DNA turns into RNA, let alone that so much of it could do anything useful.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing . By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Some biologists greeted this announcement with skepticism bordering on outrage . The ENCODE team was accused of hyping its findings; some critics argued that most of this RNA was made accidentally because the RNA-making enzyme that travels along the genome is rather indiscriminate about which bits of DNA it reads.

Now it looks like ENCODE was basically right. Dozens of other research groups, scoping out activity along the human genome, also have found that much of our DNA is churning out “noncoding” RNA. It doesn’t encode proteins, as mRNA does, but engages with other molecules to conduct some biochemical task. By 2020 the ENCODE project said it had identified around 37,600 noncoding genes—that is, DNA stretches with instructions for RNA molecules that do not code for proteins. That is almost twice as many as there are protein-coding genes. Other tallies vary widely, from around 18,000 to close to 96,000. There are still doubters, but there are also enthusiastic biologists such as Jeanne Lawrence and Lisa Hall of the University of Massachusetts Chan Medical School. In a 2024 commentary for the journal Science , the duo described these findings as part of an “RNA revolution.”

What makes these discoveries revolutionary is what all this noncoding RNA—abbreviated as ncRNA—does. Much of it indeed seems involved in gene regulation: not simply turning them off or on but also fine-tuning their activity. So although some genes hold the blueprint for proteins, ncRNA can control the activity of those genes and thus ultimately determine whether their proteins are made. This is a far cry from the basic narrative of biology that has held sway since the discovery of the DNA double helix some 70 years ago, which was all about DNA leading to proteins. “It appears that we may have fundamentally misunderstood the nature of genetic programming,” wrote molecular biologists Kevin Morris of Queensland University of Technology and John Mattick of the University of New South Wales in Australia in a 2014 article.

Another important discovery is that some ncRNAs appear to play a role in disease, for example, by regulating the cell processes involved in some forms of cancer. So researchers are investigating whether it is possible to develop drugs that target such ncRNAs or, conversely, to use ncRNAs themselves as drugs. If a gene codes for a protein that helps a cancer cell grow, for example, an ncRNA that shuts down the gene might help treat the cancer.

A few noncoding RNAs had been known for many decades, but those seemed to have some role in protein manufacture. For instance, only a few years after Francis Crick, James Watson and several of their colleagues deduced the structure of DNA, researchers found that some RNA, called transfer RNA, grabs onto amino acids that eventually get strung together into proteins.

In the 1990s, however, scientists realized ncRNA could do things quite unrelated to protein construction. These new roles came to light from efforts to understand the process of X-inactivation, wherein one of the two X chromosomes carried by females is silenced, all 1,000 or so of its genes (in humans) being turned off. This process seemed to be controlled by a gene called XIST . But attempts to find the corresponding XIST protein consistently failed.

The reason, it turned out, was that the gene did not work through a protein but instead did so by producing a long noncoding (lnc) RNA molecule. Such RNAs are typically longer than about 200 nucleotides, which are the chemical building blocks of DNA and RNA. Using a microscopy technique called fluorescence in situ hybridization, Lawrence and her colleagues showed that this RNA wraps itself around one X chromosome (selected at random in each cell) to induce persistent changes that silence the genes. “This was the first evidence of a lncRNA that does something,” Lawrence says, “and it was totally surprising.”

If noncoding RNAs power the way a cell processes genetic information, it is possible they can be used in medicine.

XIST isn’t that unusual in generating an ncRNA, though. In the early 2000s it became clear that transcription of noncoding DNA sequences is widespread. For example, in 2002 a team at biotech company Affymetrix in Santa Clara, Calif., led by Gingeras, who was working there at the time, reported that much more of human chromosomes 21 and 22 gets transcribed than just the protein-coding regions.

It was only after ENCODE published its results in 2012, however, that ncRNA became impossible to ignore. Part of the antipathy toward those findings, says Peter Stadler, a bioinformatics expert at Leipzig University in Germany, is that they seemed like an unwanted and unneeded complication. “The biological community figured we already knew how the cell works, and so the discovery of [ncRNAs] was more of an annoyance,” he says. What’s more, it showed that simpler organisms were not always a reliable guide to human biology: there is far less ncRNA in bacteria, studies of which had long shaped thinking about how genes are regulated.

But now there is no turning back the tide: many thousands of human lncRNAs have been reported, and Mattick suspects the real number is greater than 500,000. Yet only a few of these have been shown to have specific functions, and how many of them really do remains an open question. “I personally don’t think all of those RNAs have an individual role,” Lawrence says. Some, though, may act in groups to regulate other molecules.

How lncRNAs perform such regulation is also still a matter of debate. One idea is that they help to form so-called condensates: dense fluid blobs containing a range of different regulatory molecules. Condensates are thought to hold all the relevant players in one place long enough for them to do their job collectively. Another idea is that lncRNAs affect the structure of chromatin—the combination of DNA and proteins that makes up chromosome fibers in the cell nucleus. How chromatin is structured determines which of its genes are accessible and can be transcribed; if parts of chromatin are too tightly packed, the enzyme machinery of transcription can’t reach it. “Some lncRNAs appear to be involved with chromatin-modifying complexes,” says Marcel Dinger, a genomics researcher at the University of Sydney.

If only 1 to 2 percent of the RNA from our genome was encoding proteins, what was the rest for? Some, scientists knew, carried out crucial tasks such as turning genes on or off.

Lawrence and Hall suspect that lncRNAs could supply scaffolds for organizing other molecules, for example, by holding some of the many hundreds of RNA-binding proteins in functional assemblies. One lncRNA called NEAT1, which is involved in the formation of small compartments in the nucleus called paraspeckles, has been shown capable of binding up to 60 of these proteins. Or such RNA scaffolding could arrange chromatin itself into particular structures and thereby affect gene regulation. Such RNA scaffolding could have regularly repeating modules and thus repetitive sequences—a feature that has long been regarded as a hallmark of junk DNA but lately is appearing to be not so junky after all. This view of lncRNA as scaffolding is supported by a 2024 report of repeat-rich ncRNAs in mouse brain cells that persist for at least two years. The research, by Sara Zocher of the German Center for Neurodegenerative Diseases in Dresden and her co-workers. found these ncRNAs seem to be needed to keep parts of chromatin in a compact and silent state.

T hese lncRNAs are just one branch of the noncoding RNA family, and biologists keep discovering others that appear to have different functions and different ways of affecting what happens to a cell—and thus the entire human body.

Some of these RNAs are not long at all but surprisingly short. Their story began in the 1980s, when Victor Ambros, working as a postdoctoral researcher in the laboratory of biologist Robert Horvitz at the Massachusetts Institute of Technology, was studying a gene denoted lin-4 in the worm Caenorhabditis elegans . Mutations of lin-4 caused developmental defects in which “the cells repeated whole developmental programs that they should have transitioned beyond,” says Ambros, now at the University of Massachusetts Medical School. It seemed that lin-4 might be a kind of “master regulator” controlling the timing of different stages of development.

Graphic presents 2 views of how DNA works. The traditional view is unidirectional: DNA to RNA to protein. The emerging view includes ncRNA, which may double back to regulate DNA transcription.

Jen Christiansen; Source: John Mattick, UNSW Sydney (consultant)

“We thought lin-4 would be a protein-coding gene,” Ambros says. To figure out what role this putative protein plays, Ambros and his colleagues cloned the C. elegans gene and looked at its product—and found that the effects of the gene may not be mediated by any protein but by the gene’s RNA product alone. This molecule looked ridiculously short: just 22 nucleotides long, a mere scrap of a molecule for such big developmental effects.

This was the first known microRNA (miRNA). At first “we thought this might be a peculiar characteristic of C. elegans ,” Ambros says. But in 2000 Gary Ruvkun, another former postdoc in the Horvitz lab, and his co-workers found that another of these miRNA genes in C. elegans , called let-7 , appears in essentially identical form in many other organisms, including vertebrates, mollusks and insects. This implies that it is a very ancient gene and “must have been around for 600 million to 700 million years” before these diverse lineages went their separate ways, Ambros says. If miRNAs are so ancient, “there had to be others out there.”

Indeed, there are. Today more than 2,000 miRNAs have been identified in the human genome, generally with regulatory roles. One of the main ways miRNAs work is by interfering with the translation of a gene’s mRNA transcript into its corresponding protein. Typically the miRNA comes from a longer molecule, perhaps around 70 nucleotides long, known as pre-miRNA. This molecule is seized by an enzyme called Dicer, which chops it into smaller fragments. These pieces, now miRNAs, move to a class of proteins called Argonautes, components of a protein assembly called the RNA-induced silencing complex (RISC). The miRNAs guide the RISC to an mRNA, and this either stops the mRNA from being translated into a protein or leads to its degradation, which has the same effect. This regulatory action of miRNAs guides processes ranging from the determination of cell “fate” (the specialized cell types they become) to cell death and management of the cell cycle.

Key insights into how such small RNAs can regulate other RNA emerged from studies in C. elegans in 1998 by molecular biologists Andrew Fire, Craig Mello and their co-workers, for which Fire and Mello were awarded the 2006 Nobel Prize in Physiology or Medicine. They learned that RISC is guided by slightly different RNA strands named small interfering (si) RNA. The process ends with the mRNA being snipped in half, a process called RNA interference.

MiRNAs do pose a puzzle, however. A given miRNA typically has a sequence that matches up with lots of mRNAs. How, then, is there any selectivity about which genes they silence? One possibility is that miRNAs work in gangs, with several miRNAs joining forces to regulate a given gene. The different combinations, rather than individual snippets, are what match specific genes and their miRNAs.

Why would miRNA gene regulation work in this complicated way? Ambros suspects it might allow for “evolutionary fluidity”: the many ways in which different miRNAs can work together, and the number of possible targets each of them can have, offer a lot of flexibility in how genes are regulated and thus in what traits might result. That gives an organism many evolutionary options, so that it is more able to adapt to changing circumstances.

One class of small RNAs regulates gene expression by directly interfering with transcription in the cell nucleus, triggering mRNA degradation. These PIWI-interacting (pi) RNAs work in conjunction with a class of proteins called PIWI Argonautes. PiRNAs operate in germline cells (gametes), where they combat “selfish” DNA sequences called transposons or “jumping genes”: sequences that can insert copies of themselves throughout the genome in a disruptive way. Thus, piRNAs are “a part of the genome’s immune system,” says Julius Brennecke of the Institute of Molecular Biotechnology of the Austrian Academy of Sciences. If the piRNA system is artificially shut down, “the gametes’ genomes are completely shredded, and the organism is completely sterile,” he says.

Still other types of ncRNAs, called small nucleolar RNAs, work within cell compartments called nucleoli to help modify the RNA in ribosomes—a cell’s protein-making factories—as well as transfer RNA and mRNA. These are all ways to regulate gene expression. Then there are circular RNAs: mRNA molecules (particularly in neurons) that get stitched into a circular form before they are moved beyond the nucleus into the cytoplasm. It’s not clear how many circular RNAs are important—some might just be transcriptional “noise”—but there is some evidence that at least some of them have regulatory functions.

In addition, there are vault RNAs that help to transport other molecules within and between cells, “small Cajal-body-specific RNAs” that modify other ncRNAs involved in RNA processing, and more. The proliferation of ncRNA varieties lends strength to Mattick’s claim that RNA, not DNA, is “the computational engine of the cell.”

I f ncRNAs indeed power the way a cell processes genetic information, it is possible they can be used in medicine. Disease is often the result of a cell doing the wrong thing because it gets the wrong regulatory instructions: cells that lose proper control of their cycle of growth and division can become tumors, for example. Currently medical efforts to target ncRNAs and alter their regulatory effects often use RNA strings called antisense oligonucleotides (ASOs). These strands of nucleic acid have sequences that are complementary to the target RNA, so they will pair up with and disable it. ASOs have been around since the late 1970s. But it has been hard to make them clinically useful because they get degraded quickly in cells and have a tendency to bind to the wrong targets, with potentially drastic consequences.

Some ASOs, however, are being developed to disable lncRNAs that are associated with cancers such as lung cancer and acute myeloid leukemia. Other lncRNAs might act as drugs themselves. One known as MEG3 has been found, preliminarily, to act as a tumor suppressor. Small synthetic molecules, which are easier than ASOs to fine-tune and deliver into the body as pharmaceuticals, are also being explored for binding to lncRNAs or otherwise inhibiting their interactions with proteins. Getting these approaches to work, however, has not been easy. “As far as I am aware, no lncRNA target or therapeutic has entered clinical development,” Gingeras says.

Targeting the smaller regulatory RNAs such as miRNAs might prove more clinically amenable. Because miRNAs typically hit many targets, they can do many things at once. For example, miRNAs in families denoted miR-15a and miR-16-1 act as tumor suppressors by targeting several genes that themselves suppress cell death (apoptosis, a defense against cancer) and are being explored for cancer therapies.

Yet a problem with using small RNAs as drugs is that they elicit an immune response. Precisely because the immune system aims to protect against viral RNA, it usually recognizes and attacks any “nonself” RNA. One strategy for protecting therapeutic RNA from immune assault and degradation is to chemically modify its backbone so that it forms a nonnatural “locked” ring structure that the degrading enzymes can’t easily recognize.

Some short ASOs that target RNAs are already approved for clinical use, such as the drugs inotersen to treat amyloidosis and golodirsen for Duchenne muscular dystrophy. Researchers are also exploring antisense RNAs fewer than 21 nucleotides long that target natural regulatory miRNAs because it is only beyond that length that an RNA tends to trigger an immune reaction.

These are early days for RNA-based medicine, precisely because the significance of ncRNA itself in human biology is still relatively new and imperfectly understood. The more we appreciate its pervasive nature, the more we can expect to see RNA being used to control and improve our well-being. Nils Walter of the Center for RNA Biomedicine at the University of Michigan wrote in an article early in 2024 that the burgeoning promise of RNA therapeutics “only makes the need for deciphering ncRNA function more urgent.” Succeeding in this goal, he adds, “would finally fulfill the promise of the Human Genome Project.”

Despite this potential of noncoding RNA in medicine, the debate continues about how much of it truly matters for our cells. Geneticists Chris Ponting of the University of Edinburgh and Wilfried Haerty of the Earlham Institute in Norwich, England, are among the skeptics. In a 2022 article they argued that most lncRNAs are just “transcriptional noise,” accidentally transcribed from random bits of DNA. “Relatively few human lncRNAs ... contribute centrally to human development, physiology, or behavior,” they wrote.

Brennecke advises caution about current high estimates of the number of noncoding genes. Although he agrees that such genes “have been underappreciated for a long time,” he says we should not leap to assuming that all lncRNAs have functions. Many of them are transcribed only at low levels, which is what one would expect if indeed they were just random noise. Geneticist Adrian Bird of the University of Edinburgh points out that the abundance of the vast majority of ncRNAs seems to be well below one molecule per cell. “It is difficult to see how essential functions can be exerted by an ncRNA if it is absent in most cells,” he says.

But Gingeras counters that this low expression rate might reflect the very tissue-specific roles of ncRNAs. Some, he says, are expressed more in one part of a tissue than in another, suggesting that expression levels in each cell are sensitive to signals coming from surrounding tissues. Lawrence points out that, despite the low expression levels, there are often shared patterns of expression across cells of a particular type, making it harder to argue that the transcription is simply random. And Hall doubts that cells are really so prone to “bad housekeeping” that they will habitually churn out lots of useless RNA. Lawrence and Hall’s suggestion that some lncRNAs have collective effects on chromatin structure would mean that no individual one of them is needed at high expression levels and that their precise sequence doesn’t matter too much.

That lack of specificity in sequence and binding targets, Dinger says, means that a mutation of a nucleotide in an ncRNA typically won’t have the same negative impact on its function as it tends to in a protein-coding DNA sequence. So it would not be surprising to see quite a lot of sequence variation. Dinger argues that it makes more sense to assume that “genetically encoded molecules are potentially functional until shown otherwise, rather than junk unless proven functional.” Some in the ENCODE team now agree that not all of the 75 percent or so of human genome transcription might be functionally significant. But many researchers make the point that surely many more of the noncoding molecules do meaningful things than was suspected before.

Demonstrating functional roles for lncRNAs is often tricky. In part, Gingeras says, this may be because lncRNA might not be the biochemically active molecule in a given process: it might be snipped up into short RNAs that actually do the work. But because long and short RNAs tend to be characterized via different techniques, researchers may end up searching for the wrong thing. What’s more, long RNAs are often cut up into fragments and then spliced back together again in various combinations, the exact order often depending on the condition of the host cell.

At its roots, the controversy over noncoding RNA is partly about what qualifies a molecule as “functional.” Should the criterion be based on whether the sequence is maintained between different species? Or whether deleting the molecule from an organism’s repertoire leads to some observable change in a trait? Or simply whether it can be shown to be involved in some biochemical process in the cell? If repetitive RNA acts collectively as a chromosome “scaffold” or if miRNAs act in a kind of regulatory swarm, can any individual one of them really be considered to have a “function”?

Gingeras says he is perplexed by ongoing claims that ncRNAs are merely noise or junk, as evidence is mounting that they do many things. “It is puzzling why there is such an effort to persuade colleagues to move from a sense of interest and curiosity in the ncRNA field to a more dubious and critical one,” he says.

Perhaps the arguments are so intense because they undercut the way we think our biology works. Ever since the epochal discovery about DNA’s double helix and how it encodes information, the bedrock idea of molecular biology has been that there are precisely encoded instructions that program specific molecules for particular tasks. But ncRNAs seem to point to a fuzzier, more collective, logic to life. It is a logic that is harder to discern and harder to understand. But if scientists can learn to live with the fuzziness, this view of life may turn out to be more complete.

Philip Ball is a science writer and former Nature editor based in London. His most recent book is How Life Works (University of Chicago Press, 2023).

Scientific American Magazine Vol 330 Issue 6

To revisit this article, visit My Profile, then View saved stories .

Backchannel
Newsletters
WIRED Insider
WIRED Consulting

Will Knight

Google DeepMind’s Groundbreaking AI for Protein Structure Can Now Model DNA

Abstract sculpture of multicolored spheres and straws on a pink and yellow background molecular structure concept

Google spent much of the past year hustling to build its Gemini chatbot to counter ChatGPT , pitching it as a multifunctional AI assistant that can help with work tasks or the digital chores of personal life. More quietly, the company has been working to enhance a more specialized artificial intelligence tool that is already a must-have for some scientists.

AlphaFold , software developed by Google’s DeepMind AI unit to predict the 3D structure of proteins, has received a significant upgrade. It can now model other molecules of biological importance, including DNA, and the interactions between antibodies produced by the immune system and the molecules of disease organisms. DeepMind added those new capabilities to AlphaFold 3 in part through borrowing techniques from AI image generators.

“This is a big advance for us,” Demis Hassabis , CEO of Google DeepMind, told WIRED ahead of Wednesday’s publication of a paper on AlphaFold 3 in the science journal Nature . “This is exactly what you need for drug discovery: You need to see how a small molecule is going to bind to a drug, how strongly, and also what else it might bind to.”

AlphaFold 3 can model large molecules such as DNA and RNA, which carry genetic code, but also much smaller entities, including metal ions. It can predict with high accuracy how these different molecules will interact with one another, Google’s research paper claims.

The software was developed by Google DeepMind and Isomorphic labs, a sibling company under parent Alphabet working on AI for biotech that is also led by Hassabis. In January, Isomorphic Labs announced that it would work with Eli Lilly and Novartis on drug development.

AlphaFold 3 will be made available via the cloud for outside researchers to access for free, but DeepMind is not releasing the software as open source the way it did for earlier versions of AlphaFold. John Jumper, who leads the Google DeepMind team working on the software, says it could help provide a deeper understanding of how proteins interact and work with DNA inside the body. “How do proteins respond to DNA damage; how do they find, repair it?” Jumper says. “We can start to answer these questions.”

Understanding protein structures used to require painstaking work using electron microscopes and a technique called x-ray crystallography. Several years ago, academic research groups began testing whether deep learning , the technique at the heart of many recent AI advances, could predict the shape of proteins simply from their constituent amino acids, by learning from structures that had been experimentally verified.

In 2018, Google DeepMind revealed it was working on AI software called AlphaFold to accurately predict the shape of proteins. In 2020, AlphaFold 2 produced results accurate enough to set off a storm of excitement in molecular biology. A year later, the company released an open source version of AlphaFold for anyone to use, along with 350,000 predicted protein structures, including for almost every protein known to exist in the human body. In 2022 the company released more than 2 million protein structures.

Google’s AI Overviews Are Here, Whether You Want Them or Not

Reece Rogers

Louryn Strampe

I Went Undercover as a Secret OnlyFans Chatter. It Wasn’t Pretty

Brendan I. Koerner

The 25 Best Outdoor Summer Deals From REI’s Anniversary Sale

Scott Gilbertson

The latest AlphaFold’s ability to model different proteins was improved in part through an algorithm called a diffusion model that helps AI image generators like Dall-E and Midjourney create weird and sometimes photo-real imagery. The diffusion model inside AlphaFold 3 sharpens the molecular structures the software generates. The diffusion model is able to generate plausible protein structures based on patterns it picked up from analyzing a collection of verified protein structures, much as an image generator learns from real photographs how to render realistic-looking snapshots.

AlphaFold 3 is not perfect, though, and offers a color-coded confidence scale for its predictions. Areas of a protein structure colored blue indicate high confidence, while red areas show less certainty.

David Baker , a professor at the University of Washington who leads a group working on techniques for protein design, has competed with AlphaFold. In 2021, before DeepMind open sourced its creation, his team released an independent protein-structure prediction inspired by AlphaFold. His own lab recently released a diffusion model to help model a wider range of molecular structures, but he concedes that AlphaFold 3 is more capable. “The structure prediction performance of AlphaFold 3 is very impressive,” Baker says.

Baker adds that it is a shame that the source code for AlphaFold 3 has not been released to the scientific community.

Hassabis, who leads all of Alphabet’s AI initiatives, has long taken a special interest in the potential for AI to accelerate scientific research . But he says the latest techniques being developed for AlphaFold, a highly specialized AI system, could prove useful for building more general systems that aim to exceed human capabilities on many dimensions.

If AI programs like Google’s Gemini become a lot more capable over the next decade, he says, “you could imagine them using things like AlphaFold as tools, to achieve some other goal.”

You Might Also Like …

In your inbox: Will Knight's Fast Forward explores advances in AI

He emptied a crypto exchange onto a thumb drive —then disappeared

The real-time deepfake romance scams have arrived

Boomergasms are booming

Heading outdoors? Here are the best sleeping bags for every adventure

Paresh Dave

TikTok’s Creator Economy Stares Into the Abyss

Louise Matsakis

The Real-Time Deepfake Romance Scams Have Arrived

Matt Burgess

She Painted a Few Champagne Bottles. Then Came Meta’s Customer Support Hell

Kathy Gilsinan

Ads for Explicit ‘AI Girlfriends’ Are Swarming Facebook and Instagram

Lydia Morrish

Meta’s Open Source Llama 3 Is Already Nipping at OpenAI’s Heels

Kate Knibbs

IMAGES

What is a DNA? An Introduction
DNA Structure
Premium Vector
Chemical structure of DNA, displaying four nucleobase pairs made by
Figure 4 from Discovery of DNA Structure and Function: Watson and Crick
Structure and Function of DNA

VIDEO

Decoding DNA
DNA STRUCTURE || PART 1
6 Questions About DNA Answered
Top 10 Revolutionary Discoveries in Genetics That Changed Our Understanding of Heredity
DNA STRUCTURE
What is DNA ?

COMMENTS

The structure of DNA
The discovery of the helical structure of double-stranded DNA settled the matter — and changed biology forever. ... but also housed the Medical Research Council's Unit for Research on the ...
DNA
DNA, organic chemical of complex molecular structure found in all prokaryotic and eukaryotic cells. It codes genetic information for the transmission of inherited traits. The structure of DNA was described in 1953, leading to further understanding of DNA replication and hereditary control of cellular activities.
Discovery of the structure of DNA (article)
The components of DNA. From the work of biochemist Phoebus Levene and others, scientists in Watson and Crick's time knew that DNA was composed of subunits called nucleotides 1 . A nucleotide is made up of a sugar (deoxyribose), a phosphate group, and one of four nitrogenous bases: adenine (A), thymine (T), guanine (G) or cytosine (C).
The Structure and Function of DNA
The Structure and Function of DNA. Biologists in the 1940s had difficulty in accepting DNA as the genetic material because of the apparent simplicity of its chemistry. DNA was known to be a long polymer composed of only four types of subunits, which resemble one another chemically. Early in the 1950s, DNA was first examined by x-ray diffraction ...
Biochemistry, DNA Structure
The remarkable structure of deoxyribonucleic acid (DNA), from the nucleotide up to the chromosome, plays a crucial role in its biological function. The ability of DNA to function as the material through which genetic information is stored and transmitted is a direct result of its elegant structure. In their seminal 1953 paper, Watson and Crick unveiled two aspects of DNA structure: pairing the ...
DNA
Properties Chemical structure of DNA; hydrogen bonds shown as dotted lines. Each end of the double helix has an exposed 5' phosphate on one strand and an exposed 3′ hydroxyl group (—OH) on the other.. DNA is a long polymer made from repeating units called nucleotides. The structure of DNA is dynamic along its length, being capable of coiling into tight loops and other shapes.
DNA Research
DNA Research is an internationally peer-reviewed journal which aims at publishing papers of highest quality in broad aspects of DNA and genome-related research ... GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level . Resource Articles: Genomes Explored
9.1: The Structure of DNA
Now let's consider the structure of the two types of nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The building blocks of DNA are nucleotides, which are made up of three parts: a deoxyribose (5-carbon sugar), a phosphate group, and a nitrogenous base (Figure 9.1.2 9.1. 2 ). There are four types of nitrogenous bases in ...
DNA function & structure (with diagram) (article)
DNA structure and function. DNA is the information molecule. It stores instructions for making other large molecules, called proteins. These instructions are stored inside each of your cells, distributed among 46 long structures called chromosomes. These chromosomes are made up of thousands of shorter segments of DNA, called genes.
DNA structure and function
The information encoded by DNA is both digital - the precise base specifying, for example, amino acid sequences - and analogue. The latter determines the sequence-dependent physicochemical properties of DNA, for example, its stiffness and susceptibility to strand separation. Most importantly, DNA chirality enables the formation of supercoiling ...
Rosalind Franklin's Overlooked Role in the Discovery of DNA's Structure
English chemist and X-ray crystallographer Rosalind Elsie Franklin poses for a portrait, circa 1955. In 1951, Franklin joined a team of biophysicists led by John Randall at King's College who ...
Human Molecular Genetics and Genomics
The focus of genomics research has recently moved beyond analyzing DNA variation to studying patterns of gene expression in individual cells, a step that has been driven by new methods for single ...
DNA Studies: Latest Spectroscopic and Structural Approaches
Despite the B-DNA form, it is mostly known to be the "ideal" DNA structure, other conformations are commonly present in physiological conditions. The double helix assumes conformations in response to environmental ... An extensive amount of research related to DNA analysis for the detection and sensing has been reported for microfluidic ...
The Discovery of the Double Helix, 1951-1953
The discovery in 1953 of the double helix, the twisted-ladder structure of deoxyribonucleic acid (DNA), by James Watson and Francis Crick marked a milestone in the history of science and gave rise to modern molecular biology, which is largely concerned with understanding how genes control the chemical processes within cells.
(PDF) DNA structure and function
A) Structures of A-DNA and B-DNA. Note the difference in groove width and the relative displacements of the base pairs from the central axis. Reproduced with permission from Arnott [12].
DNA explained: Structure, function, and impact on health
There are many types of DNA, each of which varies depending on its specific structure. The most common is B-DNA, but some other types found in the genome include A-DNA, H-DNA, and Z-DNA. What is ...
(PDF) The Structures of DNA and RNA
6. CHAPTER The Structures of DNA. and RNA. T he discovery that DNA is the prime genetic molecule, carrying all. the hereditary information within chromosomes, immediately. focused attention on its ...
Deoxyribonucleic Acid (DNA)
Deoxyribonucleic acid (abbreviated DNA) is the molecule that carries genetic information for the development and functioning of an organism. DNA is made of two linked strands that wind around each other to resemble a twisted ladder — a shape known as a double helix. Each strand has a backbone made of alternating sugar (deoxyribose) and ...
The Human Genome Project
The Human Genome Project. The Human Genome Project (HGP) is one of the greatest scientific feats in history. The project was a voyage of biological discovery led by an international group of researchers looking to comprehensively study all of the DNA (known as a genome) of a select set of organisms. Launched in October 1990 and completed in ...
Google DeepMind's new AlphaFold can model a much larger slice of
AlphaFold 3 can predict how DNA, RNA, and other molecules interact, further cementing its leading role in drug discovery and research. Who will benefit? Google DeepMind has released an improved ...
AlphaFold 3 predicts the structure and interactions of all of life's
Google DeepMind's newly launched AlphaFold Server is the most accurate tool in the world for predicting how proteins interact with other molecules throughout the cell. It is a free platform that scientists around the world can use for non-commercial research. With just a few clicks, biologists can harness the power of AlphaFold 3 to model structures composed of proteins, DNA, RNA and a ...
Understanding biochemistry: structure and function of nucleic acids
The structure of DNA. ( A) A nucleotide (guanosine triphosphate). The nitrogenous base (guanine in this example) is linked to the 1′ carbon of the deoxyribose and the phosphate groups are linked to the 5′ carbon. A nucleoside is a base linked to a sugar. A nucleotide is a nucleoside with one or more phosphate groups.
Revolutionary Genetics Research Shows RNA May Rule Our Genome
For instance, only a few years after Francis Crick, James Watson and several of their colleagues deduced the structure of DNA, researchers found that some RNA, called transfer RNA, grabs onto ...
Google DeepMind's Groundbreaking AI for Protein Structure Can Now Model DNA
Move over, chatbots. This upgraded AI can model antibodies, DNA, and molecules from disease organisms. This next generation of AlphaFold, from Google Deepmind, is poised to significantly advance ...
PDF Basic Molecular Biology: Basic Science
The basic elements that compose DNA are five atoms: carbon, nitrogen, oxygen, phosphorous, and hydrogen. A nucleoside is the combination of these atoms into two structures, a fivecarbon sugar molecule called deoxyribose, which is responsible for the name of DNA, and a phosphate group. Together, these two structures form the supporting backbone ...
Full article: Genetic diversity and population structure of Canarian
To determine whether Canarian chicken breed varieties represented their own population structure or whether they were just the result of colour pattern gene segregation, a genetic structure study was carried out applying the Bayesian model-based method developed by Pritchard et al. (Citation 2004) and implemented in the STRUCTURE v 2.1 program ...