Scientists have a powerful new tool for controlling the coronavirus: Its own genetic code.

The United States, home to the world’s biggest outbreak, has failed to tap the technique’s full potential.

Imagine a place where politicians and public health experts use every tool at their disposal to contain the coronavirus.

Welcome to the fictional town of Scienceville.

Here, when a citizen catches covid-19, the entire community gets tested. The tests reveal
everyone
who is infected, even those who show no symptoms. But the tests alone can’t say how people get sick.
Fortunately, in Scienceville, officials also sequence the genomes of each virus sample they collect. It’s a simple but powerful extra step that reveals which sub-strains of the coronavirus are circulating in the town.

A

Sub-strains

B

C

At this bar, the regulars have the same
sub-strain
of coronavirus as the bartender, suggesting the business is a likely hot spot. The town closes the bar to slow the virus’s spread.
An infected plumber turns up with
the
same sub-strain as her customers. Contact tracers also learn the plumber had chosen not to wear a mask, so the town enacts a mask mandate.
Sequencing can also rule out chains of transmission that otherwise seem likely. Three people at a school are sick, but all have
different
sub-strains of the virus. This suggests the virus was brought into the school from outside, but did not spread within it. The school can safely stay open.
Scienceville is a fantasy. But in some countries, researchers and governments are racing to make it a reality — for this pandemic and the ones that follow.
Scroll to continue arrow-down

The six British patients seemed to have little in common besides this: Each was dealing with kidney failure, and each had tested positive for the coronavirus.

They were among scores of virus-stricken people showing up at Addenbrooke’s Hospital in Cambridge in the early weeks of April. Had they lived in the United States instead of the United Kingdom, the link that allowed the contagion to spread among them might have slipped by unnoticed.

But the U.K. had done something in the early days of the pandemic that the United States and many other nations had not. It funded a national push to repeatedly decode the coronavirus genome as it made its way across the country. The process reveals tiny, otherwise invisible changes in the virus’s genetic code, leaving a fingerprint that gives scientists valuable glimpses into how the disease is spreading. It’s a cutting-edge technique that was not widely available in previous global pandemics but that researchers think can help hasten the end of this one.

[The code: How genetic science helped expose a secret coronavirus outbreak]

Experts cite this practice, known as “genomic epidemiology,” as one more tool the United States has failed to fully employ in the fight against the virus. Though it first sequenced the 3 billion-base-pair human genome 20 years ago and spends more on basic biomedical research than almost any other nation, the United States has yet to muster the kind of well funded and comprehensive national effort that could produce a more precise accounting of how the coronavirus is infiltrating communities around the country.

In the case of the six British patients, sequencing revealed they had been infected by almost identical sub-strains of SARS-CoV-2, the virus that causes covid-19. Epidemiologists soon determined that all six had visited the same outpatient dialysis clinic on the same day of the week. Many had ridden in the same small transport van that regularly brought patients for treatment from across the surrounding area.

Officials promptly put in place new safety measures, including mandatory masks and intense cleaning of the van and the chairs at the dialysis clinic.

“And, you know, we’ve had no further cases,” said Estée Török, an infectious-disease expert at the University of Cambridge who helped decipher the outbreak. Studying the virus’s genome “helps to highlight cryptic or hidden transmission. That’s the real power of it — you can detect outbreaks and act while they’re happening.”

Image without caption
Image without caption
Image without caption

Estée Török, an infectious-disease expert at the University of Cambridge, and her colleagues have sequenced and catalogued thousands of viral genomes since the spring. (Photos by Anastasia Taylor-Lind for The Washington Post)

Already, the United Kingdom has sequenced at least 72,529 coronavirus genomes, nearly as many as the rest of the world combined. By contrast, U.S. labs have produced less than half as many sequences as their British counterparts, based on data from the GISAID Initiative, a global database of coronavirus genomes. That’s despite the fact that the United States is battling an epidemic that’s massively larger.

Scientists say the technique could help officials trace and smother outbreaks, and could even shed light on the origins of the nation’s most high-profile coronavirus cluster: the outbreak at the White House.

About nine months after infections began surfacing in the United States, the country has sequenced just 0.4 percent of its more than 7 million coronavirus cases — a proportion surpassed by 40 other countries. By contrast, the U.K., which has conducted genetic analysis for roughly 12 percent of its outbreak, ranks 6th. Even greater percentages of cases have been sequenced in Australia, Taiwan, New Zealand and Iceland.

While U.K. scientists, backed by their government, began work early to create a centralized tracking system that could benefit researchers around the country, the U.S. effort has remained more diffuse and disorganized, conducted largely by an ad hoc group of scientists scattered across the country. They often have found themselves hamstrung by a fragmented health care system, a fractured pandemic response, lack of national coordination and a shortage of federal dollars.

The U.S. effort is “patchy and largely passive,” the National Academies of Science, Engineering and Medicine found in a July report, citing inconsistent funding that raises “concerns of sustainability.”

Duncan MacCannell, who as chief scientist at the Centers for Disease Control and Prevention’s Office of Advanced Molecular Detection has been racing to try to organize the U.S. effort, called sequencing “an incredibly powerful tool in understanding this virus and in responding to the pandemic.” But he acknowledged the country got off to a slow start.

As coronavirus cases began to spike across the United States, he said in an email, only about a third of the nation’s state, county and local public health departments were in a position to implement some level of routine genomic sequencing. Though the CDC launched an initiative in May to coordinate those efforts and ramp up capacity, the undertaking has only recently begun to hand out funding.

“[It] just comes down to political will,” said Bronwyn MacInnis, a genomic epidemiologist at the Broad Institute in Boston who has been involved in the U.S. effort. “If it was a priority and we had political will and coordination, and we had a framework for activating together quickly, we could be — not that it’s a competition — but we could be right there with the U.K., or leading.”

A genetic bar code

The reason that the coronavirus was able to infect humans in the first place is the reason we can use genetics to track it: Viruses mutate. With SARS-CoV-2, this occurs roughly every second transmission between people, or about every two weeks. Inside one of our cells, as the virus replicates, it occasionally makes a tiny mistake somewhere amid its nearly 30,000 genetic bases. From then on, whenever it copies itself, it is copying a slightly different genetic code.

The vast majority of these mutations have no effect on the function or behavior of the virus. But they leave a clue, a tiny genetic marker that can be traced. That marker can be used as a bar code, locating that particular virus on the family tree of all of its relatives and allowing it, and its offspring, to be tracked.

“It’s like if a word is misspelled in chapter three of ‘One Hundred Years of Solitude,’ every time someone photocopies that, it doesn’t hurt the meaning, but you can tell it’s that copy,” said Jeremy Kamil, a virologist at the Louisiana State University Health Sciences Center-Shreveport who is using sequencing to try to understand outbreaks in the state.

[Massive genetic study shows coronavirus mutating amid rapid U.S. spread]

That means sequencing has the potential to help scientists decipher not only how a virus is spreading in a particular location, but how it got there in the first place.

“It is almost like a passport that gets stamped whenever a virus jumps between countries,” said Cheryl Bennett of the GISAID Initiative, whose global database contains the largest number of coronavirus genomes, currently numbering more than 142,000.

Image without caption
Image without caption
Image without caption

Bronwyn MacInnis, top, a genomic epidemiologist, and Pardis Sabeti, a computational biologist, both work on the sequencing of the virus at the Broad Institute in Boston, along with a team pictured here, from left to right, Stephen Schaffner, Chris Tomkins-Tinch, Amber Carter, MacInnis, Jacob Lemieux, Kim Lagerborg, Steve Reilly, Sabeti, Lydia Krasilnikova, Bennett Shaw, Christine Loreth, Gordon Adams, Matt Bauer, Melissa Rudy, Erica Normandin, Katie Siddle and Kat Deruff. (Photos by Brianna Soukup for The Washington Post)

Merging genetics with epidemiology is like switching from a black-and-white world to one rich with color, said Paraic Kenny, a cancer researcher-turned-coronavirus geneticist at the Gundersen Health System in La Crosse, Wis. Diagnostic tests can answer a single, binary question about covid-19: infected or not.

“But once you start seeing in color, there are all kinds of implications,” Kenny said. It’s possible to know who shares similar sub-strains of the virus, and where those strains likely originated. The shades and nuances of an outbreak become much clearer.

This makes viral sequencing a key source of information for contact tracing, which is essential to curbing disease spread before cities and states get overwhelmed. And unlike people, viral genomes don’t lie or forget where they’ve been.

Yet for genomic epidemiology to work, scientists say, scale and speed matter. Since one of the most important pieces of information that a genome can reveal is its connection to other cases, the technique becomes more valuable as more samples have been sequenced. The process must also happen quickly enough for health officials to act on the results.

In the Netherlands, for instance, sequences can be produced just one day after the initial coronavirus test, said Marion Koopmans of the Erasmus University Medical Center in Rotterdam, who began working with health authorities to set up sequencing there in January.

“What we’ve shown is that you can do it quickly, and then you can feed back the results to clinicians and infection control teams, and you can really investigate outbreaks in real time and put in interventions that help to control them,” said Török, the Cambridge researcher. “There’s no point in doing this weeks or months or years later."

An ‘awe-inspiring’ effort

The value of sequencing has been clear since it helped fight another deadly disease: Ebola.

In 2014, Pardis Sabeti, a computational biologist at the Broad Institute, used sequencing to reveal that the devastating Ebola outbreak in West Africa began with a single spillover from an animal to a human in Guinea. From there, it spread from person to person across the region. The findings strengthened the health response to the epidemic by documenting how contact with infected people, rather than bats and other animal hosts, was the most important mode of transmission.

After that outbreak, Sabeti became an evangelist for genomic surveillance of infectious diseases in the United States. She got a grant from the CDC to train employees at state and city public health departments in sequencing and analysis. She urged U.S. officials to establish regional sequencing centers and build more specialized laboratories for working with dangerous pathogens.

“We were pitching everywhere: ‘This is a race against time,’ ” Sabeti said.

[Genetic data show how a single superspreading event sent coronavirus across Massachusetts — and the nation ]

Her vision never came to full fruition in the United States. But on the other side of the Atlantic, scientists began work this winter to turn the same idea into a reality.

Like the United States, Britain already sequences the genomes of tuberculosis cases across the country, and it has used the approach to help monitor and crack down on outbreaks of foodborne illness.

“When covid came along,” said University of Cambridge microbiologist and public health professor Sharon Peacock, “several of us thought this is a really important opportunity to develop a national capability.”

Peacock and her colleagues quickly put together a proposal. The British government, in collaboration with the research nonprofit Wellcome Sanger Institute, agreed to put roughly $25 million into the effort.

The grant started on April 1, and the consortium built a network of 17 sequencing centers and hundreds of researchers across the country, dedicated to uploading as many covid-19 genomes as possible into a central database. Peacock said the current funding will allow researchers to sequence at least 200,000 coronavirus genomes, if necessary, from all over Great Britain.

Image without caption
Image without caption
Image without caption

Sharon Peacock, a University of Cambridge microbiologist and public health professor, directs a consortium of sequencing centers and researchers across the United Kingdom. At her lab, researchers regularly extract and sequence the genomes of viruses and bacteria, such as Clostridium difficile, seen here, a serious infection that is resistant to antibiotics. (Photos by Anastasia Taylor-Lind for The Washington Post)

The effort has already paid dividends.

In one instance, researchers discovered that coronavirus was brought into the United Kingdom on at least 1,300 separate occasions — obliterating the notion that a single carrier had triggered the nation’s initial outbreak and showing just how easily the virus slipped through airports and across borders in the early days of the pandemic. The analysis also found that the vast majority of those early cases arrived from Spain and other European countries, rather than from China, where the pandemic began.

The effort has yielded more than just clues about the past, as the case of the six dialysis patients in Cambridge shows. Those behind it argue that it could serve as a key tool to help stamp out the virus’s ongoing spread.

The Broad Institute’s MacInnis called the British sequencing effort “awe-inspiring.”

“They’ll be able to understand . . . where introductions are coming from, how the virus is spreading within the U.K., and how to target interventions to stop it,” she said. “It will serve as the gold standard for years.”

The initiative’s power comes from both its scale and coordination, MacInnis said. Having sequenced an ever-growing portion of British cases, scientists there have a much more granular understanding of the virus’s family tree.

“It’s not a panacea,” Peacock is quick to acknowledge. The nation must continue to sample more of its infections — and more quickly — for the approach to realize its full potential in aiding disease detectives, enhancing infection control and monitoring for meaningful mutations of the virus.

She added that the effort is less useful amid the harrowing spike of cases the U.K. has experienced recently or the virtually unchecked outbreaks that have plagued the United States.

“Sequencing is not going to help if nobody washes their hands and nobody social distances. It has to be part of a package of measures,” she said. “If an outbreak is completely out of control, sequencing doesn’t add very much.”

Still, British health authorities will be better able to identify unique mutations that characterize the outbreaks in certain communities — a hospital, a nursing home, a grocery store — and use that information to control transmission, MacInnis said. They’ll be able to identify “invisible links” between cases that might be impossible to pinpoint with traditional contact tracing, and, through close coordination with public health authorities, stop emerging outbreaks in their tracks.

Unless the United States improves, “we just won’t be able to do that,” she said. “We won’t know where infections are coming from within the country, let alone how they’re being introduced when we open our borders."

And it will be harder to decipher specific outbreaks, such as the high-profile coronavirus cluster within the White House, which has sickened dozens of people, including the president, the first lady and multiple administration officials.

Anthony S. Fauci, the government’s top infectious-disease expert, on Friday described as a “superspreader event” the packed, mostly maskless Rose Garden ceremony weeks ago to announce the nomination of Amy Coney Barrett to the Supreme Court. If samples from every infected person at the gathering were sequenced, it would be possible to infer whether a single infected guest spread the virus among other attendees, or whether the virus was introduced by multiple people into the White House orbit.

Sequencing also could potentially indicate where the virus that reached the White House originated, which in turn could suggest how protections for the president failed. And if closely related viruses are found outside the White House — for instance, in D.C. neighborhoods or in Barrett’s community in South Bend, Ind., — it could shed light on whether the event sparked other outbreaks.

For now, those genetic mysteries remain unsolved.

Image without caption
Image without caption
Image without caption

The Accident and Emergency Department at Addenbrooke's Hospital in Cambridge, where researchers used sequencing to identify an outbreak among dialysis patients. (Photos by Anastasia Taylor-Lind for The Washington Post)

‘We should be doing this’

The list of U.S. stumbles during the pandemic is a long one.

The nation has failed to ensure adequate testing. Contact tracing has been spotty and unable to keep up with the tidal wave of infections. Some states and localities have reopened their economies despite worsening case numbers, with troubling results; some schools and universities have repeated these same mistakes. All the while, government leaders have sent mixed messages about the importance of wearing masks.

The sequencing effort has struggled for many of the same reasons: It has largely been left up to states and individual researchers, rather than being part of a coordinated and well-funded national program.

Francis Collins, who led the effort to sequence the human genome and now heads the National Institutes of Health, has written about the power of the technology, combined with other more traditional techniques, to “put more targeted public health measures in place to slow and eventually stop [the coronavirus’s] deadly spread.”

In an interview, Collins agreed that the U.S. effort has been “a bit ad hoc, as opposed to a coordinated effort.” But he noted that the severity and scale of the pandemic have put incredible pressure on researchers and public officials to focus on finding a vaccine and treatments for the disease above all else.

“I think if we were convinced this was the highest possible priority to help end this pandemic, maybe there would be a more organized effort,” Collins said.

State level statistics underscore the disparities: While some states, such as California, Washington and Texas, have sequenced thousands of viruses, an enormous data desert remains throughout much of the central and southern United States.

As of Oct. 12, based on data from GISAID Initiative, Mississippi and Vermont had only published 4 sequences. Kansas had published 3, Montana and West Virginia had 2, and North Dakota had 0.

Number of sequences sent to
GISAID Initiative from each state
Hover the map for more detail
Source: GISAID Initiative. Data last updated .

With such poor sampling, experts say, these states have essentially no ability to use sequencing to aid in contact tracing, or to better understand the origins and development of their respective outbreaks.

The tools are already in place in much of the country, where sequencing machines are prevalent and the cost of running the short coronavirus genome can run as low as $10, according to Francis deSouza, the CEO of Illumina, which dominates the market for sequencing machines. (That figure refers only to sequencing and does not include other costs, such as for collecting and testing coronavirus samples, or getting results to patients.)

“We’ve got the sequencing horsepower,” said James Musser, a researcher at Houston Methodist Hospital who has presided over what appears to be the U.S.‘s largest single effort. “I think we should be doing this very extensively in the nation writ large, and I think it’s terrible that we are not.”

In the absence of federal support and coordination, the most successful sequencing efforts have occurred in a scattering of cities where the necessary resources already happened to be in place.

Vaughn Cooper, an evolutionary biologist at the University of Pittsburgh who runs his own private laboratory, set up a system that could turn a positive coronavirus swab into a sequence in less than 72 hours. Sabeti and her colleagues used sequencing to provide a detailed look at how a superspreading event unfolded in Boston — tracing strains of the virus from a conference of biotech executives all the way into the city’s homeless shelters.

Image without caption
Image without caption
Image without caption

Vaughn Cooper, an evolutionary biologist at the University of Pittsburgh, set up a system that could turn a swab from a positive coronavirus test into a sequence in less than 72 hours. (Photos by Amber Ford for The Washington Post)

Michigan was especially well positioned to become a leader in viral genomics. The state’s public health department already hosts a center for rapid sequencing of tuberculosis, which was set up by the CDC in 2017. Since transitioning to study SARS-CoV-2, the lab has added some 2,000 genome sequences to the GISAID Initiative database, about 6 percent of the country’s total.

The sequencing has allowed Michigan to better trace the movement of the virus, said molecular microbiologist Heather Blankenship of the state’s Department of Health and Human Services. “I think as we move forward in the response to covid, it’s only going to get better and more real-time.”

At the CDC, MacCannell has been working to stand up a public-private consortium of researchers and labs to share genetic information about the coronavirus in a centralized, systematic way. He said the group now includes 130 institutions, from small public health labs to nationwide diagnostic firms such as LabCorp and Quest Diagnostics.

“We are just starting to ramp up,” said MacCannell, adding that he expects the United States to soon produce tens of thousands of more sequences.

But many researchers said coordination among medical providers, scientists and public health officials is made more difficult by the U.S. health system, with its patchwork of public and private insurance and assortment of disconnected hospitals. Uninsured or underinsured people might not seek care for their illness, meaning virus samples from the most vulnerable people are not collected. Overwhelmed hospitals and cash-strapped public health departments often don’t have the resources to prepare samples and get them to labs where they can be analyzed. Even the variety of record keeping systems used by hospitals make it difficult to share data and distribute samples for sequencing.

“Centralized data management and processing for a national sequencing program isn’t as straightforward in the United States as it is in a lot of other countries,” MacCannell said.

Image without caption
Image without caption
Image without caption

Heather Blankenship, a molecular biologist at the Michigan Department of Health and Human Services Environmental Laboratory, sitting here in front of a biosafety cabinet, uses a sequencer to track the virus's genome in her lab, peppered with social distancing markers. (Photos by Brittany Greeson for The Washington Post)

Even some participants said the current effort is nowhere near as broad as is needed.

“It’s still kind of like a volunteer fire department,” said Tom Friedrich, a virologist at the University of Wisconsin-Madison and a member of the consortium. “Labs that already have the interest and capacity are sequencing, but that leaves other places lacking in coverage.”

Some of the biggest gaps are in places where outbreaks are most out of control, noted Friedrich’s University of Wisconsin colleague Dave O’Connor. “It is sort of like a street only being illuminated where there happen to be streetlights,” he said. “You can’t know anything about the areas that are dark.”

Quest Diagnostics itself proves the point: The firm has run 15.7 million coronavirus diagnostic tests in the United States. But while it has the capacity to perform sequencing and is “validating” a test that would include it, for now the company will continue to focus on testing “until sequencing can be deployed in a high volume, cost-effective way,” said Kimberly Gorode, a Quest Diagnostics spokesperson. Samples currently are not kept for later sequencing, she added; most are thrown away.

Meanwhile, as the U.S. epidemic continues to surge, sequencing has become less feasible in states with huge volumes of cases. Health authorities are overwhelmed. Test results can still take days to be delivered. Contact tracing remains unwieldy and burdensome.

But many scientists say the technique will be even more important as societies try to extinguish the flames of the pandemic, helping officials target interventions and track mutations that might help the virus evade treatments, including antiviral drugs such as remdesivir and, eventually, vaccines.

“Once we start using those drugs, then it will be really important to do surveillance,” said Stuart Ray, the vice chair of medicine for data integrity and analytics at Johns Hopkins Medicine.

Without federal funding, scientists say, even those who see the value of genomic epidemiology will be forced to give up on it.

In Pittsburgh, Cooper has reduced his efforts to small scale local sequencing, rather than trying to fully chart the city’s outbreak. Work at MacInnis’s lab nearly came to a standstill this summer as she filled out applications for NIH grants. A recent award from the CDC has allowed her to fill in some of the resulting data gaps. But she doesn’t expect to hear back from NIH until next year.

“This is not easy work. And there are a lot of dimensions that make it really challenging to implement at scale,” she said. But, she added, “I think we’ve learned a lot from covid on how the data can really be game changing.”

About this story

Editing by Trish Wilson. Graphics editing by Monica Ulmanu. Additional graphics work by Aaron Steckelberg. Photo editing by Olivier Laurent. Design and development by Audrey Valbuena. Additional development by Harry Stevens and Matthew Callahan. Design editing by Matthew Callahan. Copy editing by Emily Codik. Scienceville inspired by Paraic Kenny.

Brady Dennis is a Pulitzer Prize-winning national reporter for The Washington Post, focusing on the environment and public health. He previously spent years covering the nation’s economy.
Chris Mooney is a Pulitzer Prize winning reporter covering climate change, energy, and the environment. He has reported from the 2015 Paris climate negotiations, the Northwest Passage, and the Greenland ice sheet, among other locations, and has written four books about science, politics and climate change.
Sarah Kaplan is a climate reporter covering humanity's response to a warming world. She previously reported on Earth science and the universe.
Harry Stevens is a graphics reporter at The Washington Post. He was part of a team at The Post that won the 2020 Pulitzer Prize for Explanatory Reporting for the series “2C: Beyond the Limit.”