Background

Cluster Tracker is a tool that exposes viral evolution, using complex algorithms to identify new introductions from outside of a focal region, allowing epidemiologists potentially to see ahead of concerning events, when a new viral variant of interest has successfully spread into their region.

Cluster Tracker deciphers genomic surveillance data to catch how SARS-Co-V2 is mutating and traveling, thereby identifying samples in a community for epidemiologists for further investigation, and possible intervention. Related samples sharing descent from a common evolutionary viral ancestor and a common regional introduction are presented in a table below a map. This heat map visualizes geogenomic transmission dynamics, drawing attention to the best potential origins of variants introduced into a region. 

As new sequences of the evolving virus continually become available, Cluster Tracker crunches this data to highlight variants moving between regions and sharing a common evolutionary descent. UCSC's Ultrafast Sample placement on Existing tRee (UShER) tool allows scientists to quickly contextualize these new variants into a growing evolutionary history of what versions of the virus have come before. Rather than recalculating an entirely new phylogenetic tree from all the data, UShER's speed comes from looking at the list of mutations in each new sample and then using an existing mutation-annotate tree to place the new virus. This approach allows for comparing datasets quickly and aids in assigning new lineage names, as exemplified in how the Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) group moved to using UShER in 2022 to name lineages.

To understand the impact of the many evolving lineages of the virus, researchers have focused on variants that affect the virus's ability to infect human cells. They found that changes to the SARS-Co-V2 spike (S) protein, which binds to the human ACE2 receptor, can have a major impact on the virus's ability to spread. The ACE2 receptor is found on the surface of many tissues in the body, including the lungs, heart, and blood vessels, and the viral spike protein uses this interaction to gain entry into the host cell and initiate replication. The spike protein is made up of three identical subunits and binds to ACE2 at a site called the Receptor-Binding Domain (RBD).

Image of the structure of the spike-ACE2 complex and a close-up view of the RBD interface from figure 2 of the paper Obermeyer F et al., Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness. Science. 2022 Jun 17;376(6599):1327-1332. doi: 10.1126/science.abm1208

Viruses with more RBD spike protein mutations impact SARS-Co-V2's ability to infect human cells, making those versions of the virus potentially more contagious and/or resistant to immunity from vaccines. Such mutations are often called "variants of concern" or "variants of interest" and when new lineages are spotted with more RBD mutations, scientists have been able to hypothesize which lineages will likely spread more successfully. Such a story unfolded with the lineages BQ and XBB in the Fall of 2022.

Epidemiologists can use the Cluster Tracker tool to search for these RBD mutant lineages in their region. By coming to the Cluster Tracker tool and searching for lineages known to have variants of concern, Cluster Tracker's mutation analysis utilities, or matUtils, will extract spatial-temporal information to label the best potential origins for introductions of those lineages spotted in their region. The matUtils calculate introductions by comparing tree branching between samples, generating a regional index that can help ascertain if that newly obtained sequence descended from a virus introduced from outside the region or not. A heat map of regions highlights where more introductions are coming from. In the table, a calculated growth score weights the most recent samples. By clicking on the "View Cluster" link in a table, the genomic samples can be viewed on a phylogenetic tree in Taxonium, within the related lineage branches, where options to color or search on more metadata terms allow making further connections. In these ways, epidemiologists can use Cluster Tracker to spot clusters of concerning lineages in their region and ask themselves, "What clusters make me want to dig deeper?"  With a specimen ID, epidemiologists can search the table to discover linked cases clustered by sequence, and thereby generate and prioritize new hypotheses about how SARS-Co-V2 might be spreading in their region.

It is crucial to understand that the scope of available sequencing data only represents a fraction of the actual cases circulating, and the process of constructing phylogenetic trees is not without its limitations. Despite these limitations, Cluster Tracker can still serve as a valuable tool in navigating the evolution of the virus. The famous quote from the movie Jurassic Park, "Life finds a way," holds true in the context of a deadly virus like SARS-CoV-2. Just as life has a tendency to break free, expand to new places, and adapt to new challenges, this virus also has the ability to mutate and evolve in unexpected ways. Through genomic surveillance tools such as Cluster Tracker, epidemiologists can gain insight into the virus's movements and identify early on when new, potentially dangerous strains have emerged in their community. In short, Cluster Tracker can help us stay one step ahead of the virus, as it continues to evolve and adapt.

View links to Relevant Papers and GitHub on our Resources page.