About BRC Analytics
What are these resources?
The four resources include Galaxy (est. 2005), HyPhy (est. 2000), UCSC Genome Browser (est. 2000), and TACC (est. 2001).
Galaxy
The Galaxy team is a community that develops software infrastructure for deploying Galaxy instances. There are three major Galaxy instances in the US (https://usegaxy.org), Europe (https://usegalaxy.eu), and Australia (https://usegalaxy.au.org) and many regional instances.
Galaxy is an application that allows users to run a wide variety of command-line, web-based, or interactive tools on any type of compatible data. Galaxy can be accessed either through a web-browser, or programmatically via application programming interface (API). A Galaxy instance can be configured to manage local or remote computational resources, and schedule tool runs on any modern computational infrastructure including local hardware, conventional clusters, commercial or public clouds, and beyond.
The Galaxy team also operates and maintains the Galaxy ToolShed—a growing repository of >8,500 analysis tools available for use in Galaxy. The ToolShed is closely aligned with the BioConda and BioContainers communities that package and have become the standard distribution channels for bioinformatics tools. The Galaxy software ecosystem also includes scheduling components, tool development utilities, training infrastructure, and many other features (e.g., Planemo, Pulsar, TPV).
The main public https://usegalaxy.org site is an example of a Galaxy instance. Usegalaxy.org supports 10s of thousands of active users running 100s of thousands jobs per month and manages over 4 petabytes of user data.
The Galaxy Training Network (GTN) contains a comprehensive collection of tutorials covering all aspects of Galaxy from basic functionality to advanced analyses—it is a widely used community-curated resource.
UCSC Genome Browser
The UCSC Genome Browser, maintained by the University of California, Santa Cruz (UCSC), is a widely used and highly regarded online tool for visualizing and exploring genomic information. It is one of the most widely used sources of genomic data in the world, with more than 150,000 monthly users, spread over 200 countries and the majority of usage coming from outside the United States. The Browser team has been generating and distributing multiple alignments for genomes distributed on the site.
HyPhy
Key tools for the analysis of pathogen evolution and dynamics are contained within HyPhy—Hypothesis testing using Phylogenies—a mature (>20 years) open-source platform for comparative sequence analysis with a focus on studying the evolutionary process, especially selection, recombination, and evolutionary rates. Our system will leverage the power of HyPhy from within the Galaxy environment.
HyPhy provides a domain-specific scripting language which enables complex model definition, fitting, and hypothesis testing. Its analytical tools for comparative evolutionary analyses of pathogen sequence data have been extensively used over the past decade to investigate viral and bacterial pathogens. In the most recent five years, our “one-click” web application Datamonkey processed ~300,000 submissions, translating to ~$2.5M value based on Amazon EC2 pricing (compute time only); these are typically compute-intensive jobs investigating the impact of natural selection and recombination of pathogens. This service has processed well over 30,000 coronavirus (including SARS-CoV2) related jobs since 2020, and handles ~2,200 analyses/week.
The HyPhy software platform has also found extensive and consistent use in the field of pathogen evolutionary analysis with over 7,000 citations to it and attendant statistical methods.
TACC
Texas Advanced Computing Center (TACC) located within the University of Texas at Austin, serves as an advanced computing research hub. It offers an extensive array of advanced computing resources and support services to researchers across the United States. TACC's overarching objective is to facilitate groundbreaking discoveries that drive advancements in both science and society, achieved through the utilization of cutting-edge computing technologies.
The center supports diverse areas, including high-performance computing, scientific visualization, data analysis, and storage systems, as well as software development, research initiatives, and the creation of user-friendly portal interfaces.
TACC is a center of excellence in computational sciences within the US. Its resources and services are made accessible to the wider academic community through its participation in the National Science Foundation's (NSF) ACCESS-CI project.
Collaborative strategy
The research team responsible for implementing this work is headed by five PI with complementary research expertise domains, and a long history of collaboration. It includes Dr. Nekrutenko (Penn State), the original co-developer and PI of the Galaxy Project for the past 15 years. Dr. Pond (Temple) a computational evolutionary biologist whose core expertise is methodology development (including the HyPhy package) and two decades of experience in viral and pathogen evolution (HIV, IAV, and more recently SARS-CoV2). Dr. Schatz (Johns Hopkins) is a computational biologist who developed widely used algorithms for de-novo genome assembly and variant detection as well as pioneered the use of computational clouds in life sciences. Dr. Haeussler (UCSC) a software engineer and the PI of the Browser Project. Dr. John Fonner, a biomedical engineer who serves as the Director of Special Projects at TACC. Importantly, the team consists of highly experienced and skilled software developers and engineers who worked on the project for many years. Just in the past three years this group has published over 40 papers on pathogen genomics including recent high profile publications on SARS-CoV2 genomics in Cell, Nature Biotechnology, Nature Reviews Genetics, Nature Genetics and others. A high level overview of dependencies among these components is shown below.