BRC Analytics

Roadmap

As BRC Analytics develops, we will utilize existing APIs and design new approaches for external data access, integrate Galaxy with hundreds of tools, provide access to Jupyter and RStudio for ad hoc analytics, offer custom tools and ObservableHQ-based dashboards, and include interactive tutorials for users of all skill levels.
Roadmap
The Data integration component will include utilization of existing API as well as design of new approaches for access of external data. The Data analysis component will consist of a globally accessible Galaxy instance deployed using ACCESS-CI/TACC resources. It will integrate hundreds of tools corresponding to each of the colored modules. In addition it will offer direct access to notebook environments such as Jupyter and RStudio allowing ad hoc analytics. A custom tools service will be offered to satisfy demands of users requesting missing or new components. We will provide a number of templates for deployment of serverless ObservableHQ-based dashboard that can be used to create rich visual representations of analytical results ranging from simple reports to Nationwide pathogen surveillance efforts. Finally, the Training component will embrace all aspects of the systems’ functionality and will include hundreds of interactive tutorials that can be used by users of all skill levels from computationally naive experimentalists to system engineers.

Development Plan

Develop data component

Data harmonization and ingest

The list of all 785 genomes originally found in VEuPathDB will be harmonized. This means that for each genome, we will identify the latest official release listed at NCBI. The data will then be ingested by the UCSC Genome Browser team to create a browser instance for each genome. The instance will initially contain annotations provided by the NCBI. Next, the best effort will be made to transfer any additional annotations (not found at NCBI) from VuePathDb database to each of the browsers. In particular, we will work on maximizing the amount of information available on gene pages.

Search component implementation

A search component allowing users to perform custom queries on all data will be developed. It will allow functionality that was previously provided by VEuPathDB’s “search strategy” component.

Develop analysis component

Develop best practices for common analyses

Develop and deploy robust analysis workflows for (1) transcriptomics (bulk and single cell), (2) variant analysis, (3) genome assembly, (4) genome annotation, (5) regulation (ChIP-seq and related) and others as appropriate. This will be done in close collaboration with the research community, which will guide us based on current needs and research trends.

Ensure tight integration between data and analysis components

To increase usability of brc-analytics a substantial amount of engineering will be devoted to making the interplay between data and analytics components as seamless as possible. For example, selecting a genome during the search phase will automatically pre-fill the analysis step with necessary reference data for this species such as read-mapper indices, SNPeff databases, and other artifacts.

Develop training component

Training and outreach activities are absolutely essential to our efforts. To reflect this degree of importance, we will provide tutorials, workshops & training materials, and the infrastructure necessary for enabling worldwide training events. Our training will include step-by-step interactive tutorials accessible directly from the Galaxy interface to facilitate learning our available features, a service for reserving and monitoring computational resources necessary for running live and on-line workshops anywhere in the world. A globally distributed yearly event (known and Smörgasbord) is dedicated to community-suggested topics and regularly attracts thousands of on-line participants. To achieve these goals we will be leveraging the highly successful Galaxy Training Network (GTN).