New Software and Project Highlights
DrugScreenExplorer online platform
The DrugScreenExplorer online platform was created as interactive data visualisation and exploration tool for the analysis of multiple types of data. Detailed information about this platform is available in the Technical Report describing the data structure and methods for operating on screening data (PDF).
Motivated by the continuous progress in sequencing technology, Varbench was designed as a generalised workflow and open-source software package to fulfill the need for a robust standardised application of existing methods in liquid biopsy data analysis. Detailed information about Varbench is available in the document Open-source software for the standardized application of existing methods in liquid biopsy data analysis, supported by online catalogue of existing methods and data resources (PDF).
The open-source software CAUSEXPR was developed for the identification of causal genetic mutations and pathways in mitochondrial disorders, by integrating genotype and gene expression data. The Report Open-source software CAUSEXPR, to prioritize likely causal mutations from genotype and gene expression describes CAUSEXPR in detail (PDF).
The TCGAbrowser is an open-source web portal for mining and analysing TCGA tumour data (The Cancer Genome Atlas), by integrating multiple omic and clinical variables, with the goal of predicting treatment outcome and support clinical decision-making. The software is described in detail in the document Open-source software supporting the molecular tumour characterization pipeline and predicting perturbed pathways (PDF).
Updated version of Renjin
The compatibility of Renjin, an interpreter for the R language written in Java, with the GNU R interpreter, which serves as reference implementation of R language, has been improved to allow the use of bioinformatic workflows and Bioconductor packages. An (open) library of benchmarks based on real use cases was created, which will be used in measuring performance improvements, http://github.com/bedatadriven/renjin-benchmarks. The software is described in detail in the document Open-source software implementing an updated version of Renjin that supports the major Bioconductor packages; a code library of benchmarks; and a technical report describing the most significant performance bottlenecks (PDF).
Approaches for outlier detection in patient ‘omics data
Robust methods for outlier detection have been reviewed and expanded to identify outliers in patient omics’ data. Outliers may correspond to experimental errors, but also to special medical cases that deserve further analysis. Detecting outliers is expected to improve prognostic, while providing deeper insight on of the biology of the disease. Read more in the Technical report on approaches to outlier detection in patient ‘omics data (PDF).
Causal stability ranking for high-dimensional genotype data
Improving treatment decisions and options based on expertise in sequencing technologies and molecular data analysis is now possible through a pipeline developed, employed and maintained by the ETH technology platform NEXUS to support the Molecular Tumor Board of the University Hospitals in Zurich and Basel. Detailed information is available in the document Open-source software that implements causal stability ranking for high-dimensional genotype data (PDF).
New disease entities
Existing and novel methods developed in WP3 (Methods for Genetic Disorders) were applied to identify potential pathogenic variants in patients with suspected mitochondrial disorders. The Technical report on new disease entities that were identified using novel statistical methods (PDF) describes the methods in detail.
Database of solved cases
A database of genome-wide genotypes of solved and unsolved patients with suspected mitochondrial disorder suitable for automated benchmarking was established at TUM-MED in collaboration with TUM. Read more at (PDF).
Definitions and best practice guidelines
A common set of criteria and terminology in somatic mutation calling can be found in the technical report “Uncertainty in Somatic Mutation Calling”, accompanied by the open-source software package VARMISS. Different benchmarks performed by different groups in the project will therefore be comparable. The Technical report with definitions and best practice guidelines; and software package for simulation of experimental design and choice of validation sites in cancer genomics is available as (PDF).
Network centrality metrics for Elastic-Net regularized models
glmSparseNet is an R-package that generalizes sparse regression models when the features (e.g. genes) have a graph structure (e.g. protein-protein interactions), by including network-based regularizers. glmSparseNet uses the glmnet R-package, by including centrality measures of the network as penalty weights in the regularization. The current version implements regularization based on node degree, i.e. the strength and/or number of its associated edges, either by promoting hubs in the solution or orphan genes in the solution. All the glmnet distribution families are supported, namely “gaussian”, “poisson”, “binomial”, “multinomial”, “cox”, and “mgaussian”. (glmSparseNet).