CSAMA 2016 – Statistical Data Analysis for Genome Biology

CSAMA 2016 (14th edition)
Statistical Data Analysis for Genome Biology
Bressanone-Brixen, Italy (South Tyrol Alps)
July 10-15, 2016


  • Simon Anders, Institute for Molecular Medicine, Helsinki
  • Jennifer Bryan, University of British Columbia, Vancouver
  • Vincent J. Carey, Channing Laboratory, Harvard Medical School
  • Wolfgang Huber, European Molecular Biology Laboratory (EMBL), Heidelberg
  • Michael Love, Dana Farber Cancer Institute and the Harvard School of Public Health
  • Martin Morgan, Roswell Park Cancer Institute, Buffalo, New York.
  • Charlotte Soneson, University of Zurich
  • Levi Waldron, CUNY School of Public Health at Hunter College, New York

Teaching Assistants:

  • Simone Bell, EMBL, Heidelberg
  • Alejandro Reyes, EMBL, Heidelberg
  • Mike L. Smith, EMBL, Heidelberg

The one-week intensive course Statistical Data Analysis for Genome Biology teaches statistical and computational analysis of multi-omics studies in biology and biomedicine. It covers the underlying theory and state of the art (the morning lectures), and practical hands-on exercises based on the R / Bioconductor environment (the afternoon labs). The course covers the primary analysis of high-throughput sequencing based assays in functional genomics and integrative methods including efficiently operating with genomic intervals, statistical testing, linear models, machine learning, bioinformatic annotation and visualization. At the end of the course, you should be able to run analysis workflows on your own (multi-)omic data, adapt and combine different tools, and make informed and scientifically sound choices about analysis strategies.

Topics include:

  • Introduction to Bioconductor
  • Elements of statistics: hhypothesis testing, multiple testing, regression, regularization, clustering and classification, parallelization and performance (machine learning), visualisation
  • Reproducible research and R authoring with markdown and knitr
  • RNA-Seq data analysis and differential expression
  • New workflows for RNA-seq
  • Computing with sequences and genomic intervals
  • End-to-end RNA-Seq workflow
  • Experimental design, batch effects and confounding
  • Working with annotation – genes, genomic features and variants
  • Visualization, the grammar of graphics and ggplot2
  • Use of Git and GitHub with R, RStudio, and R Markdown
  • Gene set enrichment analysis

The course consists of

  • morning lectures: 20 x 45 minutes: Monday to Friday 8:30h – 12:00h
  • 4 practical computer tutorials in the afternoons (14:00h – 17:00h) on Monday, Tuesday, Thursday and Friday

Visit the course’s website at: http://www.huber.embl.de/csama