Using ROOT in the field of genome sequencing
Description
The ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations. The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time modules to layer between components. We believe additional layering formalisms will benefit ROOT and its users.
ROOT has broader scientific uses than the field of high energy physics. Several studies have shown promising applications of the ROOT I/O system in the field of genome sequencing. This project is about extending the developed capability in GeneROOT and understanding better the requirements of the field.
Expected results
- Reproduce the results based on previous comparisons against ROOT master
- Investigate and compare the latest compression strategies used by Samtools for conversions to BAM, with RAM(ROOT Alignment Maps).
- Explore ROOT’s RNTuple format to efficiently store RAM maps, in place of the previously used
TTree. - Investigate different ROOT file splitting techniques
- Produce a comparison report
Requirements
- C++ and Python programming
- Familiarity with Git
- Knowledge of ROOT and/or the BAM file formats is a plus.
Links
Mentors
- Martin Vasilev - Uni Plovdiv
- Jonas Rembser - CERN
- Fons Rademakers - CERN
Additional Information
- Difficulty level (low / medium / high): medium
- Duration: 350 hours
- Mentor availability: June-November