Go-HEP/groot - Create a pure-Go dataframe package to access ROOT TTrees with groot
Description
Go-HEP is a set of libraries and applications allowing High Energy Physicists (HEP) to write efficient analysis code in the Go programming language.
Go brings the fast edit-compile-run cycle that interpreted language users know and the runtime efficiency that compiled languages users expect.
Go-HEP provides the needed HEP oriented packages on top of this concurrency-enabled language.
The Go-HEP project currently provides limited read access to ROOT files, the binary format that all LHC experiments use to store data, via its groot package.
groot also allows to create ROOT files with histograms and graphs.
But Go-HEP is missing a pure-Go library to present ROOT TTrees as dataframes. The Gonum organization is in the process of developing a dataframe package, built off the Apache Arrow project.
Tasks
The proposed project aims at implementing a thin API on top of the groot/rtree package to expose any ROOT TTree as a dataframe, providing Gonum with some feedback with regards to its proposed dframe package.
We propose the following steps:
- extract a ROOT TTreeās schema and convert it into an Apache Arrow schema
- create a dataframe off an Arrow schema, connect it to the low-level data representation of a TTree
- implement convenience functions to:
- create a 1-dim histogram off a dataframe
- create a 2-dim histogram off a dataframe
- create 2-dim scatters off a dataframe
- implement the ability to add user columns to a dataframe, based off a user provided function depending on other columns from the dataframe
- implement the ability to easily plot quantities or columns from a dataframe
- implement the ability to save a dataframe as a ROOT TTree.
Expected results
A package that can expose a ROOT TTree as a dataframe, usable for various High Energy Physics analyses or data science tasks.
Requirements
- Go
- git