Precision Recovery in Lossy-Compressed Floating Point Data for High Energy Physics
Description
ATLAS is one of the particle physics experiments at the Large Hadron Collider (LHC) at CERN. With the planned upgrade of the LHC (the so-called High Luminosity phase), allowing for even more detailed exploration of fundamental particles and forces of nature, it is expected that the recorded data rate will be up to ten times greater than today. One of the methods of addressing this storage challenge is data compression. The traditional approach involves lossless compression algorithms such as zstd and zlib. To further reduce storage footprint, methods involving lossy compression are being investigated. One of the solutions in High Energy Physics is the reduction of floating point precision, as stored precision may be higher than detector resolution. However, when reading data back, physicists may be interested in restoring the precision of the floating point numbers. This is obviously impossible in the strict sense, as the process of removing bits is irreversible. Nevertheless, given that the data volume is high, some variables are correlated, and follow specific distributions, one may consider a machine learning approach to recover the lossy-compressed floating-point data.
Task ideas
- Perform lossy compression of data sample from the ATLAS experiment
- Investigate ML techniques for data recovery, prediction and upscaling
- Integrate the chosen technique into HEP workflow
Expected results
- Implementation of ML-based procedure to restore precision of lossy-compressed floating-point numbers in ATLAS data
- Evaluation of the method’s performance (decompression accuracy) and its applicability in HEP workflow
Requirements
- C++, Python, Machine Learning
Links
Mentors
- Maciej Szymański - ANL
- Peter Van Gemmeren - ANL
Additional Information
- Difficulty level (low / medium / high): medium
- Duration: 350 hours
- Mentor availability: July-September