OneOligo: Using oneAPI to Accelerate DNA Data Storage
In the European Commission-funded Future and Emerging Technologies initiative OligoArchive, we are working on transforming DNA–the biological building block of life–into a digital building block for long-term data archival. One of the key steps in retrieving digital data stored in DNA involves clustering billions of strings with respect to edit distance. The computationally intensive nature of edit distance computation has made this step a critical bottleneck in the DNA data retrieval pipeline. In this talk, we will present project OneOligo—our scalable, hardware-accelerated solution for DNA read clustering. In doing so, we will first provide an overview the DNA data storage pipeline. Then, we will present OneJoin—a string-similarity join algorithm that synergistically combines algorithmic advances in low-distortion embedding with cross-architectural programming ability offered by DPC++, to scale-up clustering across CPUs and GPUs.