Efficient Genomic Compression using a Hierarchy of Block Codes and Novel Alignment Algorithm

Yuval Cassuto, Department of ECE, Technion - Israel Institute of Technology


The sequencing of genomic (DNA) data is an emerging field with vast applications in modern biological and medical research. The exploding amount of generated data requires the use of powerful compression tools. In this talk, I will present a new compression scheme for genomic data given as fragments called “reads”, which enjoys extremely low encoding complexity compared to state-of-the-art tools. The reads are assumed to be similar to segments of a reference sequence available to the decoder only, thus our multi-layered code construction carries the information needed for both aligning the (coded) read within the reference, and reconstructing the read from a similar reference segment. We first present the scheme for the case of only substitution errors between the reads and the reference, and then extend it to support reads with a single deletion and multiple substitutions. A central tool in this extension is an alignment algorithm using a new distance metric, which is shown analytically to improve alignment performance over existing distance metrics.

Bio: Yuval Cassuto is an Associate Professor at the Viterbi Department of Electrical and Computer Engineering, Technion – Israel Institute of Technology. His research interests lie at the intersection of the theoretical information sciences and the engineering of practical computing and storage systems. He has served on the technical program committees of leading conferences in both theory and systems. During 2010-2011 he has been a Scientist at EPFL, the Swiss Federal Institute of Technology in Lausanne. From 2008 to 2010 he was a Research Staff Member at Hitachi Global Storage Technologies, San Jose Research Center. In 2018-2019 he held a Visiting Professor position at Western Digital Research, and a Visiting Scholar position at UC Berkeley. He received the B.Sc degree in Electrical Engineering, summa cum laude, from the Technion in 2001, and the M.S. and Ph.D. degrees in Electrical Engineering from the California Institute of Technology, in 2004 and 2008, respectively. From 2000 to 2002, he was with Qualcomm, Israel R&D Center, where he worked on modeling, design and analysis in wireless communications. Dr. Cassuto has won the Best Student Paper Award in data storage from the IEEE Communications Society in 2010 as a student, and in 2019 as an adviser. He also won faculty awards from Qualcomm, Intel, and Western Digital. As an undergraduate student, he won the 2001 Texas Instruments DSP and Analog Challenge $100,000 prize.

Recorded Talk