CMU-CS-20-131
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-20-131

Coding for Synchronization Errors

Amirbehshad Shahrasbi

Ph.D. Thesis

September 2020

CMU-CS-20-131.pdf


Keywords: Synchronization, Error-Correction for Synchronization Errors, Coding for Insertions and Deletions, Synchronization Strings, List Decoding for Insertions and Deletions, Interactive Coding for Synchronization Errors, Edit Distance Computation

Coding theory is the study of algorithms and techniques that facilitate reliable information transmission over noisy mediums, most notably through combinatorial objects called error-correcting codes. Following the inspiring works of Shannon and Hamming, a sophisticated and extensive body of research on error-correcting codes has led to a deep and detailed theoretical understanding as well as practical implementations that have helped fuel the digital revolution. Error-correcting codes can be found in essentially all modern communication, computation, and data storage systems. While being remarkably successful in understanding the theoretical limits and trade-offs of reliable communication under errors and erasures, the coding theory literature significantly lags behind when it comes to overcoming errors that concern the timing of communications. In particular, the study of correcting synchronization errors, i.e., symbol insertions and deletions, while initially introduced by Levenshtein in the 60s, has significantly fallen behind our highly sophisticated knowledge of codes for Hamming-type errors.

This thesis investigates coding against synchronization errors under a variety of models and attempts to understand trade-offs between different qualities of interest in respective coding schemes such as rate, distance, and algorithmic qualities of the code. Most of the presented results rely on synchronization strings, simple yet powerful pseudorandom objects introduced in this work that have proven to be very effective solutions for coping with synchronization errors in various settings.

Through indexing with strings that satisfy certain pseudo-random properties, we provide synchronization codes that achieve near-optimal rate-distance trade-off. We further attempt to provide constructions that enable fast encoding/decoding procedures. We study the same problem under the list-decoding regime, where the decoder is expected to provide a short list of codewords that is guaranteed to contain the sent message. We will also try to better understand the fundamental limits of list-decoding for synchronization errors such as the list-decoding capacity or maximal error resilience for list-decodable synchronization codes. This thesis furthermore studies synchronization strings and other related pseudo-random string properties as combinatorial objects that are of independent interest. Such combinatorial objects will be used to extend some of our techniques to alternative communication problems such as coding from block transposition errors or coding for interactive communication.

297 pages

Thesis Committee:
Bernard Haeupler (Chair)
Venkatesan Guruswami
Rashmi Vinayak
Mahdu Sudan (Harvard University)
Sergey Yekhanin (Microsoft Research)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu