
CMUCS03134
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMUCS03134
Uniquely Decodable ngram Embeddings
Leonid Kontorovich
April 2003
CMUCS03134.ps
CMUCS03134.pdf
Keywords: ngram, embedding, finite state automata,
strings
We define the family of ngram embeddings from strings over a finite
alphabet into the semimodule N^K. We classify all x in N^K
that are valid images of strings under such embeddings, as well as
all x whose inverse image consists of exactly 1 string (we
call such x uniquely decodable). We prove
that for a fixed alphabet, the set of all strings whose image is
uniquely decodable is a regular language.
11 pages
