Resumen:
|
Juha Kärkkäinen String analysis is nowadays a popular research field, both for
its direct practical applications and the way it can be integrated into other fields,
which can themselves be highly benefitted from its ...[+]
Juha Kärkkäinen String analysis is nowadays a popular research field, both for
its direct practical applications and the way it can be integrated into other fields,
which can themselves be highly benefitted from its advances. There are three basic
queries over strings that are substantial for their study (specially when analyzing
substrings of the original text) as a lot of different operations can be reformulated
as some expression of them.
Those queries are access, which returns the symbol stored in a given position
(trivial for standard representations), rank, which counts the number of occurrences
of a symbol until a given position, and select, used to find the position of
the string in which a certain occurrence of the symbol takes place.
The tricky part comes when we want to perform those queries efficiently over
large texts, as it is usual in string analysis, and at the same time preventing the
space necessary to skyrocket. Fortunately, it is possible to infer the results for
arbitrary alphabets from those obtained on binary texts, and when the alphabet
only comprises two symbols we can come upon structures that allow very fast
queries while still being space-efficient.
In this project I will review and try to optimize some of those structures (compact
representations of the original binary strings), from the most intuitive (and
spatially more expensive) to the succinct structures that, although more complex,
approach the theoretical lower bound. For all of them a method for performing
rank and select operations in constant time will be given.
I will finish with the study of some applications of these structures on different
problems or algorithms, as an example to illustrate how the effort put into this
topic is indeed indirectly beneficial for many others.
[-]
|