Publications

Detailed Information

Space Efficient Encodings for Bit-strings, Range queries and Related Problems

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

조승범

Advisor
Srinivasa Rao Satti
Major
공과대학 전기·컴퓨터공학부
Issue Date
2016-02
Publisher
서울대학교 대학원
Keywords
Space-efficient data structuresuccinct data structureencoding modelindexing modelbitvector\rank{} query\select{} querynearest larger neighbor problemrange queriesnext/previous larger queryrange topk query
Description
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. Srinivasa Rao Satti.
Abstract
In this thesis, we design and implement various space efficient data structures. Most of these structures use spaces close to the information-theoretic lower bound while supporting the queries efficiently.
In particular, this thesis is concerned with the data structures for four problems: (i) supporting \rank{} and \select{} queries on compressed bit strings, (ii) nearest larger neighbor problem, (iii) simultaneous encodings for range and next/previous larger/smaller value queries, and (iv) range \topk{} queries on two-dimensional arrays.

We first consider practical implementations of \emph{compressed} bitvectors, which support \rank{} and \select{} operations on a given bit-string, while storing the bit-string in compressed form~\cite{DBLP:conf/dcc/JoJORS14}. Our approach relies on \emph{variable-to-fixed} encodings
of the bit-string, an approach that has not yet been considered systematically for practical encodings of bitvectors. We show that this approach leads to fast practical implementations with low \emph{redundancy} (i.e., the space used by the bitvector in addition to the compressed representation of the bit-string), and is a flexible and promising solution to the problem of supporting
\rank{} and \select{} on moderately compressible bit-strings, such as those encountered in real-world applications.

Next, we propose space-efficient data structures for the nearest larger neighbor problem~\cite{IWOCA2014,walcom-JoRS15}. Given a sequence of $n$ elements from a total order, and a position in the sequence, the nearest larger neighbor (\NLV{}) query returns the position of the
element which is closest to the query position, and is larger than the element at the query position. The
problem of finding all nearest larger neighbors has attracted interest due to its applications for
parenthesis matching and in computational geometry~\cite{AsanoBK09,AsanoK13,BerkmanSV93}.
We consider a data structure version of this problem, which is to preprocess a given sequence of elements to construct a data structure that can answer \NLN{} queries efficiently. For one-dimensional arrays, we give time-space tradeoffs for the problem on \textit{indexing model}. For two-dimensional arrays, we give an optimal encoding with constant query on \textit{encoding model}.

We also propose space-efficient encodings which support various range queries, and previous and next smaller/larger value queries~\cite{cocoonJS15}. Given a sequence of $n$ elements from a total order, we obtain a $4.088n + o(n)$-bit encoding that supports all these queries where $n$ is the length
of input array. For the case when we need to support all these queries in constant time, we give an encoding that takes $4.585n + o(n)$ bits. This improves the $5.08n+o(n)$-bit encoding obtained by encoding the colored $2d$-Min and $2d$-Max heaps proposed by Fischer~\cite{Fischer11}.
We extend the original DFUDS~\cite{BDMRRR05} encoding of the colored $2d$-Min and $2d$-Max heap that supports the queries in constant time. Then, we combine the extended DFUDS of $2d$-Min heap and $2d$-Max heap using the Min-Max encoding of Gawrychowski and Nicholson~\cite{Gawry14}
with some modifications. We also obtain encodings that take lesser space and support a subset of these queries.

Finally, we consider the various encodings that support range \topk{} queries on a two-dimensional array containing elements from a total order. For an $m \times n$ array, we first propose an optimal encoding for answering one-sided \topk{} queries, whose query range is restricted to $[1 \dots m][1 \dots a]$, for $1 \le a \le n$. Next, we propose an encoding for the general \topk{} queries that takes
$m^2\lg{{(k+1)n \choose n}} + m\lg{m}+o(n)$ bits. This generalizes the \topk{} encoding of Gawrychowski and Nicholson~\cite{Gawry14}.
Language
English
URI
https://hdl.handle.net/10371/119143
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share