Publications

Detailed Information

Space Efficient Encodings for Bit-strings, Range queries and Related Problems

DC Field Value Language
dc.contributor.advisorSrinivasa Rao Satti-
dc.contributor.author조승범-
dc.date.accessioned2017-07-13T07:12:34Z-
dc.date.available2017-07-13T07:12:34Z-
dc.date.issued2016-02-
dc.identifier.other000000132092-
dc.identifier.urihttps://hdl.handle.net/10371/119143-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. Srinivasa Rao Satti.-
dc.description.abstractIn this thesis, we design and implement various space efficient data structures. Most of these structures use spaces close to the information-theoretic lower bound while supporting the queries efficiently.
In particular, this thesis is concerned with the data structures for four problems: (i) supporting \rank{} and \select{} queries on compressed bit strings, (ii) nearest larger neighbor problem, (iii) simultaneous encodings for range and next/previous larger/smaller value queries, and (iv) range \topk{} queries on two-dimensional arrays.

We first consider practical implementations of \emph{compressed} bitvectors, which support \rank{} and \select{} operations on a given bit-string, while storing the bit-string in compressed form~\cite{DBLP:conf/dcc/JoJORS14}. Our approach relies on \emph{variable-to-fixed} encodings
of the bit-string, an approach that has not yet been considered systematically for practical encodings of bitvectors. We show that this approach leads to fast practical implementations with low \emph{redundancy} (i.e., the space used by the bitvector in addition to the compressed representation of the bit-string), and is a flexible and promising solution to the problem of supporting
\rank{} and \select{} on moderately compressible bit-strings, such as those encountered in real-world applications.

Next, we propose space-efficient data structures for the nearest larger neighbor problem~\cite{IWOCA2014,walcom-JoRS15}. Given a sequence of $n$ elements from a total order, and a position in the sequence, the nearest larger neighbor (\NLV{}) query returns the position of the
element which is closest to the query position, and is larger than the element at the query position. The
problem of finding all nearest larger neighbors has attracted interest due to its applications for
parenthesis matching and in computational geometry~\cite{AsanoBK09,AsanoK13,BerkmanSV93}.
We consider a data structure version of this problem, which is to preprocess a given sequence of elements to construct a data structure that can answer \NLN{} queries efficiently. For one-dimensional arrays, we give time-space tradeoffs for the problem on \textit{indexing model}. For two-dimensional arrays, we give an optimal encoding with constant query on \textit{encoding model}.

We also propose space-efficient encodings which support various range queries, and previous and next smaller/larger value queries~\cite{cocoonJS15}. Given a sequence of $n$ elements from a total order, we obtain a $4.088n + o(n)$-bit encoding that supports all these queries where $n$ is the length
of input array. For the case when we need to support all these queries in constant time, we give an encoding that takes $4.585n + o(n)$ bits. This improves the $5.08n+o(n)$-bit encoding obtained by encoding the colored $2d$-Min and $2d$-Max heaps proposed by Fischer~\cite{Fischer11}.
We extend the original DFUDS~\cite{BDMRRR05} encoding of the colored $2d$-Min and $2d$-Max heap that supports the queries in constant time. Then, we combine the extended DFUDS of $2d$-Min heap and $2d$-Max heap using the Min-Max encoding of Gawrychowski and Nicholson~\cite{Gawry14}
with some modifications. We also obtain encodings that take lesser space and support a subset of these queries.

Finally, we consider the various encodings that support range \topk{} queries on a two-dimensional array containing elements from a total order. For an $m \times n$ array, we first propose an optimal encoding for answering one-sided \topk{} queries, whose query range is restricted to $[1 \dots m][1 \dots a]$, for $1 \le a \le n$. Next, we propose an encoding for the general \topk{} queries that takes
$m^2\lg{{(k+1)n \choose n}} + m\lg{m}+o(n)$ bits. This generalizes the \topk{} encoding of Gawrychowski and Nicholson~\cite{Gawry14}.
-
dc.description.tableofcontentsChapter 1 Introduction 1
1.1 Computational model 2
1.1.1 Encoding and indexing models 2
1.2 Contribution of the thesis 3
1.3 Organization of the thesis 5

Chapter 2 Preliminaries 7

Chapter 3 Compressed bit vectors based on variable-to-fixed encodings 10
3.1 Introduction 10
3.2 Bit-vectors using V2F coding 14
3.3 V2F compression algorithms for bit-strings 16
3.3.1 Tunstall code 16
3.3.2 Enumerative codes 19
3.3.3 LZW algorithm 23
3.3.4 Empirical evaluation of the compressors 23
3.4 Practical implementation of bitvectors based on V2F compression. 26
3.4.1 Testing Methodology 29
3.4.2 Results of Empirical Evaluation 33
3.5 Future works 35

Chapter 4 Space Efficient Data Structures for Nearest Larger Neighbor 39
4.1 Introduction 39
4.2 Indexing NLV queries on 1D arrays 43
4.3 Encoding NLN queries on2D binary arrays 44
4.4 Encoding NLN queries for general 2D arrays 50
4.4.1 2D NLN in the encoding model–distinct case 50
4.4.2 2D NLN in the encoding model–general case 53
4.5 Open problems 63

Chapter 5 Simultaneous encodings for range and next/previous larger/smaller value queries 64
5.1 Introduction 64
5.2 Preliminaries 67
5.2.1 2d-Min heap 69
5.2.2 Encoding range min-max queries 72
5.3 Extended DFUDS for colored 2d-Min heap 75
5.4 Encoding colored 2d-Min and 2d-Max heaps 80
5.4.1 Combined data structure for DCMin(A) and DCMax(A) 82
5.4.2 Encoding colored 2d-Min and 2d-Max heaps using less space 88
5.5 Open problems 89

Chapter 6 Encoding Two-dimensional range Top-k queries 90
6.1 Introduction 90
6.2 Encoding one-sided range Top-k queries on 2D array 92
6.3 Encoding general range Top-k queries on 2D array 95
6.4 Open problems 99

Chapter 7 Conculsion 100

Bibliography 103

요약 112
-
dc.formatapplication/pdf-
dc.format.extent3655125 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectSpace-efficient data structure-
dc.subjectsuccinct data structure-
dc.subjectencoding model-
dc.subjectindexing model-
dc.subjectbitvector-
dc.subject\rank{} query-
dc.subject\select{} query-
dc.subjectnearest larger neighbor problem-
dc.subjectrange queries-
dc.subjectnext/previous larger query-
dc.subjectrange topk query-
dc.subject.ddc621-
dc.titleSpace Efficient Encodings for Bit-strings, Range queries and Related Problems-
dc.typeThesis-
dc.description.degreeDoctor-
dc.citation.pagesxi, 114-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2016-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share