Publications

Detailed Information

Fast and scalable method for distributed Boolean tensor factorization

Cited 8 time in Web of Science Cited 13 time in Scopus
Authors

Park, Namyong; Oh, Sejoon; Kang, U.

Issue Date
2019-08
Publisher
Springer Verlag
Citation
VLDB Journal, Vol.28 No.4, pp.549-574
Abstract
How can we analyze tensors that are composed of 0's and 1's? How can we efficiently analyze such Boolean tensors with millions or even billions of entries? Boolean tensors often represent relationship, membership, or occurrences of events such as subject-relation-object tuples in knowledge base data (e.g., 'Seoul'-'is the capital of'-'South Korea'). Boolean tensor factorization (BTF) is a useful tool for analyzing binary tensors to discover latent factors from them. Furthermore, BTF is known to produce more interpretable and sparser results than normal factorization methods. Although several BTF algorithms exist, they do not scale up for large-scale Boolean tensors. In this paper, we propose DBTF, a distributed method for Boolean CP (DBTF-CP) and Tucker (DBTF-TK) factorizations running on the Apache Spark framework. By distributed data generation with minimal network transfer, exploiting the characteristics of Boolean operations, and with careful partitioning, DBTF successfully tackles the high computational costs and minimizes the intermediate data. Experimental results show that DBTF-CP decomposes up to 16(3)-32(3) x larger tensors than existing methods in 82-180 x less time, and DBTF- TK decomposes up to 8(3)-16(3) x larger tensors than existing methods in 86-129 x less time. Furthermore, both DBTF- CP and DBTF- TK exhibit near- linear scalability in terms of tensor dimensionality, density, rank, and machines.
ISSN
1066-8888
URI
https://hdl.handle.net/10371/179761
DOI
https://doi.org/10.1007/s00778-019-00538-z
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share