b-Bit Minwise Hashing in Practice

b-Bit Minwise Hashing in Practice
Li, Ping; Shrivastava, Anshumali; König, Arnd Christian
Minwise hashing is a standard technique in the context of search for
approximating set similarities. The recent work [26, 32] demon-
strated a potential use of b-bit minwise hashing [23, 24] for ef-
ficient search and learning on massive, high-dimensional, binary
data (which are typical for many applications in Web search and
text mining). In this paper, we focus on a number of critical is-
sues which must be addressed before one can apply b-bit minwise
hashing to the volumes of data often used industrial applications.