b-Bit Minwise Hashing in Practice

Li, Ping, Anshumali Shrivastava, and Arnd Christian König. b-Bit Minwise Hashing in Practice. Cornell University Preprint 1813:37986, 2013, available at http://hdl.handle.net/1813/37986.
b-Bit Minwise Hashing in Practice Li, Ping; Shrivastava, Anshumali; König, Arnd Christian Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demon- strated a potential use of b-bit minwise hashing [23, 24] for ef- ficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical is- sues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications.