Fast Near Neighbor Search in High-Dimensional Binary Data

Fast Near Neighbor Search in High-Dimensional Binary Data
Shrivastava, Anshumali; Li, Ping
Numerous applications in search, databases, machine learning,
and computer vision, can benefit from efficient algorithms for near
neighbor search. This paper proposes a simple framework for fast near
neighbor search in high-dimensional binary data, which are common in
practice (e.g., text). We develop a very simple and effective strategy for
sub-linear time near neighbor search, by creating hash tables directly
using the bits generated by b-bit minwise hashing. The advantages of
our method are demonstrated through thorough comparisons with two
strong baselines: spectral hashing and sign (1-bit) random projections.