Li, Ping

Boosting Models for Edit, Imputation and Prediction of Multiple Response Outcomes

Boosting Models for Edit, Imputation and Prediction of Multiple Response Outcomes
Li, Ping; Abowd, John M.
In this paper, we propose a statistical framework that generalizes the classical logit model to predict multiple responses (i.e., multi-label classification). We develop an effective implementation based on boosting and trees. For the NCRN seminar we present an application to editing and imputation in the multiple response race and ethnicity coding on the American Community Survey.

Fast Near Neighbor Search in High-Dimensional Binary Data

Fast Near Neighbor Search in High-Dimensional Binary Data
Shrivastava, Anshumali; Li, Ping
Numerous applications in search, databases, machine learning,
and computer vision, can benefit from efficient algorithms for near
neighbor search. This paper proposes a simple framework for fast near
neighbor search in high-dimensional binary data, which are common in
practice (e.g., text). We develop a very simple and effective strategy for
sub-linear time near neighbor search, by creating hash tables directly

b-Bit Minwise Hashing in Practice

b-Bit Minwise Hashing in Practice
Li, Ping; Shrivastava, Anshumali; König, Arnd Christian
Minwise hashing is a standard technique in the context of search for
approximating set similarities. The recent work [26, 32] demon-
strated a potential use of b-bit minwise hashing [23, 24] for ef-
ficient search and learning on massive, high-dimensional, binary
data (which are typical for many applications in Web search and
text mining). In this paper, we focus on a number of critical is-