Skip to main content

Theory and Practice of Algorithms for analysis of People and Data on the Web

Final Report Summary - EVALUATE (Theory and Practice of Algorithms for analysis of People and Data on the Web)

The project set out to study 3 main scientific goals, namely:
(1) Understanding learning theoretical and social choice theoretical aspects of learning to rank from preferences, (2) Studying problems related to the Correlation Clustering problem with prior trust information and (3) improve our understanding of the FJLT (Fast Johnson-Lindenstrauss Transforms).
The expected impact of this work was to better understand the theoretical aspects underpinning some of the most important building blocks of the world of big data which is increasingly rapidly affecting many aspects of the lives of billions.

For objective (1), the PI and his team have made progress in several directions. In [1,2], new active learning algorithms over pairwise preferences were designed, giving rise to new techniques for active learning. The new techniques are called “Smooth Relative Regret Approximation” (SRRA), and they are interesting in their own right in learning theory. In [7,9] the PI and coauthors were able to design online learning algorithms for an algorithm playing over an action set consisting of all permutations, in an environment incurring a loss function that is linear in a standard representation of the permutation. The algorithms bridged between almost optimal regret bounds as well as computational bounds. In [8] the PI and his collaborators defined a natural extension of a problem called “bandit stochastic optimization” to a pairwise case, in which the binary feedback from choosing a pair of actions is based on the outcome of a duel between the actions.

For objective (2), the progress was made in the following aspects. The new techniques for active learning mentioned above (cf. [1]) turned out to be useful also in devising learning algorithms for a setting in which similarity labels are provided for pairs of objects. The labels are noisy, and possibly adversarily. In [5], a technique known as trace-norm minimization was used to solve a clustering problem known as “planted partitioning” (which is a stochastic case of correlation clustering). Unlike previous results, our result used a notion of adaptivity. In [6], we have developed a new analysis of an algorithm knows as k-means++ for a problem related to correlation clustering, and showed that the problem is more difficult that previously assumed.

For objective (3), we have shown in [4] a strong connection between FJLT and so-called RIP matrices (Restricted Isometry Property) by defining a carefully constructed random walk in the unitary group. Studying the properties of this random walk, defined by repeatedly randomly flipping signs and applying a Fourier transform, opened the door to further open problems. In [10] I have shown new lower bounds for Fourier transform computation, which is a core step in FJLT.