  # The Role of Dimension in Metric Embedding

## Final Report Summary - DIMENSION (The Role of Dimension in Metric Embedding)

This project deals with the theory of \emph{low-distortion embedding} of finite metric spaces. Metric embedding can be a useful and versatile algorithmic tool, in particular for approximating combinatorial optimization problems, and in any application area that requires classification and organization of data whose features bear some geometry.
A typical problem in this field is understanding how faithfully a metric space can be represented as a subset of a normed space. Normed spaces are very natural candidates for embedding questions, since they carry additional structure that may be exploited in algorithms and data structures.

An important parameter of an embedding is the \emph{dimension} of the host normed space. The dimension plays a major role in application areas, because it determines the size of the representation, and because the running time of many algorithms depends inherently on the dimension.
In this research project, I am mainly working on developing a better understanding on the dimension required in different embedding settings, focusing on doubling metrics.

In the beginning of the project I published a paper with Assaf Naor called "Assouad's Theorem with Dimension Independent of the Snowflaking". The main result of this paper is a strengthening of a classical theorem of Assouad from '83, we show that any $(1-\epsilon)$ snowflake of a doubling metric embeds into Euclidean space with distortion $\tilde{O}(1/\epsilon)$ using only $O(1)$ dimensions (in Assouad's result, and in all of its subsequent improvements, the dimension depends on $\epsilon$).

One of the main goals of the research program, is to understand whether the dependence on $\epsilon$ can be improved, and in general try to obtain distortion $O(1/\sqrt{\epsilon})$, which translate to distortion $O(\sqrt{\log n})$ for arbitrary doubling metrics on $n$ points. These are Problems \ref{pr:1} and \ref{pr:4} in the project. The first step described there, is to study embedding of one of the "bad examples" for low dimensional embedding - the so called Laakso graph.
I found a satisfying answer to this question, which appears in the paper titled "Low Dimensional Embedding of Doubling Metrics". There I showed that the best distortion achievable for embedding this graph into Euclidean space, which is $O(\log n)$, can also be obtained in constant dimension (in particular, dimension 3). Additionally, I presented in that paper several improvements and simplifications of low distortion embedding of doubling metrics and their snowflakes into $\ell_p$. In particular, I showed a simple proof with the best possible dependence on the parameters, for embedding doubling metrics with arbitrarily small distortion of $1+\delta$ into $\ell_\infty$, where the host dimension is the (asymptotically) best possible.

One of the main open problems in the field of metric embedding, is whether there exists a dimension reduction in Euclidean space tailored for doubling metrics (that provides constant distortion using constant dimension). This is \probref{pr:3} in the project, which is very challenging. In an attempt to attack this problem, with Lee-Ad Gottlieb and Yair Bartal, we show in the paper "On the Impossibility of Dimension Reduction for Doubling Subsets of $\ell_p$", that its counterpart in $\ell_p$ for any $p>2$ is impossible. That is, there is no dimensionality reduction for doubling subsets of $\ell_p$ for $p>2$. (The same result, using completely different techniques, was obtain concurrently by Lafforge and Naor).

I also worked on other related problems in the field of metric embedding. For instance, in the paper "Cops, Robbers, and Threatening Skeletons: Padded Decomposition for Minor-Free Graphs" I and my co-authors show an improved decomposition theorem for minor-free graphs, which in turn implies improved embedding of such graphs into $\ell_1$. In an additional paper called "Light Spanners", we show that every metric has $t$-spanner (a graph that preserves distances upto a factor of $t$), with few edges and small total weight (sum of edge weights). Such spanners are a useful object in some distributed setting, network design, and routing.

In a sequence of works with my student Arnold Filtser and Colleague Michael Elkin, we study relaxed notions of embedding, that allow improved distortion and dimension for some of the pairs. In particular, in our paper "Terminal Embeddings" we assume the input consists of a metric and a subset of k important points called terminals. Then we would like to obtain a low dimensional embedding that preserves well distance from terminals to all other points -- where both distortion and dimension should depend on k only (regardless of the input size). We show the applicability of this notion in various algorithmic settings, e.g. approximation algorithms and online algorithms. In our paper "Prioritized Metric Structures and Embedding" we generalize the notion of terminal embedding, and propose a scheme where a priority ranking over the points is given, then one desires an embedding with guarantees that scale with the ranking of the points. We extend these ideas beyond embedding, into related algorithmic tasks such as approximate distance oracles and compact routing schemes, that has prioritized stretch and label size guarantees. Roughly speaking, the stretch corresponds to distortion, and the label size to dimension. With Arnold Filtser and Yair Bartal we continue to study these new notions, and in our paper "On Notions of Distortion and an Almost Minimum Spanning Tree with Constant Average Distortion" we show that they have some similarity to previously defined notions of scaling distortion. We also construct spanners with prioritized distortion, and as a corollary obtain that each graph contains a spanning tree with weight arbitrarily close to that of the minimum spanning tree, and that also possesses constant average distortion. All these result interplay nicely with the main theme of the proposal -- to understand under what circumstances one can obtain low dimensional representation of data, with low distortion.

With Michael Elkin I had several works on succinct data structure for representing metrics -- we had a paper "On Efficient Distributed Construction of Near Optimal Routing Schemes", which construct in a distributed manner a routing scheme that can route message on almost shortest paths, which needs low storage at every vertex. In our paper "Hopsets with Constant Hopbound, and Applications to Approximate Shortest Paths" we study the notion of hopsets, in which one wants to augment the input graph with a small number of edges, and maintain almost shortest paths with path consisting of few edges. We also had a paper "Distributed Strong Diameter Network Decomposition" for network decomposition, which is basic partitioning framework useful in many distributed tasks. Very recently we published "Efficient Algorithms for Constructing Very Sparse Spanners and Emulators", in which we improve the state-of-the-art for near linear time algorithms for spanners.

In a recent paper with Alex Andoni and Assaf Naor, "Impossibility of Sketching of the 3D Transportation Metric with Quadratic Cost", we study transportation metrics. These metrics play a vital role in vision and image processing, and have numerous other applications. We provide both lower and upper bounds on the embeddability of such metrics into Euclidean space and L1, and also give lower bounds on low dimensional representation of such metrics, known as sketches. Again this work lies in the main theme of the proposal -- understanding the dimensionality of various metrics.

In a submitted paper with Yair Bartal and Nova Fandina, we study refined notions of distortion in dimension reduction in Euclidean space. We prove that the classical JL dimension reduction lemma has bounded moments of distortion even using dimension much smaller than logarithmic in the number of points, and show relations to other dimension reduction objective studied in the literature, such as Stress and Energy. We also exhibit some algorithmic applications of our techniques.