Welcome to the Simons Workshop on Learning, Algorithm Design and Beyond Worst-Case Analysis November 14-18, 2016 Organizers: Nir Ailon, Nina Balcan, Avrim Blum, Ravi Kumar, Kevin Leyton-Brown, and Tim Roughgarden This workshop on Learning, Algorithm Design and

Beyond Worst-Case Analysis is part of the Semester Program on Algorithms and Uncertainty (Fall 2016) at the Simons Institute for the Theory of Computing Some notes:

No food/drinks in auditorium (sorry) Wireless: CalVisitor or eduroam Time for short impromptu talks (email/talk to me) Reception after the talks today Optional social events each evenings: Mon: microbrew pub Tue: game night Wed: trip to SF Thu: microbrew pub

Fri: Viks Chaat House A Brief Tour of Algorithm Analysis Beyond the Worst Case Avrim Blum Carnegie Mellon University Whats the issue? A problem (like network flow, TSP, graph coloring, SAT, clustering,) consists of a family of instances Traditional goal in TCS: algorithm that performs well

on all instances in the family, i.e., Worst-Case Analysis Great if we can do it Worst-Case Analysis Dont need to worry about instance structure, can use for reductions (e.g., network flow, LP), etc. We feel we have cracked the problem if we have an alg that does well on all instances in the family. Great if we can do it

Worst-Case Analysis Dont need to worry about instance structure, can use for reductions (e.g., network flow, LP), etc. We feel we have cracked the problem if we have an alg that does well on all instances in the family. But what if we cant? Average-Case Analysis Look at random instances Less pessimistic

Can often give some insight into the problem Worst case Average case [Beardwood et al 1959]: tight analysis of length of optimal TSP tour for random points in unit square [Karp 1977]: asymptotically optimal heuristic for avg case Average-Case Analysis Look at random instances On the other hand, random typical

vs (e.g., images) Also random instances often very specific E.g., random 3-colorable graph Two nodes of same color share neighbors in common Two nodes of different color share neighbors in common

1990s: Intermediate models Issue of worst-case (great if you can do well but for many problems you cant) vs average case (random typical) led to flurry of work in 1990s on intermediate models Averag e Case Worst

Case Focus: NP-hard problems that are hard to approximate well in worst-case, online problems with high competitive ratios Semi-random input model [FOCS 1990] Semi-random input model [Feige-Kilian 98] Algs for -coloring, independent set in semi-random model for very low ( best possible) noise rate .

Make model more adversarial: Noise only adds edges Adversary can see all coin flips in advance I.e., start with sparse random graph (between colors or from to ) and adversary adds to it Bisection: stochastic block model + monotone changes See Aravindans talk for recent results in graph partitioning See Uris talk for model where randomness replaced with expansion

On-line algorithms Algorithm faced with request sequence . Must make decisions without knowing future. Competitive ratio: Paging: Best possible deterministic ratio is , achieved by both good algorithms (LRU) and bad algorithms (FIFO,) Explain what makes LRU good? Bypass this lower bound?

On-line algorithms Access-graph model [Borodin-Irani-Raghavan-Schieber 91] Program graph on pages. Accesses restricted to walks on this graph Analyze competitive ratio of LRU as a function of the graph LRU never much worse than FIFO and can be much better, and design interesting near-optimal algorithm On-line algorithms

Access-graph model [Borodin-Irani-Raghavan-Schieber 91] Loose-competitiveness model [Young 91] For any request sequence, for almost all , ratio ... (or cost is small in absolute terms) Markov-paging model [Karlin-Phillips-Raghavan 92] Diffuse-adversary model [Koutsoupias-Papadimitriou 94] Request sequence drawn from distribution D belonging to known class of distributions 1990s Learning theory Combining expert advice [Littlestone-Warmuth]

[Cesa-Bianchi et al] Given a collection of actions (which way to drive to work today), which one should you take? In hindsight, should have followed path A on day 1, path B on day 2, path A on day 3, Just try to compete with best single action (path). Motivationally, if world was iid, best strategy would be a single action (Model is worst-case, but competing wrt best strategy

in class that includes optimal for iid setting) Robot s R Us 32 min 2000s: Understand observed success Smoothed analysis model [Spielman-Teng 01] Every entry perturbed with small Gaussian noise Simplex Algorithm for LP runs in poly expected time:

More adversarial than semi-random model 2000s: Understand observed success Smoothed analysis model [Spielman-Teng 01] 2-OPT TSP heuristic [Englert-Rglin-Vcking 07] Lloyds -means algorithm [Arthur-Manthey-Rglin 09] (analysis is for time to reach a local optimum) Late 2000s+: models of nice inputs Replace randomness with deterministic conditions. Goals:

Understand reasonable inputs What properties of an instance make it easy/hard to find a global optimum or desired solution? Can we use implicit assumptions in formulations? Explain success of common heuristics Late 2000s+: models of nice inputs -means clustering [Ostrovsky-Rabani-Schulman-Swamy 06] Cost of optimal -means clustering decreases with . Often choose based on knee in the curve. x

x x x x If then natural heuristic finds If for constant can get PTAS. [Awasthi-B-Sheffet10] Late 2000s+: models of nice inputs

Approximation Stability [Balcan-B-Gupta 09] Perturbation Resilience [Bilu-Linial 10] Idea: identify assumptions that (a) are natural based on how input was collected, or (b) we will want to assume anyway because of how the output is being used. Observation: In many problems, true goal is to match a reference/ground-truth/desired solution. Objective just a proxy. Late 2000s+: models of nice inputs Given a set of news articles, cluster them how a

user would. Given a set of images, cluster by who is in them. Given a picture, correctly segment into objects. Observation: In many problems, true goal is to match a reference/ground-truth/desired solution. Objective just a proxy. Late 2000s+: models of nice inputs Here, the objective function (like -means score) is a measurable proxy for an unmeasurable goal of accuracy.

Obviously yes if one has a c-approx alg, but what if you dont? [Balcan-B-Gupta 09] -Approximation Stability: Suppose that all -approxs to are -close to reference solution. Can one use this to get close? OPT OPT Desired solution

Approximation-Stability [Balcan-B-Gupta 09]: for -median, -means, min-sum clustering, answer is yes even for . [Voevodski-Balcan-Roglin-Teng-Xia 10]: superfast algos (small number of distance queries) with this guarantee. Apply to biological sequence clustering. [Awasthi-Balcan-B-Sheffet-Vempala 10]: Nash equilibria OPT OPT Desired solution

Perturbation-Resilience [Bilu-Linial 10] Choice of edge weights, locations often somewhat heuristic If small changes to the input affected the optimal solution a lot, then if OPT is good (close to the desired solution) its just by luck Perturbation-Resilience [Bilu-Linial 10] Choice of edge weights, locations often somewhat heuristic Input is -perturbation resilient if changing input values by a factor doesnt change optimal solution (at all).

Perturbation-Resilience [Bilu-Linial 10] Generally harder than approximation-stability for the same level of parameter. While an optimal solution in a perturbed instance is also an approx-optimal solution in the original, the other direction need not be true. Many/most results need stability above approx bound In some cases (Nash equilibria) the two notions coincide [Balcan-Braverman 10]

Perturbation-Resilience [Bilu-Linial 10] Lots of results on: Max-cut [Bilu-Linial 10] [Bilu-Daniely-Linial-Saks 13] [Makarychev-Makarychev-Vijayaraghavan 14] Nash equilibria [Balcan-Braverman 10]

TSP [Mihalak-Schongens-Sramek-Widmayer 11] -median/means [Awasthi-B-Sheffet 12] [Balcan-Liang 12] [Makarychev-Makarychev 16] -center (incl asymmetric) [Balcan-Haghtalab-White 16] Talk today

Minimum multiway cut [Makarychev-MakarychevVijayaraghavan 14] Related: [Kushagra-Samadi-BenDavid 16] [Ashtiani-Kushagra-BenDavid 16] Average-case for Unknown Distribution Concern with usual average-case analysis: random typical vs How about average-case over the distribution of typical instances for your given application? Input is a collection of random samples of instances from your application

Then try to use them to learn the best algorithm (from some given family) for your applications distribution Average-case for Typical Examples Sorting and problems in computational geometry [Ailon-Chazelle-Clarkson-Liu-Mulzer-Seshadhri 11] Assume product distribution on items within input, compete with information-theoretic optimum Revenue-maximizing auctions [Elkind 07] [Cole-Roughgarden 14] [Morgenstern-Roughgarden 15] [Devanur-Huang-Psamos 16], General algorithmic problems [Gupta-Roughgarden

16] [Balcan-Nagarajan-Vitercik-White 16] No prior assumption on distribution of instances Compete with best algorithm from a given natural family A Will see talks on these this week Coming Up Theoretical talks on algorithms and models Empirical talks on approaches in practice/experiments Application talks on properties of real-world inputs in various domains Enjoy!