This course surveys decentralization in the context of optimization and learning, with focus on modern directions in large-scale machine learning. We begin with an overview of decentralization in optimization, presenting classical techniques alongside recent communication-efficient methods, where optimization is carried out through structured local updates and fixed communication protocols. In particular, we cover some stochastic and adaptive methods. We then develop some fundamental limits for distributed computation, highlighting oracle and communication lower bounds, and present some principled and automated analysis frameworks for deriving worst-case performance guarantees and guide the design of efficient methods.An important theme is non-convex optimization, with applications to in decentralized and federated settings, where stochastic gradient descent (SGD) and adaptive variants are analyzed under heterogeneity, asynchrony, and partial participation. We study convergence guarantees, variance-reduced methods, and adaptive gradient strategies, connecting them to the practice of federated learning across devices with limited resources. The last part of the course is devoted to robustness and resiliency, addressing Byzantine failures, adversarial models, and privacy-preserving mechanisms critical for federated and edge learning.
Outline (tentative)
Decentralization in optimization and learning: an overview of the state-of-the-art
Fundamental limits in distributed computations and optimization