Mathematical Optimization for Data Science Group, Saarland University

Welcome to the homepage of the

Mathematical Optimization for Data Science Group

Department of Mathematics and Computer Science, Saarland University, Germany

MOP Research Seminar

Research Seminar on Mathematical Optimization

Meeting:	Thursdays 15.00 – 16.00 o'clock
Where:	Virtual meeting via Zoom
Who:	MOP Group and students or researchers with interest in Mathematical Optimization
Language:	English

News

Description

Organization

News

Upcoming Sessions:

Date	Topic	Comment
05.06.2025	TBA.	reading session
12.06.2025	TBA.	reading session
19.06.2024	Public Holiday	no session
26.06.2025	TBA.	reading session

Past Sessions:

Date	Topic	Comment
29.05.2024	Public Holiday	no session
22.05.2025	Elesedy, Zaidi: Provably Strict Generalisation Benefit for Equivariant Models, 2021.	reading session
15.05.2025	Iutzeler, Pauwels, Vaiter: Derivatives of Stochastic Gradient Descent, 2024.	reading session
08.05.2025	Louzi: Stochastic Gradient Descent Revisited. 2024.	reading session
01.05.2024	Public Holiday	no session
24.04.2025	Optimization with Nonlinear Equality Constraints: Gauss-Newton-Type Penalty and Augmented Lagrangian Methods.	Talk by Lahcen El Bourkhissi
17.04.2025	Grohs, Hosseini: eps-subgradient algorithms for locally lipschitz functions on Riemannian manifolds, 2016.	reading session
10.04.2025	Polytope conditioning matters: a study of Frank-Wolfe over product domains.	Talk by Gabriele Iommazzo
03.04.2025	Cai, Luo: Optimal convergence rate of inertial gradient system with flat geometries and perturbations, 2023.	reading session
27.03.2025	Kornowski, Padmanabhan, Wang, Zhang, Sra: First-Order Methods for Linearly Constrained Bilevel Optimization, 2024.	reading session
20.03.2025	Differentiation of inertial methods for optimizing smooth parametric function.	Talk by Jean-Jacques Godeme
13.03.2025	Training Deep Learning Models with Norm-Constrained LMOs.	Talk by Tony Silveti-Falls
06.03.2025	An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsize.	Talk by Antonio Orvieto
27.02.2025	Kobler, Effland, Kunisch, Pock: Total Deep Variation: A Stable Regularizer for Inverse Problems, 2020.	reading session
20.02.2025	Fenchel Duality on Riemannian Manifolds.	Talk by Ronny Bergmann
13.02.2025	Fort, Pag&eactue;s: Asymptotic behavior of a Markovian stochastic algorithm with constant step, 1999.	reading session
06.02.2025	Bianchi, Hachem, Salim: Constant step stochastic approximations involving differential inclusions: stability, long-run convergence and applications, 2019.	reading session
30.01.2024	Optimization of Sums of Bivariate Functions.	Talk by Nils Müller
23.01.2025	Lewis, Lopez-Acedo, Nicolae: Basic convex analysis in metric spaces with bounded curvature, 2023.	reading session
16.01.2025	Elesedy: Group Symmetry in PAC Learning, 2022.	reading session
21.12.2024 to 12.01.2025	Winter Break	no session
19.12.2024	Hu, Xiao, Liu, Toh: A constraint dissolving approach for nonsmooth optimization over the Stiefel manifold, 2023.	reading session
12.12.2024	Ahn, Cutkosky: Adam with model exponential moving average is effective for nonconvex optimization, 2024.	reading session
05.12.2024	Domke, Grower, Garrigos: Provable convergence guarantees for black-box variational inference, 2023.	reading session
28.11.2024	Chen, Lasserre, Magron and Pauwels: Semialgebraic Optimization for Lipschitz Constants of ReLU Networks, 2020.	reading session
21.11.2024	Doikov, Mishchenko, Nesterov: Super-Universal Regularized Newton Method, 2022.	reading session
07.11.2024	Plug-and-Play Methods for Inverse Problems: Stability and Convergence of Solutions with Decreasing Noise Levels.	Talk by Andrea Ebner
31.10.2024	Higher-Order Methods for Composite Optimization with Application.	Talk by Yassine Nabou
24.10.2024	Aspman, Kungurstev, Seraji: Riemannian Stochastic Approximation for Minimizing Tame Nonsmooth Objective Functions, 2023.	reading session
17.10.2024	Barakat, Bianchi: Convergence and dynamical behavior of the ADAM algorithm for nonconvex stochastic optimization, 2021.	reading session
10.10.2024	Attouch, Chbani, Fadili, Riahi: First-order optimization algorithms via inertial systems with Hessian driven damping, 2019.	reading session
03.10.2024	Public Holiday	no session
26.09.2024	Khanh, Mordukhovich, Phat, Tran: Globally convergent coderivative-based generalized Newton methods in nonsmooth optimization, 2023.	reading session
19.09.2024	Lopez, Martin-Marquez, Wang: Resolvents of Set-Valued Monotone Vector Fields in Hadamard Manifolds, 2010.	reading session
12.09.2024	Variational Learning is Effective for Large Deep Networks. In this talk, I present extensive evidence against the common belief that variational Bayesian learning is ineffective for large neural networks. I demonstrate that a direct optimization of the variational objective with an Improved Variational Online Newton method (IVON) can consistently match or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. The talk concludes with several new use cases of variational learning where we improve fine-tuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data.	Talk by Thomas Möllenhoff
18.07. to 05.09.2022	Summer Break	no session
11.07.2024	Liu, Liu, Yao, Zeng, Zhang: Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-loop and Hessian-free Solution Strategy, 2024.	reading session
04.07.2024	Projected gradient descent accumulates at bouligand stationary point.	Talk by Guillaume Olikier
27.06.2024	Proximal Methods and Dynamical systems for Optimization and Variational Inequalities	Talk by Oday Hazaimah
20.06.2024	Marcotte, Gribonval, Peyre: Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows, 2023.	reading session
13.06.2024	Azizian, Iutzeler, Malick, Mertikopoulos: The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities, 2021.	reading session
06.06.2024	Mixed-integer linearity in nonlinear optimization: a trust region approach.	Talk by Alberto de Marchi
30.05.2023	Public Holiday	no session
23.05.2024	Liu, Pan, Wu, Yang: An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization, 2024.	reading session
16.05.2024	Dynamic Bilevel Learning with Inexact Line Search.	Talk by Mohammad Sadegh Salehi
09.05.2023	Public Holiday	no session
02.05.2024	Accelerated first-order optimization under smooth nonlinear constraints.	Talk by Michael Muehlebach
25.04.2024	Jin, Jiang, Mokhtari: Non-asymptotic Global Convergence Rates of BFGS with Exact Line Search, 2024.	reading session
18.04.2024	Bilevel Optimization for Traffic Mitigation in Optimal Transport Networks. Global infrastructure robustness and local transport efficiency are critical requirements for transportation networks. However, since passengers often travel greedily to maximize their benefits and trigger traffic jams, transportation performance can be heavily disrupted. In this talk, we present a dynamical system that effectively routes passengers along their shortest paths while also strategically tuning the network edge weights to reduce congestion. As a result, we enforce both global and local optimality of transport. We show how the system is effective on synthetic networks and real-world data. Our findings on the international European highways suggest that thoughtfully devised routing schemes might help to lower car-produced carbon emissions.	Talk by Alessandro Lonardi
11.04.2024	Gunasekar, Woodworth, Srebro: Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent, 2021.	reading session
04.04.2023	cancelled	no session
28.03.2024	Ye, Lin, Chang, Zhang: Towards explicit superlinear convergence rate for SR1, 2019.	reading session
21.03.2024	Monzio, Compagnoni, Biggio, Orvieto, Kersting, Proske, Lucchi: An SDE for Modeling SAM: Theory and Insights, 2023.	reading session
14.03.2024	Universal Gradient Methods for Stochastic Convex Optimization. We propose a new rule for adjusting the step size in the Stochastic Gradient method (SGD). This rule is related to that of the AdaGrad method but there are some significant differences. Most importantly, instead of using the norms of stochastic gradients, we use a stochastic approximation of the Bregman distance of the objective function. The resulting algorithm turns out to be the first universal method for Stochastic Optimization in the sense that it automatically adjusts not only to the oracle's noise level but also to the level of smoothness of the objective function. More specifically, our method has state-of-the-art worst-case convergence rate guarantees for the entire Hölder class of convex functions including both nonsmooth functions and those with Lipschitz continuous gradient. We also show how to use our approach for constructing an accelerated version of the Universal SGD with even better efficiency estimates.	Talk by Anton Rodomanov
07.03.2024	Liu, Grimmer: Gauges and Accelerated Optimization over Smooth and/or Strongly Convex Sets, 2023.	reading session
29.02.2024	Tseng, Yun: A coordinate gradient descent method for nonsmooth separable minimization, 2007.	reading session
22.02.2024	Scalable convex relaxations for manifold-valued variational problems We derive a moment relaxation for large-scale nonsmooth optimization problems with graphical structure and spherical constraints. In contrast to classical moment relaxations for global polynomial optimization that suffer from the curse of dimensionality we exploit the partially separable structure of the optimization problem to reduce the dimensionality of the search space. Leveraging optimal transport and Kantorovich-Rubinstein duality we decouple the problem and derive a tractable dual subspace approximation of the infinite-dimensional problem using spherical harmonics. This allows us to tackle polynomial optimization problems with spherical constraints and geodesic coupling terms. We show that the duality gap vanishes in the limit by proving that a Lipschitz continuous dual multiplier on a unit sphere can be approximated as closely as desired in terms of a Lipschitz continuous polynomial. The formulation is applied to sphere-valued imaging problems with total variation regularization and graph-based SLAM. In imaging tasks our approach achieves small duality gaps for a moderate degree. In graph-based SLAM our approach often finds solutions which after refinement with a local method are near the ground truth solution.	Talk by Robin Kenis
15.02.2024	Orvieto: An Accelerated Lyapunov Function for Polyak's Heavy-Ball on Convex Quadratics, 2023.	reading session
08.02.2024	Nonsmooth Nonconvex Stochastic Heavy-Ball Motivated by the conspicuous use of momentum based algorithms in deep learning, we study a nonsmooth nonconvex stochastic heavy ball method and show its convergence. Our approach relies on semialgebraic assumptions, commonly met in practical situations, which allow to combine a conservative calculus with nonsmooth ODE methods. In particular, we can justify the use of subgradient sampling in practical implementations that employ backpropagation or implicit differentiation. Additionally, we provide general conditions for the sample distribution to ensure the convergence of the objective function. As for the stochastic subgradient method, our analysis highlights that subgradient sampling can make the stochastic heavy ball method converge to artificial critical points. We address this concern showing that these artifacts are almost surely avoided when initializations are randomized.	Talk by Tam Le
01.02.2024	Lugosi, Neu: Generalisation Bounds via Convex Analysis, 2022.	reading session
25.01.2024	Chambolle, Contreras: Accelerated Bregman Primal-Dual methods applied to Optimal Transport and Wasserstein Barycenter problems, 2022.	reading session
18.01.2024	Zhang, Qu, Wright: From Symmetry to Geometry: Tractable Nonconvex Problems, 2021.	reading session
11.01.2024	Absil, Hosseini: A Collection of Nonsmooth Riemannian Optimization Problems, 2019.	reading session
23.12.2023 to 07.01.2024	Winter Break	no session
20.12.2023	Gutman, Pena: Perturbed Fenchel duality and first-order methods, 2023.	reading session
13.12.2023	Haddouche, Guedj: Wasserstein PAC-Bayes Learning: Exploiting Optimisation Guarantees to Explain Generalisation, 2023.	reading session
07.12.2023	Damian, Nichani, Lee: Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability, 2022.	reading session
30.11.2023	Master Thesis Defense: Near-optimal Closed-loop Method via Lyapunov Damping for Convex Optimization.	Talk by Severin Maier
23.11.2023	Amit, Epstein, Moran, Meir: Integral Probability Metrics PAC-Bayes Bounds, 2022.	reading session
16.11.2023	Alimisis, Orvieto, Becigneul, Lucchi: Momentum Improves Optimization on Riemannian Manifold , 2021.	reading session
09.11.2023	MOP Retreat	no session
02.11.2023	cancelled	no session
26.10.2023	Cockayne, Oates, Ipsen, Girolami: A Bayesian Conjugate-Gradient Method, 2018.	reading session
19.10.2023	Ablin, Peyre: Fast and accurate optimization on the orthogonal manifold without retraction, 2022.	reading session
12.10.2023	Jiang, Mokhtari: Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization, 2023.	reading session
05.10.2023	Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction.	Talk by Xiaowen Jiang
28.09.2023	Grimmer: Provably Faster Gradient Descent via Long Steps, 2023.	reading session
21.09.2023	Bolte, Pauwels, Vaiter: One-step differentiation of iterative algorithms, 2023.	reading session
14.09.2023	Ruszczynski: Convergence of a Stochastic Subgradient Method with Averaging for Nonsmooth Nonconvex Constrained Optimization, 2020.	reading session
03.08. to 07.09.2022	Summer Break	no session
27.07.2023	Wibisono: Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem, 2018.	reading session
20.07.2023	Molinari, Massias, Rosasco, Villa: Iterative regularization for low complexity regularizers, Part II, 2022.	Talk by Cesare Molinari
13.07.2023	Chizat, Bach: On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport, 2018.	reading session
06.07.2023	Molinari, Massias, Rosasco, Villa: Iterative regularization for low complexity regularizers, 2022.	Talk by Cesare Molinari
29.06.2023	Scieur, d'Aspremont, Bach: Regularized Nonlinear Acceleration, 2019.	reading session
22.06.2023	Regularized Renyi divergence minimization through Bregman proximal gradient algorithms We study the variational inference problem of minimizing a regularized Rényi divergence over an exponential family, and propose a relaxed moment-matching algorithm, which includes a proximal-like step. Using the information-geometric link between Bregman divergences and the Kullback-Leibler divergence, this algorithm is shown to be equivalent to a Bregman proximal gradient algorithm. This novel perspective allows us to exploit the geometry of our approximate model while using stochastic black-box updates. We use this point of view to prove strong convergence guarantees including monotonic decrease of the objective, convergence to a stationary point or to the minimizer, and geometric convergence rates. These new theoretical insights lead to a versatile, robust, and competitive method, as illustrated by numerical experiments.	Talk by Thomas Guilmeau
15.06.2023	Fadili, Malick, Peyre: Sensitivity Analysis for Mirror-Stratifiable Convex Functions, 2017.	reading session
08.06.2023	Public Holiday	no session
01.06.2023	Scieur, Bertrand, Gidel, Pedregosa: The Curse of Unrolling: Rate of Differentiating Through Optimization, 2022.	reading session
25.05.2023	Dung, Vu: A stochastic variance reduction algorithm with Bregman distances for structured composite problems, 2021.	reading session
18.05.2023	Public Holiday	no session
11.05.2023	Du, Zhai, Poczos, Singh: Gradient Descent Provable Optimizes Over-parametrized Neural Networks, 2019.	reading session
04.05.2023	Revisiting Gradient Clipping Gradient clipping is a popular modification to standard (stochastic) gradient descent, at every iteration limiting the gradient norm to a certain value c>0. It is widely used for example for stabilizing the training of deep learning models (Goodfellow et al., 2016), or for enforcing differential privacy (Abadi et al., 2016). Despite the popularity and the simplicity of the clipping mechanism, its convergence guarantees often require specific values of c and strong noise assumptions. In this talk, we discuss convergence guarantees that show precise dependence on arbitrary clipping thresholds c and show that our guarantees are tight with both deterministic and stochastic gradients.	Talk by Sebastian Stich
27.04.2023	Poon, Peyre: Smooth Bilevel Programming for Sparse Regularization, 2021.	reading session
20.04.2023	Attouch, Bot, Csetnek: Fast optimization via inertial dynamics with closed-loop damping, 2022.	reading session
13.04.2023	Calatroni, Garrigos, Rosasco, Villa: Accelerated Iterative Regularization via Dual Diagonal Descent, 2021.	reading session
06.04.2023	Behling, Bello-Cruz, Iusem, Liu, Santos: A successive centralized circumcenter reflection method for the convex feasibility problem, 2023.	reading session
30.03.2023	Scieur, Pedregosa: Universal Average-Case Optimality of Polyak Momentum, 2020.	reading session
23.03.2023	Yang, Milzarek, Wen, Zhang: A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization, 2023.	reading session
16.03.2023	Kawaguchi: On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers, 2021.	reading session
09.03.2023	Zhang, Sra: Towards Riemannian Accelerated Gradient Methods, 2018.	reading session
02.03.2023	Cartis, Gould, Toint: On the complexity of steepest descent, Newton's and Regularized Newton's methods for non-convex unconstrained optimization, 2010.	reading session
23.03.2023	Drusvyatskiy, Ioffe, Lewis: Transversality and alternating projections for nonconvex sets, 2016.	reading session
16.02.2023	Davis, Drusvyatskiy: Graphical Convergence of Subgradients in Nonconvex Optimization and Learning, 2018.	reading session
09.02.2023	Charisopoulos, Davis: A superlinearly convergent subgradient method for sharp semismooth problems, 2022.	reading session
02.02.2023	Continuous Newton-like Methods featuring Inertia and Variable Mass	Talk by Camille Castera
26.01.2023	Finlay, Calder, Abbasi, Oberman: Lipschitz regularized Deep Neural Networks generalize and are adversarially robust, 2019.	reading session
12.01.2023	Attouch, Ioan Bot, Nguyen: Fast convex optimization via time scale and averaging of the steepest descent, 2022.	reading session
22.12.2021 to 08.01.2022	Winter Break	no session
15.12.2022	PAC-Bayesian Learning of Optimization Algorithms We apply the PAC-Bayes theory to the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-bounds) and explicit trade-off between a high probability of convergence and a high convergence speed. Even in the limit case, where convergence is guaranteed, our learned optimization algorithms provably outperform related algorithms based on a (deterministic) worst-case analysis. Our results rely on PAC-Bayes bounds for general, unbounded loss-functions based on exponential families. By generalizing existing ideas, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum, which enables the algorithmic realization of the learning procedure. As a proof-of-concept, we learn hyperparameters of standard optimization algorithms to empirically underline our theory.	Talk by Michael Sucker
08.12.2022	SAM as an Optimal Relaxation of Bayes	Talk by Thomas Möllenhoff
01.12.2022	Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We consider such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity analysis and parameter learning optmization problems. We show that under partial smoothness and other mild assumptions, Automatic Differentiation (AD) of the sequence generated by proximal splitting algorithms converges to the derivative of the solution mapping. For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD and moreover provide faster convergence theoretically. We numerically illustrate the convergence and convergence rates of AD and FPAD on Lasso and Group Lasso problems and demonstrate the working of FPAD on prototypical practical image denoising problem by learning the regularization term.	Talk by Sheheryar Mehmood
24.11.2022	Inertial Quasi-Newton Methods for Monotone Inclusion: Efficient Resolvent Calculus and Primal-Dual Methods We introduce an inertial quasi-Newton Forward-Backward Splitting Algorithm to solve a class of monotone inclusion problems. While the inertial step is computationally cheap, in general, the bottleneck is the evaluation of the resolvent operator. A change of the metric makes its computation hard even for (otherwise in the standard metric) simple operators. In order to fully exploit the advantage of adapting the metric, we develop a new efficient resolvent calculus for a low-rank perturbed standard metric, which accounts exactly for quasi-Newton metrics. Moreover, we prove the convergence of our algorithms, including linear convergence rates in case one of the two considered operators is strongly monotone. Beyond the general monotone inclusion setup, we instantiate a novel inertial quasi-Newton Primal-Dual Hybrid Gradient Method for solving saddle point problems. The favourable performance of our inertial quasi-Newton PDHG method is demonstrated on several numerical experiments in image processing.	Talk by Shida Wang
17.11.2022	Bolte, Combettes, Pauwels: The Iterates of the Frank-Wolfe Algorithm May Not Converge, 2022.	reading session
10.11.2022	Kunstner, Kumar, Schmidt: Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent, 2021.	reading session
03.11.2022	Mei, Bai, Montanari: The Landscape of Empirical Risk for Non-convex Losses, 2017.	reading session
27.10.2022	Aberdam, Beck: An Accelerated Coordinate Gradient Descent Algorithm for Non-separable Composite Optimization, 2021.	reading session
20.10.2022	Zhang, Sra: First-order Methods for Geodesically Convex Optimization, 2016.	reading session
13.10.2022	Du, Jin, Lee, Jordan: Gradient Descent Can Take Exponential Time to Escape Saddle Points, 2017.	reading session
06.10.2022	Subgradient sampling for nonsmooth nonconvex minimization We focus on a stochastic minimization problem in the nonsmooth and nonconvex setting which applies for instance to the training of deep learning models. A popular way in the machine learning community to deal with this problem is to use stochastic gradient descent (SGD). This method combines both subgradient sampling and backpropagation, which is an efficient implementation of the chain-rule formula on nonsmooth compositions. Due to the incompatibility of these operations in the nonsmooth world, the SGD can generate spurious critical points in the optimization landscape which does not guarantee the convergence of the iterates to the critical set. We will explain in this talk how the model of Conservative Gradients is compatible with subgradient sampling and backpropagation, allowing to obtain convergence results for nonsmooth SGD. By means of definable geometry, we will emphasize that functions used in machine learning are locally endown with geometric properties of piecewise affine functions. In this setting, chain-ruling nonsmooth functions, and sampling subgradients output conservative gradients, but also, spurious critical points are hardly attained when performing SGD in practice.	Talk by Tam Le
29.09.2022	Lyu, Li: Gradient Descent Maximizes the Margin of Homogeneous Neural Networks, 2020.	reading session
22.07.2022	An SDE perspective on stochastic convex optimization We analyze the global and local behavior of gradient-like flows under stochastic errors towards the aim of solving convex optimization problems with noisy gradient input. We first study the unconstrained differentiable convex case, using a stochastic differential equation where the drift term is minus the gradient of the objective function and the diffusion term is either bounded or square-integrable. In this context, under Lipschitz continuity of the gradient, our first main result shows almost sure convergence of the objective and the trajectory process towards a minimizer of the objective function. We also provide a comprehensive complexity analysis by establishing several new pointwise and ergodic convergence rates in expectation for the convex, strongly convex, and (local) Lojasiewicz case. The latter, which involves local analysis, is challenging and requires non-trivial arguments from measure theory. Then, we extend our study to the constrained case and more generally to certain nonsmooth situations. We show that several of our results have natural extensions obtained by replacing the gradient of the objective function by a cocoercive monotone operator. This makes it possible to obtain similar convergence results for optimization problems with an additively "smooth + non-smooth" convex structure. Finally, we consider another extension of our results to non-smooth optimization which is based on the Moreau envelope.	Talk by Rodrigo Maulen Soto
15.09.2022	Bianchi, Hachem, Schechtman: Convergence of constant step stochastic gradient descent for non-smooth non-convex functions, 2020.	reading session
08.09.2022	Nonsmooth Implicit Differentiation for Machine-Learning and Optimization	Talk by Tony Silveti-Falls
01.09.2022	Jin, Koppel, Rajawat, Mokhtari: Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood, 2022.	reading session
28.07. to 25.08.2022	Summer Break	no session
21.07.2022	Luo, Chen: From differential equation solvers to accelerated first-order methods for convex optimization, 2022.	reading session
14.07.2022	Benaim, Hofbauer, Sorin: Stochastic Approximations and Differential Inclusions, 2005.	reading session
07.07.2022	Davis, Drusvyatskiy, Kakade, Lee: Stochastic Subgradient Method Converges on Tame Functions, 2020.	reading session
30.06.2022	Savarino, Schnörr: Continuous-Domain Assignment Flows, 2019.	reading session
23.06.2022	Lewis, Malick: Alternating Projections on Manifolds, 2008.	reading session
16.06.2022	Public Holiday	no session
09.06.2022	Ahn, Sra: From Proximal Point Method to Nesterov's Acceleration, 2020.	reading session
02.06.2022	Luo, Chen: From differential equation solvers to accelerated first-order methods for convex optimization, 2022.	reading session
26.05.2022	Public Holiday	no session
19.05.2022	Drori, Taylor: Efficient first-order methods for convex minimization: a constructive approach, 2020.	reading session
12.05.2022	Li, Pong: Calculus of the exponent of Kurdyka-Lojasiewicz inequality and its applications to linear convergence of first-order methods, 2021.	reading session
05.05.2022	Solving 0-1 ILPs with Binary Decision Diagrams We present a Lagrange decomposition method for solving 0-1 integer linear programs occurring in structured prediction. We propose a sequential and a massively min-marginal averaging schemes for solving the Lagrangean dual and a perturbation technique for decoding primal solutions. For representing subproblems we use binary decision diagrams (BDDs), which support efficient computation of subproblem solutions and update steps. We present experimental results on combinatorial problems from MAP inference for Markov Random Fields, quadratic assignment and cell tracking for developmental biology. Our highly parallel GPU implementation improves, comes close to, or outperform some state-of-the-art specialized heuristics while being problem agnostic.	Talk by Paul Swoboda
28.04.2022	Bredies, Chenchene, Lorenz, Naldi: Degenerate Preconditioned Proximal Point algorithms, 2021.	reading session
21.04.2022	Frecon, Salzo, Pontil, Gasso: Bregman Neural Networks, 2022.	reading session
14.04.2022	Pedregosa, Scieur: Average-Case Acceleration Through Spectral Density Estimation, 2020.	reading session
07.04.2022	Li, Voroninski: Sparse Signal Recovery from Quadratic Measurements via Convex Programming, 2012.	reading session
31.03.2022	Master Thesis Defense	Talk by Michael Sucker
24.03.2022	Lewis, Tian: The structure of conservative gradient fields, 2021.	reading session
17.03.2022	Progress of current research	Talk by Sheheryar Mehmood
10.03.2022	Progress of current research	Talk by Rodrigo Maulen
03.03.2022	Progress of current research	Talk by Jean-Jacques Godeme
24.02.2022	Raskutti, Mukherjee: The information geometry of mirror descent, 2013.	Reading Session
17.02.2022	Progress of current research	Talk by Oskar Adolfson
10.02.2022	Muehlebach, Jordan: Continuous-time Lower Bounds for Gradient-based Algorithms, 2020.	reading session
03.02.2022	An Inertial Newton Algorithm for Deep Learning	Talk by Camille Castera
27.01.2022	Alvarez, Attouch, Bolte, Redont: A second-order gradient-like dissipative dynamical system with Hessian-driven damping, 2002.	Reading Session
20.01.2022	Progress of current research	Talk by Shida Wang
13.01.2022	Apidopoulos, Ginatta, Villa: Convergence rates for the Heavy-Ball continuous dynamics for non-convex optimization, under Polyak-Łojasiewicz conditioning, 2021.	Reading Session
23.12.2021 to 06.01.2022	Break	no session
16.12.2021	Calatroni, Chambolle: Backtracking Strategies for Accelerated Descent Methods with Smooth Composite Objectives, 2019.	reading session
09.12.2021	Sahin, Eftekhari, Alacaoglu, Latorre, Cevher: An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints, 2019.	reading session
02.12.2021	Muehlebach, Jordan: On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems, 2020.	reading session
25.11.2021	Andersen, Dahl, Vandenberghe: Logarithmic barriers for sparse matrix cones, 2013.	reading session
18.11.2021	Nesterov: Implementable tensor methods in unconstrained convex optimization, 2021.	reading session
11.11.2021	Wang, Fang, Liu: Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values, 2016.	reading session
04.11.2021	Wilson, Recht, Jordan: A Lyapunov Analysis of Momentum Methods in Optimization, 2016.	reading session
28.10.2021	Connor, Vandenberghe: On the equivalence of the primal-dual hybrid gradient method and Douglas-Rachford splitting, 2020.	reading session
21.10.2021	Bilevel Learning for Inverse Problems Variational regularization techniques are dominant in the field of inverse problems. A drawback of these techniques is that they are dependent on a number of parameters which have to be set by the user. This issue can be approached by machine learning where we estimate these parameters from data. This is known as "Bilevel Learning" and has been successfully applied to many tasks, some as small-dimensional as learning a regularization parameter, others as high-dimensional as learning a sampling pattern in MRI. While mathematically appealing this strategy leads to a nested optimization problem which is computationally difficult to handle. In this talk we discuss several applications of bilevel learning for imaging as well as new computational approaches. There are quite a few open problems in this relatively recent field of study, some of which I will highlight along the way.	Talk by Matthias Ehrhardt
14.10.2021	Fadili, Malick, Peyre: Sensitivity Analysis for Mirror-Stratifiable Convex Functions, 2017.	reading session
07.10.2021	cancelled	no session
30.09.2021	Gower, Schmidt, Bach, Richtarik: Variance-reduced methods for machine learning, 2020.	reading session
23.09.2021	Sabach, Teboulle: Faster Lagrangian-Based Methods in Convex Optimization, 2020.	reading session
16.09.2021	Proximal Algorithms for Scalar and Vector Optimization Problems This presentation will go through several techniques for solving both scalar and vector optimization problems with various objective functions. On scalar version of optimization problems, we intend to discuss how third-order differential equations, or equations with an order higher than three, can contribute to development of fast optimization methods. The first problem to consider is to minimize the real-valued, convex, and continuously differentiable function defined on a Hilbert space. The problem will then be altered to minimize non- smooth and non-convex functions. We show that the convergence rate of the continuous dynamical system and the proximal algorithm obtained from discretization of it is the same. At the end of this route, we want to analyze the convergence of trajectories towards optimal solutions in both continuous and discrete versions. In the case of vector optimization problems, we aim to present a model of Alternating Direction Method of Multipliers (ADMM) which is a simple and powerful proximal algorithm, while the problem is minimization of sum of two vector-valued functions with a linear constraint respect to variables. This research is vital since there is a lack of literature on proximal techniques for tackling vector optimization problems.	Talk by Maede Ramazannejad
09.09.2021	Teboulle, Vaisbourd: Novel proximal gradient methods for nonnegative matrix factorization with sparsity constraints, SIIMS 13(1):381-421, 2020.	reading session
08.2021	Break	no session
29.07.2021	Reading: Chatterji, Diakonikolas, Jordan, Bartlett: Langevin Monte Carlo without smoothness, 2020.	reading session
22.07.2021	Pokutta, Spiegel, Zimmer: Deep Neural Network Training with Frank-Wolfe, 2020.	reading session
15.07.2021	Drori, Teboulle: Performance of first-order methods for smooth convex minimization: a novel approach, 2020.	reading session
08.07.2021	Lin, Jin, Jordan: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems, 2020.	reading session
01.07.2021	Nonconvex Min-max Optimization in Machine Learning: Algorithms and Applications. Abstract: Min-max Optimization has broad applications in machine learning, e.g., Adversarial Robustness, AUC Maximization, Generative Adversarial Networks, etc. However, modern machine learning models are intrinsically nonconvex (e.g., deep learning, matrix factorization), which brings tremendous challenge for min-max optimization. In this talk, I will talk about my work on provably efficient algorithms for min-max optimization in the presence of non-convexity. This talk will showcase a few results. First, I will talk about nonconvex-concave min-max optimization, with applications in deep AUC optimization. We designed polynomial-time algorithms to converge to stationary point, as well as maximal AUC value, under different assumptions. Second, I will talk about nonconvex-nonconcave min-max optimization, with applications in training Generative Adversarial Networks. The key result is that under the Minty Variational Inequality (MVI) condition, we design polynomial-time algorithms to find stationary points.	Talk by Mingrui Liu
24.06.2021	Stochastic trust-region method with adaptive sample sizes for finite-sum minimization problems. In this talk, we present SIRTR (Stochastic Inexact Restoration Trust-Region method) for solving finite-sum minimization problems. At each iteration, SIRTR approximates both function and gradient by sampling. The function sample size is computed using a deterministic rule inspired by the inexact restoration method, whereas the gradient sample size can be smaller than the sample size employed in function approximations. Notably, our approach may allow the decrease of the sample sizes at some iterations. We show that SIRTR eventually reaches full precision in evaluating the objective function and we provide a worst-case complexity result on the number of iterations required to achieve full precision. Numerical results on nonconvex binary classification problems confirm that SIRTR is able to provide accurate approximations way before the maximum sample size is reached and without requiring a problem-dependent tuning of the parameters involved.	Talk by Simone Rebegoldi
17.06.2021	Jiang, Vandenberghe: "Bregman primal-dual first-order method and application to sparse semidefinite programming", 2020.	reading session
10.06.2021	Salimans, Zhang, Radford, Metaxas: "Improving GANs Using Optimal Transport", ICLR 2018.	reading session
03.06.2021	Public Holiday	no session
27.05.2021	Cuturi, Teboul, Vert: "Differentiable Ranks and Sorting using Optimal Transport." NeurIPS, 2019.	reading session
20.05.2021	Adaptive Acceleration for First-order Methods. Abstract: First-order operator splitting methods are ubiquitous among many fields through science and engineering, such as inverse problems, image processing, statistics, data science and machine learning, to name a few. In this talk, through the fixed-point sequence, I will first discuss a geometry property of first-order methods when applying to solve non-smooth optimization problems. Then I will discuss the limitation of current widely used "inertial acceleration" technique, and propose a trajectory following adaptive acceleration algorithm. Global convergence is established for the proposed acceleration scheme based on the perturbation of fixed-point iteration. Locally, connections between the acceleration scheme and the well studied "vector extrapolation technique" in the field of numerical analysis will be discussed, followed by acceleration guarantees of the proposed acceleration scheme. Numeric experiments on various first-order methods are provided to demonstrate the advantage of the proposed adaptive acceleration scheme.	Talk by Jingwei Liang
13.05.2021	Public Holiday	no session
06.05.2021	Rockafellar : "Augmented Lagrange Multiplier Functions and Duality in Nonconvex Programming." SIAM Journal on Control, 1974.	reading session
29.04.2021	Attouch, Bolte, Svaiter: "Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods." Mathematical Programming, 2013.	reading session
22.04.2021	Implicit differentiation for fast hyperparameter selection in non-smooth convex learning. Abstract: Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian. Using implicit differentiation, we show it is possible to leverage the non-smoothness of the inner problem to speed up the computation. Finally, we provide a bound on the error made on the hypergradient when the inner optimization problem is solved approximately. Results on regression and classification problems reveal computational benefits for hyperparameter optimization, especially when multiple hyperparameters are required.	Talk by Quentin Bertrand
15.04.2021	Chen, Rubanova, Bettencourt, Duvenaud: "Neural ordinary differential equations." NeurIPS, 2018.	reading session
08.04.2021	Bach: "Submodular functions: from discrete to continuous domains", Mathematical Programming, 2019.	reading session
01.04.2021	Learning to optimize with unrolled algorithms Abstract: When solving multiple optimization problems sharing the same underlying structure, using iterative algorithms designed for worst case scenario can be considered as inefficient. When one aim at having good solution in average, it is possible to improve the performances by learning the weights of a neural networked designed to mimic an unfolded optimization algorithm. However, the reason why learning the weights of such a network would accelerate the problem resolution is not always clear. In this talk, I will first review how one can design unrolled algorithms to solve the linear regression with l1 or TV regularization, with a particular focus on the choice of parametrization and loss. Then, I will discuss the reasons why such procedure can lead to better results compared to classical optimization, with a particular focus on the choice of step sizes.	Talk by Thomas Moreau
25.03.2021	Wright: "Coordinate Descent Algorithms", Mathematical Programming 151:3-34, 2015.	reading session
18.03.2021	Dragomir, d'Aspremont, Bolte: "Quartic first-order methods for low rank minimization." Journal of Optimization Theory and Applications, 2021	reading session
11.03.2021	Monsaingeon, Vorotnikov: "The Schrödinger problem on the non-commutative Fisher-Rao space." Calculus of Variations and Partial Differential Equations, 2021.	reading session
04.03.2021	Kolmogorov: "Recursive frank-wolfe algorithms", 2020.	reading session
25.02.2021	Bolte, Pauwels: "A mathematical model for automatic differentiation in machine learning". NeurIPS, 2020.	reading session
18.02.2021	Dragomir, Taylor, d'Aspremont, and Bolte: "Optimal complexity and certification of Bregman first-order methods". arXiv preprint, 2019.	reading session
11.02.2021	Kolmogorov: "Convergent tree-reweighted message passing for energy minimization". IEEE transactions on pattern analysis and machine intelligence, 2006.	reading session
04.02.2021	Dalalyan: "Theoretical guarantees for approximate sampling from smooth and log-concave densities". Journal of the Royal Statistical Society, 2017.	reading session
28.01.2021	Djolonga, Krause: "Differentiable Learning of Submodular Models". Advances in Neural Information Processing Systems, 2017.	reading session
21.01.2021	"A Stochastic Bregman Primal-Dual Splitting Algorithm for Composite Optimization".	Talk by Tony Silveti-Falls
14.01.2021	Van den Berg, Friedlander: "Sparse optimization with least-squares constraints". SIAM Journal on Optimization, 2011.	reading session
17.12.2020	Stella, Themelis, Patrinos: "Forward-backward quasi-Newton methods for nonsmooth optimization problems". Computational Optimization and Applications 67:443-487, 2017.	reading session
10.12.2020	"Lifting the Convex Conjugate in Lagrangian Relaxations".	Talk by Emanuel Laude
03.12.2020	Dlask, Werner: "A Class of Linear Programs Solvable by Coordinate-wise Minimization". 2020.	reading session
02.12.2020	S. Wang: "Numerical scheme for Wasserstein distance on Manifold". Master's Thesis Mathematics, University of Bonn, 2020.	Talk by Shida Wang
26.11.2020	Attouch, Chbani, Fadili, Riahi: "First-order optimization algorithms via inertial systems with Hessian driven damping". Mathematical Programming, 2020.	reading session
19.11.2020	Ablin, Peyré, Moreau: "Super-efficiency of automatic differentiation for functions defined as a minimum". ICML, 2020.	reading session
12.11.2020	Mukkamala, Ochs: "Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms". NeurIPS, 2019.	Talk by W. Bender
05.11.2020	Ghadimi, Lan: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. 2013.	reading session
29.10.2020	Swoboda, Kolmogorov: "MAP inference via Block-Coordinate Frank-Wolfe Algorithm". CVPR, 2018.	reading session
22.10.2020	Ahookhosh, Hien, Gillis, Patrinos: A block inertial Bregman proximal algorithm for nonsmooth nonconvex problems, 2020.	reading session
15.10.2020	Pilanci, Ergen: "Neural Networks are Convex Regularizers". 2020.	reading session

Description

This is a research seminar about various topics in Mathematical Optimization that is organized by the MOP Group in form of reading sessions. We discuss important papers around mathematical optimization including (but not limited to) the following topics:

Non-smooth Analysis and Optimization
Convex or Non-convex Optimization
Parametric Optimization and Parameter Learning
Bilevel Optimization
Dynamic Programming and Discrete Optimization
Applications in Machine Learning, Computer Vision, Image Processing, Statistics, ...

Key for a convenient and beneficial reading session will be the active participation and discussion.

Organization

Format:

The reading session is (until further notice) organized via Zoom Meetings. Usually, one paper is discussed in a session. Everyone who wants to participate in the reading session should have read the paper. Before the meeting, a leader is assigned to the paper, who presents it. He or she is not supposed to be the expert of the topic.

Following are some guidelines:

The leader guides the reading by going through the paper step by step.
Everyone can add comments or start a discussion at any time.
Explicitly say ''I haven't understood this or that part.''
Discuss the 'big picture' of the paper and the context.
Discuss details of common interest and postpone specific details to individual discussions.

The goal is to answer the following questions:

What is the key contribution?
What do we learn from the paper?
What's the limitations of their approach?
What are the open questions?

Registration:

Please send an eMail to Peter Ochs, confirm your interest in this research seminar, and note whether you would like to join as guest or regular member (listed below). As a regular member, you should be more than willing to join the session regularly. You will have the permission to suggest papers directly in the spreadsheet and to vote for the paper of the next reading session.

Paper Collection:

The papers that are discussed in the reading session are organized in the following

google spreadsheet.

Access to the spreadsheet and the zoom link is provided by eMail after registration.

Paper Selection:

Every week, each regular members has in total three votes for papers in the section 'Upcoming Reading', which may be distributed to different papers. Deadline for voting is on Friday 2 pm for the upcoming week. (Votes cannot be accumulated.)