Welcome to the homepage of the

Mathematical Optimization for Data Science Group

Department of Mathematics and Computer Science, Saarland University, Germany

Learning to Optimize

Typically, optimization algorithms are derived analytically by considering a certain class of problems for which one can perform a worst-case analysis to prove convergence and derive corresponding convergence rates. This approach provides a theoretically guaranteed performance of the algorithm for any case in question. However, this comes at a cost: The worst-case analysis often forbids to use statistical properties of the problem, and the analytical tractability severly limits the design choices. As a result, many classical optimization algorithms are robust and easy to implement, yet their actual performance might prove to be unsatisfactory in many cases. On the other hand, the recently growing field of Learning-to-Optimize leverages machine learning techniques to develop optimization methods. Especially, this alleviates the bounds of analytical tractability and allows for automation of their design in a data-driven manner. As a result, learned algorithms are often able to outperform classical ones by several order of magnitude by utilizing more of the statistical properties that prevail in the given distribution of problems. However, also this comes at a cost: By explicitly going beyond analtic features, one looses the typical convergence guarantees for the learned algorithm, which would render its application at least questionable.

One way to mitigate this problem is by means of generalization results: Since the algorithm is actively trained on a given dataset, it typically solves the corresponding problems very efficiently. Thus, by proving that its performance generalizes to the whole data-generating distribution ("in-distribution generalization"), one gets an algorithm that is specifically tailored to these kind of problems and has a guaranteed performance on them. Nevertheless, the learned algorithm still might fail on problems stemming from different distributions ("out-of-distribution generalization"), that is, it might not be applicable to whole classes of problems without any further ado. Therefore, the worlds of Learning-to-Optimize and classical optimization have to be brought closer together to achieve reliability and speed at the same time.

Our Contributions:

[SO23] M. Sucker, P. Ochs:
PAC-Bayesian Learning of Optimization Algorithms. [pdf]
In F. Ruiz, J. Dy, J-W. van (Eds.): International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, Vol. 206, 8145-8164, , 2023.

We present a theoretical framework for learning optimization algorithms based on the empirical risk. In this framework, we measure the performance of the algorithm by means of the loss of the last iterate, and the algorithm is assumed to be parametric, that is, it has hyperparameters that have to be chosen by the user to optimize its performance. In doing so, we provide PAC-Bayesian generalization guarantees, that is, we are able to learn a distribution over these hyperparameters in a data-driven way, which guarantees that, with high-probability, the performance of the algorithm on the training data generalizes to the whole data-generating distribution (that defines the class of optimization problems).


[SFO24] M. Sucker, J. Fadili, P. Ochs:
Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation. [pdf]
Technical Report, ArXiv e-prints, arXiv:2404.03290, 2024.

The PAC-Bayesian approach to learning has received considerable attention in recent years. Typically, this approach is built upon a so-called prior distribution, and it was reported many times that the achieved performance and the given guarantees rely heavily on the actual choice of this distribution. We also discovered this problem in our conference paper [SO23] and subsequent experiments. However, based on our theoretical constraints, we actively have to construct a suitable distribution. In doing so, we report several (new and crucial) design choices for Learning-to-Optimize, for example, a very specific loss-function for training. To validate the procedure and our theoretical generalization guarantees, we conduct four experiments, ranging from strongly convex and smooth problems up to non-smooth and non-convex problems. Taken together, we present a complete framework for learning optimization algorithms with PAC-Bayesian guarantees.



MOP Group
©2017-2024
The author is not
responsible for
the content of
external pages.