(image credits)

  Bokun Wang

I am a Ph.D. student at Texas A&M University. I got my bachelor's degree in Computer
Science from University of Electronic Science and Technology of China (UESTC) in 2018.
My current research interest is continuous optimization.

Selected Manuscripts

    ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization
    Bokun Wang and Tianbao Yang
    Preprint, 2023. [arxiv]
    + abstract
    This paper revisits a class of convex Finite-Sum Coupled Compositional Stochastic Optimization (cFCCO) problems with many applications, including group distributionally robust optimization (GDRO), reinforcement learning, and learning to rank. To better solve these problems, we introduce a unified family of efficient single-loop primal-dual block-coordinate proximal algorithms, dubbed ALEXR. This algorithm leverages block-coordinate stochastic mirror ascent updates for the dual variable and stochastic proximal gradient descent updates for the primal variable. We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions of involved functions, which not only improve the best rates in previous works on smooth cFCCO problems but also expand the realm of cFCCO for solving more challenging non-smooth problems such as the dual form of GDRO. Finally, we present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate stochastic algorithms for the considered class of cFCCO problems.
    Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications
    Bokun Wang and Tianbao Yang
    In Proc. of the 39th International Conference on Machine Learning (ICML), 2022.
    [paper] [arxiv (with updates)] [bib] [code] [poster] [slides]
    + abstract
    This paper studies stochastic optimization for a sum of compositional functions, where the inner-level function of each summand is coupled with the corresponding summation index. We refer to this family of problems as finite-sum coupled compositional optimization (FCCO). It has broad applications in machine learning for optimizing non-convex or convex compositional measures/objectives such as average precision (AP), p-norm push, listwise ranking losses, neighborhood component analysis (NCA), deep survival analysis, deep latent variable models, etc., which deserves finer analysis. Yet, existing algorithms and analyses are restricted in one or other aspects. The contribution of this paper is to provide a comprehensive convergence analysis of a simple stochastic algorithm for both non-convex and convex objectives. Our key result is the improved oracle complexity with the parallel speed-up by using the moving-average based estimator with mini-batching. Our theoretical analysis also exhibits new insights for improving the practical implementation by sampling the batches of equal size for the outer and inner levels. Numerical experiments on AP maximization, NCA, and p-norm push corroborate some aspects of the theory.
    IntSGD: Adaptive Floatless Compression of Stochastic Gradients
    Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, and Peter Richtárik
    In Proc. of the 10th International Conference on Learning Representations (ICLR), 2022 (Spotlight).
    [paper] [arxiv] [bib] [code] [poster] [slides]
    + abstract
    We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float. This is achieved by multiplying floating-point vectors with a number known to every device and then rounding to integers. In contrast to the prior work on integer compression for SwitchML by Sapio et al. (2021), our IntSGD method is provably convergent and computationally cheaper as it estimates the scaling of vectors adaptively. Our theory shows that the iteration complexity of IntSGD matches that of SGD up to constant factors for both convex and non-convex, smooth and non-smooth functions, with and without overparameterization. Moreover, our algorithm can also be tailored for the popular all-reduce primitive and shows promising empirical performance.


    King Abdullah University of Science and Technology (KAUST)
    Remote research intern advised by Prof. Peter Richtárik, September 2020 - August 2021.

    Department of Mathematics, University of California Davis (UC Davis)
    Research assistant advised by Prof. Shiqian Ma, July 2019 - September 2019.


    Teaching assistant, ECS 32B: Introduction to Data Structures, University of California Davis.
    Teaching assistant, ECS 154A: Computer Architecture, University of California Davis.