(image credits)

  Bokun Wang

I am a Ph.D. student at Texas A&M University. I got my bachelor's degree in Computer
Science from University of Electronic Science and Technology of China (UESTC) in 2018.
My current research interests are stochastic and/or distributed optimization.

Selected Papers

    Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications
    Bokun Wang and Tianbao Yang
    In Proc. of the 39th International Conference on Machine Learning (ICML), 2022.
    [paper] [arxiv] [bib] [code] [poster] [slides]
    + abstract
    This paper studies stochastic optimization for a sum of compositional functions, where the inner-level function of each summand is coupled with the corresponding summation index. We refer to this family of problems as finite-sum coupled compositional optimization (FCCO). It has broad applications in machine learning for optimizing non-convex or convex compositional measures/objectives such as average precision (AP), p-norm push, listwise ranking losses, neighborhood component analysis (NCA), deep survival analysis, deep latent variable models, etc., which deserves finer analysis. Yet, existing algorithms and analyses are restricted in one or other aspects. The contribution of this paper is to provide a comprehensive convergence analysis of a simple stochastic algorithm for both non-convex and convex objectives. Our key result is the improved oracle complexity with the parallel speed-up by using the moving-average based estimator with mini-batching. Our theoretical analysis also exhibits new insights for improving the practical implementation by sampling the batches of equal size for the outer and inner levels. Numerical experiments on AP maximization, NCA, and p-norm push corroborate some aspects of the theory.
    IntSGD: Adaptive Floatless Compression of Stochastic Gradients
    Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, and Peter Richtárik
    In Proc. of the 10th International Conference on Learning Representations (ICLR), 2022 (Spotlight).
    [paper] [arxiv] [bib] [code] [poster] [slides]
    + abstract
    We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float. This is achieved by multiplying floating-point vectors with a number known to every device and then rounding to integers. In contrast to the prior work on integer compression for SwitchML by Sapio et al. (2021), our IntSGD method is provably convergent and computationally cheaper as it estimates the scaling of vectors adaptively. Our theory shows that the iteration complexity of IntSGD matches that of SGD up to constant factors for both convex and non-convex, smooth and non-smooth functions, with and without overparameterization. Moreover, our algorithm can also be tailored for the popular all-reduce primitive and shows promising empirical performance.


    Optimization and Machine Learning Lab, King Abdullah University of Science and Technology (KAUST)
    Research intern advised by Prof. Peter Richtárik, September 2020 - August 2021.

    Department of Mathematics, University of California Davis (UC Davis)
    Research assistant advised by Prof. Shiqian Ma, July 2019 - September 2019.


    Spotlight Presentation, 10th International Conference on Learning Representations (ICLR), 2022.
    Tang Lixin Scholarship, University of Electronic Science and Technology of China (UESTC), 2017.
    National Scholarship, Ministry of Education, China, 2016.


    Teaching assistant, ECS 32B: Introduction to Data Structures, University of California Davis.
    Teaching assistant, ECS 154A: Computer Architecture, University of California Davis.
    Teaching assistant, ECS 271: Machine Learning & Discovery, University of California Davis.
    Teaching assistant, ECS 170: Introduction to Artificial Intelligence, University of California Davis.