(image credits)

  Bokun Wang

I am a Ph.D. student in the Department of Computer Science & Engineering (CSE) at Texas A&M University supervised by Prof. Tianbao Yang. Before that, I got my master's degree in Computer Science at University of California Davis and bachelor's degree in Computer Science from University of Electronic Science and Technology of China (UESTC) in 2018. My research focuses on stochastic optimization for machine learning and AI, with an emphasis on learning from imbalanced data, self-supervised learning, and distributionally robust optimization.


News

Research Overview

Manuscripts

    On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
    Bokun Wang, Yunwen Lei, Yiming Ying, and Tianbao Yang
    Preprint, 2024. [arxiv]
    + abstract
    We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning. To address the challenge of computing the integral in the partition function for each anchor data, we leverage the multiple importance sampling (MIS) technique for robust Monte Carlo integration, which can recover InfoNCE-based contrastive loss as a special case. Within this probabilistic modeling framework, we conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning and derive insights for developing better approaches by reducing the error of Monte Carlo integration. To this end, we propose a novel non-parametric method for approximating the sum of conditional densities required by MIS through convex optimization, yielding a new contrastive objective for self-supervised representation learning. Moreover, we design an efficient algorithm for solving the proposed objective. We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm.
    ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization
    Bokun Wang and Tianbao Yang
    Preprint, 2023. [arxiv]
    + abstract
    This paper revisits a class of convex Finite-Sum Coupled Compositional Stochastic Optimization (cFCCO) problems with many applications, including group distributionally robust optimization (GDRO), reinforcement learning, and learning to rank. To better solve these problems, we introduce a unified family of efficient single-loop primal-dual block-coordinate proximal algorithms, dubbed ALEXR. This algorithm leverages block-coordinate stochastic mirror ascent updates for the dual variable and stochastic proximal gradient descent updates for the primal variable. We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions of involved functions, which not only improve the best rates in previous works on smooth cFCCO problems but also expand the realm of cFCCO for solving more challenging non-smooth problems such as the dual form of GDRO. Finally, we present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate stochastic algorithms for the considered class of cFCCO problems.

Selected Publications [Full List]

    Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications
    Bokun Wang and Tianbao Yang
    In Proc. of the 39th International Conference on Machine Learning (ICML), 2022.
    [paper] [arxiv (with updates)] [bib] [code] [poster] [slides]
    + abstract
    This paper studies stochastic optimization for a sum of compositional functions, where the inner-level function of each summand is coupled with the corresponding summation index. We refer to this family of problems as finite-sum coupled compositional optimization (FCCO). It has broad applications in machine learning for optimizing non-convex or convex compositional measures/objectives such as average precision (AP), p-norm push, listwise ranking losses, neighborhood component analysis (NCA), deep survival analysis, deep latent variable models, etc., which deserves finer analysis. Yet, existing algorithms and analyses are restricted in one or other aspects. The contribution of this paper is to provide a comprehensive convergence analysis of a simple stochastic algorithm for both non-convex and convex objectives. Our key result is the improved oracle complexity with the parallel speed-up by using the moving-average based estimator with mini-batching. Our theoretical analysis also exhibits new insights for improving the practical implementation by sampling the batches of equal size for the outer and inner levels. Numerical experiments on AP maximization, NCA, and p-norm push corroborate some aspects of the theory.
    IntSGD: Adaptive Floatless Compression of Stochastic Gradients
    Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, and Peter Richtárik
    In Proc. of the 10th International Conference on Learning Representations (ICLR), 2022 (Spotlight).
    [paper] [arxiv] [bib] [code] [poster] [slides]
    + abstract
    We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float. This is achieved by multiplying floating-point vectors with a number known to every device and then rounding to integers. In contrast to the prior work on integer compression for SwitchML by Sapio et al. (2021), our IntSGD method is provably convergent and computationally cheaper as it estimates the scaling of vectors adaptively. Our theory shows that the iteration complexity of IntSGD matches that of SGD up to constant factors for both convex and non-convex, smooth and non-smooth functions, with and without overparameterization. Moreover, our algorithm can also be tailored for the popular all-reduce primitive and shows promising empirical performance.

Experience

    King Abdullah University of Science and Technology (KAUST)
    Remote research intern advised by Prof. Peter Richtárik, September 2020 - August 2021.

    Department of Mathematics, University of California Davis (UC Davis)
    Research assistant advised by Prof. Shiqian Ma, July 2019 - September 2019.

Teaching

    Guest Lecturer, CSCE 689: Optimization for Machine Learning (Fall 2023, instructed by Prof. Tianbao Yang), Texas A&M University.
    Teaching assistant, ECS 32B: Introduction to Data Structures (Winter 2020), University of California Davis.
    Teaching assistant, ECS 154A: Computer Architecture (Fall 2019), University of California Davis.
    Teaching assistant, ECS 170: Introduction to Artificial Intelligence (Spring 2019), University of California Davis.
    Teaching assistant, ECS 271: Machine Learning and Data Discovery (Winter 2019), University of California Davis.