(image credits)
|
Bokun Wang
I am a Ph.D. student in the Department of Computer Science & Engineering (CSE) at Texas A&M University supervised by Prof. Tianbao Yang. Before that, I got my master's degree in Computer Science at University of California Davis and bachelor's degree in Computer Science from University of Electronic Science and Technology of China (UESTC) in 2018. My research focuses on stochastic optimization for machine learning and AI, with an emphasis on learning from imbalanced data, self-supervised learning, and distributionally robust optimization.
|
|
News
|
Research Overview
|
Manuscripts
-
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
Bokun Wang, Yunwen Lei, Yiming Ying, and Tianbao Yang
Preprint, 2024. [arxiv]
-
+ abstract
-
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning. To address the challenge of computing the integral in the partition function for each anchor data, we leverage the multiple importance sampling (MIS) technique for robust Monte Carlo integration, which can recover InfoNCE-based contrastive loss as a special case. Within this probabilistic modeling framework, we conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning and derive insights for developing better approaches by reducing the error of Monte Carlo integration. To this end, we propose a novel non-parametric method for approximating the sum of conditional densities required by MIS through convex optimization, yielding a new contrastive objective for self-supervised representation learning. Moreover, we design an efficient algorithm for solving the proposed objective. We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm.
-
ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization
Bokun Wang and Tianbao Yang
Preprint, 2023. [arxiv]
-
+ abstract
-
This paper revisits a class of convex Finite-Sum Coupled Compositional Stochastic Optimization (cFCCO) problems with many applications, including group distributionally robust optimization (GDRO), reinforcement learning, and learning to rank. To better solve these problems, we introduce a unified family of efficient single-loop primal-dual block-coordinate proximal algorithms, dubbed ALEXR. This algorithm leverages block-coordinate stochastic mirror ascent updates for the dual variable and stochastic proximal gradient descent updates for the primal variable. We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions of involved functions, which not only improve the best rates in previous works on smooth cFCCO problems but also expand the realm of cFCCO for solving more challenging non-smooth problems such as the dual form of GDRO. Finally, we present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate stochastic algorithms for the considered class of cFCCO problems.
|
-
Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications
Bokun Wang and Tianbao Yang
In Proc. of the 39th International Conference on Machine Learning (ICML), 2022.
[paper] [arxiv (with updates)] [bib] [code] [poster] [slides]
-
+ abstract
-
This paper studies stochastic optimization for a sum of compositional functions, where the inner-level function of each summand is coupled with the corresponding summation index. We refer to this family of problems as finite-sum coupled compositional optimization (FCCO). It has broad applications in machine learning for optimizing non-convex or convex compositional measures/objectives such as average precision (AP), p-norm push, listwise ranking losses, neighborhood component analysis (NCA), deep survival analysis, deep latent variable models, etc., which deserves finer analysis. Yet, existing algorithms and analyses are restricted in one or other aspects. The contribution of this paper is to provide a comprehensive convergence analysis of a simple stochastic algorithm for both non-convex and convex objectives. Our key result is the improved oracle complexity with the parallel speed-up by using the moving-average based estimator with mini-batching. Our theoretical analysis also exhibits new insights for improving the practical implementation by sampling the batches of equal size for the outer and inner levels. Numerical experiments on AP maximization, NCA, and p-norm push corroborate some aspects of the theory.
-
IntSGD: Adaptive Floatless Compression of Stochastic Gradients
Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, and Peter Richtárik
In Proc. of the 10th International Conference on Learning Representations (ICLR), 2022 (Spotlight).
[paper] [arxiv] [bib] [code] [poster] [slides]
-
+ abstract
-
We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float. This is achieved by multiplying floating-point vectors with a number known to every device and then rounding to integers. In contrast to the prior work on integer compression for SwitchML by Sapio et al. (2021), our IntSGD method is provably convergent and computationally cheaper as it estimates the scaling of vectors adaptively. Our theory shows that the iteration complexity of IntSGD matches that of SGD up to constant factors for both convex and non-convex, smooth and non-smooth functions, with and without overparameterization. Moreover, our algorithm can also be tailored for the popular all-reduce primitive and shows promising empirical performance.
|
Experience
-
King Abdullah University of Science and Technology (KAUST)
Remote research intern advised by
Prof. Peter Richtárik, September 2020 - August 2021.
-
Department of Mathematics, University of California Davis (UC Davis)
Research assistant advised by Prof. Shiqian Ma, July 2019 - September 2019.
|
Teaching
-
Guest Lecturer, CSCE 689: Optimization for Machine Learning (Fall 2023, instructed by Prof. Tianbao Yang), Texas A&M University.
-
Teaching assistant, ECS 32B: Introduction to Data Structures (Winter 2020), University of California Davis.
-
Teaching assistant, ECS 154A: Computer Architecture (Fall 2019), University of California Davis.
-
Teaching assistant, ECS 170: Introduction to Artificial Intelligence (Spring 2019), University of California Davis.
-
Teaching assistant, ECS 271: Machine Learning and Data Discovery (Winter 2019), University of California Davis.
|