publications | Curiousity Hub

2025

Reinforcement Learning with Rubric Anchors

Zenan Huang, Yihong Zhuang, Guoshan Lu, and 18 more authors

Aug 2025

arXiv:2508.12790 [cs]

Abs DOI

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing Large Language Models (LLMs), exemplified by the success of OpenAI’s o-series. In RLVR, rewards are derived from verifiable signals-such as passing unit tests in code generation or matching correct answers in mathematical reasoning. While effective, this requirement largely confines RLVR to domains with automatically checkable outcomes. To overcome this, we extend the RLVR paradigm to open-ended tasks by integrating rubric-based rewards, where carefully designed rubrics serve as structured, model-interpretable criteria for automatic scoring of subjective outputs. We construct, to our knowledge, the largest rubric reward system to date, with over 10,000 rubrics from humans, LLMs, or a hybrid human-LLM collaboration. Implementing rubric-based RL is challenging; we tackle these issues with a clear framework and present an open-sourced Qwen-30B-A3B model with notable gains: 1) With only 5K+ samples, our system improves by +5.2% on open-ended benchmarks (especially humanities), outperforming a 671B DeepSeek-V3 model by +2.4%, while preserving general and reasoning abilities. 2) Our method provides fine-grained stylistic control, using rubrics as anchors to mitigate the "AI-like" tone and produce more human-like, expressive responses. We share key lessons in rubric construction, data selection, and training, and discuss limitations and future releases.
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Haoyuan Wu, Haoxing Chen, Xiaodong Chen, and 10 more authors

Aug 2025

arXiv:2508.07785 [cs]

Abs DOI

The Mixture of Experts (MoE) architecture is a cornerstone of modern state-of-the-art (SOTA) large language models (LLMs). MoE models facilitate scalability by enabling sparse parameter activation. However, traditional MoE architecture uses homogeneous experts of a uniform size, activating a fixed number of parameters irrespective of input complexity and thus limiting computational efficiency. To overcome this limitation, we introduce Grove MoE, a novel architecture incorporating experts of varying sizes, inspired by the heterogeneous big.LITTLE CPU architecture. This architecture features novel adjugate experts with a dynamic activation mechanism, enabling model capacity expansion while maintaining manageable computational overhead. Building on this architecture, we present GroveMoE-Base and GroveMoE-Inst, 33B-parameter LLMs developed by applying an upcycling strategy to the Qwen3-30B-A3B-Base model during mid-training and post-training. GroveMoE models dynamically activate 3.14-3.28B parameters based on token complexity and achieve performance comparable to SOTA open-source models of similar or even larger size.
ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems

Yiming Zhang, Yingfan Ma, Yanmei Gu, and 9 more authors

Jul 2025

arXiv:2507.04766 [cs]

Abs DOI

Large Language Models (LLMs) have shown impressive performance in domains such as mathematics and programming, yet their capabilities in physics remain underexplored and poorly understood. Physics poses unique challenges that demand not only precise computation but also deep conceptual understanding and physical modeling skills. Existing benchmarks often fall short due to limited difficulty, multiple-choice formats, and static evaluation settings that fail to capture physical modeling ability. In this paper, we introduce ABench-Physics, a novel benchmark designed to rigorously evaluate LLMs’ physical reasoning and generalization capabilities. ABench-Physics consists of two components: Phy_A, a static set of 400 graduate- or Olympiad-level problems; and Phy_B, a dynamic subset of 100 problems equipped with an automatic variation engine to test model robustness across changing conditions. All questions require precise numerical answers, with strict formatting and tolerance constraints. Our evaluation of several state-of-the-art LLMs reveals substantial performance gaps, highlighting persistent limitations in physical reasoning, especially in generalization to dynamic variants. ABench-Physics provides a challenging and diagnostic framework for advancing scientific reasoning in LLMs.

2024

ICLR
Energy-Based Automated Model Evaluation

Ru Peng, Heming Zou, Haobo Wang, and 3 more authors

In The Twelfth International Conference on Learning Representations, Jul 2024

Abs Bib

The conventional evaluation protocols on machine learning models rely heavily on a labeled, i.i.d-assumed testing dataset, which is not often present in real-world applications. The Automated Model Evaluation (AutoEval) shows an alternative to this traditional workflow, by forming a proximal prediction pipeline of the testing performance without the presence of ground-truth labels. Despite its recent successes, the AutoEval frameworks still suffer from an overconfidence issue, substantial storage and computational cost. In that regard, we propose a novel measure — Meta-Distribution Energy (MDE) that allows the AutoEval framework to be both more efficient and effective. The core of the MDE is to establish a meta-distribution statistic, on the information (energy) associated with individual samples, then offer a smoother representation enabled by energy-based learning. We further provide our theoretical insights by connecting the MDE with the classification loss. We provide extensive experiments across modalities, datasets and different architectural backbones to validate MDE’s validity, together with its superiority compared with prior approaches. We also prove MDE’s versatility by showing its seamless integration with large-scale models, and easy adaption to learning scenarios with noisy- or imbalanced- labels.
@inproceedings{pengEnergybasedAutomatedModel2024, title = {Energy-Based {{Automated Model Evaluation}}}, booktitle = {The {{Twelfth International Conference}} on {{Learning Representations}}}, author = {Peng, Ru and Zou, Heming and Wang, Haobo and Zeng, Yawen and Huang, Zenan and Zhao, Junbo}, year = {2024}, urldate = {2024-01-19}, copyright = {All rights reserved}, langid = {english} }
AAAI
MCA: Moment Channel Attention Networks

Yangbo Jiang, Zhiwei Jiang, Le Han, and 2 more authors

Proceedings of the AAAI Conference on Artificial Intelligence, Mar 2024

Abs DOI Bib

Channel attention mechanisms endeavor to recalibrate channel weights to enhance representation abilities of networks. However, mainstream methods often rely solely on global average pooling as the feature squeezer, which significantly limits the overall potential of models. In this paper, we investigate the statistical moments of feature maps within a neural network. Our findings highlight the critical role of high-order moments in enhancing model capacity. Consequently, we introduce a flexible and comprehensive mechanism termed Extensive Moment Aggregation (EMA) to capture the global spatial context. Building upon this mechanism, we propose the Moment Channel Attention (MCA) framework, which efficiently incorporates multiple levels of moment-based information while minimizing additional computation costs through our Cross Moment Convolution (CMC) module. The CMC module via channel-wise convolution layer to capture multiple order moment information as well as cross channel features. The MCA block is designed to be lightweight and easily integrated into a variety of neural network architectures. Experimental results on classical image classification, object detection, and instance segmentation tasks demonstrate that our proposed method achieves state-of-the-art results, outperforming existing channel attention methods.
@article{jiangMCAMomentChannel2024, title = {{{MCA}}: {{Moment Channel Attention Networks}}}, shorttitle = {{{MCA}}}, author = {Jiang, Yangbo and Jiang, Zhiwei and Han, Le and Huang, Zenan and Zheng, Nenggan}, year = {2024}, month = mar, journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {38}, number = {3}, pages = {2579--2588}, issn = {2374-3468}, doi = {10.1609/aaai.v38i3.28035}, urldate = {2024-05-07}, copyright = {Copyright (c) 2024 Association for the Advancement of Artificial Intelligence}, langid = {english} }
ICML
Unbiased Multi-Label Learning from Crowdsourced Annotations

Mingxuan Xia, Zenan Huang, Runze Wu, and 4 more authors

In The Forty-first International Conference on Machine Learning, Mar 2024

Abs Bib

This work studies the novel Crowdsourced Multi-Label Learning (CMLL) problem, where each instance is related to multiple true labels but the model only receives unreliable labels from different annotators. Although a few Crowdsourced Multi-Label Inference (CMLI) methods have addressed learning with multiple labels under crowdsourcing, they pay more attention to directly identifying true labels given crowdsourced ones and lack of theoretical guarantees of the learned multi-label predictor. In this paper, by excavating the generation process of crowdsourced labels, we establish the first \textbfunbiased risk estimator for CMLL based on the crowdsourced transition matrices. To facilitate transition matrix estimation, we upgrade our unbiased risk estimator by aggregating crowdsourced labels and transition matrices from all annotators while guaranteeing its theoretical characteristics. Integrating with the unbiased risk estimator, we further propose a decoupled autoencoder framework to exploit label correlations and boost performance. We also provide a generalization error bound to ensure the convergence of the empirical risk estimator. Experiments on various CMLL scenarios demonstrate the effectiveness of our proposed method.
@inproceedings{xiaUnbiasedMultiLabelLearning2024, author = {Xia, Mingxuan and Huang, Zenan and Wu, Runze and Lyu, Gengyu and Zhao, Junbo and Chen, Gang and Wang, Haobo}, booktitle = {The {{Forty-first International Conference}} on {{Machine Learning}}}, title = {Unbiased Multi-Label Learning from Crowdsourced Annotations}, year = {2024} }
AAAI
A Separation and Alignment Framework for Black-box Domain Adaptation

Mingxuan Xia, Junbo Zhao, Lyu Gengyu, and 4 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, Mar 2024

Abs Bib

Black-box domain adaptation (BDA) targets to learn a classifier on an unsupervised target domain while assuming only access to black-box predictors trained from unseen source data. Although a few BDA approaches have demonstrated promise by manipulating the transferred labels, they largely overlook the rich underlying structure in the target domain. To address this problem, we introduce a novel separation and alignment framework for BDA. Firstly, we locate those well-adapted samples via loss ranking and a flexible confidence-thresholding procedure. Then, we introduce a novel graph contrastive learning objective that aligns under-adapted samples to their local neighbors and well-adapted samples. Lastly, the adaptation is finally achieved by a nearest-centroid-augmented objective that exploits the clustering effect in the feature space. Extensive experiments demonstrate that our proposed method outperforms best baselines on benchmark datasets, e.g. improving the averaged per-class accuracy by 4.1% on the VisDA dataset. The source code is available at: https://github.com/MingxuanXia/SEAL.
@inproceedings{xiaSeparationAlignmentFramework2024, title = {A {{Separation}} and {{Alignment Framework}} for {{Black-box Domain Adaptation}}}, booktitle = {Proceedings of the {{AAAI Conference}} on {{Artificial Intelligence}}}, author = {Xia, Mingxuan and Zhao, Junbo and Gengyu, Lyu and Huang, Zenan and Hu, Tianlei and Chen, Gang and Wang, Haobo}, year = {2024}, copyright = {All rights reserved} }

2023

IEEE TIP
Discriminative Radial Domain Adaptation

Zenan Huang, Jun Wen, Siheng Chen, and 2 more authors

IEEE Transactions on Image Processing, Mar 2023

Abs DOI Bib HTML

Domain adaptation methods reduce domain shift typically by learning domain-invariant features. Most existing methods are built on distribution matching, e.g., adversarial domain adaptation, which tends to corrupt feature discriminability. In this paper, we propose Discriminative Radial Domain Adaptation (DRDA) which bridges source and target domains via a shared radial structure. It’s motivated by the observation that as the model is trained to be progressively discriminative, features of different categories expand outwards in different directions, forming a radial structure. We show that transferring such an inherently discriminative structure would enable to enhance feature transferability and discriminability simultaneously. Speciﬁcally, we represent each domain with a global anchor and each category a local anchor to form a radial structure and reduce domain shift via structure matching. It consists of two parts, namely isometric transformation to align the structure globally and local reﬁnement to match each category. To enhance the discriminability of the structure, we further encourage samples to cluster close to the corresponding local anchors based on optimal-transport assignment. Extensively experimenting on multiple benchmarks, our method is shown to consistently outperforms state-of-the-art approaches on varied tasks, including the typical unsupervised domain adaptation, multi-source domain adaptation, domainagnostic learning, and domain generalization.
@article{huangDiscriminativeRadialDomain2023, ids = {huangDiscriminativeRadialDomain2023a}, title = {Discriminative {{Radial Domain Adaptation}}}, author = {Huang, Zenan and Wen, Jun and Chen, Siheng and Zhu, Linchao and Zheng, Nenggan}, year = {2023}, journal = {IEEE Transactions on Image Processing}, pages = {1--1}, issn = {1941-0042}, doi = {10.1109/TIP.2023.3235583}, copyright = {All rights reserved}, }
ICCV
iDAG: Invariant DAG Searching for Domain Generalization

Zenan Huang, Haobo Wang, Junbo Zhao, and 1 more author

In Proceedings of the IEEE/CVF International Conference on Computer Vision, Mar 2023

Abs Bib HTML

Existing machine learning (ML) models are often fragile in open environments because the data distribution frequently shifts. To address this problem, domain generalization (DG) aims to explore underlying invariant patterns for stable prediction across domains. In this work, we first characterize that this failure of conventional ML models in DG attributes to an inadequate identification of causal structures. We further propose a novel invariant Directed Acyclic Graph (dubbed iDAG) searching framework that attains an invariant graphical relation as the proxy to the causality structure from the intrinsic data-generating process. To enable tractable computation, iDAG solves a constrained optimization objective built on a set of representative class-conditional prototypes. Additionally, we integrate a hierarchical contrastive learning module, which poses a strong effect of clustering, for enhanced prototypes as well as stabler prediction. Extensive experiments on the synthetic and real-world benchmarks demonstrate that iDAG outperforms the state-of-the-art approaches, verifying the superiority of causal structure identification for DG. The code of iDAG is available at https://github.com/lccurious/iDAG.
@inproceedings{huangIDAGInvariantDAG2023, title = {{{iDAG}}: {{Invariant DAG Searching}} for {{Domain Generalization}}}, shorttitle = {{{iDAG}}}, booktitle = {Proceedings of the {{IEEE}}/{{CVF International Conference}} on {{Computer Vision}}}, author = {Huang, Zenan and Wang, Haobo and Zhao, Junbo and Zheng, Nenggan}, year = {2023}, pages = {19169--19179}, urldate = {2023-11-28}, copyright = {All rights reserved}, langid = {english}, }
IJCAI
Latent Processes Identification From Multi-View Time Series

Zenan Huang, Haobo Wang, Junbo Zhao, and 1 more author

In Thirty-Second International Joint Conference on Artificial Intelligence, Aug 2023

Abs DOI Bib HTML

Understanding the dynamics of time series data typically requires identifying the unique latent factors for data generation, a.k.a., latent processes identification. Driven by the independent assumption, existing works have made great progress in handling single-view data. However, it is a nontrivial problem that extends them to multi-view time series data because of two main challenges: (i) the complex data structure, such as temporal dependency, can result in violation of the independent assumption; (ii) the factors from different views are generally overlapped and are hard to be aggregated to a complete set. In this work, we propose a novel framework MuLTI that employs the contrastive learning technique to invert the data generative process for enhanced identifiability. Additionally, MuLTI integrates a permutation mechanism that merges corresponding overlapped variables by the establishment of an optimal transport formula. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of our method in recovering identifiable latent variables on multi-view time series. The code is available on https://github.com/lccurious/MuLTI.
@inproceedings{huangLatentProcessesIdentification2023, title = {Latent {{Processes Identification From Multi-View Time Series}}}, booktitle = {Thirty-{{Second International Joint Conference}} on {{Artificial Intelligence}}}, author = {Huang, Zenan and Wang, Haobo and Zhao, Junbo and Zheng, Nenggan}, year = {2023}, month = aug, volume = {4}, pages = {3848--3856}, issn = {1045-0823}, doi = {10.24963/ijcai.2023/428}, urldate = {2023-08-27}, copyright = {All rights reserved}, langid = {english}, }

2021

IEEE NER

Improving Movement-Related Cortical Potential Detection at the EEG Source Domain

Chenyang Li, Haonan Guan, Zenan Huang, and 3 more authors

In 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER), May 2021

DOI Bib

@inproceedings{liImprovingMovementRelatedCortical2021,
  title = {Improving {{Movement-Related Cortical Potential Detection}} at the {{EEG Source Domain}}},
  booktitle = {2021 10th {{International IEEE}}/{{EMBS Conference}} on {{Neural Engineering}} ({{NER}})},
  author = {Li, Chenyang and Guan, Haonan and Huang, Zenan and Chen, Weidong and Li, Jianhua and Zhang, Shaomin},
  year = {2021},
  month = may,
  pages = {214--217},
  issn = {1948-3554},
  doi = {10.1109/NER49283.2021.9441169},
  urldate = {2023-12-21},
  copyright = {All rights reserved}
}

Representation of Drosophila Larval Behaviors by Muscle Activity Patterns

Jinrun Zhou, Zenan Huang, Xinhang Li, and 7 more authors

Nov 2021

DOI Bib

@misc{zhouRepresentationDrosophilaLarval2021,
  title = {Representation of {{Drosophila}} Larval Behaviors by Muscle Activity Patterns},
  author = {Zhou, Jinrun and Huang, Zenan and Li, Xinhang and Song, Zhiying and Sun, Yixuan and Ping, Junyu and Chen, Xiaopeng and Fei, Peng and Zheng, Nenggan and Gong, Zhefeng},
  year = {2021},
  month = nov,
  primaryclass = {New Results},
  pages = {2021.11.26.470133},
  publisher = {{bioRxiv}},
  doi = {10.1101/2021.11.26.470133},
  urldate = {2022-09-27},
  archiveprefix = {bioRxiv},
  chapter = {New Results},
  copyright = {{\copyright} 2021, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0/},
  langid = {english}
}

2020

Interventional Domain Adaptation

Jun Wen, Changjian Shui, Kun Kuang, and 4 more authors

Nov 2020

DOI Bib

@misc{wenInterventionalDomainAdaptation2020,
  title = {Interventional {{Domain Adaptation}}},
  author = {Wen, Jun and Shui, Changjian and Kuang, Kun and Yuan, Junsong and Huang, Zenan and Gong, Zhefeng and Zheng, Nenggan},
  year = {2020},
  month = nov,
  number = {arXiv:2011.03737},
  eprint = {2011.03737},
  primaryclass = {cs, stat},
  publisher = {{arXiv}},
  doi = {10.48550/arXiv.2011.03737},
  urldate = {2022-10-27},
  archiveprefix = {arxiv},
  copyright = {All rights reserved}
}