Ruiyang Xu   徐瑞阳

Senior Undergraduate in CS

Shanghai Jiao Tong University (SJTU)

Minhang District, Shanghai, China

About

I am a senior CS student at Shanghai Jiao Tong University, advised by Prof. Xie Chen.

My research focuses on cutting-edge Multimodal Large Language Models (MLLMs), specifically multimodal reasoning and understanding. As a Research Intern at Alibaba Qwen, I have contributed to Qwen3-Omni and Qwen3.5-Omni, focusing on detailed captioning and agents.

In my spare time, I listen to Britpop and J-rock.

News

  • 2026.03 Released Qwen3.5-Omni, featuring superior performance in audio-visual detailed captioning.
  • 2026.03 Check out Omni-Cloze, a novel benchmark dedicated to multimodal detailed captioning.
  • 2026.01 Omni-Captioner accepted at ICLR 2026.
  • 2026.01 Join our Audio Reasoning Challenge at Interspeech 2026. Check the project page.
  • 2025.09 Qwen3-Omni is released. Check the blog.
  • 2025.09 MMAR benchmark is accepted at NeurIPS 2025.
  • 2025.05 SLAM-Omni is accepted to the ACL 2025 Findings.

Selected Publications

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents
Ziyang Ma, Ruiyang Xu, Yinghao Ma, Chao-Han Huck Yang, Bohan Li, Jaeyeon Kim, Jin Xu, Jinyu Li, Carlos Busso, Kai Yu, Eng Siong Chng, Xie Chen
arxiv
2026.02
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
Ziyang Ma*, Ruiyang Xu*, Zhenghao Xing*, Yunfei Chu, Yuxuan Wang, Jinzheng He, Jin Xu, Pheng-Ann Heng, Kai Yu, Junyang Lin, Eng Siong Chng, Xie Chen
ICLR 2026

* Equal contribution

2025.10
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Ziyang Ma, Yinghao Ma, Yanqiao Zhu, Chen Yang, Yi-Wen Chao, Ruiyang Xu, Wenxi Chen, Yuanzhe Chen, Zhuo Chen, Jian Cong, Kai Li, Keliang Li, Siyou Li, Xinfeng Li, Xiquan Li, Zheng Lian, Yuzhe Liang, Minghao Liu, Zhikang Niu, Tianrui Wang, Yuping Wang, Yuxuan Wang, Yihao Wu, Guanrou Yang, Jianwei Yu, Ruibin Yuan, Zhisheng Zheng, Ziya Zhou, Haina Zhu, Wei Xue, Emmanouil Benetos, Kai Yu, Eng-Siong Chng, Xie Chen
NeurIPS 2025
2025.05
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, Yanqiao Zhu, Yifan Yang, Zhanxun Liu, Kai Yu, Yuxuan Hu, Jinyu Li, Yan Lu, Shujie Liu, Xie Chen
Findings of ACL 2025
2025.07
SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing
Ziyang Ma, Guanrou Yang, Wenxi Chen, Zhifu Gao, Yexing Du, Xiquan Li, Zhisheng Zheng, Haina Zhu, Jianheng Zhuo, Zheshu Song, Ruiyang Xu, Tiranrui Wang, Yifan Yang, Yanqiao Zhu, Zhikang Niu, Liumeng Xue, Yinghao Ma, Ruibin Yuan, Shiliang Zhang, Kai Yu, Eng Siong Chng, Xie Chen
IEEE JSTSP
2026.01

Education

Shanghai Jiao Tong University
B.S. in Computer Science
2022 — Present