Ruiyang Xu 徐瑞阳
Senior Undergraduate in CS
Shanghai Jiao Tong University (SJTU)
Minhang District, Shanghai, China
xry2022 [at] sjtu.edu.cn
About
I am a senior CS student at Shanghai Jiao Tong University, advised by Prof. Xie Chen.
My research focuses on cutting-edge Multimodal Large Language Models (MLLMs), specifically multimodal reasoning and understanding. As a Research Intern at Alibaba Qwen, I have contributed to Qwen3-Omni and Qwen3.5-Omni, focusing on detailed captioning and agents.
In my spare time, I listen to Britpop and J-rock.
News
- 2026.03 Released Qwen3.5-Omni, featuring superior performance in audio-visual detailed captioning.
- 2026.03 Check out Omni-Cloze, a novel benchmark dedicated to multimodal detailed captioning.
- 2026.01 Omni-Captioner accepted at ICLR 2026.
- 2026.01 Join our Audio Reasoning Challenge at Interspeech 2026. Check the project page.
- 2025.09 Qwen3-Omni is released. Check the blog.
- 2025.09 MMAR benchmark is accepted at NeurIPS 2025.
- 2025.05 SLAM-Omni is accepted to the ACL 2025 Findings.
Selected Publications
The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents
2026.02
arxiv
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
2025.10
ICLR 2026
* Equal contribution
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
2025.05
NeurIPS 2025
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
2025.07
Findings of ACL 2025
Education
Shanghai Jiao Tong University
B.S. in Computer Science
2022 — Present
B.S. in Computer Science