Senior Undergraduate in Computer Science @ SJTU
Research: Multimodal Learning
Email: xry2022 [at] sjtu.edu.cn
I am a senior CS student at Shanghai Jiao Tong University, advised by Prof. Xie Chen.
My research focuses on fine-grained perception in Multimodal LLMs, specifically through audio-visual synergy. I am currently a Research Intern at Alibaba Qwen, where I contributed to the development of Qwen3-Omni, focusing on audio-visual captioning and agents.
In my spare time, I listen to Britpop and J-rock.
* Equal contribution