About

Research Scientist, Meta AI

Hello! I am a research scientist at Meta, building multi-modal understanding and generation models.

Previously I was a senior ML scientist at Picsart. I received my Ph.D. in Computer Science from Leibniz Hanover University, advised by Prof. Michael Yang and Prof. Bodo Rosenhahn.

News

  • Nov 2025: We release TUNA, A multimodal understanding and generation model!
  • Jun 2025: Our GenAI solution for Ads is highlighted at Cannes Lions 2025.
  • Mar 2025: Paper on compositional image generation accepted to IJCV.
  • Feb 2025: Paper on virtual try-on accepted to CVPR.

Selected Publications

Generative Models

  • TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
    Zhiheng Liu, Weiming Ren, Haozhe Liu, Zijian Zhou, Shoufa Chen, Haonan Qiu, Xiaoke Huang, Zhaochong An, Fanny Yang, Aditya Patel, Viktar Atliha, Tony Ng, Xiao Han, Chuyan Zhu, Chenyang Zhang, Ding Liu, Juan-Manuel Perez-Rua, Sen He, Jürgen Schmidhuber, Wenhu Chen, Ping Luo, Wei Liu, Tao Xiang, Jonas Schult, Yuren Cong. Preprint, 2025.
  • HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
    Haonan Qiu, Shikun Liu, Zijian Zhou, Zhaochong An, Weiming Ren, Zhiheng Liu, Jonas Schult, Sen He, Shoufa Chen, Yuren Cong, Tao Xiang, Ziwei Liu, Juan-Manuel Perez-Rua. Preprint, 2025.
  • Mixture of States: Routing Token-Level Dynamics for Multimodal Generation
    Haozhe Liu, Ding Liu, Mingchen Zhuge, Zijian Zhou, Tian Xie, Sen He, Yukang Yang, Shuming Liu, Yuren Cong, Jiadong Guo, Hongyu Xu, Ke Xu, Kam-Woh Ng, Juan C. Pérez, Juan-ManuelPérez-Rúa, Tao Xiang, Wei Liu, Shikun Liu, Jürgen Schmidhuber. Preprint, 2025.
  • Scaling Zero-Shot Reference-to-Video Generation
    Zijian Zhou, Shikun Liu, Haozhe Liu, Haonan Qiu, Zhaochong An, Weiming Ren, Zhiheng Liu, Xiaoke Huang, Kam Woh Ng, Tian Xie, Xiao Han, Yuren Cong, Hang Li, Chuyan Zhu, Aditya Patel, Tao Xiang, Sen He. Preprint, 2025.
  • Attribute-Centric Compositional Text-to-Image Generation
    Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang. IJCV, 2025.
  • Learning Flow Fields in Attention for Controllable Person Image Generation
    Zijian Zhou, Shikun Liu, Xiao Han, Haozhe Liu, Kam Woh Ng, Tian Xie, Yuren Cong, Hang Li, Mengmeng Xu, Juan-Manuel Perez-Rua, Aditya Patel, Tao Xiang, Miaojing Shi, Sen He. CVPR, 2025.
  • FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
    Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He. ICLR, 2024.
  • GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
    Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua. CVPR , 2024.

Scene Understanding

  • SPAN: Learning Similarity between Scene Graphs and Images with Transformers
    Yuren Cong, Wentong Liao, Bodo Rosenhahn, Michael Ying Yang. PAMI , 2025.
  • Reltr: Relation Transformer for Scene Graph Generation
    Yuren Cong, Wentong Liao, Bodo Rosenhahn, Michael Ying Yang. PAMI , 2023.
  • Spatial-temporal Transformer for Dynamic Scene Graph Generation
    Yuren Cong, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn, Michael Ying Yang. ICCV , 2021.
  • NODIS: Neural Ordinary Differential Scene Understanding
    Yuren Cong, Hanno Ackermann, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn. ECCV , 2020.

Embodied AI

  • Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change
    Mariia Khan, Yue Qiu, Yuren Cong, Bodo Rosenhahn, David Suter, Jumana Abu-Khalaf. IROS , 2024.
  • Worldafford: Affordance Grounding Based on Natural Language Instructions
    Changmao Chen, Yuren Cong, Zhen Kan. ICTAI , 2024.

Contact