Yuren Cong


I am now working as a AI research scientist at Meta, focusing on image and video generative models. Previously, I was a Ph.D. student at the Institute for Information Processing at Leibniz University Hanover, working on scene understanding and generative models. I was advised by Prof. Bodo Rosenhahn and Prof. Michael Ying Yang.

My research interest lies in computer vision and graphics:

  • Image / Video Understanding
  • Image / Video Generation
  • Multimodal Learning
  • Embodied AI

Please feel free to contact me by email for any questions or collaboration!

Scholar CV Mail Twitter Github LinkedIn

Profile picture

Publications

Project image
Learning Flow Fields in Attention for Controllable Person Image Generation
Zijian Zhou, Shikun Liu, Xiao Han, Haozhe Liu, Kam Woh Ng, Tian Xie, Yuren Cong, Hang Li, Mengmeng Xu, Juan-Manuel Perez-Rua, Aditya Patel, Tao Xiang, Miaojing Shi, Sen He,
Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2025
Huggingface /
@article{zhou2024learning,
      title={Learning Flow Fields in Attention for Controllable Person Image Generation},
      author={Zhou, Zijian and Liu, Shikun and Han, Xiao and Liu, Haozhe and Ng, Kam Woh and Xie, Tian and Cong, Yuren and Li, Hang and Xu, Mengmeng and P{\'e}rez-R{\'u}a, Juan-Manuel and others},
      journal={arXiv preprint arXiv:2412.08486},
      year={2024}
    }
Project image
Attribute-Centric Compositional Text-to-Image Generation
Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang,
International Journal of Computer Vision (IJCV), 2025
@article{cong2023attribute,
      title={Attribute-centric compositional text-to-image generation},
      author={Cong, Yuren and Min, Martin Renqiang and Li, Li Erran and Rosenhahn, Bodo and Yang, Michael Ying},
      journal={International Journal of Computer Vision},
      year={2025}
    }
Project image
Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change
Mariia Khan, Yue Qiu, Yuren Cong, Bodo Rosenhahn, David Suter, Jumana Abu-Khalaf,
International Conference on Intelligent Robots and Systems (IROS), 2024
@inproceedings{khan2024indoor,
      title={Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change},
      author={Khan, Mariia and Qiu, Yue and Cong, Yuren and Rosenhahn, Bodo and Suter, David and Abu-Khalaf, Jumana},
      booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
      pages={9777--9783},
      year={2024},
      organization={IEEE}
    }
Project image
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua,
Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2024
Project Page /
@misc{chen2023gentron,
      title={GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation},
      author={Shoufa Chen and Mengmeng Xu and Jiawei Ren and Yuren Cong and Sen He and Yanping Xie and Animesh Sinha and Ping Luo and Tao Xiang and Juan-Manuel Perez-Rua},
      year={2023},
      eprint={2312.04557},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
    }
Project image
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He,
In International Conference on Learning Representations (ICLR), 2024
Project Page / Code / Video /
@article{cong2023flatten,
      title={FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing},
      author={Cong, Yuren and Xu, Mengmeng and Simon, Christian and Chen, Shoufa and Ren, Jiawei and Xie, Yanping and Perez-Rua, Juan-Manuel and Rosenhahn, Bodo and Xiang, Tao and He, Sen},
      journal={arXiv preprint arXiv:2310.05922},
      year={2023}
    }
Project image
Learning Similarity between Scene Graphs and Images with Transformers
Yuren Cong, Wentong Liao, Jiawei Ren, Bodo Rosenhahn, Michael Ying Yang,
arXiv.org (under review), 2023
Project Page / Code /
@article{cong2023learning,
      title={Learning Similarity between Scene Graphs and Images with Transformers},
      author={Cong, Yuren and Liao, Wentong and Rosenhahn, Bodo and Yang, Michael Ying},
      journal={arXiv preprint arXiv:2304.00590},
      year={2023}
    }
Project image
SSGVS: Semantic Scene Graph-to-Video Synthesis
Yuren Cong, Jinhui Yi, Bodo Rosenhahn, Michael Ying Yang,
Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023
Supplemental / Code /
@InProceedings{Cong_2023_CVPR,
        author    = {Cong, Yuren and Yi, Jinhui and Rosenhahn, Bodo and Yang, Michael Ying},
        title     = {SSGVS: Semantic Scene Graph-to-Video Synthesis},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
        month     = {June},
        year      = {2023},
        pages     = {2555-2565}
    }
Project image
Reltr: Relation Transformer for Scene Graph Generation
Yuren Cong, Michael Ying Yang, Bodo Rosenhahn,
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2023
Code / Colab /
@article{cong2023reltr,
      title={Reltr: Relation transformer for scene graph generation},
      author={Cong, Yuren and Yang, Michael Ying and Rosenhahn, Bodo},
      journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
      year={2023},
      publisher={IEEE}
    }
Project image
Spatial-temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn, Michael Ying Yang,
Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021
Supplemental / Code / Video /
@InProceedings{Cong_2021_ICCV,
        author    = {Cong, Yuren and Liao, Wentong and Ackermann, Hanno and Rosenhahn, Bodo and Yang, Michael Ying},
        title     = {Spatial-Temporal Transformer for Dynamic Scene Graph Generation},
        booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
        month     = {October},
        year      = {2021},
        pages     = {16372-16382}
    }
Project image
NODIS: Neural Ordinary Differential Scene Understanding
Yuren Cong, Hanno Ackermann, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn,
Proc. of the European Conference on Computer Vision (ECCV) , 2020
Code / Video /
@InProceedings{cong2020nodis,
      title={Nodis: Neural ordinary differential scene understanding},
      author={Cong, Yuren and Ackermann, Hanno and Liao, Wentong and Yang, Michael Ying and Rosenhahn, Bodo},
      booktitle={Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XX 16},
      pages={636--653},
      year={2020},
      organization={Springer}
    }