Skip to main content

Hi, I'm Xiaofeng Li (Sheldon)

HKUST(GZ) · Image Caption Evaluation · Generation Quality Evaluation

I research how to reliably evaluate generative models — from image caption metrics to VLM-as-judge approaches.
GitHub

About Me

I am a researcher at HKUST(GZ) working on evaluation of generative models. My current focus is on Image Caption Evaluation and Generation Quality Evaluation — tackling how we can reliably and comprehensively measure what makes generated content good.

🔬 Current Research

Image Caption Evaluation. I recently reproduced PGV3's CapsBench and am running systematic ablation experiments to identify its failure modes. Existing caption evaluation metrics correlate poorly with human judgment on fine-grained aspects like hallucination, style, and factual consistency. My goal is to design a more comprehensive and accurate caption evaluation framework.

Generation Quality Evaluation. I curate awesome-generation-model-evaluation, a survey tracking 20+ papers (2025–2026) across benchmarks, metrics, VLM-based judges, and domain-specific evaluation. Contributions welcome!

Publications

  • IP-Bench: Benchmark for Image Protection Methods in Image-to-Video Generation Scenarios
    Xiaofeng Li, Leyi Sheng, Zhen Sun, Zongmin Zhang, Jiaheng Wei, Xinlei He · arXiv 2026
    [paper] [code]

  • Programming guide for solving constraint satisfaction problems with tensor networks
    [paper] [code]