Hi, I'm Xiaofeng Li (Sheldon)
HKUST(GZ) · Image Caption Evaluation · Generation Quality Evaluation


About Me
I am a researcher at HKUST(GZ) working on evaluation of generative models. My current focus is on Image Caption Evaluation and Generation Quality Evaluation — tackling how we can reliably and comprehensively measure what makes generated content good.
🔬 Current Research
Image Caption Evaluation. I recently reproduced PGV3's CapsBench and am running systematic ablation experiments to identify its failure modes. Existing caption evaluation metrics correlate poorly with human judgment on fine-grained aspects like hallucination, style, and factual consistency. My goal is to design a more comprehensive and accurate caption evaluation framework.
Generation Quality Evaluation. I curate awesome-generation-model-evaluation, a survey tracking 20+ papers (2025–2026) across benchmarks, metrics, VLM-based judges, and domain-specific evaluation. Contributions welcome!