In the evaluation of image generation and editing models, efficiently evaluating multiple models across multiple benchmarks simultaneously remains a persistent challenge. Conventional workflows often ...