Founded by former OpenAI scientist Andrew Carr and former Google creative director Jonathan Jarvis, Cartwheel is bridging the gap between 2D vision and 3D execution.
NL2Repo-Bench provides 104 Python library generation tasks where an agent receives a natural-language spec and must build a complete, installable repo from scratch. Evaluation is execution-based: the ...