Abstract: Visual document understanding (VDU) has rapidly advanced with the development of powerful multi-modal language models. However, these models typically require extensive document pre-training ...
Add working directories to your OpenCode session — inspired by Claude Code's /add-dir command. When you need an agent to read, edit, or search files outside the current project, this plugin grants ...
Abstract: Semantic segmentation of remote sensing (RS) images is a very challenging task due to the complicated characteristics such as diversity, complexity, and massiveness. Current research ...
Microsoft unveiled MAI-Image-2, a new AI model that achieved third place on the Arena.ai text-to-image generation leaderboard. PCWorld reports that the model excels at creating photorealistic images ...
VS Code 1.112 agents can now read image files from disk. The image carousel can open generated or selected images in chat. My PoC used three leaderboard screenshots to summarize model trade-offs.