Ui2code^n: A Visual Language Model For Test-time Scalable Interactive Ui-to-code Generation | Awesome LLM Papers Add your paper to Awesome LLM Papers

Ui2code^n: A Visual Language Model For Test-time Scalable Interactive Ui-to-code Generation

Zhen Yang, Wenyi Hong, Mingde Xu, Xinyue Fan, Weihan Wang, Jiele Cheng, Xiaotao Gu, Jie Tang . No Venue 2025

[Code] [Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Fine Tuning Has Code Image Text Integration Interdisciplinary Approaches Llm For Code Model Architecture Multimodal Semantic Representation Productivity Enhancement Reinforcement Learning Visual Contextualization

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code^N, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code^N establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.

Similar Work