Unlocking The Conversion Of Web Screenshots Into HTML Code With The Websight Dataset | Awesome LLM Papers Contribute to Awesome LLM Papers

Unlocking The Conversion Of Web Screenshots Into HTML Code With The Websight Dataset

Hugo Laurençon, Léo Tronchon, Victor Sanh . No Venue 2024

[Paper] [Paper]   Search on Google Scholar   Search on Semantic Scholar
Datasets Efficiency

Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight.

https://huggingface.co/discussions/paper/65f3a30f2b4e85e2e8898b53

Similar Work