Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark And Empirical Study | Awesome LLM Papers Add your paper to Awesome LLM Papers

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark And Empirical Study

Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, Dongmei Zhang . Proceedings of the 17th ACM International Conference on Web Search and Data Mining 2024 – 52 citations

[Code] [Code] [Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Evaluation Few Shot Has Code Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation Prompting

Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, the understanding of their capability to process structured data like tables remains an under-explored area. While tables can be serialized as input for LLMs, there is a lack of comprehensive studies on whether LLMs genuinely comprehend this data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities of LLMs through seven distinct tasks, e.g., cell lookup, row retrieval and size detection. Specially, we perform a series of evaluations on the recent most advanced LLM models, GPT-3.5 and GPT-4 and observe that performance varied with different input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we propose (\textit{self-augmentation}) for effective structural prompting, such as critical value / range identification using internal knowledge of LLMs. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, e.g., TabFact((\uparrow2.31%)), HybridQA((\uparrow2.13%)), SQA((\uparrow2.72%)), Feverous((\uparrow0.84%)), and ToTTo((\uparrow5.68%)). We believe that our open source benchmark and proposed prompting methods can serve as a simple yet generic selection for future research. The code and data of this paper will be temporality released at https://anonymous.4open.science/r/StructuredLLM-76F3/README.md and will be replaced with an official one at https://github.com/microsoft/TableProvider later.

Similar Work