Publications by Tag

🏷️ Publications by Tag

Explore tags that group related research papers. Click a tag to expand and view all associated works, or use the search bar to filter instantly across the collection.

Search

A C D E F H I K L M N P Q R S T U V

AAAI (87) ACL (370) Agentic (222) Applications (1023) ASRU (32) CIKM (19) COLING (3) Conversational Agents (469) CVPR (280) Datasets (1692) Diffusion Processes (448) EACL (13) Efficiency (1547) Emergent Abilities (350) EMNLP (285) Evaluation Frameworks (483) Fine Tuning (1097) Formal Verification (51) Human Ai Collaboration (482) ICASSP (31) ICCV (124) ICLR (8) ICRA (8) IJCAI (17) In Context Learning (160) Instruction Following (334) Interpretability (219) Interspeech (52) KDD (44) Llm For Code (370) Memory & Long Context (237) Model Architecture (1091) NAACL (22) Neural Machine Translation (263) NEURIPS (12) Prompting (508) Question Answering (746) RAG (128) RECSYS (70) Reinforcement Learning (749) Retrieval Systems (237) Scalability (1018) Security (215) SIGIR (56) SLT (12) Survey Paper (313) TACL (31) Training Techniques (399) Uncategorized (54) Vision Language (1357)

Tags

Click a tag to expand and see all papers.

— A —

AAAI 87 papers #

Explicit Visual Prompts For Visual Object Tracking (2024) • Proceedings of the AAAI Conference on Artificial Intelligence • 40 citations
Shi et al.
BEST: BERT Pre-training For Sign Language Recognition With Coupling Tokenization (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 41 citations
Zhao et al.
Interpretable Long-form Legal Question Answering With Retrieval-augmented Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 45 citations
Antoine Louis, Gijs van Dijck, Gerasimos Spanakis
Social Biases Through The Text-to-image Generation Lens (2023) • Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society • 80 citations
Ranjita Naik, Besmira Nushi
Vadclip: Adapting Vision-language Models For Weakly Supervised Video Anomaly Detection (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 84 citations
Wu et al.
Graph Of Thoughts: Solving Elaborate Problems With Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 234 citations
Besta et al.
Memorybank: Enhancing Large Language Models With Long-term Memory (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 77 citations
Zhong et al.
Enhancing Job Recommendation Through Llm-based Generative Adversarial Networks (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 49 citations
Du et al.
Navgpt: Explicit Reasoning In Vision-and-language Navigation With Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 104 citations
Gengze Zhou, Yicong Hong, Qi Wu
Detecting And Preventing Hallucinations In Large Vision Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 93 citations
Anisha Gunjal, Jihan Yin, Erhan Bas
Expel: LLM Agents Are Experiential Learners (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Zhao et al.
Mitigating The Learning Bias Towards Repetition By Self-contrastive Training For Open-ended Generation (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 41 citations
Jian Guan, Minlie Huang
Bad Actor, Good Advisor: Exploring The Role Of Large Language Models In Fake News Detection (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 104 citations
Hu et al.
BLIVA: A Simple Multimodal LLM For Better Handling Of Text-rich Visual Questions (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Hu et al.
Audiogpt: Understanding And Generating Speech, Music, Sound, And Talking Head (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Huang et al.
Dc-former: Diverse And Compact Transformer For Person Re-identification (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 53 citations
Li et al.
Graphix-t5: Mixing Pre-trained Transformers With Graph-aware Layers For Text-to-sql Parsing (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 51 citations
Li et al.
Flexkbqa: A Flexible Llm-powered Framework For Few-shot Knowledge Base Question Answering (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 45 citations
Li et al.
RESDSQL: Decoupling Schema Linking And Skeleton Parsing For Text-to-sql (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 96 citations
Li et al.
Visual Adversarial Examples Jailbreak Aligned Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 73 citations
Qi et al.
Supporting Human-ai Collaboration In Auditing Llms With Llms (2023) • AIES '23: AAAI/ACM Conference on AI, Ethics, and Society • 47 citations
Rastogi et al.
Motiongpt: Finetuned Llms Are General-purpose Motion Generators (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 50 citations
Zhang et al.
Codeattack: Code-based Adversarial Attacks For Pre-trained Programming Language Models (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 40 citations
Akshita Jha, Chandan K. Reddy
Fact: Factor-tuning For Lightweight Adaptation On Vision Transformer (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Shibo Jie, Zhi-Hong Deng
Repair Is Nearly Generation: Multilingual Program Repair With Llms (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 68 citations
Joshi et al.
End-to-end Transformer Based Model For Image Captioning (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 107 citations
Yiyu Wang, Jungang Xu, Yingfei Sun
Are Transformers Effective For Time Series Forecasting? (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 1544 citations
Zeng et al.
Revisiting Classifier: Transferring Vision-language Models For Video Recognition (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Wenhao Wu, Zhun Sun, Wanli Ouyang
Dptext-detr: Towards Better Scene Text Detection With Dynamic Points In Transformer (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 72 citations
Ye et al.
Laneformer: Object-aware Row-column Transformers For Lane Detection (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 54 citations
Han et al.
Tailor Versatile Multi-modal Learning For Multi-label Emotion Recognition (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Zhang et al.
Graph-enhanced Multi-task Learning Of Multi-level Transition Dynamics For Session-based Recommendation (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 95 citations
Huang et al.
Transtailor: Pruning The Pre-trained Model For Improved Transfer Learning (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 47 citations
Liu et al.
Persistent Anti-muslim Bias In Large Language Models (2021) • AIES '21: AAAI/ACM Conference on AI, Ethics, and Society • 266 citations
Abubakar Abid, Maheen Farooqi, James Zou
Pale Transformer: A General Vision Transformer Backbone With Pale-shaped Attention (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 48 citations
Wu et al.
Partial Is Better Than All: Revisiting Fine-tuning Strategy For Few-shot Learning (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 133 citations
Shen et al.
Bidirectional Machine Reading Comprehension For Aspect Sentiment Triplet Extraction (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 185 citations
Chen et al.
Learning Modality-specific Representations With Self-supervised Multi-task Learning For Multimodal Sentiment Analysis (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 555 citations
Yu et al.
Evo-vit: Slow-fast Token Evolution For Dynamic Vision Transformer (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 131 citations
Xu et al.
Scheduled Sampling In Vision-language Pretraining With Decoupled Encoder-decoder Network (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Li et al.
Entity Structure Within And Throughout: Modeling Mention Dependencies For Document-level Relation Extraction (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 167 citations
Xu et al.
Similarity Reasoning And Filtration For Image-text Matching (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 319 citations
Diao et al.
Peco: Perceptual Codebook For BERT Pre-training Of Vision Transformers (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 83 citations
Dong et al.
BROS: A Pre-trained Language Model Focusing On Text And Layout For Better Key Information Extraction From Documents (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 105 citations
Hong et al.
Knowledge-enhanced Hierarchical Graph Transformer Network For Multi-behavior Recommendation (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 179 citations
Xia et al.
Transfg: A Transformer Architecture For Fine-grained Recognition (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 315 citations
He et al.
Trocr: Transformer-based Optical Character Recognition With Pre-trained Models (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 247 citations
Li et al.
Nested Hierarchical Transformer: Towards Accurate, Data-efficient And Interpretable Visual Understanding (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 122 citations
Zhang et al.
Pretrained Transformers As Universal Computation Engines (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 71 citations
Lu et al.
Visualmrc: Machine Reading Comprehension On Document Images (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 62 citations
Ryota Tanaka, Kyosuke Nishida, Sen Yoshida
Informer: Beyond Efficient Transformer For Long Sequence Time-series Forecasting (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 3884 citations
Zhou et al.
Interpretable Rumor Detection In Microblogs By Attending To User Interactions (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 176 citations
Khoo et al.
Contrastive Triple Extraction With Generative Transformer (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 72 citations
Ye et al.
Rethinking Generalization Of Neural Models: A Named Entity Recognition Case Study (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 72 citations
Fu et al.
Spatial-temporal Multi-cue Network For Continuous Sign Language Recognition (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 184 citations
Zhou et al.
Show, Recall, And Tell: Image Captioning With Recall Mechanism (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 40 citations
Wang et al.
Latent Opinions Transfer Network For Target-oriented Opinion Words Extraction (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 90 citations
Wu et al.
Expressing Objects Just Like Words: Recurrent Visual Embedding For Image-text Matching (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 65 citations
Tianlang Chen, Jiebo Luo
Guiding Attention In Sequence-to-sequence Models For Dialogue Act Prediction (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Colombo et al.
Location-aware Graph Convolutional Networks For Video Question Answering (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 169 citations
Huang et al.
Evaluating Commonsense In Pre-trained Language Models (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 50 citations
Zhou et al.
Activitynet-qa: A Dataset For Understanding Complex Web Videos Via Question Answering (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 199 citations
Yu et al.
Knowing What, How And Why: A Near Complete Solution For Aspect-based Sentiment Analysis (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 255 citations
Peng et al.
Generating Persona Consistent Dialogues By Exploiting Natural Language Inference (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 65 citations
Song et al.
Select, Answer And Explain: Interpretable Multi-hop Reading Comprehension Over Multiple Documents (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 54 citations
Tu et al.
Unsupervised Neural Machine Translation With SMT As Posterior Regularization (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 56 citations
Ren et al.
HAS-QA: Hierarchical Answer Spans Model For Open-domain Question Answering (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 43 citations
Pang et al.
TANDA: Transfer And Adapt Pre-trained Transformer Models For Answer Sentence Selection (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 92 citations
Siddhant Garg, Thuy Vu, Alessandro Moschitti
End-to-end Knowledge-routed Relational Dialogue System For Automatic Diagnosis (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 154 citations
Xu et al.
Context-aware Self-attention Networks (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 92 citations
Yang et al.
Q-BERT: Hessian Based Ultra Low Precision Quantization Of BERT (2019) • AAAI 2020 • 52 citations
Shen et al.
Non-autoregressive Machine Translation With Auxiliary Regularization (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 153 citations
Wang et al.
Towards Non-saturating Recurrent Units For Modelling Long-term Dependencies (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Chandar et al.
Deep Short Text Classification With Knowledge Powered Attention (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 132 citations
Chen et al.
Temporal Deformable Convolutional Encoder-decoder Networks For Video Captioning (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 99 citations
Chen et al.
Movie Question Answering: Remembering The Textual Cues For Layered Visual Contents (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 44 citations
Wang et al.
Complex Sequential Question Answering: Towards Learning To Converse Over Linked Question Answer Pairs With A Knowledge Graph (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 172 citations
Saha et al.
Character-level Language Modeling With Deeper Self-attention (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 146 citations
Al-Rfou et al.
Asynchronous Bidirectional Decoding For Neural Machine Translation (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 115 citations
Zhang et al.
Joint Training For Neural Machine Translation Models With Monolingual Data (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 99 citations
Zhang et al.
Improving Variational Encoder-decoders In Dialogue Generation (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 91 citations
Shen et al.
Learning To Extract Coherent Summary Via Deep Reinforcement Learning (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 130 citations
Yuxiang Wu, Baotian Hu
Medical Exam Question Answering With Large-scale Reading Comprehension (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Zhang et al.
Efficient Large-scale Multi-modal Classification (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 145 citations
Kiela et al.
Table-to-text: Describing Table Region With Natural Language (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 62 citations
Bao et al.
A Knowledge-grounded Neural Conversation Model (2017) • Proceedings of the AAAI Conference on Artificial Intelligence • 234 citations
Ghazvininejad et al.
Empower Sequence Labeling With Task-aware Neural Language Model (2017) • Proceedings of the AAAI Conference on Artificial Intelligence • 126 citations
Liu et al.

Showing first 12 while collapsed. Click to expand and reveal all 87.

ACL 370 papers #

BGE M3-embedding: Multi-lingual, Multi-functionality, Multi-granularity Text Embeddings Through Self-knowledge Distillation (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 221 citations
Chen et al.
On Llms-driven Synthetic Data Generation, Curation, And Evaluation: A Survey (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 45 citations
Long et al.
Paecter: Patent-level Representation Learning Using Citation-informed Transformers (2024) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 218 citations
Ghosh et al.
Biomistral: A Collection Of Open-source Pretrained Large Language Models For Medical Domains (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 108 citations
Labrak et al.
Benchmarking Retrieval-augmented Generation For Medicine (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 119 citations
Xiong et al.
LLM+P: Empowering Large Language Models With Optimal Planning Proficiency (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 61 citations
Liu et al.
Information Screening Whilst Exploiting! Multimodal Relation Extraction With Feature Denoising And Multimodal Topic Modeling (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Wu et al.
Membership Inference Attacks Against Language Models Via Neighbourhood Comparison (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 46 citations
Mattern et al.
Cross2stra: Unpaired Cross-lingual Image Captioning With Cross-lingual Cross-modal Structure-pivoted Alignment (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 41 citations
Wu et al.
Few-shot Fine-tuning Vs. In-context Learning: A Fair Comparison And Evaluation (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 54 citations
Mosbach et al.
Language Model Behavior: A Comprehensive Survey (2023) • Computational Linguistics • 53 citations
Tyler A. Chang, Benjamin K. Bergen
Revisiting Relation Extraction In The Era Of Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 94 citations
Somin Wadhwa, Silvio Amir, Byron C. Wallace
Can Large Language Models Be An Alternative To Human Evaluations? (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 161 citations
Cheng-Han Chiang, Hung-Yi Lee
Rolellm: Benchmarking, Eliciting, And Enhancing Role-playing Abilities Of Large Language Models (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 51 citations
Wang et al.
Increasing Diversity While Maintaining Accuracy: Text Data Generation With Large Language Models And Human Interventions (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
John Joon Young Chung, Ece Kamar, Saleema Amershi
Aom: Detecting Aspect-oriented Information For Multimodal Aspect-based Sentiment Analysis (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 41 citations
Zhou et al.
Scene Graph As Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation With Visual Scene Hallucination (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
Fei et al.
Reasoning Implicit Sentiment With Chain-of-thought Prompting (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 75 citations
Fei et al.
From Pretraining Data To Language Models To Downstream Tasks: Tracking The Trails Of Political Biases Leading To Unfair NLP Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 75 citations
Feng et al.
Medagents: Large Language Models As Collaborators For Zero-shot Medical Reasoning (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 53 citations
Tang et al.
Aligning Instruction Tasks Unlocks Large Language Models As Zero-shot Relation Extractors (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 44 citations
Kai Zhang, Bernal Jiménez Gutiérrez, Yu Su
Distilling Step-by-step! Outperforming Larger Language Models With Less Training Data And Smaller Model Sizes (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 160 citations
Hsieh et al.
Evaluating Embedding Apis For Information Retrieval (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 73 citations
Kamalloo et al.
In-context Retrieval-augmented Language Models (2023) • Transactions of the Association for Computational Linguistics • 201 citations
Ram et al.
Siren's Song In The AI Ocean: A Survey On Hallucination In Large Language Models (2023) • Computational Linguistics • 55 citations
Zhang et al.
A Systematic Study And Comprehensive Evaluation Of Chatgpt On Benchmark Datasets (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 69 citations
Laskar et al.
Towards Attribute-entangled Controllable Text Generation: A Pilot Study Of Blessing Generation (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 199 citations
Huang et al.
Discovering Language Model Behaviors With Model-written Evaluations (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 47 citations
Perez et al.
Effective Token Graph Modeling Using A Novel Labeling Strategy For Structured Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 71 citations
Shi et al.
Vit5: Pretrained Text-to-text Transformer For Vietnamese Language Generation (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop • 51 citations
Phan et al.
Vision-language Pre-training For Multimodal Aspect-based Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 103 citations
Yan Ling, Jianfei Yu, Rui Xia
A Two-stream Amr-enhanced Model For Document-level Event Argument Extraction (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 47 citations
Xu et al.
Controllable Natural Language Generation With Contrastive Prefixes (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 56 citations
Qian et al.
A Few Thousand Translations Go A Long Way! Leveraging Pre-trained Models For African News Translation (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 44 citations
Adelani et al.
Promptsource: An Integrated Development Environment And Repository For Natural Language Prompts (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 148 citations
Bach et al.
Graph Pre-training For AMR Parsing And Generation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 61 citations
Xuefeng Bai, Yulong Chen, Yue Zhang
Lilt: A Simple Yet Effective Language-independent Layout Transformer For Structured Document Understanding (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 93 citations
Jiapeng Wang, Lianwen Jin, Kai Ding
Generating Data To Mitigate Spurious Correlations In Natural Language Inference Datasets (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
Wu et al.
THE-X: Privacy-preserving Transformer Inference With Homomorphic Encryption (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 47 citations
Chen et al.
Factpegasus: Factuality-aware Pre-training And Fine-tuning For Abstractive Summarization (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 48 citations
David Wan, Mohit Bansal
Promda: Prompt-based Data Augmentation For Low-resource NLU Tasks (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 61 citations
Wang et al.
The Moral Integrity Corpus: A Benchmark For Ethical Dialogue Systems (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Ziems et al.
M-SENA: An Integrated Platform For Multimodal Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 58 citations
Mao et al.
Enabling Multimodal Generation On CLIP Via Vision-language Knowledge Distillation (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 52 citations
Dai et al.
Why Can GPT Learn In-context? Language Models Implicitly Perform Gradient Descent As Meta-optimizers (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 73 citations
Dai et al.
MISC: A Mixed Strategy-aware Model Integrating COMET For Emotional Support Conversation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 66 citations
Tu et al.
On The Origin Of Hallucinations In Conversational Models: Is It The Datasets Or The Models? (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 101 citations
Dziri et al.
One Embedder, Any Task: Instruction-finetuned Text Embeddings (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 68 citations
Su et al.
Dynatask: A Framework For Creating Dynamic AI Benchmark Tasks (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 40 citations
Thrush et al.
Document-level Relation Extraction With Adaptive Focal Loss And Knowledge Distillation (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 101 citations
Tan et al.
RARR: Researching And Revising What Language Models Say, Using Language Models (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Gao et al.
An Information-theoretic Approach To Prompt Engineering Without Ground Truth Labels (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 95 citations
Sorensen et al.
Linkbert: Pretraining Language Models With Document Links (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 163 citations
Michihiro Yasunaga, Jure Leskovec, Percy Liang
Unified Structure Generation For Universal Information Extraction (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 345 citations
Lu et al.
Why Does Surprisal From Larger Transformer-based Language Models Provide A Poorer Fit To Human Reading Times? (2022) • Transactions of the Association for Computational Linguistics • 51 citations
Byung-Doh Oh, William Schuler
Debiased Contrastive Learning Of Unsupervised Sentence Representations (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 94 citations
Zhou et al.
Mucgec: A Multi-reference Multi-source Evaluation Dataset For Chinese Grammatical Error Correction (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 51 citations
Zhang et al.
Incorporating Dynamic Semantics Into Pre-trained Language Model For Aspect-based Sentiment Analysis (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 49 citations
Zhang et al.
Subgraph Retrieval Enhanced Model For Multi-hop Knowledge Base Question Answering (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 76 citations
Zhang et al.
Diffusionbert: Improving Generative Masked Language Models With Diffusion Models (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 41 citations
He et al.
Reclip: A Strong Zero-shot Baseline For Referring Expression Comprehension (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 94 citations
Subramanian et al.
LEWIS: Levenshtein Editing For Unsupervised Text Style Transfer (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 43 citations
MacHel Reid, Victor Zhong
Simcls: A Simple Framework For Contrastive Learning Of Abstractive Summarization (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) • 203 citations
Yixin Liu, Pengfei Liu
Dexperts: Decoding-time Controlled Text Generation With Experts And Anti-experts (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 127 citations
Liu et al.
Chinesebert: Chinese Pretraining Enhanced By Glyph And Pinyin Information (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 176 citations
Sun et al.
Topic-to-essay Generation With Comprehensive Knowledge Enhancement (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 49 citations
Zhiyue Liu, Jiahai Wang, Zhenghong Li
Consert: A Contrastive Framework For Self-supervised Sentence Representation Transfer (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 466 citations
Yan et al.
Continual Learning For Text Classification With Information Disentanglement Based Regularization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 59 citations
Huang et al.
Are NLP Models Really Able To Solve Simple Math Word Problems? (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 66 citations
Arkil Patel, Satwik Bhattamishra, Navin Goyal
Psyqa: A Chinese Dataset For Generating Long Counseling Text For Mental Health Support (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 47 citations
Sun et al.
Detecting Harmful Memes And Their Targets (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 90 citations
Pramanick et al.
TABBIE: Pretrained Representations Of Tabular Data (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 110 citations
Iida et al.
Document-level Event Argument Extraction By Conditional Generation (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 222 citations
Sha Li, Heng Ji, Jiawei Han
Towards Enhancing Fine-grained Details For Image Matting (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 160 citations
Chang Liu, Henghui Ding, Xudong Jiang
Unified Pre-training For Program Understanding And Generation (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 520 citations
Ahmad et al.
Arat5: Text-to-text Transformers For Arabic Language Generation (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 56 citations
El Moatez Billah Nagoudi, Abdelrahim Elmadany, Muhammad Abdul-Mageed
Cross-lingual Abstractive Summarization With Limited Parallel Resources (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 43 citations
Yu Bai, Yang Gao, Heyan Huang
Syntax-bert: Improving Pre-trained Transformers With Syntax Trees (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 40 citations
Bai et al.
Polyjuice: Generating Counterfactuals For Explaining, Evaluating, And Improving Models (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 60 citations
Wu et al.
CLINE: Contrastive Learning With Semantic Negative Examples For Natural Language Understanding (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 85 citations
Wang et al.
Musicbert: Symbolic Music Understanding With Large-scale Pre-training (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 79 citations
Zeng et al.
ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis And Rating Prediction (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 63 citations
Bu et al.
Out-of-scope Intent Detection With Self-supervision And Discriminative Training (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 46 citations
Zhan et al.
MELM: Data Augmentation With Masked Entity Language Modeling For Low-resource NER (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 50 citations
Zhou et al.
Improving Faithfulness In Abstractive Summarization With Contrast Candidate Generation And Selection (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 74 citations
Chen et al.
Dialogsum: A Real-life Scenario Dialogue Summarization Dataset (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 121 citations
Chen et al.
Industry Scale Semi-supervised Learning For Natural Language Understanding (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers • 67 citations
Chen et al.
Structure-aware Abstractive Conversation Summarization Via Discourse And Action Graphs (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 68 citations
Jiaao Chen, Diyi Yang
Semantic And Syntactic Enhanced Aspect Sentiment Triplet Extraction (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 60 citations
Chen et al.
Shadowgnn: Graph Projection Neural Network For Text-to-sql Parser (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 44 citations
Chen et al.
An Empirical Survey Of The Effectiveness Of Debiasing Techniques For Pre-trained Language Models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 45 citations
Nicholas Meade, Elinor Poole-Dayan, Siva Reddy
Bitfit: Simple Parameter-efficient Fine-tuning For Transformer-based Masked Language-models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 286 citations
Elad Ben Zaken, Shauli Ravfogel, Yoav Goldberg
Measuring And Improving Consistency In Pretrained Language Models (2021) • Transactions of the Association for Computational Linguistics • 82 citations
Elazar et al.
Factual Probing Is [MASK]: Learning Vs. Learning To Recall (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 230 citations
Zexuan Zhong, Dan Friedman, Danqi Chen
Learning Domain Adaptation With Model Calibration For Surgical Report Generation In Robotic Surgery (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 164 citations
Xu et al.
Directed Acyclic Graph Network For Conversational Emotion Recognition (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 245 citations
Shen et al.
Planning With Learned Entity Prompts For Abstractive Summarization (2021) • Transactions of the Association for Computational Linguistics • 92 citations
Narayan et al.
VLM: Task-agnostic Video-language Model Pre-training For Video Understanding (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 56 citations
Xu et al.
Template-based Named Entity Recognition Using BART (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 283 citations
Cui et al.
Parameter-efficient Multi-task Fine-tuning For Transformers Via Shared Hypernetworks (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 54 citations
Mahabadi et al.
Indicbart: A Pre-trained Model For Indic Natural Language Generation (2021) • Findings of the Association for Computational Linguistics: ACL 2022 • 51 citations
Dabre et al.
Knowledge Neurons In Pretrained Transformers (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 76 citations
Dai et al.
Does Syntax Matter? A Strong Baseline For Aspect-based Sentiment Analysis With Roberta (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 155 citations
Dai et al.
Lower Perplexity Is Not Always Human-like (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 48 citations
Kuribayashi et al.
Ultra-fine Entity Typing With Weak Supervision From A Masked Language Model (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Hongliang Dai, Yangqiu Song, Haixun Wang
Increasing Faithfulness In Knowledge-grounded Dialogue With Controllable Features (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 69 citations
Rashkin et al.
Stacked Acoustic-and-textual Encoding: Integrating The Pre-trained Models Into Speech Translation Encoders (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 66 citations
Xu et al.
Few-shot Intent Classification And Slot Filling With Retrieved Examples (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 46 citations
Yu et al.
Document-level Event Extraction Via Heterogeneous Graph-based Interaction Model With A Tracker (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 85 citations
Xu et al.
A Dataset Of Information-seeking Questions And Answers Anchored In Research Papers (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 86 citations
Dasigi et al.
Quality At A Glance: An Audit Of Web-crawled Multilingual Datasets (2021) • Transactions of the Association for Computational Linguistics • 155 citations
Kreutzer et al.
Adaptsum: Towards Low-resource Domain Adaptation For Abstractive Summarization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 61 citations
Tiezheng Yu, Zihan Liu, Pascale Fung
Docnli: A Large-scale Dataset For Document-level Natural Language Inference (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 51 citations
Wenpeng Yin, Dragomir Radev, Caiming Xiong
Few-nerd: A Few-shot Named Entity Recognition Dataset (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 184 citations
Ding et al.
GLM: General Language Model Pretraining With Autoregressive Blank Infilling (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 635 citations
Du et al.
Towards Interpreting And Mitigating Shortcut Learning Behavior Of NLU Models (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 63 citations
Du et al.
A Survey Of Data Augmentation Approaches For NLP (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 127 citations
Feng et al.
Position Bias Mitigation: A Knowledge-aware Graph Model For Emotion Cause Extraction (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 47 citations
Yan et al.
Decoupling The Role Of Data, Attention, And Losses In Multimodal Transformers (2021) • Transactions of the Association for Computational Linguistics • 63 citations
Hendricks et al.
Gender Bias In Machine Translation (2021) • Transactions of the Association for Computational Linguistics • 74 citations
Savoldi et al.
Learnda: Learnable Knowledge-guided Data Augmentation For Event Causality Identification (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 52 citations
Zuo et al.
Jointgt: Graph-text Joint Representation Learning For Text Generation From Knowledge Graphs (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 77 citations
Ke et al.
Meta-learning Adversarial Domain Adaptation Network For Few-shot Text Classification (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 54 citations
Han et al.
Learning Shared Semantic Space For Speech-to-text Translation (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 57 citations
Han et al.
A Unified Generative Framework For Aspect-based Sentiment Analysis (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 256 citations
Yan et al.
Xl-sum: Large-scale Multilingual Abstractive Summarization For 44 Languages (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 179 citations
Hasan et al.
On The Effectiveness Of Adapter-based Tuning For Pretrained Language Model Adaptation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 112 citations
He et al.
Structurallm: Structural Pre-training For Form Understanding (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 61 citations
Li et al.
Symbolic Knowledge Distillation: From General Language Models To Commonsense Models (2021) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 54 citations
West et al.
Bob: BERT Over BERT For Training Persona-based Dialogue Models From Limited Personalized Data (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 91 citations
Song et al.
Experts, Errors, And Context: A Large-scale Study Of Human Evaluation For Machine Translation (2021) • Transactions of the Association for Computational Linguistics • 91 citations
Freitag et al.
Contrastive Learning For Many-to-many Multilingual Neural Machine Translation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 134 citations
Pan et al.
Dynaeval: Unifying Turn And Dialogue Level Evaluation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Zhang et al.
Towards Robustness Of Text-to-sql Models Against Synonym Substitution (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 58 citations
Gan et al.
Self-supervised Text-to-sql Learning With Header Alignment Training (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 160 citations
Donggyu Kim, Seanie Lee
Topic-driven And Knowledge-aware Transformer For Dialogue Emotion Detection (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 94 citations
Zhu et al.
QA-GNN: Reasoning With Language Models And Knowledge Graphs For Question Answering (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 259 citations
Yasunaga et al.
One2set: Generating Diverse Keyphrases As A Set (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 60 citations
Ye et al.
Robustness Gym: Unifying The NLP Evaluation Landscape (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations • 48 citations
Goel et al.
Revcore: Review-augmented Conversational Recommendation (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 64 citations
Lu et al.
TIMEDIAL: Temporal Commonsense Reasoning In Dialog (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Qin et al.
TAT-QA: A Question Answering Benchmark On A Hybrid Of Tabular And Textual Content In Finance (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 126 citations
Zhu et al.
Cutting Down On Prompts And Parameters: Simple Few-shot Learning With Language Models (2021) • Findings of the Association for Computational Linguistics: ACL 2022 • 42 citations
Logan et al.
Long Text Generation By Modeling Sentence-level And Discourse-level Coherence (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 47 citations
Guan et al.
Textflint: Unified Multilingual Robustness Evaluation Toolkit For Natural Language Processing (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations • 79 citations
Gui et al.
Explaining Black Box Predictions And Unveiling Data Artifacts Through Influence Functions (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 86 citations
Xiaochuang Han, Byron C. Wallace, Yulia Tsvetkov
CDL: Curriculum Dual Learning For Emotion-controllable Response Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 80 citations
Lei Shen, Yang Feng
Don't Stop Pretraining: Adapt Language Models To Domains And Tasks (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 740 citations
Gururangan et al.
Encoder-decoder Models Can Benefit From Pre-trained Masked Language Models In Grammatical Error Correction (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 120 citations
Kaneko et al.
Mt5: A Massively Multilingual Pre-trained Text-to-text Transformer (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 618 citations
Xue et al.
Template-based Question Generation From Retrieved Sentences For Improved Unsupervised Question Answering (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 68 citations
Fabbri et al.
Summeval: Re-evaluating Summarization Evaluation (2020) • Transactions of the Association for Computational Linguistics • 47 citations
Fabbri et al.
Tabert: Pretraining For Joint Understanding Of Textual And Tabular Data (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 380 citations
Yin et al.
A Contextual Hierarchical Attention Network With Adaptive Objective For Dialogue State Tracking (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 57 citations
Shan et al.
Neural Data-to-text Generation Via Jointly Learning The Segmentation And Correspondence (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 45 citations
Shen et al.
Parallel Data Augmentation For Formality Style Transfer (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 67 citations
Yi Zhang, Tao Ge, Xu Sun
Pretrained Transformers Improve Out-of-distribution Robustness (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 104 citations
Hendrycks et al.
Realformer: Transformer Likes Residual Attention (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 45 citations
He et al.
A Knowledge-enhanced Pretraining Model For Commonsense Story Generation (2020) • Transactions of the Association for Computational Linguistics • 231 citations
Guan et al.
Dynamic Knowledge Routing Network For Target-guided Open-domain Conversation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 87 citations
Qin et al.
BLEURT: Learning Robust Metrics For Text Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 40 citations
Thibault Sellam, Dipanjan Das, Ankur P. Parikh
SAFER: A Structure-free Approach For Certified Robustness To Adversarial Word Substitutions (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 84 citations
Mao Ye, Chengyue Gong, Qiang Liu
Kdconv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 101 citations
Zhou et al.
Injecting Numerical Reasoning Skills Into Language Models (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 53 citations
Mor Geva, Ankit Gupta, Jonathan Berant
Opiniondigest: A Simple Framework For Opinion Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 62 citations
Suhara et al.
Sparse, Dense, And Attentional Representations For Text Retrieval (2020) • Transactions of the Association for Computational Linguistics • 154 citations
Luan et al.
Compressing Large-scale Transformer-based Models: A Case Study On BERT (2020) • Transactions of the Association for Computational Linguistics • 102 citations
Ganesh et al.
Making Pre-trained Language Models Better Few-shot Learners (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 556 citations
Tianyu Gao, Adam Fisch, Danqi Chen
Paraphrase Augmented Task-oriented Dialog Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 77 citations
Gao et al.
SUPERT: Towards New Frontiers In Unsupervised Evaluation Metrics For Multi-document Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 107 citations
Yang Gao, Wei Zhao, Steffen Eger
Code Summarization With Structure-induced Transformer (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 45 citations
Hongqiu Wu, Hai Zhao, Min Zhang
Deebert: Dynamic Early Exiting For Accelerating BERT Inference (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 265 citations
Xin et al.
On The Limitations Of Cross-lingual Encoders As Exposed By Reference-free Machine Translation Evaluation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 52 citations
Zhao et al.
Demographics Should Not Be The Reason Of Toxicity: Mitigating Discrimination In Text Classifications With Instance Weighting (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 55 citations
Zhang et al.
Reasoning With Latent Structure Refinement For Document-level Relation Extraction (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 280 citations
Nan et al.
Robust Encodings: A Framework For Combating Adversarial Typos (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 89 citations
Jones et al.
ARBERT & MARBERT: Deep Bidirectional Transformers For Arabic (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 217 citations
Muhammad Abdul-Mageed, Abdelrahim Elmadany, El Moatez Billah Nagoudi
Improved Natural Language Generation Via Loss Truncation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 72 citations
Daniel Kang, Tatsunori Hashimoto
Efficient Second-order Treecrf For Neural Dependency Parsing (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 88 citations
Yu Zhang, Zhenghua Li, Min Zhang
Neural CRF Model For Sentence Alignment In Text Simplification (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 97 citations
Jiang et al.
Knowledge Graph Based Synthetic Corpus Generation For Knowledge-enhanced Language Model Pre-training (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 49 citations
Agarwal et al.
A Transformer-based Approach For Source Code Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 256 citations
Ahmad et al.
Generate, Delete And Rewrite: A Three-stage Framework For Improving Persona Consistency Of Dialogue Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 40 citations
Song et al.
Unsupervised Domain Clusters In Pretrained Language Models (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 48 citations
Roee Aharoni, Yoav Goldberg
Stereoset: Measuring Stereotypical Bias In Pretrained Language Models (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 224 citations
Moin Nadeem, Anna Bethke, Siva Reddy
Perturbed Masking: Parameter-free Probing For Analyzing And Interpreting BERT (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 154 citations
Wu et al.
Logic-guided Data Augmentation And Regularization For Consistent Question Answering (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 59 citations
Akari Asai, Hannaneh Hajishirzi
Towards Conversational Recommendation Over Multi-type Dialogs (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 151 citations
Liu et al.
Toxic Language Detection In Social Media For Brazilian Portuguese: New Dataset And Multilingual Analysis (2020) • Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing • 45 citations
Leite et al.
Language (technology) Is Power: A Critical Survey Of "bias" In NLP (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 68 citations
Blodgett et al.
Beyond Accuracy: Behavioral Testing Of NLP Models With Checklist (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 51 citations
Ribeiro et al.
Balancing Training For Multilingual Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 73 citations
Xinyi Wang, Yulia Tsvetkov, Graham Neubig
Photon: A Robust Cross-domain Text-to-sql System (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 42 citations
Zeng et al.
Conditional Augmentation For Aspect Term Extraction Via Masked Sequence-to-sequence Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 99 citations
Li et al.
Minilmv2: Multi-head Self-attention Relation Distillation For Compressing Pretrained Transformers (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 149 citations
Wang et al.
Heterogeneous Graph Neural Networks For Extractive Document Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 270 citations
Wang et al.
Multi-domain Dialogue Acts And Response Co-generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 45 citations
Wang et al.
Syntactic Data Augmentation Increases Robustness To Inference Heuristics (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 143 citations
Min et al.
Jiant: A Software Toolkit For Research On General-purpose Text Understanding Models (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 64 citations
Pruksachatkun et al.
Data Manipulation: Towards Effective Instance Learning For Neural Dialogue Generation Via Learning To Augment And Reweight (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 56 citations
Cai et al.
AMR Parsing Via Graph-sequence Iterative Inference (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 80 citations
Deng Cai, Wai Lam
Expertise Style Transfer: A New Task Towards Better Communication Between Experts And Laymen (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 50 citations
Cao et al.
Simultaneous Translation Policies: From Fixed To Adaptive (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 54 citations
Zheng et al.
Does Multi-encoder Help? A Case Study On Context-aware Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 60 citations
Li et al.
Social Biases In NLP Models As Barriers For Persons With Disabilities (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 249 citations
Hutchinson et al.
Exclusive Hierarchical Decoding For Deep Keyphrase Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 62 citations
Chen et al.
Logical Natural Language Generation From Open-domain Tables (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 114 citations
Chen et al.
Efficient Content-based Sparse Attention With Routing Transformers (2020) • Transactions of the Association for Computational Linguistics • 266 citations
Roy et al.
Leveraging Graph To Improve Abstractive Multi-document Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 113 citations
Li et al.
Document Modeling With Graph Attention Networks For Multi-grained Machine Reading Comprehension (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 55 citations
Zheng et al.
How Good Is Your Tokenizer? On The Monolingual Performance Of Multilingual Language Models (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 50 citations
Rust et al.
Multilingual Denoising Pre-training For Neural Machine Translation (2020) • Transactions of the Association for Computational Linguistics • 897 citations
Liu et al.
On Faithfulness And Factuality In Abstractive Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 54 citations
Maynez et al.
Rikinet: Reading Wikipedia Pages For Natural Question Answering (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 54 citations
Liu et al.
SPECTER: Document-level Representation Learning Using Citation-informed Transformers (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 218 citations
Cohan et al.
Span-convert: Few-shot Span Extraction For Dialog With Pretrained Conversational Representations (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 43 citations
Coope et al.
You Impress Me: Dialogue Generation Via Mutual Persona Perception (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 138 citations
Liu et al.
Knowledge Graph-augmented Abstractive Summarization With Semantic-driven Cloze Reward (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 148 citations
Luyang Huang, Lingfei Wu, Lu Wang
Mutual: A Dataset For Multi-turn Dialogue Reasoning (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 110 citations
Cui et al.
A Systematic Assessment Of Syntactic Generalization In Neural Language Models (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 120 citations
Hu et al.
Iterative Edit-based Unsupervised Sentence Simplification (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 57 citations
Kumar et al.
Document-level Event Role Filler Extraction Using Multi-granularity Contextualized Encoding (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 98 citations
Xinya Du, Claire Cardie
Coach: A Coarse-to-fine Approach For Cross-domain Slot Filling (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 99 citations
Liu et al.
As Good As New. How To Successfully Recycle English GPT-2 To Make Models For Other Languages (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 40 citations
Wietse de Vries, Malvina Nissim
Designing Precise And Robust Dialogue Response Evaluators (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 40 citations
Tianyu Zhao, Divesh Lala, Tatsuya Kawahara
Extractive Summarization As Text Matching (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 394 citations
Zhong et al.
Enabling Language Models To Fill In The Blanks (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 41 citations
Chris Donahue, Mina Lee, Percy Liang
Structure-grounded Pretraining For Text-to-sql (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 56 citations
Deng et al.
Low-resource Deep Entity Resolution With Transfer And Active Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 120 citations
Kasai et al.
BIGPATENT: A Large-scale Dataset For Abstractive And Coherent Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 155 citations
Eva Sharma, Chen Li, Lu Wang
A Novel Bi-directional Interrelated Model For Joint Intent Detection And Slot Filling (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 188 citations
E et al.
Learning From Dialogue After Deployment: Feed Yourself, Chatbot! (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 66 citations
Hancock et al.
Negated And Misprimed Probes For Pretrained Language Models: Birds Can Talk, But Cannot Fly (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 69 citations
Nora Kassner, Hinrich Schütze
Learning To Ask Unanswerable Questions For Machine Reading Comprehension (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 44 citations
Zhu et al.
Editnts: An Neural Programmer-interpreter Model For Sentence Simplification Through Explicit Editing (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 139 citations
Dong et al.
Visually Grounded Neural Syntax Acquisition (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 68 citations
Shi et al.
Domain Adaptive Dialog Generation Via Meta Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 134 citations
Kun Qian, Zhou Yu
Multimodal Transformer For Unaligned Multimodal Language Sequences (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1063 citations
Tsai et al.
Emotion-cause Pair Extraction: A New Task To Emotion Analysis In Texts (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 249 citations
Rui Xia, Zixiang Ding
Exbert: A Visual Analysis Tool To Explore Learned Representations In Transformers Models (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 75 citations
Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann
Masked Language Model Scoring (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 145 citations
Salazar et al.
When A Good Translation Is Wrong In Context: Context-aware Machine Translation Improves On Deixis, Ellipsis, And Lexical Cohesion (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 48 citations
Elena Voita, Rico Sennrich, Ivan Titov
Robust Neural Machine Translation With Doubly Adversarial Inputs (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 233 citations
Yong Cheng, Lu Jiang, Wolfgang MacHerey
Transferable Multi-domain State Generator For Task-oriented Dialogue Systems (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 404 citations
Wu et al.
A Multiscale Visualization Of Attention In The Transformer Model (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 295 citations
Jesse Vig
Camembert: A Tasty French Language Model (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 336 citations
Martin et al.
Analyzing The Structure Of Attention In A Transformer Language Model (2019) • Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP • 123 citations
Jesse Vig, Yonatan Belinkov
Improving The Similarity Measure Of Determinantal Point Processes For Extractive Multi-document Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 61 citations
Cho et al.
Choosing Transfer Languages For Cross-lingual Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 148 citations
Lin et al.
Towards Generating Long And Coherent Text With Multi-level Latent Variable Models (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 46 citations
Shen et al.
BERT Rediscovers The Classical NLP Pipeline (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1210 citations
Ian Tenney, Dipanjan Das, Ellie Pavlick
CONAN -- Counter Narratives Through Nichesourcing: A Multilingual Dataset Of Responses To Fight Online Hate Speech (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 78 citations
Chung et al.
Coupling Retrieval And Meta-learning For Context-dependent Semantic Parsing (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 44 citations
Guo et al.
End-to-end Bias Mitigation By Modelling Biases In Corpora (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 134 citations
Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson
Strategies For Structuring Story Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 186 citations
Angela Fan, Mike Lewis, Yann Dauphin
Style Transformer: Unpaired Text Style Transfer Without Disentangled Latent Representation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 68 citations
Dai et al.
Transformer-xl: Attentive Language Models Beyond A Fixed-length Context (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1694 citations
Dai et al.
Nlprolog: Reasoning With Weak Unification For Question Answering In Natural Language (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 71 citations
Weber et al.
Towards Scalable And Reliable Capsule Networks For Challenging NLP Applications (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 136 citations
Zhao et al.
Personalizing Dialogue Agents Via Meta-learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 177 citations
Lin et al.
Energy And Policy Considerations For Deep Learning In NLP (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 423 citations
Emma Strubell, Ananya Ganesh, Andrew McCallum
A Surprisingly Robust Trick For Winograd Schema Challenge (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 91 citations
Kocijan et al.
Rewarding Smatch: Transition-based AMR Parsing With Reinforcement Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 60 citations
Naseem et al.
Unsupervised Cross-lingual Representation Learning At Scale (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 538 citations
Conneau et al.
Exploiting Entity BIO Tag Embeddings And Multi-task Learning For Relation Extraction With Imbalanced Data (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 44 citations
Ye et al.
Olmpics -- On What Language Model Pre-training Captures (2019) • Transactions of the Association for Computational Linguistics • 55 citations
Talmor et al.
Hellaswag: Can A Machine Really Finish Your Sentence? (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 572 citations
Zellers et al.
SMART: Robust And Efficient Fine-tuning For Pre-trained Natural Language Models Through Principled Regularized Optimization (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 138 citations
Jiang et al.
Exact Hard Monotonic Attention For Character-level Transduction (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 54 citations
Shijie Wu, Ryan Cotterell
Effective Cross-lingual Transfer Of Neural Machine Translation Models Without Shared Vocabularies (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 71 citations
Yunsu Kim, Yingbo Gao, Hermann Ney
ERNIE: Enhanced Language Representation With Informative Entities (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1348 citations
Zhang et al.
HIBERT: Document Level Pre-training Of Hierarchical Bidirectional Transformers For Document Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 161 citations
Xingxing Zhang, Furu Wei, Ming Zhou
Fine-tuning Pre-trained Transformer Language Models To Distantly Supervised Relation Extraction (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 58 citations
Christoph Alt, Marc Hübner, Leonhard Hennig
Incremental Transformer With Deliberation Decoder For Document Grounded Conversations (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 42 citations
Li et al.
Simple And Effective Curriculum Pointer-generator Networks For Reading Comprehension Over Long Narratives (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 91 citations
Tay et al.
Simple And Effective Text Matching With Richer Alignment Features (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 171 citations
Yang et al.
On The Cross-lingual Transferability Of Monolingual Representations (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 57 citations
Mikel Artetxe, Sebastian Ruder, Dani Yogatama
GCDT: A Global Context Enhanced Deep Transition Architecture For Sequence Labeling (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 96 citations
Liu et al.
Ccmatrix: Mining Billions Of High-quality Parallel Sentences On The WEB (2019) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 61 citations
Schwenk et al.
Generalized Data Augmentation For Low-resource Translation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 109 citations
Xia et al.
Constrained Decoding For Neural NLG From Compositional Representations In Task-oriented Dialogue (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 74 citations
Balakrishnan et al.
Revisiting Joint Modeling Of Cross-document Entity And Event Coreference Resolution (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 104 citations
Barhom et al.
Human Vs. Muppet: A Conservative Estimate Of Human Performance On The GLUE Benchmark (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 52 citations
Nikita Nangia, Samuel R. Bowman
Revisiting Low-resource Neural Machine Translation: A Case Study (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 151 citations
Rico Sennrich, Biao Zhang
Multi-task Deep Neural Networks For Natural Language Understanding (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1026 citations
Liu et al.
GEAR: Graph-based Evidence Aggregating And Reasoning For Fact Verification (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 196 citations
Zhou et al.
COMET: Commonsense Transformers For Automatic Knowledge Graph Construction (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 329 citations
Bosselut et al.
Make Up Your Mind! Adversarial Generation Of Inconsistent Natural Language Explanations (2019) • Short Paper at ACL 2020 • 55 citations
Camburu et al.
Emerging Cross-lingual Structure In Pretrained Language Models (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 46 citations
Wu et al.
Sentence Centrality Revisited For Unsupervised Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 175 citations
Hao Zheng, Mirella Lapata
Reinforced Dynamic Reasoning For Conversational Question Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 40 citations
Pan et al.
Does It Make Sense? And Why? A Pilot Study For Sense Making And Explanation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 105 citations
Wang et al.
Multi-hop Reading Comprehension Through Question Decomposition And Rescoring (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 182 citations
Min et al.
Weakly-supervised Spatio-temporally Grounding Natural Sentence In Video (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 101 citations
Chen et al.
Answering While Summarizing: Multi-task Learning For Multi-hop QA With Evidence Extraction (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 96 citations
Nishida et al.
Self-supervised Learning For Contextualized Extractive Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 61 citations
Wang et al.
Learning Deep Transformer Models For Machine Translation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 356 citations
Wang et al.
Hierarchical Transformers For Multi-document Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 133 citations
Yang Liu, Mirella Lapata
A Hierarchical Reinforced Sequence Operation Method For Unsupervised Text Style Transfer (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 53 citations
Wu et al.
Extracting Multiple-relations In One-pass With Pre-trained Transformers (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 45 citations
Wang et al.
Semantically Conditioned Dialog Response Generation Via Hierarchical Disentangled Self-attention (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 128 citations
Chen et al.
Is Attention Interpretable? (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 115 citations
Sofia Serrano, Noah A. Smith
Docred: A Large-scale Document-level Relation Extraction Dataset (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 479 citations
Yao et al.
Towards Complex Text-to-sql In Cross-domain Database With Intermediate Representation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 348 citations
Guo et al.
Predictive Biases In Natural Language Processing Models: A Conceptual Framework And Overview (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 88 citations
Deven Shah, H. Andrew Schwartz, Dirk Hovy
Are You Looking? Grounding To Multiple Modalities In Vision-and-language Navigation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 74 citations
Hu et al.
Explain Yourself! Leveraging Language Models For Commonsense Reasoning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 86 citations
Rajani et al.
Improving Question Answering Over Incomplete Kbs With Knowledge-aware Reader (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 118 citations
Xiong et al.
Conversing By Reading: Contentful Neural Conversation With On-demand Machine Reading (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 97 citations
Qin et al.
Improved Zero-shot Neural Machine Translation Via Ignoring Spurious Correlations (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 86 citations
Gu et al.
Convlab: Multi-domain End-to-end Dialog System Platform (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 87 citations
Lee et al.
Towards Explainable NLP: A Generative Explanation Framework For Text Classification (2018) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 141 citations
Hui Liu, Qingyu Yin, William Yang Wang
Deep Communicating Agents For Abstractive Summarization (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 62 citations
Celikyilmaz et al.
Universal Language Model Fine-tuning For Text Classification (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1950 citations
Jeremy Howard, Sebastian Ruder
A Bi-model Based RNN Semantic Frame Parsing Model For Intent Detection And Slot Filling (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 202 citations
Yu Wang, Yilin Shen, Hongxia Jin
Adventure: Adversarial Training For Textual Entailment With Knowledge-guided Examples (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 64 citations
Kang et al.
Improving Text-to-sql Evaluation Methodology (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 213 citations
Finegan-Dollak et al.
Multi-passage Machine Reading Comprehension With Cross-passage Answer Verification (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 102 citations
Wang et al.
Fast Abstractive Summarization With Reinforce-selected Sentence Rewriting (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 604 citations
Yen-Chun Chen, Mohit Bansal
Deep Dyna-q: Integrating Planning For Task-completion Dialogue Policy Learning (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 177 citations
Peng et al.
Robust Machine Comprehension Models Via Adversarial Training (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 99 citations
Yicheng Wang, Mohit Bansal
Hierarchical Neural Story Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1076 citations
Angela Fan, Mike Lewis, Yann Dauphin
Mem2seq: Effectively Incorporating Knowledge Bases Into End-to-end Task-oriented Dialog Systems (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Andrea Madotto, Chien-Sheng Wu, Pascale Fung
Sarcasm Analysis Using Conversation Context (2018) • Computational Linguistics • 77 citations
Debanjan Ghosh, Alexander R. Fabbri, Smaranda Muresan
Query And Output: Generating Words By Querying Distributed Word Representations For Paraphrase Generation (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 67 citations
Ma et al.
A Large-scale Corpus For Conversation Disentanglement (2018) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 86 citations
Kummerfeld et al.
Event2mind: Commonsense Inference On Events, Intents, And Reactions (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 40 citations
Rashkin et al.
An End-to-end Approach For Handling Unknown Slot Values In Dialogue State Tracking (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 148 citations
Puyang Xu, Qi Hu
A Unified Model For Extractive And Abstractive Summarization Using Inconsistency Loss (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 265 citations
Hsu et al.
Autoencoder As Assistant Supervisor: Improving Text Representation For Chinese Social Media Text Summarization (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 43 citations
Ma et al.
Structvae: Tree-structured Latent Variable Models For Semi-supervised Semantic Parsing (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 98 citations
Yin et al.
A Graph-to-sequence Model For Amr-to-text Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 197 citations
Song et al.
Adversarial Example Generation With Syntactically Controlled Paraphrase Networks (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 82 citations
Iyyer et al.
Context-aware Neural Machine Translation Learns Anaphora Resolution (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 300 citations
Voita et al.
Tracking State Changes In Procedural Text: A Challenge Dataset And Models For Process Paragraph Comprehension (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 109 citations
Mishra et al.
Towards Robust Neural Machine Translation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 178 citations
Cheng et al.
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 115 citations
Khandelwal et al.
Transformation Networks For Target-oriented Sentiment Classification (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 491 citations
Li et al.
Neural Coreference Resolution With Deep Biaffine Attention By Joint Mention Detection And Mention Clustering (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 62 citations
Zhang et al.
Baseline Needs More Love: On Simple Word-embedding-based Models And Associated Pooling Mechanisms (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 319 citations
Shen et al.
Harvesting Paragraph-level Question-answer Pairs From Wikipedia (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 169 citations
Xinya Du, Claire Cardie
Sentiment Adaptive End-to-end Dialog Systems (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 75 citations
Weiyan Shi, Zhou Yu
Conversations Gone Awry: Detecting Early Signs Of Conversational Failure (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 174 citations
Zhang et al.
What Do Neural Machine Translation Models Learn About Morphology? (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 276 citations
Belinkov et al.
FOIL It! Find One Mismatch Between Image And Language Caption (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Shekhar et al.
Opennmt: Open-source Toolkit For Neural Machine Translation (2017) • Proceedings of ACL 2017, System Demonstrations • 287 citations
Klein et al.
Cross-sentence N-ary Relation Extraction With Graph Lstms (2017) • Transactions of the Association for Computational Linguistics (TACL) 2017 Vol 5 • 133 citations
Peng et al.
Learning To Ask: Neural Question Generation For Reading Comprehension (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 630 citations
Xinya Du, Junru Shao, Claire Cardie
Latent Variable Dialogue Models And Their Diversity (2017) • Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers • 50 citations
Kris Cao, Stephen Clark
Colors In Context: A Pragmatic Neural Model For Grounded Language Understanding (2017) • Transactions of the Association for Computational Linguistics • 104 citations
Monroe et al.
Learning A Neural Semantic Parser From User Feedback (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 54 citations
Iyer et al.
Evaluating Discourse Phenomena In Neural Machine Translation (2017) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 119 citations
Bawden et al.
Neural Natural Language Inference Models Enhanced With External Knowledge (2017) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 271 citations
Chen et al.
Affect-lm: A Neural Language Model For Customizable Affective Text Generation (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 75 citations
Ghosh et al.
Learning Discourse-level Diversity For Neural Dialog Models Using Conditional Variational Autoencoders (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 704 citations
Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
Get To The Point: Summarization With Pointer-generator Networks (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 3757 citations
Abigail See, Peter J. Liu, Christopher D. Manning
Deep Neural Machine Translation With Linear Associative Unit (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Wang et al.
Question Answering Through Transfer Learning From Large Fine-grained Supervision Data (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 114 citations
Sewon Min, Minjoon Seo, Hannaneh Hajishirzi
Learning Distributed Representations Of Sentences From Unlabelled Data (2016) • Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 117 citations
Felix Hill, Kyunghyun Cho, Anna Korhonen
Using The Output Embedding To Improve Language Models (2016) • Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers • 94 citations
Ofir Press, Lior Wolf
Tree-to-sequence Attentional Neural Machine Translation (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 250 citations
Akiko Eriguchi, Kazuma Hashimoto, Yoshimasa Tsuruoka
A Fast Unified Model For Parsing And Sentence Understanding (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 229 citations
Bowman et al.
Improving Coreference Resolution By Learning Entity-level Distributed Representations (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 51 citations
Kevin Clark, Christopher D. Manning
A Persona-based Neural Conversation Model (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 889 citations
Li et al.
Incremental Parsing With Minimal Features Using Bi-directional LSTM (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 80 citations
James Cross, Liang Huang
The LAMBADA Dataset: Word Prediction Requiring A Broad Discourse Context (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 99 citations
Paperno et al.
Simple And Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations (2016) • Transactions of the Association for Computational Linguistics • 583 citations
Eliyahu Kiperwasser, Yoav Goldberg
Enhanced LSTM For Natural Language Inference (2016) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1178 citations
Chen et al.
Text Understanding With The Attention Sum Reader Network (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 46 citations
Kadlec et al.
Incorporating Copying Mechanism In Sequence-to-sequence Learning (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1419 citations
Gu et al.

Showing first 12 while collapsed. Click to expand and reveal all 370.

Agentic 222 papers #

4kagent: Agentic Any Image To 4K Super-resolution (2025) • No Venue
Zuo et al.
A Survey On Latent Reasoning (2025) • No Venue
Zhu et al.
Scaling Latent Reasoning Via Looped Language Models (2025) • No Venue
Zhu et al.
Multiagentbench: Evaluating The Collaboration And Competition Of LLM Agents (2025) • No Venue
Zhu et al.
Oagents: An Empirical Study Of Building Effective Agents (2025) • No Venue
Zhu et al.
Hephaestus: Improving Fundamental Agent Capabilities Of Large Language Models Through Continual Pre-training (2025) • No Venue
Zhuang et al.
Safearena: Evaluating The Safety Of Autonomous Web Agents (2025) • No Venue
Tur et al.
Co-evolving LLM Coder And Unit Tester Via Reinforcement Learning (2025) • No Venue
Wang et al.
Efficient Agents: Building Effective Agents While Reducing Cost (2025) • No Venue
Wang et al.
Game-tars: Pretrained Foundation Models For Scalable Generalist Multimodal Game Agents (2025) • No Venue
Wang et al.
Geovista: Web-augmented Agentic Visual Reasoning For Geolocalization (2025) • No Venue
Wang et al.
Internvl3.5: Advancing Open-source Multimodal Models In Versatility, Reasoning, And Efficiency (2025) • No Venue
Wang et al.
Mcp-bench: Benchmarking Tool-using LLM Agents With Complex Real-world Tasks Via MCP Servers (2025) • No Venue
Wang et al.
The Landscape Of Agentic Reinforcement Learning For Llms: A Survey (2025) • No Venue
Zhang et al.
Othink-r1: Intrinsic Fast/slow Thinking Mode Switching For Over-reasoning Mitigation (2025) • No Venue
Zhang et al.
Skywork-r1v4: Toward Agentic Multimodal Intelligence Through Interleaved Thinking With Images And Deepresearch (2025) • No Venue
Zhang et al.
Pyvision: Agentic Vision With Dynamic Tooling (2025) • No Venue
Zhao et al.
Stronger Together: On-policy Reinforcement Learning For Collaborative Llms (2025) • No Venue
Zhao et al.
Aionopedia: An LLM Agent Orchestrating Multimodal Learning For Ionic Liquid Discovery (2025) • No Venue
Yin et al.
Aworld: Orchestrating The Training Recipe For Agentic AI (2025) • No Venue
Yu et al.
Demystifying Reinforcement Learning In Agentic Reasoning (2025) • No Venue
Yu et al.
Embodied-reasoner: Synergizing Visual Search, Reasoning, And Action For Embodied Interactive Tasks (2025) • No Venue
Zhang et al.
Agentic Context Engineering: Evolving Contexts For Self-improving Language Models (2025) • No Venue
Zhang et al.
Deepanalyze: Agentic Large Language Models For Autonomous Data Science (2025) • No Venue
Zhang et al.
Complex Logical Instruction Generation (2025) • No Venue
Zhang et al.
Omniear: Benchmarking Agent Reasoning In Embodied Tasks (2025) • No Venue
Wang et al.
Finevision: Open Data Is All You Need (2025) • No Venue
Wiedmann et al.
UI-TARS-2 Technical Report: Advancing GUI Agent With Multi-turn Reinforcement Learning (2025) • No Venue
Wang et al.
From AI For Science To Agentic Science: A Survey On Autonomous Scientific Discovery (2025) • No Venue
Wei et al.
Widesearch: Benchmarking Agentic Broad Info-seeking (2025) • No Venue
Wong et al.
Webdancer: Towards Autonomous Information Seeking Agency (2025) • No Venue
Wu et al.
Agentgym-rl: Training LLM Agents For Long-horizon Decision Making Through Multi-turn Reinforcement Learning (2025) • No Venue
Xi et al.
LIMI: Less Is More For Agency (2025) • No Venue
Xiao et al.
Aworld: Dynamic Multi-agent System With Stable Maneuvering For Robust GAIA Problem Solving (2025) • No Venue
Xie et al.
Scaling Computer-use Grounding Via User Interface Decomposition And Synthesis (2025) • No Venue
Xie et al.
Flag-trader: Fusion Llm-agent With Gradient-based Reinforcement Learning For Financial Trading (2025) • No Venue
Xiong et al.
Deepphy: Benchmarking Agentic Vlms On Physical Reasoning (2025) • No Venue
Xu et al.
Filmagent: A Multi-agent Framework For End-to-end Film Automation In Virtual 3D Spaces (2025) • No Venue
Xu et al.
Ravine: Reality-aligned Evaluation For Agentic Search (2025) • No Venue
Xu et al.
TOUCAN: Synthesizing 1.5M Tool-agentic Data From Real-world MCP Environments (2025) • No Venue
Xu et al.
Oceangym: A Benchmark Environment For Underwater Embodied Agents (2025) • No Venue
Xue et al.
General Agentic Memory Via Deep Research (2025) • No Venue
Yan et al.
Captionqa: Is Your Caption As Useful As The Image Itself? (2025) • No Venue
Yang et al.
Embodiedbench: Comprehensive Benchmarking Multi-modal Large Language Models For Vision-driven Embodied Agents (2025) • No Venue
Yang et al.
Longvt: Incentivizing "thinking With Long Videos" Via Native Tool Calling (2025) • No Venue
Yang et al.
Magma: A Foundation Model For Multimodal AI Agents (2025) • No Venue
Yang et al.
Vlaser: Vision-language-action Model With Synergistic Embodied Reasoning (2025) • No Venue
Yang et al.
From What To Why: A Multi-agent System For Evidence-based Chemical Reaction Condition Reasoning (2025) • No Venue
Yang et al.
Flashadventure: A Benchmark For GUI Agents Solving Full Story Arcs In Diverse Adventure Games (2025) • No Venue
Ahn et al.
Surfer 2: The Next Generation Of Cross-platform Computer Use Agents (2025) • No Venue
Andreux et al.
Herobench: A Benchmark For Long-horizon Planning And Structured Reasoning In Virtual Worlds (2025) • No Venue
Anokhin et al.
What Does It Take To Be A Good AI Research Agent? Studying The Role Of Ideation Diversity (2025) • No Venue
Audran-Reiss et al.
Swe-rebench: An Automated Pipeline For Task Collection And Decontaminated Evaluation Of Software Engineering Agents (2025) • No Venue
Badertdinov et al.
Qwen3-vl Technical Report (2025) • No Venue
Bai et al.
Small Language Models Are The Future Of Agentic AI (2025) • No Venue
Belcak et al.
Enhancing Vision-language Model Training With Reinforcement Learning In Synthetic Worlds For Real-world Success (2025) • No Venue
Bredis et al.
Video Action Differencing (2025) • No Venue
Burgess et al.
Training-free Group Relative Policy Optimization (2025) • No Venue
Cai et al.
Agentfrontier: Expanding The Capability Frontier Of LLM Agents With Zpd-guided Data Synthesis (2025) • No Venue
Chen et al.
A^2FM: An Adaptive Agent Foundation Model For Tool-aware Hybrid Reasoning (2025) • No Venue
Chen et al.
ERA: Transforming Vlms Into Embodied Agents Via Embodied Prior Learning And Online Reinforcement Learning (2025) • No Venue
Chen et al.
Evolve The Method, Not The Prompts: Evolutionary Synthesis Of Jailbreak Attacks On Llms (2025) • No Venue
Chen et al.
Geometrically-constrained Agent For Spatial Reasoning (2025) • No Venue
Chen et al.
Spacetools: Tool-augmented Spatial Reasoning Via Double Interactive RL (2025) • No Venue
Chen et al.
P1: Mastering Physics Olympiads With Reinforcement Learning (2025) • No Venue
Chen et al.
Ui-ins: Enhancing GUI Grounding With Multi-perspective Instruction-as-reasoning (2025) • No Venue
Chen et al.
Stockbench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? (2025) • No Venue
Chen et al.
The Era Of Agentic Organization: Learning To Organize With Language Models (2025) • No Venue
Chi et al.
Deepresearchgym: A Free, Transparent, And Reproducible Evaluation Sandbox For Deep Research (2025) • No Venue
Coelho et al.
Gemini 2.5: Pushing The Frontier With Advanced Reasoning, Multimodality, Long Context, And Next Generation Agentic Capabilities (2025) • No Venue
Comanici et al.
The Danger Of Overthinking: Examining The Reasoning-action Dilemma In Agentic Tasks (2025) • No Venue
Cuadron et al.
Defeating Prompt Injections By Design (2025) • No Venue
Debenedetti et al.
Toolscope: An Agentic Framework For Vision-guided And Long-horizon Tool Use (2025) • No Venue
Mengjie Deng, Guanting Dong, Zhicheng Dou
Supervised Reinforcement Learning: From Expert Trajectories To Step-wise Reasoning (2025) • No Venue
Deng et al.
Arm-thinker: Reinforcing Multimodal Generative Reward Models With Agentic Tool Use And Visual Reasoning (2025) • No Venue
Ding et al.
Agentic Entropy-balanced Policy Optimization (2025) • No Venue
Dong et al.
SSRL: Self-search Reinforcement Learning (2025) • No Venue
Fan et al.
Cognitive Kernel-pro: A Framework For Deep Research Agents And Agent Foundation Models Training (2025) • No Venue
Fang et al.
A Comprehensive Survey Of Self-evolving AI Agents: A New Paradigm Bridging Foundation Models And Lifelong Agentic Systems (2025) • No Venue
Fang et al.
Dualvla: Building A Generalizable Embodied Agent Via Partial Decoupling Of Reasoning And Action (2025) • No Venue
Fang et al.
Towards General Agentic Intelligence Via Environment Scaling (2025) • No Venue
Fang et al.
Grounding Computer Use Agents On Human Demonstrations (2025) • No Venue
Feizi et al.
Beyond Ten Turns: Unlocking Long-horizon Agentic Search With Large-scale Asynchronous RL (2025) • No Venue
Gao et al.
Agentscope 1.0: A Developer-centric Framework For Building Agentic Applications (2025) • No Venue
Gao et al.
A Survey Of Self-evolving Agents: On Path To Artificial Super Intelligence (2025) • No Venue
Gao et al.
Mind2web 2: Evaluating Agentic Search With Agent-as-a-judge (2025) • No Venue
Gou et al.
Ui-venus Technical Report: Building High-performance UI Agents With RFT (2025) • No Venue
Gu et al.
Textarena (2025) • No Venue
Guertler et al.
Costaast: Cost-sensitive Toolpath Agent For Multi-turn Image Editing (2025) • No Venue
Gupta et al.
Deep Researcher With Test-time Diffusion (2025) • No Venue
Han et al.
Deepeyesv2: Toward Agentic Multimodal Model (2025) • No Venue
Hong et al.
Xolver: Multi-agent Reasoning With Holistic Experience Learning Just Like An Olympiad Team (2025) • No Venue
Hosain et al.
Paperdebugger: A Plugin-based Multi-agent System For In-editor Academic Writing, Review, And Editing (2025) • No Venue
Hou et al.
A Survey Of Scientific Large Language Models: From Data Foundations To Agent Frontiers (2025) • No Venue
Hu et al.
Building A Foundational Guardrail For General Agentic Systems Via Synthetic Data (2025) • No Venue
Huang et al.
BIRD-INTERACT: Re-imagining Text-to-sql Evaluation For Large Language Models Via Lens Of Dynamic Interactions (2025) • No Venue
Huo et al.
Tree Search For LLM Agent Reinforcement Learning (2025) • No Venue
Ji et al.
Verltool: Towards Holistic Agentic Reinforcement Learning With Tool Use (2025) • No Venue
Jiang et al.
ACON: Optimizing Context Compression For Long-horizon LLM Agents (2025) • No Venue
Kang et al.
Theoremexplainagent: Towards Multimodal Explanations For LLM Theorem Understanding (2025) • No Venue
Ku et al.
Deepagent: A General Reasoning Agent With Scalable Toolsets (2025) • No Venue
Li et al.
Chain-of-agents: End-to-end Agent Foundation Models Via Multi-agent Distillation And Agentic RL (2025) • No Venue
Li et al.
In-the-flow Agentic System Optimization For Effective Planning And Tool Use (2025) • No Venue
Li et al.
Perception, Reason, Think, And Plan: A Survey On Large Multimodal Reasoning Models (2025) • No Venue
Li et al.
Search-o1: Agentic Search-enhanced Large Reasoning Models (2025) • No Venue
Li et al.
Towards Agentic RAG With Deep Reasoning: A Survey Of Rag-reasoning Systems In Llms (2025) • No Venue
Li et al.
Websailor-v2: Bridging The Chasm To Proprietary Agents Via Synthetic Data And Scalable Reinforcement Learning (2025) • No Venue
Li et al.
Embrace-3k: Embodied Reasoning And Action In Complex Environments (2025) • No Venue
Lin et al.
Vcode: A Multimodal Coding Benchmark With SVG As Symbolic Visual Representation (2025) • No Venue
Lin et al.
Advances And Challenges In Foundation Agents: From Brain-inspired Intelligence To Evolutionary, Collaborative, And Safe Systems (2025) • No Venue
Liu et al.
Docreward: A Document Reward Model For Structuring And Stylizing (2025) • No Venue
Liu et al.
GEM: A Gym For Agentic Llms (2025) • No Venue
Liu et al.
Visual-rft: Visual Reinforcement Fine-tuning (2025) • No Venue
Liu et al.
Scalecua: Scaling Open-source Computer Use Agents With Cross-platform Data (2025) • No Venue
Liu et al.
Webexplorer: Explore And Evolve For Training Long-horizon Web Agents (2025) • No Venue
Liu et al.
VITA-E: Natural Embodied Interaction With Concurrent Seeing, Hearing, Speaking, And Acting (2025) • No Venue
Liu et al.
VISTA: A Test-time Self-improving Video Generation Agent (2025) • No Venue
Long et al.
Ultrahorizon: Benchmarking Agent Capabilities In Ultra Long-horizon Scenarios (2025) • No Venue
Luo et al.
Build The Web For Agents, Not Agents For The Web (2025) • No Venue
Lù et al.
R-wom: Retrieval-augmented World Model For Computer-use Agents (2025) • No Venue
Mei et al.
Paper2agent: Reimagining Research Papers As Interactive And Reliable AI Agents (2025) • No Venue
Miao et al.
Livemcpbench: Can Agents Navigate An Ocean Of MCP Tools? (2025) • No Venue
Mo et al.
Large Language Models Think Too Fast To Explore Effectively (2025) • No Venue
Lan Pan, Hanbo Xie, Robert C. Wilson
Agentic Reward Modeling: Integrating Human Preferences With Verifiable Correctness Signals For Reliable Reward Systems (2025) • No Venue
Peng et al.
SWE-QA: Can Language Models Answer Repository-level Code Questions? (2025) • No Venue
Peng et al.
BEAR: Benchmarking And Enhancing Multimodal Language Models For Atomic Embodied Capabilities (2025) • No Venue
Qi et al.
Agentic Knowledgeable Self-awareness (2025) • No Venue
Qiao et al.
Learn The Ropes, Then Trust The Wins: Self-imitation With Progressive Exploration For Agentic Reinforcement Learning (2025) • No Venue
Qin et al.
Thinking Beyond Tokens: From Brain-inspired Intelligence To Cognitive Foundations For Artificial General Intelligence And Its Societal Impact (2025) • No Venue
Qureshi et al.
Simworld: An Open-ended Realistic Simulator For Autonomous Agents In Physical And Social Worlds (2025) • No Venue
Ren et al.
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning And Reasoning Modes (2025) • No Venue
Research et al.
Self-generated In-context Examples Improve LLM Agents For Sequential Decision-making Tasks (2025) • No Venue
Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
Llms Are Greedy Agents: Effects Of RL Fine-tuning On Decision-making Abilities (2025) • No Venue
Schmied et al.
Agentrxiv: Towards Collaborative Autonomous Research (2025) • No Venue
Samuel Schmidgall, Michael Moor
Rstar2-agent: Agentic Reasoning Technical Report (2025) • No Venue
Shang et al.
Deep Research: A Systematic Survey (2025) • No Venue
Shi et al.
Taskcraft: Automated Generation Of Agentic Tasks (2025) • No Venue
Shi et al.
Fathom-deepresearch: Unlocking Long Horizon Information Retrieval And Synthesis For Slms (2025) • No Venue
Shreyas Singh, Kunal Singh, Pradeep Moturi
Agentic Reasoning And Tool Integration For Llms Via Reinforcement Learning (2025) • No Venue
Singh et al.
MADD: Multi-agent Drug Discovery Orchestra (2025) • No Venue
Solovev et al.
Agent Data Protocol: Unifying Datasets For Diverse, Effective Fine-tuning Of LLM Agents (2025) • No Venue
Song et al.
Toolorchestra: Elevating Intelligence Via Efficient Model And Tool Orchestration (2025) • No Venue
Su et al.
Learn-by-interact: A Data-centric Framework For Self-adaptive Agents In Realistic Environments (2025) • No Venue
Su et al.
Scaling Agents Via Continual Pre-training (2025) • No Venue
Su et al.
Os-sentinel: Towards Safety-enhanced Mobile GUI Agents Via Hybrid Validation In Realistic Workflows (2025) • No Venue
Sun et al.
Scienceboard: Evaluating Multimodal Autonomous Agents In Realistic Scientific Workflows (2025) • No Venue
Sun et al.
Seagent: Self-evolving Computer Use Agent With Autonomous Learning From Experience (2025) • No Venue
Sun et al.
Hiersearch: A Hierarchical Enterprise Deep Search Framework Integrating Local And Web Searches (2025) • No Venue
Tan et al.
Agent KB: Leveraging Cross-domain Experience For Agentic Problem Solving (2025) • No Venue
Tang et al.
Webshaper: Agentically Data Synthesizing Via Information-seeking Formalization (2025) • No Venue
Tao et al.
Gemini Robotics: Bringing AI Into The Physical World (2025) • No Venue
Team et al.
Inferix: A Block-diffusion Based Next-generation Inference Engine For World Simulation (2025) • No Venue
Team et al.
GLM-4.5: Agentic, Reasoning, And Coding (ARC) Foundation Models (2025) • No Venue
Team et al.
Nex-n1: Agentic Models Trained Via A Unified Ecosystem For Large-scale Environment Construction (2025) • No Venue
Team et al.
PAN: A World Model For General, Interactable, And Long-horizon World Simulation (2025) • No Venue
Team et al.
Tongyi Deepresearch Technical Report (2025) • No Venue
Team et al.
Open Multimodal Retrieval-augmented Factual Image Generation (2025) • No Venue
Tian et al.
Appworld: A Controllable World Of Apps And People For Benchmarking Interactive Coding Agents (2024) • No Venue
Trivedi et al.
Agent Workflow Memory (2024) • No Venue
Wang et al.
Agent S: An Open Agentic Framework That Uses Computers Like A Human (2024) • No Venue
Agashe et al.
Arigraph: Learning Knowledge Graph World Models With Episodic Memory For LLM Agents (2024) • No Venue
Anokhin et al.
Genie: Generative Interactive Environments (2024) • No Venue
Bruce et al.
Web Agents With World Models: Learning And Leveraging Environment Dynamics In Web Navigation (2024) • No Venue
Chae et al.
Internet Of Agents: Weaving A Web Of Heterogeneous Agents For Collaborative Intelligence (2024) • No Venue
Chen et al.
MLLM As Retriever: Interactively Learning Multimodal Retrieval For Embodied Agents (2024) • No Venue
Yue et al.
Gpt-4v(ision) Is A Generalist Web Agent, If Grounded (2024) • No Venue
Zheng et al.
LEGENT: Open Platform For Embodied Agents (2024) • No Venue
Cheng et al.
ALPINE: Unveiling The Planning Capability Of Autoregressive Learning In Language Models (2024) • No Venue
Wang et al.
Mobile-agent: Autonomous Multi-modal Mobile Device Agent With Visual Perception (2024) • No Venue
Wang et al.
LAVE: Llm-powered Agent Assistance And Language Augmentation For Video Editing (2024) • No Venue
Wang et al.
Videoagent: Long-form Video Understanding With Large Language Model As Agent (2024) • No Venue
Wang et al.
Opendevin: An Open Platform For AI Software Developers As Generalist Agents (2024) • No Venue
Wang et al.
Agent-as-a-judge: Evaluate Agents With Agents (2024) • No Venue
Zhuge et al.
Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments (2024) • No Venue
Xi et al.
The AI Scientist: Towards Fully Automated Open-ended Scientific Discovery (2024) • No Venue
Lu et al.
Generative World Explorer (2024) • No Venue
Lu et al.
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents (2024) • No Venue
Wu et al.
Agentinstruct: Toward Generative Teaching With Agentic Flows (2024) • No Venue
Mitra et al.
Dynasaur: Large Language Agents Beyond Predefined Actions (2024) • No Venue
Nguyen et al.
Webrl: Training LLM Web Agents Via Self-evolving Online Curriculum Reinforcement Learning (2024) • No Venue
Qi et al.
Benchmarking Agentic Workflow Generation (2024) • No Venue
Qiao et al.
A Review Of Large Language Models And Autonomous Agents In Chemistry (2024) • Chemical Science • 79 citations
Mayk Caldas Ramos, Christopher J. Collison, Andrew D. White
Agent AI: Surveying The Horizons Of Multimodal Interaction (2024) • Arxiv • 40 citations
Durante et al.
Stream Of Search (sos): Learning To Search In Language (2024) • No Venue
Gandhi et al.
Empowering Biomedical Discovery With AI Agents (2024) • Cell • 129 citations
Gao et al.
Protagents: Protein Discovery Via Large Language Model Multi-agent Collaborations Combining Physics And Machine Learning (2024) • Digital Discovery • 49 citations
A. Ghafarollahi, M. J. Buehler
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2024) • No Venue
Grosnit et al.
The Dawn Of GUI Agent: A Preliminary Case Study With Claude 3.5 Computer Use (2024) • No Venue
Hu et al.
Automated Design Of Agentic Systems (2024) • No Venue
Shengran Hu, Cong Lu, Jeff Clune
Pokéllmon: A Human-parity Agent For Pokémon Battles With Large Language Models (2024) • No Venue
Sihao Hu, Tiansheng Huang, Ling Liu
Genmac: Compositional Text-to-video Generation With Multi-agent Collaboration (2024) • No Venue
Huang et al.
Omniact: A Dataset And Benchmark For Enabling Multimodal Generalist Autonomous Agents For Desktop And Web (2024) • No Venue
Kapoor et al.
Husky: A Unified, Open-source Language Agent For Multi-step Reasoning (2024) • No Venue
Kim et al.
Androidlab: Training And Systematic Benchmarking Of Android Autonomous Agents (2024) • No Venue
Xu et al.
Revealing The Barriers Of Language Agents In Planning (2024) • No Venue
Xie et al.
Agentbench: Evaluating Llms As Agents (2023) • No Venue
Liu et al.
Mechagents: Large Language Model Multi-agent Collaborations Can Solve Mechanics Problems, Generate New Data, And Integrate Knowledge (2023) • Extreme Mechanics Letters • 47 citations
Bo Ni, Markus J. Buehler
Embodiedgpt: Vision-language Pre-training Via Embodied Chain Of Thought (2023) • Arxiv • 41 citations
Mu et al.
JARVIS-1: Open-world Multi-task Agents With Memory-augmented Multimodal Language Models (2023) • No Venue
Wang et al.
Agenttuning: Enabling Generalized Agent Abilities For Llms (2023) • No Venue
Zeng et al.
Navgpt: Explicit Reasoning In Vision-and-language Navigation With Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 104 citations
Gengze Zhou, Yicong Hong, Qi Wu
Expel: LLM Agents Are Experiential Learners (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Zhao et al.
Retroformer: Retrospective Large Language Agents With Policy Gradient Optimization (2023) • No Venue
Yao et al.
Deception Abilities Emerged In Large Language Models (2023) • Proceedings of the National Academy of Sciences • 45 citations
Thilo Hagendorff
Agents: An Open-source Framework For Autonomous Language Agents (2023) • No Venue
Zhou et al.
Cogagent: A Visual Language Model For GUI Agents (2023) • No Venue
Hong et al.
Octopus: Embodied Vision-language Programmer From Environmental Feedback (2023) • No Venue
Yang et al.
Large AI Model Empowered Multimodal Semantic Communications (2023) • IEEE Wireless Communications • 40 citations
Jiang et al.
Modelscope-agent: Building Your Customizable Agent System With Open-source Large Language Models (2023) • No Venue
Li et al.
Lemur: Harmonizing Natural Language And Code For Language Agents (2023) • No Venue
Xu et al.
Dspy: Compiling Declarative Language Model Calls Into Self-improving Pipelines (2023) • No Venue
Khattab et al.
Webarena: A Realistic Web Environment For Building Autonomous Agents (2023) • No Venue
Zhou et al.
Hugginggpt: Solving AI Tasks With Chatgpt And Its Friends In Hugging Face (2023) • Arxiv • 264 citations
Shen et al.
A Generalist Agent (2022) • Transactions on Machine Learning Research 11/2022 https://openreview.net/forum?id=1ikK0kHjvj • 60 citations
Reed et al.
Language Models As Agent Models (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 44 citations
Jacob Andreas
Think Global, Act Local: Dual-scale Graph Transformer For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 127 citations
Chen et al.
Minedojo: Building Open-ended Embodied Agents With Internet-scale Knowledge (2022) • Arxiv • 59 citations
Fan et al.
Hierarchical Cross-modal Agent For Robotics Vision-and-language Navigation (2021) • 2021 IEEE International Conference on Robotics and Automation (ICRA) • 45 citations
Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira
Alignment Of Language Agents (2021) • Arxiv • 41 citations
Kenton et al.
Automated Rationale Generation: A Technique For Explainable AI And Its Effects On Human Perceptions (2019) • Arxiv • 44 citations
Ehsan et al.
The Regretful Agent: Heuristic-aided Navigation Through Progress Estimation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 168 citations
Ma et al.
Proactive Human-machine Conversation With Explicit Conversation Goals (2019) • Arxiv • 41 citations
Wu et al.

Showing first 12 while collapsed. Click to expand and reveal all 222.

Applications 1023 papers #

Enhancing Human-like Responses In Large Language Models (2025) • No Venue
Ethem Yağız Çalık, Talha Rüzgar Akkuş
Falcon-h1: A Family Of Hybrid-head Language Models Redefining Efficiency And Performance (2025) • No Venue
Zuo et al.
Segagent: Exploring Pixel Understanding Capabilities In Mllms By Imitating Human Annotator Trajectories (2025) • No Venue
Zhu et al.
Embeddinggemma: Powerful And Lightweight Text Representations (2025) • No Venue
Vera et al.
Declip: Decoupled Learning For Open-vocabulary Dense Perception (2025) • No Venue
Wang et al.
Coser: Coordinating Llm-based Persona Simulation Of Established Roles (2025) • No Venue
Wang et al.
MIRIX: Multi-agent Memory System For Llm-based Agents (2025) • No Venue
Yu Wang, Xi Chen
Landscape Of Thoughts: Visualizing The Reasoning Process Of Large Language Models (2025) • No Venue
Zhou et al.
The Landscape Of Agentic Reinforcement Learning For Llms: A Survey (2025) • No Venue
Zhang et al.
Latent Sketchpad: Sketching Visual Thoughts To Elicit Multimodal Reasoning In Mllms (2025) • No Venue
Zhang et al.
Minimax-speech: Intrinsic Zero-shot Text-to-speech With A Learnable Speaker Encoder (2025) • No Venue
Zhang et al.
Metamind: Modeling Human Social Thoughts With Metacognitive Multi-agent Systems (2025) • No Venue
Zhang et al.
Prompt Orchestration Markup Language (2025) • No Venue
Zhang et al.
T2r-bench: A Benchmark For Generating Article-level Reports From Real World Industrial Tables (2025) • No Venue
Zhang et al.
UFO2: The Desktop Agentos (2025) • No Venue
Zhang et al.
What, How, Where, And How Well? A Survey On Test-time Scaling In Large Language Models (2025) • No Venue
Zhang et al.
Chemdfm-r: An Chemical Reasoner LLM Enhanced With Atomized Chemical Knowledge (2025) • No Venue
Zhao et al.
R1-omni: Explainable Omni-multimodal Emotion Recognition With Reinforcing Learning (2025) • No Venue
Jiaxing Zhao, Xihan Wei, Liefeng Bo
Vbench-2.0: Advancing Video Generation Benchmark Suite For Intrinsic Faithfulness (2025) • No Venue
Zheng et al.
Aionopedia: An LLM Agent Orchestrating Multimodal Learning For Ionic Liquid Discovery (2025) • No Venue
Yin et al.
Aligning Multimodal LLM With Human Preference: A Survey (2025) • No Venue
Yu et al.
Discrete Diffusion In Large Language And Multimodal Models: A Survey (2025) • No Venue
Runpeng Yu, Qi Li, Xinchao Wang
The Stochastic Parrot On Llm's Shoulder: A Summative Assessment Of Physical Concept Understanding (2025) • No Venue
Yu et al.
Sa2va: Marrying SAM2 With Llava For Dense Grounded Understanding Of Images And Videos (2025) • No Venue
Yuan et al.
Gliner2: An Efficient Multi-task Information Extraction System With Schema-driven Interface (2025) • No Venue
Zaratiana et al.
Designlab: Designing Slides Through Iterative Detection And Correction (2025) • No Venue
Yun et al.
Internlm-xcomposer2.5-reward: A Simple Yet Effective Multi-modal Reward Model (2025) • No Venue
Zang et al.
API Agents Vs. GUI Agents: Divergence And Convergence (2025) • No Venue
Zhang et al.
Agentic Context Engineering: Evolving Contexts For Self-improving Language Models (2025) • No Venue
Zhang et al.
BANG: Dividing 3D Assets Via Generative Exploded Dynamics (2025) • No Venue
Zhang et al.
Multimodal Chain-of-thought Reasoning: A Comprehensive Survey (2025) • No Venue
Wang et al.
ODYSSEY: Open-world Quadrupeds Exploration And Manipulation For Long-horizon Tasks (2025) • No Venue
Wang et al.
Opencua: Open Foundations For Computer-use Agents (2025) • No Venue
Wang et al.
Skywork-vl Reward: An Effective Reward Model For Multimodal Understanding And Reasoning (2025) • No Venue
Wang et al.
RLVER: Reinforcement Learning With Verifiable Emotion Rewards For Empathetic Agents (2025) • No Venue
Wang et al.
Pixnerd: Pixel Neural Field Diffusion (2025) • No Venue
Wang et al.
Revolutionizing Reinforcement Learning Framework For Diffusion Large Language Models (2025) • No Venue
Wang et al.
Transpixar: Advancing Text-to-video Generation With Transparency (2025) • No Venue
Wang et al.
Unified Multimodal Chain-of-thought Reward Model Through Reinforcement Fine-tuning (2025) • No Venue
Wang et al.
Worldgen: From Text To Traversable And Interactive 3D Worlds (2025) • No Venue
Wang et al.
Llama-3.1-foundationai-securityllm-8b-instruct Technical Report (2025) • No Venue
Weerawardhena et al.
From AI For Science To Agentic Science: A Survey On Autonomous Scientific Discovery (2025) • No Venue
Wei et al.
3D Scene Generation: A Survey (2025) • No Venue
Wen et al.
Less-to-more Generalization: Unlocking More Controllability By In-context Generation (2025) • No Venue
Wu et al.
The Bitter Lesson Learned From 2,000+ Multilingual Benchmarks (2025) • No Venue
Wu et al.
Generate, But Verify: Reducing Hallucination In Vision-language Models With Retrospective Resampling (2025) • No Venue
Wu et al.
Shifting Long-context Llms Research From Input To Output (2025) • No Venue
Wu et al.
Sitemb-v1.5: Improved Context-aware Dense Retrieval For Semantic Association And Long Story Comprehension (2025) • No Venue
Wu et al.
Surrogate Signals From Format And Length: Reinforcement Learning For Solving Mathematical Problems Without Ground Truth Answers (2025) • No Venue
Xin et al.
Naturelm: Deciphering The Language Of Nature For Scientific Discovery (2025) • No Venue
Xia et al.
Dreamomni2: Multimodal Instruction-based Editing And Generation (2025) • No Venue
Xia et al.
Filmagent: A Multi-agent Framework For End-to-end Film Automation In Virtual 3D Spaces (2025) • No Venue
Xu et al.
Easyedit2: An Easy-to-use Steering Framework For Editing Large Language Models (2025) • No Venue
Xu et al.
Easysteer: A Unified Framework For High-performance And Extensible LLM Steering (2025) • No Venue
Xu et al.
Scalable Chain Of Thoughts Via Elastic Reasoning (2025) • No Venue
Xu et al.
Step-audio-editx Technical Report (2025) • No Venue
Yan et al.
Learning On The Job: An Experience-driven Self-evolving Agent For Long-horizon Tasks (2025) • No Venue
Yang et al.
From Code Foundation Models To Agents And Applications: A Practical Guide To Code Intelligence (2025) • No Venue
Yang et al.
Gencompositor: Generative Video Compositing With Diffusion Transformer (2025) • No Venue
Yang et al.
Hscodecomp: A Realistic And Expert-level Benchmark For Deep Search Agents In Hierarchical Rule Application (2025) • No Venue
Yang et al.
Medical World Model: Generative Simulation Of Tumor Evolution For Treatment Planning (2025) • No Venue
Yang et al.
Moose-chem2: Exploring LLM Limits In Fine-grained Scientific Hypothesis Discovery Via Hierarchical Search (2025) • No Venue
Yang et al.
Twinmarket: A Scalable Behavioral And Social Simulation For Financial Markets (2025) • No Venue
Yang et al.
Recgpt Technical Report (2025) • No Venue
Yi et al.
Survey On Evaluation Of Llm-based Agents (2025) • No Venue
Yehudai et al.
Omnivinci: Enhancing Architecture And Data For Omni-modal Understanding LLM (2025) • No Venue
Ye et al.
Primitiveanything: Human-crafted 3D Primitive Assembly Generation With Auto-regressive Transformer (2025) • No Venue
Ye et al.
Wan: Open And Advanced Large-scale Video Generative Models (2025) • No Venue
Wanteam et al.
FS-DAG: Few Shot Domain Adapting Graph Networks For Visually Rich Document Understanding (2025) • No Venue
Amit Agarwal, Srikant Panda, Kulbhushan Pachauri
Sadeed: Advancing Arabic Diacritization Through Small Language Model (2025) • No Venue
Aldallal et al.
LFM2 Technical Report (2025) • No Venue
Amini et al.
Kandinsky 5.0: A Family Of Foundation Models For Image And Video Generation (2025) • No Venue
Arkhipkin et al.
Recammaster: Camera-controlled Generative Rendering From A Single Video (2025) • No Venue
Bai et al.
Clinical Knowledge In Llms Does Not Translate To Human Interactions (2025) • No Venue
Bean et al.
Small Language Models Are The Future Of Agentic AI (2025) • No Venue
Belcak et al.
Video-as-prompt: Unified Semantic Control For Video Generation (2025) • No Venue
Bian et al.
Video Action Differencing (2025) • No Venue
Burgess et al.
Scaling Spatial Intelligence With Multimodal Foundation Models (2025) • No Venue
Cai et al.
Multi-domain Explainability Of Preferences (2025) • No Venue
Nitay Calderon, Liat Ein-Dor, Roi Reichart
Reconstructing 4D Spatial Intelligence: A Survey (2025) • No Venue
Cao et al.
A3: Android Agent Arena For Mobile GUI Agents (2025) • No Venue
Chai et al.
Autopr: Let's Automate Your Academic Promotion! (2025) • No Venue
Chen et al.
Comp: Continual Multimodal Pre-training For Vision Foundation Models (2025) • No Venue
Chen et al.
Spacetools: Tool-augmented Spatial Reasoning Via Double Interactive RL (2025) • No Venue
Chen et al.
Opengpt-4o-image: A Comprehensive Dataset For Advanced Image Generation And Editing (2025) • No Venue
Chen et al.
Sana-sprint: One-step Diffusion With Continuous-time Consistency Distillation (2025) • No Venue
Chen et al.
Animegamer: Infinite Anime Life Simulation With Next Game State Prediction (2025) • No Venue
Cheng et al.
Video-as-answer: Predict And Generate Next Video Event With Joint-grpo (2025) • No Venue
Cheng et al.
Beyond RAG: Task-aware KV Cache Compression For Comprehensive Knowledge Reasoning (2025) • No Venue
Corallo et al.
Meshcoder: Llm-powered Structured Mesh Code Generation From Point Clouds (2025) • No Venue
Dai et al.
Swe-bench Pro: Can AI Agents Solve Long-horizon Software Engineering Tasks? (2025) • No Venue
Deng et al.
Self-improvement In Multimodal Large Language Models: A Survey (2025) • No Venue
Deng et al.
Kling-avatar: Grounding Multimodal Instructions For Cascaded Long-duration Avatar Animation Synthesis (2025) • No Venue
Ding et al.
Motionsight: Boosting Fine-grained Motion Understanding In Multimodal Llms (2025) • No Venue
Du et al.
Cognitive Kernel-pro: A Framework For Deep Research Agents And Agent Foundation Models Training (2025) • No Venue
Fang et al.
A Multi-modal AI Copilot For Single-cell Analysis With Instruction Following (2025) • No Venue
Fang et al.
Nemotron-flash: Towards Latency-optimal Hybrid Small Language Models (2025) • No Venue
Fu et al.
Sliderspace: Decomposing The Visual Capabilities Of Diffusion Models (2025) • No Venue
Gandikota et al.
A Survey Of Self-evolving Agents: On Path To Artificial Super Intelligence (2025) • No Venue
Gao et al.
Arc-hunyuan-video-7b: Structured Video Comprehension Of Real-world Shorts (2025) • No Venue
Ge et al.
Training Long-context, Multi-turn Software Engineering Agents With Reinforcement Learning (2025) • No Venue
Golubev et al.
Lment: A Suite For Analyzing Knowledge In Language Models From Pretraining Data To Representations (2025) • No Venue
Gottesman et al.
Mineworld: A Real-time And Open-source Interactive World Model On Minecraft (2025) • No Venue
Guo et al.
Audiostory: Generating Long-form Narrative Audio With Large Language Models (2025) • No Venue
Guo et al.
Seed1.5-vl Technical Report (2025) • No Venue
Guo et al.
Charting And Navigating Hugging Face's Model Atlas (2025) • No Venue
Horwitz et al.
Dynaguard: A Dynamic Guardrail Model With User-defined Policies (2025) • No Venue
Hoover et al.
Hunyuancustom: A Multimodal-driven Architecture For Customized Video Generation (2025) • No Venue
Hu et al.
Live Avatar: Streaming Real-time Audio-driven Avatar Generation With Infinite Length (2025) • No Venue
Huang et al.
On The Trustworthiness Of Generative Foundation Models: Guideline, Assessment, And Perspective (2025) • No Venue
Huang et al.
BIRD-INTERACT: Re-imagining Text-to-sql Evaluation For Large Language Models Via Lens Of Dynamic Interactions (2025) • No Venue
Huo et al.
Expect The Unexpected: Failsafe Long Context QA For Finance (2025) • No Venue
Kamble et al.
Gigaevo: An Open Source Optimization Framework Powered By Llms And Evolution Algorithms (2025) • No Venue
Khrulkov et al.
Distillm-2: A Contrastive Approach Boosts The Distillation Of Llms (2025) • No Venue
Ko et al.
Streamdit: Real-time Streaming Text-to-video Generation (2025) • No Venue
Kodaira et al.
Cadrille: Multi-modal CAD Reconstruction With Online Reinforcement Learning (2025) • No Venue
Kolodiazhnyi et al.
From Scores To Skills: A Cognitive Diagnosis Framework For Evaluating Financial Large Language Models (2025) • No Venue
Kuang et al.
Gemini Embedding: Generalizable Embeddings From Gemini (2025) • No Venue
Lee et al.
Can One Domain Help Others? A Data-centric Study On Multi-domain Reasoning Via Reinforcement Learning (2025) • No Venue
Li et al.
4D Langsplat: 4D Language Gaussian Splatting Via Multimodal Large Language Models (2025) • No Venue
Li et al.
Deepagent: A General Reasoning Agent With Scalable Toolsets (2025) • No Venue
Li et al.
Droplet3d: Commonsense Priors From Videos Facilitate 3D Generation (2025) • No Venue
Li et al.
Drafterbench: Benchmarking Large Language Models For Tasks Automation In Civil Engineering (2025) • No Venue
Yinsheng Li, Zhen Dong, Yi Shao
If-vidcap: Can Video Caption Models Follow Instructions? (2025) • No Venue
Li et al.
Langsplatv2: High-dimensional 3D Language Gaussian Splatting With 450+ FPS (2025) • No Venue
Li et al.
Sos1: O1 And R1-like Reasoning Llms Are Sum-of-square Solvers (2025) • No Venue
Li et al.
SWE-SQL: Illuminating LLM Pathways To Solve User SQL Issues In Real-world Applications (2025) • No Venue
Li et al.
The Tool Decathlon: Benchmarking Language Agents For Diverse, Realistic, And Long-horizon Task Execution (2025) • No Venue
Li et al.
Triposg: High-fidelity 3D Shape Synthesis Using Large-scale Rectified Flow Models (2025) • No Venue
Li et al.
Colorbench: Can Vlms See And Understand The Colorful World? A Comprehensive Benchmark For Color Perception, Reasoning, And Robustness (2025) • No Venue
Liang et al.
Autoregressive Adversarial Post-training For Real-time Interactive Video Generation (2025) • No Venue
Lin et al.
Computer-use Agents As Judges For Generative User Interface (2025) • No Venue
Lin et al.
Omnihuman-1: Rethinking The Scaling-up Of One-stage Conditioned Human Animation Models (2025) • No Venue
Lin et al.
Uniworld: High-resolution Semantic Encoders For Unified Visual Understanding And Generation (2025) • No Venue
Lin et al.
Towards Understanding Camera Motions In Any Video (2025) • No Venue
Lin et al.
Critique-coder: Enhancing Coder Models By Critique Reinforcement Learning (2025) • No Venue
Ruan et al.
Spatiallm: Training Large Language Models For Structured Indoor Modeling (2025) • No Venue
Mao et al.
Beyond Distillation: Pushing The Limits Of Medical LLM Reasoning With Minimalist Rule-based RL (2025) • No Venue
Liu et al.
Advances And Challenges In Foundation Agents: From Brain-inspired Intelligence To Evolutionary, Collaborative, And Safe Systems (2025) • No Venue
Liu et al.
Longemotion: Measuring Emotional Intelligence Of Large Language Models In Long-context Interaction (2025) • No Venue
Liu et al.
Can World Simulators Reason? Gen-vire: A Generative Visual Reasoning Benchmark (2025) • No Venue
Liu et al.
A Comprehensive Survey On Long Context Language Modeling (2025) • No Venue
Liu et al.
Efficient Inference For Large Reasoning Models: A Survey (2025) • No Venue
Liu et al.
Part I: Tricks Or Traps? A Deep Dive Into RL For LLM Reasoning (2025) • No Venue
Liu et al.
Medsam3: Delving Into Segment Anything With Medical Concepts (2025) • No Venue
Liu et al.
METAGENE-1: Metagenomic Foundation Model For Pandemic Monitoring (2025) • No Venue
Liu et al.
Othink-mr1: Stimulating Multimodal Generalized Reasoning Capabilities Via Dynamic Reinforcement Learning (2025) • No Venue
Liu et al.
Phantom: Subject-consistent Video Generation Via Cross-modal Alignment (2025) • No Venue
Liu et al.
Songgen: A Single Stage Auto-regressive Transformer For Text-to-song Generation (2025) • No Venue
Liu et al.
Step1x-edit: A Practical Framework For General Image Editing (2025) • No Venue
Liu et al.
Webexplorer: Explore And Evolve For Training Long-horizon Web Agents (2025) • No Venue
Liu et al.
Seeing, Listening, Remembering, And Reasoning: A Multimodal Agent With Long-term Memory (2025) • No Venue
Long et al.
Webgen-agent: Enhancing Interactive Website Generation With Multi-level Feedback And Step-level Reinforcement Learning (2025) • No Venue
Lu et al.
Atoken: A Unified Tokenizer For Vision (2025) • No Venue
Lu et al.
Bizfinbench: A Business-driven Real-world Financial Benchmark For Evaluating Llms (2025) • No Venue
Lu et al.
Dreamactor-m1: Holistic, Expressive And Robust Human Image Animation With Hybrid Guidance (2025) • No Venue
Luo et al.
Large Language Model Agent: A Survey On Methodology, Applications And Challenges (2025) • No Venue
Luo et al.
Mcp-universe: Benchmarking Large Language Models With Real-world Model Context Protocol Servers (2025) • No Venue
Luo et al.
Open Captchaworld: A Comprehensive Web-based Platform For Testing And Benchmarking Multimodal LLM Agents (2025) • No Venue
Luo et al.
Technologies On Effectiveness And Efficiency: A Survey Of State Spaces Models (2025) • No Venue
Lv et al.
Agentrewardbench: Evaluating Automatic Evaluations Of Web Agent Trajectories (2025) • No Venue
Lù et al.
Calligrapher: Freestyle Text Image Customization (2025) • No Venue
Ma et al.
TCIA: A Task-centric Instruction Augmentation Method For Instruction Finetuning (2025) • No Venue
Ma et al.
SQL-R1: Training Natural Language To SQL Reasoning Model By Reinforcement Learning (2025) • No Venue
Ma et al.
Yume: An Interactive World Generation Model (2025) • No Venue
Mao et al.
Hard Negative Mining For Domain-specific Retrieval In Enterprise Systems (2025) • No Venue
Meghwani et al.
Discrete Audio Tokens: More Than A Survey! (2025) • No Venue
Mousavi et al.
Dreamo: A Unified Framework For Image Customization (2025) • No Venue
Mou et al.
AION-1: Omnimodal Foundation Model For Astronomical Sciences (2025) • No Venue
Parker et al.
Sweeval: Do Llms Really Swear? A Safety Benchmark For Testing Limits For Enterprise Use (2025) • No Venue
Patel et al.
Optimizing Multilingual Text-to-speech With Accents & Emotions (2025) • No Venue
Pawar et al.
Multifinben: A Multilingual, Multimodal, And Difficulty-aware Benchmark For Financial LLM Evaluation (2025) • No Venue
Peng et al.
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification To Improve Trustworthy QA (2025) • No Venue
Pletenev et al.
Bookworld: From Novels To Interactive Agent Societies For Creative Story Generation (2025) • No Venue
Ran et al.
Simworld: An Open-ended Realistic Simulator For Autonomous Agents In Physical And Social Worlds (2025) • No Venue
Ren et al.
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning And Reasoning Modes (2025) • No Venue
Research et al.
Quickvideo: Real-time Long Video Understanding With System Algorithm Co-design (2025) • No Venue
Schneider et al.
Seaweed-7b: Cost-effective Training Of Video Generation Foundation Model (2025) • No Venue
Seawead et al.
Seedream 4.0: Toward Next-generation Multimodal Image Generation (2025) • No Venue
Seedream et al.
When Punctuation Matters: A Large-scale Comparison Of Prompt Robustness Methods For Llms (2025) • No Venue
Seleznyov et al.
Yourbench: Easy Custom Evaluation Sets For Everyone (2025) • No Venue
Shashidhar et al.
Solving Inequality Proofs With Large Language Models (2025) • No Venue
Sheng et al.
Mme-videoocr: Evaluating Ocr-based Capabilities Of Multimodal Llms In Video Scenarios (2025) • No Venue
Shi et al.
Voila: Voice-language Foundation Models For Real-time Autonomous Interaction And Voice Role-play (2025) • No Venue
Shi et al.
Towards Trustworthy GUI Agents: A Survey (2025) • No Venue
Shi et al.
MADD: Multi-agent Drug Discovery Orchestra (2025) • No Venue
Solovev et al.
Vf-eval: Evaluating Multimodal Llms For Generating Feedback On AIGC Videos (2025) • No Venue
Song et al.
Expanding RL With Verifiable Rewards Across Diverse Domains (2025) • No Venue
Su et al.
Thinking With Images For Multimodal Reasoning: Foundations, Methods, And Future Frontiers (2025) • No Venue
Su et al.
Januscoder: Towards A Foundational Visual-programmatic Interface For Code Intelligence (2025) • No Venue
Sun et al.
Understanding Generative AI Capabilities In Everyday Image Editing Tasks (2025) • No Venue
Taesiri et al.
Lego-puzzles: How Good Are Mllms At Multi-step Spatial Reasoning? (2025) • No Venue
Tang et al.
Baichuan-m2: Scaling Medical Capability With Large Verifier System (2025) • No Venue
Team et al.
Gemini Robotics: Bringing AI Into The Physical World (2025) • No Venue
Team et al.
Cube: A Roblox View Of 3D Intelligence (2025) • No Venue
Team et al.
Minicpm4: Ultra-efficient Llms On End Devices (2025) • No Venue
Team et al.
Mimo: Unlocking The Reasoning Potential Of Language Model -- From Pretraining To Posttraining (2025) • No Venue
Team et al.
Robobrain 2.0 Technical Report (2025) • No Venue
Team et al.
Audiox: Diffusion Transformer For Anything-to-audio Generation (2025) • No Venue
Tian et al.
Reflections From The 2024 Large Language Model (LLM) Hackathon For Applications In Materials Science And Chemistry (2024) • No Venue
Zimmermann et al.
From Rags To Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information For Factual Queries (2024) • No Venue
Wadhwa et al.
Nl-eye: Abductive NLI For Images (2024) • No Venue
Ventura et al.
Tnt-llm: Text Mining At Scale With Large Language Models (2024) • No Venue
Wan et al.
Edify Image: High-quality Image Generation With Pixel Space Laplacian Diffusion Models (2024) • No Venue
Nvidia et al.
Llm-detectaive: A Tool For Fine-grained Machine-generated Text Detection (2024) • No Venue
Abassy et al.
Alignment Studio: Aligning Large Language Models To Particular Contextual Regulations (2024) • No Venue
Achintalwar et al.
Agent S: An Open Agentic Framework That Uses Computers Like A Human (2024) • No Venue
Agashe et al.
Evolutionary Optimization Of Model Merging Recipes (2024) • No Venue
Akiba et al.
Automated Unit Test Improvement Using Large Language Models At Meta (2024) • FSE '24: 32nd ACM International Conference on the Foundations of Software Engineering • 53 citations
Alshahwan et al.
Arigraph: Learning Knowledge Graph World Models With Episodic Memory For LLM Agents (2024) • No Venue
Anokhin et al.
Stable Flow: Vital Layers For Training-free Image Editing (2024) • No Venue
Avrahami et al.
Skywork-math: Data Scaling Laws For Mathematical Reasoning In Large Language Models -- The Story Goes On (2024) • No Venue
Zeng et al.
Flowmind: Automatic Workflow Generation With Llms (2024) • No Venue
Zeng et al.
Syncammaster: Synchronizing Multi-camera Video Generation From Diverse Viewpoints (2024) • No Venue
Bai et al.
Seed-music: A Unified Framework For High Quality And Controlled Music Generation (2024) • No Venue
Bai et al.
Seven Failure Points When Engineering A Retrieval Augmented Generation System (2024) • Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI • 57 citations
Barnett et al.
Lumiere: A Space-time Diffusion Model For Video Generation (2024) • No Venue
Bar-Tal et al.
Genomic Language Models: Opportunities And Challenges (2024) • Trends in Genetics • 43 citations
Benegas et al.
SUTRA: Scalable Multilingual Language Model Architecture (2024) • No Venue
Bendale et al.
MUMU: Bootstrapping Multimodal Image Generation From Text-to-image Data (2024) • No Venue
William Berman, Alexander Peysakhovich
Taking The Next Step With Generative Artificial Intelligence: The Transformative Role Of Multimodal Large Language Models In Science Education (2024) • Learning and Individual Differences • 47 citations
Bewersdorff et al.
Speculative Streaming: Fast LLM Inference Without Auxiliary Models (2024) • No Venue
Bhendawade et al.
INDUS: Effective And Efficient Language Models For Scientific Applications (2024) • No Venue
Bhattacharjee et al.
Make It Count: Text-to-image Generation With An Accurate Number Of Objects (2024) • No Venue
Binyamin et al.
Intelligent Clinical Documentation: Harnessing Generative AI For Patient-centric Clinical Note Generation (2024) • International Journal of Innovative Science and Research Technology (IJISRT) • 999 citations
Anjanava Biswas, Wrick Talukdar
Biomedlm: A 2.7B Parameter Language Model Trained On Biomedical Text (2024) • No Venue
Bolton et al.
Windows Agent Arena: Evaluating Multi-modal OS Agents At Scale (2024) • No Venue
Bonatti et al.
An Introduction To Vision-language Modeling (2024) • No Venue
Bordes et al.
Roadmap Towards Superhuman Speech Understanding Using Large Language Models (2024) • No Venue
Bu et al.
Uni-smart: Universal Science Multimodal Analysis And Research Transformer (2024) • No Venue
Cai et al.
Survey On Large Language Model-enhanced Reinforcement Learning: Concept, Taxonomy, And Methods (2024) • IEEE Transactions on Neural Networks and Learning Systems • 50 citations
Cao et al.
Language-based Game Theory In The Age Of Artificial Intelligence (2024) • Journal of The Royal Society Interface • 66 citations
Capraro et al.
XTTS: A Massively Multilingual Zero-shot Text-to-speech Model (2024) • Interspeech 2024 • 49 citations
Casanova et al.
Edgefusion: On-device Text-to-image Generation (2024) • No Venue
Castells et al.
At The Dawn Of Generative AI Era: A Tutorial-cum-survey On New Frontiers In 6G Wireless Intelligence (2024) • IEEE Open Journal of the Communications Society • 57 citations
Abdulkadir Celik, Ahmed M. Eltawil
Chatmusician: Understanding And Generating Music Intrinsically With LLM (2024) • No Venue
Yuan et al.
Scaling Synthetic Data Creation With 1,000,000,000 Personas (2024) • No Venue
Chan et al.
3dtopia-xl: Scaling High-quality 3D Asset Generation Via Primitive Diffusion (2024) • No Venue
Chen et al.
Agentpoison: Red-teaming LLM Agents Via Poisoning Memory Or Knowledge Bases (2024) • No Venue
Chen et al.
BGE M3-embedding: Multi-lingual, Multi-functionality, Multi-granularity Text Embeddings Through Self-knowledge Distillation (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 221 citations
Chen et al.
Contrastive Localized Language-image Pre-training (2024) • No Venue
Chen et al.
Mindsearch: Mimicking Human Minds Elicits Deep AI Searcher (2024) • No Venue
Chen et al.
Gmai-mmbench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI (2024) • No Venue
Chen et al.
Mega-bench: Scaling Multimodal Evaluation To Over 500 Real-world Tasks (2024) • No Venue
Chen et al.
Octopus V2: On-device Language Model For Super Agent (2024) • No Venue
Wei Chen, Zhiyuan Li
Region-aware Text-to-image Generation Via Hard Binding And Soft Refinement (2024) • No Venue
Chen et al.
Spatialvlm: Endowing Vision-language Models With Spatial Reasoning Capabilities (2024) • No Venue
Chen et al.
Textgrad: Automatic "differentiation" Via Text (2024) • No Venue
Yuksekgonul et al.
Mora: Enabling Generalist Video Generation Via A Multi-agent Framework (2024) • No Venue
Yuan et al.
From MOOC To MAIC: Reshaping Online Teaching And Learning Through Llm-driven Agents (2024) • No Venue
Yu et al.
Videgothink: Assessing Egocentric Video Understanding Capabilities For Embodied AI (2024) • No Venue
Cheng et al.
On Domain-specific Post-training For Multimodal Large Language Models (2024) • No Venue
Cheng et al.
The Browsergym Ecosystem For Web Agent Research (2024) • No Venue
Chezelles et al.
Harnessing Large Language Models For Text-rich Sequential Recommendation (2024) • WWW '24: The ACM Web Conference 2024 • 45 citations
Zheng et al.
Beyond Fine-tuning: Unleashing The Potential Of Continuous Pretraining For Clinical Llms (2024) • No Venue
Christophe et al.
Visionllama: A Unified Llama Interface For Vision Tasks (2024) • No Venue
Chu et al.
Are Vision-language Models Truly Understanding Multi-vision Sensor? (2024) • No Venue
Chung et al.
VLOGGER: Multimodal Diffusion For Embodied Avatar Synthesis (2024) • No Venue
Corona et al.
Towards A Personal Health Large Language Model (2024) • No Venue
Cosentino et al.
Large Legal Fictions: Profiling Legal Hallucinations In Large Language Models (2024) • Journal of Legal Analysis • 108 citations
Dahl et al.
Self-recognition In Language Models (2024) • No Venue
Davidson et al.
ORGANA: A Robotic Assistant For Automated Chemistry Experimentation And Characterization (2024) • Matter • 52 citations
Darvish et al.
Security And Privacy Challenges Of Large Language Models: A Survey (2024) • ACM Computing Surveys • 104 citations
Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu
Sam2long: Enhancing SAM 2 For Long Video Segmentation With A Training-free Memory Tree (2024) • No Venue
Ding et al.
Vintern-1b: An Efficient Multimodal Large Language Model For Vietnamese (2024) • No Venue
Doan et al.
A Scoping Review Of Chatgpt Research In Accounting And Finance (2024) • International Journal of Accounting Information Systems • 41 citations
Mengming Michael Dong, Theophanis C. Stratopoulos, Victor Xiaoqi Wang
Git: Towards Generalist Vision Transformer Through Universal Language Interface (2024) • No Venue
Wang et al.
Autotrain: No-code Training For State-of-the-art Models (2024) • No Venue
Abhishek Thakur
Spreadsheetllm: Encoding Spreadsheets For Large Language Models (2024) • No Venue
Tian et al.
Mobile-agent: Autonomous Multi-modal Mobile Device Agent With Visual Perception (2024) • No Venue
Wang et al.
Grutopia: Dream General Robots In A City At Scale (2024) • No Venue
Wang et al.
LAVE: Llm-powered Agent Assistance And Language Augmentation For Video Editing (2024) • No Venue
Wang et al.
Litesearch: Efficacious Tree Search For LLM (2024) • No Venue
Wang et al.
MOSAIC: A Modular System For Assistive And Interactive Cooking (2024) • No Venue
Wang et al.
Utilizing Local Hierarchy With Adversarial Training For Hierarchical Text Classification (2024) • ACM Computing Surveys • 58 citations
Zihan Wang, Peiyi Wang, Houfeng Wang
Deep Learning For Cross-domain Data Fusion In Urban Computing: Taxonomy, Advances, And Outlook (2024) • Information Fusion • 53 citations
Zou et al.
Hunyuan-large: An Open-source Moe Model With 52 Billion Activated Parameters By Tencent (2024) • No Venue
Sun et al.
LAMBDA: A Large Model Based Data Agent (2024) • No Venue
Sun et al.
Weaver: Foundation Models For Creative Writing (2024) • No Venue
Wang et al.
Videollamb: Long-context Video Understanding With Recurrent Memory Bridges (2024) • No Venue
Wang et al.
Virtuwander: Enhancing Multi-modal Interaction For Virtual Tour Guidance Through Large Language Models (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 42 citations
Wang et al.
Diasynth -- Synthetic Dialogue Generation Framework (2024) • No Venue
Suresh et al.
Meta-prompting: Enhancing Language Models With Task-agnostic Scaffolding (2024) • No Venue
Mirac Suzgun, Adam Tauman Kalai
Videogamebunny: Towards Vision Assistants For Video Games (2024) • No Venue
Mohammad Reza Taesiri, Cor-Paul Bezemer
Resonance Rope: Improving Context Length Generalization Of Large Language Models (2024) • No Venue
Wang et al.
TIP-I2V: A Million-scale Real Text And Image Prompt Dataset For Image-to-video Generation (2024) • No Venue
Wenhao Wang, Yi Yang
A Framework For Human Evaluation Of Large Language Models In Healthcare Derived From Literature Review (2024) • npj Digital Medicine • 131 citations
Tam et al.
Knowledge Mechanisms In Large Language Models: A Survey And Perspective (2024) • No Venue
Wang et al.
Omnieval: An Omnidirectional And Automatic RAG Evaluation Benchmark In Financial Domain (2024) • No Venue
Wang et al.
Phased Consistency Model (2024) • No Venue
Wang et al.
Ominicontrol: Minimal And Universal Control For Diffusion Transformer (2024) • No Venue
Tan et al.
Large Language Models For Data Annotation And Synthesis: A Survey (2024) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 78 citations
Tan et al.
Secrets Of RLHF In Large Language Models Part II: Reward Modeling (2024) • No Venue
Wang et al.
Capabilities Of Gemini Models In Medicine (2024) • No Venue
Saab et al.
Hybridrag: Integrating Knowledge Graphs And Vector Retrieval Augmented Generation For Efficient Information Extraction (2024) • Proceedings of the 5th ACM International Conference on AI in Finance • 55 citations
Sarmah et al.
Fast High-resolution Image Synthesis With Latent Adversarial Diffusion Distillation (2024) • No Venue
Sauer et al.
Prithvi Wxc: Foundation Model For Weather And Climate (2024) • No Venue
Schmude et al.
Generative Artificial Intelligence: A Systematic Review And Applications (2024) • Multimedia Tools and Applications • 189 citations
Sengar et al.
A Multimodal Automated Interpretability Agent (2024) • No Venue
Shaham et al.
Polynomial Composition Activations: Unleashing The Dynamics Of Large Language Models (2024) • No Venue
Zhuo et al.
Tag-llm: Repurposing General-purpose Llms For Specialized Domains (2024) • No Venue
Shen et al.
Mamba-in-mamba: Centralized Mamba-cross-scan In Tokenized Mamba Model For Hyperspectral Image Classification (2024) • Neurocomputing • 60 citations
Zhou et al.
VSSD: Vision Mamba With Non-casual State Space Duality (2024) • No Venue
Shi et al.
Large-scale Text-to-image Model With Inpainting Is A Zero-shot Subject-driven Image Generator (2024) • No Venue
Shin et al.
Rethinking Interpretability In The Era Of Large Language Models (2024) • No Venue
Singh et al.
Funaudiollm: Voice Understanding And Generation Foundation Models For Natural Interaction Between Humans And Llms (2024) • No Venue
Tongyi Speechteam
Best Practices And Lessons Learned On Synthetic Data For Language Models (2024) • No Venue
Liu et al.
Generative Photomontage (2024) • No Venue
Liu et al.
Lipo: Listwise Preference Optimization Through Learning-to-rank (2024) • No Venue
Liu et al.
Llms + Persona-plug = Personalized Llms (2024) • No Venue
Liu et al.
Magicquill: An Intelligent Interactive Image Editing System (2024) • No Venue
Liu et al.
POINTS1.5: Building A Vision-language Model Towards Real World Applications (2024) • No Venue
Liu et al.
Sora: A Review On Background, Technology, Limitations, And Opportunities Of Large Vision Models (2024) • No Venue
Liu et al.
Skywork-reward: Bag Of Tricks For Reward Modeling In Llms (2024) • No Venue
Liu et al.
Teach Multimodal Llms To Comprehend Electrocardiographic Images (2024) • No Venue
Liu et al.
Multimodal Healthcare AI: Identifying And Designing Clinically Relevant Vision-language Applications For Radiology (2024) • Proceedings of the CHI Conference on Human Factors in Computing Systems • 50 citations
Yildirim et al.
Generation Of Asset Administration Shell With Large Language Model Agents: Toward Semantic Interoperability In Digital Twins In The Context Of Industry 4.0 (2024) • IEEE Access • 47 citations
Xia et al.
Omg-llava: Bridging Image-level, Object-level, Pixel-level Reasoning And Understanding (2024) • No Venue
Zhang et al.
Lora Land: 310 Fine-tuned Llms That Rival GPT-4, A Technical Report (2024) • No Venue
Zhao et al.
O1-coder: An O1 Replication For Coding (2024) • No Venue
Zhang et al.
Mme-realworld: Could Your Multimodal LLM Challenge High-resolution Real-world Scenarios That Are Difficult For Humans? (2024) • No Venue
Zhang et al.
OCR Hinders RAG: Evaluating The Cascading Impact Of OCR On Retrieval-augmented Generation (2024) • No Venue
Zhang et al.
Personalization Of Large Language Models: A Survey (2024) • No Venue
Zhang et al.
Simulating Classroom Education With Llm-empowered Agents (2024) • No Venue
Zhang et al.
Sageattention2 Technical Report: Accurate 4 Bit Attention For Plug-and-play Inference Acceleration (2024) • No Venue
Zhang et al.
On Llms-driven Synthetic Data Generation, Curation, And Evaluation: A Survey (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 45 citations
Long et al.
Segment Anything Model For Medical Image Segmentation: Current Applications And Future Directions (2024) • Computers in Biology and Medicine • 179 citations
Yichi Zhang, Zhenrong Shen, Rushi Jiao
Deepseek-vl: Towards Real-world Vision-language Understanding (2024) • No Venue
Lu et al.
From GPT-4 To Gemini And Beyond: Assessing The Landscape Of Mllms On Generalizability, Trustworthiness And Causality Through Four Modalities (2024) • No Venue
Lu et al.
Omniparser For Pure Vision Based GUI Agent (2024) • No Venue
Lu et al.
Retrieval-augmented Generation For Ai-generated Content: A Survey (2024) • Arxiv • 72 citations
Zhao et al.
Semievol: Semi-supervised Fine-tuning For LLM Adaptation (2024) • No Venue
Luo et al.
Robustft: Robust Supervised Fine-tuning For Large Language Models Under Noisy Response (2024) • No Venue
Luo et al.
Improve Mathematical Reasoning In Language Models By Automated Process Supervision (2024) • No Venue
Luo et al.
A Survey To Recent Progress Towards Understanding In-context Learning (2024) • Frontiers of Computer Science • 40 citations
Mao et al.
Tablebench: A Comprehensive And Complex Benchmark For Table Question Answering (2024) • No Venue
Wu et al.
Aria Everyday Activities Dataset (2024) • No Venue
Lv et al.
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents (2024) • No Venue
Wu et al.
Fiva: Fine-grained Visual Attribute Dataset For Text-to-image Diffusion Models (2024) • No Venue
Wu et al.
Openmedlm: Prompt Engineering Can Out-perform Fine-tuning In Medical Question-answering With Open-source Large Language Models (2024) • Scientific Reports • 53 citations
Maharjan et al.
Towards World Simulator: Crafting Physical Commonsense-based Benchmark For Video Generation (2024) • No Venue
Meng et al.
Bimedix2: Bio-medical Expert LMM For Diverse Medical Modalities (2024) • No Venue
Mullappilly et al.
Generative AI In EU Law: Liability, Privacy, Intellectual Property, And Cybersecurity (2024) • Arxiv • 45 citations
Novelli et al.
Aurora-m: The First Open Source Multilingual Language Model Red-teamed According To The U.S. Executive Order (2024) • No Venue
Nakamura et al.
Preference Tuning With Human Feedback On Language, Speech, And Vision Tasks: A Survey (2024) • No Venue
Winata et al.
GUI Agents: A Survey (2024) • No Venue
Nguyen et al.
Swiftedit: Lightning Fast Text-guided Image Editing Via One-step Diffusion (2024) • No Venue
Nguyen et al.
DITTO: Diffusion Inference-time T-optimization For Music Generation (2024) • No Venue
Novack et al.
Integrating Large Language Models Into A Tri-modal Architecture For Automated Depression Classification (2024) • No Venue
Santosh V. Patapati
Bielik 7B V0.1: A Polish Language Model -- Development, Insights, And Evaluation (2024) • No Venue
Ociepa et al.
Relik: Retrieve And Link, Fast And Accurate Entity Linking And Relation Extraction On An Academic Budget (2024) • No Venue
Orlando et al.
IOPO: Empowering Llms With Complex Instruction Following Via Input-output Preference Optimization (2024) • No Venue
Zhang et al.
Multi-dimensional Insights: Benchmarking Real-world Personalization In Large Multimodal Models (2024) • No Venue
Zhang et al.
Survey Of Cultural Awareness In Language Models: Text And Beyond (2024) • No Venue
Pawar et al.
Llmtimesmapreduce: Simplified Long-sequence Processing Using Large Language Models (2024) • No Venue
Zhou et al.
Internlm-xcomposer2.5-omnilive: A Comprehensive Multimodal System For Long-term Streaming Video And Audio Interactions (2024) • No Venue
Zhang et al.
Personalized Visual Instruction Tuning (2024) • No Venue
Pi et al.
Movie Gen: A Cast Of Media Foundation Models (2024) • No Venue
Polyak et al.
Sambanova SN40L: Scaling The AI Memory Wall With Dataflow And Composition Of Experts (2024) • No Venue
Prabhakar et al.
Large Language Models Meet NLP: A Survey (2024) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 81 citations
Qin et al.
Bringing Objects To Life: 4D Generation From 3D Objects (2024) • No Venue
Rahamim et al.
Editable Scene Simulation For Autonomous Driving Via Collaborative Llm-agents (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 40 citations
Wei et al.
SAM 2: Segment Anything In Images And Videos (2024) • No Venue
Ravi et al.
Omniedit: Building Image Editing Generalist Models Through Specialist Supervision (2024) • No Venue
Wei et al.
An Interactive Agent Foundation Model (2024) • No Venue
Durante et al.
Llm-based Policy Generation For Intent-based Management Of Applications (2024) • 2023 19th International Conference on Network and Service Management (CNSM) • 46 citations
Dzeparoska et al.
Build-a-scene: Interactive 3D Layout Control For Diffusion-based Image Generation (2024) • No Venue
Abdelrahman Eldesokey, Peter Wonka
Mmfactory: A Universal Solution Search Engine For Vision-language Tasks (2024) • No Venue
Wan-Cyuan Fan, Tanzila Rahman, Leonid Sigal
A Survey On RAG Meeting Llms: Towards Retrieval-augmented Large Language Models (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 281 citations
Fan et al.
Mmbench-video: A Long-form Multi-shot Benchmark For Holistic Video Understanding (2024) • No Venue
Fang et al.
Chemllm: A Chemical Large Language Model (2024) • No Venue
Zhang et al.
Towards Fast Multilingual LLM Inference: Speculative Decoding And Specialized Drafters (2024) • No Venue
Yi et al.
Enhancing Video-language Representations With Structural Spatio-temporal Alignment (2024) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 49 citations
Fei et al.
FLUX That Plays Music (2024) • No Venue
Fei et al.
VITA: Towards Open-source Interactive Omni Multimodal LLM (2024) • No Venue
Fu et al.
Efficient LLM Scheduling By Learning To Rank (2024) • No Venue
Fu et al.
Mme-survey: A Comprehensive Survey On Evaluation Of Multimodal Llms (2024) • No Venue
Fu et al.
Large Language Models And Games: A Survey And Roadmap (2024) • IEEE Transactions on Games • 57 citations
Gallotta et al.
Similarity Is Not All You Need: Endowing Retrieval Augmented Generation With Multi Layered Thoughts (2024) • No Venue
Gan et al.
Empowering Biomedical Discovery With AI Agents (2024) • Cell • 129 citations
Gao et al.
Generative AI For Visualization: State Of The Art And Future Directions (2024) • Visual Informatics • 68 citations
Ye et al.
Differential Transformer (2024) • No Venue
Ye et al.
Minicpm-v: A GPT-4V Level MLLM On Your Phone (2024) • No Venue
Yao et al.
Kvasir-vqa: A Text-image Pair GI Tract Dataset (2024) • No Venue
Gautam et al.
Protagents: Protein Discovery Via Large Language Model Multi-agent Collaborations Combining Physics And Machine Learning (2024) • Digital Discovery • 49 citations
A. Ghafarollahi, M. J. Buehler
Patchscope: A Unifying Framework For Inspecting Hidden Representations Of Language Models (2024) • No Venue
Ghandeharioun et al.
Paecter: Patent-level Representation Learning Using Citation-informed Transformers (2024) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 218 citations
Ghosh et al.
Chatglm: A Family Of Large Language Models From GLM-130B To GLM-4 All Tools (2024) • No Venue
Glm et al.
Av-odyssey Bench: Can Your Multimodal Llms Really Understand Audio-visual Information? (2024) • No Venue
Gong et al.
Navigating The Digital World As Humans Do: Universal Visual Grounding For GUI Agents (2024) • No Venue
Gou et al.
Recent Advances In Generative AI And Large Language Models: Current Status, Challenges, And Perspectives (2024) • IEEE Transactions on Artificial Intelligence • 59 citations
Desta Haileselassie Hagos, Rick Battle, Danda B. Rawat
Flex3d: Feed-forward 3D Generation With Flexible Reconstruction Model And Input View Curation (2024) • No Venue
Han et al.
Parameter-efficient Fine-tuning For Large Models: A Comprehensive Survey (2024) • Arxiv • 81 citations
Han et al.
Seed-story: Multimodal Long Story Generation With Large Language Model (2024) • No Venue
Yang et al.
Video As The New Language For Real-world Decision Making (2024) • No Venue
Yang et al.
Exploring Chatgpt And Its Impact On Society (2024) • AI and Ethics • 44 citations
Md. Asraful Haque, Shuai Li
Vision-language Models For Medical Report Generation And Visual Question Answering: A Review (2024) • Frontiers in Artificial Intelligence • 86 citations
Iryna Hartsock, Ghulam Rasool
Mambavision: A Hybrid Mamba-transformer Vision Backbone (2024) • No Venue
Ali Hatamizadeh, Jan Kautz
Webvoyager: Building An End-to-end Web Agent With Large Multimodal Models (2024) • No Venue
He et al.
Denoising Vision Transformers (2024) • No Venue
Yang et al.
A Survey Of Recent Methods For Addressing AI Fairness And Bias In Biomedicine (2024) • Journal of Biomedical Informatics • 44 citations
Yang et al.
Qwen2 Technical Report (2024) • No Venue
Yang et al.
Cogvlm2: Visual Language Models For Image And Video Understanding (2024) • No Venue
Hong et al.
Sampart3d: Segment Any Part In 3D Objects (2024) • No Venue
Yang et al.
Evaluating And Aligning Codellms On Human Preference (2024) • No Venue
Yang et al.
Do Large Language Models Latently Perform Multi-hop Reasoning? (2024) • No Venue
Yang et al.
Minicpm: Unveiling The Potential Of Small Language Models With Scalable Training Strategies (2024) • No Venue
Hu et al.
The Dawn Of GUI Agent: A Preliminary Case Study With Claude 3.5 Computer Use (2024) • No Venue
Hu et al.
Adapting Visual-language Models For Generalizable Anomaly Detection In Medical Images (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Huang et al.
Mv-adapter: Multi-view Consistent Image Generation Made Easy (2024) • No Venue
Huang et al.
Autocoderover: Autonomous Program Improvement (2024) • Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis • 54 citations
Zhang et al.
Qwen2.5-coder Technical Report (2024) • No Venue
Hui et al.
Large Language Models For Uavs: Current State And Pathways To The Future (2024) • IEEE Open Journal of Vehicular Technology • 46 citations
Shumaila Javaid, Nasir Saeed, Bin He
Dsbench: How Far Are Data Science Agents To Becoming Data Science Experts? (2024) • No Venue
Jing et al.
Accessing GPT-4 Level Mathematical Olympiad Solutions Via Monte Carlo Tree Self-refine With Llama-3 8B (2024) • No Venue
Zhang et al.
VARCO-VISION: Expanding Frontiers In Korean Vision-language Models (2024) • No Venue
Ju et al.
MEDIC: Towards A Comprehensive Framework For Evaluating Llms In Clinical Applications (2024) • No Venue
Kanithi et al.
Omniact: A Dataset And Benchmark For Enabling Multimodal Generalist Autonomous Agents For Desktop And Web (2024) • No Venue
Kapoor et al.
Beyondscene: Higher-resolution Human-centric Scene Generation With Pretrained Diffusion (2024) • No Venue
Kim et al.
A Survey On Integration Of Large Language Models With Intelligent Robots (2024) • Intelligent Service Robotics • 46 citations
Kim et al.
Videoicl: Confidence-based Iterative In-context Learning For Out-of-distribution Video Understanding (2024) • No Venue
Kim et al.
Careless Whisper: Speech-to-text Hallucination Harms (2024) • The 2024 ACM Conference on Fairness, Accountability, and Transparency • 47 citations
Koenecke et al.
Biomistral: A Collection Of Open-source Pretrained Large Language Models For Medical Domains (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 108 citations
Labrak et al.
An Artificial Intelligence (AI) Workflow For Catalyst Design And Optimization (2024) • Industrial & Engineering Chemistry Research • 48 citations
Lai et al.
Pllava : Parameter-free Llava Extension From Images To Videos For Video Dense Captioning (2024) • No Venue
Xu et al.
The Opportunities And Risks Of Large Language Models In Mental Health (2024) • JMIR Mental Health • 80 citations
Lawrence et al.
Closing The Gap Between Open-source And Commercial Large Language Models For Medical Evidence Summarization (2024) • npj Digital Medicine • 45 citations
Zhang et al.
LLM2LLM: Boosting Llms With Novel Iterative Data Enhancement (2024) • No Venue
Lee et al.
Streammultidiffusion: Real-time Interactive Generation With Region-based Semantic Control (2024) • No Venue
Lee et al.
Mathemyths: Leveraging Large Language Models To Teach Mathematical Language Through Child-ai Co-creative Storytelling (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 47 citations
Zhang et al.
Ootdiffusion: Outfitting Fusion Based Latent Diffusion For Controllable Virtual Try-on (2024) • No Venue
Xu et al.
Beyond A*: Better Planning With Transformers Via Search Dynamics Bootstrapping (2024) • No Venue
Lehnert et al.
Materials Science In The Era Of Large Language Models: A Perspective (2024) • Digital Discovery • 49 citations
Ge Lei, Ronan Docherty, Samuel J. Cooper
Songcreator: Lyrics-based Universal Song Generation (2024) • No Venue
Lei et al.
Llava-next-interleave: Tackling Multi-image, Video, And 3D In Large Multimodal Models (2024) • No Venue
Li et al.
Baichuan-omni Technical Report (2024) • No Venue
Li et al.
Codes: Towards Building Open-source Language Models For Text-to-sql (2024) • Proceedings of the ACM on Management of Data • 44 citations
Li et al.
From Generation To Judgment: Opportunities And Challenges Of Llm-as-a-judge (2024) • No Venue
Li et al.
Euclid: Supercharging Multimodal Llms With Synthetic High-fidelity Visual Descriptions (2024) • No Venue
Zhang et al.
TPI-LLM: Serving 70b-scale Llms Efficiently On Low-resource Edge Devices (2024) • No Venue
Li et al.
Svdqunat: Absorbing Outliers By Low-rank Components For 4-bit Diffusion Models (2024) • No Venue
Li et al.
Structrag: Boosting Knowledge Intensive Reasoning Of Llms Via Inference-time Hybrid Information Structurization (2024) • No Venue
Li et al.
AI For Social Science And Social Science Of AI: A Survey (2024) • Information Processing & Management • 71 citations
Xu et al.
Chatglm-math: Improving Math Problem-solving In Large Language Models With A Self-critique Pipeline (2024) • No Venue
Xu et al.
Agenttrek: Agent Trajectory Synthesis Via Guiding Replay With Web Tutorials (2024) • No Venue
Xu et al.
CLAY: A Controllable Large-scale Generative Model For Creating High-quality 3D Assets (2024) • ACM Transactions on Graphics • 49 citations
Zhang et al.
A Comprehensive Study Of Knowledge Editing For Large Language Models (2024) • No Venue
Zhang et al.
Document Parsing Unveiled: Techniques, Challenges, And Prospects For Structured Information Extraction (2024) • No Venue
Zhang et al.
Controllable Text Generation For Large Language Models: A Survey (2024) • No Venue
Liang et al.
Foundation Models For Time Series Analysis: A Tutorial And Survey (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 100 citations
Liang et al.
GS-LRM: Large Reconstruction Model For 3D Gaussian Splatting (2024) • No Venue
Zhang et al.
Critic-v: VLM Critics Help Catch VLM Errors In Multimodal Reasoning (2024) • No Venue
Zhang et al.
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once On Gemma 2 (2024) • No Venue
Lieberum et al.
Osworld: Benchmarking Multimodal Agents For Open-ended Tasks In Real Computer Environments (2024) • No Venue
Xie et al.
Paper Copilot: A Self-evolving And Efficient LLM System For Personalized Academic Assistance (2024) • No Venue
Lin et al.
Open-sora Plan: Open-source Large Video Generation Model (2024) • No Venue
Lin et al.
A Natural-language-based Approach To Intelligent Data Retrieval And Representation For Cloud BIM (2024) • Computer-Aided Civil and Infrastructure Engineering • 160 citations
Lin et al.
Advancing Building Energy Modeling With Large Language Models: Exploration And Case Studies (2024) • Energy and Buildings • 42 citations
Liang Zhang, Zhelun Chen, Vitaly Ford
STIV: Scalable Text And Image Conditioned Video Generation (2024) • No Venue
Lin et al.
Travelplanner: A Benchmark For Real-world Planning With Language Agents (2024) • No Venue
Xie et al.
Open-finllms: Open Multimodal Large Language Models For Financial Applications (2024) • No Venue
Xie et al.
Generative AI: Implications And Applications For Education (2023) • Arxiv • 59 citations
Olga et al.
Large Language Models In Medicine: The Potentials And Pitfalls (2023) • Annals of Internal Medicine • 212 citations
Omiye et al.
The Segment Anything Model (SAM) For Remote Sensing Applications: From Zero To One Shot (2023) • International Journal of Applied Earth Observation and Geoinformation • 227 citations
Osco et al.
Large Language Models Can Infer Psychological Dispositions Of Social Media Users (2023) • PNAS Nexus • 45 citations
Heinrich Peters, Sandra Matz
LMDX: Language Model-based Document Information Extraction And Localization (2023) • No Venue
Perot et al.
Chatgpt Prompt Patterns For Improving Code Quality, Refactoring, Requirements Elicitation, And Software Design (2023) • Generative AI for Effective Software Development • 147 citations
White et al.
A Survey On Few-shot Class-incremental Learning (2023) • Neural Networks • 131 citations
Tian et al.
Is Chatgpt The Ultimate Programming Assistant -- How Far Is It? (2023) • Arxiv • 107 citations
Tian et al.
Kosmos-2: Grounding Multimodal Large Language Models To The World (2023) • No Venue
Peng et al.
Automatically Correcting Large Language Models: Surveying The Landscape Of Diverse Self-correction Strategies (2023) • Transactions of the Association for Computational Linguistics • 42 citations
Pan et al.
Natural Language Generation And Understanding Of Big Code For Ai-assisted Programming: A Review (2023) • Entropy • 90 citations
Wong et al.
Generative Agents: Interactive Simulacra Of Human Behavior (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 941 citations
Park et al.
Enabling Resource-efficient Aiot System With Cross-level Optimization: A Survey (2023) • IEEE Communications Surveys & Tutorials • 44 citations
Liu et al.
A Comprehensive Evaluation Of Chatgpt's Zero-shot Text-to-sql Capability (2023) • Arxiv • 58 citations
Liu et al.
Deid-gpt: Zero-shot Medical Text De-identification By GPT-4 (2023) • Arxiv • 89 citations
Liu et al.
Make LLM A Testing Expert: Bringing Human-like Interaction To Mobile GUI Testing Via Functionality-aware Decisions (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 57 citations
Liu et al.
Git-mol: A Multi-modal Large Language Model For Molecular Science With Graph, Image, And Text (2023) • Computers in Biology and Medicine • 48 citations
Liu et al.
Is Chatgpt A Good Recommender? A Preliminary Study (2023) • Arxiv • 43 citations
Liu et al.
Llava-plus: Learning To Use Tools For Creating Multimodal Agents (2023) • No Venue
Liu et al.
Summary Of Chatgpt-related Research And Perspective Towards The Future Of Large Language Models (2023) • Meta-Radiology • 582 citations
Liu et al.
3D-GPT: Procedural 3D Modeling With Large Language Models (2023) • No Venue
Sun et al.
When MOE Meets Llms: Parameter Efficient Fine-tuning For Multi-task Medical Applications (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 49 citations
Liu et al.
Wavjourney: Compositional Audio Creation With Large Language Models (2023) • No Venue
Liu et al.
Generative Ai-enabled Vehicular Networks: Fundamentals, Framework, And Case Study (2023) • IEEE Network • 55 citations
Zhang et al.
Loss Functions And Metrics In Deep Learning (2023) • Terven J. Cordova-Esparza DM. Romero-Gonzalez JA. et al. A comprehensive survey of loss functions and metrics in deep learning. Artif Intell Rev 58 195 (2025) • 40 citations
Terven et al.
Luminate: Structured Generation And Exploration Of Design Space With Large Language Models For Human-ai Co-creation (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 79 citations
Suh et al.
Towards Autonomous System: Flexible Modular Production System Enhanced With Large Language Model Agents (2023) • 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA) • 57 citations
Xia et al.
Alpha-clip: A CLIP Model Focusing On Wherever You Want (2023) • No Venue
Sun et al.
The Rise And Potential Of Large Language Model Based Agents: A Survey (2023) • Science China Information Sciences • 183 citations
Xi et al.
Decoding Chatgpt: A Taxonomy Of Existing Research, Current Challenges, And Possible Future Directions (2023) • Journal of King Saud University - Computer and Information Sciences • 122 citations
Sohail et al.
Kosmos-2.5: A Multimodal Literate Model (2023) • No Venue
Lv et al.
Chatcad+: Towards A Universal And Reliable Interactive CAD Using Llms (2023) • IEEE Transactions on Medical Imaging • 41 citations
Zhao et al.
Evaluating The Social Impact Of Generative AI Systems In Systems And Society (2023) • Arxiv • 41 citations
Solaiman et al.
Uni-controlnet: All-in-one Control To Text-to-image Diffusion Models (2023) • Arxiv • 65 citations
Zhao et al.
Translating Radiology Reports Into Plain Language Using Chatgpt And GPT-4 With Prompt Learning: Promising Results, Limitations, And Potential (2023) • Visual Computing for Industry, Biomedicine, and Art • 264 citations
Lyu et al.
Beyond Chatbots: Explorellm For Structured Thoughts And Personalized Model Responses (2023) • No Venue
Ma et al.
A Survey On Semantic Processing Techniques (2023) • Information Fusion • 44 citations
Mao et al.
Unveiling Security, Privacy, And Ethical Concerns Of Chatgpt (2023) • Journal of Information and Intelligence • 165 citations
Xiaodong Wu, Ran Duan, Jianbing Ni
Tidybot: Personalized Robot Assistance With Large Language Models (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 65 citations
Wu et al.
GPT Has Become Financially Literate: Insights From Financial Literacy Tests Of GPT And A Preliminary Test Of How People Use It As A Source Of Advice (2023) • Finance Research Letters • 58 citations
Paweł Niszczota, Sami Abbas
Capabilities Of GPT-4 On Medical Challenge Problems (2023) • Arxiv • 474 citations
Nori et al.
Distilling Large Language Models For Matching Patients To Clinical Trials (2023) • Journal of the American Medical Informatics Association • 44 citations
Nievas et al.
Social Biases Through The Text-to-image Generation Lens (2023) • Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society • 80 citations
Ranjita Naik, Besmira Nushi
Large Language Models In Healthcare And Medical Domain: A Review (2023) • Informatics • 192 citations
Zabir Al Nazi, Wei Peng
Deepfakes, Misinformation, And Disinformation In The Era Of Frontier AI, Generative AI, And Large AI Models (2023) • 2023 International Conference on Computer and Applications (ICCA) • 69 citations
Shoaib et al.
Adaptive Ensemble Learning: Boosting Model Performance Through Intelligent Feature Fusion In Deep Neural Networks (2023) • Arxiv • 40 citations
Neelesh Mungoli
From Google Gemini To Openai Q* (q-star): A Survey Of Reshaping The Generative Artificial Intelligence (AI) Research Landscape (2023) • Arxiv • 59 citations
McIntosh et al.
Towards Geospatial Foundation Models Via Continual Pretraining (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 57 citations
Mendieta et al.
Ai-generated Content (AIGC): A Survey (2023) • Arxiv • 81 citations
Wu et al.
Llm-assisted Knowledge Graph Engineering: Experiments With Chatgpt (2023) • Informatik aktuell • 40 citations
Meyer et al.
Chatgpt Or Human? Detect And Explain. Explaining Decisions Of Machine Learning Model For Detecting Short Chatgpt-generated Text (2023) • Arxiv • 81 citations
Sandra Mitrović, Davide Andreoletti, Omran Ayoub
Med-flamingo: A Multimodal Medical Few-shot Learner (2023) • No Venue
Moor et al.
Dreamix: Video Diffusion Models Are General Video Editors (2023) • Arxiv • 43 citations
Molad et al.
Verbs In Action: Improving Verb Understanding In Video-language Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Momeni et al.
Levels Of AGI For Operationalizing Progress On The Path To AGI (2023) • No Venue
Morris et al.
Pmc-llama: Towards Building Open-source Language Models For Medicine (2023) • Journal of the American Medical Informatics Association • 179 citations
Wu et al.
LAVIE: High-quality Video Generation With Cascaded Latent Diffusion Models (2023) • No Venue
Wang et al.
Towards Human-bot Collaborative Software Architecting With Chatgpt (2023) • Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering • 128 citations
Ahmad et al.
Interpolating Between Images With Diffusion Models (2023) • No Venue
Clinton J. Wang, Polina Golland
Emotional Intelligence Of Large Language Models (2023) • Journal of Pacific Rim Psychology • 77 citations
Wang et al.
Spellburst: A Node-based Interface For Exploratory Creative Coding With Natural Language Prompts (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 45 citations
Angert et al.
Chatgpt In Drug Discovery: A Case Study On Anti-cocaine Addiction Drug Development With Chatbots (2023) • Journal of Chemical Information and Modeling • 42 citations
Rui Wang, Hongsong Feng, Guo-Wei Wei
Decodingtrust: A Comprehensive Assessment Of Trustworthiness In GPT Models (2023) • Arxiv • 58 citations
Wang et al.
Codet5+: Open Code Large Language Models For Code Understanding And Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 215 citations
Wang et al.
Large Language Models Streamline Automated Machine Learning For Clinical Studies (2023) • Nature Communications • 74 citations
Arasteh et al.
Kandinsky 3.0 Technical Report (2023) • No Venue
Arkhipkin et al.
Adaptive Shells For Efficient Neural Radiance Field Rendering (2023) • No Venue
Wang et al.
Chatcad: Interactive Computer-aided Diagnosis On Medical Image Using Large Language Models (2023) • Communications Engineering • 88 citations
Wang et al.
Foundational Models Defining A New Era In Vision: A Survey And Outlook (2023) • Arxiv • 66 citations
Awais et al.
Chatgpt: Applications, Opportunities, And Threats (2023) • 2023 Systems and Information Engineering Design Symposium (SIEDS) • 162 citations
Bahrini et al.
Longbench: A Bilingual, Multitask Benchmark For Long Context Understanding (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Bai et al.
Dreamdiffusion: Generating High-quality Images From Brain EEG Signals (2023) • No Venue
Bai et al.
Qwen Technical Report (2023) • No Venue
Bai et al.
A Multitask, Multilingual, Multimodal Evaluation Of Chatgpt On Reasoning, Hallucination, And Interactivity (2023) • Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) • 466 citations
Bang et al.
Advancements In Generative AI: A Comprehensive Review Of Gans, GPT, Autoencoders, Diffusion Model, And Transformers (2023) • IEEE Access • 185 citations
Bengesi et al.
Align Your Latents: High-resolution Video Synthesis With Latent Diffusion Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 438 citations
Blattmann et al.
Emergent Autonomous Scientific Research Capabilities Of Large Language Models (2023) • Arxiv • 73 citations
Daniil A. Boiko, Robert MacKnight, Gabe Gomes
GPT As Knowledge Worker: A Zero-shot Evaluation Of (AI)CPA Capabilities (2023) • SSRN Electronic Journal • 54 citations
Bommarito et al.
Medbert.de: A Comprehensive German BERT Model For The Medical Domain (2023) • Expert Systems with Applications • 44 citations
Bressem et al.
Prompt Engineering For Healthcare: Methodologies And Applications (2023) • Arxiv • 115 citations
Wang et al.
Chatunitest: A Framework For Llm-based Test Generation (2023) • FSE '24: 32nd ACM International Conference on the Foundations of Software Engineering • 56 citations
Chen et al.
Texfusion: Synthesizing 3D Textures With Text-guided Image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 67 citations
Cao et al.
A Comprehensive Survey Of Ai-generated Content (AIGC): A History Of Generative AI From GAN To Chatgpt (2023) • Arxiv • 385 citations
Cao et al.
On The Possibilities Of Ai-generated Text Detection (2023) • Arxiv • 50 citations
Chakraborty et al.
One-for-all: Generalized Lora For Parameter-efficient Fine-tuning (2023) • No Venue
Chavan et al.
Benchmarking Large Language Models For Biomedical Natural Language Processing Applications And Recommendations (2023) • Nature Communications • 41 citations
Chen et al.
Automatic Root Cause Analysis Via Large Language Models For Cloud Incidents (2023) • EuroSys '24: Nineteenth European Conference on Computer Systems • 77 citations
Chen et al.
Large-scale Automatic Audiobook Creation (2023) • No Venue
Walsh et al.
Exploring Data Augmentation For Code Generation Tasks (2023) • ACM SIGKDD Explorations Newsletter • 118 citations
Pinzhen Chen, Gerasimos Lampouras
Gptutor: A Chatgpt-powered Programming Tool For Code Explanation (2023) • Communications in Computer and Information Science • 68 citations
Chen et al.
Llava-interactive: An All-in-one Demo For Image Chat, Segmentation, Generation And Editing (2023) • No Venue
Chen et al.
Pixart-α: Fast Training Of Diffusion Transformer For Photorealistic Text-to-image Synthesis (2023) • No Venue
Chen et al.
When Do You Need Chain-of-thought Prompting For Chatgpt? (2023) • World Wide Web • 190 citations
Chen et al.
Chatgpt Empowered Long-step Robot Control In Various Environments: A Case Application (2023) • IEEE Access • 75 citations
Wake et al.
Internvid: A Large-scale Video-text Dataset For Multimodal Understanding And Generation (2023) • No Venue
Wang et al.
"kelly Is A Warm Person, Joseph Is A Role Model": Gender Biases In Llm-generated Reference Letters (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 76 citations
Wan et al.
Memorybank: Enhancing Large Language Models With Long-term Memory (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 77 citations
Zhong et al.
A GPT-4 Reticular Chemist For Guiding MOF Discovery (2023) • Angewandte Chemie International Edition • 117 citations
Zheng et al.
Parameter-efficient Transfer Learning For Remote Sensing Image-text Retrieval (2023) • IEEE Transactions on Geoscience and Remote Sensing • 60 citations
Yuan Yuan, Yang Zhan, Zhitong Xiong
A Survey On Deep Neural Network Pruning-taxonomy, Comparison, Analysis, And Recommendations (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 203 citations
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi
Chatgpt For Robotics: Design Principles And Model Abilities (2023) • Arxiv • 90 citations
Vemprala et al.
Large Language Models For Business Process Management: Opportunities And Challenges (2023) • Lecture Notes in Business Information Processing • 75 citations
Maxim Vidgof, Stefan Bachhofner, Jan Mendling
A Survey Of Techniques For Optimizing Transformer Inference (2023) • Journal of Systems Architecture • 90 citations
Chitty-Venkata et al.
Agieval: A Human-centric Benchmark For Evaluating Foundation Models (2023) • Arxiv • 60 citations
Zhong et al.
On The Robustness Of Chatgpt: An Adversarial And Out-of-distribution Perspective (2023) • Arxiv • 90 citations
Wang et al.
Lmsys-chat-1m: A Large-scale Real-world LLM Conversation Dataset (2023) • No Venue
Zheng et al.
Musicagent: An AI Agent For Music Understanding And Generation With Large Language Models (2023) • No Venue
Yu et al.
Simple And Controllable Music Generation (2023) • No Venue
Copet et al.
Nl2spec: Interactively Translating Unstructured Natural Language To Temporal Logics With Large Language Models (2023) • Lecture Notes in Computer Science • 61 citations
Cosler et al.
A Survey On Multimodal Large Language Models For Autonomous Driving (2023) • 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) • 217 citations
Cui et al.
Large Language Models For Compiler Optimization (2023) • No Venue
Cummins et al.
Chatgpt-4 Outperforms Experts And Crowd Workers In Annotating Political Twitter Messages With Zero-shot Learning (2023) • Arxiv • 152 citations
Petter Törnberg
Speechx: Neural Codec Language Model As A Versatile Speech Transformer (2023) • No Venue
Wang et al.
K2: A Foundation Language Model For Geoscience Knowledge Understanding And Utilization (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 48 citations
Deng et al.
LIDA: A Tool For Automatic Generation Of Grammar-agnostic Visualizations And Infographics Using Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) • 69 citations
Victor Dibia
General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Societal Implications And Responsible Governance (2023) • Information Fusion • 46 citations
Triguero et al.
Enhancing Job Recommendation Through Llm-based Generative Adversarial Networks (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 49 citations
Du et al.
Ethical Chatgpt: Concerns, Challenges, And Commandments (2023) • Electronics • 54 citations
Zhou et al.
Semantic Anomaly Detection With Large Language Models (2023) • Autonomous Robots • 63 citations
Elhafsi et al.
Decoding The Threat Landscape : Chatgpt, Fraudgpt, And Wormgpt In Social Engineering Attacks (2023) • International Journal of Scientific Research in Computer Science, Engineering and Information Technology • 43 citations
Polra Victor Falade
A Bibliometric Review Of Large Language Models Research From 2017 To 2023 (2023) • ACM Transactions on Intelligent Systems and Technology • 95 citations
Fan et al.
Large Language Models For Software Engineering: Survey And Open Problems (2023) • 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) • 244 citations
Fan et al.
Can Ai-generated Text Be Reliably Detected? (2023) • Arxiv • 144 citations
Sadasivan et al.
Generative Pre-trained Transformer: A Comprehensive Review On Enabling Technologies, Potential Applications, Emerging Challenges, And Future Directions (2023) • IEEE Access • 375 citations
Yenduri et al.
A Review Of The Trends And Challenges In Adopting Natural Language Processing Methods For Education Feedback Analysis (2023) • IEEE Access • 207 citations
Shaik et al.
Transforming Sentiment Analysis In The Financial Domain With Chatgpt (2023) • Machine Learning with Applications • 124 citations
Fatouros et al.
Fedmultimodal: A Benchmark For Multimodal Federated Learning (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 59 citations
Feng et al.
Let's Have A Chat! A Conversation With Chatgpt: Technology, Applications, And Limitations (2023) • Artificial Intelligence and Applications • 93 citations
Sakib Shahriar, Kadhim Hayawi
Attentionviz: A Global View Of Transformer Attention (2023) • IEEE Transactions on Visualization and Computer Graphics • 44 citations
Yeh et al.
Should Chatgpt Be Biased? Challenges And Risks Of Bias In Large Language Models (2023) • First Monday • 176 citations
Emilio Ferrara
Foundation Models In Robotics: Applications, Challenges, And The Future (2023) • The International Journal of Robotics Research • 89 citations
Firoozi et al.
Large Language Models In Education: Vision And Opportunities (2023) • 2023 IEEE International Conference on Big Data (BigData) • 75 citations
Gan et al.
On The Origin Of Llms: An Evolutionary Tree And Graph For 15,821 Large Language Models (2023) • No Venue
Sarah Gao, Andrew Kean Gao
G-llava: Solving Geometric Problem With Multi-modal Large Language Model (2023) • No Venue
Gao et al.
Assistgpt: A General Multi-modal Assistant That Can Plan, Execute, Inspect, And Learn (2023) • No Venue
Gao et al.
Funasr: A Fundamental End-to-end Speech Recognition Toolkit (2023) • INTERSPEECH 2023 • 44 citations
Gao et al.
Gemini: A Family Of Highly Capable Multimodal Models (2023) • Arxiv • 758 citations
Team et al.
Chatgpt: Vision And Challenges (2023) • Internet of Things and Cyber-Physical Systems • 204 citations
Sukhpal Singh Gill, Rupinder Kaur
Chatgpt Outperforms Crowd-workers For Text-annotation Tasks (2023) • Arxiv • 68 citations
Fabrizio Gilardi, Meysam Alizadeh, Maël Kubli
Imagebind: One Embedding Space To Bind Them All (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 552 citations
Girdhar et al.
Audiopalm: A Large Language Model That Can Speak And Listen (2023) • No Venue
Rubenstein et al.
Make-it-3d: High-fidelity 3D Creation From A Single Image With Diffusion Prior (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 149 citations
Tang et al.
PIPPA: A Partially Synthetic Conversational Dataset (2023) • No Venue
Tear Gosling, Alpin Dale, Yinhe Zheng
Chatgpt Is Not All You Need. A State Of The Art Review Of Large Generative AI Models (2023) • Arxiv • 238 citations
Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchan
Not What You've Signed Up For: Compromising Real-world Llm-integrated Applications With Indirect Prompt Injection (2023) • CCS '23: ACM SIGSAC Conference on Computer and Communications Security • 178 citations
Greshake et al.
Large Language Models Can Accomplish Business Process Management Tasks (2023) • Lecture Notes in Business Information Processing • 57 citations
Grohs et al.
Text With Knowledge Graph Augmented Transformer For Video Captioning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Gu et al.
A Survey On Large Language Model (LLM) Security And Privacy: The Good, The Bad, And The Ugly (2023) • High-Confidence Computing • 594 citations
Yao et al.
Hallucinations In Large Multilingual Translation Models (2023) • Transactions of the Association for Computational Linguistics • 77 citations
Guerreiro et al.
Enhancing Dyadic Relations With Homogeneous Graphs For Multimodal Recommendation (2023) • Frontiers in Artificial Intelligence and Applications • 45 citations
Zhou et al.
Can Generative Pre-trained Transformers (GPT) Pass Assessments In Higher Education Programming Courses? (2023) • ITiCSE 2023: Innovation and Technology in Computer Science Education • 83 citations
Savelka et al.
A Complete Survey On Generative AI (AIGC): Is Chatgpt From GPT-4 To GPT-5 All You Need? (2023) • Arxiv • 101 citations
Zhang et al.
Regulating Chatgpt And Other Large Generative AI Models (2023) • 2023 ACM Conference on Fairness Accountability and Transparency • 353 citations
Philipp Hacker, Andreas Engel, Marco Mauer
Using Sequences Of Life-events To Predict Human Lives (2023) • Nature Computational Science • 55 citations
Savcisens et al.
Medalpaca -- An Open-source Collection Of Medical Conversational AI Models And Training Data (2023) • Arxiv • 102 citations
Han et al.
Svdiff: Compact Parameter Space For Diffusion Fine-tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Han et al.
A Comprehensive Survey On Segment Anything Model For Vision And Beyond (2023) • Arxiv • 45 citations
Zhang et al.
Leveraging Large Language Models For Sequential Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 88 citations
Harte et al.
The Political Ideology Of Conversational AI: Converging Evidence On Chatgpt's Pro-environmental, Left-libertarian Orientation (2023) • SSRN Electronic Journal • 85 citations
Jochen Hartmann, Jasper Schwenzow, Maximilian Witte
Fastervit: Fast Vision Transformers With Hierarchical Attention (2023) • No Venue
Hatamizadeh et al.
GPT Models In Construction Industry: Opportunities, Limitations, And A Use Case Validation (2023) • Developments in the Built Environment • 81 citations
Saka et al.
Sketch-a-shape: Zero-shot Sketch-to-3d Shape Generation (2023) • No Venue
Sanghi et al.
A Survey On Uncertainty Quantification Methods For Deep Learning (2023) • Arxiv • 50 citations
He et al.
On The Challenges And Perspectives Of Foundation Models For Medical Image Analysis (2023) • Medical Image Analysis • 165 citations
Shaoting Zhang, Dimitris Metaxas
Biomedclip: A Multimodal Biomedical Foundation Model Pretrained From Fifteen Million Scientific Image-text Pairs (2023) • Arxiv • 87 citations
Zhang et al.
Biomedgpt: A Generalist Vision-language Foundation Model For Diverse Biomedical Tasks (2023) • Arxiv • 49 citations
Zhang et al.
Appagent: Multimodal Agents As Smartphone Users (2023) • No Venue
Zhang et al.
Foundation Models And Fair Use (2023) • SSRN Electronic Journal • 64 citations
Henderson et al.
Dreamface: Progressive Generation Of Animatable 3D Faces Under Text Guidance (2023) • ACM Transactions on Graphics • 54 citations
Zhang et al.
Graspgpt: Leveraging Semantic Knowledge From A Large Language Model For Task-oriented Grasping (2023) • IEEE Robotics and Automation Letters • 67 citations
Tang et al.
Personality Traits In Large Language Models (2023) • No Venue
Safdari et al.
Tool Documentation Enables Zero-shot Tool-usage With Large Language Models (2023) • No Venue
Hsieh et al.
BLIVA: A Simple Multimodal LLM For Better Handling Of Text-rich Visual Questions (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Hu et al.
Improving User Controlled Table-to-text Generation Robustness (2023) • Journal of the American Medical Informatics Association • 187 citations
Hu et al.
Zero-shot Information Extraction From Radiological Reports Using Chatgpt (2023) • International Journal of Medical Informatics • 67 citations
Hu et al.
FABRIC: Personalizing Diffusion Models With Iterative Feedback (2023) • No Venue
Rütte et al.
Harnessing The Power Of Llms In Practice: A Survey On Chatgpt And Beyond (2023) • ACM Transactions on Knowledge Discovery from Data • 303 citations
Yang et al.
MM-REACT: Prompting Chatgpt For Multimodal Reasoning And Action (2023) • Arxiv • 78 citations
Yang et al.
Lorahub: Efficient Cross-task Generalization Via Dynamic Lora Composition (2023) • No Venue
Huang et al.
Chatgpt For Shaping The Future Of Dentistry: The Potential Of Multi-modal Large Language Model (2023) • International Journal of Oral Science • 221 citations
Huang et al.
Large Language Models Cannot Self-correct Reasoning Yet (2023) • No Venue
Huang et al.
Make-an-audio: Text-to-audio Generation With Prompt-enhanced Diffusion Models (2023) • Arxiv • 46 citations
Huang et al.
A Survey On Automated Program Repair Techniques (2023) • Artificial Intelligence Review • 62 citations
Huang et al.
Neuroprompts: An Adaptive Framework To Optimize Prompts For Text-to-image Generation (2023) • No Venue
Shachar Rosenman, Vasudev Lal, Phillip Howard
Genassist: Making Image Generation Accessible (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 46 citations
Mina Huh, Yi-Hao Peng, Amy Pavel
Med-halt: Medical Domain Hallucination Test For Large Language Models (2023) • Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL) • 54 citations
Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu
Flatformer: Flattened Window Attention For Efficient Point Cloud Transformer (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 83 citations
Liu et al.
Facilitating Self-guided Mental Health Interventions Through Human-language Model Interaction: A Case Study Of Cognitive Restructuring (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 43 citations
Sharma et al.
Universalner: Targeted Distillation From Large Language Models For Open Named Entity Recognition (2023) • No Venue
Zhou et al.
Testing The Reliability Of Chatgpt For Text Annotation And Classification: A Cautionary Remark (2023) • Arxiv • 80 citations
Michael V. Reiss
Give Us The Facts: Enhancing Large Language Models With Knowledge Graphs For Fact-aware Language Modeling (2023) • IEEE Transactions on Knowledge and Data Engineering • 96 citations
Yang et al.
Large Language Models As Optimizers (2023) • No Venue
Yang et al.
Implicit Neural Representation For Cooperative Low-light Image Enhancement (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 144 citations
Yang et al.
The Dawn Of Lmms: Preliminary Explorations With Gpt-4v(ision) (2023) • Arxiv • 160 citations
Yang et al.
Exploring The Limits Of Chatgpt For Query Or Aspect-based Text Summarization (2023) • Arxiv • 89 citations
Yang et al.
Fingpt: Open-source Financial Large Language Models (2023) • SSRN Electronic Journal • 143 citations
Hongyang Yang, Xiao-Yang Liu, Christina Dan Wang
The Programmer's Assistant: Conversational Interaction With A Large Language Model For Software Development (2023) • IUI '23: 28th International Conference on Intelligent User Interfaces • 177 citations
Ross et al.
Fusecap: Leveraging Large Language Models For Enriched Fused Image Captions (2023) • 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) • 47 citations
Rotstein et al.
Magicapture: High-resolution Multi-concept Portrait Customization (2023) • No Venue
Junha Hyung, Jaeyo Shin, Jaegul Choo
Starvector: Generating Scalable Vector Graphics Code From Images (2023) • No Venue
Rodriguez et al.
A Comprehensive Survey On Applications Of Transformers For Deep Learning Tasks (2023) • Expert Systems with Applications • 259 citations
Islam et al.
14 Examples Of How Llms Can Transform Materials Science And Chemistry: A Reflection On A Large Language Model Hackathon (2023) • Digital Discovery • 141 citations
Jablonka et al.
A Study On The Implementation Of Generative AI Services Using An Enterprise Data-based LLM Application Architecture (2023) • Advances in Artificial Intelligence and Machine Learning • 53 citations
Cheonsu Jeong
Llmlingua: Compressing Prompts For Accelerated Inference Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 64 citations
Jiang et al.
Meta-transformer: A Unified Framework For Multimodal Learning (2023) • No Venue
Zhang et al.
Code Llama: Open Foundation Models For Code (2023) • Arxiv • 377 citations
Rozière et al.
Matching Patients To Clinical Trials With Large Language Models (2023) • Nature Communications • 111 citations
Jin et al.
Large Models For Time Series And Spatio-temporal Data: A Survey And Outlook (2023) • IEEE Transactions on Knowledge and Data Engineering • 54 citations
Jin et al.
Huatuogpt, Towards Taming Language Model To Be A Doctor (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 128 citations
Zhang et al.
Make-a-character: High Quality Text-to-3d Character Generation Within Minutes (2023) • No Venue
Ren et al.
Videodirectorgpt: Consistent Multi-scene Video Generation Via Llm-guided Planning (2023) • No Venue
Lin et al.
Large Language Models For Education: Grading Open-ended Questions Using Chatgpt (2023) • SBES 2023: XXXVII Brazilian Symposium on Software Engineering • 46 citations
Pinto et al.
Deepmatcher: A Deep Transformer-based Network For Robust And Accurate Local Feature Matching (2023) • Expert Systems with Applications • 51 citations
Xie et al.
Boxdiff: Text-to-image Synthesis With Training-free Box-constrained Diffusion (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 101 citations
Xie et al.
AI Transparency In The Age Of Llms: A Human-centered Research Roadmap (2023) • Harvard Data Science Review • 75 citations
Q. Vera Liao, Jennifer Wortman Vaughan
GPT-4 Enhanced Multimodal Grounding For Autonomous Driving: Leveraging Cross-modal Attention With Large Language Models (2023) • Communications in Transportation Research • 57 citations
Liao et al.
Investigating The Use Of Chatgpt For The Scheduling Of Construction Projects (2023) • Buildings • 174 citations
Samuel A. Prieto, Eyob T. Mengiste, Borja García de Soto
Performance Of Chatgpt On The US Fundamentals Of Engineering Exam: Comprehensive Assessment Of Proficiency And Potential Implications For Professional Environmental Engineering Practice (2023) • Computers and Education: Artificial Intelligence • 91 citations
Vinay Pursnani, Yusuf Sermet, Ibrahim Demir
CCT5: A Code-change-oriented Pre-trained Model (2023) • Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 42 citations
Lin et al.
AWQ: Activation-aware Weight Quantization For LLM Compression And Acceleration (2023) • GetMobile: Mobile Computing and Communications • 64 citations
Lin et al.
MM-VID: Advancing Video Understanding With Gpt-4v(ision) (2023) • No Venue
Lin et al.
Cancergpt: Few-shot Drug Pair Synergy Prediction Using Large Pre-trained Language Models (2023) • npj Digital Medicine • 72 citations
Li et al.
BLIP-2: Bootstrapping Language-image Pre-training With Frozen Image Encoders And Large Language Models (2023) • Arxiv • 65 citations
Li et al.
Autonomous GIS: The Next-generation Ai-powered GIS (2023) • International Journal of Digital Earth • 99 citations
Zhenlong Li, Huan Ning
Chatdoctor: A Medical Chat Model Fine-tuned On A Large Language Model Meta-ai (llama) Using Medical Domain Knowledge (2023) • Cureus • 256 citations
Li et al.
FLM-101B: An Open LLM And How To Train It With $100K Budget (2023) • No Venue
Li et al.
Large Multimodal Models: Notes On CVPR 2023 Tutorial (2023) • ICAIF '23: 4th ACM International Conference on AI in Finance • 157 citations
Chunyuan Li
Modelscope-agent: Building Your Customizable Agent System With Open-source Large Language Models (2023) • No Venue
Li et al.
Multi-step Jailbreaking Privacy Attacks On Chatgpt (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 117 citations
Li et al.
Photomaker: Customizing Realistic Human Photos Via Stacked ID Embedding (2023) • No Venue
Li et al.
Large AI Models In Health Informatics: Applications, Challenges, And The Future (2023) • IEEE Journal of Biomedical and Health Informatics • 159 citations
Qiu et al.
Videogen: A Reference-guided Latent Diffusion Approach For High Definition Text-to-video Generation (2023) • No Venue
Li et al.
Towards Tracing Code Provenance With Code Watermarking (2023) • IEEE Internet of Things Journal • 62 citations
Li et al.
Videochat: Chat-centric Video Understanding (2023) • Arxiv • 90 citations
Li et al.
Vision-language Models In Remote Sensing: Current Progress And Future Trends (2023) • IEEE Geoscience and Remote Sensing Magazine • 80 citations
Li et al.
Ufogen: You Forward Once Large Scale Text-to-image Generation Via Diffusion Gans (2023) • No Venue
Xu et al.
Mental-llm: Leveraging Large Language Models For Mental Health Prediction Via Online Text Data (2023) • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies • 119 citations
Xu et al.
Drivegpt4: Interpretable End-to-end Autonomous Driving Via Large Language Model (2023) • IEEE Robotics and Automation Letters • 202 citations
Xu et al.
Demystifying CLIP Data (2023) • No Venue
Xu et al.
Autodroid: Llm-powered Task Automation In Android (2023) • ACM MobiCom '24: 30th Annual International Conference on Mobile Computing and Networking • 40 citations
Wen et al.
Augmenting Low-resource Text Classification With Graph-grounded Pre-training And Prompting (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Zhihao Wen, Yuan Fang
Challenges And Applications Of Large Language Models (2023) • No Venue
Kaddour et al.
Magicbrush: A Manually Annotated Dataset For Instruction-guided Image Editing (2023) • No Venue
Zhang et al.
LARP: Language-agent Role Play For Open-world Games (2023) • No Venue
Yan et al.
Scaling Up Gans For Text-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 313 citations
Kang et al.
Evaluating GPT-4 And Chatgpt On Japanese Medical Licensing Examinations (2023) • Arxiv • 50 citations
Kasai et al.
Chatgpt For Programming Numerical Methods (2023) • Journal of Machine Learning for Modeling and Computing • 81 citations
Ali Kashefi, Tapan Mukerji
X-adapter: Adding Universal Compatibility Of Plugins For Upgraded Diffusion Model (2023) • No Venue
Ran et al.
Exploring The Potential Of Large Language Models To Generate Formative Programming Feedback (2023) • 2023 IEEE Frontiers in Education Conference (FIE) • 48 citations
Natalie Kiesler, Dominic Lohr, Hieke Keuning
Mindfuldiary: Harnessing Large Language Model To Support Psychiatric Patients' Journaling (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 66 citations
Kim et al.
Collaborative Score Distillation For Consistent Visual Synthesis (2023) • No Venue
Kim et al.
Evallm: Interactive Evaluation Of Large Language Model Prompts On User-defined Criteria (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 43 citations
Kim et al.
Prospect: Prompt Spectrum For Attribute-aware Personalization Of Diffusion Models (2023) • ACM Transactions on Graphics • 76 citations
Zhang et al.
A Survey Of Learning-based Automated Program Repair (2023) • ACM Transactions on Software Engineering and Methodology • 77 citations
Zhang et al.
Nemo Guardrails: A Toolkit For Controllable And Safe LLM Applications With Programmable Rails (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 53 citations
Rebedea et al.
A Transformer-based Representation-learning Model With Unified Processing Of Multimodal Input For Clinical Diagnostics (2023) • Nature Biomedical Engineering • 223 citations
Zhou et al.
Sasha: Creative Goal-oriented Reasoning In Smart Homes With Large Language Models (2023) • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies • 44 citations
King et al.
The Potential And Pitfalls Of Using A Large Language Model Such As Chatgpt Or GPT-4 As A Clinical Assistant (2023) • Journal of the American Medical Informatics Association • 45 citations
Zhang et al.
Freestyle Layout-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Xue et al.
Vision Language Models In Autonomous Driving: A Survey And Outlook (2023) • IEEE Transactions on Intelligent Vehicles • 46 citations
Zhou et al.
Personalize Segment Anything Model With One Shot (2023) • Arxiv • 65 citations
Zhang et al.
Motiongpt: Finetuned Llms Are General-purpose Motion Generators (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 50 citations
Zhang et al.
Text-visual Prompting For Efficient 2D Temporal Video Grounding (2023) • Arxiv • 73 citations
Zhang et al.
Chatgpt Beyond English: Towards A Comprehensive Evaluation Of Large Language Models In Multilingual Learning (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 91 citations
Lai et al.
Large Language Models In Law: A Survey (2023) • AI Open • 53 citations
Lai et al.
Trafficgpt: Viewing, Processing And Interacting With Traffic Foundation Models (2023) • Transport Policy • 87 citations
Zhang et al.
Unifying The Perspectives Of NLP And Software Engineering: A Survey On Language Models For Code (2023) • No Venue
Zhang et al.
Evaluation Of Chatgpt For Nlp-based Mental Health Applications (2023) • Arxiv • 54 citations
Bishal Lamichhane
Redefining Qualitative Analysis In The AI Era: Utilizing Chatgpt For Efficient Thematic Analysis (2023) • Arxiv • 58 citations
Zhang et al.
Chatgpt And Other Large Language Models As Evolutionary Engines For Online Interactive Collaborative Game Design (2023) • GECCO '23: Genetic and Evolutionary Computation Conference • 42 citations
Pier Luca Lanzi, Daniele Loiacono
Voicebox: Text-guided Multilingual Universal Speech Generation At Scale (2023) • Arxiv • 44 citations
Le et al.
Applying Large Language Models And Chain-of-thought For Automatic Scoring (2023) • Computers and Education: Artificial Intelligence • 77 citations
Lee et al.
Hierspeech++: Bridging The Gap Between Semantic And Acoustic Representation Of Speech By Hierarchical Variational Inference For Zero-shot Speech Synthesis (2023) • No Venue
Lee et al.
VISAR: A Human-ai Argumentative Writing Assistant With Visual Programming And Rapid Draft Prototyping (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 67 citations
Zhang et al.
Opportunities And Challenges For Chatgpt And Large Language Models In Biomedicine And Health (2023) • Briefings in Bioinformatics • 233 citations
Tian et al.
Reasoning With Language Model Prompting: A Survey (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 71 citations
Qiao et al.
Clip-event: Connecting Text And Images With Event Structures (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Li et al.
Dit: Self-supervised Pre-training For Document Image Transformer (2022) • MM '22: The 30th ACM International Conference on Multimedia • 118 citations
Li et al.
Cross-modal Clinical Graph Transformer For Ophthalmic Report Generation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 47 citations
Li et al.
Transformers In Time Series: A Survey (2022) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 557 citations
Wen et al.
Diffusion Models: A Comprehensive Survey Of Methods And Applications (2022) • Arxiv • 147 citations
Yang et al.
Multi-behavior Hypergraph-enhanced Transformer For Sequential Recommendation (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 130 citations
Yang et al.
Robots Enact Malignant Stereotypes (2022) • 2022 ACM Conference on Fairness Accountability and Transparency • 43 citations
Hundt et al.
Artificial Intelligence For The Metaverse: A Survey (2022) • Engineering Applications of Artificial Intelligence • 601 citations
Huynh-The et al.
Deep Unsupervised Domain Adaptation: A Review Of Recent Advances And Perspectives (2022) • APSIPA Transactions on Signal and Information Processing • 225 citations
Liu et al.
Opal: Multimodal Image Generation For News Illustration (2022) • Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology • 87 citations
Vivian Liu, Han Qiao, Lydia Chilton
Clip-mesh: Generating Textured Meshes From Text Using Pretrained Image-text Models (2022) • SIGGRAPH Asia 2022 Conference Papers • 162 citations
Khalid et al.
A Systematic Evaluation Of Large Language Models Of Code (2022) • MAPS '22: 6th ACM SIGPLAN International Symposium on Machine Programming • 362 citations
Xu et al.
COLD Decoding: Energy-based Constrained Text Generation With Langevin Dynamics (2022) • Arxiv • 43 citations
Qin et al.
Transformers In Time-series Analysis: A Tutorial (2022) • Circuits, Systems, and Signal Processing • 192 citations
Ahmed et al.
Text And Code Embeddings By Contrastive Pre-training (2022) • Arxiv • 146 citations
Neelakantan et al.
MTEB: Massive Text Embedding Benchmark (2022) • Arxiv • 57 citations
Muennighoff et al.
Out Of One, Many: Using Language Models To Simulate Human Samples (2022) • Political Analysis • 346 citations
Argyle et al.
Fairness-aware Adversarial Perturbation Towards Bias Mitigation For Deployed Deep Models (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Wang et al.
A Survey On Generative Diffusion Model (2022) • Arxiv • 66 citations
Cao et al.
Socratic Models: Composing Zero-shot Multimodal Reasoning With Language (2022) • Arxiv • 168 citations
Zeng et al.
Regression Transformer: Concurrent Sequence Regression And Generation For Molecular Language Modeling (2022) • Nature Machine Intelligence • 97 citations
Jannis Born, Matteo Manica
Audiolm: A Language Modeling Approach To Audio Generation (2022) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 252 citations
Borsos et al.
Tweetnlp: Cutting-edge Natural Language Processing For Social Media (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 95 citations
Camacho-Collados et al.
Aspanformer: Detector-free Image Matching With Adaptive Span Transformer (2022) • Lecture Notes in Computer Science • 157 citations
Chen et al.
Make-a-video: Text-to-video Generation Without Text-video Data (2022) • Arxiv • 308 citations
Singer et al.
Promptchainer: Chaining Large Language Model Prompts Through Visual Programming (2022) • CHI '22: CHI Conference on Human Factors in Computing Systems • 124 citations
Wu et al.
Large Language Models Meet Nl2code: A Survey (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 81 citations
Zan et al.
Pretrained Domain-specific Language Model For General Information Retrieval Tasks In The AEC Domain (2022) • Computers in Industry • 82 citations
Zheng et al.
Learning From Flowsheets: A Generative Transformer Model For Autocompletion Of Flowsheets (2022) • Computers & Chemical Engineering • 42 citations
Gabriel Vogel, Lukas Schulze Balhorn, Artur M. Schweidtmann
Integrated Multimodal Artificial Intelligence Framework For Healthcare Applications (2022) • npj Digital Medicine • 202 citations
Soenksen et al.
M-SENA: An Integrated Platform For Multimodal Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 58 citations
Mao et al.
Using Pre-trained Models To Boost Code Review Automation (2022) • Proceedings of the 44th International Conference on Software Engineering • 131 citations
Tufano et al.
Biogpt: Generative Pre-trained Transformer For Biomedical Text Generation And Mining (2022) • Briefings in Bioinformatics • 700 citations
Luo et al.
A Survey On In-context Learning (2022) • Arxiv • 240 citations
Dong et al.
AI And 6G Into The Metaverse: Fundamentals, Challenges And Future Research Trends (2022) • IEEE Open Journal of the Communications Society • 122 citations
Zawish et al.
Investigating Explainability Of Generative AI For Code Through Scenario-based Design (2022) • 27th International Conference on Intelligent User Interfaces • 176 citations
Sun et al.
A Taxonomy Of Prompt Modifiers For Text-to-image Generation (2022) • Behaviour & Information Technology • 128 citations
Jonas Oppenlaender
Ai-driven Development Is Here: Should You Worry? (2022) • IEEE Software • 55 citations
Neil Ernst, Gabriele Bavota
Deep Learning-aided 6G Wireless Networks: A Comprehensive Survey Of Revolutionary PHY Architectures (2022) • IEEE Open Journal of the Communications Society • 96 citations
Ozpoyraz et al.
Abinet++: Autonomous, Bidirectional And Iterative Language Modeling For Scene Text Spotting (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 57 citations
Fang et al.
Transformers In Medical Imaging: A Survey (2022) • Medical Image Analysis • 773 citations
Shamshad et al.
A Comprehensive Survey Of Few-shot Learning: Evolution, Applications, Challenges, And Opportunities (2022) • ACM Computing Surveys • 448 citations
Song et al.
An Image Is Worth One Word: Personalizing Text-to-image Generation Using Textual Inversion (2022) • Arxiv • 460 citations
Gal et al.
Are Deepfakes Concerning? Analyzing Conversations Of Deepfakes On Reddit And Exploring Societal Implications (2022) • CHI Conference on Human Factors in Computing Systems • 61 citations
Gamage et al.
How To Keep Text Private? A Systematic Review Of Deep Learning Methods For Privacy-preserving Natural Language Processing (2022) • Artificial Intelligence Review • 57 citations
Samuel Sousa, Roman Kern
Language Model Classifier Aligns Better With Physician Word Sensitivity Than Xgboost On Readmission Prediction (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 112 citations
Yang et al.
Lampost: Design And Evaluation Of An Ai-assisted Email Writing Prototype For Adults With Dyslexia (2022) • Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility • 55 citations
Goodman et al.
Webshop: Towards Scalable Real-world Web Interaction With Grounded Language Agents (2022) • Arxiv • 46 citations
Yao et al.
Motiondiffuse: Text-driven Human Motion Generation With Diffusion Model (2022) • Arxiv • 109 citations
Zhang et al.
Artificial Intelligence In Government: Concepts, Standards, And A Unified Framework (2022) • Government Information Quarterly • 48 citations
Straub et al.
Coditt5: Pretraining For Source Code And Natural Language Editing (2022) • Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering • 68 citations
Zhang et al.
Speciesist Bias In AI -- How AI Applications Perpetuate Discrimination And Unfair Outcomes Against Animals (2022) • AI and Ethics • 62 citations
Hagendorff et al.
A Survey Of Controllable Text Generation Using Transformer-based Pre-trained Language Models (2022) • ACM Computing Surveys • 153 citations
Zhang et al.
Tailor Versatile Multi-modal Learning For Multi-label Emotion Recognition (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Zhang et al.
Twhin-bert: A Socially-enriched Pre-trained Language Model For Multilingual Tweet Representations At Twitter (2022) • KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 43 citations
Zhang et al.
Automatic Generation Of Programming Exercises And Code Explanations Using Large Language Models (2022) • ICER 2022: ACM Conference on International Computing Education Research • 329 citations
Sarsa et al.
"I Think This Is The Most Disruptive Technology": Exploring Sentiments Of Chatgpt Early Adopters Using Twitter Data (2022) • Arxiv • 204 citations
Haque et al.
Anyface: Free-style Text-to-face Synthesis And Manipulation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Sun et al.
A Contrastive Framework For Neural Text Generation (2022) • Arxiv • 83 citations
Su et al.
Prompt-to-prompt Image Editing With Cross Attention Control (2022) • Arxiv • 360 citations
Hertz et al.
A Primer On Contrastive Pretraining In Language Processing: Methods, Lessons Learned And Perspectives (2021) • ACM Computing Surveys • 55 citations
Nils Rethmeier, Isabelle Augenstein
Lightningdot: Pre-training Visual-semantic Embeddings For Real-time Image-text Retrieval (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 79 citations
Sun et al.
Generating Syntactically Controlled Paraphrases Without Using Annotated Parallel Pairs (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 52 citations
Kuan-Hao Huang, Kai-Wei Chang
Graph-enhanced Multi-task Learning Of Multi-level Transition Dynamics For Session-based Recommendation (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 95 citations
Huang et al.
M6: A Chinese Multimodal Pretrainer (2021) • Arxiv • 48 citations
Lin et al.
TVT: Transferable Vision Transformer For Unsupervised Domain Adaptation (2021) • 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) • 113 citations
Yang et al.
Prompt Programming For Large Language Models: Beyond The Few-shot Paradigm (2021) • CHI '21: CHI Conference on Human Factors in Computing Systems • 517 citations
Laria Reynolds, Kyle McDonell
MOMENTA: A Multimodal Framework For Detecting Harmful Memes And Their Targets (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 100 citations
Pramanick et al.
High-resolution Image Synthesis With Latent Diffusion Models (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 10513 citations
Rombach et al.
Layoutparser: A Unified Toolkit For Deep Learning Based Document Image Analysis (2021) • Lecture Notes in Computer Science • 98 citations
Shen et al.
Accelerating Recommendation System Training By Leveraging Popular Choices (2021) • Proceedings of the VLDB Endowment • 47 citations
Adnan et al.
Variational Transformer Networks For Layout Generation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 98 citations
Diego Martin Arroyo, Janis Postels, Federico Tombari
AI Chains: Transparent And Controllable Human-ai Interaction By Chaining Large Language Model Prompts (2021) • CHI '22: CHI Conference on Human Factors in Computing Systems • 326 citations
Tongshuang Wu, Michael Terry, Carrie J. Cai
Geometry Attention Transformer With Position-aware Lstms For Image Captioning (2021) • Expert Systems with Applications • 59 citations
Chi Wang, Yulin Shen, Luping Ji
Syntax-bert: Improving Pre-trained Transformers With Syntax Trees (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 40 citations
Bai et al.
Polyjuice: Generating Counterfactuals For Explaining, Evaluating, And Improving Models (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 60 citations
Wu et al.
Pre-training For Low Resource Speech-to-intent Applications (2021) • ACM Computing Surveys • 105 citations
Pu Wang, Hugo van Hamme
PAN++: Towards Efficient And Accurate End-to-end Spotting Of Arbitrarily-shaped Text (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 123 citations
Wang et al.
Screen2words: Automatic Mobile UI Summarization With Multimodal Learning (2021) • The 34th Annual ACM Symposium on User Interface Software and Technology • 78 citations
Wang et al.
Pseudo-relevance Feedback For Multiple Representation Dense Retrieval (2021) • Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval • 47 citations
Wang et al.
Meta-stylespeech : Multi-speaker Adaptive Text-to-speech Generation (2021) • Arxiv • 45 citations
Min et al.
Data Augmentation In Natural Language Processing: A Novel Text Generation Approach For Long And Short Text Classifiers (2021) • International Journal of Machine Learning and Cybernetics • 127 citations
Bayer et al.
Documentation Matters: Human-centered AI System To Assist Data Science Code Documentation In Computational Notebooks (2021) • ACM Transactions on Computer-Human Interaction • 73 citations
Wang et al.
Pimnet: A Parallel, Iterative And Mimicking Network For Scene Text Recognition (2021) • MM '21: ACM Multimedia Conference • 69 citations
Qiao et al.
Text2gestures: A Transformer-based Network For Generating Emotive Body Gestures For Virtual Agents (2021) • 2021 IEEE Virtual Reality and 3D User Interfaces (VR) • 109 citations
Bhattacharya et al.
Musicbert: Symbolic Music Understanding With Large-scale Pre-training (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 79 citations
Zeng et al.
On The Opportunities And Risks Of Foundation Models (2021) • Arxiv • 2055 citations
Bommasani et al.
Wav2clip: Learning Robust Audio Representations From CLIP (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 158 citations
Wu et al.
Paragraph-level Rationale Extraction Through Regularization: A Case Study On European Court Of Human Rights Cases (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 62 citations
Chalkidis et al.
An Exploration Of Self-supervised Pretrained Representations For End-to-end Speech Recognition (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 44 citations
Chang et al.
Zero-shot Cross-lingual Transfer Of Neural Machine Translation With Multilingual Pretrained Encoders (2021) • Lecture Notes in Computer Science • 51 citations
Chen et al.
Topic Modelling Meets Deep Neural Networks: A Survey (2021) • Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence • 108 citations
Zhao et al.
Kaleido-bert: Vision-language Pre-training On Fashion Domain (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 105 citations
Zhuge et al.
Improving Biomedical Pretrained Language Models With Knowledge (2021) • Proceedings of the 20th Workshop on Biomedical Language Processing • 40 citations
Yuan et al.
Bartscore: Evaluating Generated Text As Text Generation (2021) • Arxiv • 318 citations
Weizhe Yuan, Graham Neubig, Pengfei Liu
Fairfil: Contrastive Neural Debiasing Method For Pretrained Text Encoders (2021) • Arxiv • 50 citations
Cheng et al.
Backdoor Pre-trained Models Can Transfer To All (2021) • Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security • 70 citations
Shen et al.
Data Augmentation Approaches In Natural Language Processing: A Survey (2021) • AI Open • 290 citations
Bohan Li, Yutai Hou, Wanxiang Che
Contextual Transformer Networks For Visual Recognition (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 534 citations
Li et al.
Deepxml: A Deep Extreme Multi-label Learning Framework Applied To Short Text Documents (2021) • Proceedings of the 14th ACM International Conference on Web Search and Data Mining • 60 citations
Dahiya et al.
Fine-tuning Large Neural Language Models For Biomedical Natural Language Processing (2021) • Patterns • 100 citations
Tinn et al.
BOLD: Dataset And Metrics For Measuring Biases In Open-ended Language Generation (2021) • FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency • 125 citations
Dhamala et al.
What Changes Can Large-scale Language Models Bring? Intensive Study On Hyperclova: Billions-scale Korean Generative Pretrained Transformers (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 43 citations
Kim et al.
One Chatbot Per Person: Creating Personalized Chatbots Based On Implicit User Profiles (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 55 citations
Ma et al.
Word Alignment By Fine-tuning Embeddings On Parallel Corpora (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 118 citations
Zi-Yi Dou, Graham Neubig
A Survey Of Data Augmentation Approaches For NLP (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 127 citations
Feng et al.
Plan-then-generate: Controlled Data-to-text Generation Via Planning (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 57 citations
Su et al.
Image Captioning For Effective Use Of Language Models In Knowledge-based Visual Question Answering (2021) • Expert Systems with Applications • 48 citations
Salaberria et al.
Matscibert: A Materials Domain Language Model For Text Mining And Information Extraction (2021) • npj Computational Materials • 200 citations
Gupta et al.
Audioclip: Extending CLIP To Image, Text And Audio (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 210 citations
Guzhov et al.
Learning Shared Semantic Space For Speech-to-text Translation (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 57 citations
Han et al.
Societal Biases In Language Generation: Progress And Challenges (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 102 citations
Sheng et al.
Improving Sequence-to-sequence Pre-training Via Sequence Span Rewriting (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 201 citations
Zhou et al.
SATAR: A Self-supervised Approach To Twitter Account Representation Learning And Its Application In Bot Detection (2021) • CIKM '21: The 30th ACM International Conference on Information and Knowledge Management • 60 citations
Feng et al.
Multi-task Pre-training For Plug-and-play Task-oriented Dialogue System (2021) • Arxiv • 57 citations
Su et al.
Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies (2021) • Behavior Research Methods • 65 citations
Flemotomos et al.
Open Aspect Target Sentiment Classification With Natural Language Prompts (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 43 citations
Seoh et al.
Utnet: A Hybrid Transformer Architecture For Medical Image Segmentation (2021) • Lecture Notes in Computer Science • 356 citations
Yunhe Gao, Mu Zhou, Dimitris Metaxas
Recommending Metamodel Concepts During Modeling Activities With Pre-trained Language Models (2021) • Software and Systems Modeling • 58 citations
Martin Weyssow, Houari Sahraoui, Eugene Syriani
Climatebert: A Pretrained Language Model For Climate-related Text (2021) • SSRN Electronic Journal • 59 citations
Webersinke et al.
Transformers In Vision: A Survey (2021) • ACM Computing Surveys • 2262 citations
Khan et al.
Deep Transfer Learning & Beyond: Transformer Language Models In Information Systems Research (2021) • ACM Computing Surveys • 42 citations
Ross Gruetzemacher, David Paradice
A Recurrent Vision-and-language BERT For Navigation (2020) • Arxiv • 40 citations
Hong et al.
A Survey On Visual Transformer (2020) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 3049 citations
Han et al.
Point Transformer (2020) • IEEE Access • 241 citations
Zhao et al.
Towards Controllable Biases In Language Generation (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 84 citations
Sheng et al.
GREEK-BERT: The Greeks Visiting Sesame Street (2020) • 11th Hellenic Conference on Artificial Intelligence • 79 citations
Koutsikakis et al.
Informer: Beyond Efficient Transformer For Long Sequence Time-series Forecasting (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 3884 citations
Zhou et al.
Tweepfake: About Detecting Deepfake Tweets (2020) • PLOS ONE • 153 citations
Fagni et al.
Codebert: A Pre-trained Model For Programming And Natural Languages (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 2031 citations
Feng et al.
HOLMES: Health Online Model Ensemble Serving For Deep Learning Models In Intensive Care Units (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 72 citations
Hong et al.
Human-centric Spatio-temporal Video Grounding With Visual Transformers (2020) • IEEE Transactions on Circuits and Systems for Video Technology • 75 citations
Tang et al.
Measuring And Reducing Gendered Correlations In Pre-trained Models (2020) • Arxiv • 106 citations
Webster et al.
Context-aware Attentive Knowledge Tracing (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 432 citations
Aritra Ghosh, Neil Heffernan, Andrew S. Lan
Opiniondigest: A Simple Framework For Opinion Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 62 citations
Suhara et al.
Adversarial Training For Aspect-based Sentiment Analysis With BERT (2020) • 2020 25th International Conference on Pattern Recognition (ICPR) • 41 citations
Akbar Karimi, Leonardo Rossi, Andrea Prati
CLUE: A Chinese Language Understanding Evaluation Benchmark (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 235 citations
Xu et al.
Fashionbert: Text And Image Matching With Adaptive Loss For Cross-modal Retrieval (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 114 citations
Gao et al.
Paraphrase Augmented Task-oriented Dialog Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 77 citations
Gao et al.
Confronting Abusive Language Online: A Survey From The Ethical And Human Rights Perspective (2020) • Journal of Artificial Intelligence Research • 53 citations
Svetlana Kiritchenko, Isar Nejadgholi, Kathleen C. Fraser
Deebert: Dynamic Early Exiting For Accelerating BERT Inference (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 265 citations
Xin et al.
Can Adversarial Weight Perturbations Inject Neural Backdoors? (2020) • Proceedings of the 29th ACM International Conference on Information & Knowledge Management • 51 citations
Garg et al.
Empowering Things With Intelligence: A Survey Of The Progress, Challenges, And Opportunities In Artificial Intelligence Of Things (2020) • IEEE Internet of Things Journal • 571 citations
Jing Zhang, Dacheng Tao
TOD-BERT: Pre-trained Natural Language Understanding For Task-oriented Dialogue (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 203 citations
Wu et al.
Uncertainty-aware Self-training For Text Classification With Few Labels (2020) • Arxiv • 41 citations
Subhabrata Mukherjee, Ahmed Hassan Awadallah
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder For Long-form Document Matching (2020) • CIKM '20: The 29th ACM International Conference on Information and Knowledge Management • 53 citations
Yang et al.
Phobert: Pre-trained Language Models For Vietnamese (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 255 citations
Dat Quoc Nguyen, Anh Tuan Nguyen
Bertweet: A Pre-trained Language Model For English Tweets (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 432 citations
Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen
Survey On Deep Multi-modal Data Analytics: Collaboration, Rivalry And Fusion (2020) • ACM Transactions on Multimedia Computing, Communications, and Applications • 99 citations
Yang Wang
Biomegatron: Larger Biomedical Domain Language Model (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 69 citations
Shin et al.
Heterogeneous Graph Neural Networks For Extractive Document Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 270 citations
Wang et al.
Deep Entity Matching With Pre-trained Language Models (2020) • Proceedings of the VLDB Endowment • 246 citations
Li et al.
Learning Context-aware Task Reasoning For Efficient Meta-reinforcement Learning (2020) • Proceedings of the ACM on Programming Languages • 60 citations
Haozhe Wang, Jiale Zhou, Xuming He
Artificial Intelligence In The Battle Against Coronavirus (COVID-19): A Survey And Future Research Directions (2020) • Arxiv • 165 citations
Nguyen et al.
Learning Agile Robotic Locomotion Skills By Imitating Animals (2020) • Arxiv • 41 citations
Peng et al.
Towards Automatic Face-to-face Translation (2020) • Proceedings of the 27th ACM International Conference on Multimedia • 121 citations
R et al.
Snippext: Semi-supervised Opinion Mining With Augmented Data (2020) • Proceedings of The Web Conference 2020 • 50 citations
Miao et al.
KGPT: Knowledge-grounded Pre-training For Data-to-text Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 111 citations
Chen et al.
Fine-grained Visual Textual Alignment For Cross-modal Retrieval Using Transformer Encoders (2020) • ACM Transactions on Multimedia Computing, Communications, and Applications • 118 citations
Messina et al.
Few-shot Natural Language Generation For Task-oriented Dialog (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 157 citations
Peng et al.
Continual Learning For Natural Language Generation In Task-oriented Dialog Systems (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 53 citations
Mi et al.
ADER: Adaptively Distilled Exemplar Replay Towards Continual Learning For Session-based Recommendation (2020) • Fourteenth ACM Conference on Recommender Systems • 49 citations
Fei Mi, Xiaoyu Lin, Boi Faltings
Deep Learning For Source Code Modeling And Generation: Models, Applications And Challenges (2020) • ACM Computing Surveys • 108 citations
Triet H. M. Le, Hao Chen, M. Ali Babar
Robustscanner: Dynamically Enhancing Positional Clues For Robust Text Recognition (2020) • Lecture Notes in Computer Science • 172 citations
Yue et al.
Unsupervised Translation Of Programming Languages (2020) • Arxiv • 62 citations
Lachaux et al.
Query Resolution For Conversational Search With Limited Supervision (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 106 citations
Voskarides et al.
MM-COVID: A Multilingual And Multimodal Data Repository For Combating COVID-19 Disinformation (2020) • Arxiv • 51 citations
Li et al.
SPECTER: Document-level Representation Learning Using Citation-informed Transformers (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 218 citations
Cohan et al.
Improving Aspect-level Sentiment Analysis With Aspect Extraction (2020) • Neural Computing and Applications • 42 citations
Majumder et al.
Pymt5: Multi-mode Translation Of Natural Language And Python Code With Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 64 citations
Clement et al.
XTREME: A Massively Multilingual Multi-task Benchmark For Evaluating Cross-lingual Generalization (2020) • Arxiv • 299 citations
Hu et al.
Hiertrain: Fast Hierarchical Edge AI Learning With Hybrid Parallelism In Mobile-edge-cloud Computing (2020) • IEEE Open Journal of the Communications Society • 65 citations
Liu et al.
Cost-effective Selection Of Pretraining Data: A Case Study Of Pretraining BERT On Social Media (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 46 citations
Dai et al.
Attentional Feature Fusion (2020) • 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) • 990 citations
Dai et al.
A Survey Of Knowledge-enhanced Text Generation (2020) • ACM Computing Surveys • 220 citations
Yu et al.
Underspecification Presents Challenges For Credibility In Modern Machine Learning (2020) • Arxiv • 428 citations
D'Amour et al.
Inductive Entity Representations From Text Via Link Prediction (2020) • Proceedings of the Web Conference 2021 • 70 citations
Daniel Daza, Michael Cochez, Paul Groth
Generating Accurate Assert Statements For Unit Test Cases Using Pretrained Transformers (2020) • AST '22: IEEE/ACM 3rd International Conference on Automation of Software Test • 66 citations
Tufano et al.
Mixup-transformer: Dynamic Data Augmentation For NLP Tasks (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 72 citations
Sun et al.
Personality Trait Detection Using Bagged SVM Over BERT Word Embedding Ensembles (2020) • Proceedings of the The Fourth Widening Natural Language Processing Workshop (2020) • 52 citations
Kazameini et al.
DMD: A Large-scale Multi-modal Driver Monitoring Dataset For Attention And Alertness Analysis (2020) • Lecture Notes in Computer Science • 90 citations
Ortega et al.
Med7: A Transferable Clinical Natural Language Processing Model For Electronic Health Records (2020) • Arxiv • 143 citations
Kormilitzin et al.
Gector -- Grammatical Error Correction: Tag, Not Rewrite (2020) • Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications • 103 citations
Omelianchuk et al.
Modality-agnostic Attention Fusion For Visual Search With Text Feedback (2020) • Arxiv • 47 citations
Dodds et al.
Enabling Language Models To Fill In The Blanks (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 41 citations
Chris Donahue, Mina Lee, Percy Liang
Low-resource Deep Entity Resolution With Transfer And Active Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 120 citations
Kasai et al.
Discrete Flows: Invertible Generative Models Of Discrete Data (2019) • Arxiv • 40 citations
Tran et al.
Neural Metric Learning For Fast End-to-end Relation Extraction (2019) • Arxiv • 40 citations
Tung Tran, Ramakanth Kavuluru
TWEETQA: A Social Media Focused Question Answering Dataset (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 64 citations
Xiong et al.
Domain Adaptive Dialog Generation Via Meta Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 134 citations
Kun Qian, Zhou Yu
Large-batch Training For LSTM And Beyond (2019) • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis • 74 citations
You et al.
Emotion-cause Pair Extraction: A New Task To Emotion Analysis In Texts (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 249 citations
Rui Xia, Zixiang Ding
K-BERT: Enabling Language Representation With Knowledge Graph (2019) • Arxiv • 84 citations
Liu et al.
Controllable Sentence Simplification (2019) • Arxiv • 52 citations
Martin et al.
A Generalized Framework Of Sequence Generation With Application To Undirected Sequence Models (2019) • Arxiv • 46 citations
Mansimov et al.
Mixture Content Selection For Diverse Sequence Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 56 citations
Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi
On Learning Meaningful Code Changes Via Neural Machine Translation (2019) • 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) • 202 citations
Tufano et al.
Nemo: A Toolkit For Building AI Applications Using Neural Modules (2019) • Arxiv • 175 citations
Kuchaiev et al.
Meta-learning With Dynamic-memory-based Prototypical Network For Few-shot Event Detection (2019) • Proceedings of the 13th International Conference on Web Search and Data Mining • 69 citations
Deng et al.
Expressing Visual Relationships Via Language (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 42 citations
Tan et al.
A Backdoor Attack Against Lstm-based Text Classification Systems (2019) • IEEE Access • 268 citations
Jiazhu Dai, Chuanshuai Chen
Generative Teaching Networks: Accelerating Neural Architecture Search By Learning To Generate Synthetic Training Data (2019) • Arxiv • 48 citations
Such et al.
Learning To Generate Questions By Learning What Not To Generate (2019) • WWW '19: The Web Conference • 48 citations
Liu et al.
Tripping Through Time: Efficient Localization Of Activities In Videos (2019) • Arxiv • 41 citations
Hahn et al.
Text Readability Assessment For Second Language Learners (2019) • Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications • 138 citations
Menglin Xia, Ekaterina Kochmar, Ted Briscoe
Cross-lingual Alignment Of Contextual Word Embeddings, With Applications To Zero-shot Dependency Parsing (2019) • Proceedings of the 2019 Conference of the North • 41 citations
Schuster et al.
Generating Sentiment-preserving Fake Online Reviews Using Neural Language Models And Their Human- And Machine-based Detection (2019) • Advances in Intelligent Systems and Computing • 51 citations
Adelani et al.
Modeling Noisiness To Recognize Named Entities Using Multitask Neural Networks On Social Media (2019) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 50 citations
Aguilar et al.
Eleatt-rnn: Adding Attentiveness To Neurons In Recurrent Neural Networks (2019) • IEEE Transactions on Image Processing • 103 citations
Zhang et al.
Simple And Effective Text Matching With Richer Alignment Features (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 171 citations
Yang et al.
Robust Sequence-to-sequence Acoustic Modeling With Stepwise Monotonic Attention For Neural TTS (2019) • Interspeech 2019 • 79 citations
Mutian He, Yan Deng, Lei He
Jointly Measuring Diversity And Quality In Text Generation Models (2019) • Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation • 42 citations
Ehsan Montahaei, Danial Alihosseini, Mahdieh Soleymani Baghshah
Revisiting Joint Modeling Of Cross-document Entity And Event Coreference Resolution (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 104 citations
Barhom et al.
Evolutionary Neural Automl For Deep Learning (2019) • Proceedings of the Genetic and Evolutionary Computation Conference • 106 citations
Liang et al.
Target-guided Open-domain Conversation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 109 citations
Tang et al.
A Comparative Study On Transformer Vs RNN In Speech Applications (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 448 citations
Karita et al.
Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019) • Interspeech 2019 • 155 citations
Kannan et al.
Multi-style Generative Reading Comprehension (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 76 citations
Nishida et al.
Defending Against Neural Fake News (2019) • Arxiv • 89 citations
Zellers et al.
Analyzing Information Leakage Of Updates To Natural Language Models (2019) • CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security • 45 citations
Zanella-Béguelin et al.
Doc2edag: An End-to-end Document-level Framework For Chinese Financial Event Extraction (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 170 citations
Zheng et al.
Integrating Multimodal Information In Large Pretrained Transformers (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 421 citations
Rahman et al.
Shape Robust Text Detection With Progressive Scale Expansion Network (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 798 citations
Wang et al.
Sentence Embedding Alignment For Lifelong Relation Extraction (2019) • Proceedings of the 2019 Conference of the North • 110 citations
Wang et al.
Adaptive Embedding Gate For Attention-based Scene Text Recognition (2019) • Neurocomputing • 41 citations
Chen et al.
Complementary Fusion Of Multi-features And Multi-modalities In Sentiment Analysis (2019) • Arxiv • 53 citations
Chen et al.
Pre-training Of Graph Augmented Transformers For Medication Recommendation (2019) • Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19} • 168 citations
Shang et al.
Training Neural Response Selection For Task-oriented Dialogue Systems (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 83 citations
Henderson et al.
Layoutlm: Pre-training Of Text And Layout For Document Image Understanding (2019) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 478 citations
Xu et al.
Abstract Meaning Representation For Multi-document Summarization (2018) • Arxiv • 78 citations
Kexin Liao, Logan Lebanoff, Fei Liu
Learning A Text-video Embedding From Incomplete And Heterogeneous Data (2018) • Arxiv • 174 citations
Antoine Miech, Ivan Laptev, Josef Sivic
Word2vec Applied To Recommendation: Hyperparameters Matter (2018) • Arxiv • 44 citations
Hugo Caselles-Dupré, Florian Lesaint, Jimena Royo-Letelier
Tf-ranking: Scalable Tensorflow Library For Learning-to-rank (2018) • KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 82 citations
Pasumarthi et al.
Adversarial Text Generation Via Feature-mover's Distance (2018) • Arxiv • 75 citations
Chen et al.
Polisis: Automated Analysis And Presentation Of Privacy Policies Using Deep Learning (2018) • Arxiv • 174 citations
Harkous et al.
Grow And Prune Compact, Fast, And Accurate Lstms (2018) • IEEE Transactions on Computers • 67 citations
Xiaoliang Dai, Hongxu Yin, Niraj K. Jha
Video Description: A Survey Of Methods, Datasets And Evaluation Metrics (2018) • ACM Computing Surveys • 138 citations
Aafaq et al.
AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale (2018) • Arxiv • 201 citations
Du et al.
Interpretation Of Natural Language Rules In Conversational Machine Reading (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 141 citations
Saeidi et al.
Tell, Draw, And Repeat: Generating And Modifying Images Based On Continual Linguistic Instruction (2018) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 54 citations
El-Nouby et al.
Polite Dialogue Generation Without Parallel Data (2018) • Transactions of the Association for Computational Linguistics • 166 citations
Tong Niu, Mohit Bansal
Generation Of Synthetic Electronic Medical Record Text (2018) • 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) • 41 citations
Guan et al.
Code2seq: Generating Sequences From Structured Representations Of Code (2018) • Arxiv • 62 citations
Alon et al.
Natural Language Processing For Ehr-based Computational Phenotyping (2018) • IEEE/ACM Transactions on Computational Biology and Bioinformatics • 198 citations
Zeng et al.
Decoupled Novel Object Captioner (2018) • Proceedings of the 26th ACM international conference on Multimedia • 70 citations
Wu et al.
Alternating Multi-bit Quantization For Recurrent Neural Networks (2018) • Arxiv • 60 citations
Xu et al.
Densely Connected Bidirectional LSTM With Applications To Sentence Classification (2018) • Lecture Notes in Computer Science • 64 citations
Ding et al.
Model Cards For Model Reporting (2018) • Proceedings of the Conference on Fairness, Accountability, and Transparency • 1289 citations
Mitchell et al.
Tensor Comprehensions: Framework-agnostic High-performance Machine Learning Abstractions (2018) • Arxiv • 251 citations
Vasilache et al.
SHAPED: Shared-private Encoder-decoder For Text Style Adaptation (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 51 citations
Ye Zhang, Nan Ding, Radu Soricut
Textbugger: Generating Adversarial Text Against Real-world Applications (2018) • Proceedings 2019 Network and Distributed System Security Symposium • 275 citations
Li et al.
Deep-fsmn For Large Vocabulary Continuous Speech Recognition (2018) • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 120 citations
Zhang et al.
Dpp-net: Device-aware Progressive Search For Pareto-optimal Neural Architectures (2018) • Lecture Notes in Computer Science • 187 citations
Dong et al.
Semantic Image Synthesis Via Adversarial Learning (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 264 citations
Dong et al.
What Do We Need To Build Explainable AI Systems For The Medical Domain? (2017) • Arxiv • 626 citations
Holzinger et al.
Survey Of The State Of The Art In Natural Language Generation: Core Tasks, Applications And Evaluation (2017) • Journal of Artificial Intelligence Research • 347 citations
Albert Gatt, Emiel Krahmer
Long Text Generation Via Adversarial Training With Leaked Information (2017) • Arxiv • 161 citations
Guo et al.
Deep Learning Based Recommender System: A Survey And New Perspectives (2017) • Arxiv • 655 citations
Zhang et al.
Detecting Oriented Text In Natural Images By Linking Segments (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 800 citations
Baoguang Shi, Xiang Bai, Serge Belongie
A Knowledge-grounded Neural Conversation Model (2017) • Proceedings of the AAAI Conference on Artificial Intelligence • 234 citations
Ghazvininejad et al.
Neural Networks For Text Correction And Completion In Keyboard Decoding (2017) • Arxiv • 56 citations
Shaona Ghosh, Per Ola Kristensson
Knowledge Adaptation: Teaching To Adapt (2017) • Arxiv • 41 citations
Sebastian Ruder, Parsa Ghaffari, John G. Breslin
Unsupervised Learning Of Sentence Embeddings Using Compositional N-gram Features (2017) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 688 citations
Matteo Pagliardini, Prakhar Gupta, Martin Jaggi
Fast-slow Recurrent Neural Networks (2017) • Arxiv • 41 citations
Asier Mujika, Florian Meier, Angelika Steger
Semeval-2017 Task 1: Semantic Textual Similarity - Multilingual And Cross-lingual Focused Evaluation (2017) • Arxiv • 306 citations
Cer et al.
Steering Output Style And Topic In Neural Response Generation (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 66 citations
Wang et al.
Learning Distributed Representations Of Sentences From Unlabelled Data (2016) • Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 117 citations
Felix Hill, Kyunghyun Cho, Anna Korhonen
Neural Text Generation From Structured Data With Application To The Biography Domain (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 63 citations
Remi Lebret, David Grangier, Michael Auli
An Empirical Evaluation Of Doc2vec With Practical Insights Into Document Embedding Generation (2016) • Proceedings of the 1st Workshop on Representation Learning for NLP • 111 citations
Jey Han Lau, Timothy Baldwin
Language As A Latent Variable: Discrete Generative Models For Sentence Compression (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 186 citations
Yishu Miao, Phil Blunsom
Multiresolution Recurrent Neural Networks: An Application To Dialogue Response Generation (2016) • Arxiv • 73 citations
Serban et al.
Interpretable Semantic Textual Similarity: Finding And Explaining Differences Between Sentences (2016) • Knowledge-Based Systems • 45 citations
Lopez-Gazpio et al.
Learning End-to-end Goal-oriented Dialog (2016) • Arxiv • 75 citations
Antoine Bordes, Y-Lan Boureau, Jason Weston
Emoji2vec: Learning Emoji Representations From Their Description (2016) • Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media • 246 citations
Eisner et al.
Stackgan: Text To Photo-realistic Image Synthesis With Stacked Generative Adversarial Networks (2016) • Arxiv • 227 citations
Zhang et al.
One-shot Generalization In Deep Generative Models (2016) • Arxiv • 75 citations
Rezende et al.
Neuro-symbolic Program Synthesis (2016) • Arxiv • 105 citations
Parisotto et al.
Generative Deep Neural Networks For Dialogue: A Short Review (2016) • Arxiv • 67 citations
Serban et al.
Ask The GRU: Multi-task Learning For Deep Text Recommendations (2016) • RecSys '16: Tenth ACM Conference on Recommender Systems • 192 citations
Trapit Bansal, David Belanger, Andrew McCallum
Topicrnn: A Recurrent Neural Network With Long-range Semantic Dependency (2016) • Arxiv • 129 citations
Dieng et al.
RNN Approaches To Text Normalization: A Challenge (2016) • Arxiv • 55 citations
Richard Sproat, Navdeep Jaitly
Towards Sub-word Level Compositions For Sentiment Analysis Of Hindi-english Code Mixed Text (2016) • Arxiv • 128 citations
Prabhu et al.
Diverse Beam Search: Decoding Diverse Solutions From Neural Sequence Models (2016) • Arxiv • 67 citations
Vijayakumar et al.
Personalized Speech Recognition On Mobile Devices (2016) • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 158 citations
McGraw et al.
Hierarchical Neural Language Models For Joint Representation Of Streaming Documents And Their Content (2016) • WWW '15: 24th International World Wide Web Conference • 46 citations
Djuric et al.
Contextual LSTM (CLSTM) Models For Large Scale NLP Tasks (2016) • Arxiv • 190 citations
Ghosh et al.

Showing first 12 while collapsed. Click to expand and reveal all 1023.

ASRU 32 papers #

Scaling Analysis Of Interleaved Speech-text Language Models (2025) • No Venue
Maimon et al.
Roadmap Towards Superhuman Speech Understanding Using Large Language Models (2024) • No Venue
Bu et al.
On Decoder-only Architecture For Speech-to-text And Large Language Model Integration (2023) • 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 52 citations
Wu et al.
The Chime-7 DASR Challenge: Distant Meeting Transcription With Multiple Devices In Diverse Scenarios (2023) • 7th International Workshop on Speech Processing in Everyday Environments (CHiME 2023) • 45 citations
Cornell et al.
Mslam: Massively Multilingual Joint Pre-training For Speech And Text (2022) • Arxiv • 59 citations
Bapna et al.
Audiolm: A Language Modeling Approach To Audio Generation (2022) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 252 citations
Borsos et al.
Branchformer: Parallel Mlp-attention Architectures To Capture Local And Global Context For Speech Recognition And Understanding (2022) • Arxiv • 40 citations
Peng et al.
Efficient Conformer: Progressive Downsampling And Grouped Attention For Automatic Speech Recognition (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 68 citations
Maxime Burchi, Valentin Vielzeuf
Speechstew: Simply Mix All Available Speech Recognition Data To Train One Large Neural Network (2021) • Arxiv • 75 citations
Chan et al.
An Exploration Of Self-supervised Pretrained Representations For End-to-end Speech Recognition (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 44 citations
Chang et al.
Adapting GPT, GPT-2 And BERT Language Models For Speech Recognition (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 56 citations
Xianrui Zheng, Chao Zhang, Philip C. Woodland
W2v-bert: Combining Contrastive Learning And Masked Language Modeling For Self-supervised Speech Pre-training (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 204 citations
Chung et al.
Scaling End-to-end Models For Large-scale Multilingual ASR (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 46 citations
Li et al.
Unified Streaming And Non-streaming Two-pass End-to-end Model For Speech Recognition (2020) • Arxiv • 46 citations
Zhang et al.
A Density Ratio Approach To Language Model Fusion In End-to-end Automatic Speech Recognition (2020) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 64 citations
Erik McDermott, Hasim Sak, Ehsan Variani
Minimum Latency Training Strategies For Streaming Sequence-to-sequence ASR (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 46 citations
Inaguma et al.
Almost Unsupervised Text To Speech And Automatic Speech Recognition (2019) • Arxiv • 58 citations
Ren et al.
Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 174 citations
Li et al.
Topic-aware Pointer-generator Networks For Summarizing Spoken Conversations (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 87 citations
Liu et al.
A Comparative Study On Transformer Vs RNN In Speech Applications (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 448 citations
Karita et al.
Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019) • Interspeech 2019 • 155 citations
Kannan et al.
End-to-end ASR: From Supervised To Semi-supervised Learning With Modern Architectures (2019) • Arxiv • 165 citations
Synnaeve et al.
Multilingual End-to-end Speech Translation (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 78 citations
Inaguma et al.
From Senones To Chenones: Tied Context-dependent Graphemes For Hybrid Speech Recognition (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 72 citations
Le et al.
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 66 citations
Wang et al.
Hierarchical Transformers For Long Document Classification (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 137 citations
Pappagari et al.
AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale (2018) • Arxiv • 201 citations
Du et al.
Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018) • 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 137 citations
Kanishka Rao, Haşim Sak, Rohit Prabhavalkar
Exploring Neural Transducers For End-to-end Speech Recognition (2017) • 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 233 citations
Battenberg et al.
Direct Acoustics-to-word Models For English Conversational Speech Recognition (2017) • Interspeech 2017 • 114 citations
Audhkhasi et al.
Streaming Small-footprint Keyword Spotting Using Sequence-to-sequence Models (2017) • 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 89 citations
He et al.
An Analysis Of Incorporating An External Language Model Into A Sequence-to-sequence Model (2017) • ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 138 citations
Kannan et al.

Showing first 12 while collapsed. Click to expand and reveal all 32.

— C —

CIKM 19 papers #

Hallucination Detection: Robustly Discerning Reliable Answers In Large Language Models (2024) • CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management • 59 citations
Chen et al.
Clip-count: Towards Text-guided Zero-shot Object Counting (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 50 citations
Ruixiang Jiang, Lingbo Liu, Changwen Chen
Xuanyuan 2.0: A Large Chinese Financial Chat Model With Hundreds Of Billions Parameters (2023) • Proceedings of the 32nd ACM International Conference on Information and Knowledge Management • 57 citations
Xuanyu Zhang, Qing Yang, Dongliang Xu
Adamct: Adaptive Mixture Of Cnn-transformer For Sequential Recommendation (2022) • CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management • 43 citations
Jiang et al.
Corpusbrain: Pre-train A Generative Retrieval Model For Knowledge-intensive Language Tasks (2022) • Proceedings of the 31st ACM International Conference on Information & Knowledge Management • 52 citations
Chen et al.
Contrastive Learning With Bidirectional Transformers For Sequential Recommendation (2022) • CIKM '22: The 31st ACM International Conference on Information and Knowledge Management • 44 citations
Du et al.
Can Open Domain Question Answering Systems Answer Visual Knowledge Questions? (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 52 citations
Zhang et al.
Counterfactual Reasoning For Out-of-distribution Multimodal Sentiment Analysis (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 50 citations
Sun et al.
Complex Temporal Question Answering On Knowledge Graphs (2021) • Proceedings of the 30th ACM International Conference on Information & Knowledge Management • 64 citations
Jia et al.
Integrating Pattern- And Fact-based Fake News Detection Via Model Preference Learning (2021) • Proceedings of the 30th ACM International Conference on Information & Knowledge Management • 51 citations
Sheng et al.
Enhancing Knowledge Tracing Via Adversarial Training (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 77 citations
Guo et al.
SATAR: A Self-supervised Approach To Twitter Account Representation Learning And Its Application In Bot Detection (2021) • CIKM '21: The 30th ACM International Conference on Information and Knowledge Management • 60 citations
Feng et al.
Lightweight Self-attentive Sequential Recommendation (2021) • Proceedings of the 30th ACM International Conference on Information & Knowledge Management • 104 citations
Li et al.
Speaker-aware BERT For Multi-turn Response Selection In Retrieval-based Chatbots (2020) • Proceedings of the 29th ACM International Conference on Information & Knowledge Management • 148 citations
Gu et al.
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder For Long-form Document Matching (2020) • CIKM '20: The 29th ACM International Conference on Information and Knowledge Management • 53 citations
Yang et al.
Enriching Pre-trained Language Model With Entity Information For Relation Classification (2019) • CIKM '19: The 28th ACM International Conference on Information and Knowledge Management • 272 citations
Shanchan Wu, Yifan He
How Does BERT Answer Questions? A Layer-wise Analysis Of Transformer Representations (2019) • CIKM '19: The 28th ACM International Conference on Information and Knowledge Management • 63 citations
Aken et al.
Attentive History Selection For Conversational Question Answering (2019) • Proceedings of the 28th ACM International Conference on Information and Knowledge Management • 44 citations
Qu et al.
A Hybrid Retrieval-generation Neural Conversation Model (2019) • Proceedings of the 28th ACM International Conference on Information and Knowledge Management • 65 citations
Yang et al.

Showing first 12 while collapsed. Click to expand and reveal all 19.

COLING 3 papers #

Automatically Identifying Words That Can Serve As Labels For Few-shot Text Classification (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 148 citations
Timo Schick, Helmut Schmid, Hinrich Schütze
On Adversarial Examples For Character-level Neural Machine Translation (2018) • COLING 2018 • 157 citations
Javid Ebrahimi, Daniel Lowd, Dejing Dou
Modeling Multi-turn Conversation With Deep Utterance Aggregation (2018) • COLING 2018 pages 3740-3752 • 93 citations
Zhang et al.

Showing first 12 while collapsed. Click to expand and reveal all 3.

Conversational Agents 469 papers #

Enhancing Human-like Responses In Large Language Models (2025) • No Venue
Ethem Yağız Çalık, Talha Rüzgar Akkuş
Scaling Latent Reasoning Via Looped Language Models (2025) • No Venue
Zhu et al.
Interactiveomni: A Unified Omni-modal Model For Audio-visual Multi-turn Dialogue (2025) • No Venue
Tong et al.
Metamind: Modeling Human Social Thoughts With Metacognitive Multi-agent Systems (2025) • No Venue
Zhang et al.
Sentient Agent As A Judge: Evaluating Higher-order Social Cognition In Large Language Models (2025) • No Venue
Zhang et al.
Livemcp-101: Stress Testing And Diagnosing Mcp-enabled Agents On Challenging Queries (2025) • No Venue
Yin et al.
API Agents Vs. GUI Agents: Divergence And Convergence (2025) • No Venue
Zhang et al.
Agent Learning Via Early Experience (2025) • No Venue
Zhang et al.
The Alignment Waltz: Jointly Training Agents To Collaborate For Safety (2025) • No Venue
Zhang et al.
O-mem: Omni Memory System For Personalized, Long Horizon, Self-evolving Agents (2025) • No Venue
Wang et al.
Roboomni: Proactive Robot Manipulation In Omni-modal Context (2025) • No Venue
Wang et al.
Voiceassistant-eval: Benchmarking AI Assistants Across Listening, Speaking, And Viewing (2025) • No Venue
Wang et al.
Mocha: Towards Movie-grade Talking Character Synthesis (2025) • No Venue
Wei et al.
Resum: Unlocking Long-horizon Search Intelligence Via Context Summarization (2025) • No Venue
Wu et al.
Qwen2.5-omni Technical Report (2025) • No Venue
Xu et al.
Qwen3-omni Technical Report (2025) • No Venue
Xu et al.
Step-audio-editx Technical Report (2025) • No Venue
Yan et al.
Too Good To Be Bad: On The Failure Of Llms To Role-play Villains (2025) • No Venue
Yi et al.
Black-box On-policy Distillation Of Large Language Models (2025) • No Venue
Ye et al.
Agentfold: Long-horizon Web Agents With Proactive Context Management (2025) • No Venue
Ye et al.
Game-time: Evaluating Temporal Dynamics In Spoken Language Models (2025) • No Venue
Chang et al.
Minmo: A Multimodal Large Language Model For Seamless Voice Interaction (2025) • No Venue
Chen et al.
Persona Vectors: Monitoring And Controlling Character Traits In Language Models (2025) • No Venue
Chen et al.
Mem0: Building Production-ready AI Agents With Scalable Long-term Memory (2025) • No Venue
Chhikara et al.
STITCH: Simultaneous Thinking And Talking With Chunked Reasoning For Spoken Language Models (2025) • No Venue
Chiang et al.
SHANKS: Simultaneous Hearing And Thinking For Spoken Language Models (2025) • No Venue
Chiang et al.
Interactcomp: Evaluating Search Agents With Ambiguous Queries (2025) • No Venue
Deng et al.
Llama-omni2: Llm-based Real-time Spoken Chatbot With Autoregressive Streaming Speech Synthesis (2025) • No Venue
Fang et al.
Robix: A Unified Model For Robot Interaction, Reasoning And Planning (2025) • No Venue
Fang et al.
Reactive Transformer (rxt) -- Stateful Real-time Processing For Event-driven Reactive Language Models (2025) • No Venue
Adam Filipek
VITA-1.5: Towards Gpt-4o Level Real-time Vision And Speech Interaction (2025) • No Venue
Fu et al.
Fasta^*: Fast-slow Toolpath Agent With Subroutine Mining For Efficient Multi-turn Image Editing (2025) • No Venue
Gupta et al.
Dynaguard: A Dynamic Guardrail Model With User-defined Policies (2025) • No Venue
Hoover et al.
Adaptive Multi-agent Response Refinement In Conversational Systems (2025) • No Venue
Jeong et al.
SDPO: Segment-level Direct Preference Optimization For Social Agents (2025) • No Venue
Kong et al.
Embodied Agents Meet Personalization: Exploring Memory Utilization For Personalized Assistance (2025) • No Venue
Kwon et al.
Infiguiagent: A Multimodal Generalist GUI Agent With Native Reasoning And Reflection (2025) • No Venue
Liu et al.
Pc-agent: A Hierarchical Multi-agent Collaboration Framework For Complex Task Automation On PC (2025) • No Venue
Liu et al.
Taking Notes Brings Focus? Towards Multi-turn Multimodal Dialogue Learning (2025) • No Venue
Liu et al.
Voxtral (2025) • No Venue
Liu et al.
Agentrewardbench: Evaluating Automatic Evaluations Of Web Agent Trajectories (2025) • No Venue
Lù et al.
C3: A Bilingual Benchmark For Spoken Dialogue Models Exploring Challenges In Complex Conversations (2025) • No Venue
Chengqian Ma, Wei Tao, Yiwen Guo
Paper2agent: Reimagining Research Papers As Interactive And Reliable AI Agents (2025) • No Venue
Miao et al.
Effective Red-teaming Of Policy-adherent Agents (2025) • No Venue
Nakash et al.
BEAR: Benchmarking And Enhancing Multimodal Language Models For Atomic Embodied Capabilities (2025) • No Venue
Qi et al.
Dispider: Enabling Video Llms With Active Real-time Interaction Via Disentangled Perception, Decision, And Reaction (2025) • No Venue
Qian et al.
Userbench: An Interactive Gym Environment For User-centric Agents (2025) • No Venue
Qian et al.
Nile-chat: Egyptian Language Models For Arabic And Latin Scripts (2025) • No Venue
Shang et al.
Voila: Voice-language Foundation Models For Real-time Autonomous Interaction And Voice Role-play (2025) • No Venue
Shi et al.
Llmvox: Autoregressive Streaming Text-to-speech Model For Any LLM (2025) • No Venue
Shikhar et al.
The Leaderboard Illusion (2025) • No Venue
Singh et al.
Personafeedback: A Large-scale Human-annotated Benchmark For Personalization (2025) • No Venue
Tao et al.
Towards Conversational Diagnostic AI (2024) • No Venue
Tu et al.
Meltemi: The First Open Large Language Model For Greek (2024) • No Venue
Voukoutis et al.
Fusechat: Knowledge Fusion Of Chat Models (2024) • No Venue
Wan et al.
Agent Workflow Memory (2024) • No Venue
Wang et al.
Gpt-4o System Card (2024) • No Venue
Openai et al.
Alignment Studio: Aligning Large Language Models To Particular Contextual Regulations (2024) • No Venue
Achintalwar et al.
Yi: Open Foundation Models By 01.AI (2024) • No Venue
Ai et al.
Anygpt: Unified Multimodal LLM With Discrete Sequence Modeling (2024) • No Venue
Zhan et al.
Homogenization Effects Of Large Language Models On Human Creative Ideation (2024) • C&C '24: Creativity and Cognition • 77 citations
Barrett R. Anderson, Jash Hemant Shah, Max Kreminski
Minigpt4-video: Advancing Multimodal Llms For Video Understanding With Interleaved Visual-textual Tokens (2024) • No Venue
Ataallah et al.
Iris: An Ai-driven Virtual Tutor For Computer Science Education (2024) • Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 • 41 citations
Patrick Bassner, Eduard Frankford, Stephan Krusche
3dgraphllm: Combining Semantic Graphs And Large Language Models For 3D Scene Understanding (2024) • No Venue
Tatiana Zemskova, Dmitry Yudin
Roadmap Towards Superhuman Speech Understanding Using Large Language Models (2024) • No Venue
Bu et al.
EMOVA: Empowering Language Models To See, Hear And Speak With Vivid Emotions (2024) • No Venue
Chen et al.
Videollm-online: Online Video Large Language Model For Streaming Video (2024) • No Venue
Chen et al.
CORAL: Benchmarking Multi-turn Conversational Retrieval-augmentation Generation (2024) • No Venue
Cheng et al.
Exploring Large Language Model Based Intelligent Agents: Definitions, Methods, And Prospects (2024) • Internet Research • 214 citations
Cheng et al.
Chatbot Arena: An Open Platform For Evaluating Llms By Human Preference (2024) • No Venue
Chiang et al.
Can Chatgpt Evaluate Research Quality? (2024) • Journal of Data and Information Science • 40 citations
Mike Thelwall
Scaling Instructable Agents Across Many Simulated Worlds (2024) • No Venue
Team et al.
The Mamba In The Llama: Distilling And Accelerating Hybrid Models (2024) • No Venue
Wang et al.
Lami: Large Language Models For Multi-modal Human-robot Interaction (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 42 citations
Wang et al.
MOSAIC: A Modular System For Assistive And Interactive Cooking (2024) • No Venue
Wang et al.
Virtuwander: Enhancing Multi-modal Interaction For Virtual Tour Guidance Through Large Language Models (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 42 citations
Wang et al.
Diasynth -- Synthetic Dialogue Generation Framework (2024) • No Venue
Suresh et al.
Sotopia-π: Interactive Learning Of Socially Intelligent Language Agents (2024) • No Venue
Wang et al.
Generative Echo Chamber? Effects Of Llm-powered Search Systems On Diverse Information Seeking (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 52 citations
Nikhil Sharma, Q. Vera Liao, Ziang Xiao
Canttalkaboutthis: Aligning Language Models To Stay On Topic In Dialogues (2024) • No Venue
Sreedhar et al.
Simulating Classroom Education With Llm-empowered Agents (2024) • No Venue
Zhang et al.
Large Language Models Are Superpositions Of All Characters: Attaining Arbitrary Role-play Via Self-alignment (2024) • No Venue
Lu et al.
Blending Is All You Need: Cheaper, Better Alternative To Trillion-parameters LLM (2024) • No Venue
Lu et al.
Weblinx: Real-world Website Navigation With Multi-turn Dialogue (2024) • No Venue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
Language Model Can Listen While Speaking (2024) • No Venue
Ma et al.
Evaluating Very Long-term Conversational Memory Of LLM Agents (2024) • No Venue
Maharana et al.
Exploring The Capabilities And Limitations Of Large Language Models In The Electric Energy Sector (2024) • Joule • 69 citations
Majumder et al.
Wildchat: 1M Chatgpt Interaction Logs In The Wild (2024) • No Venue
Zhao et al.
Realm: Reference Resolution As Language Modeling (2024) • No Venue
Moniz et al.
H2o-danube3 Technical Report (2024) • No Venue
Pfeiffer et al.
Personalized Visual Instruction Tuning (2024) • No Venue
Pi et al.
Llama-omni: Seamless Speech Interaction With Large Language Models (2024) • No Venue
Fang et al.
Chemllm: A Chemical Large Language Model (2024) • No Venue
Zhang et al.
Pingpong: A Benchmark For Role-playing Language Models With User Emulation And Multi-model Evaluation (2024) • No Venue
Ilya Gusev
Exploring Chatgpt And Its Impact On Society (2024) • AI and Ethics • 44 citations
Md. Asraful Haque, Shuai Li
Webvoyager: Building An End-to-end Web Agent With Large Multimodal Models (2024) • No Venue
He et al.
Distilling An End-to-end Voice Assistant Without Instruction Training Data (2024) • No Venue
Held et al.
Chatdiet: Empowering Personalized Nutrition-oriented Food Recommender Chatbots Through An Llm-augmented Framework (2024) • Smart Health • 59 citations
Yang et al.
The Dawn Of GUI Agent: A Preliminary Case Study With Claude 3.5 Computer Use (2024) • No Venue
Hu et al.
Understanding Large-language Model (llm)-powered Human-robot Interaction (2024) • HRI '24: ACM/IEEE International Conference on Human-Robot Interaction • 75 citations
Callie Y. Kim, Christine P. Lee, Bilge Mutlu
THEANINE: Revisiting Memory Management In Long-term Conversations With Timeline-augmented Response Generation (2024) • No Venue
Kim et al.
Autowebglm: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent (2024) • No Venue
Lai et al.
Thanos: Enhancing Conversational Agents With Skill-of-mind-infused Large Language Model (2024) • No Venue
Lee et al.
Stark: Social Long-term Multi-modal Conversation With Persona Commonsense Knowledge (2024) • No Venue
Lee et al.
Mathemyths: Leveraging Large Language Models To Teach Mathematical Language Through Child-ai Co-creative Storytelling (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 47 citations
Zhang et al.
Hunyuan-dit: A Powerful Multi-resolution Diffusion Transformer With Fine-grained Chinese Understanding (2024) • No Venue
Li et al.
More Agents Is All You Need (2024) • No Venue
Li et al.
Wildbench: Benchmarking Llms With Challenging Tasks From Real Users In The Wild (2024) • No Venue
Lin et al.
Travelplanner: A Benchmark For Real-world Planning With Language Agents (2024) • No Venue
Xie et al.
Mini-omni2: Towards Open-source Gpt-4o With Vision, Speech And Duplex Capabilities (2024) • No Venue
Zhifei Xie, Changqiao Wu
Generative AI: Implications And Applications For Education (2023) • Arxiv • 59 citations
Olga et al.
Chatgpt, Can You Generate Solutions For My Coding Exercises? An Evaluation On Its Effectiveness In An Undergraduate Java Programming Course (2023) • ITiCSE 2023: Innovation and Technology in Computer Science Education • 61 citations
Ouh et al.
Large Language Models Can Infer Psychological Dispositions Of Social Media Users (2023) • PNAS Nexus • 45 citations
Heinrich Peters, Sandra Matz
Learning Gain Differences Between Chatgpt And Human Tutor Generated Algebra Hints (2023) • Arxiv • 71 citations
Zachary A. Pardos, Shreya Bhandari
Generative Agents: Interactive Simulacra Of Human Behavior (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 941 citations
Park et al.
Evaluating The Logical Reasoning Ability Of Chatgpt And GPT-4 (2023) • Arxiv • 102 citations
Liu et al.
Visual Instruction Tuning (2023) • Arxiv • 659 citations
Liu et al.
Make LLM A Testing Expert: Bringing Human-like Interaction To Mobile GUI Testing Via Functionality-aware Decisions (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 57 citations
Liu et al.
Leveraging Large Language Models To Power Chatbots For Collecting User Self-reported Data (2023) • Proceedings of the ACM on Human-Computer Interaction • 50 citations
Wei et al.
Keep The Conversation Going: Fixing 162 Out Of 337 Bugs For $0.42 Each Using Chatgpt (2023) • ISSTA 2024 Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis • 40 citations
Chunqiu Steven Xia, Lingming Zhang
Cognitive Architectures For Language Agents (2023) • Arxiv • 53 citations
Sumers et al.
Sensecape: Enabling Multilevel Exploration And Sensemaking With Large Language Models (2023) • UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology • 89 citations
Suh et al.
Do Large Language Models Show Decision Heuristics Similar To Humans? A Case Study Using GPT-3.5 (2023) • Journal of Experimental Psychology: General • 46 citations
Suri et al.
Chatgpt: More Than A Weapon Of Mass Deception, Ethical Challenges And Responses From The Human-centered Artificial Intelligence (HCAI) Perspective (2023) • International Journal of Human–Computer Interaction • 63 citations
Sison et al.
Chatgpt And A New Academic Reality: Artificial Intelligence-written Research Papers And The Ethics Of The Large Language Models In Scholarly Publishing (2023) • Journal of the Association for Information Science and Technology • 591 citations
Lund et al.
Chatanything: Facetime Chat With Llm-enhanced Personas (2023) • No Venue
Zhao et al.
Decoding Chatgpt: A Taxonomy Of Existing Research, Current Challenges, And Possible Future Directions (2023) • Journal of King Saud University - Computer and Information Sciences • 122 citations
Sohail et al.
Bioinspiredllm: Conversational Large Language Model For The Mechanics Of Biological And Bio-inspired Materials (2023) • Advanced Science • 67 citations
Rachel K. Luu, Markus J. Buehler
Translating Radiology Reports Into Plain Language Using Chatgpt And GPT-4 With Prompt Learning: Promising Results, Limitations, And Potential (2023) • Visual Computing for Industry, Biomedicine, and Art • 264 citations
Lyu et al.
A Transformer-based Model With Self-distillation For Multimodal Emotion Recognition In Conversations (2023) • IEEE Transactions on Multimedia • 71 citations
Ma et al.
Beyond Chatbots: Explorellm For Structured Thoughts And Personalized Model Responses (2023) • No Venue
Ma et al.
Tidybot: Personalized Robot Assistance With Large Language Models (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 65 citations
Wu et al.
GPT Has Become Financially Literate: Insights From Financial Literacy Tests Of GPT And A Preliminary Test Of How People Use It As A Source Of Advice (2023) • Finance Research Letters • 58 citations
Paweł Niszczota, Sami Abbas
Using An LLM To Help With Code Understanding (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 186 citations
Nam et al.
Chatgpt Or Grammarly? Evaluating Chatgpt On Grammatical Error Correction Benchmark (2023) • Arxiv • 48 citations
Wu et al.
Llm-assisted Knowledge Graph Engineering: Experiments With Chatgpt (2023) • Informatik aktuell • 40 citations
Meyer et al.
Reflexion: Language Agents With Verbal Reinforcement Learning (2023) • Arxiv • 247 citations
Shinn et al.
Chatgpt Or Human? Detect And Explain. Explaining Decisions Of Machine Learning Model For Detecting Short Chatgpt-generated Text (2023) • Arxiv • 81 citations
Sandra Mitrović, Davide Andreoletti, Omran Ayoub
Video-chatgpt: Towards Detailed Video Understanding Via Large Vision And Language Models (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 224 citations
Maaz et al.
Chat2vis: Generating Data Visualisations Via Natural Language Using Chatgpt, Codex And GPT-3 Large Language Models (2023) • IEEE Access • 165 citations
Paula Maddigan, Teo Susnjak
Next-gpt: Any-to-any Multimodal LLM (2023) • No Venue
Wu et al.
Roco: Dialectic Multi-robot Collaboration With Large Language Models (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 78 citations
Zhao Mandi, Shreeya Jain, Shuran Song
Assessing Cross-cultural Alignment Between Chatgpt And Human Societies: An Empirical Study (2023) • Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) • 96 citations
Cao et al.
Towards Human-bot Collaborative Software Architecting With Chatgpt (2023) • Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering • 128 citations
Ahmad et al.
Chatgpt In Drug Discovery: A Case Study On Anti-cocaine Addiction Drug Development With Chatbots (2023) • Journal of Chemical Information and Modeling • 42 citations
Rui Wang, Hongsong Feng, Guo-Wei Wei
Large Language Models Streamline Automated Machine Learning For Clinical Studies (2023) • Nature Communications • 74 citations
Arasteh et al.
Chatcad: Interactive Computer-aided Diagnosis On Medical Image Using Large Language Models (2023) • Communications Engineering • 88 citations
Wang et al.
Chatgpt: Applications, Opportunities, And Threats (2023) • 2023 Systems and Information Engineering Design Symposium (SIEDS) • 162 citations
Bahrini et al.
Qwen Technical Report (2023) • No Venue
Bai et al.
Multimodal Llms For Health Grounded In Individual-specific Data (2023) • Lecture Notes in Computer Science • 44 citations
Belyaeva et al.
Chip-chat: Challenges And Opportunities In Conversational Hardware Design (2023) • 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD) • 147 citations
Blocklove et al.
A Categorical Archive Of Chatgpt Failures (2023) • Arxiv • 395 citations
Ali Borji
How Is Chatgpt's Behavior Changing Over Time? (2023) • No Venue
Lingjiao Chen, Matei Zaharia, James Zou
Gptutor: A Chatgpt-powered Programming Tool For Code Explanation (2023) • Communications in Computer and Information Science • 68 citations
Chen et al.
Llava-interactive: An All-in-one Demo For Image Chat, Segmentation, Generation And Editing (2023) • No Venue
Chen et al.
Soulchat: Improving Llms' Empathy, Listening, And Comfort Abilities Through Fine-tuning With Multi-turn Empathy Conversations (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 43 citations
Chen et al.
When Do You Need Chain-of-thought Prompting For Chatgpt? (2023) • World Wide Web • 190 citations
Chen et al.
Chatgpt Empowered Long-step Robot Control In Various Environments: A Case Application (2023) • IEEE Access • 75 citations
Wake et al.
Memorybank: Enhancing Large Language Models With Long-term Memory (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 77 citations
Zhong et al.
Shepherd: A Critic For Language Model Generation (2023) • No Venue
Wang et al.
Rethinking The Evaluation For Conversational Recommendation In The Era Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 45 citations
Wang et al.
Enhancing STEM Learning With Chatgpt And Bing Chat As Objects To Think With: A Case Study (2023) • Eurasia Journal of Mathematics, Science and Technology Education • 86 citations
Marco Antonio Rodrigues Vasconcelos, Renato P. Dos Santos
Chatclimate: Grounding Conversational AI In Climate Science (2023) • Communications Earth & Environment • 87 citations
Vaghefi et al.
Auggpt: Leveraging Chatgpt For Text Data Augmentation (2023) • Arxiv • 98 citations
Dai et al.
Uncovering Chatgpt's Capabilities In Recommender Systems (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 116 citations
Dai et al.
Zephyr: Direct Distillation Of LM Alignment (2023) • Arxiv • 51 citations
Tunstall et al.
Masterkey: Automated Jailbreak Across Multiple Large Language Model Chatbots (2023) • Network and Distributed System Security Symposium • 65 citations
Deng et al.
Qlora: Efficient Finetuning Of Quantized Llms (2023) • No Venue
Dettmers et al.
Toxicity In Chatgpt: Analyzing Persona-assigned Language Models (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 92 citations
Deshpande et al.
Enhancing Chat Language Models By Scaling High-quality Instructional Conversations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 60 citations
Ding et al.
Llama 2: Open Foundation And Fine-tuned Chat Models (2023) • No Venue
Touvron et al.
Robot-enabled Construction Assembly With Automated Sequence Planning Based On Chatgpt: Robogpt (2023) • Buildings • 63 citations
You et al.
Judging Llm-as-a-judge With Mt-bench And Chatbot Arena (2023) • No Venue
Zheng et al.
Ethical Chatgpt: Concerns, Challenges, And Commandments (2023) • Electronics • 54 citations
Zhou et al.
Role-play With Large Language Models (2023) • Nature • 248 citations
Murray Shanahan, Kyle McDonell, Laria Reynolds
Let's Have A Chat! A Conversation With Chatgpt: Technology, Applications, And Limitations (2023) • Artificial Intelligence and Applications • 93 citations
Sakib Shahriar, Kadhim Hayawi
Jais And Jais-chat: Arabic-centric Foundation And Instruction-tuned Open Generative Large Language Models (2023) • No Venue
Sengupta et al.
Exploring The Feasibility Of Chatgpt For Event Extraction (2023) • Arxiv • 55 citations
Gao et al.
Improved Trust In Human-robot Collaboration With Chatgpt (2023) • IEEE Access • 164 citations
Yang Ye, Hengxu You, Jing Du
Transformative Effects Of Chatgpt On Modern Education: Emerging Era Of AI Chatbots (2023) • Internet of Things and Cyber-Physical Systems • 372 citations
Gill et al.
Chatgpt: Vision And Challenges (2023) • Internet of Things and Cyber-Physical Systems • 204 citations
Sukhpal Singh Gill, Rupinder Kaur
Audiopalm: A Large Language Model That Can Speak And Listen (2023) • No Venue
Rubenstein et al.
PIPPA: A Partially Synthetic Conversational Dataset (2023) • No Venue
Tear Gosling, Alpin Dale, Yinhe Zheng
Artificial Muses: Generative Artificial Intelligence Chatbots Have Risen To Human-level Creativity (2023) • Journal of Creativity • 153 citations
Jennifer Haase, Paul H. P. Hanel
Medalpaca -- An Open-source Collection Of Medical Conversational AI Models And Training Data (2023) • Arxiv • 102 citations
Han et al.
Platform-independent And Curriculum-oriented Intelligent Assistant For Higher Education (2023) • International Journal of Educational Technology in Higher Education • 80 citations
Sajja et al.
The Political Ideology Of Conversational AI: Converging Evidence On Chatgpt's Pro-environmental, Left-libertarian Orientation (2023) • SSRN Electronic Journal • 85 citations
Jochen Hartmann, Jasper Schwenzow, Maximilian Witte
Annollm: Making Large Language Models To Be Better Crowdsourced Annotators (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track) • 46 citations
He et al.
Stay On Topic With Classifier-free Guidance (2023) • No Venue
Sanchez et al.
Appagent: Multimodal Agents As Smartphone Users (2023) • No Venue
Zhang et al.
Agents: An Open-source Framework For Autonomous Language Agents (2023) • No Venue
Zhou et al.
Metagpt: Meta Programming For A Multi-agent Collaborative Framework (2023) • Arxiv • 124 citations
Hong et al.
Cogagent: A Visual Language Model For GUI Agents (2023) • No Venue
Hong et al.
Personality Traits In Large Language Models (2023) • No Venue
Safdari et al.
Opportunities And Challenges Of Chatgpt For Design Knowledge Management (2023) • Procedia CIRP • 76 citations
Hu et al.
Zero-shot Information Extraction From Radiological Reports Using Chatgpt (2023) • International Journal of Medical Informatics • 67 citations
Hu et al.
Audiogpt: Understanding And Generating Speech, Music, Sound, And Talking Head (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Huang et al.
Is Chatgpt Better Than Human Annotators? Potential And Limitations Of Chatgpt In Explaining Implicit Hate Speech (2023) • Companion Proceedings of the ACM Web Conference 2023 • 178 citations
Fan Huang, Haewoon Kwak, Jisun An
Chatgpt For Shaping The Future Of Dentistry: The Potential Of Multi-modal Large Language Model (2023) • International Journal of Oral Science • 221 citations
Huang et al.
Character-llm: A Trainable Agent For Role-playing (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 62 citations
Shao et al.
Facilitating Self-guided Mental Health Interventions Through Human-language Model Interaction: A Case Study Of Cognitive Restructuring (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 43 citations
Sharma et al.
Testing The Reliability Of Chatgpt For Text Annotation And Classification: A Cautionary Remark (2023) • Arxiv • 80 citations
Michael V. Reiss
Exploring The Limits Of Chatgpt For Query Or Aspect-based Text Summarization (2023) • Arxiv • 89 citations
Yang et al.
The Programmer's Assistant: Conversational Interaction With A Large Language Model For Software Development (2023) • IUI '23: 28th International Conference on Intelligent User Interfaces • 177 citations
Ross et al.
Perception, Performance, And Detectability Of Conversational Artificial Intelligence Across 32 University Courses (2023) • Scientific Reports • 105 citations
Ibrahim et al.
Consistency Analysis Of Chatgpt (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 49 citations
Myeongjun Erik Jang, Thomas Lukasiewicz
Personallm: Investigating The Ability Of Large Language Models To Express Personality Traits (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 40 citations
Jiang et al.
Is Chatgpt Fair For Recommendation? Evaluating Fairness In Large Language Model Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 94 citations
Zhang et al.
Teach AI How To Code: Using Large Language Models As Teachable Agents For Programming Education (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 53 citations
Jin et al.
Is Stack Overflow Obsolete? An Empirical Study Of The Characteristics Of Chatgpt Answers To Stack Overflow Questions (2023) • Arxiv • 49 citations
Kabir et al.
Extracting Accurate Materials Data From Research Papers With Conversational Language Models And Prompt Engineering (2023) • Nature Communications • 212 citations
MacIej P. Polak, Dane Morgan
Chatbots Put To The Test In Math And Logic Problems: A Preliminary Comparison And Assessment Of Chatgpt-3.5, Chatgpt-4, And Google Bard (2023) • AI • 58 citations
Vagelis Plevris, George Papazafeiropoulos, Alejandro Jiménez Rios
Can GPT-4 Replicate Empirical Software Engineering Research? (2023) • NEJM AI • 101 citations
Liang et al.
Doctorglm: Fine-tuning Your Chinese Doctor Is Not A Herculean Task (2023) • Arxiv • 70 citations
Xiong et al.
Investigating The Use Of Chatgpt For The Scheduling Of Construction Projects (2023) • Buildings • 174 citations
Samuel A. Prieto, Eyob T. Mengiste, Borja García de Soto
Llm-eval: Unified Multi-dimensional Automatic Evaluation For Open-domain Conversations With Large Language Models (2023) • Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023) • 45 citations
Yen-Ting Lin, Yun-Nung Chen
Learning To Model The World With Language (2023) • No Venue
Lin et al.
CAMEL: Communicative Agents For "mind" Exploration Of Large Language Model Society (2023) • Arxiv • 87 citations
Li et al.
Chatdoctor: A Medical Chat Model Fine-tuned On A Large Language Model Meta-ai (llama) Using Medical Domain Knowledge (2023) • Cureus • 256 citations
Li et al.
Chatgpt As An Attack Tool: Stealthy Textual Backdoor Attack Via Blackbox Generative Model Trigger (2023) • Reliability Engineering & System Safety • 75 citations
Li et al.
Llava-med: Training A Large Language-and-vision Assistant For Biomedicine In One Day (2023) • Arxiv • 216 citations
Li et al.
Revisiting K-nn For Fine-tuning Pre-trained Language Models (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 61 citations
Li et al.
Modelscope-agent: Building Your Customizable Agent System With Open-source Large Language Models (2023) • No Venue
Li et al.
Table-gpt: Table-tuned GPT For Diverse Table Tasks (2023) • No Venue
Li et al.
Videochat: Chat-centric Video Understanding (2023) • Arxiv • 90 citations
Li et al.
Chatdev: Communicative Agents For Software Development (2023) • Arxiv • 65 citations
Qian et al.
Chatgpt Vs. Google: A Comparative Study Of Search Performance And User Experience (2023) • SSRN Electronic Journal • 46 citations
Ruiyun Xu, Yue Feng, Hailiang Chen
Lemur: Harmonizing Natural Language And Code For Language Agents (2023) • No Venue
Xu et al.
Baize: An Open-source Chat Model With Parameter-efficient Tuning On Self-chat Data (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 94 citations
Xu et al.
A Survey Of GPT-3 Family Large Language Models Including Chatgpt And GPT-4 (2023) • Natural Language Processing Journal • 201 citations
Katikapalli Subramanyam Kalyan
LARP: Language-agent Role Play For Open-world Games (2023) • No Venue
Yan et al.
"it's A Fair Game", Or Is It? Examining How Users Navigate Disclosure Risks And Benefits When Using Llm-based Conversational Agents (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 46 citations
Zhang et al.
Chatgpt For Programming Numerical Methods (2023) • Journal of Machine Learning for Modeling and Computing • 81 citations
Ali Kashefi, Tapan Mukerji
Gptaraeval: A Comprehensive Evaluation Of Chatgpt On Arabic NLP (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 41 citations
Khondaker et al.
Mindfuldiary: Harnessing Large Language Model To Support Psychiatric Patients' Journaling (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 66 citations
Kim et al.
Is Chatgpt A General-purpose Natural Language Processing Task Solver? (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 410 citations
Qin et al.
Sasha: Creative Goal-oriented Reasoning In Smart Homes With Large Language Models (2023) • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies • 44 citations
King et al.
Chatgpt: Jack Of All Trades, Master Of None (2023) • Information Fusion • 468 citations
Kocoń et al.
Glamm: Pixel Grounding Large Multimodal Model (2023) • No Venue
Rasheed et al.
Chatgpt: Beginning Of An End Of Manual Linguistic Data Annotation? Use Case Of Automatic Genre Identification (2023) • Arxiv • 64 citations
Taja Kuzman, Igor Mozetič, Nikola Ljubešić
Chinese Intermediate English Learners Outdid Chatgpt In Deep Cohesion: Evidence From English Narrative Writing (2023) • System • 50 citations
Zhou et al.
Xuanyuan 2.0: A Large Chinese Financial Chat Model With Hundreds Of Billions Parameters (2023) • Proceedings of the 32nd ACM International Conference on Information and Knowledge Management • 57 citations
Xuanyu Zhang, Qing Yang, Dongliang Xu
Hugginggpt: Solving AI Tasks With Chatgpt And Its Friends In Hugging Face (2023) • Arxiv • 264 citations
Shen et al.
VISAR: A Human-ai Argumentative Writing Assistant With Visual Programming And Rapid Draft Prototyping (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 67 citations
Zhang et al.
User-centric Conversational Recommendation With Multi-aspect User Modeling (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 52 citations
Li et al.
In Conversation With Artificial Intelligence: Aligning Language Models With Human Values (2022) • Philosophy & Technology • 77 citations
Atoosa Kasirzadeh, Iason Gabriel
Prosocialdialog: A Prosocial Backbone For Conversational Agents (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 42 citations
Kim et al.
Empathetic Conversational Systems: A Review Of Current Advances, Gaps, And Opportunities (2022) • IEEE Transactions on Affective Computing • 44 citations
Aravind Sesagiri Raamkumar, Yinping Yang
Training A Helpful And Harmless Assistant With Reinforcement Learning From Human Feedback (2022) • Arxiv • 346 citations
Bai et al.
A Model-agnostic Data Manipulation Method For Persona-based Dialogue Generation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 40 citations
Cao et al.
Impacts Of Personal Characteristics On User Trust In Conversational Recommender Systems (2022) • CHI Conference on Human Factors in Computing Systems • 43 citations
Wanling Cai, Yucheng Jin, Li Chen
Blenderbot 3: A Deployed Conversational Agent That Continually Learns To Responsibly Engage (2022) • Arxiv • 98 citations
Shuster et al.
UX Research On Conversational Human-ai Interaction: A Literature Review Of The ACM Digital Library (2022) • CHI Conference on Human Factors in Computing Systems • 70 citations
Zheng et al.
A Unified Multi-task Learning Framework For Multi-goal Conversational Recommender Systems (2022) • ACM Transactions on Information Systems • 56 citations
Deng et al.
The Moral Integrity Corpus: A Benchmark For Ethical Dialogue Systems (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Ziems et al.
MISC: A Mixed Strategy-aware Model Integrating COMET For Emotional Support Conversation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 66 citations
Tu et al.
Large Language Models And The Reverse Turing Test (2022) • Neural Computation • 99 citations
Terrence Sejnowski
Evaluating Mixed-initiative Conversational Search Systems Via User Simulation (2022) • Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining • 42 citations
Ivan Sekulić, Mohammad Aliannejadi, Fabio Crestani
Control Globally, Understand Locally: A Global-to-local Hierarchical Graph Network For Emotional Support Conversation (2022) • Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22} • 42 citations
Peng et al.
Improving Alignment Of Dialogue Agents Via Targeted Human Judgements (2022) • Arxiv • 130 citations
Glaese et al.
How Would Stance Detection Techniques Evolve After The Launch Of Chatgpt? (2022) • Arxiv • 66 citations
Zhang et al.
Storybuddy: A Human-ai Collaborative Chatbot For Parent-child Interactive Storytelling With Flexible Parental Involvement (2022) • CHI Conference on Human Factors in Computing Systems • 127 citations
Zhang et al.
"I Think This Is The Most Disruptive Technology": Exploring Sentiments Of Chatgpt Early Adopters Using Twitter Data (2022) • Arxiv • 204 citations
Haque et al.
Chatgpt: The End Of Online Exam Integrity? (2022) • Arxiv • 349 citations
Teo Susnjak
Coreference-aware Dialogue Summarization (2021) • Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue • 46 citations
Zhengyuan Liu, Ke Shi, Nancy F. Chen
A Survey On Spoken Language Understanding: Recent Advances And New Frontiers (2021) • Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence • 68 citations
Qin et al.
Towards Enhancing Fine-grained Details For Image Matting (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 160 citations
Chang Liu, Henghui Ding, Xudong Jiang
Eliciting And Analysing Users' Envisioned Dialogues With Perfect Voice Assistants (2021) • Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems • 64 citations
Völkel et al.
Natural Language Understanding For Argumentative Dialogue Systems In The Opinion Building Domain (2021) • Knowledge-Based Systems • 41 citations
Abro et al.
Building And Evaluating Open-domain Dialogue Corpora With Clarifying Questions (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Aliannejadi et al.
Retrieval Augmentation Reduces Hallucination In Conversation (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 370 citations
Shuster et al.
Software-based Dialogue Systems: Survey, Taxonomy And Challenges (2021) • ACM Computing Surveys • 45 citations
Quim Motger, Xavier Franch, Jordi Marco
Just Say No: Analyzing The Stance Of Neural Dialogue Generation In Offensive Contexts (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 48 citations
Baheti et al.
Text2gestures: A Transformer-based Network For Generating Emotive Body Gestures For Virtual Agents (2021) • 2021 IEEE Virtual Reality and 3D User Interfaces (VR) • 109 citations
Bhattacharya et al.
Crslab: An Open-source Toolkit For Building Conversational Recommender System (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations • 44 citations
Zhou et al.
Out-of-scope Intent Detection With Self-supervision And Discriminative Training (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 46 citations
Zhan et al.
Structure-aware Abstractive Conversation Summarization Via Discourse And Action Graphs (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 68 citations
Jiaao Chen, Diyi Yang
Graph Based Network With Contextualized Representations Of Turns In Dialogue (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 40 citations
Bongseok Lee, Yong Suk Choi
A Short Survey Of Pre-trained Language Models For Conversational AI-A Newage In NLP (2021) • ACSW '20: Australasian Computer Science Week 2020 • 48 citations
Munazza Zaib, Quan Z. Sheng, Wei Emma Zhang
Directed Acyclic Graph Network For Conversational Emotion Recognition (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 245 citations
Shen et al.
Few-shot Bot: Prompt-based Learning For Dialogue Systems (2021) • Arxiv • 45 citations
Madotto et al.
Beyond Goldfish Memory: Long-term Open-domain Conversation (2021) • Arxiv • 40 citations
Jing Xu, Arthur Szlam, Jason Weston
Increasing Faithfulness In Knowledge-grounded Dialogue With Controllable Features (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 69 citations
Rashkin et al.
Transferable Dialogue Systems And User Simulators (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Tseng et al.
One Chatbot Per Person: Creating Personalized Chatbots Based On Implicit User Profiles (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 55 citations
Ma et al.
Text Is NOT Enough: Integrating Visual Impressions Into Open-domain Dialogue Generation (2021) • ACM Computing Surveys • 151 citations
Shen et al.
Alignment Of Language Agents (2021) • Arxiv • 41 citations
Kenton et al.
GALAXY: A Generative Pre-trained Model For Task-oriented Dialog With Semi-supervised Learning And Explicit Policy Injection (2021) • Arxiv • 45 citations
He et al.
Multi-task Pre-training For Plug-and-play Task-oriented Dialogue System (2021) • Arxiv • 57 citations
Su et al.
Bob: BERT Over BERT For Training Persona-based Dialogue Models From Limited Personalized Data (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 91 citations
Song et al.
Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies (2021) • Behavior Research Methods • 65 citations
Flemotomos et al.
Revcore: Review-augmented Conversational Recommendation (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 64 citations
Lu et al.
Neural Path Hunter: Reducing Hallucination In Dialogue Systems Via Path Grounding (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 68 citations
Dziri et al.
CDL: Curriculum Dual Learning For Emotion-controllable Response Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 80 citations
Lei Shen, Yang Feng
Conceptual Metaphors Impact Perceptions Of Human-ai Collaboration (2020) • Proceedings of the ACM on Human-Computer Interaction • 124 citations
Khadpe et al.
A Comparison Of LSTM And BERT For Small Corpus (2020) • Arxiv • 66 citations
Aysu Ezen-Can
Will I Sound Like Me? Improving Persona Consistency In Dialogues Through Pragmatic Self-consciousness (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 44 citations
Hyunwoo Kim, Byeongchang Kim, Gunhee Kim
If I Hear You Correctly: Building And Evaluating Interview Chatbots With Active Listening Skills (2020) • Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems • 95 citations
Xiao et al.
Dynamic Knowledge Routing Network For Target-guided Open-domain Conversation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 87 citations
Qin et al.
Speaker-aware BERT For Multi-turn Response Selection In Retrieval-based Chatbots (2020) • Proceedings of the 29th ACM International Conference on Information & Knowledge Management • 148 citations
Gu et al.
Beyond Domain Apis: Task-oriented Conversational Modeling With Unstructured Knowledge Access (2020) • Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue • 55 citations
Kim et al.
INSPIRED: Toward Sociable Recommendation Dialog Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 95 citations
Hayati et al.
Trippy: A Triple Copy Strategy For Value Independent Neural Dialog State Tracking (2020) • Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue • 181 citations
Heck et al.
Discern: Discourse-aware Entailment Reasoning Network For Conversational Machine Reading (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Gao et al.
From Machine Reading Comprehension To Dialogue State Tracking: Bridging The Gap (2020) • Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI • 49 citations
Gao et al.
A Co-interactive Transformer For Joint Slot Filling And Intent Detection (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 89 citations
Qin et al.
Sequential Latent Knowledge Selection For Knowledge-grounded Dialogue (2020) • Arxiv • 111 citations
Byeongchang Kim, Jaewoo Ahn, Gunhee Kim
Evaluating Conversational Recommender Systems Via User Simulation (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 73 citations
Shuo Zhang, Krisztian Balog
Imitating Interactive Intelligence (2020) • Arxiv • 43 citations
Abramson et al.
Discriminative Nearest Neighbor Few-shot Intent Detection By Transferring Natural Language Inference (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 65 citations
Zhang et al.
Towards A Human-like Open-domain Chatbot (2020) • Arxiv • 266 citations
Adiwardana et al.
CPM: A Large-scale Generative Chinese Pre-trained Language Model (2020) • AI Open • 59 citations
Zhang et al.
Generate, Delete And Rewrite: A Three-stage Framework For Improving Persona Consistency Of Dialogue Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 40 citations
Song et al.
TOD-BERT: Pre-trained Natural Language Understanding For Task-oriented Dialogue (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 203 citations
Wu et al.
Towards Conversational Recommendation Over Multi-type Dialogs (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 151 citations
Liu et al.
Graphdialog: Integrating Graph Knowledge Into End-to-end Task-oriented Dialogue Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Shiquan Yang, Rui Zhang, Sarah Erfani
HHH: An Online Medical Chatbot System Based On Knowledge Graph And Hierarchical Bi-directional Attention (2020) • Proceedings of the Australasian Computer Science Week Multiconference • 47 citations
Qiming Bao, Lin Ni, Jiamou Liu
Improving Multi-turn Response Selection Models With Complementary Last-utterance Selection By Instance Weighting (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 290 citations
Zhou et al.
A Survey On Conversational Recommender Systems (2020) • ACM Computing Surveys • 313 citations
Jannach et al.
Joint Contextual Modeling For ASR Correction And Language Understanding (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 42 citations
Weng et al.
Chatbot Interaction With Artificial Intelligence: Human Data Augmentation With T5 And Language Transformer Ensemble For Text Classification (2020) • Journal of Ambient Intelligence and Humanized Computing • 54 citations
Jordan J. Bird, Anikó Ekárt, Diego R. Faria
VD-BERT: A Unified Vision And Dialog Transformer With BERT (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 43 citations
Wang et al.
Multi-domain Dialogue Acts And Response Co-generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 45 citations
Wang et al.
Response Selection For Multi-party Conversations With Dynamic Topic Tracking (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 40 citations
Weishi Wang, Shafiq Joty, Steven C. H. Hoi
A Large-scale Chinese Short-text Conversation Dataset (2020) • Lecture Notes in Computer Science • 99 citations
Wang et al.
Low-resource Knowledge-grounded Dialogue Generation (2020) • Arxiv • 84 citations
Zhao et al.
DIET: Lightweight Language Understanding For Dialogue Systems (2020) • Arxiv • 112 citations
Bunk et al.
Data Manipulation: Towards Effective Instance Learning For Neural Dialogue Generation Via Learning To Augment And Reweight (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 56 citations
Cai et al.
SOLOIST: Building Task Bots At Scale With Transfer Learning And Machine Teaching (2020) • Arxiv • 99 citations
Peng et al.
Mitigating Gender Bias For Neural Dialogue Generation With Adversarial Learning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 52 citations
Liu et al.
Low-resource Domain Adaptation For Compositional Task-oriented Semantic Parsing (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 64 citations
Chen et al.
Multi-stage Influence Function (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 119 citations
Chen et al.
Mintl: Minimalist Transfer Learning For Task-oriented Dialogue Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 131 citations
Lin et al.
Dialoglue: A Natural Language Understanding Benchmark For Task-oriented Dialogue (2020) • Arxiv • 97 citations
Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur
Bridging Text And Video: A Universal Multimodal Transformer For Video-audio Scene-aware Dialog (2020) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 52 citations
Li et al.
Zero-resource Knowledge-grounded Dialogue Generation (2020) • Arxiv • 50 citations
Li et al.
Query Resolution For Conversational Search With Limited Supervision (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 106 citations
Voskarides et al.
A Taxonomy Of Empathetic Response Intents In Human Social Conversations (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 81 citations
Anuradha Welivita, Pearl Pu
NL4DV: A Toolkit For Generating Analytic Specifications For Data Visualization From Natural Language Queries (2020) • IEEE Transactions on Visualization and Computer Graphics • 164 citations
Arpit Narechania, Arjun Srinivasan, John Stasko
Guiding Attention In Sequence-to-sequence Models For Dialogue Act Prediction (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Colombo et al.
Span-convert: Few-shot Span Extraction For Dialog With Pretrained Conversational Representations (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 43 citations
Coope et al.
You Impress Me: Dialogue Generation Via Mutual Persona Perception (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 138 citations
Liu et al.
Analysing The Effect Of Clarifying Questions On Document Ranking In Conversational Search (2020) • Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval • 45 citations
Krasakis et al.
Mutual: A Dataset For Multi-turn Dialogue Reasoning (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 110 citations
Cui et al.
Coco: Controllable Counterfactuals For Evaluating Dialogue State Trackers (2020) • Arxiv • 41 citations
Li et al.
GRADE: Automatic Graph-enhanced Coherence Metric For Evaluating Open-domain Dialogue Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 67 citations
Huang et al.
Knowledge-grounded Dialogue Generation With Pre-trained Language Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Zhao et al.
AGIF: An Adaptive Graph-interactive Framework For Joint Multiple Intent Detection And Slot Filling (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 108 citations
Qin et al.
Recipes For Safety In Open-domain Chatbots (2020) • Arxiv • 98 citations
Xu et al.
Recipes For Building An Open-domain Chatbot (2020) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 170 citations
Roller et al.
Coach: A Coarse-to-fine Approach For Cross-domain Slot Filling (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 99 citations
Liu et al.
Towards Robustifying NLI Models Against Lexical Dataset Biases (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 143 citations
Xiang Zhou, Mohit Bansal
Designing Precise And Robust Dialogue Response Evaluators (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 40 citations
Tianyu Zhao, Divesh Lala, Tatsuya Kawahara
Scaling Multi-domain Dialogue State Tracking Via Query Reformulation (2019) • Proceedings of the 2019 Conference of the North • 44 citations
Rastogi et al.
Learning From Dialogue After Deployment: Feed Yourself, Chatbot! (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 66 citations
Hancock et al.
Recommendation As A Communication Game: Self-supervised Bot-play For Goal-oriented Dialogue (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 80 citations
Kang et al.
Cosql: A Conversational Text-to-sql Challenge Towards Cross-domain Natural Language Interfaces To Databases (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 122 citations
Yu et al.
The Second Conversational Intelligence Challenge (convai2) (2019) • The Springer Series on Challenges in Machine Learning • 361 citations
Dinan et al.
Transferable Multi-domain State Generator For Task-oriented Dialogue Systems (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 404 citations
Wu et al.
Exploiting Persona Information For Diverse Generation Of Conversational Responses (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 126 citations
Song et al.
Generating Persona Consistent Dialogues By Exploiting Natural Language Inference (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 65 citations
Song et al.
A Simple But Effective Method To Incorporate Multi-turn Context With BERT For Conversational Machine Comprehension (2019) • Proceedings of the First Workshop on NLP for Conversational AI • 42 citations
Ohsugi et al.
Learning To Generate Questions By Learning What Not To Generate (2019) • WWW '19: The Web Conference • 48 citations
Liu et al.
Challenges In Building Intelligent Open-domain Dialog Systems (2019) • Arxiv • 44 citations
Minlie Huang, Xiaoyan Zhu, Jianfeng Gao
Personalizing Dialogue Agents Via Meta-learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 177 citations
Lin et al.
What Makes A Good Conversation? Challenges In Designing Truly Conversational Agents (2019) • Arxiv • 134 citations
Clark et al.
Improving Multi-turn Dialogue Modelling With Utterance Rewriter (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 106 citations
Su et al.
Approximating Interactive Human Evaluation With Self-play For Open-domain Dialog Systems (2019) • Arxiv • 51 citations
Ghandeharioun et al.
Recosa: Detecting The Relevant Contexts With Self-attention For Multi-turn Dialogue Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 121 citations
Zhang et al.
Dialogpt: Large-scale Generative Pre-training For Conversational Response Generation (2019) • Arxiv • 103 citations
Zhang et al.
Asking Clarifying Questions In Open-domain Information-seeking Conversations (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 175 citations
Aliannejadi et al.
What Makes A Good Conversation? How Controllable Attributes Affect Human Judgments (2019) • Proceedings of the 2019 Conference of the North • 224 citations
See et al.
Incremental Transformer With Deliberation Decoder For Document Grounded Conversations (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 42 citations
Li et al.
Vision-and-dialog Navigation (2019) • Arxiv • 118 citations
Thomason et al.
ACUTE-EVAL: Improved Dialogue Evaluation With Optimized Questions And Multi-turn Comparisons (2019) • Arxiv • 79 citations
Margaret Li, Jason Weston, Stephen Roller
A Stack-propagation Framework With Token-level Intent Detection For Spoken Language Understanding (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 302 citations
Qin et al.
End-to-end Knowledge-routed Relational Dialogue System For Automatic Diagnosis (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 154 citations
Xu et al.
Constrained Decoding For Neural NLG From Compositional Representations In Task-oriented Dialogue (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 74 citations
Balakrishnan et al.
Zero-shot Cross-lingual Dialogue Systems With Transferable Latent Variables (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 76 citations
Liu et al.
Way Off-policy Batch Deep Reinforcement Learning Of Implicit Human Preferences In Dialog (2019) • Arxiv • 131 citations
Jaques et al.
Multi-task Learning With Language Modeling For Question Generation (2019) • Arxiv • 58 citations
Wenjie Zhou, Minghua Zhang, Yunfang Wu
Structuring Latent Spaces For Stylized Response Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 48 citations
Gao et al.
Target-guided Open-domain Conversation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 109 citations
Tang et al.
GECOR: An End-to-end Generative Ellipsis And Co-reference Resolution Model For Task-oriented Dialogue (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 49 citations
Quan et al.
Jointly Optimizing Diversity And Relevance In Neural Response Generation (2019) • Proceedings of the 2019 Conference of the North • 90 citations
Gao et al.
Hello, It's GPT-2 -- How Can I Help You? Towards The Use Of Pretrained Language Models For Task-oriented Dialogue Systems (2019) • Proceedings of the 3rd Workshop on Neural Generation and Translation • 127 citations
Paweł Budzianowski, Ivan Vulić
A Discrete CVAE For Response Generation On Short-text Conversation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Gao et al.
Personalized Dialogue Generation With Diversified Traits (2019) • Arxiv • 89 citations
Zheng et al.
Multimodal Transformer Networks For End-to-end Video-grounded Dialogue Systems (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 51 citations
Le et al.
Meta-learning For Low-resource Natural Language Generation In Task-oriented Dialogue Systems (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 84 citations
Mi et al.
Proactive Human-machine Conversation With Explicit Conversation Goals (2019) • Arxiv • 41 citations
Wu et al.
An End-to-end Conversational Style Matching Agent (2019) • Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents • 43 citations
Hoegen et al.
A Hybrid Retrieval-generation Neural Conversation Model (2019) • Proceedings of the 28th ACM International Conference on Information and Knowledge Management • 65 citations
Yang et al.
Semantically Conditioned Dialog Response Generation Via Hierarchical Disentangled Self-attention (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 128 citations
Chen et al.
Higru: Hierarchical Gated Recurrent Units For Utterance-level Emotion Recognition (2019) • Arxiv • 70 citations
Jiao et al.
Training Neural Response Selection For Task-oriented Dialogue Systems (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 83 citations
Henderson et al.
Convert: Efficient And Accurate Conversational Representations From Transformers (2019) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 41 citations
Henderson et al.
Conversing By Reading: Contentful Neural Conversation With On-demand Machine Reading (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 97 citations
Qin et al.
Entity-consistent End-to-end Task-oriented Dialogue System With KB Retriever (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 57 citations
Qin et al.
Convlab: Multi-domain End-to-end Dialog System Platform (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 87 citations
Lee et al.
Speaker-follower Models For Vision-and-language Navigation (2018) • Arxiv • 244 citations
Fried et al.
Building A Conversational Agent Overnight With Dialogue Self-play (2018) • Arxiv • 161 citations
Shah et al.
Conversational AI: The Science Behind The Alexa Prize (2018) • Alexa.Prize.Proceedings https://developer.amazon.com/alexaprize/proceedings accessed (2018)-01-01 • 201 citations
Ram et al.
Response Ranking With Deep Matching Networks And External Knowledge In Information-seeking Conversation Systems (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 153 citations
Yang et al.
Zero-shot Dialog Generation With Cross-domain Latent Actions (2018) • Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue • 52 citations
Tiancheng Zhao, Maxine Eskenazi
Complex Sequential Question Answering: Towards Learning To Converse Over Linked Question Answer Pairs With A Knowledge Graph (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 172 citations
Saha et al.
Deep Dyna-q: Integrating Planning For Task-completion Dialogue Policy Learning (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 177 citations
Peng et al.
Unsupervised Discrete Sentence Representation Learning For Interpretable Neural Dialog Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 136 citations
Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi
Retrieve And Refine: Improved Sequence Generation Models For Dialogue (2018) • Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI • 176 citations
Jason Weston, Emily Dinan, Alexander H. Miller
Dialog-context Aware End-to-end Speech Recognition (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 40 citations
Suyoun Kim, Florian Metze
User Modeling For Task Oriented Dialogues (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 45 citations
Gur et al.
Interpretation Of Natural Language Rules In Conversational Machine Reading (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 141 citations
Saeidi et al.
An End-to-end Approach For Handling Unknown Slot Values In Dialogue State Tracking (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 148 citations
Puyang Xu, Qi Hu
Polite Dialogue Generation Without Parallel Data (2018) • Transactions of the Association for Computational Linguistics • 166 citations
Tong Niu, Mohit Bansal
Flowqa: Grasping Flow In History For Conversational Machine Comprehension (2018) • Arxiv • 63 citations
Hsin-Yuan Huang, Eunsol Choi, Wen-Tau Yih
Training Millions Of Personalized Dialogue Agents (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 205 citations
Mazaré et al.
Multiwoz -- A Large-scale Multi-domain Wizard-of-oz Dataset For Task-oriented Dialogue Modelling (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 312 citations
Budzianowski et al.
Wizard Of Wikipedia: Knowledge-powered Conversational Agents (2018) • Arxiv • 479 citations
Dinan et al.
Advancing The State Of The Art In Open Domain Dialog Systems Through The Alexa Prize (2018) • Arxiv • 61 citations
Khatri et al.
Towards Exploiting Background Knowledge For Building Conversation Systems (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 158 citations
Moghe et al.
Towards Deep Conversational Recommendations (2018) • Arxiv • 123 citations
Li et al.
Modeling Multi-turn Conversation With Deep Utterance Aggregation (2018) • COLING 2018 pages 3740-3752 • 93 citations
Zhang et al.
Microsoft Dialogue Challenge: Building End-to-end Task-completion Dialogue Systems (2018) • Arxiv • 60 citations
Li et al.
From Eliza To Xiaoice: Challenges And Opportunities With Social Chatbots (2018) • Frontiers of Information Technology & Electronic Engineering • 678 citations
Heung-Yeung Shum, Xiaodong He, di Li
A Hierarchical Latent Structure For Variational Conversation Modeling (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 102 citations
Yookoon Park, Jaemin Cho, Gunhee Kim
Evorus: A Crowd-powered Conversational Assistant Built To Automate Itself Over Time (2018) • CHI '18: CHI Conference on Human Factors in Computing Systems • 48 citations
Ting-Hao 'Kenneth' Huang, Joseph Chee Chang, Jeffrey P. Bigham
Generating Informative And Diverse Conversational Responses Via Adversarial Information Maximization (2018) • Arxiv • 181 citations
Zhang et al.
Generating More Interesting Responses In Neural Conversation Models With Distributional Constraints (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 90 citations
Baheti et al.
Chatpainter: Improving Text To Image Generation Using Dialogue (2018) • Arxiv • 78 citations
Sharma et al.
Extending Neural Generative Conversational Model Using External Knowledge Sources (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 69 citations
Prasanna Parthasarathi, Joelle Pineau
Conversations Gone Awry: Detecting Early Signs Of Conversational Failure (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 174 citations
Zhang et al.
Adversarial Over-sensitivity And Over-stability Strategies For Dialogue Models (2018) • Proceedings of the 22nd Conference on Computational Natural Language Learning • 61 citations
Tong Niu, Mohit Bansal
Global-locally Self-attentive Dialogue State Tracker (2018) • Arxiv • 70 citations
Victor Zhong, Caiming Xiong, Richard Socher
End-to-end Optimization Of Goal-driven And Visually Grounded Dialogue Systems (2017) • Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence • 88 citations
Strub et al.
RUBER: An Unsupervised Method For Automatic Evaluation Of Open-domain Dialog Systems (2017) • Arxiv • 118 citations
Tao et al.
Hierarchical Recurrent Attention Network For Response Generation (2017) • Arxiv • 116 citations
Xing et al.
Flexible End-to-end Dialogue System For Knowledge Grounded Conversation (2017) • Arxiv • 88 citations
Zhu et al.
Latent Variable Dialogue Models And Their Diversity (2017) • Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers • 50 citations
Kris Cao, Stephen Clark
Colors In Context: A Pragmatic Neural Model For Grounded Language Understanding (2017) • Transactions of the Association for Computational Linguistics • 104 citations
Monroe et al.
End-to-end Task-completion Neural Dialogue Systems (2017) • Arxiv • 58 citations
Li et al.
Personalization In Goal-oriented Dialog (2017) • Arxiv • 63 citations
Chaitanya K. Joshi, Fei Mi, Boi Faltings
Learning A Neural Semantic Parser From User Feedback (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 54 citations
Iyer et al.
Deal Or No Deal? End-to-end Learning For Negotiation Dialogues (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 51 citations
Lewis et al.
Key-value Retrieval Networks For Task-oriented Dialogue (2017) • Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue • 51 citations
Mihail Eric, Christopher D. Manning
Neural Response Generation With Dynamic Vocabularies (2017) • Arxiv • 50 citations
Wu et al.
A Knowledge-grounded Neural Conversation Model (2017) • Proceedings of the AAAI Conference on Artificial Intelligence • 234 citations
Ghazvininejad et al.
Affect-lm: A Neural Language Model For Customizable Affective Text Generation (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 75 citations
Ghosh et al.
Just ASK: Building An Architecture For Extensible Self-service Spoken Language Understanding (2017) • Arxiv • 56 citations
Kumar et al.
Attentive Memory Networks: Efficient Machine Reading For Conversational Search (2017) • Proceedings of 1st International Workshop on Conversational Approaches to Information Retrieval Tokyo Japan August 11 2017 (CAIR17) • 40 citations
Tom Kenter, Maarten de Rijke
Best Of Both Worlds: Transferring Knowledge From Discriminative Learning To A Generative Visual Dialog Model (2017) • Arxiv • 86 citations
Lu et al.
Rasa: Open Source Language Understanding And Dialogue Management (2017) • Arxiv • 139 citations
Bocklisch et al.
Learning Discourse-level Diversity For Neural Dialog Models Using Conditional Variational Autoencoders (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 704 citations
Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
Image-grounded Conversations: Multimodal Context For Natural Question And Response Generation (2017) • Arxiv • 117 citations
Mostafazadeh et al.
Parlai: A Dialog Research Software Platform (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 108 citations
Miller et al.
Steering Output Style And Topic In Neural Response Generation (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 66 citations
Wang et al.
Adversarial Evaluation Of Dialogue Models (2017) • Arxiv • 66 citations
Anjuli Kannan, Oriol Vinyals
Generating High-quality And Informative Conversation Responses With Sequence-to-sequence Models (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 187 citations
Shao et al.
Deep Reinforcement Learning For Dialogue Generation (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 1034 citations
Li et al.
Multiresolution Recurrent Neural Networks: An Application To Dialogue Response Generation (2016) • Arxiv • 73 citations
Serban et al.
Learning End-to-end Goal-oriented Dialog (2016) • Arxiv • 75 citations
Antoine Bordes, Y-Lan Boureau, Jason Weston
A Network-based End-to-end Trainable Task-oriented Dialogue System (2016) • Arxiv • 170 citations
Wen et al.
Learning Through Dialogue Interactions By Asking Questions (2016) • Arxiv • 72 citations
Li et al.
Conversational Contextual Cues: The Case Of Personalization And History For Response Ranking (2016) • Arxiv • 57 citations
Al-Rfou et al.
Dialogue Learning With Human-in-the-loop (2016) • Arxiv • 46 citations
Li et al.
A Persona-based Neural Conversation Model (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 889 citations
Li et al.
End-to-end Lstm-based Dialog Control Optimized With Supervised And Reinforcement Learning (2016) • Arxiv • 122 citations
Jason D. Williams, Geoffrey Zweig
Dialog-based Language Learning (2016) • Arxiv • 69 citations
Jason Weston
Conditional Generation And Snapshot Learning In Neural Dialogue Systems (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 73 citations
Wen et al.
Topic Aware Neural Response Generation (2016) • Arxiv • 317 citations
Xing et al.

Showing first 12 while collapsed. Click to expand and reveal all 469.

CVPR 280 papers #

Eyes Wide Shut? Exploring The Visual Shortcomings Of Multimodal Llms (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 90 citations
Tong et al.
Videocrafter2: Overcoming Data Limitations For High-quality Video Diffusion Models (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 94 citations
Chen et al.
Shvit: Single-head Vision Transformer With Memory Efficient Macro Design (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Seokju Yun, Youngmin Ro
Boosting Continual Learning Of Vision-language Models Via Mixture-of-experts Adapters (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Yu et al.
Coconut: Modernizing COCO Segmentation (2024) • No Venue
Deng et al.
Instancediffusion: Instance-level Control For Image Generation (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Wang et al.
Vit-comer: Vision Transformer With Convolutional Multi-scale Feature Interaction For Dense Predictions (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 80 citations
Xia et al.
Expandable Subspace Ensemble For Pre-trained Model-based Class-incremental Learning (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Zhou et al.
SNIFFER: Multimodal Large Language Model For Explainable Out-of-context Misinformation Detection (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Qi et al.
Editable Scene Simulation For Autonomous Driving Via Collaborative Llm-agents (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 40 citations
Wei et al.
Onetracker: Unifying Visual Object Tracking With Foundation Models And Efficient Tuning (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 71 citations
Hong et al.
Salience DETR: Enhancing Detection Transformer With Hierarchical Salience Filtering Refinement (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 59 citations
Hou et al.
Prompting Large Language Models With Rationale Heuristics For Knowledge-based Visual Question Answering (2024) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 153 citations
Hu et al.
Adapting Visual-language Models For Generalizable Anomaly Detection In Medical Images (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Huang et al.
Omg-seg: Is One Model Good Enough For All Segmentation? (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 47 citations
Li et al.
Promptkd: Unsupervised Prompt Distillation For Vision-language Models (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Li et al.
Towards Universal Fake Image Detectors That Generalize Across Generative Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 175 citations
Utkarsh Ojha, Yuheng Li, Yong Jae Lee
Toward Verifiable And Reproducible Human Evaluation For Text-to-image Generation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Otani et al.
Efficientvit: Memory Efficient Vision Transformer With Cascaded Group Attention (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 522 citations
Liu et al.
Revisiting Temporal Modeling For Clip-based Image-to-video Knowledge Transferring (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Liu et al.
Videomae V2: Scaling Video Masked Autoencoders With Dual Masking (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 318 citations
Wang et al.
Visual Language Pretrained Multiple Instance Zero-shot Transfer For Histopathology Images (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 79 citations
Lu et al.
Visual Prompt Multi-modal Tracking (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 234 citations
Zhu et al.
Geolayoutlm: Geometric Pre-training For Visual Information Extraction (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Luo et al.
Diffuse, Attend, And Segment: Unsupervised Zero-shot Segmentation Using Stable Diffusion (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Tian et al.
HOICLIP: Efficient Knowledge Transfer For HOI Detection With Vision-language Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Ning et al.
Diversity Is Definitely Needed: Improving Model-agnostic Zero-shot Classification Via Stable Diffusion (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 41 citations
Shipard et al.
CORA: Adapting CLIP For Open-vocabulary Detection With Region Prompting And Anchor Pre-matching (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 103 citations
Wu et al.
Meshgpt: Generating Triangle Meshes With Decoder-only Transformers (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Siddiqui et al.
Referring Multi-object Tracking (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 66 citations
Wu et al.
Learning To Exploit Temporal Structure For Biomedical Vision-language Processing (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Bannur et al.
Align Your Latents: High-resolution Video Synthesis With Latent Diffusion Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 438 citations
Blattmann et al.
Clip2scene: Towards Label-efficient 3D Scene Understanding By CLIP (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 105 citations
Chen et al.
LL3DA: Visual Interactive Instruction Tuning For Omni-3d Understanding, Reasoning, And Planning (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 48 citations
Chen et al.
Object-aware Distillation Pyramid For Open-vocabulary Object Detection (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Wang et al.
CVT-SLR: Contrastive Visual-textual Transformation For Sign Language Recognition With Variational Alignment (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Zheng et al.
CLIP The Gap: A Single Domain Generalization Approach For Object Detection (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 87 citations
Vidit Vidit, Martin Engilberge, Mathieu Salzmann
Selective Structured State-spaces For Long-form Video Understanding (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 70 citations
Wang et al.
Turning A CLIP Model Into A Scene Text Detector (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 75 citations
Yu et al.
RLHF-V: Towards Trustworthy Mllms Via Behavior Alignment From Fine-grained Correctional Human Feedback (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 63 citations
Yu et al.
A Pilot Study Of Query-free Adversarial Attack Against Stable Diffusion (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 44 citations
Haomin Zhuang, Yihua Zhang, Sijia Liu
Lmdrive: Closed-loop End-to-end Driving With Large Language Models (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 76 citations
Shao et al.
Detecting And Grounding Multi-modal Media Manipulation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Rui Shao, Tianxing Wu, Ziwei Liu
Depgraph: Towards Any Structural Pruning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 335 citations
Fang et al.
GALIP: Generative Adversarial Clips For Text-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Tao et al.
Vita-clip: Video And Text Adaptive CLIP Via Multimodal Prompting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Wasim et al.
Joint Visual Grounding And Tracking With Natural Language Specification (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 88 citations
Zhou et al.
Deepsolo++: Let Transformer Decoder With Explicit Points Solo For Multilingual Text Spotting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 87 citations
Ye et al.
Imagebind: One Embedding Space To Bind Them All (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 552 citations
Girdhar et al.
Detclipv2: Scalable Open-vocabulary Object Detection Pre-training Via Word-region Alignment (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Yao et al.
Visual-language Prompt Tuning With Knowledge-guided Context Optimization (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 141 citations
Hantao Yao, Rui Zhang, Changsheng Xu
Text With Knowledge Graph Augmented Transformer For Video Captioning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Gu et al.
Hallusionbench: An Advanced Diagnostic Suite For Entangled Language Hallucination And Visual Illusion In Large Vision-language Models (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Guan et al.
Align And Attend: Multimodal Summarization With Dual Contrastive Losses (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
He et al.
CLIP For All Things Zero-shot Sketch-based Image Retrieval, Fine-grained Or Not (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 93 citations
Sain et al.
Vid2seq: Large-scale Pretraining Of A Visual Language Model For Dense Video Captioning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 179 citations
Yang et al.
Diversity-aware Meta Visual Prompting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Huang et al.
OPERA: Alleviating Hallucination In Multi-modal Large Language Models Via Over-trust Penalty And Retrospection-allocation (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Huang et al.
Vtimellm: Empower LLM To Grasp Video Moments (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Huang et al.
Flatformer: Flattened Window Attention For Efficient Point Cloud Transformer (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 83 citations
Liu et al.
Winclip: Zero-/few-shot Anomaly Classification And Segmentation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 223 citations
Jeong et al.
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models For Domain Generalized Semantic Segmentation (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Wei et al.
Timechat: A Time-sensitive Multimodal Large Language Model For Long Video Understanding (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 85 citations
Ren et al.
Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 204 citations
Ding Jiang, Mang Ye
Hallucination Augmented Contrastive Learning For Multimodal Large Language Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Jiang et al.
Pixellm: Pixel Reasoning With Large Multimodal Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Ren et al.
Chat-univi: Unified Visual Representation Empowers Large Language Models With Image And Video Understanding (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 90 citations
Jin et al.
Video-text As Game Players: Hierarchical Banzhaf Interaction For Cross-modal Representation Learning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Jin et al.
Universal Instance Perception As Object Discovery And Retrieval (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Yan et al.
Zero-shot Everything Sketch-based Image Retrieval, And In Explainable Style (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Lin et al.
Crowdclip: Unsupervised Crowd Counting Via Vision-language Model (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 58 citations
Liang et al.
Multimodality Helps Unimodality: Cross-modal Few-shot Learning With Multimodal Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 97 citations
Lin et al.
Efficient Domain Adaptation For Speech Foundation Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Li et al.
Decoupled Multimodal Distilling For Emotion Recognition (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 138 citations
Yong Li, Yuanzhi Wang, Zhen Cui
GLIGEN: Open-set Grounded Text-to-image Generation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 364 citations
Li et al.
Manipllm: Embodied Multimodal Large Language Model For Object-centric Robotic Manipulation (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Li et al.
Lite DETR : An Interleaved Multi-scale Encoder For Efficient DETR (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 95 citations
Li et al.
Seed-bench: Benchmarking Multimodal Llms With Generative Comprehension (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 92 citations
Li et al.
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 65 citations
Xu et al.
Scaling Up Gans For Text-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 313 citations
Kang et al.
VILA: Learning Image Aesthetics From User Comments With Vision-language Pretraining (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Ke et al.
Prompt, Generate, Then Cache: Cascade Of Foundation Models Makes Strong Few-shot Learners (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 126 citations
Zhang et al.
Freestyle Layout-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Xue et al.
Geochat: Grounded Large Vision-language Model For Remote Sensing (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Kuckreja et al.
Transferable Adversarial Attacks On Vision Transformers With Token Gradient Regularization (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 48 citations
Zhang et al.
Open-vocabulary Panoptic Segmentation With Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 266 citations
Xu et al.
Filtering, Distillation, And Hard Negatives For Vision-language Pre-training (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Radenovic et al.
Mitigating Object Hallucinations In Large Vision-language Models Through Visual Contrastive Decoding (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 78 citations
Leng et al.
Compositional Temporal Grounding With Structured Variational Cross-graph Correspondence Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Li et al.
Clip-event: Connecting Text And Images With Event Structures (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Li et al.
Cross-modal Clinical Graph Transformer For Ophthalmic Report Generation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 47 citations
Li et al.
Envedit: Environment Editing For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 64 citations
Jialu Li, Hao Tan, Mohit Bansal
Invariant Grounding For Video Question Answering (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 97 citations
Li et al.
Visual-language Navigation Pretraining Via Prompt-based Environmental Self-exploration (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Liang et al.
Proposalclip: Unsupervised Open-category Object Proposal Generation Via Exploiting CLIP Cues (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Shi et al.
Vision Transformers Are Parameter-efficient Audio-visual Learners (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 64 citations
Lin et al.
ADAPT: Vision-language Navigation With Modality-aligned Action Prompts (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Lin et al.
Transrac: Encoding Multi-scale Temporal Correlation With Transformers For Repetitive Action Counting (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Hu et al.
Pushing The Limits Of Simple Pipelines For Few-shot Learning: External Data And Fine-tuning Make A Difference (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 148 citations
Hu et al.
Multi-view Transformer For 3D Visual Grounding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 88 citations
Huang et al.
Swintextspotter: Scene Text Spotting Via Better Synergy Between Text Detection And Text Recognition (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 128 citations
Huang et al.
Safe Self-refinement For Transformer-based Domain Adaptation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 96 citations
Sun et al.
Language As Queries For Referring Video Object Segmentation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 123 citations
Wu et al.
Tubedetr: Spatio-temporal Video Grounding With Transformers (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Yang et al.
Vision-language Pre-training With Triple Contrastive Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 244 citations
Yang et al.
Xmp-font: Self-supervised Cross-modality Pre-training For Few-shot Font Generation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Liu et al.
UMT: Unified Multi-modal Transformers For Joint Video Moment Retrieval And Highlight Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 125 citations
Liu et al.
Towards Transferable Unrestricted Adversarial Examples With Minimum Changes (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 54 citations
Fangcheng Liu, Chao Zhang, Hongyang Zhang
Partslip: Low-shot Part Segmentation For 3D Point Clouds Via Pretrained Image-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Liu et al.
Pseudo-q: Generating Pseudo Language Queries For Visual Grounding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 54 citations
Jiang et al.
DRT: A Lightweight Single Image Deraining Recursive Transformer (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 60 citations
Yuanchu Liang, Saeed Anwar, Yang Liu
MSTR: Multi-scale Transformer For End-to-end Human-object Interaction Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 70 citations
Kim et al.
Conditional Prompt Learning For Vision-language Models (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 1126 citations
Zhou et al.
Towards Weakly-supervised Text Spotting Using A Multi-task Transformer (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Kittenplon et al.
ULIP: Learning A Unified Representation Of Language, Images, And Point Clouds For 3D Understanding (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 169 citations
Xue et al.
Beyond A Pre-trained Object Detector: Cross-modal Textual And Visual Context For Image Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 85 citations
Chia-Wen Kuo, Zsolt Kira
Groupvit: Semantic Segmentation Emerges From Text Supervision (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 352 citations
Xu et al.
Improving Mispronunciation Detection With Wav2vec2-based Momentum Pseudo-labeling For Accentedness And Intelligibility Assessment (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Yang et al.
Training-free Transformer Architecture Search (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Zhou et al.
Multimodal Token Fusion For Vision Transformers (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 173 citations
Wang et al.
Fairness-aware Adversarial Perturbation Towards Bias Mitigation For Deployed Deep Models (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Wang et al.
Counterfactual Cycle-consistent Learning For Instruction Following And Generation In Vision-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Wang et al.
Mult: An End-to-end Multitask Learning Transformer (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Bhattacharjee et al.
Revisiting The "video" In Video-language Understanding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 106 citations
Buch et al.
LASP: Text-to-text Optimization For Language-aware Soft Prompting Of Vision & Language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 40 citations
Adrian Bulat, Georgios Tzimiropoulos
Vision Transformer Slimming: Multi-dimension Searching In Continuous Optimization Space (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Chavan et al.
Activating More Pixels In Image Super-resolution Transformer (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 742 citations
Chen et al.
Gatehub: Gated History Unit With Background Suppression For Online Action Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Chen et al.
Think Global, Act Local: Dual-scale Graph Transformer For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 127 citations
Chen et al.
A Simple Multi-modality Transfer Learning Baseline For Sign Language Translation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 120 citations
Chen et al.
Bidirectional Cross-modal Knowledge Exploration For Video Recognition With Pre-trained Vision-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 75 citations
Wu et al.
Winoground: Probing Vision And Language Models For Visio-linguistic Compositionality (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Thrush et al.
Task Adaptive Parameter Sharing For Multi-task Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Wallingford et al.
X-trans2cap: Cross-modal Knowledge Transfer Using Transformer For 3D Dense Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Yuan et al.
Vista: Vision And Scene Text Aggregation For Cross-modal Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 81 citations
Cheng et al.
Tableformer: Table Structure Understanding With Transformers (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 63 citations
Nassar et al.
Clip-art: Contrastive Pre-training For Fine-grained Art Classification (2022) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 88 citations
Marcos V. Conde, Kerem Turgutlu
I2mvformer: Large Language Model Generated Multi-view Document Supervision For Zero-shot Image Classification (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Naeem et al.
Task Residual For Tuning Vision-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 73 citations
Yu et al.
MAGVIT: Masked Generative Video Transformer (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 93 citations
Yu et al.
EMOCA: Emotion Driven Monocular Face Capture And Animation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 149 citations
Radek Danecek, Michael J. Black, Timo Bolkart
Learning-by-narrating: Narrative Pre-training For Zero-shot Dialogue Comprehension (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 109 citations
Zhao et al.
Stripformer: Strip Transformer For Fast Image Deblurring (2022) • Lecture Notes in Computer Science • 155 citations
Tsai et al.
Mukea: Multimodal Knowledge Extraction And Accumulation For Knowledge-based Visual Question Answering (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Ding et al.
Language-bridged Spatial-temporal Interaction For Referring Video Object Segmentation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 54 citations
Ding et al.
Are Multimodal Transformers Robust To Missing Modality? (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 117 citations
Ma et al.
Teaching Structured Vision&language Concepts To Vision&language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Doveh et al.
Learning To Prompt For Open-vocabulary Object Detection With Vision-language Model (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 253 citations
Du et al.
Castling-vit: Compressing Self-attention Via Switching Towards Linear-angular Attention At Vision Transformer Inference (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
You et al.
A Text Attention Network For Spatial Deformation Robust Scene Text Image Super-resolution (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 86 citations
Jianqi Ma, Zhetong Liang, Lei Zhang
Unifying Vision, Text, And Layout For Universal Document Processing (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Tang et al.
3D-SPS: Single-stage 3D Visual Grounding Via Referred Point Progressive Selection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 64 citations
Luo et al.
BSRT: Improving Burst Super-resolution With Swin Transformer And Flow-guided Deformable Alignment (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 51 citations
Luo et al.
An Empirical Study Of End-to-end Video-language Transformers With Masked Visual Modeling (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Fu et al.
End-to-end Generative Pretraining For Multimodal Video Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 141 citations
Seo et al.
MIST: Multi-modal Iterative Spatial-temporal Transformer For Long-form Video Question Answering (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 60 citations
Gao et al.
Vision Transformer With Deformable Attention (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 649 citations
Xia et al.
Shifting More Attention To Visual Backbone: Query-modulated Refinement Networks For End-to-end Visual Grounding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 78 citations
Ye et al.
Bridging Video-text Retrieval With Multiple Choice Questions (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 117 citations
Ge et al.
Deepsolo: Let Transformer Decoder With Explicit Points Solo For Text Spotting (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 87 citations
Ye et al.
COTS: Collaborative Two-stream Vision-language Pre-training Model For Cross-modal Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 61 citations
Lu et al.
Prompt Distribution Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 188 citations
Lu et al.
Language Model Classifier Aligns Better With Physician Word Sensitivity Than Xgboost On Readmission Prediction (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 112 citations
Yang et al.
Future Transformer For Long-term Action Anticipation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Gong et al.
X-pool: Cross-modal Language-video Attention For Text-video Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 170 citations
Gorti et al.
Few-shot Object Detection With Fully Cross-transformer (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 154 citations
Han et al.
Temporal Alignment Networks For Long-term Video (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Tengda Han, Weidi Xie, Andrew Zisserman
Neighborhood Attention Transformer (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 311 citations
Hassani et al.
Anyface: Free-style Text-to-face Synthesis And Manipulation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Sun et al.
NLX-GPT: A Model For Natural Language Explanations In Vision And Vision-language Tasks (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Fawaz Sammani, Tanmoy Mukherjee, Nikos Deligiannis
Generalized Decoding For Pixel, Image, And Language (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 140 citations
Zou et al.
Bridging The Gap Between Learning In Discrete And Continuous Environments For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Hong et al.
Relation-aware Instance Refinement For Weakly Supervised Visual Grounding (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Liu et al.
Context-aware Biaffine Localizing Network For Temporal Sentence Grounding (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 128 citations
Liu et al.
Locate Then Segment: A Strong Pipeline For Referring Image Segmentation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 125 citations
Jing et al.
Look Before You Leap: Learning Landmark Features For One-stage Visual Grounding (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 111 citations
Huang et al.
Seeing Out Of The Box: End-to-end Pre-training For Vision-language Representation Learning (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 205 citations
Huang et al.
Learning Salient Boundary Feature For Anchor-free Temporal Action Localization (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 248 citations
Lin et al.
Swinbert: End-to-end Transformers With Sparse Attention For Video Captioning (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 208 citations
Lin et al.
LAVT: Language-aware Vision Transformer For Referring Image Segmentation (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 260 citations
Yang et al.
Causal Attention For Vision-language Tasks (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Yang et al.
High-resolution Image Synthesis With Latent Diffusion Models (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 10513 citations
Rombach et al.
Zero-shot Adversarial Quantization (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 73 citations
Yuang Liu, Wei Zhang, Jun Wang
Open-domain, Content-based, Multi-modal Fact-checking Of Out-of-context Images Via Online Resources (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 60 citations
Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
Variational Transformer Networks For Layout Generation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 98 citations
Diego Martin Arroyo, Janis Postels, Federico Tombari
Towards More Flexible And Accurate Object Tracking With Natural Language: Algorithms And Benchmark (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 198 citations
Wang et al.
Transformer Meets Tracker: Exploiting Temporal Context For Robust Visual Tracking (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 707 citations
Wang et al.
Less Is More: Clipbert For Video-and-language Learning Via Sparse Sampling (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 513 citations
Lei et al.
Dual Attention Suppression Attack: Generate Adversarial Camouflage In Physical World (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 172 citations
Wang et al.
Latr: Layout-aware Transformer For Scene-text VQA (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 67 citations
Biten et al.
Everything At Once -- Multi-modal Fusion Transformer For Video Retrieval (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 112 citations
Shvetsova et al.
End-to-end Referring Video Object Segmentation With Multimodal Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Adam Botach, Evgenii Zheltonozhskii, Chaim Baskin
Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 98 citations
Miech et al.
Human-like Controllable Image Captioning With Verb-specific Semantic Roles (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Chen et al.
Mobile-former: Bridging Mobilenet And Transformer (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 514 citations
Chen et al.
Visualgpt: Data-efficient Adaptation Of Pretrained Language Models For Image Captioning (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 133 citations
Chen et al.
Kaleido-bert: Vision-language Pre-training On Fashion Domain (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 105 citations
Zhuge et al.
Style-aware Normalized Loss For Improving Arbitrary Style Transfer (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Cheng et al.
Diverse Part Discovery: Occluded Person Re-identification With Part-aware Transformer (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 335 citations
Li et al.
Towards Corruption-agnostic Robust Domain Adaptation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Xu et al.
Point-bert: Pre-training 3D Point Cloud Transformers With Masked Point Modeling (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 565 citations
Yu et al.
Cswin Transformer: A General Vision Transformer Backbone With Cross-shaped Windows (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 957 citations
Dong et al.
An Empirical Study Of Training End-to-end Vision-and-language Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 244 citations
Dou et al.
Dytox: Transformers For Continual Learning With Dynamic Token Expansion (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 256 citations
Douillard et al.
Layerwise Optimization By Gradient Decomposition For Continual Learning (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Tang et al.
Revamping Cross-modal Recipe Retrieval With Hierarchical Transformers And Self-supervised Learning (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Salvador et al.
Object-region Video Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Herzig et al.
Vinvl: Revisiting Visual Representations In Vision-language Models (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 640 citations
Zhang et al.
Open-book Video Captioning With Retrieve-copy-generate Network (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 100 citations
Zhang et al.
Delving Deep Into The Generalization Of Vision Transformers Under Distribution Shifts (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 70 citations
Zhang et al.
Improving Sequence-to-sequence Pre-training Via Sequence Span Rewriting (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 201 citations
Zhou et al.
Faceformer: Speech-driven 3D Facial Animation With Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 187 citations
Fan et al.
Cross-modal Contrastive Learning For Text-to-image Generation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 297 citations
Zhang et al.
Styleswin: Transformer-based GAN For High-resolution Image Generation (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 202 citations
Zhang et al.
VX2TEXT: End-to-end Learning Of Video-based Text Generation From Multimodal Inputs (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Lin et al.
Transnas-bench-101: Improving Transferability And Generalizability Of Cross-task Neural Architecture Search (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Duan et al.
Normalized And Geometry-aware Self-attention Network For Image Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 246 citations
Guo et al.
Graph Structured Network For Image-text Matching (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 260 citations
Liu et al.
Roses Are Red, Violets Are Blue... But Should Vqa Expect Them To? (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Kervadec et al.
More Grounded Image Captioning By Distilling Image-text Matching Model (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Zhou et al.
Towards Learning A Generic Agent For Vision-and-language Navigation Via Pre-training (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 221 citations
Hao et al.
Creating Something From Nothing: Unsupervised Knowledge Distillation For Cross-modal Hashing (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 127 citations
Hu et al.
Show, Edit And Tell: A Framework For Editing Image Captions (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Fawaz Sammani, Luke Melas-Kyriazi
Where Does It Exist: Spatio-temporal Video Grounding For Multi-form Sentences (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 94 citations
Zhang et al.
MTL-NAS: Task-agnostic Neural Architecture Search Towards General-purpose Multi-task Learning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Gao et al.
Multi-task Collaborative Network For Joint Referring Expression Comprehension And Segmentation (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 268 citations
Luo et al.
In Defense Of Grid Features For Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 351 citations
Jiang et al.
Actbert: Learning Global-local Video-text Representations (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 392 citations
Linchao Zhu, Yi Yang
LSQ+: Improving Low-bit Quantization Through Learnable Offsets And Better Initialization (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 190 citations
Bhalgat et al.
On The General Value Of Evidence, And Bilingual Scene-text Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Wang et al.
Topological Planning With Transformers For Vision-and-language Navigation (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Chen et al.
Adversarial Robustness: From Self-supervised Pre-training To Fine-tuning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 167 citations
Chen et al.
Cops-ref: A New Dataset And Task On Compositional Referring Expression Comprehension (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Chen et al.
Counterfactual Samples Synthesizing For Robust Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 335 citations
Chen et al.
IMRAM: Iterative Matching With Recurrent Attention Memory For Cross-modal Image-text Retrieval (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 391 citations
Chen et al.
Fine-grained Video-text Retrieval With Hierarchical Graph Reasoning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 306 citations
Chen et al.
Graph-structured Referring Expression Reasoning In The Wild (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 96 citations
Sibei Yang, Guanbin Li, Yizhou Yu
Mask Textspotter V3: Segmentation Proposal Network For Robust Scene Text Spotting (2020) • Lecture Notes in Computer Science • 191 citations
Liao et al.
VIOLIN: A Large-scale Dataset For Video-and-language Inference (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Liu et al.
UP-DETR: Unsupervised Pre-training For Object Detection With Transformers (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 401 citations
Dai et al.
Object Relational Graph With Teacher-recommended Learning For Video Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 310 citations
Zhang et al.
Vision-dialog Navigation By Exploring Cross-modal Memory (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Zhu et al.
Transform And Tell: Entity-aware News Image Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 81 citations
Alasdair Tran, Alexander Mathews, Lexing Xie
Mirrorgan: Learning Text-to-image Generation By Redescription (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 589 citations
Qiao et al.
Polysemous Visual-semantic Embedding For Cross-modal Retrieval (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 243 citations
Yale Song, Mohammad Soleymani
Deep Modular Co-attention Networks For Visual Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 862 citations
Yu et al.
Information Maximizing Visual Question Generation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 90 citations
Ranjay Krishna, Michael Bernstein, Li Fei-Fei
Tactical Rewind: Self-correction Via Backtracking In Vision-and-language Navigation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 149 citations
Ke et al.
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 545 citations
Marino et al.
The Regretful Agent: Heuristic-aided Navigation Through Progress Estimation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 168 citations
Ma et al.
Heterogeneous Memory Enhanced Multimodal Attention Model For Video Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 264 citations
Fan et al.
Meshed-memory Transformer For Image Captioning (2019) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 861 citations
Cornia et al.
Spatio-temporal Dynamics And Semantic Attribute Enriched Visual Encoding For Video Captioning (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 234 citations
Aafaq et al.
Object-driven Text-to-image Synthesis Via Adversarial Training (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 309 citations
Li et al.
Grounding Human-to-vehicle Advice For Self-driving Vehicles (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Kim et al.
Improving Referring Expression Grounding With Cross-modal Attention-guided Erasing (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 189 citations
Liu et al.
DM-GAN: Dynamic Memory Generative Adversarial Networks For Text-to-image Synthesis (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 549 citations
Zhu et al.
Clevr-ref+: Diagnosing Visual Reasoning With Referring Expressions (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Liu et al.
Shape Robust Text Detection With Progressive Scale Expansion Network (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 798 citations
Wang et al.
Iterative Answer Prediction With Pointer-augmented Multimodal Transformers For Textvqa (2019) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 140 citations
Hu et al.
GQA: A New Dataset For Real-world Visual Reasoning And Compositional Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Drew A. Hudson, Christopher D. Manning
Neural Baby Talk (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 442 citations
Lu et al.
Regularizing Rnns For Caption Generation By Reconstructing The Past With The Present (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 90 citations
Chen et al.
Visual Question Reasoning On General Dependency Tree (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 42 citations
Cao et al.
An End-to-end Textspotter With Explicit Alignment And Attention (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 237 citations
He et al.
Mattnet: Modular Attention Network For Referring Expression Comprehension (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 788 citations
Yu et al.
End-to-end Dense Video Captioning With Masked Transformer (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 401 citations
Zhou et al.
Jointly Localizing And Describing Events For Dense Video Captioning (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 168 citations
Li et al.
FOTS: Fast Oriented Text Spotting With A Unified Network (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 563 citations
Liu et al.
Context-aware Captions From Context-agnostic Supervision (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 109 citations
Vedantam et al.
Look, Imagine And Match: Improving Textual-visual Cross-modal Retrieval With Generative Models (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 289 citations
Gu et al.
TGIF-QA: Toward Spatio-temporal Reasoning In Visual Question Answering (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 435 citations
Jang et al.
Detecting Oriented Text In Natural Images By Linking Segments (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 800 citations
Baoguang Shi, Xiang Bai, Serge Belongie
Incorporating Copying Mechanism In Image Captioning For Learning Novel Objects (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 149 citations
Yao et al.
Don't Just Assume; Look And Answer: Overcoming Priors For Visual Question Answering (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 40 citations
Agrawal et al.
Fooling Vision And Language Models Despite Localization And Attention Mechanism (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Xu et al.
Skeleton Key: Image Captioning By Skeleton-attribute Decomposition (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 100 citations
Wang et al.
End-to-end Concept Word Detection For Video Captioning, Retrieval, And Question Answering (2016) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 146 citations
Yu et al.
TGIF: A New Dataset And Benchmark On Animated GIF Description (2016) • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 197 citations
Li et al.
Learning Deep Representations Of Fine-grained Visual Descriptions (2016) • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 769 citations
Reed et al.

Showing first 12 while collapsed. Click to expand and reveal all 280.

— D —

Datasets 1692 papers #

DRIVE: Data Curation Best Practices For Reinforcement Learning With Verifiable Reward In Competitive Code Generation (2025) • No Venue
Zhu et al.
Multiagentbench: Evaluating The Collaboration And Competition Of LLM Agents (2025) • No Venue
Zhu et al.
Longwriter-v: Enabling Ultra-long And High-fidelity Generation In Vision-language Models (2025) • No Venue
Tu et al.
Time Blindness: Why Video-language Models Can't See What Humans Can? (2025) • No Venue
Upadhyay et al.
Drivel-ology: Challenging Llms With Interpreting Nonsense With Depth (2025) • No Venue
Wang et al.
CODESYNC: Synchronizing Large Language Models With Dynamic Code Evolution At Scale (2025) • No Venue
Wang et al.
Chain-of-retrieval Augmented Generation (2025) • No Venue
Wang et al.
Cinemaster: A 3d-aware And Controllable Framework For Cinematic Text-to-video Generation (2025) • No Venue
Wang et al.
Cmphysbench: A Benchmark For Evaluating Large Language Models In Condensed Matter Physics (2025) • No Venue
Wang et al.
Critique Fine-tuning: Learning To Critique Is More Effective Than Learning To Imitate (2025) • No Venue
Yubo Wang, Xiang Yue, Wenhu Chen
Coser: Coordinating Llm-based Persona Simulation Of Established Roles (2025) • No Venue
Wang et al.
Fantasyportrait: Enhancing Multi-character Portrait Animation With Expression-augmented Diffusion Transformers (2025) • No Venue
Wang et al.
Fostering Video Reasoning Via Next-event Prediction (2025) • No Venue
Wang et al.
GPT-IMAGE-EDIT-1.5M: A Million-scale, Gpt-generated Image Dataset (2025) • No Venue
Wang et al.
Internsvg: Towards Unified SVG Tasks With Multimodal Large Language Models (2025) • No Venue
Wang et al.
F2LLM Technical Report: Matching SOTA Embedding Performance With 6 Million Open-source Data (2025) • No Venue
Zhang et al.
Megamath: Pushing The Limits Of Open Math Corpora (2025) • No Venue
Zhou et al.
Neural-driven Image Editing (2025) • No Venue
Zhou et al.
Omniworld: A Multi-domain And Multi-modal Dataset For 4D World Modeling (2025) • No Venue
Zhou et al.
Roborefer: Towards Spatial Referring With Reasoning In Vision-language Models For Robotics (2025) • No Venue
Zhou et al.
Phi-ground Tech Report: Advancing Perception In GUI Grounding (2025) • No Venue
Zhang et al.
GKG-LLM: A Unified Framework For Generalized Knowledge Graph Construction (2025) • No Venue
Zhang et al.
Qwen3 Embedding: Advancing Text Embedding And Reranking Through Foundation Models (2025) • No Venue
Zhang et al.
Speakervid-5m: A Large-scale High-quality Dataset For Audio-visual Dyadic Interactive Human Generation (2025) • No Venue
Zhang et al.
Videollama 3: Frontier Multimodal Foundation Models For Image And Video Understanding (2025) • No Venue
Zhang et al.
Unified Multimodal Understanding And Generation Models: Advances, Challenges, And Opportunities (2025) • No Venue
Zhang et al.
Babel: Open Multilingual Large Language Models Serving Over 90% Of Global Speakers (2025) • No Venue
Zhao et al.
Omnialign-v: Towards Enhanced Alignment Of Mllms With Human Preference (2025) • No Venue
Zhao et al.
Lex-art: Rethinking Text Generation Via Scalable High-quality Data Synthesis (2025) • No Venue
Zhao et al.
R1-omni: Explainable Omni-multimodal Emotion Recognition With Reinforcing Learning (2025) • No Venue
Jiaxing Zhao, Xihan Wei, Liefeng Bo
Promptcot 2.0: Scaling Prompt Synthesis For Large Language Model Reasoning (2025) • No Venue
Zhao et al.
One Token To Fool Llm-as-a-judge (2025) • No Venue
Zhao et al.
SAIL-VL2 Technical Report (2025) • No Venue
Yin et al.
Aligning Multimodal LLM With Human Preference: A Survey (2025) • No Venue
Yu et al.
Demystifying Reinforcement Learning In Agentic Reasoning (2025) • No Venue
Yu et al.
How Far Are Vlms From Visual Spatial Intelligence? A Benchmark-driven Perspective (2025) • No Venue
Yu et al.
Vrbench: A Benchmark For Multi-step Reasoning In Long Narrative Videos (2025) • No Venue
Yu et al.
Unicorn: Text-only Data Synthesis For Vision Language Model Training (2025) • No Venue
Yu et al.
Z1: Efficient Test-time Scaling With Code (2025) • No Venue
Yu et al.
Agent-r: Training Language Model Agents To Reflect Via Iterative Self-training (2025) • No Venue
Yuan et al.
Sa2va: Marrying SAM2 With Llava For Dense Grounded Understanding Of Images And Videos (2025) • No Venue
Yuan et al.
Refeed: Multi-dimensional Summarization Refinement With Reflective Reasoning On Feedback (2025) • No Venue
Yun et al.
Multi-swe-bench: A Multilingual Benchmark For Issue Resolving (2025) • No Venue
Zan et al.
Aralingbench A Human-annotated Benchmark For Evaluating Arabic Linguistic Capabilities Of Large Language Models (2025) • No Venue
Zbib et al.
A Vision-language-action-critic Model For Robotic Real-world Reinforcement Learning (2025) • No Venue
Zhai et al.
Skywork-swe: Unveiling Data Scaling Laws For Software Engineering In Llms (2025) • No Venue
Zeng et al.
2.5 Years In Class: A Multimodal Textbook For Vision-language Pretraining (2025) • No Venue
Zhang et al.
Bee: A High-quality Corpus And Full-stack Suite To Unlock Advanced Fully Open Mllms (2025) • No Venue
Zhang et al.
Basereward: A Strong Baseline For Multimodal Reward Model (2025) • No Venue
Zhang et al.
Autoenv: Automated Environments For Measuring Cross-environment Agent Learning (2025) • No Venue
Zhang et al.
Domain2vec: Vectorizing Datasets To Find The Optimal Data Mixture Without Training (2025) • No Venue
Zhang et al.
Mathcoder-vl: Bridging Vision And Code For Enhanced Multimodal Mathematical Reasoning (2025) • No Venue
Wang et al.
Multishotmaster: A Controllable Multi-shot Video Generation Framework (2025) • No Venue
Wang et al.
Mr-align: Meta-reasoning Informed Factuality Alignment For Large Reasoning Models (2025) • No Venue
Wang et al.
Opencua: Open Foundations For Computer-use Agents (2025) • No Venue
Wang et al.
Skywork-vl Reward: An Effective Reward Model For Multimodal Understanding And Reasoning (2025) • No Venue
Wang et al.
Pref-grpo: Pairwise Preference Reward-based GRPO For Stable Text-to-image Reinforcement Learning (2025) • No Venue
Wang et al.
Roboomni: Proactive Robot Manipulation In Omni-modal Context (2025) • No Venue
Wang et al.
Scaling Pre-training To One Hundred Billion Data For Vision Language Models (2025) • No Venue
Wang et al.
Textatlas5m: A Large-scale Dataset For Dense Text Image Generation (2025) • No Venue
Wang et al.
Finevision: Open Data Is All You Need (2025) • No Venue
Wiedmann et al.
Vision-zero: Scalable VLM Self-improvement Via Strategic Gamified Self-play (2025) • No Venue
Wang et al.
Video-thinker: Sparking "thinking With Videos" Via Reinforcement Learning (2025) • No Venue
Wang et al.
Worldpm: Scaling Human Preference Modeling (2025) • No Venue
Wang et al.
Mocha: Towards Movie-grade Talking Character Synthesis (2025) • No Venue
Wei et al.
Rank1: Test-time Compute For Reranking In Information Retrieval (2025) • No Venue
Weller et al.
Seq Vs Seq: An Open Suite Of Paired Encoders And Decoders (2025) • No Venue
Weller et al.
3D Scene Generation: A Survey (2025) • No Venue
Wen et al.
Spot The Fake: Large Multimodal Model-based Synthetic Image Detection With Artifact Explanation (2025) • No Venue
Wen et al.
Widesearch: Benchmarking Agentic Broad Info-seeking (2025) • No Venue
Wong et al.
Lightgen: Efficient Image Generation Through Knowledge Distillation And Direct Preference Optimization (2025) • No Venue
Wu et al.
Less-to-more Generalization: Unlocking More Controllability By In-context Generation (2025) • No Venue
Wu et al.
Any2caption:interpreting Any Condition To Caption For Controllable Video Generation (2025) • No Venue
Wu et al.
Omnigen2: Exploration To Advanced Multimodal Generation (2025) • No Venue
Wu et al.
Reasoning Or Memorization? Unreliable Results Of Reinforcement Learning Due To Data Contamination (2025) • No Venue
Wu et al.
Qwen-image Technical Report (2025) • No Venue
Wu et al.
Spatial-mllm: Boosting MLLM Capabilities In Visual-based Spatial Intelligence (2025) • No Venue
Wu et al.
Writingbench: A Comprehensive Benchmark For Generative Writing (2025) • No Venue
Wu et al.
BMMR: A Large-scale Bilingual Multimodal Multi-discipline Reasoning Dataset (2025) • No Venue
Xi et al.
Dense Retrievers Can Fail On Simple Queries: Revealing The Granularity Dilemma Of Embeddings (2025) • No Venue
Xu et al.
Leetcodedataset: A Temporal Dataset For Robust Evaluation And Efficient Training Of Code Llms (2025) • No Venue
Xia et al.
Open Data Synthesis For Deep Research (2025) • No Venue
Xia et al.
Retrieval-augmented Large Language Models For Financial Time Series Forecasting (2025) • No Venue
Xiao et al.
MIEB: Massive Image Embedding Benchmark (2025) • No Venue
Xiao et al.
Ui-genie: A Self-improving Approach For Iteratively Boosting Mllm-based Mobile GUI Agents (2025) • No Venue
Xiao et al.
Are Vlms Ready For Autonomous Driving? An Empirical Study From The Reliability, Data, And Metric Perspectives (2025) • No Venue
Xie et al.
Llms Can Get "brain Rot"! (2025) • No Venue
Xing et al.
Jodi: Unification Of Visual Generation And Understanding Via Joint Modeling (2025) • No Venue
Xu et al.
Kodcode: A Diverse, Challenging, And Verifiable Synthetic Dataset For Coding (2025) • No Venue
Xu et al.
Mind The Gap: Bridging Thought Leap For Improved Chain-of-thought Tuning (2025) • No Venue
Xu et al.
Visulogic: A Benchmark For Evaluating Visual Reasoning In Multi-modal Large Language Models (2025) • No Venue
Xu et al.
TOUCAN: Synthesizing 1.5M Tool-agentic Data From Real-world MCP Environments (2025) • No Venue
Xu et al.
Audio-flan: A Preliminary Release (2025) • No Venue
Xue et al.
Withanyone: Towards Controllable And ID Consistent Image Generation (2025) • No Venue
Xu et al.
Oceangym: A Benchmark Environment For Underwater Embodied Agents (2025) • No Venue
Xue et al.
Gpt-imgeval: A Comprehensive Benchmark For Diagnosing Gpt4o In Image Generation (2025) • No Venue
Yan et al.
Egolife: Towards Egocentric Life Assistant (2025) • No Venue
Yang et al.
Magma: A Foundation Model For Multimodal AI Agents (2025) • No Venue
Yang et al.
Steering Vision-language-action Models As Anti-exploration: A Test-time Scaling Approach (2025) • No Venue
Yang et al.
Table-r1: Inference-time Scaling For Table Reasoning (2025) • No Venue
Yang et al.
Too Good To Be Bad: On The Failure Of Llms To Role-play Villains (2025) • No Venue
Yi et al.
Through-the-mask: Mask-based Motion Trajectories For Image-to-video Generation (2025) • No Venue
Yariv et al.
Echo-4o: Harnessing The Power Of Gpt-4o Synthetic Images For Improved Image Generation (2025) • No Venue
Ye et al.
Seeing From Another Perspective: Evaluating Multi-view Understanding In Mllms (2025) • No Venue
Yeh et al.
Primitiveanything: Human-crafted 3D Primitive Assembly Generation With Auto-regressive Transformer (2025) • No Venue
Ye et al.
Shapellm-omni: A Native Multimodal LLM For 3D Generation And Understanding (2025) • No Venue
Ye et al.
Phi-4-mini Technical Report: Compact Yet Powerful Multimodal Language Models Via Mixture-of-loras (2025) • No Venue
Abouelenin et al.
Emergent Misalignment Via In-context Learning: Narrow In-context Examples Can Produce Broadly Misaligned Llms (2025) • No Venue
Afonin et al.
Language Models' Factuality Depends On The Language Of Inquiry (2025) • No Venue
Aggarwal et al.
Essential-web V1.0: 24T Tokens Of Organized Web Data (2025) • No Venue
Ai et al.
Sadeed: Advancing Arabic Diacritization Through Small Language Model (2025) • No Venue
Aldallal et al.
Atla Selene Mini: A General Purpose Evaluation Model (2025) • No Venue
Alexandru et al.
Smollm2: When Smol Goes Big -- Data-centric Training Of A Small Language Model (2025) • No Venue
Allal et al.
Amo-bench: Large Language Models Still Struggle In High School Math Competitions (2025) • No Venue
An et al.
Llava-onevision-1.5: Fully Open Framework For Democratized Multimodal Training (2025) • No Venue
An et al.
Herobench: A Benchmark For Long-horizon Planning And Structured Reasoning In Virtual Worlds (2025) • No Venue
Anokhin et al.
Tabstar: A Foundation Tabular Model With Semantically Target-aware Representations (2025) • No Venue
Alan Arazi, Eilam Shapira, Roi Reichart
Towards Best Practices For Open Datasets For LLM Training (2025) • No Venue
Baack et al.
Swe-rebench: An Automated Pipeline For Task Collection And Decontaminated Evaluation Of Software Engineering Agents (2025) • No Venue
Badertdinov et al.
Eurobert: Scaling Multilingual Encoders For European Languages (2025) • No Venue
Boizard et al.
A Data-centric Framework For Addressing Phonetic And Prosodic Challenges In Russian Speech Generative Models (2025) • No Venue
Borodin et al.
Video Action Differencing (2025) • No Venue
Burgess et al.
Microvqa: A Multimodal Reasoning Benchmark For Microscopy-based Scientific Research (2025) • No Venue
Burgess et al.
Crowdsource, Crawl, Or Generate? Creating SEA-VL, A Multicultural Vision-language Dataset For Southeast Asia (2025) • No Venue
Cahyawijaya et al.
MORSE-500: A Programmatically Controllable Video Benchmark To Stress-test Multimodal Reasoning (2025) • No Venue
Cai et al.
Web-shepherd: Advancing Prms For Reinforcing Web Agents (2025) • No Venue
Chae et al.
Webscale-rl: Automated Data Pipeline For Scaling RL Data To Pretraining Levels (2025) • No Venue
Cen et al.
A3: Android Agent Arena For Mobile GUI Agents (2025) • No Venue
Chai et al.
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-mesh Representation, And Evaluation Metrics (2025) • No Venue
Chae-Yeon et al.
Game-time: Evaluating Temporal Dynamics In Spoken Language Models (2025) • No Venue
Chang et al.
Humo: Human-centric Video Generation Via Collaborative Multi-modal Conditioning (2025) • No Venue
Chen et al.
Blip3-o: A Family Of Fully Open Unified Multimodal Models-architecture, Training And Dataset (2025) • No Venue
Chen et al.
Code2video: A Code-centric Paradigm For Educational Video Generation (2025) • No Venue
Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou
Halumem: Evaluating Hallucinations In Memory Systems Of Agents (2025) • No Venue
Chen et al.
FINEREASON: Evaluating And Improving Llms' Deliberate Reasoning Through Reflective Puzzle Solving (2025) • No Venue
Chen et al.
Fusionaudio-1.2m: Towards Fine-grained Audio Captioning With Multimodal Contextual Fusion (2025) • No Venue
Chen et al.
MIG: Automatic Data Selection For Instruction Tuning By Maximizing Information Gain In Semantic Space (2025) • No Venue
Chen et al.
Moca: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings (2025) • No Venue
Chen et al.
Opengpt-4o-image: A Comprehensive Dataset For Advanced Image Generation And Editing (2025) • No Venue
Chen et al.
Paper2web: Let's Make Your Paper Alive! (2025) • No Venue
Chen et al.
Sharegpt-4o-image: Aligning Multimodal Models With Gpt-4o-level Image Generation (2025) • No Venue
Chen et al.
Xverify: Efficient Answer Verifier For Reasoning Model Evaluations (2025) • No Venue
Chen et al.
Ui-ins: Enhancing GUI Grounding With Multi-perspective Instruction-as-reasoning (2025) • No Venue
Chen et al.
Videovista-culturallingo: 360^circ Horizons-bridging Cultures, Languages, And Domains In Video Comprehension (2025) • No Venue
Chen et al.
Multimodal Evaluation Of Russian-language Architectures (2025) • No Venue
Chervyakov et al.
System Prompt Optimization With Meta-learning (2025) • No Venue
Yumin Choi, Jinheon Baek, Sung Ju Hwang
Instruction-guided Lesion Segmentation For Chest X-rays With Automatically Generated Large-scale Dataset (2025) • No Venue
Choi et al.
WEAVE: Unleashing And Benchmarking The In-context Interleaved Comprehension And Generation (2025) • No Venue
Chow et al.
Overview Of The TREC 2021 Deep Learning Track (2025) • Arxiv • 58 citations
Craswell et al.
This Time Is Different: An Observability Perspective On Time Series Foundation Models (2025) • No Venue
Cohen et al.
Reinforcement Learning For Reasoning In Small Llms: What Works And What Doesn't (2025) • No Venue
Quy-Anh Dang, Chris Ngo
Meshcoder: Llm-powered Structured Mesh Code Generation From Point Clouds (2025) • No Venue
Dai et al.
Toolscope: An Agentic Framework For Vision-guided And Long-horizon Tool Use (2025) • No Venue
Mengjie Deng, Guanting Dong, Zhicheng Dou
Self-improvement In Multimodal Large Language Models: A Survey (2025) • No Venue
Deng et al.
CLIMB: Clustering-based Iterative Data Mixture Bootstrapping For Language Model Pre-training (2025) • No Venue
Diao et al.
Mmdocir: Benchmarking Multi-modal Retrieval For Long Documents (2025) • No Venue
Dong et al.
Motionsight: Boosting Fine-grained Motion Understanding In Multimodal Llms (2025) • No Venue
Du et al.
Megascience: Pushing The Frontiers Of Post-training Datasets For Science Reasoning (2025) • No Venue
Run-Ze Fan, Zengzhi Wang, Pengfei Liu
Missing Premise Exacerbates Overthinking: Are Reasoning Models Losing Critical Thinking Skill? (2025) • No Venue
Fan et al.
Flux-reason-6m & Prism-bench: A Million-scale Text-to-image Reasoning Dataset And Comprehensive Benchmark (2025) • No Venue
Fang et al.
Got: Unleashing Reasoning Capability Of Multimodal Large Language Model For Visual Generation And Editing (2025) • No Venue
Fang et al.
Grounding Computer Use Agents On Human Demonstrations (2025) • No Venue
Feizi et al.
Can Mllms Guide Me Home? A Benchmark Study On Fine-grained Visual Reasoning From Transit Maps (2025) • No Venue
Feng et al.
WILDCHAT-50M: A Deep Dive Into The Role Of Synthetic Data In Post-training (2025) • No Venue
Benjamin Feuer, Chinmay Hegde
Video-r1: Reinforcing Video Reasoning In Mllms (2025) • No Venue
Feng et al.
Listener-rewarded Thinking In Vlms For Image Preferences (2025) • No Venue
Gambashidze et al.
Cognitive Behaviors That Enable Self-improving Reasoners, Or, Four Habits Of Highly Effective Stars (2025) • No Venue
Gandhi et al.
A Strategic Coordination Framework Of Small Llms Matches Large Llms In Data Synthesis (2025) • No Venue
Gao et al.
R&B: Domain Regrouping And Data Mixture Balancing For Efficient Foundation Model Training (2025) • No Venue
Ge et al.
Arc-hunyuan-video-7b: Structured Video Comprehension Of Real-world Shorts (2025) • No Venue
Ge et al.
Audio Flamingo 2: An Audio-language Model With Long-audio Understanding And Expert Reasoning Abilities (2025) • No Venue
Ghosh et al.
Lment: A Suite For Analyzing Knowledge In Language Models From Pretraining Data To Representations (2025) • No Venue
Gottesman et al.
Openthoughts: Data Recipes For Reasoning Models (2025) • No Venue
Guha et al.
ACADREASON: Exploring The Limits Of Reasoning Models With Academic Research Problems (2025) • No Venue
Gui et al.
Swe-factory: Your Automated Factory For Issue Resolution Training Data And Evaluation Benchmarks (2025) • No Venue
Guo et al.
Beyond The Last Answer: Your Reasoning Trace Uncovers More Than You Think (2025) • No Venue
Hasan Abed Al Kader Hammoud, Hani Itani, Bernard Ghanem
Mesatask: Towards Task-driven Tabletop Scene Generation Via 3D Spatial Reasoning (2025) • No Venue
Hao et al.
MAGA: Massive Genre-audience Reformulation To Pretraining Corpus Expansion (2025) • No Venue
Xintong Hao, Ke Shen, Chenggang Li
Unireditbench: A Unified Reasoning-based Image Editing Benchmark (2025) • No Venue
Han et al.
Learnings From Scaling Visual Tokenizers For Reconstruction And Generation (2025) • No Venue
Hansen-Estruch et al.
Pasa: An LLM Agent For Comprehensive Academic Paper Search (2025) • No Venue
He et al.
Hardtests: Synthesizing High-quality Test Cases For LLM Coding (2025) • No Venue
He et al.
Videossr: Video Self-supervised Reinforcement Learning (2025) • No Venue
He et al.
CASS: Nvidia To AMD Transpilation With Data, Models, And Benchmark (2025) • No Venue
Heakl et al.
Mutarjim: Advancing Bidirectional Arabic-english Translation With A Small Language Model (2025) • No Venue
Hennara et al.
Wasm: A Pipeline For Constructing Structured Arabic Interleaved Multimodal Corpora (2025) • No Venue
Hennara et al.
Charting And Navigating Hugging Face's Model Atlas (2025) • No Venue
Horwitz et al.
Quest: Incentivizing Llms To Generate Difficult Problems (2025) • No Venue
Hu et al.
Finsearchcomp: Towards A Realistic, Expert-level Evaluation Of Financial Search And Reasoning (2025) • No Venue
Hu et al.
A Survey Of Scientific Large Language Models: From Data Foundations To Agent Frontiers (2025) • No Venue
Hu et al.
Video-mmmu: Evaluating Knowledge Acquisition From Multi-discipline Professional Videos (2025) • No Venue
Hu et al.
Benchmax: A Comprehensive Multilingual Evaluation Suite For Large Language Models (2025) • No Venue
Huang et al.
Loong: Synthesize Long Chain-of-thoughts At Scale Through Verifiers (2025) • No Venue
Huang et al.
Vision-r1: Incentivizing Reasoning Capability In Multimodal Large Language Models (2025) • No Venue
Huang et al.
Vistadpo: Video Hierarchical Spatial-temporal Direct Preference Optimization For Large Video Models (2025) • No Venue
Huang et al.
Sentinel: SOTA Model To Protect Against Prompt Injections (2025) • No Venue
Dror Ivry, Oran Nahum
The African Languages Lab: A Collaborative Approach To Advancing Low-resource African NLP (2025) • No Venue
Issaka et al.
Ambik: Dataset Of Ambiguous Tasks In Kitchen Environment (2025) • No Venue
Ivanova et al.
Reasoning Model Is Stubborn: Diagnosing Instruction Overriding In Reasoning Models (2025) • No Venue
Jang et al.
Adaptive Multi-agent Response Refinement In Conversational Systems (2025) • No Venue
Jeong et al.
CSVQA: A Chinese Multimodal Benchmark For Evaluating STEM Reasoning Capabilities Of Vlms (2025) • No Venue
Jian et al.
Omnispatial: Towards Comprehensive Spatial Reasoning Benchmark For Vision Language Models (2025) • No Venue
Jia et al.
Visualwebinstruct: Scaling Up Multimodal Instruction Data Through Web Search (2025) • No Venue
Jia et al.
Rynnvla-001: Using Human Demonstrations To Improve Robot Manipulation (2025) • No Venue
Jiang et al.
Omni-reward: Towards Generalist Omni-modal Reward Modeling With Free-form Preferences (2025) • No Venue
Jin et al.
Expect The Unexpected: Failsafe Long Context QA For Finance (2025) • No Venue
Kamble et al.
The Common Pile V0.1: An 8TB Dataset Of Public Domain And Openly Licensed Text (2025) • No Venue
Kandpal et al.
First Try Matters: Revisiting The Role Of Reflection In Reasoning Models (2025) • No Venue
Kang et al.
LEGION: Learning To Ground And Explain For Synthetic Image Detection (2025) • No Venue
Kang et al.
Robot-r1: Reinforcement Learning For Enhanced Embodied Reasoning In Robotics (2025) • No Venue
Kim et al.
Mol-llama: Towards General Understanding Of Molecules In Large Molecular Language Model (2025) • No Venue
Dongki Kim, Wonbin Lee, Sung Ju Hwang
From Scores To Skills: A Cognitive Diagnosis Framework For Evaluating Financial Large Language Models (2025) • No Venue
Kuang et al.
Nohumansrequired: Autonomous High-quality Image Editing Triplet Mining (2025) • No Venue
Kuprashevich et al.
Opensir: Open-ended Self-improving Reasoner (2025) • No Venue
Kwan et al.
Mini-o3: Scaling Up Reasoning Patterns And Interaction Turns For Visual Search (2025) • No Venue
Lai et al.
Rethinking Reward Models For Multi-domain Test-time Scaling (2025) • No Venue
Lee et al.
Stream3r: Scalable Sequential 3D Reconstruction With Causal Transformer (2025) • No Venue
Lan et al.
MMR1: Enhancing Multimodal Reasoning With Variance-aware Sampling And Open Resources (2025) • No Venue
Leng et al.
Miromind-m1: An Open-source Advancement In Mathematical Reasoning Via Context-aware Multi-stage Policy Optimization (2025) • No Venue
Li et al.
IGGT: Instance-grounded Geometry Transformer For Semantic 3D Reconstruction (2025) • No Venue
Li et al.
Droplet3d: Commonsense Priors From Videos Facilitate 3D Generation (2025) • No Venue
Li et al.
Drafterbench: Benchmarking Large Language Models For Tasks Automation In Civil Engineering (2025) • No Venue
Yinsheng Li, Zhen Dong, Yi Shao
Migician: Revealing The Magic Of Free-form Multi-image Grounding In Multimodal Large Language Models (2025) • No Venue
Li et al.
Ovo-bench: How Far Is Your Video-llms From Real-world Online Video Understanding? (2025) • No Venue
Li et al.
Sos1: O1 And R1-like Reasoning Llms Are Sum-of-square Solvers (2025) • No Venue
Li et al.
SWE-SQL: Illuminating LLM Pathways To Solve User SQL Issues In Real-world Applications (2025) • No Venue
Li et al.
Temporal Preference Optimization For Long-form Video Understanding (2025) • No Venue
Li et al.
Truth In The Few: High-value Data Selection For Efficient Multi-modal Reasoning (2025) • No Venue
Li et al.
Zebra-cot: A Dataset For Interleaved Vision Language Reasoning (2025) • No Venue
Li et al.
Describe Anything: Detailed Localized Image And Video Captioning (2025) • No Venue
Lian et al.
Modomodo: Multi-domain Data Mixtures For Multimodal LLM Reinforcement Learning (2025) • No Venue
Liang et al.
URECA: Unique Region Caption Anything (2025) • No Venue
Lim et al.
Embrace-3k: Embodied Reasoning And Action In Complex Environments (2025) • No Venue
Lin et al.
Partcrafter: Structured 3D Mesh Generation Via Compositional Latent Diffusion Transformers (2025) • No Venue
Lin et al.
Ost-bench: Evaluating The Capabilities Of Mllms In Online Spatio-temporal Scene Understanding (2025) • No Venue
Lin et al.
Towards Understanding Camera Motions In Any Video (2025) • No Venue
Lin et al.
Beyond Distillation: Pushing The Limits Of Medical LLM Reasoning With Minimalist Rule-based RL (2025) • No Venue
Liu et al.
Llm-powered GUI Agents In Phone Automation: Surveying Progress And Prospects (2025) • No Venue
Liu et al.
Langscene-x: Reconstruct Generalizable 3D Language-embedded Scenes With Trimap Video Diffusion (2025) • No Venue
Liu et al.
Shotbench: Expert-level Cinematic Understanding In Vision-language Models (2025) • No Venue
Liu et al.
Part I: Tricks Or Traps? A Deep Dive Into RL For LLM Reasoning (2025) • No Venue
Liu et al.
Pairwise RM: Perform Best-of-n Sampling With Knockout Tournament (2025) • No Venue
Liu et al.
Quadmix: Quality-diversity Balanced Data Selection For Efficient LLM Pretraining (2025) • No Venue
Liu et al.
Points-reader: Distillation-free Adaptation Of Vision-language Models For Document Conversion (2025) • No Venue
Liu et al.
Rstar-coder: Scaling Competitive Code Reasoning With A Large-scale Verified Dataset (2025) • No Venue
Liu et al.
Scalecua: Scaling Open-source Computer Use Agents With Cross-platform Data (2025) • No Venue
Liu et al.
Skywork-reward-v2: Scaling Preference Data Curation Via Human-ai Synergy (2025) • No Venue
Liu et al.
Taking Notes Brings Focus? Towards Multi-turn Multimodal Dialogue Learning (2025) • No Venue
Liu et al.
Synlogic: Synthesizing Verifiable Reasoning Data At Scale For Learning Logical Reasoning And Beyond (2025) • No Venue
Liu et al.
Unimoe-audio: Unified Speech And Music Generation With Dynamic-capacity Moe (2025) • No Venue
Liu et al.
BIOMEDICA: An Open Biomedical Image-caption Archive, Dataset, And Vision-language Models Derived From Scientific Literature (2025) • No Venue
Lozano et al.
Elv-halluc: Benchmarking Semantic Aggregation Hallucinations In Long Video Understanding (2025) • No Venue
Lu et al.
Av-reasoner: Improving And Benchmarking Clue-grounded Audio-visual Counting For Mllms (2025) • No Venue
Lu et al.
Finmme: Benchmark Dataset For Financial Multi-modal Reasoning Evaluation (2025) • No Venue
Luo et al.
URSA: Understanding And Verifying Chain-of-thought Reasoning In Multimodal Mathematics (2025) • No Venue
Luo et al.
C3: A Bilingual Benchmark For Spoken Dialogue Models Exploring Challenges In Complex Conversations (2025) • No Venue
Chengqian Ma, Wei Tao, Yiwen Guo
General-reasoner: Advancing LLM Reasoning Across All Domains (2025) • No Venue
Ma et al.
Beyondweb: Lessons From Scaling Synthetic Data For Trillion-scale Pretraining (2025) • No Venue
Maini et al.
Wikivideo: Article Generation From Multiple Videos (2025) • No Venue
Martin et al.
Hard Negative Mining For Domain-specific Retrieval In Enterprise Systems (2025) • No Venue
Meghwani et al.
Swe-lancer: Can Frontier Llms Earn $1 Million From Real-world Freelance Software Engineering? (2025) • No Venue
Miserendino et al.
Synthdetoxm: Modern Llms Are Few-shot Parallel Detoxification Data Annotators (2025) • No Venue
Moskovskiy et al.
Do Generative Video Models Learn Physical Principles From Watching Videos? (2025) • No Venue
Motamed et al.
Smoldocling: An Ultra-compact Vision-language Model For End-to-end Multi-modal Document Conversion (2025) • No Venue
Nassar et al.
Annotation-efficient Universal Honesty Alignment (2025) • No Venue
Ni et al.
Viscoder2: Building Multi-language Visualization Coding Agents (2025) • No Venue
Ni et al.
Viscoder: Fine-tuning Llms For Executable Python Visualization Code Generation (2025) • No Venue
Ni et al.
Does Understanding Inform Generation In Unified Multimodal Models? From Analysis To Path Forward (2025) • No Venue
Niu et al.
Benchmarking Llms' Swarm Intelligence (2025) • No Venue
Ruan et al.
Large Language Models Meet Extreme Multi-label Classification: Scaling And Multi-modal Framework (2025) • No Venue
Ortego et al.
Paper2poster: Towards Multimodal Poster Automation From Scientific Papers (2025) • No Venue
Pang et al.
Mathfusion: Enhancing Mathematic Problem-solving Of LLM Through Instruction Fusion (2025) • No Venue
Pei et al.
Fineweb2: One Pipeline To Scale Them All -- Adapting Pre-training Data Processing To Every Language (2025) • No Venue
Penedo et al.
Multifinben: A Multilingual, Multimodal, And Difficulty-aware Benchmark For Financial LLM Evaluation (2025) • No Venue
Peng et al.
Plutus: Benchmarking Large Language Models In Low-resource Greek Finance (2025) • No Venue
Peng et al.
Humanity's Last Exam (2025) • No Venue
Phan et al.
An Open Recipe: Adapting Language-specific Llms To A Reasoning Model In One Day Via Model Merging (2025) • No Venue
Pipatanakul et al.
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification To Improve Trustworthy QA (2025) • No Venue
Pletenev et al.
THOUGHTTERMINATOR: Benchmarking, Calibrating, And Mitigating Overthinking In Reasoning Models (2025) • No Venue
Pu et al.
Generating Physically Stable And Buildable LEGO Designs From Text (2025) • No Venue
Pun et al.
Sofar: Language-grounded Orientation Bridges Spatial Reasoning And Object Manipulation (2025) • No Venue
Qi et al.
Pico-banana-400k: A Large-scale Dataset For Text-guided Image Editing (2025) • No Venue
Qian et al.
Fino1: On The Transferability Of Reasoning Enhanced Llms To Finance (2025) • No Venue
Qian et al.
V-thinker: Interactive Thinking With Images (2025) • No Venue
Qiao et al.
We-math 2.0: A Versatile Mathbook System For Incentivizing Visual Mathematical Reasoning (2025) • No Venue
Qiao et al.
Animeshooter: A Multi-shot Animation Dataset For Reference-guided Video Generation (2025) • No Venue
Qiu et al.
Phybench: Holistic Evaluation Of Physical Perception And Reasoning In Large Language Models (2025) • No Venue
Qiu et al.
How Well Does Gpt-4o Understand Vision? Evaluating Multimodal Foundation Models On Standard Computer Vision Tasks (2025) • No Venue
Ramachandran et al.
Videomathqa: Benchmarking Mathematical Reasoning Via Multimodal Understanding In Videos (2025) • No Venue
Rasheed et al.
Anycap Project: A Unified Framework, Dataset, And Benchmark For Controllable Omni-modal Captioning (2025) • No Venue
Ren et al.
Zerobench: An Impossible Visual Benchmark For Contemporary Large Multimodal Models (2025) • No Venue
Roberts et al.
When Models Lie, We Learn: Multilingual Span-level Hallucination Detection With Psiloqa (2025) • No Venue
Rykov et al.
Dota-rag: Dynamic Of Thought Aggregation RAG (2025) • No Venue
Ruangtanusak et al.
Through The Looking Glass: Common Sense Consistency Evaluation Of Weird Images (2025) • No Venue
Rykov et al.
Aligning Text, Images, And 3D Structure Token-by-token (2025) • No Venue
Aadarsh Sahoo, Vansh Tibrewal, Georgia Gkioxari
Geopolitical Biases In Llms: What Are The "good" And The "bad" Countries According To Contemporary Language Models (2025) • No Venue
Salnikov et al.
ABC: Achieving Better Control Of Multimodal Embeddings Using Vlms (2025) • No Venue
Benjamin Schneider, Florian Kerschbaum, Wenhu Chen
Emonet-voice: A Fine-grained, Expert-verified Benchmark For Speech Emotion Detection (2025) • No Venue
Schuhmann et al.
Seedream 4.0: Toward Next-generation Multimodal Image Generation (2025) • No Venue
Seedream et al.
Reasonir: Training Retrievers For Reasoning Tasks (2025) • No Venue
Shao et al.
Solving Inequality Proofs With Large Language Models (2025) • No Venue
Sheng et al.
Phyx: Does Your Model Have The "wits" For Physical Reasoning? (2025) • No Venue
Shen et al.
Mathcanvas: Intrinsic Visual Chain-of-thought For Multimodal Mathematical Reasoning (2025) • No Venue
Shi et al.
Smolvla: A Vision-language-action Model For Affordable And Efficient Robotics (2025) • No Venue
Shukor et al.
Predictive Data Selection: The Data That Predicts Is The Data That Teaches (2025) • No Venue
Shum et al.
Dinov3 (2025) • No Venue
Siméoni et al.
Pushing On Multilingual Reasoning Models With Language-mixed Chain-of-thought (2025) • No Venue
Son et al.
Agent Data Protocol: Unifying Datasets For Diverse, Effective Fine-tuning Of LLM Agents (2025) • No Venue
Song et al.
DMM: Building A Versatile Image Generation Model Via Distillation-based Model Merging (2025) • No Venue
Song et al.
Makeanything: Harnessing Diffusion Transformers For Multi-domain Procedural Sequence Generation (2025) • No Venue
Yiren Song, Cheng Liu, Mike Zheng Shou
Alchemist: Turning Public Text-to-image Data Into Generative Gold (2025) • No Venue
Startsev et al.
Video-lmm Post-training: A Deep Dive Into Video Reasoning With Large Multimodal Models (2025) • No Venue
Tang et al.
Reasonmed: A 370K Multi-agent Generated Dataset For Advancing Medical Reasoning (2025) • No Venue
Sun et al.
Intrex: A Dataset For Modeling Engagement In Educational Conversations (2025) • No Venue
Tan et al.
Large Language Models For Data Synthesis (2025) • No Venue
Yihong Tang, Menglin Kong, Lijun Sun
Lingshu: A Generalist Foundation Model For Unified Multimodal Medical Understanding And Reasoning (2025) • No Venue
Team et al.
Personafeedback: A Large-scale Human-annotated Benchmark For Personalization (2025) • No Venue
Tao et al.
COIG-P: A High-quality And Large-scale Chinese Preference Dataset For Alignment With Human Values (2025) • No Venue
Team et al.
Minicpm4: Ultra-efficient Llms On End Devices (2025) • No Venue
Team et al.
Fixing Data That Hurts Performance: Cascading Llms To Relabel Hard Negatives For Robust Information Retrieval (2025) • No Venue
Thakur et al.
Audiox: Diffusion Transformer For Anything-to-audio Generation (2025) • No Venue
Tian et al.
MMMR: Benchmarking Massive Multi-modal Reasoning Tasks (2025) • No Venue
Tie et al.
Openmathinstruct-1: A 1.8 Million Math Instruction Tuning Dataset (2024) • No Venue
Toshniwal et al.
No "zero-shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance (2024) • No Venue
Udandarao et al.
Replacing Judges With Juries: Evaluating LLM Generations With A Panel Of Diverse Models (2024) • No Venue
Verga et al.
One Missing Piece In Vision And Language: A Survey On Comics Understanding (2024) • No Venue
Vivoli et al.
Meltemi: The First Open Large Language Model For Greek (2024) • No Venue
Voukoutis et al.
Qwen2.5 Technical Report (2024) • No Venue
Qwen et al.
Maya: An Instruction Finetuned Multilingual Multimodal Model (2024) • No Venue
Alam et al.
Understanding Alignment In Multimodal Llms: A Comprehensive Study (2024) • No Venue
Amirloo et al.
Skyeyegpt: Unifying Remote Sensing Vision-language Tasks Via Instruction Tuning With Large Language Model (2024) • ISPRS Journal of Photogrammetry and Remote Sensing • 51 citations
Yang Zhan, Zhitong Xiong, Yuan Yuan
Anygpt: Unified Multimodal LLM With Discrete Sequence Modeling (2024) • No Venue
Zhan et al.
Perplexed By Perplexity: Perplexity-based Data Pruning With Small Reference Models (2024) • No Venue
Ankner et al.
Chronos: Learning The Language Of Time Series (2024) • No Venue
Ansari et al.
Scenescript: Reconstructing Scenes With An Autoregressive Structured Language Model (2024) • No Venue
Avetisyan et al.
MINT-1T: Scaling Open-source Multimodal Data By 10x: A Multimodal Dataset With One Trillion Tokens (2024) • No Venue
Awadalla et al.
BLIP3-KALE: Knowledge Augmented Large-scale Dense Captions (2024) • No Venue
Awadalla et al.
Revisiting In-context Learning With Long Context Language Models (2024) • No Venue
Baek et al.
Screenai: A Vision-language Model For UI And Infographics Understanding (2024) • No Venue
Baechler et al.
Longwriter: Unleashing 10,000+ Word Generation From Long Context Llms (2024) • No Venue
Bai et al.
Fintral: A Family Of GPT-4 Level Multimodal Financial Large Language Models (2024) • No Venue
Bhatia et al.
INDUS: Effective And Efficient Language Models For Scientific Applications (2024) • No Venue
Bhattacharjee et al.
Visual Riddles: A Commonsense And World Knowledge Challenge For Large Vision And Language Models (2024) • No Venue
Bitton-Guetta et al.
Merlin: A Vision Language Foundation Model For 3D Computed Tomography (2024) • Arxiv • 45 citations
Blankemeier et al.
3dgraphllm: Combining Semantic Graphs And Large Language Models For 3D Scene Understanding (2024) • No Venue
Tatiana Zemskova, Dmitry Yudin
Long Code Arena: A Set Of Benchmarks For Long-context Code Models (2024) • No Venue
Bogomolov et al.
Transformers Meet Neural Algorithmic Reasoners (2024) • No Venue
Bounsi et al.
On The Compositional Generalization Of Multimodal Llms For Medical Imaging (2024) • No Venue
Cai et al.
Matryoshka Multimodal Models (2024) • No Venue
Cai et al.
Edgefusion: On-device Text-to-image Generation (2024) • No Venue
Castells et al.
PERSONA: A Reproducible Testbed For Pluralistic Alignment (2024) • No Venue
Castricato et al.
Swe-bench-java: A Github Issue Resolving Benchmark For Java (2024) • No Venue
Zan et al.
Pangea: A Fully Open Multilingual Multimodal LLM For 39 Languages (2024) • No Venue
Yue et al.
Getting It Right: Improving Spatial Consistency In Text-to-image Models (2024) • No Venue
Chatterjee et al.
Tx-llm: A Large Language Model For Therapeutics (2024) • No Venue
Chaves et al.
Premise Order Matters In Reasoning With Large Language Models (2024) • No Venue
Chen et al.
Chexagent: Towards A Foundation Model For Chest X-ray Interpretation (2024) • No Venue
Chen et al.
Compcap: Improving Multimodal Large Language Models With Composite Captions (2024) • No Venue
Chen et al.
Gmai-mmbench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI (2024) • No Venue
Chen et al.
Hallucination Detection: Robustly Discerning Reliable Answers In Large Language Models (2024) • CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management • 59 citations
Chen et al.
How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites (2024) • No Venue
Chen et al.
Language Models Are Hidden Reasoners: Unlocking Latent Reasoning Capabilities Via Self-rewarding (2024) • No Venue
Chen et al.
Interleaved Scene Graph For Interleaved Text-and-image Generation Assessment (2024) • No Venue
Chen et al.
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey (2024) • No Venue
Chen et al.
Motionllm: Understanding Human Behaviors From Human Motions And Videos (2024) • No Venue
Chen et al.
MS MARCO Web Search: A Large-scale Information-rich Web Dataset With Millions Of Real Click Labels (2024) • No Venue
Chen et al.
Panda-70m: Captioning 70M Videos With Multiple Cross-modality Teachers (2024) • No Venue
Chen et al.
Reverse Thinking Makes Llms Stronger Reasoners (2024) • No Venue
Chen et al.
Self-play Fine-tuning Converts Weak Language Models To Strong Language Models (2024) • No Venue
Chen et al.
Visionts: Visual Masked Autoencoders Are Free-lunch Zero-shot Time Series Forecasters (2024) • No Venue
Chen et al.
Mmmu-pro: A More Robust Multi-discipline Multimodal Understanding Benchmark (2024) • No Venue
Yue et al.
Unist: A Prompt-empowered Universal Model For Urban Spatio-temporal Prediction (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 51 citations
Yuan et al.
Videorefer Suite: Advancing Spatial-temporal Object Understanding With Video LLM (2024) • No Venue
Yuan et al.
Chronomagic-bench: A Benchmark For Metamorphic Evaluation Of Text-to-time-lapse Video Generation (2024) • No Venue
Yuan et al.
Open-vocabulary SAM: Segment And Recognize Twenty-thousand Classes Interactively (2024) • No Venue
Yuan et al.
Magictime: Time-lapse Video Generation Models As Metamorphic Simulators (2024) • No Venue
Yuan et al.
MMAU: A Holistic Benchmark Of Agent Capabilities Across Diverse Domains (2024) • No Venue
Yin et al.
M3docrag: Multi-modal Retrieval Is What You Need For Multi-page Multi-document Understanding (2024) • No Venue
Cho et al.
M-longdoc: A Benchmark For Multimodal Super-long Document Understanding And A Retrieval-aware Tuning Framework (2024) • No Venue
Chia et al.
A Flexible Large Language Models Guardrail Development Methodology Applied To Off-topic Prompt Detection (2024) • No Venue
Gabriel Chua, Shing Yee Chan, Shaun Khoo
Heavy Labels Out! Dataset Distillation With Label Space Lightening (2024) • No Venue
Yu et al.
Toto: Time Series Optimized Transformer For Observability (2024) • No Venue
Cohen et al.
Saullm-54b & Saullm-141b: Scaling Up Domain Adaptation For The Legal Domain (2024) • No Venue
Colombo et al.
Towards A Personal Health Large Language Model (2024) • No Venue
Cosentino et al.
NVLM: Open Frontier-class Multimodal Llms (2024) • No Venue
Dai et al.
Molmo And Pixmo: Open Weights And Open Data For State-of-the-art Multimodal Models (2024) • No Venue
Deitke et al.
Coconut: Modernizing COCO Segmentation (2024) • No Venue
Deng et al.
Mapeval: A Map-based Evaluation Of Geo-spatial Reasoning In Foundation Models (2024) • No Venue
Dihan et al.
Ferret-ui: Grounded Mobile UI Understanding With Multimodal Llms (2024) • No Venue
You et al.
Unleashing Reasoning Capability Of Llms Via Scalable Question Synthesis From Scratch (2024) • No Venue
Ding et al.
Megapairs: Massive Data Synthesis For Universal Multimodal Retrieval (2024) • No Venue
Zhou et al.
Vintern-1b: An Efficient Multimodal Large Language Model For Vietnamese (2024) • No Venue
Doan et al.
Baichuanseed: Sharing The Potential Of Extensive Data Collection And Deduplication By Introducing A Competitive Large Language Model Baseline (2024) • No Venue
Dong et al.
Charxiv: Charting Gaps In Realistic Chart Understanding In Multimodal Llms (2024) • No Venue
Wang et al.
Git: Towards Generalist Vision Transformer Through Universal Language Interface (2024) • No Venue
Wang et al.
Multilingual E5 Text Embeddings: A Technical Report (2024) • No Venue
Wang et al.
Mtu-bench: A Multi-granularity Tool-use Benchmark For Large Language Models (2024) • No Venue
Wang et al.
Octo: An Open-source Generalist Robot Policy (2024) • No Venue
Team et al.
Mmlu-pro: A More Robust And Challenging Multi-task Language Understanding Benchmark (2024) • No Venue
Wang et al.
Helpsteer2-preference: Complementing Ratings With Preferences (2024) • No Venue
Wang et al.
Grutopia: Dream General Robots In A City At Scale (2024) • No Venue
Wang et al.
Lift: Leveraging Human Feedback For Text-to-video Model Alignment (2024) • No Venue
Wang et al.
How Do Your Code Llms Perform? Empowering Code Instruction Tuning With High-quality Data (2024) • No Venue
Wang et al.
Litesearch: Efficacious Tree Search For LLM (2024) • No Venue
Wang et al.
Structlm: Towards Building Generalist Models For Structured Knowledge Grounding (2024) • No Venue
Zhuang et al.
EVA-CLIP-18B: Scaling CLIP To 18 Billion Parameters (2024) • No Venue
Sun et al.
LAMBDA: A Large Model Based Data Agent (2024) • No Venue
Sun et al.
T2v-compbench: A Comprehensive Benchmark For Compositional Text-to-video Generation (2024) • No Venue
Sun et al.
Parrot: Multilingual Visual Instruction Tuning (2024) • No Venue
Sun et al.
Planetarium: A Rigorous Benchmark For Translating Text To Structured Planning Languages (2024) • No Venue
Zuo et al.
Video-star: Self-training Enables Video Instruction Tuning With Any Supervision (2024) • No Venue
Zohar et al.
Llava-3d: A Simple Yet Effective Pathway To Empowering Lmms With 3d-awareness (2024) • No Venue
Zhu et al.
Yolov9: Learning What You Want To Learn Using Programmable Gradient Information (2024) • No Venue
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
Diasynth -- Synthetic Dialogue Generation Framework (2024) • No Venue
Suresh et al.
Videogamebunny: Towards Vision Assistants For Video Games (2024) • No Venue
Mohammad Reza Taesiri, Cor-Paul Bezemer
TIP-I2V: A Million-scale Real Text And Image Prompt Dataset For Image-to-video Generation (2024) • No Venue
Wenhao Wang, Yi Yang
Judgebench: A Benchmark For Evaluating Llm-based Judges (2024) • No Venue
Tan et al.
Omnieval: An Omnidirectional And Automatic RAG Evaluation Benchmark In Financial Domain (2024) • No Venue
Wang et al.
PIN: A Knowledge-intensive Dataset For Paired And Interleaved Multimodal Documents (2024) • No Venue
Wang et al.
Textsquare: Scaling Up Text-centric Visual Instruction Tuning (2024) • No Venue
Tang et al.
Ominicontrol: Minimal And Universal Control For Diffusion Transformer (2024) • No Venue
Tan et al.
Ref-avs: Refer And Segment Objects In Audio-visual Scenes (2024) • No Venue
Wang et al.
Grandmaster-level Chess Without Search (2024) • No Venue
Ruoss et al.
Atlas-chat: Adapting Large Language Models For Low-resource Moroccan Arabic Dialect (2024) • No Venue
Shang et al.
MMAU: A Massive Multi-task Audio Understanding And Reasoning Benchmark (2024) • No Venue
Sakshi et al.
Blended RAG: Improving RAG (retriever-augmented Generation) Accuracy With Semantic Search And Hybrid Query-based Retrievers (2024) • 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR) • 47 citations
Kunal Sawarkar, Abhilasha Mangal, Shivam Raj Solanki
Truth Or Mirage? Towards End-to-end Factuality Evaluation With LLM-OASIS (2024) • No Venue
Scirè et al.
Livexiv -- A Multi-modal Live Benchmark Based On Arxiv Papers Content (2024) • No Venue
Shabtay et al.
Synth^2: Boosting Visual-language Models With Synthetic Captions And Image Embeddings (2024) • No Venue
Sharifzadeh et al.
Jetmoe: Reaching Llama2 Performance With 0.1M Dollars (2024) • No Venue
Shen et al.
Aya Model: An Instruction Finetuned Open-access Multilingual Language Model (2024) • No Venue
Üstün et al.
PERL: Parameter Efficient Reinforcement Learning From Human Feedback (2024) • No Venue
Sidahmed et al.
Can Large Language Models Understand Context? (2024) • No Venue
Zhu et al.
Aya Dataset: An Open-access Collection For Multilingual Instruction Tuning (2024) • No Venue
Singh et al.
MARVEL-40M+: Multi-level Visual Elaboration For High-fidelity Text-to-3d Content Creation (2024) • No Venue
Sinha et al.
Global MMLU: Understanding And Addressing Cultural And Linguistic Biases In Multilingual Evaluation (2024) • No Venue
Singh et al.
A Large Encoder-decoder Family Of Foundation Models For Chemical Language (2024) • No Venue
Soares et al.
The Russian-focused Embedders' Exploration: Rumteb Benchmark And Russian Embedding Model Design (2024) • No Venue
Snegirev et al.
Dolma: An Open Corpus Of Three Trillion Tokens For Language Model Pretraining Research (2024) • No Venue
Soldaini et al.
Both Text And Images Leaked! A Systematic Analysis Of Multimodal LLM Data Contamination (2024) • No Venue
Song et al.
Moviellm: Enhancing Long Video Understanding With Ai-generated Movies (2024) • No Venue
Song et al.
To Cot Or Not To Cot? Chain-of-thought Helps Mainly On Math And Symbolic Reasoning (2024) • No Venue
Sprague et al.
Canttalkaboutthis: Aligning Language Models To Stay On Topic In Dialogues (2024) • No Venue
Sreedhar et al.
Aligning Teacher With Student Preferences For Tailored Training Data Generation (2024) • No Venue
Liu et al.
Best Practices And Lessons Learned On Synthetic Data For Language Models (2024) • No Venue
Liu et al.
Apigen: Automated Pipeline For Generating Verifiable And Diverse Function-calling Datasets (2024) • No Venue
Liu et al.
DDK: Distilling Domain Knowledge For Efficient Large Language Models (2024) • No Venue
Liu et al.
Glyph-byt5-v2: A Strong Aesthetic Baseline For Accurate Multilingual Visual Text Rendering (2024) • No Venue
Liu et al.
Harnessing Webpage Uis For Text-rich Visual Understanding (2024) • No Venue
Liu et al.
Longgenbench: Long-context Generation Benchmark (2024) • No Venue
Liu et al.
MIA-DPO: Multi-image Augmented Direct Preference Optimization For Large Vision-language Models (2024) • No Venue
Liu et al.
POINTS1.5: Building A Vision-language Model Towards Real World Applications (2024) • No Venue
Liu et al.
POINTS: Improving Your Vision-language Model With Affordable Strategies (2024) • No Venue
Liu et al.
Cambrian-1: A Fully Open, Vision-centric Exploration Of Multimodal Llms (2024) • No Venue
Tong et al.
Skywork-reward: Bag Of Tricks For Reward Modeling In Llms (2024) • No Venue
Liu et al.
Teach Multimodal Llms To Comprehend Electrocardiographic Images (2024) • No Venue
Liu et al.
Spatial-temporal Large Language Model For Traffic Prediction (2024) • 2024 25th IEEE International Conference on Mobile Data Management (MDM) • 56 citations
Liu et al.
World Model On Million-length Video And Language With Ringattention (2024) • No Venue
Liu et al.
RULE: Reliable Multimodal RAG For Factuality In Medical Vision Language Models (2024) • No Venue
Xia et al.
Video Instruction Tuning With Synthetic Data (2024) • No Venue
Zhang et al.
MAVIS: Mathematical Visual Instruction Tuning (2024) • No Venue
Zhang et al.
Mme-realworld: Could Your Multimodal LLM Challenge High-resolution Real-world Scenarios That Are Difficult For Humans? (2024) • No Venue
Zhang et al.
Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments (2024) • No Venue
Xi et al.
Multimodal Self-instruct: Synthetic Abstract Image And Visual Reasoning Instruction Using Language Model (2024) • No Venue
Zhang et al.
SPAR: Personalized Content-based Recommendation Via Long Engagement Attention (2024) • No Venue
Zhang et al.
Seacrowd: A Multilingual Multimodal Data Hub And Benchmark Suite For Southeast Asian Languages (2024) • No Venue
Lovenia et al.
Starcoder 2 And The Stack V2: The Next Generation (2024) • No Venue
Lozhkov et al.
Large Language Models Are Superpositions Of All Characters: Attaining Arbitrary Role-play Via Self-alignment (2024) • No Venue
Lu et al.
Generative World Explorer (2024) • No Venue
Lu et al.
Mathverse: Does Your Multi-modal LLM Truly See The Diagrams In Visual Math Problems? (2024) • No Venue
Zhang et al.
Omniparser For Pure Vision Based GUI Agent (2024) • No Venue
Lu et al.
Mathcoder2: Better Math Reasoning From Continued Pretraining On Model-translated Mathematical Code (2024) • No Venue
Lu et al.
Robustft: Robust Supervised Fine-tuning For Large Language Models Under Noisy Response (2024) • No Venue
Luo et al.
Mmevol: Empowering Multimodal Large Language Models With Evol-instruct (2024) • No Venue
Luo et al.
Weblinx: Real-world Website Navigation With Multi-turn Dialogue (2024) • No Venue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
Reft: Reasoning With Reinforced Fine-tuning (2024) • No Venue
Luong et al.
Aria Everyday Activities Dataset (2024) • No Venue
Lv et al.
Diffsensei: Bridging Multi-modal Llms And Diffusion Models For Customized Manga Generation (2024) • No Venue
Wu et al.
Plot2code: A Comprehensive Benchmark For Evaluating Multi-modal Large Language Models In Code Generation From Scientific Plots (2024) • No Venue
Wu et al.
Foundation Models For Music: A Survey (2024) • No Venue
Ma et al.
Futga: Towards Fine-grained Music Understanding Through Temporally-enhanced Generative Augmentation (2024) • No Venue
Wu et al.
Fiva: Fine-grained Visual Attribute Dataset For Text-to-image Diffusion Models (2024) • No Venue
Wu et al.
Wildchat: 1M Chatgpt Interaction Logs In The Wild (2024) • No Venue
Zhao et al.
Eurollm: Multilingual Language Models For Europe (2024) • No Venue
Martins et al.
Improving Text-to-image Consistency Via Automatic Prompt Optimization (2024) • No Venue
Mañas et al.
Openelm: An Efficient Language Model Family With Open-source Training And Inference Framework (2024) • No Venue
Mehta et al.
Videoglamm: A Large Multimodal Model For Pixel-level Visual Grounding In Videos (2024) • No Venue
Munasinghe et al.
A Pointer Network-based Approach For Joint Extraction And Detection Of Multi-label Multi-class Intents (2024) • No Venue
Mullick et al.
Yesbut: A High-quality Annotated Multimodal Dataset For Evaluating Satire Comprehension Capability Of Vision-language Models (2024) • No Venue
Nandy et al.
Openvid-1m: A Large-scale High-quality Dataset For Text-to-video Generation (2024) • No Venue
Nan et al.
Preference Tuning With Human Feedback On Language, Speech, And Vision Tasks: A Survey (2024) • No Venue
Winata et al.
A Survey Of Small Language Models (2024) • No Venue
Nguyen et al.
User-llm: Efficient LLM Contextualization With User Embeddings (2024) • No Venue
Ning et al.
Xland-100b: A Large-scale Multi-task Dataset For In-context Reinforcement Learning (2024) • No Venue
Nikulin et al.
Llms Know More Than They Show: On The Intrinsic Representation Of LLM Hallucinations (2024) • No Venue
Orgad et al.
Omnidocbench: Benchmarking Diverse PDF Document Parsing With Comprehensive Annotations (2024) • No Venue
Ouyang et al.
Worldcuisines: A Massive-scale Benchmark For Multilingual And Multicultural Visual Question Answering On Global Cuisines (2024) • No Venue
Winata et al.
Training Software Engineering Agents And Verifiers With Swe-gym (2024) • No Venue
Pan et al.
Llmlingua-2: Data Distillation For Efficient And Faithful Task-agnostic Prompt Compression (2024) • No Venue
Pan et al.
IOPO: Empowering Llms With Complex Instruction Following Via Input-output Preference Optimization (2024) • No Venue
Zhang et al.
Datadreamer: A Tool For Synthetic Data Generation And Reproducible LLM Workflows (2024) • No Venue
Ajay Patel, Colin Raffel, Chris Callison-Burch
Survey Of Cultural Awareness In Language Models: Text And Beyond (2024) • No Venue
Pawar et al.
Dreambench++: A Human-aligned Benchmark For Personalized Image Generation (2024) • No Venue
Peng et al.
Large Language Model Confidence Estimation Via Black-box Access (2024) • No Venue
Pedapati et al.
The Fineweb Datasets: Decanting The Web For The Finest Text Data At Scale (2024) • No Venue
Penedo et al.
Livebench: A Challenging, Contamination-free LLM Benchmark (2024) • No Venue
White et al.
A Toolbox For Surfacing Health Equity Harms And Biases In Large Language Models (2024) • Nature Medicine • 46 citations
Pfohl et al.
We-math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? (2024) • No Venue
Qiao et al.
Evaluating D-MERIT Of Partial-annotation On Information Retrieval (2024) • No Venue
Rassin et al.
Adapting Safe-for-work Classifier For Malaysian Language Text: Enhancing Alignment In Llm-ops Framework (2024) • No Venue
Razak et al.
Redpajama: An Open Dataset For Training Large Language Models (2024) • No Venue
Weber et al.
VISTA: Enhancing Long-duration And High-resolution Video Understanding By Video Spatiotemporal Augmentation (2024) • No Venue
Ren et al.
Omniedit: Building Image Editing Generalist Models Through Specialist Supervision (2024) • No Venue
Wei et al.
Urbench: A Comprehensive Benchmark For Evaluating Large Multimodal Models In Multi-view Urban Scenarios (2024) • No Venue
Zhou et al.
Paint By Inpaint: Learning To Add Image Objects By Removing Them First (2024) • No Venue
Wasserman et al.
RLHF Workflow: From Reward Modeling To Online RLHF (2024) • No Venue
Dong et al.
Toward General Instruction-following Alignment For Retrieval-augmented Generation (2024) • No Venue
Dong et al.
CLEAR: Character Unlearning In Textual And Visual Modalities (2024) • No Venue
Dontsov et al.
Hyperclova X Technical Report (2024) • No Venue
Yoo et al.
An Interactive Agent Foundation Model (2024) • No Venue
Durante et al.
Learning To Move Like Professional Counter-strike Players (2024) • No Venue
Durst et al.
Processbench: Identifying Process Errors In Mathematical Reasoning (2024) • No Venue
Zheng et al.
Chemllm: A Chemical Large Language Model (2024) • No Venue
Zhang et al.
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark (2024) • No Venue
Zhang et al.
Croissantllm: A Truly Bilingual French-english Language Model (2024) • No Venue
Faysse et al.
Test Of Time: A Benchmark For Evaluating Llms On Temporal Reasoning (2024) • No Venue
Fatemi et al.
Enhancing Video-language Representations With Structural Spatio-temporal Alignment (2024) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 49 citations
Fei et al.
Openfedllm: Training Large Language Models On Decentralized Private Data Via Federated Learning (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 47 citations
Ye et al.
RAG Foundry: A Framework For Enhancing Llms For Retrieval Augmented Generation (2024) • No Venue
Fleischer et al.
Video-mme: The First-ever Comprehensive Evaluation Benchmark Of Multi-modal Llms In Video Analysis (2024) • No Venue
Fu et al.
LOKI: A Comprehensive Synthetic Data Detection Benchmark Using Large Multimodal Models (2024) • No Venue
Ye et al.
Mm-ego: Towards Building Egocentric Multimodal Llms (2024) • No Venue
Ye et al.
Omni-math: A Universal Olympiad Level Mathematic Benchmark For Large Language Models (2024) • No Venue
Gao et al.
Dreamreward: Text-to-3d Generation With Human Preference (2024) • No Venue
Ye et al.
Longins: A Challenging Long-context Instruction-based Exam For Llms (2024) • No Venue
Gavin et al.
Kvasir-vqa: A Text-image Pair GI Tract Dataset (2024) • No Venue
Gautam et al.
Are We Done With MMLU? (2024) • No Venue
Gema et al.
Socially Aware Synthetic Data Generation For Suicidal Ideation Detection Using Large Language Models (2024) • IEEE Access • 40 citations
Hamideh Ghanadian, Isar Nejadgholi, Hussein Al Osman
Learn Your Reference Model For Real Good Alignment (2024) • No Venue
Gorbatovski et al.
Zamba: A Compact 7B SSM Hybrid Model (2024) • No Venue
Glorioso et al.
Mulberry: Empowering MLLM With O1-like Reasoning And Reflection Via Collective Monte Carlo Tree Search (2024) • No Venue
Yao et al.
Atomovideo: High Fidelity Image-to-video Generation (2024) • No Venue
Gong et al.
Av-odyssey Bench: Can Your Multimodal Llms Really Understand Audio-visual Information? (2024) • No Venue
Gong et al.
Navigating The Digital World As Humans Do: Universal Visual Grounding For GUI Agents (2024) • No Venue
Gou et al.
Olmo: Accelerating The Science Of Language Models (2024) • No Venue
Groeneveld et al.
Sam2point: Segment Any 3D As Videos In Zero-shot And Promptable Manners (2024) • No Venue
Guo et al.
Mammoth-vl: Eliciting Multimodal Reasoning With Instruction Tuning At Scale (2024) • No Venue
Guo et al.
Direct Language Model Alignment From Online AI Feedback (2024) • No Venue
Guo et al.
Infimm-webmath-40b: Advancing Multimodal Pre-training For Enhanced Mathematical Reasoning (2024) • No Venue
Han et al.
Vision-language Models For Medical Report Generation And Visual Question Answering: A Review (2024) • Frontiers in Artificial Intelligence • 86 citations
Iryna Hartsock, Ghulam Rasool
Data Mixture Inference: What Do BPE Tokenizers Reveal About Their Training Data? (2024) • No Venue
Hayase et al.
Distill Visual Chart Reasoning Ability From Llms To Mllms (2024) • No Venue
He et al.
Cameractrl: Enabling Camera Control For Text-to-video Generation (2024) • No Venue
He et al.
Mmworld: Towards Multi-discipline Multi-faceted World Model Evaluation In Videos (2024) • No Venue
He et al.
UCFE: A User-centric Financial Expertise Benchmark For Large Language Models (2024) • No Venue
Yang et al.
Vript: A Video Is Worth Thousands Of Words (2024) • No Venue
Yang et al.
Thinking In Space: How Multimodal Large Language Models See, Remember, And Recall Spaces (2024) • No Venue
Yang et al.
CRAG -- Comprehensive RAG Benchmark (2024) • No Venue
Yang et al.
Sampart3d: Segment Any Part In 3D Objects (2024) • No Venue
Yang et al.
3D-GRAND: A Million-scale Dataset For 3d-llms With Better Grounding And Less Hallucination (2024) • No Venue
Yang et al.
Mplug-docowl 1.5: Unified Structure Learning For Ocr-free Document Understanding (2024) • No Venue
Hu et al.
Compression Represents Intelligence Linearly (2024) • No Venue
Huang et al.
Can Knowledge Editing Really Correct Hallucinations? (2024) • No Venue
Huang et al.
How Good Are Low-bit Quantized Llama3 Models? An Empirical Study (2024) • No Venue
Huang et al.
RU-AI: A Large Multimodal Dataset For Machine-generated Content Detection (2024) • Arxiv • 2046 citations
Huang et al.
Simple And Scalable Strategies To Continually Pre-train Large Language Models (2024) • No Venue
Ibrahim et al.
Gitchameleon: Unmasking The Version-switching Capabilities Of Code Generation Models (2024) • No Venue
Islah et al.
Improving Medical Reasoning Through Retrieval And Self-reflection With Retrieval-augmented Large Language Models (2024) • Bioinformatics • 50 citations
Jeong et al.
LEOPARD : A Vision Language Model For Text-rich Multi-image Tasks (2024) • No Venue
Jia et al.
Many-shot In-context Learning In Multimodal Foundation Models (2024) • No Venue
Jiang et al.
SOLAMI: Social Vision-language-action Modeling For Immersive Interaction With 3D Autonomous Characters (2024) • No Venue
Jiang et al.
RATIONALYST: Pre-training Process-supervision For Improving Reasoning (2024) • No Venue
Jiang et al.
Dsbench: How Far Are Data Science Agents To Becoming Data Science Experts? (2024) • No Venue
Jing et al.
Accessing GPT-4 Level Mathematical Olympiad Solutions Via Monte Carlo Tree Self-refine With Llama-3 8B (2024) • No Venue
Zhang et al.
VARCO-VISION: Expanding Frontiers In Korean Vision-language Models (2024) • No Venue
Ju et al.
Omniact: A Dataset And Benchmark For Enabling Multimodal Generalist Autonomous Agents For Desktop And Web (2024) • No Venue
Kapoor et al.
Vineppo: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment (2024) • No Venue
Kazemnejad et al.
ATHAR: A High-quality And Diverse Dataset For Classical Arabic To English Translation (2024) • No Venue
Mohammed Khalil, Mohammed Sabry
Sdpo: Don't Use Your Data All At Once (2024) • No Venue
Kim et al.
Evaluating Language Models As Synthetic Data Generators (2024) • No Venue
Kim et al.
Husky: A Unified, Open-source Language Agent For Multi-step Reasoning (2024) • No Venue
Kim et al.
Xgen-mm (BLIP-3): A Family Of Open Large Multimodal Models (2024) • No Venue
Xue et al.
Longvila: Scaling Long-context Visual Language Models For Long Videos (2024) • No Venue
Xue et al.
Fact, Fetch, And Reason: A Unified Evaluation Of Retrieval-augmented Generation (2024) • No Venue
Krishna et al.
Harvesting Textual And Structured Data From The HAL Publication Repository (2024) • No Venue
Kulumba et al.
Biomistral: A Collection Of Open-source Pretrained Large Language Models For Medical Domains (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 108 citations
Labrak et al.
TÜLU 3: Pushing Frontiers In Open Language Model Post-training (2024) • No Venue
Lambert et al.
Rewardbench: Evaluating Reward Models For Language Modeling (2024) • No Venue
Lambert et al.
Building And Better Understanding Vision-language Models: Insights And Future Directions (2024) • No Venue
Laurençon et al.
Unlocking The Conversion Of Web Screenshots Into HTML Code With The Websight Dataset (2024) • No Venue
Hugo Laurençon, Léo Tronchon, Victor Sanh
What Matters When Building Vision-language Models? (2024) • No Venue
Laurençon et al.
Closing The Gap Between Open-source And Commercial Large Language Models For Medical Evidence Summarization (2024) • npj Digital Medicine • 45 citations
Zhang et al.
Thanos: Enhancing Conversational Agents With Skill-of-mind-infused Large Language Model (2024) • No Venue
Lee et al.
Meteor: Mamba-based Traversal Of Rationale For Large Language And Vision Models (2024) • No Venue
Lee et al.
Stark: Social Long-term Multi-modal Conversation With Persona Commonsense Knowledge (2024) • No Venue
Lee et al.
A Careful Examination Of Large Language Model Performance On Grade School Arithmetic (2024) • No Venue
Zhang et al.
Ootdiffusion: Outfitting Fusion Based Latent Diffusion For Controllable Virtual Try-on (2024) • No Venue
Xu et al.
Stronger Models Are NOT Stronger Teachers For Instruction Tuning (2024) • No Venue
Xu et al.
Slowfast-llava: A Strong Training-free Baseline For Video Large Language Models (2024) • No Venue
Xu et al.
Long-context Llms Struggle With Long In-context Learning (2024) • No Venue
Li et al.
Llava-next-interleave: Tackling Multi-image, Video, And 3D In Large Multimodal Models (2024) • No Venue
Li et al.
LAION-SG: An Enhanced Large-scale Dataset For Training Complex Image-text Models With Structural Annotations (2024) • No Venue
Li et al.
Direct Preference Knowledge Distillation For Large Language Models (2024) • No Venue
Li et al.
Datacomp-lm: In Search Of The Next Generation Of Training Sets For Language Models (2024) • No Venue
Li et al.
Codes: Towards Building Open-source Language Models For Text-to-sql (2024) • Proceedings of the ACM on Management of Data • 44 citations
Li et al.
Dotamath: Decomposition Of Thought With Code Assistance And Self-correction For Mathematical Reasoning (2024) • No Venue
Li et al.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-language Model And A Comprehensive Multimodal Dataset Towards General Medical AI (2024) • No Venue
Li et al.
Androidlab: Training And Systematic Benchmarking Of Android Autonomous Agents (2024) • No Venue
Xu et al.
Omnicorpus: A Unified Multimodal Corpus Of 10 Billion-level Images Interleaved With Text (2024) • No Venue
Li et al.
Omnibench: Towards The Future Of Universal Omni-language Models (2024) • No Venue
Li et al.
Scaling (down) CLIP: A Comprehensive Analysis Of Data, Architecture, And Training Strategies (2024) • No Venue
Zichao Li, Cihang Xie, Ekin Dogus Cubuk
Synthetic Data (almost) From Scratch: Generalized Instruction Tuning For Language Models (2024) • No Venue
Li et al.
Urbangpt: Spatio-temporal Large Language Models (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 69 citations
Li et al.
Wolf: Captioning Everything With A World Summarization Framework (2024) • No Venue
Li et al.
Chatglm-math: Improving Math Problem-solving In Large Language Models With A Self-critique Pipeline (2024) • No Venue
Xu et al.
Magpie: Alignment Data Synthesis From Scratch By Prompting Aligned Llms With Nothing (2024) • No Venue
Xu et al.
Contrastive Preference Optimization: Pushing The Boundaries Of LLM Performance In Machine Translation (2024) • No Venue
Xu et al.
Earthgpt: A Universal Multi-modal Large Language Model For Multi-sensor Image Comprehension In Remote Sensing Domain (2024) • IEEE Transactions on Geoscience and Remote Sensing • 78 citations
Zhang et al.
MMIE: Massive Multimodal Interleaved Comprehension Benchmark For Large Vision-language Models (2024) • No Venue
Xia et al.
Document Parsing Unveiled: Techniques, Challenges, And Prospects For Structured Information Extraction (2024) • No Venue
Zhang et al.
Benchmarking Retrieval-augmented Generation For Medicine (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 119 citations
Xiong et al.
I-SHEEP: Self-alignment Of LLM From Scratch Through An Iterative Self-enhancement Paradigm (2024) • No Venue
Liang et al.
Large Motion Video Autoencoding With Cross-modal Video VAE (2024) • No Venue
Xing et al.
HARE: Human Priors, A Key To Small Language Model Efficiency (2024) • No Venue
Zhang et al.
Showui: One Vision-language-action Model For GUI Visual Agent (2024) • No Venue
Lin et al.
Medtrinity-25m: A Large-scale Multimodal Dataset With Multigranular Annotations For Medicine (2024) • No Venue
Xie et al.
A Preliminary Study Of O1 In Medicine: Are We Closer To An AI Doctor? (2024) • No Venue
Xie et al.
Open-finllms: Open Multimodal Large Language Models For Financial Applications (2024) • No Venue
Xie et al.
The Finben: An Holistic Financial Benchmark For Large Language Models (2024) • No Venue
Xie et al.
Fine-tuning Or Retrieval? Comparing Knowledge Injection In Llms (2023) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 50 citations
Ovadia et al.
Chartgpt: Leveraging Llms To Generate Charts From Abstract Natural Language (2023) • IEEE Transactions on Visualization and Computer Graphics • 48 citations
Tian et al.
The Refinedweb Dataset For Falcon LLM: Outperforming Curated Corpora With Web Data, And Web Data Only (2023) • No Venue
Penedo et al.
Evaluating The Logical Reasoning Ability Of Chatgpt And GPT-4 (2023) • Arxiv • 102 citations
Liu et al.
Agentbench: Evaluating Llms As Agents (2023) • No Venue
Liu et al.
A Comprehensive Evaluation Of Chatgpt's Zero-shot Text-to-sql Capability (2023) • Arxiv • 58 citations
Liu et al.
Revisiting Temporal Modeling For Clip-based Image-to-video Knowledge Transferring (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Liu et al.
Multi-task Recommendations With Reinforcement Learning (2023) • IEEE Transactions on Image Processing • 53 citations
Liu et al.
Tinygsm: Achieving >80% On Gsm8k With Small Language Models (2023) • No Venue
Liu et al.
CLIP As RNN: Segment Countless Visual Concepts Without Training Endeavor (2023) • No Venue
Sun et al.
C-pack: Packed Resources For General Chinese Embeddings (2023) • Arxiv • 69 citations
Xiao et al.
Inconsistent Matters: A Knowledge-guided Dual-consistency Network For Multi-modal Rumor Detection (2023) • IEEE Transactions on Knowledge and Data Engineering • 47 citations
Sun et al.
The Flan Collection: Designing Data And Methods For Effective Instruction Tuning (2023) • Arxiv • 109 citations
Longpre et al.
Interpretable Long-form Legal Question Answering With Retrieval-augmented Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 45 citations
Antoine Louis, Gijs van Dijck, Gerasimos Spanakis
Visual Language Pretrained Multiple Instance Zero-shot Transfer For Histopathology Images (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 79 citations
Lu et al.
Unified-io 2: Scaling Autoregressive Multimodal Models With Vision, Language, Audio, And Action (2023) • No Venue
Lu et al.
Llama-reviewer: Advancing Code Review Automation With Large Language Models Through Parameter-efficient Fine-tuning (2023) • 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE) • 71 citations
Lu et al.
Level Generation Through Large Language Models (2023) • FDG 2023: Foundations of Digital Games 2023 • 65 citations
Todd et al.
Can Chatgpt Reproduce Human-generated Labels? A Study Of Social Computing Tasks (2023) • Arxiv • 62 citations
Zhu et al.
Taiyi: A Bilingual Fine-tuned Large Language Model For Diverse Biomedical Tasks (2023) • Journal of the American Medical Informatics Association • 41 citations
Luo et al.
Llms For Knowledge Graph Construction And Reasoning: Recent Capabilities And Future Opportunities (2023) • World Wide Web • 130 citations
Zhu et al.
CORAL: Expert-curated Medical Oncology Reports To Advance Language Model Inference (2023) • NEJM AI • 42 citations
Sushil et al.
Fingpt: Large Generative Models For A Small Language (2023) • No Venue
Luukkonen et al.
A Transformer-based Model With Self-distillation For Multimodal Emotion Recognition In Conversations (2023) • IEEE Transactions on Multimedia • 71 citations
Ma et al.
Auto-avsr: Audio-visual Speech Recognition With Automatic Labels (2023) • ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 91 citations
Ma et al.
Text-to-sticker: Style Tailoring Latent Diffusion Models For Human Expression (2023) • No Venue
Sinha et al.
Tidybot: Personalized Robot Assistance With Large Language Models (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 65 citations
Wu et al.
3d-vista: Pre-trained Transformer For 3D Vision And Text Alignment (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 82 citations
Zhu et al.
Can Generalist Foundation Models Outcompete Special-purpose Tuning? Case Study In Medicine (2023) • Arxiv • 157 citations
Nori et al.
Capabilities Of GPT-4 On Medical Challenge Problems (2023) • Arxiv • 474 citations
Nori et al.
A Comprehensive Overview Of Large Language Models (2023) • ACM Transactions on Intelligent Systems and Technology • 152 citations
Naveed et al.
Culturax: A Cleaned, Enormous, And Multilingual Dataset For Large Language Models In 167 Languages (2023) • No Venue
Nguyen et al.
Hyenadna: Long-range Genomic Sequence Modeling At Single Nucleotide Resolution (2023) • Arxiv • 140 citations
Nguyen et al.
Chatgpt Or Grammarly? Evaluating Chatgpt On Grammatical Error Correction Benchmark (2023) • Arxiv • 48 citations
Wu et al.
Llasm: Large Language And Speech Model (2023) • No Venue
Shu et al.
Towards Geospatial Foundation Models Via Continual Pretraining (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 57 citations
Mendieta et al.
Text2kgbench: A Benchmark For Ontology-driven Knowledge Graph Generation From Text (2023) • Lecture Notes in Computer Science • 51 citations
Mihindukulasooriya et al.
Embodiedgpt: Vision-language Pre-training Via Embodied Chain Of Thought (2023) • Arxiv • 41 citations
Mu et al.
Pmc-llama: Towards Building Open-source Language Models For Medicine (2023) • Journal of the American Medical Informatics Association • 179 citations
Wu et al.
Video-chatgpt: Towards Detailed Video Understanding Via Large Vision And Language Models (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 224 citations
Maaz et al.
Q-instruct: Improving Low-level Visual Abilities For Multi-modality Foundation Models (2023) • No Venue
Wu et al.
Enhancing CLIP With GPT-4: Harnessing Visual Descriptions As Prompts (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 45 citations
Maniparambil et al.
Mathcoder: Seamless Code Integration In Llms For Enhanced Mathematical Reasoning (2023) • No Venue
Wang et al.
Tinystories: How Small Can Language Models Be And Still Speak Coherent English? (2023) • No Venue
Ronen Eldan, Yuanzhi Li
From Sparse To Dense: GPT-4 Summarization With Chain Of Density Prompting (2023) • No Venue
Adams et al.
Is Chatgpt A Good NLG Evaluator? A Preliminary Study (2023) • Proceedings of the 4th New Frontiers in Summarization Workshop • 178 citations
Wang et al.
MEGA: Multilingual Evaluation Of Generative AI (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 76 citations
Ahuja et al.
Instructuie: Multi-task Instruction Tuning For Unified Information Extraction (2023) • Arxiv • 46 citations
Wang et al.
Docllm: A Layout-aware Generative Language Model For Multimodal Document Understanding (2023) • No Venue
Wang et al.
Improving Text Embeddings With Large Language Models (2023) • No Venue
Wang et al.
Cross-modal Contrastive Learning For Multimodal Fake News Detection (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 58 citations
Wang et al.
Large Language Models Streamline Automated Machine Learning For Clinical Studies (2023) • Nature Communications • 74 citations
Arasteh et al.
Openflamingo: An Open-source Framework For Training Large Autoregressive Vision-language Models (2023) • No Venue
Awadalla et al.
Rasa: Relation And Sensitivity Aware Representation Learning For Text-based Person Search (2023) • Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence • 73 citations
Bai et al.
Longbench: A Bilingual, Multitask Benchmark For Long Context Understanding (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Bai et al.
Learning To Exploit Temporal Structure For Biomedical Vision-language Processing (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Bannur et al.
Codekgc: Code Language Model For Generative Knowledge Graph Construction (2023) • ACM Transactions on Asian and Low-Resource Language Information Processing • 40 citations
Bi et al.
Nougat: Neural Optical Understanding For Academic Documents (2023) • No Venue
Blecher et al.
Spanish Pre-trained BERT Model And Evaluation Data (2023) • Arxiv • 332 citations
Cañete et al.
Multilora: Democratizing Lora For Better Multi-task Learning (2023) • No Venue
Wang et al.
On The Possibilities Of Ai-generated Text Detection (2023) • Arxiv • 50 citations
Chakraborty et al.
Alpagasus: Training A Better Alpaca With Fewer Data (2023) • No Venue
Chen et al.
Driving With Llms: Fusing Object-level Vector Modality For Explainable Autonomous Driving (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 110 citations
Chen et al.
Clip2scene: Towards Label-efficient 3D Scene Understanding By CLIP (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 105 citations
Chen et al.
Diversevul: A New Vulnerable Source Code Dataset For Deep Learning Based Vulnerability Detection (2023) • Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses • 136 citations
Chen et al.
Hdformer: High-order Directed Transformer For 3D Human Pose Estimation (2023) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 44 citations
Chen et al.
Modelscope Text-to-video Technical Report (2023) • Arxiv • 46 citations
Wang et al.
Plan-and-solve Prompting: Improving Zero-shot Chain-of-thought Reasoning By Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 149 citations
Wang et al.
Internvid: A Large-scale Video-text Dataset For Multimodal Understanding And Generation (2023) • No Venue
Wang et al.
CVT-SLR: Contrastive Visual-textual Transformation For Sign Language Recognition With Variational Alignment (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Zheng et al.
GPT-RE: In-context Learning For Relation Extraction Using Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 110 citations
Wan et al.
Selformer: Molecular Representation Learning Via SELFIES Language Models (2023) • Machine Learning: Science and Technology • 43 citations
Yüksel et al.
Adapointr: Diverse Point Cloud Completion With Adaptive Geometry-aware Transformers (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 79 citations
Yu et al.
Scaling Relationship On Learning Mathematical Reasoning With Large Language Models (2023) • No Venue
Yuan et al.
Recmind: Large Language Model Powered Agent For Recommendation (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 43 citations
Wang et al.
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts (2023) • No Venue
Veen et al.
SAM On Medical Images: A Comprehensive Study On Three Prompt Modes (2023) • Arxiv • 53 citations
Cheng et al.
Rolellm: Benchmarking, Eliciting, And Enhancing Role-playing Abilities Of Large Language Models (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 51 citations
Wang et al.
A Picture Is Worth More Than 77 Text Tokens: Evaluating Clip-style Models On Dense Captions (2023) • No Venue
Urbanek et al.
Selective Structured State-spaces For Long-form Video Understanding (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 70 citations
Wang et al.
Agieval: A Human-centric Benchmark For Evaluating Foundation Models (2023) • Arxiv • 60 citations
Zhong et al.
On The Robustness Of Chatgpt: An Adversarial And Out-of-distribution Perspective (2023) • Arxiv • 90 citations
Wang et al.
Increasing Diversity While Maintaining Accuracy: Text Data Generation With Large Language Models And Human Interventions (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
John Joon Young Chung, Ece Kamar, Saleema Amershi
Scaling Robot Learning With Semantically Imagined Experience (2023) • Robotics: Science and Systems XIX • 57 citations
Yu et al.
Lmsys-chat-1m: A Large-scale Real-world LLM Conversation Dataset (2023) • No Venue
Zheng et al.
The Chime-7 DASR Challenge: Distant Meeting Transcription With Multiple Devices In Diverse Scenarios (2023) • 7th International Workshop on Speech Processing in Everyday Environments (CHiME 2023) • 45 citations
Cornell et al.
Efficient And Effective Text Encoding For Chinese Llama And Alpaca (2023) • Arxiv • 71 citations
Yiming Cui, Ziqing Yang, Xin Yao
Vision Grid Transformer For Document Layout Analysis (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
da et al.
A Survey On Multimodal Large Language Models For Autonomous Driving (2023) • 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) • 217 citations
Cui et al.
Auggpt: Leveraging Chatgpt For Text Data Augmentation (2023) • Arxiv • 98 citations
Dai et al.
The State Of Human-centered NLP Technology For Fact-checking (2023) • Information Processing & Management • 55 citations
Das et al.
A Decoder-only Foundation Model For Time-series Forecasting (2023) • Arxiv • 41 citations
Das et al.
Multi-modal Self-supervised Learning For Recommendation (2023) • Proceedings of the ACM Web Conference 2023 • 157 citations
Wei et al.
K2: A Foundation Language Model For Geoscience Knowledge Understanding And Utilization (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 48 citations
Deng et al.
Enhancing Chat Language Models By Scaling High-quality Instructional Conversations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 60 citations
Ding et al.
Misrob{\ae}rta: Transformers Versus Misinformation (2023) • Mathematics • 41 citations
Ciprian-Octavian Truică, Elena-Simona Apostol
Lp-musiccaps: Llm-based Pseudo Music Captioning (2023) • No Venue
Doh et al.
Ferret: Refer And Ground Anything Anywhere At Any Granularity (2023) • Arxiv • 43 citations
You et al.
Enhancing Job Recommendation Through Llm-based Generative Adversarial Networks (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 49 citations
Du et al.
DNABERT-2: Efficient Foundation Model And Benchmark For Multi-species Genome (2023) • Arxiv • 139 citations
Zhou et al.
Lmdrive: Closed-loop End-to-end Driving With Large Language Models (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 76 citations
Shao et al.
Pdftriage: Question Answering Over Long, Structured Documents (2023) • No Venue
Saad-Falcon et al.
Detecting And Grounding Multi-modal Media Manipulation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Rui Shao, Tianxing Wu, Ziwei Liu
GPT-3.5, GPT-4, Or BARD? Evaluating Llms Reasoning Ability In Zero-shot Setting And Performance Boosting Through Prompts (2023) • Natural Language Processing Journal • 69 citations
Espejel et al.
Unified Pre-training With Pseudo Texts For Text-to-image Person Re-identification (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 48 citations
Shao et al.
A Multi-task Multi-stage Transitional Training Framework For Neural Chat Translation (2023) • Proceedings of the 2023 ACM International Conference on Multimedia Retrieval • 48 citations
Zhou et al.
Fedmultimodal: A Benchmark For Multimodal Federated Learning (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 59 citations
Feng et al.
Semeval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (multiconer 2) (2023) • Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023) • 43 citations
Fetahu et al.
Multiconer V2: A Large Multilingual Dataset For Fine-grained And Noisy Named Entity Recognition (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Fetahu et al.
Medalign: A Clinician-generated Dataset For Instruction Following With Electronic Medical Records (2023) • No Venue
Fleming et al.
Mathematical Capabilities Of Chatgpt (2023) • NeurIPS 2023 Datasets and Benchmarks • 293 citations
Frieder et al.
Chatgpt For Vulnerability Detection, Classification, And Repair: How Far Are We? (2023) • 2023 30th Asia-Pacific Software Engineering Conference (APSEC) • 56 citations
Fu et al.
Datacomp: In Search Of The Next Generation Of Multimodal Datasets (2023) • Arxiv • 72 citations
Gadre et al.
Bias And Fairness In Large Language Models: A Survey (2023) • Computational Linguistics • 255 citations
Gallegos et al.
Distil-whisper: Robust Knowledge Distillation Via Large-scale Pseudo Labelling (2023) • No Venue
Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
A Comprehensive Capability Analysis Of GPT-3 And GPT-3.5 Series Models (2023) • Arxiv • 181 citations
Ye et al.
On The Origin Of Llms: An Evolutionary Tree And Graph For 15,821 Large Language Models (2023) • No Venue
Sarah Gao, Andrew Kean Gao
G-llava: Solving Geometric Problem With Multi-modal Large Language Model (2023) • No Venue
Gao et al.
Funasr: A Fundamental End-to-end Speech Recognition Toolkit (2023) • INTERSPEECH 2023 • 44 citations
Gao et al.
Large Language Models Are Versatile Decomposers: Decompose Evidence And Questions For Table-based Reasoning (2023) • SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 43 citations
Ye et al.
A Picture Is Worth A Thousand Words: Principled Recaptioning Improves Image Generation (2023) • No Venue
Segalis et al.
Flacuna: Unleashing The Problem Solving Power Of Vicuna Using FLAN Fine-tuning (2023) • No Venue
Ghosal et al.
Can Chatgpt Replace Traditional KBQA Models? An In-depth Analysis Of The Question Answering Performance Of The GPT LLM Family (2023) • Lecture Notes in Computer Science • 66 citations
Tan et al.
Adding Conditional Control To Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 2580 citations
Lvmin Zhang, Anyi Rao, Maneesh Agrawala
Medagents: Large Language Models As Collaborators For Zero-shot Medical Reasoning (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 53 citations
Tang et al.
PIPPA: A Partially Synthetic Conversational Dataset (2023) • No Venue
Tear Gosling, Alpin Dale, Yinhe Zheng
Detecting And Preventing Hallucinations In Large Vision Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 93 citations
Anisha Gunjal, Jihan Yin, Erhan Bas
Text With Knowledge Graph Augmented Transformer For Video Captioning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Gu et al.
Editing Large Language Models: Problems, Methods, And Opportunities (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 55 citations
Yao et al.
Legalbench: A Collaboratively Built Benchmark For Measuring Legal Reasoning In Large Language Models (2023) • SSRN Electronic Journal • 77 citations
Guha et al.
Verigen: A Large Language Model For Verilog Code Generation (2023) • ACM Transactions on Design Automation of Electronic Systems • 129 citations
Thakur et al.
Connecting Large Language Models With Evolutionary Algorithms Yields Powerful Prompt Optimizers (2023) • No Venue
Guo et al.
PPTC Benchmark: Evaluating Large Language Models For Powerpoint Task Completion (2023) • No Venue
Guo et al.
Chatie: Zero-shot Information Extraction Via Chatting With Chatgpt (2023) • Arxiv • 141 citations
Wei et al.
Medalpaca -- An Open-source Collection Of Medical Conversational AI Models And Training Data (2023) • Arxiv • 102 citations
Han et al.
Stylegan-t: Unlocking The Power Of Gans For Fast Large-scale Text-to-image Synthesis (2023) • Arxiv • 59 citations
Sauer et al.
Leveraging Large Language Models For Sequential Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 88 citations
Harte et al.
Annollm: Making Large Language Models To Be Better Crowdsourced Annotators (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track) • 46 citations
He et al.
Align And Attend: Multimodal Summarization With Dual Contrastive Losses (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
He et al.
Reinforcement Learning-based Counter-misinformation Response Generation: A Case Study Of COVID-19 Vaccine Misinformation (2023) • Proceedings of the ACM Web Conference 2023 • 44 citations
Bing He, Mustaque Ahamad, Srijan Kumar
A Survey On Uncertainty Quantification Methods For Deep Learning (2023) • Arxiv • 50 citations
He et al.
From Words To Watts: Benchmarking The Energy Costs Of Large Language Model Inference (2023) • 2023 IEEE High Performance Extreme Computing Conference (HPEC) • 104 citations
Samsi et al.
Biomedclip: A Multimodal Biomedical Foundation Model Pretrained From Fifteen Million Scientific Image-text Pairs (2023) • Arxiv • 87 citations
Zhang et al.
Copiloting The Copilots: Fusing Large Language Models With Completion Engines For Automated Program Repair (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 75 citations
Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang
LRM: Large Reconstruction Model For Single Image To 3D (2023) • No Venue
Hong et al.
Large Language Models Are Zero-shot Rankers For Recommender Systems (2023) • Lecture Notes in Computer Science • 155 citations
Hou et al.
Bad Actor, Good Advisor: Exploring The Role Of Large Language Models In Fake News Detection (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 104 citations
Hu et al.
RSGPT: A Remote Sensing Vision Language Model And Benchmark (2023) • ISPRS Journal of Photogrammetry and Remote Sensing • 46 citations
Hu et al.
Llm-adapters: An Adapter Family For Parameter-efficient Fine-tuning Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 128 citations
Hu et al.
Vid2seq: Large-scale Pretraining Of A Visual Language Model For Dense Video Captioning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 179 citations
Yang et al.
Towards Interpretable Mental Health Analysis With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 84 citations
Yang et al.
How To Do Things With Deep Learning Code (2023) • Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 41 citations
Minh Hua, Rita Raley
Swin3d: A Pretrained Transformer Backbone For 3D Indoor Scene Understanding (2023) • Computational Visual Media • 47 citations
Yang et al.
C-eval: A Multi-level Multi-discipline Chinese Evaluation Suite For Foundation Models (2023) • Arxiv • 89 citations
Huang et al.
Segment And Caption Anything (2023) • No Venue
Huang et al.
Diversity-aware Meta Visual Prompting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Huang et al.
Make-an-audio: Text-to-audio Generation With Prompt-enhanced Diffusion Models (2023) • Arxiv • 46 citations
Huang et al.
Med-halt: Medical Domain Hallucination Test For Large Language Models (2023) • Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL) • 54 citations
Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu
GPQA: A Graduate-level Google-proof Q&A Benchmark (2023) • No Venue
Rein et al.
Universalner: Targeted Distillation From Large Language Models For Open Named Entity Recognition (2023) • No Venue
Zhou et al.
Exploring The Limits Of Chatgpt For Query Or Aspect-based Text Summarization (2023) • Arxiv • 89 citations
Yang et al.
Fusecap: Leveraging Large Language Models For Enriched Fused Image Captions (2023) • 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) • 47 citations
Rotstein et al.
Starvector: Generating Scalable Vector Graphics Code From Images (2023) • No Venue
Rodriguez et al.
Quilt-1m: One Million Image-text Pairs For Histopathology (2023) • Arxiv • 52 citations
Ikezogwo et al.
From Image To Language: A Critical Analysis Of Visual Question Answering (VQA) Approaches, Challenges, And Opportunities (2023) • Information Fusion • 58 citations
Ishmam et al.
Conceptfusion: Open-set Multimodal 3D Mapping (2023) • Robotics: Science and Systems XIX • 142 citations
Jatavallabhula et al.
Camels In A Changing Climate: Enhancing LM Adaptation With Tulu 2 (2023) • No Venue
Ivison et al.
A Comprehensive Evaluation Of Large Language Models On Benchmark Biomedical Text Processing Tasks (2023) • Computers in Biology and Medicine • 61 citations
Jahan et al.
Llmlingua: Compressing Prompts For Accelerated Inference Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 64 citations
Jiang et al.
Llm-blender: Ensembling Large Language Models With Pairwise Ranking And Generative Fusion (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 61 citations
Dongfu Jiang, Xiang Ren, Bill Yuchen Lin
Active Retrieval Augmented Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 216 citations
Jiang et al.
Clip-count: Towards Text-guided Zero-shot Object Counting (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 50 citations
Ruixiang Jiang, Lingbo Liu, Changwen Chen
Pixellm: Pixel Reasoning With Large Multimodal Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Ren et al.
Large Language Models As Zero-shot Human Models For Human-robot Interaction (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 52 citations
Bowen Zhang, Harold Soh
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages (2023) • Arxiv • 111 citations
Zhang et al.
Sabi\'a: Portuguese Large Language Models (2023) • Lecture Notes in Computer Science • 46 citations
Pires et al.
Video-llava: Learning United Visual Representation By Alignment Before Projection (2023) • No Venue
Lin et al.
DIN-SQL: Decomposed In-context Learning Of Text-to-sql With Self-correction (2023) • Arxiv • 53 citations
Mohammadreza Pourreza, Davood Rafiei
Univtg: Towards Unified Video-language Temporal Grounding (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Lin et al.
Crowdclip: Unsupervised Crowd Counting Via Vision-language Model (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 58 citations
Liang et al.
Segment Everything Everywhere All At Once (2023) • Arxiv • 151 citations
Zou et al.
PMC-CLIP: Contrastive Language-image Pre-training Using Biomedical Documents (2023) • Lecture Notes in Computer Science • 110 citations
Lin et al.
Text Is All You Need: Learning Language Representations For Sequential Recommendation (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 103 citations
Li et al.
Teach Llms To Personalize -- An Approach Inspired By Writing Education (2023) • No Venue
Li et al.
Revisiting K-nn For Fine-tuning Pre-trained Language Models (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 61 citations
Li et al.
Seed-bench: Benchmarking Multimodal Llms With Generative Comprehension (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 92 citations
Li et al.
Skcoder: A Sketch-based Approach For Automatic Code Generation (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 42 citations
Li et al.
Synthetic Data Generation With Large Language Models For Text Classification: Potential And Limitations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 70 citations
Li et al.
Videochat: Chat-centric Video Understanding (2023) • Arxiv • 90 citations
Li et al.
Unsafe Diffusion: On The Generation Of Unsafe Images And Hateful Memes From Text-to-image Models (2023) • Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security • 42 citations
Qu et al.
Llmrec: Large Language Models With Graph Augmentation For Recommendation (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 134 citations
Wei et al.
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 65 citations
Xu et al.
Pointllm: Empowering Large Language Models To Understand Point Clouds (2023) • Lecture Notes in Computer Science • 45 citations
Xu et al.
Multi: Efficient Video-and-language Understanding With Text-guided Multiway-sampler And Multiple Choice Modeling (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 50 citations
Xu et al.
Imagereward: Learning And Evaluating Human Preferences For Text-to-image Generation (2023) • Arxiv • 99 citations
Xu et al.
Demystifying CLIP Data (2023) • No Venue
Xu et al.
Knowledge-enhanced Visual-language Pre-training On Chest Radiology Images (2023) • Nature Communications • 134 citations
Zhang et al.
Learning Disentangled Semantic Spaces Of Explanations Via Invertible Neural Networks (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 79 citations
Yingji Zhang, Danilo S. Carvalho, André Freitas
Magicbrush: A Manually Annotated Dataset For Instruction-guided Image Editing (2023) • No Venue
Zhang et al.
Scaling Up Gans For Text-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 313 citations
Kang et al.
SMART-LLM: Smart Multi-agent Robot Task Planning Using Large Language Models (2023) • 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 79 citations
Shyam Sundar Kannan, Vishnunandan L. N. Venkatesh, Byung-Cheol Min
Language-driven Representation Learning For Robotics (2023) • Robotics: Science and Systems XIX • 47 citations
Karamcheti et al.
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior (2023) • No Venue
Khandelwal et al.
Gptaraeval: A Comprehensive Evaluation Of Chatgpt On Arabic NLP (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 41 citations
Khondaker et al.
Prometheus: Inducing Fine-grained Evaluation Capability In Language Models (2023) • No Venue
Kim et al.
A Survey Of Learning-based Automated Program Repair (2023) • ACM Transactions on Software Engineering and Methodology • 77 citations
Zhang et al.
Is Chatgpt A General-purpose Natural Language Processing Task Solver? (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 410 citations
Qin et al.
The Troubling Emergence Of Hallucination In Large Language Models -- An Extensive Definition, Quantification, And Prescriptive Remediations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 56 citations
Rawte et al.
Pick-a-pic: An Open Dataset Of User Preferences For Text-to-image Generation (2023) • Arxiv • 41 citations
Kirstain et al.
Gender Bias And Stereotypes In Large Language Models (2023) • CI '23: Collective Intelligence Conference • 216 citations
Hadas Kotek, Rikker Dockum, David Q. Sun
RS5M And Georsclip: A Large Scale Vision-language Dataset And A Large Vision-language Model For Remote Sensing (2023) • IEEE Transactions on Geoscience and Remote Sensing • 43 citations
Zhang et al.
Vision Language Models In Autonomous Driving: A Survey And Outlook (2023) • IEEE Transactions on Intelligent Vehicles • 46 citations
Zhou et al.
Personalize Segment Anything Model With One Shot (2023) • Arxiv • 65 citations
Zhang et al.
Sentiment Analysis In The Era Of Large Language Models: A Reality Check (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 161 citations
Zhang et al.
MADLAD-400: A Multilingual And Document-level Large Audited Dataset (2023) • No Venue
Kudugunta et al.
Geochat: Grounded Large Vision-language Model For Remote Sensing (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Kuckreja et al.
Summit: Iterative Text Summarization Via Chatgpt (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Haopeng Zhang, Xiao Liu, Jiawei Zhang
Glamm: Pixel Grounding Large Multimodal Model (2023) • No Venue
Rasheed et al.
Chatgpt: Beginning Of An End Of Manual Linguistic Data Annotation? Use Case Of Automatic Genre Identification (2023) • Arxiv • 64 citations
Taja Kuzman, Igor Mozetič, Nikola Ljubešić
Vision-language Models For Vision Tasks: A Survey (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 403 citations
Zhang et al.
LISA: Reasoning Segmentation Via Large Language Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 205 citations
Lai et al.
Chatgpt Beyond English: Towards A Comprehensive Evaluation Of Large Language Models In Multilingual Learning (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 91 citations
Lai et al.
Evaluation Of Chatgpt For Nlp-based Mental Health Applications (2023) • Arxiv • 54 citations
Bishal Lamichhane
A Systematic Study And Comprehensive Evaluation Of Chatgpt On Benchmark Datasets (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 69 citations
Laskar et al.
OBELICS: An Open Web-scale Filtered Dataset Of Interleaved Image-text Documents (2023) • No Venue
Laurençon et al.
The Bigscience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset (2023) • Arxiv • 65 citations
Laurençon et al.
Platypus: Quick, Cheap, And Powerful Refinement Of Llms (2023) • No Venue
Ariel N. Lee, Cole J. Hunter, Nataniel Ruiz
Toolllm: Facilitating Large Language Models To Master 16000+ Real-world Apis (2023) • No Venue
Qin et al.
In Chatgpt We Trust? Measuring And Characterizing The Reliability Of Chatgpt (2023) • Arxiv • 71 citations
Shen et al.
Video-llama: An Instruction-tuned Audio-visual Language Model For Video Understanding (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 367 citations
Hang Zhang, Xin Li, Lidong Bing
Filter-enhanced MLP Is All You Need For Sequential Recommendation (2022) • Proceedings of the ACM Web Conference 2022 • 283 citations
Zhou et al.
Few-shot Class-incremental Learning By Sampling Multi-phase Tasks (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 108 citations
Zhou et al.
ELEVATER: A Benchmark And Toolkit For Evaluating Language-augmented Visual Models (2022) • Arxiv • 64 citations
Li et al.
Compositional Temporal Grounding With Structured Variational Cross-graph Correspondence Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Li et al.
Automating Code Review Activities By Large-scale Pre-training (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 129 citations
Li et al.
BLIP: Bootstrapping Language-image Pre-training For Unified Vision-language Understanding And Generation (2022) • Arxiv • 850 citations
Li et al.
CLMLF:A Contrastive Learning And Multi-layer Fusion Method For Multimodal Sentiment Detection (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 85 citations
Li et al.
Envedit: Environment Editing For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 64 citations
Jialu Li, Hao Tan, Mohit Bansal
User-centric Conversational Recommendation With Multi-aspect User Modeling (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 52 citations
Li et al.
Detect Rumors In Microblog Posts For Low-resource Domains Via Adversarial Contrastive Learning (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 41 citations
Lin et al.
Data Cards: Purposeful And Transparent Dataset Documentation For Responsible AI (2022) • 2022 ACM Conference on Fairness Accountability and Transparency • 140 citations
Mahima Pushkarna, Andrew Zaldivar, Oddur Kjartansson
Visual-language Navigation Pretraining Via Prompt-based Environmental Self-exploration (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Liang et al.
Proposalclip: Unsupervised Open-category Object Proposal Generation Via Exploiting CLIP Cues (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Shi et al.
Egocentric Video-language Pretraining (2022) • Arxiv • 45 citations
Lin et al.
RASAT: Integrating Relational Structures Into Pretrained Seq2seq Model For Text-to-sql (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 60 citations
Qi et al.
Transrac: Encoding Multi-scale Temporal Correlation With Transformers For Repetitive Action Counting (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Hu et al.
Large Language Models Can Self-improve (2022) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 96 citations
Huang et al.
Swintextspotter: Scene Text Spotting Via Better Synergy Between Text Detection And Text Recognition (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 128 citations
Huang et al.
POLITICS: Pretraining With Same-story Article Comparison For Ideology Prediction And Stance Detection (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 40 citations
Liu et al.
Tganet: Text-guided Attention For Improved Polyp Segmentation (2022) • Lecture Notes in Computer Science • 141 citations
Tomar et al.
Discovering Language Model Behaviors With Model-written Evaluations (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 47 citations
Perez et al.
A Prompting-based Approach For Adversarial Example Generation And Robustness Enhancement (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 108 citations
Yang et al.
Zero-shot Video Question Answering Via Frozen Bidirectional Language Models (2022) • Arxiv • 64 citations
Yang et al.
Robots Enact Malignant Stereotypes (2022) • 2022 ACM Conference on Fairness Accountability and Transparency • 43 citations
Hundt et al.
Chinese CLIP: Contrastive Vision-language Pretraining In Chinese (2022) • Arxiv • 51 citations
Yang et al.
Scaling Up Models And Data With $\texttt{t5x}$ And $\texttt{seqio}$ (2022) • Arxiv • 47 citations
Roberts et al.
Entity-enhanced Adaptive Reconstruction Network For Weakly Supervised Referring Expression Grounding (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 48 citations
Liu et al.
Codefill: Multi-token Code Completion By Jointly Learning From Structure And Naming Sequences (2022) • ICSE '22: 44th International Conference on Software Engineering • 67 citations
Maliheh Izadi, Roberta Gismondi, Georgios Gousios
Effective Token Graph Modeling Using A Novel Labeling Strategy For Structured Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 71 citations
Shi et al.
On The Importance Of Building High-quality Training Datasets For Neural Code Search (2022) • Proceedings of the 44th International Conference on Software Engineering • 61 citations
Sun et al.
UMT: Unified Multi-modal Transformers For Joint Video Moment Retrieval And Highlight Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 125 citations
Liu et al.
Reducing The Vision And Language Bias For Temporal Sentence Grounding (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 45 citations
Daizong Liu, Xiaoye Qu, Wei Hu
Asymmetric Cross-scale Alignment For Text-based Person Search (2022) • IEEE Transactions on Multimedia • 54 citations
Ji et al.
Partslip: Low-shot Part Segmentation For 3D Point Clouds Via Pretrained Image-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Liu et al.
Pseudo-q: Generating Pseudo Language Queries For Visual Grounding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 54 citations
Jiang et al.
Automatic Text Summarization Methods: A Comprehensive Review (2022) • SN Computer Science • 68 citations
Divakar Yadav, Jalpa Desai, Arun Kumar Yadav
GL-RG: Global-local Representation Granularity For Video Captioning (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 52 citations
Yan et al.
Chart-to-text: A Large-scale Benchmark For Chart Summarization (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 70 citations
Kantharaj et al.
Local-global Context Aware Transformer For Language-guided Video Segmentation (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 72 citations
Liang et al.
Fantastic Questions And Where To Find Them: Fairytaleqa -- An Authentic Dataset For Narrative Comprehension (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Xu et al.
Prosocialdialog: A Prosocial Backbone For Conversational Agents (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 42 citations
Kim et al.
Perturbation Augmentation For Fairer NLP (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 50 citations
Qian et al.
Clip-vip: Adapting Pre-trained Image-text Model To Video-language Representation Alignment (2022) • Arxiv • 53 citations
Xue et al.
ULIP: Learning A Unified Representation Of Language, Images, And Point Clouds For 3D Understanding (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 169 citations
Xue et al.
Leveraging Language Foundation Models For Human Mobility Forecasting (2022) • Proceedings of the 30th International Conference on Advances in Geographic Information Systems • 48 citations
Hao Xue, Bhanu Prakash Voutharoja, Flora D. Salim
An Empirical Survey On Long Document Summarization: Datasets, Models And Metrics (2022) • ACM Computing Surveys • 69 citations
Koh et al.
A Two-stream Amr-enhanced Model For Document-level Event Argument Extraction (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 47 citations
Xu et al.
Beyond A Pre-trained Object Detector: Cross-modal Textual And Visual Context For Image Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 85 citations
Chia-Wen Kuo, Zsolt Kira
Multi-task Learning With Multi-query Transformer For Dense Prediction (2022) • IEEE Transactions on Circuits and Systems for Video Technology • 45 citations
Xu et al.
Groupvit: Semantic Segmentation Emerges From Text Supervision (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 352 citations
Xu et al.
Learn From Structural Scope: Improving Aspect-level Sentiment Analysis With Hybrid Graph Convolutional Networks (2022) • Neurocomputing • 45 citations
Xu et al.
PEER: A Comprehensive And Multi-task Benchmark For Protein Sequence Understanding (2022) • Arxiv • 58 citations
Xu et al.
Multihiertt: Numerical Reasoning Over Multi Hierarchical Tabular And Textual Data (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 46 citations
Zhao et al.
Coauthor: Designing A Human-ai Collaborative Writing Dataset For Exploring Language Model Capabilities (2022) • CHI '22: CHI Conference on Human Factors in Computing Systems • 223 citations
Mina Lee, Percy Liang, Qian Yang
Improving Mispronunciation Detection With Wav2vec2-based Momentum Pseudo-labeling For Accentedness And Intelligibility Assessment (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Yang et al.
Empathetic Conversational Systems: A Review Of Current Advances, Gaps, And Opportunities (2022) • IEEE Transactions on Affective Computing • 44 citations
Aravind Sesagiri Raamkumar, Yinping Yang
Benchmarking Large Language Models For Automated Verilog RTL Code Generation (2022) • 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) • 112 citations
Thakur et al.
Progen2: Exploring The Boundaries Of Protein Language Models (2022) • Cell Systems • 286 citations
Nijkamp et al.
NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Task (2022) • Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP) • 48 citations
Abdul-Mageed et al.
A Few Thousand Translations Go A Long Way! Leveraging Pre-trained Models For African News Translation (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 44 citations
Adelani et al.
Learning Audio-video Modalities From Image Captions (2022) • Lecture Notes in Computer Science • 48 citations
Nagrani et al.
Large Language Models Are Few-shot Clinical Information Extractors (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 212 citations
Agrawal et al.
Chemberta-2: Towards Chemical Foundation Models (2022) • Arxiv • 120 citations
Ahmad et al.
MTEB: Massive Text Embedding Benchmark (2022) • Arxiv • 57 citations
Muennighoff et al.
USB: A Unified Semi-supervised Learning Benchmark For Classification (2022) • Arxiv • 42 citations
Wang et al.
Towards Data-efficient Detection Transformers (2022) • Lecture Notes in Computer Science • 52 citations
Wang et al.
Self-consistency Improves Chain Of Thought Reasoning In Language Models (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 365 citations
Wang et al.
Scene Text Recognition With Permuted Autoregressive Sequence Models (2022) • Lecture Notes in Computer Science • 183 citations
Darwin Bautista, Rowel Atienza
Text Embeddings By Weakly-supervised Contrastive Pre-training (2022) • Arxiv • 107 citations
Wang et al.
Simkgc: Simple Contrastive Knowledge Graph Completion With Pre-trained Language Models (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 151 citations
Wang et al.
Refined: An Efficient Zero-shot-capable Approach To End-to-end Entity Linking (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track • 53 citations
Ayoola et al.
Multimae: Multi-modal Multi-task Masked Autoencoders (2022) • Lecture Notes in Computer Science • 186 citations
Bachmann et al.
Promptsource: An Integrated Development Environment And Repository For Natural Language Prompts (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 148 citations
Bach et al.
Training A Helpful And Harmless Assistant With Reinforcement Learning From Human Feedback (2022) • Arxiv • 346 citations
Bai et al.
Medclip: Contrastive Learning From Unpaired Medical Images And Text (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 402 citations
Wang et al.
Building Machine Translation Systems For The Next Thousand Languages (2022) • Arxiv • 43 citations
Bapna et al.
Lila: A Unified Benchmark For Mathematical Reasoning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 70 citations
Mishra et al.
RT-1: Robotics Transformer For Real-world Control At Scale (2022) • Robotics: Science and Systems 2023 • 372 citations
Brohan et al.
Prompting GPT-3 To Be Reliable (2022) • Arxiv • 68 citations
Si et al.
Numglue: A Suite Of Fundamental Yet Challenging Mathematical Reasoning Tasks (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 117 citations
Mishra et al.
Are Transformers Effective For Time Series Forecasting? (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 1544 citations
Zeng et al.
Exploiting Unlabeled Data With Vision And Language Models For Object Detection (2022) • Lecture Notes in Computer Science • 74 citations
Zhao et al.
A Model-agnostic Data Manipulation Method For Persona-based Dialogue Generation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 40 citations
Cao et al.
Open-vocabulary DETR With Conditional Matching (2022) • Lecture Notes in Computer Science • 155 citations
Zang et al.
Generating Data To Mitigate Spurious Correlations In Natural Language Inference Datasets (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
Wu et al.
Roentgen: Vision-language Foundation Model For Chest X-ray Generation (2022) • Arxiv • 55 citations
Chambon et al.
Adapting Pretrained Vision-language Foundational Models To Medical Imaging Domains (2022) • Foundation Models for Decision Making Workshop at Neural Information Processing Systems 2022 • 43 citations
Chambon et al.
Unified Vision And Language Prompt Learning (2022) • Arxiv • 54 citations
Zang et al.
Large Language Models Are Few(1)-shot Table Reasoners (2022) • Findings of the Association for Computational Linguistics: EACL 2023 • 41 citations
Wenhu Chen
Gatehub: Gated History Unit With Background Suppression For Online Action Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Chen et al.
Murag: Multimodal Retrieval-augmented Generator For Open Question Answering Over Images And Text (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 65 citations
Chen et al.
Program Of Thoughts Prompting: Disentangling Computation From Reasoning For Numerical Reasoning Tasks (2022) • Arxiv • 110 citations
Chen et al.
A Simple Multi-modality Transfer Learning Baseline For Sign Language Translation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 120 citations
Chen et al.
What Matters In Language Conditioned Robotic Imitation Learning Over Unstructured Data (2022) • IEEE Robotics and Automation Letters • 49 citations
Oier Mees, Lukas Hermann, Wolfram Burgard
Bidirectional Cross-modal Knowledge Exploration For Video Recognition With Pre-trained Vision-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 75 citations
Wu et al.
Locating And Editing Factual Associations In GPT (2022) • Arxiv • 172 citations
Meng et al.
Winoground: Probing Vision And Language Models For Visio-linguistic Compositionality (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Thrush et al.
Promda: Prompt-based Data Augmentation For Low-resource NLU Tasks (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 61 citations
Wang et al.
X-trans2cap: Cross-modal Knowledge Transfer Using Transformer For 3D Dense Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Yuan et al.
Phenaki: Variable Length Video Generation From Open Domain Textual Description (2022) • Arxiv • 78 citations
Villegas et al.
The Moral Integrity Corpus: A Benchmark For Ethical Dialogue Systems (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Ziems et al.
Ernie-layout: Layout Knowledge Enhanced Pre-training For Visually-rich Document Understanding (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 53 citations
Peng et al.
Medmcqa : A Large-scale Multi-subject Multi-choice Dataset For Medical Domain Question Answering (2022) • ACM Conference on Health Inference and Learning (CHIL) 2022 • 72 citations
Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu
Heterogeneous Ensemble Knowledge Transfer For Training Large Models In Federated Learning (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 82 citations
Cho et al.
Fine-grained Image Captioning With CLIP Reward (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 52 citations
Cho et al.
Multiconer: A Large-scale Multilingual Dataset For Complex Named Entity Recognition (2022) • Arxiv • 82 citations
Malmasi et al.
Large Language Models Encode Clinical Knowledge (2022) • Nature • 1963 citations
Singhal et al.
M-SENA: An Integrated Platform For Multimodal Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 58 citations
Mao et al.
ZSON: Zero-shot Object-goal Navigation Using Multimodal Goal Embeddings (2022) • Arxiv • 41 citations
Majumdar et al.
Storydall-e: Adapting Pretrained Text-to-image Transformers For Story Continuation (2022) • Lecture Notes in Computer Science • 47 citations
Adyasha Maharana, Darryl Hannan, Mohit Bansal
"this Is My Unicorn, Fluffy": Personalizing Frozen Vision-language Representations (2022) • Lecture Notes in Computer Science • 42 citations
Cohen et al.
Clip-art: Contrastive Pre-training For Fine-grained Art Classification (2022) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 88 citations
Marcos V. Conde, Kerem Turgutlu
No Language Left Behind: Scaling Human-centered Machine Translation (2022) • Arxiv • 354 citations
Team et al.
See Finer, See More: Implicit Modality Alignment For Text-based Person Retrieval (2022) • Lecture Notes in Computer Science • 102 citations
Shu et al.
I2mvformer: Large Language Model Generated Multi-view Document Supervision For Zero-shot Image Classification (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Naeem et al.
A Survey On Legal Judgment Prediction: Datasets, Metrics, Models And Challenges (2022) • IEEE Access • 46 citations
Cui et al.
Teaching Small Language Models To Reason (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 45 citations
Magister et al.
Structured Pruning Learns Compact And Accurate Models (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 99 citations
Mengzhou Xia, Zexuan Zhong, Danqi Chen
MISC: A Mixed Strategy-aware Model Integrating COMET For Emotional Support Conversation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 66 citations
Tu et al.
Improving The Factual Correctness Of Radiology Report Generation With Semantic Rewards (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 41 citations
Delbrouck et al.
COLD: A Benchmark For Chinese Offensive Language Detection (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 66 citations
Deng et al.
Visual Speech Recognition For Multiple Languages In The Wild (2022) • Nature Machine Intelligence • 130 citations
Pingchuan Ma, Stavros Petridis, Maja Pantic
Reading-strategy Inspired Visual Representation Learning For Text-to-video Retrieval (2022) • IEEE Transactions on Circuits and Systems for Video Technology • 67 citations
Dong et al.
Teaching Structured Vision&language Concepts To Vision&language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Doveh et al.
On The Origin Of Hallucinations In Conversational Models: Is It The Datasets Or The Models? (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 101 citations
Dziri et al.
One Embedder, Any Task: Instruction-finetuned Text Embeddings (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 68 citations
Su et al.
Ontology-enhanced Prompt-tuning For Few-shot Learning (2022) • Proceedings of the ACM Web Conference 2022 • 57 citations
Ye et al.
Mintrec: A New Dataset For Multimodal Intent Recognition (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 41 citations
Zhang et al.
3D-SPS: Single-stage 3D Visual Grounding Via Referred Point Progressive Selection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 64 citations
Luo et al.
Practical Program Repair In The Era Of Large Pre-trained Language Models (2022) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 236 citations
Chunqiu Steven Xia, Yuxiang Wei, Lingming Zhang
DR-GAN: Distribution Regularization For Text-to-image Generation (2022) • IEEE Transactions on Neural Networks and Learning Systems • 43 citations
Tan et al.
Mapping Global Dynamics Of Benchmark Creation And Saturation In Artificial Intelligence (2022) • Nature Communications • 55 citations
Ott et al.
Dynatask: A Framework For Creating Dynamic AI Benchmark Tasks (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 40 citations
Thrush et al.
Document-level Relation Extraction With Adaptive Focal Loss And Knowledge Distillation (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 101 citations
Tan et al.
Transformer-based Language Models For Software Vulnerability Detection (2022) • ACSAC: Annual Computer Security Applications Conference • 87 citations
Thapa et al.
Evaluating Mixed-initiative Conversational Search Systems Via User Simulation (2022) • Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining • 42 citations
Ivan Sekulić, Mohammad Aliannejadi, Fabio Crestani
How To Keep Text Private? A Systematic Review Of Deep Learning Methods For Privacy-preserving Natural Language Processing (2022) • Artificial Intelligence Review • 57 citations
Samuel Sousa, Roman Kern
Can Large Language Models Reason About Medical Questions? (2022) • Patterns • 138 citations
Liévin et al.
Unified Structure Generation For Universal Information Extraction (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 345 citations
Lu et al.
Bridging Video-text Retrieval With Multiple Choice Questions (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 117 citations
Ge et al.
LAION-5B: An Open Large-scale Dataset For Training Next Generation Image-text Models (2022) • Arxiv • 1032 citations
Schuhmann et al.
Zero-shot Text Classification With Self-training (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 44 citations
Gera et al.
Leveraging Unimodal Self-supervised Learning For Multimodal Audio-visual Speech Recognition (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 42 citations
Pan et al.
Why Does Surprisal From Larger Transformer-based Language Models Provide A Poorer Fit To Human Reading Times? (2022) • Transactions of the Association for Computational Linguistics • 51 citations
Byung-Doh Oh, William Schuler
A-OKVQA: A Benchmark For Visual Question Answering Using World Knowledge (2022) • Lecture Notes in Computer Science • 162 citations
Schwenk et al.
ASQA: Factoid Questions Meet Long-form Answers (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 43 citations
Stelmakh et al.
Learn To Explain: Multimodal Reasoning Via Thought Chains For Science Question Answering (2022) • Arxiv • 214 citations
Lu et al.
A Sequence-to-sequence Approach For Document-level Relation Extraction (2022) • Proceedings of the 21st Workshop on Biomedical Language Processing • 50 citations
John Giorgi, Gary D. Bader, Bo Wang
Dynamic Prompt Learning Via Policy Gradient For Semi-structured Mathematical Reasoning (2022) • Arxiv • 41 citations
Lu et al.
X-pool: Cross-modal Language-video Attention For Text-video Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 170 citations
Gorti et al.
News Summarization And Evaluation In The Era Of GPT-3 (2022) • Arxiv • 180 citations
Tanya Goyal, Junyi Jessy Li, Greg Durrett
Unixcoder: Unified Cross-modal Pre-training For Code Representation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 448 citations
Guo et al.
How Would Stance Detection Techniques Evolve After The Launch Of Chatgpt? (2022) • Arxiv • 66 citations
Zhang et al.
Self-critiquing Models For Assisting Human Evaluators (2022) • Arxiv • 46 citations
Saunders et al.
Speciesist Bias In AI -- How AI Applications Perpetuate Discrimination And Unfair Outcomes Against Animals (2022) • AI and Ethics • 62 citations
Hagendorff et al.
Mucgec: A Multi-reference Multi-source Evaluation Dataset For Chinese Grammatical Error Correction (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 51 citations
Zhang et al.
Temporal Alignment Networks For Long-term Video (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Tengda Han, Weidi Xie, Andrew Zisserman
Twhin-bert: A Socially-enriched Pre-trained Language Model For Multilingual Tweet Representations At Twitter (2022) • KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 43 citations
Zhang et al.
Fengshenbang 1.0: Being The Foundation Of Chinese Cognitive Intelligence (2022) • Arxiv • 44 citations
Zhang et al.
Can Machines Help Us Answering Question 16 In Datasheets, And In Turn Reflecting On Inappropriate Content? (2022) • FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency • 41 citations
Patrick Schramowski, Christopher Tauchmann, Kristian Kersting
WANLI: Worker And AI Collaboration For Natural Language Inference Dataset Creation (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 102 citations
Liu et al.
Vitaev2: Vision Transformer Advanced By Exploring Inductive Bias For Image Recognition And Beyond (2022) • International Journal of Computer Vision • 173 citations
Zhang et al.
Pile Of Law: Learning Responsible Data Filtering From The Law And A 256GB Open-source Legal Dataset (2022) • Arxiv • 43 citations
Henderson et al.
Reclip: A Strong Zero-shot Baseline For Referring Expression Comprehension (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 94 citations
Subramanian et al.
Bridging The Gap Between Learning In Discrete And Continuous Environments For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Hong et al.
From Discrimination To Generation: Knowledge Graph Completion With Generative Transformer (2022) • WWW '22: The ACM Web Conference 2022 • 63 citations
Xie et al.
Unnatural Instructions: Tuning Language Models With (almost) No Human Labor (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Honovich et al.
TRUE: Re-evaluating Factual Consistency Evaluation (2022) • Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering • 47 citations
Honovich et al.
Graphmae: Self-supervised Masked Graph Autoencoders (2022) • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 441 citations
Hou et al.
Context-aware Biaffine Localizing Network For Temporal Sentence Grounding (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 128 citations
Liu et al.
Locate Then Segment: A Strong Pipeline For Referring Image Segmentation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 125 citations
Jing et al.
Augmenting Sequential Recommendation With Pseudo-prior Items Via Reversely Pre-training Transformer (2021) • SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 116 citations
Liu et al.
Unit: Multimodal Multitask Learning With A Unified Transformer (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 224 citations
Ronghang Hu, Amanpreet Singh
Signbert: Pre-training Of Hand-model-aware Representation For Sign Language Recognition (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 89 citations
Hu et al.
Visually Grounded Reasoning Across Languages And Cultures (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 84 citations
Liu et al.
Consert: A Contrastive Framework For Self-supervised Sentence Representation Transfer (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 466 citations
Yan et al.
Reformer: The Relational Transformer For Image Captioning (2021) • MM '22: The 30th ACM International Conference on Multimedia • 45 citations
Xuewen Yang, Yingru Liu, Xin Wang
Discriminative Triad Matching And Reconstruction For Weakly Referring Expression Grounding (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 56 citations
Sun et al.
Efficient Attentions For Long Document Summarization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 121 citations
Huang et al.
Whiteningbert: An Easy Unsupervised Sentence Embedding Approach (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 70 citations
Huang et al.
Look Before You Leap: Learning Landmark Features For One-stage Visual Grounding (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 111 citations
Huang et al.
Graph-enhanced Multi-task Learning Of Multi-level Transition Dynamics For Session-based Recommendation (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 95 citations
Huang et al.
Task-adaptive Neural Process For User Cold-start Recommendation (2021) • Proceedings of the Web Conference 2021 • 81 citations
Lin et al.
On The Evaluation Of Neural Code Summarization (2021) • Proceedings of the 44th International Conference on Software Engineering • 64 citations
Shi et al.
Are NLP Models Really Able To Solve Simple Math Word Problems? (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 66 citations
Arkil Patel, Satwik Bhattamishra, Navin Goyal
Psyqa: A Chinese Dataset For Generating Long Counseling Text For Mental Health Support (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 47 citations
Sun et al.
Detecting Harmful Memes And Their Targets (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 90 citations
Pramanick et al.
Swinbert: End-to-end Transformers With Sparse Attention For Video Captioning (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 208 citations
Lin et al.
Memory Augmented Multi-instance Contrastive Predictive Coding For Sequential Recommendation (2021) • 2021 IEEE International Conference on Data Mining (ICDM) • 45 citations
Ruihong Qiu, Zi Huang, Hongzhi Yin
TABBIE: Pretrained Representations Of Tabular Data (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 110 citations
Iida et al.
TAPEX: Table Pre-training Via Learning A Neural SQL Executor (2021) • Arxiv • 90 citations
Liu et al.
MOMENTA: A Multimodal Framework For Detecting Harmful Memes And Their Targets (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 100 citations
Pramanick et al.
Semantic Answer Similarity For Evaluating Question Answering Models (2021) • Proceedings of the 3rd Workshop on Machine Reading for Question Answering • 43 citations
Risch et al.
Document-level Event Argument Extraction By Conditional Generation (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 222 citations
Sha Li, Heng Ji, Jiawei Han
Towards Enhancing Fine-grained Details For Image Matting (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 160 citations
Chang Liu, Henghui Ding, Xudong Jiang
Scaling Up Visual And Vision-language Representation Learning With Noisy Text Supervision (2021) • International Conference on Machine Learning 2021 • 1191 citations
Jia et al.
Multimodal Emergent Fake News Detection Via Meta Neural Process Networks (2021) • Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining • 51 citations
Wang et al.
Pre-training BERT On Arabic Tweets: Practical Considerations (2021) • Arxiv • 83 citations
Abdelali et al.
Empowering News Recommendation With Pre-trained Language Models (2021) • SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 122 citations
Wu et al.
Natural Language Understanding For Argumentative Dialogue Systems In The Opinion Building Domain (2021) • Knowledge-Based Systems • 41 citations
Abro et al.
Pairwise Supervised Contrastive Learning Of Sentence Representations (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 41 citations
Zhang et al.
Muppet: Massive Multi-task Representations With Pre-finetuning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 168 citations
Aghajanyan et al.
Arat5: Text-to-text Transformers For Arabic Language Generation (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 56 citations
El Moatez Billah Nagoudi, Abdelrahim Elmadany, Muhammad Abdul-Mageed
Building And Evaluating Open-domain Dialogue Corpora With Clarifying Questions (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Aliannejadi et al.
GODIVA: Generating Open-domain Videos From Natural Descriptions (2021) • Arxiv • 78 citations
Wu et al.
PASTE: A Tagging-free Decoding Framework Using Pointer Networks For Aspect Sentiment Triplet Extraction (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 49 citations
Mukherjee et al.
Esimcse: Enhanced Sample Building Method For Contrastive Learning Of Unsupervised Sentence Embedding (2021) • Arxiv • 69 citations
Wu et al.
Docformer: End-to-end Transformer For Document Understanding (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 200 citations
Appalaraju et al.
Datasets: A Community Library For Natural Language Processing (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 310 citations
Lhoest et al.
Geometry Attention Transformer With Position-aware Lstms For Image Captioning (2021) • Expert Systems with Applications • 59 citations
Chi Wang, Yulin Shen, Luping Ji
Layoutreader: Pre-training Of Text And Layout For Reading Order Detection (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 46 citations
Wang et al.
Combat COVID-19 Infodemic Using Explainable Natural Language Processing Models (2021) • Information Processing & Management • 146 citations
Jackie Ayoub, X. Jessie Yang, Feng Zhou
Describing And Localizing Multiple Changes With Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 76 citations
Qiu et al.
Just Say No: Analyzing The Stance Of Neural Dialogue Generation In Offensive Contexts (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 48 citations
Baheti et al.
Cross-lingual Abstractive Summarization With Limited Parallel Resources (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 43 citations
Yu Bai, Yang Gao, Heyan Huang
Simple And Effective Zero-shot Cross-lingual Phoneme Recognition (2021) • Lecture Notes in Computer Science • 155 citations
Qiantong Xu, Alexei Baevski, Michael Auli
Learning Transferable Visual Models From Natural Language Supervision (2021) • Arxiv • 5297 citations
Radford et al.
Towards More Flexible And Accurate Object Tracking With Natural Language: Algorithms And Benchmark (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 198 citations
Wang et al.
TSDAE: Using Transformer-based Sequential Denoising Auto-encoder For Unsupervised Sentence Embedding Learning (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 76 citations
Kexin Wang, Nils Reimers, Iryna Gurevych
Screen2words: Automatic Mobile UI Summarization With Multimodal Learning (2021) • The 34th Annual ACM Symposium on User Interface Software and Technology • 78 citations
Wang et al.
Data Augmentation In Natural Language Processing: A Novel Text Generation Approach For Long And Short Text Classifiers (2021) • International Journal of Machine Learning and Cybernetics • 127 citations
Bayer et al.
Multi-modal Sarcasm Detection And Humor Classification In Code-mixed Conversations (2021) • IEEE Transactions on Affective Computing • 63 citations
Bedi et al.
Data Expansion Using Back Translation And Paraphrasing For Hate Speech Detection (2021) • Arxiv • 73 citations
Djamila Romaissa Beddiar, Md Saroar Jahan, Mourad Oussalah
Few-shot Domain Adaptation For Grammatical Error Correction Via Meta-learning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 64 citations
Zhang et al.
Banglabert: Language Model Pretraining And Benchmarks For Low-resource Language Understanding Evaluation In Bangla (2021) • Findings of the Association for Computational Linguistics: NAACL 2022 • 71 citations
Bhattacharjee et al.
Cycle-consistent Inverse GAN For Text-to-image Synthesis (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 44 citations
Wang et al.
Curriculum Pre-training Heterogeneous Subgraph Transformer For Top-$n$ Recommendation (2021) • ACM Transactions on Information Systems • 45 citations
Wang et al.
CLEVE: Contrastive Pre-training For Event Extraction (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 100 citations
Wang et al.
Crossclr: Cross-modal Contrastive Learning For Multi-modal Video Representations (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 121 citations
Zolfaghari et al.
Everything At Once -- Multi-modal Fusion Transformer For Video Retrieval (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 112 citations
Shvetsova et al.
Quiz-style Question Generation For News Stories (2021) • Proceedings of the Web Conference 2021 • 41 citations
Adam D. Lelkes, Vinh Q. Tran, Cong Yu
Crslab: An Open-source Toolkit For Building Conversational Recommender System (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations • 44 citations
Zhou et al.
Recursively Summarizing Books With Human Feedback (2021) • Arxiv • 65 citations
Wu et al.
End-to-end Referring Video Object Segmentation With Multimodal Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Adam Botach, Evgenii Zheltonozhskii, Chaim Baskin
Metaicl: Learning To Learn In Context (2021) • Arxiv • 61 citations
Min et al.
ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis And Rating Prediction (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 63 citations
Bu et al.
Indonlg: Benchmark And Resources For Evaluating Indonesian Natural Language Generation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 68 citations
Cahyawijaya et al.
Out-of-scope Intent Detection With Self-supervision And Discriminative Training (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 46 citations
Zhan et al.
Disentangling Hate In Online Memes (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 73 citations
Cao et al.
Deduplicating Training Data Makes Language Models Better (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 119 citations
Lee et al.
Multieurlex -- A Multi-lingual And Multi-label Legal Document Classification Dataset For Zero-shot Cross-lingual Transfer (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 68 citations
Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos
Lexglue: A Benchmark Dataset For Legal Language Understanding In English (2021) • SSRN Electronic Journal • 73 citations
Chalkidis et al.
Speechstew: Simply Mix All Available Speech Recognition Data To Train One Large Neural Network (2021) • Arxiv • 75 citations
Chan et al.
Dialogsum: A Real-life Scenario Dialogue Summarization Dataset (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 121 citations
Chen et al.
Bidirectional Machine Reading Comprehension For Aspect Sentiment Triplet Extraction (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 185 citations
Chen et al.
Knowprompt: Knowledge-aware Prompt-tuning With Synergistic Optimization For Relation Extraction (2021) • Proceedings of the ACM Web Conference 2022 • 330 citations
Chen et al.
Semantic And Syntactic Enhanced Aspect Sentiment Triplet Extraction (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 60 citations
Chen et al.
Graph Based Network With Contextualized Representations Of Turns In Dialogue (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 40 citations
Bongseok Lee, Yong Suk Choi
Studying The Usage Of Text-to-text Transfer Transformer To Support Code-related Tasks (2021) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 190 citations
Mastropaolo et al.
Deepcad: A Deep Generative Network For Computer-aided Design Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 112 citations
Rundi Wu, Chang Xiao, Changxi Zheng
Swiss-judgment-prediction: A Multilingual Legal Judgment Prediction Benchmark (2021) • Proceedings of the Natural Legal Language Processing Workshop 2021 • 46 citations
Joel Niklaus, Ilias Chalkidis, Matthias Stürmer
Instancerefer: Cooperative Holistic Understanding For Visual Grounding On Point Clouds Through Instance Multi-level Contextual Referring (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 99 citations
Yuan et al.
Bartscore: Evaluating Generated Text As Text Generation (2021) • Arxiv • 318 citations
Weizhe Yuan, Graham Neubig, Pengfei Liu
Evaluation Of BERT And ALBERT Sentence Embedding Performance On Downstream NLP Tasks (2021) • 2020 25th International Conference on Pattern Recognition (ICPR) • 115 citations
Choi et al.
Planning With Learned Entity Prompts For Abstractive Summarization (2021) • Transactions of the Association for Computational Linguistics • 92 citations
Narayan et al.
BERT-GT: Cross-sentence N-ary Relation Extraction With BERT And Graph Transformer (2021) • Bioinformatics • 49 citations
Po-Ting Lai, Zhiyong Lu
Layoutxlm: Multimodal Pre-training For Multilingual Visually-rich Document Understanding (2021) • Arxiv • 48 citations
Xu et al.
Learning Modality-specific Representations With Self-supervised Multi-task Learning For Multimodal Sentiment Analysis (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 555 citations
Yu et al.
TEACHTEXT: Crossmodal Generalized Distillation For Text-video Retrieval (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 117 citations
Croitoru et al.
Multimodal End-to-end Sparse Model For Emotion Recognition (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 70 citations
Dai et al.
Masked Language Modeling And The Distributional Hypothesis: Order Word Matters Pre-training For Little (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 62 citations
Sinha et al.
Deepxml: A Deep Extreme Multi-label Learning Framework Applied To Short Text Documents (2021) • Proceedings of the 14th ACM International Conference on Web Search and Data Mining • 60 citations
Dahiya et al.
Does Syntax Matter? A Strong Baseline For Aspect-based Sentiment Analysis With Roberta (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 155 citations
Dai et al.
Beyond Goldfish Memory: Long-term Open-domain Conversation (2021) • Arxiv • 40 citations
Jing Xu, Arthur Szlam, Jason Weston
Increasing Faithfulness In Knowledge-grounded Dialogue With Controllable Features (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 69 citations
Rashkin et al.
Case-based Reasoning For Natural Language Queries Over Knowledge Bases (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 99 citations
Das et al.
A Dataset Of Information-seeking Questions And Answers Anchored In Research Papers (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 86 citations
Dasigi et al.
Quality At A Glance: An Audit Of Web-crawled Multilingual Datasets (2021) • Transactions of the Association for Computational Linguistics • 155 citations
Kreutzer et al.
Entity Structure Within And Throughout: Modeling Mention Dependencies For Document-level Relation Extraction (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 167 citations
Xu et al.
Docnli: A Large-scale Dataset For Document-level Natural Language Inference (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 51 citations
Wenpeng Yin, Dragomir Radev, Caiming Xiong
BOLD: Dataset And Metrics For Measuring Biases In Open-ended Language Generation (2021) • FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency • 125 citations
Dhamala et al.
Time-aware Language Models As Temporal Knowledge Bases (2021) • Transactions of the Association for Computational Linguistics • 49 citations
Dhingra et al.
Similarity Reasoning And Filtration For Image-text Matching (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 319 citations
Diao et al.
Few-nerd: A Few-shot Named Entity Recognition Dataset (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 184 citations
Ding et al.
Transferable Dialogue Systems And User Simulators (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Tseng et al.
SIMMC 2.0: A Task-oriented Dialog Dataset For Immersive Multimodal Conversations (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 51 citations
Kottur et al.
Cross-lingual COVID-19 Fake News Detection (2021) • 2021 International Conference on Data Mining Workshops (ICDMW) • 40 citations
Du et al.
Plan-then-generate: Controlled Data-to-text Generation Via Planning (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 57 citations
Su et al.
Clip4caption ++: Multi-clip For Video Caption (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 113 citations
Tang et al.
Decoupling The Role Of Data, Attention, And Losses In Multimodal Transformers (2021) • Transactions of the Association for Computational Linguistics • 63 citations
Hendricks et al.
CUAD: An Expert-annotated NLP Dataset For Legal Contract Review (2021) • Arxiv • 95 citations
Hendrycks et al.
Measuring Mathematical Problem Solving With The MATH Dataset (2021) • Arxiv • 268 citations
Hendrycks et al.
Object-region Video Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Herzig et al.
Unlocking Compositional Generalization In Pre-trained Models Using Intermediate Representations (2021) • Arxiv • 51 citations
Herzig et al.
MDETR -- Modulated Detection For End-to-end Multi-modal Understanding (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 594 citations
Kamath et al.
Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 153 citations
Zaid Khan, Yun Fu
Matscibert: A Materials Domain Language Model For Text Mining And Information Extraction (2021) • npj Computational Materials • 200 citations
Gupta et al.
Jointgt: Graph-text Joint Representation Learning For Text Generation From Knowledge Graphs (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 77 citations
Ke et al.
CAVER: Cross-modal View-mixed Transformer For Bi-modal Salient Object Detection (2021) • IEEE Transactions on Image Processing • 138 citations
Pang et al.
Audioclip: Extending CLIP To Image, Text And Audio (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 210 citations
Guzhov et al.
Asvspoof 2021: Accelerating Progress In Spoofed And Deepfake Speech Detection (2021) • 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge • 268 citations
Yamagishi et al.
Vision Transformers For Weeds And Crops Classification Of High Resolution UAV Images (2021) • Remote Sensing • 167 citations
Reedha et al.
Meta-learning Adversarial Domain Adaptation Network For Few-shot Text Classification (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 54 citations
Han et al.
Exploring Task Difficulty For Few-shot Relation Extraction (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 71 citations
Jiale Han, Bo Cheng, Wei Lu
Knowledge-enhanced Hierarchical Graph Transformer Network For Multi-behavior Recommendation (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 179 citations
Xia et al.
Multiplex Behavioral Relation Learning For Recommendation Via Memory Augmented Transformer Network (2021) • SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval • 129 citations
Xia et al.
E-vil: A Dataset And Benchmark For Natural Language Explanations In Vision-language Tasks (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 54 citations
Kayser et al.
KLUE: Korean Language Understanding Evaluation (2021) • Arxiv • 78 citations
Park et al.
Open-book Video Captioning With Retrieve-copy-generate Network (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 100 citations
Zhang et al.
A Unified Generative Framework For Aspect-based Sentiment Analysis (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 256 citations
Yan et al.
Xl-sum: Large-scale Multilingual Abstractive Summarization For 44 Languages (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 179 citations
Hasan et al.
Roformer: Enhanced Transformer With Rotary Position Embedding (2021) • Neurocomputing • 830 citations
Su et al.
Transrefer3d: Entity-and-relation Aware Transformer For Fine-grained 3D Visual Grounding (2021) • MM '21: ACM Multimedia Conference • 65 citations
He et al.
GALAXY: A Generative Pre-trained Model For Task-oriented Dialog With Semi-supervised Learning And Explicit Policy Injection (2021) • Arxiv • 45 citations
He et al.
Multitask Prompted Training Enables Zero-shot Task Generalization (2021) • Arxiv • 558 citations
Sanh et al.
Does CLIP Benefit Visual Question Answering In The Medical Domain As Much As It Does In The General Domain? (2021) • Arxiv • 41 citations
Sedigheh Eslami, Gerard de Melo, Christoph Meinel
Clip2video: Mastering Video-text Retrieval Via Image CLIP (2021) • Arxiv • 130 citations
Fang et al.
SATAR: A Self-supervised Approach To Twitter Account Representation Learning And Its Application In Bot Detection (2021) • CIKM '21: The 30th ACM International Conference on Information and Knowledge Management • 60 citations
Feng et al.
An Improved Baseline For Sentence-level Relation Extraction (2021) • Arxiv • 49 citations
Wenxuan Zhou, Muhao Chen
Structext: Structured Text Understanding With Multi-modal Transformers (2021) • MM '21: ACM Multimedia Conference • 89 citations
Li et al.
Cross-modal Contrastive Learning For Text-to-image Generation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 297 citations
Zhang et al.
Guided Generation Of Cause And Effect (2021) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 47 citations
Li et al.
Trocr: Transformer-based Optical Character Recognition With Pre-trained Models (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 247 citations
Li et al.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-image Pre-training Paradigm (2021) • Arxiv • 126 citations
Li et al.
Aspect Sentiment Quad Prediction As Paraphrase Generation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 174 citations
Zhang et al.
Keeping It Simple: Language Models Can Learn Complex Molecular Distributions (2021) • Nature Communications • 129 citations
Daniel Flam-Shepherd, Kevin Zhu, Alán Aspuru-Guzik
Adversarial Text-to-image Synthesis: A Review (2021) • Neural Networks • 193 citations
Frolov et al.
Attend What You Need: Motion-appearance Synergistic Networks For Video Question Answering (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 57 citations
Seo et al.
Bigssl: Exploring The Frontier Of Large-scale Semi-supervised Learning For Automatic Speech Recognition (2021) • IEEE Journal of Selected Topics in Signal Processing • 148 citations
Zhang et al.
Towards Robustness Of Text-to-sql Models Against Synonym Substitution (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 58 citations
Gan et al.
Open Aspect Target Sentiment Classification With Natural Language Prompts (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 43 citations
Seoh et al.
Clip4clip: An Empirical Study Of CLIP For End To End Video Clip Retrieval (2021) • Arxiv • 113 citations
Luo et al.
Newsclippings: Automatic Generation Of Out-of-context Multimodal Media (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 46 citations
Grace Luo, Trevor Darrell, Anna Rohrbach
Simcse: Simple Contrastive Learning Of Sentence Embeddings (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 2273 citations
Tianyu Gao, Xingcheng Yao, Danqi Chen
Unsupervised Corpus Aware Language Model Pre-training For Dense Passage Retrieval (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 88 citations
Luyu Gao, Jamie Callan
Topic-driven And Knowledge-aware Transformer For Dialogue Emotion Detection (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 94 citations
Zhu et al.
Competency Problems: On Finding And Removing Artifacts In Language Data (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 68 citations
Gardner et al.
Crossfit: A Few-shot Learning Challenge For Cross-task Generalization In NLP (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 100 citations
Qinyuan Ye, Bill Yuchen Lin, Xiang Ren
End-to-end Speech Translation Via Cross-modal Progressive Training (2021) • Interspeech 2021 • 42 citations
Rong Ye, Mingxuan Wang, Lei Li
INVIGORATE: Interactive Visual Grounding And Grasping In Clutter (2021) • Robotics: Science and Systems XVII • 45 citations
Zhang et al.
LAION-400M: Open Dataset Of Clip-filtered 400 Million Image-text Pairs (2021) • Arxiv • 366 citations
Schuhmann et al.
Synthesis Of Compositional Animations From Textual Descriptions (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 140 citations
Ghosh et al.
Image Retrieval On Real-life Images With Pre-trained Vision-and-language Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 151 citations
Liu et al.
TIMEDIAL: Temporal Commonsense Reasoning In Dialog (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Qin et al.
Generating Datasets With Pretrained Language Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 40 citations
Timo Schick, Hinrich Schütze
Visualmrc: Machine Reading Comprehension On Document Images (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 62 citations
Ryota Tanaka, Kyosuke Nishida, Sen Yoshida
Hate Towards The Political Opponent: A Twitter Corpus Study Of The 2020 US Elections On The Basis Of Offensive Speech And Stance Detection (2021) • Arxiv • 41 citations
Lara Grimminger, Roman Klinger
The Multimodal Sentiment Analysis In Car Reviews (muse-car) Dataset: Collection, Insights And Improvements (2021) • IEEE Transactions on Affective Computing • 57 citations
Stappen et al.
Open-vocabulary Object Detection Via Vision And Language Knowledge Distillation (2021) • ICLR 2022 • 280 citations
Gu et al.
Airbert: In-domain Pretraining For Vision-and-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 118 citations
Guhur et al.
BENDR: Using Transformers And A Contrastive Self-supervised Learning Task To Learn From Massive Amounts Of EEG Data (2021) • Frontiers in Human Neuroscience • 172 citations
Demetres Kostas, Stephane Aroca-Ouellette, Frank Rudzicz
End-to-end Audio-visual Speech Recognition With Conformers (2021) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 202 citations
Pingchuan Ma, Stavros Petridis, Maja Pantic
Transnas-bench-101: Improving Transferability And Generalizability Of Cross-task Neural Architecture Search (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Duan et al.
Indobertweet: A Pretrained Language Model For Indonesian Twitter With Effective Domain-specific Vocabulary Initialization (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 45 citations
Fajri Koto, Jey Han Lau, Timothy Baldwin
Sub-instruction Aware Vision-and-language Navigation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 53 citations
Hong et al.
Fastbert: A Self-distilling BERT With Adaptive Inference Time (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 245 citations
Liu et al.
Sequence-level Mixed Sample Data Augmentation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 68 citations
Demi Guo, Yoon Kim, Alexander M. Rush
XGLUE: A New Benchmark Dataset For Cross-lingual Pre-training, Understanding And Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 58 citations
Liang et al.
Low Rank Fusion Based Transformers For Multimodal Sequences (2020) • Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML) • 53 citations
Sahay et al.
BOND: Bert-assisted Open-domain Named Entity Recognition With Distant Supervision (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 113 citations
Liang et al.
LAMBERT: Layout-aware (language) Modeling For Information Extraction (2020) • Lecture Notes in Computer Science • 84 citations
Garncarek et al.
Babywalk: Going Farther In Vision-and-language Navigation By Taking Baby Steps (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 59 citations
Zhu et al.
GREEK-BERT: The Greeks Visiting Sesame Street (2020) • 11th Hellenic Conference on Artificial Intelligence • 79 citations
Koutsikakis et al.
Coarse-to-fine Pre-training For Named Entity Recognition (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 49 citations
Xue et al.
A Comparison Of LSTM And BERT For Small Corpus (2020) • Arxiv • 66 citations
Aysu Ezen-Can
Tweepfake: About Detecting Deepfake Tweets (2020) • PLOS ONE • 153 citations
Fagni et al.
A Novel Graph-based Multi-modal Fusion Encoder For Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 139 citations
Yin et al.
Unifiedqa: Crossing Format Boundaries With A Single QA System (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 51 citations
Khashabi et al.
Multi-dialect Arabic BERT For Country-level Dialect Identification (2020) • Arxiv • 45 citations
Talafha et al.
Tabert: Pretraining For Joint Understanding Of Textual And Tabular Data (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 380 citations
Yin et al.
Meta-cotgan: A Meta Cooperative Training Paradigm For Improving Adversarial Text Generation (2020) • Arxiv • 62 citations
Yin et al.
Parsbert: Transformer-based Model For Persian Language Understanding (2020) • Neural Processing Letters • 111 citations
Farahani et al.
A Contextual Hierarchical Attention Network With Adaptive Objective For Dialogue State Tracking (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 57 citations
Shan et al.
Doc2dial: A Goal-oriented Document-grounded Dialogue Dataset (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 84 citations
Feng et al.
Aligning AI With Shared Human Values (2020) • Arxiv • 100 citations
Hendrycks et al.
Parallel Data Augmentation For Formality Style Transfer (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 67 citations
Yi Zhang, Tao Ge, Xu Sun
Towards Automated Neural Interaction Discovery For Click-through Rate Prediction (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 61 citations
Song et al.
Pretrained Transformers Improve Out-of-distribution Robustness (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 104 citations
Hendrycks et al.
Weakly-supervised Multi-level Attentional Reconstruction Network For Grounding Textual Queries In Videos (2020) • Arxiv • 51 citations
Song et al.
Leakage-adjusted Simulatability: Can Models Generate Non-trivial Explanations Of Their Behavior In Natural Language? (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 55 citations
Hase et al.
Human-centric Spatio-temporal Video Grounding With Visual Transformers (2020) • IEEE Transactions on Circuits and Systems for Video Technology • 75 citations
Tang et al.
Domain-specific Language Model Pretraining For Biomedical Natural Language Processing (2020) • ACM Transactions on Computing for Healthcare • 915 citations
Gu et al.
Kdconv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 101 citations
Zhou et al.
Injecting Numerical Reasoning Skills Into Language Models (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 53 citations
Mor Geva, Ankit Gupta, Jonathan Berant
Ctrlsum: Towards Generic Controllable Text Summarization (2020) • Arxiv • 50 citations
He et al.
Contrastive Triple Extraction With Generative Transformer (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 72 citations
Ye et al.
Machine Reading Comprehension: The Role Of Contextualized Language Models And Beyond (2020) • Arxiv • 48 citations
Zhuosheng Zhang, Hai Zhao, Rui Wang
Transformer Networks For Trajectory Forecasting (2020) • 2020 25th International Conference on Pattern Recognition (ICPR) • 336 citations
Giuliari et al.
Dynamic And Static Context-aware LSTM For Multi-agent Motion Prediction (2020) • Lecture Notes in Computer Science • 45 citations
Tao et al.
INSPIRED: Toward Sociable Recommendation Dialog Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 95 citations
Hayati et al.
A Large Dataset Of Historical Japanese Documents With Complex Layouts (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 52 citations
Zejiang Shen, Kaixuan Zhang, Melissa Dell
Creating Something From Nothing: Unsupervised Knowledge Distillation For Cross-modal Hashing (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 127 citations
Hu et al.
Rethinking Generalization Of Neural Models: A Named Entity Recognition Case Study (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 72 citations
Fu et al.
LUKE: Deep Contextualized Entity Representations With Entity-aware Self-attention (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 536 citations
Yamada et al.
Learning To Discretely Compose Reasoning Module Networks For Video Captioning (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 59 citations
Tan et al.
Learning From Others' Mistakes: Avoiding Dataset Biases Without Modeling Them (2020) • Arxiv • 51 citations
Sanh et al.
News Recommender System: A Review Of Recent Progress, Challenges, And Opportunities (2020) • Artificial Intelligence Review • 138 citations
Shaina Raza, Chen Ding
Language-guided Navigation Via Cross-modal Grounding And Alternate Adversarial Learning (2020) • IEEE Transactions on Circuits and Systems for Video Technology • 61 citations
Zhang et al.
Compositional Generalization In Semantic Parsing: Pre-training Vs. Specialized Architectures (2020) • Arxiv • 74 citations
Furrer et al.
Where Does It Exist: Spatio-temporal Video Grounding For Multi-form Sentences (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 94 citations
Zhang et al.
POINTER: Constrained Progressive Text Generation Via Insertion-based Generative Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 66 citations
Zhang et al.
CLUE: A Chinese Language Understanding Evaluation Benchmark (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 235 citations
Xu et al.
Totto: A Controlled Table-to-text Generation Dataset (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 69 citations
Parikh et al.
Self-training For End-to-end Speech Translation (2020) • Interspeech 2020 • 40 citations
Pino et al.
From Machine Reading Comprehension To Dialogue State Tracking: Bridging The Gap (2020) • Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI • 49 citations
Gao et al.
Generating Question Titles For Stack Overflow From Mined Code Snippets (2020) • ACM Transactions on Software Engineering and Methodology • 55 citations
Gao et al.
A Co-interactive Transformer For Joint Slot Filling And Intent Detection (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 89 citations
Qin et al.
Semi-supervised Neural Architecture Search (2020) • Arxiv • 43 citations
Luo et al.
Multi-task Collaborative Network For Joint Referring Expression Comprehension And Segmentation (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 268 citations
Luo et al.
A Dataset And Baselines For Visual Question Answering On Art (2020) • Lecture Notes in Computer Science • 49 citations
Garcia et al.
Widget Captioning: Generating Natural Language Description For Mobile User Interface Elements (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 42 citations
Li et al.
Evaluating Models' Local Decision Boundaries Via Contrast Sets (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 252 citations
Gardner et al.
Generative Data Augmentation For Commonsense Reasoning (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 92 citations
Yang et al.
On The Potential Of Lexico-logical Alignments For Semantic Parsing To SQL Queries (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 41 citations
Shi et al.
Improving Massively Multilingual Neural Machine Translation And Zero-shot Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 208 citations
Zhang et al.
Demographics Should Not Be The Reason Of Toxicity: Mitigating Discrimination In Text Classifications With Instance Weighting (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 55 citations
Zhang et al.
Reasoning With Latent Structure Refinement For Document-level Relation Extraction (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 280 citations
Nan et al.
OCNLI: Original Chinese Natural Language Inference (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 71 citations
Hu et al.
M3P: Learning Universal Representations Via Multitask Multilingual Multimodal Pre-training (2020) • Arxiv • 43 citations
Ni et al.
ARBERT & MARBERT: Deep Bidirectional Transformers For Arabic (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 217 citations
Muhammad Abdul-Mageed, Abdelrahim Elmadany, El Moatez Billah Nagoudi
Improving Coreference Resolution By Leveraging Entity-centric Features With Graph Neural Networks And Second-order Inference (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 112 citations
Liu et al.
Efficient Second-order Treecrf For Neural Dependency Parsing (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 88 citations
Yu Zhang, Zhenghua Li, Min Zhang
Neural CRF Model For Sentence Alignment In Text Simplification (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 97 citations
Jiang et al.
Diagnosing The Environment Bias In Vision-and-language Navigation (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 44 citations
Yubo Zhang, Hao Tan, Mohit Bansal
Knowledge Graph Based Synthetic Corpus Generation For Knowledge-enhanced Language Model Pre-training (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 49 citations
Agarwal et al.
Crows-pairs: A Challenge Dataset For Measuring Social Biases In Masked Language Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 148 citations
Nangia et al.
BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining (2020) • Proceedings of the 3rd Clinical Natural Language Processing Workshop • 73 citations
Zachariah Zhang, Jingshu Liu, Narges Razavian
TOD-BERT: Pre-trained Natural Language Understanding For Task-oriented Dialogue (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 203 citations
Wu et al.
Unitrans: Unifying Model Transfer And Data Transfer For Cross-lingual Named Entity Recognition With Unlabeled Data (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 45 citations
Wu et al.
Machine Generation And Detection Of Arabic Manipulated And Fake News (2020) • Arxiv • 41 citations
Nagoudi et al.
ETC: Encoding Long And Structured Inputs In Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 103 citations
Ainslie et al.
Fighting The COVID-19 Infodemic: Modeling The Perspective Of Journalists, Fact-checkers, Social Media Platforms, Policy Makers, And The Society (2020) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 61 citations
Alam et al.
Stereoset: Measuring Stereotypical Bias In Pretrained Language Models (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 224 citations
Moin Nadeem, Anna Bethke, Siva Reddy
Hover: A Dataset For Many-hop Fact Extraction And Claim Verification (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 83 citations
Jiang et al.
Covid-twitter-bert: A Natural Language Processing Model To Analyse COVID-19 Content On Twitter (2020) • Arxiv • 134 citations
Martin Müller, Marcel Salathé, Per E Kummervold
Fashion Captioning: Towards Generating Accurate Descriptions With Semantic Rewards (2020) • Lecture Notes in Computer Science • 61 citations
Yang et al.
Grid Tagging Scheme For Aspect-oriented Fine-grained Opinion Extraction (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 200 citations
Wu et al.
Uncertainty-aware Self-training For Text Classification With Few Labels (2020) • Arxiv • 41 citations
Subhabrata Mukherjee, Ahmed Hassan Awadallah
Logic-guided Data Augmentation And Regularization For Consistent Question Answering (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 59 citations
Akari Asai, Hannaneh Hajishirzi
Inltk: Natural Language Toolkit For Indic Languages (2020) • Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS) • 42 citations
Gaurav Arora
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder For Long-form Document Matching (2020) • CIKM '20: The 29th ACM International Conference on Information and Knowledge Management • 53 citations
Yang et al.
Imagebert: Cross-modal Pre-training With Large-scale Weak-supervised Image-text Data (2020) • Arxiv • 154 citations
Qi et al.
Textattack: A Framework For Adversarial Attacks, Data Augmentation, And Adversarial Training In NLP (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 245 citations
Morris et al.
Stereotypical Bias Removal For Hate Speech Detection Task Using Knowledge-based Generalizations (2020) • The World Wide Web Conference • 89 citations
Pinkesh Badjatiya, Manish Gupta, Vasudeva Varma
Prophetnet: Predicting Future N-gram For Sequence-to-sequence Pre-training (2020) • Arxiv • 83 citations
Qi et al.
Referring Expression Comprehension: A Survey Of Methods And Datasets (2020) • IEEE Transactions on Multimedia • 81 citations
Yanyuan Qiao, Chaorui Deng, Qi Wu
Towards Conversational Recommendation Over Multi-type Dialogs (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 151 citations
Liu et al.
Graphdialog: Integrating Graph Knowledge Into End-to-end Task-oriented Dialogue Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Shiquan Yang, Rui Zhang, Sarah Erfani
Overview Of Checkthat! 2020: Automatic Identification And Verification Of Claims In Social Media (2020) • Lecture Notes in Computer Science • 80 citations
Barron-Cedeno et al.
Investigating Pretrained Language Models For Graph-to-text Generation (2020) • Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI • 43 citations
Ribeiro et al.
Toxic Language Detection In Social Media For Brazilian Portuguese: New Dataset And Multilingual Analysis (2020) • Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing • 45 citations
Leite et al.
Latent Opinions Transfer Network For Target-oriented Opinion Words Extraction (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 90 citations
Wu et al.
TED: A Pretrained Unsupervised Summarization Model With Theme Modeling And Denoising (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 43 citations
Yang et al.
Semeval-2020 Task 4: Commonsense Validation And Explanation (2020) • Proceedings of the Fourteenth Workshop on Semantic Evaluation • 89 citations
Wang et al.
Deep Entity Matching With Pre-trained Language Models (2020) • Proceedings of the VLDB Endowment • 246 citations
Li et al.
Adversarial Filters Of Dataset Biases (2020) • Arxiv • 125 citations
Bras et al.
A Large-scale Chinese Short-text Conversation Dataset (2020) • Lecture Notes in Computer Science • 99 citations
Wang et al.
Syntactic Data Augmentation Increases Robustness To Inference Heuristics (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 143 citations
Min et al.
A Survey On Machine Reading Comprehension: Tasks, Evaluation Metrics And Benchmark Datasets (2020) • Applied Sciences • 61 citations
Zeng et al.
Pre-training Graph Transformer With Multimodal Side Information For Recommendation (2020) • MM '21: ACM Multimedia Conference • 63 citations
Liu et al.
MART: Memory-augmented Recurrent Transformer For Coherent Video Paragraph Captioning (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 121 citations
Lei et al.
DIET: Lightweight Language Understanding For Dialogue Systems (2020) • Arxiv • 112 citations
Bunk et al.
What Is More Likely To Happen Next? Video-and-language Future Event Prediction (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 41 citations
Lei et al.
Data Manipulation: Towards Effective Instance Learning For Neural Dialogue Generation Via Learning To Augment And Reweight (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 56 citations
Cai et al.
Cat-gen: Improving Robustness In NLP Models Via Controlled Adversarial Text Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 62 citations
Wang et al.
Factual Error Correction For Abstractive Summarization Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 118 citations
Cao et al.
Expertise Style Transfer: A New Task Towards Better Communication Between Experts And Laymen (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 50 citations
Cao et al.
With Little Power Comes Great Responsibility (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 73 citations
Card et al.
Multiwoz 2.2 : A Dialogue Dataset With Additional Annotation Corrections And State Tracking Baselines (2020) • Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI • 165 citations
Zang et al.
Hatebert: Retraining BERT For Abusive Language Detection In English (2020) • Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) • 60 citations
Caselli et al.
Mitigating Gender Bias For Neural Dialogue Generation With Adversarial Learning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 52 citations
Liu et al.
Exclusive Hierarchical Decoding For Deep Keyphrase Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 62 citations
Chen et al.
Artificial Intelligence (AI) In Action: Addressing The COVID-19 Pandemic With Natural Language Processing (NLP) (2020) • Annual Review of Biomedical Data Science • 56 citations
Chen et al.
Adaptive Offline Quintuplet Loss For Image-text Matching (2020) • Lecture Notes in Computer Science • 67 citations
Tianlang Chen, Jiajun Deng, Jiebo Luo
Cops-ref: A New Dataset And Task On Compositional Referring Expression Comprehension (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Chen et al.
IMRAM: Iterative Matching With Recurrent Attention Memory For Cross-modal Image-text Retrieval (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 391 citations
Chen et al.
Hybridqa: A Dataset Of Multi-hop Question Answering Over Tabular And Textual Data (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 178 citations
Chen et al.
Logic2text: High-fidelity Natural Language Generation From Logical Forms (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 49 citations
Chen et al.
Learning Modality Interaction For Temporal Sentence Localization And Event Captioning In Videos (2020) • Lecture Notes in Computer Science • 89 citations
Chen et al.
Local Additivity Based Data Augmentation For Semi-supervised NER (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Chen et al.
Few-shot Natural Language Generation For Task-oriented Dialog (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 157 citations
Peng et al.
AUTSL: A Large Scale Multi-modal Turkish Sign Language Dataset And Baseline Methods (2020) • IEEE Access • 200 citations
Ozge Mercanoglu Sincan, Hacer Yalim Keles
Graph-structured Referring Expression Reasoning In The Wild (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 96 citations
Sibei Yang, Guanbin Li, Yizhou Yu
What Can We Learn From Collective Human Opinions On Natural Language Inference Data? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 77 citations
Yixin Nie, Xiang Zhou, Mohit Bansal
Dialoglue: A Natural Language Understanding Benchmark For Task-oriented Dialogue (2020) • Arxiv • 97 citations
Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur
HERO: Hierarchical Encoder For Video+language Omni-representation Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 372 citations
Li et al.
A Benchmark For Systematic Generalization In Grounded Language Understanding (2020) • Arxiv • 45 citations
Ruis et al.
VIOLIN: A Large-scale Dataset For Video-and-language Inference (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Liu et al.
Room-across-room: Multilingual Vision-and-language Navigation With Dense Spatiotemporal Grounding (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 190 citations
Ku et al.
Pre-training For Abstractive Document Summarization By Reinstating Source Text (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 42 citations
Zou et al.
A Vietnamese Dataset For Evaluating Machine Reading Comprehension (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 58 citations
Nguyen et al.
Zero-resource Knowledge-grounded Dialogue Generation (2020) • Arxiv • 50 citations
Li et al.
Neural Deepfake Detection With Factual Structure Of Text (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 41 citations
Zhong et al.
Grappa: Grammar-augmented Pre-training For Table Semantic Parsing (2020) • Arxiv • 59 citations
Yu et al.
Pre-training Multilingual Neural Machine Translation By Leveraging Alignment Information (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 82 citations
Lin et al.
Interbert: Vision-and-language Interaction For Multi-modal Pretraining (2020) • Arxiv • 56 citations
Lin et al.
Mapping Natural Language Instructions To Mobile UI Action Sequences (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 71 citations
Li et al.
MM-COVID: A Multilingual And Multimodal Data Repository For Combating COVID-19 Disinformation (2020) • Arxiv • 51 citations
Li et al.
Directions In Abusive Language Training Data: Garbage In, Garbage Out (2020) • PLOS ONE • 132 citations
Bertie Vidgen, Leon Derczynski
Jointly Cross- And Self-modal Graph Attention Network For Query-based Moment Localization (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 108 citations
Liu et al.
X-stance: A Multilingual Multi-target Dataset For Stance Detection (2020) • Arxiv • 45 citations
Jannis Vamvas, Rico Sennrich
Span-convert: Few-shot Span Extraction For Dialog With Pretrained Conversational Representations (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 43 citations
Coope et al.
Knowledge Graph-augmented Abstractive Summarization With Semantic-driven Cloze Reward (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 148 citations
Luyang Huang, Lingfei Wu, Lu Wang
Weakly-supervised Aspect-based Sentiment Analysis Via Joint Aspect-sentiment Topic Embedding (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 53 citations
Huang et al.
Frameaxis: Characterizing Microframe Bias And Intensity With Word Embedding (2020) • PeerJ Computer Science • 43 citations
Kwak et al.
Enhancing Extractive Text Summarization With Topic-aware Graph Neural Networks (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 63 citations
Peng Cui, Le Hu, Yuanchao Liu
Mutual: A Dataset For Multi-turn Dialogue Reasoning (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 110 citations
Cui et al.
Data Augmentation Using Pre-trained Transformer Models (2020) • Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems • 61 citations
Varun Kumar, Ashutosh Choudhary, Eunah Cho
Coco: Controllable Counterfactuals For Evaluating Dialogue State Trackers (2020) • Arxiv • 41 citations
Li et al.
Med-bert: Pre-trained Contextualized Embeddings On Large-scale Structured Electronic Health Records For Disease Prediction (2020) • Arxiv • 61 citations
Rasmy et al.
Cost-effective Selection Of Pretraining Data: A Case Study Of Pretraining BERT On Social Media (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 46 citations
Dai et al.
Few-shot Named Entity Recognition: A Comprehensive Study (2020) • Arxiv • 51 citations
Huang et al.
Understanding Neural Abstractive Summarization Models Via Uncertainty (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 45 citations
Jiacheng Xu, Shrey Desai, Greg Durrett
Plotmachines: Outline-conditioned Generation With Dynamic Plot State Tracking (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 97 citations
Rashkin et al.
Iterative Edit-based Unsupervised Sentence Simplification (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 57 citations
Kumar et al.
Dual-mode ASR: Unify And Improve Streaming ASR With Full-context Modeling (2020) • IEEE Transactions on Multimedia • 50 citations
Yu et al.
AGIF: An Adaptive Graph-interactive Framework For Joint Multiple Intent Detection And Slot Filling (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 108 citations
Qin et al.
Vlanet: Video-language Alignment Network For Weakly-supervised Video Moment Retrieval (2020) • Lecture Notes in Computer Science • 78 citations
Ma et al.
DMD: A Large-scale Multi-modal Driver Monitoring Dataset For Attention And Alertness Analysis (2020) • Lecture Notes in Computer Science • 90 citations
Ortega et al.
Chart-to-text: Generating Natural Language Descriptions For Charts By Adapting The Transformer Model (2020) • Proceedings of the 13th International Conference on Natural Language Generation • 46 citations
Jason Obeid, Enamul Hoque
Reclor: A Reading Comprehension Dataset Requiring Logical Reasoning (2020) • Arxiv • 127 citations
Yu et al.
Towards Robustifying NLI Models Against Lexical Dataset Biases (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 143 citations
Xiang Zhou, Mohit Bansal
Fquad: French Question Answering Dataset (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 62 citations
D'Hoffschmidt et al.
Extractive Summarization As Text Matching (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 394 citations
Zhong et al.
Modality-agnostic Attention Fusion For Visual Search With Text Feedback (2020) • Arxiv • 47 citations
Dodds et al.
Robbert: A Dutch Roberta-based Language Model (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 103 citations
Pieter Delobelle, Thomas Winters, Bettina Berendt
Structure-grounded Pretraining For Text-to-sql (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 56 citations
Deng et al.
Chinese Street View Text: Large-scale Chinese Text Reading With Partially Supervised Learning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 55 citations
Sun et al.
BIGPATENT: A Large-scale Dataset For Abstractive And Coherent Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 155 citations
Eva Sharma, Chen Li, Lu Wang
Evaluating The State-of-the-art Of End-to-end Natural Language Generation: The E2E NLG Challenge (2019) • Computer Speech & Language • 180 citations
Ondřej Dušek, Jekaterina Novikova, Verena Rieser
A Novel Bi-directional Interrelated Model For Joint Intent Detection And Slot Filling (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 188 citations
E et al.
Clevr-dialog: A Diagnostic Dataset For Multi-round Reasoning In Visual Dialog (2019) • Arxiv • 49 citations
Kottur et al.
End-to-end Text-to-speech For Low-resource Languages By Cross-lingual Transfer Learning (2019) • Interspeech 2019 • 71 citations
Tu et al.
Neural Metric Learning For Fast End-to-end Relation Extraction (2019) • Arxiv • 40 citations
Tung Tran, Ramakanth Kavuluru
Recommendation As A Communication Game: Self-supervised Bot-play For Goal-oriented Dialogue (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 80 citations
Kang et al.
Automated Essay Scoring Based On Two-stage Learning (2019) • Arxiv • 45 citations
Jiawei Liu, Yang Xu, Yaguang Zhu
Self-attention Aligner: A Latency-control End-to-end Model For ASR Using Self-attention Network And Chunk-hopping (2019) • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 100 citations
Linhao Dong, Feng Wang, Bo Xu
Mirrorgan: Learning Text-to-image Generation By Redescription (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 589 citations
Qiao et al.
Polysemous Visual-semantic Embedding For Cross-modal Retrieval (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 243 citations
Yale Song, Mohammad Soleymani
Cosql: A Conversational Text-to-sql Challenge Towards Cross-domain Natural Language Interfaces To Databases (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 122 citations
Yu et al.
TWEETQA: A Social Media Focused Question Answering Dataset (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 64 citations
Xiong et al.
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs (2019) • Arxiv • 96 citations
Dua et al.
An Entity-driven Framework For Abstractive Summarization (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 59 citations
Sharma et al.
Activitynet-qa: A Dataset For Understanding Complex Web Videos Via Question Answering (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 199 citations
Yu et al.
Connecting The Dots: Document-level Neural Relation Extraction With Edge-oriented Graphs (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 229 citations
Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou
The Flores Evaluation Datasets For Low-resource Machine Translation: Nepali-english And Sinhala-english (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 92 citations
Guzmán et al.
Entity, Relation, And Event Extraction With Contextualized Span Representations (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 275 citations
Wadden et al.
Adapting Text Embeddings For Causal Inference (2019) • Arxiv • 51 citations
Victor Veitch, Dhanya Sridhar, David M. Blei
Multilingual Is Not Enough: BERT For Finnish (2019) • Arxiv • 121 citations
Virtanen et al.
Improving Short Text Classification Through Global Augmentation Methods (2019) • Lecture Notes in Computer Science • 61 citations
Vukosi Marivate, Tshephisho Sefara
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 545 citations
Marino et al.
Earlier Attention? Aspect-aware LSTM For Aspect-based Sentiment Analysis (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 50 citations
Xing et al.
Camembert: A Tasty French Language Model (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 336 citations
Martin et al.
Automatic Radiology Report Generation Based On Multi-view Image Fusion And Medical Concept Enrichment (2019) • Lecture Notes in Computer Science • 165 citations
Yuan et al.
Mixture Content Selection For Diverse Sequence Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 56 citations
Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi
Improving The Similarity Measure Of Determinantal Point Processes For Extractive Multi-document Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 61 citations
Cho et al.
Learning Dual Retrieval Module For Semi-supervised Relation Extraction (2019) • The World Wide Web Conference • 61 citations
Lin et al.
CONAN -- Counter Narratives Through Nichesourcing: A Multilingual Dataset Of Responses To Fight Online Hate Speech (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 78 citations
Chung et al.
Distilling Task-specific Knowledge From BERT Into Simple Neural Networks (2019) • Arxiv • 337 citations
Tang et al.
Meta-learning With Dynamic-memory-based Prototypical Network For Few-shot Event Detection (2019) • Proceedings of the 13th International Conference on Web Search and Data Mining • 69 citations
Deng et al.
A Simple But Effective Method To Incorporate Multi-turn Context With BERT For Conversational Machine Comprehension (2019) • Proceedings of the First Workshop on NLP for Conversational AI • 42 citations
Ohsugi et al.
Coupling Retrieval And Meta-learning For Context-dependent Semantic Parsing (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 44 citations
Guo et al.
End-to-end Bias Mitigation By Modelling Biases In Corpora (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 134 citations
Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson
Expressing Visual Relationships Via Language (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 42 citations
Tan et al.
ELI5: Long Form Question Answering (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 284 citations
Fan et al.
Cosmos QA: Machine Reading Comprehension With Contextual Commonsense Reasoning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 271 citations
Huang et al.
Transforming Delete, Retrieve, Generate Approach For Controlled Text Style Transfer (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Akhilesh Sudhakar, Bhargav Upadhyay, Arjun Maheswaran
Benchmarking Zero-shot Text Classification: Datasets, Evaluation And Entailment Approach (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 219 citations
Wenpeng Yin, Jamaal Hay, Dan Roth
A Systematic Comparison Of Methods For Low-resource Dependency Parsing On Genuinely Low-resource Languages (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 51 citations
Vania et al.
Amazonqa: A Review-based Question Answering Task (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 45 citations
Gupta et al.
Text Readability Assessment For Second Language Learners (2019) • Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications • 138 citations
Menglin Xia, Ekaterina Kochmar, Ted Briscoe
Pretrained Language Models For Sequential Sentence Classification (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 51 citations
Cohan et al.
Do Sentence Interactions Matter? Leveraging Sentence Level Representations For Fake News Classification (2019) • Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13) • 80 citations
Vaibhav Vaibhav, Raghuram Mandyam Annasamy, Eduard Hovy
Speech Model Pre-training For End-to-end Spoken Language Understanding (2019) • Interspeech 2019 • 41 citations
Lugosch et al.
Improving Multi-turn Dialogue Modelling With Utterance Rewriter (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 106 citations
Su et al.
Context-aware Visual Policy Network For Fine-grained Image Captioning (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 151 citations
Zha et al.
Olmpics -- On What Language Model Pre-training Captures (2019) • Transactions of the Association for Computational Linguistics • 55 citations
Talmor et al.
Hellaswag: Can A Machine Really Finish Your Sentence? (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 572 citations
Zellers et al.
Spatio-temporal Dynamics And Semantic Attribute Enriched Visual Encoding For Video Captioning (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 234 citations
Aafaq et al.
A Question-entailment Approach To Question Answering (2019) • BMC Bioinformatics • 174 citations
Asma Ben Abacha, Dina Demner-Fushman
75 Languages, 1 Model: Parsing Universal Dependencies Universally (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 61 citations
Dan Kondratyuk, Milan Straka
Unlearn Dataset Bias In Natural Language Inference By Fitting The Residual (2019) • Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019) • 164 citations
He He, Sheng Zha, Haohan Wang
Visual Entailment: A Novel Task For Fine-grained Image Understanding (2019) • Arxiv • 162 citations
Xie et al.
Generating Token-level Explanations For Natural Language Inference (2019) • Proceedings of the 2019 Conference of the North • 41 citations
Thorne et al.
Learning To Generalize From Sparse And Underspecified Rewards (2019) • Proceedings of the 36th International Conference on Machine Learning PMLR 97130-140 2019 • 46 citations
Agarwal et al.
Juice: A Large Scale Distantly Supervised Dataset For Open Domain Context-based Code Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Rajas Agashe, Srinivasan Iyer, Luke Zettlemoyer
A Unified MRC Framework For Named Entity Recognition (2019) • Arxiv • 50 citations
Li et al.
Jasper: An End-to-end Convolutional Neural Acoustic Model (2019) • Interspeech 2019 • 212 citations
Li et al.
Learning The Difference That Makes A Difference With Counterfactually-augmented Data (2019) • Arxiv • 231 citations
Divyansh Kaushik, Eduard Hovy, Zachary C. Lipton
PEGASUS: Pre-training With Extracted Gap-sentences For Abstractive Summarization (2019) • Arxiv • 976 citations
Zhang et al.
Reconstruct And Represent Video Contents For Captioning Via Reinforcement Learning (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 79 citations
Zhang et al.
Probing What Different NLP Tasks Teach Machines About Function Word Comprehension (2019) • Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) • 94 citations
Kim et al.
The Effect Of Translationese In Machine Translation Test Sets (2019) • Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) • 69 citations
Mike Zhang, Antonio Toral
Induction Networks For Few-shot Text Classification (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 197 citations
Geng et al.
HIBERT: Document Level Pre-training Of Hierarchical Bidirectional Transformers For Document Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 161 citations
Xingxing Zhang, Furu Wei, Ming Zhou
Mathqa: Towards Interpretable Math Word Problem Solving With Operation-based Formalisms (2019) • Arxiv • 119 citations
Amini et al.
Controllable Dual Skew Divergence Loss For Neural Machine Translation (2019) • Arxiv • 79 citations
Li et al.
Vision-and-dialog Navigation (2019) • Arxiv • 118 citations
Thomason et al.
UER: An Open-source Toolkit For Pre-training Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations • 88 citations
Zhao et al.
NAS Evaluation Is Frustratingly Hard (2019) • Arxiv • 109 citations
Antoine Yang, Pedro M. Esperança, Fabio M. Carlucci
Simple And Effective Text Matching With Richer Alignment Features (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 171 citations
Yang et al.
Grounding Human-to-vehicle Advice For Self-driving Vehicles (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Kim et al.
Roberta: A Robustly Optimized BERT Pretraining Approach (2019) • Arxiv • 16976 citations
Liu et al.
Automatic Argument Quality Assessment -- New Datasets And Methods (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 55 citations
Toledo et al.
Text Summarization With Pretrained Encoders (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 1525 citations
Yang Liu, Mirella Lapata
Summary Level Training Of Sentence Rewriting For Abstractive Summarization (2019) • Proceedings of the 2nd Workshop on New Frontiers in Summarization • 61 citations
Bae et al.
Generalized Data Augmentation For Low-resource Translation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 109 citations
Xia et al.
Improving Referring Expression Grounding With Cross-modal Attention-guided Erasing (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 189 citations
Liu et al.
A Stack-propagation Framework With Token-level Intent Detection For Spoken Language Understanding (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 302 citations
Qin et al.
Bert-based Ranking For Biomedical Entity Normalization (2019) • Arxiv • 93 citations
Zongcheng Ji, Qiang Wei, Hua Xu
Saliency-guided Attention Network For Image-sentence Matching (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 105 citations
Ji et al.
TANDA: Transfer And Adapt Pre-trained Transformer Models For Answer Sentence Selection (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 92 citations
Siddhant Garg, Thuy Vu, Alessandro Moschitti
MLQA: Evaluating Cross-lingual Extractive Question Answering (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 52 citations
Lewis et al.
Constrained Decoding For Neural NLG From Compositional Representations In Task-oriented Dialogue (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 74 citations
Balakrishnan et al.
Unsupervised Question Answering By Cloze Translation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 136 citations
Patrick Lewis, Ludovic Denoyer, Sebastian Riedel
Scibert: A Pretrained Language Model For Scientific Text (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 1631 citations
Iz Beltagy, Kyle Lo, Arman Cohan
NCLS: Neural Cross-lingual Summarization (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 121 citations
Zhu et al.
Multi-task Deep Neural Networks For Natural Language Understanding (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1026 citations
Liu et al.
Humor Detection: A Transformer Gets The Last Laugh (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 44 citations
Orion Weller, Kevin Seppi
PAWS-X: A Cross-lingual Adversarial Dataset For Paraphrase Identification (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 222 citations
Yang et al.
Enhancing Amr-to-text Generation With Dual Graph Representations (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 57 citations
Leonardo F. R. Ribeiro, Claire Gardent, Iryna Gurevych
Sampling Bias In Deep Active Classification: An Empirical Study (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 45 citations
Ameya Prabhu, Charles Dognin, Maneesh Singh
GEAR: Graph-based Evidence Aggregating And Reasoning For Fact Verification (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 196 citations
Zhou et al.
Knowledge Aware Conversation Generation With Explainable Reasoning Over Augmented Graphs (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 95 citations
Liu et al.
Topic-enhanced Memory Networks For Personalised Point-of-interest Recommendation (2019) • Arxiv • 54 citations
Xiao Zhou, Cecilia Mascolo, Zhongxiang Zhao
Multi-task Learning With Language Modeling For Question Generation (2019) • Arxiv • 58 citations
Wenjie Zhou, Minghua Zhang, Yunfang Wu
Cm-net: A Novel Collaborative Memory Network For Spoken Language Understanding (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 89 citations
Liu et al.
Counterfactual Story Reasoning And Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 98 citations
Qin et al.
A Novel Aspect-guided Deep Transition Model For Aspect Based Sentiment Analysis (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 49 citations
Liang et al.
Taskmaster-1: Toward A Realistic And Diverse Dialog Dataset (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 134 citations
Byrne et al.
Assessing The Factual Accuracy Of Generated Text (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 145 citations
Goodrich et al.
Winogrande: An Adversarial Winograd Schema Challenge At Scale (2019) • Arxiv • 83 citations
Sakaguchi et al.
Transfer Learning In Biomedical Natural Language Processing: An Evaluation Of BERT And Elmo On Ten Benchmarking Datasets (2019) • Proceedings of the 18th BioNLP Workshop and Shared Task • 792 citations
Yifan Peng, Shankai Yan, Zhiyong Lu
Addressing Semantic Drift In Question Generation For Semi-supervised Question Answering (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 109 citations
Shiyue Zhang, Mohit Bansal
Howto100m: Learning A Text-video Embedding By Watching Hundred Million Narrated Video Clips (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 117 citations
Miech et al.
Measuring Compositional Generalization: A Comprehensive Method On Realistic Data (2019) • Arxiv • 55 citations
Keysers et al.
Personalized Dialogue Generation With Diversified Traits (2019) • Arxiv • 89 citations
Zheng et al.
Clevr-ref+: Diagnosing Visual Reasoning With Referring Expressions (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Liu et al.
From Senones To Chenones: Tied Context-dependent Graphemes For Hybrid Speech Recognition (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 72 citations
Le et al.
Sentence Centrality Revisited For Unsupervised Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 175 citations
Hao Zheng, Mirella Lapata
Proactive Human-machine Conversation With Explicit Conversation Goals (2019) • Arxiv • 41 citations
Wu et al.
Pubmedqa: A Dataset For Biomedical Research Question Answering (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 425 citations
Jin et al.
Adversarial NLI: A New Benchmark For Natural Language Understanding (2019) • Arxiv • 66 citations
Nie et al.
Crossweigh: Training Named Entity Tagger From Imperfect Annotations (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 96 citations
Wang et al.
Mining Discourse Markers For Unsupervised Sentence Representation Learning (2019) • Proceedings of the 2019 Conference of the North • 41 citations
Sileo et al.
Fine-tune Bert For Docred With Two-step Process (2019) • Arxiv • 116 citations
Wang et al.
Integrating Multimodal Information In Large Pretrained Transformers (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 421 citations
Rahman et al.
Evidence Sentence Extraction For Machine Reading Comprehension (2019) • Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) • 45 citations
Wang et al.
Modeling Sentiment Dependencies With Graph Convolutional Networks For Aspect-level Sentiment Classification (2019) • Knowledge-Based Systems • 180 citations
Pinlong Zhaoa, Linlin Houb, Ou Wua
Automatic Spanish Translation Of The Squad Dataset For Multilingual Question Answering (2019) • Arxiv • 42 citations
Casimiro Pio Carrino, Marta R. Costa-Jussà, José A. R. Fonollosa
VATEX: A Large-scale, High-quality Multilingual Dataset For Video-and-language Research (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 326 citations
Wang et al.
Weakly-supervised Spatio-temporally Grounding Natural Sentence In Video (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 101 citations
Chen et al.
Superglue: A Stickier Benchmark For General-purpose Language Understanding Systems (2019) • Arxiv • 984 citations
Wang et al.
BERT Post-training For Review Reading Comprehension And Aspect-based Sentiment Analysis (2019) • Arxiv • 358 citations
Xu et al.
A Constructive Prediction Of The Generalization Error Across Scales (2019) • Arxiv • 49 citations
Rosenfeld et al.
Matching Images And Text With Multi-modal Tensor Fusion And Re-ranking (2019) • Proceedings of the 27th ACM International Conference on Multimedia • 145 citations
Wang et al.
Trouble On The Horizon: Forecasting The Derailment Of Online Conversations As They Develop (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 45 citations
Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil
A Comprehensive Exploration On Wikisql With Table-aware Word Contextualization (2019) • Arxiv • 122 citations
Hwang et al.
UNITER: Universal Image-text Representation Learning (2019) • Arxiv • 183 citations
Chen et al.
A Hierarchical Reinforced Sequence Operation Method For Unsupervised Text Style Transfer (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 53 citations
Wu et al.
Understanding Dataset Design Choices For Multi-hop Reasoning (2019) • Proceedings of the 2019 Conference of the North • 90 citations
Jifan Chen, Greg Durrett
Multi-hop Question Answering Via Reasoning Chains (2019) • Arxiv • 66 citations
Jifan Chen, Shih-Ting Lin, Greg Durrett
Blackmarks: Blackbox Multibit Watermarking For Deep Neural Networks (2019) • Arxiv • 41 citations
Huili Chen, Bita Darvish Rouhani, Farinaz Koushanfar
Adaptive Embedding Gate For Attention-based Scene Text Recognition (2019) • Neurocomputing • 41 citations
Chen et al.
Complementary Fusion Of Multi-features And Multi-modalities In Sentiment Analysis (2019) • Arxiv • 53 citations
Chen et al.
Deep Short Text Classification With Knowledge Powered Attention (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 132 citations
Chen et al.
Temporal Deformable Convolutional Encoder-decoder Networks For Video Captioning (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 99 citations
Chen et al.
Review-driven Answer Generation For Product-related Questions In E-commerce (2019) • Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining • 50 citations
Chen et al.
Tabfact: A Large-scale Dataset For Table-based Fact Verification (2019) • Arxiv • 179 citations
Chen et al.
Docred: A Large-scale Document-level Relation Extraction Dataset (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 479 citations
Yao et al.
Higru: Hierarchical Gated Recurrent Units For Utterance-level Emotion Recognition (2019) • Arxiv • 70 citations
Jiao et al.
HELP: A Dataset For Identifying Shortcomings Of Neural Models In Monotonicity Reasoning (2019) • Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) • 50 citations
Yanaka et al.
Explain Yourself! Leveraging Language Models For Commonsense Reasoning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 86 citations
Rajani et al.
Conversing By Reading: Contentful Neural Conversation With On-demand Machine Reading (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 97 citations
Qin et al.
GQA: A New Dataset For Real-world Visual Reasoning And Compositional Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Drew A. Hudson, Christopher D. Manning
Convlab: Multi-domain End-to-end Dialog System Platform (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 87 citations
Lee et al.
Hotpotqa: A Dataset For Diverse, Explainable Multi-hop Question Answering (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 540 citations
Yang et al.
What Makes Reading Comprehension Questions Easier? (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 102 citations
Sugawara et al.
Building A Conversational Agent Overnight With Dialogue Self-play (2018) • Arxiv • 161 citations
Shah et al.
Staqc: A Systematically Mined Question-code Dataset From Stack Overflow (2018) • the 2018 World Wide Web Conference • 44 citations
Yao et al.
Know What You Don't Know: Unanswerable Questions For Squad (2018) • Arxiv • 209 citations
Pranav Rajpurkar, Robin Jia, Percy Liang
Towards Explainable NLP: A Generative Explanation Framework For Text Classification (2018) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 141 citations
Hui Liu, Qingyu Yin, William Yang Wang
Multimodal Explanations: Justifying Decisions And Pointing To The Evidence (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 101 citations
Park et al.
Learning A Text-video Embedding From Incomplete And Heterogeneous Data (2018) • Arxiv • 174 citations
Antoine Miech, Ivan Laptev, Josef Sivic
Wronging A Right: Generating Better Errors To Improve Grammatical Error Detection (2018) • Arxiv • 56 citations
Sudhanshu Kasewa, Pontus Stenetorp, Sebastian Riedel
Word2vec Applied To Recommendation: Hyperparameters Matter (2018) • Arxiv • 44 citations
Hugo Caselles-Dupré, Florian Lesaint, Jimena Royo-Letelier
Neural Aesthetic Image Reviewer (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 107 citations
Wang et al.
Neural Baby Talk (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 442 citations
Lu et al.
Adventure: Adversarial Training For Textual Entailment With Knowledge-guided Examples (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 64 citations
Kang et al.
ODSQA: Open-domain Spoken Question Answering Dataset (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 42 citations
Lee et al.
Complex Sequential Question Answering: Towards Learning To Converse Over Linked Question Answer Pairs With A Knowledge Graph (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 172 citations
Saha et al.
Emrqa: A Large Corpus For Question Answering On Electronic Medical Records (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 164 citations
Pampari et al.
End-to-end Non-autoregressive Neural Machine Translation With Connectionist Temporal Classification (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 132 citations
Jindřich Libovický, Jindřich Helcl
Improving Text-to-sql Evaluation Methodology (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 213 citations
Finegan-Dollak et al.
Preco: A Large-scale Dataset In Preschool Vocabulary For Coreference Resolution (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 44 citations
Chen et al.
Can A Suit Of Armor Conduct Electricity? A New Dataset For Open Book Question Answering (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 557 citations
Mihaylov et al.
Born Again Neural Networks (2018) • Arxiv • 442 citations
Furlanello et al.
A Reinforced Topic-aware Convolutional Sequence-to-sequence Model For Abstractive Text Summarization (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 127 citations
Wang et al.
Unsupervised Discrete Sentence Representation Learning For Interpretable Neural Dialog Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 136 citations
Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi
Gender Bias In Neural Natural Language Processing (2018) • Arxiv • 73 citations
Lu et al.
Multi-modal Data Augmentation For End-to-end ASR (2018) • Interspeech 2018 • 54 citations
Renduchintala et al.
Hierarchical Neural Story Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1076 citations
Angela Fan, Mike Lewis, Yann Dauphin
Video Description: A Survey Of Methods, Datasets And Evaluation Metrics (2018) • ACM Computing Surveys • 138 citations
Aafaq et al.
Mem2seq: Effectively Incorporating Knowledge Bases Into End-to-end Task-oriented Dialog Systems (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Andrea Madotto, Chien-Sheng Wu, Pascale Fung
Reasoning About Actions And State Changes By Injecting Commonsense Knowledge (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 84 citations
Tandon et al.
Sarcasm Analysis Using Conversation Context (2018) • Computational Linguistics • 77 citations
Debanjan Ghosh, Alexander R. Fabbri, Smaranda Muresan
Query And Output: Generating Words By Querying Distributed Word Representations For Paraphrase Generation (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 67 citations
Ma et al.
AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale (2018) • Arxiv • 201 citations
Du et al.
Transforming Question Answering Datasets Into Natural Language Inference Datasets (2018) • Arxiv • 122 citations
Dorottya Demszky, Kelvin Guu, Percy Liang
GLAC Net: Glocal Attention Cascading Networks For Multi-image Cued Story Generation (2018) • Arxiv • 53 citations
Kim et al.
Nocaps: Novel Object Captioning At Scale (2018) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 233 citations
Agrawal et al.
A Retrospective Analysis Of The Fake News Challenge Stance Detection Task (2018) • Arxiv • 68 citations
Hanselowski et al.
Textual Explanations For Self-driving Vehicles (2018) • Lecture Notes in Computer Science • 283 citations
Kim et al.
A Large-scale Corpus For Conversation Disentanglement (2018) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 86 citations
Kummerfeld et al.
Event2mind: Commonsense Inference On Events, Intents, And Reactions (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 40 citations
Rashkin et al.
End-to-end Neural Entity Linking (2018) • Proceedings of the 22nd Conference on Computational Natural Language Learning • 233 citations
Nikolaos Kolitsas, Octavian-Eugen Ganea, Thomas Hofmann
XNLI: Evaluating Cross-lingual Sentence Representations (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 906 citations
Conneau et al.
Autoencoder As Assistant Supervisor: Improving Text Representation For Chinese Social Media Text Summarization (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 43 citations
Ma et al.
Wikihow: A Large Scale Text Summarization Dataset (2018) • Arxiv • 177 citations
Mahnaz Koupaee, William Yang Wang
Contextual Parameter Generation For Universal Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 155 citations
Platanios et al.
Textfield: Learning A Deep Direction Field For Irregular Scene Text Detection (2018) • IEEE Transactions on Image Processing • 355 citations
Xu et al.
Back-translation-style Data Augmentation For End-to-end ASR (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 95 citations
Hayashi et al.
A Hierarchical Structured Self-attentive Model For Extractive Document Summarization (HSSAS) (2018) • IEEE Access • 127 citations
Kamal Al-Sabahi, Zhang Zuping, Mohammed Nadher
SWAG: A Large-scale Adversarial Dataset For Grounded Commonsense Inference (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 103 citations
Zellers et al.
Self-attentive Sequential Recommendation (2018) • 2018 IEEE International Conference on Data Mining (ICDM) • 2379 citations
Wang-Cheng Kang, Julian McAuley
E-snli: Natural Language Inference With Natural Language Explanations (2018) • Arxiv • 282 citations
Camburu et al.
Abstractive Summarization Of Reddit Posts With Multi-level Memory Networks (2018) • Arxiv • 60 citations
Byeongchang Kim, Hyunwoo Kim, Gunhee Kim
An End-to-end Textspotter With Explicit Alignment And Attention (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 237 citations
He et al.
Multiwoz -- A Large-scale Multi-domain Wizard-of-oz Dataset For Task-oriented Dialogue Modelling (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 312 citations
Budzianowski et al.
Multi-pointer Co-attention Networks For Recommendation (2018) • Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 220 citations
Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Coqa: A Conversational Question Answering Challenge (2018) • Transactions of the Association for Computational Linguistics • 97 citations
Siva Reddy, Danqi Chen, Christopher D. Manning
Diverse Few-shot Text Classification With Multiple Metrics (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 230 citations
Yu et al.
Show, Tell And Discriminate: Image Captioning By Self-retrieval With Partially Labeled Data (2018) • Lecture Notes in Computer Science • 83 citations
Liu et al.
Simple Unsupervised Keyphrase Extraction Using Sentence Embeddings (2018) • Proceedings of the 22nd Conference on Computational Natural Language Learning • 224 citations
Bennani-Smires et al.
End-to-end Dense Video Captioning With Masked Transformer (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 401 citations
Zhou et al.
Densely Connected Bidirectional LSTM With Applications To Sentence Classification (2018) • Lecture Notes in Computer Science • 64 citations
Ding et al.
Adversarially Regularising Neural NLI Models To Integrate Logical Background Knowledge (2018) • Arxiv • 43 citations
Pasquale Minervini, Sebastian Riedel
Tracking State Changes In Procedural Text: A Challenge Dataset And Models For Process Paragraph Comprehension (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 109 citations
Mishra et al.
Towards Exploiting Background Knowledge For Building Conversation Systems (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 158 citations
Moghe et al.
Towards Deep Conversational Recommendations (2018) • Arxiv • 123 citations
Li et al.
A Discourse-aware Attention Model For Abstractive Summarization Of Long Documents (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 68 citations
Cohan et al.
Microsoft Dialogue Challenge: Building End-to-end Task-completion Dialogue Systems (2018) • Arxiv • 60 citations
Li et al.
A Hierarchical End-to-end Model For Jointly Improving Text Summarization And Sentiment Classification (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 62 citations
Ma et al.
A Hierarchical Latent Structure For Variational Conversation Modeling (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 102 citations
Yookoon Park, Jaemin Cho, Gunhee Kim
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 115 citations
Khandelwal et al.
Meansum: A Neural Model For Unsupervised Multi-document Abstractive Summarization (2018) • Arxiv • 97 citations
Eric Chu, Peter J. Liu
Large Scale Distributed Neural Network Training Through Online Distillation (2018) • Arxiv • 152 citations
Anil et al.
Escape: A Large-scale Synthetic Corpus For Automatic Post-editing (2018) • Arxiv • 50 citations
Negri et al.
Learning To Mine Aligned Code And Natural Language Pairs From Stack Overflow (2018) • Proceedings of the 15th International Conference on Mining Software Repositories • 183 citations
Yin et al.
Multilingual Extractive Reading Comprehension By Runtime Machine Translation (2018) • Arxiv • 59 citations
Asai et al.
Pythia V0.1: The Winning Entry To The VQA Challenge 2018 (2018) • Arxiv • 165 citations
Jiang et al.
Aspect Term Extraction With History Attention And Selective Transformation (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 275 citations
Li et al.
Delete, Retrieve, Generate: A Simple Approach To Sentiment And Style Transfer (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 114 citations
Li et al.
Spider: A Large-scale Human-labeled Dataset For Complex And Cross-domain Semantic Parsing And Text-to-sql Task (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 672 citations
Yu et al.
Adversarial Removal Of Demographic Attributes From Text Data (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 243 citations
Yanai Elazar, Yoav Goldberg
Quac : Question Answering In Context (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 678 citations
Choi et al.
Scaling Neural Machine Translation (2018) • Proceedings of the Third Conference on Machine Translation: Research Papers • 80 citations
Ott et al.
Learning Private Neural Language Modeling With Attentive Aggregation (2018) • 2019 International Joint Conference on Neural Networks (IJCNN) • 93 citations
Ji et al.
Baseline Needs More Love: On Simple Word-embedding-based Models And Associated Pooling Mechanisms (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 319 citations
Shen et al.
Harvesting Paragraph-level Question-answer Pairs From Wikipedia (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 169 citations
Xinya Du, Claire Cardie
Textsnake: A Flexible Representation For Detecting Text Of Arbitrary Shapes (2018) • Lecture Notes in Computer Science • 623 citations
Long et al.
Style Transfer As Unsupervised Machine Translation (2018) • Arxiv • 114 citations
Zhang et al.
Table-to-text: Describing Table Region With Natural Language (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 62 citations
Bao et al.
FOTS: Fast Oriented Text Spotting With A Unified Network (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 563 citations
Liu et al.
Neural Abstractive Text Summarization With Sequence-to-sequence Models (2018) • Arxiv • 68 citations
Shi et al.
Reinforced Self-attention Network: A Hybrid Of Hard And Soft Attention For Sequence Modeling (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 134 citations
Shen et al.
Faithful To The Original: Fact Aware Neural Abstractive Summarization (2017) • Arxiv • 174 citations
Cao et al.
Deep Active Learning For Named Entity Recognition (2017) • Proceedings of the 2nd Workshop on Representation Learning for NLP • 364 citations
Shen et al.
Dense-captioning Events In Videos (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 50 citations
Krishna et al.
FOIL It! Find One Mismatch Between Image And Language Caption (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Shekhar et al.
Disan: Directional Self-attention Network For Rnn/cnn-free Language Understanding (2017) • Arxiv • 113 citations
Shen et al.
A Deep Reinforced Model For Abstractive Summarization (2017) • Arxiv • 1273 citations
Romain Paulus, Caiming Xiong, Richard Socher
A Parallel Corpus Of Python Functions And Documentation Strings For Automated Code Documentation And Code Generation (2017) • Arxiv • 62 citations
Antonio Valerio Miceli Barone, Rico Sennrich
TALL: Temporal Activity Localization Via Language Query (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 768 citations
Gao et al.
Flexible End-to-end Dialogue System For Knowledge Grounded Conversation (2017) • Arxiv • 88 citations
Zhu et al.
Improved Variational Autoencoders For Text Modeling Using Dilated Convolutions (2017) • Arxiv • 94 citations
Yang et al.
Learning To Generate Reviews And Discovering Sentiment (2017) • Arxiv • 350 citations
Alec Radford, Rafal Jozefowicz, Ilya Sutskever
Latent Relational Metric Learning Via Memory-based Attention For Collaborative Ranking (2017) • Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18 • 214 citations
Yi Tay, Anh Tuan Luu, Siu Cheung Hui
Variational Reasoning For Question Answering With Knowledge Graph (2017) • Arxiv • 180 citations
Zhang et al.
Neural Rating Regression With Abstractive Tips Generation For Recommendation (2017) • Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval • 288 citations
Li et al.
Quasar: Datasets For Question Answering By Search And Reading (2017) • Arxiv • 139 citations
Bhuwan Dhingra, Kathryn Mazaitis, William W. Cohen
Personalization In Goal-oriented Dialog (2017) • Arxiv • 63 citations
Chaitanya K. Joshi, Fei Mi, Boi Faltings
Attend And Diagnose: Clinical Time Series Analysis Using Attention Models (2017) • Arxiv • 41 citations
Song et al.
Inter-session Modeling For Session-based Recommendation (2017) • Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems • 72 citations
Massimiliano Ruocco, Ole Steinar Lillestøl Skrede, Helge Langseth
I2T2I: Learning Text To Image Synthesis With Textual Data Augmentation (2017) • 2017 IEEE International Conference on Image Processing (ICIP) • 60 citations
Dong et al.
Learning To Generate Long-term Future Via Hierarchical Prediction (2017) • Arxiv • 180 citations
Villegas et al.
Deal Or No Deal? End-to-end Learning For Negotiation Dialogues (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 51 citations
Lewis et al.
Seq2sql: Generating Structured Queries From Natural Language Using Reinforcement Learning (2017) • Arxiv • 782 citations
Victor Zhong, Caiming Xiong, Richard Socher
Aspect-augmented Adversarial Networks For Domain Adaptation (2017) • Transactions of the Association for Computational Linguistics • 93 citations
Yuan Zhang, Regina Barzilay, Tommi Jaakkola
Unconstrained Scene Text And Video Text Recognition For Arabic Script (2017) • 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR) • 49 citations
Mohit Jain, Minesh Mathew, C. V. Jawahar
Fusionnet: Fusing Via Fully-aware Attention With Application To Machine Comprehension (2017) • Arxiv • 86 citations
Huang et al.
Neural Natural Language Inference Models Enhanced With External Knowledge (2017) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 271 citations
Chen et al.
Adacomp : Adaptive Residual Gradient Compression For Data-parallel Distributed Training (2017) • Arxiv • 74 citations
Chen et al.
Key-value Retrieval Networks For Task-oriented Dialogue (2017) • Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue • 51 citations
Mihail Eric, Christopher D. Manning
Incorporating Copying Mechanism In Image Captioning For Learning Novel Objects (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 149 citations
Yao et al.
Deep Gradient Compression: Reducing The Communication Bandwidth For Distributed Training (2017) • ICLR 2018 • 645 citations
Lin et al.
Dissent: Sentence Representation Learning From Explicit Discourse Relations (2017) • Arxiv • 59 citations
Allen Nie, Erin D. Bennett, Noah D. Goodman
Video Captioning With Guidance Of Multimodal Latent Topics (2017) • Proceedings of the 25th ACM international conference on Multimedia • 66 citations
Chen et al.
Gradnorm: Gradient Normalization For Adaptive Loss Balancing In Deep Multitask Networks (2017) • Proceedings of the 35th International Conference on Machine Learning (2018) 793-802 • 443 citations
Chen et al.
Just ASK: Building An Architecture For Extensible Self-service Spoken Language Understanding (2017) • Arxiv • 56 citations
Kumar et al.
Don't Just Assume; Look And Answer: Overcoming Priors For Visual Question Answering (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 40 citations
Agrawal et al.
Attentive Memory Networks: Efficient Machine Reading For Conversational Search (2017) • Proceedings of 1st International Workshop on Conversational Approaches to Information Retrieval Tokyo Japan August 11 2017 (CAIR17) • 40 citations
Tom Kenter, Maarten de Rijke
Dureader: A Chinese Machine Reading Comprehension Dataset From Real-world Applications (2017) • Arxiv • 51 citations
He et al.
Image-grounded Conversations: Multimodal Context For Natural Question And Response Generation (2017) • Arxiv • 117 citations
Mostafazadeh et al.
Neural Semantic Parsing By Character-based Translation: Experiments With Abstract Meaning Representations (2017) • Arxiv • 82 citations
Rik van Noord, Johan Bos
Parlai: A Dialog Research Software Platform (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 108 citations
Miller et al.
Simple And Effective Multi-paragraph Reading Comprehension (2017) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 63 citations
Christopher Clark, Matt Gardner
Bpemb: Tokenization-free Pre-trained Subword Embeddings In 275 Languages (2017) • Arxiv • 127 citations
Benjamin Heinzerling, Michael Strube
Learning To Compose Domain-specific Transformations For Data Augmentation (2017) • Advances in Neural Information Processing Systems 30 2017 3236--3246 • 182 citations
Ratner et al.
Skeleton Key: Image Captioning By Skeleton-attribute Decomposition (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 100 citations
Wang et al.
Question Answering Through Transfer Learning From Large Fine-grained Supervision Data (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 114 citations
Sewon Min, Minjoon Seo, Hannaneh Hajishirzi
Accelerating Innovation Through Analogy Mining (2017) • Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining • 60 citations
Hope et al.
Generating High-quality And Informative Conversation Responses With Sequence-to-sequence Models (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 187 citations
Shao et al.
Visual Question Answering: A Survey Of Methods And Datasets (2016) • Arxiv • 44 citations
Wu et al.
Machine Comprehension Using Match-lstm And Answer Pointer (2016) • Arxiv • 414 citations
Shuohang Wang, Jing Jiang
Collaborative Recurrent Autoencoder: Recommend While Learning To Fill In The Blanks (2016) • Arxiv • 79 citations
Hao Wang, Xingjian Shi, Dit-Yan Yeung
A Hierarchical Model Of Reviews For Aspect-based Sentiment Analysis (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 49 citations
Sebastian Ruder, Parsa Ghaffari, John G. Breslin
Dataset And Neural Recurrent Sequence Labeling Model For Open-domain Factoid Question Answering (2016) • Arxiv • 68 citations
Li et al.
Multi-perspective Context Matching For Machine Comprehension (2016) • Arxiv • 115 citations
Wang et al.
A Network-based End-to-end Trainable Task-oriented Dialogue System (2016) • Arxiv • 170 citations
Wen et al.
Joint Copying And Restricted Generation For Paraphrase (2016) • Arxiv • 61 citations
Cao et al.
Stackgan: Text To Photo-realistic Image Synthesis With Stacked Generative Adversarial Networks (2016) • Arxiv • 227 citations
Zhang et al.
Image Captioning With Deep Bidirectional Lstms (2016) • Proceedings of the 24th ACM international conference on Multimedia • 262 citations
Wang et al.
Learning To Generalize To New Compositions In Image Understanding (2016) • Arxiv • 53 citations
Atzmon et al.
Modeling Context In Referring Expressions (2016) • Lecture Notes in Computer Science • 895 citations
Yu et al.
Zoneout: Regularizing Rnns By Randomly Preserving Hidden Activations (2016) • Arxiv • 173 citations
Krueger et al.
Embracing Data Abundance: Booktest Dataset For Reading Comprehension (2016) • Arxiv • 56 citations
Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
The LAMBADA Dataset: Word Prediction Requiring A Broad Discourse Context (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 99 citations
Paperno et al.
MS MARCO: A Human Generated Machine Reading Comprehension Dataset (2016) • Arxiv • 440 citations
Bajaj et al.
RNN Approaches To Text Normalization: A Challenge (2016) • Arxiv • 55 citations
Richard Sproat, Navdeep Jaitly
Visual Genome: Connecting Language And Vision Using Crowdsourced Dense Image Annotations (2016) • International Journal of Computer Vision • 4911 citations
Krishna et al.
A Context-aware Attention Network For Interactive Question Answering (2016) • Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining • 40 citations
Li et al.
Title Generation For User Generated Videos (2016) • Lecture Notes in Computer Science • 65 citations
Zeng et al.
Modelling Interaction Of Sentence Pair With Coupled-lstms (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 43 citations
Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Distraction-based Neural Networks For Document Summarization (2016) • IJCAI 2016 • 61 citations
Chen et al.
Wikireading: A Novel Large-scale Language Understanding Task Over Wikipedia (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 128 citations
Hewlett et al.
Revisiting Visual Question Answering Baselines (2016) • Lecture Notes in Computer Science • 224 citations
Allan Jabri, Armand Joulin, Laurens van Der Maaten
Attentive Explanations: Justifying Decisions And Pointing To The Evidence (2016) • Arxiv • 55 citations
Park et al.
TGIF: A New Dataset And Benchmark On Animated GIF Description (2016) • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 197 citations
Li et al.
Towards Sub-word Level Compositions For Sentiment Analysis Of Hindi-english Code Mixed Text (2016) • Arxiv • 128 citations
Prabhu et al.
Text Understanding With The Attention Sum Reader Network (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 46 citations
Kadlec et al.
Pointer Sentinel Mixture Models (2016) • Arxiv • 481 citations
Merity et al.
Tracking The World State With Recurrent Entity Networks (2016) • ICLR 2017 • 157 citations
Henaff et al.

Showing first 12 while collapsed. Click to expand and reveal all 1692.

Diffusion Processes 448 papers #

Latent Refinement Decoding: Enhancing Diffusion-based Language Models By Refining Belief States (2025) • No Venue
Zhu et al.
Obs-diff: Accurate Pruning For Diffusion Models In One-shot (2025) • No Venue
Zhu et al.
Transition Models: Rethinking The Generative Learning Objective (2025) • No Venue
Wang et al.
Stableavatar: Infinite-length Audio-driven Avatar Video Generation (2025) • No Venue
Tu et al.
Cinemaster: A 3d-aware And Controllable Framework For Cinematic Text-to-video Generation (2025) • No Venue
Wang et al.
DDT: Decoupled Diffusion Transformer (2025) • No Venue
Wang et al.
Fantasyportrait: Enhancing Multi-character Portrait Animation With Expression-augmented Diffusion Transformers (2025) • No Venue
Wang et al.
Fantasytalking: Realistic Talking Portrait Generation Via Coherent Motion Synthesis (2025) • No Venue
Wang et al.
DICEPTION: A Generalist Diffusion Model For Visual Perceptual Tasks (2025) • No Venue
Zhao et al.
Neural-driven Image Editing (2025) • No Venue
Zhou et al.
Dreamrenderer: Taming Multi-instance Attribute Control In Large-scale Text-to-image Models (2025) • No Venue
Zhou et al.
Flashvideo:flowing Fidelity To Detail For Efficient High-resolution Video Generation (2025) • No Venue
Zhang et al.
Fast Video Generation With Sliding Tile Attention (2025) • No Venue
Zhang et al.
Faster Video Diffusion With Trainable Sparse Attention (2025) • No Venue
Zhang et al.
Group Relative Attention Guidance For Image Editing (2025) • No Venue
Zhang et al.
Packing Input Frame Context In Next-frame Prediction Models For Video Generation (2025) • No Venue
Lvmin Zhang, Maneesh Agrawala
SLA: Beyond Sparsity In Diffusion Transformers Via Fine-tunable Sparse-linear Attention (2025) • No Venue
Zhang et al.
Training-free Efficient Video Generation Via Dynamic Token Carving (2025) • No Venue
Zhang et al.
Vision-language-vision Auto-encoder: Scalable Knowledge Distillation From Diffusion Models (2025) • No Venue
Zhang et al.
Unified Multimodal Understanding And Generation Models: Advances, Challenges, And Opportunities (2025) • No Venue
Zhang et al.
Videorepa: Learning Physics For Video Generation Through Relational Alignment With Foundation Models (2025) • No Venue
Zhang et al.
Marrying Autoregressive Transformer And Diffusion With Multi-reference Autoregression (2025) • No Venue
Zhen et al.
Riflex: A Free Lunch For Length Extrapolation In Video Diffusion Transformers (2025) • No Venue
Zhao et al.
Diffusionnft: Online Diffusion Reinforcement With Forward Process (2025) • No Venue
Zheng et al.
Diffusion Transformers With Representation Autoencoders (2025) • No Venue
Zheng et al.
Scaling Diffusion Transformers Efficiently Via Μp (2025) • No Venue
Zheng et al.
Llada-v: Large Language Diffusion Models With Visual Instruction Tuning (2025) • No Venue
You et al.
Discrete Diffusion In Large Language And Multimodal Models: A Survey (2025) • No Venue
Runpeng Yu, Qi Li, Xinchao Wang
Pixeldit: Pixel Diffusion Transformers For Image Generation (2025) • No Venue
Yu et al.
Easycontrol: Adding Efficient And Flexible Control For Diffusion Transformer (2025) • No Venue
Zhang et al.
BANG: Dividing 3D Assets Via Generative Exploded Dynamics (2025) • No Venue
Zhang et al.
Diffusion Vs. Autoregressive Language Models: A Text Embedding Perspective (2025) • No Venue
Zhang et al.
Ovis-u1 Technical Report (2025) • No Venue
Wang et al.
Time Is A Feature: Exploiting Temporal Dynamics In Diffusion Language Models (2025) • No Venue
Wang et al.
Pixnerd: Pixel Neural Field Diffusion (2025) • No Venue
Wang et al.
Sparsed: Sparse Attention For Diffusion Language Models (2025) • No Venue
Wang et al.
Transpixar: Advancing Text-to-video Generation With Transparency (2025) • No Venue
Wang et al.
3D Scene Generation: A Survey (2025) • No Venue
Wen et al.
The Devil Behind The Mask: An Emergent Safety Vulnerability Of Diffusion Llms (2025) • No Venue
Wen et al.
Fast-dllm: Training-free Acceleration Of Diffusion LLM By Enabling KV Cache And Parallel Decoding (2025) • No Venue
Wu et al.
Vmoba: Mixture-of-block Attention For Video Diffusion Models (2025) • No Venue
Wu et al.
Lumina-dimoo: An Omni Diffusion Large Language Model For Multi-modal Generation And Understanding (2025) • No Venue
Xin et al.
SANA 1.5: Efficient Scaling Of Training-time And Inference-time Compute In Linear Diffusion Transformer (2025) • No Venue
Xie et al.
STAR: Spatial-temporal Augmentation With Text-to-video Models For Real-world Video Super-resolution (2025) • No Venue
Xie et al.
Dancegrpo: Unleashing GRPO On Visual Generation (2025) • No Venue
Xue et al.
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding And Generation (2025) • No Venue
Xu et al.
Jodi: Unification Of Visual Generation And Understanding Via Joint Modeling (2025) • No Venue
Xu et al.
Withanyone: Towards Controllable And ID Consistent Image Generation (2025) • No Venue
Xu et al.
Gencompositor: Generative Video Compositing With Diffusion Transformer (2025) • No Venue
Yang et al.
Mindjourney: Test-time Scaling With World Models For Spatial Reasoning (2025) • No Venue
Yang et al.
Mmada: Multimodal Large Diffusion Language Models (2025) • No Venue
Yang et al.
Steering Vision-language-action Models As Anti-exploration: A Test-time Scaling Approach (2025) • No Venue
Yang et al.
Dynvfx: Augmenting Real Videos With Dynamic Content (2025) • No Venue
Yatim et al.
Reconstruction Vs. Generation: Taming Optimization Dilemma In Latent Diffusion Models (2025) • No Venue
Jingfeng Yao, Xinggang Wang
Ultraflux: Data-model Co-design For High-quality Native 4K Text-to-image Generation Across Diverse Aspect Ratios (2025) • No Venue
Tian Ye, Song Fei, Lei Zhu
Magicinfinite: Generating Infinite Talking Videos With Your Words And Voice (2025) • No Venue
Yi et al.
Infinity-rope: Action-controllable Infinite Video Generation Emerges From Autoregressive Self-rollout (2025) • No Venue
Yesiltepe et al.
Fine-grained Perturbation Guidance Via Attention Head Selection (2025) • No Venue
Ahn et al.
Flexidit: Your Diffusion Transformer Can Easily Generate High-quality Samples With Less Compute (2025) • No Venue
Anagnostidis et al.
Block Diffusion: Interpolating Between Autoregressive And Diffusion Language Models (2025) • No Venue
Arriola et al.
Weak-to-strong Diffusion With Reflection (2025) • No Venue
Lichen Bai, Masashi Sugiyama, Zeke Xie
Video-as-prompt: Unified Semantic Control For Video Generation (2025) • No Venue
Bian et al.
Go-with-the-flow: Motion-controllable Video Diffusion Models Using Real-time Warped Noise (2025) • No Venue
Burgert et al.
Videocanvas: Unified Video Completion From Arbitrary Spatiotemporal Patches Via In-context Conditioning (2025) • No Venue
Cai et al.
Dip: Taming Diffusion Models In Pixel Space (2025) • No Venue
Chen et al.
Blip3-o: A Family Of Fully Open Unified Multimodal Models-architecture, Training And Dataset (2025) • No Venue
Chen et al.
Blip3o-next: Next Frontier Of Native Image Generation (2025) • No Venue
Chen et al.
Coda: Coding LM Via Diffusion Adaptation (2025) • No Venue
Chen et al.
Flash-dmd: Towards High-fidelity Few-step Image Generation With Efficient Distillation And Joint Reinforcement Learning (2025) • No Venue
Chen et al.
An Empirical Study Of Gpt-4o Image Generation Capabilities (2025) • No Venue
Chen et al.
MIDAS: Multimodal Interactive Digital-human Synthesis Via Real-time Autoregressive Video Generation (2025) • No Venue
Chen et al.
Multimodal Representation Alignment For Image Generation: Text-image Interleaved Control Is Easier Than You Think (2025) • No Venue
Chen et al.
Omniinsert: Mask-free Video Insertion Of Any Reference Via Diffusion Transformer Models (2025) • No Venue
Chen et al.
S^2-guidance: Stochastic Self Guidance For Training-free Enhancement Of Diffusion Models (2025) • No Venue
Chen et al.
Sana-sprint: One-step Diffusion With Continuous-time Consistency Distillation (2025) • No Venue
Chen et al.
Sparse-vdit: Unleashing The Power Of Sparse Attention To Accelerate Video Diffusion Transformers (2025) • No Venue
Chen et al.
Animegamer: Infinite Anime Life Simulation With Next Game State Prediction (2025) • No Venue
Cheng et al.
Emu3.5: Native Multimodal Models Are World Learners (2025) • No Venue
Cui et al.
Self-forcing++: Towards Minute-scale High-quality Video Generation (2025) • No Venue
Cui et al.
Lorashop: Training-free Multi-concept Image Generation And Editing With Rectified Flow Transformers (2025) • No Venue
Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag
MV-RAG: Retrieval Augmented Multiview Diffusion (2025) • No Venue
Yosef Dayani, Omer Benishu, Sagie Benaim
Story2board: A Training-free Approach For Expressive Storyboard Generation (2025) • No Venue
Dinkevich et al.
Unimmvsr: A Unified Multi-modal Framework For Cascaded Video Super-resolution (2025) • No Venue
Du et al.
Mind-the-glitch: Visual Correspondence For Detecting Inconsistencies In Subject-driven Generation (2025) • No Venue
Eldesokey et al.
Cfg-zero*: Improved Classifier-free Guidance For Flow Matching Models (2025) • No Venue
Fan et al.
Tokenverse: Versatile Multi-concept Personalization In Token Modulation Space (2025) • No Venue
Garibi et al.
Sliderspace: Decomposing The Visual Capabilities Of Diffusion Models (2025) • No Venue
Gandikota et al.
D-AR: Diffusion Via Autoregressive Models (2025) • No Venue
Ziteng Gao, Mike Zheng Shou
Diffucoder: Understanding And Improving Masked Diffusion Models For Code Generation (2025) • No Venue
Gong et al.
Seedream 2.0: A Native Chinese-english Bilingual Image Generation Foundation Model (2025) • No Venue
Gong et al.
Diffusion As Shader: 3d-aware Video Diffusion For Versatile Video Generation Control (2025) • No Venue
Gu et al.
Deep Researcher With Test-time Diffusion (2025) • No Venue
Han et al.
Learnings From Scaling Visual Tokenizers For Reconstruction And Generation (2025) • No Venue
Hansen-Estruch et al.
Conceptattention: Diffusion Transformers Learn Highly Interpretable Features (2025) • No Venue
Helbling et al.
Dita: Scaling Diffusion Transformer For Generalist Vision-language-action Policy (2025) • No Venue
Hou et al.
Image Editing As Programs With Diffusion Models (2025) • No Venue
Hu et al.
ILLUME+: Illuminating Unified MLLM With Dual Visual Tokenization And Diffusion Refinement (2025) • No Venue
Huang et al.
Live Avatar: Streaming Real-time Audio-driven Avatar Generation With Infinite Length (2025) • No Venue
Huang et al.
From Denoising To Refining: A Corrective Framework For Vision-language Diffusion Model (2025) • No Venue
Ji et al.
Dype: Dynamic Position Extrapolation For Ultra High Resolution Diffusion (2025) • No Venue
Issachar et al.
Silent Branding Attack: Trigger-free Data Poisoning Attack On Text-to-image Diffusion Models (2025) • No Venue
Jang et al.
Upsample What Matters: Region-adaptive Latent Sampling For Accelerated Diffusion Transformers (2025) • No Venue
Jeong et al.
Infiniteyou: Flexible Photo Recrafting While Preserving Your Identity (2025) • No Venue
Jiang et al.
Continuous Diffusion Model For Language Modeling (2025) • No Venue
Jaehyeong Jo, Sung Ju Hwang
Loopholing Discrete Diffusion: Deterministic Bypass Of The Sampling Wall (2025) • No Venue
Jo et al.
Parallelbench: Understanding The Trade-offs Of Parallel Decoding In Diffusion Llms (2025) • No Venue
Kang et al.
Marigold: Affordable Adaptation Of Diffusion-based Image Generators For Image Analysis (2025) • No Venue
Ke et al.
KLASS: Kl-guided Fast Inference In Masked Diffusion Models (2025) • No Venue
Kim et al.
Inference-time Scaling For Flow Models Via Stochastic Generation And Rollover Budget Forcing (2025) • No Venue
Kim et al.
PLADIS: Pushing The Limits Of Attention In Diffusion Models At Inference Time By Leveraging Sparsity (2025) • No Venue
Kwanyoung Kim, Byeongsu Sim
Temporal In-context Fine-tuning For Versatile Control Of Video Diffusion Models (2025) • No Venue
Kinam Kim, Junha Hyung, Jaegul Choo
Heeding The Inner Voice: Aligning Controlnet Training Via Intermediate Features Feedback (2025) • No Venue
Konovalova et al.
Streamdit: Real-time Streaming Text-to-video Generation (2025) • No Venue
Kodaira et al.
REPA-E: Unlocking VAE For End-to-end Tuning With Latent Diffusion Transformers (2025) • No Venue
Leng et al.
Beyond Fixed: Variable-length Denoising For Diffusion Large Language Models (2025) • No Venue
Li et al.
Back To Basics: Let Denoising Generative Models Denoise (2025) • No Venue
Tianhong Li, Kaiming He
Diffusion Language Models Know The Answer Before Decoding (2025) • No Venue
Li et al.
MANZANO: A Simple And Scalable Unified Multimodal Model With A Hybrid Vision Tokenizer (2025) • No Venue
Li et al.
Radial Attention: O(nlog N) Sparse Attention With Energy Decay For Long Video Generation (2025) • No Venue
Li et al.
Uniworld-v2: Reinforce Image Editing With Diffusion Negative-aware Finetuning And MLLM Implicit Feedback (2025) • No Venue
Li et al.
A Survey On Diffusion Language Models (2025) • No Venue
Li et al.
Triposg: High-fidelity 3D Shape Synthesis Using Large-scale Rectified Flow Models (2025) • No Venue
Li et al.
Discrete Diffusion VLA: Bringing Discrete Diffusion To Action Decoding In Vision-language-action Policies (2025) • No Venue
Liang et al.
Autoregressive Adversarial Post-training For Real-time Interactive Video Generation (2025) • No Venue
Lin et al.
Partcrafter: Structured 3D Mesh Generation Via Compositional Latent Diffusion Transformers (2025) • No Venue
Lin et al.
Omnihuman-1: Rethinking The Scaling-up Of One-stage Conditioned Human Animation Models (2025) • No Venue
Lin et al.
Quantization Meets Dllms: A Systematic Study Of Post-training Quantization For Diffusion Llms (2025) • No Venue
Lin et al.
Longllada: Unlocking Long Context Capabilities In Diffusion Llms (2025) • No Venue
Liu et al.
Flow-grpo: Training Flow Matching Models Via Online RL (2025) • No Venue
Liu et al.
Javisdit: Joint Audio-video Diffusion Transformer With Hierarchical Spatio-temporal Prior Synchronization (2025) • No Venue
Liu et al.
Langscene-x: Reconstruct Generalizable 3D Language-embedded Scenes With Trimap Video Diffusion (2025) • No Venue
Liu et al.
Rolling Forcing: Autoregressive Long Video Diffusion In Real Time (2025) • No Venue
Liu et al.
Region-adaptive Sampling For Diffusion Transformers (2025) • No Venue
Liu et al.
Sequential Diffusion Language Models (2025) • No Venue
Liu et al.
Step1x-edit: A Practical Framework For General Image Editing (2025) • No Venue
Liu et al.
Tidar: Think In Diffusion, Talk In Autoregression (2025) • No Venue
Liu et al.
Hyper-bagel: A Unified Acceleration Framework For Multimodal Understanding And Generation (2025) • No Venue
Lu et al.
Dreamactor-m1: Holistic, Expressive And Robust Human Image Animation With Hybrid Guidance (2025) • No Venue
Luo et al.
Calligrapher: Freestyle Text Image Customization (2025) • No Venue
Ma et al.
Inference-time Scaling For Diffusion Models Beyond Scaling Denoising Steps (2025) • No Venue
Ma et al.
Step-video-t2v Technical Report: The Practice, Challenges, And Future Of Video Foundation Model (2025) • No Venue
Ma et al.
Yume: An Interactive World Generation Model (2025) • No Venue
Mao et al.
I Think, Therefore I Diffuse: Enabling Multimodal In-context Reasoning In Diffusion Models (2025) • No Venue
Mi et al.
Nablanabla: Neighborhood Adaptive Block-level Attention (2025) • No Venue
Mikhailov et al.
ORIGEN: Zero-shot 3D Orientation Grounding In Text-to-image Generation (2025) • No Venue
Min et al.
Dreamo: A Unified Framework For Image Customization (2025) • No Venue
Mou et al.
Attention Is All You Need For KV Cache In Diffusion Llms (2025) • No Venue
Quan Nguyen-Tri, Mukul Ranjan, Zhiqiang Shen
Large Language Diffusion Models (2025) • No Venue
Nie et al.
Diffusion Language Models Are Super Data Learners (2025) • No Venue
Ni et al.
Semantics Lead The Way: Harmonizing Semantic And Texture Modeling With Asynchronous Latent Diffusion (2025) • No Venue
Pan et al.
Temporal Alignment Guidance: On-manifold Sampling In Diffusion Models (2025) • No Venue
Park et al.
Vibevoice Technical Report (2025) • No Venue
Peng et al.
Unconditional Priors Matter! Improving Conditional Generation Of Fine-tuned Diffusion Models (2025) • No Venue
Phunyaphibarn et al.
Animeshooter: A Multi-shot Animation Dataset For Reference-guided Video Generation (2025) • No Venue
Qiu et al.
One Small Step In Latent, One Giant Leap For Pixels: Fast Latent Upscale Adapter For Your Diffusion Models (2025) • No Venue
Aleksandr Razin, Danil Kazantsev, Ilya Makarov
Attention Sinks In Diffusion Language Models (2025) • No Venue
Rulli et al.
The Diffusion Duality (2025) • No Venue
Sahoo et al.
Omnimattezero: Training-free Real-time Omnimatte With Pre-trained Video Diffusion Models (2025) • No Venue
Samuel et al.
Seedream 4.0: Toward Next-generation Multimodal Image Generation (2025) • No Venue
Seedream et al.
Skrr: Skip And Re-use Text Encoder Layers For Memory Efficient Text-to-image Generation (2025) • No Venue
Seo et al.
Efficient Personalization Of Quantized Diffusion Model Without Backpropagation (2025) • No Venue
Seo et al.
Imagerag: Dynamic Image Retrieval For Reference-guided Image Generation (2025) • No Venue
Shalev-Arkushin et al.
Core^2: Collect, Reflect And Refine To Generate Better And Faster (2025) • No Venue
Shao et al.
Negative-guided Subject Fidelity Optimization For Zero-shot Subject-driven Generation (2025) • No Venue
Shin et al.
Liteattention: A Temporal Sparse Attention For Diffusion Transformers (2025) • No Venue
Shmilovich et al.
Time-to-move: Training-free Motion Controlled Video Generation Via Dual-clock Denoising (2025) • No Venue
Singer et al.
T-lora: Single Image Diffusion Model Customization Without Overfitting (2025) • No Venue
Soboleva et al.
Seed Diffusion: A Large-scale Diffusion Language Model With High-speed Inference (2025) • No Venue
Song et al.
Omniconsistency: Learning Style-agnostic Consistency From Paired Stylization Data (2025) • No Venue
Yiren Song, Cheng Liu, Mike Zheng Shou
Layertracer: Cognitive-aligned Layered SVG Synthesis Via Diffusion Transformer (2025) • No Venue
Yiren Song, Danze Chen, Mike Zheng Shou
Makeanything: Harnessing Diffusion Transformers For Multi-domain Procedural Sequence Generation (2025) • No Venue
Yiren Song, Cheng Liu, Mike Zheng Shou
Unified Continuous Generative Models (2025) • No Venue
Peng Sun, Yi Jiang, Tao Lin
DINGO: Constrained Inference For Diffusion Llms (2025) • No Venue
Suresh et al.
Vision Bridge Transformer At Scale (2025) • No Venue
Tan et al.
Inferix: A Block-diffusion Based Next-generation Inference Engine For World Simulation (2025) • No Venue
Team et al.
Nextstep-1: Toward Autoregressive Image Generation With Continuous Tokens At Scale (2025) • No Venue
Team et al.
PAN: A World Model For General, Interactable, And Long-horizon World Simulation (2025) • No Venue
Team et al.
Padding Tone: A Mechanistic Analysis Of Padding Tokens In T2I Models (2025) • No Venue
Toker et al.
Mmada-parallel: Multimodal Large Diffusion Language Models For Thinking-aware Editing And Generation (2025) • No Venue
Tian et al.
Audiox: Diffusion Transformer For Anything-to-audio Generation (2025) • No Venue
Tian et al.
Continuous Speech Synthesis Using Per-token Latent Diffusion (2024) • No Venue
Turetzky et al.
Diffusion Models Are Real-time Game Engines (2024) • No Venue
Valevski et al.
Switti: Designing Scale-wise Transformers For Text-to-image Synthesis (2024) • No Venue
Voronov et al.
Edify Image: High-quality Image Generation With Pixel Space Laplacian Diffusion Models (2024) • No Venue
Nvidia et al.
Stable Flow: Vital Layers For Training-free Image Editing (2024) • No Venue
Avrahami et al.
Meissonic: Revitalizing Masked Generative Transformers For Efficient High-resolution Text-to-image Synthesis (2024) • No Venue
Bai et al.
Syncammaster: Synchronizing Multi-camera Video Generation From Diverse Viewpoints (2024) • No Venue
Bai et al.
Seed-music: A Unified Framework For High Quality And Controlled Music Generation (2024) • No Venue
Bai et al.
Lumiere: A Space-time Diffusion Model For Video Generation (2024) • No Venue
Bar-Tal et al.
MUMU: Bootstrapping Multimodal Image Generation From Text-to-image Data (2024) • No Venue
William Berman, Alexander Peysakhovich
Make It Count: Text-to-image Generation With An Accurate Number Of Objects (2024) • No Venue
Binyamin et al.
Ditctrl: Exploring Attention Control In Multi-modal Diffusion Transformer For Tuning-free Multi-prompt Longer Video Generation (2024) • No Venue
Cai et al.
3dtopia-xl: Scaling High-quality 3D Asset Generation Via Primitive Diffusion (2024) • No Venue
Chen et al.
Diffusion Forcing: Next-token Prediction Meets Full-sequence Diffusion (2024) • No Venue
Chen et al.
F5-TTS: A Fairytaler That Fakes Fluent And Faithful Speech With Flow Matching (2024) • No Venue
Chen et al.
Pixart-δ: Fast And Controllable Image Generation With Latent Consistency Models (2024) • No Venue
Chen et al.
Videocrafter2: Overcoming Data Limitations For High-quality Video Diffusion Models (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 94 citations
Chen et al.
Training-free Regional Prompting For Diffusion Transformers (2024) • No Venue
Chen et al.
Ditfastattn: Attention Compression For Diffusion Transformer Models (2024) • No Venue
Yuan et al.
Identity-preserving Text-to-video Generation By Frequency Decomposition (2024) • No Venue
Yuan et al.
Cogview3: Finer And Faster Text-to-image Generation Via Relay Diffusion (2024) • No Venue
Zheng et al.
Multi-lora Composition For Image Generation (2024) • No Venue
Zhong et al.
VLOGGER: Multimodal Diffusion For Embodied Avatar Synthesis (2024) • No Venue
Corona et al.
Scalable High-resolution Pixel-space Image Synthesis With Hourglass Diffusion Transformers (2024) • No Venue
Crowson et al.
Be Yourself: Bounded Attention For Multi-subject Text-to-image Generation (2024) • No Venue
Dahary et al.
Swiftbrush V2: Make Your One-step Diffusion Model Better Than Its Teacher (2024) • No Venue
Dao et al.
Causal Diffusion Transformers For Generative Modeling (2024) • No Venue
Deng et al.
Animatelcm: Accelerating The Animation Of Personalized Diffusion Models And Adapters With Decoupled Consistency Learning (2024) • No Venue
Wang et al.
Guide-and-rescale: Self-guidance Mechanism For Effective Tuning-free Real Image Editing (2024) • No Venue
Titov et al.
Diffusion Feedback Helps CLIP See Better (2024) • No Venue
Wang et al.
Generative Inbetweening: Adapting Image-to-video Models For Keyframe Interpolation (2024) • No Venue
Wang et al.
DPLM-2: A Multimodal Diffusion Protein Language Model (2024) • No Venue
Wang et al.
Fitv2: Scalable And Improved Flexible Vision Transformer For Diffusion Model (2024) • No Venue
Wang et al.
Videotetris: Towards Compositional Text-to-video Generation (2024) • No Venue
Tian et al.
Instancediffusion: Instance-level Control For Image Generation (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Wang et al.
Easyref: Omni-generalized Group Image Reference For Diffusion Models Via Multimodal LLM (2024) • No Venue
Zong et al.
Bitsfusion: 1.99 Bits Weight Quantization Of Diffusion Model (2024) • No Venue
Sui et al.
Multimodal Latent Language Modeling With Next-token Diffusion (2024) • No Venue
Sun et al.
Unpacking SDXL Turbo: Interpreting Text-to-image Models With Sparse Autoencoders (2024) • No Venue
Surkov et al.
Video-infinity: Distributed Long Video Generation (2024) • No Venue
Tan et al.
Neural Network Diffusion (2024) • No Venue
Wang et al.
Phased Consistency Model (2024) • No Venue
Wang et al.
Ominicontrol: Minimal And Universal Control For Diffusion Transformer (2024) • No Venue
Tan et al.
No Training, No Problem: Rethinking Classifier-free Guidance For Diffusion Models (2024) • No Venue
Sadat et al.
Eliminating Oversaturation And Artifacts Of High Guidance Scales In Diffusion Models (2024) • No Venue
Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber
Litevae: Lightweight And Efficient Variational Autoencoders For Latent Diffusion Models (2024) • No Venue
Sadat et al.
Fast High-resolution Image Synthesis With Latent Adversarial Diffusion Distillation (2024) • No Venue
Sauer et al.
Inserf: Text-driven Generative Object Insertion In Neural 3D Scenes (2024) • No Venue
Shahbazi et al.
Transfusion: Predict The Next Token And Diffuse Images With One Multi-modal Model (2024) • No Venue
Zhou et al.
Realmdreamer: Text-driven 3D Scene Generation With Inpainting And Depth Diffusion (2024) • No Venue
Shriram et al.
Invertible Consistency Distillation For Text-guided Image Editing In Around 7 Steps (2024) • No Venue
Starodubcev et al.
Alleviating Distortion In Image Generation Via Multi-resolution Diffusion Models (2024) • No Venue
Liu et al.
CLEAR: Conv-like Linearization Revs Pre-trained Diffusion Transformers Up (2024) • No Venue
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Flowing From Words To Pixels: A Framework For Cross-modality Evolution (2024) • No Venue
Liu et al.
Generative Photomontage (2024) • No Venue
Liu et al.
Linfusion: 1 GPU, 1 Minute, 16K Image (2024) • No Venue
Liu et al.
Reconx: Reconstruct Any Scene From Sparse Views With Video Diffusion Model (2024) • No Venue
Liu et al.
Magicquill: An Intelligent Interactive Image Editing System (2024) • No Venue
Liu et al.
Itercomp: Iterative Composition-aware Feedback Learning From Model Gallery For Text-to-image Generation (2024) • No Venue
Zhang et al.
Pixel-space Post-training Of Latent Diffusion Models (2024) • No Venue
Zhang et al.
Tora: Trajectory-oriented Diffusion Transformer For Video Generation (2024) • No Venue
Zhang et al.
Videodrafter: Content-consistent Multi-scene Video Generation With LLM (2024) • No Venue
Long et al.
SF-V: Single Forward Video Generation Model (2024) • No Venue
Zhang et al.
Videoelevator: Elevating Video Generation Quality With Versatile Text-to-image Diffusion Models (2024) • No Venue
Zhang et al.
Fit: Flexible Vision Transformer For Diffusion Model (2024) • No Venue
Lu et al.
Turboedit: Instant Text-based Image Editing (2024) • No Venue
Wu et al.
Fastercache: Training-free Video Diffusion Model Acceleration With High Quality (2024) • No Venue
Lv et al.
Diffsensei: Bridging Multi-modal Llms And Diffusion Models For Customized Manga Generation (2024) • No Venue
Wu et al.
Exploring The Role Of Large Language Models In Prompt Encoding For Diffusion Models (2024) • No Venue
Ma et al.
Fiva: Fine-grained Visual Attribute Dataset For Text-to-image Diffusion Models (2024) • No Venue
Wu et al.
Diffusekrona: A Parameter Efficient Fine-tuning Method For Personalized Diffusion Model (2024) • No Venue
Marjit et al.
Bigger Is Not Always Better: Scaling Properties Of Latent Diffusion Models (2024) • No Venue
Mei et al.
Anidoc: Animation Creation Made Easier (2024) • No Venue
Meng et al.
Youdream: Generating Anatomically Controllable Consistent Text-to-3d Animals (2024) • No Venue
Sandeep Mishra, Oindrila Saha, Alan C. Bovik
Openvid-1m: A Large-scale High-quality Dataset For Text-to-video Generation (2024) • No Venue
Nan et al.
An Image Is Worth More Than 16x16 Patches: Exploring Transformers On Individual Pixels (2024) • No Venue
Nguyen et al.
DITTO: Diffusion Inference-time T-optimization For Music Generation (2024) • No Venue
Novack et al.
Diffusion Augmented Agents: A Framework For Efficient Exploration And Transfer Learning (2024) • No Venue
Palo et al.
Controlnext: Powerful And Efficient Control For Image And Video Generation (2024) • No Venue
Peng et al.
Diffusiongpt: Llm-driven Text-to-image Generation System (2024) • No Venue
Qin et al.
Xgen-videosyn-1: High-fidelity Text-to-video Synthesis With Compressed Representations (2024) • No Venue
Qin et al.
Freescale: Unleashing The Resolution Of Diffusion Models Via Tuning-free Scale Fusion (2024) • No Venue
Qiu et al.
Bringing Objects To Life: 4D Generation From 3D Objects (2024) • No Venue
Rahamim et al.
Omniedit: Building Image Editing Generalist Models Through Specialist Supervision (2024) • No Venue
Wei et al.
Paint By Inpaint: Learning To Add Image Objects By Removing Them First (2024) • No Venue
Wasserman et al.
Pathways On The Image Manifold: Image Editing Via Video Generation (2024) • No Venue
Rotstein et al.
Ipadapter-instruct: Resolving Ambiguity In Image-based Conditioning Using Instruct Prompts (2024) • No Venue
Rowles et al.
Geometry Image Diffusion: Fast And Data-efficient Text-to-3d With Image-based Surface Representation (2024) • No Venue
Slava Elizarov, Ciara Rowles, Simon Donné
Build-a-scene: Interactive 3D Layout Control For Diffusion-based Image Generation (2024) • No Venue
Abdelrahman Eldesokey, Peter Wonka
Scaling Rectified Flow Transformers For High-resolution Image Synthesis (2024) • No Venue
Esser et al.
Scaling Diffusion Transformers To 16 Billion Parameters (2024) • No Venue
Fei et al.
FLUX That Plays Music (2024) • No Venue
Fei et al.
Aligning Diffusion Models With Noise-conditioned Perception (2024) • No Venue
Gambashidze et al.
Flashspeech: Efficient Zero-shot Speech Synthesis (2024) • No Venue
Ye et al.
Dreamreward: Text-to-3d Generation With Human Preference (2024) • No Venue
Ye et al.
Cogvideox: Text-to-video Diffusion Models With An Expert Transformer (2024) • No Venue
Yang et al.
DART: Denoising Autoregressive Transformer For Scalable Text-to-image Generation (2024) • No Venue
Gu et al.
Pulid: Pure And Lightning ID Customization Via Contrastive Alignment (2024) • No Venue
Guo et al.
Ltx-video: Realtime Video Latent Diffusion (2024) • No Venue
Hacohen et al.
Flex3d: Feed-forward 3D Generation With Flexible Reconstruction Model And Input View Curation (2024) • No Venue
Han et al.
Face Adapter For Pre-trained Diffusion Models With Fine-grained ID And Attribute Control (2024) • No Venue
Han et al.
Mastering Text-to-image Diffusion: Recaptioning, Planning, And Generating With Multimodal Llms (2024) • No Venue
Yang et al.
Large-scale Reinforcement Learning For Diffusion Models (2024) • No Venue
Zhang et al.
Acdit: Interpolating Autoregressive Conditional Modeling And Diffusion Transformer (2024) • No Venue
Hu et al.
ELLA: Equip Diffusion Models With LLM For Enhanced Semantic Alignment (2024) • No Venue
Hu et al.
Snapgen: Taming High-resolution Text-to-image Models For Mobile Devices With Efficient Architectures And Training (2024) • No Venue
Hu et al.
LVCD: Reference-based Lineart Video Colorization With Diffusion Models (2024) • No Venue
Zhitong Huang, Mohan Zhang, Jing Liao
Spatiotemporal Skip Guidance For Enhanced Video Diffusion Sampling (2024) • No Venue
Hyung et al.
Comat: Aligning Text-to-image Diffusion Model With Image-to-text Concept Matching (2024) • No Venue
Jiang et al.
Pyramidal Flow Matching For Efficient Video Generative Modeling (2024) • No Venue
Jin et al.
Naturalspeech 3: Zero-shot Speech Synthesis With Factorized Codec And Diffusion Models (2024) • No Venue
Ju et al.
Adaptive Caching For Faster Video Generation With Diffusion Transformers (2024) • No Venue
Kahatapitiya et al.
Video Depth Without Video Models (2024) • No Venue
Ke et al.
Fifo-diffusion: Generating Infinite Videos From Text Without Training (2024) • No Venue
Kim et al.
Beyondscene: Higher-resolution Human-centric Scene Generation With Pretrained Diffusion (2024) • No Venue
Kim et al.
Revisit Large-scale Image-caption Data In Pre-training Multimodal Foundation Models (2024) • No Venue
Lai et al.
Streammultidiffusion: Real-time Interactive Generation With Region-based Semantic Control (2024) • No Venue
Lee et al.
Videoguide: Improving Video Diffusion Models Without Training Through A Teacher's Guide (2024) • No Venue
Lee et al.
Ootdiffusion: Outfitting Fusion Based Latent Diffusion For Controllable Virtual Try-on (2024) • No Venue
Xu et al.
Brushedit: All-in-one Image Inpainting And Editing (2024) • No Venue
Li et al.
Controlnet++: Improving Conditional Controls With Efficient Consistency Feedback (2024) • No Venue
Li et al.
Dual3d: Efficient And Consistent Text-to-3d Generation With Dual-mode Multi-view Latent Diffusion (2024) • No Venue
Li et al.
Hunyuan-dit: A Powerful Multi-resolution Diffusion Transformer With Fine-grained Chinese Understanding (2024) • No Venue
Li et al.
Svdqunat: Absorbing Outliers By Low-rank Components For 4-bit Diffusion Models (2024) • No Venue
Li et al.
CLAY: A Controllable Large-scale Generative Model For Creating High-quality 3D Assets (2024) • ACM Transactions on Graphics • 49 citations
Zhang et al.
Step-aware Preference Optimization: Aligning Preference With Denoising Performance At Each Step (2024) • No Venue
Liang et al.
Ctrl-adapter: An Efficient And Versatile Framework For Adapting Diverse Controls To Any Diffusion Model (2024) • No Venue
Lin et al.
Pixwizard: Versatile Image-to-image Visual Assistant With Open-language Instructions (2024) • No Venue
Lin et al.
Show-o: One Single Transformer To Unify Multimodal Understanding And Generation (2024) • No Venue
Xie et al.
Localizing Object-level Shape Variations With Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 66 citations
Patashnik et al.
ECLIPSE: A Resource-efficient Text-to-image Prior For Image Generations (2023) • No Venue
Patel et al.
Cache Me If You Can: Accelerating Diffusion Models Through Block Caching (2023) • No Venue
Wimbauer et al.
Audioldm 2: Learning Holistic Audio Generation With Self-supervised Pretraining (2023) • No Venue
Liu et al.
Dual-stream Diffusion Net For Text-to-video Generation (2023) • No Venue
Liu et al.
Instaflow: One Step Is Enough For High-quality Diffusion-based Text-to-image Generation (2023) • No Venue
Liu et al.
Sherpa3d: Boosting High-fidelity Text-to-3d Generation Via Coarse 3D Prior (2023) • No Venue
Liu et al.
Unleashing Text-to-image Diffusion Models For Visual Perception (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 124 citations
Zhao et al.
Chatanything: Facetime Chat With Llm-enhanced Personas (2023) • No Venue
Zhao et al.
Lcm-lora: A Universal Stable-diffusion Acceleration Module (2023) • No Venue
Luo et al.
Uni-controlnet: All-in-one Control To Text-to-image Diffusion Models (2023) • Arxiv • 65 citations
Zhao et al.
Codefusion: A Pre-trained Diffusion Model For Code Generation (2023) • No Venue
Singh et al.
Deepcache: Accelerating Diffusion Models For Free (2023) • No Venue
Xinyin Ma, Gongfan Fang, Xinchao Wang
Text-to-sticker: Style Tailoring Latent Diffusion Models For Human Expression (2023) • No Venue
Sinha et al.
Diffuse, Attend, And Segment: Unsupervised Zero-shot Segmentation Using Stable Diffusion (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Tian et al.
Diversity Is Definitely Needed: Improving Model-agnostic Zero-shot Classification Via Stable Diffusion (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 41 citations
Shipard et al.
SKED: Sketch-guided Text-based 3D Editing (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 42 citations
Mikaeili et al.
Freecontrol: Training-free Spatial Control Of Any Text-to-image Diffusion Model With Any Condition (2023) • No Venue
Mo et al.
Dreamix: Video Diffusion Models Are General Video Editors (2023) • Arxiv • 43 citations
Molad et al.
Dragondiffusion: Enabling Drag-style Manipulation On Diffusion Models (2023) • No Venue
Mou et al.
Human Preference Score: Better Aligning Text-to-image Models With Human Preference (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 108 citations
Wu et al.
Next-gpt: Any-to-any Multimodal LLM (2023) • No Venue
Wu et al.
LAVIE: High-quality Video Generation With Cascaded Latent Diffusion Models (2023) • No Venue
Wang et al.
A Neural Space-time Representation For Text-to-image Personalization (2023) • ACM Transactions on Graphics • 50 citations
Alaluf et al.
Interpolating Between Images With Diffusion Models (2023) • No Venue
Clinton J. Wang, Polina Golland
Gesturediffuclip: Gesture Diffusion Model With CLIP Latents (2023) • ACM Transactions on Graphics • 113 citations
Tenglong Ao, Zeyi Zhang, Libin Liu
Fusionframes: Efficient Architectural Aspects For Text-to-video Generation Pipeline (2023) • No Venue
Arkhipkin et al.
Kandinsky 3.0 Technical Report (2023) • No Venue
Arkhipkin et al.
Synthetic Data From Diffusion Models Improves Imagenet Classification (2023) • Arxiv • 78 citations
Azizi et al.
Dreamdiffusion: Generating High-quality Images From Brain EEG Signals (2023) • No Venue
Bai et al.
Hrs-bench: Holistic, Reliable And Scalable Benchmark For Text-to-image Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Bakr et al.
Align Your Latents: High-resolution Video Synthesis With Latent Diffusion Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 438 citations
Blattmann et al.
Token Merging For Fast Stable Diffusion (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 64 citations
Daniel Bolya, Judy Hoffman
Texfusion: Synthesizing 3D Textures With Text-guided Image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 67 citations
Cao et al.
Masactrl: Tuning-free Mutual Self-attention Control For Consistent Image Synthesis And Editing (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 234 citations
Cao et al.
Extracting Training Data From Diffusion Models (2023) • Arxiv • 93 citations
Carlini et al.
Attend-and-excite: Attention-based Semantic Guidance For Text-to-image Diffusion Models (2023) • ACM Transactions on Graphics • 291 citations
Chefer et al.
Fantasia3d: Disentangling Geometry And Appearance For High-quality Text-to-3d Content Creation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 246 citations
Chen et al.
Photoverse: Tuning-free Image Customization With Text-to-image Diffusion Models (2023) • No Venue
Chen et al.
Pixart-α: Fast Training Of Diffusion Transformer For Photorealistic Text-to-image Synthesis (2023) • No Venue
Chen et al.
Schrodinger Bridges Beat Diffusion Models On Text-to-speech Synthesis (2023) • No Venue
Chen et al.
Symbolic Discovery Of Optimization Algorithms (2023) • Arxiv • 163 citations
Chen et al.
Modelscope Text-to-video Technical Report (2023) • Arxiv • 46 citations
Wang et al.
Diffusion Model Alignment Using Direct Preference Optimization (2023) • No Venue
Wallace et al.
Anti-dreambooth: Protecting Users From Personalized Text-to-image Synthesis (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Le et al.
Promptpaint: Steering Text-to-image Generation Through Paint Medium-like Interactions (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 66 citations
John Joon Young Chung, Eytan Adar
Scaling Robot Learning With Semantically Imagined Experience (2023) • Robotics: Science and Systems XIX • 57 citations
Yu et al.
Emu: Enhancing Image Generation Models Using Photogenic Needles In A Haystack (2023) • No Venue
Dai et al.
Effective Data Augmentation With Diffusion Models (2023) • Arxiv • 80 citations
Trabucco et al.
A Pilot Study Of Query-free Adversarial Attack Against Stable Diffusion (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 44 citations
Haomin Zhuang, Yihua Zhang, Sijia Liu
LEDITS: Real Image Editing With DDPM Inversion And Semantic Guidance (2023) • No Venue
Linoy Tsaban, Apolinário Passos
Structure And Content-guided Video Synthesis With Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 275 citations
Esser et al.
Glaze: Protecting Artists From Style Mimicry By Text-to-image Models (2023) • Arxiv • 41 citations
Shan et al.
GALIP: Generative Adversarial Clips For Text-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Tao et al.
Diverse Data Augmentation With Diffusions For Effective Test-time Prompt Tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 44 citations
Feng et al.
Encoder-based Domain Tuning For Fast Personalization Of Text-to-image Models (2023) • ACM Transactions on Graphics • 122 citations
Gal et al.
Erasing Concepts From Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 111 citations
Gandikota et al.
Vox-e: Text-guided Voxel Editing Of 3D Objects (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Sella et al.
Ip-adapter: Text Compatible Image Prompt Adapter For Text-to-image Diffusion Models (2023) • No Venue
Ye et al.
Expressive Text-to-image Generation With Rich Text (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 48 citations
Ge et al.
Tokenflow: Consistent Diffusion Features For Consistent Video Editing (2023) • No Venue
Geyer et al.
Preserve Your Own Correlation: A Noise Prior For Video Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 97 citations
Ge et al.
A Picture Is Worth A Thousand Words: Principled Recaptioning Improves Image Generation (2023) • No Venue
Segalis et al.
Emu Video: Factorizing Text-to-video Generation By Explicit Image Conditioning (2023) • No Venue
Girdhar et al.
Adding Conditional Control To Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 2580 citations
Lvmin Zhang, Anyi Rao, Maneesh Agrawala
Commoncanvas: An Open Diffusion Model Trained With Creative-commons Images (2023) • No Venue
Gokaslan et al.
Make-it-3d: High-fidelity 3D Creation From A Single Image With Diffusion Prior (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 149 citations
Tang et al.
Animatediff: Animate Your Personalized Text-to-image Diffusion Models Without Specific Tuning (2023) • No Venue
Guo et al.
Using Human Feedback To Fine-tune Diffusion Models Without Any Reward Model (2023) • No Venue
Yang et al.
A Complete Survey On Generative AI (AIGC): Is Chatgpt From GPT-4 To GPT-5 All You Need? (2023) • Arxiv • 101 citations
Zhang et al.
Photorealistic Video Generation With Diffusion Models (2023) • No Venue
Gupta et al.
Svdiff: Compact Parameter Space For Diffusion Fine-tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Han et al.
Hyperdreambooth: Hypernetworks For Fast Personalization Of Text-to-image Models (2023) • No Venue
Ruiz et al.
De-diffusion Makes Text A Strong Cross-modal Interface (2023) • No Venue
Wei et al.
Conceptlab: Creative Generation Using Diffusion Prior Constraints (2023) • No Venue
Richardson et al.
FABRIC: Personalizing Diffusion Models With Iterative Feedback (2023) • No Venue
Rütte et al.
Rerender A Video: Zero-shot Text-guided Video-to-video Translation (2023) • No Venue
Yang et al.
Make-an-audio: Text-to-audio Generation With Prompt-enhanced Diffusion Models (2023) • Arxiv • 46 citations
Huang et al.
Noise2music: Text-conditioned Music Generation With Diffusion Models (2023) • Arxiv • 45 citations
Huang et al.
Tech: Text-guided Reconstruction Of Lifelike Clothed Humans (2023) • No Venue
Huang et al.
Neuroprompts: An Adaptive Framework To Optimize Prompts For Text-to-image Generation (2023) • No Venue
Shachar Rosenman, Vasudev Lal, Phillip Howard
Magicapture: High-resolution Multi-concept Portrait Customization (2023) • No Venue
Junha Hyung, Jaeyo Shin, Jaegul Choo
Word-as-image For Semantic Typography (2023) • ACM Transactions on Graphics • 51 citations
Iluz et al.
VMC: Video Motion Customization Using Temporal Attention Adaption For Text-to-video Diffusion Models (2023) • No Venue
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
Scedit: Efficient And Controllable Image Diffusion Generation Via Skip Connection Editing (2023) • No Venue
Jiang et al.
SDXL: Improving Latent Diffusion Models For High-resolution Image Synthesis (2023) • No Venue
Podell et al.
Aligning Text-to-image Diffusion Models With Reward Backpropagation (2023) • No Venue
Prabhudesai et al.
Boxdiff: Text-to-image Synthesis With Training-free Box-constrained Diffusion (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 101 citations
Xie et al.
Rich Human Feedback For Text-to-image Generation (2023) • No Venue
Liang et al.
BLIP-2: Bootstrapping Language-image Pre-training With Frozen Image Encoders And Large Language Models (2023) • Arxiv • 65 citations
Li et al.
Diffusion Models For Non-autoregressive Text Generation: A Survey (2023) • Arxiv • 41 citations
Li et al.
Diffurec: A Diffusion Model For Sequential Recommendation (2023) • ACM Transactions on Information Systems • 80 citations
Zihao Li, Aixin Sun, Chenliang Li
GLIGEN: Open-set Grounded Text-to-image Generation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 364 citations
Li et al.
JEN-1: Text-guided Universal Music Generation With Omnidirectional Diffusion Models (2023) • No Venue
Li et al.
Videogen: A Reference-guided Latent Diffusion Approach For High Definition Text-to-video Generation (2023) • No Venue
Li et al.
Your Diffusion Model Is Secretly A Zero-shot Classifier (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 99 citations
Li et al.
Controlling Text-to-image Diffusion By Orthogonal Finetuning (2023) • No Venue
Qiu et al.
Fatezero: Fusing Attentions For Zero-shot Text-based Video Editing (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 141 citations
Qi et al.
Ufogen: You Forward Once Large Scale Text-to-image Generation Via Diffusion Gans (2023) • No Venue
Xu et al.
DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model (2023) • No Venue
Xu et al.
Imagereward: Learning And Evaluating Human Preferences For Text-to-image Generation (2023) • Arxiv • 99 citations
Xu et al.
Generative Artificial Intelligence In Learning Analytics: Contextualising Opportunities And Challenges Through The Learning Analytics Cycle (2023) • Proceedings of the 14th Learning Analytics and Knowledge Conference • 51 citations
Lixiang Yan, Roberto Martinez-Maldonado, Dragan Gašević
Text2video-zero: Text-to-image Diffusion Models Are Zero-shot Video Generators (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 261 citations
Khachatryan et al.
X-adapter: Adding Universal Compatibility Of Plugins For Upgraded Diffusion Model (2023) • No Venue
Ran et al.
Dense Text-to-image Generation With Attention Modulation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 75 citations
Kim et al.
Collaborative Score Distillation For Consistent Visual Synthesis (2023) • No Venue
Kim et al.
Prospect: Prompt Spectrum For Attribute-aware Personalization Of Diffusion Models (2023) • ACM Transactions on Graphics • 76 citations
Zhang et al.
Kandinsky: An Improved Text-to-image Synthesis With Image Prior And Latent Diffusion (2023) • No Venue
Razzhigaev et al.
RAPHAEL: Text-to-image Generation Via Large Mixture Of Diffusion Paths (2023) • Arxiv • 41 citations
Xue et al.
Freestyle Layout-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Xue et al.
Text-visual Prompting For Efficient 2D Temporal Video Grounding (2023) • Arxiv • 73 citations
Zhang et al.
Open-vocabulary Panoptic Segmentation With Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 266 citations
Xu et al.
Layoutllm-t2i: Eliciting Layout Guidance From LLM For Text-to-image Generation (2023) • MM '23: The 31st ACM International Conference on Multimedia • 65 citations
Qu et al.
Diffusion-lm Improves Controllable Text Generation (2022) • Arxiv • 236 citations
Li et al.
Dreambooth: Fine Tuning Text-to-image Diffusion Models For Subject-driven Generation (2022) • Arxiv • 105 citations
Ruiz et al.
Knn-diffusion: Image Generation Via Large-scale Retrieval (2022) • Arxiv • 46 citations
Sheynin et al.
Diffusion Models: A Comprehensive Survey Of Methods And Applications (2022) • Arxiv • 147 citations
Yang et al.
Compositional Visual Generation With Composable Diffusion Models (2022) • Lecture Notes in Computer Science • 211 citations
Liu et al.
Dreamfusion: Text-to-3d Using 2D Diffusion (2022) • Arxiv • 453 citations
Poole et al.
Versatile Diffusion: Text, Images And Variations All In One Diffusion Model (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 91 citations
Xu et al.
Magicvideo: Efficient Video Generation With Latent Diffusion Models (2022) • Arxiv • 63 citations
Zhou et al.
Ediff-i: Text-to-image Diffusion Models With An Ensemble Of Expert Denoisers (2022) • Arxiv • 222 citations
Balaji et al.
A Survey On Generative Diffusion Model (2022) • Arxiv • 66 citations
Cao et al.
Instructpix2pix: Learning To Follow Image Editing Instructions (2022) • Arxiv • 40 citations
Tim Brooks, Aleksander Holynski, Alexei A. Efros
Roentgen: Vision-language Foundation Model For Chest X-ray Generation (2022) • Arxiv • 55 citations
Chambon et al.
Adapting Pretrained Vision-language Foundational Models To Medical Imaging Domains (2022) • Foundation Models for Decision Making Workshop at Neural Information Processing Systems 2022 • 43 citations
Chambon et al.
Re-imagen: Retrieval-augmented Text-to-image Generator (2022) • Arxiv • 41 citations
Chen et al.
Point-e: A System For Generating 3D Point Clouds From Complex Prompts (2022) • Arxiv • 155 citations
Nichol et al.
Training-free Structured Diffusion Guidance For Compositional Text-to-image Synthesis (2022) • Arxiv • 70 citations
Feng et al.
Make-a-scene: Scene-based Text-to-image Generation With Human Priors (2022) • Lecture Notes in Computer Science • 265 citations
Gafni et al.
Dpm-solver++: Fast Solver For Guided Sampling Of Diffusion Probabilistic Models (2022) • Arxiv • 101 citations
Lu et al.
Diffuseq: Sequence To Sequence Text Generation With Diffusion Models (2022) • Arxiv • 93 citations
Gong et al.
Motiondiffuse: Text-driven Human Motion Generation With Diffusion Model (2022) • Arxiv • 109 citations
Zhang et al.
Diffusionbert: Improving Generative Masked Language Models With Diffusion Models (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 41 citations
He et al.
Human Motion Diffusion Model (2022) • Arxiv • 165 citations
Tevet et al.
Photorealistic Text-to-image Diffusion Models With Deep Language Understanding (2022) • Arxiv • 2091 citations
Saharia et al.
Diff-tts: A Denoising Diffusion Model For Text-to-speech (2021) • Interspeech 2021 • 95 citations
Jeong et al.
High-resolution Image Synthesis With Latent Diffusion Models (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 10513 citations
Rombach et al.
GLIDE: Towards Photorealistic Image Generation And Editing With Text-guided Diffusion Models (2021) • Arxiv • 995 citations
Nichol et al.

Showing first 12 while collapsed. Click to expand and reveal all 448.

— E —

EACL 13 papers #

DREEAM: Guiding Attention With Evidence For Improving Document-level Relation Extraction (2023) • Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics • 62 citations
Youmi Ma, An Wang, Naoaki Okazaki
Large Language Models Are Few(1)-shot Table Reasoners (2022) • Findings of the Association for Computational Linguistics: EACL 2023 • 41 citations
Wenhu Chen
Should You Mask 15% In Masked Language Modeling? (2022) • Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics • 60 citations
Wettig et al.
Generating Syntactically Controlled Paraphrases Without Using Annotated Parallel Pairs (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 52 citations
Kuan-Hao Huang, Kai-Wei Chang
Syntax-bert: Improving Pre-trained Transformers With Syntax Trees (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 40 citations
Bai et al.
Trankit: A Light-weight Transformer-based Toolkit For Multilingual Natural Language Processing (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations • 43 citations
Nguyen et al.
Word Alignment By Fine-tuning Embeddings On Parallel Corpora (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 118 citations
Zi-Yi Dou, Graham Neubig
Debiasing Pre-trained Contextualised Embeddings (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 83 citations
Masahiro Kaneko, Danushka Bollegala
On Hallucination And Predictive Uncertainty In Conditional Language Generation (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 63 citations
Yijun Xiao, William Yang Wang
Adapterfusion: Non-destructive Task Composition For Transfer Learning (2020) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 41 citations
Pfeiffer et al.
Recipes For Building An Open-domain Chatbot (2020) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 170 citations
Roller et al.
Multi-task Learning For Mental Health Using Social Media Text (2017) • Proceedings of the 15th Conference of the EACL (2017) 152-162 • 60 citations
Adrian Benton, Margaret Mitchell, Dirk Hovy
Learning To Generate One-sentence Biographies From Wikidata (2017) • Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers • 90 citations
Andrew Chisholm, Will Radford, Ben Hachey

Showing first 12 while collapsed. Click to expand and reveal all 13.

Efficiency 1547 papers #

Inference-time Hyper-scaling With KV Cache Compression (2025) • No Venue
Łańcucki et al.
Falcon-h1: A Family Of Hybrid-head Language Models Redefining Efficiency And Performance (2025) • No Venue
Zuo et al.
Softpick: No Attention Sink, No Massive Activations With Rectified Softmax (2025) • No Venue
Zayd M. K. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji
Reasonflux-prm: Trajectory-aware Prms For Long Chain-of-thought Reasoning In Llms (2025) • No Venue
Zou et al.
Latent Collaboration In Multi-agent Systems (2025) • No Venue
Zou et al.
Frac-connections: Fractional Extension Of Hyper-connections (2025) • No Venue
Zhu et al.
Layercake: Token-aware Contrastive Decoding Within Large Language Model Layers (2025) • No Venue
Zhu et al.
Latent Refinement Decoding: Enhancing Diffusion-based Language Models By Refining Belief States (2025) • No Venue
Zhu et al.
Scaling Latent Reasoning Via Looped Language Models (2025) • No Venue
Zhu et al.
Obs-diff: Accurate Pruning For Diffusion Models In One-shot (2025) • No Venue
Zhu et al.
Hybridnorm: Towards Stable And Efficient Transformer Training Via Hybrid Normalization (2025) • No Venue
Zhuo et al.
Hephaestus: Improving Fundamental Agent Capabilities Of Large Language Models Through Continual Pre-training (2025) • No Venue
Zhuang et al.
Efficient Long-context Language Model Training By Core Attention Disaggregation (2025) • No Venue
Zhuang et al.
Siglip 2: Multilingual Vision-language Encoders With Improved Semantic Understanding, Localization, And Dense Features (2025) • No Venue
Tschannen et al.
Deepprune: Parallel Scaling Without Inter-trace Redundancy (2025) • No Venue
Tu et al.
How To Train Your LLM Web Agent: A Statistical Diagnosis (2025) • No Venue
Vattikonda et al.
Embeddinggemma: Powerful And Lightweight Text Representations (2025) • No Venue
Vera et al.
Qwenlong-l1: Towards Long-context Large Reasoning Models With Reinforcement Learning (2025) • No Venue
Wan et al.
Diffusion Llms Can Do Faster-than-ar Inference Via Discrete Diffusion Forcing (2025) • No Venue
Wang et al.
CODESYNC: Synchronizing Large Language Models With Dynamic Code Evolution At Scale (2025) • No Venue
Wang et al.
Bitvla: 1-bit Vision-language-action Models For Robotics Manipulation (2025) • No Venue
Wang et al.
Bitnet V2: Native 4-bit Activations With Hadamard Transformation For 1-bit Llms (2025) • No Venue
Hongyu Wang, Shuming Ma, Furu Wei
Co-evolving LLM Coder And Unit Tester Via Reinforcement Learning (2025) • No Venue
Wang et al.
Tina: Tiny Reasoning Models Via Lora (2025) • No Venue
Wang et al.
PATS: Process-level Adaptive Thinking Mode Switching (2025) • No Venue
Wang et al.
Information Gain-based Policy Optimization: A Simple And Effective Approach For Multi-turn LLM Agents (2025) • No Venue
Wang et al.
Generalizing Test-time Compute-optimal Scaling As An Optimizable Graph (2025) • No Venue
Wang et al.
Emergent Hierarchical Reasoning In Llms Through Reinforcement Learning (2025) • No Venue
Wang et al.
From Editor To Dense Geometry Estimator (2025) • No Venue
Wang et al.
Efficient Agents: Building Effective Agents While Reducing Cost (2025) • No Venue
Wang et al.
Memmamba: Rethinking Memory Patterns In State Space Model (2025) • No Venue
Wang et al.
Internvl3.5: Advancing Open-source Multimodal Models In Versatility, Reasoning, And Efficiency (2025) • No Venue
Wang et al.
Hybrimoe: Hybrid CPU-GPU Scheduling And Cache Management For Efficient Moe Inference (2025) • No Venue
Zhong et al.
WALL-E 2.0: World Alignment By Neurosymbolic Learning Improves World Model-based LLM Agents (2025) • No Venue
Zhou et al.
A Theoretical Study On Bridging Internal Probability And Self-consistency For LLM Reasoning (2025) • No Venue
Zhou et al.
3DIS-FLUX: Simple And Efficient Multi-instance Generation With Dit Rendering (2025) • No Venue
Zhou et al.
Soundwave: Less Is More For Speech-text Alignment In Llms (2025) • No Venue
Zhang et al.
Lightthinker: Thinking Step-by-step Compression (2025) • No Venue
Zhang et al.
Flashvideo:flowing Fidelity To Detail For Efficient High-resolution Video Generation (2025) • No Venue
Zhang et al.
Fast Video Generation With Sliding Tile Attention (2025) • No Venue
Zhang et al.
Faster Video Diffusion With Trainable Sparse Attention (2025) • No Venue
Zhang et al.
The Lessons Of Developing Process Reward Models In Mathematical Reasoning (2025) • No Venue
Zhang et al.
Llava-mini: Efficient Image And Video Large Multimodal Models With One Vision Token (2025) • No Venue
Zhang et al.
Locality-aware Parallel Decoding For Efficient Autoregressive Image Generation (2025) • No Venue
Zhang et al.
Othink-r1: Intrinsic Fast/slow Thinking Mode Switching For Over-reasoning Mitigation (2025) • No Venue
Zhang et al.
MM-RLHF: The Next Step Forward In Multimodal LLM Alignment (2025) • No Venue
Zhang et al.
SLA: Beyond Sparsity In Diffusion Transformers Via Fine-tunable Sparse-linear Attention (2025) • No Venue
Zhang et al.
S1-bench: A Simple Benchmark For Evaluating System 1 Thinking Capability Of Large Reasoning Models (2025) • No Venue
Zhang et al.
Pixel-sail: Single Transformer For Pixel-grounded Understanding (2025) • No Venue
Zhang et al.
Qwen3 Embedding: Advancing Text Embedding And Reranking Through Foundation Models (2025) • No Venue
Zhang et al.
Sageattention2++: A More Efficient Implementation Of Sageattention2 (2025) • No Venue
Zhang et al.
Sageattention3: Microscaling FP4 Attention For Inference And An Exploration Of 8-bit Training (2025) • No Venue
Zhang et al.
Training-free Efficient Video Generation Via Dynamic Token Carving (2025) • No Venue
Zhang et al.
Spargeattn: Accurate Sparse Attention Accelerating Any Model Inference (2025) • No Venue
Zhang et al.
Tensor Product Attention Is All You Need (2025) • No Venue
Zhang et al.
Vision-language-vision Auto-encoder: Scalable Knowledge Distillation From Diffusion Models (2025) • No Venue
Zhang et al.
UFO2: The Desktop Agentos (2025) • No Venue
Zhang et al.
Waver: Wave Your Way To Lifelike Video Generation (2025) • No Venue
Zhang et al.
Insights Into Deepseek-v3: Scaling Challenges And Reflections On Hardware For AI Architectures (2025) • No Venue
Zhao et al.
Let Llms Break Free From Overthinking Via Self-braking Tuning (2025) • No Venue
Zhao et al.
Paroattention: Pattern-aware Reordering For Efficient Sparse And Quantized Attention In Visual Generation Models (2025) • No Venue
Zhao et al.
FARMER: Flow Autoregressive Transformer Over Pixels (2025) • No Venue
Zheng et al.
An Empirical Study Of Qwen3 Quantization (2025) • No Venue
Zheng et al.
Scaling Diffusion Transformers Efficiently Via Μp (2025) • No Venue
Zheng et al.
Group Sequence Policy Optimization (2025) • No Venue
Zheng et al.
Vision Foundation Models As Effective Visual Tokenizers For Autoregressive Image Generation (2025) • No Venue
Zheng et al.
SAIL-VL2 Technical Report (2025) • No Venue
Yin et al.
Livemcp-101: Stress Testing And Diagnosing Mcp-enabled Agents On Challenging Queries (2025) • No Venue
Yin et al.
Formalmath: Benchmarking Formal Mathematical Reasoning Of Large Language Models (2025) • No Venue
Yu et al.
Craw4llm: Efficient Web Crawling For LLM Pretraining (2025) • No Venue
Shi Yu, Zhiyuan Liu, Chenyan Xiong
Demystifying Reinforcement Learning In Agentic Reasoning (2025) • No Venue
Yu et al.
Dimple: Discrete Diffusion Multimodal Large Language Model With Parallel Decoding (2025) • No Venue
Runpeng Yu, Xinyin Ma, Xinchao Wang
Recode: Unify Plan And Action For Universal Granularity Control (2025) • No Venue
Yu et al.
Minicpm-v 4.5: Cooking Efficient Mllms Via Architecture, Data, And Training Recipe (2025) • No Venue
Yu et al.
Trajselector: Harnessing Latent Representations For Efficient And Effective Best-of-n In Large Reasoning Model (2025) • No Venue
Yu et al.
Z1: Efficient Test-time Scaling With Code (2025) • No Venue
Yu et al.
Agent-r: Training Language Model Agents To Reflect Via Iterative Self-training (2025) • No Venue
Yuan et al.
Being-0: A Humanoid Robotic Agent With Vision-language Models And Modular Skills (2025) • No Venue
Yuan et al.
Efficientllm: Efficiency In Large Language Models (2025) • No Venue
Yuan et al.
Pixelrefer: A Unified Framework For Spatio-temporal Object Referring With Arbitrary Granularity (2025) • No Venue
Yuan et al.
Native Sparse Attention: Hardware-aligned And Natively Trainable Sparse Attention (2025) • No Venue
Yuan et al.
Shortv: Efficient Multimodal Large Language Models By Freezing Visual Tokens In Ineffective Layers (2025) • No Venue
Yuan et al.
Vl-cogito: Progressive Curriculum Reinforcement Learning For Advanced Multimodal Reasoning (2025) • No Venue
Yuan et al.
Gliner2: An Efficient Multi-task Information Extraction System With Schema-driven Interface (2025) • No Venue
Zaratiana et al.
ARWKV: Pretrain Is Not What We Need, An Rnn-attention-based Language Model Born From Transformer (2025) • No Venue
Yueyu et al.
Understand Before You Generate: Self-guided Training For Autoregressive Image Generation (2025) • No Venue
Yue et al.
When To Ensemble: Identifying Token-level Points For Stable And Fast LLM Ensembling (2025) • No Venue
Yun et al.
Designlab: Designing Slides Through Iterative Detection And Correction (2025) • No Venue
Yun et al.
Rlinf-vla: A Unified And Efficient Framework For VLA+RL Training (2025) • No Venue
Zang et al.
A Vision-language-action-critic Model For Robotic Real-world Reinforcement Learning (2025) • No Venue
Zhai et al.
SIFT: Grounding LLM Reasoning In Contexts Via Stickers (2025) • No Venue
Zeng et al.
Exgrpo: Learning To Reason From Experience (2025) • No Venue
Zhan et al.
Easycontrol: Adding Efficient And Flexible Control For Diffusion Transformer (2025) • No Venue
Zhang et al.
Adaptthink: Reasoning Models Can Learn When To Think (2025) • No Venue
Zhang et al.
70% Size, 100% Accuracy: Lossless LLM Compression For Efficient GPU Inference Via Dynamic-length Float (2025) • No Venue
Zhang et al.
Alphaone: Reasoning Models Thinking Slow And Fast At Test Time (2025) • No Venue
Zhang et al.
Batch Speculative Decoding Done Right (2025) • No Venue
Zhang et al.
Browseragent: Building Web Agents With Human-inspired Web Browsing Actions (2025) • No Venue
Zhang et al.
Domain2vec: Vectorizing Datasets To Find The Optimal Data Mixture Without Training (2025) • No Venue
Zhang et al.
Mobile-agent-e: Self-evolving Mobile Assistant For Complex Tasks (2025) • No Venue
Wang et al.
O-mem: Omni Memory System For Personalized, Long Horizon, Self-evolving Agents (2025) • No Venue
Wang et al.
Optimizing Large Language Model Training Using FP4 Quantization (2025) • No Venue
Wang et al.
Open-qwen2vl: Compute-efficient Pre-training Of Fully-open Multimodal Llms On Academic Resources (2025) • No Venue
Wang et al.
OTC: Optimal Tool Calls Via Reinforcement Learning (2025) • No Venue
Wang et al.
Rep-mtl: Unleashing The Power Of Representation-level Task Saliency For Multi-task Learning (2025) • No Venue
Zedong Wang, Siyuan Li, Dan Xu
Reinforcement Learning For Reasoning In Large Language Models With One Training Example (2025) • No Venue
Wang et al.
Resa: Transparent Reasoning Models Via Saes (2025) • No Venue
Wang et al.
Reverse-engineered Reasoning For Open-ended Generation (2025) • No Venue
Wang et al.
Scholarcopilot: Training Large Language Models For Academic Writing With Accurate Citations (2025) • No Venue
Wang et al.
Thoughts Are All Over The Place: On The Underthinking Of O1-like Llms (2025) • No Venue
Wang et al.
Sparser Block-sparse Attention Via Token Permutation (2025) • No Venue
Wang et al.
Sota With Less: Mcts-guided Sample Selection For Data-efficient Visual Reasoning Self-improvement (2025) • No Venue
Wang et al.
Sparsed: Sparse Attention For Diffusion Language Models (2025) • No Venue
Wang et al.
Thinking Augmented Pre-training (2025) • No Venue
Wang et al.
Test-time Scaling With Reflective Generative Model (2025) • No Venue
Wang et al.
Winning The Pruning Gamble: A Unified Approach To Joint Sample And Token Pruning For Efficient Supervised Fine-tuning (2025) • No Venue
Wang et al.
Unigenbench++: A Unified Semantic Evaluation Benchmark For Text-to-image Generation (2025) • No Venue
Wang et al.
Wait, We Don't Need To "wait"! Removing Thinking Tokens Improves Reasoning Efficiency (2025) • No Venue
Wang et al.
Vla-adapter: An Effective Paradigm For Tiny-scale Vision-language-action Model (2025) • No Venue
Wang et al.
World Modeling Makes A Better Planner: Dual Preference Optimization For Embodied Task Planning (2025) • No Venue
Wang et al.
Findings Of The Babylm Challenge: Sample-efficient Pretraining On Developmentally Plausible Corpora (2025) • Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning • 63 citations
Warstadt et al.
Deepseek-ocr: Contexts Optical Compression (2025) • No Venue
Haoran Wei, Yaofeng Sun, Yukun Li
Sim-cot: Supervised Implicit Chain-of-thought (2025) • No Venue
Wei et al.
Streamvln: Streaming Vision-and-language Navigation Via Slowfast Context Modeling (2025) • No Venue
Wei et al.
Rank1: Test-time Compute For Reranking In Information Retrieval (2025) • No Venue
Weller et al.
Efficient Multi-modal Large Language Models Via Progressive Consistency Distillation (2025) • No Venue
Wen et al.
Delta Attention: Fast And Accurate Sparse Attention Inference By Delta Correction (2025) • No Venue
Jeffrey Willette, Heejun Lee, Sung Ju Hwang
Lightgen: Efficient Image Generation Through Knowledge Distillation And Direct Preference Optimization (2025) • No Venue
Wu et al.
Fast-dllm: Training-free Acceleration Of Diffusion LLM By Enabling KV Cache And Parallel Decoding (2025) • No Venue
Wu et al.
Bitnet Distillation (2025) • No Venue
Wu et al.
ARM: Adaptive Reasoning Model (2025) • No Venue
Wu et al.
Automated Movie Generation Via Multi-agent Cot Planning (2025) • No Venue
Weijia Wu, Zeyu Zhu, Mike Zheng Shou
Efficient Pretraining Length Scaling (2025) • No Venue
Wu et al.
Deepsearch: Overcome The Bottleneck Of Reinforcement Learning With Verifiable Rewards Via Monte Carlo Tree Search (2025) • No Venue
Wu et al.
Boosting Multimodal Reasoning With Mcts-automated Structured Thinking (2025) • No Venue
Wu et al.
Direct3d-s2: Gigascale 3D Generation Made Easy With Spatial Sparse Attention (2025) • No Venue
Wu et al.
Latent Flow Transformer (2025) • No Venue
Wu et al.
From Hours To Minutes: Lossless Acceleration Of Ultra Long Sequence Generation Up To 100K Tokens (2025) • No Venue
Wu et al.
Grove Moe: Towards Efficient And Superior Moe Llms With Adjugate Experts (2025) • No Venue
Wu et al.
LAPO: Internalizing Reasoning Efficiency Via Length-adaptive Policy Optimization (2025) • No Venue
Wu et al.
Hunyuanvideo 1.5 Technical Report (2025) • No Venue
Wu et al.
It Takes Two: Your GRPO Is Secretly DPO (2025) • No Venue
Wu et al.
Resum: Unlocking Long-horizon Search Intelligence Via Context Summarization (2025) • No Venue
Wu et al.
Vmoba: Mixture-of-block Attention For Video Diffusion Models (2025) • No Venue
Wu et al.
BAPO: Stabilizing Off-policy Reinforcement Learning For Llms Via Balanced Policy Optimization With Adaptive Clipping (2025) • No Venue
Xi et al.
LIMI: Less Is More For Agency (2025) • No Venue
Xiao et al.
Lumina-dimoo: An Omni Diffusion Large Language Model For Multi-modal Generation And Understanding (2025) • No Venue
Xin et al.
SANA 1.5: Efficient Scaling Of Training-time And Inference-time Compute In Linear Diffusion Transformer (2025) • No Venue
Xie et al.
Reconstruction Alignment Improves Unified Multimodal Models (2025) • No Venue
Xie et al.
MPO: Boosting LLM Agents With Meta Plan Optimization (2025) • No Venue
Xiong et al.
Chain Of Draft: Thinking Faster By Writing Less (2025) • No Venue
Xu et al.
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding And Generation (2025) • No Venue
Xu et al.
Easysteer: A Unified Framework For High-performance And Extensible LLM Steering (2025) • No Venue
Xu et al.
Genius: A Generalizable And Purely Unsupervised Self-training Framework For Advanced Reasoning (2025) • No Venue
Xu et al.
Noderag: Structuring Graph-based RAG With Heterogeneous Nodes (2025) • No Venue
Xu et al.
Scalable Chain Of Thoughts Via Elastic Reasoning (2025) • No Venue
Xu et al.
Ravine: Reality-aligned Evaluation For Agentic Search (2025) • No Venue
Xu et al.
Qwen3-omni Technical Report (2025) • No Venue
Xu et al.
Thinking-free Policy Initialization Makes Distilled Reasoning Models More Effective And Efficient Reasoners (2025) • No Venue
Xu et al.
Tiny Model, Big Logic: Diversity-driven Optimization Elicits Large-model Reasoning Ability In Vibethinker-1.5b (2025) • No Venue
Xu et al.
Unveiling Downstream Performance Scaling Of Llms: A Clustering-based Perspective (2025) • No Venue
Xu et al.
Φ-decoding: Adaptive Foresight Sampling For Balanced Inference-time Exploration And Exploitation (2025) • No Venue
Xu et al.
MUR: Momentum Uncertainty Guided Reasoning For Large Language Models (2025) • No Venue
Yan et al.
ARIA: Training Language Agents With Intention-driven Reward Aggregation (2025) • No Venue
Yang et al.
Laser: Reinforcement Learning With Last-token Self-rewarding (2025) • No Venue
Yang et al.
Egolife: Towards Egocentric Life Assistant (2025) • No Venue
Yang et al.
DCPO: Dynamic Clipping Policy Optimization (2025) • No Venue
Yang et al.
Hscodecomp: A Realistic And Expert-level Benchmark For Deep Search Agents In Hierarchical Rule Application (2025) • No Venue
Yang et al.
Step Back To Leap Forward: Self-backtracking For Boosting Reasoning Of Language Models (2025) • No Venue
Yang et al.
Multiverse: Your Language Models Secretly Decide How To Parallelize And Merge Generation (2025) • No Venue
Yang et al.
Longlive: Real-time Interactive Long Video Generation (2025) • No Venue
Yang et al.
Reasonflux: Hierarchical LLM Reasoning Via Scaling Thought Templates (2025) • No Venue
Yang et al.
Qwen2.5-1m Technical Report (2025) • No Venue
Yang et al.
Qwen3 Technical Report (2025) • No Venue
Yang et al.
Sparse Videogen2: Accelerate Video Generation With Sparse Attention Via Semantic-aware Permutation (2025) • No Venue
Yang et al.
Visionthink: Smart And Efficient Vision Language Model Via Reinforcement Learning (2025) • No Venue
Yang et al.
Reconstruction Vs. Generation: Taming Optimization Dilemma In Latent Diffusion Models (2025) • No Venue
Jingfeng Yao, Xinggang Wang
Optimizing Chain-of-thought Reasoners Via Gradient Variance Minimization In Rejection Sampling And RL (2025) • No Venue
Yao et al.
Survey On Evaluation Of Llm-based Agents (2025) • No Venue
Yehudai et al.
Black-box On-policy Distillation Of Large Language Models (2025) • No Venue
Ye et al.
LIMO: Less Is More For Reasoning (2025) • No Venue
Ye et al.
Magicinfinite: Generating Infinite Talking Videos With Your Words And Voice (2025) • No Venue
Yi et al.
Demystifying Long Chain-of-thought Reasoning In Llms (2025) • No Venue
Yeo et al.
Wan: Open And Advanced Large-scale Video Generative Models (2025) • No Venue
Wanteam et al.
Every Activation Boosted: Scaling General Reasoner To 1 Trillion Open Language Foundation (2025) • No Venue
Ling-Team et al.
Virtual Width Networks (2025) • No Venue
Seed et al.
Skywork R1V2: Multimodal Hybrid Reinforcement Learning For Reasoning (2025) • No Venue
Chris et al.
NVIDIA Nemotron Nano V2 VL (2025) • No Venue
Nvidia et al.
VAPO: Efficient And Reliable Reinforcement Learning For Advanced Reasoning Tasks (2025) • No Venue
Yuyue et al.
Grokking In The Wild: Data Augmentation For Real-world Multi-hop Reasoning With Transformers (2025) • No Venue
Roman Abramov, Felix Steinbauer, Gjergji Kasneci
Gated Associative Memory: A Parallel O(N) Architecture For Efficient Sequence Modeling (2025) • No Venue
Rishiraj Acharya
The Markovian Thinker (2025) • No Venue
Aghajohari et al.
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025) • No Venue
Agrawal et al.
Ming-flash-omni: A Sparse, Unified Architecture For Multimodal Perception And Generation (2025) • No Venue
Ai et al.
When Less Is Enough: Adaptive Token Reduction For Efficient Image Representation (2025) • No Venue
Eduard Allakhverdov, Elizaveta Goncharova, Andrey Kuznetsov
Sharing Is Caring: Efficient LM Post-training With Collective RL Experience Sharing (2025) • No Venue
Amico et al.
Flexidit: Your Diffusion Transformer Can Easily Generate High-quality Samples With Less Compute (2025) • No Venue
Anagnostidis et al.
Llava-onevision-1.5: Fully Open Framework For Democratized Multimodal Training (2025) • No Venue
An et al.
Surfer 2: The Next Generation Of Cross-platform Computer Use Agents (2025) • No Venue
Andreux et al.
Tabstar: A Foundation Tabular Model With Semantically Target-aware Representations (2025) • No Venue
Alan Arazi, Eilam Shapira, Roi Reichart
Kandinsky 5.0: A Family Of Foundation Models For Image And Video Generation (2025) • No Venue
Arkhipkin et al.
Block Diffusion: Interpolating Between Autoregressive And Diffusion Language Models (2025) • No Venue
Arriola et al.
The Best Of N Worlds: Aligning Reinforcement Learning With Best-of-n Sampling Via Max@k Optimisation (2025) • No Venue
Bagirov et al.
Sketch-of-thought: Efficient LLM Reasoning With Adaptive Cognitive-inspired Sketching (2025) • No Venue
Simon A. Aytes, Jinheon Baek, Sung Ju Hwang
Hybrid Architectures For Language Models: Systematic Analysis And Design Insights (2025) • No Venue
Bae et al.
Mixture-of-recursions: Learning Dynamic Recursive Depths For Adaptive Token-level Computation (2025) • No Venue
Bae et al.
Llama-nemotron: Efficient Reasoning Models (2025) • No Venue
Bercovich et al.
KV Cache Steering For Inducing Reasoning In Small Language Models (2025) • No Venue
Belitsky et al.
Singlora: Low Rank Adaptation Using A Single Matrix (2025) • No Venue
Bensaïd et al.
All Is Not Lost: LLM Recovery Without Checkpoints (2025) • No Venue
Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen
Riemannlora: A Unified Riemannian Framework For Ambiguity-free Lora Optimization (2025) • No Venue
Bogachev et al.
When Does Reasoning Matter? A Controlled Study Of Reasoning's Contribution To Model Performance (2025) • No Venue
Boizard et al.
Go-with-the-flow: Motion-controllable Video Diffusion Models Using Real-time Warped Noise (2025) • No Venue
Burgert et al.
Distillation Scaling Laws (2025) • No Venue
Busbridge et al.
Iterresearch: Rethinking Long-horizon Agents Via Markovian State Reconstruction (2025) • No Venue
Chen et al.
Quartet: Native FP4 Training Can Be Optimal For Large Language Models (2025) • No Venue
Castro et al.
INT V.s. FP: A Comprehensive Study Of Fine-grained Low-bit Quantization Formats (2025) • No Venue
Chen et al.
Dip: Taming Diffusion Models In Pixel Space (2025) • No Venue
Chen et al.
A^2FM: An Adaptive Agent Foundation Model For Tool-aware Hybrid Reasoning (2025) • No Venue
Chen et al.
Dc-videogen: Efficient Video Generation With Deep Compression Video Autoencoder (2025) • No Venue
Chen et al.
Blip3-o: A Family Of Fully Open Unified Multimodal Models-architecture, Training And Dataset (2025) • No Venue
Chen et al.
Autopr: Let's Automate Your Academic Promotion! (2025) • No Venue
Chen et al.
Coda: Coding LM Via Diffusion Adaptation (2025) • No Venue
Chen et al.
Code2video: A Code-centric Paradigm For Educational Video Generation (2025) • No Venue
Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou
Flash-dmd: Towards High-fidelity Few-step Image Generation With Efficient Distillation And Joint Reinforcement Learning (2025) • No Venue
Chen et al.
Eagle 2.5: Boosting Long-context Post-training For Frontier Vision-language Models (2025) • No Venue
Chen et al.
Exploring The Effect Of Reinforcement Learning On Video Understanding: Insights From Seed-bench-r1 (2025) • No Venue
Chen et al.
The Geometry Of LLM Quantization: GPTQ As Babai's Nearest Plane Algorithm (2025) • No Venue
Jiale Chen, Torsten Hoefler, Dan Alistarh
MIDAS: Multimodal Interactive Digital-human Synthesis Via Real-time Autoregressive Video Generation (2025) • No Venue
Chen et al.
Parallel Scaling Law For Language Models (2025) • No Venue
Chen et al.
Retroinfer: A Vector-storage Approach For Scalable Long-context LLM Inference (2025) • No Venue
Chen et al.
Sana-sprint: One-step Diffusion With Continuous-time Consistency Distillation (2025) • No Venue
Chen et al.
Sparse-vdit: Unleashing The Power Of Sparse Attention To Accelerate Video Diffusion Transformers (2025) • No Venue
Chen et al.
Verithinker: Learning To Verify Makes Reasoning Model Efficient (2025) • No Venue
Chen et al.
Glyph: Scaling Context Windows Via Visual-text Compression (2025) • No Venue
Cheng et al.
Gold-medalist Performance In Solving Olympiad Geometry With Alphageometry2 (2025) • No Venue
Chervonyi et al.
Mem0: Building Production-ready AI Agents With Scalable Long-term Memory (2025) • No Venue
Chhikara et al.
STITCH: Simultaneous Thinking And Talking With Chunked Reasoning For Spoken Language Models (2025) • No Venue
Chiang et al.
Command A: An Enterprise-ready Large Language Model (2025) • No Venue
Cohere et al.
Beyond RAG: Task-aware KV Cache Compression For Comprehensive Knowledge Reasoning (2025) • No Venue
Corallo et al.
The Danger Of Overthinking: Examining The Reasoning-action Dilemma In Agentic Tasks (2025) • No Venue
Cuadron et al.
Paddleocr-vl: Boosting Multilingual Document Parsing Via A 0.9B Ultra-compact Vision-language Model (2025) • No Venue
Cui et al.
Emu3.5: Native Multimodal Models Are World Learners (2025) • No Venue
Cui et al.
Process Reinforcement Through Implicit Rewards (2025) • No Venue
Cui et al.
CDE: Curiosity-driven Exploration For Efficient Reinforcement Learning In Large Language Models (2025) • No Venue
Dai et al.
One-minute Video Generation With Test-time Training (2025) • No Venue
Dalal et al.
Alayadb: The Data Foundation For Efficient And Effective Long-context LLM Inference (2025) • No Venue
Deng et al.
Exploring The Sustainable Scaling Of AI Dilemma: A Projective Study Of Corporations' AI Environmental Impacts (2025) • No Venue
Desroches et al.
Machinelearninglm: Continued Pretraining Language Models On Millions Of Synthetic Tabular Prediction Tasks Scales In-context ML (2025) • No Venue
Dong et al.
Mmtok: Multimodal Coverage Maximization For Efficient Inference Of Vlms (2025) • No Venue
Dong et al.
Streaming Diloco With Overlapping Communication: Towards A Distributed Free Lunch (2025) • No Venue
Douillard et al.
Mom: Linear Sequence Modeling With Mixture-of-memories (2025) • No Venue
Du et al.
Which Heads Matter For Reasoning? Rl-guided KV Cache Compression (2025) • No Venue
Du et al.
Moderngbert: German-only 1B Encoder Model Trained From Scratch (2025) • No Venue
Ehrmanntraut et al.
Bridging The Gap Between Promise And Performance For Microscaling FP4 Quantization (2025) • No Venue
Egiazarian et al.
Deep Think With Confidence (2025) • No Venue
Fu et al.
Artificial Hippocampus Networks For Efficient Long-context Modeling (2025) • No Venue
Fang et al.
Megascience: Pushing The Frontiers Of Post-training Datasets For Science Reasoning (2025) • No Venue
Run-Ze Fan, Zengzhi Wang, Pengfei Liu
Make Lora Great Again: Boosting Lora With Adaptive Singular Values And Mixture-of-experts Optimization Alignment (2025) • No Venue
Fan et al.
Missing Premise Exacerbates Overthinking: Are Reasoning Models Losing Critical Thinking Skill? (2025) • No Venue
Fan et al.
Phased DMD: Few-step Distribution Matching Distillation Via Score Matching Within Subintervals (2025) • No Venue
Fan et al.
Memp: Exploring Agent Procedural Memory (2025) • No Venue
Fang et al.
Dualvla: Building A Generalizable Embodied Agent Via Partial Decoupling Of Reasoning And Action (2025) • No Venue
Fang et al.
Lightmem: Lightweight And Efficient Memory-augmented Generation (2025) • No Venue
Fang et al.
Thinkless: LLM Learns When To Think (2025) • No Venue
Gongfan Fang, Xinyin Ma, Xinchao Wang
SRPO: Self-referential Policy Optimization For Vision-language-action Models (2025) • No Venue
Fei et al.
Cache-to-cache: Direct Semantic Communication Between Large Language Models (2025) • No Venue
Fu et al.
Retool: Reinforcement Learning For Strategic Tool Use In Llms (2025) • No Venue
Feng et al.
Efficient Reasoning Models: A Survey (2025) • No Venue
Feng et al.
WILDCHAT-50M: A Deep Dive Into The Role Of Synthetic Data In Post-training (2025) • No Venue
Benjamin Feuer, Chinmay Hegde
What Characterizes Effective Reasoning? Revisiting Length, Review, And Structure Of Cot (2025) • No Venue
Feng et al.
Vericot: Neuro-symbolic Chain-of-thought Validation Via Logical Consistency Checks (2025) • No Venue
Feng et al.
Areal: A Large-scale Asynchronous Reinforcement Learning System For Language Reasoning (2025) • No Venue
Fu et al.
R2R: Efficiently Navigating Divergent Reasoning Paths With Small-large Model Token Routing (2025) • No Venue
Fu et al.
Nemotron-flash: Towards Latency-optimal Hybrid Small Language Models (2025) • No Venue
Fu et al.
Beyond Ten Turns: Unlocking Long-horizon Agentic Search With Large-scale Asynchronous RL (2025) • No Venue
Gao et al.
Agentscope 1.0: A Developer-centric Framework For Building Agentic Applications (2025) • No Venue
Gao et al.
Soft Adaptive Policy Optimization (2025) • No Venue
Gao et al.
Seedream 3.0 Technical Report (2025) • No Venue
Gao et al.
Seedance 1.0: Exploring The Boundaries Of Video Generation Models (2025) • No Venue
Gao et al.
Seerattention-r: Sparse Attention Adaptation For Long Reasoning (2025) • No Venue
Gao et al.
A Strategic Coordination Framework Of Small Llms Matches Large Llms In Data Synthesis (2025) • No Venue
Gao et al.
Set Block Decoding Is A Language Model Inference Accelerator (2025) • No Venue
Gat et al.
R&B: Domain Regrouping And Data Mixture Balancing For Efficient Foundation Model Training (2025) • No Venue
Ge et al.
Arc-hunyuan-video-7b: Structured Video Comprehension Of Real-world Shorts (2025) • No Venue
Ge et al.
Inverse Scaling In Test-time Compute (2025) • No Venue
Gema et al.
Guided By Gut: Efficient Test-time Scaling With Reinforced Intrinsic Confidence (2025) • No Venue
Ghasemabadi et al.
Should We Still Pretrain Encoders With Masked Language Modeling? (2025) • No Venue
Gisserot-Boukhlef et al.
RADLADS: Rapid Attention Distillation To Linear Attention Decoders At Scale (2025) • No Venue
Goldstein et al.
Mind2web 2: Evaluating Agentic Search With Agent-as-a-judge (2025) • No Venue
Gou et al.
Rstar-math: Small Llms Can Master Math Reasoning With Self-evolved Deep Thinking (2025) • No Venue
Guan et al.
Deeprag: Thinking To Retrieval Step By Step For Large Language Models (2025) • No Venue
Guan et al.
Fasta^*: Fast-slow Toolpath Agent With Subroutine Mining For Efficient Multi-turn Image Editing (2025) • No Venue
Gupta et al.
Swe-factory: Your Automated Factory For Issue Resolution Training Data And Evaluation Benchmarks (2025) • No Venue
Guo et al.
Train Long, Think Short: Curriculum Learning For Efficient Reasoning (2025) • No Venue
Hammoud et al.
Trillion 7B Technical Report (2025) • No Venue
Han et al.
Vision As A Dialect: Unifying Visual Understanding And Generation Via Text-aligned Representations (2025) • No Venue
Han et al.
ROOT: Robust Orthogonalized Optimizer For Neural Network Training (2025) • No Venue
He et al.
Don't Overthink It. Preferring Shorter Thinking Chains For Improved LLM Reasoning (2025) • No Venue
Hassid et al.
Can Large Language Models Detect Errors In Long Chain-of-thought Reasoning? (2025) • No Venue
He et al.
Textoon: Generating Vivid 2D Cartoon Characters From Text Descriptions (2025) • No Venue
Chao He, Jianqiang Ren, Liefeng Bo
Dr.llm: Dynamic Layer Routing In Llms (2025) • No Venue
Heakl et al.
Kuwain 1.5B: An Arabic SLM Via Language Injection (2025) • No Venue
Hennara et al.
Beyond One-size-fits-all: Inversion Learning For Highly Effective NLG Evaluation Prompts (2025) • No Venue
Hong et al.
Group Think: Multiple Concurrent Reasoning Agents Collaborating At Token Level Granularity (2025) • No Venue
Hsu et al.
The Art Of Scaling Reinforcement Learning Compute For Llms (2025) • No Venue
Khatri et al.
REINFORCE++: A Simple And Efficient Approach For Aligning Large Language Models (2025) • No Venue
Jian Hu
Every Token Counts: Generalizing 16M Ultra-long Context In Large Language Models (2025) • No Venue
Hu et al.
Adaspec: Selective Knowledge Distillation For Efficient Speculative Decoders (2025) • No Venue
Hu et al.
Qerl: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning For Llms (2025) • No Venue
Huang et al.
Over-tokenized Transformer: Vocabulary Is Generally Worth Scaling (2025) • No Venue
Huang et al.
Live Avatar: Streaming Real-time Audio-driven Avatar Generation With Infinite Length (2025) • No Venue
Huang et al.
O1 Replication Journey -- Part 3: Inference-time Scaling For Medical Reasoning (2025) • No Venue
Huang et al.
Stable-spam: How To Train In 4-bit More Stably Than 16-bit Adam (2025) • No Venue
Huang et al.
Vchain: Chain-of-visual-thought For Reasoning In Video Generation (2025) • No Venue
Huang et al.
Ultramemv2: Memory Networks Scaling To 120B Parameters With Superior Long-context Learning (2025) • No Venue
Huang et al.
Dynamic Chunking For End-to-end Hierarchical Sequence Modeling (2025) • No Venue
Sukjun Hwang, Brandon Wang, Albert Gu
Multi-granular Spatio-temporal Token Merging For Training-free Acceleration Of Video Llms (2025) • No Venue
Hyun et al.
Moga: Mixture-of-groups Attention For End-to-end Long Video Generation (2025) • No Venue
Jia et al.
Mme-cot: Benchmarking Chain-of-thought In Large Multimodal Models For Reasoning Quality, Robustness, And Efficiency (2025) • No Venue
Jiang et al.
Alphadrive: Unleashing The Power Of Vlms In Autonomous Driving Via Reinforcement Learning And Reasoning (2025) • No Venue
Jiang et al.
Detect Anything Via Next Point Prediction (2025) • No Venue
Jiang et al.
R-4B: Incentivizing General-purpose Auto-thinking Capability In Mllms Via Bi-mode Annealing And Reinforce Learning (2025) • No Venue
Jiang et al.
Token-efficient Long Video Understanding For Multimodal Llms (2025) • No Venue
Jiang et al.
Think Only When You Need With Large Hybrid-reasoning Models (2025) • No Venue
Jiang et al.
Verltool: Towards Holistic Agentic Reinforcement Learning With Tool Use (2025) • No Venue
Jiang et al.
Parallelbench: Understanding The Trade-offs Of Parallel Decoding In Diffusion Llms (2025) • No Venue
Kang et al.
Distilling LLM Agent Into Small Models With Retrieval And Code Tools (2025) • No Venue
Kang et al.
ACON: Optimizing Context Compression For Long-horizon LLM Agents (2025) • No Venue
Kang et al.
First Try Matters: Revisiting The Role Of Reflection In Reasoning Models (2025) • No Venue
Kang et al.
Simple Semi-supervised Knowledge Distillation From Vision-language Models Via Texttt{d}ual-texttt{h}ead Texttt{o}ptimization (2025) • No Venue
Kang et al.
T1: Tool-integrated Self-verification For Test-time Compute Scaling In Small Language Models (2025) • No Venue
Minki Kang, Jongwon Jeong, Jaewoong Cho
Gigaevo: An Open Source Optimization Framework Powered By Llms And Evolution Algorithms (2025) • No Venue
Khrulkov et al.
KLASS: Kl-guided Fast Inference In Masked Diffusion Models (2025) • No Venue
Kim et al.
Meta-awareness Enhances Reasoning Models: Self-alignment Reinforcement Learning (2025) • No Venue
Yoonjeon Kim, Doohyuk Jang, Eunho Yang
PLADIS: Pushing The Limits Of Attention In Diffusion Models At Inference Time By Leveraging Sparsity (2025) • No Venue
Kwanyoung Kim, Byeongsu Sim
Universal Reasoner: A Single, Composable Plug-and-play Reasoner For Frozen Llms (2025) • No Venue
Kim et al.
Streamdit: Real-time Streaming Text-to-video Generation (2025) • No Venue
Kodaira et al.
Zclip: Adaptive Spike Mitigation For LLM Pre-training (2025) • No Venue
Kumar et al.
Cramming 1568 Tokens Into A Single Vector And Back Again: Exploring The Limits Of Embedding Space Capacity (2025) • No Venue
Kuratov et al.
Train Sparse Autoencoders Efficiently By Utilizing Features Correlation (2025) • No Venue
Kurochkin et al.
Making Mathematical Reasoning Adaptive (2025) • No Venue
Lai et al.
Mini-o3: Scaling Up Reasoning Patterns And Interaction Turns For Visual Search (2025) • No Venue
Lai et al.
Evolving Deeper LLM Thinking (2025) • No Venue
Lee et al.
Infinitehip: Extending Language Model Context Up To 3 Million Tokens On A Single GPU (2025) • No Venue
Lee et al.
Genrecal: Generation After Recalibration From Large To Small Vision-language Models (2025) • No Venue
Lee et al.
Saferoute: Adaptive Model Selection For Efficient And Accurate Safety Guardrails In Large Language Models (2025) • No Venue
Lee et al.
MITS: Enhanced Tree Search Reasoning For Llms Via Pointwise Mutual Information (2025) • No Venue
Li et al.
C3PO: Critical-layer, Core-expert, Collaborative Pathway Optimization For Test-time Expert Re-mixing (2025) • No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
4D Langsplat: 4D Language Gaussian Splatting Via Multimodal Large Language Models (2025) • No Venue
Li et al.
Beyond Fixed: Variable-length Denoising For Diffusion Large Language Models (2025) • No Venue
Li et al.
Autotriton: Automatic Triton Programming With Reinforcement Learning In Llms (2025) • No Venue
Li et al.
Miromind-m1: An Open-source Advancement In Mathematical Reasoning Via Context-aware Multi-stage Policy Optimization (2025) • No Venue
Li et al.
CUDA-L1: Improving CUDA Optimization Via Contrastive Reinforcement Learning (2025) • No Venue
Li et al.
Diffusion Language Models Know The Answer Before Decoding (2025) • No Venue
Li et al.
Langsplatv2: High-dimensional 3D Language Gaussian Splatting With 450+ FPS (2025) • No Venue
Li et al.
Memos: A Memory OS For AI System (2025) • No Venue
Li et al.
Staying In The Sweet Spot: Responsive Reasoning Evolution Via Capability-adaptive Hint Scaffolding (2025) • No Venue
Li et al.
PRIMA.CPP: Speeding Up 70b-scale LLM Inference On Low-resource Everyday Home Clusters (2025) • No Venue
Li et al.
Mol-r1: Towards Explicit Long-cot Reasoning In Molecule Discovery (2025) • No Venue
Li et al.
Openvision: A Fully-open, Cost-effective Family Of Advanced Vision Encoders For Multimodal Learning (2025) • No Venue
Li et al.
Preference Leakage: A Contamination Problem In Llm-as-a-judge (2025) • No Venue
Li et al.
R2-T2: Re-routing In Test-time For Multimodal Mixture-of-experts (2025) • No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
START: Self-taught Reasoner With Tools (2025) • No Venue
Li et al.
Skip A Layer Or Loop It? Test-time Depth Adaptation Of Pretrained Llms (2025) • No Venue
Ziyue Li, Yang Li, Tianyi Zhou
Spatial Forcing: Implicit Spatial Representation Alignment For Vision-language-action Model (2025) • No Venue
Li et al.
Small Models Struggle To Learn From Strong Reasoners (2025) • No Venue
Li et al.
Tempsamp-r1: Effective Temporal Sampling With Reinforcement Fine-tuning For Video Llms (2025) • No Venue
Li et al.
A Survey On Diffusion Language Models (2025) • No Venue
Li et al.
Test-time Preference Optimization: On-the-fly Alignment Via Iterative Textual Feedback (2025) • No Venue
Li et al.
Uni-moe-2.0-omni: Scaling Language-centric Omnimodal Large Model With Advanced Moe, Training And Data (2025) • No Venue
Li et al.
Transmamba: Flexibly Switching Between Transformer And Mamba (2025) • No Venue
Li et al.
Treepo: Bridging The Gap Of Policy Optimization And Efficacy And Inference Efficiency With Heuristic Tree-based Modeling (2025) • No Venue
Li et al.
VLA-RFT: Vision-language-action Reinforcement Fine-tuning With Verified Rewards In World Simulators (2025) • No Venue
Li et al.
Drag-and-drop Llms: Zero-shot Prompt-to-weights (2025) • No Venue
Liang et al.
SEAP: Training-free Sparse Expert Activation Pruning Unlock The Brainpower Of Large Language Models (2025) • No Venue
Liang et al.
Multimodal Mamba: Decoder-only Multimodal State Space Model Via Quadratic To Linear Distillation (2025) • No Venue
Liao et al.
Reward-guided Speculative Decoding For Efficient LLM Reasoning (2025) • No Venue
Liao et al.
Motif 2 12.7B Technical Report (2025) • No Venue
Lim et al.
Implicit Reasoning In Transformers Is Reasoning Through Shortcuts (2025) • No Venue
Lin et al.
Autoregressive Adversarial Post-training For Real-time Interactive Video Generation (2025) • No Venue
Lin et al.
Computer-use Agents As Judges For Generative User Interface (2025) • No Venue
Lin et al.
Sigma: Differential Rescaling Of Query, Key And Value For Efficient Language Models (2025) • No Venue
Lin et al.
Quantization Meets Dllms: A Systematic Study Of Post-training Quantization For Diffusion Llms (2025) • No Venue
Lin et al.
Gear: Generation Augmented Retrieval (2025) • No Venue
Liu et al.
Costbench: Evaluating Multi-turn Cost-optimal Planning And Adaptation In Dynamic Environments For LLM Tool-use Agents (2025) • No Venue
Liu et al.
A Comprehensive Survey On Long Context Language Modeling (2025) • No Venue
Liu et al.
E^2rank: Your Text Embedding Can Also Be An Effective And Efficient Listwise Reranker (2025) • No Venue
Liu et al.
Efficient Inference For Large Reasoning Models: A Survey (2025) • No Venue
Liu et al.
Flow-grpo: Training Flow Matching Models Via Online RL (2025) • No Venue
Liu et al.
Guardreasoner: Towards Reasoning-based LLM Safeguards (2025) • No Venue
Liu et al.
Llm-powered GUI Agents In Phone Automation: Surveying Progress And Prospects (2025) • No Venue
Liu et al.
Learn To Reason Efficiently With Adaptive Length-based Reward Shaping (2025) • No Venue
Liu et al.
Quantization Hurts Reasoning? An Empirical Study On Quantized Reasoning Models (2025) • No Venue
Liu et al.
New Trends For Modern Machine Translation With Large Reasoning Models (2025) • No Venue
Liu et al.
Openvision 2: A Family Of Generative Pretrained Visual Encoders For Multimodal Learning (2025) • No Venue
Liu et al.
Quadmix: Quality-diversity Balanced Data Selection For Efficient LLM Pretraining (2025) • No Venue
Liu et al.
Points-reader: Distillation-free Adaptation Of Vision-language Models For Document Conversion (2025) • No Venue
Liu et al.
Shifting AI Efficiency From Model-centric To Data-centric Compression (2025) • No Venue
Liu et al.
Rolling Forcing: Autoregressive Long Video Diffusion In Real Time (2025) • No Venue
Liu et al.
Region-adaptive Sampling For Diffusion Transformers (2025) • No Venue
Liu et al.
Sequential Diffusion Language Models (2025) • No Venue
Liu et al.
Synlogic: Synthesizing Verifiable Reasoning Data At Scale For Learning Logical Reasoning And Beyond (2025) • No Venue
Liu et al.
Understanding R1-zero-like Training: A Critical Perspective (2025) • No Venue
Liu et al.
Tidar: Think In Diffusion, Talk In Autoregression (2025) • No Venue
Liu et al.
Adacot: Pareto-optimal Adaptive Chain-of-thought Triggering Via Reinforcement Learning (2025) • No Venue
Lou et al.
UI-R1: Enhancing Action Prediction Of GUI Agents By Reinforcement Learning (2025) • No Venue
Lu et al.
Hyper-bagel: A Unified Acceleration Framework For Multimodal Understanding And Generation (2025) • No Venue
Lu et al.
Omnicaptioner: One Captioner To Rule Them All (2025) • No Venue
Lu et al.
Ultrahorizon: Benchmarking Agent Capabilities In Ultra Long-horizon Scenarios (2025) • No Venue
Luo et al.
O1-pruner: Length-harmonizing Fine-tuning For O1-like Reasoning Pruning (2025) • No Venue
Luo et al.
Technologies On Effectiveness And Efficiency: A Survey Of State Spaces Models (2025) • No Venue
Lv et al.
Autonomy-of-experts Models (2025) • No Venue
Lv et al.
Build The Web For Agents, Not Agents For The Web (2025) • No Venue
Lù et al.
Token-shuffle: Towards High-resolution Image Generation With Autoregressive Models (2025) • No Venue
Ma et al.
Bitnet B1.58 2B4T Technical Report (2025) • No Venue
Ma et al.
Inference-time Scaling For Diffusion Models Beyond Scaling Denoising Steps (2025) • No Venue
Ma et al.
Veomni: Scaling Any Modality Model Training With Model-centric Distributed Recipe Zoo (2025) • No Venue
Ma et al.
Unitok: A Unified Tokenizer For Visual Generation And Understanding (2025) • No Venue
Ma et al.
Scaling Analysis Of Interleaved Speech-text Language Models (2025) • No Venue
Maimon et al.
Slamming: Training A Speech Language Model On One GPU In A Day (2025) • No Venue
Gallil Maimon, Avishai Elmakies, Yossi Adi
Smolvlm: Redefining Small And Efficient Multimodal Models (2025) • No Venue
Marafioti et al.
Yume: An Interactive World Generation Model (2025) • No Venue
Mao et al.
Gemstones: A Model Suite For Multi-faceted Scaling Laws (2025) • No Venue
McLeish et al.
Hard Negative Mining For Domain-specific Retrieval In Enterprise Systems (2025) • No Venue
Meghwani et al.
Transmla: Multi-head Latent Attention Is All You Need (2025) • No Venue
Fanxu Meng, Zengwei Yao, Muhan Zhang
Mm-eureka: Exploring Visual Aha Moment With Rule-based Large-scale Reinforcement Learning (2025) • No Venue
Meng et al.
Holocine: Holistic Generation Of Cinematic Multi-shot Long Video Narratives (2025) • No Venue
Meng et al.
Nablanabla: Neighborhood Adaptive Block-level Attention (2025) • No Venue
Mikhailov et al.
Discrete Audio Tokens: More Than A Survey! (2025) • No Venue
Mousavi et al.
SINQ: Sinkhorn-normalized Quantization For Calibration-free Low-precision LLM Weights (2025) • No Venue
Müller et al.
Leveraging Self-attention For Input-dependent Soft Prompting In Llms (2025) • No Venue
Ananth Muppidi, Abhilash Nandy, Sambaran Bandyopadhyay
Scalable-softmax Is Superior For Attention (2025) • No Venue
Ken M. Nakanishi
Matryoshka Quantization (2025) • No Venue
Nair et al.
Adaptivocab: Enhancing LLM Efficiency In Focused Domains Through Lightweight Vocabulary Adaptation (2025) • No Venue
Nakash et al.
Attention Is All You Need For KV Cache In Diffusion Llms (2025) • No Venue
Quan Nguyen-Tri, Mukul Ranjan, Zhiqiang Shen
Drax: Speech Recognition With Discrete Flow Matching (2025) • No Venue
Navon et al.
Ruccod: Towards Automated ICD Coding In Russian (2025) • No Venue
Nesterov et al.
Semviqa: A Semantic Question Answering System For Vietnamese Information Fact-checking (2025) • No Venue
Nguyen et al.
Mineru2.5: A Decoupled Vision-language Model For Efficient High-resolution Document Parsing (2025) • No Venue
Niu et al.
Bielik V3 Small: Technical Report (2025) • No Venue
Ociepa et al.
Large Language Models Meet Extreme Multi-label Classification: Scaling And Multi-modal Framework (2025) • No Venue
Ortego et al.
Quest: Stable Training Of Llms With 1-bit Weights And Activations (2025) • No Venue
Panferov et al.
Outlier-safe Pre-training For Robust 4-bit Quantization Of Large Language Models (2025) • No Venue
Park et al.
A Survey On Inference Engines For Large Language Models: Perspectives On Optimization And Efficiency (2025) • No Venue
Park et al.
Mathfusion: Enhancing Mathematic Problem-solving Of LLM Through Instruction Fusion (2025) • No Venue
Pei et al.
Skywork R1V: Pioneering Multimodal Reasoning With Chain-of-thought (2025) • No Venue
Peng et al.
Vibevoice Technical Report (2025) • No Venue
Peng et al.
FAST: Efficient Action Tokenization For Vision-language-action Models (2025) • No Venue
Pertsch et al.
THOUGHTTERMINATOR: Benchmarking, Calibrating, And Mitigating Overthinking In Reasoning Models (2025) • No Venue
Pu et al.
Optimizing Anytime Reasoning Via Budget Relative Policy Optimization (2025) • No Venue
Qi et al.
Dispider: Enabling Video Llms With Active Real-time Interaction Via Disentangled Perception, Decision, And Reaction (2025) • No Venue
Qian et al.
Lumina-image 2.0: A Unified And Efficient Image Generative Framework (2025) • No Venue
Qin et al.
Chain-of-visual-thought: Teaching Vlms To See And Think Better With Continuous Visual Tokens (2025) • No Venue
Qin et al.
Why Low-precision Transformer Training Fails: An Analysis On Flash Attention (2025) • No Venue
Haiquan Qiu, Quanming Yao
Saffron-1: Towards An Inference Scaling Paradigm For LLM Safety Assurance (2025) • No Venue
Qiu et al.
A Survey Of Efficient Reasoning For Large Reasoning Models: Language, Multimodality, And Beyond (2025) • No Venue
Qu et al.
Optimizing Test-time Compute Via Meta Reinforcement Fine-tuning (2025) • No Venue
Qu et al.
One Small Step In Latent, One Giant Leap For Pixels: Fast Latent Upscale Adapter For Your Diffusion Models (2025) • No Venue
Aleksandr Razin, Danil Kazantsev, Ilya Makarov
Visual Autoregressive Models Beat Diffusion Models On Inference Time Scaling (2025) • No Venue
Erik Riise, Mehmet Onurcan Kaya, Dim P. Papadopoulos
RL + Transformer = A General-purpose Problem Solver (2025) • No Venue
Micah Rentschler, Jesse Roberts
Vamba: Understanding Hour-long Videos With Hybrid Mamba-transformers (2025) • No Venue
Ren et al.
Hogwild! Inference: Parallel LLM Generation Via Concurrent Attention (2025) • No Venue
Rodionov et al.
Fast And Simplex: 2-simplicial Attention In Triton (2025) • No Venue
Roy et al.
Attention Sinks In Diffusion Language Models (2025) • No Venue
Rulli et al.
Dota-rag: Dynamic Of Thought Aggregation RAG (2025) • No Venue
Ruangtanusak et al.
The Diffusion Duality (2025) • No Venue
Sahoo et al.
Antidistillation Sampling (2025) • No Venue
Savani et al.
Agent Laboratory: Using LLM Agents As Research Assistants (2025) • No Venue
Schmidgall et al.
Quickvideo: Real-time Long Video Understanding With System Algorithm Co-design (2025) • No Venue
Schneider et al.
Seaweed-7b: Cost-effective Training Of Video Generation Foundation Model (2025) • No Venue
Seawead et al.
Seedream 4.0: Toward Next-generation Multimodal Image Generation (2025) • No Venue
Seedream et al.
Skrr: Skip And Re-use Text Encoder Layers For Memory Efficient Text-to-image Generation (2025) • No Venue
Seo et al.
Efficient Personalization Of Quantized Diffusion Model Without Backpropagation (2025) • No Venue
Seo et al.
Longrope2: Near-lossless LLM Context Window Scaling (2025) • No Venue
Shang et al.
Core^2: Collect, Reflect And Refine To Generate Better And Faster (2025) • No Venue
Shao et al.
Continuous Autoregressive Language Models (2025) • No Venue
Shao et al.
When Tokens Talk Too Much: A Survey Of Multimodal Long-context Token Compression Across Images, Videos, And Audios (2025) • No Venue
Shao et al.
Holitom: Holistic Token Merging For Fast Video Large Language Models (2025) • No Venue
Shao et al.
Qwenlong-cprs: Towards Infty-llms With Dynamic Context Optimization (2025) • No Venue
Shen et al.
SSA: Sparse Sparse Attention By Aligning Full And Sparse Attention Outputs In Feature Space (2025) • No Venue
Shen et al.
Scaling Vision Pre-training To 4K Resolution (2025) • No Venue
Shi et al.
Mavors: Multi-granularity Video Representation For Multimodal Large Language Model (2025) • No Venue
Shi et al.
Longcodezip: Compress Long Context For Code Language Models (2025) • No Venue
Shi et al.
Llmvox: Autoregressive Streaming Text-to-speech Model For Any LLM (2025) • No Venue
Shikhar et al.
COSPADI: Compressing Llms Via Calibration-guided Sparse Dictionary Learning (2025) • No Venue
Shopkhoev et al.
Liteattention: A Temporal Sparse Attention For Diffusion Transformers (2025) • No Venue
Shmilovich et al.
Replaceme: Network Simplification Via Layer Pruning And Linear Transformations (2025) • No Venue
Shopkhoev et al.
Scaling Laws For Optimal Data Mixtures (2025) • No Venue
Shukor et al.
Smolvla: A Vision-language-action Model For Affordable And Efficient Robotics (2025) • No Venue
Shukor et al.
Predictive Data Selection: The Data That Predicts Is The Data That Teaches (2025) • No Venue
Shum et al.
Diagonal Batching Unlocks Parallelism In Recurrent Memory Transformers For Long Contexts (2025) • No Venue
Sivtsov et al.
MADD: Multi-agent Drug Discovery Orchestra (2025) • No Venue
Solovev et al.
Smallthinker: A Family Of Efficient Large Language Models Natively Trained For Local Deployment (2025) • No Venue
Song et al.
Chain-of-model Learning For Language Model (2025) • No Venue
Song et al.
Seed Diffusion: A Large-scale Diffusion Language Model With High-speed Inference (2025) • No Venue
Song et al.
DMM: Building A Versatile Image Generation Model Via Distillation-based Model Merging (2025) • No Venue
Song et al.
Stabletoken: A Noise-robust Semantic Speech Tokenizer For Resilient Speechllms (2025) • No Venue
Song et al.
Xquant: Breaking The Memory Wall For LLM Inference With KV Cache Rematerialization (2025) • No Venue
Tomar et al.
Toolorchestra: Elevating Intelligence Via Efficient Model And Tool Orchestration (2025) • No Venue
Su et al.
Klear-reasoner: Advancing Reasoning Capability Via Gradient-preserving Clipping Policy Optimization (2025) • No Venue
Su et al.
Learn-by-interact: A Data-centric Framework For Self-adaptive Agents In Realistic Environments (2025) • No Venue
Su et al.
Stop Overthinking: A Survey On Efficient Reasoning For Large Language Models (2025) • No Venue
Sui et al.
LASP-2: Rethinking Sequence Parallelism For Linear Attention And Its Hybrid (2025) • No Venue
Sun et al.
Transformer^2: Self-adaptive Llms (2025) • No Venue
Qi Sun, Edoardo Cetin, Yujin Tang
Speed Always Wins: A Survey On Efficient Architectures For Large Language Models (2025) • No Venue
Sun et al.
DINGO: Constrained Inference For Diffusion Llms (2025) • No Venue
Suresh et al.
LLM Pretraining With Continuous Concepts (2025) • No Venue
Tack et al.
Nemotron Elastic: Towards Efficient Many-in-one Reasoning Llms (2025) • No Venue
Taghibakhshi et al.
Lumine: An Open Recipe For Building Generalist Agents In 3D Open Worlds (2025) • No Venue
Tan et al.
Agent KB: Leveraging Cross-domain Experience For Agentic Problem Solving (2025) • No Venue
Tang et al.
Large Language Models For Data Synthesis (2025) • No Venue
Yihong Tang, Menglin Kong, Lijun Sun
Plug-and-play 1.x-bit KV Cache Quantization For Video Large Language Models (2025) • No Venue
Tao et al.
Webleaper: Empowering Efficiency And Efficacy In Webagent Via Enabling Info-rich Seeking (2025) • No Venue
Tao et al.
Gemma 3 Technical Report (2025) • No Venue
Team et al.
COIG-P: A High-quality And Large-scale Chinese Preference Dataset For Alignment With Human Values (2025) • No Venue
Team et al.
Every Step Evolves: Scaling Reinforcement Learning For Trillion-scale Thinking Model (2025) • No Venue
Team et al.
Gigabrain-0: A World Model-powered Vision-language-action Model (2025) • No Venue
Team et al.
Kanana: Compute-efficient Bilingual Language Models (2025) • No Venue
Team et al.
Minicpm4: Ultra-efficient Llms On End Devices (2025) • No Venue
Team et al.
Longcat-flash-omni Technical Report (2025) • No Venue
Team et al.
Nextstep-1: Toward Autoregressive Image Generation With Continuous Tokens At Scale (2025) • No Venue
Team et al.
Fixing Data That Hurts Performance: Cascading Llms To Relabel Hard Negatives For Robust Information Retrieval (2025) • No Venue
Thakur et al.
Think Twice: Enhancing LLM Reasoning By Scaling Multi-round Test-time Thinking (2025) • No Venue
Tian et al.
Step-audio-r1 Technical Report (2025) • No Venue
Tian et al.
Moba: A Two-level Agent System For Efficient Mobile Task Automation (2024) • No Venue
Zhu et al.
Masked Audio Generation Using A Single Non-autoregressive Transformer (2024) • No Venue
Ziv et al.
Fastvlm: Efficient Vision Encoding For Vision Language Models (2024) • No Venue
Vasu et al.
Tnt-llm: Text Mining At Scale With Large Language Models (2024) • No Venue
Wan et al.
Star Attention: Efficient LLM Inference Over Long Sequences (2024) • No Venue
Shantanu Acharya, Fei Jia, Boris Ginsburg
Burstattention: An Efficient Distributed Attention Framework For Extremely Long Sequences (2024) • No Venue
Ao et al.
Perplexed By Perplexity: Perplexity-based Data Pruning With Small Reference Models (2024) • No Venue
Ankner et al.
Chronos: Learning The Language Of Time Series (2024) • No Venue
Ansari et al.
PALP: Prompt Aligned Personalization Of Text-to-image Models (2024) • No Venue
Arar et al.
Simple Linear Attention Language Models Balance The Recall-throughput Tradeoff (2024) • No Venue
Arora et al.
Aya 23: Open Weight Releases To Further Multilingual Progress (2024) • No Venue
Aryabumi et al.
Slicegpt: Compress Large Language Models By Deleting Rows And Columns (2024) • No Venue
Ashkboos et al.
Meissonic: Revitalizing Masked Generative Transformers For Efficient High-resolution Text-to-image Synthesis (2024) • No Venue
Bai et al.
GPTVQ: The Blessing Of Dimensionality For LLM Quantization (2024) • No Venue
Baalen et al.
Multi-agent Collaborative Data Selection For Efficient LLM Pretraining (2024) • No Venue
Bai et al.
LLM Augmented Llms: Expanding Capabilities Through Composition (2024) • No Venue
Bansal et al.
Llm2vec: Large Language Models Are Secretly Powerful Text Encoders (2024) • No Venue
Behnamghader et al.
SUTRA: Scalable Multilingual Language Model Architecture (2024) • No Venue
Bendale et al.
Qalam : A Multimodal LLM For Arabic Optical Character And Handwriting Recognition (2024) • No Venue
Bhatia et al.
Speculative Streaming: Fast LLM Inference Without Auxiliary Models (2024) • No Venue
Bhendawade et al.
INDUS: Effective And Efficient Language Models For Scientific Applications (2024) • No Venue
Bhattacharjee et al.
Lora Learns Less And Forgets Less (2024) • No Venue
Biderman et al.
Merlin: A Vision Language Foundation Model For 3D Computed Tomography (2024) • Arxiv • 45 citations
Blankemeier et al.
Recurrentgemma: Moving Past Transformers For Efficient Open Language Models (2024) • No Venue
Botev et al.
Reducing Transformer Key-value Cache Size With Cross-layer Attention (2024) • No Venue
Brandon et al.
Matryoshka Multimodal Models (2024) • No Venue
Cai et al.
Medusa: Simple LLM Inference Acceleration Framework With Multiple Decoding Heads (2024) • No Venue
Cai et al.
Compassjudger-1: All-in-one Judge Model Helps Model Evaluation And Evolution (2024) • No Venue
Cao et al.
Edgefusion: On-device Text-to-image Generation (2024) • No Venue
Castells et al.
Lyra: An Efficient And Speech-centric Framework For Omni-cognition (2024) • No Venue
Zhong et al.
Web Agents With World Models: Learning And Leveraging Environment Dynamics In Web Navigation (2024) • No Venue
Chae et al.
Language Models As Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning In Language Models (2024) • No Venue
Chae et al.
Dolphin: Long Context As A New Modality For Energy-efficient On-device Language Models (2024) • No Venue
Chen et al.
BGE M3-embedding: Multi-lingual, Multi-functionality, Multi-granularity Text Embeddings Through Self-knowledge Distillation (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 221 citations
Chen et al.
Do NOT Think That Much For 2+3=? On The Overthinking Of O1-like Llms (2024) • No Venue
Chen et al.
Prefixquant: Static Quantization Beats Dynamic Through Prefixed Outliers In Llms (2024) • No Venue
Chen et al.
Mindsearch: Mimicking Human Minds Elicits Deep AI Searcher (2024) • No Venue
Chen et al.
An Image Is Worth 1/2 Tokens After Layer 2: Plug-and-play Inference Acceleration For Large Vision-language Models (2024) • No Venue
Chen et al.
F5-TTS: A Fairytaler That Fakes Fluent And Faithful Speech With Flow Matching (2024) • No Venue
Chen et al.
EVLM: An Efficient Vision-language Model For Visual Understanding (2024) • No Venue
Chen et al.
Meshanything V2: Artist-created Mesh Generation With Adjacent Mesh Tokenization (2024) • No Venue
Chen et al.
Pixart-δ: Fast And Controllable Image Generation With Latent Consistency Models (2024) • No Venue
Chen et al.
Moto: Latent Motion Token As The Bridging Language For Robot Manipulation (2024) • No Venue
Chen et al.
Octo-planner: On-device Language Model For Planner-action Agents (2024) • No Venue
Chen et al.
Videollm-online: Online Video Large Language Model For Streaming Video (2024) • No Venue
Chen et al.
Reverse Thinking Makes Llms Stronger Reasoners (2024) • No Venue
Chen et al.
Shvit: Single-head Vision Transformer With Memory Efficient Macro Design (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Seokju Yun, Youngmin Ro
AIM: Adaptive Inference Of Multi-modal Llms Via Token Merging And Pruning (2024) • No Venue
Zhong et al.
Ditfastattn: Attention Compression For Diffusion Transformer Models (2024) • No Venue
Yuan et al.
Open-vocabulary SAM: Segment And Recognize Twenty-thousand Classes Interactively (2024) • No Venue
Yuan et al.
Cogview3: Finer And Faster Text-to-image Generation Via Relay Diffusion (2024) • No Venue
Zheng et al.
An Image Is Worth 32 Tokens For Reconstruction And Generation (2024) • No Venue
Yu et al.
Openresearcher: Unleashing AI For Accelerated Scientific Research (2024) • No Venue
Zheng et al.
Compressed Chain Of Thought: Efficient Reasoning Through Dense Representations (2024) • No Venue
Jeffrey Cheng, Benjamin van Durme
Breaking The Memory Barrier: Near Infinite Batch Size Scaling For Contrastive Loss (2024) • No Venue
Cheng et al.
Llamafactory: Unified Efficient Fine-tuning Of 100+ Language Models (2024) • No Venue
Zheng et al.
Yolo-world: Real-time Open-vocabulary Object Detection (2024) • No Venue
Cheng et al.
Efficiently Democratizing Medical Llms For 50 Languages Via A Mixture Of Language Family Experts (2024) • No Venue
Zheng et al.
Boosting Continual Learning Of Vision-language Models Via Mixture-of-experts Adapters (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Yu et al.
M-longdoc: A Benchmark For Multimodal Super-long Document Understanding And A Retrieval-aware Tuning Framework (2024) • No Venue
Chia et al.
Heavy Labels Out! Dataset Distillation With Label Space Lightening (2024) • No Venue
Yu et al.
Jacolbertv2.5: Optimising Multi-vector Retrievers To Create State-of-the-art Japanese Retrievers With Constrained Resources (2024) • No Venue
Benjamin Clavié
Scalable High-resolution Pixel-space Image Synthesis With Hourglass Diffusion Transformers (2024) • No Venue
Crowson et al.
NVLM: Open Frontier-class Multimodal Llms (2024) • No Venue
Dai et al.
ORGANA: A Robotic Assistant For Automated Chemistry Experimentation And Characterization (2024) • Matter • 52 citations
Darvish et al.
Transformers Are Ssms: Generalized Models And Efficient Algorithms Through Structured State Space Duality (2024) • No Venue
Tri Dao, Albert Gu
Larimar: Large Language Models With Episodic Memory Control (2024) • No Venue
Das et al.
Griffin: Mixing Gated Linear Recurrences With Local Attention For Efficient Language Models (2024) • No Venue
de et al.
A Simple And Effective L_2 Norm-based Strategy For KV Cache Compression (2024) • No Venue
Devoto et al.
Unveiling Encoder-free Vision-language Models (2024) • No Venue
Diao et al.
Hymba: A Hybrid-head Architecture For Small Language Models (2024) • No Venue
Dong et al.
FAN: Fourier Analysis Networks (2024) • No Venue
Dong et al.
Animatelcm: Accelerating The Animation Of Personalized Diffusion Models And Adapters With Decoupled Consistency Learning (2024) • No Venue
Wang et al.
Bitstack: Fine-grained Size Control For Compressed Large Language Models In Variable Memory Environments (2024) • No Venue
Wang et al.
Bitnet A4.8: 4-bit Activations For 1-bit Llms (2024) • No Venue
Hongyu Wang, Shuming Ma, Furu Wei
Guide-and-rescale: Self-guidance Mechanism For Effective Tuning-free Real Image Editing (2024) • No Venue
Titov et al.
Visual Autoregressive Modeling: Scalable Image Generation Via Next-scale Prediction (2024) • No Venue
Tian et al.
Mobillama: Towards Accurate And Lightweight Fully Transparent GPT (2024) • No Venue
Thawakar et al.
Multilingual E5 Text Embeddings: A Technical Report (2024) • No Venue
Wang et al.
Jamba-1.5: Hybrid Transformer-mamba Models At Scale (2024) • No Venue
Team et al.
Scaling Laws With Vocabulary: Larger Models Deserve Larger Vocabularies (2024) • No Venue
Tao et al.
Mambabyte: Token-free Selective State Space Model (2024) • No Venue
Wang et al.
Longllava: Scaling Multi-modal Llms To 1000 Images Efficiently Via Hybrid Architecture (2024) • No Venue
Wang et al.
The Mamba In The Llama: Distilling And Accelerating Hybrid Models (2024) • No Venue
Wang et al.
Grutopia: Dream General Robots In A City At Scale (2024) • No Venue
Wang et al.
Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey To The Edge Of Generalization (2024) • No Venue
Wang et al.
Let The Expert Stick To His Last: Expert-specialized Fine-tuning For Sparse Architectural Large Language Models (2024) • No Venue
Wang et al.
Is Mamba Effective For Time Series Forecasting? (2024) • Neurocomputing • 84 citations
Wang et al.
How Do Your Code Llms Perform? Empowering Code Instruction Tuning With High-quality Data (2024) • No Venue
Wang et al.
Litesearch: Efficacious Tree Search For LLM (2024) • No Venue
Wang et al.
MOSAIC: A Modular System For Assistive And Interactive Cooking (2024) • No Venue
Wang et al.
Utilizing Local Hierarchy With Adversarial Training For Hierarchical Text Classification (2024) • ACM Computing Surveys • 58 citations
Zihan Wang, Peiyi Wang, Houfeng Wang
Falcon Mamba: The First Competitive Attention-free 7B Language Model (2024) • No Venue
Zuo et al.
Structlm: Towards Building Generalist Models For Structured Knowledge Grounding (2024) • No Venue
Zhuang et al.
Bitsfusion: 1.99 Bits Weight Quantization Of Diffusion Model (2024) • No Venue
Sui et al.
Branch-train-mix: Mixing Expert Llms Into A Mixture-of-experts LLM (2024) • No Venue
Sukhbaatar et al.
Hunyuan-large: An Open-source Moe Model With 52 Billion Activated Parameters By Tencent (2024) • No Venue
Sun et al.
Os-genesis: Automating GUI Agent Trajectory Construction Via Reverse Task Synthesis (2024) • No Venue
Sun et al.
Videollamb: Long-context Video Understanding With Recurrent Memory Bridges (2024) • No Venue
Wang et al.
Yolov9: Learning What You Want To Learn Using Programmable Gradient Information (2024) • No Venue
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
Yolov10: Real-time End-to-end Object Detection (2024) • Arxiv • 950 citations
Wang et al.
Videoagent: Long-form Video Understanding With Large Language Model As Agent (2024) • No Venue
Wang et al.
Qwen2-vl: Enhancing Vision-language Model's Perception Of The World At Any Resolution (2024) • No Venue
Wang et al.
Searching For Best Practices In Retrieval-augmented Generation (2024) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 44 citations
Wang et al.
Self-training With Direct Preference Optimization Improves Chain-of-thought Reasoning (2024) • No Venue
Tianduo Wang, Shichen Li, Wei Lu
Q-sparse: All Large Language Models Can Be Fully Sparsely-activated (2024) • No Venue
Wang et al.
Htmlrag: HTML Is Better Than Plain Text For Modeling Retrieved Knowledge In RAG Systems (2024) • No Venue
Tan et al.
Strokenuwa: Tokenizing Strokes For Vector Graphic Synthesis (2024) • No Venue
Tang et al.
LOGO -- Long Context Alignment Via Efficient Preference Optimization (2024) • No Venue
Tang et al.
Ominicontrol: Minimal And Universal Control For Diffusion Transformer (2024) • No Venue
Tan et al.
Lloco: Learning Long Contexts Offline (2024) • No Venue
Tan et al.
Pre-training Small Base Lms With Fewer Tokens (2024) • No Venue
Sunny Sanyal, Sujay Sanghavi, Alexandros G. Dimakis
Writing In The Margins: Better Inference Pattern For Long Context Retrieval (2024) • No Venue
Russak et al.
How To Train Data-efficient Llms (2024) • No Venue
Sachdeva et al.
Litevae: Lightweight And Efficient Variational Autoencoders For Latent Diffusion Models (2024) • No Venue
Sadat et al.
Scaling Smart: Accelerating Large Language Model Pre-training With Small Model Initialization (2024) • No Venue
Samragh et al.
Llama-nas: Efficient Neural Architecture Search For Large Language Models (2024) • No Venue
Sarah et al.
Fast High-resolution Image Synthesis With Latent Adversarial Diffusion Distillation (2024) • No Venue
Sauer et al.
A Large Recurrent Action Model: Xlstm Enables Fast Inference For Robotics Tasks (2024) • No Venue
Schmied et al.
BOND: Aligning Llms With Best-of-n Distillation (2024) • No Venue
Sessa et al.
Synth^2: Boosting Visual-language Models With Synthetic Captions And Image Embeddings (2024) • No Venue
Sharifzadeh et al.
Imp: Highly Capable Large Multimodal Models For Mobile Devices (2024) • No Venue
Shao et al.
Scaling Retrieval-based Language Models With A Trillion-token Datastore (2024) • No Venue
Shao et al.
GLEE: A Unified Framework And Benchmark For Language-based Economic Environments (2024) • No Venue
Shapira et al.
Programming Every Example: Lifting Pre-training Data Quality Like Experts At Scale (2024) • No Venue
Zhou et al.
Scaling Laws For Linear Complexity Language Models (2024) • No Venue
Shen et al.
Jetmoe: Reaching Llama2 Performance With 0.1M Dollars (2024) • No Venue
Shen et al.
Transfusion: Predict The Next Token And Diffuse Images With One Multi-modal Model (2024) • No Venue
Zhou et al.
Mamba-in-mamba: Centralized Mamba-cross-scan In Tokenized Mamba Model For Hyperspectral Image Classification (2024) • Neurocomputing • 60 citations
Zhou et al.
Lumos : Empowering Multimodal Llms With Scene Text Recognition (2024) • No Venue
Shenoy et al.
Explicit Visual Prompts For Visual Object Tracking (2024) • Proceedings of the AAAI Conference on Artificial Intelligence • 40 citations
Shi et al.
Discovering The Gems In Early Layers: Accelerating Long-context Llms With 1000x Input Token Reduction (2024) • No Venue
Shi et al.
VSSD: Vision Mamba With Non-casual State Space Duality (2024) • No Venue
Shi et al.
When Do We Not Need Larger Vision Models? (2024) • No Venue
Shi et al.
APOLLO: Sgd-like Memory, Adamw-level Performance (2024) • No Venue
Zhu et al.
Llava-mod: Making Llava Tiny Via Moe Knowledge Distillation (2024) • No Venue
Shu et al.
Scaling LLM Test-time Compute Optimally Can Be More Effective Than Scaling Model Parameters (2024) • No Venue
Snell et al.
Turbo Sparse: Achieving LLM SOTA Performance With Minimal Activated Parameters (2024) • No Venue
Song et al.
LLM Pruning And Distillation In Practice: The Minitron Approach (2024) • No Venue
Sreenivas et al.
Distilled Decoding 1: One-step Sampling Of Image Auto-regressive Models With Flow Matching (2024) • No Venue
Liu et al.
Bitdelta: Your Fine-tune May Only Be Worth One Bit (2024) • No Venue
Liu et al.
Deliberation In Latent Space Via Differentiable Cache Augmentation (2024) • No Venue
Liu et al.
CLEAR: Conv-like Linearization Revs Pre-trained Diffusion Transformers Up (2024) • No Venue
Songhua Liu, Zhenxiong Tan, Xinchao Wang
DDK: Distilling Domain Knowledge For Efficient Large Language Models (2024) • No Venue
Liu et al.
Regmix: Data Mixture As Regression For Language Model Pre-training (2024) • No Venue
Liu et al.
Lightweight Deep Learning For Resource-constrained Environments: A Survey (2024) • ACM Computing Surveys • 100 citations
Liu et al.
KAN: Kolmogorov-arnold Networks (2024) • No Venue
Liu et al.
Kangaroo: Lossless Self-speculative Decoding Via Double Early Exiting (2024) • No Venue
Liu et al.
Linrec: Linear Attention Mechanism For Long-term Sequential Recommender Systems (2024) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 52 citations
Liu et al.
NVILA: Efficient Frontier Visual Language Models (2024) • No Venue
Liu et al.
Magicquill: An Intelligent Interactive Image Editing System (2024) • No Venue
Liu et al.
Mobilellm: Optimizing Sub-billion Parameter Language Models For On-device Use Cases (2024) • No Venue
Liu et al.
Oryx MLLM: On-demand Spatial-temporal Understanding At Arbitrary Resolution (2024) • No Venue
Liu et al.
Tuning Language Models By Proxy (2024) • No Venue
Liu et al.
Rscama: Remote Sensing Image Change Captioning With State Space Model (2024) • IEEE Geoscience and Remote Sensing Letters • 59 citations
Liu et al.
Retrievalattention: Accelerating Long-context LLM Inference Via Vector Retrieval (2024) • No Venue
Liu et al.
Surgical SAM 2: Real-time Segment Anything In Surgical Video By Efficient Frame Pruning (2024) • No Venue
Liu et al.
Understanding Llms: A Comprehensive Overview From Training To Inference (2024) • No Venue
Liu et al.
VPTQ: Extreme Low-bit Vector Post-training Quantization For Large Language Models (2024) • No Venue
Liu et al.
Configurable Foundation Models: Building Llms From A Modular Perspective (2024) • No Venue
Xiao et al.
FP6-LLM: Efficiently Serving Large Language Models Through Fp6-centric Algorithm-system Co-design (2024) • No Venue
Xia et al.
Vit-comer: Vision Transformer With Convolutional Multi-scale Feature Interaction For Dense Predictions (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 80 citations
Xia et al.
Tinyllama: An Open-source Small Language Model (2024) • No Venue
Zhang et al.
Llama-berry: Pairwise Optimization For O1-like Olympiad-level Mathematical Reasoning (2024) • No Venue
Zhang et al.
Sageattention: Accurate 8-bit Attention For Plug-and-play Inference Acceleration (2024) • No Venue
Zhang et al.
Onegen: Efficient One-pass Unified Generation And Retrieval For Llms (2024) • No Venue
Zhang et al.
Recurrent Drafter For Fast Speculative Decoding In Large Language Models (2024) • No Venue
Zhang et al.
POA: Pre-training Once For Models Of All Sizes (2024) • No Venue
Zhang et al.
Q-galore: Quantized Galore With INT4 Projection And Layer-adaptive Low-rank Gradients (2024) • No Venue
Zhang et al.
Sageattention2 Technical Report: Accurate 4 Bit Attention For Plug-and-play Inference Acceleration (2024) • No Venue
Zhang et al.
SF-V: Single Forward Video Generation Model (2024) • No Venue
Zhang et al.
SPAR: Personalized Content-based Recommendation Via Long Engagement Attention (2024) • No Venue
Zhang et al.
Deepseek-vl: Towards Real-world Vision-language Understanding (2024) • No Venue
Lu et al.
Bluelm-v-3b: Algorithm And System Co-design For Multimodal Large Language Models On Mobile Devices (2024) • No Venue
Lu et al.
Blending Is All You Need: Cheaper, Better Alternative To Trillion-parameters LLM (2024) • No Venue
Lu et al.
Soaring From 4K To 400K: Extending Llm's Context With Activation Beacon (2024) • No Venue
Zhang et al.
Divide-or-conquer? Which Part Should You Distill Your LLM? (2024) • No Venue
Wu et al.
Meta-chunking: Learning Efficient Text Segmentation Via Logical Perception (2024) • No Venue
Zhao et al.
Reft: Representation Finetuning For Language Models (2024) • No Venue
Wu et al.
Semievol: Semi-supervised Fine-tuning For LLM Adaptation (2024) • No Venue
Luo et al.
Addition Is All You Need For Energy-efficient Language Models (2024) • No Venue
Hongyin Luo, Wei Sun
Improve Mathematical Reasoning In Language Models By Automated Process Supervision (2024) • No Venue
Luo et al.
Galore: Memory-efficient LLM Training By Gradient Low-rank Projection (2024) • No Venue
Zhao et al.
Cobra: Extending Mamba To Multi-modal Large Language Model For Efficient Inference (2024) • No Venue
Zhao et al.
Megalodon: Efficient LLM Pretraining And Inference With Unlimited Context Length (2024) • No Venue
Ma et al.
Weblinx: Real-world Website Navigation With Multi-turn Dialogue (2024) • No Venue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
A Survey To Recent Progress Towards Understanding In-context Learning (2024) • Frontiers of Computer Science • 40 citations
Mao et al.
Fastercache: Training-free Video Diffusion Model Acceleration With High Quality (2024) • No Venue
Lv et al.
Yuan 2.0-M32: Mixture Of Experts With Attention Router (2024) • No Venue
Wu et al.
SCOPE: Optimizing Key-value Cache Compression In Long-context Generation (2024) • No Venue
Wu et al.
Llmparser: An Exploratory Study On Using Large Language Models For Log Parsing (2024) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 49 citations
Ma et al.
The Era Of 1-bit Llms: All Large Language Models Are In 1.58 Bits (2024) • No Venue
Ma et al.
GEB-1.3B: Open Lightweight Large Language Model (2024) • No Venue
Wu et al.
Layer-condensed KV Cache For Efficient Inference Of Large Language Models (2024) • No Venue
Haoyi Wu, Kewei Tu
Rephrasing The Web: A Recipe For Compute And Data-efficient Language Modeling (2024) • No Venue
Maini et al.
Diffusekrona: A Parameter Efficient Fine-tuning Method For Personalized Diffusion Model (2024) • No Venue
Marjit et al.
Openelm: An Efficient Language Model Family With Open-source Training And Inference Framework (2024) • No Venue
Mehta et al.
Bigger Is Not Always Better: Scaling Properties Of Latent Diffusion Models (2024) • No Venue
Mei et al.
LLM Agent Operating System (2024) • No Venue
Mei et al.
Snap Video: Scaled Spatiotemporal Transformers For Text-to-video Synthesis (2024) • No Venue
Menapace et al.
Shortgpt: Layers In Large Language Models Are More Redundant Than You Expect (2024) • No Venue
Men et al.
Cut Your Losses In Large-vocabulary Language Models (2024) • No Venue
Wijmans et al.
Contextual Document Embeddings (2024) • No Venue
John X. Morris, Alexander M. Rush
Olmoe: Open Mixture-of-experts Language Models (2024) • No Venue
Muennighoff et al.
Leave No Context Behind: Efficient Infinite Context Transformers With Infini-attention (2024) • No Venue
Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal
Compact Language Models Via Pruning And Knowledge Distillation (2024) • No Venue
Muralidharan et al.
A Survey Of Small Language Models (2024) • No Venue
Nguyen et al.
Swiftedit: Lightning Fast Text-guided Image Editing Via One-step Diffusion (2024) • No Venue
Nguyen et al.
DITTO: Diffusion Inference-time T-optimization For Music Generation (2024) • No Venue
Novack et al.
User-llm: Efficient LLM Contextualization With User Embeddings (2024) • No Venue
Ning et al.
Beyond Scaling Laws: Understanding Transformer Performance With Associative Memory (2024) • No Venue
Niu et al.
Towards Modular Llms By Building And Reusing A Library Of Loras (2024) • No Venue
Ostapenko et al.
Transformers Are Multi-state Rnns (2024) • No Venue
Oren et al.
H2o-danube3 Technical Report (2024) • No Venue
Pfeiffer et al.
Byte Latent Transformer: Patches Scale Better Than Tokens (2024) • No Venue
Pagnoni et al.
Diffusion Augmented Agents: A Framework For Efficient Exploration And Transfer Learning (2024) • No Venue
Palo et al.
Llmlingua-2: Data Distillation For Efficient And Faithful Task-agnostic Prompt Compression (2024) • No Venue
Pan et al.
Llmtimesmapreduce: Simplified Long-sequence Processing Using Large Language Models (2024) • No Venue
Zhou et al.
Controlnext: Powerful And Efficient Control For Image And Video Generation (2024) • No Venue
Peng et al.
Eagle And Finch: RWKV With Matrix-valued States And Dynamic Recurrence (2024) • No Venue
Peng et al.
Moe-mamba: Efficient Selective State Space Models With Mixture Of Experts (2024) • No Venue
Pióro et al.
Movie Gen: A Cast Of Media Foundation Models (2024) • No Venue
Polyak et al.
Mutual Reasoning Makes Smaller Llms Stronger Problem-solvers (2024) • No Venue
Qi et al.
Code-as-monitor: Constraint-aware Visual Programming For Reactive And Proactive Robotic Failure Detection (2024) • No Venue
Zhou et al.
HGRN2: Gated Linear Rnns With State Expansion (2024) • No Venue
Qin et al.
Lightning Attention-2: A Free Lunch For Handling Unlimited Sequence Lengths In Large Language Models (2024) • No Venue
Qin et al.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder For Fast, Memory Efficient, And Long Context Finetuning And Inference (2024) • No Venue
Warner et al.
Layerwise Recurrent Router For Mixture-of-experts (2024) • No Venue
Qiu et al.
Self-discover: Large Language Models Self-compose Reasoning Structures (2024) • No Venue
Zhou et al.
Xmodel-2 Technical Report (2024) • No Venue
Qun et al.
2BP: 2-stage Backpropagation (2024) • No Venue
Christopher Rae, Joseph K. L. Lee, James Richings
Editable Scene Simulation For Autonomous Driving Via Collaborative Llm-agents (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 40 citations
Wei et al.
Evaluating D-MERIT Of Partial-annotation On Information Retrieval (2024) • No Venue
Rassin et al.
Mixture-of-depths: Dynamically Allocating Compute In Transformer-based Language Models (2024) • No Venue
Raposo et al.
Trans-tokenization And Cross-lingual Vocabulary Transfers: Language Adaptation Of Llms For Low-resource NLP (2024) • No Venue
Remy et al.
Samba: Simple Hybrid State Space Models For Efficient Unlimited Context Language Modeling (2024) • No Venue
Ren et al.
Stacking Your Transformers: A Closer Look At Model Growth For Efficient LLM Pre-training (2024) • No Venue
Du et al.
Efficient Exploration For Llms (2024) • No Venue
Dwaracherla et al.
Learning To Move Like Professional Counter-strike Players (2024) • No Venue
Durst et al.
Buffer Of Thoughts: Thought-augmented Reasoning With Large Language Models (2024) • No Venue
Yang et al.
Balancing Pipeline Parallelism With Vocabulary Parallelism (2024) • No Venue
Yeung et al.
Geometry Image Diffusion: Fast And Data-efficient Text-to-3d With Image-based Surface Representation (2024) • No Venue
Slava Elizarov, Ciara Rowles, Simon Donné
Layerskip: Enabling Early Exit Inference And Self-speculative Decoding (2024) • No Venue
Elhoushi et al.
Llama-omni: Seamless Speech Interaction With Large Language Models (2024) • No Venue
Fang et al.
Maskllm: Learnable Semi-structured Sparsity For Large Language Models (2024) • No Venue
Fang et al.
Fuzzcoder: Byte-level Fuzzing Test Via Large Language Model (2024) • No Venue
Yang et al.
Kolmogorov-arnold Transformer (2024) • No Venue
Xingyi Yang, Xinchao Wang
Mgte: Generalized Long-context Text Representation And Reranking Models For Multilingual Text Retrieval (2024) • No Venue
Zhang et al.
Towards Fast Multilingual LLM Inference: Speculative Decoding And Specialized Drafters (2024) • No Venue
Yi et al.
Voco-llama: Towards Vision Compression With Large Language Models (2024) • No Venue
Ye et al.
Flightllm: Efficient Large Language Model Inference With A Complete Mapping Flow On Fpgas (2024) • FPGA '24: The 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays • 78 citations
Zeng et al.
Natural Language Reinforcement Learning (2024) • No Venue
Feng et al.
Nnsight And NDIF: Democratizing Access To Foundation Model Internals (2024) • No Venue
Fiotto-Kaufman et al.
Human-like Episodic Memory For Infinite Context Llms (2024) • No Venue
Fountas et al.
Lazyllm: Dynamic Token Pruning For Efficient Long Context LLM Inference (2024) • No Venue
Fu et al.
Efficient LLM Scheduling By Learning To Rank (2024) • No Venue
Fu et al.
Efficiently Serving LLM Reasoning Programs With Certaindex (2024) • No Venue
Fu et al.
Rethinking Patch Dependence For Masked Autoencoders (2024) • No Venue
Fu et al.
Chunkattention: Efficient Self-attention With Prefix-aware KV Cache And Two-phase Partition (2024) • No Venue
Ye et al.
WALL-E: World Alignment By Rule Learning Improves World Model-based LLM Agents (2024) • No Venue
Zhou et al.
Aligning Diffusion Models With Noise-conditioned Perception (2024) • No Venue
Gambashidze et al.
Efficient Tool Use With Chain-of-abstraction Reasoning (2024) • No Venue
Gao et al.
Flashspeech: Efficient Zero-shot Speech Synthesis (2024) • No Venue
Ye et al.
Seerattention: Learning Intrinsic Sparse Attention In Your Llms (2024) • No Venue
Gao et al.
Minicpm-v: A GPT-4V Level MLLM On Your Phone (2024) • No Venue
Yao et al.
Convllava: Hierarchical Backbones As Visual Encoder For Large Multimodal Models (2024) • No Venue
Ge et al.
Towards Flexible Perception With Visual Memory (2024) • No Venue
Geirhos et al.
Gemini 1.5: Unlocking Multimodal Understanding Across Millions Of Tokens Of Context (2024) • Arxiv • 253 citations
Team et al.
Goldfinch: High Performance Rwkv/transformer Hybrid With Linear Pre-fill And Extreme Kv-cache Compression (2024) • No Venue
Goldstein et al.
Better & Faster Large Language Models Via Multi-token Prediction (2024) • No Venue
Gloeckle et al.
Zamba: A Compact 7B SSM Hybrid Model (2024) • No Venue
Glorioso et al.
Olmo: Accelerating The Science Of Language Models (2024) • No Venue
Groeneveld et al.
Specialized Language Models With Cheap Inference From Limited Domain Data (2024) • No Venue
Grangier et al.
The Unreasonable Ineffectiveness Of The Deeper Layers (2024) • No Venue
Gromov et al.
DART: Denoising Autoregressive Transformer For Scalable Text-to-image Generation (2024) • No Venue
Gu et al.
1.58-bit FLUX (2024) • No Venue
Yang et al.
Direct Language Model Alignment From Online AI Feedback (2024) • No Venue
Guo et al.
Efficient Continual Pre-training By Mitigating The Stability Gap (2024) • No Venue
Guo et al.
Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss (2024) • No Venue
Gupta et al.
Ltx-video: Realtime Video Latent Diffusion (2024) • No Venue
Hacohen et al.
Token-budget-aware LLM Reasoning (2024) • No Venue
Han et al.
Rethinking Token Reduction In Mllms: Towards A Unified Paradigm For Training-free Acceleration (2024) • No Venue
Han et al.
Face Adapter For Pre-trained Diffusion Models With Fine-grained ID And Attribute Control (2024) • No Venue
Han et al.
JPEG-LM: Llms As Image Generators With Canonical Codec Representations (2024) • No Venue
Han et al.
Mambavision: A Hybrid Mamba-transformer Vision Backbone (2024) • No Venue
Ali Hatamizadeh, Jan Kautz
Inference Performance Optimization For Large Language Models On Cpus (2024) • No Venue
He et al.
Distill Visual Chart Reasoning Ability From Llms To Mllms (2024) • No Venue
He et al.
What Matters In Transformers? Not All Attention Is Needed (2024) • No Venue
He et al.
Visionzip: Longer Is Better But Not Necessary In Vision Language Models (2024) • No Venue
Yang et al.
Law Of Vision Representation In Mllms (2024) • No Venue
Yang et al.
Block Transformer: Global-to-local Language Modeling For Fast Inference (2024) • No Venue
Ho et al.
Algorithmic Progress In Language Models (2024) • No Venue
Ho et al.
Salience DETR: Enhancing Detection Transformer With Hierarchical Salience Filtering Refinement (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 59 citations
Hou et al.
No More Adam: Learning Rate Scaling At Initialization Is All You Need (2024) • No Venue
Xu et al.
Minicpm: Unveiling The Potential Of Small Language Models With Scalable Training Strategies (2024) • No Venue
Hu et al.
Longrecipe: Recipe For Efficient Long Context Generalization In Large Languge Models (2024) • No Venue
Hu et al.
Exploring Model Kinship For Merging Large Language Models (2024) • No Venue
Hu et al.
Openrlhf: An Easy-to-use, Scalable And High-performance RLHF Framework (2024) • No Venue
Hu et al.
Snapgen: Taming High-resolution Text-to-image Models For Mobile Devices With Efficient Architectures And Training (2024) • No Venue
Hu et al.
Compression Represents Intelligence Linearly (2024) • No Venue
Huang et al.
Yulan-mini: An Open Data-efficient Language Model (2024) • No Venue
Hu et al.
Billm: Pushing The Limit Of Post-training Quantization For Llms (2024) • No Venue
Huang et al.
Autocrawler: A Progressive Understanding Web Agent For Web Crawler Generation (2024) • No Venue
Huang et al.
O1 Replication Journey -- Part 2: Surpassing O1-preview Through Simple Distillation, Big Progress Or Bitter Lesson? (2024) • No Venue
Huang et al.
How Good Are Low-bit Quantized Llama3 Models? An Empirical Study (2024) • No Venue
Huang et al.
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation (2024) • No Venue
Huang et al.
Mv-adapter: Multi-view Consistent Image Generation Made Easy (2024) • No Venue
Huang et al.
Mmevalpro: Calibrating Multimodal Benchmarks Towards Trustworthy And Efficient Evaluation (2024) • No Venue
Huang et al.
Piccolo2: General Text Embedding With Multi-task Hybrid Loss Training (2024) • No Venue
Huang et al.
Ultra-sparse Memory Network (2024) • No Venue
Huang et al.
Autocoderover: Autonomous Program Improvement (2024) • Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis • 54 citations
Zhang et al.
Affordance-based Robot Manipulation With Flow Matching (2024) • No Venue
Fan Zhang, Michael Gienger
Mixture Of Nested Experts: Adaptive Processing Of Visual Tokens (2024) • No Venue
Jain et al.
Adaptive-rag: Learning To Adapt Retrieval-augmented Large Language Models Through Question Complexity (2024) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 72 citations
Jeong et al.
Repeat After Me: Transformers Are Better Than State Space Models At Copying (2024) • No Venue
Jelassi et al.
Large Language Models For Uavs: Current State And Pathways To The Future (2024) • IEEE Open Journal of Vehicular Technology • 46 citations
Shumaila Javaid, Nasir Saeed, Bin He
Wavtokenizer: An Efficient Acoustic Discrete Codec Tokenizer For Audio Language Modeling (2024) • No Venue
Ji et al.
Megascale: Scaling Large Language Model Training To More Than 10,000 Gpus (2024) • No Venue
Jiang et al.
Many-shot In-context Learning In Multimodal Foundation Models (2024) • No Venue
Jiang et al.
Minference 1.0: Accelerating Pre-filling For Long-context Llms Via Dynamic Sparse Attention (2024) • No Venue
Jiang et al.
Mora: High-rank Updating For Parameter-efficient Fine-tuning (2024) • No Venue
Jiang et al.
Moh: Multi-head Attention As Mixture-of-head Attention (2024) • No Venue
Jin et al.
Pyramidal Flow Matching For Efficient Video Generative Modeling (2024) • No Venue
Jin et al.
Naturalspeech 3: Zero-shot Speech Synthesis With Factorized Codec And Diffusion Models (2024) • No Venue
Ju et al.
Hydragen: High-throughput LLM Inference With Shared Prefixes (2024) • No Venue
Juravsky et al.
Adaptive Caching For Faster Video Generation With Diffusion Transformers (2024) • No Venue
Kahatapitiya et al.
Spectra: A Comprehensive Study Of Ternary, Quantized, And FP16 Language Models (2024) • No Venue
Kaushal et al.
Large Language Models Meet Collaborative Filtering: An Efficient All-round Llm-based Recommender System (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 67 citations
Kim et al.
Openvla: An Open-source Vision-language-action Model (2024) • No Venue
Kim et al.
Powerinfer-2: Fast Large Language Model Inference On A Smartphone (2024) • No Venue
Xue et al.
Openmoe: An Early Effort On Open Mixture-of-experts Language Models (2024) • No Venue
Xue et al.
Process Modeling With Large Language Models (2024) • Lecture Notes in Business Information Processing • 50 citations
Kourani et al.
Jina CLIP: Your CLIP Model Is Also Your Text Retriever (2024) • No Venue
Koukounas et al.
In Search Of Needles In A 10M Haystack: Recurrent Memory Finds What Llms Miss (2024) • No Venue
Kuratov et al.
Babilong: Testing The Limits Of Llms With Long Context Reasoning-in-a-haystack (2024) • No Venue
Kuratov et al.
"give Me BF16 Or Give Me Death"? Accuracy-performance Trade-offs In LLM Quantization (2024) • No Venue
Kurtic et al.
Autowebglm: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent (2024) • No Venue
Lai et al.
Step-dpo: Step-wise Preference Optimization For Long-chain Reasoning Of Llms (2024) • No Venue
Lai et al.
Pllava : Parameter-free Llava Extension From Images To Videos For Video Dense Captioning (2024) • No Venue
Xu et al.
Onebit: Towards Extremely Low-bit Large Language Models (2024) • No Venue
Xu et al.
Unlocking The Conversion Of Web Screenshots Into HTML Code With The Websight Dataset (2024) • No Venue
Hugo Laurençon, Léo Tronchon, Victor Sanh
What Matters When Building Vision-language Models? (2024) • No Venue
Laurençon et al.
Gecko: Versatile Text Embeddings Distilled From Large Language Models (2024) • No Venue
Lee et al.
Phantom Of Latent For Large Language And Vision Models (2024) • No Venue
Lee et al.
Think: Thinner Key Cache By Query-driven Pruning (2024) • No Venue
Xu et al.
Clip-moe: Towards Building Mixture Of Experts For CLIP With Diversified Multiplet Upcycling (2024) • No Venue
Zhang et al.
Beyond A*: Better Planning With Transformers Via Search Dynamics Bootstrapping (2024) • No Venue
Lehnert et al.
Xmodel-vlm: A Simple Baseline For Multimodal Vision Language Model (2024) • No Venue
Xu et al.
Selective Attention Improves Transformer (2024) • No Venue
Yaniv Leviathan, Matan Kalman, Yossi Matias
Training Llms Over Neurally Compressed Text (2024) • No Venue
Lester et al.
Lmms-eval: Reality Check On The Evaluation Of Large Multimodal Models (2024) • No Venue
Zhang et al.
Direct Preference Knowledge Distillation For Large Language Models (2024) • No Venue
Li et al.
Controlnet++: Improving Conditional Controls With Efficient Consistency Feedback (2024) • No Venue
Li et al.
K-sort Arena: Efficient And Reliable Benchmarking For Generative Models Via K-wise Human Preferences (2024) • No Venue
Li et al.
Focusllm: Scaling Llm's Context By Parallel Decoding (2024) • No Venue
Li et al.
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty (2024) • No Venue
Li et al.
Dual3d: Efficient And Consistent Text-to-3d Generation With Dual-mode Multi-view Latent Diffusion (2024) • No Venue
Li et al.
TPI-LLM: Serving 70b-scale Llms Efficiently On Low-resource Edge Devices (2024) • No Venue
Li et al.
Mix-ln: Unleashing The Power Of Deeper Layers By Combining Pre-ln And Post-ln (2024) • No Venue
Pengxiang Li, Lu Yin, Shiwei Liu
Retrollm: Empowering Large Language Models To Retrieve Fine-grained Evidence Within Generation (2024) • No Venue
Li et al.
Promptkd: Unsupervised Prompt Distillation For Vision-language Models (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Li et al.
Tokenpacker: Efficient Visual Projector For Multimodal LLM (2024) • No Venue
Li et al.
Svdqunat: Absorbing Outliers By Low-rank Components For 4-bit Diffusion Models (2024) • No Venue
Li et al.
Snapkv: LLM Knows What You Are Looking For Before Generation (2024) • No Venue
Li et al.
T2v-turbo: Breaking The Quality Bottleneck Of Video Consistency Model With Mixed Reward Feedback (2024) • No Venue
Li et al.
What Happened In Llms Layers When Trained For Fast Vs. Slow Thinking: A Gradient Perspective (2024) • No Venue
Ming Li, Yanhong Li, Tianyi Zhou
Transformer-lite: High-efficiency Deployment Of Large Language Models On Mobile Phone Gpus (2024) • No Venue
Li et al.
Adam-mini: Use Fewer Learning Rates To Gain More (2024) • No Venue
Zhang et al.
Gated Slot Attention For Efficient Linear-time Sequence Modeling (2024) • No Venue
Zhang et al.
GRAPE: Generalizing Robot Policy Via Preference Alignment (2024) • No Venue
Zhang et al.
Pyramiddrop: Accelerating Your Large Vision-language Models Via Pyramid Visual Redundancy Reduction (2024) • No Venue
Xing et al.
Multi-layer Transformers Gradient Can Be Approximated In Almost Linear Time (2024) • No Venue
Liang et al.
I-SHEEP: Self-alignment Of LLM From Scratch Through An Iterative Self-enhancement Paradigm (2024) • No Venue
Liang et al.
Step-aware Preference Optimization: Aligning Preference With Denoising Performance At Each Step (2024) • No Venue
Liang et al.
Adding Nvme Ssds To Enable And Accelerate 100B Model Fine-tuning On A Single GPU (2024) • No Venue
Liao et al.
HARE: Human Priors, A Key To Small Language Model Efficiency (2024) • No Venue
Zhang et al.
Critic-v: VLM Critics Help Catch VLM Errors In Multimodal Reasoning (2024) • No Venue
Zhang et al.
Instruct-musicgen: Unlocking Text-to-music Editing For Music Language Models Via Instruction Tuning (2024) • No Venue
Zhang et al.
Paper Copilot: A Self-evolving And Efficient LLM System For Personalized Academic Assistance (2024) • No Venue
Lin et al.
Data-efficient Fine-tuning For Llm-based Recommendation (2024) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 80 citations
Lin et al.
Ctrl-adapter: An Efficient And Versatile Framework For Adapting Diverse Controls To Any Diffusion Model (2024) • No Venue
Lin et al.
Open-sora Plan: Open-source Large Video Generation Model (2024) • No Venue
Lin et al.
Moma: Efficient Early-fusion Pre-training With Mixture Of Modality-aware Experts (2024) • No Venue
Lin et al.
Moe-llava: Mixture Of Experts For Large Vision-language Models (2024) • No Venue
Lin et al.
Showui: One Vision-language-action Model For GUI Visual Agent (2024) • No Venue
Lin et al.
Rho-1: Not All Tokens Are What You Need (2024) • No Venue
Lin et al.
ELITE: Encoding Visual Concepts Into Textual Embeddings For Customized Text-to-image Generation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 160 citations
Wei et al.
Resurrecting Recurrent Neural Networks For Long Sequences (2023) • Arxiv • 42 citations
Orvieto et al.
Key-locked Rank One Editing For Text-to-image Personalization (2023) • Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings • 87 citations
Tewel et al.
LMDX: Language Model-based Document Information Extraction And Localization (2023) • No Venue
Perot et al.
ECLIPSE: A Resource-efficient Text-to-image Prior For Image Generations (2023) • No Venue
Patel et al.
FP8-LM: Training FP8 Large Language Models (2023) • No Venue
Peng et al.
Yarn: Efficient Context Window Extension Of Large Language Models (2023) • No Venue
Peng et al.
RWKV: Reinventing Rnns For The Transformer Era (2023) • No Venue
Peng et al.
Cache Me If You Can: Accelerating Diffusion Models Through Block Caching (2023) • No Venue
Wimbauer et al.
Enabling Resource-efficient Aiot System With Cross-level Optimization: A Survey (2023) • IEEE Communications Surveys & Tutorials • 44 citations
Liu et al.
Efficientvit: Memory Efficient Vision Transformer With Cascaded Group Attention (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 522 citations
Liu et al.
Instaflow: One Step Is Enough For High-quality Diffusion-based Text-to-image Generation (2023) • No Venue
Liu et al.
LLM+P: Empowering Large Language Models With Optimal Planning Proficiency (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 61 citations
Liu et al.
Parameter-efficient Orthogonal Finetuning Via Butterfly Factorization (2023) • No Venue
Liu et al.
Pre-train, Prompt And Recommendation: A Comprehensive Survey Of Language Modelling Paradigm Adaptations In Recommender Systems (2023) • Transactions of the Association for Computational Linguistics • 65 citations
Peng Liu, Lemei Zhang, Jon Atle Gulla
Sherpa3d: Boosting High-fidelity Text-to-3d Generation Via Coarse 3D Prior (2023) • No Venue
Liu et al.
Generative Ai-enabled Vehicular Networks: Fundamentals, Framework, And Case Study (2023) • IEEE Network • 55 citations
Zhang et al.
EVA-CLIP: Improved Training Techniques For CLIP At Scale (2023) • Arxiv • 77 citations
Sun et al.
Pytorch FSDP: Experiences On Scaling Fully Sharded Data Parallel (2023) • Proceedings of the VLDB Endowment • 115 citations
Zhao et al.
Training Transformers With 4-bit Integers (2023) • No Venue
Xi et al.
Retentive Network: A Successor To Transformer For Large Language Models (2023) • No Venue
Sun et al.
Is Chatgpt Good At Search? Investigating Large Language Models As Re-ranking Agents (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 141 citations
Sun et al.
Can Chatgpt Forecast Stock Price Movements? Return Predictability And Large Language Models (2023) • SSRN Electronic Journal • 146 citations
Alejandro Lopez-Lira, Yuehua Tang
X-former: In-memory Acceleration Of Transformers (2023) • IEEE Transactions on Very Large Scale Integration (VLSI) Systems • 45 citations
Sridharan et al.
Visual Prompt Multi-modal Tracking (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 234 citations
Zhu et al.
Accelerating LLM Inference With Staged Speculative Decoding (2023) • No Venue
Benjamin Spector, Chris Re
Powerinfer: Fast Large Language Model Serving With A Consumer-grade GPU (2023) • No Venue
Song et al.
A Survey On Model Compression For Large Language Models (2023) • Transactions of the Association for Computational Linguistics • 85 citations
Zhu et al.
Lcm-lora: A Universal Stable-diffusion Acceleration Module (2023) • No Venue
Luo et al.
Chatagri: Exploring Potentials Of Chatgpt On Cross-linguistic Agricultural Text Classification (2023) • Neurocomputing • 96 citations
Zhao et al.
Full Parameter Fine-tuning For Large Language Models With Limited Resources (2023) • No Venue
Lv et al.
Dreameditor: Text-driven 3D Scene Editing With Neural Fields (2023) • SIGGRAPH Asia 2023 Conference Papers • 77 citations
Zhuang et al.
Biomedical Knowledge Graph-optimized Prompt Generation For Large Language Models (2023) • Bioinformatics • 51 citations
Soman et al.
A Transformer-based Model With Self-distillation For Multimodal Emotion Recognition In Conversations (2023) • IEEE Transactions on Multimedia • 71 citations
Ma et al.
Deepcache: Accelerating Diffusion Models For Free (2023) • No Venue
Xinyin Ma, Gongfan Fang, Xinchao Wang
HOICLIP: Efficient Knowledge Transfer For HOI Detection With Vision-language Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Ning et al.
Skeleton-of-thought: Large Language Models Can Do Parallel Decoding (2023) • No Venue
Ning et al.
A Comprehensive Overview Of Large Language Models (2023) • ACM Transactions on Intelligent Systems and Technology • 152 citations
Naveed et al.
Large Language Models In Healthcare And Medical Domain: A Review (2023) • Informatics • 192 citations
Zabir Al Nazi, Wei Peng
Directgpt: A Direct Manipulation Interface To Interact With Large Language Models (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 54 citations
Masson et al.
How Generative AI Models Such As Chatgpt Can Be (mis)used In SPC Practice, Education, And Research? An Exploratory Study (2023) • Quality Engineering • 116 citations
Megahed et al.
Anymal: An Efficient And Scalable Any-modality Augmented Language Model (2023) • No Venue
Moon et al.
Embodiedgpt: Vision-language Pre-training Via Embodied Chain Of Thought (2023) • Arxiv • 41 citations
Mu et al.
Large Language Models For Telecom: Forthcoming Impact On The Industry (2023) • IEEE Communications Magazine • 47 citations
Maatouk et al.
Self-refine: Iterative Refinement With Self-feedback (2023) • Arxiv • 202 citations
Madaan et al.
Recommending Root-cause And Mitigation Steps For Cloud Incidents Using Large Language Models (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 69 citations
Ahmed et al.
GQA: Training Generalized Multi-query Transformer Models From Multi-head Checkpoints (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 204 citations
Ainslie et al.
LLM In A Flash: Efficient Large Language Model Inference With Limited Memory (2023) • No Venue
Alizadeh et al.
Palm 2 Technical Report (2023) • Arxiv • 153 citations
Anil et al.
Bitnet: Scaling 1-bit Transformers For Large Language Models (2023) • No Venue
Wang et al.
Language Models Enable Simple Systems For Generating Structured Views Of Heterogeneous Data Lakes (2023) • Proceedings of the VLDB Endowment • 41 citations
Arora et al.
Advancing Requirements Engineering Through Generative AI: Assessing The Role Of Llms (2023) • Generative AI for Effective Software Development • 95 citations
Chetan Arora, John Grundy, Mohamed Abdelrazek
Fusionframes: Efficient Architectural Aspects For Text-to-video Generation Pipeline (2023) • No Venue
Arkhipkin et al.
Adaptive Shells For Efficient Neural Radiance Field Rendering (2023) • No Venue
Wang et al.
Sigmoid Loss For Language Image Pre-training (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 321 citations
Zhai et al.
Tallrec: An Effective And Efficient Tuning Framework To Align Large Language Model With Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 242 citations
Bao et al.
Exponentially Faster Language Modelling (2023) • No Venue
Peter Belcak, Roger Wattenhofer
Processgpt: Transforming Business Process Management With Generative Artificial Intelligence (2023) • 2023 IEEE International Conference on Web Services (ICWS) • 58 citations
Beheshti et al.
GPT As Knowledge Worker: A Zero-shot Evaluation Of (AI)CPA Capabilities (2023) • SSRN Electronic Journal • 54 citations
Bommarito et al.
Rethinking Attention: Exploring Shallow Feed-forward Neural Networks As An Alternative To Attention Layers In Transformers (2023) • No Venue
Bozic et al.
Multilora: Democratizing Lora For Better Multi-task Learning (2023) • No Venue
Wang et al.
The AI Generation Gap: Are Gen Z Students More Interested In Adopting Generative AI Such As Chatgpt In Teaching And Learning Than Their Gen X And Millennial Generation Teachers? (2023) • Smart Learning Environments • 354 citations
Cecilia Ka Yuk Chan, Katherine K. W. Lee
Pepnet: Parameter And Embedding Personalized Network For Infusing With Personalized Prior Information (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 78 citations
Chang et al.
LLM4TS: Aligning Pre-trained Llms As Data-efficient Time-series Forecasters (2023) • ACM Transactions on Intelligent Systems and Technology • 43 citations
Chang et al.
One-for-all: Generalized Lora For Parameter-efficient Fine-tuning (2023) • No Venue
Chavan et al.
Automatic Root Cause Analysis Via Large Language Models For Cloud Incidents (2023) • EuroSys '24: Nineteenth European Conference on Computer Systems • 77 citations
Chen et al.
Minigpt-v2: Large Language Model As A Unified Interface For Vision-language Multi-task Learning (2023) • No Venue
Chen et al.
Hdformer: High-order Directed Transformer For 3D Human Pose Estimation (2023) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 44 citations
Chen et al.
Lorashear: Efficient Large Language Model Structured Pruning And Knowledge Recovery (2023) • No Venue
Chen et al.
Longlora: Efficient Fine-tuning Of Long-context Large Language Models (2023) • No Venue
Chen et al.
Understanding And Improving Deep Graph Neural Networks: A Probabilistic Graphical Model Perspective (2023) • ACM Transactions on Reconfigurable Technology and Systems • 45 citations
Chen et al.
Scalable Multi-robot Collaboration With Large Language Models: Centralized Or Decentralized Systems? (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 46 citations
Chen et al.
Schrodinger Bridges Beat Diffusion Models On Text-to-speech Synthesis (2023) • No Venue
Chen et al.
Vanillanet: The Power Of Minimalism In Deep Learning (2023) • Arxiv • 82 citations
Chen et al.
Object-aware Distillation Pyramid For Open-vocabulary Object Detection (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Wang et al.
A GPT-4 Reticular Chemist For Guiding MOF Discovery (2023) • Angewandte Chemie International Edition • 117 citations
Zheng et al.
R2gengpt: Radiology Report Generation With Frozen Llms (2023) • Meta-Radiology • 92 citations
Wang et al.
Less Is More: Focus Attention For Efficient DETR (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 85 citations
Zheng et al.
Preventing Zero-shot Transfer Degradation In Continual Learning Of Vision-language Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 55 citations
Zheng et al.
Low-rank Adaptation Of Large Language Model Rescoring For Parameter-efficient Speech Recognition (2023) • No Venue
Yu et al.
Parameter-efficient Transfer Learning For Remote Sensing Image-text Retrieval (2023) • IEEE Transactions on Geoscience and Remote Sensing • 60 citations
Yuan Yuan, Yang Zhan, Zhitong Xiong
Tinygpt-v: Efficient Multimodal Large Language Model Via Small Backbones (2023) • No Venue
Zhengqing Yuan, Zhaoxu Li, Lichao Sun
Where To Go Next For Recommender Systems? ID- Vs. Modality-based Recommender Models Revisited (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 137 citations
Yuan et al.
A Survey On Deep Neural Network Pruning-taxonomy, Comparison, Analysis, And Recommendations (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 203 citations
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi
Fastvit: A Fast Hybrid Vision Transformer Using Structural Reparameterization (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 69 citations
Vasu et al.
SAM-CLIP: Merging Vision Foundation Models Towards Semantic And Spatial Understanding (2023) • No Venue
Wang et al.
Codegeex: A Pre-trained Model For Code Generation With Multilingual Benchmarking On Humaneval-x (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 158 citations
Zheng et al.
A Survey Of Techniques For Optimizing Transformer Inference (2023) • Journal of Systems Architecture • 90 citations
Chitty-Venkata et al.
Selective Structured State-spaces For Long-form Video Understanding (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 70 citations
Wang et al.
L3MVN: Leveraging Large Language Models For Visual Target Navigation (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 58 citations
Bangguo Yu, Hamidreza Kasaei, Ming Cao
Mobilevlm : A Fast, Reproducible And Strong Vision Language Assistant For Mobile Devices (2023) • No Venue
Chu et al.
RLHF-V: Towards Trustworthy Mllms Via Behavior Alignment From Fine-grained Correctional Human Feedback (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 63 citations
Yu et al.
Wavecoder: Widespread And Versatile Enhanced Instruction Tuning With Refined Data Generation (2023) • No Venue
Yu et al.
Swinmm: Masked Multi-view With Swin Transformers For 3D Medical Image Segmentation (2023) • Lecture Notes in Computer Science • 41 citations
Wang et al.
Efficient And Effective Text Encoding For Chinese Llama And Alpaca (2023) • Arxiv • 71 citations
Yiming Cui, Ziqing Yang, Xin Yao
Switchhead: Accelerating Transformers With Mixture-of-experts Attention (2023) • No Venue
Csordás et al.
Capsfusion: Rethinking Image-text Data At Scale (2023) • No Venue
Yu et al.
Benchmarking Neural Network Training Algorithms (2023) • No Venue
Dahl et al.
Effective Test Generation Using Pre-trained Large Language Models And Mutation Testing (2023) • Information and Software Technology • 47 citations
Dakhel et al.
Zephyr: Direct Distillation Of LM Alignment (2023) • Arxiv • 51 citations
Tunstall et al.
Flashattention-2: Faster Attention With Better Parallelism And Work Partitioning (2023) • Arxiv • 135 citations
Tri Dao
LLMR: Real-time Prompting Of Interactive Worlds Using Large Language Models (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 56 citations
Torre et al.
Patch N' Pack: Navit, A Vision Transformer For Any Aspect Ratio And Resolution (2023) • No Venue
Dehghani et al.
Language Modeling Is Compression (2023) • No Venue
Delétang et al.
Qlora: Efficient Finetuning Of Quantized Llms (2023) • No Venue
Dettmers et al.
Towards Accurate Post-training Quantization For Vision Transformer (2023) • MM '22: The 30th ACM International Conference on Multimedia • 44 citations
Ding et al.
A Comprehensive Survey On Multimodal Recommender Systems: Taxonomy, Evaluation, And Future Directions (2023) • Arxiv • 149 citations
Zhou et al.
DNABERT-2: Efficient Foundation Model And Benchmark For Multi-species Genome (2023) • Arxiv • 139 citations
Zhou et al.
Ankh: Optimized Protein Language Model Unlocks General-purpose Modelling (2023) • Arxiv • 95 citations
Elnaggar et al.
GPT-3.5, GPT-4, Or BARD? Evaluating Llms Reasoning Ability In Zero-shot Setting And Performance Boosting Through Prompts (2023) • Natural Language Processing Journal • 69 citations
Espejel et al.
RMT: Retentive Networks Meet Vision Transformers (2023) • No Venue
Fan et al.
Depgraph: Towards Any Structural Pruning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 335 citations
Fang et al.
GALIP: Generative Adversarial Clips For Text-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Tao et al.
Prompting Is All You Need: Automated Android Bug Replay With Large Language Models (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 89 citations
Sidong Feng, Chunyang Chen
Qmoe: Practical Sub-1-bit Compression Of Trillion-parameter Models (2023) • No Venue
Elias Frantar, Dan Alistarh
Encoder-based Domain Tuning For Fast Personalization Of Text-to-image Models (2023) • ACM Transactions on Graphics • 122 citations
Gal et al.
Distil-whisper: Robust Knowledge Distillation Via Large-scale Pseudo Labelling (2023) • No Venue
Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
Text-to-sql Empowered By Large Language Models: A Benchmark Evaluation (2023) • Proceedings of the VLDB Endowment • 111 citations
Gao et al.
Graph Masked Autoencoder For Sequential Recommendation (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 48 citations
Yaowen Ye, Lianghao Xia, Chao Huang
Deepsolo++: Let Transformer Decoder With Explicit Points Solo For Multilingual Text Spotting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 87 citations
Ye et al.
Prompt Cache: Modular Attention Reuse For Low-latency Inference (2023) • No Venue
Gim et al.
A Picture Is Worth A Thousand Words: Principled Recaptioning Improves Image Generation (2023) • No Venue
Segalis et al.
VIP5: Towards Multimodal Foundation Models For Recommendation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 41 citations
Geng et al.
Composable Function-preserving Expansions For Transformer Architectures (2023) • No Venue
Andrea Gesmundo, Kaitlin Maile
Can A Student Large Language Model Perform As Well As It's Teacher? (2023) • Advances in Medical Technologies and Clinical Practice • 49 citations
Sia Gholami, Marwan Omar
Chatgpt Outperforms Crowd-workers For Text-annotation Tasks (2023) • Arxiv • 68 citations
Fabrizio Gilardi, Meysam Alizadeh, Maël Kubli
Commoncanvas: An Open Diffusion Model Trained With Creative-commons Images (2023) • No Venue
Gokaslan et al.
Visual-language Prompt Tuning With Knowledge-guided Context Optimization (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 141 citations
Hantao Yao, Rui Zhang, Changsheng Xu
Knowledge Distillation Of Large Language Models (2023) • No Venue
Gu et al.
Mamba: Linear-time Sequence Modeling With Selective State Spaces (2023) • No Venue
Albert Gu, Tri Dao
Deepspeed-chat: Easy, Fast And Affordable RLHF Training Of Chatgpt-like Models At All Scales (2023) • No Venue
Yao et al.
Photorealistic Video Generation With Diffusion Models (2023) • No Venue
Gupta et al.
A Real-world Webagent With Planning, Long Context Understanding, And Program Synthesis (2023) • No Venue
Gur et al.
Lm-infinite: Simple On-the-fly Length Generalization For Large Language Models (2023) • No Venue
Han et al.
Beyond Chinchilla-optimal: Accounting For Inference In Language Model Scaling Laws (2023) • No Venue
Nikhil Sardana, Jonathan Frankle
Svdiff: Compact Parameter Space For Diffusion Fine-tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Han et al.
Stylegan-t: Unlocking The Power Of Gans For Fast Large-scale Text-to-image Synthesis (2023) • Arxiv • 59 citations
Sauer et al.
Reasoning With Language Model Is Planning With World Model (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 68 citations
Hao et al.
Fastervit: Fast Vision Transformers With Hierarchical Attention (2023) • No Venue
Hatamizadeh et al.
Hyperdreambooth: Hypernetworks For Fast Personalization Of Text-to-image Models (2023) • No Venue
Ruiz et al.
A Survey On Uncertainty Quantification Methods For Deep Learning (2023) • Arxiv • 50 citations
He et al.
From Words To Watts: Benchmarking The Energy Costs Of Large Language Model Inference (2023) • 2023 IEEE High Performance Extreme Computing Conference (HPEC) • 104 citations
Samsi et al.
On The Challenges And Perspectives Of Foundation Models For Medical Image Analysis (2023) • Medical Image Analysis • 165 citations
Shaoting Zhang, Dimitris Metaxas
Biomedgpt: A Generalist Vision-language Foundation Model For Diverse Biomedical Tasks (2023) • Arxiv • 49 citations
Zhang et al.
Metagpt: Meta Programming For A Multi-agent Collaborative Framework (2023) • Arxiv • 124 citations
Hong et al.
Flashdecoding++: Faster Large Language Model Inference On Gpus (2023) • No Venue
Hong et al.
TEAL: Tokenize And Embed ALL For Multi-modal Large Language Models (2023) • No Venue
Yang et al.
Distilling Step-by-step! Outperforming Larger Language Models With Less Training Data And Smaller Model Sizes (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 160 citations
Hsieh et al.
Mistral 7B (2023) • Arxiv • 219 citations
Jiang et al.
Harnessing The Power Of Llms In Practice: A Survey On Chatgpt And Beyond (2023) • ACM Transactions on Knowledge Discovery from Data • 303 citations
Yang et al.
Uniaudio: An Audio Foundation Model Toward Universal Audio Generation (2023) • No Venue
Yang et al.
Segment And Caption Anything (2023) • No Venue
Huang et al.
Lorahub: Efficient Cross-task Generalization Via Dynamic Lora Composition (2023) • No Venue
Huang et al.
Diversity-aware Meta Visual Prompting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Huang et al.
Flatformer: Flattened Window Attention For Efficient Point Cloud Transformer (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 83 citations
Liu et al.
Universalner: Targeted Distillation From Large Language Models For Open Named Entity Recognition (2023) • No Venue
Zhou et al.
Knowing Where To Focus: Event-aware Transformer For Video Grounding (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Jang et al.
VMC: Video Motion Customization Using Temporal Attention Adaption For Text-to-video Diffusion Models (2023) • No Venue
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
Vary: Scaling Up The Vision Vocabulary For Large Vision-language Models (2023) • No Venue
Wei et al.
Llmlingua: Compressing Prompts For Accelerated Inference Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 64 citations
Jiang et al.
LILAC: Log Parsing Using Llms With Adaptive Parsing Cache (2023) • Proceedings of the ACM on Software Engineering • 51 citations
Jiang et al.
Impact Of Code Language Models On Automated Program Repair (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 119 citations
Jiang et al.
Scedit: Efficient And Controllable Image Diffusion Generation Via Skip Connection Editing (2023) • No Venue
Jiang et al.
Sparq Attention: Bandwidth-efficient LLM Inference (2023) • No Venue
Ribar et al.
Matching Patients To Clinical Trials With Large Language Models (2023) • Nature Communications • 111 citations
Jin et al.
Representation Learning With Large Language Models For Recommendation (2023) • WWW '24: The ACM Web Conference 2024 • 118 citations
Ren et al.
Sg-former: Self-guided Transformer With Evolving Token Reallocation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 45 citations
Ren et al.
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages (2023) • Arxiv • 111 citations
Zhang et al.
Adapters: A Unified Library For Parameter-efficient And Modular Transfer Learning (2023) • No Venue
Poth et al.
Stack More Layers Differently: High-rank Training Through Low-rank Updates (2023) • No Venue
Lialin et al.
Scaling Down To Scale Up: A Guide To Parameter-efficient Fine-tuning (2023) • Arxiv • 66 citations
Lialin et al.
Exploring Format Consistency For Instruction Tuning (2023) • Computers, Environment and Urban Systems • 43 citations
Liang et al.
Automatic Prompt Optimization With "gradient Descent" And Beam Search (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 100 citations
Pryzant et al.
From Sparse To Soft Mixtures Of Experts (2023) • No Venue
Puigcerver et al.
GPT-4 Enhanced Multimodal Grounding For Autonomous Driving: Leveraging Cross-modal Attention With Large Language Models (2023) • Communications in Transportation Research • 57 citations
Liao et al.
AWQ: Activation-aware Weight Quantization For LLM Compression And Acceleration (2023) • GetMobile: Mobile Computing and Communications • 64 citations
Lin et al.
Llm-eval: Unified Multi-dimensional Automatic Evaluation For Open-domain Conversations With Large Language Models (2023) • Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023) • 45 citations
Yen-Ting Lin, Yun-Nung Chen
How Can Recommender Systems Benefit From Large Language Models: A Survey (2023) • ACM Transactions on Information Systems • 85 citations
Lin et al.
Efficient Domain Adaptation For Speech Foundation Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Li et al.
FLM-101B: An Open LLM And How To Train It With $100K Budget (2023) • No Venue
Li et al.
Lite DETR : An Interleaved Multi-scale Encoder For Efficient DETR (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 95 citations
Li et al.
JEN-1: Text-guided Universal Music Generation With Omnidirectional Diffusion Models (2023) • No Venue
Li et al.
Instant3d: Fast Text-to-3d With Sparse-view Generation And Large Reconstruction Model (2023) • No Venue
Li et al.
Loftq: Lora-fine-tuning-aware Quantization For Large Language Models (2023) • No Venue
Li et al.
Photomaker: Customizing Realistic Human Photos Via Stacked ID Embedding (2023) • No Venue
Li et al.
Llmrec: Large Language Models With Graph Augmentation For Recommendation (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 134 citations
Wei et al.
Scaling Transnormer To 175 Billion Parameters (2023) • No Venue
Qin et al.
Zero Bubble Pipeline Parallelism (2023) • No Venue
Qi et al.
S-lora: Serving Thousands Of Concurrent Lora Adapters (2023) • No Venue
Sheng et al.
Chatgpt Vs. Google: A Comparative Study Of Search Performance And User Experience (2023) • SSRN Electronic Journal • 46 citations
Ruiyun Xu, Yue Feng, Hailiang Chen
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 65 citations
Xu et al.
Qa-lora: Quantization-aware Low-rank Adaptation Of Large Language Models (2023) • No Venue
Xu et al.
Ufogen: You Forward Once Large Scale Text-to-image Generation Via Diffusion Gans (2023) • No Venue
Xu et al.
Multi: Efficient Video-and-language Understanding With Text-guided Multiway-sampler And Multiple Choice Modeling (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 50 citations
Xu et al.
Sorted Llama: Unlocking The Potential Of Intermediate Layers Of Large Language Models For Dynamic Inference Using Sorted Fine-tuning (soft) (2023) • No Venue
Kavehzadeh et al.
Towards Lightweight Cross-domain Sequential Recommendation Via External Attention-enhanced Graph Convolution Network (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 63 citations
Zhang et al.
Efficient Memory Management For Large Language Model Serving With Pagedattention (2023) • No Venue
Kwon et al.
Copy Is All You Need (2023) • No Venue
Lan et al.
Propainter: Improving Propagation And Transformer For Video Inpainting (2023) • No Venue
Zhou et al.
Hierspeech++: Bridging The Gap Between Semantic And Acoustic Representation Of Speech By Hierarchical Variational Inference For Zero-shot Speech Synthesis (2023) • No Venue
Lee et al.
Toolllm: Facilitating Large Language Models To Master 16000+ Real-world Apis (2023) • No Venue
Qin et al.
Filtering, Distillation, And Hard Negatives For Vision-language Pre-training (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Radenovic et al.
Large Language Models Are Effective Text Rankers With Pairwise Ranking Prompting (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 88 citations
Qin et al.
AUGER: Automatically Generating Review Comments With Pre-training Models (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 43 citations
Li et al.
Accelerating Attention Through Gradient-based Learned Runtime Pruning (2022) • Proceedings of the 49th Annual International Symposium on Computer Architecture • 42 citations
Li et al.
ELEVATER: A Benchmark And Toolkit For Evaluating Language-augmented Visual Models (2022) • Arxiv • 64 citations
Li et al.
Mplug: Effective And Efficient Vision-language Learning By Cross-modal Skip-connections (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 141 citations
Li et al.
Repq-vit: Scale Reparameterization For Post-training Quantization Of Vision Transformers (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 65 citations
Li et al.
Rethinking Query-key Pairwise Interactions In Vision Transformers (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 194 citations
Cheng Li, Yangxin Liu
Frozen CLIP Models Are Efficient Video Learners (2022) • Lecture Notes in Computer Science • 148 citations
Lin et al.
Vision Transformers Are Parameter-efficient Audio-visual Learners (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 64 citations
Lin et al.
Transrac: Encoding Multi-scale Temporal Correlation With Transformers For Repetitive Action Counting (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Hu et al.
A Smile Is All You Need: Predicting Limiting Activity Coefficients From SMILES With Natural Language Processing (2022) • Digital Discovery • 65 citations
Winter et al.
Simpleclick: Interactive Image Segmentation With Simple Vision Transformers (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 84 citations
Liu et al.
Few-shot Parameter-efficient Fine-tuning Is Better And Cheaper Than In-context Learning (2022) • Arxiv • 292 citations
Liu et al.
Visual Prompt Tuning (2022) • Lecture Notes in Computer Science • 1133 citations
Jia et al.
VIMA: General Robot Manipulation With Multimodal Prompts (2022) • Arxiv • 65 citations
Jiang et al.
Adamct: Adaptive Mixture Of Cnn-transformer For Sequential Recommendation (2022) • CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management • 43 citations
Jiang et al.
Fact: Factor-tuning For Lightweight Adaptation On Vision Transformer (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Shibo Jie, Zhi-Hong Deng
What To Hide From Your Students: Attention-guided Masked Image Modeling (2022) • Lecture Notes in Computer Science • 81 citations
Kakogeorgiou et al.
DRT: A Lightweight Single Image Deraining Recursive Transformer (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 60 citations
Yuanchu Liang, Saeed Anwar, Yang Liu
Large Language Models Are Few-shot Testers: Exploring Llm-based General Bug Reproduction (2022) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 123 citations
Sungmin Kang, Juyeon Yoon, Shin Yoo
Holistic Evaluation Of Language Models (2022) • Annals of the New York Academy of Sciences • 107 citations
Liang et al.
Automatic Detection And Analysis Of Technical Debts In Peer-review Documentation Of R Packages (2022) • Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering • 81 citations
Junaed Younus Khan, Gias Uddin
Decomposing Nerf For Editing Via Feature Field Distillation (2022) • Arxiv • 103 citations
Sosuke Kobayashi, Eiichi Matsumoto, Vincent Sitzmann
Multi-task Learning With Multi-query Transformer For Dense Prediction (2022) • IEEE Transactions on Circuits and Systems for Video Technology • 45 citations
Xu et al.
Deepspeed-moe: Advancing Mixture-of-experts Inference And Training To Power Next-generation AI Scale (2022) • Arxiv • 55 citations
Rajbhandari et al.
An Efficiency Study For SPLADE Models (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 58 citations
Carlos Lassance, Stéphane Clinchant
A New Generation Of Perspective API: Efficient Multilingual Character-level Transformers (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 88 citations
Lees et al.
Magicvideo: Efficient Video Generation With Latent Diffusion Models (2022) • Arxiv • 63 citations
Zhou et al.
Training-free Transformer Architecture Search (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Zhou et al.
ATTEMPT: Parameter-efficient Multi-task Tuning Via Attentional Mixtures Of Soft Prompts (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 49 citations
Asai et al.
Towards Data-efficient Detection Transformers (2022) • Lecture Notes in Computer Science • 52 citations
Wang et al.
Scene Text Recognition With Permuted Autoregressive Sequence Models (2022) • Lecture Notes in Computer Science • 183 citations
Darwin Bautista, Rowel Atienza
Simkgc: Simple Contrastive Knowledge Graph Completion With Pre-trained Language Models (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 151 citations
Wang et al.
Refined: An Efficient Zero-shot-capable Approach To End-to-end Entity Linking (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track • 53 citations
Ayoola et al.
Language Modeling Via Stochastic Processes (2022) • Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22} • 40 citations
Wang et al.
End-to-end Transformer Based Model For Image Captioning (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 107 citations
Yiyu Wang, Jungang Xu, Yingfei Sun
Ediff-i: Text-to-image Diffusion Models With An Ensemble Of Expert Denoisers (2022) • Arxiv • 222 citations
Balaji et al.
Prompting Is Programming: A Query Language For Large Language Models (2022) • Proceedings of the ACM on Programming Languages • 64 citations
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
GLM-130B: An Open Bilingual Pre-trained Model (2022) • Arxiv • 288 citations
Zeng et al.
Revisiting The "video" In Video-language Understanding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 106 citations
Buch et al.
FP8 Formats For Deep Learning (2022) • Arxiv • 49 citations
Micikevicius et al.
Expanding Language-image Pretrained Models For General Video Recognition (2022) • Lecture Notes in Computer Science • 221 citations
Ni et al.
FAST-VQA: Efficient End-to-end Video Quality Assessment With Fragment Sampling (2022) • Lecture Notes in Computer Science • 161 citations
Wu et al.
Vision Transformer Slimming: Multi-dimension Searching In Continuous Optimization Space (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Chavan et al.
Gatehub: Gated History Unit With Background Suppression For Online Action Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Chen et al.
GERE: Generative Evidence Retrieval For Fact Verification (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 53 citations
Chen et al.
Make-a-video: Text-to-video Generation Without Text-video Data (2022) • Arxiv • 308 citations
Singer et al.
Point-e: A System For Generating 3D Point Clouds From Complex Prompts (2022) • Arxiv • 155 citations
Nichol et al.
Revisiting Classifier: Transferring Vision-language Models For Video Recognition (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Wenhao Wu, Zhun Sun, Wanli Ouyang
Augmenting Interpretable Models With Llms During Training (2022) • Nature Communications • 42 citations
Singh et al.
Task Adaptive Parameter Sharing For Multi-task Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Wallingford et al.
Dawn Of The Transformer Era In Speech Emotion Recognition: Closing The Valence Gap (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 245 citations
Wagner et al.
Alpa: Automating Inter- And Intra-operator Parallelism For Distributed Deep Learning (2022) • Arxiv • 75 citations
Zheng et al.
X-trans2cap: Cross-modal Knowledge Transfer Using Transformer For 3D Dense Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Yuan et al.
Towards Lightweight Transformer Via Group-wise Transformation For Vision-and-language Tasks (2022) • IEEE Transactions on Image Processing • 51 citations
Luo et al.
Heterogeneous Ensemble Knowledge Transfer For Training Large Models In Federated Learning (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 82 citations
Cho et al.
Cross-attention Of Disentangled Modalities For 3D Human Mesh Recovery With Transformers (2022) • Lecture Notes in Computer Science • 88 citations
Junhyeong Cho, Kim Youwang, Tae-Hyun Oh
Centerclip: Token Clustering For Efficient Text-video Retrieval (2022) • SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 100 citations
Zhao et al.
Task Residual For Tuning Vision-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 73 citations
Yu et al.
Enabling Multimodal Generation On CLIP Via Vision-language Knowledge Distillation (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 52 citations
Dai et al.
MAGVIT: Masked Generative Video Transformer (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 93 citations
Yu et al.
Simplified State Space Layers For Sequence Modeling (2022) • Arxiv • 76 citations
Jimmy T. H. Smith, Andrew Warrington, Scott W. Linderman
Flashattention: Fast And Memory-efficient Exact Attention With Io-awareness (2022) • Arxiv • 452 citations
Dao et al.
Efficient Few-shot Learning Without Prompts (2022) • Arxiv • 93 citations
Tunstall et al.
Vitality: Unifying Low-rank And Sparse Approximation For Vision Transformer Acceleration With A Linear Taylor Attention (2022) • 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA) • 42 citations
Dass et al.
Structured Pruning Learns Compact And Accurate Models (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 99 citations
Mengzhou Xia, Zexuan Zhong, Danqi Chen
Rlprompt: Optimizing Discrete Text Prompts With Reinforcement Learning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 117 citations
Deng et al.
Stripformer: Strip Transformer For Fast Image Deblurring (2022) • Lecture Notes in Computer Science • 155 citations
Tsai et al.
Delta Tuning: A Comprehensive Study Of Parameter Efficient Methods For Pre-trained Language Models (2022) • Arxiv • 51 citations
Ding et al.
Measuring The Carbon Intensity Of AI In Cloud Instances (2022) • 2022 ACM Conference on Fairness Accountability and Transparency • 161 citations
Dodge et al.
Automated Clinical Coding: What, Why, And Where We Are? (2022) • npj Digital Medicine • 69 citations
Dong et al.
Vitcod: Vision Transformer Acceleration Via Dedicated Algorithm And Accelerator Co-design (2022) • 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA) • 83 citations
You et al.
Prompt For Extraction? PAIE: Prompting Argument Interaction For Event Argument Extraction (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 112 citations
Ma et al.
Castling-vit: Compressing Self-attention Via Switching Towards Linear-angular Attention At Vision Transformer Inference (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
You et al.
Self-supervised Hypergraph Transformer For Recommender Systems (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 107 citations
Lianghao Xia, Chao Huang, Chuxu Zhang
Llm-planner: Few-shot Grounded Planning For Embodied Agents With Large Language Models (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 267 citations
Song et al.
Document-level Relation Extraction With Adaptive Focal Loss And Knowledge Distillation (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 101 citations
Tan et al.
Paraformer: Fast And Accurate Parallel Transformer For Non-autoregressive End-to-end Speech Recognition (2022) • Interspeech 2022 • 73 citations
Gao et al.
MIST: Multi-modal Iterative Spatial-temporal Transformer For Long-form Video Question Answering (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 60 citations
Gao et al.
Dptext-detr: Towards Better Scene Text Detection With Dynamic Points In Transformer (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 72 citations
Ye et al.
LST: Ladder Side-tuning For Parameter And Memory Efficient Transfer Learning (2022) • Arxiv • 79 citations
Yi-Lin Sung, Jaemin Cho, Mohit Bansal
Mixture-of-experts With Expert Choice Routing (2022) • Arxiv • 57 citations
Zhou et al.
St-adapter: Parameter-efficient Image-to-video Transfer Learning (2022) • Arxiv • 76 citations
Pan et al.
Efficient Adapter Transfer Of Self-supervised Speech Models For Automatic Speech Recognition (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 52 citations
Bethan Thomas, Samuel Kessler, Salah Karout
Zerogen: Efficient Zero-shot Learning Via Dataset Generation (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 82 citations
Ye et al.
Vision Transformer With Deformable Attention (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 649 citations
Xia et al.
Bridging Video-text Retrieval With Multiple Choice Questions (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 117 citations
Ge et al.
Deepsolo: Let Transformer Decoder With Explicit Points Solo For Text Spotting (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 87 citations
Ye et al.
COTS: Collaborative Two-stream Vision-language Pre-training Model For Cross-modal Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 61 citations
Lu et al.
Re2g: Retrieve, Rerank, Generate (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 42 citations
Glass et al.
Wave-vit: Unifying Wavelet And Transformers For Visual Representation Learning (2022) • Lecture Notes in Computer Science • 145 citations
Yao et al.
Efficient Multimodal Transformer With Dual-level Feature Restoration For Robust Multimodal Sentiment Analysis (2022) • IEEE Transactions on Affective Computing • 136 citations
Sun et al.
Qdrop: Randomly Dropping Quantization For Extremely Low-bit Post-training Quantization (2022) • Arxiv • 44 citations
Wei et al.
Efficient Long-range Attention Network For Image Super-resolution (2022) • Lecture Notes in Computer Science • 386 citations
Zhang et al.
Swinfir: Revisiting The Swinir With Fast Fourier Convolution And Improved Training For Image Super-resolution (2022) • Arxiv • 71 citations
Zhang et al.
Global Pointer: Novel Efficient Span-based Approach For Named Entity Recognition (2022) • Arxiv • 63 citations
Su et al.
Branchformer: Parallel Mlp-attention Architectures To Capture Local And Global Context For Speech Recognition And Understanding (2022) • Arxiv • 40 citations
Peng et al.
A Primer On Contrastive Pretraining In Language Processing: Methods, Lessons Learned And Perspectives (2021) • ACM Computing Surveys • 55 citations
Nils Rethmeier, Isabelle Augenstein
Hardware Acceleration Of Fully Quantized BERT For Efficient Natural Language Processing (2021) • 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE) • 50 citations
Zejian Liu, Gang Li, Jian Cheng
A Good Prompt Is Worth Millions Of Parameters: Low-resource Prompt-based Learning For Vision-language Models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 66 citations
Jin et al.
Delightfultts: The Microsoft Speech Synthesis System For Blizzard Challenge 2021 (2021) • The Blizzard Challenge 2021 • 40 citations
Liu et al.
End-to-end Neural Diarization: From Transformer To Conformer (2021) • IEEE Transactions on Image Processing • 210 citations
Liu et al.
Dexperts: Decoding-time Controlled Text Generation With Experts And Anti-experts (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 127 citations
Liu et al.
An Efficient Transformer Decoder With Compressed Sub-layers (2021) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 46 citations
Li et al.
Lightningdot: Pre-training Visual-semantic Embeddings For Real-time Image-text Retrieval (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 79 citations
Sun et al.
Efficient Attentions For Long Document Summarization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 121 citations
Huang et al.
Context-aware Legal Citation Recommendation Using Deep Learning (2021) • Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law • 40 citations
Huang et al.
Random Feature Attention (2021) • Arxiv • 122 citations
Peng et al.
Traceability Transformed: Generating More Accurate Links With Pre-trained BERT Models (2021) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 115 citations
Lin et al.
Learning Salient Boundary Feature For Anchor-free Temporal Action Localization (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 248 citations
Lin et al.
CAT: Cross-attention Transformer For One-shot Object Detection (2021) • 2022 IEEE International Conference on Multimedia and Expo (ICME) • 173 citations
Lin et al.
Relaxed Transformer Decoders For Direct Action Proposal Generation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 153 citations
Tan et al.
TABBIE: Pretrained Representations Of Tabular Data (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 110 citations
Iida et al.
Sparse DETR: Efficient End-to-end Object Detection With Learnable Sparsity (2021) • Arxiv • 89 citations
Roh et al.
Diff-tts: A Denoising Diffusion Model For Text-to-speech (2021) • Interspeech 2021 • 95 citations
Jeong et al.
Enhance To Read Better: A Multi-task Adversarial Network For Handwritten Document Image Enhancement (2021) • Pattern Recognition • 49 citations
Jemni et al.
Zero-shot Adversarial Quantization (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 73 citations
Yuang Liu, Wei Zhang, Jun Wang
Transtailor: Pruning The Pre-trained Model For Improved Transfer Learning (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 47 citations
Liu et al.
Accelerating Recommendation System Training By Leveraging Popular Choices (2021) • Proceedings of the VLDB Endowment • 47 citations
Adnan et al.
Ibot: Image BERT Pre-training With Online Tokenizer (2021) • Arxiv • 207 citations
Zhou et al.
Muppet: Massive Multi-task Representations With Pre-finetuning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 168 citations
Aghajanyan et al.
Spanet: Generalized Permutationless Set Assignment For Particle Physics Using Symmetry Preserving Attention (2021) • SciPost Physics • 41 citations
Shmakov et al.
Pale Transformer: A General Vision Transformer Backbone With Pale-shaped Attention (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 48 citations
Wu et al.
Vision Transformer For Fast And Efficient Scene Text Recognition (2021) • Lecture Notes in Computer Science • 147 citations
Rowel Atienza
Combat COVID-19 Infodemic Using Explainable Natural Language Processing Models (2021) • Information Processing & Management • 146 citations
Jackie Ayoub, X. Jessie Yang, Feng Zhou
Personalized Transformer For Explainable Recommendation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 72 citations
Lei Li, Yongfeng Zhang, Li Chen
Fast End-to-end Speech Recognition Via Non-autoregressive Models And Cross-modal Knowledge Transferring From BERT (2021) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 49 citations
Bai et al.
PAN++: Towards Efficient And Accurate End-to-end Spotting Of Arbitrarily-shaped Text (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 123 citations
Wang et al.
Cliport: What And Where Pathways For Robotic Manipulation (2021) • Arxiv • 98 citations
Mohit Shridhar, Lucas Manuelli, Dieter Fox
Fastformer: Additive Attention Can Be All You Need (2021) • Arxiv • 77 citations
Wu et al.
Lambdanetworks: Modeling Long-range Interactions Without Attention (2021) • Arxiv • 48 citations
Irwan Bello
Pimnet: A Parallel, Iterative And Mimicking Network For Scene Text Recognition (2021) • MM '21: ACM Multimedia Conference • 69 citations
Qiao et al.
Understanding And Overcoming The Challenges Of Efficient Transformer Quantization (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 51 citations
Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort
Wav2clip: Learning Robust Audio Representations From CLIP (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 158 citations
Wu et al.
Contextualized Streaming End-to-end Speech Recognition With Trie-based Deep Biasing And Shallow Fusion (2021) • Interspeech 2021 • 58 citations
Le et al.
Efficient Conformer: Progressive Downsampling And Grouped Attention For Automatic Speech Recognition (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 68 citations
Maxime Burchi, Valentin Vielzeuf
The Impact Of Multiple Parallel Phrase Suggestions On Email Input And Composition Behaviour Of Native And Non-native English Writers (2021) • Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems • 95 citations
Daniel Buschek, Martin Zürn, Malin Eiband
Deduplicating Training Data Makes Language Models Better (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 119 citations
Lee et al.
Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 98 citations
Miech et al.
Energon: Towards Efficient Acceleration Of Transformers Using Dynamic Sparse Attention (2021) • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems • 42 citations
Zhou et al.
Understanding Data Storage And Ingestion For Large-scale Deep Recommendation Model Training (2021) • Proceedings of the 49th Annual International Symposium on Computer Architecture • 59 citations
Zhao et al.
Remote Sensing Image Change Detection With Transformers (2021) • IEEE Transactions on Geoscience and Remote Sensing • 889 citations
Hao Chen, Zipeng Qi, Zhenwei Shi
Copy, Right? A Testing Framework For Copyright Protection Of Deep Learning Models (2021) • 2022 IEEE Symposium on Security and Privacy (SP) • 57 citations
Chen et al.
Mobile-former: Bridging Mobilenet And Transformer (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 514 citations
Chen et al.
Industry Scale Semi-supervised Learning For Natural Language Understanding (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers • 67 citations
Chen et al.
Visualgpt: Data-efficient Adaptation Of Pretrained Language Models For Image Captioning (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 133 citations
Chen et al.
Lightweight Adapter Tuning For Multilingual Speech Translation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) • 54 citations
Le et al.
Align Before Fuse: Vision And Language Representation Learning With Momentum Distillation (2021) • Arxiv • 820 citations
Li et al.
Bitfit: Simple Parameter-efficient Fine-tuning For Transformer-based Masked Language-models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 286 citations
Elad Ben Zaken, Shauli Ravfogel, Yoav Goldberg
Kaleido-bert: Vision-language Pre-training On Fashion Domain (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 105 citations
Zhuge et al.
Improving Video-text Retrieval By Multi-stream Corpus Alignment And Dual Softmax Loss (2021) • Arxiv • 66 citations
Cheng et al.
Compacter: Efficient Low-rank Hypercomplex Adapter Layers (2021) • Arxiv • 82 citations
Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder
Tokens-to-token Vit: Training Vision Transformers From Scratch On Imagenet (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 1687 citations
Yuan et al.
Learned Token Pruning For Transformers (2021) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 75 citations
Kim et al.
Factual Probing Is [MASK]: Learning Vs. Learning To Recall (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 230 citations
Zexuan Zhong, Dan Friedman, Danqi Chen
Learning Domain Adaptation With Model Calibration For Surgical Report Generation In Robotic Surgery (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 164 citations
Xu et al.
Trankit: A Light-weight Transformer-based Toolkit For Multilingual Natural Language Processing (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations • 43 citations
Nguyen et al.
Efficient Large-scale Language Model Training On GPU Clusters Using Megatron-lm (2021) • SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis • 340 citations
Narayanan et al.
CANINE: Pre-training An Efficient Tokenization-free Encoder For Language Representation (2021) • Transactions of the Association for Computational Linguistics • 114 citations
Clark et al.
Learning Passage Impacts For Inverted Indexes (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 140 citations
Mallia et al.
E2E-VLP: End-to-end Vision-language Pre-training Enhanced By Visual Learning (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 78 citations
Xu et al.
TEACHTEXT: Crossmodal Generalized Distillation For Text-video Retrieval (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 117 citations
Croitoru et al.
Evo-vit: Slow-fast Token Evolution For Dynamic Vision Transformer (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 131 citations
Xu et al.
Stacked Acoustic-and-textual Encoding: Integrating The Pre-trained Models Into Speech Translation Encoders (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 66 citations
Xu et al.
Vector-quantized Image Modeling With Improved VQGAN (2021) • Arxiv • 91 citations
Yu et al.
Compute And Energy Consumption Trends In Deep Learning Inference (2021) • Trends in AI inference energy consumption Beyond the performance-vs-parameter laws of deep learning Sustainable Computing Informatics and Systems (2023). Volume 38 April 2023 100857 • 43 citations
Radosvet Desislavov, Fernando Martínez-Plumed, José Hernández-Orallo
The NLP Cookbook: Modern Recipes For Transformer Based Deep Learning Architectures (2021) • IEEE Access • 121 citations
Sushant Singh, Ausif Mahmood
Luna: Linear Unified Nested Attention (2021) • Arxiv • 49 citations
Ma et al.
Openprompt: An Open-source Framework For Prompt-learning (2021) • Arxiv • 64 citations
Ding et al.
Scaling End-to-end Models For Large-scale Multilingual ASR (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 46 citations
Li et al.
A Tinyml Platform For On-device Continual Learning With Quantized Latent Replays (2021) • IEEE Journal on Emerging and Selected Topics in Circuits and Systems • 75 citations
Ravaglia et al.
Efficient Training Of Audio Transformers With Patchout (2021) • Interspeech 2022 • 134 citations
Koutini et al.
A Full-stack Search Technique For Domain Optimized Deep Learning Accelerators (2021) • Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems • 44 citations
Zhang et al.
Fjord: Fair And Accurate Federated Learning Under Heterogeneous Targets With Ordered Dropout (2021) • Arxiv • 64 citations
Horvath et al.
Prompting Visual-language Models For Efficient Video Understanding (2021) • Lecture Notes in Computer Science • 246 citations
Ju et al.
A Syntax-guided Edit Decoder For Neural Program Repair (2021) • Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 178 citations
Zhu et al.
Zero-offload: Democratizing Billion-scale Model Training (2021) • Arxiv • 61 citations
Ren et al.
Token Shift Transformer For Video Classification (2021) • MM '21: ACM Multimedia Conference • 95 citations
Hao Zhang, Yanbin Hao, Chong-Wah Ngo
Pre-trained Models: Past, Present And Future (2021) • AI Open • 700 citations
Han et al.
Whitening Sentence Representations For Better Semantics And Faster Retrieval (2021) • Arxiv • 204 citations
Su et al.
On The Effectiveness Of Adapter-based Tuning For Pretrained Language Model Adaptation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 112 citations
He et al.
Debertav3: Improving Deberta Using Electra-style Pre-training With Gradient-disentangled Embedding Sharing (2021) • Arxiv • 391 citations
Pengcheng He, Jianfeng Gao, Weizhu Chen
Towards A Unified View Of Parameter-efficient Transfer Learning (2021) • Arxiv • 277 citations
He et al.
The Stem Cell Hypothesis: Dilemma Behind Multi-task Learning With Transformer Encoders (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 85 citations
Han He, Jinho D. Choi
Symbolic Knowledge Distillation: From General Language Models To Commonsense Models (2021) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 54 citations
West et al.
Ocr-free Document Understanding Transformer (2021) • Lecture Notes in Computer Science • 194 citations
Kim et al.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-image Pre-training Paradigm (2021) • Arxiv • 126 citations
Li et al.
A Practical Survey On Faster And Lighter Transformers (2021) • ACM Computing Surveys • 75 citations
Quentin Fournier, Gaétan Marceau Caron, Daniel Aloise
Lightspeech: Lightweight And Fast Text To Speech With Neural Architecture Search (2021) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Luo et al.
CPM-2: Large-scale Cost-effective Pre-trained Language Models (2021) • AI Open • 51 citations
Zhang et al.
Bigssl: Exploring The Frontier Of Large-scale Semi-supervised Learning For Automatic Speech Recognition (2021) • IEEE Journal of Selected Topics in Signal Processing • 148 citations
Zhang et al.
COIL: Revisit Exact Lexical Match In Information Retrieval With Contextualized Inverted List (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 161 citations
Luyu Gao, Zhuyun Dai, Jamie Callan
Container: Context Aggregation Network (2021) • Arxiv • 41 citations
Gao et al.
I-pulse: A NLP Based Novel Approach For Employee Engagement In Logistics Organization (2021) • International Journal of Information Management Data Insights • 54 citations
Garg et al.
Nested Hierarchical Transformer: Towards Accurate, Data-efficient And Interpretable Visual Understanding (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 122 citations
Zhang et al.
P-tuning V2: Prompt Tuning Can Be Comparable To Fine-tuning Universally Across Scales And Tasks (2021) • Arxiv • 261 citations
Liu et al.
Styleswin: Transformer-based GAN For High-resolution Image Generation (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 202 citations
Zhang et al.
Pretrained Transformers As Universal Computation Engines (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 71 citations
Lu et al.
Efficient Passage Retrieval With Hashing For Open-domain Question Answering (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) • 51 citations
Ikuya Yamada, Akari Asai, Hannaneh Hajishirzi
Efficiently Modeling Long Sequences With Structured State Spaces (2021) • Arxiv • 471 citations
Albert Gu, Karan Goel, Christopher Ré
Cutting Down On Prompts And Parameters: Simple Few-shot Learning With Language Models (2021) • Findings of the Association for Computational Linguistics: ACL 2022 • 42 citations
Logan et al.
Lightweight Self-attentive Sequential Recommendation (2021) • Proceedings of the 30th ACM International Conference on Information & Knowledge Management • 104 citations
Li et al.
FILIP: Fine-grained Interactive Language-image Pre-training (2021) • Arxiv • 205 citations
Yao et al.
Transnas-bench-101: Improving Transferability And Generalizability Of Cross-task Neural Architecture Search (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Duan et al.
Byt5: Towards A Token-free Future With Pre-trained Byte-to-byte Models (2021) • Arxiv • 69 citations
Xue et al.
Sparse Sinkhorn Attention (2020) • Arxiv • 77 citations
Tay et al.
Few-shot Text Generation With Pattern-exploiting Training (2020) • Arxiv • 74 citations
Timo Schick, Hinrich Schütze
Fastbert: A Self-distilling BERT With Adaptive Inference Time (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 245 citations
Liu et al.
Efficient Minimum Word Error Rate Training Of Rnn-transducer For End-to-end Speech Recognition (2020) • Interspeech 2020 • 51 citations
Guo et al.
Dynabert: Dynamic BERT With Adaptive Width And Depth (2020) • Arxiv • 119 citations
Hou et al.
Glancing Transformer For Non-autoregressive Neural Machine Translation (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 45 citations
Qian et al.
Informer: Beyond Efficient Transformer For Long Sequence Time-series Forecasting (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 3884 citations
Zhou et al.
Utility Is In The Eye Of The User: A Critique Of NLP Leaderboards (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 127 citations
Kawin Ethayarajh, Dan Jurafsky
Training With Quantization Noise For Extreme Model Compression (2020) • Arxiv • 113 citations
Fan et al.
On The Effect Of Dropping Layers Of Pre-trained Transformer Models (2020) • Computer Speech & Language • 57 citations
Sajjad et al.
Tinylstms: Efficient Neural Speech Enhancement For Hearing Aids (2020) • Interspeech 2020 • 105 citations
Fedorov et al.
More Grounded Image Captioning By Distilling Image-text Matching Model (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Zhou et al.
Ternarybert: Distillation-aware Ultra-low Bit BERT (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 140 citations
Zhang et al.
Unified Streaming And Non-streaming Two-pass End-to-end Model For Speech Recognition (2020) • Arxiv • 46 citations
Zhang et al.
Conformer: Convolution-augmented Transformer For Speech Recognition (2020) • Interspeech 2020 • 1880 citations
Gulati et al.
Improving Efficient Neural Ranking Models With Cross-architecture Knowledge Distillation (2020) • Arxiv • 64 citations
Hofstätter et al.
It's Not Just Size That Matters: Small Language Models Are Also Few-shot Learners (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 234 citations
Timo Schick, Hinrich Schütze
Compressing BERT: Studying The Effects Of Weight Pruning On Transfer Learning (2020) • Proceedings of the 5th Workshop on Representation Learning for NLP • 78 citations
Mitchell A. Gordon, Kevin Duh, Nicholas Andrews
Autostr: Efficient Backbone Search For Scene Text Recognition (2020) • Lecture Notes in Computer Science • 48 citations
Zhang et al.
Train No Evil: Selective Masking For Task-guided Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 42 citations
Gu et al.
Colbert: Efficient And Effective Passage Search Via Contextualized Late Interaction Over BERT (2020) • Arxiv • 189 citations
Omar Khattab, Matei Zaharia
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 84 citations
Shi et al.
Deberta: Decoding-enhanced BERT With Disentangled Attention (2020) • Arxiv • 412 citations
He et al.
Distilled One-shot Federated Learning (2020) • Arxiv • 73 citations
Zhou et al.
Creating Something From Nothing: Unsupervised Knowledge Distillation For Cross-modal Hashing (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 127 citations
Hu et al.
Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 54 citations
Nils Reimers, Iryna Gurevych
Language-guided Navigation Via Cross-modal Grounding And Alternate Adversarial Learning (2020) • IEEE Transactions on Circuits and Systems for Video Technology • 61 citations
Zhang et al.
Sparse, Dense, And Attentional Representations For Text Retrieval (2020) • Transactions of the Association for Computational Linguistics • 154 citations
Luan et al.
Compressing Large-scale Transformer-based Models: A Case Study On BERT (2020) • Transactions of the Association for Computational Linguistics • 102 citations
Ganesh et al.
Fashionbert: Text And Image Matching With Adaptive Loss For Cross-modal Retrieval (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 114 citations
Gao et al.
Modularized Transfomer-based Ranking Framework (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 50 citations
Luyu Gao, Zhuyun Dai, Jamie Callan
Deebert: Dynamic Early Exiting For Accelerating BERT Inference (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 265 citations
Xin et al.
Efficient Second-order Treecrf For Neural Dependency Parsing (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 88 citations
Yu Zhang, Zhenghua Li, Min Zhang
Convbert: Improving BERT With Span-based Dynamic Convolution (2020) • Arxiv • 118 citations
Jiang et al.
Slotrefine: A Fast Non-autoregressive Model For Joint Intent Detection And Slot Filling (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 71 citations
Wu et al.
Unitrans: Unifying Model Transfer And Data Transfer For Cross-lingual Named Entity Recognition With Unlabeled Data (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 45 citations
Wu et al.
Efficient Transformers: A Survey (2020) • ACM Computing Surveys • 532 citations
Tay et al.
Standing On The Shoulders Of Giants: Hardware And Neural Architecture Co-search With Hot Start (2020) • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems • 70 citations
Jiang et al.
When BERT Plays The Lottery, All Tickets Are Winning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 44 citations
Sai Prasanna, Anna Rogers, Anna Rumshisky
Binarybert: Pushing The Limit Of BERT Quantization (2020) • Arxiv • 45 citations
Bai et al.
Sparterm: Learning Term-based Sparse Representation For Fast Text Retrieval (2020) • Arxiv • 59 citations
Bai et al.
Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020) • Interspeech 2020 • 57 citations
Yang et al.
Fastspeech 2: Fast And High-quality End-to-end Text To Speech (2020) • Arxiv • 514 citations
Ren et al.
Knowledge Distillation For Multi-task Learning (2020) • Lecture Notes in Computer Science • 52 citations
Wei-Hong Li, Hakan Bilen
BERT Loses Patience: Fast And Robust Inference With Early Exit (2020) • Arxiv • 45 citations
Zhou et al.
The Cost Of Training NLP Models: A Concise Overview (2020) • Arxiv • 114 citations
Or Sharir, Barak Peleg, Yoav Shoham
LSQ+: Improving Low-bit Quantization Through Learnable Offsets And Better Initialization (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 190 citations
Bhalgat et al.
Structure-level Knowledge Distillation For Multilingual Sequence Labeling (2020) • Proceedings of the Web Conference 2021 • 172 citations
Wang et al.
Minilmv2: Multi-head Self-attention Relation Distillation For Compressing Pretrained Transformers (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 149 citations
Wang et al.
HAT: Hardware-aware Transformers For Efficient Natural Language Processing (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 138 citations
Wang et al.
Developing RNN-T Models Surpassing High-performance Hybrid Models With Customization Capability (2020) • Interspeech 2020 • 96 citations
Li et al.
Byte Pair Encoding Is Suboptimal For Language Model Pretraining (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 69 citations
Kaj Bostrom, Greg Durrett
Characterbert: Reconciling Elmo And BERT For Word-level Open-vocabulary Representations From Characters (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 118 citations
Boukkouri et al.
A Study Of Non-autoregressive Model For Sequence Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 55 citations
Ren et al.
Aligntts: Efficient Feed-forward Text-to-speech System Without Explicit Alignment (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Zeng et al.
Template Guided Text Generation For Task-oriented Dialogue (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 55 citations
Mihir Kale, Abhinav Rastogi
Gradient Vaccine: Investigating And Improving Multi-task Optimization In Massively Multilingual Models (2020) • Arxiv • 59 citations
Wang et al.
Efficient Neural Query Auto Completion (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Wang et al.
Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 68 citations
Miao et al.
Pre-training Tasks For Embedding-based Large-scale Retrieval (2020) • Arxiv • 102 citations
Chang et al.
Adabert: Task-adaptive BERT Compression With Differentiable Neural Architecture Search (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 70 citations
Chen et al.
Developing Real-time Streaming Transformer Transducer For Speech Recognition On Large-scale Dataset (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 127 citations
Chen et al.
IMRAM: Iterative Matching With Recurrent Attention Memory For Cross-modal Image-text Retrieval (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 391 citations
Chen et al.
Multispeech: Multi-speaker Text To Speech With Transformer (2020) • Interspeech 2020 • 52 citations
Chen et al.
Mintl: Minimalist Transfer Learning For Task-oriented Dialogue Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 131 citations
Lin et al.
Efficient Content-based Sparse Attention With Routing Transformers (2020) • Transactions of the Association for Computational Linguistics • 266 citations
Roy et al.
Improving Post Training Neural Quantization: Layer-wise Calibration And Integer Programming (2020) • Arxiv • 75 citations
Hubara et al.
ADER: Adaptively Distilled Exemplar Replay Towards Continual Learning For Session-based Recommendation (2020) • Fourteenth ACM Conference on Recommender Systems • 49 citations
Fei Mi, Xiaoyu Lin, Boi Faltings
Pre-trained Summarization Distillation (2020) • Arxiv • 57 citations
Sam Shleifer, Alexander M. Rush
Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020) • Arxiv • 738 citations
Jungil Kong, Jaehyeon Kim, Jaekyoung Bae
Ladabert: Lightweight Adaptation Of BERT Through Hybrid Model Compression (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 40 citations
Mao et al.
Rethinking Attention With Performers (2020) • Arxiv • 122 citations
Choromanski et al.
Memory-efficient Pipeline-parallel DNN Training (2020) • Arxiv • 60 citations
Narayanan et al.
ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators (2020) • Arxiv • 541 citations
Clark et al.
Norm-based Curriculum Learning For Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 96 citations
Liu et al.
Multi-head Attention: Collaborate Instead Of Concatenate (2020) • Arxiv • 76 citations
Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi
Enhancing Extractive Text Summarization With Topic-aware Graph Neural Networks (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 63 citations
Peng Cui, Le Hu, Yuanchao Liu
Hiertrain: Fast Hierarchical Edge AI Learning With Hybrid Parallelism In Mobile-edge-cloud Computing (2020) • IEEE Open Journal of the Communications Society • 65 citations
Liu et al.
Expansion Via Prediction Of Importance With Contextualization (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 64 citations
MacAvaney et al.
Leveraging Unpaired Text Data For Training End-to-end Speech-to-intent Systems (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 56 citations
Huang et al.
Contrastive Distillation On Intermediate Representations For Language Model Compression (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Sun et al.
Mobilebert: A Compact Task-agnostic BERT For Resource-limited Devices (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 593 citations
Sun et al.
A Streaming On-device End-to-end Model Surpassing Server-side Conventional Model Quality And Latency (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 200 citations
Sainath et al.
Personality Trait Detection Using Bagged SVM Over BERT Word Embedding Ensembles (2020) • Proceedings of the The Fourth Widening Natural Language Processing Workshop (2020) • 52 citations
Kazameini et al.
Vlanet: Video-language Alignment Network For Weakly-supervised Video Moment Retrieval (2020) • Lecture Notes in Computer Science • 78 citations
Ma et al.
Bridging Textual And Tabular Data For Cross-domain Text-to-sql Semantic Parsing (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 145 citations
Xi Victoria Lin, Richard Socher, Caiming Xiong
Podnet: Pooled Outputs Distillation For Small-tasks Incremental Learning (2020) • Lecture Notes in Computer Science • 48 citations
Douillard et al.
Self-training Improves Pre-training For Natural Language Understanding (2020) • Arxiv • 46 citations
Du et al.
Minimum Latency Training Strategies For Streaming Sequence-to-sequence ASR (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 46 citations
Inaguma et al.
Differentiable Reasoning Over A Virtual Knowledge Base (2020) • Arxiv • 44 citations
Dhingra et al.
Adapterdrop: On The Efficiency Of Adapters In Transformers (2020) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 61 citations
Rücklé et al.
Understanding And Improving Lexical Choice In Non-autoregressive Translation (2020) • Arxiv • 44 citations
Ding et al.
Gector -- Grammatical Error Correction: Tag, Not Rewrite (2020) • Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications • 103 citations
Omelianchuk et al.
Spacenet: Make Free Space For Continual Learning (2020) • Neurocomputing • 56 citations
Ghada Sokar, Decebal Constantin Mocanu, Mykola Pechenizkiy
Playing The Lottery With Rewards And Multiple Languages: Lottery Tickets In RL And NLP (2019) • Arxiv • 77 citations
Yu et al.
Small And Practical BERT Models For Sequence Labeling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 117 citations
Tsai et al.
Durian: Duration Informed Attention Network For Multimodal Synthesis (2019) • Arxiv • 94 citations
Yu et al.
Multifit: Efficient Multi-lingual Language Model Fine-tuning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 48 citations
Eisenschlos et al.
Practice On Long Sequential User Behavior Modeling For Click-through Rate Prediction (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 198 citations
Pi et al.
Structural Supervision Improves Learning Of Non-local Grammatical Dependencies (2019) • Proceedings of the 2019 Conference of the North • 61 citations
Wilcox et al.
Structured Query Construction Via Knowledge Graph Embedding (2019) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 72 citations
Wang et al.
Hierarchical Temporal Convolutional Networks For Dynamic Recommender Systems (2019) • The World Wide Web Conference • 110 citations
You et al.
Self-attention Aligner: A Latency-control End-to-end Model For ASR Using Self-attention Network And Chunk-hopping (2019) • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 100 citations
Linhao Dong, Feng Wang, Bo Xu
Patient Knowledge Distillation For BERT Model Compression (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 85 citations
Sun et al.
Videobert: A Joint Model For Video And Language Representation Learning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 1077 citations
Sun et al.
Fast Structured Decoding For Sequence Models (2019) • Arxiv • 61 citations
Sun et al.
Parameter-efficient Transfer Learning For NLP (2019) • Arxiv • 144 citations
Houlsby et al.
Analyzing Multi-head Self-attention: Specialized Heads Do The Heavy Lifting, The Rest Can Be Pruned (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 494 citations
Voita et al.
Optimizing Multi-gpu Parallelization Strategies For Deep Learning Training (2019) • IEEE Micro • 70 citations
Pal et al.
Commonsense Knowledge Mining From Pretrained Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 264 citations
Joshua Feldman, Joe Davison, Alexander M. Rush
Reasoning Over Paragraph Effects In Situations (2019) • Proceedings of the 2nd Workshop on Machine Reading for Question Answering • 89 citations
Lin et al.
Mixture Content Selection For Diverse Sequence Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 56 citations
Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi
Temporal Convolution For Real-time Keyword Spotting On Mobile Devices (2019) • Interspeech 2019 • 143 citations
Choi et al.
Distilling Task-specific Knowledge From BERT Into Simple Neural Networks (2019) • Arxiv • 337 citations
Tang et al.
Well-read Students Learn Better: On The Importance Of Pre-training Compact Models (2019) • Arxiv • 428 citations
Turc et al.
Transformer-xl: Attentive Language Models Beyond A Fixed-length Context (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1694 citations
Dai et al.
CEDR: Contextualized Embeddings For Document Ranking (2019) • SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 186 citations
MacAvaney et al.
Data Diversification: A Simple Strategy For Neural Machine Translation (2019) • Arxiv • 44 citations
Nguyen et al.
Encode, Tag, Realize: High-precision Text Editing (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Malmi et al.
Tripping Through Time: Efficient Localization Of Activities In Videos (2019) • Arxiv • 41 citations
Hahn et al.
Energy And Policy Considerations For Deep Learning In NLP (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 423 citations
Emma Strubell, Ananya Ganesh, Andrew McCallum
Adaptively Sparse Transformers (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 81 citations
Gonçalo M. Correia, Vlad Niculae, André F. T. Martins
Multilingual Neural Machine Translation With Knowledge Distillation (2019) • Arxiv • 129 citations
Tan et al.
White-to-black: Efficient Distillation Of Black-box Adversarial Attacks (2019) • Proceedings of the 2019 Conference of the North • 45 citations
Gil et al.
MASTER: Multi-aspect Non-local Network For Scene Text Recognition (2019) • Pattern Recognition • 171 citations
Lu et al.
SMART: Robust And Efficient Fine-tuning For Pre-trained Natural Language Models Through Principled Regularized Optimization (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 138 citations
Jiang et al.
Improving Relation Extraction By Pre-trained Language Representations (2019) • Proceedings of AKBC 2019 • 53 citations
Christoph Alt, Marc Hübner, Leonhard Hennig
Improving Neural Network Quantization Without Retraining Using Outlier Channel Splitting (2019) • Arxiv • 150 citations
Zhao et al.
Deep Equilibrium Models (2019) • Arxiv • 245 citations
Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Zero: Memory Optimizations Toward Training Trillion Parameter Models (2019) • Arxiv • 72 citations
Rajbhandari et al.
Q-BERT: Hessian Based Ultra Low Precision Quantization Of BERT (2019) • AAAI 2020 • 52 citations
Shen et al.
Large Memory Layers With Product Keys (2019) • Arxiv • 50 citations
Lample et al.
Graph Representation Learning Via Hard And Channel-wise Attention Networks (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 57 citations
Hongyang Gao, Shuiwang Ji
The State Of Sparsity In Deep Neural Networks (2019) • Arxiv • 436 citations
Trevor Gale, Erich Elsen, Sara Hooker
Structured Pruning Of A Bert-based Question Answering Model (2019) • Arxiv • 72 citations
J. S. McCarley, Rishav Chakravarti, Avirup Sil
Pay Less Attention With Lightweight And Dynamic Convolutions (2019) • Arxiv • 322 citations
Wu et al.
Is BERT Really Robust? A Strong Baseline For Natural Language Attack On Text Classification And Entailment (2019) • Arxiv • 102 citations
Jin et al.
BERT And Pals: Projected Attention Layers For Efficient Adaptation In Multi-task Learning (2019) • Arxiv • 113 citations
Asa Cooper Stickland, Iain Murray
Spherical Text Embedding (2019) • Arxiv • 52 citations
Meng et al.
Reweighted Proximal Pruning For Large-scale Language Representation (2019) • Arxiv • 45 citations
Guo et al.
Speaker Adaptation For Attention-based End-to-end Speech Recognition (2019) • Interspeech 2019 • 40 citations
Meng et al.
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 66 citations
Wang et al.
Performance-efficiency Trade-off Of Low-precision Numerical Formats In Deep Neural Networks (2019) • Proceedings of the Conference for Next Generation Arithmetic 2019 • 63 citations
Carmichael et al.
Non-autoregressive Machine Translation With Auxiliary Regularization (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 153 citations
Wang et al.
Multi-task Feature Learning For Knowledge Graph Enhanced Recommendation (2019) • The World Wide Web Conference • 525 citations
Wang et al.
Are Sixteen Heads Really Better Than One? (2019) • Arxiv • 45 citations
Paul Michel, Omer Levy, Graham Neubig
Simpler And Faster Learning Of Adaptive Policies For Simultaneous Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 77 citations
Zheng et al.
Extracting Multiple-relations In One-pass With Pre-trained Transformers (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 45 citations
Wang et al.
Review-driven Answer Generation For Product-related Questions In E-commerce (2019) • Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining • 50 citations
Chen et al.
Overcoming Long-term Catastrophic Forgetting Through Adversarial Neural Pruning And Synaptic Consolidation (2019) • IEEE Transactions on Neural Networks and Learning Systems • 46 citations
Peng et al.
Convert: Efficient And Accurate Conversational Representations From Transformers (2019) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 41 citations
Henderson et al.
Tinybert: Distilling BERT For Natural Language Understanding (2019) • Arxiv • 134 citations
Jiao et al.
DNNVM : End-to-end Compiler Leveraging Heterogeneous Optimizations On Fpga-based CNN Accelerators (2019) • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems • 80 citations
Xing et al.
Cyclical Annealing Schedule: A Simple Approach To Mitigating KL Vanishing (2019) • Arxiv • 169 citations
Fu et al.
End-to-end Speech Translation With Knowledge Distillation (2019) • Interspeech 2019 • 139 citations
Liu et al.
Low-resource Name Tagging Learned With Weakly Labeled Data (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 86 citations
Cao et al.
Mesh-tensorflow: Deep Learning For Supercomputers (2018) • Arxiv • 52 citations
Shazeer et al.
Compositional Attention Networks For Machine Reasoning (2018) • Arxiv • 132 citations
Drew A. Hudson, Christopher D. Manning
Efficient And Robust Question Answering From Minimal Context Over Documents (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 144 citations
Min et al.
Universal Sentence Encoder (2018) • Arxiv • 1289 citations
Cer et al.
The Price Of Debiasing Automatic Metrics In Natural Language Evaluation (2018) • Arxiv • 43 citations
Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang
Born Again Neural Networks (2018) • Arxiv • 442 citations
Furlanello et al.
Fast Decoding In Sequence Models Using Discrete Latent Variables (2018) • Arxiv • 177 citations
Kaiser et al.
Gpipe: Efficient Training Of Giant Neural Networks Using Pipeline Parallelism (2018) • Arxiv • 236 citations
Huang et al.
Grow And Prune Compact, Fast, And Accurate Lstms (2018) • IEEE Transactions on Computers • 67 citations
Xiaoliang Dai, Hongxu Yin, Niraj K. Jha
A Retrieve-and-edit Framework For Predicting Structured Outputs (2018) • Arxiv • 102 citations
Hashimoto et al.
Adversarial Sampling And Training For Semi-supervised Information Retrieval (2018) • The World Wide Web Conference • 79 citations
Dae Hoon Park, Yi Chang
Rethinking Floating Point For Deep Learning (2018) • Arxiv • 104 citations
Jeff Johnson
Qanet: Combining Local Convolution With Global Self-attention For Reading Comprehension (2018) • Arxiv • 417 citations
Yu et al.
Back-translation-style Data Augmentation For End-to-end ASR (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 95 citations
Hayashi et al.
Efficient Contextualized Representation: Language Model Pruning For Sequence Labeling (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 859 citations
Liu et al.
Von Mises-fisher Loss For Training Sequence To Sequence Models With Continuous Outputs (2018) • Arxiv • 54 citations
Sachin Kumar, Yulia Tsvetkov
Self-attentive Sequential Recommendation (2018) • 2018 IEEE International Conference on Data Mining (ICDM) • 2379 citations
Wang-Cheng Kang, Julian McAuley
Modular Networks: Learning To Decompose Neural Computation (2018) • Arxiv • 40 citations
Louis Kirsch, Julius Kunze, David Barber
Proxylessnas: Direct Neural Architecture Search On Target Task And Hardware (2018) • Arxiv • 284 citations
Han Cai, Ligeng Zhu, Song Han
Alternating Multi-bit Quantization For Recurrent Neural Networks (2018) • Arxiv • 60 citations
Xu et al.
Neural Architecture Optimization (2018) • Arxiv • 431 citations
Luo et al.
Bi-directional Block Self-attention For Fast And Memory-efficient Sequence Modeling (2018) • Arxiv • 77 citations
Shen et al.
Toward Diverse Text Generation With Inverse Reinforcement Learning (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 86 citations
Shi et al.
Efficient Attention: Attention With Linear Complexities (2018) • Arxiv • 88 citations
Shen et al.
Tensor Comprehensions: Framework-agnostic High-performance Machine Learning Abstractions (2018) • Arxiv • 251 citations
Vasilache et al.
Clarinet: Parallel Wave Generation In End-to-end Text-to-speech (2018) • Arxiv • 63 citations
Wei Ping, Kainan Peng, Jitong Chen
Large Scale Distributed Neural Network Training Through Online Distillation (2018) • Arxiv • 152 citations
Anil et al.
Textbugger: Generating Adversarial Text Against Real-world Applications (2018) • Proceedings 2019 Network and Distributed System Security Symposium • 275 citations
Li et al.
Efficient Large-scale Multi-modal Classification (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 145 citations
Kiela et al.
Dpp-net: Device-aware Progressive Search For Pareto-optimal Neural Architectures (2018) • Lecture Notes in Computer Science • 187 citations
Dong et al.
Trellis Networks For Sequence Modeling (2018) • Arxiv • 68 citations
Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Sentiment Adaptive End-to-end Dialog Systems (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 75 citations
Weiyan Shi, Zhou Yu
Neural Abstractive Text Summarization With Sequence-to-sequence Models (2018) • Arxiv • 68 citations
Shi et al.
Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-experts Layer (2017) • Arxiv • 268 citations
Shazeer et al.
Disan: Directional Self-attention Network For Rnn/cnn-free Language Understanding (2017) • Arxiv • 113 citations
Shen et al.
End-to-end Neural Coreference Resolution (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 84 citations
Lee et al.
Opennmt: Open-source Toolkit For Neural Machine Translation (2017) • Proceedings of ACL 2017, System Demonstrations • 287 citations
Klein et al.
Learning Intrinsic Sparse Structures Within Long Short-term Memory (2017) • Arxiv • 103 citations
Wen et al.
Non-autoregressive Neural Machine Translation (2017) • Arxiv • 449 citations
Gu et al.
Learning To Remember Rare Events (2017) • Arxiv • 238 citations
Kaiser et al.
Discourse-based Objectives For Fast Unsupervised Sentence Representation Learning (2017) • Arxiv • 111 citations
Yacine Jernite, Samuel R. Bowman, David Sontag
Yellowfin And The Art Of Momentum Tuning (2017) • Arxiv • 63 citations
Jian Zhang, Ioannis Mitliagkas
Fast And Accurate Entity Recognition With Iterated Dilated Convolutions (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 51 citations
Strubell et al.
Generalisation In Named Entity Recognition: A Quantitative Analysis (2017) • Computer Speech & Language • 118 citations
Isabelle Augenstein, Leon Derczynski, Kalina Bontcheva
Block-sparse Recurrent Neural Networks (2017) • Arxiv • 96 citations
Sharan Narang, Eric Undersander, Gregory Diamos
Efficient Vector Representation For Documents Through Corruption (2017) • Arxiv • 78 citations
Minmin Chen
Adacomp : Adaptive Residual Gradient Compression For Data-parallel Distributed Training (2017) • Arxiv • 74 citations
Chen et al.
Deep Gradient Compression: Reducing The Communication Bandwidth For Distributed Training (2017) • ICLR 2018 • 645 citations
Lin et al.
Reversible Architectures For Arbitrarily Deep Residual Neural Networks (2017) • Arxiv • 70 citations
Chang et al.
Neuraghe: Exploiting CPU-FPGA Synergies For Efficient And Flexible CNN Inference Acceleration On Zynq Socs (2017) • ACM Transactions on Reconfigurable Technology and Systems • 48 citations
Meloni et al.
Tacotron: Towards End-to-end Speech Synthesis (2017) • Interspeech 2017 • 1567 citations
Wang et al.
Neural Response Generation With Dynamic Vocabularies (2017) • Arxiv • 50 citations
Wu et al.
Attentive Memory Networks: Efficient Machine Reading For Conversational Search (2017) • Proceedings of 1st International Workshop on Conversational Approaches to Information Retrieval Tokyo Japan August 11 2017 (CAIR17) • 40 citations
Tom Kenter, Maarten de Rijke
Empower Sequence Labeling With Task-aware Neural Language Model (2017) • Proceedings of the AAAI Conference on Artificial Intelligence • 126 citations
Liu et al.
Neural Speed Reading Via Skim-rnn (2017) • Arxiv • 53 citations
Seo et al.
Learned Optimizers That Scale And Generalize (2017) • Arxiv • 115 citations
Wichrowska et al.
Bpemb: Tokenization-free Pre-trained Subword Embeddings In 275 Languages (2017) • Arxiv • 127 citations
Benjamin Heinzerling, Michael Strube
Multilingual Hierarchical Attention Networks For Document Classification (2017) • Arxiv • 48 citations
Nikolaos Pappas, Andrei Popescu-Belis
Beam Search Strategies For Neural Machine Translation (2017) • Proceedings of the First Workshop on Neural Machine Translation • 235 citations
Markus Freitag, Yaser Al-Onaizan
Hyperbolic Representation Learning For Fast And Efficient Neural Question Answering (2017) • Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining • 53 citations
Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Using The Output Embedding To Improve Language Models (2016) • Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers • 94 citations
Ofir Press, Lior Wolf
A Fast Unified Model For Parsing And Sentence Understanding (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 229 citations
Bowman et al.
Quasi-recurrent Neural Networks (2016) • Arxiv • 326 citations
Bradbury et al.
A Convolutional Encoder Model For Neural Machine Translation (2016) • Arxiv • 64 citations
Gehring et al.
Multiresolution Recurrent Neural Networks: An Application To Dialogue Response Generation (2016) • Arxiv • 73 citations
Serban et al.
Acceleration Of Deep Neural Network Training With Resistive Cross-point Devices (2016) • Frontiers in Neuroscience • 407 citations
Tayfun Gokmen, Yurii Vlasov
Neuro-symbolic Program Synthesis (2016) • Arxiv • 105 citations
Parisotto et al.
The AMU-UEDIN Submission To The WMT16 News Translation Task: Attention-based NMT Models As Feature Functions In Phrase-based SMT (2016) • Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers • 56 citations
Marcin Junczys-Dowmunt, Tomasz Dwojak, Rico Sennrich
Compression Of Neural Machine Translation Models Via Pruning (2016) • Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning • 50 citations
Abigail See, Minh-Thang Luong, Christopher D. Manning
Google's Neural Machine Translation System: Bridging The Gap Between Human And Machine Translation (2016) • Arxiv • 5627 citations
Wu et al.
Neural Symbolic Machines: Learning Semantic Parsers On Freebase With Weak Supervision (2016) • Arxiv • 44 citations
Liang et al.
Personalized Speech Recognition On Mobile Devices (2016) • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 158 citations
McGraw et al.
Scaling Memory-augmented Neural Networks With Sparse Reads And Writes (2016) • Arxiv • 58 citations
Rae et al.

Showing first 12 while collapsed. Click to expand and reveal all 1547.

Emergent Abilities 350 papers #

Segagent: Exploring Pixel Understanding Capabilities In Mllms By Imitating Human Annotator Trajectories (2025) • No Venue
Zhu et al.
Layercake: Token-aware Contrastive Decoding Within Large Language Model Layers (2025) • No Venue
Zhu et al.
Scaling Latent Reasoning Via Looped Language Models (2025) • No Venue
Zhu et al.
Deepresearch Arena: The First Exam Of Llms' Research Abilities Via Seminar-grounded Tasks (2025) • No Venue
Wan et al.
Emergent Hierarchical Reasoning In Llms Through Reinforcement Learning (2025) • No Venue
Wang et al.
The End Of Manual Decoding: Towards Truly End-to-end Language Models (2025) • No Venue
Wang et al.
Game-tars: Pretrained Foundation Models For Scalable Generalist Multimodal Game Agents (2025) • No Venue
Wang et al.
Fostering Video Reasoning Via Next-event Prediction (2025) • No Venue
Wang et al.
Grasp Any Region: Towards Precise, Contextual Pixel Understanding For Multimodal Llms (2025) • No Venue
Wang et al.
Mcp-bench: Benchmarking Tool-using LLM Agents With Complex Real-world Tasks Via MCP Servers (2025) • No Venue
Wang et al.
Variational Reasoning For Language Models (2025) • No Venue
Zhou et al.
Scientists' First Exam: Probing Cognitive Abilities Of MLLM Via Perception, Understanding, And Reasoning (2025) • No Venue
Zhou et al.
Breaking The Exploration Bottleneck: Rubric-scaffolded Reinforcement Learning For General LLM Reasoning (2025) • No Venue
Zhou et al.
Complexfuncbench: Exploring Multi-step And Constrained Function Calling Under Long-context Scenario (2025) • No Venue
Zhong et al.
Inverse Ifeval: Can Llms Unlearn Stubborn Training Conventions To Follow Real Instructions? (2025) • No Venue
Zhang et al.
Sentient Agent As A Judge: Evaluating Higher-order Social Cognition In Large Language Models (2025) • No Venue
Zhang et al.
First Return, Entropy-eliciting Explore (2025) • No Venue
Zheng et al.
The Stochastic Parrot On Llm's Shoulder: A Summative Assessment Of Physical Concept Understanding (2025) • No Venue
Yu et al.
SIFT: Grounding LLM Reasoning In Contexts Via Stickers (2025) • No Venue
Zeng et al.
Embodied-reasoner: Synergizing Visual Search, Reasoning, And Action For Embodied Interactive Tasks (2025) • No Venue
Zhang et al.
Ovis-u1 Technical Report (2025) • No Venue
Wang et al.
Worldpm: Scaling Human Preference Modeling (2025) • No Venue
Wang et al.
Ggbench: A Geometric Generative Reasoning Benchmark For Unified Multimodal Models (2025) • No Venue
Wei et al.
Univideo: Unified Understanding, Generation, And Editing For Videos (2025) • No Venue
Wei et al.
Video Models Are Zero-shot Learners And Reasoners (2025) • No Venue
Wiedemer et al.
LAPO: Internalizing Reasoning Efficiency Via Length-adaptive Policy Optimization (2025) • No Venue
Wu et al.
Agent0: Unleashing Self-evolving Agents From Zero Data Via Tool-integrated Reasoning (2025) • No Venue
Xia et al.
Aworld: Dynamic Multi-agent System With Stable Maneuvering For Robust GAIA Problem Solving (2025) • No Venue
Xie et al.
Easyedit2: An Easy-to-use Steering Framework For Editing Large Language Models (2025) • No Venue
Xu et al.
Qwen3-omni Technical Report (2025) • No Venue
Xu et al.
Step-audio-editx Technical Report (2025) • No Venue
Yan et al.
Recitation Over Reasoning: How Cutting-edge Language Models Can Fail On Elementary School-level Reasoning Problems? (2025) • No Venue
Yan et al.
Moose-chem2: Exploring LLM Limits In Fine-grained Scientific Hypothesis Discovery Via Hierarchical Search (2025) • No Venue
Yang et al.
Reasonflux: Hierarchical LLM Reasoning Via Scaling Thought Templates (2025) • No Venue
Yang et al.
Twinmarket: A Scalable Behavioral And Social Simulation For Financial Markets (2025) • No Venue
Yang et al.
Spin-bench: How Well Do Llms Plan Strategically And Reason Socially? (2025) • No Venue
Yao et al.
LIMO: Less Is More For Reasoning (2025) • No Venue
Ye et al.
Shapellm-omni: A Native Multimodal LLM For 3D Generation And Understanding (2025) • No Venue
Ye et al.
Grokking In The Wild: Data Augmentation For Real-world Multi-hop Reasoning With Transformers (2025) • No Venue
Roman Abramov, Felix Steinbauer, Gjergji Kasneci
Rethinking Reflection In Pre-training (2025) • No Venue
Ai et al.
CRANE: Reasoning With Constrained LLM Generation (2025) • No Venue
Banerjee et al.
KV Cache Steering For Inducing Reasoning In Small Language Models (2025) • No Venue
Belitsky et al.
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study (2025) • No Venue
Cai et al.
Agentfrontier: Expanding The Capability Frontier Of LLM Agents With Zpd-guided Data Synthesis (2025) • No Venue
Chen et al.
FINEREASON: Evaluating And Improving Llms' Deliberate Reasoning Through Reflective Puzzle Solving (2025) • No Venue
Chen et al.
Evolve The Method, Not The Prompts: Evolutionary Synthesis Of Jailbreak Attacks On Llms (2025) • No Venue
Chen et al.
Longpo: Long Context Self-evolution Of Large Language Models Through Short-to-long Preference Optimization (2025) • No Venue
Chen et al.
Persona Vectors: Monitoring And Controlling Character Traits In Language Models (2025) • No Venue
Chen et al.
Tivibench: Benchmarking Think-in-video Reasoning For Video Generative Models (2025) • No Venue
Chen et al.
WEAVE: Unleashing And Benchmarking The In-context Interleaved Comprehension And Generation (2025) • No Venue
Chow et al.
Modifying Large Language Model Post-training For Diverse Creative Writing (2025) • No Venue
Chung et al.
Gemini 2.5: Pushing The Frontier With Advanced Reasoning, Multimodality, Long Context, And Next Generation Agentic Capabilities (2025) • No Venue
Comanici et al.
Ebt-policy: Energy Unlocks Emergent Physical Reasoning Capabilities (2025) • No Venue
Davies et al.
Emerging Properties In Unified Multimodal Pretraining (2025) • No Venue
Deng et al.
SONAR-LLM: Autoregressive Transformer That Thinks In Sentence Embeddings And Speaks In Tokens (2025) • No Venue
Dragunov et al.
Deepresearch Bench: A Comprehensive Benchmark For Deep Research Agents (2025) • No Venue
Du et al.
Virgo: A Preliminary Exploration On Reproducing O1-like MLLM (2025) • No Venue
Du et al.
Creation-mmbench: Assessing Context-aware Creative Intelligence In MLLM (2025) • No Venue
Fang et al.
Cognitive Kernel-pro: A Framework For Deep Research Agents And Agent Foundation Models Training (2025) • No Venue
Fang et al.
Robix: A Unified Model For Robot Interaction, Reasoning And Planning (2025) • No Venue
Fang et al.
On Path To Multimodal Generalist: General-level And General-bench (2025) • No Venue
Fei et al.
Retool: Reinforcement Learning For Strategic Tool Use In Llms (2025) • No Venue
Feng et al.
Multiple Choice Questions: Reasoning Makes Large Language Models (llms) More Self-confident Even When They Are Wrong (2025) • No Venue
Fu et al.
Could Thinking Multilingually Empower LLM Reasoning? (2025) • No Venue
Gao et al.
Intuitive Physics Understanding Emerges From Self-supervised Pretraining On Natural Videos (2025) • No Venue
Garrido et al.
Inverse Scaling In Test-time Compute (2025) • No Venue
Gema et al.
Great Models Think Alike And This Undermines AI Oversight (2025) • No Venue
Goel et al.
Thinkmorph: Emergent Properties In Multimodal Interleaved Chain-of-thought Reasoning (2025) • No Venue
Gu et al.
Rstar-math: Small Llms Can Master Math Reasoning With Self-evolved Deep Thinking (2025) • No Venue
Guan et al.
Audiostory: Generating Long-form Narrative Audio With Large Language Models (2025) • No Venue
Guo et al.
Learning To See Before Seeing: Demystifying LLM Visual Priors From Language Pre-training (2025) • No Venue
Han et al.
Can Large Language Models Detect Errors In Long Chain-of-thought Reasoning? (2025) • No Venue
He et al.
Xolver: Multi-agent Reasoning With Holistic Experience Learning Just Like An Olympiad Team (2025) • No Venue
Hosain et al.
Group Think: Multiple Concurrent Reasoning Agents Collaborating At Token Level Granularity (2025) • No Venue
Hsu et al.
Beyond 'aha!': Toward Systematic Meta-abilities Alignment In Large Reasoning Models (2025) • No Venue
Hu et al.
Llms Learn To Deceive Unintentionally: Emergent Misalignment In Dishonesty From Misaligned Samples To Biased Human-ai Interactions (2025) • No Venue
Hu et al.
The Imitation Game: Turing Machine Imitator Is Length Generalizable Reasoner (2025) • No Venue
Hua et al.
Video-mmmu: Evaluating Knowledge Acquisition From Multi-discipline Professional Videos (2025) • No Venue
Hu et al.
R-zero: Self-evolving Reasoning LLM From Zero Data (2025) • No Venue
Huang et al.
Vchain: Chain-of-visual-thought For Reasoning In Video Generation (2025) • No Venue
Huang et al.
Reasoning Model Is Stubborn: Diagnosing Instruction Overriding In Reasoning Models (2025) • No Venue
Jang et al.
Feedback Friction: Llms Struggle To Fully Incorporate External Feedback (2025) • No Venue
Jiang et al.
Omnihuman-1.5: Instilling An Active Mind In Avatars Via Cognitive Simulation (2025) • No Venue
Jiang et al.
Why Language Models Hallucinate (2025) • No Venue
Kalai et al.
Reasoning With Sampling: Your Base Model Is Smarter Than You Think (2025) • No Venue
Aayush Karan, Yilun Du
Universal Reasoner: A Single, Composable Plug-and-play Reasoner For Frozen Llms (2025) • No Venue
Kim et al.
The Cot Encyclopedia: Analyzing, Predicting, And Controlling How A Reasoning Model Will Think (2025) • No Venue
Lee et al.
Evolving Deeper LLM Thinking (2025) • No Venue
Lee et al.
Analysing Chain Of Thought Dynamics: Active Guidance Or Unfaithful Post-hoc Rationalisation? (2025) • No Venue
Lewis-Lim et al.
Baichuan-omni-1.5 Technical Report (2025) • No Venue
Li et al.
Diffusion Language Models Know The Answer Before Decoding (2025) • No Venue
Li et al.
Llms Can Easily Learn To Reason From Demonstrations Structure, Not Content, Is What Matters! (2025) • No Venue
Li et al.
Where To Find Grokking In LLM Pretraining? Monitor Memorization-to-generalization Without Test (2025) • No Venue
Ziyue Li, Chenrui Fan, Tianyi Zhou
ROVER: Benchmarking Reciprocal Cross-modal Reasoning For Omnimodal Generation (2025) • No Venue
Liang et al.
Implicit Reasoning In Transformers Is Reasoning Through Shortcuts (2025) • No Venue
Lin et al.
Metaladder: Ascending Mathematical Solution Quality Via Analogical-problem Reasoning Transfer (2025) • No Venue
Lin et al.
Understanding Tool-integrated Reasoning (2025) • No Venue
Heng Lin, Zhongwen Xu
Deciphering Trajectory-aided LLM Reasoning: An Optimization Perspective (2025) • No Venue
Liu et al.
Infiguiagent: A Multimodal Generalist GUI Agent With Native Reasoning And Reflection (2025) • No Venue
Liu et al.
Visual-rft: Visual Reinforcement Fine-tuning (2025) • No Venue
Liu et al.
Olmotrace: Tracing Language Model Outputs Back To Trillions Of Training Tokens (2025) • No Venue
Liu et al.
New Trends For Modern Machine Translation With Large Reasoning Models (2025) • No Venue
Liu et al.
Learning From Peers In Reasoning Models (2025) • No Venue
Luo et al.
Large Language Model Agent: A Survey On Methodology, Applications And Challenges (2025) • No Venue
Luo et al.
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, And Control In Spaces (2025) • No Venue
Luo et al.
Deepseek-r1 Thoughtology: Let's About LLM Reasoning (2025) • No Venue
Marjanović et al.
Holocine: Holistic Generation Of Cinematic Multi-shot Long Video Narratives (2025) • No Venue
Meng et al.
Exploring The Latent Capacity Of Llms For One-step Text Generation (2025) • No Venue
Gleb Mezentsev, Ivan Oseledets
Benchmarking Llms' Swarm Intelligence (2025) • No Venue
Ruan et al.
How Do Llms Acquire New Knowledge? A Knowledge Circuits Perspective On Continual Pre-training (2025) • No Venue
Ou et al.
Large Language Models Think Too Fast To Explore Effectively (2025) • No Venue
Lan Pan, Hanbo Xie, Robert C. Wilson
Thinking Sparks!: Emergent Attention Heads In Reasoning Models During Post Training (2025) • No Venue
Yein Park, Minbyul Jeong, Jaewoo Kang
RWKV-7 "goose" With Expressive Dynamic State Evolution (2025) • No Venue
Peng et al.
BEAR: Benchmarking And Enhancing Multimodal Language Models For Atomic Embodied Capabilities (2025) • No Venue
Qi et al.
Llm-microscope: Uncovering The Hidden Role Of Punctuation In Context Memory Of Transformers (2025) • No Venue
Razzhigaev et al.
RL + Transformer = A General-purpose Problem Solver (2025) • No Venue
Micah Rentschler, Jesse Roberts
Hogwild! Inference: Parallel LLM Generation Via Concurrent Attention (2025) • No Venue
Rodionov et al.
Training Language Models For Social Deduction With Multi-agent Reinforcement Learning (2025) • No Venue
Sarkar et al.
Can Language Models Falsify? Evaluating Algorithmic Reasoning With Counterexample Creation (2025) • No Venue
Sinha et al.
The Illusion Of Diminishing Returns: Measuring Long Horizon Execution In Llms (2025) • No Venue
Sinha et al.
Auto-regressive Vs Flow-matching: A Comparative Study Of Modeling Paradigms For Text-to-music Generation (2025) • No Venue
Or Tal, Felix Kreuk, Yossi Adi
Enabling Scalable Oversight Via Self-evolving Critic (2025) • No Venue
Tang et al.
PAN: A World Model For General, Interactable, And Long-horizon World Simulation (2025) • No Venue
Team et al.
Step-audio-r1 Technical Report (2025) • No Venue
Tian et al.
From Words To Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-context Examples (2024) • No Venue
Vacareanu et al.
Agent Workflow Memory (2024) • No Venue
Wang et al.
Gpt-4o System Card (2024) • No Venue
Openai et al.
Phi-3 Technical Report: A Highly Capable Language Model Locally On Your Phone (2024) • No Venue
Abdin et al.
LLM Augmented Llms: Expanding Capabilities Through Composition (2024) • No Venue
Bansal et al.
Text2sql Is Not Enough: Unifying AI And Databases With TAG (2024) • No Venue
Biswal et al.
Transformers Meet Neural Algorithmic Reasoners (2024) • No Venue
Bounsi et al.
Genie: Generative Interactive Environments (2024) • No Venue
Bruce et al.
Roadmap Towards Superhuman Speech Understanding Using Large Language Models (2024) • No Venue
Bu et al.
Law Of The Weakest Link: Cross Capabilities Of Large Language Models (2024) • No Venue
Zhong et al.
Chatmusician: Understanding And Generating Music Intrinsically With LLM (2024) • No Venue
Yuan et al.
How Do Large Language Models Acquire Factual Knowledge During Pretraining? (2024) • No Venue
Chang et al.
Premise Order Matters In Reasoning With Large Language Models (2024) • No Venue
Chen et al.
Mora: Enabling Generalist Video Generation Via A Multi-agent Framework (2024) • No Venue
Yuan et al.
Symbolicai: A Framework For Logic-based Approaches Combining Generative Models And Solvers (2024) • No Venue
Dinu et al.
ALPINE: Unveiling The Planning Capability Of Autoregressive Learning In Language Models (2024) • No Venue
Wang et al.
Emu3: Next-token Prediction Is All You Need (2024) • No Venue
Wang et al.
Mtu-bench: A Multi-granularity Tool-use Benchmark For Large Language Models (2024) • No Venue
Wang et al.
Scaling Instructable Agents Across Many Simulated Worlds (2024) • No Venue
Team et al.
MIO: A Foundation Model On Multimodal Tokens (2024) • No Venue
Wang et al.
Loong: Generating Minute-level Long Videos With Autoregressive Language Models (2024) • No Venue
Wang et al.
Hermes 3 Technical Report (2024) • No Venue
Ryan Teknium, Jeffrey Quesnelle, Chen Guang
Mdpo: Conditional Preference Optimization For Multimodal Large Language Models (2024) • No Venue
Wang et al.
Mixture-of-agents Enhances Large Language Model Capabilities (2024) • No Venue
Wang et al.
Knowledge Mechanisms In Large Language Models: A Survey And Perspective (2024) • No Venue
Wang et al.
MMAU: A Massive Multi-task Audio Understanding And Reasoning Benchmark (2024) • No Venue
Sakshi et al.
BASE TTS: Lessons From Building A Billion-parameter Text-to-speech Model On 100K Hours Of Data (2024) • No Venue
Łajszczak et al.
Programming Every Example: Lifting Pre-training Data Quality Like Experts At Scale (2024) • No Venue
Zhou et al.
Can Llms Generate Novel Research Ideas? A Large-scale Human Study With 100+ NLP Researchers (2024) • No Venue
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Are Your Llms Capable Of Stable Reasoning? (2024) • No Venue
Liu et al.
Deliberation In Latent Space Via Differentiable Cache Augmentation (2024) • No Venue
Liu et al.
Oryx MLLM: On-demand Spatial-temporal Understanding At Arbitrary Resolution (2024) • No Venue
Liu et al.
Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments (2024) • No Venue
Xi et al.
Multimodal Self-instruct: Synthetic Abstract Image And Visual Reasoning Instruction Using Language Model (2024) • No Venue
Zhang et al.
The AI Scientist: Towards Fully Automated Open-ended Scientific Discovery (2024) • No Venue
Lu et al.
Generative World Explorer (2024) • No Venue
Lu et al.
Mathverse: Does Your Multi-modal LLM Truly See The Diagrams In Visual Math Problems? (2024) • No Venue
Zhang et al.
Divide-or-conquer? Which Part Should You Distill Your LLM? (2024) • No Venue
Wu et al.
Large Language Models Surpass Human Experts In Predicting Neuroscience Results (2024) • Nature Human Behaviour • 57 citations
Luo et al.
Thinking Llms: General Instruction Following With Thought Generation (2024) • No Venue
Wu et al.
Meta-rewarding Language Models: Self-improving Alignment With Llm-as-a-meta-judge (2024) • No Venue
Wu et al.
Transformers Can Do Arithmetic With The Right Embeddings (2024) • No Venue
McLeish et al.
Whiteboard-of-thought: Thinking Step-by-step Across Modalities (2024) • No Venue
Sachit Menon, Richard Zemel, Carl Vondrick
Realm: Reference Resolution As Language Modeling (2024) • No Venue
Moniz et al.
MALT: Improving Reasoning With Multi-agent LLM Training (2024) • No Venue
Motwani et al.
Olmoe: Open Mixture-of-experts Language Models (2024) • No Venue
Muennighoff et al.
Dynasaur: Large Language Agents Beyond Predefined Actions (2024) • No Venue
Nguyen et al.
Llms Know More Than They Show: On The Intrinsic Representation Of LLM Hallucinations (2024) • No Venue
Orgad et al.
Nemotron-4 15B Technical Report (2024) • No Venue
Parmar et al.
Movie Gen: A Cast Of Media Foundation Models (2024) • No Venue
Polyak et al.
Mutual Reasoning Makes Smaller Llms Stronger Problem-solvers (2024) • No Venue
Qi et al.
We-math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? (2024) • No Venue
Qiao et al.
Self-discover: Large Language Models Self-compose Reasoning Structures (2024) • No Venue
Zhou et al.
Humanoid Locomotion As Next Token Prediction (2024) • No Venue
Radosavovic et al.
Hellobench: Evaluating Long Text Generation Capabilities Of Large Language Models (2024) • No Venue
Que et al.
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning With Checklist (2024) • No Venue
Zhou et al.
Long-form Factuality In Large Language Models (2024) • No Venue
Wei et al.
Needle Threading: Can Llms Follow Threads Through Near-million-scale Haystacks? (2024) • No Venue
Jonathan Roberts, Kai Han, Samuel Albanie
Direct Nash Optimization: Teaching Language Models To Self-improve With General Preferences (2024) • No Venue
Rosset et al.
Hyperclova X Technical Report (2024) • No Venue
Yoo et al.
The Llama 3 Herd Of Models (2024) • No Venue
Dubey et al.
Not All Language Model Features Are Linear (2024) • No Venue
Engels et al.
Mplug-owl3: Towards Long Image-sequence Understanding In Multi-modal Large Language Models (2024) • No Venue
Ye et al.
VITA: Towards Open-source Interactive Omni Multimodal LLM (2024) • No Venue
Fu et al.
Stream Of Search (sos): Learning To Search In Language (2024) • No Venue
Gandhi et al.
GAMA: A Large Audio-language Model With Advanced Audio Understanding And Complex Reasoning Abilities (2024) • No Venue
Ghosh et al.
Mulberry: Empowering MLLM With O1-like Reasoning And Reflection Via Collective Monte Carlo Tree Search (2024) • No Venue
Yao et al.
Av-odyssey Bench: Can Your Multimodal Llms Really Understand Audio-visual Information? (2024) • No Venue
Gong et al.
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2024) • No Venue
Grosnit et al.
Spotting Llms With Binoculars: Zero-shot Detection Of Machine-generated Text (2024) • No Venue
Hans et al.
Training Large Language Models To Reason In A Continuous Latent Space (2024) • No Venue
Hao et al.
Data Mixture Inference: What Do BPE Tokenizers Reveal About Their Training Data? (2024) • No Venue
Hayase et al.
Mmworld: Towards Multi-discipline Multi-faceted World Model Evaluation In Videos (2024) • No Venue
He et al.
Do Large Language Models Latently Perform Multi-hop Reasoning? (2024) • No Venue
Yang et al.
The Dawn Of GUI Agent: A Preliminary Case Study With Claude 3.5 Computer Use (2024) • No Venue
Hu et al.
Visual Sketchpad: Sketching As A Visual Chain Of Thought For Multimodal Language Models (2024) • No Venue
Hu et al.
Genmac: Compositional Text-to-video Generation With Multi-agent Collaboration (2024) • No Venue
Huang et al.
Smaller Language Models Are Better Instruction Evolvers (2024) • No Venue
Hui et al.
Symdpo: Boosting In-context Learning Of Large Multimodal Models With Symbol Demonstration Direct Preference Optimization (2024) • No Venue
Jia et al.
LLM Maybe Longlm: Self-extend LLM Context Window Without Tuning (2024) • No Venue
Jin et al.
To Believe Or Not To Believe Your LLM (2024) • No Venue
Yadkori et al.
Evaluating Language Models As Synthetic Data Generators (2024) • No Venue
Kim et al.
Stark: Social Long-term Multi-modal Conversation With Persona Commonsense Knowledge (2024) • No Venue
Lee et al.
Materials Science In The Era Of Large Language Models: A Perspective (2024) • Digital Discovery • 49 citations
Ge Lei, Ronan Docherty, Samuel J. Cooper
Common 7B Language Models Already Possess Strong Math Capabilities (2024) • No Venue
Li et al.
More Agents Is All You Need (2024) • No Venue
Li et al.
Needlebench: Can Llms Do Retrieval And Reasoning In 1 Million Context Window? (2024) • No Venue
Li et al.
Hallucination Is Inevitable: An Innate Limitation Of Large Language Models (2024) • Arxiv • 138 citations
Ziwei Xu, Sanjay Jain, Mohan Kankanhalli
Unbounded: A Generative Infinite Game Of Character Life Simulation (2024) • No Venue
Li et al.
Map-neo: Highly Capable And Transparent Bilingual Large Language Model Series (2024) • No Venue
Zhang et al.
Critical Tokens Matter: Token-level Contrastive Estimation Enhence Llm's Reasoning Capability (2024) • No Venue
Lin et al.
Travelplanner: A Benchmark For Real-world Planning With Language Agents (2024) • No Venue
Xie et al.
Demystifying GPT Self-repair For Code Generation (2023) • No Venue
Olausson et al.
System 2 Attention (is Something You Might Need Too) (2023) • No Venue
Jason Weston, Sainbayar Sukhbaatar
Large Language Models Can Infer Psychological Dispositions Of Social Media Users (2023) • PNAS Nexus • 45 citations
Heinrich Peters, Sandra Matz
The Generative AI Paradox: "what It Can Create, It May Not Understand" (2023) • No Venue
West et al.
LMDX: Language Model-based Document Information Extraction And Localization (2023) • No Venue
Perot et al.
Kosmos-2: Grounding Multimodal Large Language Models To The World (2023) • No Venue
Peng et al.
From Word Models To World Models: Translating From Natural Language To The Probabilistic Language Of Thought (2023) • No Venue
Wong et al.
Generative Agents: Interactive Simulacra Of Human Behavior (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 941 citations
Park et al.
LLM360: Towards Fully Transparent Open-source Llms (2023) • No Venue
Liu et al.
Luminate: Structured Generation And Exploration Of Design Space With Large Language Models For Human-ai Co-creation (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 79 citations
Suh et al.
Sensecape: Enabling Multilevel Exploration And Sensemaking With Large Language Models (2023) • UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology • 89 citations
Suh et al.
Towards Autonomous System: Flexible Modular Production System Enhanced With Large Language Model Agents (2023) • 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA) • 57 citations
Xia et al.
Do Large Language Models Show Decision Heuristics Similar To Humans? A Case Study Using GPT-3.5 (2023) • Journal of Experimental Psychology: General • 46 citations
Suri et al.
Chameleon: Plug-and-play Compositional Reasoning With Large Language Models (2023) • Arxiv • 89 citations
Lu et al.
Unleashing The Emergent Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 48 citations
Wang et al.
Multilingual Machine Translation With Large Language Models: Empirical Results And Analysis (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 56 citations
Zhu et al.
Dissociating Language And Thought In Large Language Models (2023) • Trends in Cognitive Sciences • 191 citations
Mahowald et al.
GPT Has Become Financially Literate: Insights From Financial Literacy Tests Of GPT And A Preliminary Test Of How People Use It As A Source Of Advice (2023) • Finance Research Letters • 58 citations
Paweł Niszczota, Sami Abbas
GAIA: A Benchmark For General AI Assistants (2023) • No Venue
Mialon et al.
Text2kgbench: A Benchmark For Ontology-driven Knowledge Graph Generation From Text (2023) • Lecture Notes in Computer Science • 51 citations
Mihindukulasooriya et al.
Detectgpt: Zero-shot Machine-generated Text Detection Using Probability Curvature (2023) • Arxiv • 151 citations
Mitchell et al.
Orca 2: Teaching Small Language Models How To Reason (2023) • No Venue
Mitra et al.
Levels Of AGI For Operationalizing Progress On The Path To AGI (2023) • No Venue
Morris et al.
Orca: Progressive Learning From Complex Explanation Traces Of GPT-4 (2023) • No Venue
Mukherjee et al.
Bubogpt: Enabling Visual Grounding In Multi-modal Llms (2023) • No Venue
Zhao et al.
Self-refine: Iterative Refinement With Self-feedback (2023) • Arxiv • 202 citations
Madaan et al.
On The Opportunities And Challenges Of Foundation Models For Geospatial Artificial Intelligence (2023) • Arxiv • 63 citations
Mai et al.
A Survey On Multimodal Large Language Models (2023) • National Science Review • 271 citations
Yin et al.
Self-rag: Learning To Retrieve, Generate, And Critique Through Self-reflection (2023) • No Venue
Asai et al.
Tinystories: How Small Can Language Models Be And Still Speak Coherent English? (2023) • No Venue
Ronen Eldan, Yuanzhi Li
JARVIS-1: Open-world Multi-task Agents With Memory-augmented Multimodal Language Models (2023) • No Venue
Wang et al.
Document-level Machine Translation With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 71 citations
Wang et al.
Learning From Mistakes Makes LLM Better Reasoner (2023) • No Venue
An et al.
Llemma: An Open Language Model For Mathematics (2023) • No Venue
Azerbayev et al.
Emergent Autonomous Scientific Research Capabilities Of Large Language Models (2023) • Arxiv • 73 citations
Daniil A. Boiko, Robert MacKnight, Gabe Gomes
RT-2: Vision-language-action Models Transfer Web Knowledge To Robotic Control (2023) • No Venue
Brohan et al.
Sparks Of Artificial General Intelligence: Early Experiments With GPT-4 (2023) • Arxiv • 1480 citations
Bubeck et al.
Mechgpt, A Language-based Strategy For Mechanics And Materials Modeling That Connects Knowledge Across Scales, Disciplines And Modalities (2023) • Applied Mechanics Reviews • 74 citations
Markus J. Buehler
Chatgpt Informed Graph Neural Network For Stock Movement Prediction (2023) • SSRN Electronic Journal • 43 citations
Chen et al.
Beyond Surface: Probing Llama Across Scales And Layers (2023) • No Venue
Chen et al.
How Is Chatgpt's Behavior Changing Over Time? (2023) • No Venue
Lingjiao Chen, Matei Zaharia, James Zou
Skills-in-context Prompting: Unlocking Compositionality In Large Language Models (2023) • No Venue
Chen et al.
When Do You Need Chain-of-thought Prompting For Chatgpt? (2023) • World Wide Web • 190 citations
Chen et al.
Shepherd: A Critic For Language Model Generation (2023) • No Venue
Wang et al.
Is GPT-4 A Good Data Analyst? (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 65 citations
Liying Cheng, Xingxuan Li, Lidong Bing
Contrastive Chain-of-thought Prompting (2023) • No Venue
Chia et al.
Dola: Decoding By Contrasting Layers Improves Factuality In Large Language Models (2023) • No Venue
Chuang et al.
Simple And Controllable Music Generation (2023) • No Venue
Copet et al.
Merlin:empowering Multimodal Llms With Foresight Minds (2023) • No Venue
Yu et al.
Chatgpt-4 Outperforms Experts And Crowd Workers In Annotating Political Twitter Messages With Zero-shot Learning (2023) • Arxiv • 152 citations
Petter Törnberg
Uncovering Chatgpt's Capabilities In Recommender Systems (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 116 citations
Dai et al.
Chain-of-verification Reduces Hallucination In Large Language Models (2023) • No Venue
Dhuliawala et al.
General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Societal Implications And Responsible Governance (2023) • Information Fusion • 46 citations
Triguero et al.
Dreamllm: Synergistic Multimodal Comprehension And Creation (2023) • No Venue
Dong et al.
Large Language Model For Science: A Study On P Vs. NP (2023) • No Venue
Dong et al.
Ferret: Refer And Ground Anything Anywhere At Any Granularity (2023) • Arxiv • 43 citations
You et al.
Judging Llm-as-a-judge With Mt-bench And Chatbot Arena (2023) • No Venue
Zheng et al.
Exploring Large Language Models' Cognitive Moral Development Through Defining Issues Test (2023) • No Venue
Tanmay et al.
Gptscore: Evaluate As You Desire (2023) • Arxiv • 80 citations
Fu et al.
A Comprehensive Capability Analysis Of GPT-3 And GPT-3.5 Series Models (2023) • Arxiv • 181 citations
Ye et al.
Is Chatgpt A Good Causal Reasoner? A Comprehensive Evaluation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Gao et al.
Large Language Models On Wikipedia-style Survey Generation: An Evaluation In NLP Concepts (2023) • Humanities and Social Sciences Communications • 97 citations
Gao et al.
Mplug-owl2: Revolutionizing Multi-modal Large Language Model With Modality Collaboration (2023) • No Venue
Ye et al.
Gemini: A Family Of Highly Capable Multimodal Models (2023) • Arxiv • 758 citations
Team et al.
Imagebind: One Embedding Space To Bind Them All (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 552 citations
Girdhar et al.
How Far Are Large Language Models From Agents With Theory-of-mind? (2023) • No Venue
Zhou et al.
Expel: LLM Agents Are Experiential Learners (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Zhao et al.
Textbooks Are All You Need (2023) • No Venue
Gunasekar et al.
PPTC Benchmark: Evaluating Large Language Models For Powerpoint Task Completion (2023) • No Venue
Guo et al.
Deception Abilities Emerged In Large Language Models (2023) • Proceedings of the National Academy of Sciences • 45 citations
Thilo Hagendorff
Machine Psychology (2023) • Arxiv • 61 citations
Hagendorff et al.
From Task Structures To World Models: What Do Llms Know? (2023) • Trends in Cognitive Sciences • 43 citations
Ilker Yildirim, L. A. Paul
Pandagpt: One Model To Instruction-follow Them All (2023) • Arxiv • 46 citations
Su et al.
Evaluating Large Language Models On A Highly-specialized Topic, Radiation Oncology Physics (2023) • Frontiers in Oncology • 112 citations
Holmes et al.
Response: Emergent Analogical Reasoning In Large Language Models (2023) • Nature Human Behaviour • 238 citations
Damian Hodel, Jevin West
Graspgpt: Leveraging Semantic Knowledge From A Large Language Model For Task-oriented Grasping (2023) • IEEE Robotics and Automation Letters • 67 citations
Tang et al.
Harnessing The Power Of Llms In Practice: A Survey On Chatgpt And Beyond (2023) • ACM Transactions on Knowledge Discovery from Data • 303 citations
Yang et al.
Audiogpt: Understanding And Generating Speech, Music, Sound, And Talking Head (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Huang et al.
Contrastive Decoding Improves Reasoning In Large Language Models (2023) • No Venue
Sean O'Brien, Mike Lewis
Give Us The Facts: Enhancing Large Language Models With Knowledge Graphs For Fact-aware Language Modeling (2023) • IEEE Transactions on Knowledge and Data Engineering • 96 citations
Yang et al.
The Dawn Of Lmms: Preliminary Explorations With Gpt-4v(ision) (2023) • Arxiv • 160 citations
Yang et al.
Conceptfusion: Open-set Multimodal 3D Mapping (2023) • Robotics: Science and Systems XIX • 142 citations
Jatavallabhula et al.
Make-a-character: High Quality Text-to-3d Character Generation Within Minutes (2023) • No Venue
Ren et al.
Chain Of Code: Reasoning With A Language Model-augmented Code Emulator (2023) • No Venue
Li et al.
Videodirectorgpt: Consistent Multi-scene Video Generation Via Llm-guided Planning (2023) • No Venue
Lin et al.
Translating Natural Language To Planning Goals With Large-language Models (2023) • Arxiv • 44 citations
Xie et al.
Encouraging Divergent Thinking In Large Language Models Through Multi-agent Debate (2023) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 99 citations
Liang et al.
Learning To Model The World With Language (2023) • No Venue
Lin et al.
MM-VID: Advancing Video Understanding With Gpt-4v(ision) (2023) • No Venue
Lin et al.
Autonomous GIS: The Next-generation Ai-powered GIS (2023) • International Journal of Digital Earth • 99 citations
Zhenlong Li, Huan Ning
Otter: A Multi-modal Model With In-context Instruction Tuning (2023) • Arxiv • 87 citations
Li et al.
Textbooks Are All You Need II: Phi-1.5 Technical Report (2023) • No Venue
Li et al.
Theory Of Mind For Multi-agent Collaboration Via Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 40 citations
Li et al.
Your Diffusion Model Is Secretly A Zero-shot Classifier (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 99 citations
Li et al.
LARP: Language-agent Role Play For Open-world Games (2023) • No Venue
Yan et al.
Prometheus: Inducing Fine-grained Evaluation Capability In Language Models (2023) • No Venue
Kim et al.
Is Chatgpt A General-purpose Natural Language Processing Task Solver? (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 410 citations
Qin et al.
The Troubling Emergence Of Hallucination In Large Language Models -- An Extensive Definition, Quantification, And Prescriptive Remediations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 56 citations
Rawte et al.
Videopoet: A Large Language Model For Zero-shot Video Generation (2023) • No Venue
Kondratyuk et al.
Evaluating Large Language Models In Theory Of Mind Tasks (2023) • Proceedings of the National Academy of Sciences • 83 citations
Michal Kosinski
Chatgpt: Beginning Of An End Of Manual Linguistic Data Annotation? Use Case Of Automatic Genre Identification (2023) • Arxiv • 64 citations
Taja Kuzman, Igor Mozetič, Nikola Ljubešić
LISA: Reasoning Segmentation Via Large Language Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 205 citations
Lai et al.
A Systematic Study And Comprehensive Evaluation Of Chatgpt On Benchmark Datasets (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 69 citations
Laskar et al.
Voicebox: Text-guided Multilingual Universal Speech Generation At Scale (2023) • Arxiv • 44 citations
Le et al.
Hugginggpt: Solving AI Tasks With Chatgpt And Its Friends In Hugging Face (2023) • Arxiv • 264 citations
Shen et al.
Teaching Models To Express Their Uncertainty In Words (2022) • Arxiv • 53 citations
Stephanie Lin, Jacob Hilton, Owain Evans
Inner Monologue: Embodied Reasoning Through Planning With Language Models (2022) • Arxiv • 202 citations
Huang et al.
Discovering Language Model Behaviors With Model-written Evaluations (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 47 citations
Perez et al.
A Generalist Agent (2022) • Transactions on Machine Learning Research 11/2022 https://openreview.net/forum?id=1ikK0kHjvj • 60 citations
Reed et al.
Emergent World Representations: Exploring A Sequence Model Trained On A Synthetic Task (2022) • Arxiv • 59 citations
Li et al.
Impact Of Pretraining Term Frequencies On Few-shot Reasoning (2022) • Arxiv • 51 citations
Razeghi et al.
CM3: A Causal Masked Multimodal Model Of The Internet (2022) • Arxiv • 40 citations
Aghajanyan et al.
Using Cognitive Psychology To Understand GPT-3 (2022) • Arxiv • 61 citations
Marcel Binz, Eric Schulz
Gpt-neox-20b: An Open-source Autoregressive Language Model (2022) • Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models • 241 citations
Black et al.
Audiolm: A Language Modeling Approach To Audio Generation (2022) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 252 citations
Borsos et al.
Locating And Editing Factual Associations In GPT (2022) • Arxiv • 172 citations
Meng et al.
Do Large Language Models Know What Humans Know? (2022) • Cognitive Science • 63 citations
Trott et al.
Large Language Models And The Reverse Turing Test (2022) • Neural Computation • 99 citations
Terrence Sejnowski
Emergent Analogical Reasoning In Large Language Models (2022) • Nature Human Behaviour • 238 citations
Taylor Webb, Keith J. Holyoak, Hongjing Lu
Thinking Fast And Slow In Large Language Models (2022) • Nature Computational Science • 135 citations
Thilo Hagendorff, Sarah Fabi, Michal Kosinski
Neural Theory-of-mind? On The Limits Of Social Intelligence In Large Lms (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 81 citations
Sap et al.
Chatgpt: The End Of Online Exam Integrity? (2022) • Arxiv • 349 citations
Teo Susnjak
Language Models Are Few-shot Multilingual Learners (2021) • Proceedings of the 1st Workshop on Multilingual Representation Learning • 46 citations
Winata et al.
Popblends: Strategies For Conceptual Blending With Large Language Models (2021) • CHI '23: CHI Conference on Human Factors in Computing Systems • 43 citations
Wang et al.
Cliport: What And Where Pathways For Robotic Manipulation (2021) • Arxiv • 98 citations
Mohit Shridhar, Lucas Manuelli, Dieter Fox
On The Opportunities And Risks Of Foundation Models (2021) • Arxiv • 2055 citations
Bommasani et al.
Multitask Prompted Training Enables Zero-shot Task Generalization (2021) • Arxiv • 558 citations
Sanh et al.
Fantastically Ordered Prompts And Where To Find Them: Overcoming Few-shot Prompt Order Sensitivity (2021) • Arxiv • 118 citations
Lu et al.
Large Pre-trained Language Models Contain Human-like Biases Of What Is Right And Wrong To Do (2021) • Nature Machine Intelligence • 194 citations
Schramowski et al.
Autoprompt: Eliciting Knowledge From Language Models With Automatically Generated Prompts (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 611 citations
Shin et al.
Mechanisms For Handling Nested Dependencies In Neural-network Language Models And Humans (2020) • Cognition • 55 citations
Lakretz et al.
Negated And Misprimed Probes For Pretrained Language Models: Birds Can Talk, But Cannot Fly (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 69 citations
Nora Kassner, Hinrich Schütze
Emerging Cross-lingual Structure In Pretrained Language Models (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 46 citations
Wu et al.
One-shot Generalization In Deep Generative Models (2016) • Arxiv • 75 citations
Rezende et al.

Showing first 12 while collapsed. Click to expand and reveal all 350.

EMNLP 285 papers #

The Effect Of Sampling Temperature On Problem Solving In Large Language Models (2024) • Findings of the Association for Computational Linguistics: EMNLP 2024 • 76 citations
Matthew Renze, Erhan Guven
ORPO: Monolithic Preference Optimization Without Reference Model (2024) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 43 citations
Jiwoo Hong, Noah Lee, James Thorne
Monitoring Ai-modified Content At Scale: A Case Study On The Impact Of Chatgpt On AI Conference Peer Reviews (2024) • Arxiv • 59 citations
Liang et al.
Logic-lm: Empowering Large Language Models With Symbolic Solvers For Faithful Logical Reasoning (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 52 citations
Pan et al.
Text Classification Via Large Language Models (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 108 citations
Sun et al.
Large Language Model Is Not A Good Few-shot Information Extractor, But A Good Reranker For Hard Samples! (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 55 citations
Ma et al.
Sources Of Hallucination By Large Language Models On Inference Tasks (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 71 citations
McKenna et al.
Automatic Prompt Augmentation And Selection With Chain-of-thought From Labeled Data (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 43 citations
Kashun Shum, Shizhe Diao, Tong Zhang
MEGA: Multilingual Evaluation Of Generative AI (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 76 citations
Ahuja et al.
The Internal State Of An LLM Knows When It's Lying (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 68 citations
Amos Azaria, Tom Mitchell
Soulchat: Improving Llms' Empathy, Listening, And Comfort Abilities Through Fine-tuning With Multi-turn Empathy Conversations (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 43 citations
Chen et al.
"kelly Is A Warm Person, Joseph Is A Role Model": Gender Biases In Llm-generated Reference Letters (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 76 citations
Wan et al.
Is GPT-4 A Good Data Analyst? (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 65 citations
Liying Cheng, Xingxuan Li, Lidong Bing
Self-supervised Learning Of Action Affordances As Interaction Modes (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 41 citations
Wang et al.
Toxicity In Chatgpt: Analyzing Persona-assigned Language Models (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 92 citations
Deshpande et al.
Enhancing Retrieval-augmented Large Language Models With Iterative Retrieval-generation Synergy (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 94 citations
Shao et al.
Multiconer V2: A Large Multilingual Dataset For Fine-grained And Noisy Named Entity Recognition (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Fetahu et al.
Is Chatgpt A Good Causal Reasoner? A Comprehensive Evaluation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Gao et al.
Ureader: Universal Ocr-free Visually-situated Language Understanding With Multimodal Large Language Model (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 46 citations
Ye et al.
VIP5: Towards Multimodal Foundation Models For Recommendation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 41 citations
Geng et al.
Editing Large Language Models: Problems, Methods, And Opportunities (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 55 citations
Yao et al.
Huatuogpt, Towards Taming Language Model To Be A Doctor (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 128 citations
Zhang et al.
Evaluating Parameter-efficient Transfer Learning Approaches On SURE Benchmark For Speech Understanding (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 281 citations
Li et al.
Multi-step Jailbreaking Privacy Attacks On Chatgpt (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 117 citations
Li et al.
Synthetic Data Generation With Large Language Models For Text Classification: Potential And Limitations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 70 citations
Li et al.
Learning Disentangled Semantic Spaces Of Explanations Via Invertible Neural Networks (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 79 citations
Yingji Zhang, Danilo S. Carvalho, André Freitas
Repocoder: Repository-level Code Completion Through Iterative Retrieval And Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 66 citations
Zhang et al.
Extractive Summarization Via Chatgpt For Faithful Summary Generation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 53 citations
Haopeng Zhang, Xiao Liu, Jiawei Zhang
Speechgpt: Empowering Large Language Models With Intrinsic Cross-modal Conversational Abilities (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 108 citations
Zhang et al.
Summit: Iterative Text Summarization Via Chatgpt (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Haopeng Zhang, Xiao Liu, Jiawei Zhang
Don't Trust Chatgpt When Your Question Is Not In English: A Study Of Multilingual Abilities And Types Of Llms (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 42 citations
Zhang et al.
Chatgpt Beyond English: Towards A Comprehensive Evaluation Of Large Language Models In Multilingual Learning (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 91 citations
Lai et al.
Measuring And Narrowing The Compositionality Gap In Language Models (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 101 citations
Press et al.
RASAT: Integrating Relational Structures Into Pretrained Seq2seq Model For Text-to-sql (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 60 citations
Qi et al.
Improving Multi-task Generalization Via Regularizing Spurious Correlation (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 40 citations
Hu et al.
Improving Passage Retrieval With Zero-shot Question Generation (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 57 citations
Sachan et al.
Are Large Pre-trained Language Models Leaking Your Personal Information? (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 49 citations
Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang
Large Language Models Are Better Reasoners With Self-verification (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 49 citations
Weng et al.
Promptbert: Improving BERT Sentence Embeddings With Prompts (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 137 citations
Jiang et al.
Perturbation Augmentation For Fairer NLP (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 50 citations
Qian et al.
Can Language Models Learn From Explanations In Context? (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 59 citations
Lampinen et al.
Quantifying Privacy Risks Of Masked Language Models Using Membership Inference Attacks (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 40 citations
Mireshghallah et al.
Super-naturalinstructions: Generalization Via Declarative Instructions On 1600+ NLP Tasks (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 205 citations
Wang et al.
Large Language Models Are Few-shot Clinical Information Extractors (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 212 citations
Agrawal et al.
Language Models As Agent Models (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 44 citations
Jacob Andreas
Entity-centered Cross-document Relation Extraction (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 48 citations
Wang et al.
Medclip: Contrastive Learning From Unpaired Medical Images And Text (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 402 citations
Wang et al.
Tweetnlp: Cutting-edge Natural Language Processing For Social Media (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 95 citations
Camacho-Collados et al.
A Span-level Bidirectional Network For Aspect Sentiment Triplet Extraction (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 44 citations
Chen et al.
Towards A General Pre-training Framework For Adaptive Learning In Moocs (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 84 citations
Zhong et al.
Ernie-layout: Layout Knowledge Enhanced Pre-training For Visually-rich Document Understanding (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 53 citations
Peng et al.
Improving The Factual Correctness Of Radiology Report Generation With Semantic Rewards (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 41 citations
Delbrouck et al.
COLD: A Benchmark For Chinese Offensive Language Detection (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 66 citations
Deng et al.
Translation Between Molecules And Natural Language (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 92 citations
Edwards et al.
Plug-and-play VQA: Zero-shot VQA By Conjoining Large Pretrained Models With Zero Training (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 64 citations
Tiong et al.
Zerogen: Efficient Zero-shot Learning Via Dataset Generation (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 82 citations
Ye et al.
Demystifying Prompts In Language Models Via Perplexity Estimation (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 47 citations
Gonen et al.
WANLI: Worker And AI Collaboration For Natural Language Inference Dataset Creation (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 102 citations
Liu et al.
Retromae: Pre-training Retrieval-oriented Language Models Via Masked Auto-encoder (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 41 citations
Xiao et al.
Text-only Training For Image Captioning Using Noise-injected CLIP (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 62 citations
David Nukrai, Ron Mokady, Amir Globerson
Temporal Adaptation Of BERT And Performance On Downstream Document Classification: Insights From Social Media (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 45 citations
Paul Röttger, Janet B. Pierrehumbert
Topic-to-essay Generation With Comprehensive Knowledge Enhancement (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 49 citations
Zhiyue Liu, Jiahai Wang, Zhenghong Li
Visually Grounded Reasoning Across Languages And Cultures (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 84 citations
Liu et al.
Whiteningbert: An Easy Unsupervised Sentence Embedding Approach (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 70 citations
Huang et al.
Challenges In Detoxifying Language Models (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 47 citations
Welbl et al.
Transfernet: An Effective And Transparent Framework For Multi-hop Question Answering Over Relation Graph (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 86 citations
Shi et al.
RAP: Robustness-aware Perturbations For Defending Against Backdoor Attacks On NLP Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 55 citations
Yang et al.
MOMENTA: A Multimodal Framework For Detecting Harmful Memes And Their Targets (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 100 citations
Pramanick et al.
Pairwise Supervised Contrastive Learning Of Sentence Representations (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 41 citations
Zhang et al.
Retrieval Augmentation Reduces Hallucination In Conversation (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 370 citations
Shuster et al.
PASTE: A Tagging-free Decoding Framework Using Pointer Networks For Aspect Sentiment Triplet Extraction (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 49 citations
Mukherjee et al.
Datasets: A Community Library For Natural Language Processing (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 310 citations
Lhoest et al.
Layoutreader: Pre-training Of Text And Layout For Reading Order Detection (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 46 citations
Wang et al.
Just Say No: Analyzing The Stance Of Neural Dialogue Generation In Offensive Contexts (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 48 citations
Baheti et al.
TSDAE: Using Transformer-based Sequential Denoising Auto-encoder For Unsupervised Sentence Embedding Learning (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 76 citations
Kexin Wang, Nils Reimers, Iryna Gurevych
On Pursuit Of Designing Multi-modal Transformer For Video Grounding (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 48 citations
Cao et al.
Graph Based Network With Contextualized Representations Of Turns In Dialogue (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 40 citations
Bongseok Lee, Yong Suk Choi
Not All Negatives Are Equal: Label-aware Contrastive Loss For Fine-grained Text Classification (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 61 citations
Varsha Suresh, Desmond C. Ong
Editing Factual Knowledge In Language Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Nicola de Cao, Wilker Aziz, Ivan Titov
Plan-then-generate: Controlled Data-to-text Generation Via Planning (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 57 citations
Su et al.
Solving Aspect Category Sentiment Analysis As A Text Generation Task (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 56 citations
Liu et al.
Exploring Task Difficulty For Few-shot Relation Extraction (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 71 citations
Jiale Han, Bo Cheng, Wei Lu
The Stem Cell Hypothesis: Dilemma Behind Multi-task Learning With Transformer Encoders (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 85 citations
Han He, Jinho D. Choi
Multidoc2dial: Modeling Dialogues Grounded In Multiple Documents (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 42 citations
Feng et al.
Aspect Sentiment Quad Prediction As Paraphrase Generation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 174 citations
Zhang et al.
Exploring Underexplored Limitations Of Cross-domain Text-to-sql Generalization (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 52 citations
Yujian Gan, Xinyun Chen, Matthew Purver
Natural SQL: Making SQL Easier To Infer From Natural Language Specifications (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 58 citations
Gan et al.
Simcse: Simple Contrastive Learning Of Sentence Embeddings (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 2273 citations
Tianyu Gao, Xingcheng Yao, Danqi Chen
Competency Problems: On Finding And Removing Artifacts In Language Data (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 68 citations
Gardner et al.
PICARD: Parsing Incrementally For Constrained Auto-regressive Decoding From Language Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 140 citations
Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau
Crossfit: A Few-shot Learning Challenge For Cross-task Generalization In NLP (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 100 citations
Qinyuan Ye, Bill Yuchen Lin, Xiang Ren
Indobertweet: A Pretrained Language Model For Indonesian Twitter With Effective Domain-specific Vocabulary Initialization (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 45 citations
Fajri Koto, Jey Han Lau, Timothy Baldwin
Gpt3mix: Leveraging Large-scale Language Models For Text Augmentation (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 67 citations
Yoo et al.
Neural Path Hunter: Reducing Hallucination In Dialogue Systems Via Path Grounding (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 68 citations
Dziri et al.
Sequence-level Mixed Sample Data Augmentation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 68 citations
Demi Guo, Yoon Kim, Alexander M. Rush
XGLUE: A New Benchmark Dataset For Cross-lingual Pre-training, Understanding And Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 58 citations
Liang et al.
Towards Controllable Biases In Language Generation (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 84 citations
Sheng et al.
Utility Is In The Eye Of The User: A Critique Of NLP Leaderboards (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 127 citations
Kawin Ethayarajh, Dan Jurafsky
Will I Sound Like Me? Improving Persona Consistency In Dialogues Through Pragmatic Self-consciousness (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 44 citations
Hyunwoo Kim, Byeongchang Kim, Gunhee Kim
Differentially Private Representation For NLP: Formal Guarantee And An Empirical Study On Privacy And Fairness (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 47 citations
Lingjuan Lyu, Xuanli He, Yitong Li
Video2commonsense: Generating Commonsense Descriptions To Enrich Video Captioning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 59 citations
Fang et al.
Unifiedqa: Crossing Format Boundaries With A Single QA System (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 51 citations
Khashabi et al.
Autoprompt: Eliciting Knowledge From Language Models With Automatically Generated Prompts (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 611 citations
Shin et al.
Universal Natural Language Processing With Limited Annotations: Try Few-shot Textual Entailment As A Start (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Yin et al.
Look At The First Sentence: Position Bias In Question Answering (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 49 citations
Ko et al.
Doc2dial: A Goal-oriented Document-grounded Dialogue Dataset (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 84 citations
Feng et al.
Codebert: A Pre-trained Model For Programming And Natural Languages (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 2031 citations
Feng et al.
Leakage-adjusted Simulatability: Can Models Generate Non-trivial Explanations Of Their Behavior In Natural Language? (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 55 citations
Hase et al.
UNION: An Unreferenced Metric For Evaluating Open-ended Story Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Jian Guan, Minlie Huang
Train No Evil: Selective Masking For Task-guided Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 42 citations
Gu et al.
Coreferential Reasoning Learning For Language Representation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 160 citations
Ye et al.
The Language Interpretability Tool: Extensible, Interactive Visualizations And Analysis For NLP Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 61 citations
Tenney et al.
POINTER: Constrained Progressive Text Generation Via Insertion-based Generative Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 66 citations
Zhang et al.
Totto: A Controlled Table-to-text Generation Dataset (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 69 citations
Parikh et al.
Discern: Discourse-aware Entailment Reasoning Network For Conversational Machine Reading (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Gao et al.
BAE: Bert-based Adversarial Examples For Text Classification (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 49 citations
Siddhant Garg, Goutham Ramakrishnan
Evaluating Models' Local Decision Boundaries Via Contrast Sets (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 252 citations
Gardner et al.
Generative Data Augmentation For Commonsense Reasoning (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 92 citations
Yang et al.
On The Potential Of Lexico-logical Alignments For Semantic Parsing To SQL Queries (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 41 citations
Shi et al.
Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 59 citations
Liu et al.
OCNLI: Original Chinese Natural Language Inference (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 71 citations
Hu et al.
Discriminative Nearest Neighbor Few-shot Intent Detection By Transferring Natural Language Inference (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 65 citations
Zhang et al.
Slotrefine: A Fast Non-autoregressive Model For Joint Intent Detection And Slot Filling (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 71 citations
Wu et al.
ETC: Encoding Long And Structured Inputs In Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 103 citations
Ainslie et al.
Fighting The COVID-19 Infodemic: Modeling The Perspective Of Journalists, Fact-checkers, Social Media Platforms, Policy Makers, And The Society (2020) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 61 citations
Alam et al.
Hover: A Dataset For Many-hop Fact Extraction And Claim Verification (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 83 citations
Jiang et al.
Grid Tagging Scheme For Aspect-oriented Fine-grained Opinion Extraction (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 200 citations
Wu et al.
Textattack: A Framework For Adversarial Attacks, Data Augmentation, And Adversarial Training In NLP (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 245 citations
Morris et al.
Augmented Natural Language For Generative Sequence Labeling (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 53 citations
Athiwaratkun et al.
Phobert: Pre-trained Language Models For Vietnamese (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 255 citations
Dat Quoc Nguyen, Anh Tuan Nguyen
When BERT Plays The Lottery, All Tickets Are Winning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 44 citations
Sai Prasanna, Anna Rogers, Anna Rumshisky
Language Generation With Multi-hop Reasoning On Commonsense Knowledge Graph (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 95 citations
Ji et al.
Exploring Versatile Generative Language Model Via Parameter-efficient Transfer Learning (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 45 citations
Zhaojiang Lin, Andrea Madotto, Pascale Fung
Bertweet: A Pre-trained Language Model For English Tweets (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 432 citations
Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen
Graphdialog: Integrating Graph Knowledge Into End-to-end Task-oriented Dialogue Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Shiquan Yang, Rui Zhang, Sarah Erfani
TED: A Pretrained Unsupervised Summarization Model With Theme Modeling And Denoising (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 43 citations
Yang et al.
Response Selection For Multi-party Conversations With Dynamic Topic Tracking (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 40 citations
Weishi Wang, Shafiq Joty, Steven C. H. Hoi
Byte Pair Encoding Is Suboptimal For Language Model Pretraining (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 69 citations
Kaj Bostrom, Greg Durrett
On Negative Interference In Multilingual Models: Findings And A Meta-learning Treatment (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 70 citations
Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov
Connecting The Dots: A Knowledgeable Path Generator For Commonsense Question Answering (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 65 citations
Wang et al.
Cat-gen: Improving Robustness In NLP Models Via Controlled Adversarial Text Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 62 citations
Wang et al.
Factual Error Correction For Abstractive Summarization Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 118 citations
Cao et al.
Probing Pretrained Language Models For Lexical Semantics (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 53 citations
Vulić et al.
With Little Power Comes Great Responsibility (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 73 citations
Card et al.
KGPT: Knowledge-grounded Pre-training For Data-to-text Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 111 citations
Chen et al.
Hybridqa: A Dataset Of Multi-hop Question Answering Over Tabular And Textual Data (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 178 citations
Chen et al.
Recall And Learn: Fine-tuning Deep Pretrained Language Models With Less Forgetting (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 66 citations
Chen et al.
Logic2text: High-fidelity Natural Language Generation From Logical Forms (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 49 citations
Chen et al.
Local Additivity Based Data Augmentation For Semi-supervised NER (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Chen et al.
Multi-stage Influence Function (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 119 citations
Chen et al.
A Rigorous Study On Named Entity Recognition: Can Fine-tuning Pretrained Model Lead To The Promised Land? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 40 citations
Lin et al.
Few-shot Natural Language Generation For Task-oriented Dialog (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 157 citations
Peng et al.
Continual Learning For Natural Language Generation In Task-oriented Dialog Systems (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 53 citations
Mi et al.
Optimus: Organizing Sentences Via Pre-trained Modeling Of A Latent Space (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 129 citations
Li et al.
Adapterhub: A Framework For Adapting Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 181 citations
Pfeiffer et al.
What Can We Learn From Collective Human Opinions On Natural Language Inference Data? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 77 citations
Yixin Nie, Xiang Zhou, Mohit Bansal
From Zero To Hero: On The Limitations Of Zero-shot Cross-lingual Transfer With Multilingual Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 50 citations
Lauscher et al.
HERO: Hierarchical Encoder For Video+language Omni-representation Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 372 citations
Li et al.
Pre-training For Abstractive Document Summarization By Reinstating Source Text (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 42 citations
Zou et al.
Neural Deepfake Detection With Factual Structure Of Text (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 41 citations
Zhong et al.
A Hierarchical Network For Abstractive Meeting Summarization With Cross-domain Pretraining (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 122 citations
Zhu et al.
Document Ranking With A Pretrained Sequence-to-sequence Model (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 367 citations
Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin
Towards Persona-based Empathetic Conversational Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 107 citations
Zhong et al.
What Have We Achieved On Text Summarization? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 70 citations
Huang et al.
Grounded Adaptation For Zero-shot Executable Semantic Parsing (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 83 citations
Zhong et al.
Cold-start Active Learning Through Self-supervised Language Modeling (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 61 citations
Michelle Yuan, Hsuan-Tien Lin, Jordan Boyd-Graber
Weakly-supervised Aspect-based Sentiment Analysis Via Joint Aspect-sentiment Topic Embedding (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 53 citations
Huang et al.
Revisiting Pre-trained Models For Chinese Natural Language Processing (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 452 citations
Cui et al.
Texthide: Tackling Data Privacy In Language Understanding Tasks (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 42 citations
Huang et al.
Cost-effective Selection Of Pretraining Data: A Case Study Of Pretraining BERT On Social Media (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 46 citations
Dai et al.
GRADE: Automatic Graph-enhanced Coherence Metric For Evaluating Open-domain Dialogue Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 67 citations
Huang et al.
Knowledge-grounded Dialogue Generation With Pre-trained Language Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Zhao et al.
Revealing The Myth Of Higher-order Inference In Coreference Resolution (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 79 citations
Liyan Xu, Jinho D. Choi
Plotmachines: Outline-conditioned Generation With Dynamic Plot State Tracking (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 97 citations
Rashkin et al.
AGIF: An Adaptive Graph-interactive Framework For Joint Multiple Intent Detection And Slot Filling (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 108 citations
Qin et al.
Bridging Textual And Tabular Data For Cross-domain Text-to-sql Semantic Parsing (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 145 citations
Xi Victoria Lin, Richard Socher, Caiming Xiong
Calibration Of Pre-trained Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 60 citations
Shrey Desai, Greg Durrett
Fquad: French Question Answering Dataset (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 62 citations
D'Hoffschmidt et al.
Adapterdrop: On The Efficiency Of Adapters In Transformers (2020) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 61 citations
Rücklé et al.
DAGA: Data Augmentation With A Generation Approach For Low-resource Tagging Tasks (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 130 citations
Ding et al.
Reducing Quantity Hallucinations In Abstractive Summarization (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 92 citations
Zheng Zhao, Shay B. Cohen, Bonnie Webber
Multi-fact Correction In Abstractive Text Summarization (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 96 citations
Dong et al.
Robbert: A Dutch Roberta-based Language Model (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 103 citations
Pieter Delobelle, Thomas Winters, Bettina Berendt
Small And Practical BERT Models For Sequence Labeling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 117 citations
Tsai et al.
Investigating Meta-learning Algorithms For Low-resource Natural Language Understanding Tasks (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 106 citations
Zi-Yi Dou, Keyi Yu, Antonios Anastasopoulos
Show Your Work: Improved Reporting Of Experimental Results (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 215 citations
Dodge et al.
Patient Knowledge Distillation For BERT Model Compression (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 85 citations
Sun et al.
Cosql: A Conversational Text-to-sql Challenge Towards Cross-domain Natural Language Interfaces To Databases (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 122 citations
Yu et al.
An Entity-driven Framework For Abstractive Summarization (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 59 citations
Sharma et al.
Connecting The Dots: Document-level Neural Relation Extraction With Edge-oriented Graphs (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 229 citations
Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou
The Flores Evaluation Datasets For Low-resource Machine Translation: Nepali-english And Sinhala-english (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 92 citations
Guzmán et al.
Entity, Relation, And Event Extraction With Contextualized Span Representations (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 275 citations
Wadden et al.
Do We Really Need Fully Unsupervised Cross-lingual Embeddings? (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 109 citations
Vulić et al.
Commonsense Knowledge Mining From Pretrained Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 264 citations
Joshua Feldman, Joe Davison, Alexander M. Rush
Adversarial Learning With Contextual Embeddings For Zero-resource Cross-lingual Classification And NER (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 71 citations
Phillip Keung, Yichao Lu, Vikas Bhardwaj
Sentence-level Content Planning And Style Specification For Neural Text Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 51 citations
Xinyu Hua, Lu Wang
BERT For Coreference Resolution: Baselines And Analysis (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 57 citations
Joshi et al.
Unsupervised Domain Adaptation Of Contextualized Embeddings For Sequence Labeling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 154 citations
Xiaochuang Han, Jacob Eisenstein
A Multi-type Multi-span Network For Reading Comprehension That Requires Discrete Reasoning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 77 citations
Hu et al.
Moverscore: Text Generation Evaluating With Contextualized Embeddings And Earth Mover Distance (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 444 citations
Zhao et al.
Using Local Knowledge Graph Construction To Scale Seq2seq Models To Multi-document Inputs (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 98 citations
Fan et al.
Reducing Sentiment Bias In Language Models Via Counterfactual Evaluation (2019) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 54 citations
Huang et al.
Transforming Delete, Retrieve, Generate Approach For Controlled Text Style Transfer (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Akhilesh Sudhakar, Bhargav Upadhyay, Arjun Maheswaran
Benchmarking Zero-shot Text Classification: Datasets, Evaluation And Entailment Approach (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 219 citations
Wenpeng Yin, Jamaal Hay, Dan Roth
Encode, Tag, Realize: High-precision Text Editing (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Malmi et al.
A Systematic Comparison Of Methods For Low-resource Dependency Parsing On Genuinely Low-resource Languages (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 51 citations
Vania et al.
Pretrained Language Models For Sequential Sentence Classification (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 51 citations
Cohan et al.
Long And Diverse Text Generation With Planning-based Hierarchical Variational Model (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 94 citations
Shao et al.
Implicit Deep Latent Variable Models For Text Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 53 citations
Fang et al.
Certified Robustness To Adversarial Word Substitutions (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 236 citations
Jia et al.
Tree Transformer: Integrating Tree Structures Into Self-attention (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 48 citations
Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen
75 Languages, 1 Model: Parsing Universal Dependencies Universally (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 61 citations
Dan Kondratyuk, Milan Straka
Generating Token-level Explanations For Natural Language Inference (2019) • Proceedings of the 2019 Conference of the North • 41 citations
Thorne et al.
A Surprisingly Effective Fix For Deep Latent Variable Modeling Of Text (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 59 citations
Li et al.
Attention Is Not Not Explanation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 58 citations
Sarah Wiegreffe, Yuval Pinter
A Logic-driven Framework For Consistency Of Neural Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 64 citations
Li et al.
Beto, Bentz, Becas: The Surprising Cross-lingual Effectiveness Of BERT (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 135 citations
Shijie Wu, Mark Dredze
Self-assembling Modular Networks For Interpretable Multi-hop Reasoning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 56 citations
Yichen Jiang, Mohit Bansal
Induction Networks For Few-shot Text Classification (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 197 citations
Geng et al.
Editing-based SQL Query Generation For Cross-domain Context-dependent Questions (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 114 citations
Zhang et al.
Domain Adaptive Text Style Transfer (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 56 citations
Li et al.
Hint-based Training For Non-autoregressive Machine Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 63 citations
Li et al.
UER: An Open-source Toolkit For Pre-training Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations • 88 citations
Zhao et al.
Automatic Argument Quality Assessment -- New Datasets And Methods (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 55 citations
Toledo et al.
Text Summarization With Pretrained Encoders (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 1525 citations
Yang Liu, Mirella Lapata
Parallel Iterative Edit Models For Local Sequence Transduction (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 142 citations
Awasthi et al.
Cloze-driven Pretraining Of Self-attention Networks (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 41 citations
Baevski et al.
Socialiqa: Commonsense Reasoning About Social Interactions (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 111 citations
Sap et al.
Sentence-bert: Sentence Embeddings Using Siamese Bert-networks (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 9086 citations
Nils Reimers, Iryna Gurevych
Zero-shot Cross-lingual Dialogue Systems With Transferable Latent Variables (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 76 citations
Liu et al.
Scibert: A Pretrained Language Model For Scientific Text (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 1631 citations
Iz Beltagy, Kyle Lo, Arman Cohan
NCLS: Neural Cross-lingual Summarization (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 121 citations
Zhu et al.
The Woman Worked As A Babysitter: On Biases In Language Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 380 citations
Sheng et al.
PAWS-X: A Cross-lingual Adversarial Dataset For Paraphrase Identification (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 222 citations
Yang et al.
Enhancing Amr-to-text Generation With Dual Graph Representations (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 57 citations
Leonardo F. R. Ribeiro, Claire Gardent, Iryna Gurevych
Multi-task Learning For Conversational Question Answering Over A Large-scale Knowledge Base (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 89 citations
Shen et al.
Sampling Bias In Deep Active Classification: An Empirical Study (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 45 citations
Ameya Prabhu, Charles Dognin, Maneesh Singh
Knowledge Aware Conversation Generation With Explainable Reasoning Over Augmented Graphs (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 95 citations
Liu et al.
Cm-net: A Novel Collaborative Memory Network For Spoken Language Understanding (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 89 citations
Liu et al.
A Novel Aspect-guided Deep Transition Model For Aspect Based Sentiment Analysis (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 49 citations
Liang et al.
Imat: Unsupervised Text Attribute Transfer Via Iterative Matching And Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 46 citations
Jin et al.
GECOR: An End-to-end Generative Ellipsis And Co-reference Resolution Model For Task-oriented Dialogue (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 49 citations
Quan et al.
Allennlp Interpret: A Framework For Explaining Predictions Of NLP Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations • 125 citations
Wallace et al.
Doc2edag: An End-to-end Document-level Framework For Chinese Financial Event Extraction (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 170 citations
Zheng et al.
Language Models As Knowledge Bases? (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 725 citations
Petroni et al.
Crossweigh: Training Named Entity Tagger From Imperfect Annotations (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 96 citations
Wang et al.
Trouble On The Horizon: Forecasting The Derailment Of Online Conversations As They Develop (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 45 citations
Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil
Convert: Efficient And Accurate Conversational Representations From Transformers (2019) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 41 citations
Henderson et al.
Entity-consistent End-to-end Task-oriented Dialogue System With KB Retriever (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 57 citations
Qin et al.
Low-resource Name Tagging Learned With Weakly Labeled Data (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 86 citations
Cao et al.
GLUE: A Multi-task Benchmark And Analysis Platform For Natural Language Understanding (2018) • Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP • 3674 citations
Wang et al.
Neural Aesthetic Image Reviewer (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 107 citations
Wang et al.
Learning A Shared Shape Space For Multimodal Garment Design (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 62 citations
Wang et al.
Emrqa: A Large Corpus For Question Answering On Electronic Medical Records (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 164 citations
Pampari et al.
Preco: A Large-scale Dataset In Preschool Vocabulary For Coreference Resolution (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 44 citations
Chen et al.
Semi-autoregressive Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 76 citations
Chunqi Wang, Ji Zhang, Haiqing Chen
Dissecting Contextual Word Embeddings: Architecture And Representation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 420 citations
Peters et al.
Sql-to-text Generation With Graph-to-sequence Model (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 74 citations
Xu et al.
Retrieve And Refine: Improved Sequence Generation Models For Dialogue (2018) • Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI • 176 citations
Jason Weston, Emily Dinan, Alexander H. Miller
Reasoning About Actions And State Changes By Injecting Commonsense Knowledge (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 84 citations
Tandon et al.
Rearranging The Familiar: Testing Compositional Generalization In Recurrent Networks (2018) • Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP • 95 citations
João Loula, Marco Baroni, Brenden M. Lake
Semi-supervised Sequence Modeling With Cross-view Training (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 391 citations
Clark et al.
Hard Non-monotonic Attention For Character-level Transduction (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 50 citations
Shijie Wu, Pamela Shapiro, Ryan Cotterell
Learning Neural Templates For Text Generation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 192 citations
Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
XNLI: Evaluating Cross-lingual Sentence Representations (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 906 citations
Conneau et al.
Back-translation Sampling By Targeting Difficult Words In Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 84 citations
Marzieh Fadaee, Christof Monz
Word Sense Induction With Neural Bilm And Symmetric Patterns (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 48 citations
Asaf Amrami, Yoav Goldberg
SWAG: A Large-scale Adversarial Dataset For Grounded Commonsense Inference (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 103 citations
Zellers et al.
A Skeleton-based Model For Promoting Coherence Among Sentences In Narrative Story Generation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 94 citations
Xu et al.
Multiwoz -- A Large-scale Multi-domain Wizard-of-oz Dataset For Task-oriented Dialogue Modelling (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 312 citations
Budzianowski et al.
Phrase-indexed Question Answering: A New Challenge For Scalable Document Comprehension (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 42 citations
Seo et al.
Meta-learning For Low-resource Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 313 citations
Gu et al.
Why Self-attention? A Targeted Evaluation Of Neural Machine Translation Architectures (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 287 citations
Tang et al.
Towards Exploiting Background Knowledge For Building Conversation Systems (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 158 citations
Moghe et al.
Generating Natural Language Adversarial Examples (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 781 citations
Alzantot et al.
Open Domain Question Answering Using Early Fusion Of Knowledge Bases And Text (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 422 citations
Sun et al.
Improving The Transformer Translation Model With Document-level Context (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 81 citations
Zhang et al.
Bottom-up Abstractive Summarization (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 716 citations
Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush
Understanding Back-translation At Scale (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 995 citations
Edunov et al.
Spider: A Large-scale Human-labeled Dataset For Complex And Cross-domain Semantic Parsing And Text-to-sql Task (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 672 citations
Yu et al.
Generating More Interesting Responses In Neural Conversation Models With Distributional Constraints (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 90 citations
Baheti et al.
Adversarial Removal Of Demographic Attributes From Text Data (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 243 citations
Yanai Elazar, Yoav Goldberg
Quac : Question Answering In Context (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 678 citations
Choi et al.
Extending Neural Generative Conversational Model Using External Knowledge Sources (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 69 citations
Prasanna Parthasarathi, Joelle Pineau
Shortcut-stacked Sentence Encoders For Multi-domain Inference (2017) • Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP • 119 citations
Yixin Nie, Mohit Bansal

Showing first 12 while collapsed. Click to expand and reveal all 285.

Evaluation Frameworks 483 papers #

Establishing Trustworthy LLM Evaluation Via Shortcut Neuron Analysis (2025) • No Venue
Zhu et al.
Oagents: An Empirical Study Of Building Effective Agents (2025) • No Venue
Zhu et al.
Bigcodearena: Unveiling More Reliable Human Preferences In Code Generation Via Execution (2025) • No Venue
Zhuo et al.
Safearena: Evaluating The Safety Of Autonomous Web Agents (2025) • No Venue
Tur et al.
B-score: Detecting Biases In Large Language Models Using Response History (2025) • No Venue
Vo et al.
Deepresearch Arena: The First Exam Of Llms' Research Abilities Via Seminar-grounded Tasks (2025) • No Venue
Wan et al.
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation And Methodology (2025) • No Venue
Wang et al.
Cmphysbench: A Benchmark For Evaluating Large Language Models In Condensed Matter Physics (2025) • No Venue
Wang et al.
Genexam: A Multidisciplinary Text-to-image Exam (2025) • No Venue
Wang et al.
Mcp-bench: Benchmarking Tool-using LLM Agents With Complex Real-world Tasks Via MCP Servers (2025) • No Venue
Wang et al.
When Visualizing Is The First Step To Reasoning: MIRA, A Benchmark For Visual Chain-of-thought (2025) • No Venue
Zhou et al.
Scientists' First Exam: Probing Cognitive Abilities Of MLLM Via Perception, Understanding, And Reasoning (2025) • No Venue
Zhou et al.
Complexfuncbench: Exploring Multi-step And Constrained Function Calling Under Long-context Scenario (2025) • No Venue
Zhong et al.
Vibe Checker: Aligning Code Evaluation With Human Preference (2025) • No Venue
Zhong et al.
Redundancy Principles For Mllms Benchmarks (2025) • No Venue
Zhang et al.
Sentient Agent As A Judge: Evaluating Higher-order Social Cognition In Large Language Models (2025) • No Venue
Zhang et al.
Envisioning Beyond The Pixels: Benchmarking Reasoning-informed Visual Editing (2025) • No Venue
Zhao et al.
Sciarena: An Open Evaluation Platform For Foundation Models In Scientific Literature Tasks (2025) • No Venue
Zhao et al.
Newtonbench: Benchmarking Generalizable Scientific Law Discovery In LLM Agents (2025) • No Venue
Zheng et al.
Livecodebench Pro: How Do Olympiad Medalists Judge Llms In Competitive Programming? (2025) • No Venue
Zheng et al.
Vbench-2.0: Advancing Video Generation Benchmark Suite For Intrinsic Faithfulness (2025) • No Venue
Zheng et al.
Livemcp-101: Stress Testing And Diagnosing Mcp-enabled Agents On Challenging Queries (2025) • No Venue
Yin et al.
Vrbench: A Benchmark For Multi-step Reasoning In Long Narrative Videos (2025) • No Venue
Yu et al.
Mme-reasoning: A Comprehensive Benchmark For Logical Reasoning In Mllms (2025) • No Venue
Yuan et al.
Aralingbench A Human-annotated Benchmark For Evaluating Arabic Linguistic Capabilities Of Large Language Models (2025) • No Venue
Zbib et al.
Futurex: An Advanced Live Benchmark For LLM Agents In Future Prediction (2025) • No Venue
Zeng et al.
Codecriticbench: A Holistic Code Critique Benchmark For Large Language Models (2025) • No Venue
Zhang et al.
DITING: A Multi-agent Evaluation Framework For Benchmarking Web Novel Translation (2025) • No Venue
Zhang et al.
Pref-grpo: Pairwise Preference Reward-based GRPO For Stable Text-to-image Reinforcement Learning (2025) • No Venue
Wang et al.
Unigenbench++: A Unified Semantic Evaluation Benchmark For Text-to-image Generation (2025) • No Venue
Wang et al.
Trustjudge: Inconsistencies Of Llm-as-a-judge And How To Alleviate Them (2025) • No Venue
Wang et al.
Voiceassistant-eval: Benchmarking AI Assistants Across Listening, Speaking, And Viewing (2025) • No Venue
Wang et al.
Ggbench: A Geometric Generative Reasoning Benchmark For Unified Multimodal Models (2025) • No Venue
Wei et al.
Codearc: Benchmarking Reasoning Capabilities Of LLM Agents For Inductive Program Synthesis (2025) • No Venue
Wei et al.
Reinforcement Learning With Verifiable Rewards Implicitly Incentivizes Correct Reasoning In Base Llms (2025) • No Venue
Wen et al.
Widesearch: Benchmarking Agentic Broad Info-seeking (2025) • No Venue
Wong et al.
Mcpmark: A Benchmark For Stress-testing Realistic And Comprehensive MCP Use (2025) • No Venue
Wu et al.
Writingbench: A Comprehensive Benchmark For Generative Writing (2025) • No Venue
Wu et al.
Leetcodedataset: A Temporal Dataset For Robust Evaluation And Efficient Training Of Code Llms (2025) • No Venue
Xia et al.
MIEB: Massive Image Embedding Benchmark (2025) • No Venue
Xiao et al.
Pretrainzero: Reinforcement Active Pretraining (2025) • No Venue
Xing et al.
Ravine: Reality-aligned Evaluation For Agentic Search (2025) • No Venue
Xu et al.
Relearn: Unlearning Via Learning For Large Language Models (2025) • No Venue
Xu et al.
Visulogic: A Benchmark For Evaluating Visual Reasoning In Multi-modal Large Language Models (2025) • No Venue
Xu et al.
Step-audio-editx Technical Report (2025) • No Venue
Yan et al.
Oceangym: A Benchmark Environment For Underwater Embodied Agents (2025) • No Venue
Xue et al.
Recitation Over Reasoning: How Cutting-edge Language Models Can Fail On Elementary School-level Reasoning Problems? (2025) • No Venue
Yan et al.
Gpt-imgeval: A Comprehensive Benchmark For Diagnosing Gpt4o In Image Generation (2025) • No Venue
Yan et al.
A Controllable Examination For Long-context Language Models (2025) • No Venue
Yang et al.
Embodiedbench: Comprehensive Benchmarking Multi-modal Large Language Models For Vision-driven Embodied Agents (2025) • No Venue
Yang et al.
Longvt: Incentivizing "thinking With Long Videos" Via Native Tool Calling (2025) • No Venue
Yang et al.
Moose-chem2: Exploring LLM Limits In Fine-grained Scientific Hypothesis Discovery Via Hierarchical Search (2025) • No Venue
Yang et al.
Too Good To Be Bad: On The Failure Of Llms To Role-play Villains (2025) • No Venue
Yi et al.
Spin-bench: How Well Do Llms Plan Strategically And Reason Socially? (2025) • No Venue
Yao et al.
Survey On Evaluation Of Llm-based Agents (2025) • No Venue
Yehudai et al.
Echo-4o: Harnessing The Power Of Gpt-4o Synthetic Images For Improved Image Generation (2025) • No Venue
Ye et al.
Seeing From Another Perspective: Evaluating Multi-view Understanding In Mllms (2025) • No Venue
Yeh et al.
Language Models' Factuality Depends On The Language Of Inquiry (2025) • No Venue
Aggarwal et al.
Flashadventure: A Benchmark For GUI Agents Solving Full Story Arcs In Diverse Adventure Games (2025) • No Venue
Ahn et al.
Atla Selene Mini: A General Purpose Evaluation Model (2025) • No Venue
Alexandru et al.
Open Deep Search: Democratizing Search With Open-source Reasoning Agents (2025) • No Venue
Alzubi et al.
Herobench: A Benchmark For Long-horizon Planning And Structured Reasoning In Virtual Worlds (2025) • No Venue
Anokhin et al.
What Does It Take To Be A Good AI Research Agent? Studying The Role Of Ideation Diversity (2025) • No Venue
Audran-Reiss et al.
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study (2025) • No Venue
Cai et al.
MM-IQ: Benchmarking Human-like Abstraction And Reasoning In Multimodal Models (2025) • No Venue
Huanqia Cai, Yijun Yang, Winston Hu
Web-shepherd: Advancing Prms For Reinforcing Web Agents (2025) • No Venue
Chae et al.
A3: Android Agent Arena For Mobile GUI Agents (2025) • No Venue
Chai et al.
Oneig-bench: Omni-dimensional Nuanced Evaluation For Image Generation (2025) • No Venue
Chang et al.
Browsecomp-plus: A More Fair And Transparent Evaluation Benchmark Of Deep-research Agent (2025) • No Venue
Chen et al.
Autopr: Let's Automate Your Academic Promotion! (2025) • No Venue
Chen et al.
Halumem: Evaluating Hallucinations In Memory Systems Of Agents (2025) • No Venue
Chen et al.
FINEREASON: Evaluating And Improving Llms' Deliberate Reasoning Through Reflective Puzzle Solving (2025) • No Venue
Chen et al.
Mvi-bench: A Comprehensive Benchmark For Evaluating Robustness To Misleading Visual Inputs In Lvlms (2025) • No Venue
Chen et al.
Paper2web: Let's Make Your Paper Alive! (2025) • No Venue
Chen et al.
Reform: Reflective Autoformalization With Prospective Bounded Sequence Optimization (2025) • No Venue
Chen et al.
Xverify: Efficient Answer Verifier For Reasoning Model Evaluations (2025) • No Venue
Chen et al.
Tivibench: Benchmarking Think-in-video Reasoning For Video Generative Models (2025) • No Venue
Chen et al.
Videovista-culturallingo: 360^circ Horizons-bridging Cultures, Languages, And Domains In Video Comprehension (2025) • No Venue
Chen et al.
Multimodal Evaluation Of Russian-language Architectures (2025) • No Venue
Chervyakov et al.
Interactcomp: Evaluating Search Agents With Ambiguous Queries (2025) • No Venue
Deng et al.
Swe-bench Pro: Can AI Agents Solve Long-horizon Software Engineering Tasks? (2025) • No Venue
Deng et al.
Story2board: A Training-free Approach For Expressive Storyboard Generation (2025) • No Venue
Dinkevich et al.
Deepresearch Bench: A Comprehensive Benchmark For Deep Research Agents (2025) • No Venue
Du et al.
Flux-reason-6m & Prism-bench: A Million-scale Text-to-image Reasoning Dataset And Comprehensive Benchmark (2025) • No Venue
Fang et al.
Creation-mmbench: Assessing Context-aware Creative Intelligence In MLLM (2025) • No Venue
Fang et al.
Cognitive Kernel-pro: A Framework For Deep Research Agents And Agent Foundation Models Training (2025) • No Venue
Fang et al.
On Path To Multimodal Generalist: General-level And General-bench (2025) • No Venue
Fei et al.
Can Mllms Guide Me Home? A Benchmark Study On Fine-grained Visual Reasoning From Transit Maps (2025) • No Venue
Feng et al.
Multiple Choice Questions: Reasoning Makes Large Language Models (llms) More Self-confident Even When They Are Wrong (2025) • No Venue
Fu et al.
Do Vision-language Models Have Internal World Models? Towards An Atomic Evaluation (2025) • No Venue
Gao et al.
Inverse Scaling In Test-time Compute (2025) • No Venue
Gema et al.
Great Models Think Alike And This Undermines AI Oversight (2025) • No Venue
Goel et al.
Mind2web 2: Evaluating Agentic Search With Agent-as-a-judge (2025) • No Venue
Gou et al.
Textarena (2025) • No Venue
Guertler et al.
ACADREASON: Exploring The Limits Of Reasoning Models With Academic Research Problems (2025) • No Venue
Gui et al.
Mineworld: A Real-time And Open-source Interactive World Model On Minecraft (2025) • No Venue
Guo et al.
Swe-factory: Your Automated Factory For Issue Resolution Training Data And Evaluation Benchmarks (2025) • No Venue
Guo et al.
Generating An Image From 1,000 Words: Enhancing Text-to-image With Structured Captions (2025) • No Venue
Gutflaish et al.
Beyond The Last Answer: Your Reasoning Trace Uncovers More Than You Think (2025) • No Venue
Hasan Abed Al Kader Hammoud, Hani Itani, Bernard Ghanem
Unireditbench: A Unified Reasoning-based Image Editing Benchmark (2025) • No Venue
Han et al.
Videoscore2: Think Before You Score In Generative Video Evaluation (2025) • No Venue
He et al.
A Sober Look At Progress In Language Model Reasoning: Pitfalls And Paths To Reproducibility (2025) • No Venue
Hochlehnert et al.
Motionbench: Benchmarking And Improving Fine-grained Video Motion Understanding For Vision Language Models (2025) • No Venue
Hong et al.
Lmgame-bench: How Good Are Llms At Playing Games? (2025) • No Venue
Hu et al.
Finsearchcomp: Towards A Realistic, Expert-level Evaluation Of Financial Search And Reasoning (2025) • No Venue
Hu et al.
Video-mmmu: Evaluating Knowledge Acquisition From Multi-discipline Professional Videos (2025) • No Venue
Hu et al.
Benchmax: A Comprehensive Multilingual Evaluation Suite For Large Language Models (2025) • No Venue
Huang et al.
Building A Foundational Guardrail For General Agentic Systems Via Synthetic Data (2025) • No Venue
Huang et al.
Lego-eval: Towards Fine-grained Evaluation On Synthesizing 3D Embodied Environments With Tool Augmentation (2025) • No Venue
Hwangbo et al.
Omnispatial: Towards Comprehensive Spatial Reasoning Benchmark For Vision Language Models (2025) • No Venue
Jia et al.
Are Today's Llms Ready To Explain Well-being Concepts? (2025) • No Venue
Jiang et al.
Omni-reward: Towards Generalist Omni-modal Reward Modeling With Free-form Preferences (2025) • No Venue
Jin et al.
Why Language Models Hallucinate (2025) • No Venue
Kalai et al.
LINGOLY-TOO: Disentangling Memorisation From Reasoning With Linguistic Templatisation And Orthographic Obfuscation (2025) • No Venue
Khouja et al.
Flex-judge: Think Once, Judge Anywhere (2025) • No Venue
Ko et al.
Exp-bench: Can AI Conduct AI Research Experiments? (2025) • No Venue
Kon et al.
From Scores To Skills: A Cognitive Diagnosis Framework For Evaluating Financial Large Language Models (2025) • No Venue
Kuang et al.
Fea-bench: A Benchmark For Evaluating Repository-level Code Generation For Feature Implementation (2025) • No Venue
Li et al.
Migician: Revealing The Magic Of Free-form Multi-image Grounding In Multimodal Large Language Models (2025) • No Venue
Li et al.
Omnivideobench: Towards Audio-visual Understanding Evaluation For Omni Mllms (2025) • No Venue
Li et al.
Ovo-bench: How Far Is Your Video-llms From Real-world Online Video Understanding? (2025) • No Venue
Li et al.
A.S.E: A Repository-level Benchmark For Evaluating Security In Ai-generated Code (2025) • No Venue
Lian et al.
Towards Personalized Deep Research: Benchmarks And Evaluations (2025) • No Venue
Liang et al.
Surveyx: Academic Survey Automation Via Large Language Models (2025) • No Venue
Liang et al.
Ost-bench: Evaluating The Capabilities Of Mllms In Online Spatio-temporal Scene Understanding (2025) • No Venue
Lin et al.
Mcpeval: Automatic Mcp-based Deep Evaluation For AI Agent Models (2025) • No Venue
Liu et al.
Compassverifier: A Unified And Robust Verifier For Llms Evaluation And Outcome Reward (2025) • No Venue
Liu et al.
Shotbench: Expert-level Cinematic Understanding In Vision-language Models (2025) • No Venue
Liu et al.
Pc-agent: A Hierarchical Multi-agent Collaboration Framework For Complex Task Automation On PC (2025) • No Venue
Liu et al.
Researchbench: Benchmarking Llms In Scientific Discovery Via Inspiration-based Task Decomposition (2025) • No Venue
Liu et al.
Step1x-edit: A Practical Framework For General Image Editing (2025) • No Venue
Liu et al.
Mcp-universe: Benchmarking Large Language Models With Real-world Model Context Protocol Servers (2025) • No Venue
Luo et al.
Agentrewardbench: Evaluating Automatic Evaluations Of Web Agent Trajectories (2025) • No Venue
Lù et al.
Iv-bench: A Benchmark For Image-grounded Video Perception And Reasoning In Multimodal Llms (2025) • No Venue
Ma et al.
Rethinking RL Scaling For Vision Language Models: A Transparent, From-scratch Framework And Comprehensive Evaluation Scheme (2025) • No Venue
Ma et al.
Step-video-t2v Technical Report: The Practice, Challenges, And Future Of Video Foundation Model (2025) • No Venue
Ma et al.
Swe-lancer: Can Frontier Llms Earn $1 Million From Real-world Freelance Software Engineering? (2025) • No Venue
Miserendino et al.
Mlgym: A New Framework And Benchmark For Advancing AI Research Agents (2025) • No Venue
Nathani et al.
Annotation-efficient Universal Honesty Alignment (2025) • No Venue
Ni et al.
A Survey On Large Language Model Benchmarks (2025) • No Venue
Ni et al.
Does Understanding Inform Generation In Unified Multimodal Models? From Analysis To Path Forward (2025) • No Venue
Niu et al.
Benchmarking Llms' Swarm Intelligence (2025) • No Venue
Ruan et al.
Comment On The Illusion Of Thinking: Understanding The Strengths And Limitations Of Reasoning Models Via The Lens Of Problem Complexity (2025) • No Venue
C. Opus, A. Lawsen
REST: Stress Testing Large Reasoning Models By Asking Multiple Problems At Once (2025) • No Venue
Pan et al.
Sweeval: Do Llms Really Swear? A Safety Benchmark For Testing Limits For Enterprise Use (2025) • No Venue
Patel et al.
Plutus: Benchmarking Large Language Models In Low-resource Greek Finance (2025) • No Venue
Peng et al.
Humanity's Last Exam (2025) • No Venue
Phan et al.
THOUGHTTERMINATOR: Benchmarking, Calibrating, And Mitigating Overthinking In Reasoning Models (2025) • No Venue
Pu et al.
Judge Anything: MLLM As A Judge Across Any Modality (2025) • No Venue
Pu et al.
Vcr-bench: A Comprehensive Evaluation Framework For Video Chain-of-thought Reasoning (2025) • No Venue
Qi et al.
BEAR: Benchmarking And Enhancing Multimodal Language Models For Atomic Embodied Capabilities (2025) • No Venue
Qi et al.
Userbench: An Interactive Gym Environment For User-centric Agents (2025) • No Venue
Qian et al.
Phybench: Holistic Evaluation Of Physical Perception And Reasoning In Large Language Models (2025) • No Venue
Qiu et al.
How Well Does Gpt-4o Understand Vision? Evaluating Multimodal Foundation Models On Standard Computer Vision Tasks (2025) • No Venue
Ramachandran et al.
Videomathqa: Benchmarking Mathematical Reasoning Via Multimodal Understanding In Videos (2025) • No Venue
Rasheed et al.
Anycap Project: A Unified Framework, Dataset, And Benchmark For Controllable Omni-modal Captioning (2025) • No Venue
Ren et al.
Reviewscore: Misinformed Peer Review Detection With Large Language Models (2025) • No Venue
Ryu et al.
Nile-chat: Egyptian Language Models For Arabic And Latin Scripts (2025) • No Venue
Shang et al.
Yourbench: Easy Custom Evaluation Sets For Everyone (2025) • No Venue
Shashidhar et al.
Solving Inequality Proofs With Large Language Models (2025) • No Venue
Sheng et al.
Are We On The Right Way For Assessing Document Retrieval-augmented Generation? (2025) • No Venue
Shen et al.
Phyx: Does Your Model Have The "wits" For Physical Reasoning? (2025) • No Venue
Shen et al.
The Leaderboard Illusion (2025) • No Venue
Singh et al.
Can Language Models Falsify? Evaluating Algorithmic Reasoning With Counterexample Creation (2025) • No Venue
Sinha et al.
IFIR: A Comprehensive Benchmark For Evaluating Instruction-following In Expert-domain Information Retrieval (2025) • No Venue
Song et al.
Paperbench: Evaluating Ai's Ability To Replicate AI Research (2025) • No Venue
Starace et al.
Challenging The Boundaries Of Reasoning: An Olympiad-level Math Benchmark For Large Language Models (2025) • No Venue
Sun et al.
Evaluation Is All You Need: Strategic Overclaiming Of LLM Reasoning Capabilities Through Evaluation Design (2025) • No Venue
Sun et al.
Au-harness: An Open-source Toolkit For Holistic Evaluation Of Audio Llms (2025) • No Venue
Surapaneni et al.
T2i-reasonbench: Benchmarking Reasoning-informed Text-to-image Generation (2025) • No Venue
Sun et al.
When An LLM Is Apprehensive About Its Answers -- And When Its Uncertainty Is Justified (2025) • No Venue
Sychev et al.
Realcritic: Towards Effectiveness-driven Evaluation Of Language Model Critiques (2025) • No Venue
Tang et al.
Enabling Scalable Oversight Via Self-evolving Critic (2025) • No Venue
Tang et al.
Agent KB: Leveraging Cross-domain Experience For Agentic Problem Solving (2025) • No Venue
Tang et al.
Supergpqa: Scaling LLM Evaluation Across 285 Graduate Disciplines (2025) • No Venue
Team et al.
Llamav-o1: Rethinking Step-by-step Visual Reasoning In Llms (2025) • No Venue
Thawakar et al.
Open Multimodal Retrieval-augmented Factual Image Generation (2025) • No Venue
Tian et al.
MMMR: Benchmarking Massive Multi-modal Reasoning Tasks (2025) • No Venue
Tie et al.
Replacing Judges With Juries: Evaluating LLM Generations With A Panel Of Diverse Models (2024) • No Venue
Verga et al.
Meltemi: The First Open Large Language Model For Greek (2024) • No Venue
Voukoutis et al.
Gpt-4o System Card (2024) • No Venue
Openai et al.
Phi-3 Technical Report: A Highly Capable Language Model Locally On Your Phone (2024) • No Venue
Abdin et al.
Unibench: Visual Reasoning Requires Rethinking Vision-language Beyond Scaling (2024) • No Venue
Al-Tahan et al.
Unitxt: Flexible, Shareable And Reusable Data Preparation And Evaluation For Generative AI (2024) • No Venue
Bandel et al.
Windows Agent Arena: Evaluating Multi-modal OS Agents At Scale (2024) • No Venue
Bonatti et al.
Roadmap Towards Superhuman Speech Understanding Using Large Language Models (2024) • No Venue
Bu et al.
Compassjudger-1: All-in-one Judge Model Helps Model Evaluation And Evolution (2024) • No Venue
Cao et al.
PERSONA: A Reproducible Testbed For Pluralistic Alignment (2024) • No Venue
Castricato et al.
Swe-bench-java: A Github Issue Resolving Benchmark For Java (2024) • No Venue
Zan et al.
Law Of The Weakest Link: Cross Capabilities Of Large Language Models (2024) • No Venue
Zhong et al.
Mceval: Massively Multilingual Code Evaluation (2024) • No Venue
Chai et al.
Chexagent: Towards A Foundation Model For Chest X-ray Interpretation (2024) • No Venue
Chen et al.
Gmai-mmbench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI (2024) • No Venue
Chen et al.
Interleaved Scene Graph For Interleaved Text-and-image Generation Assessment (2024) • No Venue
Chen et al.
Mega-bench: Scaling Multimodal Evaluation To Over 500 Real-world Tasks (2024) • No Venue
Chen et al.
Mj-bench: Is Your Multimodal Reward Model Really A Good Judge For Text-to-image Generation? (2024) • No Venue
Chen et al.
Scienceagentbench: Toward Rigorous Assessment Of Language Agents For Data-driven Scientific Discovery (2024) • No Venue
Chen et al.
Mmmu-pro: A More Robust Multi-discipline Multimodal Understanding Benchmark (2024) • No Venue
Yue et al.
Videorefer Suite: Advancing Spatial-temporal Object Understanding With Video LLM (2024) • No Venue
Yuan et al.
Chronomagic-bench: A Benchmark For Metamorphic Evaluation Of Text-to-time-lapse Video Generation (2024) • No Venue
Yuan et al.
MMAU: A Holistic Benchmark Of Agent Capabilities Across Diverse Domains (2024) • No Venue
Yin et al.
CORAL: Benchmarking Multi-turn Conversational Retrieval-augmentation Generation (2024) • No Venue
Cheng et al.
The Browsergym Ecosystem For Web Agent Research (2024) • No Venue
Chezelles et al.
Chatbot Arena: An Open Platform For Evaluating Llms By Human Preference (2024) • No Venue
Chiang et al.
Symbolicai: A Framework For Logic-based Approaches Combining Generative Models And Solvers (2024) • No Venue
Dinu et al.
Mapeval: A Map-based Evaluation Of Geo-spatial Reasoning In Foundation Models (2024) • No Venue
Dihan et al.
Judging The Judges: Evaluating Alignment And Vulnerabilities In Llms-as-judges (2024) • No Venue
Thakur et al.
Can Chatgpt Evaluate Research Quality? (2024) • Journal of Data and Information Science • 40 citations
Mike Thelwall
Multimodal Needle In A Haystack: Benchmarking Long-context Capability Of Multimodal Large Language Models (2024) • No Venue
Wang et al.
Mmlu-pro: A More Robust And Challenging Multi-task Language Understanding Benchmark (2024) • No Venue
Wang et al.
T2v-compbench: A Comprehensive Benchmark For Compositional Text-to-video Generation (2024) • No Venue
Sun et al.
Trustllm: Trustworthiness In Large Language Models (2024) • No Venue
Sun et al.
Planetarium: A Rigorous Benchmark For Translating Text To Structured Planning Languages (2024) • No Venue
Zuo et al.
Videohallucer: Evaluating Intrinsic And Extrinsic Hallucinations In Large Video-language Models (2024) • No Venue
Wang et al.
Needle In A Multimodal Haystack (2024) • No Venue
Wang et al.
Sotopia-π: Interactive Learning Of Socially Intelligent Language Agents (2024) • No Venue
Wang et al.
A Framework For Human Evaluation Of Large Language Models In Healthcare Derived From Literature Review (2024) • npj Digital Medicine • 131 citations
Tam et al.
Judgebench: A Benchmark For Evaluating Llm-based Judges (2024) • No Venue
Tan et al.
Opendevin: An Open Platform For AI Software Developers As Generalist Agents (2024) • No Venue
Wang et al.
Omnieval: An Omnidirectional And Automatic RAG Evaluation Benchmark In Financial Domain (2024) • No Venue
Wang et al.
Evaluating Retrieval Quality In Retrieval-augmented Generation (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 79 citations
Alireza Salemi, Hamed Zamani
Truth Or Mirage? Towards End-to-end Factuality Evaluation With LLM-OASIS (2024) • No Venue
Scirè et al.
Livexiv -- A Multi-modal Live Benchmark Based On Arxiv Papers Content (2024) • No Venue
Shabtay et al.
Who Validates The Validators? Aligning Llm-assisted Evaluation Of LLM Outputs With Human Preferences (2024) • UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology • 91 citations
Shankar et al.
TOMATO: Assessing Visual Temporal Reasoning Capabilities In Multimodal Foundation Models (2024) • No Venue
Shangguan et al.
Chartmimic: Evaluating Lmm's Cross-modal Reasoning Capability Via Chart-to-code Generation (2024) • No Venue
Shi et al.
Can Llms Generate Novel Research Ideas? A Large-scale Human Study With 100+ NLP Researchers (2024) • No Venue
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Global MMLU: Understanding And Addressing Cultural And Linguistic Biases In Multilingual Evaluation (2024) • No Venue
Singh et al.
Both Text And Images Leaked! A Systematic Analysis Of Multimodal LLM Data Contamination (2024) • No Venue
Song et al.
The Good, The Bad, And The Greedy: Evaluation Of Llms Should Not Ignore Non-determinism (2024) • No Venue
Song et al.
Enhancing Llm-based Feedback: Insights From Intelligent Tutoring Systems And The Learning Sciences (2024) • Communications in Computer and Information Science • 41 citations
John Stamper, Ruiwei Xiao, Xinying Hou
Are Your Llms Capable Of Stable Reasoning? (2024) • No Venue
Liu et al.
Longgenbench: Long-context Generation Benchmark (2024) • No Venue
Liu et al.
Teach Multimodal Llms To Comprehend Electrocardiographic Images (2024) • No Venue
Liu et al.
Agent-as-a-judge: Evaluate Agents With Agents (2024) • No Venue
Zhuge et al.
Starcoder 2 And The Stack V2: The Next Generation (2024) • No Venue
Lozhkov et al.
The AI Scientist: Towards Fully Automated Open-ended Scientific Discovery (2024) • No Venue
Lu et al.
Bento: Benchmark Task Reduction With In-context Transferability (2024) • No Venue
Zhao et al.
Mathverse: Does Your Multi-modal LLM Truly See The Diagrams In Visual Math Problems? (2024) • No Venue
Zhang et al.
Llama Beyond English: An Empirical Study On Language Capability Transfer (2024) • No Venue
Zhao et al.
Videoautoarena: An Automated Arena For Evaluating Large Multimodal Models In Video Analysis Through User Simulation (2024) • No Venue
Luo et al.
Plot2code: A Comprehensive Benchmark For Evaluating Multi-modal Large Language Models In Code Generation From Scientific Plots (2024) • No Venue
Wu et al.
Gpt-4v(ision) Is A Human-aligned Evaluator For Text-to-3d Generation (2024) • No Venue
Wu et al.
Evaluating Very Long-term Conversational Memory Of LLM Agents (2024) • No Venue
Maharana et al.
MMIU: Multimodal Multi-image Understanding For Evaluating Large Vision-language Models (2024) • No Venue
Meng et al.
Grouse: A Benchmark To Evaluate Evaluators In Grounded Question Answering (2024) • No Venue
Muller et al.
Bielik 7B V0.1: A Polish Language Model -- Development, Insights, And Evaluation (2024) • No Venue
Ociepa et al.
Reka Core, Flash, And Edge: A Series Of Powerful Multimodal Language Models (2024) • No Venue
Ormazabal et al.
Multi-dimensional Insights: Benchmarking Real-world Personalization In Large Multimodal Models (2024) • No Venue
Zhang et al.
Dreambench++: A Human-aligned Benchmark For Personalized Image Generation (2024) • No Venue
Peng et al.
Livebench: A Challenging, Contamination-free LLM Benchmark (2024) • No Venue
White et al.
A Toolbox For Surfacing Health Equity Harms And Biases In Large Language Models (2024) • Nature Medicine • 46 citations
Pfohl et al.
Benchmarking Agentic Workflow Generation (2024) • No Venue
Qiao et al.
We-math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? (2024) • No Venue
Qiao et al.
Hellobench: Evaluating Long Text Generation Capabilities Of Large Language Models (2024) • No Venue
Que et al.
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning With Checklist (2024) • No Venue
Zhou et al.
Long-form Factuality In Large Language Models (2024) • No Venue
Wei et al.
Urbench: A Comprehensive Benchmark For Evaluating Large Multimodal Models In Multi-view Urban Scenarios (2024) • No Venue
Zhou et al.
The Llama 3 Herd Of Models (2024) • No Venue
Dubey et al.
Mmbench-video: A Long-form Multi-shot Benchmark For Holistic Video Understanding (2024) • No Venue
Fang et al.
Processbench: Identifying Process Errors In Mathematical Reasoning (2024) • No Venue
Zheng et al.
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark (2024) • No Venue
Zhang et al.
BLINK: Multimodal Large Language Models Can See But Not Perceive (2024) • No Venue
Fu et al.
Video-mme: The First-ever Comprehensive Evaluation Benchmark Of Multi-modal Llms In Video Analysis (2024) • No Venue
Fu et al.
Mme-survey: A Comprehensive Survey On Evaluation Of Multimodal Llms (2024) • No Venue
Fu et al.
Are We Done With MMLU? (2024) • No Venue
Gema et al.
Justrank: Benchmarking LLM Judges For System Ranking (2024) • No Venue
Gera et al.
Atomovideo: High Fidelity Image-to-video Generation (2024) • No Venue
Gong et al.
Av-odyssey Bench: Can Your Multimodal Llms Really Understand Audio-visual Information? (2024) • No Venue
Gong et al.
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2024) • No Venue
Grosnit et al.
Pingpong: A Benchmark For Role-playing Language Models With User Emulation And Multi-model Evaluation (2024) • No Venue
Ilya Gusev
Chinese Simpleqa: A Chinese Factuality Evaluation For Large Language Models (2024) • No Venue
He et al.
Mmworld: Towards Multi-discipline Multi-faceted World Model Evaluation In Videos (2024) • No Venue
He et al.
Webvoyager: Building An End-to-end Web Agent With Large Multimodal Models (2024) • No Venue
He et al.
UCFE: A User-centric Financial Expertise Benchmark For Large Language Models (2024) • No Venue
Yang et al.
RULER: What's The Real Context Size Of Your Long-context Language Models? (2024) • No Venue
Hsieh et al.
Mmevalpro: Calibrating Multimodal Benchmarks Towards Trustworthy And Efficient Evaluation (2024) • No Venue
Huang et al.
Vbench++: Comprehensive And Versatile Benchmark Suite For Video Generative Models (2024) • No Venue
Huang et al.
Piccolo2: General Text Embedding With Multi-task Hybrid Loss Training (2024) • No Venue
Huang et al.
Gitchameleon: Unmasking The Version-switching Capabilities Of Code Generation Models (2024) • No Venue
Islah et al.
Mmsearch: Benchmarking The Potential Of Large Models As Multi-modal Search Engines (2024) • No Venue
Jiang et al.
Genai Arena: An Open Evaluation Platform For Generative Models (2024) • No Venue
Jiang et al.
Hidden Flaws Behind Expert-level Accuracy Of Multimodal GPT-4 Vision In Medicine (2024) • npj Digital Medicine • 75 citations
Jin et al.
Dsbench: How Far Are Data Science Agents To Becoming Data Science Experts? (2024) • No Venue
Jing et al.
MEDIC: Towards A Comprehensive Framework For Evaluating Llms In Clinical Applications (2024) • No Venue
Kanithi et al.
Evaluating Language Models As Synthetic Data Generators (2024) • No Venue
Kim et al.
Prometheus 2: An Open Source Language Model Specialized In Evaluating Other Language Models (2024) • No Venue
Kim et al.
Fact, Fetch, And Reason: A Unified Evaluation Of Retrieval-augmented Generation (2024) • No Venue
Krishna et al.
Babilong: Testing The Limits Of Llms With Long Context Reasoning-in-a-haystack (2024) • No Venue
Kuratov et al.
Summary Of A Haystack: A Challenge To Long-context Llms And RAG Systems (2024) • No Venue
Laban et al.
A Careful Examination Of Large Language Model Performance On Grade School Arithmetic (2024) • No Venue
Zhang et al.
Theagentcompany: Benchmarking LLM Agents On Consequential Real World Tasks (2024) • No Venue
Xu et al.
The Curse Of Multi-modalities: Evaluating Hallucinations Of Large Multimodal Models Across Language, Visual, And Audio (2024) • No Venue
Leng et al.
Lmms-eval: Reality Check On The Evaluation Of Large Multimodal Models (2024) • No Venue
Zhang et al.
Datacomp-lm: In Search Of The Next Generation Of Training Sets For Language Models (2024) • No Venue
Li et al.
MMIE: Massive Multimodal Interleaved Comprehension Benchmark For Large Vision-language Models (2024) • No Venue
Xia et al.
Llava-critic: Learning To Evaluate Multimodal Models (2024) • No Venue
Xiong et al.
Humaneval-v: Benchmarking High-level Visual Reasoning With Complex Diagrams In Coding Tasks (2024) • No Venue
Zhang et al.
Wildbench: Benchmarking Llms With Challenging Tasks From Real Users In The Wild (2024) • No Venue
Lin et al.
Travelplanner: A Benchmark For Real-world Planning With Language Agents (2024) • No Venue
Xie et al.
The Finben: An Holistic Financial Benchmark For Large Language Models (2024) • No Venue
Xie et al.
Demystifying GPT Self-repair For Code Generation (2023) • No Venue
Olausson et al.
Toward Verifiable And Reproducible Human Evaluation For Text-to-image Generation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Otani et al.
Chatgpt, Can You Generate Solutions For My Coding Exercises? An Evaluation On Its Effectiveness In An Undergraduate Java Programming Course (2023) • ITiCSE 2023: Innovation and Technology in Computer Science Education • 61 citations
Ouh et al.
Learning Gain Differences Between Chatgpt And Human Tutor Generated Algebra Hints (2023) • Arxiv • 71 citations
Zachary A. Pardos, Shreya Bhandari
Agentbench: Evaluating Llms As Agents (2023) • No Venue
Liu et al.
G-eval: NLG Evaluation Using GPT-4 With Better Human Alignment (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 383 citations
Liu et al.
Aligning Large Multimodal Models With Factually Augmented RLHF (2023) • No Venue
Sun et al.
Large Language Models Can Accurately Predict Searcher Preferences (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 82 citations
Thomas et al.
Judgelm: Fine-tuned Large Language Models Are Scalable Judges (2023) • No Venue
Lianghui Zhu, Xinggang Wang, Xinlong Wang
RTLLM: An Open-source Benchmark For Design RTL Generation With Large Language Model (2023) • 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) • 116 citations
Lu et al.
Chatgpt As A Factual Inconsistency Evaluator For Text Summarization (2023) • Arxiv • 50 citations
Zheheng Luo, Qianqian Xie, Sophia Ananiadou
Fingpt: Large Generative Models For A Small Language (2023) • No Venue
Luukkonen et al.
Evaluating The Social Impact Of Generative AI Systems In Systems And Society (2023) • Arxiv • 41 citations
Solaiman et al.
Chatgpt Or Grammarly? Evaluating Chatgpt On Grammatical Error Correction Benchmark (2023) • Arxiv • 48 citations
Wu et al.
Text2kgbench: A Benchmark For Ontology-driven Knowledge Graph Generation From Text (2023) • Lecture Notes in Computer Science • 51 citations
Mihindukulasooriya et al.
Factscore: Fine-grained Atomic Evaluation Of Factual Precision In Long Form Text Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 132 citations
Min et al.
Orca 2: Teaching Small Language Models How To Reason (2023) • No Venue
Mitra et al.
State Of What Art? A Call For Multi-prompt LLM Evaluation (2023) • Transactions of the Association for Computational Linguistics • 58 citations
Mizrahi et al.
Orca: Progressive Learning From Complex Explanation Traces Of GPT-4 (2023) • No Venue
Mukherjee et al.
Human Preference Score: Better Aligning Text-to-image Models With Human Preference (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 108 citations
Wu et al.
Video-chatgpt: Towards Detailed Video Understanding Via Large Vision And Language Models (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 224 citations
Maaz et al.
Towards Expert-level Medical Question Answering With Large Language Models (2023) • Arxiv • 329 citations
Singhal et al.
An Early Evaluation Of Gpt-4v(ision) (2023) • No Venue
Wu et al.
Assessing Cross-cultural Alignment Between Chatgpt And Human Societies: An Empirical Study (2023) • Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) • 96 citations
Cao et al.
Tinystories: How Small Can Language Models Be And Still Speak Coherent English? (2023) • No Venue
Ronen Eldan, Yuanzhi Li
Is Chatgpt A Good NLG Evaluator? A Preliminary Study (2023) • Proceedings of the 4th New Frontiers in Summarization Workshop • 178 citations
Wang et al.
MEGA: Multilingual Evaluation Of Generative AI (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 76 citations
Ahuja et al.
Can We Trust The Evaluation On Chatgpt? (2023) • Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023) • 62 citations
Aiyappa et al.
Hrs-bench: Holistic, Reliable And Scalable Benchmark For Text-to-image Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Bakr et al.
A Multitask, Multilingual, Multimodal Evaluation Of Chatgpt On Reasoning, Hallucination, And Interactivity (2023) • Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) • 466 citations
Bang et al.
Multimodal Llms For Health Grounded In Individual-specific Data (2023) • Lecture Notes in Computer Science • 44 citations
Belyaeva et al.
Chatunitest: A Framework For Llm-based Test Generation (2023) • FSE '24: 32nd ACM International Conference on the Foundations of Software Engineering • 56 citations
Chen et al.
Art Or Artifice? Large Language Models And The False Promise Of Creativity (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 56 citations
Chakrabarty et al.
Spanish Pre-trained BERT Model And Evaluation Data (2023) • Arxiv • 332 citations
Cañete et al.
A Survey On Evaluation Of Large Language Models (2023) • No Venue
Chang et al.
Shepherd: A Critic For Language Model Generation (2023) • No Venue
Wang et al.
Rethinking The Evaluation For Conversational Recommendation In The Era Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 45 citations
Wang et al.
Can Large Language Models Be An Alternative To Human Evaluations? (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 161 citations
Cheng-Han Chiang, Hung-Yi Lee
Agieval: A Human-centric Benchmark For Evaluating Foundation Models (2023) • Arxiv • 60 citations
Zhong et al.
Mm-vet: Evaluating Large Multimodal Models For Integrated Capabilities (2023) • Arxiv • 59 citations
Yu et al.
Codereval: A Benchmark Of Pragmatic Code Generation With Generative Pre-trained Models (2023) • Proceedings of the IEEE/ACM 46th International Conference on Software Engineering • 77 citations
Yu et al.
Llms Cannot Reliably Identify And Reason About Security Vulnerabilities (yet?): A Comprehensive Evaluation, Framework, And Benchmarks (2023) • 2024 IEEE Symposium on Security and Privacy (SP) • 50 citations
Ullah et al.
Judging Llm-as-a-judge With Mt-bench And Chatbot Arena (2023) • No Venue
Zheng et al.
Exploring Large Language Models' Cognitive Moral Development Through Defining Issues Test (2023) • No Venue
Tanmay et al.
Ragas: Automated Evaluation Of Retrieval Augmented Generation (2023) • Arxiv • 60 citations
Es et al.
Just Ask For Calibration: Strategies For Eliciting Calibrated Confidence Scores From Language Models Fine-tuned With Human Feedback (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 49 citations
Tian et al.
Instruction-following Evaluation For Large Language Models (2023) • No Venue
Zhou et al.
Semeval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (multiconer 2) (2023) • Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023) • 43 citations
Fetahu et al.
Gptscore: Evaluate As You Desire (2023) • Arxiv • 80 citations
Fu et al.
Codebertscore: Evaluating Code Generation With Pretrained Models Of Code (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 63 citations
Zhou et al.
Datacomp: In Search Of The Next Generation Of Multimodal Datasets (2023) • Arxiv • 72 citations
Gadre et al.
Enabling Large Language Models To Generate Text With Citations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 68 citations
Gao et al.
Gemini: A Family Of Highly Capable Multimodal Models (2023) • Arxiv • 758 citations
Team et al.
An Empirical Evaluation Of Using Large Language Models For Automated Unit Test Generation (2023) • IEEE Transactions on Software Engineering • 176 citations
Schäfer et al.
How Far Are Large Language Models From Agents With Theory-of-mind? (2023) • No Venue
Zhou et al.
Hallusionbench: An Advanced Diagnostic Suite For Entangled Language Hallucination And Visual Illusion In Large Vision-language Models (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Guan et al.
Legalbench: A Collaboratively Built Benchmark For Measuring Legal Reasoning In Large Language Models (2023) • SSRN Electronic Journal • 77 citations
Guha et al.
PPTC Benchmark: Evaluating Large Language Models For Powerpoint Task Completion (2023) • No Venue
Guo et al.
Lamp: When Large Language Models Meet Personalization (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
Salemi et al.
Evaluating Large Language Models On A Highly-specialized Topic, Radiation Oncology Physics (2023) • Frontiers in Oncology • 112 citations
Holmes et al.
Mentallama: Interpretable Mental Health Analysis On Social Media With Large Language Models (2023) • WWW '24: The ACM Web Conference 2024 • 79 citations
Yang et al.
C-eval: A Multi-level Multi-discipline Chinese Evaluation Suite For Foundation Models (2023) • Arxiv • 89 citations
Huang et al.
Testing The Reliability Of Chatgpt For Text Annotation And Classification: A Cautionary Remark (2023) • Arxiv • 80 citations
Michael V. Reiss
Baichuan 2: Open Large-scale Language Models (2023) • No Venue
Yang et al.
A Comprehensive Evaluation Of Large Language Models On Benchmark Biomedical Text Processing Tasks (2023) • Computers in Biology and Medicine • 61 citations
Jahan et al.
Is Chatgpt Fair For Recommendation? Evaluating Fairness In Large Language Model Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 94 citations
Zhang et al.
Huatuogpt, Towards Taming Language Model To Be A Doctor (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 128 citations
Zhang et al.
Large Language Models For Education: Grading Open-ended Questions Using Chatgpt (2023) • SBES 2023: XXXVII Brazilian Symposium on Software Engineering • 46 citations
Pinto et al.
Mindmap: Knowledge Graph Prompting Sparks Graph Of Thoughts In Large Language Models (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Yilin Wen, Zifeng Wang, Jimeng Sun
Llm-eval: Unified Multi-dimensional Automatic Evaluation For Open-domain Conversations With Large Language Models (2023) • Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023) • 45 citations
Yen-Ting Lin, Yun-Nung Chen
Halueval: A Large-scale Hallucination Evaluation Benchmark For Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 143 citations
Li et al.
FLM-101B: An Open LLM And How To Train It With $100K Budget (2023) • No Venue
Li et al.
Otterhd: A High-resolution Multi-modality Model (2023) • No Venue
Li et al.
Starcoder: May The Source Be With You! (2023) • No Venue
Li et al.
Imagereward: Learning And Evaluating Human Preferences For Text-to-image Generation (2023) • Arxiv • 99 citations
Xu et al.
Language-driven Representation Learning For Robotics (2023) • Robotics: Science and Systems XIX • 47 citations
Karamcheti et al.
Evaluating GPT-4 And Chatgpt On Japanese Medical Licensing Examinations (2023) • Arxiv • 50 citations
Kasai et al.
Prometheus: Inducing Fine-grained Evaluation Capability In Language Models (2023) • No Venue
Kim et al.
Evallm: Interactive Evaluation Of Large Language Model Prompts On User-defined Criteria (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 43 citations
Kim et al.
Evaluating Large Language Models In Theory Of Mind Tasks (2023) • Proceedings of the National Academy of Sciences • 83 citations
Michal Kosinski
Chinese Intermediate English Learners Outdid Chatgpt In Deep Cohesion: Evidence From English Narrative Writing (2023) • System • 50 citations
Zhou et al.
Toolllm: Facilitating Large Language Models To Master 16000+ Real-world Apis (2023) • No Venue
Qin et al.
ELEVATER: A Benchmark And Toolkit For Evaluating Language-augmented Visual Models (2022) • Arxiv • 64 citations
Li et al.
Discovering Language Model Behaviors With Model-written Evaluations (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 47 citations
Perez et al.
Holistic Evaluation Of Language Models (2022) • Annals of the New York Academy of Sciences • 107 citations
Liang et al.
Impact Of Pretraining Term Frequencies On Few-shot Reasoning (2022) • Arxiv • 51 citations
Razeghi et al.
A Systematic Evaluation Of Large Language Models Of Code (2022) • MAPS '22: 6th ACM SIGPLAN International Symposium on Machine Programming • 362 citations
Xu et al.
Better Together? An Evaluation Of Ai-supported Code Translation (2022) • 27th International Conference on Intelligent User Interfaces • 57 citations
Weisz et al.
Benchmarking Large Language Models For Automated Verilog RTL Code Generation (2022) • 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) • 112 citations
Thakur et al.
Co-writing Screenplays And Theatre Scripts With Language Models: An Evaluation By Industry Professionals (2022) • CHI '23: CHI Conference on Human Factors in Computing Systems • 154 citations
Mirowski et al.
Large Language Models Encode Clinical Knowledge (2022) • Nature • 1963 citations
Singhal et al.
M-SENA: An Integrated Platform For Multimodal Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 58 citations
Mao et al.
Challenging Big-bench Tasks And Whether Chain-of-thought Can Solve Them (2022) • Arxiv • 40 citations
Suzgun et al.
Dynatask: A Framework For Creating Dynamic AI Benchmark Tasks (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 40 citations
Thrush et al.
Complexity-based Prompting For Multi-step Reasoning (2022) • Arxiv • 72 citations
Fu et al.
Evaluating Mixed-initiative Conversational Search Systems Via User Simulation (2022) • Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining • 42 citations
Ivan Sekulić, Mohammad Aliannejadi, Fabio Crestani
News Summarization And Evaluation In The Era Of GPT-3 (2022) • Arxiv • 180 citations
Tanya Goyal, Junyi Jessy Li, Greg Durrett
Self-critiquing Models For Assisting Human Evaluators (2022) • Arxiv • 46 citations
Saunders et al.
NLX-GPT: A Model For Natural Language Explanations In Vision And Vision-language Tasks (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Fawaz Sammani, Tanmoy Mukherjee, Nikos Deligiannis
Coreference-aware Dialogue Summarization (2021) • Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue • 46 citations
Zhengyuan Liu, Ke Shi, Nancy F. Chen
Challenges In Detoxifying Language Models (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 47 citations
Welbl et al.
On The Evaluation Of Neural Code Summarization (2021) • Proceedings of the 44th International Conference on Software Engineering • 64 citations
Shi et al.
Semantic Answer Similarity For Evaluating Question Answering Models (2021) • Proceedings of the 3rd Workshop on Machine Reading for Question Answering • 43 citations
Risch et al.
Arat5: Text-to-text Transformers For Arabic Language Generation (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 56 citations
El Moatez Billah Nagoudi, Abdelrahim Elmadany, Muhammad Abdul-Mageed
Building And Evaluating Open-domain Dialogue Corpora With Clarifying Questions (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Aliannejadi et al.
Bartscore: Evaluating Generated Text As Text Generation (2021) • Arxiv • 318 citations
Weizhe Yuan, Graham Neubig, Pengfei Liu
Evaluation Of BERT And ALBERT Sentence Embedding Performance On Downstream NLP Tasks (2021) • 2020 25th International Conference on Pattern Recognition (ICPR) • 115 citations
Choi et al.
Clipscore: A Reference-free Evaluation Metric For Image Captioning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 720 citations
Hessel et al.
Asvspoof 2021: Accelerating Progress In Spoofed And Deepfake Speech Detection (2021) • 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge • 268 citations
Yamagishi et al.
E-vil: A Dataset And Benchmark For Natural Language Explanations In Vision-language Tasks (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 54 citations
Kayser et al.
Societal Biases In Retrieved Contents: Measurement Framework And Adversarial Mitigation For BERT Rankers (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 44 citations
Navid Rekabsaz, Simone Kopeinik, Markus Schedl
To Ship Or Not To Ship: An Extensive Evaluation Of Automatic Metrics For Machine Translation (2021) • Arxiv • 82 citations
Kocmi et al.
The Expando-mono-duo Design Pattern For Text Ranking With Pretrained Sequence-to-sequence Models (2021) • Arxiv • 40 citations
Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin
An Improved Baseline For Sentence-level Relation Extraction (2021) • Arxiv • 49 citations
Wenxuan Zhou, Muhao Chen
Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies (2021) • Behavior Research Methods • 65 citations
Flemotomos et al.
Experts, Errors, And Context: A Large-scale Study Of Human Evaluation For Machine Translation (2021) • Transactions of the Association for Computational Linguistics • 91 citations
Freitag et al.
Dynaeval: Unifying Turn And Dialogue Level Evaluation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Zhang et al.
Robustness Gym: Unifying The NLP Evaluation Landscape (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations • 48 citations
Goel et al.
The FLORES-101 Evaluation Benchmark For Low-resource And Multilingual Machine Translation (2021) • Arxiv • 82 citations
Goyal et al.
Textflint: Unified Multilingual Robustness Evaluation Toolkit For Natural Language Processing (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations • 79 citations
Gui et al.
Learning-to-rank With BERT In Tf-ranking (2020) • Arxiv • 60 citations
Han et al.
Summeval: Re-evaluating Summarization Evaluation (2020) • Transactions of the Association for Computational Linguistics • 47 citations
Fabbri et al.
UNION: An Unreferenced Metric For Evaluating Open-ended Story Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Jian Guan, Minlie Huang
BLEURT: Learning Robust Metrics For Text Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 40 citations
Thibault Sellam, Dipanjan Das, Ankur P. Parikh
CLUE: A Chinese Language Understanding Evaluation Benchmark (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 235 citations
Xu et al.
On The Limitations Of Cross-lingual Encoders As Exposed By Reference-free Machine Translation Evaluation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 52 citations
Zhao et al.
Evaluating Models' Local Decision Boundaries Via Contrast Sets (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 252 citations
Gardner et al.
Evaluating Conversational Recommender Systems Via User Simulation (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 73 citations
Shuo Zhang, Krisztian Balog
Imitating Interactive Intelligence (2020) • Arxiv • 43 citations
Abramson et al.
Checkthat! At CLEF 2020: Enabling The Automatic Identification And Verification Of Claims In Social Media (2020) • Lecture Notes in Computer Science • 50 citations
Barron-Cedeno et al.
Beyond Accuracy: Behavioral Testing Of NLP Models With Checklist (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 51 citations
Ribeiro et al.
Semeval-2020 Task 4: Commonsense Validation And Explanation (2020) • Proceedings of the Fourteenth Workshop on Semantic Evaluation • 89 citations
Wang et al.
Logical Natural Language Generation From Open-domain Tables (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 114 citations
Chen et al.
A Benchmark For Systematic Generalization In Grounded Language Understanding (2020) • Arxiv • 45 citations
Ruis et al.
On Faithfulness And Factuality In Abstractive Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 54 citations
Maynez et al.
SPECTER: Document-level Representation Learning Using Citation-informed Transformers (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 218 citations
Cohan et al.
Coco: Controllable Counterfactuals For Evaluating Dialogue State Trackers (2020) • Arxiv • 41 citations
Li et al.
Codebleu: A Method For Automatic Evaluation Of Code Synthesis (2020) • Arxiv • 184 citations
Ren et al.
GRADE: Automatic Graph-enhanced Coherence Metric For Evaluating Open-domain Dialogue Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 67 citations
Huang et al.
Simuleval: An Evaluation Toolkit For Simultaneous Translation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 69 citations
Ma et al.
FEQA: A Question Answering Evaluation Framework For Faithfulness Assessment In Abstractive Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 79 citations
Esin Durmus, He He, Mona Diab
Designing Precise And Robust Dialogue Response Evaluators (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 40 citations
Tianyu Zhao, Divesh Lala, Tatsuya Kawahara
Evaluating The State-of-the-art Of End-to-end Natural Language Generation: The E2E NLG Challenge (2019) • Computer Speech & Language • 180 citations
Ondřej Dušek, Jekaterina Novikova, Verena Rieser
Editnts: An Neural Programmer-interpreter Model For Sentence Simplification Through Explicit Editing (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 139 citations
Dong et al.
Masked Language Model Scoring (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 145 citations
Salazar et al.
Multilingual Is Not Enough: BERT For Finnish (2019) • Arxiv • 121 citations
Virtanen et al.
Semantic Object Accuracy For Generative Text-to-image Synthesis (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 146 citations
Tobias Hinz, Stefan Heinrich, Stefan Wermter
Moverscore: Text Generation Evaluating With Contextualized Embeddings And Earth Mover Distance (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 444 citations
Zhao et al.
Strategies For Structuring Story Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 186 citations
Angela Fan, Mike Lewis, Yann Dauphin
Approximating Interactive Human Evaluation With Self-play For Open-domain Dialog Systems (2019) • Arxiv • 51 citations
Ghandeharioun et al.
The Effect Of Translationese In Machine Translation Test Sets (2019) • Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) • 69 citations
Mike Zhang, Antonio Toral
Asking Clarifying Questions In Open-domain Information-seeking Conversations (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 175 citations
Aliannejadi et al.
What Makes A Good Conversation? How Controllable Attributes Affect Human Judgments (2019) • Proceedings of the 2019 Conference of the North • 224 citations
See et al.
EASSE: Easier Automatic Sentence Simplification Evaluation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations • 55 citations
Alva-Manchego et al.
NAS Evaluation Is Frustratingly Hard (2019) • Arxiv • 109 citations
Antoine Yang, Pedro M. Esperança, Fabio M. Carlucci
ACUTE-EVAL: Improved Dialogue Evaluation With Optimized Questions And Multi-turn Comparisons (2019) • Arxiv • 79 citations
Margaret Li, Jason Weston, Stephen Roller
Sticking To The Facts: Confident Decoding For Faithful Data-to-text Generation (2019) • Arxiv • 48 citations
Tian et al.
MLQA: Evaluating Cross-lingual Extractive Question Answering (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 52 citations
Lewis et al.
Evaluating Style Transfer For Text (2019) • Proceedings of the 2019 Conference of the North • 75 citations
Mir et al.
Findings Of The First Shared Task On Machine Translation Robustness (2019) • Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) • 55 citations
Li et al.
Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM (2019) • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 49 citations
Cai et al.
Transfer Learning In Biomedical Natural Language Processing: An Evaluation Of BERT And Elmo On Ten Benchmarking Datasets (2019) • Proceedings of the 18th BioNLP Workshop and Shared Task • 792 citations
Yifan Peng, Shankai Yan, Zhiyong Lu
No Training Required: Exploring Random Encoders For Sentence Classification (2019) • Arxiv • 75 citations
John Wieting, Douwe Kiela
Does It Make Sense? And Why? A Pilot Study For Sense Making And Explanation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 105 citations
Wang et al.
Superglue: A Stickier Benchmark For General-purpose Language Understanding Systems (2019) • Arxiv • 984 citations
Wang et al.
Federated Evaluation Of On-device Personalization (2019) • Arxiv • 117 citations
Wang et al.
On Evaluation Of Adversarial Perturbations For Sequence-to-sequence Models (2019) • Proceedings of the 2019 Conference of the North • 114 citations
Michel et al.
Texygen: A Benchmarking Platform For Text Generation Models (2018) • Arxiv • 154 citations
Zhu et al.
GLUE: A Multi-task Benchmark And Analysis Platform For Natural Language Understanding (2018) • Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP • 3674 citations
Wang et al.
The Price Of Debiasing Automatic Metrics In Natural Language Evaluation (2018) • Arxiv • 43 citations
Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang
Incsql: Training Incremental Text-to-sql Parsers With Non-deterministic Oracles (2018) • Arxiv • 59 citations
Shi et al.
Improving Text-to-sql Evaluation Methodology (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 213 citations
Finegan-Dollak et al.
Unsupervised Sentence Compression Using Denoising Auto-encoders (2018) • Proceedings of the 22nd Conference on Computational Natural Language Learning • 46 citations
Thibault Févry, Jason Phang
A Retrospective Analysis Of The Fake News Challenge Stance Detection Task (2018) • Arxiv • 68 citations
Hanselowski et al.
Language Gans Falling Short (2018) • ICLR 2020 - Proceedings of the Seventh International Conference on Learning Representation • 81 citations
Caccia et al.
On Accurate Evaluation Of Gans For Language Generation (2018) • Arxiv • 77 citations
Stanislau Semeniuta, Aliaksei Severyn, Sylvain Gelly
News Session-based Recommendations Using Deep Neural Networks (2018) • Proceedings of the 3rd Workshop on Deep Learning for Recommender Systems • 95 citations
Gabriel de Souza P. Moreira, Felipe Ferreira, Adilson Marques da Cunha
RUBER: An Unsupervised Method For Automatic Evaluation Of Open-domain Dialog Systems (2017) • Arxiv • 118 citations
Tao et al.
Shapeworld - A New Test Methodology For Multimodal Language Understanding (2017) • Arxiv • 47 citations
Alexander Kuhnle, Ann Copestake
Recurrent Neural Network-based Sentence Encoder With Gated Attention For Natural Language Inference (2017) • Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP • 88 citations
Chen et al.
Adversarial Evaluation Of Dialogue Models (2017) • Arxiv • 66 citations
Anjuli Kannan, Oriol Vinyals
Title Generation For User Generated Videos (2016) • Lecture Notes in Computer Science • 65 citations
Zeng et al.

Showing first 12 while collapsed. Click to expand and reveal all 483.

— F —

Fine Tuning 1097 papers #

Enhancing Human-like Responses In Large Language Models (2025) • No Venue
Ethem Yağız Çalık, Talha Rüzgar Akkuş
Reasonflux-prm: Trajectory-aware Prms For Long Chain-of-thought Reasoning In Llms (2025) • No Venue
Zou et al.
Tattoo: Tool-grounded Thinking PRM For Test-time Scaling In Tabular Reasoning (2025) • No Venue
Zou et al.
Segagent: Exploring Pixel Understanding Capabilities In Mllms By Imitating Human Annotator Trajectories (2025) • No Venue
Zhu et al.
Internvl3: Exploring Advanced Training And Test-time Recipes For Open-source Multimodal Models (2025) • No Venue
Zhu et al.
DRIVE: Data Curation Best Practices For Reinforcement Learning With Verifiable Reward In Competitive Code Generation (2025) • No Venue
Zhu et al.
Is Extending Modality The Right Path Towards Omni-modality? (2025) • No Venue
Zhu et al.
The Path Not Taken: RLVR Provably Learns Off The Principals (2025) • No Venue
Zhu et al.
Towards Faithful And Controllable Personalization Via Critique-post-edit Reinforcement Learning (2025) • No Venue
Zhu et al.
Hephaestus: Improving Fundamental Agent Capabilities Of Large Language Models Through Continual Pre-training (2025) • No Venue
Zhuang et al.
Longwriter-v: Enabling Ultra-long And High-fidelity Generation In Vision-language Models (2025) • No Venue
Tu et al.
How To Train Your LLM Web Agent: A Statistical Diagnosis (2025) • No Venue
Vattikonda et al.
Qwenlong-l1: Towards Long-context Large Reasoning Models With Reinforcement Learning (2025) • No Venue
Wan et al.
Diversity-enhanced Reasoning For Subjective Questions (2025) • No Venue
Wang et al.
Critique Fine-tuning: Learning To Critique Is More Effective Than Learning To Imitate (2025) • No Venue
Yubo Wang, Xiang Yue, Wenhu Chen
From Editor To Dense Geometry Estimator (2025) • No Venue
Wang et al.
Geovista: Web-augmented Agentic Visual Reasoning For Geolocalization (2025) • No Venue
Wang et al.
Jigsaw-r1: A Study Of Rule-based Visual Reinforcement Learning With Jigsaw Puzzles (2025) • No Venue
Wang et al.
F2LLM Technical Report: Matching SOTA Embedding Performance With 6 Million Open-source Data (2025) • No Venue
Zhang et al.
DICEPTION: A Generalist Diffusion Model For Visual Perceptual Tasks (2025) • No Venue
Zhao et al.
Variational Reasoning For Language Models (2025) • No Venue
Zhou et al.
Neural-driven Image Editing (2025) • No Venue
Zhou et al.
Omniworld: A Multi-domain And Multi-modal Dataset For 4D World Modeling (2025) • No Venue
Zhou et al.
Roborefer: Towards Spatial Referring With Reasoning In Vision-language Models For Robotics (2025) • No Venue
Zhou et al.
Reinforced Visual Perception With Tools (2025) • No Venue
Zhou et al.
GKG-LLM: A Unified Framework For Generalized Knowledge Graph Construction (2025) • No Venue
Zhang et al.
Inverse Ifeval: Can Llms Unlearn Stubborn Training Conventions To Follow Real Instructions? (2025) • No Venue
Zhang et al.
MM-RLHF: The Next Step Forward In Multimodal LLM Alignment (2025) • No Venue
Zhang et al.
Openmmreasoner: Pushing The Frontiers For Multimodal Reasoning With An Open And General Recipe (2025) • No Venue
Zhang et al.
Qwen3 Embedding: Advancing Text Embedding And Reranking Through Foundation Models (2025) • No Venue
Zhang et al.
Skywork-r1v4: Toward Agentic Multimodal Intelligence Through Interleaved Thinking With Images And Deepresearch (2025) • No Venue
Zhang et al.
Sageattention3: Microscaling FP4 Attention For Inference And An Exploration Of 8-bit Training (2025) • No Venue
Zhang et al.
Speakervid-5m: A Large-scale High-quality Dataset For Audio-visual Dyadic Interactive Human Generation (2025) • No Venue
Zhang et al.
Videollama 3: Frontier Multimodal Foundation Models For Image And Video Understanding (2025) • No Venue
Zhang et al.
Videorepa: Learning Physics For Video Generation Through Relational Alignment With Foundation Models (2025) • No Venue
Zhang et al.
Babel: Open Multilingual Large Language Models Serving Over 90% Of Global Speakers (2025) • No Venue
Zhao et al.
Omnialign-v: Towards Enhanced Alignment Of Mllms With Human Preference (2025) • No Venue
Zhao et al.
MM-HELIX: Boosting Multimodal Long-chain Reflective Reasoning With Holistic Platform And Adaptive Hybrid Policy Optimization (2025) • No Venue
Zhao et al.
Let Llms Break Free From Overthinking Via Self-braking Tuning (2025) • No Venue
Zhao et al.
Promptcot 2.0: Scaling Prompt Synthesis For Large Language Model Reasoning (2025) • No Venue
Zhao et al.
Riflex: A Free Lunch For Length Extrapolation In Video Diffusion Transformers (2025) • No Venue
Zhao et al.
Diffusionnft: Online Diffusion Reinforcement With Forward Process (2025) • No Venue
Zheng et al.
Architecture Decoupling Is Not All You Need For Unified Multimodal Model (2025) • No Venue
Zheng et al.
Parallel-r1: Towards Parallel Thinking Via Reinforcement Learning (2025) • No Venue
Zheng et al.
Z1: Efficient Test-time Scaling With Code (2025) • No Venue
Yu et al.
From F(x) And G(x) To F(g(x)): Llms Learn New Skills In RL By Composing Old Ones (2025) • No Venue
Yuan et al.
Efficientllm: Efficiency In Large Language Models (2025) • No Venue
Yuan et al.
Yue: Scaling Open Foundation Models For Long-form Music Generation (2025) • No Venue
Yuan et al.
Designlab: Designing Slides Through Iterative Detection And Correction (2025) • No Venue
Yun et al.
Reasoning Vectors: Transferring Chain-of-thought Capabilities Via Task Arithmetic (2025) • No Venue
Mohammad Zbeeb, Hasan Abed Al Kader Hammoud, Bernard Ghanem
Skywork-swe: Unveiling Data Scaling Laws For Software Engineering In Llms (2025) • No Venue
Zeng et al.
Satori-swe: Evolutionary Test-time Scaling For Sample-efficient Software Engineering (2025) • No Venue
Zeng et al.
SIFT: Grounding LLM Reasoning In Contexts Via Stickers (2025) • No Venue
Zeng et al.
Vision-r1: Evolving Human-free Alignment In Large Vision-language Models Via Vision-guided Reinforcement Learning (2025) • No Venue
Zhan et al.
100 Days After Deepseek-r1: A Survey On Replication Studies And More Directions For Reasoning Language Models (2025) • No Venue
Zhang et al.
Agent Learning Via Early Experience (2025) • No Venue
Zhang et al.
Bee: A High-quality Corpus And Full-stack Suite To Unlock Advanced Fully Open Mllms (2025) • No Venue
Zhang et al.
Browseragent: Building Web Agents With Human-inspired Web Browsing Actions (2025) • No Venue
Zhang et al.
Mathcoder-vl: Bridging Vision And Code For Enhanced Multimodal Mathematical Reasoning (2025) • No Venue
Wang et al.
Omniear: Benchmarking Agent Reasoning In Embodied Tasks (2025) • No Venue
Wang et al.
Open-qwen2vl: Compute-efficient Pre-training Of Fully-open Multimodal Llms On Academic Resources (2025) • No Venue
Wang et al.
RLVER: Reinforcement Learning With Verifiable Emotion Rewards For Empathetic Agents (2025) • No Venue
Wang et al.
Revolutionizing Reinforcement Learning Framework For Diffusion Large Language Models (2025) • No Venue
Wang et al.
Resa: Transparent Reasoning Models Via Saes (2025) • No Venue
Wang et al.
Scireasoner: Laying The Scientific Reasoning Ground Across Disciplines (2025) • No Venue
Wang et al.
Sota With Less: Mcts-guided Sample Selection For Data-efficient Visual Reasoning Self-improvement (2025) • No Venue
Wang et al.
Winning The Pruning Gamble: A Unified Approach To Joint Sample And Token Pruning For Efficient Supervised Fine-tuning (2025) • No Venue
Wang et al.
Video-thinker: Sparking "thinking With Videos" Via Reinforcement Learning (2025) • No Venue
Wang et al.
Worldpm: Scaling Human Preference Modeling (2025) • No Venue
Wang et al.
Advancing Multimodal Reasoning Via Reinforcement Learning With Cold Start (2025) • No Venue
Wei et al.
Codearc: Benchmarking Reasoning Capabilities Of LLM Agents For Inductive Program Synthesis (2025) • No Venue
Wei et al.
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior For Visual Reasoning (2025) • No Venue
Wei et al.
Light-r1: Curriculum SFT, DPO And RL For Long COT From Scratch And Beyond (2025) • No Venue
Wen et al.
On The Generalization Of SFT: A Reinforcement Learning Perspective With Reward Rectification (2025) • No Venue
Wu et al.
Gui-actor: Coordinate-free Visual Grounding For GUI Agents (2025) • No Venue
Wu et al.
Spatial-mllm: Boosting MLLM Capabilities In Visual-based Spatial Intelligence (2025) • No Venue
Wu et al.
Superwriter: Reflection-driven Long-form Generation With Large Language Models (2025) • No Venue
Wu et al.
Webdancer: Towards Autonomous Information Seeking Agency (2025) • No Venue
Wu et al.
Dense Retrievers Can Fail On Simple Queries: Revealing The Granularity Dilemma Of Embeddings (2025) • No Venue
Xu et al.
Leetcodedataset: A Temporal Dataset For Robust Evaluation And Efficient Training Of Code Llms (2025) • No Venue
Xia et al.
Scaling Language-centric Omnimodal Representation Learning (2025) • No Venue
Xiao et al.
Self-rewarding Correction For Mathematical Reasoning (2025) • No Venue
Xiong et al.
Pretrainzero: Reinforcement Active Pretraining (2025) • No Venue
Xing et al.
Flag-trader: Fusion Llm-agent With Gradient-based Reinforcement Learning For Financial Trading (2025) • No Venue
Xiong et al.
Comfyui-copilot: An Intelligent Assistant For Automated Workflow Development (2025) • No Venue
Xu et al.
Phi-4-mini-reasoning: Exploring The Limits Of Small Reasoning Language Models In Math (2025) • No Venue
Xu et al.
Kodcode: A Diverse, Challenging, And Verifiable Synthetic Dataset For Coding (2025) • No Venue
Xu et al.
Relearn: Unlearning Via Learning For Large Language Models (2025) • No Venue
Xu et al.
Streamingvlm: Real-time Understanding For Infinite Video Streams (2025) • No Venue
Xu et al.
Simpletir: End-to-end Reinforcement Learning For Multi-turn Tool-integrated Reasoning (2025) • No Venue
Xue et al.
Re:form -- Reducing Human Priors In Scalable Formal Software Verification With RL In Llms: A Preliminary Study On Dafny (2025) • No Venue
Yan et al.
From Code Foundation Models To Agents And Applications: A Practical Guide To Code Intelligence (2025) • No Venue
Yang et al.
Fine-tuning Done Right In Model Editing (2025) • No Venue
Yang et al.
Deepcritic: Deliberate Critique With Large Language Models (2025) • No Venue
Yang et al.
Step Back To Leap Forward: Self-backtracking For Boosting Reasoning Of Language Models (2025) • No Venue
Yang et al.
Multiverse: Your Language Models Secretly Decide How To Parallelize And Merge Generation (2025) • No Venue
Yang et al.
Longvt: Incentivizing "thinking With Long Videos" Via Native Tool Calling (2025) • No Venue
Yang et al.
Moose-chem2: Exploring LLM Limits In Fine-grained Scientific Hypothesis Discovery Via Hierarchical Search (2025) • No Venue
Yang et al.
Mmada: Multimodal Large Diffusion Language Models (2025) • No Venue
Yang et al.
Steering Vision-language-action Models As Anti-exploration: A Test-time Scaling Approach (2025) • No Venue
Yang et al.
Qwen2.5-1m Technical Report (2025) • No Venue
Yang et al.
Visual Spatial Tuning (2025) • No Venue
Yang et al.
Table-r1: Inference-time Scaling For Table Reasoning (2025) • No Venue
Yang et al.
Vlaser: Vision-language-action Model With Synergistic Embodied Reasoning (2025) • No Venue
Yang et al.
Are Reasoning Models More Prone To Hallucination? (2025) • No Venue
Yao et al.
Optimizing Chain-of-thought Reasoners Via Gradient Variance Minimization In Rejection Sampling And RL (2025) • No Venue
Yao et al.
Agentfold: Long-horizon Web Agents With Proactive Context Management (2025) • No Venue
Ye et al.
Demystifying Long Chain-of-thought Reasoning In Llms (2025) • No Venue
Yeo et al.
Judgelrm: Large Reasoning Models As A Judge (2025) • No Venue
Chen et al.
Phi-4-reasoning Technical Report (2025) • No Venue
Abdin et al.
Phi-4-mini Technical Report: Compact Yet Powerful Multimodal Language Models Via Mixture-of-loras (2025) • No Venue
Abouelenin et al.
Emergent Misalignment Via In-context Learning: Narrow In-context Examples Can Produce Broadly Misaligned Llms (2025) • No Venue
Afonin et al.
Front-loading Reasoning: The Synergy Between Pretraining And Post-training Data (2025) • No Venue
Akter et al.
Atla Selene Mini: A General Purpose Evaluation Model (2025) • No Venue
Alexandru et al.
LFM2 Technical Report (2025) • No Venue
Amini et al.
Ultraif: Advancing Instruction Following From The Wild (2025) • No Venue
An et al.
Kandinsky 5.0: A Family Of Foundation Models For Image And Video Generation (2025) • No Venue
Arkhipkin et al.
The Best Of N Worlds: Aligning Reinforcement Learning With Best-of-n Sampling Via Max@k Optimisation (2025) • No Venue
Bagirov et al.
Univg-r1: Reasoning Guided Universal Visual Grounding With Reinforcement Learning (2025) • No Venue
Bai et al.
Llama-nemotron: Efficient Reasoning Models (2025) • No Venue
Bercovich et al.
Singlora: Low Rank Adaptation Using A Single Matrix (2025) • No Venue
Bensaïd et al.
Riemannlora: A Unified Riemannian Framework For Ambiguity-free Lora Optimization (2025) • No Venue
Bogachev et al.
When Does Reasoning Matter? A Controlled Study Of Reasoning's Contribution To Model Performance (2025) • No Venue
Boizard et al.
Neobert: A Next-generation BERT (2025) • No Venue
Breton et al.
Divmerge: A Divergence-based Model Merging Method For Multi-tasking (2025) • No Venue
Brahim et al.
GR-3 Technical Report (2025) • No Venue
Cheang et al.
Agentfrontier: Expanding The Capability Frontier Of LLM Agents With Zpd-guided Data Synthesis (2025) • No Venue
Chen et al.
Acereason-nemotron: Advancing Math And Code Reasoning Through Reinforcement Learning (2025) • No Venue
Chen et al.
Dc-videogen: Efficient Video Generation With Deep Compression Video Autoencoder (2025) • No Venue
Chen et al.
Flash-dmd: Towards High-fidelity Few-step Image Generation With Efficient Distillation And Joint Reinforcement Learning (2025) • No Venue
Chen et al.
Edit Transfer: Learning Image Editing Via Vision In-context Relations (2025) • No Venue
Chen et al.
Exploring The Effect Of Reinforcement Learning On Video Understanding: Insights From Seed-bench-r1 (2025) • No Venue
Chen et al.
Fusionaudio-1.2m: Towards Fine-grained Audio Captioning With Multimodal Contextual Fusion (2025) • No Venue
Chen et al.
MIG: Automatic Data Selection For Instruction Tuning By Maximizing Information Gain In Semantic Space (2025) • No Venue
Chen et al.
Longpo: Long Context Self-evolution Of Large Language Models Through Short-to-long Preference Optimization (2025) • No Venue
Chen et al.
Opengpt-4o-image: A Comprehensive Dataset For Advanced Image Generation And Editing (2025) • No Venue
Chen et al.
Persona Vectors: Monitoring And Controlling Character Traits In Language Models (2025) • No Venue
Chen et al.
SFT Or RL? An Early Investigation Into Training R1-like Reasoning Large Vision-language Models (2025) • No Venue
Chen et al.
Ui-ins: Enhancing GUI Grounding With Multi-perspective Instruction-as-reasoning (2025) • No Venue
Chen et al.
Π_rl: Online RL Fine-tuning For Flow-based Vision-language-action Models (2025) • No Venue
Chen et al.
SFT Memorizes, RL Generalizes: A Comparative Study Of Foundation Model Post-training (2025) • No Venue
Chu et al.
Modifying Large Language Model Post-training For Diverse Creative Writing (2025) • No Venue
Chung et al.
Reinforcement Learning For Reasoning In Small Llms: What Works And What Doesn't (2025) • No Venue
Quy-Anh Dang, Chris Ngo
Openvlthinker: An Early Exploration To Complex Vision-language Reasoning Via Iterative Self-improvement (2025) • No Venue
Deng et al.
Supervised Reinforcement Learning: From Expert Trajectories To Step-wise Reasoning (2025) • No Venue
Deng et al.
Mm-ifengine: Towards Multimodal Instruction Following (2025) • No Venue
Ding et al.
Machinelearninglm: Continued Pretraining Language Models On Millions Of Synthetic Tabular Prediction Tasks Scales In-context ML (2025) • No Venue
Dong et al.
Reinforcement Pre-training (2025) • No Venue
Dong et al.
Motionsight: Boosting Fine-grained Motion Understanding In Multimodal Llms (2025) • No Venue
Du et al.
Virgo: A Preliminary Exploration On Reproducing O1-like MLLM (2025) • No Venue
Du et al.
Make Lora Great Again: Boosting Lora With Adaptive Singular Values And Mixture-of-experts Optimization Alignment (2025) • No Venue
Fan et al.
Creation-mmbench: Assessing Context-aware Creative Intelligence In MLLM (2025) • No Venue
Fang et al.
Dualvla: Building A Generalizable Embodied Agent Via Partial Decoupling Of Reasoning And Action (2025) • No Venue
Fang et al.
Towards General Agentic Intelligence Via Environment Scaling (2025) • No Venue
Fang et al.
Grounding Computer Use Agents On Human Demonstrations (2025) • No Venue
Feizi et al.
Onethinker: All-in-one Reasoning Model For Image And Video (2025) • No Venue
Feng et al.
WILDCHAT-50M: A Deep Dive Into The Role Of Synthetic Data In Post-training (2025) • No Venue
Benjamin Feuer, Chinmay Hegde
Vericot: Neuro-symbolic Chain-of-thought Validation Via Logical Consistency Checks (2025) • No Venue
Feng et al.
Think-at-hard: Selective Latent Iterations To Improve Reasoning Language Models (2025) • No Venue
Fu et al.
Listener-rewarded Thinking In Vlms For Image Preferences (2025) • No Venue
Gambashidze et al.
Cognitive Behaviors That Enable Self-improving Reasoners, Or, Four Habits Of Highly Effective Stars (2025) • No Venue
Gandhi et al.
Exploring Hallucination Of Large Multimodal Models In Video Understanding: Benchmark, Analysis And Mitigation (2025) • No Venue
Gao et al.
Seedream 3.0 Technical Report (2025) • No Venue
Gao et al.
Pixels, Patterns, But No Poetry: To See The World Like Humans (2025) • No Venue
Gao et al.
Seedance 1.0: Exploring The Boundaries Of Video Generation Models (2025) • No Venue
Gao et al.
The Differences Between Direct Alignment Algorithms Are A Blur (2025) • No Venue
Gorbatovski et al.
Set Block Decoding Is A Language Model Inference Accelerator (2025) • No Venue
Gat et al.
Guided By Gut: Efficient Test-time Scaling With Reinforced Intrinsic Confidence (2025) • No Venue
Ghasemabadi et al.
Audio Flamingo 2: An Audio-language Model With Long-audio Understanding And Expert Reasoning Abilities (2025) • No Venue
Ghosh et al.
Should We Still Pretrain Encoders With Masked Language Modeling? (2025) • No Venue
Gisserot-Boukhlef et al.
Seedream 2.0: A Native Chinese-english Bilingual Image Generation Foundation Model (2025) • No Venue
Gong et al.
Diffusion As Shader: 3d-aware Video Diffusion For Versatile Video Generation Control (2025) • No Venue
Gu et al.
Ui-venus Technical Report: Building High-performance UI Agents With RFT (2025) • No Venue
Gu et al.
Learning To See Before Seeing: Demystifying LLM Visual Priors From Language Pre-training (2025) • No Venue
Han et al.
RLP: Reinforcement As A Pretraining Objective (2025) • No Venue
Hatamizadeh et al.
Don't Overthink It. Preferring Shorter Thinking Chains For Improved LLM Reasoning (2025) • No Venue
Hassid et al.
Kuwain 1.5B: An Arabic SLM Via Language Injection (2025) • No Venue
Hennara et al.
Baseer: A Vision-language Model For Arabic Document-to-markdown OCR (2025) • No Venue
Hennara et al.
A Sober Look At Progress In Language Model Reasoning: Pitfalls And Paths To Reproducibility (2025) • No Venue
Hochlehnert et al.
Dita: Scaling Diffusion Transformer For Generalist Vision-language-action Policy (2025) • No Venue
Hou et al.
Quest: Incentivizing Llms To Generate Difficult Problems (2025) • No Venue
Hu et al.
Llms Learn To Deceive Unintentionally: Emergent Misalignment In Dishonesty From Misaligned Samples To Biased Human-ai Interactions (2025) • No Venue
Hu et al.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability Of LLM Reasoning (2025) • No Venue
Huan et al.
Sentinel: SOTA Model To Protect Against Prompt Injections (2025) • No Venue
Dror Ivry, Oran Nahum
The African Languages Lab: A Collaborative Approach To Advancing Low-resource African NLP (2025) • No Venue
Issaka et al.
Infiniteyou: Flexible Photo Recrafting While Preserving Your Identity (2025) • No Venue
Jiang et al.
Are Today's Llms Ready To Explain Well-being Concepts? (2025) • No Venue
Jiang et al.
Generalist Foundation Models Are Not Clinical Enough For Hospital Operations (2025) • No Venue
Jiang et al.
Detect Anything Via Next Point Prediction (2025) • No Venue
Jiang et al.
Think Only When You Need With Large Hybrid-reasoning Models (2025) • No Venue
Jiang et al.
VIDEOP2R: Video Understanding From Perception To Reasoning (2025) • No Venue
Jiang et al.
Gralora: Granular Low-rank Adaptation For Parameter-efficient Fine-tuning (2025) • No Venue
Jung et al.
Don't Blind Your VLA: Aligning Visual Representations For OOD Generalization (2025) • No Venue
Kachaev et al.
First Try Matters: Revisiting The Role Of Reflection In Reasoning Models (2025) • No Venue
Kang et al.
Marigold: Affordable Adaptation Of Diffusion-based Image Generators For Image Analysis (2025) • No Venue
Ke et al.
Piper: On-device Environment Setup Via Online Reinforcement Learning (2025) • No Venue
Kovrigin et al.
Robot-r1: Reinforcement Learning For Enhanced Embodied Reasoning In Robotics (2025) • No Venue
Kim et al.
Universal Reasoner: A Single, Composable Plug-and-play Reasoner For Frozen Llms (2025) • No Venue
Kim et al.
Temporal In-context Fine-tuning For Versatile Control Of Video Diffusion Models (2025) • No Venue
Kinam Kim, Junha Hyung, Jaegul Choo
Cadrille: Multi-modal CAD Reconstruction With Online Reinforcement Learning (2025) • No Venue
Kolodiazhnyi et al.
Stream3r: Scalable Sequential 3D Reconstruction With Causal Transformer (2025) • No Venue
Lan et al.
Fedsvd: Adaptive Orthogonalization For Private Federated Learning With Lora (2025) • No Venue
Lee et al.
MMR1: Enhancing Multimodal Reasoning With Variance-aware Sampling And Open Resources (2025) • No Venue
Leng et al.
Autotriton: Automatic Triton Programming With Reinforcement Learning In Llms (2025) • No Venue
Li et al.
Baichuan-omni-1.5 Technical Report (2025) • No Venue
Li et al.
Confidence Is All You Need: Few-shot RL Fine-tuning Of Language Models (2025) • No Venue
Li et al.
How Instruction And Reasoning Data Shape Post-training: Data Quality Through The Lens Of Layer-wise Gradients (2025) • No Venue
Li et al.
Llms Can Easily Learn To Reason From Demonstrations Structure, Not Content, Is What Matters! (2025) • No Venue
Li et al.
Veripo: Cultivating Long Reasoning In Video-llms Via Verifier-gudied Iterative Policy Optimization (2025) • No Venue
Li et al.
Mol-r1: Towards Explicit Long-cot Reasoning In Molecule Discovery (2025) • No Venue
Li et al.
Routing Manifold Alignment Improves Generalization Of Mixture-of-experts Llms (2025) • No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
Radial Attention: O(nlog N) Sparse Attention With Energy Decay For Long Video Generation (2025) • No Venue
Li et al.
START: Self-taught Reasoner With Tools (2025) • No Venue
Li et al.
Small Models Struggle To Learn From Strong Reasoners (2025) • No Venue
Li et al.
Sos1: O1 And R1-like Reasoning Llms Are Sum-of-square Solvers (2025) • No Venue
Li et al.
Uniworld-v2: Reinforce Image Editing With Diffusion Negative-aware Finetuning And MLLM Implicit Feedback (2025) • No Venue
Li et al.
Tempsamp-r1: Effective Temporal Sampling With Reinforcement Fine-tuning For Video Llms (2025) • No Venue
Li et al.
Taming Llms By Scaling Learning Rates With Gradient Grouping (2025) • No Venue
Li et al.
SWE-SQL: Illuminating LLM Pathways To Solve User SQL Issues In Real-world Applications (2025) • No Venue
Li et al.
Uni-moe-2.0-omni: Scaling Language-centric Omnimodal Large Model With Advanced Moe, Training And Data (2025) • No Venue
Li et al.
VLA-RFT: Vision-language-action Reinforcement Fine-tuning With Verified Rewards In World Simulators (2025) • No Venue
Li et al.
Zebra-cot: A Dataset For Interleaved Vision Language Reasoning (2025) • No Venue
Li et al.
Drag-and-drop Llms: Zero-shot Prompt-to-weights (2025) • No Venue
Liang et al.
Modomodo: Multi-domain Data Mixtures For Multimodal LLM Reinforcement Learning (2025) • No Venue
Liang et al.
Improved Visual-spatial Reasoning Via R1-zero-like Training (2025) • No Venue
Liao et al.
Motif 2 12.7B Technical Report (2025) • No Venue
Lim et al.
Jarvisart: Liberating Human Artistic Creativity Via An Intelligent Photo Retouching Agent (2025) • No Venue
Lin et al.
Critique-coder: Enhancing Coder Models By Critique Reinforcement Learning (2025) • No Venue
Ruan et al.
Beyond Distillation: Pushing The Limits Of Medical LLM Reasoning With Minimalist Rule-based RL (2025) • No Venue
Liu et al.
Acereason-nemotron 1.1: Advancing Math And Code Reasoning Through SFT And RL Synergy (2025) • No Venue
Liu et al.
Fin-r1: A Large Language Model For Financial Reasoning Through Reinforcement Learning (2025) • No Venue
Liu et al.
Guardreasoner: Towards Reasoning-based LLM Safeguards (2025) • No Venue
Liu et al.
Infiguiagent: A Multimodal Generalist GUI Agent With Native Reasoning And Reflection (2025) • No Venue
Liu et al.
Visual-rft: Visual Reinforcement Fine-tuning (2025) • No Venue
Liu et al.
Shotbench: Expert-level Cinematic Understanding In Vision-language Models (2025) • No Venue
Liu et al.
Medsam3: Delving Into Segment Anything With Medical Concepts (2025) • No Venue
Liu et al.
Pairwise RM: Perform Best-of-n Sampling With Knockout Tournament (2025) • No Venue
Liu et al.
Othink-mr1: Stimulating Multimodal Generalized Reasoning Capabilities Via Dynamic Reinforcement Learning (2025) • No Venue
Liu et al.
Reasonrank: Empowering Passage Ranking With Strong Reasoning Ability (2025) • No Venue
Liu et al.
Webexplorer: Explore And Evolve For Training Long-horizon Web Agents (2025) • No Venue
Liu et al.
Don't Just Fine-tune The Agent, Tune The Environment (2025) • No Venue
Lu et al.
Omnicaptioner: One Captioner To Rule Them All (2025) • No Venue
Lu et al.
Learning From Peers In Reasoning Models (2025) • No Venue
Luo et al.
Beyond English: Toward Inclusive And Scalable Multilingual Machine Translation With Llms (2025) • No Venue
Luo et al.
O1-pruner: Length-harmonizing Fine-tuning For O1-like Reasoning Pruning (2025) • No Venue
Luo et al.
URSA: Understanding And Verifying Chain-of-thought Reasoning In Multimodal Mathematics (2025) • No Venue
Luo et al.
Towards A Unified View Of Large Language Model Post-training (2025) • No Venue
Lv et al.
Deepperception: Advancing R1-like Cognitive Visual Perception In Mllms For Knowledge-intensive Visual Grounding (2025) • No Venue
Ma et al.
TCIA: A Task-centric Instruction Augmentation Method For Instruction Finetuning (2025) • No Venue
Ma et al.
SQL-R1: Training Natural Language To SQL Reasoning Model By Reinforcement Learning (2025) • No Venue
Ma et al.
S^2R: Teaching Llms To Self-verify And Self-correct Via Reinforcement Learning (2025) • No Venue
Ma et al.
Tool-integrated Reinforcement Learning For Repo Deep Search (2025) • No Venue
Ma et al.
Unirl: Self-improving Unified Multimodal Models Via Supervised And Reinforcement Learning (2025) • No Venue
Weijia Mao, Zhenheng Yang, Mike Zheng Shou
I Think, Therefore I Diffuse: Enabling Multimodal In-context Reasoning In Diffusion Models (2025) • No Venue
Mi et al.
Easy Dataset: A Unified And Extensible Framework For Synthesizing LLM Fine-tuning Data From Unstructured Documents (2025) • No Venue
Miao et al.
S1: Simple Test-time Scaling (2025) • No Venue
Muennighoff et al.
Leveraging Self-attention For Input-dependent Soft Prompting In Llms (2025) • No Venue
Ananth Muppidi, Abhilash Nandy, Sambaran Bandyopadhyay
Adaptivocab: Enhancing LLM Efficiency In Focused Domains Through Lightweight Vocabulary Adaptation (2025) • No Venue
Nakash et al.
Viscoder: Fine-tuning Llms For Executable Python Visualization Code Generation (2025) • No Venue
Ni et al.
Mineru2.5: A Decoupled Vision-language Model For Efficient High-resolution Document Parsing (2025) • No Venue
Niu et al.
DINO-R1: Incentivizing Reasoning Capability In Vision Foundation Models (2025) • No Venue
Pan et al.
Omnimanip: Towards General Robotic Manipulation Via Object-centric Interaction Primitives As Spatial Constraints (2025) • No Venue
Pan et al.
BOLT: Bootstrap Long Chain-of-thought In Language Models Without Distillation (2025) • No Venue
Pang et al.
Thinking Sparks!: Emergent Attention Heads In Reasoning Models During Post Training (2025) • No Venue
Yein Park, Minbyul Jeong, Jaewoo Kang
Mathfusion: Enhancing Mathematic Problem-solving Of LLM Through Instruction Fusion (2025) • No Venue
Pei et al.
Skywork R1V: Pioneering Multimodal Reasoning With Chain-of-thought (2025) • No Venue
Peng et al.
Criticlean: Critic-guided Reinforcement Learning For Mathematical Formalization (2025) • No Venue
Peng et al.
Unconditional Priors Matter! Improving Conditional Generation Of Fine-tuned Diffusion Models (2025) • No Venue
Phunyaphibarn et al.
How Much Knowledge Can You Pack Into A Lora Adapter Without Harming LLM? (2025) • No Venue
Pletenev et al.
Toolrl: Reward Is All Tool Learning Needs (2025) • No Venue
Qian et al.
Defeating The Training-inference Mismatch Via FP16 (2025) • No Venue
Qi et al.
AR-RAG: Autoregressive Retrieval Augmentation For Image Generation (2025) • No Venue
Qi et al.
Fino1: On The Transferability Of Reasoning Enhanced Llms To Finance (2025) • No Venue
Qian et al.
Optimizing Test-time Compute Via Meta Reinforcement Fine-tuning (2025) • No Venue
Qu et al.
Apriel-1.5-15b-thinker (2025) • No Venue
Radhakrishna et al.
REFINE-AF: A Task-agnostic Framework To Align Language Models Via Self-generated Instructions Using Reinforcement Learning From Automated Feedback (2025) • No Venue
Roy et al.
Through The Looking Glass: Common Sense Consistency Evaluation Of Weird Images (2025) • No Venue
Rykov et al.
Llms Are Greedy Agents: Effects Of RL Fine-tuning On Decision-making Abilities (2025) • No Venue
Schmied et al.
Seaweed-7b: Cost-effective Training Of Video Generation Foundation Model (2025) • No Venue
Seawead et al.
Efficient Personalization Of Quantized Diffusion Model Without Backpropagation (2025) • No Venue
Seo et al.
Longrope2: Near-lossless LLM Context Window Scaling (2025) • No Venue
Shang et al.
Nile-chat: Egyptian Language Models For Arabic And Latin Scripts (2025) • No Venue
Shang et al.
Deep Research: A Systematic Survey (2025) • No Venue
Shi et al.
Negative-guided Subject Fidelity Optimization For Zero-shot Subject-driven Generation (2025) • No Venue
Shin et al.
Taskcraft: Automated Generation Of Agentic Tasks (2025) • No Venue
Shi et al.
T-lora: Single Image Diffusion Model Customization Without Overfitting (2025) • No Venue
Soboleva et al.
Agent Data Protocol: Unifying Datasets For Diverse, Effective Fine-tuning Of LLM Agents (2025) • No Venue
Song et al.
Makeanything: Harnessing Diffusion Transformers For Multi-domain Procedural Sequence Generation (2025) • No Venue
Yiren Song, Cheng Liu, Mike Zheng Shou
RL Makes Mllms See Better Than SFT (2025) • No Venue
Song et al.
Alchemist: Turning Public Text-to-image Data Into Generative Gold (2025) • No Venue
Startsev et al.
Video-lmm Post-training: A Deep Dive Into Video Reasoning With Large Multimodal Models (2025) • No Venue
Tang et al.
Klear-reasoner: Advancing Reasoning Capability Via Gradient-preserving Clipping Policy Optimization (2025) • No Venue
Su et al.
Multiagent Finetuning: Self Improvement With Diverse Reasoning Chains (2025) • No Venue
Subramaniam et al.
Stop Overthinking: A Survey On Efficient Reasoning For Large Language Models (2025) • No Venue
Sui et al.
Reasonmed: A 370K Multi-agent Generated Dataset For Advancing Medical Reasoning (2025) • No Venue
Sun et al.
The Curse Of Depth In Large Language Models (2025) • No Venue
Sun et al.
Zerosearch: Incentivize The Search Capability Of Llms Without Searching (2025) • No Venue
Sun et al.
Gemini Robotics: Bringing AI Into The Physical World (2025) • No Venue
Team et al.
Kanana: Compute-efficient Bilingual Language Models (2025) • No Venue
Team et al.
Minicpm4: Ultra-efficient Llms On End Devices (2025) • No Venue
Team et al.
Mimo: Unlocking The Reasoning Potential Of Language Model -- From Pretraining To Posttraining (2025) • No Venue
Team et al.
Modernvbert: Towards Smaller Visual Document Retrievers (2025) • No Venue
Teiletche et al.
Mmada-parallel: Multimodal Large Diffusion Language Models For Thinking-aware Editing And Generation (2025) • No Venue
Tian et al.
Ego-r1: Chain-of-tool-thought For Ultra-long Egocentric Video Reasoning (2025) • No Venue
Tian et al.
Improving Text Embeddings For Smaller Language Models Using Contrastive Fine-tuning (2024) • No Venue
Trapoom Ukarapol, Zhicheng Lee, Amy Xin
Meltemi: The First Open Large Language Model For Greek (2024) • No Venue
Voukoutis et al.
Astraios: Parameter-efficient Instruction Tuning Code Large Language Models (2024) • No Venue
Zhuo et al.
Fusechat: Knowledge Fusion Of Chat Models (2024) • No Venue
Wan et al.
Qwen2.5 Technical Report (2024) • No Venue
Qwen et al.
Alignment Studio: Aligning Large Language Models To Particular Contextual Regulations (2024) • No Venue
Achintalwar et al.
Yi: Open Foundation Models By 01.AI (2024) • No Venue
Ai et al.
Maya: An Instruction Finetuned Multilingual Multimodal Model (2024) • No Venue
Alam et al.
Seed-tts: A Family Of High-quality Versatile Speech Generation Models (2024) • No Venue
Anastassiou et al.
Aya 23: Open Weight Releases To Further Multilingual Progress (2024) • No Venue
Aryabumi et al.
Longwriter: Unleashing 10,000+ Word Generation From Long Context Llms (2024) • No Venue
Bai et al.
Longalign: A Recipe For Long Context Alignment Of Large Language Models (2024) • No Venue
Bai et al.
Digirl: Training In-the-wild Device-control Agents With Autonomous Reinforcement Learning (2024) • No Venue
Bai et al.
From Generalist To Specialist: Adapting Vision Language Models Via Task-specific Visual Instruction Tuning (2024) • No Venue
Bai et al.
Skywork-math: Data Scaling Laws For Mathematical Reasoning In Large Language Models -- The Story Goes On (2024) • No Venue
Zeng et al.
LLM Augmented Llms: Expanding Capabilities Through Composition (2024) • No Venue
Bansal et al.
Fintral: A Family Of GPT-4 Level Multimodal Financial Large Language Models (2024) • No Venue
Bhatia et al.
Speculative Streaming: Fast LLM Inference Without Auxiliary Models (2024) • No Venue
Bhendawade et al.
Lora Learns Less And Forgets Less (2024) • No Venue
Biderman et al.
$\pi_0$: A Vision-language-action Flow Model For General Robot Control (2024) • Robotics: Science and Systems 2025 • 48 citations
Black et al.
Biomedlm: A 2.7B Parameter Language Model Trained On Biomedical Text (2024) • No Venue
Bolton et al.
Internlm2 Technical Report (2024) • No Venue
Cai et al.
Medusa: Simple LLM Inference Acceleration Framework With Multiple Decoding Heads (2024) • No Venue
Cai et al.
Data Is All You Need: Finetuning Llms For Chip Design Via An Automated Design-data Augmentation Framework (2024) • DAC '24: 61st ACM/IEEE Design Automation Conference • 43 citations
Chang et al.
Chatmusician: Understanding And Generating Music Intrinsically With LLM (2024) • No Venue
Yuan et al.
Getting It Right: Improving Spatial Consistency In Text-to-image Models (2024) • No Venue
Chatterjee et al.
Tx-llm: A Large Language Model For Therapeutics (2024) • No Venue
Chaves et al.
Compcap: Improving Multimodal Large Language Models With Composite Captions (2024) • No Venue
Chen et al.
Florence-vl: Enhancing Vision-language Models With Generative Vision Encoder And Depth-breadth Fusion (2024) • No Venue
Chen et al.
Huatuogpt-vision, Towards Injecting Medical Visual Knowledge Into Multimodal Llms At Scale (2024) • No Venue
Chen et al.
Language Models Are Hidden Reasoners: Unlocking Latent Reasoning Capabilities Via Self-rewarding (2024) • No Venue
Chen et al.
Moto: Latent Motion Token As The Bridging Language For Robot Manipulation (2024) • No Venue
Chen et al.
Mj-bench: Is Your Multimodal Reward Model Really A Good Judge For Text-to-image Generation? (2024) • No Venue
Chen et al.
Octo-planner: On-device Language Model For Planner-action Agents (2024) • No Venue
Chen et al.
Reverse Thinking Makes Llms Stronger Reasoners (2024) • No Venue
Chen et al.
Self-play Fine-tuning Converts Weak Language Models To Strong Language Models (2024) • No Venue
Chen et al.
Videocrafter2: Overcoming Data Limitations For High-quality Video Diffusion Models (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 94 citations
Chen et al.
Visionts: Visual Masked Autoencoders Are Free-lunch Zero-shot Time Series Forecasters (2024) • No Venue
Chen et al.
Xtrimopglm: Unified 100b-scale Pre-trained Transformer For Deciphering The Language Of Protein (2024) • Arxiv • 61 citations
Chen et al.
Advancing LLM Reasoning Generalists With Preference Trees (2024) • No Venue
Yuan et al.
MLLM As Retriever: Interactively Learning Multimodal Retrieval For Embodied Agents (2024) • No Venue
Yue et al.
ANOLE: An Open, Autoregressive, Native Large Multimodal Models For Interleaved Image-text Generation (2024) • No Venue
Chern et al.
Self-rewarding Language Models (2024) • No Venue
Yuan et al.
Instruction Pre-training: Language Models Are Supervised Multitask Learners (2024) • No Venue
Cheng et al.
On Domain-specific Post-training For Multimodal Large Language Models (2024) • No Venue
Cheng et al.
Llamafactory: Unified Efficient Fine-tuning Of 100+ Language Models (2024) • No Venue
Zheng et al.
Multi-lora Composition For Image Generation (2024) • No Venue
Zhong et al.
Med42-v2: A Suite Of Clinical Llms (2024) • No Venue
Christophe et al.
Harnessing Large Language Models For Text-rich Sequential Recommendation (2024) • WWW '24: The ACM Web Conference 2024 • 45 citations
Zheng et al.
Beyond Fine-tuning: Unleashing The Potential Of Continuous Pretraining For Clinical Llms (2024) • No Venue
Christophe et al.
Heavy Labels Out! Dataset Distillation With Label Space Lightening (2024) • No Venue
Yu et al.
Jacolbertv2.5: Optimising Multi-vector Retrievers To Create State-of-the-art Japanese Retrievers With Constrained Resources (2024) • No Venue
Benjamin Clavié
Saullm-54b & Saullm-141b: Scaling Up Domain Adaptation For The Legal Domain (2024) • No Venue
Colombo et al.
Towards A Personal Health Large Language Model (2024) • No Venue
Cosentino et al.
NVLM: Open Frontier-class Multimodal Llms (2024) • No Venue
Dai et al.
Swiftbrush V2: Make Your One-step Diffusion Model Better Than Its Teacher (2024) • No Venue
Dao et al.
Molmo And Pixmo: Open Weights And Open Data For State-of-the-art Multimodal Models (2024) • No Venue
Deitke et al.
Unleashing Reasoning Capability Of Llms Via Scalable Question Synthesis From Scratch (2024) • No Venue
Ding et al.
Longrope: Extending LLM Context Window Beyond 2 Million Tokens (2024) • No Venue
Ding et al.
Internlm-math: Open Math Large Language Models Toward Verifiable Reasoning (2024) • No Venue
Ying et al.
Baichuanseed: Sharing The Potential Of Extensive Data Collection And Deduplication By Introducing A Competitive Large Language Model Baseline (2024) • No Venue
Dong et al.
Generative Inbetweening: Adapting Image-to-video Models For Keyframe Interpolation (2024) • No Venue
Wang et al.
Dreamrunner: Fine-grained Storytelling Video Generation With Retrieval-augmented Motion Adaptation (2024) • No Venue
Wang et al.
Autotrain: No-code Training For State-of-the-art Models (2024) • No Venue
Abhishek Thakur
Toward Self-improvement Of Llms Via Imagination, Searching, And Criticizing (2024) • No Venue
Tian et al.
Enhancing The Reasoning Ability Of Multimodal Large Language Models Via Mixed Preference Optimization (2024) • No Venue
Wang et al.
Llms In The Imaginarium: Tool Learning Through Simulated Trial And Error (2024) • No Venue
Wang et al.
Multilingual E5 Text Embeddings: A Technical Report (2024) • No Venue
Wang et al.
Llama-mesh: Unifying 3D Mesh Generation With Language Models (2024) • No Venue
Wang et al.
Octo: An Open-source Generalist Robot Policy (2024) • No Venue
Team et al.
Model Surgery: Modulating Llm's Behavior Via Simple Parameter Editing (2024) • No Venue
Wang et al.
MIO: A Foundation Model On Multimodal Tokens (2024) • No Venue
Wang et al.
Let The Expert Stick To His Last: Expert-specialized Fine-tuning For Sparse Architectural Large Language Models (2024) • No Venue
Wang et al.
Lift: Leveraging Human Feedback For Text-to-video Model Alignment (2024) • No Venue
Wang et al.
Branch-train-mix: Mixing Expert Llms Into A Mixture-of-experts LLM (2024) • No Venue
Sukhbaatar et al.
Parrot: Multilingual Visual Instruction Tuning (2024) • No Venue
Sun et al.
Video-star: Self-training Enables Video Instruction Tuning With Any Supervision (2024) • No Venue
Zohar et al.
Weaver: Foundation Models For Creative Writing (2024) • No Venue
Wang et al.
Self-training With Direct Preference Optimization Improves Chain-of-thought Reasoning (2024) • No Venue
Tianduo Wang, Shichen Li, Wei Lu
Q-sparse: All Large Language Models Can Be Fully Sparsely-activated (2024) • No Venue
Wang et al.
Lloco: Learning Long Contexts Offline (2024) • No Venue
Tan et al.
Atlas-chat: Adapting Large Language Models For Low-resource Moroccan Arabic Dialect (2024) • No Venue
Shang et al.
Watermarking Makes Language Models Radioactive (2024) • No Venue
Sander et al.
Caduceus: Bi-directional Equivariant Long-range DNA Sequence Modeling (2024) • Arxiv • 47 citations
Schiff et al.
Prithvi Wxc: Foundation Model For Weather And Climate (2024) • No Venue
Schmude et al.
Show, Don't Tell: Aligning Language Models With Demonstrated Feedback (2024) • No Venue
Shaikh et al.
Deepseekmath: Pushing The Limits Of Mathematical Reasoning In Open Language Models (2024) • No Venue
Shao et al.
Programming Every Example: Lifting Pre-training Data Quality Like Experts At Scale (2024) • No Venue
Zhou et al.
Nemo-aligner: Scalable Toolkit For Efficient Model Alignment (2024) • No Venue
Shen et al.
Aya Model: An Instruction Finetuned Open-access Multilingual Language Model (2024) • No Venue
Üstün et al.
Tag-llm: Repurposing General-purpose Llms For Specialized Domains (2024) • No Venue
Shen et al.
PERL: Parameter Efficient Reinforcement Learning From Human Feedback (2024) • No Venue
Sidahmed et al.
Design2code: How Far Are We From Automating Front-end Engineering? (2024) • No Venue
Si et al.
Aya Dataset: An Open-access Collection For Multilingual Instruction Tuning (2024) • No Venue
Singh et al.
A Large Encoder-decoder Family Of Foundation Models For Chemical Language (2024) • No Venue
Soares et al.
Both Text And Images Leaked! A Systematic Analysis Of Multimodal LLM Data Contamination (2024) • No Venue
Song et al.
How To Synthesize Text Data Without Model Collapse? (2024) • No Venue
Zhu et al.
Fine Tuning Vs. Retrieval Augmented Generation For Less Popular Knowledge (2024) • Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 40 citations
Heydar Soudani, Evangelos Kanoulas, Faegheh Hasibi
LLM Pruning And Distillation In Practice: The Minitron Approach (2024) • No Venue
Sreenivas et al.
Canttalkaboutthis: Aligning Language Models To Stay On Topic In Dialogues (2024) • No Venue
Sreedhar et al.
Scaling Granite Code Models To 128K Context (2024) • No Venue
Stallone et al.
Paligemma 2: A Family Of Versatile Vlms For Transfer (2024) • No Venue
Steiner et al.
Jina-embeddings-v3: Multilingual Embeddings With Task Lora (2024) • No Venue
Sturua et al.
Bitdelta: Your Fine-tune May Only Be Worth One Bit (2024) • No Venue
Liu et al.
CLEAR: Conv-like Linearization Revs Pre-trained Diffusion Transformers Up (2024) • No Venue
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Lumina-mgpt: Illuminate Flexible Photorealistic Text-to-image Generation With Multimodal Generative Pretraining (2024) • No Venue
Liu et al.
Llms + Persona-plug = Personalized Llms (2024) • No Venue
Liu et al.
NVILA: Efficient Frontier Visual Language Models (2024) • No Venue
Liu et al.
MMDU: A Multi-turn Multi-image Dialog Understanding Benchmark And Instruction-tuning Dataset For Lvlms (2024) • No Venue
Liu et al.
POINTS: Improving Your Vision-language Model With Affordable Strategies (2024) • No Venue
Liu et al.
Tuning Language Models By Proxy (2024) • No Venue
Liu et al.
Understanding Llms: A Comprehensive Overview From Training To Inference (2024) • No Venue
Liu et al.
MM1.5: Methods, Analysis & Insights From Multimodal LLM Fine-tuning (2024) • No Venue
Zhang et al.
Lora Land: 310 Fine-tuned Llms That Rival GPT-4, A Technical Report (2024) • No Venue
Zhao et al.
Self-play Preference Optimization For Language Model Alignment (2024) • No Venue
Wu et al.
Pixel-space Post-training Of Latent Diffusion Models (2024) • No Venue
Zhang et al.
Q-galore: Quantized Galore With INT4 Projection And Layer-adaptive Low-rank Gradients (2024) • No Venue
Zhang et al.
Self-exploring Language Models: Active Preference Elicitation For Online Alignment (2024) • No Venue
Zhang et al.
Deepseek-vl: Towards Real-world Vision-language Understanding (2024) • No Venue
Lu et al.
A Controlled Study On Long Context Extension And Generalization In Llms (2024) • No Venue
Lu et al.
Mathcoder2: Better Math Reasoning From Continued Pretraining On Model-translated Mathematical Code (2024) • No Venue
Lu et al.
Step-controlled DPO: Leveraging Stepwise Error For Enhanced Mathematical Reasoning (2024) • No Venue
Lu et al.
When Scaling Meets LLM Finetuning: The Effect Of Data, Model And Finetuning Method (2024) • No Venue
Zhang et al.
Soaring From 4K To 400K: Extending Llm's Context With Activation Beacon (2024) • No Venue
Zhang et al.
Divide-or-conquer? Which Part Should You Distill Your LLM? (2024) • No Venue
Wu et al.
Reft: Representation Finetuning For Language Models (2024) • No Venue
Wu et al.
Semievol: Semi-supervised Fine-tuning For LLM Adaptation (2024) • No Venue
Luo et al.
Robustft: Robust Supervised Fine-tuning For Large Language Models Under Noisy Response (2024) • No Venue
Luo et al.
Large Language Models Surpass Human Experts In Predicting Neuroscience Results (2024) • Nature Human Behaviour • 57 citations
Luo et al.
Galore: Memory-efficient LLM Training By Gradient Low-rank Projection (2024) • No Venue
Zhao et al.
A Survey To Recent Progress Towards Understanding In-context Learning (2024) • Frontiers of Computer Science • 40 citations
Mao et al.
Reft: Reasoning With Reinforced Fine-tuning (2024) • No Venue
Luong et al.
WILBUR: Adaptive In-context Learning For Robust And Accurate Web Agents (2024) • No Venue
Lutz et al.
Llmparser: An Exploratory Study On Using Large Language Models For Log Parsing (2024) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 49 citations
Ma et al.
Netllm: Adapting Large Language Models For Networking (2024) • ACM SIGCOMM '24: ACM SIGCOMM 2024 Conference • 73 citations
Wu et al.
Llama Pro: Progressive Llama With Block Expansion (2024) • No Venue
Wu et al.
Openmedlm: Prompt Engineering Can Out-perform Fine-tuning In Medical Question-answering With Open-source Large Language Models (2024) • Scientific Reports • 53 citations
Maharjan et al.
Exploring The Capabilities And Limitations Of Large Language Models In The Electric Energy Sector (2024) • Joule • 69 citations
Majumder et al.
Diffusekrona: A Parameter Efficient Fine-tuning Method For Personalized Diffusion Model (2024) • No Venue
Marjit et al.
Improving Text-to-image Consistency Via Automatic Prompt Optimization (2024) • No Venue
Mañas et al.
MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training (2024) • No Venue
McKinzie et al.
Openelm: An Efficient Language Model Family With Open-source Training And Inference Framework (2024) • No Venue
Mehta et al.
Orca-math: Unlocking The Potential Of Slms In Grade School Math (2024) • No Venue
Mitra et al.
Grouse: A Benchmark To Evaluate Evaluators In Grounded Question Answering (2024) • No Venue
Muller et al.
Can Llms Learn By Teaching? A Preliminary Study (2024) • No Venue
Ning et al.
H2o-danube3 Technical Report (2024) • No Venue
Pfeiffer et al.
Llamaduo: Llmops Pipeline For Seamless Migration From Service Llms To Small-scale Local Llms (2024) • No Venue
Park et al.
Datadreamer: A Tool For Synthetic Data Generation And Reproducible LLM Workflows (2024) • No Venue
Ajay Patel, Colin Raffel, Chris Callison-Burch
Advprompter: Fast Adaptive Adversarial Prompting For Llms (2024) • No Venue
Paulus et al.
RAFT: Adapting Language Model To Domain Specific RAG (2024) • No Venue
Zhang et al.
Personalized Visual Instruction Tuning (2024) • No Venue
Pi et al.
Fine-tuning And Prompt Engineering For Large Language Models-based Code Review Automation (2024) • Information and Software Technology • 43 citations
Chanathip Pornprasit, Chakkrit Tantithamthavorn
Marco-o1: Towards Open Reasoning Models For Open-ended Solutions (2024) • No Venue
Zhao et al.
In-context Editing: Learning Knowledge From Self-induced Distributions (2024) • No Venue
Qi et al.
VISTA: Enhancing Long-duration And High-resolution Video Understanding By Video Spatiotemporal Augmentation (2024) • No Venue
Ren et al.
Selfcodealign: Self-alignment For Code Generation (2024) • No Venue
Wei et al.
Direct Nash Optimization: Teaching Language Models To Self-improve With General Preferences (2024) • No Venue
Rosset et al.
RLHF Workflow: From Reward Modeling To Online RLHF (2024) • No Venue
Dong et al.
Unlocking Continual Learning Abilities In Language Models (2024) • No Venue
Du et al.
Layerskip: Enabling Early Exit Inference And Self-speculative Decoding (2024) • No Venue
Elhoushi et al.
Mmfactory: A Universal Solution Search Engine For Vision-language Tasks (2024) • No Venue
Wan-Cyuan Fan, Tanzila Rahman, Leonid Sigal
VILA^2: VILA Augmented VILA (2024) • No Venue
Fang et al.
Processbench: Identifying Process Errors In Mathematical Reasoning (2024) • No Venue
Zheng et al.
Fuzzcoder: Byte-level Fuzzing Test Via Large Language Model (2024) • No Venue
Yang et al.
Croissantllm: A Truly Bilingual French-english Language Model (2024) • No Venue
Faysse et al.
Enhancing Video-language Representations With Structural Spatio-temporal Alignment (2024) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 49 citations
Fei et al.
Physics Of Language Models: Part 2.2, How To Learn From Mistakes On Grade-school Math Problems (2024) • No Venue
Ye et al.
RAG Foundry: A Framework For Enhancing Llms For Retrieval Augmented Generation (2024) • No Venue
Fleischer et al.
Aligning Diffusion Models With Noise-conditioned Perception (2024) • No Venue
Gambashidze et al.
Stream Of Search (sos): Learning To Search In Language (2024) • No Venue
Gandhi et al.
Efficient Tool Use With Chain-of-abstraction Reasoning (2024) • No Venue
Gao et al.
Seerattention: Learning Intrinsic Sparse Attention In Your Llms (2024) • No Venue
Gao et al.
Paecter: Patent-level Representation Learning Using Citation-informed Transformers (2024) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 218 citations
Ghosh et al.
GAMA: A Large Audio-language Model With Advanced Audio Understanding And Complex Reasoning Abilities (2024) • No Venue
Ghosh et al.
Chatglm: A Family Of Large Language Models From GLM-130B To GLM-4 All Tools (2024) • No Venue
Glm et al.
Knesset-dictabert: A Hebrew Language Model For Parliamentary Proceedings (2024) • No Venue
Gili Goldin, Shuly Wintner
Mulberry: Empowering MLLM With O1-like Reasoning And Reflection Via Collective Monte Carlo Tree Search (2024) • No Venue
Yao et al.
The Unreasonable Ineffectiveness Of The Deeper Layers (2024) • No Venue
Gromov et al.
Model Merging And Safety Alignment: One Bad Model Spoils The Bunch (2024) • No Venue
Hammoud et al.
Parameter-efficient Fine-tuning For Large Models: A Comprehensive Survey (2024) • Arxiv • 81 citations
Han et al.
Teaching Large Language Models To Reason With Reinforcement Learning (2024) • No Venue
Havrilla et al.
Distill Visual Chart Reasoning Ability From Llms To Mllms (2024) • No Venue
He et al.
Qwen2 Technical Report (2024) • No Venue
Yang et al.
Distilling An End-to-end Voice Assistant Without Instruction Training Data (2024) • No Venue
Held et al.
Instruction Following Without Instruction Tuning (2024) • No Venue
Hewitt et al.
ORPO: Monolithic Preference Optimization Without Reference Model (2024) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 43 citations
Jiwoo Hong, Noah Lee, James Thorne
Onetracker: Unifying Visual Object Tracking With Foundation Models And Efficient Tuning (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 71 citations
Hong et al.
Not All LLM Reasoners Are Created Equal (2024) • No Venue
Hosseini et al.
Evaluating And Aligning Codellms On Human Preference (2024) • No Venue
Yang et al.
No More Adam: Learning Rate Scaling At Initialization Is All You Need (2024) • No Venue
Xu et al.
Adapting Visual-language Models For Generalizable Anomaly Detection In Medical Images (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Huang et al.
O1 Replication Journey -- Part 2: Surpassing O1-preview Through Simple Distillation, Big Progress Or Bitter Lesson? (2024) • No Venue
Huang et al.
How Good Are Low-bit Quantized Llama3 Models? An Empirical Study (2024) • No Venue
Huang et al.
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation (2024) • No Venue
Huang et al.
Mv-adapter: Multi-view Consistent Image Generation Made Easy (2024) • No Venue
Huang et al.
Opencoder: The Open Cookbook For Top-tier Code Large Language Models (2024) • No Venue
Huang et al.
Extending Llama-3's Context Ten-fold Overnight (2024) • No Venue
Zhang et al.
Affordance-based Robot Manipulation With Flow Matching (2024) • No Venue
Fan Zhang, Michael Gienger
Smaller Language Models Are Better Instruction Evolvers (2024) • No Venue
Hui et al.
Scaling Laws For Downstream Task Performance Of Large Language Models (2024) • No Venue
Isik et al.
Modulated Intervention Preference Optimization (MIPO): Keep The Easy, Refine The Difficult (2024) • No Venue
Cheolhun Jang
Instruction-tuned Language Models Are Better Knowledge Learners (2024) • No Venue
Jiang et al.
Comat: Aligning Text-to-image Diffusion Model With Image-to-text Concept Matching (2024) • No Venue
Jiang et al.
E5-V: Universal Embeddings With Multimodal Large Language Models (2024) • No Venue
Jiang et al.
Mora: High-rank Updating For Parameter-efficient Fine-tuning (2024) • No Venue
Jiang et al.
Vineppo: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment (2024) • No Venue
Kazemnejad et al.
ATHAR: A High-quality And Diverse Dataset For Classical Arabic To English Translation (2024) • No Venue
Mohammed Khalil, Mohammed Sabry
Openvla: An Open-source Vision-language-action Model (2024) • No Venue
Kim et al.
Understanding Large-language Model (llm)-powered Human-robot Interaction (2024) • HRI '24: ACM/IEEE International Conference on Human-Robot Interaction • 75 citations
Callie Y. Kim, Christine P. Lee, Bilge Mutlu
Xgen-mm (BLIP-3): A Family Of Open Large Multimodal Models (2024) • No Venue
Xue et al.
In Search Of Needles In A 10M Haystack: Recurrent Memory Finds What Llms Miss (2024) • No Venue
Kuratov et al.
Training Language Models To Self-correct Via Reinforcement Learning (2024) • No Venue
Kumar et al.
Step-dpo: Step-wise Preference Optimization For Long-chain Reasoning Of Llms (2024) • No Venue
Lai et al.
TÜLU 3: Pushing Frontiers In Open Language Model Post-training (2024) • No Venue
Lambert et al.
Pllava : Parameter-free Llava Extension From Images To Videos For Video Dense Captioning (2024) • No Venue
Xu et al.
Closing The Gap Between Open-source And Commercial Large Language Models For Medical Evidence Summarization (2024) • npj Digital Medicine • 45 citations
Zhang et al.
Collavo: Crayon Large Language And Vision Model (2024) • No Venue
Lee et al.
LLM2LLM: Boosting Llms With Novel Iterative Data Enhancement (2024) • No Venue
Lee et al.
Phantom Of Latent For Large Language And Vision Models (2024) • No Venue
Lee et al.
Stronger Models Are NOT Stronger Teachers For Instruction Tuning (2024) • No Venue
Xu et al.
Baichuan-omni Technical Report (2024) • No Venue
Li et al.
Common 7B Language Models Already Possess Strong Math Capabilities (2024) • No Venue
Li et al.
Codes: Towards Building Open-source Language Models For Text-to-sql (2024) • Proceedings of the ACM on Management of Data • 44 citations
Li et al.
Controlnet++: Improving Conditional Controls With Efficient Consistency Feedback (2024) • No Venue
Li et al.
Dotamath: Decomposition Of Thought With Code Assistance And Self-correction For Mathematical Reasoning (2024) • No Venue
Li et al.
Scilitllm: How To Adapt Llms For Scientific Literature Understanding (2024) • No Venue
Li et al.
Mix-ln: Unleashing The Power Of Deeper Layers By Combining Pre-ln And Post-ln (2024) • No Venue
Pengxiang Li, Lu Yin, Shiwei Liu
Your Mixture-of-experts LLM Is Secretly An Embedding Model For Free (2024) • No Venue
Ziyue Li, Tianyi Zhou
Chatglm-math: Improving Math Problem-solving In Large Language Models With A Self-critique Pipeline (2024) • No Venue
Xu et al.
Course-correction: Safety Alignment Using Synthetic Preferences (2024) • No Venue
Xu et al.
Magpie: Alignment Data Synthesis From Scratch By Prompting Aligned Llms With Nothing (2024) • No Venue
Xu et al.
Contrastive Preference Optimization: Pushing The Boundaries Of LLM Performance In Machine Translation (2024) • No Venue
Xu et al.
Adam-mini: Use Fewer Learning Rates To Gain More (2024) • No Venue
Zhang et al.
Gated Slot Attention For Efficient Linear-time Sequence Modeling (2024) • No Venue
Zhang et al.
Longcite: Enabling Llms To Generate Fine-grained Citations In Long-context QA (2024) • No Venue
Zhang et al.
Ferret-v2: An Improved Baseline For Referring And Grounding With Large Language Models (2024) • No Venue
Zhang et al.
Controllable Text Generation For Large Language Models: A Survey (2024) • No Venue
Liang et al.
Learning To Learn Faster From Human Feedback With Language Model Predictive Control (2024) • No Venue
Liang et al.
Adding Nvme Ssds To Enable And Accelerate 100B Model Fine-tuning On A Single GPU (2024) • No Venue
Liao et al.
Mmed-rag: Versatile Multimodal RAG System For Medical Vision Language Models (2024) • No Venue
Xia et al.
Data-efficient Fine-tuning For Llm-based Recommendation (2024) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 80 citations
Lin et al.
Baichuan Alignment Technical Report (2024) • No Venue
Lin et al.
Critical Tokens Matter: Token-level Contrastive Estimation Enhence Llm's Reasoning Capability (2024) • No Venue
Lin et al.
FLAME: Factuality-aware Alignment For Large Language Models (2024) • No Venue
Lin et al.
Rho-1: Not All Tokens Are What You Need (2024) • No Venue
Lin et al.
Deepseek-prover-v1.5: Harnessing Proof Assistant Feedback For Reinforcement Learning And Monte-carlo Tree Search (2024) • No Venue
Xin et al.
Open-finllms: Open Multimodal Large Language Models For Financial Applications (2024) • No Venue
Xie et al.
The Segment Anything Model (SAM) For Remote Sensing Applications: From Zero To One Shot (2023) • International Journal of Applied Earth Observation and Geoinformation • 227 citations
Osco et al.
Fine-tuning Or Retrieval? Comparing Knowledge Injection In Llms (2023) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 50 citations
Ovadia et al.
Chartgpt: Leveraging Llms To Generate Charts From Abstract Natural Language (2023) • IEEE Transactions on Visualization and Computer Graphics • 48 citations
Tian et al.
Instruction Tuning With GPT-4 (2023) • Arxiv • 184 citations
Peng et al.
FP8-LM: Training FP8 Large Language Models (2023) • No Venue
Peng et al.
Yarn: Efficient Context Window Extension Of Large Language Models (2023) • No Venue
Peng et al.
BEST: BERT Pre-training For Sign Language Recognition With Coupling Tokenization (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 41 citations
Zhao et al.
Parameter-efficient Orthogonal Finetuning Via Butterfly Factorization (2023) • No Venue
Liu et al.
Tinygsm: Achieving >80% On Gsm8k With Small Language Models (2023) • No Venue
Liu et al.
Recommender Systems In The Era Of Large Language Models (llms) (2023) • IEEE Transactions on Knowledge and Data Engineering • 183 citations
Zhao et al.
When MOE Meets Llms: Parameter Efficient Fine-tuning For Multi-task Medical Applications (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 49 citations
Liu et al.
CLIP As RNN: Segment Countless Visual Concepts Without Training Endeavor (2023) • No Venue
Sun et al.
All In One: Multi-task Prompting For Graph Neural Networks (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 123 citations
Sun et al.
Florence-2: Advancing A Unified Representation For A Variety Of Vision Tasks (2023) • No Venue
Xiao et al.
Alpha-clip: A CLIP Model Focusing On Wherever You Want (2023) • No Venue
Sun et al.
Sql-palm: Improved Large Language Modeladaptation For Text-to-sql (2023) • No Venue
Sun et al.
Judgelm: Fine-tuned Large Language Models Are Scalable Judges (2023) • No Venue
Lianghui Zhu, Xinggang Wang, Xinlong Wang
The Flan Collection: Designing Data And Methods For Effective Instruction Tuning (2023) • Arxiv • 109 citations
Longpre et al.
Llama-reviewer: Advancing Code Review Automation With Large Language Models Through Parameter-efficient Fine-tuning (2023) • 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE) • 71 citations
Lu et al.
Collaborative Generative AI: Integrating Gpt-k For Efficient Editing In Text-to-image Generation (2023) • WWW '24: The ACM Web Conference 2024 • 58 citations
Zhu et al.
Visual Prompt Multi-modal Tracking (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 234 citations
Zhu et al.
Text Classification Via Large Language Models (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 108 citations
Sun et al.
Taiyi: A Bilingual Fine-tuned Large Language Model For Diverse Biomedical Tasks (2023) • Journal of the American Medical Informatics Association • 41 citations
Luo et al.
Geolayoutlm: Geometric Pre-training For Visual Information Extraction (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Luo et al.
Wizardcoder: Empowering Code Large Language Models With Evol-instruct (2023) • No Venue
Luo et al.
Chatagri: Exploring Potentials Of Chatgpt On Cross-linguistic Agricultural Text Classification (2023) • Neurocomputing • 96 citations
Zhao et al.
Kosmos-2.5: A Multimodal Literate Model (2023) • No Venue
Lv et al.
Full Parameter Fine-tuning For Large Language Models With Limited Resources (2023) • No Venue
Lv et al.
Bioinspiredllm: Conversational Large Language Model For The Mechanics Of Biological And Bio-inspired Materials (2023) • Advanced Science • 67 citations
Rachel K. Luu, Markus J. Buehler
Fingpt: Large Generative Models For A Small Language (2023) • No Venue
Luukkonen et al.
Beyond Human Data: Scaling Self-training For Problem-solving With Language Models (2023) • No Venue
Singh et al.
Uni-controlnet: All-in-one Control To Text-to-image Diffusion Models (2023) • Arxiv • 65 citations
Zhao et al.
Biomedical Knowledge Graph-optimized Prompt Generation For Large Language Models (2023) • Bioinformatics • 51 citations
Soman et al.
Text-to-sticker: Style Tailoring Latent Diffusion Models For Human Expression (2023) • No Venue
Sinha et al.
A Survey Of Graph Prompting Methods: Techniques, Applications, And Challenges (2023) • World Wide Web • 199 citations
Wu et al.
Dissociating Language And Thought In Large Language Models (2023) • Trends in Cognitive Sciences • 191 citations
Mahowald et al.
3d-vista: Pre-trained Transformer For 3D Vision And Text Alignment (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 82 citations
Zhu et al.
Distilling Large Language Models For Matching Patients To Clinical Trials (2023) • Journal of the American Medical Informatics Association • 44 citations
Nievas et al.
How Effective Are Neural Networks For Fixing Security Vulnerabilities (2023) • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis • 61 citations
Wu et al.
Few-shot Fine-tuning Vs. In-context Learning: A Fair Comparison And Evaluation (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 54 citations
Mosbach et al.
Dreamix: Video Diffusion Models Are General Video Editors (2023) • Arxiv • 43 citations
Molad et al.
Anymal: An Efficient And Scalable Any-modality Augmented Language Model (2023) • No Venue
Moon et al.
Octopack: Instruction Tuning Code Large Language Models (2023) • No Venue
Muennighoff et al.
Pmc-llama: Towards Building Open-source Language Models For Medicine (2023) • Journal of the American Medical Informatics Association • 179 citations
Wu et al.
Towards Expert-level Medical Question Answering With Large Language Models (2023) • Arxiv • 329 citations
Singhal et al.
On The Opportunities And Challenges Of Foundation Models For Geospatial Artificial Intelligence (2023) • Arxiv • 63 citations
Mai et al.
Mathcoder: Seamless Code Integration In Llms For Enhanced Mathematical Reasoning (2023) • No Venue
Wang et al.
On-policy Distillation Of Language Models: Learning From Self-generated Mistakes (2023) • No Venue
Agarwal et al.
Recommending Root-cause And Mitigation Steps For Cloud Incidents Using Large Language Models (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 69 citations
Ahmed et al.
Becoming Self-instruct: Introducing Early Stopping Criteria For Minimal Instruct Tuning (2023) • No Venue
Alshikh et al.
Docllm: A Layout-aware Generative Language Model For Multimodal Document Understanding (2023) • No Venue
Wang et al.
Learning From Mistakes Makes LLM Better Reasoner (2023) • No Venue
An et al.
Huatuo: Tuning Llama Model With Chinese Medical Knowledge (2023) • Arxiv • 93 citations
Wang et al.
Improving Text Embeddings With Large Language Models (2023) • No Venue
Wang et al.
Codet5+: Open Code Large Language Models For Code Understanding And Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 215 citations
Wang et al.
Domain-agnostic Tuning-encoder For Fast Personalization Of Text-to-image Models (2023) • SIGGRAPH Asia 2023 Conference Papers • 41 citations
Arar et al.
Foundational Models Defining A New Era In Vision: A Survey And Outlook (2023) • Arxiv • 66 citations
Awais et al.
Longbench: A Bilingual, Multitask Benchmark For Long Context Understanding (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Bai et al.
Learning To Exploit Temporal Structure For Biomedical Vision-language Processing (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Bannur et al.
Tallrec: An Effective And Efficient Tuning Framework To Align Large Language Model With Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 242 citations
Bao et al.
Tablegpt: Towards Unifying Tables, Nature Language And Commands Into One GPT (2023) • No Venue
Zha et al.
Align Your Latents: High-resolution Video Synthesis With Latent Diffusion Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 438 citations
Blattmann et al.
Distributed Inference And Fine-tuning Of Large Language Models Over The Internet (2023) • No Venue
Borzunov et al.
RT-2: Vision-language-action Models Transfer Web Knowledge To Robotic Control (2023) • No Venue
Brohan et al.
Mechgpt, A Language-based Strategy For Mechanics And Materials Modeling That Connects Knowledge Across Scales, Disciplines And Modalities (2023) • Applied Mechanics Reviews • 74 citations
Markus J. Buehler
Weak-to-strong Generalization: Eliciting Strong Capabilities With Weak Supervision (2023) • No Venue
Burns et al.
Just Tell Me: Prompt Engineering In Business Process Management (2023) • Lecture Notes in Business Information Processing • 52 citations
Busch et al.
Spanish Pre-trained BERT Model And Evaluation Data (2023) • Arxiv • 332 citations
Cañete et al.
Multilora: Democratizing Lora For Better Multi-task Learning (2023) • No Venue
Wang et al.
LLM4TS: Aligning Pre-trained Llms As Data-efficient Time-series Forecasters (2023) • ACM Transactions on Intelligent Systems and Technology • 43 citations
Chang et al.
One-for-all: Generalized Lora For Parameter-efficient Fine-tuning (2023) • No Venue
Chavan et al.
Benchmarking Large Language Models For Biomedical Natural Language Processing Applications And Recommendations (2023) • Nature Communications • 41 citations
Chen et al.
Alpagasus: Training A Better Alpaca With Fewer Data (2023) • No Venue
Chen et al.
Extending Context Window Of Large Language Models Via Positional Interpolation (2023) • No Venue
Chen et al.
Clip2scene: Towards Label-efficient 3D Scene Understanding By CLIP (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 105 citations
Chen et al.
Lorashear: Efficient Large Language Model Structured Pruning And Knowledge Recovery (2023) • No Venue
Chen et al.
Longlora: Efficient Fine-tuning Of Long-context Large Language Models (2023) • No Venue
Chen et al.
Symbolic Discovery Of Optimization Algorithms (2023) • Arxiv • 163 citations
Chen et al.
Soulchat: Improving Llms' Empathy, Listening, And Comfort Abilities Through Fine-tuning With Multi-turn Empathy Conversations (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 43 citations
Chen et al.
Subject-driven Text-to-image Generation Via Apprenticeship Learning (2023) • Arxiv • 46 citations
Chen et al.
Adapting Large Language Models By Integrating Collaborative Semantics For Recommendation (2023) • 2024 IEEE 40th International Conference on Data Engineering (ICDE) • 58 citations
Zheng et al.
Shepherd: A Critic For Language Model Generation (2023) • No Venue
Wang et al.
Low-rank Adaptation Of Large Language Model Rescoring For Parameter-efficient Speech Recognition (2023) • No Venue
Yu et al.
Parameter-efficient Transfer Learning For Remote Sensing Image-text Retrieval (2023) • IEEE Transactions on Geoscience and Remote Sensing • 60 citations
Yuan Yuan, Yang Zhan, Zhitong Xiong
Scaling Relationship On Learning Mathematical Reasoning With Large Language Models (2023) • No Venue
Yuan et al.
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts (2023) • No Venue
Veen et al.
Revisiting Relation Extraction In The Era Of Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 94 citations
Somin Wadhwa, Silvio Amir, Byron C. Wallace
Rolellm: Benchmarking, Eliciting, And Enhancing Role-playing Abilities Of Large Language Models (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 51 citations
Wang et al.
Open-ended Medical Visual Question Answering Through Prefix Tuning Of Language Models (2023) • Lecture Notes in Computer Science • 43 citations
Sonsbeek et al.
A Picture Is Worth More Than 77 Text Tokens: Evaluating Clip-style Models On Dense Captions (2023) • No Venue
Urbanek et al.
Turning A CLIP Model Into A Scene Text Detector (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 75 citations
Yu et al.
Dola: Decoding By Contrasting Layers Improves Factuality In Large Language Models (2023) • No Venue
Chuang et al.
Wavecoder: Widespread And Versatile Enhanced Instruction Tuning With Refined Data Generation (2023) • No Venue
Yu et al.
Merlin:empowering Multimodal Llms With Foresight Minds (2023) • No Venue
Yu et al.
Swinmm: Masked Multi-view With Swin Transformers For 3D Medical Image Segmentation (2023) • Lecture Notes in Computer Science • 41 citations
Wang et al.
Emu: Enhancing Image Generation Models Using Photogenic Needles In A Haystack (2023) • No Venue
Dai et al.
Safe RLHF: Safe Reinforcement Learning From Human Feedback (2023) • No Venue
Dai et al.
Zephyr: Direct Distillation Of LM Alignment (2023) • Arxiv • 51 citations
Tunstall et al.
Qlora: Efficient Finetuning Of Quantized Llms (2023) • No Venue
Dettmers et al.
Enhancing Chat Language Models By Scaling High-quality Instructional Conversations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 60 citations
Ding et al.
Llama 2: Open Foundation And Fine-tuned Chat Models (2023) • No Venue
Touvron et al.
Misrob{\ae}rta: Transformers Versus Misinformation (2023) • Mathematics • 41 citations
Ciprian-Octavian Truică, Elena-Simona Apostol
Lumos: Learning Agents With Unified Data, Modular Design, And Open-source Llms (2023) • No Venue
Yin et al.
Alpacafarm: A Simulation Framework For Methods That Learn From Human Feedback (2023) • Arxiv • 53 citations
Dubois et al.
Glaze: Protecting Artists From Style Mimicry By Text-to-image Models (2023) • Arxiv • 41 citations
Shan et al.
LIMA: Less Is More For Alignment (2023) • No Venue
Zhou et al.
Ziplora: Any Subject In Any Style By Effectively Merging Loras (2023) • No Venue
Shah et al.
Jais And Jais-chat: Arabic-centric Foundation And Instruction-tuned Open Generative Large Language Models (2023) • No Venue
Sengupta et al.
Chatgpt For Vulnerability Detection, Classification, And Repair: How Far Are We? (2023) • 2023 30th Asia-Pacific Software Engineering Conference (APSEC) • 56 citations
Fu et al.
Creating A Large Language Model Of A Philosopher (2023) • Mind & Language • 46 citations
Eric Schwitzgebel, David Schwitzgebel, Anna Strasser
Encoder-based Domain Tuning For Fast Personalization Of Text-to-image Models (2023) • ACM Transactions on Graphics • 122 citations
Gal et al.
Erasing Concepts From Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 111 citations
Gandikota et al.
Ureader: Universal Ocr-free Visually-situated Language Understanding With Multimodal Large Language Model (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 46 citations
Ye et al.
A Unified Continual Learning Framework With General Parameter-efficient Tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 49 citations
Gao et al.
Physically Grounded Vision-language Models For Robotic Manipulation (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 60 citations
Gao et al.
Text-to-sql Empowered By Large Language Models: A Benchmark Evaluation (2023) • Proceedings of the VLDB Endowment • 111 citations
Gao et al.
Vita-clip: Video And Text Adaptive CLIP Via Multimodal Prompting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Wasim et al.
Ip-adapter: Text Compatible Image Prompt Adapter For Text-to-image Diffusion Models (2023) • No Venue
Ye et al.
Graphgpt: Graph Instruction Tuning For Large Language Models (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 96 citations
Tang et al.
In-context Autoencoder For Context Compression In A Large Language Model (2023) • No Venue
Ge et al.
Preserve Your Own Correlation: A Noise Prior For Video Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 97 citations
Ge et al.
VIP5: Towards Multimodal Foundation Models For Recommendation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 41 citations
Geng et al.
Flacuna: Unleashing The Problem Solving Power Of Vicuna Using FLAN Fine-tuning (2023) • No Venue
Ghosal et al.
Adding Conditional Control To Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 2580 citations
Lvmin Zhang, Anyi Rao, Maneesh Agrawala
Visual-language Prompt Tuning With Knowledge-guided Context Optimization (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 141 citations
Hantao Yao, Rui Zhang, Changsheng Xu
Textbooks Are All You Need (2023) • No Venue
Gunasekar et al.
Large Language Models To Identify Social Determinants Of Health In Electronic Health Records (2023) • npj Digital Medicine • 184 citations
Guevara et al.
Retroformer: Retrospective Large Language Agents With Policy Gradient Optimization (2023) • No Venue
Yao et al.
Verigen: A Large Language Model For Verilog Code Generation (2023) • ACM Transactions on Design Automation of Electronic Systems • 129 citations
Thakur et al.
Using Human Feedback To Fine-tune Diffusion Models Without Any Reward Model (2023) • No Venue
Yang et al.
One Fits All:power General Time Series Analysis By Pretrained LM (2023) • Arxiv • 115 citations
Zhou et al.
Medalpaca -- An Open-source Collection Of Medical Conversational AI Models And Training Data (2023) • Arxiv • 102 citations
Han et al.
Svdiff: Compact Parameter Space For Diffusion Fine-tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Han et al.
Hyperdreambooth: Hypernetworks For Fast Personalization Of Text-to-image Models (2023) • No Venue
Ruiz et al.
CLIP Goes 3D: Leveraging Prompt Tuning For Language Grounded 3D Recognition (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 50 citations
Deepti Hegde, Jeya Maria Jose Valanarasu, Vishal M. Patel
Aligning Instruction Tasks Unlocks Large Language Models As Zero-shot Relation Extractors (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 44 citations
Kai Zhang, Bernal Jiménez Gutiérrez, Yu Su
Distilling Step-by-step! Outperforming Larger Language Models With Less Training Data And Smaller Model Sizes (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 160 citations
Hsieh et al.
RSGPT: A Remote Sensing Vision Language Model And Benchmark (2023) • ISPRS Journal of Photogrammetry and Remote Sensing • 46 citations
Hu et al.
Llm-adapters: An Adapter Family For Parameter-efficient Fine-tuning Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 128 citations
Hu et al.
Mentallama: Interpretable Mental Health Analysis On Social Media With Large Language Models (2023) • WWW '24: The ACM Web Conference 2024 • 79 citations
Yang et al.
Uniaudio: An Audio Foundation Model Toward Universal Audio Generation (2023) • No Venue
Yang et al.
Swin3d: A Pretrained Transformer Backbone For 3D Indoor Scene Understanding (2023) • Computational Visual Media • 47 citations
Yang et al.
Lorahub: Efficient Cross-task Generalization Via Dynamic Lora Composition (2023) • No Venue
Huang et al.
Fine-tuning Language Models For Factuality (2023) • No Venue
Tian et al.
Exploring The Limits Of Chatgpt For Query Or Aspect-based Text Summarization (2023) • Arxiv • 89 citations
Yang et al.
Magicapture: High-resolution Multi-concept Portrait Customization (2023) • No Venue
Junha Hyung, Jaeyo Shin, Jaegul Choo
Quilt-1m: One Million Image-text Pairs For Histopathology (2023) • Arxiv • 52 citations
Ikezogwo et al.
Camels In A Changing Climate: Enhancing LM Adaptation With Tulu 2 (2023) • No Venue
Ivison et al.
Simple Synthetic Data Reduces Sycophancy In Large Language Models (2023) • No Venue
Wei et al.
A Study On The Implementation Of Generative AI Services Using An Enterprise Data-based LLM Application Architecture (2023) • Advances in Artificial Intelligence and Machine Learning • 53 citations
Cheonsu Jeong
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models For Domain Generalized Semantic Segmentation (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Wei et al.
Impact Of Code Language Models On Automated Program Repair (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 119 citations
Jiang et al.
Motiongpt: Human Motion As A Foreign Language (2023) • No Venue
Jiang et al.
Huatuogpt, Towards Taming Language Model To Be A Doctor (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 128 citations
Zhang et al.
Polylm: An Open Source Polyglot Large Language Model (2023) • No Venue
Wei et al.
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages (2023) • Arxiv • 111 citations
Zhang et al.
Sabi\'a: Portuguese Large Language Models (2023) • Lecture Notes in Computer Science • 46 citations
Pires et al.
The Unlocking Spell On Base Llms: Rethinking Alignment Via In-context Learning (2023) • No Venue
Lin et al.
VILA: On Pre-training For Visual Language Models (2023) • No Venue
Lin et al.
Adapters: A Unified Library For Parameter-efficient And Modular Transfer Learning (2023) • No Venue
Poth et al.
Aligning Text-to-image Diffusion Models With Reward Backpropagation (2023) • No Venue
Prabhudesai et al.
Scaling Down To Scale Up: A Guide To Parameter-efficient Fine-tuning (2023) • Arxiv • 66 citations
Lialin et al.
Doctorglm: Fine-tuning Your Chinese Doctor Is Not A Herculean Task (2023) • Arxiv • 70 citations
Xiong et al.
Rich Human Feedback For Text-to-image Generation (2023) • No Venue
Liang et al.
Egovlpv2: Egocentric Video-language Pre-training With Fusion In The Backbone (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 51 citations
Pramanick et al.
From Sparse To Soft Mixtures Of Experts (2023) • No Venue
Puigcerver et al.
CCT5: A Code-change-oriented Pre-trained Model (2023) • Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 42 citations
Lin et al.
BLIP-2: Bootstrapping Language-image Pre-training With Frozen Image Encoders And Large Language Models (2023) • Arxiv • 65 citations
Li et al.
Text Is All You Need: Learning Language Representations For Sequential Recommendation (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 103 citations
Li et al.
Masked Vision And Language Pre-training With Unimodal And Multimodal Contrastive Losses For Medical Visual Question Answering (2023) • Lecture Notes in Computer Science • 41 citations
Li et al.
Efficient Domain Adaptation For Speech Foundation Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Li et al.
Chatdoctor: A Medical Chat Model Fine-tuned On A Large Language Model Meta-ai (llama) Using Medical Domain Knowledge (2023) • Cureus • 256 citations
Li et al.
Manipllm: Embodied Multimodal Large Language Model For Object-centric Robotic Manipulation (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Li et al.
Large Multimodal Models: Notes On CVPR 2023 Tutorial (2023) • ICAIF '23: 4th ACM International Conference on AI in Finance • 157 citations
Chunyuan Li
Loftq: Lora-fine-tuning-aware Quantization For Large Language Models (2023) • No Venue
Li et al.
Table-gpt: Table-tuned GPT For Diverse Table Tasks (2023) • No Venue
Li et al.
Starcoder: May The Source Be With You! (2023) • No Venue
Li et al.
Controlling Text-to-image Diffusion By Orthogonal Finetuning (2023) • No Venue
Qiu et al.
Time Is Encoded In The Weights Of Finetuned Language Models (2023) • No Venue
Kai Nylund, Suchin Gururangan, Noah A. Smith
S-lora: Serving Thousands Of Concurrent Lora Adapters (2023) • No Venue
Sheng et al.
A Paradigm Shift In Machine Translation: Boosting Translation Performance Of Large Language Models (2023) • No Venue
Xu et al.
Qa-lora: Quantization-aware Low-rank Adaptation Of Large Language Models (2023) • No Venue
Xu et al.
Mental-llm: Leveraging Large Language Models For Mental Health Prediction Via Online Text Data (2023) • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies • 119 citations
Xu et al.
Drivegpt4: Interpretable End-to-end Autonomous Driving Via Large Language Model (2023) • IEEE Robotics and Automation Letters • 202 citations
Xu et al.
Lemur: Harmonizing Natural Language And Code For Language Agents (2023) • No Venue
Xu et al.
Knowledge-enhanced Visual-language Pre-training On Chest Radiology Images (2023) • Nature Communications • 134 citations
Zhang et al.
Sorted Llama: Unlocking The Potential Of Intermediate Layers Of Large Language Models For Dynamic Inference Using Sorted Fine-tuning (soft) (2023) • No Venue
Kavehzadeh et al.
VILA: Learning Image Aesthetics From User Comments With Vision-language Pretraining (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Ke et al.
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior (2023) • No Venue
Khandelwal et al.
Self-regulating Prompts: Foundational Model Adaptation Without Forgetting (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 113 citations
Khattak et al.
Dspy: Compiling Declarative Language Model Calls Into Self-improving Pipelines (2023) • No Venue
Khattab et al.
Gptaraeval: A Comprehensive Evaluation Of Chatgpt On Arabic NLP (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 41 citations
Khondaker et al.
Instruct-fingpt: Financial Sentiment Analysis By Instruction Tuning Of General-purpose Large Language Models (2023) • SSRN Electronic Journal • 50 citations
Boyu Zhang, Hongyang Yang, Xiao-Yang Liu
Vera: Vector-based Random Matrix Adaptation (2023) • No Venue
Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki Markus Asano
RS5M And Georsclip: A Large Scale Vision-language Dataset And A Large Vision-language Model For Remote Sensing (2023) • IEEE Transactions on Geoscience and Remote Sensing • 43 citations
Zhang et al.
Personalize Segment Anything Model With One Shot (2023) • Arxiv • 65 citations
Zhang et al.
Federatedscope-llm: A Comprehensive Package For Fine-tuning Large Language Models In Federated Learning (2023) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 56 citations
Kuang et al.
Towards Lightweight Cross-domain Sequential Recommendation Via External Attention-enhanced Graph Convolution Network (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 63 citations
Zhang et al.
Speechgpt: Empowering Large Language Models With Intrinsic Cross-modal Conversational Abilities (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 108 citations
Zhang et al.
LISA: Reasoning Segmentation Via Large Language Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 205 citations
Lai et al.
Xuanyuan 2.0: A Large Chinese Financial Chat Model With Hundreds Of Billions Parameters (2023) • Proceedings of the 32nd ACM International Conference on Information and Knowledge Management • 57 citations
Xuanyu Zhang, Qing Yang, Dongliang Xu
Platypus: Quick, Cheap, And Powerful Refinement Of Llms (2023) • No Venue
Ariel N. Lee, Cole J. Hunter, Nataniel Ruiz
Pangu-coder2: Boosting Large Language Models For Code With Ranking Feedback (2023) • No Venue
Shen et al.
Exploring Plain Vision Transformer Backbones For Object Detection (2022) • Lecture Notes in Computer Science • 556 citations
Li et al.
Frozen CLIP Models Are Efficient Video Learners (2022) • Lecture Notes in Computer Science • 148 citations
Lin et al.
Improving Multi-task Generalization Via Regularizing Spurious Correlation (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 40 citations
Hu et al.
Pushing The Limits Of Simple Pipelines For Few-shot Learning: External Data And Fine-tuning Make A Difference (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 148 citations
Hu et al.
Black-box Tuning For Language-model-as-a-service (2022) • Arxiv • 56 citations
Sun et al.
Dreambooth: Fine Tuning Text-to-image Diffusion Models For Subject-driven Generation (2022) • Arxiv • 105 citations
Ruiz et al.
Large Language Models Can Self-improve (2022) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 96 citations
Huang et al.
Backdoor Defense Via Decoupling The Training Process (2022) • Arxiv • 41 citations
Huang et al.
PLM-ICD: Automatic ICD Coding With Pretrained Language Models (2022) • Proceedings of the 4th Clinical Natural Language Processing Workshop • 41 citations
Chao-Wei Huang, Shang-Chi Tsai, Yun-Nung Chen
Safe Self-refinement For Transformer-based Domain Adaptation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 96 citations
Sun et al.
POLITICS: Pretraining With Same-story Article Comparison For Ideology Prediction And Stance Detection (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 40 citations
Liu et al.
A Smile Is All You Need: Predicting Limiting Activity Coefficients From SMILES With Natural Language Processing (2022) • Digital Discovery • 65 citations
Winter et al.
Natural Attack For Pre-trained Models Of Code (2022) • Proceedings of the 44th International Conference on Software Engineering • 130 citations
Yang et al.
Chinese CLIP: Contrastive Vision-language Pretraining In Chinese (2022) • Arxiv • 51 citations
Yang et al.
Xmp-font: Self-supervised Cross-modality Pre-training For Few-shot Font Generation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Liu et al.
Few-shot Parameter-efficient Fine-tuning Is Better And Cheaper Than In-context Learning (2022) • Arxiv • 292 citations
Liu et al.
Visual Prompt Tuning (2022) • Lecture Notes in Computer Science • 1133 citations
Jia et al.
Fact: Factor-tuning For Lightweight Adaptation On Vision Transformer (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Shibo Jie, Zhi-Hong Deng
Impact Of Pretraining Term Frequencies On Few-shot Reasoning (2022) • Arxiv • 51 citations
Razeghi et al.
Groupvit: Semantic Segmentation Emerges From Text Supervision (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 352 citations
Xu et al.
Pix2struct: Screenshot Parsing As Pretraining For Visual Language Understanding (2022) • Arxiv • 45 citations
Lee et al.
Vitpose: Simple Vision Transformer Baselines For Human Pose Estimation (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 70 citations
Xu et al.
Controllable Natural Language Generation With Contrastive Prefixes (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 56 citations
Qian et al.
Benchmarking Large Language Models For Automated Verilog RTL Code Generation (2022) • 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) • 112 citations
Thakur et al.
A Few Thousand Translations Go A Long Way! Leveraging Pre-trained Models For African News Translation (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 44 citations
Adelani et al.
CM3: A Causal Masked Multimodal Model Of The Internet (2022) • Arxiv • 40 citations
Aghajanyan et al.
ATTEMPT: Parameter-efficient Multi-task Tuning Via Attentional Mixtures Of Soft Prompts (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 49 citations
Asai et al.
SGPT: GPT Sentence Embeddings For Semantic Search (2022) • Arxiv • 56 citations
Niklas Muennighoff
Text Embeddings By Weakly-supervised Contrastive Pre-training (2022) • Arxiv • 107 citations
Wang et al.
No More Fine-tuning? An Experimental Evaluation Of Prompt Tuning In Code Intelligence (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 124 citations
Wang et al.
Graph Pre-training For AMR Parsing And Generation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 61 citations
Xuefeng Bai, Yulong Chen, Yue Zhang
Training A Helpful And Harmless Assistant With Reinforcement Learning From Human Feedback (2022) • Arxiv • 346 citations
Bai et al.
Adamix: Mixture-of-adaptations For Parameter-efficient Model Tuning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 54 citations
Wang et al.
Mslam: Massively Multilingual Joint Pre-training For Speech And Text (2022) • Arxiv • 59 citations
Bapna et al.
GPT Takes The Bar Exam (2022) • SSRN Electronic Journal • 107 citations
Michael Bommarito, Daniel Martin Katz
Star: Bootstrapping Reasoning With Reasoning (2022) • Arxiv • 113 citations
Zelikman et al.
Cross-domain Deep Code Search With Meta Learning (2022) • Proceedings of the 44th International Conference on Software Engineering • 40 citations
Chai et al.
Roentgen: Vision-language Foundation Model For Chest X-ray Generation (2022) • Arxiv • 55 citations
Chambon et al.
Revisiting Parameter-efficient Tuning: Are We Really There Yet? (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 44 citations
Chen et al.
Corpusbrain: Pre-train A Generative Retrieval Model For Knowledge-intensive Language Tasks (2022) • Proceedings of the 31st ACM International Conference on Information & Knowledge Management • 52 citations
Chen et al.
Using Transfer Learning For Code-related Tasks (2022) • IEEE Transactions on Software Engineering • 54 citations
Mastropaolo et al.
Locating And Editing Factual Associations In GPT (2022) • Arxiv • 172 citations
Meng et al.
Revisiting Classifier: Transferring Vision-language Models For Video Recognition (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Wenhao Wu, Zhun Sun, Wanli Ouyang
Factpegasus: Factuality-aware Pre-training And Fine-tuning For Abstractive Summarization (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 48 citations
David Wan, Mohit Bansal
Task Adaptive Parameter Sharing For Multi-task Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Wallingford et al.
Dawn Of The Transformer Era In Speech Emotion Recognition: Closing The Valence Gap (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 245 citations
Wagner et al.
Learning From Flowsheets: A Generative Transformer Model For Autocompletion Of Flowsheets (2022) • Computers & Chemical Engineering • 42 citations
Gabriel Vogel, Lukas Schulze Balhorn, Artur M. Schweidtmann
Fine-grained Image Captioning With CLIP Reward (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 52 citations
Cho et al.
Storydall-e: Adapting Pretrained Text-to-image Transformers For Story Continuation (2022) • Lecture Notes in Computer Science • 47 citations
Adyasha Maharana, Darryl Hannan, Mohit Bansal
Task Residual For Tuning Vision-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 73 citations
Yu et al.
Teaching Small Language Models To Reason (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 45 citations
Magister et al.
Sus-x: Training-free Name-only Transfer Of Vision-language Models (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 56 citations
Vishaal Udandarao, Ankush Gupta, Samuel Albanie
Why Can GPT Learn In-context? Language Models Implicitly Perform Gradient Descent As Meta-optimizers (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 73 citations
Dai et al.
St-moe: Designing Stable And Transferable Sparse Expert Models (2022) • Arxiv • 43 citations
Zoph et al.
Using Pre-trained Models To Boost Code Review Automation (2022) • Proceedings of the 44th International Conference on Software Engineering • 131 citations
Tufano et al.
Efficient Few-shot Learning Without Prompts (2022) • Arxiv • 93 citations
Tunstall et al.
Learning-by-narrating: Narrative Pre-training For Zero-shot Dialogue Comprehension (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 109 citations
Zhao et al.
Mukea: Multimodal Knowledge Extraction And Accumulation For Knowledge-based Visual Question Answering (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Ding et al.
Delta Tuning: A Comprehensive Study Of Parameter Efficient Methods For Pre-trained Language Models (2022) • Arxiv • 51 citations
Ding et al.
Teaching Structured Vision&language Concepts To Vision&language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Doveh et al.
One Embedder, Any Task: Instruction-finetuned Text Embeddings (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 68 citations
Su et al.
Large Language Models And The Reverse Turing Test (2022) • Neural Computation • 99 citations
Terrence Sejnowski
Model Soups: Averaging Weights Of Multiple Fine-tuned Models Improves Accuracy Without Increasing Inference Time (2022) • Arxiv • 205 citations
Wortsman et al.
LST: Ladder Side-tuning For Parameter And Memory Efficient Transfer Learning (2022) • Arxiv • 79 citations
Yi-Lin Sung, Jaemin Cho, Mohit Bansal
Mixture-of-experts With Expert Choice Routing (2022) • Arxiv • 57 citations
Zhou et al.
St-adapter: Parameter-efficient Image-to-video Transfer Learning (2022) • Arxiv • 76 citations
Pan et al.
Efficient Adapter Transfer Of Self-supervised Speech Models For Automatic Speech Recognition (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 52 citations
Bethan Thomas, Samuel Kessler, Salah Karout
Quark: Controllable Text Generation With Reinforced Unlearning (2022) • NeurIPS 2022 (Oral Selection) • 45 citations
Lu et al.
Recommendation As Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5) (2022) • RecSys '22: Sixteenth ACM Conference on Recommender Systems • 334 citations
Geng et al.
LAION-5B: An Open Large-scale Dataset For Training Next Generation Image-text Models (2022) • Arxiv • 1032 citations
Schuhmann et al.
Zero-shot Text Classification With Self-training (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 44 citations
Gera et al.
Leveraging Unimodal Self-supervised Learning For Multimodal Audio-visual Speech Recognition (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 42 citations
Pan et al.
News Summarization And Evaluation In The Era Of GPT-3 (2022) • Arxiv • 180 citations
Tanya Goyal, Junyi Jessy Li, Greg Durrett
MVP: Multimodality-guided Visual Pre-training (2022) • Lecture Notes in Computer Science • 52 citations
Wei et al.
Coditt5: Pretraining For Source Code And Natural Language Editing (2022) • Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering • 68 citations
Zhang et al.
Self-critiquing Models For Assisting Human Evaluators (2022) • Arxiv • 46 citations
Saunders et al.
Incorporating Dynamic Semantics Into Pre-trained Language Model For Aspect-based Sentiment Analysis (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 49 citations
Zhang et al.
Temporal Alignment Networks For Long-term Video (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Tengda Han, Weidi Xie, Andrew Zisserman
Contextual Adapters For Personalized Speech Recognition In Neural Transducers (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 42 citations
Sathyendra et al.
Subgraph Retrieval Enhanced Model For Multi-hop Knowledge Base Question Answering (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 76 citations
Zhang et al.
Tip-adapter: Training-free Adaption Of CLIP For Few-shot Classification (2022) • Lecture Notes in Computer Science • 246 citations
Zhang et al.
Storseismic: A New Paradigm In Deep Learning For Seismic Processing (2022) • IEEE Transactions on Geoscience and Remote Sensing • 49 citations
Randy Harsuko, Tariq Alkhalifah
UL2: Unifying Language Learning Paradigms (2022) • Arxiv • 97 citations
Tay et al.
Should You Mask 15% In Masked Language Modeling? (2022) • Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics • 60 citations
Wettig et al.
Large Language Models Are Reasoning Teachers (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Namgyu Ho, Laura Schmid, Se-Young Yun
Generalized Decoding For Pixel, Image, And Language (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 140 citations
Zou et al.
Dexperts: Decoding-time Controlled Text Generation With Experts And Anti-experts (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 127 citations
Liu et al.
Knowledgeable Prompt-tuning: Incorporating Knowledge Into Prompt Verbalizer For Text Classification (2021) • Arxiv • 70 citations
Hu et al.
Temporal Adaptation Of BERT And Performance On Downstream Document Classification: Insights From Social Media (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 45 citations
Paul Röttger, Janet B. Pierrehumbert
Consert: A Contrastive Framework For Self-supervised Sentence Representation Transfer (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 466 citations
Yan et al.
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training For Language Understanding And Generation (2021) • Arxiv • 191 citations
Sun et al.
Truthfulqa: Measuring How Models Mimic Human Falsehoods (2021) • Arxiv • 112 citations
Stephanie Lin, Jacob Hilton, Owain Evans
M6: A Chinese Multimodal Pretrainer (2021) • Arxiv • 48 citations
Lin et al.
Hash Layers For Large Sparse Models (2021) • Arxiv • 48 citations
Roller et al.
Enhance To Read Better: A Multi-task Adversarial Network For Handwritten Document Image Enhancement (2021) • Pattern Recognition • 49 citations
Jemni et al.
Learning How To Ask: Querying Lms With Mixtures Of Soft Prompts (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 309 citations
Guanghui Qin, Jason Eisner
Prefix-tuning: Optimizing Continuous Prompts For Generation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 1929 citations
Xiang Lisa Li, Percy Liang
Sentiprompt: Sentiment Knowledge Enhanced Prompt-tuning For Aspect-based Sentiment Analysis (2021) • Arxiv • 57 citations
Li et al.
Transtailor: Pruning The Pre-trained Model For Improved Transfer Learning (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 47 citations
Liu et al.
Pre-training BERT On Arabic Tweets: Practical Considerations (2021) • Arxiv • 83 citations
Abdelali et al.
Pairwise Supervised Contrastive Learning Of Sentence Representations (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 41 citations
Zhang et al.
Ibot: Image BERT Pre-training With Online Tokenizer (2021) • Arxiv • 207 citations
Zhou et al.
Muppet: Massive Multi-task Representations With Pre-finetuning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 168 citations
Aghajanyan et al.
How Much Can CLIP Benefit Vision-and-language Tasks? (2021) • Arxiv • 152 citations
Shen et al.
Polyjuice: Generating Counterfactuals For Explaining, Evaluating, And Improving Models (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 60 citations
Wu et al.
Model Generalization On COVID-19 Fake News Detection (2021) • Communications in Computer and Information Science • 45 citations
Bang et al.
Vlmo: Unified Vision-language Pre-training With Mixture-of-modality-experts (2021) • Arxiv • 288 citations
Bao et al.
Less Is More: Clipbert For Video-and-language Learning Via Sparse Sampling (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 513 citations
Lei et al.
Few-shot Domain Adaptation For Grammatical Error Correction Via Meta-learning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 64 citations
Zhang et al.
Entailment As Few-shot Learner (2021) • Arxiv • 106 citations
Wang et al.
Partial Is Better Than All: Revisiting Fine-tuning Strategy For Few-shot Learning (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 133 citations
Shen et al.
Actionclip: A New Paradigm For Video Action Recognition (2021) • Arxiv • 189 citations
Mengmeng Wang, Jiazheng Xing, Yong Liu
CLEVE: Contrastive Pre-training For Event Extraction (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 100 citations
Wang et al.
MERLOT: Multimodal Neural Script Knowledge Models (2021) • Arxiv • 54 citations
Zellers et al.
Recent Advances In Natural Language Processing Via Large Pre-trained Language Models: A Survey (2021) • ACM Computing Surveys • 812 citations
Min et al.
Multieurlex -- A Multi-lingual And Multi-label Legal Document Classification Dataset For Zero-shot Cross-lingual Transfer (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 68 citations
Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos
Speechstew: Simply Mix All Available Speech Recognition Data To Train One Large Neural Network (2021) • Arxiv • 75 citations
Chan et al.
Lightweight Adapter Tuning For Multilingual Speech Translation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) • 54 citations
Le et al.
Generating Fake Cyber Threat Intelligence Using Transformer-based Models (2021) • 2021 International Joint Conference on Neural Networks (IJCNN) • 61 citations
Ranade et al.
GLIDE: Towards Photorealistic Image Generation And Editing With Text-guided Diffusion Models (2021) • Arxiv • 995 citations
Nichol et al.
An Empirical Investigation Of The Role Of Pre-training In Lifelong Learning (2021) • Journal of Machine Learning Research 24 (2023) 1-50 • 42 citations
Mehta et al.
Pre-trained Language Model Based Ranking In Baidu Search (2021) • KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 46 citations
Zou et al.
Bitfit: Simple Parameter-efficient Fine-tuning For Transformer-based Masked Language-models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 286 citations
Elad Ben Zaken, Shauli Ravfogel, Yoav Goldberg
Not All Negatives Are Equal: Label-aware Contrastive Loss For Fine-grained Text Classification (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 61 citations
Varsha Suresh, Desmond C. Ong
Adapting GPT, GPT-2 And BERT Language Models For Speech Recognition (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 56 citations
Xianrui Zheng, Chao Zhang, Philip C. Woodland
Urltran: Improving Phishing URL Detection Using Transformers (2021) • MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM) • 64 citations
Maneriker et al.
Studying The Usage Of Text-to-text Transfer Transformer To Support Code-related Tasks (2021) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 190 citations
Mastropaolo et al.
Compacter: Efficient Low-rank Hypercomplex Adapter Layers (2021) • Arxiv • 82 citations
Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder
Backdoor Pre-trained Models Can Transfer To All (2021) • Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security • 70 citations
Shen et al.
Towards Offensive Language Identification For Tamil Code-mixed Youtube Comments And Posts (2021) • SN Computer Science • 41 citations
Charangan Vasantharajan, Uthayasanker Thayasivam
Evaluation Of BERT And ALBERT Sentence Embedding Performance On Downstream NLP Tasks (2021) • 2020 25th International Conference on Pattern Recognition (ICPR) • 115 citations
Choi et al.
Planning With Learned Entity Prompts For Abstractive Summarization (2021) • Transactions of the Association for Computational Linguistics • 92 citations
Narayan et al.
Unipelt: A Unified Framework For Parameter-efficient Language Model Tuning (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 51 citations
Mao et al.
Raise A Child In Large Language Model: Towards Effective And Generalizable Fine-tuning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Xu et al.
Template-based Named Entity Recognition Using BART (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 283 citations
Cui et al.
Parameter-efficient Multi-task Fine-tuning For Transformers Via Shared Hypernetworks (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 54 citations
Mahabadi et al.
Masked Language Modeling And The Distributional Hypothesis: Order Word Matters Pre-training For Little (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 62 citations
Sinha et al.
Indicbart: A Pre-trained Model For Indic Natural Language Generation (2021) • Findings of the Association for Computational Linguistics: ACL 2022 • 51 citations
Dabre et al.
Adaptsum: Towards Low-resource Domain Adaptation For Abstractive Summarization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 61 citations
Tiezheng Yu, Zihan Liu, Pascale Fung
Editing Factual Knowledge In Language Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Nicola de Cao, Wilker Aziz, Ivan Titov
Fine-tuning Large Neural Language Models For Biomedical Natural Language Processing (2021) • Patterns • 100 citations
Tinn et al.
Scheduled Sampling In Vision-language Pretraining With Decoupled Encoder-decoder Network (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Li et al.
Docnli: A Large-scale Dataset For Document-level Natural Language Inference (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 51 citations
Wenpeng Yin, Dragomir Radev, Caiming Xiong
Prompt-learning For Fine-grained Entity Typing (2021) • Arxiv • 44 citations
Ding et al.
Transferable Dialogue Systems And User Simulators (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Tseng et al.
Pretrained Language Models For Text Generation: A Survey (2021) • Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21} • 88 citations
Li et al.
Word Alignment By Fine-tuning Embeddings On Parallel Corpora (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 118 citations
Zi-Yi Dou, Graham Neubig
A Neural Network Solves, Explains, And Generates University Math Problems By Program Synthesis And Few-shot Learning At Human Level (2021) • Proceedings of the National Academy of Sciences • 70 citations
Drori et al.
Exploiting Adapters For Cross-lingual Low-resource Speech Recognition (2021) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 47 citations
Hou et al.
Debiasing Pre-trained Contextualised Embeddings (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 83 citations
Masahiro Kaneko, Danushka Bollegala
Unlocking Compositional Generalization In Pre-trained Models Using Intermediate Representations (2021) • Arxiv • 51 citations
Herzig et al.
AMMU : A Survey Of Transformer-based Biomedical Pretrained Language Models (2021) • Journal of Biomedical Informatics • 212 citations
Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
Achieving Forgetting Prevention And Knowledge Transfer In Continual Learning (2021) • NeurIPS 2021 • 44 citations
Ke et al.
Pre-trained Models: Past, Present And Future (2021) • AI Open • 700 citations
Han et al.
KLUE: Korean Language Understanding Evaluation (2021) • Arxiv • 78 citations
Park et al.
Tip-adapter: Training-free Clip-adapter For Better Vision-language Modeling (2021) • Arxiv • 128 citations
Zhang et al.
On The Effectiveness Of Adapter-based Tuning For Pretrained Language Model Adaptation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 112 citations
He et al.
Towards A Unified View Of Parameter-efficient Transfer Learning (2021) • Arxiv • 277 citations
He et al.
Multitask Prompted Training Enables Zero-shot Task Generalization (2021) • Arxiv • 558 citations
Sanh et al.
Does CLIP Benefit Visual Question Answering In The Medical Domain As Much As It Does In The General Domain? (2021) • Arxiv • 41 citations
Sedigheh Eslami, Gerard de Melo, Christoph Meinel
Compressing Visual-linguistic Model Via Knowledge Distillation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 50 citations
Fang et al.
SATAR: A Self-supervised Approach To Twitter Account Representation Learning And Its Application In Bot Detection (2021) • CIKM '21: The 30th ACM International Conference on Information and Knowledge Management • 60 citations
Feng et al.
Multi-task Pre-training For Plug-and-play Task-oriented Dialogue System (2021) • Arxiv • 57 citations
Su et al.
CPM-2: Large-scale Cost-effective Pre-trained Language Models (2021) • AI Open • 51 citations
Zhang et al.
Stylegan-nada: Clip-guided Domain Adaptation Of Image Generators (2021) • Arxiv • 65 citations
Gal et al.
Bigssl: Exploring The Frontier Of Large-scale Semi-supervised Learning For Automatic Speech Recognition (2021) • IEEE Journal of Selected Topics in Signal Processing • 148 citations
Zhang et al.
Clip-adapter: Better Vision-language Models With Feature Adapters (2021) • International Journal of Computer Vision • 617 citations
Gao et al.
Rethink Training Of BERT Rerankers In Multi-stage Retrieval Pipeline (2021) • Lecture Notes in Computer Science • 71 citations
Luyu Gao, Zhuyun Dai, Jamie Callan
Unsupervised Corpus Aware Language Model Pre-training For Dense Passage Retrieval (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 88 citations
Luyu Gao, Jamie Callan
Self-supervised Text-to-sql Learning With Header Alignment Training (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 160 citations
Donggyu Kim, Seanie Lee
PICARD: Parsing Incrementally For Constrained Auto-regressive Decoding From Language Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 140 citations
Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau
Cross-attention Is All You Need: Adapting Pretrained Transformers For Machine Translation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 71 citations
Mozhdeh Gheini, Xiang Ren, Jonathan May
Climatebert: A Pretrained Language Model For Climate-related Text (2021) • SSRN Electronic Journal • 59 citations
Webersinke et al.
P-tuning V2: Prompt Tuning Can Be Comparable To Fine-tuning Universally Across Scales And Tasks (2021) • Arxiv • 261 citations
Liu et al.
Pretrained Transformers As Universal Computation Engines (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 71 citations
Lu et al.
PPT: Pre-trained Prompt Tuning For Few-shot Learning (2021) • Arxiv • 100 citations
Gu et al.
Large Pre-trained Language Models Contain Human-like Biases Of What Is Right And Wrong To Do (2021) • Nature Machine Intelligence • 194 citations
Schramowski et al.
Cutting Down On Prompts And Parameters: Simple Few-shot Learning With Language Models (2021) • Findings of the Association for Computational Linguistics: ACL 2022 • 42 citations
Logan et al.
CPT: Colorful Prompt Tuning For Pre-trained Vision-language Models (2021) • AI Open • 62 citations
Yao et al.
R-drop: Regularized Dropout For Neural Networks (2021) • Arxiv • 305 citations
Liang et al.
Lira: Learning Visual Speech Representations From Audio Through Self-supervision (2021) • Interspeech 2021 • 41 citations
Ma et al.
Cosda-ml: Multi-lingual Code-switching Data Augmentation For Zero-shot Cross-lingual NLP (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 126 citations
Qin et al.
Learning-to-rank With BERT In Tf-ranking (2020) • Arxiv • 60 citations
Han et al.
Fastbert: A Self-distilling BERT With Adaptive Inference Time (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 245 citations
Liu et al.
Don't Stop Pretraining: Adapt Language Models To Domains And Tasks (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 740 citations
Gururangan et al.
Parameter-efficient Transfer From Sequential Behaviors For User Modeling And Recommendation (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 150 citations
Yuan et al.
CERT: Contrastive Self-supervised Learning For Language Understanding (2020) • Arxiv • 196 citations
Fang et al.
Unifiedqa: Crossing Format Boundaries With A Single QA System (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 51 citations
Khashabi et al.
Mpnet: Masked And Permuted Pre-training For Language Understanding (2020) • Arxiv • 497 citations
Song et al.
Codebert: A Pre-trained Model For Programming And Natural Languages (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 2031 citations
Feng et al.
Exploiting Structured Knowledge In Text Via Graph-guided Representation Learning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Shen et al.
Ternarybert: Distillation-aware Ultra-low Bit BERT (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 140 citations
Zhang et al.
ERNIE-GEN: An Enhanced Multi-flow Pre-training And Fine-tuning Framework For Natural Language Generation (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 105 citations
Xiao et al.
Revisiting Few-sample BERT Fine-tuning (2020) • Arxiv • 55 citations
Zhang et al.
Towards Learning A Generic Agent For Vision-and-language Navigation Via Pre-training (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 221 citations
Hao et al.
It's Not Just Size That Matters: Small Language Models Are Also Few-shot Learners (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 234 citations
Timo Schick, Hinrich Schütze
A Knowledge-enhanced Pretraining Model For Commonsense Story Generation (2020) • Transactions of the Association for Computational Linguistics • 231 citations
Guan et al.
Compressing BERT: Studying The Effects Of Weight Pruning On Transfer Learning (2020) • Proceedings of the 5th Workshop on Representation Learning for NLP • 78 citations
Mitchell A. Gordon, Kevin Duh, Nicholas Andrews
Speaker-aware BERT For Multi-turn Response Selection In Retrieval-based Chatbots (2020) • Proceedings of the 29th ACM International Conference on Information & Knowledge Management • 148 citations
Gu et al.
Domain-specific Language Model Pretraining For Biomedical Natural Language Processing (2020) • ACM Transactions on Computing for Healthcare • 915 citations
Gu et al.
Train No Evil: Selective Masking For Task-guided Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 42 citations
Gu et al.
Colbert: Efficient And Effective Passage Search Via Contextualized Late Interaction Over BERT (2020) • Arxiv • 189 citations
Omar Khattab, Matei Zaharia
Coreferential Reasoning Learning For Language Representation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 160 citations
Ye et al.
Deberta: Decoding-enhanced BERT With Disentangled Attention (2020) • Arxiv • 412 citations
He et al.
Adapterfusion: Non-destructive Task Composition For Transfer Learning (2020) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 41 citations
Pfeiffer et al.
Large-scale Adversarial Training For Vision-and-language Representation Learning (2020) • Arxiv • 287 citations
Gan et al.
Incorporating BERT Into Parallel Sequence Decoding With Adapters (2020) • Arxiv • 40 citations
Guo et al.
Making Pre-trained Language Models Better Few-shot Learners (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 556 citations
Tianyu Gao, Adam Fisch, Danqi Chen
Pretrained Transformers For Simple Question Answering Over Knowledge Graphs (2020) • Lecture Notes in Computer Science • 41 citations
D. Lukovnikov, A. Fischer, J. Lehmann
How Can We Know When Language Models Know? On The Calibration Of Language Models For Question Answering (2020) • Transactions of the Association for Computational Linguistics • 93 citations
Jiang et al.
Intrinsic Dimensionality Explains The Effectiveness Of Language Model Fine-tuning (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 154 citations
Armen Aghajanyan, Luke Zettlemoyer, Sonal Gupta
Uncertainty-aware Self-training For Text Classification With Few Labels (2020) • Arxiv • 41 citations
Subhabrata Mukherjee, Ahmed Hassan Awadallah
How Much Knowledge Can You Pack Into The Parameters Of A Language Model? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 187 citations
Adam Roberts, Colin Raffel, Noam Shazeer
BOFFIN TTS: Few-shot Speaker Adaptation By Bayesian Optimization (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 56 citations
Moss et al.
Imagebert: Cross-modal Pre-training With Large-scale Weak-supervised Image-text Data (2020) • Arxiv • 154 citations
Qi et al.
Binarybert: Pushing The Limit Of BERT Quantization (2020) • Arxiv • 45 citations
Bai et al.
Exploring Versatile Generative Language Model Via Parameter-efficient Transfer Learning (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 45 citations
Zhaojiang Lin, Andrea Madotto, Pascale Fung
Pre-training Via Paraphrasing (2020) • Arxiv • 89 citations
Lewis et al.
WNUT-2020 Task 2: Identification Of Informative COVID-19 English Tweets (2020) • Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020) • 71 citations
Nguyen et al.
PERL: Pivot-based Domain Adaptation For Pre-trained Deep Contextualized Embedding Models (2020) • Transactions of the Association for Computational Linguistics • 47 citations
Eyal Ben-David, Carmel Rabinovitz, Roi Reichart
Chatbot Interaction With Artificial Intelligence: Human Data Augmentation With T5 And Language Transformer Ensemble For Text Classification (2020) • Journal of Ambient Intelligence and Humanized Computing • 54 citations
Jordan J. Bird, Anikó Ekárt, Diego R. Faria
Deep Entity Matching With Pre-trained Language Models (2020) • Proceedings of the VLDB Endowment • 246 citations
Li et al.
Byte Pair Encoding Is Suboptimal For Language Model Pretraining (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 69 citations
Kaj Bostrom, Greg Durrett
Syntactic Data Augmentation Increases Robustness To Inference Heuristics (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 143 citations
Min et al.
Self-supervised Contrastive Learning For Code Retrieval And Summarization Via Semantic-preserving Transformations (2020) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 89 citations
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang
Backdoor Attacks Against Transfer Learning With Pre-trained Deep Learning Models (2020) • IEEE Transactions on Services Computing • 68 citations
Wang et al.
DIET: Lightweight Language Understanding For Dialogue Systems (2020) • Arxiv • 112 citations
Bunk et al.
Hatebert: Retraining BERT For Abusive Language Detection In English (2020) • Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) • 60 citations
Caselli et al.
Snippext: Semi-supervised Opinion Mining With Augmented Data (2020) • Proceedings of The Web Conference 2020 • 50 citations
Miao et al.
SOLOIST: Building Task Bots At Scale With Transfer Learning And Machine Teaching (2020) • Arxiv • 99 citations
Peng et al.
Adversarial Robustness: From Self-supervised Pre-training To Fine-tuning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 167 citations
Chen et al.
Recall And Learn: Fine-tuning Deep Pretrained Language Models With Less Forgetting (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 66 citations
Chen et al.
Low-resource Domain Adaptation For Compositional Task-oriented Semantic Parsing (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 64 citations
Chen et al.
A Rigorous Study On Named Entity Recognition: Can Fine-tuning Pretrained Model Lead To The Promised Land? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 40 citations
Lin et al.
Multi-task Learning Based Pre-trained Language Model For Code Completion (2020) • ASE '20: 35th IEEE/ACM International Conference on Automated Software Engineering • 123 citations
Liu et al.
Optimus: Organizing Sentences Via Pre-trained Modeling Of A Latent Space (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 129 citations
Li et al.
Rethinking The Hyperparameters For Fine-tuning (2020) • Arxiv • 62 citations
Li et al.
Adapterhub: A Framework For Adapting Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 181 citations
Pfeiffer et al.
Dialoglue: A Natural Language Understanding Benchmark For Task-oriented Dialogue (2020) • Arxiv • 97 citations
Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur
Pre-trained Summarization Distillation (2020) • Arxiv • 57 citations
Sam Shleifer, Alexander M. Rush
From Zero To Hero: On The Limitations Of Zero-shot Cross-lingual Transfer With Multilingual Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 50 citations
Lauscher et al.
Exploring And Predicting Transferability Across NLP Tasks (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 100 citations
Vu et al.
What The [MASK]? Making Sense Of Language-specific BERT Models (2020) • Arxiv • 83 citations
Debora Nozza, Federico Bianchi, Dirk Hovy
Interbert: Vision-and-language Interaction For Multi-modal Pretraining (2020) • Arxiv • 56 citations
Lin et al.
Leveraging Monolingual Data With Self-supervision For Multilingual Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 53 citations
Siddhant et al.
Multilingual Translation With Extensible Multilingual Pretraining And Finetuning (2020) • Arxiv • 150 citations
Tang et al.
Intermediate-task Transfer Learning With Pretrained Models For Natural Language Understanding: When And Why Does It Work? (2020) • Arxiv • 52 citations
Pruksachatkun et al.
Cold-start Active Learning Through Self-supervised Language Modeling (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 61 citations
Michelle Yuan, Hsuan-Tien Lin, Jordan Boyd-Graber
A Multimodal Framework For The Detection Of Hateful Memes (2020) • PMLR 133344-360 2021 • 46 citations
Lippe et al.
Improving Vision-and-language Navigation With Image-text Pairs From The Web (2020) • Lecture Notes in Computer Science • 46 citations
Majumdar et al.
Salvaging Federated Learning By Local Adaptation (2020) • Arxiv • 68 citations
Tao Yu, Eugene Bagdasaryan, Vitaly Shmatikov
Revisiting Pre-trained Models For Chinese Natural Language Processing (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 452 citations
Cui et al.
Data Augmentation Using Pre-trained Transformer Models (2020) • Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems • 61 citations
Varun Kumar, Ashutosh Choudhary, Eunah Cho
Texthide: Tackling Data Privacy In Language Understanding Tasks (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 42 citations
Huang et al.
Med-bert: Pre-trained Contextualized Embeddings On Large-scale Structured Electronic Health Records For Disease Prediction (2020) • Arxiv • 61 citations
Rasmy et al.
A Simple But Tough-to-beat Data Augmentation Approach For Natural Language Understanding And Generation (2020) • Arxiv • 94 citations
Shen et al.
UP-DETR: Unsupervised Pre-training For Object Detection With Transformers (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 401 citations
Dai et al.
Understanding Neural Abstractive Summarization Models Via Uncertainty (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 45 citations
Jiacheng Xu, Shrey Desai, Greg Durrett
Weight Poisoning Attacks On Pre-trained Models (2020) • Arxiv • 49 citations
Keita Kurita, Paul Michel, Graham Neubig
Contrastive Distillation On Intermediate Representations For Language Model Compression (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Sun et al.
Mobilebert: A Compact Task-agnostic BERT For Resource-limited Devices (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 593 citations
Sun et al.
Generating Accurate Assert Statements For Unit Test Cases Using Pretrained Transformers (2020) • AST '22: IEEE/ACM 3rd International Conference on Automation of Software Test • 66 citations
Tufano et al.
Improving BERT Fine-tuning Via Self-ensemble And Self-distillation (2020) • Journal of Computer Science and Technology • 58 citations
Xu et al.
PROP: Pre-training With Representative Words Prediction For Ad-hoc Retrieval (2020) • Proceedings of the 14th ACM International Conference on Web Search and Data Mining • 41 citations
Ma et al.
Med7: A Transferable Clinical Natural Language Processing Model For Electronic Health Records (2020) • Arxiv • 143 citations
Kormilitzin et al.
Attention Flows: Analyzing And Comparing Attention Mechanisms In Language Models (2020) • IEEE Transactions on Visualization and Computer Graphics • 78 citations
Joseph F Derose, Jiayao Wang, Matthew Berger
As Good As New. How To Successfully Recycle English GPT-2 To Make Models For Other Languages (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 40 citations
Wietse de Vries, Malvina Nissim
Gector -- Grammatical Error Correction: Tag, Not Rewrite (2020) • Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications • 103 citations
Omelianchuk et al.
Enabling Language Models To Fill In The Blanks (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 41 citations
Chris Donahue, Mina Lee, Percy Liang
Robbert: A Dutch Roberta-based Language Model (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 103 citations
Pieter Delobelle, Thomas Winters, Bettina Berendt
TURL: Table Understanding Through Representation Learning (2020) • Arxiv • 43 citations
Deng et al.
Small And Practical BERT Models For Sequence Labeling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 117 citations
Tsai et al.
Multifit: Efficient Multi-lingual Language Model Fine-tuning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 48 citations
Eisenschlos et al.
Structured Query Construction Via Knowledge Graph Embedding (2019) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 72 citations
Wang et al.
Learning And Evaluating General Linguistic Intelligence (2019) • Arxiv • 156 citations
Yogatama et al.
ERNIE 2.0: A Continual Pre-training Framework For Language Understanding (2019) • Arxiv • 74 citations
Sun et al.
Visualizing And Understanding The Effectiveness Of BERT (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 169 citations
Hao et al.
Universal Text Representation From BERT: An Empirical Study (2019) • Arxiv • 40 citations
Ma et al.
Thieves On Sesame Street! Model Extraction Of Bert-based Apis (2019) • Arxiv • 73 citations
Krishna et al.
Unsupervised Data Augmentation For Consistency Training (2019) • Arxiv • 1615 citations
Xie et al.
Learning To Navigate Unseen Environments: Back Translation With Environmental Dropout (2019) • Proceedings of the 2019 Conference of the North • 288 citations
Hao Tan, Licheng Yu, Mohit Bansal
Linguistic Knowledge And Transferability Of Contextual Representations (2019) • Proceedings of the 2019 Conference of the North • 57 citations
Liu et al.
Parameter-efficient Transfer Learning For NLP (2019) • Arxiv • 144 citations
Houlsby et al.
Masked Language Model Scoring (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 145 citations
Salazar et al.
MASS: Masked Sequence To Sequence Pre-training For Language Generation (2019) • Arxiv • 579 citations
Song et al.
Multilingual Is Not Enough: BERT For Finnish (2019) • Arxiv • 121 citations
Virtanen et al.
An Embarrassingly Simple Approach For Transfer Learning From Pretrained Language Models (2019) • Proceedings of the 2019 Conference of the North • 47 citations
Alexandra Chronopoulou, Christos Baziotis, Alexandros Potamianos
Well-read Students Learn Better: On The Importance Of Pre-training Compact Models (2019) • Arxiv • 428 citations
Turc et al.
Unsupervised Domain Adaptation Of Contextualized Embeddings For Sequence Labeling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 154 citations
Xiaochuang Han, Jacob Eisenstein
Pre-training With Whole Word Masking For Chinese BERT (2019) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 957 citations
Cui et al.
NEZHA: Neural Contextualized Representation For Chinese Language Understanding (2019) • Arxiv • 86 citations
Wei et al.
How Does BERT Answer Questions? A Layer-wise Analysis Of Transformer Representations (2019) • CIKM '19: The 28th ACM International Conference on Information and Knowledge Management • 63 citations
Aken et al.
LXMERT: Learning Cross-modality Encoder Representations From Transformers (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 1297 citations
Hao Tan, Mohit Bansal
A Surprisingly Robust Trick For Winograd Schema Challenge (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 91 citations
Kocijan et al.
Tree Transformer: Integrating Tree Structures Into Self-attention (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 48 citations
Yau-Shian Wang, Hung-Yi Lee, Yun-Nung Chen
Olmpics -- On What Language Model Pre-training Captures (2019) • Transactions of the Association for Computational Linguistics • 55 citations
Talmor et al.
75 Languages, 1 Model: Parsing Universal Dependencies Universally (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 61 citations
Dan Kondratyuk, Milan Straka
Understanding The Behaviors Of BERT In Ranking (2019) • Arxiv • 145 citations
Qiao et al.
SMART: Robust And Efficient Fine-tuning For Pre-trained Natural Language Models Through Principled Regularized Optimization (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 138 citations
Jiang et al.
ERNIE: Enhanced Language Representation With Informative Entities (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1348 citations
Zhang et al.
Improving Relation Extraction By Pre-trained Language Representations (2019) • Proceedings of AKBC 2019 • 53 citations
Christoph Alt, Marc Hübner, Leonhard Hennig
Fine-tuning Pre-trained Transformer Language Models To Distantly Supervised Relation Extraction (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 58 citations
Christoph Alt, Marc Hübner, Leonhard Hennig
Story Ending Prediction By Transferable BERT (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 50 citations
Zhongyang Li, Xiao Ding, Ting Liu
Text Summarization With Pretrained Encoders (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 1525 citations
Yang Liu, Mirella Lapata
Portuguese Named Entity Recognition Using BERT-CRF (2019) • Arxiv • 180 citations
Fábio Souza, Rodrigo Nogueira, Roberto Lotufo
Bert-based Ranking For Biomedical Entity Normalization (2019) • Arxiv • 93 citations
Zongcheng Ji, Qiang Wei, Hua Xu
TANDA: Transfer And Adapt Pre-trained Transformer Models For Answer Sentence Selection (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 92 citations
Siddhant Garg, Thuy Vu, Alessandro Moschitti
Mockingjay: Unsupervised Speech Representation Learning With Deep Bidirectional Transformer Encoders (2019) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 195 citations
Liu et al.
Simple, Scalable Adaptation For Neural Machine Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 84 citations
Ankur Bapna, Naveen Arivazhagan, Orhan Firat
Multi-task Deep Neural Networks For Natural Language Understanding (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1026 citations
Liu et al.
BART: Denoising Sequence-to-sequence Pre-training For Natural Language Generation, Translation, And Comprehension (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 3464 citations
Lewis et al.
Q-BERT: Hessian Based Ultra Low Precision Quantization Of BERT (2019) • AAAI 2020 • 52 citations
Shen et al.
To Tune Or Not To Tune? Adapting Pretrained Representations To Diverse Tasks (2019) • Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) • 57 citations
Matthew E. Peters, Sebastian Ruder, Noah A. Smith
Freelb: Enhanced Adversarial Training For Natural Language Understanding (2019) • Arxiv • 176 citations
Zhu et al.
Do Attention Heads In BERT Track Syntactic Dependencies? (2019) • Arxiv • 88 citations
Htut et al.
BERT And Pals: Projected Attention Layers For Efficient Adaptation In Multi-task Learning (2019) • Arxiv • 113 citations
Asa Cooper Stickland, Iain Murray
Biobert: A Pre-trained Biomedical Language Representation Model For Biomedical Text Mining (2019) • Bioinformatics • 6102 citations
Lee et al.
Pubmedqa: A Dataset For Biomedical Research Question Answering (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 425 citations
Jin et al.
Adversarial Domain Adaptation For Machine Reading Comprehension (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 53 citations
Wang et al.
Reweighted Proximal Pruning For Large-scale Language Representation (2019) • Arxiv • 45 citations
Guo et al.
Fine-tune Bert For Docred With Two-step Process (2019) • Arxiv • 116 citations
Wang et al.
Integrating Multimodal Information In Large Pretrained Transformers (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 421 citations
Rahman et al.
Automatic Spanish Translation Of The Squad Dataset For Multilingual Question Answering (2019) • Arxiv • 42 citations
Casimiro Pio Carrino, Marta R. Costa-Jussà, José A. R. Fonollosa
BERT Post-training For Review Reading Comprehension And Aspect-based Sentiment Analysis (2019) • Arxiv • 358 citations
Xu et al.
Taming Pretrained Transformers For Extreme Multi-label Text Classification (2019) • KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 135 citations
Chang et al.
Low Resource Text Classification With Ulmfit And Backtranslation (2019) • Arxiv • 43 citations
Sam Shleifer
Blackmarks: Blackbox Multibit Watermarking For Deep Neural Networks (2019) • Arxiv • 41 citations
Huili Chen, Bita Darvish Rouhani, Farinaz Koushanfar
Hierarchical Transformers For Long Document Classification (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 137 citations
Pappagari et al.
End-to-end Open-domain Question Answering With Bertserini (2019) • Proceedings of the 2019 Conference of the North • 162 citations
Yang et al.
Universal Language Model Fine-tuning For Text Classification (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1950 citations
Jeremy Howard, Sebastian Ruder
Universal Sentence Encoder (2018) • Arxiv • 1289 citations
Cer et al.
Attention-based LSTM For Psychological Stress Detection From Spoken Language Using Distant Supervision (2018) • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Genta Indra Winata, Onno Pepijn Kampman, Pascale Fung
On The Effectiveness Of Task Granularity For Transfer Learning (2018) • Arxiv • 50 citations
Mahdisoltani et al.
Training Millions Of Personalized Dialogue Agents (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 205 citations
Mazaré et al.
Pathologies Of Neural Models Make Interpretations Difficult (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 235 citations
Feng et al.
Pythia V0.1: The Winning Entry To The VQA Challenge 2018 (2018) • Arxiv • 165 citations
Jiang et al.
Neural Fine-grained Entity Type Classification With Hierarchy-aware Loss (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 74 citations
Peng Xu, Denilson Barbosa
Sentence Encoders On Stilts: Supplementary Training On Intermediate Labeled-data Tasks (2018) • Arxiv • 258 citations
Jason Phang, Thibault Févry, Samuel R. Bowman
Shortcut-stacked Sentence Encoders For Multi-domain Inference (2017) • Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP • 119 citations
Yixin Nie, Mohit Bansal
Non-autoregressive Neural Machine Translation (2017) • Arxiv • 449 citations
Gu et al.
Aspect-augmented Adversarial Networks For Domain Adaptation (2017) • Transactions of the Association for Computational Linguistics • 93 citations
Yuan Zhang, Regina Barzilay, Tommi Jaakkola
Dissent: Sentence Representation Learning From Explicit Discourse Relations (2017) • Arxiv • 59 citations
Allen Nie, Erin D. Bennett, Noah D. Goodman
Fast Domain Adaptation For Neural Machine Translation (2016) • Arxiv • 164 citations
Markus Freitag, Yaser Al-Onaizan

Showing first 12 while collapsed. Click to expand and reveal all 1097.

Formal Verification 51 papers #

Tattoo: Tool-grounded Thinking PRM For Test-time Scaling In Tabular Reasoning (2025) • No Venue
Zou et al.
Generative Universal Verifier As Multimodal Meta-reasoner (2025) • No Venue
Zhang et al.
How Far Are We From Genuinely Useful Deep Research Agents? (2025) • No Venue
Zhang et al.
The Lessons Of Developing Process Reward Models In Mathematical Reasoning (2025) • No Venue
Zhang et al.
Formalmath: Benchmarking Formal Mathematical Reasoning Of Large Language Models (2025) • No Venue
Yu et al.
Mcpmark: A Benchmark For Stress-testing Realistic And Comprehensive MCP Use (2025) • No Venue
Wu et al.
Ui-genie: A Self-improving Approach For Iteratively Boosting Mllm-based Mobile GUI Agents (2025) • No Venue
Xiao et al.
Pretrainzero: Reinforcement Active Pretraining (2025) • No Venue
Xing et al.
Kodcode: A Diverse, Challenging, And Verifiable Synthetic Dataset For Coding (2025) • No Venue
Xu et al.
Re:form -- Reducing Human Priors In Scalable Formal Software Verification With RL In Llms: A Preliminary Study On Dafny (2025) • No Venue
Yan et al.
CRANE: Reasoning With Constrained LLM Generation (2025) • No Venue
Banerjee et al.
Reform: Reflective Autoformalization With Prospective Bounded Sequence Optimization (2025) • No Venue
Chen et al.
Verithinker: Learning To Verify Makes Reasoning Model Efficient (2025) • No Venue
Chen et al.
Vericot: Neuro-symbolic Chain-of-thought Validation Via Logical Consistency Checks (2025) • No Venue
Feng et al.
Protoreasoning: Prototypes As The Foundation For Generalizable Reasoning In Llms (2025) • No Venue
He et al.
Loong: Synthesize Long Chain-of-thoughts At Scale Through Verifiers (2025) • No Venue
Huang et al.
CODESIM: Multi-agent Code Generation And Problem Solving Through Simulation-driven Planning And Debugging (2025) • No Venue
Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez
T1: Tool-integrated Self-verification For Test-time Compute Scaling In Small Language Models (2025) • No Venue
Minki Kang, Jongwon Jeong, Jaewoong Cho
CLUE: Non-parametric Verification From Experience Via Hidden-state Clustering (2025) • No Venue
Liang et al.
Scaling Code-assisted Chain-of-thoughts And Instructions For Model Reasoning (2025) • No Venue
Lin et al.
Criticlean: Critic-guided Reinforcement Learning For Mathematical Formalization (2025) • No Venue
Peng et al.
Deepseekmath-v2: Towards Self-verifiable Mathematical Reasoning (2025) • No Venue
Shao et al.
Solving Inequality Proofs With Large Language Models (2025) • No Venue
Sheng et al.
Heimdall: Test-time Scaling On The Generative Verification (2025) • No Venue
Wenlei Shi, Xing Jin
Progco: Program Helps Self-correction Of Large Language Models (2025) • No Venue
Song et al.
Os-sentinel: Towards Safety-enhanced Mobile GUI Agents Via Hybrid Validation In Realistic Workflows (2025) • No Venue
Sun et al.
Automated Unit Test Improvement Using Large Language Models At Meta (2024) • FSE '24: 32nd ACM International Conference on the Foundations of Software Engineering • 53 citations
Alshahwan et al.
From Code To Correctness: Closing The Last Mile Of Code Generation With Hierarchical Debugging (2024) • No Venue
Shi et al.
Apigen: Automated Pipeline For Generating Verifiable And Diverse Function-calling Datasets (2024) • No Venue
Liu et al.
Training Software Engineering Agents And Verifiers With Swe-gym (2024) • No Venue
Pan et al.
Progressive Multimodal Reasoning Via Active Retrieval (2024) • No Venue
Dong et al.
Search, Verify And Feedback: Towards Next Generation Post-training Paradigm Of Foundation Models Via Verifier Engineering (2024) • No Venue
Guan et al.
Hallucination Is Inevitable: An Innate Limitation Of Large Language Models (2024) • Arxiv • 138 citations
Ziwei Xu, Sanjay Jain, Mohan Kankanhalli
Deepseek-prover-v1.5: Harnessing Proof Assistant Feedback For Reinforcement Learning And Monte-carlo Tree Search (2024) • No Venue
Xin et al.
Logic-lm: Empowering Large Language Models With Symbolic Solvers For Faithful Logical Reasoning (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 52 citations
Pan et al.
Selfcheck: Using Llms To Zero-shot Check Their Own Step-by-step Reasoning (2023) • No Venue
Ning Miao, Yee Whye Teh, Tom Rainforth
Specinfer: Accelerating Generative Large Language Model Serving With Tree-based Speculative Inference And Verification (2023) • ASPLOS '24: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 • 44 citations
Miao et al.
Llemma: An Open Language Model For Mathematics (2023) • No Venue
Azerbayev et al.
Nl2spec: Interactively Translating Unstructured Natural Language To Temporal Logics With Large Language Models (2023) • Lecture Notes in Computer Science • 61 citations
Cosler et al.
Chain-of-verification Reduces Hallucination In Large Language Models (2023) • No Venue
Dhuliawala et al.
Large Language Model For Science: A Study On P Vs. NP (2023) • No Venue
Dong et al.
Language Models Can Be Logical Solvers (2023) • No Venue
Feng et al.
Baldur: Whole-proof Generation And Repair With Large Language Models (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 59 citations
First et al.
LLM For Soc Security: A Paradigm Shift (2023) • IEEE Access • 52 citations
Saha et al.
A Survey On Automated Program Repair Techniques (2023) • Artificial Intelligence Review • 62 citations
Huang et al.
(security) Assertions By Large Language Models (2023) • IEEE Transactions on Information Forensics and Security • 44 citations
Kande et al.
Large Language Models Are Better Reasoners With Self-verification (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 49 citations
Weng et al.
Copy, Right? A Testing Framework For Copyright Protection Of Deep Learning Models (2021) • 2022 IEEE Symposium on Security and Privacy (SP) • 57 citations
Chen et al.
Automated Conformance Testing For Javascript Engines Via Deep Compiler Fuzzing (2021) • Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation • 66 citations
Ye et al.
Achieving Verified Robustness To Symbol Substitutions Via Interval Bound Propagation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 121 citations
Huang et al.
Neuro-symbolic Program Synthesis (2016) • Arxiv • 105 citations
Parisotto et al.

Showing first 12 while collapsed. Click to expand and reveal all 51.

— H —

Human Ai Collaboration 482 papers #

Deepresearch Arena: The First Exam Of Llms' Research Abilities Via Seminar-grounded Tasks (2025) • No Venue
Wan et al.
Mcp-bench: Benchmarking Tool-using LLM Agents With Complex Real-world Tasks Via MCP Servers (2025) • No Venue
Wang et al.
Vibe Checker: Aligning Code Evaluation With Human Preference (2025) • No Venue
Zhong et al.
Interactive Training: Feedback-driven Neural Network Optimization (2025) • No Venue
Wentao Zhang, Yang Young Lu, Yuntian Deng
Latent Sketchpad: Sketching Visual Thoughts To Elicit Multimodal Reasoning In Mllms (2025) • No Venue
Zhang et al.
Chemdfm-r: An Chemical Reasoner LLM Enhanced With Atomized Chemical Knowledge (2025) • No Venue
Zhao et al.
Newtonbench: Benchmarking Generalizable Scientific Law Discovery In LLM Agents (2025) • No Venue
Zheng et al.
Livecodebench Pro: How Do Olympiad Medalists Judge Llms In Competitive Programming? (2025) • No Venue
Zheng et al.
Aligning Multimodal LLM With Human Preference: A Survey (2025) • No Venue
Yu et al.
Vrbench: A Benchmark For Multi-step Reasoning In Long Narrative Videos (2025) • No Venue
Yu et al.
Designlab: Designing Slides Through Iterative Detection And Correction (2025) • No Venue
Yun et al.
API Agents Vs. GUI Agents: Divergence And Convergence (2025) • No Venue
Zhang et al.
Agent Learning Via Early Experience (2025) • No Venue
Zhang et al.
Aixiv: A Next-generation Open Access Ecosystem For Scientific Discovery Generated By AI Scientists (2025) • No Venue
Zhang et al.
Browseragent: Building Web Agents With Human-inspired Web Browsing Actions (2025) • No Venue
Zhang et al.
Roboomni: Proactive Robot Manipulation In Omni-modal Context (2025) • No Venue
Wang et al.
Worldgen: From Text To Traversable And Interactive 3D Worlds (2025) • No Venue
Wang et al.
Any2caption:interpreting Any Condition To Caption For Controllable Video Generation (2025) • No Venue
Wu et al.
Aworld: Dynamic Multi-agent System With Stable Maneuvering For Robust GAIA Problem Solving (2025) • No Venue
Xie et al.
Comfyui-copilot: An Intelligent Assistant For Automated Workflow Development (2025) • No Venue
Xu et al.
Filmagent: A Multi-agent Framework For End-to-end Film Automation In Virtual 3D Spaces (2025) • No Venue
Xu et al.
Easyedit2: An Easy-to-use Steering Framework For Editing Large Language Models (2025) • No Venue
Xu et al.
Egolife: Towards Egocentric Life Assistant (2025) • No Venue
Yang et al.
Twinmarket: A Scalable Behavioral And Social Simulation For Financial Markets (2025) • No Venue
Yang et al.
Spin-bench: How Well Do Llms Plan Strategically And Reason Socially? (2025) • No Venue
Yao et al.
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025) • No Venue
Agrawal et al.
Open Deep Search: Democratizing Search With Open-source Reasoning Agents (2025) • No Venue
Alzubi et al.
Humo: Human-centric Video Generation Via Collaborative Multi-modal Conditioning (2025) • No Venue
Chen et al.
Reform: Reflective Autoformalization With Prospective Bounded Sequence Optimization (2025) • No Venue
Chen et al.
Persona Vectors: Monitoring And Controlling Character Traits In Language Models (2025) • No Venue
Chen et al.
Gold-medalist Performance In Solving Olympiad Geometry With Alphageometry2 (2025) • No Venue
Chervonyi et al.
Swe-bench Pro: Can AI Agents Solve Long-horizon Software Engineering Tasks? (2025) • No Venue
Deng et al.
Deepresearch Bench: A Comprehensive Benchmark For Deep Research Agents (2025) • No Venue
Du et al.
A Multi-modal AI Copilot For Single-cell Analysis With Instruction Following (2025) • No Venue
Fang et al.
A Survey Of Vibe Coding With Large Language Models (2025) • No Venue
Ge et al.
Great Models Think Alike And This Undermines AI Oversight (2025) • No Venue
Goel et al.
Fasta^*: Fast-slow Toolpath Agent With Subroutine Mining For Efficient Multi-turn Image Editing (2025) • No Venue
Gupta et al.
Costaast: Cost-sensitive Toolpath Agent For Multi-turn Image Editing (2025) • No Venue
Gupta et al.
Group Think: Multiple Concurrent Reasoning Agents Collaborating At Token Level Granularity (2025) • No Venue
Hsu et al.
Llms Learn To Deceive Unintentionally: Emergent Misalignment In Dishonesty From Misaligned Samples To Biased Human-ai Interactions (2025) • No Venue
Hu et al.
Feedback Friction: Llms Struggle To Fully Incorporate External Feedback (2025) • No Venue
Jiang et al.
Omnihuman-1.5: Instilling An Active Mind In Avatars Via Cognitive Simulation (2025) • No Venue
Jiang et al.
Gigaevo: An Open Source Optimization Framework Powered By Llms And Evolution Algorithms (2025) • No Venue
Khrulkov et al.
Exp-bench: Can AI Conduct AI Research Experiments? (2025) • No Venue
Kon et al.
Hunyuan3d Studio: End-to-end AI Pipeline For Game-ready 3D Asset Generation (2025) • No Venue
Lei et al.
Webweaver: Structuring Web-scale Evidence With Dynamic Outlines For Open-ended Deep Research (2025) • No Venue
Li et al.
Jarvisart: Liberating Human Artistic Creativity Via An Intelligent Photo Retouching Agent (2025) • No Venue
Lin et al.
Metaladder: Ascending Mathematical Solution Quality Via Analogical-problem Reasoning Transfer (2025) • No Venue
Lin et al.
Infiguiagent: A Multimodal Generalist GUI Agent With Native Reasoning And Reflection (2025) • No Venue
Liu et al.
Pc-agent: A Hierarchical Multi-agent Collaboration Framework For Complex Task Automation On PC (2025) • No Venue
Liu et al.
Researchbench: Benchmarking Llms In Scientific Discovery Via Inspiration-based Task Decomposition (2025) • No Venue
Liu et al.
Learning From Peers In Reasoning Models (2025) • No Venue
Luo et al.
Language Models Can Learn From Verbal Feedback Without Scalar Rewards (2025) • No Venue
Luo et al.
Rethinking Diverse Human Preference Learning Through Principal Component Analysis (2025) • No Venue
Luo et al.
Deepseek-r1 Thoughtology: Let's About LLM Reasoning (2025) • No Venue
Marjanović et al.
Wikivideo: Article Generation From Multiple Videos (2025) • No Venue
Martin et al.
R-wom: Retrieval-augmented World Model For Computer-use Agents (2025) • No Venue
Mei et al.
Paper2agent: Reimagining Research Papers As Interactive And Reliable AI Agents (2025) • No Venue
Miao et al.
Swe-lancer: Can Frontier Llms Earn $1 Million From Real-world Freelance Software Engineering? (2025) • No Venue
Miserendino et al.
Large Language Models Think Too Fast To Explore Effectively (2025) • No Venue
Lan Pan, Hanbo Xie, Robert C. Wilson
Generating Physically Stable And Buildable LEGO Designs From Text (2025) • No Venue
Pun et al.
Dispider: Enabling Video Llms With Active Real-time Interaction Via Disentangled Perception, Decision, And Reaction (2025) • No Venue
Qian et al.
Userbench: An Interactive Gym Environment For User-centric Agents (2025) • No Venue
Qian et al.
Hogwild! Inference: Parallel LLM Generation Via Concurrent Attention (2025) • No Venue
Rodionov et al.
Agent Laboratory: Using LLM Agents As Research Assistants (2025) • No Venue
Schmidgall et al.
Scienceboard: Evaluating Multimodal Autonomous Agents In Realistic Scientific Workflows (2025) • No Venue
Sun et al.
Understanding Generative AI Capabilities In Everyday Image Editing Tasks (2025) • No Venue
Taesiri et al.
Enabling Scalable Oversight Via Self-evolving Critic (2025) • No Venue
Tang et al.
Agent KB: Leveraging Cross-domain Experience For Agentic Problem Solving (2025) • No Venue
Tang et al.
CS1-LLM: Integrating Llms Into CS1 Instruction (2024) • ITiCSE 2024: Innovation and Technology in Computer Science Education • 45 citations
Vadaparty et al.
The Effects Of Generative AI On Design Fixation And Divergent Thinking (2024) • Proceedings of the CHI Conference on Human Factors in Computing Systems • 80 citations
Wadinambiarachchi et al.
Gpt-4o System Card (2024) • No Venue
Openai et al.
Alignment Studio: Aligning Large Language Models To Particular Contextual Regulations (2024) • No Venue
Achintalwar et al.
Agent S: An Open Agentic Framework That Uses Computers Like A Human (2024) • No Venue
Agashe et al.
Automated Unit Test Improvement Using Large Language Models At Meta (2024) • FSE '24: 32nd ACM International Conference on the Foundations of Software Engineering • 53 citations
Alshahwan et al.
Homogenization Effects Of Large Language Models On Human Creative Ideation (2024) • C&C '24: Creativity and Cognition • 77 citations
Barrett R. Anderson, Jash Hemant Shah, Max Kreminski
Arigraph: Learning Knowledge Graph World Models With Episodic Memory For LLM Agents (2024) • No Venue
Anokhin et al.
Iris: An Ai-driven Virtual Tutor For Computer Science Education (2024) • Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 • 41 citations
Patrick Bassner, Eduard Frankford, Stephan Krusche
Text2sql Is Not Enough: Unifying AI And Databases With TAG (2024) • No Venue
Biswal et al.
Intelligent Clinical Documentation: Harnessing Generative AI For Patient-centric Clinical Note Generation (2024) • International Journal of Innovative Science and Research Technology (IJISRT) • 999 citations
Anjanava Biswas, Wrick Talukdar
Windows Agent Arena: Evaluating Multi-modal OS Agents At Scale (2024) • No Venue
Bonatti et al.
Language-based Game Theory In The Age Of Artificial Intelligence (2024) • Journal of The Royal Society Interface • 66 citations
Capraro et al.
PERSONA: A Reproducible Testbed For Pluralistic Alignment (2024) • No Venue
Castricato et al.
Language Models As Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning In Language Models (2024) • No Venue
Chae et al.
Mindsearch: Mimicking Human Minds Elicits Deep AI Searcher (2024) • No Venue
Chen et al.
Internet Of Agents: Weaving A Web Of Heterogeneous Agents For Collaborative Intelligence (2024) • No Venue
Chen et al.
Scienceagentbench: Toward Rigorous Assessment Of Language Agents For Data-driven Scientific Discovery (2024) • No Venue
Chen et al.
(A)I Am Not A Lawyer, But...: Engaging Legal Experts Towards Responsible LLM Policies For Legal Advice (2024) • FAccT '24: The 2024 ACM Conference on Fairness, Accountability, and Transparency • 56 citations
Cheong et al.
Transformer Explainer: Interactive Learning Of Text-generative Models (2024) • No Venue
Cho et al.
Large Language Models And User Trust: Consequence Of Self-referential Learning Loop And The Deskilling Of Healthcare Professionals (2024) • Journal of Medical Internet Research • 75 citations
Avishek Choudhury, Zaria Chaudhry
Large Legal Fictions: Profiling Legal Hallucinations In Large Language Models (2024) • Journal of Legal Analysis • 108 citations
Dahl et al.
ORGANA: A Robotic Assistant For Automated Chemistry Experimentation And Characterization (2024) • Matter • 52 citations
Darvish et al.
Symbolicai: A Framework For Logic-based Approaches Combining Generative Models And Solvers (2024) • No Venue
Dinu et al.
Desirable Characteristics For AI Teaching Assistants In Programming Education (2024) • Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 • 49 citations
Denny et al.
Shaping Human-ai Collaboration: Varied Scaffolding Levels In Co-writing With Language Models (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 44 citations
Dhillon et al.
Enhancing Large Language Models With Pseudo- And Multisource- Knowledge Graphs For Open-ended Question Answering (2024) • IEEE Robotics and Automation Letters • 47 citations
Liu et al.
Judging The Judges: Evaluating Alignment And Vulnerabilities In Llms-as-judges (2024) • No Venue
Thakur et al.
Llms In The Imaginarium: Tool Learning Through Simulated Trial And Error (2024) • No Venue
Wang et al.
Scaling Instructable Agents Across Many Simulated Worlds (2024) • No Venue
Team et al.
Learnlm: Improving Gemini For Learning (2024) • No Venue
Team et al.
Model Surgery: Modulating Llm's Behavior Via Simple Parameter Editing (2024) • No Venue
Wang et al.
Lami: Large Language Models For Multi-modal Human-robot Interaction (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 42 citations
Wang et al.
Lift: Leveraging Human Feedback For Text-to-video Model Alignment (2024) • No Venue
Wang et al.
LAVE: Llm-powered Agent Assistance And Language Augmentation For Video Editing (2024) • No Venue
Wang et al.
MOSAIC: A Modular System For Assistive And Interactive Cooking (2024) • No Venue
Wang et al.
Tutor Copilot: A Human-ai Approach For Scaling Real-time Expertise (2024) • No Venue
Wang et al.
LAMBDA: A Large Model Based Data Agent (2024) • No Venue
Sun et al.
Virtuwander: Enhancing Multi-modal Interaction For Virtual Tour Guidance Through Large Language Models (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 42 citations
Wang et al.
Opendevin: An Open Platform For AI Software Developers As Generalist Agents (2024) • No Venue
Wang et al.
Promptcharm: Text-to-image Generation Through Multi-modal Prompting And Refinement (2024) • Proceedings of the CHI Conference on Human Factors in Computing Systems • 66 citations
Wang et al.
Ai-augmented Brainwriting: Investigating The Use Of Llms In Group Ideation (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 67 citations
Shaer et al.
Who Validates The Validators? Aligning Llm-assisted Evaluation Of LLM Outputs With Human Preferences (2024) • UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology • 91 citations
Shankar et al.
Learning To Decode Collaboratively With Multiple Language Models (2024) • No Venue
Shen et al.
Can Llms Generate Novel Research Ideas? A Large-scale Human Study With 100+ NLP Researchers (2024) • No Venue
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Enhancing Llm-based Feedback: Insights From Intelligent Tutoring Systems And The Learning Sciences (2024) • Communications in Computer and Information Science • 41 citations
John Stamper, Ruiwei Xiao, Xinying Hou
What Large Language Models Know And What People Think They Know (2024) • Nature Machine Intelligence • 43 citations
Steyvers et al.
Magicquill: An Intelligent Interactive Image Editing System (2024) • No Venue
Liu et al.
Simulating Classroom Education With Llm-empowered Agents (2024) • No Venue
Zhang et al.
The AI Scientist: Towards Fully Automated Open-ended Scientific Discovery (2024) • No Venue
Lu et al.
Generative World Explorer (2024) • No Venue
Lu et al.
Large Language Models Surpass Human Experts In Predicting Neuroscience Results (2024) • Nature Human Behaviour • 57 citations
Luo et al.
Weblinx: Real-world Website Navigation With Multi-turn Dialogue (2024) • No Venue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
Evaluating The Effectiveness Of Llms In Introductory Computer Science Education: A Semester-long Field Study (2024) • L@S '24: Eleventh ACM Conference on Learning @ Scale • 46 citations
Lyu et al.
Language Model Can Listen While Speaking (2024) • No Venue
Ma et al.
Foundation Models For Music: A Survey (2024) • No Venue
Ma et al.
Gpt-4v(ision) Is A Human-aligned Evaluator For Text-to-3d Generation (2024) • No Venue
Wu et al.
MALT: Improving Reasoning With Multi-agent LLM Training (2024) • No Venue
Motwani et al.
ROS-LLM: A ROS Framework For Embodied AI With Task Feedback And Structured Reasoning (2024) • No Venue
Mower et al.
GUI Agents: A Survey (2024) • No Venue
Nguyen et al.
Can Llms Learn By Teaching? A Preliminary Study (2024) • No Venue
Ning et al.
Training Software Engineering Agents And Verifiers With Swe-gym (2024) • No Venue
Pan et al.
Dreambench++: A Human-aligned Benchmark For Personalized Image Generation (2024) • No Venue
Peng et al.
The Widening Gap: The Benefits And Harms Of Generative AI For Novice Programmers (2024) • Proceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1 • 87 citations
Prather et al.
Mutual Reasoning Makes Smaller Llms Stronger Problem-solvers (2024) • No Venue
Qi et al.
Processbench: Identifying Process Errors In Mathematical Reasoning (2024) • No Venue
Zheng et al.
Nnsight And NDIF: Democratizing Access To Foundation Model Internals (2024) • No Venue
Fiotto-Kaufman et al.
VITA: Towards Open-source Interactive Omni Multimodal LLM (2024) • No Venue
Fu et al.
Empowering Biomedical Discovery With AI Agents (2024) • Cell • 129 citations
Gao et al.
Dreamreward: Text-to-3d Generation With Human Preference (2024) • No Venue
Ye et al.
Justrank: Benchmarking LLM Judges For System Ranking (2024) • No Venue
Gera et al.
Navigating The Digital World As Humans Do: Universal Visual Grounding For GUI Agents (2024) • No Venue
Gou et al.
Spotting Llms With Binoculars: Zero-shot Detection Of Machine-generated Text (2024) • No Venue
Hans et al.
The Effects Of Generative AI On Computing Students' Help-seeking Preferences (2024) • Proceedings of the 26th Australasian Computing Education Conference • 61 citations
Hou et al.
The Dawn Of GUI Agent: A Preliminary Case Study With Claude 3.5 Computer Use (2024) • No Venue
Hu et al.
Visual Sketchpad: Sketching As A Visual Chain Of Thought For Multimodal Language Models (2024) • No Venue
Hu et al.
Genmac: Compositional Text-to-video Generation With Multi-agent Collaboration (2024) • No Venue
Huang et al.
Autocoderover: Autonomous Program Improvement (2024) • Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis • 54 citations
Zhang et al.
Promises And Challenges Of Generative Artificial Intelligence For Human Learning (2024) • Nature Human Behaviour • 164 citations
Yan et al.
Hidden Flaws Behind Expert-level Accuracy Of Multimodal GPT-4 Vision In Medicine (2024) • npj Digital Medicine • 75 citations
Jin et al.
Omniact: A Dataset And Benchmark For Enabling Multimodal Generalist Autonomous Agents For Desktop And Web (2024) • No Venue
Kapoor et al.
Codeaid: Evaluating A Classroom Deployment Of An Llm-based Programming Assistant That Balances Student And Educator Needs (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 137 citations
Kazemitabaar et al.
"i'm Not Sure, But...": Examining The Impact Of Large Language Models' Uncertainty Expression On User Reliance And Trust (2024) • FAccT '24: The 2024 ACM Conference on Fairness, Accountability, and Transparency • 55 citations
Kim et al.
Prometheus 2: An Open Source Language Model Specialized In Evaluating Other Language Models (2024) • No Venue
Kim et al.
Understanding Large-language Model (llm)-powered Human-robot Interaction (2024) • HRI '24: ACM/IEEE International Conference on Human-Robot Interaction • 75 citations
Callie Y. Kim, Christine P. Lee, Bilge Mutlu
Process Modeling With Large Language Models (2024) • Lecture Notes in Business Information Processing • 50 citations
Kourani et al.
Autowebglm: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent (2024) • No Venue
Lai et al.
Step-dpo: Step-wise Preference Optimization For Long-chain Reasoning Of Llms (2024) • No Venue
Lai et al.
Thanos: Enhancing Conversational Agents With Skill-of-mind-infused Large Language Model (2024) • No Venue
Lee et al.
Theagentcompany: Benchmarking LLM Agents On Consequential Real World Tasks (2024) • No Venue
Xu et al.
Materials Science In The Era Of Large Language Models: A Perspective (2024) • Digital Discovery • 49 citations
Ge Lei, Ronan Docherty, Samuel J. Cooper
A Survey On The Honesty Of Large Language Models (2024) • No Venue
Li et al.
Unbounded: A Generative Infinite Game Of Character Life Simulation (2024) • No Venue
Li et al.
The Value, Benefits, And Concerns Of Generative Ai-powered Assistance In Writing (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 44 citations
Li et al.
Agenttrek: Agent Trajectory Synthesis Via Guiding Replay With Web Tutorials (2024) • No Venue
Xu et al.
Diversity Empowers Intelligence: Integrating Expertise Of Software Engineering Agents (2024) • No Venue
Zhang et al.
Generative Motion Stylization Of Cross-structure Characters Within Canonical Motion Space (2024) • IEEE Journal on Selected Areas in Communications • 46 citations
Zhang et al.
I-SHEEP: Self-alignment Of LLM From Scratch Through An Iterative Self-enhancement Paradigm (2024) • No Venue
Liang et al.
Learning To Learn Faster From Human Feedback With Language Model Predictive Control (2024) • No Venue
Liang et al.
Paper Copilot: A Self-evolving And Efficient LLM System For Personalized Academic Assistance (2024) • No Venue
Lin et al.
Mini-omni2: Towards Open-source Gpt-4o With Vision, Speech And Duplex Capabilities (2024) • No Venue
Zhifei Xie, Changqiao Wu
Chatgpt, Can You Generate Solutions For My Coding Exercises? An Evaluation On Its Effectiveness In An Undergraduate Java Programming Course (2023) • ITiCSE 2023: Innovation and Technology in Computer Science Education • 61 citations
Ouh et al.
An Empirical Study Of The Non-determinism Of Chatgpt In Code Generation (2023) • Arxiv • 49 citations
Ouyang et al.
Large Language Models Can Infer Psychological Dispositions Of Social Media Users (2023) • PNAS Nexus • 45 citations
Heinrich Peters, Sandra Matz
The Generative AI Paradox: "what It Can Create, It May Not Understand" (2023) • No Venue
West et al.
A Prompt Pattern Catalog To Enhance Prompt Engineering With Chatgpt (2023) • Arxiv • 746 citations
White et al.
Learning To Prompt In The Classroom To Understand AI Limits: A Pilot Study (2023) • Lecture Notes in Computer Science • 44 citations
Theophilou et al.
A Study Of Generative Large Language Model For Medical Research And Healthcare (2023) • npj Digital Medicine • 252 citations
Peng et al.
Natural Language Generation And Understanding Of Big Code For Ai-assisted Programming: A Review (2023) • Entropy • 90 citations
Wong et al.
From Word Models To World Models: Translating From Natural Language To The Probabilistic Language Of Thought (2023) • No Venue
Wong et al.
Learning Gain Differences Between Chatgpt And Human Tutor Generated Algebra Hints (2023) • Arxiv • 71 citations
Zachary A. Pardos, Shreya Bhandari
ART: Automatic Multi-step Reasoning And Tool-use For Large Language Models (2023) • Arxiv • 47 citations
Paranjape et al.
Can Chatgpt Be Used To Generate Scientific Hypotheses? (2023) • Journal of Materiomics • 42 citations
Park et al.
Generative Agents: Interactive Simulacra Of Human Behavior (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 941 citations
Park et al.
G-eval: NLG Evaluation Using GPT-4 With Better Human Alignment (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 383 citations
Liu et al.
ONCE: Boosting Content-based Recommendation With Both Open- And Closed-source Large Language Models (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 67 citations
Liu et al.
3D-GPT: Procedural 3D Modeling With Large Language Models (2023) • No Venue
Sun et al.
Wavjourney: Compositional Audio Creation With Large Language Models (2023) • No Venue
Liu et al.
"what It Wants Me To Say": Bridging The Abstraction Gap Between End-user Programmers And Code-generating Large Language Models (2023) • CHI '23: CHI Conference on Human Factors in Computing Systems • 82 citations
Liu et al.
Leveraging Large Language Models To Power Chatbots For Collecting User Self-reported Data (2023) • Proceedings of the ACM on Human-Computer Interaction • 50 citations
Wei et al.
Luminate: Structured Generation And Exploration Of Design Space With Large Language Models For Human-ai Co-creation (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 79 citations
Suh et al.
Sensecape: Enabling Multilevel Exploration And Sensemaking With Large Language Models (2023) • UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology • 89 citations
Suh et al.
Supporting Qualitative Analysis With Large Language Models: Combining Codebook With GPT-3 For Deductive Coding (2023) • IUI '23: 28th International Conference on Intelligent User Interfaces • 126 citations
Xiao et al.
Towards Autonomous System: Flexible Modular Production System Enhanced With Large Language Model Agents (2023) • 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA) • 57 citations
Xia et al.
Do Large Language Models Show Decision Heuristics Similar To Humans? A Case Study Using GPT-3.5 (2023) • Journal of Experimental Psychology: General • 46 citations
Suri et al.
Is Chatgpt Good At Search? Investigating Large Language Models As Re-ranking Agents (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 141 citations
Sun et al.
Chameleon: Plug-and-play Compositional Reasoning With Large Language Models (2023) • Arxiv • 89 citations
Lu et al.
Unleashing The Emergent Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 48 citations
Wang et al.
Can Chatgpt Reproduce Human-generated Labels? A Study Of Social Computing Tasks (2023) • Arxiv • 62 citations
Zhu et al.
Chatgpt: More Than A Weapon Of Mass Deception, Ethical Challenges And Responses From The Human-centered Artificial Intelligence (HCAI) Perspective (2023) • International Journal of Human–Computer Interaction • 63 citations
Sison et al.
Chatgpt And A New Academic Reality: Artificial Intelligence-written Research Papers And The Ethics Of The Large Language Models In Scholarly Publishing (2023) • Journal of the Association for Information Science and Technology • 591 citations
Lund et al.
Bioinspiredllm: Conversational Large Language Model For The Mechanics Of Biological And Bio-inspired Materials (2023) • Advanced Science • 67 citations
Rachel K. Luu, Markus J. Buehler
Evaluating The Social Impact Of Generative AI Systems In Systems And Society (2023) • Arxiv • 41 citations
Solaiman et al.
Translating Radiology Reports Into Plain Language Using Chatgpt And GPT-4 With Prompt Learning: Promising Results, Limitations, And Potential (2023) • Visual Computing for Industry, Biomedicine, and Art • 264 citations
Lyu et al.
Beyond Chatbots: Explorellm For Structured Thoughts And Personalized Model Responses (2023) • No Venue
Ma et al.
AI Vs. Human -- Differentiation Analysis Of Scientific Content Generation (2023) • Arxiv • 73 citations
Ma et al.
Eureka: Human-level Reward Design Via Coding Large Language Models (2023) • No Venue
Ma et al.
Tidybot: Personalized Robot Assistance With Large Language Models (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 65 citations
Wu et al.
Mechagents: Large Language Model Multi-agent Collaborations Can Solve Mechanics Problems, Generate New Data, And Integrate Knowledge (2023) • Extreme Mechanics Letters • 47 citations
Bo Ni, Markus J. Buehler
GPT Has Become Financially Literate: Insights From Financial Literacy Tests Of GPT And A Preliminary Test Of How People Use It As A Source Of Advice (2023) • Finance Research Letters • 58 citations
Paweł Niszczota, Sami Abbas
Using An LLM To Help With Code Understanding (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 186 citations
Nam et al.
Directgpt: A Direct Manipulation Interface To Interact With Large Language Models (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 54 citations
Masson et al.
Towards Accurate Differential Diagnosis With Large Language Models (2023) • Nature • 59 citations
McDuff et al.
On The Design Of Ai-powered Code Assistants For Notebooks (2023) • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems • 68 citations
McNutt et al.
How Generative AI Models Such As Chatgpt Can Be (mis)used In SPC Practice, Education, And Research? An Exploratory Study (2023) • Quality Engineering • 116 citations
Megahed et al.
Llm-assisted Knowledge Graph Engineering: Experiments With Chatgpt (2023) • Informatik aktuell • 40 citations
Meyer et al.
GAIA: A Benchmark For General AI Assistants (2023) • No Venue
Mialon et al.
Factscore: Fine-grained Atomic Evaluation Of Factual Precision In Long Form Text Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 132 citations
Min et al.
Classification Of Human- And Ai-generated Texts: Investigating Features For Chatgpt (2023) • Lecture Notes on Data Engineering and Communications Technologies • 40 citations
Lorenz Mindner, Tim Schlippe, Kristina Schaaff
Assigning AI: Seven Approaches For Students, With Prompts (2023) • SSRN Electronic Journal • 95 citations
Ethan Mollick, Lilach Mollick
Levels Of AGI For Operationalizing Progress On The Path To AGI (2023) • No Venue
Morris et al.
Text Embeddings Reveal (almost) As Much As Text (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 40 citations
Morris et al.
Orca: Progressive Learning From Complex Explanation Traces Of GPT-4 (2023) • No Venue
Mukherjee et al.
Human Preference Score: Better Aligning Text-to-image Models With Human Preference (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 108 citations
Wu et al.
Chat2vis: Generating Data Visualisations Via Natural Language Using Chatgpt, Codex And GPT-3 Large Language Models (2023) • IEEE Access • 165 citations
Paula Maddigan, Teo Susnjak
Self-refine: Iterative Refinement With Self-feedback (2023) • Arxiv • 202 citations
Madaan et al.
Selfcheckgpt: Zero-resource Black-box Hallucination Detection For Generative Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 170 citations
Potsawee Manakul, Adian Liusie, Mark J. F. Gales
Roco: Dialectic Multi-robot Collaboration With Large Language Models (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 78 citations
Zhao Mandi, Shreeya Jain, Shuran Song
Assessing Cross-cultural Alignment Between Chatgpt And Human Societies: An Empirical Study (2023) • Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) • 96 citations
Cao et al.
No More Manual Tests? Evaluating And Improving Chatgpt For Unit Test Generation (2023) • Arxiv • 50 citations
Yuan et al.
Towards Human-bot Collaborative Software Architecting With Chatgpt (2023) • Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering • 128 citations
Ahmad et al.
Recommending Root-cause And Mitigation Steps For Cloud Incidents Using Large Language Models (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 69 citations
Ahmed et al.
Information Retrieval Meets Large Language Models: A Strategic Report From Chinese IR Community (2023) • AI Open • 51 citations
Ai et al.
Can Chatgpt Write A Good Boolean Query For Systematic Review Literature Search? (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 181 citations
Wang et al.
Gpt4all: An Ecosystem Of Open Source Compressed Language Models (2023) • No Venue
Anand et al.
Spellburst: A Node-based Interface For Exploratory Creative Coding With Natural Language Prompts (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 45 citations
Angert et al.
Advancing Requirements Engineering Through Generative AI: Assessing The Role Of Llms (2023) • Generative AI for Effective Software Development • 95 citations
Chetan Arora, John Grundy, Mohamed Abdelrazek
The Internal State Of An LLM Knows When It's Lying (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 68 citations
Amos Azaria, Tom Mitchell
Qwen Technical Report (2023) • No Venue
Bai et al.
Codeplan: Repository-level Coding Using Llms And Planning (2023) • No Venue
Bairi et al.
A Multitask, Multilingual, Multimodal Evaluation Of Chatgpt On Reasoning, Hallucination, And Interactivity (2023) • Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) • 466 citations
Bang et al.
Tablegpt: Towards Unifying Tables, Nature Language And Commands Into One GPT (2023) • No Venue
Zha et al.
Chip-chat: Challenges And Opportunities In Conversational Hardware Design (2023) • 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD) • 147 citations
Blocklove et al.
Sparks Of Artificial General Intelligence: Early Experiments With GPT-4 (2023) • Arxiv • 1480 citations
Bubeck et al.
Mechgpt, A Language-based Strategy For Mechanics And Materials Modeling That Connects Knowledge Across Scales, Disciplines And Modalities (2023) • Applied Mechanics Reviews • 74 citations
Markus J. Buehler
Generative AI Assistants In Software Development Education: A Vision For Integrating Generative AI Into Educational Practice, Not Instinctively Defending Against It (2023) • IEEE Software • 60 citations
Christopher Bull, Ahmed Kharrufa
Just Tell Me: Prompt Engineering In Business Process Management (2023) • Lecture Notes in Business Information Processing • 52 citations
Busch et al.
Students' Voices On Generative AI: Perceptions, Benefits, And Challenges In Higher Education (2023) • International Journal of Educational Technology in Higher Education • 992 citations
Cecilia Ka Yuk Chan, Wenjie Hu
The AI Generation Gap: Are Gen Z Students More Interested In Adopting Generative AI Such As Chatgpt In Teaching And Learning Than Their Gen X And Millennial Generation Teachers? (2023) • Smart Learning Environments • 354 citations
Cecilia Ka Yuk Chan, Katherine K. W. Lee
Chatgpt Informed Graph Neural Network For Stock Movement Prediction (2023) • SSRN Electronic Journal • 43 citations
Chen et al.
Automatic Root Cause Analysis Via Large Language Models For Cloud Incidents (2023) • EuroSys '24: Nineteenth European Conference on Computer Systems • 77 citations
Chen et al.
How Is Chatgpt's Behavior Changing Over Time? (2023) • No Venue
Lingjiao Chen, Matei Zaharia, James Zou
Gptutor: A Chatgpt-powered Programming Tool For Code Explanation (2023) • Communications in Computer and Information Science • 68 citations
Chen et al.
Llava-interactive: An All-in-one Demo For Image Chat, Segmentation, Generation And Editing (2023) • No Venue
Chen et al.
"it Felt Like Having A Second Mind": Investigating Human-ai Co-creativity In Prewriting With Large Language Models (2023) • Proceedings of the ACM on Human-Computer Interaction • 40 citations
Wan et al.
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-based Self-verification (2023) • No Venue
Zhou et al.
Generative AI In Computing Education: Perspectives Of Students And Instructors (2023) • 2023 IEEE Frontiers in Education Conference (FIE) • 92 citations
Zastudil et al.
Rethinking The Evaluation For Conversational Recommendation In The Era Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 45 citations
Wang et al.
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts (2023) • No Venue
Veen et al.
Is GPT-4 A Good Data Analyst? (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 65 citations
Liying Cheng, Xingxuan Li, Lidong Bing
Can Large Language Models Transform Computational Social Science? (2023) • Computational Linguistics • 231 citations
Ziems et al.
Adapted Large Language Models Can Outperform Medical Experts In Clinical Text Summarization (2023) • Nature Medicine • 400 citations
Veen et al.
Can Large Language Models Be An Alternative To Human Evaluations? (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 161 citations
Cheng-Han Chiang, Hung-Yi Lee
L3MVN: Leveraging Large Language Models For Visual Target Navigation (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 58 citations
Bangguo Yu, Hamidreza Kasaei, Ming Cao
Promptpaint: Steering Text-to-image Generation Through Paint Medium-like Interactions (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 66 citations
John Joon Young Chung, Eytan Adar
Increasing Diversity While Maintaining Accuracy: Text Data Generation With Large Language Models And Human Interventions (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
John Joon Young Chung, Ece Kamar, Saleema Amershi
Musicagent: An AI Agent For Music Understanding And Generation With Large Language Models (2023) • No Venue
Yu et al.
Receive, Reason, And React: Drive As You Say With Large Language Models In Autonomous Vehicles (2023) • IEEE Intelligent Transportation Systems Magazine • 69 citations
Cui et al.
Chatgpt-4 Outperforms Experts And Crowd Workers In Annotating Political Twitter Messages With Zero-shot Learning (2023) • Arxiv • 152 citations
Petter Törnberg
Auggpt: Leveraging Chatgpt For Text Data Augmentation (2023) • Arxiv • 98 citations
Dai et al.
Choice Over Control: How Users Write With Large Language Models Using Diegetic And Non-diegetic Prompting (2023) • CHI '23: CHI Conference on Human Factors in Computing Systems • 54 citations
Dang et al.
The State Of Human-centered NLP Technology For Fact-checking (2023) • Information Processing & Management • 55 citations
Das et al.
LLMR: Real-time Prompting Of Interactive Worlds Using Large Language Models (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 56 citations
Torre et al.
LIDA: A Tool For Automatic Generation Of Grammar-agnostic Visualizations And Infographics Using Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) • 69 citations
Victor Dibia
General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Societal Implications And Responsible Governance (2023) • Information Fusion • 46 citations
Triguero et al.
Large Language Model For Science: A Study On P Vs. NP (2023) • No Venue
Dong et al.
A Comparative Study Of Ai-generated (GPT-4) And Human-crafted Mcqs In Programming Education (2023) • Proceedings of the 26th Australasian Computing Education Conference • 68 citations
Doughty et al.
The AI Ghostwriter Effect: When Users Do Not Perceive Ownership Of Ai-generated Text But Self-declare As Authors (2023) • ACM Transactions on Computer-Human Interaction • 71 citations
Draxler et al.
Enhancing Job Recommendation Through Llm-based Generative Adversarial Networks (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 49 citations
Du et al.
Exploring Large Language Models' Cognitive Moral Development Through Defining Issues Test (2023) • No Venue
Tanmay et al.
Perspectives On Large Language Models For Relevance Judgment (2023) • ICTIR '23: The 2023 ACM SIGIR International Conference on the Theory of Information Retrieval • 103 citations
Faggioli et al.
Just Ask For Calibration: Strategies For Eliciting Calibrated Confidence Scores From Language Models Fine-tuned With Human Feedback (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 49 citations
Tian et al.
Gpt4aigchip: Towards Next-generation AI Accelerator Design Automation Via Large Language Models (2023) • 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) • 79 citations
Fu et al.
Creating A Large Language Model Of A Philosopher (2023) • Mind & Language • 46 citations
Eric Schwitzgebel, David Schwitzgebel, Anna Strasser
Assistgpt: A General Multi-modal Assistant That Can Plan, Execute, Inspect, And Learn (2023) • No Venue
Gao et al.
Enabling Large Language Models To Generate Text With Citations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 68 citations
Gao et al.
Improved Trust In Human-robot Collaboration With Chatgpt (2023) • IEEE Access • 164 citations
Yang Ye, Hengxu You, Jing Du
Tokenflow: Consistent Diffusion Features For Consistent Video Editing (2023) • No Venue
Geyer et al.
Transformative Effects Of Chatgpt On Modern Education: Emerging Era Of AI Chatbots (2023) • Internet of Things and Cyber-Physical Systems • 372 citations
Gill et al.
Chatgpt: Vision And Challenges (2023) • Internet of Things and Cyber-Physical Systems • 204 citations
Sukhpal Singh Gill, Rupinder Kaur
Medagents: Large Language Models As Collaborators For Zero-shot Medical Reasoning (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 53 citations
Tang et al.
Chatgpt Is Not All You Need. A State Of The Art Review Of Large Generative AI Models (2023) • Arxiv • 238 citations
Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchan
Legalbench: A Collaboratively Built Benchmark For Measuring Legal Reasoning In Large Language Models (2023) • SSRN Electronic Journal • 77 citations
Guha et al.
Thrilled By Your Progress! Large Language Models (GPT-4) No Longer Struggle To Pass Assessments In Higher Education Programming Courses (2023) • ICER 2023: ACM Conference on International Computing Education Research • 97 citations
Savelka et al.
Can Generative Pre-trained Transformers (GPT) Pass Assessments In Higher Education Programming Courses? (2023) • ITiCSE 2023: Innovation and Technology in Computer Science Education • 83 citations
Savelka et al.
Getting Pwn'd By AI: Penetration Testing With Large Language Models (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 68 citations
Andreas Happe, Jürgen Cito
Artificial Muses: Generative Artificial Intelligence Chatbots Have Risen To Human-level Creativity (2023) • Journal of Creativity • 153 citations
Jennifer Haase, Paul H. P. Hanel
Regulating Chatgpt And Other Large Generative AI Models (2023) • 2023 ACM Conference on Fairness Accountability and Transparency • 353 citations
Philipp Hacker, Andreas Engel, Marco Mauer
Machine Psychology (2023) • Arxiv • 61 citations
Hagendorff et al.
RECIPE: How To Integrate Chatgpt Into EFL Writing Education (2023) • Proceedings of the Tenth ACM Conference on Learning @ Scale • 46 citations
Han et al.
Reasoning With Language Model Is Planning With World Model (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 68 citations
Hao et al.
Platform-independent And Curriculum-oriented Intelligent Assistant For Higher Education (2023) • International Journal of Educational Technology in Higher Education • 80 citations
Sajja et al.
Annollm: Making Large Language Models To Be Better Crowdsourced Annotators (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track) • 46 citations
He et al.
Appagent: Multimodal Agents As Smartphone Users (2023) • No Venue
Zhang et al.
Copiloting The Copilots: Fusing Large Language Models With Completion Engines For Automated Program Repair (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 75 citations
Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang
Metagpt: Meta Programming For A Multi-agent Collaborative Framework (2023) • Arxiv • 124 citations
Hong et al.
Exploring The Responses Of Large Language Models To Beginner Programmers' Help Requests (2023) • ICER 2023: ACM Conference on International Computing Education Research • 95 citations
Hellas et al.
From Task Structures To World Models: What Do Llms Know? (2023) • Trends in Cognitive Sciences • 43 citations
Ilker Yildirim, L. A. Paul
Evaluating Large Language Models On A Highly-specialized Topic, Radiation Oncology Physics (2023) • Frontiers in Oncology • 112 citations
Holmes et al.
Cogagent: A Visual Language Model For GUI Agents (2023) • No Venue
Hong et al.
Graspgpt: Leveraging Semantic Knowledge From A Large Language Model For Task-oriented Grasping (2023) • IEEE Robotics and Automation Letters • 67 citations
Tang et al.
Personality Traits In Large Language Models (2023) • No Venue
Safdari et al.
Opportunities And Challenges Of Chatgpt For Design Knowledge Management (2023) • Procedia CIRP • 76 citations
Hu et al.
Audiogpt: Understanding And Generating Speech, Music, Sound, And Talking Head (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Huang et al.
Towards Social Generative AI For Education: Theory, Practices And Ethics (2023) • Learning: Research and Practice • 84 citations
Mike Sharples
Character-llm: A Trainable Agent For Role-playing (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 62 citations
Shao et al.
Genassist: Making Image Generation Accessible (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 46 citations
Mina Huh, Yi-Hao Peng, Amy Pavel
Facilitating Self-guided Mental Health Interventions Through Human-language Model Interaction: A Case Study Of Cognitive Restructuring (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 43 citations
Sharma et al.
GPQA: A Graduate-level Google-proof Q&A Benchmark (2023) • No Venue
Rein et al.
Designing Participatory AI: Creative Professionals' Worries And Expectations About Generative AI (2023) • Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems • 96 citations
Nanna Inie, Jeanette Falk, Steven Tanimoto
Co-writing With Opinionated Language Models Affects Users' Views (2023) • CHI '23: CHI Conference on Human Factors in Computing Systems • 151 citations
Jakesch et al.
Chatgpt And Software Testing Education: Promises & Perils (2023) • 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW) • 198 citations
Jalil et al.
Consistency Analysis Of Chatgpt (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 49 citations
Myeongjun Erik Jang, Thomas Lukasiewicz
Large AI Model Empowered Multimodal Semantic Communications (2023) • IEEE Wireless Communications • 40 citations
Jiang et al.
Structgpt: A General Framework For Large Language Model To Reason Over Structured Data (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 111 citations
Jiang et al.
Motiongpt: Human Motion As A Foreign Language (2023) • No Venue
Jiang et al.
Practical And Ethical Challenges Of Large Language Models In Education: A Systematic Scoping Review (2023) • British Journal of Educational Technology • 423 citations
Yan et al.
Teach AI How To Code: Using Large Language Models As Teachable Agents For Programming Education (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 53 citations
Jin et al.
Large Language Models As Zero-shot Human Models For Human-robot Interaction (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 52 citations
Bowen Zhang, Harold Soh
Is Stack Overflow Obsolete? An Empirical Study Of The Characteristics Of Chatgpt Answers To Stack Overflow Questions (2023) • Arxiv • 49 citations
Kabir et al.
Detgpt: Detect What You Need Via Reasoning (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 49 citations
Pi et al.
Extracting Accurate Materials Data From Research Papers With Conversational Language Models And Prompt Engineering (2023) • Nature Communications • 212 citations
MacIej P. Polak, Dane Morgan
Large Language Models For Education: Grading Open-ended Questions Using Chatgpt (2023) • SBES 2023: XXXVII Brazilian Symposium on Software Engineering • 46 citations
Pinto et al.
A Prompt Log Analysis Of Text-to-image Generation Systems (2023) • Proceedings of the ACM Web Conference 2023 • 40 citations
Xie et al.
Translating Natural Language To Planning Goals With Large-language Models (2023) • Arxiv • 44 citations
Xie et al.
Encouraging Divergent Thinking In Large Language Models Through Multi-agent Debate (2023) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 99 citations
Liang et al.
GPT Detectors Are Biased Against Non-native English Writers (2023) • Patterns • 306 citations
Liang et al.
Rich Human Feedback For Text-to-image Generation (2023) • No Venue
Liang et al.
Taskmatrix.ai: Completing Tasks By Connecting Foundation Models With Millions Of Apis (2023) • Intelligent Computing • 76 citations
Liang et al.
Ai's Regimes Of Representation: A Community-centered Study Of Text-to-image Models In South Asia (2023) • FAccT '23: the 2023 ACM Conference on Fairness, Accountability, and Transparency • 53 citations
Qadri et al.
Generative AI For Programming Education: Benchmarking Chatgpt, GPT-4, And Human Tutors (2023) • No Venue
Phung et al.
AI Transparency In The Age Of Llms: A Human-centered Research Roadmap (2023) • Harvard Data Science Review • 75 citations
Q. Vera Liao, Jennifer Wortman Vaughan
Designerly Understanding: Information Needs For Model Transparency To Support Design Ideation For Ai-powered User Experience (2023) • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems • 56 citations
Liao et al.
Investigating The Use Of Chatgpt For The Scheduling Of Construction Projects (2023) • Buildings • 174 citations
Samuel A. Prieto, Eyob T. Mengiste, Borja García de Soto
The Robots Are Here: Navigating The Generative AI Revolution In Computing Education (2023) • Proceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education • 235 citations
Prather et al.
Text2motion: From Natural Language Instructions To Feasible Plans (2023) • Autonomous Robots • 156 citations
Lin et al.
MM-VID: Advancing Video Understanding With Gpt-4v(ision) (2023) • No Venue
Lin et al.
CAMEL: Communicative Agents For "mind" Exploration Of Large Language Model Society (2023) • Arxiv • 87 citations
Li et al.
Api-bank: A Comprehensive Benchmark For Tool-augmented Llms (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 44 citations
Li et al.
Autonomous GIS: The Next-generation Ai-powered GIS (2023) • International Journal of Digital Earth • 99 citations
Zhenlong Li, Huan Ning
Halueval: A Large-scale Hallucination Evaluation Benchmark For Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 143 citations
Li et al.
"HOT" Chatgpt: The Promise Of Chatgpt In Detecting And Discriminating Hateful, Offensive, And Toxic Comments On Social Media (2023) • ACM Transactions on the Web • 63 citations
Li et al.
Teach Llms To Personalize -- An Approach Inspired By Writing Education (2023) • No Venue
Li et al.
Theory Of Mind For Multi-agent Collaboration Via Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 40 citations
Li et al.
Chatdev: Communicative Agents For Software Development (2023) • Arxiv • 65 citations
Qian et al.
Baize: An Open-source Chat Model With Parameter-efficient Tuning On Self-chat Data (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 94 citations
Xu et al.
LARP: Language-agent Role Play For Open-world Games (2023) • No Venue
Yan et al.
"it's A Fair Game", Or Is It? Examining How Users Navigate Disclosure Risks And Benefits When Using Llm-based Conversational Agents (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 46 citations
Zhang et al.
Generative Artificial Intelligence In Learning Analytics: Contextualising Opportunities And Challenges Through The Learning Analytics Cycle (2023) • Proceedings of the 14th Learning Analytics and Knowledge Conference • 51 citations
Lixiang Yan, Roberto Martinez-Maldonado, Dragan Gašević
Language-driven Representation Learning For Robotics (2023) • Robotics: Science and Systems XIX • 47 citations
Karamcheti et al.
Chatgpt For Programming Numerical Methods (2023) • Journal of Machine Learning for Modeling and Computing • 81 citations
Ali Kashefi, Tapan Mukerji
How Novices Use Llm-based Code Generators To Solve CS1 Coding Tasks In A Self-paced Learning Environment (2023) • Koli Calling '23: 23rd Koli Calling International Conference on Computing Education Research • 83 citations
Kazemitabaar et al.
Studying The Effect Of AI Code Generators On Supporting Novice Learners In Introductory Programming (2023) • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems • 226 citations
Kazemitabaar et al.
Gptaraeval: A Comprehensive Evaluation Of Chatgpt On Arabic NLP (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 41 citations
Khondaker et al.
Can Large Language Models Replace Humans In The Systematic Review Process? Evaluating Gpt-4's Efficacy In Screening And Extracting Data From Peer-reviewed And Grey Literature In Multiple Languages (2023) • Research Synthesis Methods • 130 citations
Khraisha et al.
Supporting Human-ai Collaboration In Auditing Llms With Llms (2023) • AIES '23: AAAI/ACM Conference on AI, Ethics, and Society • 47 citations
Rastogi et al.
Webarena: A Realistic Web Environment For Building Autonomous Agents (2023) • No Venue
Zhou et al.
Nemo Guardrails: A Toolkit For Controllable And Safe LLM Applications With Programmable Rails (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 53 citations
Rebedea et al.
Sasha: Creative Goal-oriented Reasoning In Smart Homes With Large Language Models (2023) • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies • 44 citations
King et al.
Chatgpt: Beginning Of An End Of Manual Linguistic Data Annotation? Use Case Of Automatic Genre Identification (2023) • Arxiv • 64 citations
Taja Kuzman, Igor Mozetič, Nikola Ljubešić
Chinese Intermediate English Learners Outdid Chatgpt In Deep Cohesion: Evidence From English Narrative Writing (2023) • System • 50 citations
Zhou et al.
Redefining Qualitative Analysis In The AI Era: Utilizing Chatgpt For Efficient Thematic Analysis (2023) • Arxiv • 58 citations
Zhang et al.
Chatgpt And Other Large Language Models As Evolutionary Engines For Online Interactive Collaborative Game Design (2023) • GECCO '23: Genetic and Evolutionary Computation Conference • 42 citations
Pier Luca Lanzi, Daniele Loiacono
Hugginggpt: Solving AI Tasks With Chatgpt And Its Friends In Hugging Face (2023) • Arxiv • 264 citations
Shen et al.
Video-llama: An Instruction-tuned Audio-visual Language Model For Video Understanding (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 367 citations
Hang Zhang, Xin Li, Lidong Bing
VISAR: A Human-ai Argumentative Writing Assistant With Visual Programming And Rapid Draft Prototyping (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 67 citations
Zhang et al.
Corrupted By Algorithms? How Ai-generated And Human-written Advice Shape (dis)honesty (2023) • The Economic Journal • 42 citations
Leib et al.
Comparing Code Explanations Created By Students And Large Language Models (2023) • Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 • 155 citations
Leinonen et al.
Teaching Models To Express Their Uncertainty In Words (2022) • Arxiv • 53 citations
Stephanie Lin, Jacob Hilton, Owain Evans
Large Language Models Can Self-improve (2022) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 96 citations
Huang et al.
Inner Monologue: Embodied Reasoning Through Planning With Language Models (2022) • Arxiv • 202 citations
Huang et al.
Large Language Models Are Better Reasoners With Self-verification (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 49 citations
Weng et al.
Re3: Generating Longer Stories With Recursive Reprompting And Revision (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 68 citations
Yang et al.
Creative Writing With An Ai-powered Writing Assistant: Perspectives From Professional Writers (2022) • Arxiv • 47 citations
Ippolito et al.
Opal: Multimodal Image Generation For News Illustration (2022) • Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology • 87 citations
Vivian Liu, Han Qiao, Lydia Chilton
In Conversation With Artificial Intelligence: Aligning Language Models With Human Values (2022) • Philosophy & Technology • 77 citations
Atoosa Kasirzadeh, Iason Gabriel
Automatic Detection And Analysis Of Technical Debts In Peer-review Documentation Of R Packages (2022) • Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering • 81 citations
Junaed Younus Khan, Gias Uddin
Prosocialdialog: A Prosocial Backbone For Conversational Agents (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 42 citations
Kim et al.
Human-ai Collaboration Via Conditional Delegation: A Case Study Of Content Moderation (2022) • CHI Conference on Human Factors in Computing Systems • 106 citations
Lai et al.
Better Together? An Evaluation Of Ai-supported Code Translation (2022) • 27th International Conference on Intelligent User Interfaces • 57 citations
Weisz et al.
Coauthor: Designing A Human-ai Collaborative Writing Dataset For Exploring Language Model Capabilities (2022) • CHI '22: CHI Conference on Human Factors in Computing Systems • 223 citations
Mina Lee, Percy Liang, Qian Yang
Using Large Language Models To Enhance Programming Error Messages (2022) • SIGCSE 2023: The 54th ACM Technical Symposium on Computer Science Education • 170 citations
Leinonen et al.
Co-writing Screenplays And Theatre Scripts With Language Models: An Evaluation By Industry Professionals (2022) • CHI '23: CHI Conference on Human Factors in Computing Systems • 154 citations
Mirowski et al.
Promptchainer: Chaining Large Language Model Prompts Through Visual Programming (2022) • CHI '22: CHI Conference on Human Factors in Computing Systems • 124 citations
Wu et al.
UX Research On Conversational Human-ai Interaction: A Literature Review Of The ACM Digital Library (2022) • CHI Conference on Human Factors in Computing Systems • 70 citations
Zheng et al.
Interactive Model Cards: A Human-centered Approach To Model Documentation (2022) • 2022 ACM Conference on Fairness Accountability and Transparency • 76 citations
Crisan et al.
Progprompt: Generating Situated Robot Task Plans Using Large Language Models (2022) • Arxiv • 42 citations
Singh et al.
Learning From Flowsheets: A Generative Transformer Model For Autocompletion Of Flowsheets (2022) • Computers & Chemical Engineering • 42 citations
Gabriel Vogel, Lukas Schulze Balhorn, Artur M. Schweidtmann
Language Models Of Code Are Few-shot Commonsense Learners (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 62 citations
Madaan et al.
How Readable Is Model-generated Code? Examining Readability And Visual Inspection Of Github Copilot (2022) • Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering • 40 citations
Naser Al Madi
How To Prompt? Opportunities And Challenges Of Zero- And Few-shot Learning For Human-ai Interaction In Creative Applications Of Generative Models (2022) • Arxiv • 85 citations
Dang et al.
Beyond Text Generation: Supporting Writers With Continuous Automatic Text Summaries (2022) • Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology • 74 citations
Dang et al.
Using Pre-trained Models To Boost Code Review Automation (2022) • Proceedings of the 44th International Conference on Software Engineering • 131 citations
Tufano et al.
Do Large Language Models Know What Humans Know? (2022) • Cognitive Science • 63 citations
Trott et al.
Automated Clinical Coding: What, Why, And Where We Are? (2022) • npj Digital Medicine • 69 citations
Dong et al.
Investigating Explainability Of Generative AI For Code Through Scenario-based Design (2022) • 27th International Conference on Intelligent User Interfaces • 176 citations
Sun et al.
A Taxonomy Of Prompt Modifiers For Text-to-image Generation (2022) • Behaviour & Information Technology • 128 citations
Jonas Oppenlaender
Ai-driven Development Is Here: Should You Worry? (2022) • IEEE Software • 55 citations
Neil Ernst, Gabriele Bavota
Large Language Models And The Reverse Turing Test (2022) • Neural Computation • 99 citations
Terrence Sejnowski
Emergent Analogical Reasoning In Large Language Models (2022) • Nature Human Behaviour • 238 citations
Taylor Webb, Keith J. Holyoak, Hongjing Lu
Automated Repair Of Programs From Large Language Models (2022) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 157 citations
Fan et al.
Lampost: Design And Evaluation Of An Ai-assisted Email Writing Prototype For Adults With Dyslexia (2022) • Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility • 55 citations
Goodman et al.
Putting Gpt-3's Creativity To The (alternative Uses) Test (2022) • Arxiv • 50 citations
Stevenson et al.
Self-critiquing Models For Assisting Human Evaluators (2022) • Arxiv • 46 citations
Saunders et al.
Thinking Fast And Slow In Large Language Models (2022) • Nature Computational Science • 135 citations
Thilo Hagendorff, Sarah Fabi, Michal Kosinski
Storybuddy: A Human-ai Collaborative Chatbot For Parent-child Interactive Storytelling With Flexible Parental Involvement (2022) • CHI Conference on Human Factors in Computing Systems • 127 citations
Zhang et al.
What Is It Like To Program With Artificial Intelligence? (2022) • Arxiv • 44 citations
Sarkar et al.
Automatic Generation Of Programming Exercises And Code Explanations Using Large Language Models (2022) • ICER 2022: ACM Conference on International Computing Education Research • 329 citations
Sarsa et al.
"I Think This Is The Most Disruptive Technology": Exploring Sentiments Of Chatgpt Early Adopters Using Twitter Data (2022) • Arxiv • 204 citations
Haque et al.
Neural Theory-of-mind? On The Limits Of Social Intelligence In Large Lms (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 81 citations
Sap et al.
WANLI: Worker And AI Collaboration For Natural Language Inference Dataset Creation (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 102 citations
Liu et al.
Unifiedskg: Unifying And Multi-tasking Structured Knowledge Grounding With Text-to-text Language Models (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 74 citations
Xie et al.
Dexperts: Decoding-time Controlled Text Generation With Experts And Anti-experts (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 127 citations
Liu et al.
Building And Evaluating Open-domain Dialogue Corpora With Clarifying Questions (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Aliannejadi et al.
AI Chains: Transparent And Controllable Human-ai Interaction By Chaining Large Language Model Prompts (2021) • CHI '22: CHI Conference on Human Factors in Computing Systems • 326 citations
Tongshuang Wu, Michael Terry, Carrie J. Cai
Just Say No: Analyzing The Stance Of Neural Dialogue Generation In Offensive Contexts (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 48 citations
Baheti et al.
Popblends: Strategies For Conceptual Blending With Large Language Models (2021) • CHI '23: CHI Conference on Human Factors in Computing Systems • 43 citations
Wang et al.
Documentation Matters: Human-centered AI System To Assist Data Science Code Documentation In Computational Notebooks (2021) • ACM Transactions on Computer-Human Interaction • 73 citations
Wang et al.
Text2gestures: A Transformer-based Network For Generating Emotive Body Gestures For Virtual Agents (2021) • 2021 IEEE Virtual Reality and 3D User Interfaces (VR) • 109 citations
Bhattacharya et al.
The Impact Of Multiple Parallel Phrase Suggestions On Email Input And Composition Behaviour Of Native And Non-native English Writers (2021) • Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems • 95 citations
Daniel Buschek, Martin Zürn, Malin Eiband
SIMMC 2.0: A Task-oriented Dialog Dataset For Immersive Multimodal Conversations (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 51 citations
Kottur et al.
Text Is NOT Enough: Integrating Visual Impressions Into Open-domain Dialogue Generation (2021) • ACM Computing Surveys • 151 citations
Shen et al.
Alignment Of Language Agents (2021) • Arxiv • 41 citations
Kenton et al.
INVIGORATE: Interactive Visual Grounding And Grasping In Clutter (2021) • Robotics: Science and Systems XVII • 45 citations
Zhang et al.
Perfection Not Required? Human-ai Partnerships In Code Translation (2021) • 26th International Conference on Intelligent User Interfaces • 82 citations
Weisz et al.
Conceptual Metaphors Impact Perceptions Of Human-ai Collaboration (2020) • Proceedings of the ACM on Human-Computer Interaction • 124 citations
Khadpe et al.
Aligning AI With Shared Human Values (2020) • Arxiv • 100 citations
Hendrycks et al.
Content Planning For Neural Story Generation With Aristotelian Rescoring (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 104 citations
Goldfarb-Tarrant et al.
INSPIRED: Toward Sociable Recommendation Dialog Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 95 citations
Hayati et al.
Discern: Discourse-aware Entailment Reasoning Network For Conversational Machine Reading (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Gao et al.
Dialogue Response Ranking Training With Large-scale Human Feedback Data (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 62 citations
Gao et al.
Imitating Interactive Intelligence (2020) • Arxiv • 43 citations
Abramson et al.
Crows-pairs: A Challenge Dataset For Measuring Social Biases In Masked Language Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 148 citations
Nangia et al.
Language Generation With Multi-hop Reasoning On Commonsense Knowledge Graph (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 95 citations
Ji et al.
Photon: A Robust Cross-domain Text-to-sql System (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 42 citations
Zeng et al.
Chatbot Interaction With Artificial Intelligence: Human Data Augmentation With T5 And Language Transformer Ensemble For Text Classification (2020) • Journal of Ambient Intelligence and Humanized Computing • 54 citations
Jordan J. Bird, Anikó Ekárt, Diego R. Faria
Template Guided Text Generation For Task-oriented Dialogue (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 55 citations
Mihir Kale, Abhinav Rastogi
Fairseq S2T: Fast Speech-to-text Modeling With Fairseq (2020) • Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations • 110 citations
Wang et al.
What Is More Likely To Happen Next? Video-and-language Future Event Prediction (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 41 citations
Lei et al.
Generating Similes Effortlessly Like A Pro: A Style Transfer Approach For Simile Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Tuhin Chakrabarty, Smaranda Muresan, Nanyun Peng
Room-across-room: Multilingual Vision-and-language Navigation With Dense Spatiotemporal Grounding (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 190 citations
Ku et al.
Towards Persona-based Empathetic Conversational Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 107 citations
Zhong et al.
X-LXMERT: Paint, Caption And Answer Questions With Multi-modal Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 58 citations
Cho et al.
Grounded Adaptation For Zero-shot Executable Semantic Parsing (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 83 citations
Zhong et al.
A Taxonomy Of Empathetic Response Intents In Human Social Conversations (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 81 citations
Anuradha Welivita, Pearl Pu
NL4DV: A Toolkit For Generating Analytic Specifications For Data Visualization From Natural Language Queries (2020) • IEEE Transactions on Visualization and Computer Graphics • 164 citations
Arpit Narechania, Arjun Srinivasan, John Stasko
MEGATRON-CNTRL: Controllable Story Generation With External Knowledge Using Large-scale Language Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Xu et al.
Knowledge-grounded Dialogue Generation With Pre-trained Language Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Zhao et al.
Enabling Language Models To Fill In The Blanks (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 41 citations
Chris Donahue, Mina Lee, Percy Liang
Recommendation As A Communication Game: Self-supervised Bot-play For Goal-oriented Dialogue (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 80 citations
Kang et al.
Sentence-level Content Planning And Style Specification For Neural Text Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 51 citations
Xinyu Hua, Lu Wang
Strategies For Structuring Story Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 186 citations
Angela Fan, Mike Lewis, Yann Dauphin
Generating Personalized Recipes From Historical User Preferences (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 73 citations
Majumder et al.
Encode, Tag, Realize: High-precision Text Editing (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Malmi et al.
Juice: A Large Scale Distantly Supervised Dataset For Open Domain Context-based Code Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Rajas Agashe, Srinivasan Iyer, Luke Zettlemoyer
Robust Navigation With Language Pretraining And Stochastic Sampling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 95 citations
Li et al.
Editing-based SQL Query Generation For Cross-domain Context-dependent Questions (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 114 citations
Zhang et al.
Giving BERT A Calculator: Finding Operations And Arguments With Reading Comprehension (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 75 citations
Andor et al.
Help, Anna! Visual Navigation With Natural Multimodal Assistance Via Retrospective Curiosity-encouraging Imitation Learning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 57 citations
Khanh Nguyen, Hal Daumé
The Woman Worked As A Babysitter: On Biases In Language Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 380 citations
Sheng et al.
Structuring Latent Spaces For Stylized Response Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 48 citations
Gao et al.
Counterfactual Story Reasoning And Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 98 citations
Qin et al.
Taskmaster-1: Toward A Realistic And Diverse Dialog Dataset (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 134 citations
Byrne et al.
Learning A Shared Shape Space For Multimodal Garment Design (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 62 citations
Wang et al.
Interpretation Of Natural Language Rules In Conversational Machine Reading (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 141 citations
Saeidi et al.
Semi-supervised Sequence Modeling With Cross-view Training (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 391 citations
Clark et al.
Decoupling Strategy And Generation In Negotiation Dialogues (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 128 citations
He et al.
Learning A Neural Semantic Parser From User Feedback (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 54 citations
Iyer et al.
Deal Or No Deal? End-to-end Learning For Negotiation Dialogues (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 51 citations
Lewis et al.
Rationalizing Neural Predictions (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 132 citations
Tao Lei, Regina Barzilay, Tommi Jaakkola
Reasoning About Pragmatics With Neural Listeners And Speakers (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 113 citations
Jacob Andreas, Dan Klein
Dialogue Learning With Human-in-the-loop (2016) • Arxiv • 46 citations
Li et al.
Automated Correction For Syntax Errors In Programming Assignments Using Recurrent Neural Networks (2016) • Arxiv • 71 citations
Sahil Bhatia, Rishabh Singh

Showing first 12 while collapsed. Click to expand and reveal all 482.

— I —

ICASSP 31 papers #

Mossformer: Pushing The Performance Limit Of Monaural Speech Separation Using Gated Single-head Transformer With Convolution-augmented Joint Self-attentions (2023) • ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 47 citations
Shengkui Zhao, Bin Ma
Auto-avsr: Audio-visual Speech Recognition With Automatic Labels (2023) • ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 91 citations
Ma et al.
Prompting Large Language Models With Speech Recognition Abilities (2023) • ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 60 citations
Fathullah et al.
Mmlatch: Bottom-up Top-down Fusion For Multimodal Sentiment Analysis (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 43 citations
Georgios Paraskevopoulos, Efthymios Georgiou, Alexandros Potamianos
Efficient Adapter Transfer Of Self-supervised Speech Models For Automatic Speech Recognition (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 52 citations
Bethan Thomas, Samuel Kessler, Salah Karout
Contextual Adapters For Personalized Speech Recognition In Neural Transducers (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 42 citations
Sathyendra et al.
Wav2clip: Learning Robust Audio Representations From CLIP (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 158 citations
Wu et al.
Dual-branch Attention-in-attention Transformer For Single-channel Speech Enhancement (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 81 citations
Yu et al.
Audioclip: Extending CLIP To Image, Text And Audio (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 210 citations
Guzhov et al.
Lightspeech: Lightweight And Fast Text To Speech With Neural Architecture Search (2021) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Luo et al.
End-to-end Audio-visual Speech Recognition With Conformers (2021) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 202 citations
Pingchuan Ma, Stavros Petridis, Maja Pantic
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 84 citations
Shi et al.
A Co-interactive Transformer For Joint Slot Filling And Intent Detection (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 89 citations
Qin et al.
BOFFIN TTS: Few-shot Speaker Adaptation By Bayesian Optimization (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 56 citations
Moss et al.
Joint Contextual Modeling For ASR Correction And Language Understanding (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 42 citations
Weng et al.
Aligntts: Efficient Feed-forward Text-to-speech System Without Explicit Alignment (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Zeng et al.
Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 68 citations
Miao et al.
Developing Real-time Streaming Transformer Transducer For Speech Recognition On Large-scale Dataset (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 127 citations
Chen et al.
Leveraging Unpaired Text Data For Training End-to-end Speech-to-intent Systems (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 56 citations
Huang et al.
A Streaming On-device End-to-end Model Surpassing Server-side Conventional Model Quality And Latency (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 200 citations
Sainath et al.
Minimum Latency Training Strategies For Streaming Sequence-to-sequence ASR (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 46 citations
Inaguma et al.
Self-attention Aligner: A Latency-control End-to-end Model For ASR Using Self-attention Network And Chunk-hopping (2019) • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 100 citations
Linhao Dong, Feng Wang, Bo Xu
Correction Of Automatic Speech Recognition With Transformer Sequence-to-sequence Model (2019) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 49 citations
Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg
Mockingjay: Unsupervised Speech Representation Learning With Deep Bidirectional Transformer Encoders (2019) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 195 citations
Liu et al.
Espnet-tts: Unified, Reproducible, And Integratable Open Source End-to-end Text-to-speech Toolkit (2019) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 98 citations
Hayashi et al.
Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM (2019) • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 49 citations
Cai et al.
Attention-based LSTM For Psychological Stress Detection From Spoken Language Using Distant Supervision (2018) • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Genta Indra Winata, Onno Pepijn Kampman, Pascale Fung
Deep-fsmn For Large Vocabulary Continuous Speech Recognition (2018) • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 120 citations
Zhang et al.
State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017) • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 82 citations
Chiu et al.
An Analysis Of Incorporating An External Language Model Into A Sequence-to-sequence Model (2017) • ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 138 citations
Kannan et al.
Personalized Speech Recognition On Mobile Devices (2016) • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 158 citations
McGraw et al.

Showing first 12 while collapsed. Click to expand and reveal all 31.

ICCV 124 papers #

ELITE: Encoding Visual Concepts Into Textual Embeddings For Customized Text-to-image Generation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 160 citations
Wei et al.
Localizing Object-level Shape Variations With Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 66 citations
Patashnik et al.
FLIP: Cross-domain Face Anti-spoofing With Language Guidance (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar
Unleashing Text-to-image Diffusion Models For Visual Perception (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 124 citations
Zhao et al.
What Does CLIP Know About A Red Circle? Visual Prompt Engineering For Vlms (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 78 citations
Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
3d-vista: Pre-trained Transformer For 3D Vision And Text Alignment (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 82 citations
Zhu et al.
Vipergpt: Visual Inference Via Python Execution For Reasoning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 140 citations
Dídac Surís, Sachit Menon, Carl Vondrick
Towards Geospatial Foundation Models Via Continual Pretraining (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 57 citations
Mendieta et al.
SKED: Sketch-guided Text-based 3D Editing (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 42 citations
Mikaeili et al.
Verbs In Action: Improving Verb Understanding In Video-language Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Momeni et al.
Human Preference Score: Better Aligning Text-to-image Models With Human Preference (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 108 citations
Wu et al.
Enhancing CLIP With GPT-4: Harnessing Visual Descriptions As Prompts (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 45 citations
Maniparambil et al.
Sigmoid Loss For Language Image Pre-training (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 321 citations
Zhai et al.
Hrs-bench: Holistic, Reliable And Scalable Benchmark For Text-to-image Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Bakr et al.
Texfusion: Synthesizing 3D Textures With Text-guided Image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 67 citations
Cao et al.
Masactrl: Tuning-free Mutual Self-attention Control For Consistent Image Synthesis And Editing (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 234 citations
Cao et al.
Fantasia3d: Disentangling Geometry And Appearance For High-quality Text-to-3d Content Creation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 246 citations
Chen et al.
Less Is More: Focus Attention For Efficient DETR (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 85 citations
Zheng et al.
Preventing Zero-shot Transfer Degradation In Continual Learning Of Vision-language Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 55 citations
Zheng et al.
Fastvit: A Fast Hybrid Vision Transformer Using Structural Reparameterization (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 69 citations
Vasu et al.
Anti-dreambooth: Protecting Users From Personalized Text-to-image Synthesis (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Le et al.
Attt2m: Text-driven Human Motion Generation With Multi-perspective Attention Mechanism (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Zhong et al.
Vision Grid Transformer For Document Layout Analysis (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
da et al.
Structure And Content-guided Video Synthesis With Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 275 citations
Esser et al.
Unified Pre-training With Pseudo Texts For Text-to-image Person Re-identification (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 48 citations
Shao et al.
Gloss-free Sign Language Translation: Improving From Visual-language Pretraining (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 47 citations
Zhou et al.
Transferable Decoding With Visual Entities For Zero-shot Image Captioning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Fei et al.
Diverse Data Augmentation With Diffusions For Effective Test-time Prompt Tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 44 citations
Feng et al.
Erasing Concepts From Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 111 citations
Gandikota et al.
Vox-e: Text-guided Voxel Editing Of 3D Objects (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Sella et al.
A Unified Continual Learning Framework With General Parameter-efficient Tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 49 citations
Gao et al.
Motionlm: Multi-agent Motion Forecasting As Language Modeling (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 68 citations
Seff et al.
Expressive Text-to-image Generation With Rich Text (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 48 citations
Ge et al.
Preserve Your Own Correlation: A Noise Prior For Video Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 97 citations
Ge et al.
Adding Conditional Control To Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 2580 citations
Lvmin Zhang, Anyi Rao, Maneesh Agrawala
Make-it-3d: High-fidelity 3D Creation From A Single Image With Diffusion Prior (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 149 citations
Tang et al.
On The Adversarial Robustness Of Multi-modal Foundation Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 42 citations
Christian Schlarmann, Matthias Hein
Svdiff: Compact Parameter Space For Diffusion Fine-tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Han et al.
CLIP Goes 3D: Leveraging Prompt Tuning For Language Grounded 3D Recognition (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 50 citations
Deepti Hegde, Jeya Maria Jose Valanarasu, Vishal M. Patel
TIFA: Accurate And Interpretable Text-to-image Faithfulness Evaluation With Question Answering (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Hu et al.
Implicit Neural Representation For Cooperative Low-light Image Enhancement (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 144 citations
Yang et al.
Text2room: Extracting Textured 3D Meshes From 2D Text-to-image Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 92 citations
Höllein et al.
Knowing Where To Focus: Event-aware Transformer For Video Grounding (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Jang et al.
Sg-former: Self-guided Transformer With Evolving Token Reallocation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 45 citations
Ren et al.
SLCA: Slow Learner With Classifier Alignment For Continual Learning On A Pre-trained Model (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 72 citations
Zhang et al.
Boxdiff: Text-to-image Synthesis With Training-free Box-constrained Diffusion (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 101 citations
Xie et al.
Univtg: Towards Unified Video-language Temporal Grounding (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Lin et al.
Iterative Prompt Learning For Unsupervised Backlit Image Enhancement (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 129 citations
Liang et al.
Egovlpv2: Egocentric Video-language Pre-training With Fusion In The Backbone (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 51 citations
Pramanick et al.
Your Diffusion Model Is Secretly A Zero-shot Classifier (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 99 citations
Li et al.
Fatezero: Fusing Attentions For Zero-shot Text-based Video Editing (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 141 citations
Qi et al.
Text2video-zero: Text-to-image Diffusion Models Are Zero-shot Video Generators (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 261 citations
Khachatryan et al.
LERF: Language Embedded Radiance Fields (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 200 citations
Kerr et al.
Self-regulating Prompts: Foundational Model Adaptation Without Forgetting (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 113 citations
Khattak et al.
Dense Text-to-image Generation With Attention Modulation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 75 citations
Kim et al.
Dreambooth3d: Subject-driven Text-to-3d Generation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Raj et al.
Repq-vit: Scale Reparameterization For Post-training Quantization Of Vision Transformers (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 65 citations
Li et al.
Rethinking Query-key Pairwise Interactions In Vision Transformers (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 194 citations
Cheng Li, Yangxin Liu
Simpleclick: Interactive Image Segmentation With Simple Vision Transformers (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 84 citations
Liu et al.
Versatile Diffusion: Text, Images And Variations All In One Diffusion Model (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 91 citations
Xu et al.
Sus-x: Training-free Name-only Transfer Of Vision-language Models (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 56 citations
Vishaal Udandarao, Ankush Gupta, Samuel Albanie
P{\O}DA: Prompt-driven Zero-shot Domain Adaptation (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Fahes et al.
Llm-planner: Few-shot Grounded Planning For Embodied Agents With Large Language Models (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 267 citations
Song et al.
Hit: Hierarchical Transformer With Momentum Contrast For Video-text Retrieval (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Liu et al.
Unit: Multimodal Multitask Learning With A Unified Transformer (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 224 citations
Ronghang Hu, Amanpreet Singh
Signbert: Pre-training Of Hand-model-aware Representation For Sign Language Recognition (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 89 citations
Hu et al.
Styleclip: Text-driven Manipulation Of Stylegan Imagery (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 87 citations
Patashnik et al.
Episodic Transformer For Vision-and-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 85 citations
Alexander Pashevich, Cordelia Schmid, Chen Sun
Transformer-based Attention Networks For Continuous Pixel-wise Prediction (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 170 citations
Yang et al.
Relaxed Transformer Decoders For Direct Action Proposal Generation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 153 citations
Tan et al.
Taco: Token-aware Cascade Contrastive Learning For Video-text Alignment (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 102 citations
Jianwei Yang, Yonatan Bisk, Jianfeng Gao
Vision-language Navigation With Random Environmental Mixup (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 70 citations
Liu et al.
Visual Saliency Transformer (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 429 citations
Liu et al.
Docformer: End-to-end Transformer For Document Understanding (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 200 citations
Appalaraju et al.
Describing And Localizing Multiple Changes With Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 76 citations
Qiu et al.
Crossclr: Cross-modal Contrastive Learning For Multi-modal Video Representations (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 121 citations
Zolfaghari et al.
Generic Attention-model Explainability For Interpreting Bi-modal And Encoder-decoder Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 207 citations
Hila Chefer, Shir Gur, Lior Wolf
Rethinking Lifelong Sequential Recommendation With Incremental Multi-interest Attention (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 284 citations
Wu et al.
Deepcad: A Deep Generative Network For Computer-aided Design Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 112 citations
Rundi Wu, Chang Xiao, Changxi Zheng
Tokens-to-token Vit: Training Vision Transformers From Scratch On Imagenet (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 1687 citations
Yuan et al.
Instancerefer: Cooperative Holistic Understanding For Visual Grounding On Point Clouds Through Instance Multi-level Contextual Referring (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 99 citations
Yuan et al.
Incorporating Convolution Designs Into Visual Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 434 citations
Yuan et al.
Adversarial VQA: A New Benchmark For Evaluating The Robustness Of VQA Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 42 citations
Li et al.
TEACHTEXT: Crossmodal Generalized Distillation For Text-video Retrieval (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 117 citations
Croitoru et al.
Transvg: End-to-end Visual Grounding With Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 285 citations
Deng et al.
MDETR -- Modulated Detection For End-to-end Multi-modal Understanding (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 594 citations
Kamath et al.
Greedy Gradient Ensemble For Robust Visual Question Answering (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 82 citations
Han et al.
E-vil: A Dataset And Benchmark For Natural Language Explanations In Vision-language Tasks (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 54 citations
Kayser et al.
Compressing Visual-linguistic Model Via Knowledge Distillation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 50 citations
Fang et al.
CM-NAS: Cross-modality Neural Architecture Search For Visible-infrared Person Re-identification (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 150 citations
Fu et al.
Fast Convergence Of DETR With Spatially Modulated Co-attention (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 302 citations
Gao et al.
Tph-yolov5: Improved Yolov5 Based On Transformer Prediction Head For Object Detection On Drone-captured Scenarios (2021) • 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 1665 citations
Zhu et al.
The Road To Know-where: An Object-and-room Informed Sequential BERT For Indoor Vision-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 66 citations
Qi et al.
Synthesis Of Compositional Animations From Textual Descriptions (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 140 citations
Ghosh et al.
Image Retrieval On Real-life Images With Pre-trained Vision-and-language Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 151 citations
Liu et al.
Multi-turn Dialogue Reading Comprehension With Pivot Turns And Knowledge (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 216 citations
Zhuosheng Zhang, Junlong Li, Hai Zhao
Airbert: In-domain Pretraining For Vision-and-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 118 citations
Guhur et al.
Chinese Street View Text: Large-scale Chinese Text Reading With Partially Supervised Learning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 55 citations
Sun et al.
Making History Matter: History-advantage Sequence Training For Visual Dialog (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 63 citations
Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang
Videobert: A Joint Model For Video And Language Representation Learning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 1077 citations
Sun et al.
Transferable Representation Learning In Vision-and-language Navigation (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 85 citations
Huang et al.
Self-training With Progressive Augmentation For Unsupervised Cross-domain Person Re-identification (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 273 citations
Zhang et al.
Saliency-guided Attention Network For Image-sentence Matching (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 105 citations
Ji et al.
Learning To Collocate Neural Modules For Image Captioning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 95 citations
Xu Yang, Hanwang Zhang, Jianfei Cai
Howto100m: Learning A Text-video Embedding By Watching Hundred Million Narrated Video Clips (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 117 citations
Miech et al.
CAMP: Cross-modal Adaptive Message Passing For Text-image Retrieval (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 339 citations
Wang et al.
Dynamic Graph Attention For Referring Expression Comprehension (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 226 citations
Sibei Yang, Guanbin Li, Yizhou Yu
VATEX: A Large-scale, High-quality Multilingual Dataset For Video-and-language Research (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 326 citations
Wang et al.
Watch, Listen And Tell: Multi-modal Weakly Supervised Dense Event Captioning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Tanzila Rahman, Bicheng Xu, Leonid Sigal
Convolutional Character Networks (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 183 citations
Xing et al.
Nocaps: Novel Object Captioning At Scale (2018) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 233 citations
Agrawal et al.
Tell, Draw, And Repeat: Generating And Modifying Images Based On Continual Linguistic Instruction (2018) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 54 citations
El-Nouby et al.
Semantic Image Synthesis Via Adversarial Learning (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 264 citations
Dong et al.
Dense-captioning Events In Videos (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 50 citations
Krishna et al.
Paying Attention To Descriptions Generated By Image Captioning Models (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 81 citations
Tavakoli et al.
TALL: Temporal Activity Localization Via Language Query (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 768 citations
Gao et al.
Attention-based Multimodal Fusion For Video Description (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 387 citations
Hori et al.
Recurrent Topic-transition GAN For Visual Paragraph Generation (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 171 citations
Liang et al.
Identity-aware Textual-visual Matching With Latent Co-attention (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 259 citations
Li et al.
Towards Diverse And Natural Image Descriptions Via A Conditional GAN (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 455 citations
Dai et al.
Learning To Reason: End-to-end Module Networks For Visual Question Answering (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 489 citations
Hu et al.
Multi-modal Factorized Bilinear Pooling With Co-attention Learning For Visual Question Answering (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 694 citations
Yu et al.
Speaking The Same Language: Matching Machine To Human Captions By Adversarial Training (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 238 citations
Shetty et al.
Learning Cooperative Visual Dialog Agents With Deep Reinforcement Learning (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 313 citations
Das et al.

Showing first 12 while collapsed. Click to expand and reveal all 124.

ICLR 8 papers #

Monitoring Ai-modified Content At Scale: A Case Study On The Impact Of Chatgpt On AI Conference Peer Reviews (2024) • Arxiv • 59 citations
Liang et al.
Can GPT-4 Replicate Empirical Software Engineering Research? (2023) • NEJM AI • 101 citations
Liang et al.
Open-vocabulary Object Detection Via Vision And Language Knowledge Distillation (2021) • ICLR 2022 • 280 citations
Gu et al.
Rethinking Positional Encoding In Language Pre-training (2020) • International Conference on Learning Representations (ICLR) 2021 https://openreview.net/forum?id=09-528y2Fgf • 65 citations
Guolin Ke, di He, Tie-Yan Liu
Language Gans Falling Short (2018) • ICLR 2020 - Proceedings of the Seventh International Conference on Learning Representation • 81 citations
Caccia et al.
Ask The Right Questions: Active Question Reformulation With Reinforcement Learning (2017) • Sixth International Conference on Learning Representations (ICLR) 2018 • 82 citations
Buck et al.
Deep Gradient Compression: Reducing The Communication Bandwidth For Distributed Training (2017) • ICLR 2018 • 645 citations
Lin et al.
Tracking The World State With Recurrent Entity Networks (2016) • ICLR 2017 • 157 citations
Henaff et al.

Showing first 12 while collapsed. Click to expand and reveal all 8.

ICRA 8 papers #

Roco: Dialectic Multi-robot Collaboration With Large Language Models (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 78 citations
Zhao Mandi, Shreeya Jain, Shuran Song
Autotamp: Autoregressive Task And Motion Planning With Llms As Translators And Checkers (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 57 citations
Chen et al.
Driving With Llms: Fusing Object-level Vector Modality For Explainable Autonomous Driving (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 110 citations
Chen et al.
Scalable Multi-robot Collaboration With Large Language Models: Centralized Or Decentralized Systems? (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 46 citations
Chen et al.
Physically Grounded Vision-language Models For Robotic Manipulation (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 60 citations
Gao et al.
Llm-grounder: Open-vocabulary 3D Visual Grounding With Large Language Model As An Agent (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 43 citations
Yang et al.
Code As Policies: Language Model Programs For Embodied Control (2022) • 2023 IEEE International Conference on Robotics and Automation (ICRA) • 390 citations
Liang et al.
Hierarchical Cross-modal Agent For Robotics Vision-and-language Navigation (2021) • 2021 IEEE International Conference on Robotics and Automation (ICRA) • 45 citations
Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira

Showing first 12 while collapsed. Click to expand and reveal all 8.

IJCAI 17 papers #

Hdformer: High-order Directed Transformer For 3D Human Pose Estimation (2023) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 44 citations
Chen et al.
Language Modeling Via Stochastic Processes (2022) • Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22} • 40 citations
Wang et al.
Control Globally, Understand Locally: A Global-to-local Hierarchical Graph Network For Emotional Support Conversation (2022) • Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22} • 42 citations
Peng et al.
An Efficient Transformer Decoder With Compressed Sub-layers (2021) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 46 citations
Li et al.
Pretrained Language Models For Text Generation: A Survey (2021) • Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21} • 88 citations
Li et al.
Guided Generation Of Cause And Effect (2021) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 47 citations
Li et al.
Cosda-ml: Multi-lingual Code-switching Data Augmentation For Zero-shot Cross-lingual NLP (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 126 citations
Qin et al.
ERNIE-GEN: An Enhanced Multi-flow Pre-training And Fine-tuning Framework For Natural Language Generation (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 105 citations
Xiao et al.
Improving Coreference Resolution By Leveraging Entity-centric Features With Graph Neural Networks And Second-order Inference (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 112 citations
Liu et al.
Transformers As Soft Reasoners Over Language (2020) • Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20} • 48 citations
Peter Clark, Oyvind Tafjord, Kyle Richardson
Exploiting Persona Information For Diverse Generation Of Conversational Responses (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 126 citations
Song et al.
A Dual Reinforcement Learning Framework For Unsupervised Text Style Transfer (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 155 citations
Luo et al.
Pre-training Of Graph Augmented Transformers For Medication Recommendation (2019) • Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19} • 168 citations
Shang et al.
Co-training Embeddings Of Knowledge Graphs And Entity Descriptions For Cross-lingual Entity Alignment (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 228 citations
Chen et al.
Controllable Neural Story Plot Generation Via Reward Shaping (2018) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 79 citations
Tambwekar et al.
Aspect Term Extraction With History Attention And Selective Transformation (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 275 citations
Li et al.
Distraction-based Neural Networks For Document Summarization (2016) • IJCAI 2016 • 61 citations
Chen et al.

Showing first 12 while collapsed. Click to expand and reveal all 17.

In Context Learning 160 papers #

Thinking With Video: Video Generation As A Promising Multimodal Reasoning Paradigm (2025) • No Venue
Tong et al.
Reasoning Models Better Express Their Confidence (2025) • No Venue
Yoon et al.
The Stochastic Parrot On Llm's Shoulder: A Summative Assessment Of Physical Concept Understanding (2025) • No Venue
Yu et al.
PRELUDE: A Benchmark Designed To Require Global Comprehension And Reasoning Over Long Contexts (2025) • No Venue
Yu et al.
Yue: Scaling Open Foundation Models For Long-form Music Generation (2025) • No Venue
Yuan et al.
SIFT: Grounding LLM Reasoning In Contexts Via Stickers (2025) • No Venue
Zeng et al.
Booststep: Boosting Mathematical Capability Of Large Language Models Via Improved Single-step Reasoning (2025) • No Venue
Zhang et al.
Univideo: Unified Understanding, Generation, And Editing For Videos (2025) • No Venue
Wei et al.
Less-to-more Generalization: Unlocking More Controllability By In-context Generation (2025) • No Venue
Wu et al.
Dope: Denoising Rotary Position Embedding (2025) • No Venue
Xiong et al.
Easyedit2: An Easy-to-use Steering Framework For Editing Large Language Models (2025) • No Venue
Xu et al.
Emergent Misalignment Via In-context Learning: Narrow In-context Examples Can Produce Broadly Misaligned Llms (2025) • No Venue
Afonin et al.
KV Cache Steering For Inducing Reasoning In Small Language Models (2025) • No Venue
Belitsky et al.
Neobert: A Next-generation BERT (2025) • No Venue
Breton et al.
Edit Transfer: Learning Image Editing Via Vision In-context Relations (2025) • No Venue
Chen et al.
WEAVE: Unleashing And Benchmarking The In-context Interleaved Comprehension And Generation (2025) • No Venue
Chow et al.
Pre-trained Large Language Models Learn Hidden Markov Models In-context (2025) • No Venue
Dai et al.
Machinelearninglm: Continued Pretraining Language Models On Millions Of Synthetic Tabular Prediction Tasks Scales In-context ML (2025) • No Venue
Dong et al.
Multiple Choice Questions: Reasoning Makes Large Language Models (llms) More Self-confident Even When They Are Wrong (2025) • No Venue
Fu et al.
Xolver: Multi-agent Reasoning With Holistic Experience Learning Just Like An Olympiad Team (2025) • No Venue
Hosain et al.
Ultramemv2: Memory Networks Scaling To 120B Parameters With Superior Long-context Learning (2025) • No Venue
Huang et al.
When Thoughts Meet Facts: Reusable Reasoning For Long-context Lms (2025) • No Venue
Jeong et al.
Analysing Chain Of Thought Dynamics: Active Guidance Or Unfaithful Post-hoc Rationalisation? (2025) • No Venue
Lewis-Lim et al.
Vfxmaster: Unlocking Dynamic Visual Effect Generation Via In-context Learning (2025) • No Venue
Li et al.
C3PO: Critical-layer, Core-expert, Collaborative Pathway Optimization For Test-time Expert Re-mixing (2025) • No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
Metaladder: Ascending Mathematical Solution Quality Via Analogical-problem Reasoning Transfer (2025) • No Venue
Lin et al.
Deciphering Trajectory-aided LLM Reasoning: An Optimization Perspective (2025) • No Venue
Liu et al.
BOLT: Bootstrap Long Chain-of-thought In Language Models Without Distillation (2025) • No Venue
Pang et al.
RWKV-7 "goose" With Expressive Dynamic State Evolution (2025) • No Venue
Peng et al.
Llm-microscope: Uncovering The Hidden Role Of Punctuation In Context Memory Of Transformers (2025) • No Venue
Razzhigaev et al.
Self-generated In-context Examples Improve LLM Agents For Sequential Decision-making Tasks (2025) • No Venue
Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
When Punctuation Matters: A Large-scale Comparison Of Prompt Robustness Methods For Llms (2025) • No Venue
Seleznyov et al.
Learn-by-interact: A Data-centric Framework For Self-adaptive Agents In Realistic Environments (2025) • No Venue
Su et al.
From Words To Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-context Examples (2024) • No Venue
Vacareanu et al.
Linear Transformers With Learnable Kernel Functions Are Better In-context Models (2024) • No Venue
Aksenov et al.
Large Language Models As Markov Chains (2024) • No Venue
Zekri et al.
Seed-tts: A Family Of High-quality Versatile Speech Generation Models (2024) • No Venue
Anastassiou et al.
Revisiting In-context Learning With Long Context Language Models (2024) • No Venue
Baek et al.
Longbench V2: Towards Deeper Understanding And Reasoning On Realistic Long-context Multitasks (2024) • No Venue
Bai et al.
Premise Order Matters In Reasoning With Large Language Models (2024) • No Venue
Chen et al.
Internlm-math: Open Math Large Language Models Toward Verifiable Reasoning (2024) • No Venue
Ying et al.
Chain-of-table: Evolving Tables In The Reasoning Chain For Table Understanding (2024) • No Venue
Wang et al.
Spreadsheetllm: Encoding Spreadsheets For Large Language Models (2024) • No Venue
Tian et al.
Llms In The Imaginarium: Tool Learning Through Simulated Trial And Error (2024) • No Venue
Wang et al.
X-prompt: Towards Universal In-context Image Generation In Auto-regressive Vision Language Foundation Models (2024) • No Venue
Sun et al.
Can Large Language Models Understand Context? (2024) • No Venue
Zhu et al.
Aligning Teacher With Student Preferences For Tailored Training Data Generation (2024) • No Venue
Liu et al.
Multimodal Self-instruct: Synthetic Abstract Image And Visual Reasoning Instruction Using Language Model (2024) • No Venue
Zhang et al.
A Controlled Study On Long Context Extension And Generalization In Llms (2024) • No Venue
Lu et al.
Bento: Benchmark Task Reduction With In-context Transferability (2024) • No Venue
Zhao et al.
WILBUR: Adaptive In-context Learning For Robust And Accurate Web Agents (2024) • No Venue
Lutz et al.
Foundation Models For Music: A Survey (2024) • No Venue
Ma et al.
Beyond Examples: High-level Automated Reasoning Paradigm In In-context Learning Via MCTS (2024) • No Venue
Wu et al.
MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training (2024) • No Venue
McKinzie et al.
Realm: Reference Resolution As Language Modeling (2024) • No Venue
Moniz et al.
Can Mamba Learn How To Learn? A Comparative Study On In-context Learning Tasks (2024) • No Venue
Park et al.
In-context Editing: Learning Knowledge From Self-induced Distributions (2024) • No Venue
Qi et al.
Needle Threading: Can Llms Follow Threads Through Near-million-scale Haystacks? (2024) • No Venue
Jonathan Roberts, Kai Han, Samuel Albanie
In-context Learning Enables Multimodal Large Language Models To Classify Cancer Pathology Images (2024) • Nature Communications • 68 citations
Ferber et al.
Stream Of Search (sos): Learning To Search In Language (2024) • No Venue
Gandhi et al.
Differential Transformer (2024) • No Venue
Ye et al.
Video As The New Language For Real-world Decision Making (2024) • No Venue
Yang et al.
Do Large Language Models Latently Perform Multi-hop Reasoning? (2024) • No Venue
Yang et al.
Symdpo: Boosting In-context Learning Of Large Multimodal Models With Symbol Demonstration Direct Preference Optimization (2024) • No Venue
Jia et al.
Many-shot In-context Learning In Multimodal Foundation Models (2024) • No Venue
Jiang et al.
Videoicl: Confidence-based Iterative In-context Learning For Out-of-distribution Video Understanding (2024) • No Venue
Kim et al.
Xgen-mm (BLIP-3): A Family Of Open Large Multimodal Models (2024) • No Venue
Xue et al.
Step-dpo: Step-wise Preference Optimization For Long-chain Reasoning Of Llms (2024) • No Venue
Lai et al.
Long-context Llms Struggle With Long In-context Learning (2024) • No Venue
Li et al.
Omnicorpus: A Unified Multimodal Corpus Of 10 Billion-level Images Interleaved With Text (2024) • No Venue
Li et al.
Making Text Embedders Few-shot Learners (2024) • No Venue
Li et al.
Needlebench: Can Llms Do Retrieval And Reasoning In 1 Million Context Window? (2024) • No Venue
Li et al.
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel In Long-horizon Tasks (2024) • No Venue
Li et al.
Learning To Learn Faster From Human Feedback With Language Model Predictive Control (2024) • No Venue
Liang et al.
How Far Are We From Intelligent Visual Deductive Reasoning? (2024) • No Venue
Zhang et al.
Critical Tokens Matter: Token-level Contrastive Estimation Enhence Llm's Reasoning Capability (2024) • No Venue
Lin et al.
System 2 Attention (is Something You Might Need Too) (2023) • No Venue
Jason Weston, Sainbayar Sukhbaatar
Large Language Models Can Infer Psychological Dispositions Of Social Media Users (2023) • PNAS Nexus • 45 citations
Heinrich Peters, Sandra Matz
Kosmos-2: Grounding Multimodal Large Language Models To The World (2023) • No Venue
Peng et al.
Towards Making The Most Of Chatgpt For Machine Translation (2023) • SSRN Electronic Journal • 94 citations
Peng et al.
Audioldm 2: Learning Holistic Audio Generation With Self-supervised Pretraining (2023) • No Venue
Liu et al.
Deid-gpt: Zero-shot Medical Text De-identification By GPT-4 (2023) • Arxiv • 89 citations
Liu et al.
Sql-palm: Improved Large Language Modeladaptation For Text-to-sql (2023) • No Venue
Sun et al.
Text Classification Via Large Language Models (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 108 citations
Sun et al.
Multilingual Machine Translation With Large Language Models: Empirical Results And Analysis (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 56 citations
Zhu et al.
Chatanything: Facetime Chat With Llm-enhanced Personas (2023) • No Venue
Zhao et al.
Chatcad+: Towards A Universal And Reliable Interactive CAD Using Llms (2023) • IEEE Transactions on Medical Imaging • 41 citations
Zhao et al.
Eureka: Human-level Reward Design Via Coding Large Language Models (2023) • No Venue
Ma et al.
Hyenadna: Long-range Genomic Sequence Modeling At Single Nucleotide Resolution (2023) • Arxiv • 140 citations
Nguyen et al.
Few-shot Fine-tuning Vs. In-context Learning: A Fair Comparison And Evaluation (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 54 citations
Mosbach et al.
Orca: Progressive Learning From Complex Explanation Traces Of GPT-4 (2023) • No Venue
Mukherjee et al.
Roco: Dialectic Multi-robot Collaboration With Large Language Models (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 78 citations
Zhao Mandi, Shreeya Jain, Shuran Song
A Survey On Multimodal Large Language Models (2023) • National Science Review • 271 citations
Yin et al.
Learning To Retrieve In-context Examples For Large Language Models (2023) • No Venue
Liang Wang, Nan Yang, Furu Wei
Tallrec: An Effective And Efficient Tuning Framework To Align Large Language Model With Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 242 citations
Bao et al.
Neural Codec Language Models Are Zero-shot Text To Speech Synthesizers (2023) • IEEE Transactions on Audio, Speech and Language Processing • 47 citations
Wang et al.
Skills-in-context Prompting: Unlocking Compositionality In Large Language Models (2023) • No Venue
Chen et al.
Scalable Multi-robot Collaboration With Large Language Models: Centralized Or Decentralized Systems? (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 46 citations
Chen et al.
Subject-driven Text-to-image Generation Via Apprenticeship Learning (2023) • Arxiv • 46 citations
Chen et al.
GPT-RE: In-context Learning For Relation Extraction Using Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 110 citations
Wan et al.
A GPT-4 Reticular Chemist For Guiding MOF Discovery (2023) • Angewandte Chemie International Edition • 117 citations
Zheng et al.
Contrastive Chain-of-thought Prompting (2023) • No Venue
Chia et al.
Language Modeling Is Compression (2023) • No Venue
Delétang et al.
In-context Pretraining: Language Modeling Beyond Document Boundaries (2023) • No Venue
Shi et al.
Is Chatgpt A Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation (2023) • Arxiv • 54 citations
Fang et al.
Gpt4aigchip: Towards Next-generation AI Accelerator Design Automation Via Large Language Models (2023) • 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) • 79 citations
Fu et al.
Is Chatgpt A Good Causal Reasoner? A Comprehensive Evaluation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Gao et al.
What Makes Good In-context Demonstrations For Code Intelligence Tasks With Llms? (2023) • 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) • 65 citations
Gao et al.
In-context Autoencoder For Context Compression In A Large Language Model (2023) • No Venue
Ge et al.
Large Language Models Are Few-shot Summarizers: Multi-intent Comment Generation Via In-context Learning (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 89 citations
Geng et al.
In-context Learning Creates Task Vectors (2023) • No Venue
Roee Hendel, Mor Geva, Amir Globerson
Graspgpt: Leveraging Semantic Knowledge From A Large Language Model For Task-oriented Grasping (2023) • IEEE Robotics and Automation Letters • 67 citations
Tang et al.
Towards Interpretable Mental Health Analysis With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 84 citations
Yang et al.
Lorahub: Efficient Cross-task Generalization Via Dynamic Lora Composition (2023) • No Venue
Huang et al.
Mega-tts 2: Zero-shot Text-to-speech With Arbitrary Length Speech Prompts (2023) • No Venue
Jiang et al.
Llmlingua: Compressing Prompts For Accelerated Inference Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 64 citations
Jiang et al.
LILAC: Log Parsing Using Llms With Adaptive Parsing Cache (2023) • Proceedings of the ACM on Software Engineering • 51 citations
Jiang et al.
Genegpt: Augmenting Large Language Models With Domain Tools For Improved Access To Biomedical Information (2023) • Bioinformatics • 99 citations
Jin et al.
Videodirectorgpt: Consistent Multi-scene Video Generation Via Llm-guided Planning (2023) • No Venue
Lin et al.
The Unlocking Spell On Base Llms: Rethinking Alignment Via In-context Learning (2023) • No Venue
Lin et al.
VILA: On Pre-training For Visual Language Models (2023) • No Venue
Lin et al.
DIN-SQL: Decomposed In-context Learning Of Text-to-sql With Self-correction (2023) • Arxiv • 53 citations
Mohammadreza Pourreza, Davood Rafiei
Taskmatrix.ai: Completing Tasks By Connecting Foundation Models With Millions Of Apis (2023) • Intelligent Computing • 76 citations
Liang et al.
JEN-1: Text-guided Universal Music Generation With Omnidirectional Diffusion Models (2023) • No Venue
Li et al.
Otter: A Multi-modal Model With In-context Instruction Tuning (2023) • Arxiv • 87 citations
Li et al.
Textbooks Are All You Need II: Phi-1.5 Technical Report (2023) • No Venue
Li et al.
Dspy: Compiling Declarative Language Model Calls Into Self-improving Pipelines (2023) • No Venue
Khattab et al.
In-context Retrieval-augmented Language Models (2023) • Transactions of the Association for Computational Linguistics • 201 citations
Ram et al.
Extractive Summarization Via Chatgpt For Faithful Summary Generation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 53 citations
Haopeng Zhang, Xiao Liu, Jiawei Zhang
Voicebox: Text-guided Multilingual Universal Speech Generation At Scale (2023) • Arxiv • 44 citations
Le et al.
Layoutllm-t2i: Eliciting Layout Guidance From LLM For Text-to-image Generation (2023) • MM '23: The 31st ACM International Conference on Multimedia • 65 citations
Qu et al.
Teaching Models To Express Their Uncertainty In Words (2022) • Arxiv • 53 citations
Stephanie Lin, Jacob Hilton, Owain Evans
Black-box Tuning For Language-model-as-a-service (2022) • Arxiv • 56 citations
Sun et al.
Few-shot Parameter-efficient Fine-tuning Is Better And Cheaper Than In-context Learning (2022) • Arxiv • 292 citations
Liu et al.
Impact Of Pretraining Term Frequencies On Few-shot Reasoning (2022) • Arxiv • 51 citations
Razeghi et al.
Can Language Models Learn From Explanations In Context? (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 59 citations
Lampinen et al.
CM3: A Causal Masked Multimodal Model Of The Internet (2022) • Arxiv • 40 citations
Aghajanyan et al.
Gpt-neox-20b: An Open-source Autoregressive Language Model (2022) • Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models • 241 citations
Black et al.
Audiolm: A Language Modeling Approach To Audio Generation (2022) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 252 citations
Borsos et al.
Large Language Models Are Few(1)-shot Table Reasoners (2022) • Findings of the Association for Computational Linguistics: EACL 2023 • 41 citations
Wenhu Chen
Rethinking The Role Of Demonstrations: What Makes In-context Learning Work? (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 542 citations
Min et al.
Why Can GPT Learn In-context? Language Models Implicitly Perform Gradient Descent As Meta-optimizers (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 73 citations
Dai et al.
A Survey On In-context Learning (2022) • Arxiv • 240 citations
Dong et al.
The Unreliability Of Explanations In Few-shot Prompting For Textual Reasoning (2022) • Arxiv • 52 citations
Xi Ye, Greg Durrett
Dynamic Prompt Learning Via Policy Gradient For Semi-structured Mathematical Reasoning (2022) • Arxiv • 41 citations
Lu et al.
Active Example Selection For In-context Learning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 66 citations
Yiming Zhang, Shi Feng, Chenhao Tan
UL2: Unifying Language Learning Paradigms (2022) • Arxiv • 97 citations
Tay et al.
Language Models Are Few-shot Multilingual Learners (2021) • Proceedings of the 1st Workshop on Multilingual Representation Learning • 46 citations
Winata et al.
What Makes Good In-context Examples For GPT-$3$? (2021) • Arxiv • 154 citations
Liu et al.
Prompt Programming For Large Language Models: Beyond The Few-shot Paradigm (2021) • CHI '21: CHI Conference on Human Factors in Computing Systems • 517 citations
Laria Reynolds, Kyle McDonell
Pangu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language Models With Auto-parallel Computation (2021) • Arxiv • 94 citations
Zeng et al.
Metaicl: Learning To Learn In Context (2021) • Arxiv • 61 citations
Min et al.
Noisy Channel Language Model Prompting For Few-shot Text Classification (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 46 citations
Min et al.
What Changes Can Large-scale Language Models Bring? Intensive Study On Hyperclova: Billions-scale Korean Generative Pretrained Transformers (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 43 citations
Kim et al.
Multitask Prompted Training Enables Zero-shot Task Generalization (2021) • Arxiv • 558 citations
Sanh et al.
Fantastically Ordered Prompts And Where To Find Them: Overcoming Few-shot Prompt Order Sensitivity (2021) • Arxiv • 118 citations
Lu et al.
Cutting Down On Prompts And Parameters: Simple Few-shot Learning With Language Models (2021) • Findings of the Association for Computational Linguistics: ACL 2022 • 42 citations
Logan et al.
Autoprompt: Eliciting Knowledge From Language Models With Automatically Generated Prompts (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 611 citations
Shin et al.
Negated And Misprimed Probes For Pretrained Language Models: Birds Can Talk, But Cannot Fly (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 69 citations
Nora Kassner, Hinrich Schütze
One-shot Generalization In Deep Generative Models (2016) • Arxiv • 75 citations
Rezende et al.

Showing first 12 while collapsed. Click to expand and reveal all 160.

Instruction Following 334 papers #

Falcon-h1: A Family Of Hybrid-head Language Models Redefining Efficiency And Performance (2025) • No Venue
Zuo et al.
Vargpt-v1.1: Improve Visual Autoregressive Large Unified Model Via Iterative Instruction Tuning And Reinforcement Learning (2025) • No Venue
Zhuang et al.
Longwriter-v: Enabling Ultra-long And High-fidelity Generation In Vision-language Models (2025) • No Venue
Tu et al.
The End Of Manual Decoding: Towards Truly End-to-end Language Models (2025) • No Venue
Wang et al.
GPT-IMAGE-EDIT-1.5M: A Million-scale, Gpt-generated Image Dataset (2025) • No Venue
Wang et al.
Breaking The Exploration Bottleneck: Rubric-scaffolded Reinforcement Learning For General LLM Reasoning (2025) • No Venue
Zhou et al.
Vibe Checker: Aligning Code Evaluation With Human Preference (2025) • No Venue
Zhong et al.
Iheval: Evaluating Language Models On Following The Instruction Hierarchy (2025) • No Venue
Zhang et al.
Inverse Ifeval: Can Llms Unlearn Stubborn Training Conventions To Follow Real Instructions? (2025) • No Venue
Zhang et al.
Omnialign-v: Towards Enhanced Alignment Of Mllms With Human Preference (2025) • No Venue
Zhao et al.
MM-HELIX: Boosting Multimodal Long-chain Reflective Reasoning With Holistic Platform And Adaptive Hybrid Policy Optimization (2025) • No Venue
Zhao et al.
Unicorn: Text-only Data Synthesis For Vision Language Model Training (2025) • No Venue
Yu et al.
Sa2va: Marrying SAM2 With Llava For Dense Grounded Understanding Of Images And Videos (2025) • No Venue
Yuan et al.
Internlm-xcomposer2.5-reward: A Simple Yet Effective Multi-modal Reward Model (2025) • No Venue
Zang et al.
Simplerl-zoo: Investigating And Taming Zero Reinforcement Learning For Open Base Models In The Wild (2025) • No Venue
Zeng et al.
SIFT: Grounding LLM Reasoning In Contexts Via Stickers (2025) • No Venue
Zeng et al.
Complex Logical Instruction Generation (2025) • No Venue
Zhang et al.
Diffusion Vs. Autoregressive Language Models: A Text Embedding Perspective (2025) • No Venue
Zhang et al.
Llama-3.1-foundationai-securityllm-8b-instruct Technical Report (2025) • No Venue
Weerawardhena et al.
Univideo: Unified Understanding, Generation, And Editing For Videos (2025) • No Venue
Wei et al.
On The Theoretical Limitations Of Embedding-based Retrieval (2025) • No Venue
Weller et al.
Rank1: Test-time Compute For Reranking In Information Retrieval (2025) • No Venue
Weller et al.
Any2caption:interpreting Any Condition To Caption For Controllable Video Generation (2025) • No Venue
Wu et al.
Multiplayer Nash Preference Optimization (2025) • No Venue
Wu et al.
Dreamomni2: Multimodal Instruction-based Editing And Generation (2025) • No Venue
Xia et al.
Towards System 2 Reasoning In Llms: Learning How To Think With Meta Chain-of-though (2025) • No Venue
Xiang et al.
Qwen2.5-omni Technical Report (2025) • No Venue
Xu et al.
Audio-flan: A Preliminary Release (2025) • No Venue
Xue et al.
Can Understanding And Generation Truly Benefit Together -- Or Just Coexist? (2025) • No Venue
Yan et al.
Table-r1: Inference-time Scaling For Table Reasoning (2025) • No Venue
Yang et al.
Shapellm-omni: A Native Multimodal LLM For 3D Generation And Understanding (2025) • No Venue
Ye et al.
Magistral (2025) • No Venue
Mistral-Ai et al.
Smollm2: When Smol Goes Big -- Data-centric Training Of A Small Language Model (2025) • No Venue
Allal et al.
Ultraif: Advancing Instruction Following From The Wild (2025) • No Venue
An et al.
Game-time: Evaluating Temporal Dynamics In Spoken Language Models (2025) • No Venue
Chang et al.
Blip3o-next: Next Frontier Of Native Image Generation (2025) • No Venue
Chen et al.
Coda: Coding LM Via Diffusion Adaptation (2025) • No Venue
Chen et al.
Fusionaudio-1.2m: Towards Fine-grained Audio Captioning With Multimodal Contextual Fusion (2025) • No Venue
Chen et al.
MIG: Automatic Data Selection For Instruction Tuning By Maximizing Information Gain In Semantic Space (2025) • No Venue
Chen et al.
Minmo: A Multimodal Large Language Model For Seamless Voice Interaction (2025) • No Venue
Chen et al.
Sharegpt-4o-image: Aligning Multimodal Models With Gpt-4o-level Image Generation (2025) • No Venue
Chen et al.
Instruction-guided Lesion Segmentation For Chest X-rays With Automatically Generated Large-scale Dataset (2025) • No Venue
Choi et al.
Mm-ifengine: Towards Multimodal Instruction Following (2025) • No Venue
Ding et al.
Arm-thinker: Reinforcing Multimodal Generative Reward Models With Agentic Tool Use And Visual Reasoning (2025) • No Venue
Ding et al.
Kling-avatar: Grounding Multimodal Instructions For Cascaded Long-duration Avatar Animation Synthesis (2025) • No Venue
Ding et al.
MMTEB: Massive Multilingual Text Embedding Benchmark (2025) • No Venue
Enevoldsen et al.
Llama-omni2: Llm-based Real-time Spoken Chatbot With Autoregressive Streaming Speech Synthesis (2025) • No Venue
Fang et al.
A Multi-modal AI Copilot For Single-cell Analysis With Instruction Following (2025) • No Venue
Fang et al.
Robix: A Unified Model For Robot Interaction, Reasoning And Planning (2025) • No Venue
Fang et al.
X-omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again (2025) • No Venue
Geng et al.
Seedream 2.0: A Native Chinese-english Bilingual Image Generation Foundation Model (2025) • No Venue
Gong et al.
Breaking The Modality Barrier: Universal Embedding Learning With Multimodal Llms (2025) • No Venue
Gu et al.
Audiostory: Generating Long-form Narrative Audio With Large Language Models (2025) • No Venue
Guo et al.
Hala Technical Report: Building Arabic-centric Instruction & Translation Models At Scale (2025) • No Venue
Hasan Abed Al Kader Hammoud, Mohammad Zbeeb, Bernard Ghanem
Mesatask: Towards Task-driven Tabletop Scene Generation Via 3D Spatial Reasoning (2025) • No Venue
Hao et al.
Adaspec: Selective Knowledge Distillation For Efficient Speculative Decoders (2025) • No Venue
Hu et al.
Image Editing As Programs With Diffusion Models (2025) • No Venue
Hu et al.
See, Point, Fly: A Learning-free VLM Framework For Universal Unmanned Aerial Navigation (2025) • No Venue
Hu et al.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability Of LLM Reasoning (2025) • No Venue
Huan et al.
Benchmax: A Comprehensive Multilingual Evaluation Suite For Large Language Models (2025) • No Venue
Huang et al.
Reasoning Model Is Stubborn: Diagnosing Instruction Overriding In Reasoning Models (2025) • No Venue
Jang et al.
S2s-arena, Evaluating Speech2speech Protocols On Instruction Following With Paralinguistic Information (2025) • No Venue
Jiang et al.
Mol-llama: Towards General Understanding Of Molecules In Large Molecular Language Model (2025) • No Venue
Dongki Kim, Wonbin Lee, Sung Ju Hwang
Kormo: Korean Open Reasoning Model For Everyone (2025) • No Venue
Kim et al.
Distillm-2: A Contrastive Approach Boosts The Distillation Of Llms (2025) • No Venue
Ko et al.
Language Self-play For Data-free Training (2025) • No Venue
Kuba et al.
Nohumansrequired: Autonomous High-quality Image Editing Triplet Mining (2025) • No Venue
Kuprashevich et al.
Analysing Chain Of Thought Dynamics: Active Guidance Or Unfaithful Post-hoc Rationalisation? (2025) • No Venue
Lewis-Lim et al.
Drafterbench: Benchmarking Large Language Models For Tasks Automation In Civil Engineering (2025) • No Venue
Yinsheng Li, Zhen Dong, Yi Shao
If-vidcap: Can Video Caption Models Follow Instructions? (2025) • No Venue
Li et al.
Have We Unified Image Generation And Understanding Yet? An Empirical Study Of Gpt-4o's Image Generation Ability (2025) • No Venue
Ning Li, Jingran Zhang, Justin Cui
How Instruction And Reasoning Data Shape Post-training: Data Quality Through The Lens Of Layer-wise Gradients (2025) • No Venue
Li et al.
Migician: Revealing The Magic Of Free-form Multi-image Grounding In Multimodal Large Language Models (2025) • No Venue
Li et al.
Jointly Reinforcing Diversity And Quality In Language Model Generations (2025) • No Venue
Li et al.
JARVIS-VLA: Post-training Large-scale Vision Language Models To Play Visual Games With Keyboards And Mouse (2025) • No Venue
Li et al.
Sos1: O1 And R1-like Reasoning Llms Are Sum-of-square Solvers (2025) • No Venue
Li et al.
Test-time Preference Optimization: On-the-fly Alignment Via Iterative Textual Feedback (2025) • No Venue
Li et al.
Discrete Diffusion VLA: Bringing Discrete Diffusion To Action Decoding In Vision-language-action Policies (2025) • No Venue
Liang et al.
Motif 2 12.7B Technical Report (2025) • No Venue
Lim et al.
Jarvisart: Liberating Human Artistic Creativity Via An Intelligent Photo Retouching Agent (2025) • No Venue
Lin et al.
Self-supervised Quantized Representation For Seamlessly Integrating Knowledge Graphs With Large Language Models (2025) • No Venue
Lin et al.
Pc-agent: A Hierarchical Multi-agent Collaboration Framework For Complex Task Automation On PC (2025) • No Venue
Liu et al.
Ovis2.5 Technical Report (2025) • No Venue
Lu et al.
Being-h0: Vision-language-action Pretraining From Large-scale Human Videos (2025) • No Venue
Luo et al.
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, And Control In Spaces (2025) • No Venue
Luo et al.
Cmi-bench: A Comprehensive Benchmark For Evaluating Music Instruction Following (2025) • No Venue
Ma et al.
TCIA: A Task-centric Instruction Augmentation Method For Instruction Finetuning (2025) • No Venue
Ma et al.
Large Language Diffusion Models (2025) • No Venue
Nie et al.
Viscoder: Fine-tuning Llms For Executable Python Visualization Code Generation (2025) • No Venue
Ni et al.
Agentic Reward Modeling: Integrating Human Preferences With Verifiable Correctness Signals For Reliable Reward Systems (2025) • No Venue
Peng et al.
Sofar: Language-grounded Orientation Bridges Spatial Reasoning And Object Manipulation (2025) • No Venue
Qi et al.
Beyond The Trade-off: Self-supervised Reinforcement Learning For Reasoning Models' Instruction Following (2025) • No Venue
Ren et al.
Anycap Project: A Unified Framework, Dataset, And Benchmark For Controllable Omni-modal Captioning (2025) • No Venue
Ren et al.
Aligning Text, Images, And 3D Structure Token-by-token (2025) • No Venue
Aadarsh Sahoo, Vansh Tibrewal, Georgia Gkioxari
ABC: Achieving Better Control Of Multimodal Embeddings Using Vlms (2025) • No Venue
Benjamin Schneider, Florian Kerschbaum, Wenhu Chen
The Illusion Of Diminishing Returns: Measuring Long Horizon Execution In Llms (2025) • No Venue
Sinha et al.
IFIR: A Comprehensive Benchmark For Evaluating Instruction-following In Expert-domain Information Retrieval (2025) • No Venue
Song et al.
Progco: Program Helps Self-correction Of Large Language Models (2025) • No Venue
Song et al.
Au-harness: An Open-source Toolkit For Holistic Evaluation Of Audio Llms (2025) • No Venue
Surapaneni et al.
Lumine: An Open Recipe For Building Generalist Agents In 3D Open Worlds (2025) • No Venue
Tan et al.
Exploring The Potential Of Encoder-free Architectures In 3D Lmms (2025) • No Venue
Tang et al.
Kwai Keye-vl Technical Report (2025) • No Venue
Team et al.
Gemma 3 Technical Report (2025) • No Venue
Team et al.
Hermes 4 Technical Report (2025) • No Venue
Teknium et al.
Openmathinstruct-1: A 1.8 Million Math Instruction Tuning Dataset (2024) • No Venue
Toshniwal et al.
CS1-LLM: Integrating Llms Into CS1 Instruction (2024) • ITiCSE 2024: Innovation and Technology in Computer Science Education • 45 citations
Vadaparty et al.
Meltemi: The First Open Large Language Model For Greek (2024) • No Venue
Voukoutis et al.
Bigcodebench: Benchmarking Code Generation With Diverse Function Calls And Complex Instructions (2024) • No Venue
Zhuo et al.
The Instruction Hierarchy: Training Llms To Prioritize Privileged Instructions (2024) • No Venue
Wallace et al.
Qwen2.5 Technical Report (2024) • No Venue
Qwen et al.
Maya: An Instruction Finetuned Multilingual Multimodal Model (2024) • No Venue
Alam et al.
Skyeyegpt: Unifying Remote Sensing Vision-language Tasks Via Instruction Tuning With Large Language Model (2024) • ISPRS Journal of Photogrammetry and Remote Sensing • 51 citations
Yang Zhan, Zhitong Xiong, Yuan Yuan
Anygpt: Unified Multimodal LLM With Discrete Sequence Modeling (2024) • No Venue
Zhan et al.
Longalign: A Recipe For Long Context Alignment Of Large Language Models (2024) • No Venue
Bai et al.
$\pi_0$: A Vision-language-action Flow Model For General Robot Control (2024) • Robotics: Science and Systems 2025 • 48 citations
Black et al.
Chexagent: Towards A Foundation Model For Chest X-ray Interpretation (2024) • No Venue
Chen et al.
Self-rewarding Language Models (2024) • No Venue
Yuan et al.
Gpt-4v(ision) Is A Generalist Web Agent, If Grounded (2024) • No Venue
Zheng et al.
Instruction Pre-training: Language Models Are Supervised Multitask Learners (2024) • No Venue
Cheng et al.
Qwen2-audio Technical Report (2024) • No Venue
Chu et al.
LLM-AD: Large Language Model Based Audio Description System (2024) • No Venue
Peng Chu, Jiang Wang, Andre Abrantes
RACER: Rich Language-guided Failure Recovery Policies For Imitation Learning (2024) • No Venue
Dai et al.
Symbolicai: A Framework For Logic-based Approaches Combining Generative Models And Solvers (2024) • No Venue
Dinu et al.
Ferret-ui: Grounded Mobile UI Understanding With Multimodal Llms (2024) • No Venue
You et al.
Jamba-1.5: Hybrid Transformer-mamba Models At Scale (2024) • No Venue
Team et al.
Mtu-bench: A Multi-granularity Tool-use Benchmark For Large Language Models (2024) • No Venue
Wang et al.
Scaling Instructable Agents Across Many Simulated Worlds (2024) • No Venue
Team et al.
Learnlm: Improving Gemini For Learning (2024) • No Venue
Team et al.
Octo: An Open-source Generalist Robot Policy (2024) • No Venue
Team et al.
Lami: Large Language Models For Multi-modal Human-robot Interaction (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 42 citations
Wang et al.
How Do Your Code Llms Perform? Empowering Code Instruction Tuning With High-quality Data (2024) • No Venue
Wang et al.
Hermes 3 Technical Report (2024) • No Venue
Ryan Teknium, Jeffrey Quesnelle, Chen Guang
Structlm: Towards Building Generalist Models For Structured Knowledge Grounding (2024) • No Venue
Zhuang et al.
Easyref: Omni-generalized Group Image Reference For Diffusion Models Via Multimodal LLM (2024) • No Venue
Zong et al.
Video-star: Self-training Enables Video Instruction Tuning With Any Supervision (2024) • No Venue
Zohar et al.
Llava-3d: A Simple Yet Effective Pathway To Empowering Lmms With 3d-awareness (2024) • No Venue
Zhu et al.
LOGO -- Long Context Alignment Via Efficient Preference Optimization (2024) • No Venue
Tang et al.
Atlas-chat: Adapting Large Language Models For Low-resource Moroccan Arabic Dialect (2024) • No Venue
Shang et al.
Learning To Decode Collaboratively With Multiple Language Models (2024) • No Venue
Shen et al.
Explanatory Instructions: Towards Unified Vision Tasks Understanding And Zero-shot Generalization (2024) • No Venue
Shen et al.
Aya Model: An Instruction Finetuned Open-access Multilingual Language Model (2024) • No Venue
Üstün et al.
Aya Dataset: An Open-access Collection For Multilingual Instruction Tuning (2024) • No Venue
Singh et al.
Funaudiollm: Voice Understanding And Generation Foundation Models For Natural Interaction Between Humans And Llms (2024) • No Venue
Tongyi Speechteam
Canttalkaboutthis: Aligning Language Models To Stay On Topic In Dialogues (2024) • No Venue
Sreedhar et al.
Aligning Teacher With Student Preferences For Tailored Training Data Generation (2024) • No Venue
Liu et al.
Chatqa: Building GPT-4 Level Conversational QA Models (2024) • No Venue
Liu et al.
Harnessing Webpage Uis For Text-rich Visual Understanding (2024) • No Venue
Liu et al.
MMDU: A Multi-turn Multi-image Dialog Understanding Benchmark And Instruction-tuning Dataset For Lvlms (2024) • No Venue
Liu et al.
Teach Multimodal Llms To Comprehend Electrocardiographic Images (2024) • No Venue
Liu et al.
Video Instruction Tuning With Synthetic Data (2024) • No Venue
Zhang et al.
MAVIS: Mathematical Visual Instruction Tuning (2024) • No Venue
Zhang et al.
Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments (2024) • No Venue
Xi et al.
Multimodal Self-instruct: Synthetic Abstract Image And Visual Reasoning Instruction Using Language Model (2024) • No Venue
Zhang et al.
Seallms 3: Open Foundation And Chat Multilingual Large Language Models For Southeast Asian Languages (2024) • No Venue
Zhang et al.
Self-exploring Language Models: Active Preference Elicitation For Online Alignment (2024) • No Venue
Zhang et al.
Large Language Models Are Superpositions Of All Characters: Attaining Arbitrary Role-play Via Self-alignment (2024) • No Venue
Lu et al.
Llama Beyond English: An Empirical Study On Language Capability Transfer (2024) • No Venue
Zhao et al.
Mmevol: Empowering Multimodal Large Language Models With Evol-instruct (2024) • No Venue
Luo et al.
Thinking Llms: General Instruction Following With Thought Generation (2024) • No Venue
Wu et al.
Meta-rewarding Language Models: Self-improving Alignment With Llm-as-a-meta-judge (2024) • No Venue
Wu et al.
Llama Pro: Progressive Llama With Block Expansion (2024) • No Venue
Wu et al.
Rephrasing The Web: A Recipe For Compute And Data-efficient Language Modeling (2024) • No Venue
Maini et al.
Wildchat: 1M Chatgpt Interaction Logs In The Wild (2024) • No Venue
Zhao et al.
Chartgemma: Visual Instruction-tuning For Chart Reasoning In The Wild (2024) • No Venue
Masry et al.
Agentinstruct: Toward Generative Teaching With Agentic Flows (2024) • No Venue
Mitra et al.
Generative Representational Instruction Tuning (2024) • No Venue
Muennighoff et al.
Olmoe: Open Mixture-of-experts Language Models (2024) • No Venue
Muennighoff et al.
Llamo: Large Language Model-based Molecular Graph Assistant (2024) • No Venue
Park et al.
Iterative Reasoning Preference Optimization (2024) • No Venue
Pang et al.
IOPO: Empowering Llms With Complex Instruction Following Via Input-output Preference Optimization (2024) • No Venue
Zhang et al.
Livebench: A Challenging, Contamination-free LLM Benchmark (2024) • No Venue
White et al.
Promptriever: Instruction-trained Retrievers Can Be Prompted Like Language Models (2024) • No Venue
Weller et al.
Movie Gen: A Cast Of Media Foundation Models (2024) • No Venue
Polyak et al.
SNIFFER: Multimodal Large Language Model For Explainable Out-of-context Misinformation Detection (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Qi et al.
VISTA: Enhancing Long-duration And High-resolution Video Understanding By Video Spatiotemporal Augmentation (2024) • No Venue
Ren et al.
EXAONE 3.5: Series Of Large Language Models For Real-world Use Cases (2024) • No Venue
Research et al.
Selfcodealign: Self-alignment For Code Generation (2024) • No Venue
Wei et al.
Toward General Instruction-following Alignment For Retrieval-augmented Generation (2024) • No Venue
Dong et al.
Hyperclova X Technical Report (2024) • No Venue
Yoo et al.
Llama-omni: Seamless Speech Interaction With Large Language Models (2024) • No Venue
Fang et al.
Fuzzcoder: Byte-level Fuzzing Test Via Large Language Model (2024) • No Venue
Yang et al.
Openfedllm: Training Large Language Models On Decentralized Private Data Via Federated Learning (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 47 citations
Ye et al.
VITA: Towards Open-source Interactive Omni Multimodal LLM (2024) • No Venue
Fu et al.
Towards A Unified View Of Preference Learning For Large Language Models: A Survey (2024) • No Venue
Gao et al.
Longins: A Challenging Long-context Instruction-based Exam For Llms (2024) • No Venue
Gavin et al.
Visual Fact Checker: Enabling High-fidelity Detailed Caption Generation (2024) • No Venue
Ge et al.
GAMA: A Large Audio-language Model With Advanced Audio Understanding And Complex Reasoning Abilities (2024) • No Venue
Ghosh et al.
Chatglm: A Family Of Large Language Models From GLM-130B To GLM-4 All Tools (2024) • No Venue
Glm et al.
Mammoth-vl: Eliciting Multimodal Reasoning With Instruction Tuning At Scale (2024) • No Venue
Guo et al.
Webvoyager: Building An End-to-end Web Agent With Large Multimodal Models (2024) • No Venue
He et al.
Instruction Following Without Instruction Tuning (2024) • No Venue
Hewitt et al.
ORPO: Monolithic Preference Optimization Without Reference Model (2024) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 43 citations
Jiwoo Hong, Noah Lee, James Thorne
3D-GRAND: A Million-scale Dataset For 3d-llms With Better Grounding And Less Hallucination (2024) • No Venue
Yang et al.
Not All LLM Reasoners Are Created Equal (2024) • No Venue
Hosseini et al.
Evaluating And Aligning Codellms On Human Preference (2024) • No Venue
Yang et al.
Smaller Language Models Are Better Instruction Evolvers (2024) • No Venue
Hui et al.
Mixtral Of Experts (2024) • No Venue
Jiang et al.
Chatqa 2: Bridging The Gap To Proprietary Llms In Long Context And RAG Capabilities (2024) • No Venue
Xu et al.
Longvila: Scaling Long-context Visual Language Models For Long Videos (2024) • No Venue
Xue et al.
Meteor: Mamba-based Traversal Of Rationale For Large Language And Vision Models (2024) • No Venue
Lee et al.
Collavo: Crayon Large Language And Vision Model (2024) • No Venue
Lee et al.
Moai: Mixture Of All Intelligence For Large Language And Vision Models (2024) • No Venue
Lee et al.
Stronger Models Are NOT Stronger Teachers For Instruction Tuning (2024) • No Venue
Xu et al.
Autocoder: Enhancing Code Large Language Model With Aiev-instruct (2024) • No Venue
Bin Lei, Yuchen Li, Qiuwu Chen
Aria: An Open Multimodal Native Mixture-of-experts Model (2024) • No Venue
Li et al.
Brushedit: All-in-one Image Inpainting And Editing (2024) • No Venue
Li et al.
Androidlab: Training And Systematic Benchmarking Of Android Autonomous Agents (2024) • No Venue
Xu et al.
Scilitllm: How To Adapt Llms For Scientific Literature Understanding (2024) • No Venue
Li et al.
Omnibench: Towards The Future Of Universal Omni-language Models (2024) • No Venue
Li et al.
Ruler: A Model-agnostic Method To Control Generated Length For Large Language Models (2024) • No Venue
Li et al.
Synthetic Data (almost) From Scratch: Generalized Instruction Tuning For Language Models (2024) • No Venue
Li et al.
Urbangpt: Spatio-temporal Large Language Models (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 69 citations
Li et al.
Unbounded: A Generative Infinite Game Of Character Life Simulation (2024) • No Venue
Li et al.
Agenttrek: Agent Trajectory Synthesis Via Guiding Replay With Web Tutorials (2024) • No Venue
Xu et al.
Earthgpt: A Universal Multi-modal Large Language Model For Multi-sensor Image Comprehension In Remote Sensing Domain (2024) • IEEE Transactions on Geoscience and Remote Sensing • 78 citations
Zhang et al.
Llava-critic: Learning To Evaluate Multimodal Models (2024) • No Venue
Xiong et al.
Instruct-musicgen: Unlocking Text-to-music Editing For Music Language Models Via Instruction Tuning (2024) • No Venue
Zhang et al.
FLAME: Factuality-aware Alignment For Large Language Models (2024) • No Venue
Lin et al.
Showui: One Vision-language-action Model For GUI Visual Agent (2024) • No Venue
Lin et al.
Pixwizard: Versatile Image-to-image Visual Assistant With Open-language Instructions (2024) • No Venue
Lin et al.
System 2 Attention (is Something You Might Need Too) (2023) • No Venue
Jason Weston, Sainbayar Sukhbaatar
Instruction Tuning With GPT-4 (2023) • Arxiv • 184 citations
Peng et al.
Visual Instruction Tuning (2023) • Arxiv • 659 citations
Liu et al.
Improved Baselines With Visual Instruction Tuning (2023) • No Venue
Liu et al.
Llava-plus: Learning To Use Tools For Creating Multimodal Agents (2023) • No Venue
Liu et al.
3D-GPT: Procedural 3D Modeling With Large Language Models (2023) • No Venue
Sun et al.
Towards Autonomous System: Flexible Modular Production System Enhanced With Large Language Model Agents (2023) • 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA) • 57 citations
Xia et al.
The Flan Collection: Designing Data And Methods For Effective Instruction Tuning (2023) • Arxiv • 109 citations
Longpre et al.
Seallms -- Large Language Models For Southeast Asia (2023) • No Venue
Nguyen et al.
Llasm: Large Language And Speech Model (2023) • No Venue
Shu et al.
Orca 2: Teaching Small Language Models How To Reason (2023) • No Venue
Mitra et al.
Anymal: An Efficient And Scalable Any-modality Augmented Language Model (2023) • No Venue
Moon et al.
Octopack: Instruction Tuning Code Large Language Models (2023) • No Venue
Muennighoff et al.
Orca: Progressive Learning From Complex Explanation Traces Of GPT-4 (2023) • No Venue
Mukherjee et al.
Pmc-llama: Towards Building Open-source Language Models For Medicine (2023) • Journal of the American Medical Informatics Association • 179 citations
Wu et al.
Bubogpt: Enabling Visual Grounding In Multi-modal Llms (2023) • No Venue
Zhao et al.
Q-instruct: Improving Low-level Visual Abilities For Multi-modality Foundation Models (2023) • No Venue
Wu et al.
Next-gpt: Any-to-any Multimodal LLM (2023) • No Venue
Wu et al.
Can Chatgpt Write A Good Boolean Query For Systematic Review Literature Search? (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 181 citations
Wang et al.
Instructuie: Multi-task Instruction Tuning For Unified Information Extraction (2023) • Arxiv • 46 citations
Wang et al.
Becoming Self-instruct: Introducing Early Stopping Criteria For Minimal Instruct Tuning (2023) • No Venue
Alshikh et al.
Codet5+: Open Code Large Language Models For Code Understanding And Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 215 citations
Wang et al.
Tablegpt: Towards Unifying Tables, Nature Language And Commands Into One GPT (2023) • No Venue
Zha et al.
Agenttuning: Enabling Generalized Agent Abilities For Llms (2023) • No Venue
Zeng et al.
Principled Instructions Are All You Need For Questioning Llama-1/2, GPT-3.5/4 (2023) • No Venue
Sondos Mahmoud Bsharat, Aidar Myrzakhan, Zhiqiang Shen
Alpagasus: Training A Better Alpaca With Fewer Data (2023) • No Venue
Chen et al.
LL3DA: Visual Interactive Instruction Tuning For Omni-3d Understanding, Reasoning, And Planning (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 48 citations
Chen et al.
Chatgpt Empowered Long-step Robot Control In Various Environments: A Case Application (2023) • IEEE Access • 75 citations
Wake et al.
Wavecoder: Widespread And Versatile Enhanced Instruction Tuning With Refined Data Generation (2023) • No Venue
Yu et al.
Efficient And Effective Text Encoding For Chinese Llama And Alpaca (2023) • Arxiv • 71 citations
Yiming Cui, Ziqing Yang, Xin Yao
K2: A Foundation Language Model For Geoscience Knowledge Understanding And Utilization (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 48 citations
Deng et al.
Qlora: Efficient Finetuning Of Quantized Llms (2023) • No Venue
Dettmers et al.
Enhancing Chat Language Models By Scaling High-quality Instructional Conversations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 60 citations
Ding et al.
Ferret: Refer And Ground Anything Anywhere At Any Granularity (2023) • Arxiv • 43 citations
You et al.
Lmdrive: Closed-loop End-to-end Driving With Large Language Models (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 76 citations
Shao et al.
Alpacafarm: A Simulation Framework For Methods That Learn From Human Feedback (2023) • Arxiv • 53 citations
Dubois et al.
Instruction-following Evaluation For Large Language Models (2023) • No Venue
Zhou et al.
LIMA: Less Is More For Alignment (2023) • No Venue
Zhou et al.
Navgpt: Explicit Reasoning In Vision-and-language Navigation With Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 104 citations
Gengze Zhou, Yicong Hong, Qi Wu
Language Models Can Be Logical Solvers (2023) • No Venue
Feng et al.
Medalign: A Clinician-generated Dataset For Instruction Following With Electronic Medical Records (2023) • No Venue
Fleming et al.
Jais And Jais-chat: Arabic-centric Foundation And Instruction-tuned Open Generative Large Language Models (2023) • No Venue
Sengupta et al.
Llama-adapter V2: Parameter-efficient Visual Instruction Model (2023) • Arxiv • 117 citations
Gao et al.
Ureader: Universal Ocr-free Visually-situated Language Understanding With Multimodal Large Language Model (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 46 citations
Ye et al.
Mplug-owl2: Revolutionizing Multi-modal Large Language Model With Modality Collaboration (2023) • No Venue
Ye et al.
Graphgpt: Graph Instruction Tuning For Large Language Models (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 96 citations
Tang et al.
Knowledge Distillation Of Large Language Models (2023) • No Venue
Gu et al.
PPTC Benchmark: Evaluating Large Language Models For Powerpoint Task Completion (2023) • No Venue
Guo et al.
A Real-world Webagent With Planning, Long Context Understanding, And Program Synthesis (2023) • No Venue
Gur et al.
Onellm: One Framework To Align All Modalities With Language (2023) • No Venue
Han et al.
Magicoder: Source Code Is All You Need (2023) • No Venue
Wei et al.
Benchmarking Large Language Models For News Summarization (2023) • Transactions of the Association for Computational Linguistics • 203 citations
Zhang et al.
From Task Structures To World Models: What Do Llms Know? (2023) • Trends in Cognitive Sciences • 43 citations
Ilker Yildirim, L. A. Paul
Pandagpt: One Model To Instruction-follow Them All (2023) • Arxiv • 46 citations
Su et al.
Aligning Instruction Tasks Unlocks Large Language Models As Zero-shot Relation Extractors (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 44 citations
Kai Zhang, Bernal Jiménez Gutiérrez, Yu Su
Mentallama: Interpretable Mental Health Analysis On Social Media With Large Language Models (2023) • WWW '24: The ACM Web Conference 2024 • 79 citations
Yang et al.
Camels In A Changing Climate: Enhancing LM Adaptation With Tulu 2 (2023) • No Venue
Ivison et al.
Timechat: A Time-sensitive Multimodal Large Language Model For Long Video Understanding (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 85 citations
Ren et al.
Code Llama: Open Foundation Models For Code (2023) • Arxiv • 377 citations
Rozière et al.
Doctorglm: Fine-tuning Your Chinese Doctor Is Not A Herculean Task (2023) • Arxiv • 70 citations
Xiong et al.
Text2motion: From Natural Language Instructions To Feasible Plans (2023) • Autonomous Robots • 156 citations
Lin et al.
Opening Up Chatgpt: Tracking Openness, Transparency, And Accountability In Instruction-tuned Text Generators (2023) • CUI '23: ACM conference on Conversational User Interfaces • 64 citations
Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse
Learning To Model The World With Language (2023) • No Venue
Lin et al.
CAMEL: Communicative Agents For "mind" Exploration Of Large Language Model Society (2023) • Arxiv • 87 citations
Li et al.
Llava-med: Training A Large Language-and-vision Assistant For Biomedicine In One Day (2023) • Arxiv • 216 citations
Li et al.
Otter: A Multi-modal Model With In-context Instruction Tuning (2023) • Arxiv • 87 citations
Li et al.
Table-gpt: Table-tuned GPT For Diverse Table Tasks (2023) • No Venue
Li et al.
Videochat: Chat-centric Video Understanding (2023) • Arxiv • 90 citations
Li et al.
Pointllm: Empowering Large Language Models To Understand Point Clouds (2023) • Lecture Notes in Computer Science • 45 citations
Xu et al.
Lemur: Harmonizing Natural Language And Code For Language Agents (2023) • No Venue
Xu et al.
Magicbrush: A Manually Annotated Dataset For Instruction-guided Image Editing (2023) • No Venue
Zhang et al.
Exploiting Programmatic Behavior Of Llms: Dual-use Through Standard Security Attacks (2023) • 2024 IEEE Security and Privacy Workshops (SPW) • 47 citations
Kang et al.
PMC-VQA: Visual Instruction Tuning For Medical Visual Question Answering (2023) • Arxiv • 57 citations
Zhang et al.
SOLAR 10.7B: Scaling Large Language Models With Simple Yet Effective Depth Up-scaling (2023) • No Venue
Kim et al.
Instruct-fingpt: Financial Sentiment Analysis By Instruction Tuning Of General-purpose Large Language Models (2023) • SSRN Electronic Journal • 50 citations
Boyu Zhang, Hongyang Yang, Xiao-Yang Liu
Vera: Vector-based Random Matrix Adaptation (2023) • No Venue
Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki Markus Asano
Geochat: Grounded Large Vision-language Model For Remote Sensing (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Kuckreja et al.
Speechgpt: Empowering Large Language Models With Intrinsic Cross-modal Conversational Abilities (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 108 citations
Zhang et al.
LISA: Reasoning Segmentation Via Large Language Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 205 citations
Lai et al.
A Systematic Study And Comprehensive Evaluation Of Chatgpt On Benchmark Datasets (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 69 citations
Laskar et al.
Toolllm: Facilitating Large Language Models To Master 16000+ Real-world Apis (2023) • No Venue
Qin et al.
Video-llama: An Instruction-tuned Audio-visual Language Model For Video Understanding (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 367 citations
Hang Zhang, Xin Li, Lidong Bing
Super-naturalinstructions: Generalization Via Declarative Instructions On 1600+ NLP Tasks (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 205 citations
Wang et al.
Self-consistency Improves Chain Of Thought Reasoning In Language Models (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 365 citations
Wang et al.
TEACH: Temporal Action Composition For 3D Humans (2022) • 2022 International Conference on 3D Vision (3DV) • 98 citations
Athanasiou et al.
Counterfactual Cycle-consistent Learning For Instruction Following And Generation In Vision-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Wang et al.
Instructpix2pix: Learning To Follow Image Editing Instructions (2022) • Arxiv • 40 citations
Tim Brooks, Aleksander Holynski, Alexei A. Efros
What Matters In Language Conditioned Robotic Imitation Learning Over Unstructured Data (2022) • IEEE Robotics and Automation Letters • 49 citations
Oier Mees, Lukas Hermann, Wolfram Burgard
Progprompt: Generating Situated Robot Task Plans Using Large Language Models (2022) • Arxiv • 42 citations
Singh et al.
ZSON: Zero-shot Object-goal Navigation Using Multimodal Goal Embeddings (2022) • Arxiv • 41 citations
Majumdar et al.
Llm-planner: Few-shot Grounded Planning For Embodied Agents With Large Language Models (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 267 citations
Song et al.
Unnatural Instructions: Tuning Language Models With (almost) No Human Labor (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Honovich et al.
Language Models Are Few-shot Multilingual Learners (2021) • Proceedings of the 1st Workshop on Multilingual Representation Learning • 46 citations
Winata et al.
Episodic Transformer For Vision-and-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 85 citations
Alexander Pashevich, Cordelia Schmid, Chen Sun
Neighbor-view Enhanced Model For Vision And Language Navigation (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 61 citations
An et al.
Sub-instruction Aware Vision-and-language Navigation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 53 citations
Hong et al.
Allenact: A Framework For Embodied AI Research (2020) • Arxiv • 44 citations
Weihs et al.
Babywalk: Going Farther In Vision-and-language Navigation By Taking Baby Steps (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 59 citations
Zhu et al.
Object-and-action Aware Model For Visual Language Navigation (2020) • Lecture Notes in Computer Science • 95 citations
Qi et al.
Diagnosing The Environment Bias In Vision-and-language Navigation (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 44 citations
Yubo Zhang, Hao Tan, Mohit Bansal
Topological Planning With Transformers For Vision-and-language Navigation (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Chen et al.
Mapping Natural Language Instructions To Mobile UI Action Sequences (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 71 citations
Li et al.
Self-monitoring Navigation Agent Via Auxiliary Progress Estimation (2019) • Arxiv • 134 citations
Ma et al.
Learning To Navigate Unseen Environments: Back Translation With Environmental Dropout (2019) • Proceedings of the 2019 Conference of the North • 288 citations
Hao Tan, Licheng Yu, Mohit Bansal
Tactical Rewind: Self-correction Via Backtracking In Vision-and-language Navigation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 149 citations
Ke et al.
Understanding Natural Language Instructions For Fetching Daily Objects Using Gan-based Multimodal Target-source Classification (2019) • IEEE Robotics and Automation Letters • 40 citations
Magassouba et al.
Learning To Generalize From Sparse And Underspecified Rewards (2019) • Proceedings of the 36th International Conference on Machine Learning PMLR 97130-140 2019 • 46 citations
Agarwal et al.
Robust Navigation With Language Pretraining And Stochastic Sampling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 95 citations
Li et al.
Language As An Abstraction For Hierarchical Deep Reinforcement Learning (2019) • Arxiv • 65 citations
Jiang et al.
Environmental Drivers Of Systematicity And Generalization In A Situated Agent (2019) • Arxiv • 53 citations
Hill et al.
Speaker-follower Models For Vision-and-language Navigation (2018) • Arxiv • 244 citations
Fried et al.
Follownet: Robot Navigation By Following Natural Language Directions With Deep Reinforcement Learning (2018) • Third Workshop in Machine Learning in the Planning and Control of Robot Motion at ICRA 2018 • 43 citations
Shah et al.
Tell, Draw, And Repeat: Generating And Modifying Images Based On Continual Linguistic Instruction (2018) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 54 citations
El-Nouby et al.
Learning To Understand Goal Specifications By Modelling Reward (2018) • Arxiv • 69 citations
Bahdanau et al.

Showing first 12 while collapsed. Click to expand and reveal all 334.

Interpretability 219 papers #

Softpick: No Attention Sink, No Massive Activations With Rectified Softmax (2025) • No Venue
Zayd M. K. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji
Establishing Trustworthy LLM Evaluation Via Shortcut Neuron Analysis (2025) • No Venue
Zhu et al.
Chem-r: Learning To Reason As A Chemist (2025) • No Venue
Wang et al.
Landscape Of Thoughts: Visualizing The Reasoning Process Of Large Language Models (2025) • No Venue
Zhou et al.
Latent Sketchpad: Sketching Visual Thoughts To Elicit Multimodal Reasoning In Mllms (2025) • No Venue
Zhang et al.
MARS: A Multi-agent Framework Incorporating Socratic Guidance For Automated Prompt Optimization (2025) • No Venue
Zhang et al.
MM-RLHF: The Next Step Forward In Multimodal LLM Alignment (2025) • No Venue
Zhang et al.
Sim-cot: Supervised Implicit Chain-of-thought (2025) • No Venue
Wei et al.
Spot The Fake: Large Multimodal Model-based Synthetic Image Detection With Artifact Explanation (2025) • No Venue
Wen et al.
Are Vlms Ready For Autonomous Driving? An Empirical Study From The Reliability, Data, And Metric Perspectives (2025) • No Venue
Xie et al.
From What To Why: A Multi-agent System For Evidence-based Chemical Reaction Condition Reasoning (2025) • No Venue
Yang et al.
Multi-domain Explainability Of Preferences (2025) • No Venue
Nitay Calderon, Liat Ein-Dor, Roi Reichart
RM-R1: Reward Modeling As Reasoning (2025) • No Venue
Chen et al.
Arm-thinker: Reinforcing Multimodal Generative Reward Models With Agentic Tool Use And Visual Reasoning (2025) • No Venue
Ding et al.
I Have Covered All The Bases Here: Interpreting Reasoning Features In Large Language Models Via Sparse Autoencoders (2025) • No Venue
Galichin et al.
Sliderspace: Decomposing The Visual Capabilities Of Diffusion Models (2025) • No Venue
Gandikota et al.
Beyond Transcription: Mechanistic Interpretability In ASR (2025) • No Venue
Glazer et al.
Videoscore2: Think Before You Score In Generative Video Evaluation (2025) • No Venue
He et al.
Conceptattention: Diffusion Transformers Learn Highly Interpretable Features (2025) • No Venue
Helbling et al.
Screencoder: Advancing Visual-to-code Generation For Front-end Automation Via Modular Multimodal Agents (2025) • No Venue
Jiang et al.
Why Language Models Hallucinate (2025) • No Venue
Kalai et al.
LM2: Large Memory Models (2025) • No Venue
Kang et al.
LEGION: Learning To Ground And Explain For Synthetic Image Detection (2025) • No Venue
Kang et al.
Curie: Toward Rigorous And Automated Scientific Experimentation With AI Agents (2025) • No Venue
Kon et al.
The Dragon Hatchling: The Missing Link Between The Transformer And Models Of The Brain (2025) • No Venue
Kosowski et al.
The Rogue Scalpel: Activation Steering Compromises LLM Safety (2025) • No Venue
Korznikov et al.
Feature-level Insights Into Artificial Text Detection With Sparse Autoencoders (2025) • No Venue
Kuznetsov et al.
Train Sparse Autoencoders Efficiently By Utilizing Features Correlation (2025) • No Venue
Kurochkin et al.
Analyze Feature Flow To Enhance Interpretation And Steering In Language Models (2025) • No Venue
Laptev et al.
The Cot Encyclopedia: Analyzing, Predicting, And Controlling How A Reasoning Model Will Think (2025) • No Venue
Lee et al.
Analysing Chain Of Thought Dynamics: Active Guidance Or Unfaithful Post-hoc Rationalisation? (2025) • No Venue
Lewis-Lim et al.
Attention Illuminates LLM Reasoning: The Preplan-and-anchor Rhythm Enables Fine-grained Policy Optimization (2025) • No Venue
Li et al.
Mol-r1: Towards Explicit Long-cot Reasoning In Molecule Discovery (2025) • No Venue
Li et al.
Who's Your Judge? On The Detectability Of Llm-generated Judgments (2025) • No Venue
Li et al.
Think In Games: Learning To Reason In Games Via Reinforcement Learning With Large Language Models (2025) • No Venue
Liao et al.
Efficient Inference For Large Reasoning Models: A Survey (2025) • No Venue
Liu et al.
Olmotrace: Tracing Language Model Outputs Back To Trillions Of Training Tokens (2025) • No Venue
Liu et al.
Rethinking Diverse Human Preference Learning Through Principal Component Analysis (2025) • No Venue
Luo et al.
Language Models Are Injective And Hence Invertible (2025) • No Venue
Nikolaou et al.
Cotox: Chain-of-thought-based Molecular Toxicity Reasoning And Prediction (2025) • No Venue
Park et al.
Unveiling Intrinsic Dimension Of Texts: From Academic Abstract To Creative Story (2025) • No Venue
Pedashenko et al.
Interpretable Physics Reasoning And Performance Taxonomy In Vision-language Models (2025) • No Venue
Pawar et al.
Chain-of-visual-thought: Teaching Vlms To See And Think Better With Continuous Visual Tokens (2025) • No Venue
Qin et al.
Llm-microscope: Uncovering The Hidden Role Of Punctuation In Context Memory Of Transformers (2025) • No Venue
Razzhigaev et al.
Open Problems In Mechanistic Interpretability (2025) • No Venue
Sharkey et al.
Towards Trustworthy GUI Agents: A Survey (2025) • No Venue
Shi et al.
MADD: Multi-agent Drug Discovery Orchestra (2025) • No Venue
Solovev et al.
LLM Pretraining With Continuous Concepts (2025) • No Venue
Tack et al.
Mechanistic Permutability: Match Features Across Layers (2024) • No Venue
Nikita Balagansky, Ian Maksimov, Daniil Gavrilov
Cod, Towards An Interpretable Medical Agent Using Chain Of Diagnosis (2024) • No Venue
Chen et al.
MMAU: A Holistic Benchmark Of Agent Capabilities Across Diverse Domains (2024) • No Venue
Yin et al.
Attention Heads Of Large Language Models: A Survey (2024) • No Venue
Zheng et al.
Efficiently Democratizing Medical Llms For 50 Languages Via A Mixture Of Language Family Experts (2024) • No Venue
Zheng et al.
Transformer Explainer: Interactive Learning Of Text-generative Models (2024) • No Venue
Cho et al.
Unpacking SDXL Turbo: Interpreting Text-to-image Models With Sparse Autoencoders (2024) • No Venue
Surkov et al.
Knowledge Mechanisms In Large Language Models: A Survey And Perspective (2024) • No Venue
Wang et al.
Explainable Generative AI (genxai): A Survey, Conceptualization, And Research Agenda (2024) • Artificial Intelligence Review • 61 citations
Johannes Schneider
A Multimodal Automated Interpretability Agent (2024) • No Venue
Shaham et al.
Discovering The Gems In Early Layers: Accelerating Long-context Llms With 1000x Input Token Reduction (2024) • No Venue
Shi et al.
Rethinking Interpretability In The Era Of Large Language Models (2024) • No Venue
Singh et al.
Lvlm-intrepret: An Interpretability Tool For Large Vision-language Models (2024) • No Venue
Stan et al.
KAN: Kolmogorov-arnold Networks (2024) • No Venue
Liu et al.
Reft: Representation Finetuning For Language Models (2024) • No Venue
Wu et al.
Foundation Models For Music: A Survey (2024) • No Venue
Ma et al.
Llms Know More Than They Show: On The Intrinsic Representation Of LLM Hallucinations (2024) • No Venue
Orgad et al.
Large Language Model Confidence Estimation Via Black-box Access (2024) • No Venue
Pedapati et al.
SNIFFER: Multimodal Large Language Model For Explainable Out-of-context Misinformation Detection (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Qi et al.
A Review Of Large Language Models And Autonomous Agents In Chemistry (2024) • Chemical Science • 79 citations
Mayk Caldas Ramos, Christopher J. Collison, Andrew D. White
Not All Language Model Features Are Linear (2024) • No Venue
Engels et al.
Natural Language Reinforcement Learning (2024) • No Venue
Feng et al.
LOKI: A Comprehensive Synthetic Data Detection Benchmark Using Large Multimodal Models (2024) • No Venue
Ye et al.
Patchscope: A Unifying Framework For Inspecting Hidden Representations Of Language Models (2024) • No Venue
Ghandeharioun et al.
Chatdiet: Empowering Personalized Nutrition-oriented Food Recommender Chatbots Through An Llm-augmented Framework (2024) • Smart Health • 59 citations
Yang et al.
Hidden Flaws Behind Expert-level Accuracy Of Multimodal GPT-4 Vision In Medicine (2024) • npj Digital Medicine • 75 citations
Jin et al.
LLM Comparator: Visual Analytics For Side-by-side Evaluation Of Large Language Models (2024) • No Venue
Kahng et al.
"i'm Not Sure, But...": Examining The Impact Of Large Language Models' Uncertainty Expression On User Reliance And Trust (2024) • FAccT '24: The 2024 ACM Conference on Fairness, Accountability, and Transparency • 55 citations
Kim et al.
Improve Vision Language Model Chain-of-thought Reasoning (2024) • No Venue
Zhang et al.
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once On Gemma 2 (2024) • No Venue
Lieberum et al.
Map-neo: Highly Capable And Transparent Bilingual Large Language Model Series (2024) • No Venue
Zhang et al.
Revealing The Barriers Of Language Agents In Planning (2024) • No Venue
Xie et al.
Unifying Large Language Models And Knowledge Graphs: A Roadmap (2023) • IEEE Transactions on Knowledge and Data Engineering • 578 citations
Pan et al.
Trustworthy Llms: A Survey And Guideline For Evaluating Large Language Models' Alignment (2023) • No Venue
Liu et al.
Fake News In Sheep's Clothing: Robust Fake News Detection Against Llm-empowered Style Attacks (2023) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 44 citations
Jiaying Wu, Jiafeng Guo, Bryan Hooi
Can Chatgpt Forecast Stock Price Movements? Return Predictability And Large Language Models (2023) • SSRN Electronic Journal • 146 citations
Alejandro Lopez-Lira, Yuehua Tang
Interpretable Long-form Legal Question Answering With Retrieval-augmented Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 45 citations
Antoine Louis, Gijs van Dijck, Gerasimos Spanakis
Explainability For Large Language Models: A Survey (2023) • ACM Transactions on Intelligent Systems and Technology • 317 citations
Zhao et al.
Faithful Chain-of-thought Reasoning (2023) • Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) • 63 citations
Lyu et al.
Vipergpt: Visual Inference Via Python Execution For Reasoning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 140 citations
Dídac Surís, Sachit Menon, Carl Vondrick
Chatgpt Or Human? Detect And Explain. Explaining Decisions Of Machine Learning Model For Detecting Short Chatgpt-generated Text (2023) • Arxiv • 81 citations
Sandra Mitrović, Davide Andreoletti, Omran Ayoub
Roco: Dialectic Multi-robot Collaboration With Large Language Models (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 78 citations
Zhao Mandi, Shreeya Jain, Shuran Song
The Internal State Of An LLM Knows When It's Lying (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 68 citations
Amos Azaria, Tom Mitchell
Beyond Surface: Probing Llama Across Scales And Layers (2023) • No Venue
Chen et al.
Driving With Llms: Fusing Object-level Vector Modality For Explainable Autonomous Driving (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 110 citations
Chen et al.
Rethinking The Evaluation For Conversational Recommendation In The Era Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 45 citations
Wang et al.
Attentionviz: A Global View Of Transformer Attention (2023) • IEEE Transactions on Visualization and Computer Graphics • 44 citations
Yeh et al.
Using Sequences Of Life-events To Predict Human Lives (2023) • Nature Computational Science • 55 citations
Savcisens et al.
De-diffusion Makes Text A Strong Cross-modal Interface (2023) • No Venue
Wei et al.
Signbert+: Hand-model-aware Self-supervised Pre-training For Sign Language Understanding (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 89 citations
Hu et al.
Mentallama: Interpretable Mental Health Analysis On Social Media With Large Language Models (2023) • WWW '24: The ACM Web Conference 2024 • 79 citations
Yang et al.
Towards Interpretable Mental Health Analysis With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 84 citations
Yang et al.
Is Chatgpt Better Than Human Annotators? Potential And Limitations Of Chatgpt In Explaining Implicit Hate Speech (2023) • Companion Proceedings of the ACM Web Conference 2023 • 178 citations
Fan Huang, Haewoon Kwak, Jisun An
AI Alignment: A Comprehensive Survey (2023) • Arxiv • 60 citations
Ji et al.
Zero-shot Everything Sketch-based Image Retrieval, And In Explainable Style (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Lin et al.
Designerly Understanding: Information Needs For Model Transparency To Support Design Ideation For Ai-powered User Experience (2023) • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems • 56 citations
Liao et al.
A Closer Look At The Explainability Of Contrastive Language-image Pre-training (2023) • Arxiv • 41 citations
Li et al.
Drivegpt4: Interpretable End-to-end Autonomous Driving Via Large Language Model (2023) • IEEE Robotics and Automation Letters • 202 citations
Xu et al.
A Unified Understanding Of Deep NLP Models For Text Classification (2022) • IEEE Transactions on Visualization and Computer Graphics • 42 citations
Li et al.
Invariant Grounding For Video Question Answering (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 97 citations
Li et al.
Teaching Models To Express Their Uncertainty In Words (2022) • Arxiv • 53 citations
Stephanie Lin, Jacob Hilton, Owain Evans
Transrac: Encoding Multi-scale Temporal Correlation With Transformers For Repetitive Action Counting (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Hu et al.
Maieutic Prompting: Logically Consistent Reasoning With Recursive Explanations (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 64 citations
Jung et al.
Emergent World Representations: Exploring A Sequence Model Trained On A Synthetic Task (2022) • Arxiv • 59 citations
Li et al.
Using Large Language Models To Enhance Programming Error Messages (2022) • SIGCSE 2023: The 54th ACM Technical Symposium on Computer Science Education • 170 citations
Leinonen et al.
Interpretability In The Wild: A Circuit For Indirect Object Identification In GPT-2 Small (2022) • Arxiv • 49 citations
Wang et al.
Linguistically Inspired Roadmap For Building Biologically Reliable Protein Language Models (2022) • Nature Machine Intelligence • 46 citations
Vu et al.
Locating And Editing Factual Associations In GPT (2022) • Arxiv • 172 citations
Meng et al.
Augmenting Interpretable Models With Llms During Training (2022) • Nature Communications • 42 citations
Singh et al.
Interactive Model Cards: A Human-centered Approach To Model Documentation (2022) • 2022 ACM Conference on Fairness Accountability and Transparency • 76 citations
Crisan et al.
On The Explainability Of Natural Language Processing Deep Models (2022) • ACM Computing Surveys • 89 citations
Julia El Zini, Mariette Awad
Automated Clinical Coding: What, Why, And Where We Are? (2022) • npj Digital Medicine • 69 citations
Dong et al.
Investigating Explainability Of Generative AI For Code Through Scenario-based Design (2022) • 27th International Conference on Intelligent User Interfaces • 176 citations
Sun et al.
RARR: Researching And Revising What Language Models Say, Using Language Models (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Gao et al.
MIST: Multi-modal Iterative Spatial-temporal Transformer For Long-form Video Question Answering (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 60 citations
Gao et al.
Learn To Explain: Multimodal Reasoning Via Thought Chains For Science Question Answering (2022) • Arxiv • 214 citations
Lu et al.
Language Model Classifier Aligns Better With Physician Word Sensitivity Than Xgboost On Readmission Prediction (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 112 citations
Yang et al.
Incorporating Dynamic Semantics Into Pre-trained Language Model For Aspect-based Sentiment Analysis (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 49 citations
Zhang et al.
A Survey Of Controllable Text Generation Using Transformer-based Pre-trained Language Models (2022) • ACM Computing Surveys • 153 citations
Zhang et al.
NLX-GPT: A Model For Natural Language Explanations In Vision And Vision-language Tasks (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Fawaz Sammani, Tanmoy Mukherjee, Nikos Deligiannis
Locate Then Segment: A Strong Pipeline For Referring Image Segmentation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 125 citations
Jing et al.
Signbert: Pre-training Of Hand-model-aware Representation For Sign Language Recognition (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 89 citations
Hu et al.
Reformer: The Relational Transformer For Image Captioning (2021) • MM '22: The 30th ACM International Conference on Multimedia • 45 citations
Xuewen Yang, Yingru Liu, Xin Wang
Transfernet: An Effective And Transparent Framework For Multi-hop Question Answering Over Relation Graph (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 86 citations
Shi et al.
MOMENTA: A Multimodal Framework For Detecting Harmful Memes And Their Targets (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 100 citations
Pramanick et al.
Sleeptransformer: Automatic Sleep Staging With Interpretability And Uncertainty Quantification (2021) • IEEE Transactions on Biomedical Engineering • 225 citations
Phan et al.
Combat COVID-19 Infodemic Using Explainable Natural Language Processing Models (2021) • Information Processing & Management • 146 citations
Jackie Ayoub, X. Jessie Yang, Feng Zhou
Personalized Transformer For Explainable Recommendation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 72 citations
Lei Li, Yongfeng Zhang, Li Chen
Disentangling Hate In Online Memes (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 73 citations
Cao et al.
Paragraph-level Rationale Extraction Through Regularization: A Case Study On European Court Of Human Rights Cases (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 62 citations
Chalkidis et al.
Good For Misconceived Reasons: An Empirical Revisiting On The Need For Visual Context In Multimodal Machine Translation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 54 citations
Wu et al.
Generic Attention-model Explainability For Interpreting Bi-modal And Encoder-decoder Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 207 citations
Hila Chefer, Shir Gur, Lior Wolf
Knowledge Neurons In Pretrained Transformers (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 76 citations
Dai et al.
Counterfactual Explanations For Neural Recommenders (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 45 citations
Khanh Hiep Tran, Azin Ghazimatin, Rishiraj Saha Roy
Similarity Reasoning And Filtration For Image-text Matching (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 319 citations
Diao et al.
Towards Interpreting And Mitigating Shortcut Learning Behavior Of NLU Models (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 63 citations
Du et al.
Token Shift Transformer For Video Classification (2021) • MM '21: ACM Multimedia Conference • 95 citations
Hao Zhang, Yanbin Hao, Chong-Wah Ngo
Tph-yolov5: Improved Yolov5 Based On Transformer Prediction Head For Object Detection On Drone-captured Scenarios (2021) • 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 1665 citations
Zhu et al.
Nested Hierarchical Transformer: Towards Accurate, Data-efficient And Interpretable Visual Understanding (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 122 citations
Zhang et al.
KAT: A Knowledge Augmented Transformer For Vision-and-language (2021) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 68 citations
Gui et al.
Explaining Black Box Predictions And Unveiling Data Artifacts Through Influence Functions (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 86 citations
Xiaochuang Han, Byron C. Wallace, Yulia Tsvetkov
Scalable Multi-hop Relational Reasoning For Knowledge-aware Question Answering (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 198 citations
Feng et al.
Neural Data-to-text Generation Via Jointly Learning The Segmentation And Correspondence (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 45 citations
Shen et al.
Leakage-adjusted Simulatability: Can Models Generate Non-trivial Explanations Of Their Behavior In Natural Language? (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 55 citations
Hase et al.
Interpretable Rumor Detection In Microblogs By Attending To User Interactions (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 176 citations
Khoo et al.
Context-aware Attentive Knowledge Tracing (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 432 citations
Aritra Ghosh, Neil Heffernan, Andrew S. Lan
The Language Interpretability Tool: Extensible, Interactive Visualizations And Analysis For NLP Models (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 61 citations
Tenney et al.
Controlling Generative Models With Continuous Factors Of Variations (2020) • Arxiv • 79 citations
Antoine Plumerault, Hervé Le Borgne, Céline Hudelot
Modularized Transfomer-based Ranking Framework (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 50 citations
Luyu Gao, Zhuyun Dai, Jamie Callan
Perturbed Masking: Parameter-free Probing For Analyzing And Interpreting BERT (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 154 citations
Wu et al.
Compositional Explanations Of Neurons (2020) • Arxiv • 50 citations
Jesse Mu, Jacob Andreas
Sparterm: Learning Term-based Sparse Representation For Fast Text Retrieval (2020) • Arxiv • 59 citations
Bai et al.
Learning Modality Interaction For Temporal Sentence Localization And Event Captioning In Videos (2020) • Lecture Notes in Computer Science • 89 citations
Chen et al.
Explaining Question Answering Models Through Text Generation (2020) • Arxiv • 44 citations
Veronica Latcinnik, Jonathan Berant
Bertology Meets Biology: Interpreting Attention In Protein Language Models (2020) • Arxiv • 53 citations
Vig et al.
Transformers As Soft Reasoners Over Language (2020) • Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20} • 48 citations
Peter Clark, Oyvind Tafjord, Kyle Richardson
WT5?! Training Text-to-text Models To Explain Their Predictions (2020) • Arxiv • 103 citations
Narang et al.
A Survey Of The State Of Explainable AI For Natural Language Processing (2020) • Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing • 159 citations
Danilevsky et al.
Iterative Edit-based Unsupervised Sentence Simplification (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 57 citations
Kumar et al.
Ar-net: A Simple Auto-regressive Neural Network For Time-series (2019) • Arxiv • 49 citations
Oskar Triebe, Nikolay Laptev, Ram Rajagopal
Visualizing And Understanding The Effectiveness Of BERT (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 169 citations
Hao et al.
Automated Rationale Generation: A Technique For Explainable AI And Its Effects On Human Perceptions (2019) • Arxiv • 44 citations
Ehsan et al.
Overlearning Reveals Sensitive Attributes (2019) • Arxiv • 55 citations
Congzheng Song, Vitaly Shmatikov
On Attribution Of Recurrent Neural Network Predictions Via Additive Decomposition (2019) • The World Wide Web Conference • 54 citations
Du et al.
Exbert: A Visual Analysis Tool To Explore Learned Representations In Transformers Models (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 75 citations
Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann
Interpreting And Improving Natural-language Processing (in Machines) With Natural Language-processing (in The Brain) (2019) • Arxiv • 141 citations
Mariya Toneva, Leila Wehbe
A Multiscale Visualization Of Attention In The Transformer Model (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 295 citations
Jesse Vig
Analyzing The Structure Of Attention In A Transformer Language Model (2019) • Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP • 123 citations
Jesse Vig, Yonatan Belinkov
BERT Rediscovers The Classical NLP Pipeline (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1210 citations
Ian Tenney, Dipanjan Das, Ellie Pavlick
Select, Answer And Explain: Interpretable Multi-hop Reading Comprehension Over Multiple Documents (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 54 citations
Tu et al.
Attention Interpretability Across NLP Tasks (2019) • Arxiv • 69 citations
Vashishth et al.
Nlprolog: Reasoning With Weak Unification For Question Answering In Natural Language (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 71 citations
Weber et al.
Adaptively Sparse Transformers (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 81 citations
Gonçalo M. Correia, Vlad Niculae, André F. T. Martins
Recosa: Detecting The Relevant Contexts With Self-attention For Multi-turn Dialogue Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 121 citations
Zhang et al.
Generating Token-level Explanations For Natural Language Inference (2019) • Proceedings of the 2019 Conference of the North • 41 citations
Thorne et al.
Attention Is Not Not Explanation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 58 citations
Sarah Wiegreffe, Yuval Pinter
Self-assembling Modular Networks For Interpretable Multi-hop Reasoning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 56 citations
Yichen Jiang, Mohit Bansal
When And Why Is Document-level Context Useful In Neural Machine Translation? (2019) • Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019) • 64 citations
Yunsu Kim, Duc Thanh Tran, Hermann Ney
Mathqa: Towards Interpretable Math Word Problem Solving With Operation-based Formalisms (2019) • Arxiv • 119 citations
Amini et al.
Towards Better Substitution-based Word Sense Induction (2019) • Arxiv • 50 citations
Asaf Amrami, Yoav Goldberg
Answering Complex Open-domain Questions Through Iterative Query Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 103 citations
Qi et al.
Adding Interpretable Attention To Neural Translation Models Improves Word Alignment (2019) • Arxiv • 80 citations
Thomas Zenkel, Joern Wuebker, John Denero
Topic-enhanced Memory Networks For Personalised Point-of-interest Recommendation (2019) • Arxiv • 54 citations
Xiao Zhou, Cecilia Mascolo, Zhongxiang Zhao
Attention Is Not Explanation (2019) • Arxiv • 487 citations
Sarthak Jain, Byron C. Wallace
Make Up Your Mind! Adversarial Generation Of Inconsistent Natural Language Explanations (2019) • Short Paper at ACL 2020 • 55 citations
Camburu et al.
Allennlp Interpret: A Framework For Explaining Predictions Of NLP Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations • 125 citations
Wallace et al.
Dynamic Graph Attention For Referring Expression Comprehension (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 226 citations
Sibei Yang, Guanbin Li, Yizhou Yu
A Game Theoretic Approach To Class-wise Selective Rationalization (2019) • Arxiv • 42 citations
Chang et al.
Is Attention Interpretable? (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 115 citations
Sofia Serrano, Noah A. Smith
Explain Yourself! Leveraging Language Models For Commonsense Reasoning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 86 citations
Rajani et al.
Compositional Attention Networks For Machine Reasoning (2018) • Arxiv • 132 citations
Drew A. Hudson, Christopher D. Manning
Towards Explainable NLP: A Generative Explanation Framework For Text Classification (2018) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 141 citations
Hui Liu, Qingyu Yin, William Yang Wang
Explainable Recommendation Via Multi-task Learning In Opinionated Text Data (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 199 citations
Wang et al.
Rnns Implicitly Implement Tensor Product Representations (2018) • Arxiv • 41 citations
McCoy et al.
Object Counts! Bringing Explicit Detections Back Into Image Captioning (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 42 citations
Josiah Wang, Pranava Madhyastha, Lucia Specia
Unsupervised Discrete Sentence Representation Learning For Interpretable Neural Dialog Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 136 citations
Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi
Textual Explanations For Self-driving Vehicles (2018) • Lecture Notes in Computer Science • 283 citations
Kim et al.
Interpretable Adversarial Perturbation In Input Embedding Space For Text (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 155 citations
Sato et al.
E-snli: Natural Language Inference With Natural Language Explanations (2018) • Arxiv • 282 citations
Camburu et al.
Pathologies Of Neural Models Make Interpretations Difficult (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 235 citations
Feng et al.
Beyond Word Importance: Contextual Decomposition To Extract Interactions From Lstms (2018) • Arxiv • 118 citations
W. James Murdoch, Peter J. Liu, Bin Yu
Visual Referring Expression Recognition: What Do Systems Actually Learn? (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 52 citations
Volkan Cirik, Louis-Philippe Morency, Taylor Berg-Kirkpatrick
Tell-and-answer: Towards Explainable Visual Question Answering Using Attributes And Captions (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 59 citations
Li et al.
Efficient Large-scale Multi-modal Classification (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 145 citations
Kiela et al.
Baseline Needs More Love: On Simple Word-embedding-based Models And Associated Pooling Mechanisms (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 319 citations
Shen et al.
What Do We Need To Build Explainable AI Systems For The Medical Domain? (2017) • Arxiv • 626 citations
Holzinger et al.
Recurrent Topic-transition GAN For Visual Paragraph Generation (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 171 citations
Liang et al.
How Deep Is Knowledge Tracing? (2016) • Arxiv • 89 citations
Mohammad Khajah, Robert V. Lindsey, Michael C. Mozer
Rationalizing Neural Predictions (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 132 citations
Tao Lei, Regina Barzilay, Tommi Jaakkola
Interpretable Semantic Textual Similarity: Finding And Explaining Differences Between Sentences (2016) • Knowledge-Based Systems • 45 citations
Lopez-Gazpio et al.
Conditional Generation And Snapshot Learning In Neural Dialogue Systems (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 73 citations
Wen et al.

Showing first 12 while collapsed. Click to expand and reveal all 219.

Interspeech 52 papers #

XTTS: A Massively Multilingual Zero-shot Text-to-speech Model (2024) • Interspeech 2024 • 49 citations
Casanova et al.
The Chime-7 DASR Challenge: Distant Meeting Transcription With Multiple Devices In Diverse Scenarios (2023) • 7th International Workshop on Speech Processing in Everyday Environments (CHiME 2023) • 45 citations
Cornell et al.
Funasr: A Fundamental End-to-end Speech Recognition Toolkit (2023) • INTERSPEECH 2023 • 44 citations
Gao et al.
Hierspeech++: Bridging The Gap Between Semantic And Acoustic Representation Of Speech By Hierarchical Variational Inference For Zero-shot Speech Synthesis (2023) • No Venue
Lee et al.
Wenet 2.0: More Productive End-to-end Speech Recognition Toolkit (2022) • Interspeech 2022 • 90 citations
Zhang et al.
Mslam: Massively Multilingual Joint Pre-training For Speech And Text (2022) • Arxiv • 59 citations
Bapna et al.
Paraformer: Fast And Accurate Parallel Transformer For Non-autoregressive End-to-end Speech Recognition (2022) • Interspeech 2022 • 73 citations
Gao et al.
Branchformer: Parallel Mlp-attention Architectures To Capture Local And Global Context For Speech Recognition And Understanding (2022) • Arxiv • 40 citations
Peng et al.
SUPERB: Speech Processing Universal Performance Benchmark (2021) • Interspeech 2021 • 553 citations
Yang et al.
Diff-tts: A Denoising Diffusion Model For Text-to-speech (2021) • Interspeech 2021 • 95 citations
Jeong et al.
Keyword Transformer: A Self-attention Model For Keyword Spotting (2021) • Interspeech 2021 • 98 citations
Axel Berg, Mark O'Connor, Miguel Tairum Cruz
Contextualized Streaming End-to-end Speech Recognition With Trie-based Deep Biasing And Shallow Fusion (2021) • Interspeech 2021 • 58 citations
Le et al.
A Channel Coding Benchmark For Meta-learning (2021) • Interspeech 2021 • 57 citations
Li et al.
Efficient Training Of Audio Transformers With Patchout (2021) • Interspeech 2022 • 134 citations
Koutini et al.
End-to-end Speech Translation Via Cross-modal Progressive Training (2021) • Interspeech 2021 • 42 citations
Rong Ye, Mingxuan Wang, Lei Li
Lira: Learning Visual Speech Representations From Audio Through Self-supervision (2021) • Interspeech 2021 • 41 citations
Ma et al.
Efficient Minimum Word Error Rate Training Of Rnn-transducer For End-to-end Speech Recognition (2020) • Interspeech 2020 • 51 citations
Guo et al.
Contextnet: Improving Convolutional Neural Networks For Automatic Speech Recognition With Global Context (2020) • Interspeech 2020 • 244 citations
Han et al.
Tinylstms: Efficient Neural Speech Enhancement For Hearing Aids (2020) • Interspeech 2020 • 105 citations
Fedorov et al.
Conformer: Convolution-augmented Transformer For Speech Recognition (2020) • Interspeech 2020 • 1880 citations
Gulati et al.
Self-training For End-to-end Speech Translation (2020) • Interspeech 2020 • 40 citations
Pino et al.
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters (2020) • Interspeech 2020 • 73 citations
Pratap et al.
Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020) • Interspeech 2020 • 57 citations
Yang et al.
On The Comparison Of Popular End-to-end Models For Large Scale Speech Recognition (2020) • Interspeech 2020 • 122 citations
Li et al.
Developing RNN-T Models Surpassing High-performance Hybrid Models With Customization Capability (2020) • Interspeech 2020 • 96 citations
Li et al.
Multispeech: Multi-speaker Text To Speech With Transformer (2020) • Interspeech 2020 • 52 citations
Chen et al.
Single Headed Attention Based Sequence-to-sequence Model For State-of-the-art Results On Switchboard (2020) • Interspeech 2020 • 42 citations
Tüske et al.
End-to-end Text-to-speech For Low-resource Languages By Cross-lingual Transfer Learning (2019) • Interspeech 2019 • 71 citations
Tu et al.
Temporal Convolution For Real-time Keyword Spotting On Mobile Devices (2019) • Interspeech 2019 • 143 citations
Choi et al.
RWTH ASR Systems For Librispeech: Hybrid Vs Attention -- W/o Data Augmentation (2019) • Interspeech 2019 • 108 citations
Lüscher et al.
Voice Transformer Network: Sequence-to-sequence Voice Conversion Using Transformer With Text-to-speech Pretraining (2019) • Interspeech 2020 • 47 citations
Huang et al.
Speechbert: An Audio-and-text Jointly Learned Language Model For End-to-end Spoken Question Answering (2019) • Interspeech 2020 • 41 citations
Chuang et al.
Speech Model Pre-training For End-to-end Spoken Language Understanding (2019) • Interspeech 2019 • 41 citations
Lugosch et al.
Very Deep Self-attention Networks For End-to-end Speech Recognition (2019) • Interspeech 2019 • 48 citations
Pham et al.
Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019) • Interspeech 2019 • 153 citations
Zhang et al.
Jasper: An End-to-end Convolutional Neural Acoustic Model (2019) • Interspeech 2019 • 212 citations
Li et al.
Joint Training Framework For Text-to-speech And Voice Conversion Using Multi-source Tacotron And Wavenet (2019) • Interspeech 2019 • 55 citations
Zhang et al.
Robust Sequence-to-sequence Acoustic Modeling With Stepwise Monotonic Attention For Neural TTS (2019) • Interspeech 2019 • 79 citations
Mutian He, Yan Deng, Lei He
Language Modeling With Deep Transformers (2019) • Interspeech 2019 • 85 citations
Irie et al.
On The Choice Of Modeling Unit For Sequence-to-sequence Speech Recognition (2019) • Interspeech 2019 • 40 citations
Irie et al.
Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019) • Interspeech 2019 • 155 citations
Kannan et al.
Specaugment: A Simple Data Augmentation Method For Automatic Speech Recognition (2019) • Interspeech 2019 • 3313 citations
Park et al.
Speaker Adaptation For Attention-based End-to-end Speech Recognition (2019) • Interspeech 2019 • 40 citations
Meng et al.
A New Gan-based End-to-end TTS Training Algorithm (2019) • Interspeech 2019 • 42 citations
Guo et al.
End-to-end Speech Translation With Knowledge Distillation (2019) • Interspeech 2019 • 139 citations
Liu et al.
Multi-modal Data Augmentation For End-to-end ASR (2018) • Interspeech 2018 • 54 citations
Renduchintala et al.
Improved Training Of End-to-end Attention Models For Speech Recognition (2018) • Interspeech 2018 • 277 citations
Zeyer et al.
Syllable-based Sequence-to-sequence Speech Recognition With The Transformer In Mandarin Chinese (2018) • Interspeech 2018 • 69 citations
Zhou et al.
Sequence-to-sequence Models Can Directly Translate Foreign Speech (2017) • Interspeech 2017 • 252 citations
Weiss et al.
Cold Fusion: Training Seq2seq Models Together With Language Models (2017) • Interspeech 2018 • 111 citations
Sriram et al.
Direct Acoustics-to-word Models For English Conversational Speech Recognition (2017) • Interspeech 2017 • 114 citations
Audhkhasi et al.
Tacotron: Towards End-to-end Speech Synthesis (2017) • Interspeech 2017 • 1567 citations
Wang et al.

Showing first 12 while collapsed. Click to expand and reveal all 52.

— K —

KDD 44 papers #

Unist: A Prompt-empowered Universal Model For Urban Spatio-temporal Prediction (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 51 citations
Yuan et al.
Bias And Unfairness In Information Retrieval Systems: New Challenges In The LLM Era (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 59 citations
Dai et al.
A Review Of Modern Recommender Systems Using Generative Models (gen-recsys) (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 62 citations
Deldjoo et al.
A Survey On RAG Meeting Llms: Towards Retrieval-augmented Large Language Models (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 281 citations
Fan et al.
Openfedllm: Training Large Language Models On Decentralized Private Data Via Federated Learning (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 47 citations
Ye et al.
Large Language Models Meet Collaborative Filtering: An Efficient All-round Llm-based Recommender System (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 67 citations
Kim et al.
Urbangpt: Spatio-temporal Large Language Models (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 69 citations
Li et al.
Foundation Models For Time Series Analysis: A Tutorial And Survey (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 100 citations
Liang et al.
Fake News In Sheep's Clothing: Robust Fake News Detection Against Llm-empowered Style Attacks (2023) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 44 citations
Jiaying Wu, Jiafeng Guo, Bryan Hooi
All In One: Multi-task Prompting For Graph Neural Networks (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 123 citations
Sun et al.
Pepnet: Parameter And Embedding Personalized Network For Infusing With Personalized Prior Information (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 78 citations
Chang et al.
Exploring Data Augmentation For Code Generation Tasks (2023) • ACM SIGKDD Explorations Newsletter • 118 citations
Pinzhen Chen, Gerasimos Lampouras
Codegeex: A Pre-trained Model For Code Generation With Multilingual Benchmarking On Humaneval-x (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 158 citations
Zheng et al.
Fedmultimodal: A Benchmark For Multimodal Federated Learning (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 59 citations
Feng et al.
Text Is All You Need: Learning Language Representations For Sequential Recommendation (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 103 citations
Li et al.
Federatedscope-llm: A Comprehensive Package For Fine-tuning Large Language Models In Federated Learning (2023) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 56 citations
Kuang et al.
Multi-behavior Hypergraph-enhanced Transformer For Sequential Recommendation (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 130 citations
Yang et al.
A New Generation Of Perspective API: Efficient Multilingual Character-level Transformers (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 88 citations
Lees et al.
Self-supervised Hypergraph Transformer For Recommender Systems (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 107 citations
Lianghao Xia, Chao Huang, Chuxu Zhang
Twhin-bert: A Socially-enriched Pre-trained Language Model For Multilingual Tweet Representations At Twitter (2022) • KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 43 citations
Zhang et al.
Graphmae: Self-supervised Masked Graph Autoencoders (2022) • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 441 citations
Hou et al.
Towards Universal Sequence Representation Learning For Recommender Systems (2022) • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 194 citations
Hou et al.
Multimodal Emergent Fake News Detection Via Meta Neural Process Networks (2021) • Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining • 51 citations
Wang et al.
Pre-trained Language Model Based Ranking In Baidu Search (2021) • KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 46 citations
Zou et al.
Learned Token Pruning For Transformers (2021) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 75 citations
Kim et al.
Adversarial Attacks On Deep Models For Financial Transaction Records (2021) • Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining • 42 citations
Fursov et al.
BOND: Bert-assisted Open-domain Named Entity Recognition With Distant Supervision (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 113 citations
Liang et al.
Towards Automated Neural Interaction Discovery For Click-through Rate Prediction (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 61 citations
Song et al.
HOLMES: Health Online Model Ensemble Serving For Deep Learning Models In Intensive Care Units (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 72 citations
Hong et al.
Context-aware Attentive Knowledge Tracing (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 432 citations
Aritra Ghosh, Neil Heffernan, Andrew S. Lan
Evaluating Conversational Recommender Systems Via User Simulation (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 73 citations
Shuo Zhang, Krisztian Balog
Improving Multi-turn Response Selection Models With Complementary Last-utterance Selection By Instance Weighting (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 290 citations
Zhou et al.
TUTA: Tree-based Transformers For Generally Structured Table Pre-training (2020) • KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 58 citations
Wang et al.
Practice On Long Sequential User Behavior Modeling For Click-through Rate Prediction (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 198 citations
Pi et al.
Reinforcement Learning To Optimize Long-term User Engagement In Recommender Systems (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 223 citations
Zou et al.
Pythia: Ai-assisted Code Completion System (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 106 citations
Svyatkovskiy et al.
Graph Representation Learning Via Hard And Channel-wise Attention Networks (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 57 citations
Hongyang Gao, Shuiwang Ji
Assessing The Factual Accuracy Of Generated Text (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 145 citations
Goodrich et al.
Taming Pretrained Transformers For Extreme Multi-label Text Classification (2019) • KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 135 citations
Chang et al.
Layoutlm: Pre-training Of Text And Layout For Document Image Understanding (2019) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 478 citations
Xu et al.
Tf-ranking: Scalable Tensorflow Library For Learning-to-rank (2018) • KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 82 citations
Pasumarthi et al.
Multi-pointer Co-attention Networks For Recommendation (2018) • Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 220 citations
Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Accelerating Innovation Through Analogy Mining (2017) • Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining • 60 citations
Hope et al.
A Context-aware Attention Network For Interactive Question Answering (2016) • Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining • 40 citations
Li et al.

Showing first 12 while collapsed. Click to expand and reveal all 44.

— L —

Llm For Code 370 papers #

Latent Collaboration In Multi-agent Systems (2025) • No Venue
Zou et al.
Flowrl: Matching Reward Distributions For LLM Reasoning (2025) • No Venue
Zhu et al.
DRIVE: Data Curation Best Practices For Reinforcement Learning With Verifiable Reward In Competitive Code Generation (2025) • No Venue
Zhu et al.
Latent Refinement Decoding: Enhancing Diffusion-based Language Models By Refining Belief States (2025) • No Venue
Zhu et al.
Bigcodearena: Unveiling More Reliable Human Preferences In Code Generation Via Execution (2025) • No Venue
Zhuo et al.
CODESYNC: Synchronizing Large Language Models With Dynamic Code Evolution At Scale (2025) • No Venue
Wang et al.
Co-evolving LLM Coder And Unit Tester Via Reinforcement Learning (2025) • No Venue
Wang et al.
Vibe Checker: Aligning Code Evaluation With Human Preference (2025) • No Venue
Zhong et al.
Innogym: Benchmarking The Innovation Potential Of AI Agents (2025) • No Venue
Zhang et al.
Thyme: Think Beyond Images (2025) • No Venue
Zhang et al.
Learning To Reason Without External Rewards (2025) • No Venue
Zhao et al.
Livecodebench Pro: How Do Olympiad Medalists Judge Llms In Competitive Programming? (2025) • No Venue
Zheng et al.
Recode: Unify Plan And Action For Universal Granularity Control (2025) • No Venue
Yu et al.
Z1: Efficient Test-time Scaling With Code (2025) • No Venue
Yu et al.
API Agents Vs. GUI Agents: Divergence And Convergence (2025) • No Venue
Zhang et al.
Coding Triangle: How Does Large Language Model Understand Code? (2025) • No Venue
Zhang et al.
Codecriticbench: A Holistic Code Critique Benchmark For Large Language Models (2025) • No Venue
Zhang et al.
Stabilizing Knowledge, Promoting Reasoning: Dual-token Constraints For RLVR (2025) • No Venue
Wang et al.
Codearc: Benchmarking Reasoning Capabilities Of LLM Agents For Inductive Program Synthesis (2025) • No Venue
Wei et al.
SWE-RL: Advancing LLM Reasoning Via Reinforcement Learning On Open Software Evolution (2025) • No Venue
Wei et al.
The Devil Behind The Mask: An Emergent Safety Vulnerability Of Diffusion Llms (2025) • No Venue
Wen et al.
Leetcodedataset: A Temporal Dataset For Robust Evaluation And Efficient Training Of Code Llms (2025) • No Venue
Xia et al.
Teaching Language Models To Critique Via Reinforcement Learning (2025) • No Venue
Xie et al.
Kodcode: A Diverse, Challenging, And Verifiable Synthetic Dataset For Coding (2025) • No Venue
Xu et al.
Scalable Chain Of Thoughts Via Elastic Reasoning (2025) • No Venue
Xu et al.
From Code Foundation Models To Agents And Applications: A Practical Guide To Code Intelligence (2025) • No Venue
Yang et al.
Ui2code^n: A Visual Language Model For Test-time Scalable Interactive Ui-to-code Generation (2025) • No Venue
Yang et al.
Competitive Programming With Large Reasoning Models (2025) • No Venue
Openai et al.
Swe-rebench: An Automated Pipeline For Task Collection And Decontaminated Evaluation Of Software Engineering Agents (2025) • No Venue
Badertdinov et al.
Qwen3-vl Technical Report (2025) • No Venue
Bai et al.
CRANE: Reasoning With Constrained LLM Generation (2025) • No Venue
Banerjee et al.
Coda: Coding LM Via Diffusion Adaptation (2025) • No Venue
Chen et al.
Code2video: A Code-centric Paradigm For Educational Video Generation (2025) • No Venue
Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou
Evolve The Method, Not The Prompts: Evolutionary Synthesis Of Jailbreak Attacks On Llms (2025) • No Venue
Chen et al.
L^2M: Mutual Information Scaling Law For Long-context Language Modeling (2025) • No Venue
Chen et al.
Meshcoder: Llm-powered Structured Mesh Code Generation From Point Clouds (2025) • No Venue
Dai et al.
Swe-bench Pro: Can AI Agents Solve Long-horizon Software Engineering Tasks? (2025) • No Venue
Deng et al.
A Survey Of Vibe Coding With Large Language Models (2025) • No Venue
Ge et al.
Training Long-context, Multi-turn Software Engineering Agents With Reinforcement Learning (2025) • No Venue
Golubev et al.
Diffucoder: Understanding And Improving Masked Diffusion Models For Code Generation (2025) • No Venue
Gong et al.
Latcoder: Converting Webpage Design To Code With Layout-as-thought (2025) • No Venue
Gui et al.
Can Large Language Models Detect Errors In Long Chain-of-thought Reasoning? (2025) • No Venue
He et al.
Hardtests: Synthesizing High-quality Test Cases For LLM Coding (2025) • No Venue
He et al.
Swe-perf: Can Language Models Optimize Code Performance On Real-world Repositories? (2025) • No Venue
He et al.
CASS: Nvidia To AMD Transpilation With Data, Models, And Benchmark (2025) • No Venue
Heakl et al.
Xolver: Multi-agent Reasoning With Holistic Experience Learning Just Like An Olympiad Team (2025) • No Venue
Hosain et al.
CODESIM: Multi-agent Code Generation And Problem Solving Through Simulation-driven Planning And Debugging (2025) • No Venue
Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez
Screencoder: Advancing Visual-to-code Generation For Front-end Automation Via Modular Multimodal Agents (2025) • No Venue
Jiang et al.
Gigaevo: An Open Source Optimization Framework Powered By Llms And Evolution Algorithms (2025) • No Venue
Khrulkov et al.
Distillm-2: A Contrastive Approach Boosts The Distillation Of Llms (2025) • No Venue
Ko et al.
Exp-bench: Can AI Conduct AI Research Experiments? (2025) • No Venue
Kon et al.
Gemini Embedding: Generalizable Embeddings From Gemini (2025) • No Venue
Lee et al.
Dacomp: Benchmarking Data Agents Across The Full Data Intelligence Lifecycle (2025) • No Venue
Lei et al.
Can One Domain Help Others? A Data-centric Study On Multi-domain Reasoning Via Reinforcement Learning (2025) • No Venue
Li et al.
Codei/o: Condensing Reasoning Patterns Via Code Input-output Prediction (2025) • No Venue
Li et al.
CUDA-L1: Improving CUDA Optimization Via Contrastive Reinforcement Learning (2025) • No Venue
Li et al.
Fea-bench: A Benchmark For Evaluating Repository-level Code Generation For Feature Implementation (2025) • No Venue
Li et al.
Llms Can Easily Learn To Reason From Demonstrations Structure, Not Content, Is What Matters! (2025) • No Venue
Li et al.
S*: Test Time Scaling For Code Generation (2025) • No Venue
Li et al.
Tokdrift: When LLM Speaks In Subwords But Code Speaks In Grammar (2025) • No Venue
Yinxi Li, Yuntian Deng, Pengyu Nie
Where To Find Grokking In LLM Pretraining? Monitor Memorization-to-generalization Without Test (2025) • No Venue
Ziyue Li, Chenrui Fan, Tianyi Zhou
A.S.E: A Repository-level Benchmark For Evaluating Security In Ai-generated Code (2025) • No Venue
Lian et al.
Understanding Tool-integrated Reasoning (2025) • No Venue
Heng Lin, Zhongwen Xu
Vcode: A Multimodal Coding Benchmark With SVG As Symbolic Visual Representation (2025) • No Venue
Lin et al.
Critique-coder: Enhancing Coder Models By Critique Reinforcement Learning (2025) • No Venue
Ruan et al.
Webgen-agent: Enhancing Interactive Website Generation With Multi-level Feedback And Step-level Reinforcement Learning (2025) • No Venue
Lu et al.
RPG: A Repository Planning Graph For Unified And Scalable Codebase Generation (2025) • No Venue
Luo et al.
Rethinking RL Scaling For Vision Language Models: A Transparent, From-scratch Framework And Comprehensive Evaluation Scheme (2025) • No Venue
Ma et al.
Tool-integrated Reinforcement Learning For Repo Deep Search (2025) • No Venue
Ma et al.
Swe-lancer: Can Frontier Llms Earn $1 Million From Real-world Freelance Software Engineering? (2025) • No Venue
Miserendino et al.
Viscoder2: Building Multi-language Visualization Coding Agents (2025) • No Venue
Ni et al.
Viscoder: Fine-tuning Llms For Executable Python Visualization Code Generation (2025) • No Venue
Ni et al.
Demons In The Detail: On Implementing Load Balancing Loss For Training Specialized Mixture-of-expert Models (2025) • No Venue
Qiu et al.
Codeelo: Benchmarking Competition-level Code Generation Of Llms With Human-comparable Elo Ratings (2025) • No Venue
Quan et al.
Self-generated In-context Examples Improve LLM Agents For Sequential Decision-making Tasks (2025) • No Venue
Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
Agent Laboratory: Using LLM Agents As Research Assistants (2025) • No Venue
Schmidgall et al.
Paper2code: Automating Code Generation From Scientific Papers In Machine Learning (2025) • No Venue
Seo et al.
Longcodezip: Compress Long Context For Code Language Models (2025) • No Venue
Shi et al.
Can Language Models Falsify? Evaluating Algorithmic Reasoning With Counterexample Creation (2025) • No Venue
Sinha et al.
Iterative Self-training For Code Generation Via Reinforced Re-ranking (2025) • No Venue
Nikita Sorokin, Ivan Sedykh, Valentin Malykh
Januscoder: Towards A Foundational Visual-programmatic Interface For Code Intelligence (2025) • No Venue
Sun et al.
Code Graph Model (CGM): A Graph-integrated Large Language Model For Repository-level Software Engineering Tasks (2025) • No Venue
Tao et al.
Appworld: A Controllable World Of Apps And People For Benchmarking Interactive Coding Agents (2024) • No Venue
Trivedi et al.
CS1-LLM: Integrating Llms Into CS1 Instruction (2024) • ITiCSE 2024: Innovation and Technology in Computer Science Education • 45 citations
Vadaparty et al.
Bigcodebench: Benchmarking Code Generation With Diverse Function Calls And Complex Instructions (2024) • No Venue
Zhuo et al.
Astraios: Parameter-efficient Instruction Tuning Code Large Language Models (2024) • No Venue
Zhuo et al.
Deepseek LLM: Scaling Open-source Language Models With Longtermism (2024) • No Venue
Deepseek-Ai et al.
Automated Unit Test Improvement Using Large Language Models At Meta (2024) • FSE '24: 32nd ACM International Conference on the Foundations of Software Engineering • 53 citations
Alshahwan et al.
To Code, Or Not To Code? Exploring Impact Of Code In Pre-training (2024) • No Venue
Aryabumi et al.
LLM Augmented Llms: Expanding Capabilities Through Composition (2024) • No Venue
Bansal et al.
Iris: An Ai-driven Virtual Tutor For Computer Science Education (2024) • Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 • 41 citations
Patrick Bassner, Eduard Frankford, Stephan Krusche
Long Code Arena: A Set Of Benchmarks For Long-context Code Models (2024) • No Venue
Bogomolov et al.
Data Is All You Need: Finetuning Llms For Chip Design Via An Automated Design-data Augmentation Framework (2024) • DAC '24: 61st ACM/IEEE Design Automation Conference • 43 citations
Chang et al.
Swe-bench-java: A Github Issue Resolving Benchmark For Java (2024) • No Venue
Zan et al.
Language Models As Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning In Language Models (2024) • No Venue
Chae et al.
Mceval: Massively Multilingual Code Evaluation (2024) • No Venue
Chai et al.
B4: Towards Optimal Assessment Of Plausible Code Solutions With Plausible Tests (2024) • No Venue
Chen et al.
Scienceagentbench: Toward Rigorous Assessment Of Language Agents For Data-driven Scientific Discovery (2024) • No Venue
Chen et al.
Advancing LLM Reasoning Generalists With Preference Trees (2024) • No Venue
Yuan et al.
Opencodeinterpreter: Integrating Code Generation With Execution And Refinement (2024) • No Venue
Zheng et al.
Desirable Characteristics For AI Teaching Assistants In Programming Education (2024) • Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 • 49 citations
Denny et al.
Planetarium: A Rigorous Benchmark For Translating Text To Structured Planning Languages (2024) • No Venue
Zuo et al.
Judgebench: A Benchmark For Evaluating Llm-based Judges (2024) • No Venue
Tan et al.
Opendevin: An Open Platform For AI Software Developers As Generalist Agents (2024) • No Venue
Wang et al.
Who Validates The Validators? Aligning Llm-assisted Evaluation Of LLM Outputs With Human Preferences (2024) • UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology • 91 citations
Shankar et al.
Deepseekmath: Pushing The Limits Of Mathematical Reasoning In Open Language Models (2024) • No Venue
Shao et al.
Programming Every Example: Lifting Pre-training Data Quality Like Experts At Scale (2024) • No Venue
Zhou et al.
Chartmimic: Evaluating Lmm's Cross-modal Reasoning Capability Via Chart-to-code Generation (2024) • No Venue
Shi et al.
From Code To Correctness: Closing The Last Mile Of Code Generation With Hierarchical Debugging (2024) • No Venue
Shi et al.
Design2code: How Far Are We From Automating Front-end Engineering? (2024) • No Venue
Si et al.
Scaling Granite Code Models To 128K Context (2024) • No Venue
Stallone et al.
Tuning Language Models By Proxy (2024) • No Venue
Liu et al.
Agent-as-a-judge: Evaluate Agents With Agents (2024) • No Venue
Zhuge et al.
O1-coder: An O1 Replication For Coding (2024) • No Venue
Zhang et al.
Starcoder 2 And The Stack V2: The Next Generation (2024) • No Venue
Lozhkov et al.
The AI Scientist: Towards Fully Automated Open-ended Scientific Discovery (2024) • No Venue
Lu et al.
Yuan 2.0-M32: Mixture Of Experts With Attention Router (2024) • No Venue
Wu et al.
Evaluating The Effectiveness Of Llms In Introductory Computer Science Education: A Semester-long Field Study (2024) • L@S '24: Eleventh ACM Conference on Learning @ Scale • 46 citations
Lyu et al.
Plot2code: A Comprehensive Benchmark For Evaluating Multi-modal Large Language Models In Code Generation From Scientific Plots (2024) • No Venue
Wu et al.
Llmparser: An Exploratory Study On Using Large Language Models For Log Parsing (2024) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 49 citations
Ma et al.
Llama Pro: Progressive Llama With Block Expansion (2024) • No Venue
Wu et al.
Dynasaur: Large Language Agents Beyond Predefined Actions (2024) • No Venue
Nguyen et al.
Training Software Engineering Agents And Verifiers With Swe-gym (2024) • No Venue
Pan et al.
Nemotron-4 15B Technical Report (2024) • No Venue
Parmar et al.
Fine-tuning And Prompt Engineering For Large Language Models-based Code Review Automation (2024) • Information and Software Technology • 43 citations
Chanathip Pornprasit, Chakkrit Tantithamthavorn
The Widening Gap: The Benefits And Harms Of Generative AI For Novice Programmers (2024) • Proceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1 • 87 citations
Prather et al.
Large Language Model For Vulnerability Detection: Emerging Results And Future Directions (2024) • ICSE-NIER'24: 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results • 74 citations
Xin Zhou, Ting Zhang, David Lo
Selfcodealign: Self-alignment For Code Generation (2024) • No Venue
Wei et al.
Stepcoder: Improve Code Generation With Reinforcement Learning From Compiler Feedback (2024) • No Venue
Dou et al.
Hyperclova X Technical Report (2024) • No Venue
Yoo et al.
The Llama 3 Herd Of Models (2024) • No Venue
Dubey et al.
Fuzzcoder: Byte-level Fuzzing Test Via Large Language Model (2024) • No Venue
Yang et al.
Better & Faster Large Language Models Via Multi-token Prediction (2024) • No Venue
Gloeckle et al.
Deepseek-coder: When The Large Language Model Meets Programming -- The Rise Of Code Intelligence (2024) • No Venue
Guo et al.
Data Mixture Inference: What Do BPE Tokenizers Reveal About Their Training Data? (2024) • No Venue
Hayase et al.
Exploring The Privacy Protection Capabilities Of Chinese Large Language Models (2024) • Proceedings of the ACM on Software Engineering • 40 citations
Yuqi Yang, Xiaowen Huang, Jitao Sang
Evaluating And Aligning Codellms On Human Preference (2024) • No Venue
Yang et al.
Opencoder: The Open Cookbook For Top-tier Code Large Language Models (2024) • No Venue
Huang et al.
Autocoderover: Autonomous Program Improvement (2024) • Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis • 54 citations
Zhang et al.
Qwen2.5-coder Technical Report (2024) • No Venue
Hui et al.
Gitchameleon: Unmasking The Version-switching Capabilities Of Code Generation Models (2024) • No Venue
Islah et al.
Mixtral Of Experts (2024) • No Venue
Jiang et al.
Codeaid: Evaluating A Classroom Deployment Of An Llm-based Programming Assistant That Balances Student And Educator Needs (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 137 citations
Kazemitabaar et al.
Theagentcompany: Benchmarking LLM Agents On Consequential Real World Tasks (2024) • No Venue
Xu et al.
Materials Science In The Era Of Large Language Models: A Perspective (2024) • Digital Discovery • 49 citations
Ge Lei, Ronan Docherty, Samuel J. Cooper
Autocoder: Enhancing Code Large Language Model With Aiev-instruct (2024) • No Venue
Bin Lei, Yuchen Li, Qiuwu Chen
Diversity Empowers Intelligence: Integrating Expertise Of Software Engineering Agents (2024) • No Venue
Zhang et al.
I-SHEEP: Self-alignment Of LLM From Scratch Through An Iterative Self-enhancement Paradigm (2024) • No Venue
Liang et al.
Learning To Learn Faster From Human Feedback With Language Model Predictive Control (2024) • No Venue
Liang et al.
Humaneval-v: Benchmarking High-level Visual Reasoning With Complex Diagrams In Coding Tasks (2024) • No Venue
Zhang et al.
Agentless: Demystifying Llm-based Software Engineering Agents (2024) • No Venue
Xia et al.
Map-neo: Highly Capable And Transparent Bilingual Large Language Model Series (2024) • No Venue
Zhang et al.
Demystifying GPT Self-repair For Code Generation (2023) • No Venue
Olausson et al.
Chatgpt, Can You Generate Solutions For My Coding Exercises? An Evaluation On Its Effectiveness In An Undergraduate Java Programming Course (2023) • ITiCSE 2023: Innovation and Technology in Computer Science Education • 61 citations
Ouh et al.
An Empirical Study Of The Non-determinism Of Chatgpt In Code Generation (2023) • Arxiv • 49 citations
Ouyang et al.
Lost In Translation: A Study Of Bugs Introduced By Large Language Models While Translating Code (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 64 citations
Pan et al.
A Prompt Pattern Catalog To Enhance Prompt Engineering With Chatgpt (2023) • Arxiv • 746 citations
White et al.
Is Chatgpt The Ultimate Programming Assistant -- How Far Is It? (2023) • Arxiv • 107 citations
Tian et al.
Natural Language Generation And Understanding Of Big Code For Ai-assisted Programming: A Review (2023) • Entropy • 90 citations
Wong et al.
From Word Models To World Models: Translating From Natural Language To The Probabilistic Language Of Thought (2023) • No Venue
Wong et al.
ART: Automatic Multi-step Reasoning And Tool-use For Large Language Models (2023) • Arxiv • 47 citations
Paranjape et al.
Contrabert: Enhancing Code Pre-trained Models Via Contrastive Learning (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 49 citations
Liu et al.
A Comprehensive Evaluation Of Chatgpt's Zero-shot Text-to-sql Capability (2023) • Arxiv • 58 citations
Liu et al.
LLM360: Towards Fully Transparent Open-source Llms (2023) • No Venue
Liu et al.
"what It Wants Me To Say": Bridging The Abstraction Gap Between End-user Programmers And Code-generating Large Language Models (2023) • CHI '23: CHI Conference on Human Factors in Computing Systems • 82 citations
Liu et al.
Keep The Conversation Going: Fixing 162 Out Of 337 Bugs For $0.42 Each Using Chatgpt (2023) • ISSTA 2024 Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis • 40 citations
Chunqiu Steven Xia, Lingming Zhang
Supporting Qualitative Analysis With Large Language Models: Combining Codebook With GPT-3 For Deductive Coding (2023) • IUI '23: 28th International Conference on Intelligent User Interfaces • 126 citations
Xiao et al.
Fuzz4all: Universal Fuzzing With Large Language Models (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 108 citations
Xia et al.
Sql-palm: Improved Large Language Modeladaptation For Text-to-sql (2023) • No Venue
Sun et al.
Llama-reviewer: Advancing Code Review Automation With Large Language Models Through Parameter-efficient Fine-tuning (2023) • 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE) • 71 citations
Lu et al.
RTLLM: An Open-source Benchmark For Design RTL Generation With Large Language Model (2023) • 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) • 116 citations
Lu et al.
Accelerating LLM Inference With Staged Speculative Decoding (2023) • No Venue
Benjamin Spector, Chris Re
Wizardcoder: Empowering Code Large Language Models With Evol-instruct (2023) • No Venue
Luo et al.
Codefusion: A Pre-trained Diffusion Model For Code Generation (2023) • No Venue
Singh et al.
Eureka: Human-level Reward Design Via Coding Large Language Models (2023) • No Venue
Ma et al.
Mechagents: Large Language Model Multi-agent Collaborations Can Solve Mechanics Problems, Generate New Data, And Integrate Knowledge (2023) • Extreme Mechanics Letters • 47 citations
Bo Ni, Markus J. Buehler
Using An LLM To Help With Code Understanding (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 186 citations
Nam et al.
Vipergpt: Visual Inference Via Python Execution For Reasoning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 140 citations
Dídac Surís, Sachit Menon, Carl Vondrick
How Effective Are Neural Networks For Fixing Security Vulnerabilities (2023) • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis • 61 citations
Wu et al.
On The Robustness Of Code Generation Techniques: An Empirical Study On Github Copilot (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 95 citations
Mastropaolo et al.
On The Design Of Ai-powered Code Assistants For Notebooks (2023) • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems • 68 citations
McNutt et al.
How Generative AI Models Such As Chatgpt Can Be (mis)used In SPC Practice, Education, And Research? An Exploratory Study (2023) • Quality Engineering • 116 citations
Megahed et al.
Using Large Language Models To Generate Junit Tests: An Empirical Study (2023) • EASE 2024: 28th International Conference on Evaluation and Assessment in Software Engineering • 50 citations
Siddiq et al.
Reflexion: Language Agents With Verbal Reinforcement Learning (2023) • Arxiv • 247 citations
Shinn et al.
Octopack: Instruction Tuning Code Large Language Models (2023) • No Venue
Muennighoff et al.
Chat2vis: Generating Data Visualisations Via Natural Language Using Chatgpt, Codex And GPT-3 Large Language Models (2023) • IEEE Access • 165 citations
Paula Maddigan, Teo Susnjak
Mathcoder: Seamless Code Integration In Llms For Enhanced Mathematical Reasoning (2023) • No Venue
Wang et al.
No More Manual Tests? Evaluating And Improving Chatgpt For Unit Test Generation (2023) • Arxiv • 50 citations
Yuan et al.
Fixing Hardware Security Bugs With Large Language Models (2023) • IEEE Transactions on Information Forensics and Security • 47 citations
Ahmad et al.
Automatic Semantic Augmentation Of Language Model Prompts (for Code Summarization) (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 48 citations
Ahmed et al.
Santacoder: Don't Reach For The Stars! (2023) • Arxiv • 50 citations
Allal et al.
Gpt4all: An Ecosystem Of Open Source Compressed Language Models (2023) • No Venue
Anand et al.
Spellburst: A Node-based Interface For Exploratory Creative Coding With Natural Language Prompts (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 45 citations
Angert et al.
Codet5+: Open Code Large Language Models For Code Understanding And Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 215 citations
Wang et al.
Language Models Enable Simple Systems For Generating Structured Views Of Heterogeneous Data Lakes (2023) • Proceedings of the VLDB Endowment • 41 citations
Arora et al.
Advancing Requirements Engineering Through Generative AI: Assessing The Role Of Llms (2023) • Generative AI for Effective Software Development • 95 citations
Chetan Arora, John Grundy, Mohamed Abdelrazek
Llemma: An Open Language Model For Mathematics (2023) • No Venue
Azerbayev et al.
Qwen Technical Report (2023) • No Venue
Bai et al.
Codeplan: Repository-level Coding Using Llms And Planning (2023) • No Venue
Bairi et al.
Purple Llama Cyberseceval: A Secure Coding Benchmark For Language Models (2023) • No Venue
Bhatt et al.
Chip-chat: Challenges And Opportunities In Conversational Hardware Design (2023) • 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD) • 147 citations
Blocklove et al.
Codekgc: Code Language Model For Generative Knowledge Graph Construction (2023) • ACM Transactions on Asian and Low-Resource Language Information Processing • 40 citations
Bi et al.
Sparks Of Artificial General Intelligence: Early Experiments With GPT-4 (2023) • Arxiv • 1480 citations
Bubeck et al.
Generative AI Assistants In Software Development Education: A Vision For Integrating Generative AI Into Educational Practice, Not Instinctively Defending Against It (2023) • IEEE Software • 60 citations
Christopher Bull, Ahmed Kharrufa
Chatunitest: A Framework For Llm-based Test Generation (2023) • FSE '24: 32nd ACM International Conference on the Foundations of Software Engineering • 56 citations
Chen et al.
How Is Chatgpt's Behavior Changing Over Time? (2023) • No Venue
Lingjiao Chen, Matei Zaharia, James Zou
Gptutor: A Chatgpt-powered Programming Tool For Code Explanation (2023) • Communications in Computer and Information Science • 68 citations
Chen et al.
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-based Self-verification (2023) • No Venue
Zhou et al.
Generative AI In Computing Education: Perspectives Of Students And Instructors (2023) • 2023 IEEE Frontiers in Education Conference (FIE) • 92 citations
Zastudil et al.
Codegeex: A Pre-trained Model For Code Generation With Multilingual Benchmarking On Humaneval-x (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 158 citations
Zheng et al.
Software Testing With Large Language Models: Survey, Landscape, And Vision (2023) • IEEE Transactions on Software Engineering • 235 citations
Wang et al.
Wavecoder: Widespread And Versatile Enhanced Instruction Tuning With Refined Data Generation (2023) • No Venue
Yu et al.
Codereval: A Benchmark Of Pragmatic Code Generation With Generative Pre-trained Models (2023) • Proceedings of the IEEE/ACM 46th International Conference on Software Engineering • 77 citations
Yu et al.
Large Language Models For Compiler Optimization (2023) • No Venue
Cummins et al.
Llms Cannot Reliably Identify And Reason About Security Vulnerabilities (yet?): A Comprehensive Evaluation, Framework, And Benchmarks (2023) • 2024 IEEE Symposium on Security and Privacy (SP) • 50 citations
Ullah et al.
Rap-gen: Retrieval-augmented Patch Generation With Codet5 For Automatic Program Repair (2023) • Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 57 citations
Wang et al.
LIDA: A Tool For Automatic Generation Of Grammar-agnostic Visualizations And Infographics Using Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) • 69 citations
Victor Dibia
A Comparative Study Of Ai-generated (GPT-4) And Human-crafted Mcqs In Programming Education (2023) • Proceedings of the 26th Australasian Computing Education Conference • 68 citations
Doughty et al.
Large Language Models For Software Engineering: Survey And Open Problems (2023) • 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) • 244 citations
Fan et al.
Baldur: Whole-proof Generation And Repair With Large Language Models (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 59 citations
First et al.
Jais And Jais-chat: Arabic-centric Foundation And Instruction-tuned Open Generative Large Language Models (2023) • No Venue
Sengupta et al.
Gpt4aigchip: Towards Next-generation AI Accelerator Design Automation Via Large Language Models (2023) • 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) • 79 citations
Fu et al.
Chatgpt For Vulnerability Detection, Classification, And Repair: How Far Are We? (2023) • 2023 30th Asia-Pacific Software Engineering Conference (APSEC) • 56 citations
Fu et al.
Codebertscore: Evaluating Code Generation With Pretrained Models Of Code (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 63 citations
Zhou et al.
What Makes Good In-context Demonstrations For Code Intelligence Tasks With Llms? (2023) • 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) • 65 citations
Gao et al.
Large Language Models Are Few-shot Summarizers: Multi-intent Comment Generation Via In-context Learning (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 89 citations
Geng et al.
An Empirical Evaluation Of Using Large Language Models For Automated Unit Test Generation (2023) • IEEE Transactions on Software Engineering • 176 citations
Schäfer et al.
Textbooks Are All You Need (2023) • No Venue
Gunasekar et al.
Verigen: A Large Language Model For Verilog Code Generation (2023) • ACM Transactions on Design Automation of Electronic Systems • 129 citations
Thakur et al.
Thrilled By Your Progress! Large Language Models (GPT-4) No Longer Struggle To Pass Assessments In Higher Education Programming Courses (2023) • ICER 2023: ACM Conference on International Computing Education Research • 97 citations
Savelka et al.
Can Generative Pre-trained Transformers (GPT) Pass Assessments In Higher Education Programming Courses? (2023) • ITiCSE 2023: Innovation and Technology in Computer Science Education • 83 citations
Savelka et al.
A Real-world Webagent With Planning, Long Context Understanding, And Program Synthesis (2023) • No Venue
Gur et al.
Breaking The Silence: The Threats Of Using Llms In Software Engineering (2023) • ICSE-NIER'24: 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results • 52 citations
June Sallou, Thomas Durieux, Annibale Panichella
Magicoder: Source Code Is All You Need (2023) • No Venue
Wei et al.
Large Language Models For Code: Security Hardening And Adversarial Testing (2023) • CCS '23: ACM SIGSAC Conference on Computer and Communications Security • 66 citations
Jingxuan He, Martin Vechev
Stay On Topic With Classifier-free Guidance (2023) • No Venue
Sanchez et al.
Copiloting The Copilots: Fusing Large Language Models With Completion Engines For Automated Program Repair (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 75 citations
Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang
Metagpt: Meta Programming For A Multi-agent Collaborative Framework (2023) • Arxiv • 124 citations
Hong et al.
Exploring The Responses Of Large Language Models To Beginner Programmers' Help Requests (2023) • ICER 2023: ACM Conference on International Computing Education Research • 95 citations
Hellas et al.
REPLUG: Retrieval-augmented Black-box Language Models (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 76 citations
Shi et al.
Mistral 7B (2023) • Arxiv • 219 citations
Jiang et al.
The Programmer's Assistant: Conversational Interaction With A Large Language Model For Software Development (2023) • IUI '23: 28th International Conference on Intelligent User Interfaces • 177 citations
Ross et al.
Starvector: Generating Scalable Vector Graphics Code From Images (2023) • No Venue
Rodriguez et al.
Chatgpt And Software Testing Education: Promises & Perils (2023) • 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW) • 198 citations
Jalil et al.
Impact Of Code Language Models On Automated Program Repair (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 119 citations
Jiang et al.
Self-planning Code Generation With Large Language Models (2023) • ACM Transactions on Software Engineering and Methodology • 60 citations
Jiang et al.
Code Llama: Open Foundation Models For Code (2023) • Arxiv • 377 citations
Rozière et al.
Inferfix: End-to-end Program Repair With Llms (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 97 citations
Jin et al.
Is Stack Overflow Obsolete? An Empirical Study Of The Characteristics Of Chatgpt Answers To Stack Overflow Questions (2023) • Arxiv • 49 citations
Kabir et al.
Chain Of Code: Reasoning With A Language Model-augmented Code Emulator (2023) • No Venue
Li et al.
Large Language Models For Education: Grading Open-ended Questions Using Chatgpt (2023) • SBES 2023: XXXVII Brazilian Symposium on Software Engineering • 46 citations
Pinto et al.
Taskmatrix.ai: Completing Tasks By Connecting Foundation Models With Millions Of Apis (2023) • Intelligent Computing • 76 citations
Liang et al.
Generative AI For Programming Education: Benchmarking Chatgpt, GPT-4, And Human Tutors (2023) • No Venue
Phung et al.
The Robots Are Here: Navigating The Generative AI Revolution In Computing Education (2023) • Proceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education • 235 citations
Prather et al.
CCT5: A Code-change-oriented Pre-trained Model (2023) • Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 42 citations
Lin et al.
Autonomous GIS: The Next-generation Ai-powered GIS (2023) • International Journal of Digital Earth • 99 citations
Zhenlong Li, Huan Ning
Nuances Are The Key: Unlocking Chatgpt To Find Failure-inducing Tests With Differential Prompting (2023) • 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) • 40 citations
Li et al.
Starcoder: May The Source Be With You! (2023) • No Venue
Li et al.
Skcoder: A Sketch-based Approach For Automatic Code Generation (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 42 citations
Li et al.
Textbooks Are All You Need II: Phi-1.5 Technical Report (2023) • No Venue
Li et al.
Large Language Models Are Strong Zero-shot Retriever (2023) • IEEE Communications Magazine • 80 citations
Shen et al.
Chatdev: Communicative Agents For Software Development (2023) • Arxiv • 65 citations
Qian et al.
Lemur: Harmonizing Natural Language And Code For Language Agents (2023) • No Venue
Xu et al.
(security) Assertions By Large Language Models (2023) • IEEE Transactions on Information Forensics and Security • 44 citations
Kande et al.
Chatgpt For Programming Numerical Methods (2023) • Journal of Machine Learning for Modeling and Computing • 81 citations
Ali Kashefi, Tapan Mukerji
How Novices Use Llm-based Code Generators To Solve CS1 Coding Tasks In A Self-paced Learning Environment (2023) • Koli Calling '23: 23rd Koli Calling International Conference on Computing Education Research • 83 citations
Kazemitabaar et al.
Studying The Effect Of AI Code Generators On Supporting Novice Learners In Introductory Programming (2023) • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems • 226 citations
Kazemitabaar et al.
Repocoder: Repository-level Code Completion Through Iterative Retrieval And Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 66 citations
Zhang et al.
How Secure Is Code Generated By Chatgpt? (2023) • 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) • 88 citations
Khoury et al.
Exploring The Potential Of Large Language Models To Generate Formative Programming Feedback (2023) • 2023 IEEE Frontiers in Education Conference (FIE) • 48 citations
Natalie Kiesler, Dominic Lohr, Hieke Keuning
Unifying The Perspectives Of NLP And Software Engineering: A Survey On Language Models For Code (2023) • No Venue
Zhang et al.
Toolllm: Facilitating Large Language Models To Master 16000+ Real-world Apis (2023) • No Venue
Qin et al.
Comparing Code Explanations Created By Students And Large Language Models (2023) • Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 • 155 citations
Leinonen et al.
Pangu-coder2: Boosting Large Language Models For Code With Ranking Feedback (2023) • No Venue
Shen et al.
Automating Code Review Activities By Large-scale Pre-training (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 129 citations
Li et al.
Natural Attack For Pre-trained Models Of Code (2022) • Proceedings of the 44th International Conference on Software Engineering • 130 citations
Yang et al.
Codefill: Multi-token Code Completion By Jointly Learning From Structure And Naming Sequences (2022) • ICSE '22: 44th International Conference on Software Engineering • 67 citations
Maliheh Izadi, Roberta Gismondi, Georgios Gousios
On The Importance Of Building High-quality Training Datasets For Neural Code Search (2022) • Proceedings of the 44th International Conference on Software Engineering • 61 citations
Sun et al.
Codeattack: Code-based Adversarial Attacks For Pre-trained Programming Language Models (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 40 citations
Akshita Jha, Chandan K. Reddy
Repair Is Nearly Generation: Multilingual Program Repair With Llms (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 68 citations
Joshi et al.
Large Language Models Are Few-shot Testers: Exploring Llm-based General Bug Reproduction (2022) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 123 citations
Sungmin Kang, Juyeon Yoon, Shin Yoo
Automatic Detection And Analysis Of Technical Debts In Peer-review Documentation Of R Packages (2022) • Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering • 81 citations
Junaed Younus Khan, Gias Uddin
Code As Policies: Language Model Programs For Embodied Control (2022) • 2023 IEEE International Conference on Robotics and Automation (ICRA) • 390 citations
Liang et al.
A Systematic Evaluation Of Large Language Models Of Code (2022) • MAPS '22: 6th ACM SIGPLAN International Symposium on Machine Programming • 362 citations
Xu et al.
Better Together? An Evaluation Of Ai-supported Code Translation (2022) • 27th International Conference on Intelligent User Interfaces • 57 citations
Weisz et al.
Coderl: Mastering Code Generation Through Pretrained Models And Deep Reinforcement Learning (2022) • Arxiv • 87 citations
Le et al.
Using Large Language Models To Enhance Programming Error Messages (2022) • SIGCSE 2023: The 54th ACM Technical Symposium on Computer Science Education • 170 citations
Leinonen et al.
Benchmarking Large Language Models For Automated Verilog RTL Code Generation (2022) • 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) • 112 citations
Thakur et al.
Few-shot Training Llms For Project-specific Code-summarization (2022) • ASE '22: 37th IEEE/ACM International Conference on Automated Software Engineering • 168 citations
Toufique Ahmed, Premkumar Devanbu
Text And Code Embeddings By Contrastive Pre-training (2022) • Arxiv • 146 citations
Neelakantan et al.
No More Fine-tuning? An Experimental Evaluation Of Prompt Tuning In Code Intelligence (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 124 citations
Wang et al.
Prompting Is Programming: A Query Language For Large Language Models (2022) • Proceedings of the ACM on Programming Languages • 64 citations
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
Cross-domain Deep Code Search With Meta Learning (2022) • Proceedings of the 44th International Conference on Software Engineering • 40 citations
Chai et al.
Natgen: Generative Pre-training By "naturalizing" Source Code (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 94 citations
Chakraborty et al.
Codet: Code Generation With Generated Tests (2022) • Arxiv • 63 citations
Chen et al.
Using Transfer Learning For Code-related Tasks (2022) • IEEE Transactions on Software Engineering • 54 citations
Mastropaolo et al.
Large Language Models Meet Nl2code: A Survey (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 81 citations
Zan et al.
CERT: Continual Pre-training On Sketches For Library-oriented Code Generation (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 46 citations
Zan et al.
CIRCLE: Continual Repair Across Programming Languages (2022) • Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis • 48 citations
Yuan et al.
Progprompt: Generating Situated Robot Task Plans Using Large Language Models (2022) • Arxiv • 42 citations
Singh et al.
Language Models Of Code Are Few-shot Commonsense Learners (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 62 citations
Madaan et al.
How Readable Is Model-generated Code? Examining Readability And Visual Inspection Of Github Copilot (2022) • Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering • 40 citations
Naser Al Madi
Using Pre-trained Models To Boost Code Review Automation (2022) • Proceedings of the 44th International Conference on Software Engineering • 131 citations
Tufano et al.
Large Language Models Are Zero-shot Fuzzers: Fuzzing Deep-learning Libraries Via Large Language Models (2022) • ISSTA '23: 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis • 184 citations
Deng et al.
Investigating Explainability Of Generative AI For Code Through Scenario-based Design (2022) • 27th International Conference on Intelligent User Interfaces • 176 citations
Sun et al.
Ai-driven Development Is Here: Should You Worry? (2022) • IEEE Software • 55 citations
Neil Ernst, Gabriele Bavota
Automated Repair Of Programs From Large Language Models (2022) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 157 citations
Fan et al.
Practical Program Repair In The Era Of Large Pre-trained Language Models (2022) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 236 citations
Chunqiu Steven Xia, Yuxiang Wei, Lingming Zhang
Incoder: A Generative Model For Code Infilling And Synthesis (2022) • Arxiv • 139 citations
Fried et al.
Transformer-based Language Models For Software Vulnerability Detection (2022) • ACSAC: Annual Computer Security Applications Conference • 87 citations
Thapa et al.
Less Training, More Repairing Please: Revisiting Automated Program Repair Via Zero-shot Learning (2022) • ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 152 citations
Chunqiu Steven Xia, Lingming Zhang
Unixcoder: Unified Cross-modal Pre-training For Code Representation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 448 citations
Guo et al.
Coditt5: Pretraining For Source Code And Natural Language Editing (2022) • Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering • 68 citations
Zhang et al.
What Is It Like To Program With Artificial Intelligence? (2022) • Arxiv • 44 citations
Sarkar et al.
Automatic Generation Of Programming Exercises And Code Explanations Using Large Language Models (2022) • ICER 2022: ACM Conference on International Computing Education Research • 329 citations
Sarsa et al.
On The Evaluation Of Neural Code Summarization (2021) • Proceedings of the 44th International Conference on Software Engineering • 64 citations
Shi et al.
Improving Code Summarization With Block-wise Abstract Syntax Tree Splitting (2021) • 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC) • 61 citations
Lin et al.
Examining Zero-shot Vulnerability Repair With Large Language Models (2021) • 2023 IEEE Symposium on Security and Privacy (SP) • 58 citations
Pearce et al.
TAPEX: Table Pre-training Via Learning A Neural SQL Executor (2021) • Arxiv • 90 citations
Liu et al.
CURE: Code-aware Neural Machine Translation For Automatic Program Repair (2021) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 239 citations
Nan Jiang, Thibaud Lutellier, Lin Tan
Unified Pre-training For Program Understanding And Generation (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 520 citations
Ahmad et al.
Syncobert: Syntax-guided Multi-modal Contrastive Pre-training For Code Representation (2021) • Arxiv • 70 citations
Wang et al.
Codet5: Identifier-aware Unified Pre-trained Encoder-decoder Models For Code Understanding And Generation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 1009 citations
Wang et al.
Studying The Usage Of Text-to-text Transfer Transformer To Support Code-related Tasks (2021) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 190 citations
Mastropaolo et al.
On The Validity Of Pre-trained Transformers For Natural Language Processing In The Software Engineering Domain (2021) • IEEE Transactions on Software Engineering • 63 citations
Julian von Der Mosel, Alexander Trautsch, Steffen Herbold
An Empirical Study On The Usage Of BERT Models For Code Completion (2021) • IEEE Transactions on Software Engineering • 64 citations
Ciniselli et al.
A Neural Network Solves, Explains, And Generates University Math Problems By Program Synthesis And Few-shot Learning At Human Level (2021) • Proceedings of the National Academy of Sciences • 70 citations
Drori et al.
A Syntax-guided Edit Decoder For Neural Program Repair (2021) • Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 178 citations
Zhu et al.
Cotext: Multi-task Learning With Code-text Transformer (2021) • Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021) • 67 citations
Phan et al.
Code Structure Guided Transformer For Source Code Summarization (2021) • ACM Transactions on Software Engineering and Methodology • 66 citations
Gao et al.
PICARD: Parsing Incrementally For Constrained Auto-regressive Decoding From Language Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 140 citations
Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau
Automated Conformance Testing For Javascript Engines Via Deep Compiler Fuzzing (2021) • Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation • 66 citations
Ye et al.
Perfection Not Required? Human-ai Partnerships In Code Translation (2021) • 26th International Conference on Intelligent User Interfaces • 82 citations
Weisz et al.
Hybrid Ranking Network For Text-to-sql (2020) • Arxiv • 50 citations
Lyu et al.
Graphcodebert: Pre-training Code Representations With Data Flow (2020) • Arxiv • 151 citations
Guo et al.
Improved Automatic Summarization Of Subroutines Via Attention To File Context (2020) • Proceedings of the 17th International Conference on Mining Software Repositories • 94 citations
Haque et al.
Leveraging Code Generation To Improve Code Retrieval And Summarization Via Dual Learning (2020) • Proceedings of The Web Conference 2020 • 49 citations
Ye et al.
Generating Question Titles For Stack Overflow From Mined Code Snippets (2020) • ACM Transactions on Software Engineering and Methodology • 55 citations
Gao et al.
Code Summarization With Structure-induced Transformer (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 45 citations
Hongqiu Wu, Hai Zhao, Min Zhang
Code Prediction By Feeding Trees To Transformers (2020) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 140 citations
Kim et al.
A Transformer-based Approach For Source Code Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 256 citations
Ahmad et al.
Photon: A Robust Cross-domain Text-to-sql System (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 42 citations
Zeng et al.
Self-supervised Contrastive Learning For Code Retrieval And Summarization Via Semantic-preserving Transformations (2020) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 89 citations
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang
Multi-task Learning Based Pre-trained Language Model For Code Completion (2020) • ASE '20: 35th IEEE/ACM International Conference on Automated Software Engineering • 123 citations
Liu et al.
Unsupervised Translation Of Programming Languages (2020) • Arxiv • 62 citations
Lachaux et al.
Retrieval-augmented Generation For Code Summarization Via Hybrid GNN (2020) • Arxiv • 66 citations
Liu et al.
Intellicode Compose: Code Generation Using Transformer (2020) • ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 311 citations
Svyatkovskiy et al.
Pymt5: Multi-mode Translation Of Natural Language And Python Code With Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 64 citations
Clement et al.
Codebleu: A Method For Automatic Evaluation Of Code Synthesis (2020) • Arxiv • 184 citations
Ren et al.
Generating Accurate Assert Statements For Unit Test Cases Using Pretrained Transformers (2020) • AST '22: IEEE/ACM 3rd International Conference on Automation of Software Test • 66 citations
Tufano et al.
On Learning Meaningful Code Changes Via Neural Machine Translation (2019) • 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) • 202 citations
Tufano et al.
Coupling Retrieval And Meta-learning For Context-dependent Semantic Parsing (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 44 citations
Guo et al.
Learning And Evaluating Contextual Embedding Of Source Code (2019) • Arxiv • 156 citations
Kanade et al.
X-SQL: Reinforce Schema Representation With Context (2019) • Arxiv • 53 citations
He et al.
Pythia: Ai-assisted Code Completion System (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 106 citations
Svyatkovskiy et al.
Juice: A Large Scale Distantly Supervised Dataset For Open Domain Context-based Code Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Rajas Agashe, Srinivasan Iyer, Luke Zettlemoyer
A Comprehensive Exploration On Wikisql With Table-aware Word Contextualization (2019) • Arxiv • 122 citations
Hwang et al.
Towards Complex Text-to-sql In Cross-domain Database With Intermediate Representation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 348 citations
Guo et al.
Incsql: Training Incremental Text-to-sql Parsers With Non-deterministic Oracles (2018) • Arxiv • 59 citations
Shi et al.
Tree-to-tree Neural Networks For Program Translation (2018) • Arxiv • 83 citations
Xinyun Chen, Chang Liu, Dawn Song
CODIT: Code Editing With Tree-based Neural Models (2018) • IEEE Transactions on Software Engineering • 105 citations
Chakraborty et al.
Improving Automatic Source Code Summarization Via Deep Reinforcement Learning (2018) • Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering • 390 citations
Wan et al.
A Retrieve-and-edit Framework For Predicting Structured Outputs (2018) • Arxiv • 102 citations
Hashimoto et al.
Code2seq: Generating Sequences From Structured Representations Of Code (2018) • Arxiv • 62 citations
Alon et al.
Learning To Mine Aligned Code And Natural Language Pairs From Stack Overflow (2018) • Proceedings of the 15th International Conference on Mining Software Repositories • 183 citations
Yin et al.
A Parallel Corpus Of Python Functions And Documentation Strings For Automated Code Documentation And Code Generation (2017) • Arxiv • 62 citations
Antonio Valerio Miceli Barone, Rico Sennrich
Code Completion With Neural Attention And Pointer Networks (2017) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 103 citations
Li et al.
Learning Python Code Suggestion With A Sparse Pointer Network (2016) • Arxiv • 67 citations
Bhoopchand et al.
Automated Correction For Syntax Errors In Programming Assignments Using Recurrent Neural Networks (2016) • Arxiv • 71 citations
Sahil Bhatia, Rishabh Singh
Neuro-symbolic Program Synthesis (2016) • Arxiv • 105 citations
Parisotto et al.

Showing first 12 while collapsed. Click to expand and reveal all 370.

— M —

Memory & Long Context 237 papers #

Interactiveomni: A Unified Omni-modal Model For Audio-visual Multi-turn Dialogue (2025) • No Venue
Tong et al.
Comorag: A Cognitive-inspired Memory-organized RAG For Stateful Long Narrative Reasoning (2025) • No Venue
Wang et al.
MIRIX: Multi-agent Memory System For Llm-based Agents (2025) • No Venue
Yu Wang, Xi Chen
Memmamba: Rethinking Memory Patterns In State Space Model (2025) • No Venue
Wang et al.
Agentfly: Fine-tuning LLM Agents Without Fine-tuning Llms (2025) • No Venue
Zhou et al.
Complexfuncbench: Exploring Multi-step And Constrained Function Calling Under Long-context Scenario (2025) • No Venue
Zhong et al.
PRELUDE: A Benchmark Designed To Require Global Comprehension And Reasoning Over Long Contexts (2025) • No Venue
Yu et al.
Native Sparse Attention: Hardware-aligned And Natively Trainable Sparse Attention (2025) • No Venue
Yuan et al.
Browseragent: Building Web Agents With Human-inspired Web Browsing Actions (2025) • No Venue
Zhang et al.
Loongrl:reinforcement Learning For Advanced Reasoning Over Long Contexts (2025) • No Venue
Wang et al.
Mmlongbench: Benchmarking Long-context Vision-language Models Effectively And Thoroughly (2025) • No Venue
Wang et al.
O-mem: Omni Memory System For Personalized, Long Horizon, Self-evolving Agents (2025) • No Venue
Wang et al.
Sparser Block-sparse Attention Via Token Permutation (2025) • No Venue
Wang et al.
Deepseek-ocr: Contexts Optical Compression (2025) • No Venue
Haoran Wei, Yaofeng Sun, Yukun Li
Videorope: What Makes For Good Video Rotary Position Embedding? (2025) • No Venue
Wei et al.
Streamvln: Streaming Vision-and-language Navigation Via Slowfast Context Modeling (2025) • No Venue
Wei et al.
Efficient Pretraining Length Scaling (2025) • No Venue
Wu et al.
From Hours To Minutes: Lossless Acceleration Of Ultra Long Sequence Generation Up To 100K Tokens (2025) • No Venue
Wu et al.
Shifting Long-context Llms Research From Input To Output (2025) • No Venue
Wu et al.
Resum: Unlocking Long-horizon Search Intelligence Via Context Summarization (2025) • No Venue
Wu et al.
General Agentic Memory Via Deep Research (2025) • No Venue
Yan et al.
Learning On The Job: An Experience-driven Self-evolving Agent For Long-horizon Tasks (2025) • No Venue
Yang et al.
A Controllable Examination For Long-context Language Models (2025) • No Venue
Yang et al.
Qwen2.5-1m Technical Report (2025) • No Venue
Yang et al.
Agentfold: Long-horizon Web Agents With Proactive Context Management (2025) • No Venue
Ye et al.
Gated Associative Memory: A Parallel O(N) Architecture For Efficient Sequence Modeling (2025) • No Venue
Rishiraj Acharya
The Markovian Thinker (2025) • No Venue
Aghajohari et al.
Flashadventure: A Benchmark For GUI Agents Solving Full Story Arcs In Diverse Adventure Games (2025) • No Venue
Ahn et al.
Herobench: A Benchmark For Long-horizon Planning And Structured Reasoning In Virtual Worlds (2025) • No Venue
Anokhin et al.
ATLAS: Learning To Optimally Memorize The Context At Test Time (2025) • No Venue
Behrouz et al.
Iterresearch: Rethinking Long-horizon Agents Via Markovian State Reconstruction (2025) • No Venue
Chen et al.
Halumem: Evaluating Hallucinations In Memory Systems Of Agents (2025) • No Venue
Chen et al.
Longpo: Long Context Self-evolution Of Large Language Models Through Short-to-long Preference Optimization (2025) • No Venue
Chen et al.
L^2M: Mutual Information Scaling Law For Long-context Language Modeling (2025) • No Venue
Chen et al.
Retroinfer: A Vector-storage Approach For Scalable Long-context LLM Inference (2025) • No Venue
Chen et al.
Glyph: Scaling Context Windows Via Visual-text Compression (2025) • No Venue
Cheng et al.
Mem0: Building Production-ready AI Agents With Scalable Long-term Memory (2025) • No Venue
Chhikara et al.
Gemini 2.5: Pushing The Frontier With Advanced Reasoning, Multimodality, Long Context, And Next Generation Agentic Capabilities (2025) • No Venue
Comanici et al.
Beyond RAG: Task-aware KV Cache Compression For Comprehensive Knowledge Reasoning (2025) • No Venue
Corallo et al.
One-minute Video Generation With Test-time Training (2025) • No Venue
Dalal et al.
Alayadb: The Data Foundation For Efficient And Effective Long-context LLM Inference (2025) • No Venue
Deng et al.
Mom: Linear Sequence Modeling With Mixture-of-memories (2025) • No Venue
Du et al.
Artificial Hippocampus Networks For Efficient Long-context Modeling (2025) • No Venue
Fang et al.
Memp: Exploring Agent Procedural Memory (2025) • No Venue
Fang et al.
Lightmem: Lightweight And Efficient Memory-augmented Generation (2025) • No Venue
Fang et al.
Seerattention-r: Sparse Attention Adaptation For Long Reasoning (2025) • No Venue
Gao et al.
You Do Not Fully Utilize Transformer's Representation Capacity (2025) • No Venue
Gerasimov et al.
Multi-token Attention (2025) • No Venue
Golovneva et al.
Long-context Autoregressive Video Modeling With Next-frame Prediction (2025) • No Venue
Yuchao Gu, Weijia Mao, Mike Zheng Shou
Xolver: Multi-agent Reasoning With Holistic Experience Learning Just Like An Olympiad Team (2025) • No Venue
Hosain et al.
The Imitation Game: Turing Machine Imitator Is Length Generalizable Reasoner (2025) • No Venue
Hua et al.
When Thoughts Meet Facts: Reusable Reasoning For Long-context Lms (2025) • No Venue
Jeong et al.
ACON: Optimizing Context Compression For Long-horizon LLM Agents (2025) • No Venue
Kang et al.
LM2: Large Memory Models (2025) • No Venue
Kang et al.
Embodied Agents Meet Personalization: Exploring Memory Utilization For Personalized Assistance (2025) • No Venue
Kwon et al.
Experience Is The Best Teacher: Grounding Vlms For Robotics Through Self-generated Memory (2025) • No Venue
Lan et al.
Infinitehip: Extending Language Model Context Up To 3 Million Tokens On A Single GPU (2025) • No Venue
Lee et al.
Deepagent: A General Reasoning Agent With Scalable Toolsets (2025) • No Venue
Li et al.
Memos: A Memory OS For AI System (2025) • No Venue
Li et al.
Webweaver: Structuring Web-scale Evidence With Dynamic Outlines For Open-ended Deep Research (2025) • No Venue
Li et al.
Forgetting Transformer: Softmax Attention With A Forget Gate (2025) • No Venue
Lin et al.
Longllada: Unlocking Long Context Capabilities In Diffusion Llms (2025) • No Venue
Liu et al.
Longemotion: Measuring Emotional Intelligence Of Large Language Models In Long-context Interaction (2025) • No Venue
Liu et al.
A Comprehensive Survey On Long Context Language Modeling (2025) • No Venue
Liu et al.
Thus Spake Long-context Large Language Model (2025) • No Venue
Liu et al.
Voxtral (2025) • No Venue
Liu et al.
Seeing, Listening, Remembering, And Reasoning: A Multimodal Agent With Long-term Memory (2025) • No Venue
Long et al.
Mcp-universe: Benchmarking Large Language Models With Real-world Model Context Protocol Servers (2025) • No Venue
Luo et al.
Deepseek-r1 Thoughtology: Let's About LLM Reasoning (2025) • No Venue
Marjanović et al.
Exploring The Latent Capacity Of Llms For One-step Text Generation (2025) • No Venue
Gleb Mezentsev, Ivan Oseledets
Scalable-softmax Is Superior For Attention (2025) • No Venue
Ken M. Nakanishi
Llm-microscope: Uncovering The Hidden Role Of Punctuation In Context Memory Of Transformers (2025) • No Venue
Razzhigaev et al.
Beyond Memorization: Extending Reasoning Depth With Recurrence, Memory And Test-time Compute Scaling (2025) • No Venue
Rodkin et al.
SRMT: Shared Memory For Multi-agent Lifelong Pathfinding (2025) • No Venue
Alsu Sagirova, Yuri Kuratov, Mikhail Burtsev
Longrope2: Near-lossless LLM Context Window Scaling (2025) • No Venue
Shang et al.
When Tokens Talk Too Much: A Survey Of Multimodal Long-context Token Compression Across Images, Videos, And Audios (2025) • No Venue
Shao et al.
The Illusion Of Diminishing Returns: Measuring Long Horizon Execution In Llms (2025) • No Venue
Sinha et al.
Diagonal Batching Unlocks Parallelism In Recurrent Memory Transformers For Long Contexts (2025) • No Venue
Sivtsov et al.
Xquant: Breaking The Memory Wall For LLM Inference With KV Cache Rematerialization (2025) • No Venue
Tomar et al.
Revisiting Long-context Modeling From Context Denoising Perspective (2025) • No Venue
Tang et al.
Plug-and-play 1.x-bit KV Cache Quantization For Video Large Language Models (2025) • No Venue
Tao et al.
Gemma 3 Technical Report (2025) • No Venue
Team et al.
Agent Workflow Memory (2024) • No Venue
Wang et al.
Pixtral 12B (2024) • No Venue
Agrawal et al.
Burstattention: An Efficient Distributed Attention Framework For Extremely Long Sequences (2024) • No Venue
Ao et al.
Training-free Long-context Scaling Of Large Language Models (2024) • No Venue
An et al.
Make Your LLM Fully Utilize The Context (2024) • No Venue
An et al.
Simple Linear Attention Language Models Balance The Recall-throughput Tradeoff (2024) • No Venue
Arora et al.
Longbench V2: Towards Deeper Understanding And Reasoning On Realistic Long-context Multitasks (2024) • No Venue
Bai et al.
Longalign: A Recipe For Long Context Alignment Of Large Language Models (2024) • No Venue
Bai et al.
Xlstm: Extended Long Short-term Memory (2024) • Arxiv • 81 citations
Beck et al.
Long Code Arena: A Set Of Benchmarks For Long-context Code Models (2024) • No Venue
Bogomolov et al.
Reducing Transformer Key-value Cache Size With Cross-layer Attention (2024) • No Venue
Brandon et al.
Internlm2 Technical Report (2024) • No Venue
Cai et al.
How Do Large Language Models Acquire Factual Knowledge During Pretraining? (2024) • No Venue
Chang et al.
Dolphin: Long Context As A New Modality For Energy-efficient On-device Language Models (2024) • No Venue
Chen et al.
Agentpoison: Red-teaming LLM Agents Via Poisoning Memory Or Knowledge Bases (2024) • No Venue
Chen et al.
Larimar: Large Language Models With Episodic Memory Control (2024) • No Venue
Das et al.
A Silver Bullet Or A Compromise For Full Attention? A Comprehensive Study Of Gist Token-based Context Compression (2024) • No Venue
Deng et al.
A Simple And Effective L_2 Norm-based Strategy For KV Cache Compression (2024) • No Venue
Devoto et al.
Longrope: Extending LLM Context Window Beyond 2 Million Tokens (2024) • No Venue
Ding et al.
Sam2long: Enhancing SAM 2 For Long Video Segmentation With A Training-free Memory Tree (2024) • No Venue
Ding et al.
Llms In The Imaginarium: Tool Learning Through Simulated Trial And Error (2024) • No Venue
Wang et al.
Multimodal Needle In A Haystack: Benchmarking Long-context Capability Of Multimodal Large Language Models (2024) • No Venue
Wang et al.
Learning To (learn At Test Time): Rnns With Expressive Hidden States (2024) • No Venue
Sun et al.
Videollamb: Long-context Video Understanding With Recurrent Memory Bridges (2024) • No Venue
Wang et al.
Needle In A Multimodal Haystack (2024) • No Venue
Wang et al.
Resonance Rope: Improving Context Length Generalization Of Large Language Models (2024) • No Venue
Wang et al.
Knowledge Mechanisms In Large Language Models: A Survey And Perspective (2024) • No Venue
Wang et al.
Lloco: Learning Long Contexts Offline (2024) • No Venue
Tan et al.
Complexity Of Symbolic Representation In Working Memory Of Transformer Correlates With The Complexity Of A Task (2024) • No Venue
Alsu Sagirova, Mikhail Burtsev
Scaling Granite Code Models To 128K Context (2024) • No Venue
Stallone et al.
Deliberation In Latent Space Via Differentiable Cache Augmentation (2024) • No Venue
Liu et al.
Longgenbench: Long-context Generation Benchmark (2024) • No Venue
Liu et al.
Oryx MLLM: On-demand Spatial-temporal Understanding At Arbitrary Resolution (2024) • No Venue
Liu et al.
Retrievalattention: Accelerating Long-context LLM Inference Via Vector Retrieval (2024) • No Venue
Liu et al.
World Model On Million-length Video And Language With Ringattention (2024) • No Venue
Liu et al.
Long Context Transfer From Language To Vision (2024) • No Venue
Zhang et al.
A Controlled Study On Long Context Extension And Generalization In Llms (2024) • No Venue
Lu et al.
Soaring From 4K To 400K: Extending Llm's Context With Activation Beacon (2024) • No Venue
Zhang et al.
SCOPE: Optimizing Key-value Cache Compression In Long-context Generation (2024) • No Venue
Wu et al.
Longvideobench: A Benchmark For Long-context Interleaved Video-language Understanding (2024) • No Venue
Wu et al.
Evaluating Very Long-term Conversational Memory Of LLM Agents (2024) • No Venue
Maharana et al.
Leave No Context Behind: Efficient Infinite Context Transformers With Infini-attention (2024) • No Venue
Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal
Transformers Are Multi-state Rnns (2024) • No Venue
Oren et al.
Llmtimesmapreduce: Simplified Long-sequence Processing Using Large Language Models (2024) • No Venue
Zhou et al.
Memorag: Moving Towards Next-gen RAG Via Memory-inspired Knowledge Discovery (2024) • No Venue
Qian et al.
Hellobench: Evaluating Long Text Generation Capabilities Of Large Language Models (2024) • No Venue
Que et al.
Samba: Simple Hybrid State Space Models For Efficient Unlimited Context Language Modeling (2024) • No Venue
Ren et al.
Needle Threading: Can Llms Follow Threads Through Near-million-scale Haystacks? (2024) • No Venue
Jonathan Roberts, Kai Han, Samuel Albanie
Associative Recurrent Memory Transformer (2024) • No Venue
Rodkin et al.
Mgte: Generalized Long-context Text Representation And Reranking Models For Multilingual Text Retrieval (2024) • No Venue
Zhang et al.
Mplug-owl3: Towards Long Image-sequence Understanding In Multi-modal Large Language Models (2024) • No Venue
Ye et al.
Human-like Episodic Memory For Infinite Context Llms (2024) • No Venue
Fountas et al.
Data Engineering For Scaling Language Models To 128K Context (2024) • No Venue
Fu et al.
Chunkattention: Efficient Self-attention With Prefix-aware KV Cache And Two-phase Partition (2024) • No Venue
Ye et al.
Seerattention: Learning Intrinsic Sparse Attention In Your Llms (2024) • No Venue
Gao et al.
Longins: A Challenging Long-context Instruction-based Exam For Llms (2024) • No Venue
Gavin et al.
Towards Flexible Perception With Visual Memory (2024) • No Venue
Geirhos et al.
Gemini 1.5: Unlocking Multimodal Understanding Across Millions Of Tokens Of Context (2024) • Arxiv • 253 citations
Team et al.
Goldfinch: High Performance Rwkv/transformer Hybrid With Linear Pre-fill And Extreme Kv-cache Compression (2024) • No Venue
Goldstein et al.
Is It Really Long Context If All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP (2024) • No Venue
Goldman et al.
MA-LMM: Memory-augmented Large Multimodal Model For Long-term Video Understanding (2024) • No Venue
He et al.
RULER: What's The Real Context Size Of Your Long-context Language Models? (2024) • No Venue
Hsieh et al.
Longrecipe: Recipe For Efficient Long Context Generalization In Large Languge Models (2024) • No Venue
Hu et al.
Extending Llama-3's Context Ten-fold Overnight (2024) • No Venue
Zhang et al.
Transformerfam: Feedback Attention Is Working Memory (2024) • No Venue
Hwang et al.
Longrag: Enhancing Retrieval-augmented Generation With Long-context Llms (2024) • No Venue
Ziyan Jiang, Xueguang Ma, Wenhu Chen
Many-shot In-context Learning In Multimodal Foundation Models (2024) • No Venue
Jiang et al.
Mora: High-rank Updating For Parameter-efficient Fine-tuning (2024) • No Venue
Jiang et al.
LLM Maybe Longlm: Self-extend LLM Context Window Without Tuning (2024) • No Venue
Jin et al.
Hydragen: High-throughput LLM Inference With Shared Prefixes (2024) • No Venue
Juravsky et al.
Chatqa 2: Bridging The Gap To Proprietary Llms In Long Context And RAG Capabilities (2024) • No Venue
Xu et al.
THEANINE: Revisiting Memory Management In Long-term Conversations With Timeline-augmented Response Generation (2024) • No Venue
Kim et al.
In Search Of Needles In A 10M Haystack: Recurrent Memory Finds What Llms Miss (2024) • No Venue
Kuratov et al.
Babilong: Testing The Limits Of Llms With Long Context Reasoning-in-a-haystack (2024) • No Venue
Kuratov et al.
Summary Of A Haystack: A Challenge To Long-context Llms And RAG Systems (2024) • No Venue
Laban et al.
A Human-inspired Reading Agent With Gist Memory Of Very Long Contexts (2024) • No Venue
Lee et al.
Think: Thinner Key Cache By Query-driven Pruning (2024) • No Venue
Xu et al.
Long-context Llms Struggle With Long In-context Learning (2024) • No Venue
Li et al.
Aria: An Open Multimodal Native Mixture-of-experts Model (2024) • No Venue
Li et al.
Focusllm: Scaling Llm's Context By Parallel Decoding (2024) • No Venue
Li et al.
Needlebench: Can Llms Do Retrieval And Reasoning In 1 Million Context Window? (2024) • No Venue
Li et al.
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel In Long-horizon Tasks (2024) • No Venue
Li et al.
Snapkv: LLM Knows What You Are Looking For Before Generation (2024) • No Venue
Li et al.
Resurrecting Recurrent Neural Networks For Long Sequences (2023) • Arxiv • 42 citations
Orvieto et al.
Yarn: Efficient Context Window Extension Of Large Language Models (2023) • No Venue
Peng et al.
Lost In The Middle: How Language Models Use Long Contexts (2023) • No Venue
Liu et al.
Retentive Network: A Successor To Transformer For Large Language Models (2023) • No Venue
Sun et al.
JARVIS-1: Open-world Multi-task Agents With Memory-augmented Multimodal Language Models (2023) • No Venue
Wang et al.
Longbench: A Bilingual, Multitask Benchmark For Long Context Understanding (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Bai et al.
Extending Context Window Of Large Language Models Via Positional Interpolation (2023) • No Venue
Chen et al.
Longlora: Efficient Fine-tuning Of Long-context Large Language Models (2023) • No Venue
Chen et al.
Memorybank: Enhancing Large Language Models With Long-term Memory (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 77 citations
Zhong et al.
Chatclimate: Grounding Conversational AI In Climate Science (2023) • Communications Earth & Environment • 87 citations
Vaghefi et al.
Longnet: Scaling Transformers To 1,000,000,000 Tokens (2023) • No Venue
Ding et al.
In-context Autoencoder For Context Compression In A Large Language Model (2023) • No Venue
Ge et al.
A Real-world Webagent With Planning, Long Context Understanding, And Program Synthesis (2023) • No Venue
Gur et al.
Lm-infinite: Simple On-the-fly Length Generalization For Large Language Models (2023) • No Venue
Han et al.
Sparq Attention: Bandwidth-efficient LLM Inference (2023) • No Venue
Ribar et al.
A Comparative Study Of Pretrained Language Models For Long Clinical Text (2023) • Journal of the American Medical Informatics Association • 82 citations
Li et al.
Theory Of Mind For Multi-agent Collaboration Via Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 40 citations
Li et al.
Effective Long-context Scaling Of Foundation Models (2023) • No Venue
Xiong et al.
LARP: Language-agent Role Play For Open-world Games (2023) • No Venue
Yan et al.
Local-global Context Aware Transformer For Language-guided Video Segmentation (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 72 citations
Liang et al.
Audiolm: A Language Modeling Approach To Audio Generation (2022) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 252 citations
Borsos et al.
Blenderbot 3: A Deployed Conversational Agent That Continually Learns To Responsibly Engage (2022) • Arxiv • 98 citations
Shuster et al.
Simplified State Space Layers For Sequence Modeling (2022) • Arxiv • 76 citations
Jimmy T. H. Smith, Andrew Warrington, Scott W. Linderman
Flashattention: Fast And Memory-efficient Exact Attention With Io-awareness (2022) • Arxiv • 452 citations
Dao et al.
Future Transformer For Long-term Action Anticipation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Gong et al.
Episodic Transformer For Vision-and-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 85 citations
Alexander Pashevich, Cordelia Schmid, Chen Sun
Memory Augmented Multi-instance Contrastive Predictive Coding For Sequential Recommendation (2021) • 2021 IEEE International Conference on Data Mining (ICDM) • 45 citations
Ruihong Qiu, Zi Huang, Hongzhi Yin
Working Memory Connections For LSTM (2021) • Neural Networks • 220 citations
Landi et al.
Beyond Goldfish Memory: Long-term Open-domain Conversation (2021) • Arxiv • 40 citations
Jing Xu, Arthur Szlam, Jason Weston
Time-aware Language Models As Temporal Knowledge Bases (2021) • Transactions of the Association for Computational Linguistics • 49 citations
Dhingra et al.
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 84 citations
Shi et al.
Transformer Feed-forward Layers Are Key-value Memories (2020) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 134 citations
Geva et al.
Dynamic And Static Context-aware LSTM For Multi-agent Motion Prediction (2020) • Lecture Notes in Computer Science • 45 citations
Tao et al.
Trippy: A Triple Copy Strategy For Value Independent Neural Dialog State Tracking (2020) • Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue • 181 citations
Heck et al.
MART: Memory-augmented Recurrent Transformer For Coherent Video Paragraph Captioning (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 121 citations
Lei et al.
Efficient Neural Query Auto Completion (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Wang et al.
Mechanisms For Handling Nested Dependencies In Neural-network Language Models And Humans (2020) • Cognition • 55 citations
Lakretz et al.
Vision-dialog Navigation By Exploring Cross-modal Memory (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Zhu et al.
Exploiting Persona Information For Diverse Generation Of Conversational Responses (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 126 citations
Song et al.
Augmenting Self-attention With Persistent Memory (2019) • Arxiv • 46 citations
Sukhbaatar et al.
Transfer Meets Hybrid: A Synthetic Approach For Cross-domain Collaborative Filtering With Text (2019) • The World Wide Web Conference • 90 citations
Guangneng Hu, Yu Zhang, Qiang Yang
Episodic Memory In Lifelong Language Learning (2019) • Arxiv • 99 citations
D'Autume et al.
Transformer-xl: Attentive Language Models Beyond A Fixed-length Context (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1694 citations
Dai et al.
When And Why Is Document-level Context Useful In Neural Machine Translation? (2019) • Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019) • 64 citations
Yunsu Kim, Duc Thanh Tran, Hermann Ney
Simple And Effective Curriculum Pointer-generator Networks For Reading Comprehension Over Long Narratives (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 91 citations
Tay et al.
Topic-enhanced Memory Networks For Personalised Point-of-interest Recommendation (2019) • Arxiv • 54 citations
Xiao Zhou, Cecilia Mascolo, Zhongxiang Zhao
Cm-net: A Novel Collaborative Memory Network For Spoken Language Understanding (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 89 citations
Liu et al.
DM-GAN: Dynamic Memory Generative Adversarial Networks For Text-to-image Synthesis (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 549 citations
Zhu et al.
Large Memory Layers With Product Keys (2019) • Arxiv • 50 citations
Lample et al.
Towards Non-saturating Recurrent Units For Modelling Long-term Dependencies (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Chandar et al.
Hierarchical Transformers For Long Document Classification (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 137 citations
Pappagari et al.
Higru: Hierarchical Gated Recurrent Units For Utterance-level Emotion Recognition (2019) • Arxiv • 70 citations
Jiao et al.
Overcoming Long-term Catastrophic Forgetting Through Adversarial Neural Pruning And Synaptic Consolidation (2019) • IEEE Transactions on Neural Networks and Learning Systems • 46 citations
Peng et al.
Adaptive Attention Span In Transformers (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 96 citations
Sukhbaatar et al.
Movie Question Answering: Remembering The Textual Cues For Layered Visual Contents (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 44 citations
Wang et al.
Character-level Language Modeling With Deeper Self-attention (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 146 citations
Al-Rfou et al.
Dialog-context Aware End-to-end Speech Recognition (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 40 citations
Suyoun Kim, Florian Metze
Flowqa: Grasping Flow In History For Conversational Machine Comprehension (2018) • Arxiv • 63 citations
Hsin-Yuan Huang, Eunsol Choi, Wen-Tau Yih
Abstractive Summarization Of Reddit Posts With Multi-level Memory Networks (2018) • Arxiv • 60 citations
Byeongchang Kim, Hyunwoo Kim, Gunhee Kim
Bi-directional Block Self-attention For Fast And Memory-efficient Sequence Modeling (2018) • Arxiv • 77 citations
Shen et al.
Collaborative Memory Network For Recommendation Systems (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 288 citations
Travis Ebesu, Bin Shen, Yi Fang
On Extended Long Short-term Memory And Dependent Bidirectional Recurrent Neural Network (2018) • Neurocomputing • 174 citations
Yuanhang Su, C. -C. Jay Kuo
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 115 citations
Khandelwal et al.
Trellis Networks For Sequence Modeling (2018) • Arxiv • 68 citations
Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Learning To Remember Rare Events (2017) • Arxiv • 238 citations
Kaiser et al.
Latent Relational Metric Learning Via Memory-based Attention For Collaborative Ranking (2017) • Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18 • 214 citations
Yi Tay, Anh Tuan Luu, Siu Cheung Hui
Rapid Adaptation With Conditionally Shifted Neurons (2017) • Arxiv • 106 citations
Munkhdalai et al.
Conversational Contextual Cues: The Case Of Personalization And History For Response Ranking (2016) • Arxiv • 57 citations
Al-Rfou et al.
Quantifying The Vanishing Gradient And Long Distance Dependency Problem In Recursive Neural Networks And Recursive Lstms (2016) • Proceedings of the 1st Workshop on Representation Learning for NLP • 54 citations
Phong Le, Willem Zuidema
Topicrnn: A Recurrent Neural Network With Long-range Semantic Dependency (2016) • Arxiv • 129 citations
Dieng et al.
Tracking The World State With Recurrent Entity Networks (2016) • ICLR 2017 • 157 citations
Henaff et al.
Scaling Memory-augmented Neural Networks With Sparse Reads And Writes (2016) • Arxiv • 58 citations
Rae et al.

Showing first 12 while collapsed. Click to expand and reveal all 237.

Model Architecture 1091 papers #

Predicting The Order Of Upcoming Tokens Improves Language Modeling (2025) • No Venue
Zayd M. K. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji
Falcon-h1: A Family Of Hybrid-head Language Models Redefining Efficiency And Performance (2025) • No Venue
Zuo et al.
Softpick: No Attention Sink, No Massive Activations With Rectified Softmax (2025) • No Venue
Zayd M. K. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji
Frac-connections: Fractional Extension Of Hyper-connections (2025) • No Venue
Zhu et al.
Layercake: Token-aware Contrastive Decoding Within Large Language Model Layers (2025) • No Venue
Zhu et al.
Hybridnorm: Towards Stable And Efficient Transformer Training Via Hybrid Normalization (2025) • No Venue
Zhuo et al.
Embeddinggemma: Powerful And Lightweight Text Representations (2025) • No Venue
Vera et al.
Generalizing Test-time Compute-optimal Scaling As An Optimizable Graph (2025) • No Venue
Wang et al.
Fantasyportrait: Enhancing Multi-character Portrait Animation With Expression-augmented Diffusion Transformers (2025) • No Venue
Wang et al.
The End Of Manual Decoding: Towards Truly End-to-end Language Models (2025) • No Venue
Wang et al.
Hybrimoe: Hybrid CPU-GPU Scheduling And Cache Management For Efficient Moe Inference (2025) • No Venue
Zhong et al.
3DIS-FLUX: Simple And Efficient Multi-instance Generation With Dit Rendering (2025) • No Venue
Zhou et al.
Knocking-heads Attention (2025) • No Venue
Zhou et al.
Soundwave: Less Is More For Speech-text Alignment In Llms (2025) • No Venue
Zhang et al.
Group Relative Attention Guidance For Image Editing (2025) • No Venue
Zhang et al.
Locality-aware Parallel Decoding For Efficient Autoregressive Image Generation (2025) • No Venue
Zhang et al.
Tensor Product Attention Is All You Need (2025) • No Venue
Zhang et al.
Unified Multimodal Understanding And Generation Models: Advances, Challenges, And Opportunities (2025) • No Venue
Zhang et al.
Waver: Wave Your Way To Lifelike Video Generation (2025) • No Venue
Zhang et al.
Insights Into Deepseek-v3: Scaling Challenges And Reflections On Hardware For AI Architectures (2025) • No Venue
Zhao et al.
Paroattention: Pattern-aware Reordering For Efficient Sparse And Quantized Attention In Visual Generation Models (2025) • No Venue
Zhao et al.
Diffusion Transformers With Representation Autoencoders (2025) • No Venue
Zheng et al.
Stabilizing Reinforcement Learning With Llms: Formulation And Practices (2025) • No Venue
Zheng et al.
SAIL-VL2 Technical Report (2025) • No Venue
Yin et al.
Minicpm-v 4.5: Cooking Efficient Mllms Via Architecture, Data, And Training Recipe (2025) • No Venue
Yu et al.
Efficientllm: Efficiency In Large Language Models (2025) • No Venue
Yuan et al.
Native Sparse Attention: Hardware-aligned And Natively Trainable Sparse Attention (2025) • No Venue
Yuan et al.
Yue: Scaling Open Foundation Models For Long-form Music Generation (2025) • No Venue
Yuan et al.
ARWKV: Pretrain Is Not What We Need, An Rnn-attention-based Language Model Born From Transformer (2025) • No Venue
Yueyu et al.
Rlinf-vla: A Unified And Efficient Framework For VLA+RL Training (2025) • No Venue
Zang et al.
Renderformer: Transformer-based Neural Rendering Of Triangle Meshes With Global Illumination (2025) • No Venue
Zeng et al.
Easycontrol: Adding Efficient And Flexible Control For Diffusion Transformer (2025) • No Venue
Zhang et al.
Dreamvla: A Vision-language-action Model Dreamed With Comprehensive World Knowledge (2025) • No Venue
Zhang et al.
Part-x-mllm: Part-aware 3D Multimodal Large Language Model (2025) • No Venue
Wang et al.
Ovis-u1 Technical Report (2025) • No Venue
Wang et al.
A Systematic Analysis Of Hybrid Linear Attention (2025) • No Venue
Wang et al.
Sparser Block-sparse Attention Via Token Permutation (2025) • No Venue
Wang et al.
Transpixar: Advancing Text-to-video Generation With Transparency (2025) • No Venue
Wang et al.
Mocha: Towards Movie-grade Talking Character Synthesis (2025) • No Venue
Wei et al.
Univideo: Unified Understanding, Generation, And Editing For Videos (2025) • No Venue
Wei et al.
Seq Vs Seq: An Open Suite Of Paired Encoders And Decoders (2025) • No Venue
Weller et al.
Delta Attention: Fast And Accurate Sparse Attention Inference By Delta Correction (2025) • No Venue
Jeffrey Willette, Heejun Lee, Sung Ju Hwang
Direct3d-s2: Gigascale 3D Generation Made Easy With Spatial Sparse Attention (2025) • No Venue
Wu et al.
Latent Flow Transformer (2025) • No Venue
Wu et al.
Grove Moe: Towards Efficient And Superior Moe Llms With Adjugate Experts (2025) • No Venue
Wu et al.
Hunyuanvideo 1.5 Technical Report (2025) • No Venue
Wu et al.
Spatial-mllm: Boosting MLLM Capabilities In Visual-based Spatial Intelligence (2025) • No Venue
Wu et al.
Vmoba: Mixture-of-block Attention For Video Diffusion Models (2025) • No Venue
Wu et al.
Show-o2: Improved Native Unified Multimodal Models (2025) • No Venue
Jinheng Xie, Zhenheng Yang, Mike Zheng Shou
SANA 1.5: Efficient Scaling Of Training-time And Inference-time Compute In Linear Diffusion Transformer (2025) • No Venue
Xie et al.
Qwen2.5-omni Technical Report (2025) • No Venue
Xu et al.
Qwen3-omni Technical Report (2025) • No Venue
Xu et al.
From Code Foundation Models To Agents And Applications: A Practical Guide To Code Intelligence (2025) • No Venue
Yang et al.
Qwen3 Technical Report (2025) • No Venue
Yang et al.
Table-r1: Inference-time Scaling For Table Reasoning (2025) • No Venue
Yang et al.
Reconstruction Vs. Generation: Taming Optimization Dilemma In Latent Diffusion Models (2025) • No Venue
Jingfeng Yao, Xinggang Wang
Omnivinci: Enhancing Architecture And Data For Omni-modal Understanding LLM (2025) • No Venue
Ye et al.
Llasa: Scaling Train-time And Inference-time Compute For Llama-based Speech Synthesis (2025) • No Venue
Ye et al.
Shapellm-omni: A Native Multimodal LLM For 3D Generation And Understanding (2025) • No Venue
Ye et al.
Magicinfinite: Generating Infinite Talking Videos With Your Words And Voice (2025) • No Venue
Yi et al.
Every Activation Boosted: Scaling General Reasoner To 1 Trillion Open Language Foundation (2025) • No Venue
Ling-Team et al.
Minimax-01: Scaling Foundation Models With Lightning Attention (2025) • No Venue
Minimax et al.
NVIDIA Nemotron Nano V2 VL (2025) • No Venue
Nvidia et al.
Gated Associative Memory: A Parallel O(N) Architecture For Efficient Sequence Modeling (2025) • No Venue
Rishiraj Acharya
FS-DAG: Few Shot Domain Adapting Graph Networks For Visually Rich Document Understanding (2025) • No Venue
Amit Agarwal, Srikant Panda, Kulbhushan Pachauri
Ming-flash-omni: A Sparse, Unified Architecture For Multimodal Perception And Generation (2025) • No Venue
Ai et al.
LFM2 Technical Report (2025) • No Venue
Amini et al.
Hybrid Architectures For Language Models: Systematic Analysis And Design Insights (2025) • No Venue
Bae et al.
Intern-s1: A Scientific Multimodal Foundation Model (2025) • No Venue
Bai et al.
Qwen3-vl Technical Report (2025) • No Venue
Bai et al.
ATLAS: Learning To Optimally Memorize The Context At Test Time (2025) • No Venue
Behrouz et al.
Reflect, Retry, Reward: Self-improving Llms Via Reinforcement Learning (2025) • No Venue
Bensal et al.
Video-as-prompt: Unified Semantic Control For Video Generation (2025) • No Venue
Bian et al.
Eurobert: Scaling Multilingual Encoders For European Languages (2025) • No Venue
Boizard et al.
Neobert: A Next-generation BERT (2025) • No Venue
Breton et al.
Hunyuanimage 3.0 Technical Report (2025) • No Venue
Cao et al.
Humo: Human-centric Video Generation Via Collaborative Multi-modal Conditioning (2025) • No Venue
Chen et al.
Blip3-o: A Family Of Fully Open Unified Multimodal Models-architecture, Training And Dataset (2025) • No Venue
Chen et al.
Astra: Toward General-purpose Mobile Robots Via Hierarchical Multimodal Learning (2025) • No Venue
Chen et al.
Goku: Flow Based Video Generative Foundation Models (2025) • No Venue
Chen et al.
Sparse-vdit: Unleashing The Power Of Sparse Attention To Accelerate Video Diffusion Transformers (2025) • No Venue
Chen et al.
NVIDIA Nemotron Parse 1.1 (2025) • No Venue
Chumachenko et al.
Self-forcing++: Towards Minute-scale High-quality Video Generation (2025) • No Venue
Cui et al.
Onepiece: Bringing Context Engineering And Reasoning To Industrial Cascade Ranking System (2025) • No Venue
Dai et al.
Onerec: Unifying Retrieve And Rank With Generative Recommender And Iterative Preference Alignment (2025) • No Venue
Deng et al.
From Pixels To Words -- Towards Native Vision-language Primitives At Scale (2025) • No Venue
Diao et al.
SONAR-LLM: Autoregressive Transformer That Thinks In Sentence Embeddings And Speaks In Tokens (2025) • No Venue
Dragunov et al.
Unimmvsr: A Unified Multi-modal Framework For Cascaded Video Super-resolution (2025) • No Venue
Du et al.
Moderngbert: German-only 1B Encoder Model Trained From Scratch (2025) • No Venue
Ehrmanntraut et al.
Make Lora Great Again: Boosting Lora With Adaptive Singular Values And Mixture-of-experts Optimization Alignment (2025) • No Venue
Fan et al.
Phased DMD: Few-step Distribution Matching Distillation Via Score Matching Within Subintervals (2025) • No Venue
Fan et al.
Reactive Transformer (rxt) -- Stateful Real-time Processing For Event-driven Reactive Language Models (2025) • No Venue
Adam Filipek
Think-at-hard: Selective Latent Iterations To Improve Reasoning Language Models (2025) • No Venue
Fu et al.
Seedance 1.0: Exploring The Boundaries Of Video Generation Models (2025) • No Venue
Gao et al.
Scaling Up Test-time Compute With Latent Reasoning: A Recurrent Depth Approach (2025) • No Venue
Geiping et al.
You Do Not Fully Utilize Transformer's Representation Capacity (2025) • No Venue
Gerasimov et al.
Multi-token Attention (2025) • No Venue
Golovneva et al.
Long-context Autoregressive Video Modeling With Next-frame Prediction (2025) • No Venue
Yuchao Gu, Weijia Mao, Mike Zheng Shou
Mineworld: A Real-time And Open-source Interactive World Model On Minecraft (2025) • No Venue
Guo et al.
Seed1.5-vl Technical Report (2025) • No Venue
Guo et al.
Learnings From Scaling Visual Tokenizers For Reconstruction And Generation (2025) • No Venue
Hansen-Estruch et al.
Dita: Scaling Diffusion Transformer For Generalist Vision-language-action Policy (2025) • No Venue
Hou et al.
Every Token Counts: Generalizing 16M Ultra-long Context In Large Language Models (2025) • No Venue
Hu et al.
Hunyuancustom: A Multimodal-driven Architecture For Customized Video Generation (2025) • No Venue
Hu et al.
Vchain: Chain-of-visual-thought For Reasoning In Video Generation (2025) • No Venue
Huang et al.
Ultramemv2: Memory Networks Scaling To 120B Parameters With Superior Long-context Learning (2025) • No Venue
Huang et al.
Dype: Dynamic Position Extrapolation For Ultra High Resolution Diffusion (2025) • No Venue
Issachar et al.
Omnihuman-1.5: Instilling An Active Mind In Avatars Via Cognitive Simulation (2025) • No Venue
Jiang et al.
LM2: Large Memory Models (2025) • No Venue
Kang et al.
Marigold: Affordable Adaptation Of Diffusion-based Image Generators For Image Analysis (2025) • No Venue
Ke et al.
The Dragon Hatchling: The Missing Link Between The Transformer And Models Of The Brain (2025) • No Venue
Kosowski et al.
Stream3r: Scalable Sequential 3D Reconstruction With Causal Transformer (2025) • No Venue
Lan et al.
C3PO: Critical-layer, Core-expert, Collaborative Pathway Optimization For Test-time Expert Re-mixing (2025) • No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
Baichuan-omni-1.5 Technical Report (2025) • No Venue
Li et al.
IGGT: Instance-grounded Geometry Transformer For Semantic 3D Reconstruction (2025) • No Venue
Li et al.
MANZANO: A Simple And Scalable Unified Multimodal Model With A Hybrid Vision Tokenizer (2025) • No Venue
Li et al.
Optimus-3: Towards Generalist Multimodal Minecraft Agents With Scalable Task Experts (2025) • No Venue
Li et al.
Model Merging In Pre-training Of Large Language Models (2025) • No Venue
Li et al.
Routing Manifold Alignment Improves Generalization Of Mixture-of-experts Llms (2025) • No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
R2-T2: Re-routing In Test-time For Multimodal Mixture-of-experts (2025) • No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
Radial Attention: O(nlog N) Sparse Attention With Energy Decay For Long Video Generation (2025) • No Venue
Li et al.
Skip A Layer Or Loop It? Test-time Depth Adaptation Of Pretrained Llms (2025) • No Venue
Ziyue Li, Yang Li, Tianyi Zhou
Uni-moe-2.0-omni: Scaling Language-centric Omnimodal Large Model With Advanced Moe, Training And Data (2025) • No Venue
Li et al.
Discrete Diffusion VLA: Bringing Discrete Diffusion To Action Decoding In Vision-language-action Policies (2025) • No Venue
Liang et al.
Forgetting Transformer: Softmax Attention With A Forget Gate (2025) • No Venue
Lin et al.
Autoregressive Adversarial Post-training For Real-time Interactive Video Generation (2025) • No Venue
Lin et al.
Sigma: Differential Rescaling Of Query, Key And Value For Efficient Language Models (2025) • No Venue
Lin et al.
Partcrafter: Structured 3D Mesh Generation Via Compositional Latent Diffusion Transformers (2025) • No Venue
Lin et al.
Omnihuman-1: Rethinking The Scaling-up Of One-stage Conditioned Human Animation Models (2025) • No Venue
Lin et al.
FUSION: Fully Integration Of Vision-language Representations For Deep Cross-modal Understanding (2025) • No Venue
Liu et al.
Medsam3: Delving Into Segment Anything With Medical Concepts (2025) • No Venue
Liu et al.
METAGENE-1: Metagenomic Foundation Model For Pandemic Monitoring (2025) • No Venue
Liu et al.
Songgen: A Single Stage Auto-regressive Transformer For Text-to-song Generation (2025) • No Venue
Liu et al.
Sketchvideo: Sketch-based Video Generation And Editing (2025) • No Venue
Liu et al.
Unimoe-audio: Unified Speech And Music Generation With Dynamic-capacity Moe (2025) • No Venue
Liu et al.
VITA-E: Natural Embodied Interaction With Concurrent Seeing, Hearing, Speaking, And Acting (2025) • No Venue
Liu et al.
Ovi: Twin Backbone Cross-modal Fusion For Audio-video Generation (2025) • No Venue
Chetwin Low, Weimin Wang, Calder Katyal
Atoken: A Unified Tokenizer For Vision (2025) • No Venue
Lu et al.
Dreamactor-m1: Holistic, Expressive And Robust Human Image Animation With Hybrid Guidance (2025) • No Venue
Luo et al.
F1: A Vision-language-action Model Bridging Understanding And Generation To Actions (2025) • No Venue
Lv et al.
Autonomy-of-experts Models (2025) • No Venue
Lv et al.
Step-video-t2v Technical Report: The Practice, Challenges, And Future Of Video Foundation Model (2025) • No Venue
Ma et al.
Yume: An Interactive World Generation Model (2025) • No Venue
Mao et al.
Holocine: Holistic Generation Of Cinematic Multi-shot Long Video Narratives (2025) • No Venue
Meng et al.
Nablanabla: Neighborhood Adaptive Block-level Attention (2025) • No Venue
Mikhailov et al.
Scalable-softmax Is Superior For Attention (2025) • No Venue
Ken M. Nakanishi
Large Language Models Meet Extreme Multi-label Classification: Scaling And Multi-modal Framework (2025) • No Venue
Ortego et al.
Tokenhsi: Unified Synthesis Of Physical Human-scene Interactions Through Task Tokenization (2025) • No Venue
Pan et al.
AION-1: Omnimodal Foundation Model For Astronomical Sciences (2025) • No Venue
Parker et al.
Optimizing Multilingual Text-to-speech With Accents & Emotions (2025) • No Venue
Pawar et al.
RWKV-7 "goose" With Expressive Dynamic State Evolution (2025) • No Venue
Peng et al.
Dispider: Enabling Video Llms With Active Real-time Interaction Via Disentangled Perception, Decision, And Reaction (2025) • No Venue
Qian et al.
Demons In The Detail: On Implementing Load Balancing Loss For Training Specialized Mixture-of-expert Models (2025) • No Venue
Qiu et al.
One Small Step In Latent, One Giant Leap For Pixels: Fast Latent Upscale Adapter For Your Diffusion Models (2025) • No Venue
Aleksandr Razin, Danil Kazantsev, Ilya Makarov
Visual Autoregressive Models Beat Diffusion Models On Inference Time Scaling (2025) • No Venue
Erik Riise, Mehmet Onurcan Kaya, Dim P. Papadopoulos
Vamba: Understanding Hour-long Videos With Hybrid Mamba-transformers (2025) • No Venue
Ren et al.
Beyond Memorization: Extending Reasoning Depth With Recurrence, Memory And Test-time Compute Scaling (2025) • No Venue
Rodkin et al.
Hogwild! Inference: Parallel LLM Generation Via Concurrent Attention (2025) • No Venue
Rodionov et al.
Fast And Simplex: 2-simplicial Attention In Triton (2025) • No Venue
Roy et al.
Nile-chat: Egyptian Language Models For Arabic And Latin Scripts (2025) • No Venue
Shang et al.
Voila: Voice-language Foundation Models For Real-time Autonomous Interaction And Voice Role-play (2025) • No Venue
Shi et al.
Scaling Laws For Optimal Data Mixtures (2025) • No Venue
Shukor et al.
Smallthinker: A Family Of Efficient Large Language Models Natively Trained For Local Deployment (2025) • No Venue
Song et al.
Chain-of-model Learning For Language Model (2025) • No Venue
Song et al.
Causal Attention With Lookahead Keys (2025) • No Venue
Song et al.
DMM: Building A Versatile Image Generation Model Via Distillation-based Model Merging (2025) • No Venue
Song et al.
Stabletoken: A Noise-robust Semantic Speech Tokenizer For Resilient Speechllms (2025) • No Venue
Song et al.
The Curse Of Depth In Large Language Models (2025) • No Venue
Sun et al.
LASP-2: Rethinking Sequence Parallelism For Linear Attention And Its Hybrid (2025) • No Venue
Sun et al.
Speed Always Wins: A Survey On Efficient Architectures For Large Language Models (2025) • No Venue
Sun et al.
Nemotron Elastic: Towards Efficient Many-in-one Reasoning Llms (2025) • No Venue
Taghibakhshi et al.
Vision Bridge Transformer At Scale (2025) • No Venue
Tan et al.
Code Graph Model (CGM): A Graph-integrated Large Language Model For Repository-level Software Engineering Tasks (2025) • No Venue
Tao et al.
Gemma 3 Technical Report (2025) • No Venue
Team et al.
GLM-4.5: Agentic, Reasoning, And Coding (ARC) Foundation Models (2025) • No Venue
Team et al.
Minicpm4: Ultra-efficient Llms On End Devices (2025) • No Venue
Team et al.
Robobrain 2.0 Technical Report (2025) • No Venue
Team et al.
PAN: A World Model For General, Interactable, And Long-horizon World Simulation (2025) • No Venue
Team et al.
Padding Tone: A Mechanistic Analysis Of Padding Tokens In T2I Models (2025) • No Venue
Toker et al.
Deepseek LLM: Scaling Open-source Language Models With Longtermism (2024) • No Venue
Deepseek-Ai et al.
Star Attention: Efficient LLM Inference Over Long Sequences (2024) • No Venue
Shantanu Acharya, Fei Jia, Boris Ginsburg
Yi: Open Foundation Models By 01.AI (2024) • No Venue
Ai et al.
Blackmamba: Mixture Of Experts For State-space Models (2024) • No Venue
Anthony et al.
Chronos: Learning The Language Of Time Series (2024) • No Venue
Ansari et al.
Aya 23: Open Weight Releases To Further Multilingual Progress (2024) • No Venue
Aryabumi et al.
Scenescript: Reconstructing Scenes With An Autoregressive Structured Language Model (2024) • No Venue
Avetisyan et al.
Lumiere: A Space-time Diffusion Model For Video Generation (2024) • No Venue
Bar-Tal et al.
Xlstm: Extended Long Short-term Memory (2024) • Arxiv • 81 citations
Beck et al.
Llm2vec: Large Language Models Are Secretly Powerful Text Encoders (2024) • No Venue
Behnamghader et al.
SUTRA: Scalable Multilingual Language Model Architecture (2024) • No Venue
Bendale et al.
MUMU: Bootstrapping Multimodal Image Generation From Text-to-image Data (2024) • No Venue
William Berman, Alexander Peysakhovich
Qalam : A Multimodal LLM For Arabic Optical Character And Handwriting Recognition (2024) • No Venue
Bhatia et al.
Transformers Meet Neural Algorithmic Reasoners (2024) • No Venue
Bounsi et al.
Recurrentgemma: Moving Past Transformers For Efficient Open Language Models (2024) • No Venue
Botev et al.
Reducing Transformer Key-value Cache Size With Cross-layer Attention (2024) • No Venue
Brandon et al.
Genie: Generative Interactive Environments (2024) • No Venue
Bruce et al.
Ditctrl: Exploring Attention Control In Multi-modal Diffusion Transformer For Tuning-free Multi-prompt Longer Video Generation (2024) • No Venue
Cai et al.
Stealing Part Of A Production Language Model (2024) • No Venue
Carlini et al.
Dolphin: Long Context As A New Modality For Energy-efficient On-device Language Models (2024) • No Venue
Chen et al.
EVLM: An Efficient Vision-language Model For Visual Understanding (2024) • No Venue
Chen et al.
Expanding Performance Boundaries Of Open-source Multimodal Models With Model, Data, And Test-time Scaling (2024) • No Venue
Chen et al.
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey (2024) • No Venue
Chen et al.
Shvit: Single-head Vision Transformer With Memory Efficient Macro Design (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Seokju Yun, Youngmin Ro
Xtrimopglm: Unified 100b-scale Pre-trained Transformer For Deciphering The Language Of Protein (2024) • Arxiv • 61 citations
Chen et al.
Videorefer Suite: Advancing Spatial-temporal Object Understanding With Video LLM (2024) • No Venue
Yuan et al.
Identity-preserving Text-to-video Generation By Frequency Decomposition (2024) • No Venue
Yuan et al.
Efficiently Democratizing Medical Llms For 50 Languages Via A Mixture Of Language Family Experts (2024) • No Venue
Zheng et al.
Boosting Continual Learning Of Vision-language Models Via Mixture-of-experts Adapters (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Yu et al.
Med42-v2: A Suite Of Clinical Llms (2024) • No Venue
Christophe et al.
Visionllama: A Unified Llama Interface For Vision Tasks (2024) • No Venue
Chu et al.
Scalable High-resolution Pixel-space Image Synthesis With Hourglass Diffusion Transformers (2024) • No Venue
Crowson et al.
Deepseekmoe: Towards Ultimate Expert Specialization In Mixture-of-experts Language Models (2024) • No Venue
Dai et al.
Transformers Are Ssms: Generalized Models And Efficient Algorithms Through Structured State Space Duality (2024) • No Venue
Tri Dao, Albert Gu
Molmo And Pixmo: Open Weights And Open Data For State-of-the-art Multimodal Models (2024) • No Venue
Deitke et al.
Hymba: A Hybrid-head Architecture For Small Language Models (2024) • No Venue
Dong et al.
FAN: Fourier Analysis Networks (2024) • No Venue
Dong et al.
Emu3: Next-token Prediction Is All You Need (2024) • No Venue
Wang et al.
Fitv2: Scalable And Improved Flexible Vision Transformer For Diffusion Model (2024) • No Venue
Wang et al.
Git: Towards Generalist Vision Transformer Through Universal Language Interface (2024) • No Venue
Wang et al.
Videotetris: Towards Compositional Text-to-video Generation (2024) • No Venue
Tian et al.
Jamba-1.5: Hybrid Transformer-mamba Models At Scale (2024) • No Venue
Team et al.
Octo: An Open-source Generalist Robot Policy (2024) • No Venue
Team et al.
Longllava: Scaling Multi-modal Llms To 1000 Images Efficiently Via Hybrid Architecture (2024) • No Venue
Wang et al.
The Mamba In The Llama: Distilling And Accelerating Hybrid Models (2024) • No Venue
Wang et al.
Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey To The Edge Of Generalization (2024) • No Venue
Wang et al.
Let The Expert Stick To His Last: Expert-specialized Fine-tuning For Sparse Architectural Large Language Models (2024) • No Venue
Wang et al.
Magicvideo-v2: Multi-stage High-aesthetic Video Generation (2024) • No Venue
Wang et al.
Falcon Mamba: The First Competitive Attention-free 7B Language Model (2024) • No Venue
Zuo et al.
Branch-train-mix: Mixing Expert Llms Into A Mixture-of-experts LLM (2024) • No Venue
Sukhbaatar et al.
Hunyuan-large: An Open-source Moe Model With 52 Billion Activated Parameters By Tencent (2024) • No Venue
Sun et al.
Parrot: Multilingual Visual Instruction Tuning (2024) • No Venue
Sun et al.
Tokenformer: Rethinking Transformer Scaling With Tokenized Model Parameters (2024) • No Venue
Wang et al.
Yolov9: Learning What You Want To Learn Using Programmable Gradient Information (2024) • No Venue
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
Yolov10: Real-time End-to-end Object Detection (2024) • Arxiv • 950 citations
Wang et al.
Transformers Can Represent N-gram Language Models (2024) • No Venue
Anej Svete, Ryan Cotterell
Mixture-of-agents Enhances Large Language Model Capabilities (2024) • No Venue
Wang et al.
Ominicontrol: Minimal And Universal Control For Diffusion Transformer (2024) • No Venue
Tan et al.
Pre-training Small Base Lms With Fewer Tokens (2024) • No Venue
Sunny Sanyal, Sujay Sanghavi, Alexandros G. Dimakis
Scaling Smart: Accelerating Large Language Model Pre-training With Small Model Initialization (2024) • No Venue
Samragh et al.
Llama-nas: Efficient Neural Architecture Search For Large Language Models (2024) • No Venue
Sarah et al.
A Large Recurrent Action Model: Xlstm Enables Fast Inference For Robotics Tasks (2024) • No Venue
Schmied et al.
Prithvi Wxc: Foundation Model For Weather And Climate (2024) • No Venue
Schmude et al.
Hyper-connections (2024) • No Venue
Zhu et al.
BASE TTS: Lessons From Building A Billion-parameter Text-to-speech Model On 100K Hours Of Data (2024) • No Venue
Łajszczak et al.
Imp: Highly Capable Large Multimodal Models For Mobile Devices (2024) • No Venue
Shao et al.
Polynomial Composition Activations: Unleashing The Dynamics Of Large Language Models (2024) • No Venue
Zhuo et al.
Jetmoe: Reaching Llama2 Performance With 0.1M Dollars (2024) • No Venue
Shen et al.
Lumos : Empowering Multimodal Llms With Scene Text Recognition (2024) • No Venue
Shenoy et al.
Eagle: Exploring The Design Space For Multimodal Llms With Mixture Of Encoders (2024) • No Venue
Shi et al.
When Do We Not Need Larger Vision Models? (2024) • No Venue
Shi et al.
Llava-mod: Making Llava Tiny Via Moe Knowledge Distillation (2024) • No Venue
Shu et al.
A Large Encoder-decoder Family Of Foundation Models For Chemical Language (2024) • No Venue
Soares et al.
Turbo Sparse: Achieving LLM SOTA Performance With Minimal Activated Parameters (2024) • No Venue
Song et al.
Jina-embeddings-v3: Multilingual Embeddings With Task Lora (2024) • No Venue
Sturua et al.
Alleviating Distortion In Image Generation Via Multi-resolution Diffusion Models (2024) • No Venue
Liu et al.
CLEAR: Conv-like Linearization Revs Pre-trained Diffusion Transformers Up (2024) • No Venue
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Kangaroo: Lossless Self-speculative Decoding Via Double Early Exiting (2024) • No Venue
Liu et al.
Linfusion: 1 GPU, 1 Minute, 16K Image (2024) • No Venue
Liu et al.
Linrec: Linear Attention Mechanism For Long-term Sequential Recommender Systems (2024) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 52 citations
Liu et al.
NVILA: Efficient Frontier Visual Language Models (2024) • No Venue
Liu et al.
Mobilellm: Optimizing Sub-billion Parameter Language Models For On-device Use Cases (2024) • No Venue
Liu et al.
Oryx MLLM: On-demand Spatial-temporal Understanding At Arbitrary Resolution (2024) • No Venue
Liu et al.
Configurable Foundation Models: Building Llms From A Modular Perspective (2024) • No Venue
Xiao et al.
Vit-comer: Vision Transformer With Convolutional Multi-scale Feature Interaction For Dense Predictions (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 80 citations
Xia et al.
Mm-llms: Recent Advances In Multimodal Large Language Models (2024) • No Venue
Zhang et al.
MM1.5: Methods, Analysis & Insights From Multimodal LLM Fine-tuning (2024) • No Venue
Zhang et al.
Sageattention: Accurate 8-bit Attention For Plug-and-play Inference Acceleration (2024) • No Venue
Zhang et al.
Recurrent Drafter For Fast Speculative Decoding In Large Language Models (2024) • No Venue
Zhang et al.
Fit: Flexible Vision Transformer For Diffusion Model (2024) • No Venue
Lu et al.
Multi-head Mixture-of-experts (2024) • No Venue
Wu et al.
Megalodon: Efficient LLM Pretraining And Inference With Unlimited Context Length (2024) • No Venue
Ma et al.
Yuan 2.0-M32: Mixture Of Experts With Attention Router (2024) • No Venue
Wu et al.
Exploring The Role Of Large Language Models In Prompt Encoding For Diffusion Models (2024) • No Venue
Ma et al.
Janus: Decoupling Visual Encoding For Unified Multimodal Understanding And Generation (2024) • No Venue
Wu et al.
Llama Pro: Progressive Llama With Block Expansion (2024) • No Venue
Wu et al.
Layer-condensed KV Cache For Efficient Inference Of Large Language Models (2024) • No Venue
Haoyi Wu, Kewei Tu
Wavelets Are All You Need For Autoregressive Image Generation (2024) • No Venue
Mattar et al.
Transformers Can Do Arithmetic With The Right Embeddings (2024) • No Venue
McLeish et al.
MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training (2024) • No Venue
McKinzie et al.
Snap Video: Scaled Spatiotemporal Transformers For Text-to-video Synthesis (2024) • No Venue
Menapace et al.
Shortgpt: Layers In Large Language Models Are More Redundant Than You Expect (2024) • No Venue
Men et al.
Videoglamm: A Large Multimodal Model For Pixel-level Visual Grounding In Videos (2024) • No Venue
Munasinghe et al.
Olmoe: Open Mixture-of-experts Language Models (2024) • No Venue
Muennighoff et al.
Leave No Context Behind: Efficient Infinite Context Transformers With Infini-attention (2024) • No Venue
Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal
Compact Language Models Via Pruning And Knowledge Distillation (2024) • No Venue
Muralidharan et al.
Beyond Scaling Laws: Understanding Transformer Performance With Associative Memory (2024) • No Venue
Niu et al.
Integrating Large Language Models Into A Tri-modal Architecture For Automated Depression Classification (2024) • No Venue
Santosh V. Patapati
Transformers Are Multi-state Rnns (2024) • No Venue
Oren et al.
Byte Latent Transformer: Patches Scale Better Than Tokens (2024) • No Venue
Pagnoni et al.
Llamo: Large Language Model-based Molecular Graph Assistant (2024) • No Venue
Park et al.
Controlnext: Powerful And Efficient Control For Image And Video Generation (2024) • No Venue
Peng et al.
Eagle And Finch: RWKV With Matrix-valued States And Dynamic Recurrence (2024) • No Venue
Peng et al.
Moe-mamba: Efficient Selective State Space Models With Mixture Of Experts (2024) • No Venue
Pióro et al.
Movie Gen: A Cast Of Media Foundation Models (2024) • No Venue
Polyak et al.
HGRN2: Gated Linear Rnns With State Expansion (2024) • No Venue
Qin et al.
Tokenflow: Unified Image Tokenizer For Multimodal Understanding And Generation (2024) • No Venue
Qu et al.
Lightning Attention-2: A Free Lunch For Handling Unlimited Sequence Lengths In Large Language Models (2024) • No Venue
Qin et al.
Xgen-videosyn-1: High-fidelity Text-to-video Synthesis With Compressed Representations (2024) • No Venue
Qin et al.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder For Fast, Memory Efficient, And Long Context Finetuning And Inference (2024) • No Venue
Warner et al.
Layerwise Recurrent Router For Mixture-of-experts (2024) • No Venue
Qiu et al.
Tinyllava: A Framework Of Small-scale Large Multimodal Models (2024) • No Venue
Zhou et al.
Skywork-moe: A Deep Dive Into Training Techniques For Mixture-of-experts Language Models (2024) • No Venue
Wei et al.
2BP: 2-stage Backpropagation (2024) • No Venue
Christopher Rae, Joseph K. L. Lee, James Richings
Mixture-of-depths: Dynamically Allocating Compute In Transformer-based Language Models (2024) • No Venue
Raposo et al.
SAM 2: Segment Anything In Images And Videos (2024) • No Venue
Ravi et al.
Your Transformer Is Secretly Linear (2024) • No Venue
Razzhigaev et al.
Samba: Simple Hybrid State Space Models For Efficient Unlimited Context Language Modeling (2024) • No Venue
Ren et al.
Associative Recurrent Memory Transformer (2024) • No Venue
Rodkin et al.
Stacking Your Transformers: A Closer Look At Model Growth For Efficient LLM Pre-training (2024) • No Venue
Du et al.
The Llama 3 Herd Of Models (2024) • No Venue
Dubey et al.
Not All Language Model Features Are Linear (2024) • No Venue
Engels et al.
Fluid: Scaling Autoregressive Text-to-image Generative Models With Continuous Tokens (2024) • No Venue
Fan et al.
Llama-omni: Seamless Speech Interaction With Large Language Models (2024) • No Venue
Fang et al.
Kolmogorov-arnold Transformer (2024) • No Venue
Xingyi Yang, Xinchao Wang
Colpali: Efficient Document Retrieval With Vision Language Models (2024) • No Venue
Faysse et al.
FLUX That Plays Music (2024) • No Venue
Fei et al.
Mm-ego: Towards Building Egocentric Multimodal Llms (2024) • No Venue
Ye et al.
Differential Transformer (2024) • No Venue
Ye et al.
Minicpm-v: A GPT-4V Level MLLM On Your Phone (2024) • No Venue
Yao et al.
Convllava: Hierarchical Backbones As Visual Encoder For Large Multimodal Models (2024) • No Venue
Ge et al.
GAMA: A Large Audio-language Model With Advanced Audio Understanding And Complex Reasoning Abilities (2024) • No Venue
Ghosh et al.
Goldfinch: High Performance Rwkv/transformer Hybrid With Linear Pre-fill And Extreme Kv-cache Compression (2024) • No Venue
Goldstein et al.
Zamba: A Compact 7B SSM Hybrid Model (2024) • No Venue
Glorioso et al.
Knesset-dictabert: A Hebrew Language Model For Parliamentary Proceedings (2024) • No Venue
Gili Goldin, Shuly Wintner
Omnifusion Technical Report (2024) • No Venue
Goncharova et al.
Cogvideox: Text-to-video Diffusion Models With An Expert Transformer (2024) • No Venue
Yang et al.
Atomovideo: High Fidelity Image-to-video Generation (2024) • No Venue
Gong et al.
Specialized Language Models With Cheap Inference From Limited Domain Data (2024) • No Venue
Grangier et al.
Model Merging And Safety Alignment: One Bad Model Spoils The Bunch (2024) • No Venue
Hammoud et al.
Flex3d: Feed-forward 3D Generation With Flexible Reconstruction Model And Input View Curation (2024) • No Venue
Han et al.
Exploring Chatgpt And Its Impact On Society (2024) • AI and Ethics • 44 citations
Md. Asraful Haque, Shuai Li
Mambavision: A Hybrid Mamba-transformer Vision Backbone (2024) • No Venue
Ali Hatamizadeh, Jan Kautz
What Matters In Transformers? Not All Attention Is Needed (2024) • No Venue
He et al.
MLP-KAN: Unifying Deep Representation And Function Learning (2024) • No Venue
He et al.
Denoising Vision Transformers (2024) • No Venue
Yang et al.
Qwen2 Technical Report (2024) • No Venue
Yang et al.
Llava-gemma: Accelerating Multimodal Foundation Models With A Compact Language Model (2024) • No Venue
Hinck et al.
Block Transformer: Global-to-local Language Modeling For Fast Inference (2024) • No Venue
Ho et al.
Snapgen: Taming High-resolution Text-to-image Models For Mobile Devices With Efficient Architectures And Training (2024) • No Venue
Hu et al.
Fourier Position Embedding: Enhancing Attention's Periodic Extension For Length Generalization (2024) • No Venue
Hua et al.
Deciphering Cross-modal Alignment In Large Vision-language Models With Modality Integration Rate (2024) • No Venue
Huang et al.
Mh-moe:multi-head Mixture-of-experts (2024) • No Venue
Huang et al.
Piccolo2: General Text Embedding With Multi-task Hybrid Loss Training (2024) • No Venue
Huang et al.
Ultra-sparse Memory Network (2024) • No Venue
Huang et al.
Transformerfam: Feedback Attention Is Working Memory (2024) • No Venue
Hwang et al.
Mixture Of Nested Experts: Adaptive Processing Of Visual Tokens (2024) • No Venue
Jain et al.
Mixtral Of Experts (2024) • No Venue
Jiang et al.
Moh: Multi-head Attention As Mixture-of-head Attention (2024) • No Venue
Jin et al.
Pyramidal Flow Matching For Efficient Video Generative Modeling (2024) • No Venue
Jin et al.
Pegasus-v1 Technical Report (2024) • No Venue
Jung et al.
Adaptive Caching For Faster Video Generation With Diffusion Transformers (2024) • No Venue
Kahatapitiya et al.
Xgen-mm (BLIP-3): A Family Of Open Large Multimodal Models (2024) • No Venue
Xue et al.
Openmoe: An Early Effort On Open Mixture-of-experts Language Models (2024) • No Venue
Xue et al.
Building And Better Understanding Vision-language Models: Insights And Future Directions (2024) • No Venue
Laurençon et al.
What Matters When Building Vision-language Models? (2024) • No Venue
Laurençon et al.
Moai: Mixture Of All Intelligence For Large Language And Vision Models (2024) • No Venue
Lee et al.
Ootdiffusion: Outfitting Fusion Based Latent Diffusion For Controllable Virtual Try-on (2024) • No Venue
Xu et al.
Clip-moe: Towards Building Mixture Of Experts For CLIP With Diversified Multiplet Upcycling (2024) • No Venue
Zhang et al.
Beyond A*: Better Planning With Transformers Via Search Dynamics Bootstrapping (2024) • No Venue
Lehnert et al.
Slowfast-llava: A Strong Training-free Baseline For Video Large Language Models (2024) • No Venue
Xu et al.
Selective Attention Improves Transformer (2024) • No Venue
Yaniv Leviathan, Matan Kalman, Yossi Matias
Aria: An Open Multimodal Native Mixture-of-experts Model (2024) • No Venue
Li et al.
Codes: Towards Building Open-source Language Models For Text-to-sql (2024) • Proceedings of the ACM on Management of Data • 44 citations
Li et al.
Focusllm: Scaling Llm's Context By Parallel Decoding (2024) • No Venue
Li et al.
Exploring The Potential Of Large Language Models In Self-adaptive Systems (2024) • 2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI) • 47 citations
Li et al.
Euclid: Supercharging Multimodal Llms With Synthetic High-fidelity Visual Descriptions (2024) • No Venue
Zhang et al.
Mix-ln: Unleashing The Power Of Deeper Layers By Combining Pre-ln And Post-ln (2024) • No Venue
Pengxiang Li, Lu Yin, Shiwei Liu
Making Text Embedders Few-shot Learners (2024) • No Venue
Li et al.
Omg-seg: Is One Model Good Enough For All Segmentation? (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 47 citations
Li et al.
Scaling (down) CLIP: A Comprehensive Analysis Of Data, Architecture, And Training Strategies (2024) • No Venue
Zichao Li, Cihang Xie, Ekin Dogus Cubuk
Synergen-vl: Towards Synergistic Image Understanding And Generation With Vision Experts And Token Folding (2024) • No Venue
Li et al.
Your Mixture-of-experts LLM Is Secretly An Embedding Model For Free (2024) • No Venue
Ziyue Li, Tianyi Zhou
CLAY: A Controllable Large-scale Generative Model For Creating High-quality 3D Assets (2024) • ACM Transactions on Graphics • 49 citations
Zhang et al.
Ferret-v2: An Improved Baseline For Referring And Grounding With Large Language Models (2024) • No Venue
Zhang et al.
Jamba: A Hybrid Transformer-mamba Language Model (2024) • No Venue
Lieber et al.
Multi-layer Transformers Gradient Can Be Approximated In Almost Linear Time (2024) • No Venue
Liang et al.
Mixture-of-transformers: A Sparse And Scalable Architecture For Multi-modal Foundation Models (2024) • No Venue
Liang et al.
Foundation Models For Time Series Analysis: A Tutorial And Survey (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 100 citations
Liang et al.
GS-LRM: Large Reconstruction Model For 3D Gaussian Splatting (2024) • No Venue
Zhang et al.
Map-neo: Highly Capable And Transparent Bilingual Large Language Model Series (2024) • No Venue
Zhang et al.
Open-sora Plan: Open-source Large Video Generation Model (2024) • No Venue
Lin et al.
Moma: Efficient Early-fusion Pre-training With Mixture Of Modality-aware Experts (2024) • No Venue
Lin et al.
Moe-llava: Mixture Of Experts For Large Vision-language Models (2024) • No Venue
Lin et al.
STIV: Scalable Text And Image Conditioned Video Generation (2024) • No Venue
Lin et al.
RWKV: Reinventing Rnns For The Transformer Era (2023) • No Venue
Peng et al.
Generative Agents: Interactive Simulacra Of Human Behavior (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 941 citations
Park et al.
BEST: BERT Pre-training For Sign Language Recognition With Coupling Tokenization (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 41 citations
Zhao et al.
Stemgen: A Music Generation Model That Listens (2023) • No Venue
Parker et al.
Git-mol: A Multi-modal Large Language Model For Molecular Science With Graph, Image, And Text (2023) • Computers in Biology and Medicine • 48 citations
Liu et al.
Fm-vit: Flexible Modal Vision Transformers For Face Anti-spoofing (2023) • IEEE Transactions on Information Forensics and Security • 70 citations
Liu et al.
Multi-task Recommendations With Reinforcement Learning (2023) • IEEE Transactions on Image Processing • 53 citations
Liu et al.
Retentive Network: A Successor To Transformer For Large Language Models (2023) • No Venue
Sun et al.
X-former: In-memory Acceleration Of Transformers (2023) • IEEE Transactions on Very Large Scale Integration (VLSI) Systems • 45 citations
Sridharan et al.
Unified-io 2: Scaling Autoregressive Multimodal Models With Vision, Language, Audio, And Action (2023) • No Venue
Lu et al.
Mossformer: Pushing The Performance Limit Of Monaural Speech Separation Using Gated Single-head Transformer With Convolution-augmented Joint Self-attentions (2023) • ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 47 citations
Shengkui Zhao, Bin Ma
Kosmos-2.5: A Multimodal Literate Model (2023) • No Venue
Lv et al.
A Transformer-based Model With Self-distillation For Multimodal Emotion Recognition In Conversations (2023) • IEEE Transactions on Multimedia • 71 citations
Ma et al.
Deepcache: Accelerating Diffusion Models For Free (2023) • No Venue
Xinyin Ma, Gongfan Fang, Xinchao Wang
Remote Sensing Change Detection With Transformers Trained From Scratch (2023) • IEEE Transactions on Geoscience and Remote Sensing • 66 citations
Noman et al.
HOICLIP: Efficient Knowledge Transfer For HOI Detection With Vision-language Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Ning et al.
On Decoder-only Architecture For Speech-to-text And Large Language Model Integration (2023) • 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 52 citations
Wu et al.
Freecontrol: Training-free Spatial Control Of Any Text-to-image Diffusion Model With Any Condition (2023) • No Venue
Mo et al.
Next-gpt: Any-to-any Multimodal LLM (2023) • No Venue
Wu et al.
Tinystories: How Small Can Language Models Be And Still Speak Coherent English? (2023) • No Venue
Ronen Eldan, Yuanzhi Li
Missrec: Pre-training And Transferring Multi-modal Interest-aware Sequence Representation For Recommendation (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 45 citations
Wang et al.
GQA: Training Generalized Multi-query Transformer Models From Multi-head Checkpoints (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 204 citations
Ainslie et al.
Santacoder: Don't Reach For The Stars! (2023) • Arxiv • 50 citations
Allal et al.
Docllm: A Layout-aware Generative Language Model For Multimodal Document Understanding (2023) • No Venue
Wang et al.
Bitnet: Scaling 1-bit Transformers For Large Language Models (2023) • No Venue
Wang et al.
Codet5+: Open Code Large Language Models For Code Understanding And Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 215 citations
Wang et al.
Fusionframes: Efficient Architectural Aspects For Text-to-video Generation Pipeline (2023) • No Venue
Arkhipkin et al.
Kandinsky 3.0 Technical Report (2023) • No Venue
Arkhipkin et al.
Foundational Models Defining A New Era In Vision: A Survey And Outlook (2023) • Arxiv • 66 citations
Awais et al.
Chatgpt: Applications, Opportunities, And Threats (2023) • 2023 Systems and Information Engineering Design Symposium (SIEDS) • 162 citations
Bahrini et al.
Exponentially Faster Language Modelling (2023) • No Venue
Peter Belcak, Roger Wattenhofer
Rethinking Attention: Exploring Shallow Feed-forward Neural Networks As An Alternative To Attention Layers In Transformers (2023) • No Venue
Bozic et al.
Melm, A Generative Pretrained Language Modeling Framework That Solves Forward And Inverse Mechanics Problems (2023) • Journal of the Mechanics and Physics of Solids • 42 citations
Markus J. Buehler
Beyond Surface: Probing Llama Across Scales And Layers (2023) • No Venue
Chen et al.
Extending Context Window Of Large Language Models Via Positional Interpolation (2023) • No Venue
Chen et al.
Driving With Llms: Fusing Object-level Vector Modality For Explainable Autonomous Driving (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 110 citations
Chen et al.
Diversevul: A New Vulnerable Source Code Dataset For Deep Learning Based Vulnerability Detection (2023) • Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses • 136 citations
Chen et al.
Pixart-α: Fast Training Of Diffusion Transformer For Photorealistic Text-to-image Synthesis (2023) • No Venue
Chen et al.
Vanillanet: The Power Of Minimalism In Deep Learning (2023) • Arxiv • 82 citations
Chen et al.
Selformer: Molecular Representation Learning Via SELFIES Language Models (2023) • Machine Learning: Science and Technology • 43 citations
Yüksel et al.
Adapointr: Diverse Point Cloud Completion With Adaptive Geometry-aware Transformers (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 79 citations
Yu et al.
Fastvit: A Fast Hybrid Vision Transformer Using Structural Reparameterization (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 69 citations
Vasu et al.
SAM-CLIP: Merging Vision Foundation Models Towards Semantic And Spatial Understanding (2023) • No Venue
Wang et al.
Attt2m: Text-driven Human Motion Generation With Multi-perspective Attention Mechanism (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Zhong et al.
Simple And Controllable Music Generation (2023) • No Venue
Copet et al.
Switchhead: Accelerating Transformers With Mixture-of-experts Attention (2023) • No Venue
Csordás et al.
Vision Grid Transformer For Document Layout Analysis (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
da et al.
Patch N' Pack: Navit, A Vision Transformer For Any Aspect Ratio And Resolution (2023) • No Venue
Dehghani et al.
Longnet: Scaling Transformers To 1,000,000,000 Tokens (2023) • No Venue
Ding et al.
Towards Accurate Post-training Quantization For Vision Transformer (2023) • MM '22: The 30th ACM International Conference on Multimedia • 44 citations
Ding et al.
Lumos: Learning Agents With Unified Data, Modular Design, And Open-source Llms (2023) • No Venue
Yin et al.
Detecting And Grounding Multi-modal Media Manipulation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Rui Shao, Tianxing Wu, Ziwei Liu
RMT: Retentive Networks Meet Vision Transformers (2023) • No Venue
Fan et al.
Gloss-free Sign Language Translation: Improving From Visual-language Pretraining (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 47 citations
Zhou et al.
Generative Pre-trained Transformer: A Comprehensive Review On Enabling Technologies, Potential Applications, Emerging Challenges, And Future Directions (2023) • IEEE Access • 375 citations
Yenduri et al.
Revolutionizing Cyber Threat Detection With Large Language Models: A Privacy-preserving Bert-based Lightweight Model For Iot/iiot Devices (2023) • IEEE Access • 157 citations
Ferrag et al.
Attentionviz: A Global View Of Transformer Attention (2023) • IEEE Transactions on Visualization and Computer Graphics • 44 citations
Yeh et al.
Qmoe: Practical Sub-1-bit Compression Of Trillion-parameter Models (2023) • No Venue
Elias Frantar, Dan Alistarh
Datacomp: In Search Of The Next Generation Of Multimodal Datasets (2023) • Arxiv • 72 citations
Gadre et al.
Ureader: Universal Ocr-free Visually-situated Language Understanding With Multimodal Large Language Model (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 46 citations
Ye et al.
Mplug-owl2: Revolutionizing Multi-modal Large Language Model With Modality Collaboration (2023) • No Venue
Ye et al.
Vampnet: Music Generation Via Masked Acoustic Token Modeling (2023) • No Venue
Garcia et al.
Ip-adapter: Text Compatible Image Prompt Adapter For Text-to-image Diffusion Models (2023) • No Venue
Ye et al.
Composable Function-preserving Expansions For Transformer Architectures (2023) • No Venue
Andrea Gesmundo, Kaitlin Maile
Flacuna: Unleashing The Problem Solving Power Of Vicuna Using FLAN Fine-tuning (2023) • No Venue
Ghosal et al.
Emu Video: Factorizing Text-to-video Generation By Explicit Image Conditioning (2023) • No Venue
Girdhar et al.
Text With Knowledge Graph Augmented Transformer For Video Captioning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Gu et al.
Mamba: Linear-time Sequence Modeling With Selective State Spaces (2023) • No Venue
Albert Gu, Tri Dao
Large Language Models To Identify Social Determinants Of Health In Electronic Health Records (2023) • npj Digital Medicine • 184 citations
Guevara et al.
Enhancing Dyadic Relations With Homogeneous Graphs For Multimodal Recommendation (2023) • Frontiers in Artificial Intelligence and Applications • 45 citations
Zhou et al.
A Complete Survey On Generative AI (AIGC): Is Chatgpt From GPT-4 To GPT-5 All You Need? (2023) • Arxiv • 101 citations
Zhang et al.
Stylegan-t: Unlocking The Power Of Gans For Fast Large-scale Text-to-image Synthesis (2023) • Arxiv • 59 citations
Sauer et al.
Fastervit: Fast Vision Transformers With Hierarchical Attention (2023) • No Venue
Hatamizadeh et al.
LRM: Large Reconstruction Model For Single Image To 3D (2023) • No Venue
Hong et al.
Octopus: Embodied Vision-language Programmer From Environmental Feedback (2023) • No Venue
Yang et al.
Enhancing Phenotype Recognition In Clinical Notes Using Large Language Models: Phenobcbert And Phenogpt (2023) • Patterns • 45 citations
Yang et al.
Knowing Where To Focus: Event-aware Transformer For Video Grounding (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Jang et al.
One Wide Feedforward Is All You Need (2023) • No Venue
Pires et al.
End-to-end Speech Recognition: A Survey (2023) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 131 citations
Prabhavalkar et al.
From Sparse To Soft Mixtures Of Experts (2023) • No Venue
Puigcerver et al.
GPT-4 Enhanced Multimodal Grounding For Autonomous Driving: Leveraging Cross-modal Attention With Large Language Models (2023) • Communications in Transportation Research • 57 citations
Liao et al.
Dc-former: Diverse And Compact Transformer For Person Re-identification (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 53 citations
Li et al.
A Comparative Study Of Pretrained Language Models For Long Clinical Text (2023) • Journal of the American Medical Informatics Association • 82 citations
Li et al.
Graphix-t5: Mixing Pre-trained Transformers With Graph-aware Layers For Text-to-sql Parsing (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 51 citations
Li et al.
RESDSQL: Decoupling Schema Linking And Skeleton Parsing For Text-to-sql (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 96 citations
Li et al.
Otterhd: A High-resolution Multi-modality Model (2023) • No Venue
Li et al.
Fatezero: Fusing Attentions For Zero-shot Text-based Video Editing (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 141 citations
Qi et al.
Multi: Efficient Video-and-language Understanding With Text-guided Multiway-sampler And Multiple Choice Modeling (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 50 citations
Xu et al.
Scaling Up Gans For Text-to-image Synthesis (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 313 citations
Kang et al.
VILA: Learning Image Aesthetics From User Comments With Vision-language Pretraining (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Ke et al.
SOLAR 10.7B: Scaling Large Language Models With Simple Yet Effective Depth Up-scaling (2023) • No Venue
Kim et al.
A Transformer-based Representation-learning Model With Unified Processing Of Multimodal Input For Clinical Diagnostics (2023) • Nature Biomedical Engineering • 223 citations
Zhou et al.
Kandinsky: An Improved Text-to-image Synthesis With Image Prior And Latent Diffusion (2023) • No Venue
Razzhigaev et al.
RAPHAEL: Text-to-image Generation Via Large Mixture Of Diffusion Paths (2023) • Arxiv • 41 citations
Xue et al.
Videopoet: A Large Language Model For Zero-shot Video Generation (2023) • No Venue
Kondratyuk et al.
Xuanyuan 2.0: A Large Chinese Financial Chat Model With Hundreds Of Billions Parameters (2023) • Proceedings of the 32nd ACM International Conference on Information and Knowledge Management • 57 citations
Xuanyu Zhang, Qing Yang, Dongliang Xu
Filter-enhanced MLP Is All You Need For Sequential Recommendation (2022) • Proceedings of the ACM Web Conference 2022 • 283 citations
Zhou et al.
Accelerating Attention Through Gradient-based Learned Runtime Pruning (2022) • Proceedings of the 49th Annual International Symposium on Computer Architecture • 42 citations
Li et al.
Dit: Self-supervised Pre-training For Document Image Transformer (2022) • MM '22: The 30th ACM International Conference on Multimedia • 118 citations
Li et al.
A Unified Understanding Of Deep NLP Models For Text Classification (2022) • IEEE Transactions on Visualization and Computer Graphics • 42 citations
Li et al.
Exploring Plain Vision Transformer Backbones For Object Detection (2022) • Lecture Notes in Computer Science • 556 citations
Li et al.
Rethinking Query-key Pairwise Interactions In Vision Transformers (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 194 citations
Cheng Li, Yangxin Liu
Vision Transformers Are Parameter-efficient Audio-visual Learners (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 64 citations
Lin et al.
Mmlatch: Bottom-up Top-down Fusion For Multimodal Sentiment Analysis (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 43 citations
Georgios Paraskevopoulos, Efthymios Georgiou, Alexandros Potamianos
Pushing The Limits Of Simple Pipelines For Few-shot Learning: External Data And Fine-tuning Make A Difference (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 148 citations
Hu et al.
A Systematic Review And Replicability Study Of Bert4rec For Sequential Recommendation (2022) • Proceedings of the 16th ACM Conference on Recommender Systems • 40 citations
Aleksandr Petrov, Craig MacDonald
Scalablevit: Rethinking The Context-oriented Generalization Of Vision Transformer (2022) • Lecture Notes in Computer Science • 41 citations
Yang et al.
Tubedetr: Spatio-temporal Video Grounding With Transformers (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Yang et al.
Multi-behavior Hypergraph-enhanced Transformer For Sequential Recommendation (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 130 citations
Yang et al.
Sequencer: Deep LSTM For Image Classification (2022) • Arxiv • 53 citations
Yuki Tatsunami, Masato Taki
Scaling Up Models And Data With $\texttt{t5x}$ And $\texttt{seqio}$ (2022) • Arxiv • 47 citations
Roberts et al.
Codefill: Multi-token Code Completion By Jointly Learning From Structure And Naming Sequences (2022) • ICSE '22: 44th International Conference on Software Engineering • 67 citations
Maliheh Izadi, Roberta Gismondi, Georgios Gousios
Simpleclick: Interactive Image Segmentation With Simple Vision Transformers (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 84 citations
Liu et al.
Ts2-net: Token Shift And Selection Transformer For Text-video Retrieval (2022) • Lecture Notes in Computer Science • 97 citations
Liu et al.
Asymmetric Cross-scale Alignment For Text-based Person Search (2022) • IEEE Transactions on Multimedia • 54 citations
Ji et al.
Adamct: Adaptive Mixture Of Cnn-transformer For Sequential Recommendation (2022) • CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management • 43 citations
Jiang et al.
Pseudo-q: Generating Pseudo Language Queries For Visual Grounding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 54 citations
Jiang et al.
Vit5: Pretrained Text-to-text Transformer For Vietnamese Language Generation (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop • 51 citations
Phan et al.
DRT: A Lightweight Single Image Deraining Recursive Transformer (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 60 citations
Yuanchu Liang, Saeed Anwar, Yang Liu
Local-global Context Aware Transformer For Language-guided Video Segmentation (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 72 citations
Liang et al.
MSTR: Multi-scale Transformer For End-to-end Human-object Interaction Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 70 citations
Kim et al.
Vision-language Pre-training For Multimodal Aspect-based Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 103 citations
Yan Ling, Jianfei Yu, Rui Xia
Multi-agent Reinforcement Learning Is A Sequence Modeling Problem (2022) • Arxiv • 78 citations
Wen et al.
Leveraging Language Foundation Models For Human Mobility Forecasting (2022) • Proceedings of the 30th International Conference on Advances in Geographic Information Systems • 48 citations
Hao Xue, Bhanu Prakash Voutharoja, Flora D. Salim
Groupvit: Semantic Segmentation Emerges From Text Supervision (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 352 citations
Xu et al.
Deepspeed-moe: Advancing Mixture-of-experts Inference And Training To Power Next-generation AI Scale (2022) • Arxiv • 55 citations
Rajbhandari et al.
Transpolymer: A Transformer-based Language Model For Polymer Property Predictions (2022) • npj Computational Materials • 120 citations
Changwen Xu, Yuyang Wang, Amir Barati Farimani
Coderl: Mastering Code Generation Through Pretrained Models And Deep Reinforcement Learning (2022) • Arxiv • 87 citations
Le et al.
Magicvideo: Efficient Video Generation With Latent Diffusion Models (2022) • Arxiv • 63 citations
Zhou et al.
Training-free Transformer Architecture Search (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Zhou et al.
Mgpt: Few-shot Learners Go Multilingual (2022) • Arxiv • 47 citations
Shliazhko et al.
Transformers In Time-series Analysis: A Tutorial (2022) • Circuits, Systems, and Signal Processing • 192 citations
Ahmed et al.
Text And Code Embeddings By Contrastive Pre-training (2022) • Arxiv • 146 citations
Neelakantan et al.
Multimodal Token Fusion For Vision Transformers (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 173 citations
Wang et al.
TEACH: Temporal Action Composition For 3D Humans (2022) • 2022 International Conference on 3D Vision (3DV) • 98 citations
Athanasiou et al.
Multimae: Multi-modal Multi-task Masked Autoencoders (2022) • Lecture Notes in Computer Science • 186 citations
Bachmann et al.
End-to-end Transformer Based Model For Image Captioning (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 107 citations
Yiyu Wang, Jungang Xu, Yingfei Sun
Improving Vision Transformers By Revisiting High-frequency Components (2022) • Lecture Notes in Computer Science • 72 citations
Bai et al.
Mult: An End-to-end Multitask Learning Transformer (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Bhattacharjee et al.
Gpt-neox-20b: An Open-source Autoregressive Language Model (2022) • Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models • 241 citations
Black et al.
Are Transformers Effective For Time Series Forecasting? (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 1544 citations
Zeng et al.
Multimodal Contrastive Learning With Limoe: The Language-image Mixture Of Experts (2022) • Arxiv • 72 citations
Mustafa et al.
Expanding Language-image Pretrained Models For General Video Recognition (2022) • Lecture Notes in Computer Science • 221 citations
Ni et al.
Vision Transformer Slimming: Multi-dimension Searching In Continuous Optimization Space (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Chavan et al.
Activating More Pixels In Image Super-resolution Transformer (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 742 citations
Chen et al.
Dual-tasks Siamese Transformer Framework For Building Damage Assessment (2022) • IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium • 45 citations
Chen et al.
Hybrid Transformer With Multi-level Fusion For Multimodal Knowledge Graph Completion (2022) • SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 152 citations
Chen et al.
Gatehub: Gated History Unit With Background Suppression For Online Action Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Chen et al.
Pali: A Jointly-scaled Multilingual Language-image Model (2022) • Arxiv • 194 citations
Chen et al.
A Unified Sequence Interface For Vision Tasks (2022) • Arxiv • 49 citations
Chen et al.
What Matters In Language Conditioned Robotic Imitation Learning Over Unstructured Data (2022) • IEEE Robotics and Automation Letters • 49 citations
Oier Mees, Lukas Hermann, Wolfram Burgard
Locating And Editing Factual Associations In GPT (2022) • Arxiv • 172 citations
Meng et al.
Make-a-video: Text-to-video Generation Without Text-video Data (2022) • Arxiv • 308 citations
Singer et al.
Revisiting Classifier: Transferring Vision-language Models For Video Recognition (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Wenhao Wu, Zhun Sun, Wanli Ouyang
Task Adaptive Parameter Sharing For Multi-task Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Wallingford et al.
Vista: Vision And Scene Text Aggregation For Cross-modal Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 81 citations
Cheng et al.
Phenaki: Variable Length Video Generation From Open Domain Textual Description (2022) • Arxiv • 78 citations
Villegas et al.
Tableformer: Table Structure Understanding With Transformers (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 63 citations
Nassar et al.
Cross-attention Of Disentangled Modalities For 3D Human Mesh Recovery With Transformers (2022) • Lecture Notes in Computer Science • 88 citations
Junhyeong Cho, Kim Youwang, Tae-Hyun Oh
No Language Left Behind: Scaling Human-centered Machine Translation (2022) • Arxiv • 354 citations
Team et al.
St-moe: Designing Stable And Transferable Sparse Expert Models (2022) • Arxiv • 43 citations
Zoph et al.
Scaling Autoregressive Models For Content-rich Text-to-image Generation (2022) • Arxiv • 339 citations
Yu et al.
Coca: Contrastive Captioners Are Image-text Foundation Models (2022) • Arxiv • 512 citations
Yu et al.
Simplified State Space Layers For Sequence Modeling (2022) • Arxiv • 76 citations
Jimmy T. H. Smith, Andrew Warrington, Scott W. Linderman
Stripformer: Strip Transformer For Fast Image Deblurring (2022) • Lecture Notes in Computer Science • 155 citations
Tsai et al.
Visual Speech Recognition For Multiple Languages In The Wild (2022) • Nature Machine Intelligence • 130 citations
Pingchuan Ma, Stavros Petridis, Maja Pantic
Are Multimodal Transformers Robust To Missing Modality? (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 117 citations
Ma et al.
Bootstrapped Masked Autoencoders For Vision BERT Pretraining (2022) • Lecture Notes in Computer Science • 50 citations
Dong et al.
Coarse-to-fine Vision-language Pre-training With Fusion In The Backbone (2022) • Arxiv • 67 citations
Dou et al.
Vitcod: Vision Transformer Acceleration Via Dedicated Algorithm And Accelerator Co-design (2022) • 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA) • 83 citations
You et al.
Castling-vit: Compressing Self-attention Via Switching Towards Linear-angular Attention At Vision Transformer Inference (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
You et al.
A Text Attention Network For Spatial Deformation Robust Scene Text Image Super-resolution (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 86 citations
Jianqi Ma, Zhetong Liang, Lei Zhang
Unifying Vision, Text, And Layout For Universal Document Processing (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Tang et al.
Minedojo: Building Open-ended Embodied Agents With Internet-scale Knowledge (2022) • Arxiv • 59 citations
Fan et al.
DR-GAN: Distribution Regularization For Text-to-image Generation (2022) • IEEE Transactions on Neural Networks and Learning Systems • 43 citations
Tan et al.
BSRT: Improving Burst Super-resolution With Swin Transformer And Flow-guided Deformable Alignment (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 51 citations
Luo et al.
Sparsett: Visual Tracking With Sparse Transformers (2022) • Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22} • 121 citations
Fu et al.
End-to-end Generative Pretraining For Multimodal Video Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 141 citations
Seo et al.
Paraformer: Fast And Accurate Parallel Transformer For Non-autoregressive End-to-end Speech Recognition (2022) • Interspeech 2022 • 73 citations
Gao et al.
Mixture-of-experts With Expert Choice Routing (2022) • Arxiv • 57 citations
Zhou et al.
Linkbert: Pretraining Language Models With Document Links (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 163 citations
Michihiro Yasunaga, Jure Leskovec, Percy Liang
Vision Transformer With Deformable Attention (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 649 citations
Xia et al.
Future Transformer For Long-term Action Anticipation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Gong et al.
Wave-vit: Unifying Wavelet And Transformers For Visual Representation Learning (2022) • Lecture Notes in Computer Science • 145 citations
Yao et al.
Unixcoder: Unified Cross-modal Pre-training For Code Representation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 448 citations
Guo et al.
Efficient Long-range Attention Network For Image Super-resolution (2022) • Lecture Notes in Computer Science • 386 citations
Zhang et al.
Incorporating Dynamic Semantics Into Pre-trained Language Model For Aspect-based Sentiment Analysis (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 49 citations
Zhang et al.
Laneformer: Object-aware Row-column Transformers For Lane Detection (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 54 citations
Han et al.
Storseismic: A New Paradigm In Deep Learning For Seismic Processing (2022) • IEEE Transactions on Geoscience and Remote Sensing • 49 citations
Randy Harsuko, Tariq Alkhalifah
Neighborhood Attention Transformer (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 311 citations
Hassani et al.
UL2: Unifying Language Learning Paradigms (2022) • Arxiv • 97 citations
Tay et al.
Retromae: Pre-training Retrieval-oriented Language Models Via Masked Auto-encoder (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 41 citations
Xiao et al.
Vitaev2: Vision Transformer Advanced By Exploring Inductive Bias For Image Recognition And Beyond (2022) • International Journal of Computer Vision • 173 citations
Zhang et al.
Global Pointer: Novel Efficient Span-based Approach For Named Entity Recognition (2022) • Arxiv • 63 citations
Su et al.
Branchformer: Parallel Mlp-attention Architectures To Capture Local And Global Context For Speech Recognition And Understanding (2022) • Arxiv • 40 citations
Peng et al.
Towards Universal Sequence Representation Learning For Recommender Systems (2022) • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 194 citations
Hou et al.
Hit: Hierarchical Transformer With Momentum Contrast For Video-text Retrieval (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Liu et al.
End-to-end Neural Diarization: From Transformer To Conformer (2021) • IEEE Transactions on Image Processing • 210 citations
Liu et al.
Unit: Multimodal Multitask Learning With A Unified Transformer (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 224 citations
Ronghang Hu, Amanpreet Singh
An Efficient Transformer Decoder With Compressed Sub-layers (2021) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 46 citations
Li et al.
Efficient Attentions For Long Document Summarization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 121 citations
Huang et al.
Generating Syntactically Controlled Paraphrases Without Using Annotated Parallel Pairs (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 52 citations
Kuan-Hao Huang, Kai-Wei Chang
Graph-enhanced Multi-task Learning Of Multi-level Transition Dynamics For Session-based Recommendation (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 95 citations
Huang et al.
Hierarchical Learning For Generation With Long Source Sequences (2021) • Arxiv • 41 citations
Tobias Rohde, Xiaoxia Wu, Yinhan Liu
Transformer-based Attention Networks For Continuous Pixel-wise Prediction (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 170 citations
Yang et al.
Random Feature Attention (2021) • Arxiv • 122 citations
Peng et al.
CAT: Cross-attention Transformer For One-shot Object Detection (2021) • 2022 IEEE International Conference on Multimedia and Expo (ICME) • 173 citations
Lin et al.
Swinbert: End-to-end Transformers With Sparse Attention For Video Captioning (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 208 citations
Lin et al.
Relaxed Transformer Decoders For Direct Action Proposal Generation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 153 citations
Tan et al.
Hash Layers For Large Sparse Models (2021) • Arxiv • 48 citations
Roller et al.
TVT: Transferable Vision Transformer For Unsupervised Domain Adaptation (2021) • 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) • 113 citations
Yang et al.
Perceiver IO: A General Architecture For Structured Inputs & Outputs (2021) • Arxiv • 205 citations
Jaegle et al.
Sparse DETR: Efficient End-to-end Object Detection With Learnable Sparsity (2021) • Arxiv • 89 citations
Roh et al.
Causal Attention For Vision-language Tasks (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Yang et al.
Offline Reinforcement Learning As One Big Sequence Modeling Problem (2021) • Arxiv • 41 citations
Michael Janner, Qiyang Li, Sergey Levine
High-resolution Image Synthesis With Latent Diffusion Models (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 10513 citations
Rombach et al.
A Survey Of Visual Transformers (2021) • IEEE Transactions on Neural Networks and Learning Systems • 285 citations
Liu et al.
Visual Saliency Transformer (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 429 citations
Liu et al.
Multimodal Emergent Fake News Detection Via Meta Neural Process Networks (2021) • Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining • 51 citations
Wang et al.
Open-domain, Content-based, Multi-modal Fact-checking Of Out-of-context Images Via Online Resources (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 60 citations
Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
Natural Language Understanding For Argumentative Dialogue Systems In The Opinion Building Domain (2021) • Knowledge-Based Systems • 41 citations
Abro et al.
GODIVA: Generating Open-domain Videos From Natural Descriptions (2021) • Arxiv • 78 citations
Wu et al.
Retrieval Augmentation Reduces Hallucination In Conversation (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 370 citations
Shuster et al.
PASTE: A Tagging-free Decoding Framework Using Pointer Networks For Aspect Sentiment Triplet Extraction (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 49 citations
Mukherjee et al.
Docformer: End-to-end Transformer For Document Understanding (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 200 citations
Appalaraju et al.
N\"UWA: Visual Synthesis Pre-training For Neural Visual World Creation (2021) • Lecture Notes in Computer Science • 119 citations
Wu et al.
Variational Transformer Networks For Layout Generation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 98 citations
Diego Martin Arroyo, Janis Postels, Federico Tombari
Pale Transformer: A General Vision Transformer Backbone With Pale-shaped Attention (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 48 citations
Wu et al.
Vision Transformer For Fast And Efficient Scene Text Recognition (2021) • Lecture Notes in Computer Science • 147 citations
Rowel Atienza
Describing And Localizing Multiple Changes With Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 76 citations
Qiu et al.
Fast End-to-end Speech Recognition Via Non-autoregressive Models And Cross-modal Knowledge Transferring From BERT (2021) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 49 citations
Bai et al.
Syntax-bert: Improving Pre-trained Transformers With Syntax Trees (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 40 citations
Bai et al.
Working Memory Connections For LSTM (2021) • Neural Networks • 220 citations
Landi et al.
Transformer Meets Tracker: Exploiting Temporal Context For Robust Visual Tracking (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 707 citations
Wang et al.
Vlmo: Unified Vision-language Pre-training With Mixture-of-modality-experts (2021) • Arxiv • 288 citations
Bao et al.
Cliport: What And Where Pathways For Robotic Manipulation (2021) • Arxiv • 98 citations
Mohit Shridhar, Lucas Manuelli, Dieter Fox
Multi-modal Sarcasm Detection And Humor Classification In Code-mixed Conversations (2021) • IEEE Transactions on Affective Computing • 63 citations
Bedi et al.
Data Expansion Using Back Translation And Paraphrasing For Hate Speech Detection (2021) • Arxiv • 73 citations
Djamila Romaissa Beddiar, Md Saroar Jahan, Mourad Oussalah
Fastformer: Additive Attention Can Be All You Need (2021) • Arxiv • 77 citations
Wu et al.
Lambdanetworks: Modeling Long-range Interactions Without Attention (2021) • Arxiv • 48 citations
Irwan Bello
Keyword Transformer: A Self-attention Model For Keyword Spotting (2021) • Interspeech 2021 • 98 citations
Axel Berg, Mark O'Connor, Miguel Tairum Cruz
Pimnet: A Parallel, Iterative And Mimicking Network For Scene Text Recognition (2021) • MM '21: ACM Multimedia Conference • 69 citations
Qiao et al.
Codet5: Identifier-aware Unified Pre-trained Encoder-decoder Models For Code Understanding And Generation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 1009 citations
Wang et al.
Efficient Conformer: Progressive Downsampling And Grouped Attention For Automatic Speech Recognition (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 68 citations
Maxime Burchi, Valentin Vielzeuf
On Pursuit Of Designing Multi-modal Transformer For Video Grounding (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 48 citations
Cao et al.
Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 98 citations
Miech et al.
Generic Attention-model Explainability For Interpreting Bi-modal And Encoder-decoder Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 207 citations
Hila Chefer, Shir Gur, Lior Wolf
Mobile-former: Bridging Mobilenet And Transformer (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 514 citations
Chen et al.
Visualgpt: Data-efficient Adaptation Of Pretrained Language Models For Image Captioning (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 133 citations
Chen et al.
Rethinking Lifelong Sequential Recommendation With Incremental Multi-interest Attention (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 284 citations
Wu et al.
Zero-shot Text-to-image Generation (2021) • Arxiv • 1118 citations
Ramesh et al.
Adapting GPT, GPT-2 And BERT Language Models For Speech Recognition (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 56 citations
Xianrui Zheng, Chao Zhang, Philip C. Woodland
Improving Video-text Retrieval By Multi-stream Corpus Alignment And Dual Softmax Loss (2021) • Arxiv • 66 citations
Cheng et al.
Deepcad: A Deep Generative Network For Computer-aided Design Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 112 citations
Rundi Wu, Chang Xiao, Changxi Zheng
Tokens-to-token Vit: Training Vision Transformers From Scratch On Imagenet (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 1687 citations
Yuan et al.
Gaze Estimation Using Transformer (2021) • 2022 26th International Conference on Pattern Recognition (ICPR) • 111 citations
Yihua Cheng, Feng Lu
Incorporating Convolution Designs Into Visual Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 434 citations
Yuan et al.
Bartscore: Evaluating Generated Text As Text Generation (2021) • Arxiv • 318 citations
Weizhe Yuan, Graham Neubig, Pengfei Liu
Sentence-t5: Scalable Sentence Encoders From Pre-trained Text-to-text Models (2021) • Arxiv • 62 citations
Ni et al.
Contextual Transformer Networks For Visual Recognition (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 534 citations
Li et al.
Diverse Part Discovery: Occluded Person Re-identification With Part-aware Transformer (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 335 citations
Li et al.
A Channel Coding Benchmark For Meta-learning (2021) • Interspeech 2021 • 57 citations
Li et al.
Evaluation Of BERT And ALBERT Sentence Embedding Performance On Downstream NLP Tasks (2021) • 2020 25th International Conference on Pattern Recognition (ICPR) • 115 citations
Choi et al.
Directed Acyclic Graph Network For Conversational Emotion Recognition (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 245 citations
Shen et al.
CANINE: Pre-training An Efficient Tokenization-free Encoder For Language Representation (2021) • Transactions of the Association for Computational Linguistics • 114 citations
Clark et al.
BERT-GT: Cross-sentence N-ary Relation Extraction With BERT And Graph Transformer (2021) • Bioinformatics • 49 citations
Po-Ting Lai, Zhiyong Lu
E2E-VLP: End-to-end Vision-language Pre-training Enhanced By Visual Learning (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 78 citations
Xu et al.
Multimodal End-to-end Sparse Model For Emotion Recognition (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 70 citations
Dai et al.
Point-bert: Pre-training 3D Point Cloud Transformers With Masked Point Modeling (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 565 citations
Yu et al.
Scheduled Sampling In Vision-language Pretraining With Decoupled Encoder-decoder Network (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Li et al.
Entity Structure Within And Throughout: Modeling Mention Dependencies For Document-level Relation Extraction (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 167 citations
Xu et al.
The NLP Cookbook: Modern Recipes For Transformer Based Deep Learning Architectures (2021) • IEEE Access • 121 citations
Sushant Singh, Ausif Mahmood
Luna: Linear Unified Nested Attention (2021) • Arxiv • 49 citations
Ma et al.
Scaling End-to-end Models For Large-scale Multilingual ASR (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 46 citations
Li et al.
Peco: Perceptual Codebook For BERT Pre-training Of Vision Transformers (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 83 citations
Dong et al.
Cswin Transformer: A General Vision Transformer Backbone With Cross-shaped Windows (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 957 citations
Dong et al.
Deltalm: Encoder-decoder Pre-training For Language Generation And Translation By Augmenting Pretrained Multilingual Encoders (2021) • Arxiv • 51 citations
Ma et al.
An Empirical Study Of Training End-to-end Vision-and-language Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 244 citations
Dou et al.
Dytox: Transformers For Continual Learning With Dynamic Token Expansion (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 256 citations
Douillard et al.
GLM: General Language Model Pretraining With Autoregressive Blank Infilling (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 635 citations
Du et al.
Revamping Cross-modal Recipe Retrieval With Hierarchical Transformers And Self-supervised Learning (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Salvador et al.
Decoupling The Role Of Data, Attention, And Losses In Multimodal Transformers (2021) • Transactions of the Association for Computational Linguistics • 63 citations
Hendricks et al.
Object-region Video Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Herzig et al.
Unlocking Compositional Generalization In Pre-trained Models Using Intermediate Representations (2021) • Arxiv • 51 citations
Herzig et al.
Longt5: Efficient Text-to-text Transformer For Long Sequences (2021) • Findings of the Association for Computational Linguistics: NAACL 2022 • 87 citations
Guo et al.
Enhancing Knowledge Tracing Via Adversarial Training (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 77 citations
Guo et al.
A Syntax-guided Edit Decoder For Neural Program Repair (2021) • Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 178 citations
Zhu et al.
CAVER: Cross-modal View-mixed Transformer For Bi-modal Salient Object Detection (2021) • IEEE Transactions on Image Processing • 138 citations
Pang et al.
Continuous 3D Multi-channel Sign Language Production Via Progressive Transformers And Mixture Density Networks (2021) • International Journal of Computer Vision • 58 citations
Ben Saunders, Necati Cihan Camgoz, Richard Bowden
Vision Transformers For Weeds And Crops Classification Of High Resolution UAV Images (2021) • Remote Sensing • 167 citations
Reedha et al.
Dual Transformer For Point Cloud Analysis (2021) • IEEE Transactions on Multimedia • 76 citations
Han et al.
Knowledge-enhanced Hierarchical Graph Transformer Network For Multi-behavior Recommendation (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 179 citations
Xia et al.
Open-book Video Captioning With Retrieve-copy-generate Network (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 100 citations
Zhang et al.
Roformer: Enhanced Transformer With Rotary Position Embedding (2021) • Neurocomputing • 830 citations
Su et al.
Vitas: Vision Transformer Architecture Search (2021) • Lecture Notes in Computer Science • 54 citations
Su et al.
Transrefer3d: Entity-and-relation Aware Transformer For Fine-grained 3D Visual Grounding (2021) • MM '21: ACM Multimedia Conference • 65 citations
He et al.
Transfg: A Transformer Architecture For Fine-grained Recognition (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 315 citations
He et al.
Faceformer: Speech-driven 3D Facial Animation With Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 187 citations
Fan et al.
Cotext: Multi-task Learning With Code-text Transformer (2021) • Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021) • 67 citations
Phan et al.
Guided Generation Of Cause And Effect (2021) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 47 citations
Li et al.
Trocr: Transformer-based Optical Character Recognition With Pre-trained Models (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 247 citations
Li et al.
Ocr-free Document Understanding Transformer (2021) • Lecture Notes in Computer Science • 194 citations
Kim et al.
Bob: BERT Over BERT For Training Persona-based Dialogue Models From Limited Personalized Data (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 91 citations
Song et al.
A Practical Survey On Faster And Lighter Transformers (2021) • ACM Computing Surveys • 75 citations
Quentin Fournier, Gaétan Marceau Caron, Daniel Aloise
Lightspeech: Lightweight And Fast Text To Speech With Neural Architecture Search (2021) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Luo et al.
CM-NAS: Cross-modality Neural Architecture Search For Visible-infrared Person Re-identification (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 150 citations
Fu et al.
Handwritten Mathematical Expression Recognition With Bidirectionally Trained Transformer (2021) • Lecture Notes in Computer Science • 72 citations
Zhao et al.
Contextual Non-local Alignment Over Full-scale Representation For Text-based Person Search (2021) • Arxiv • 61 citations
Gao et al.
Container: Context Aggregation Network (2021) • Arxiv • 41 citations
Gao et al.
Fast Convergence Of DETR With Spatially Modulated Co-attention (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 302 citations
Gao et al.
Utnet: A Hybrid Transformer Architecture For Medical Image Segmentation (2021) • Lecture Notes in Computer Science • 356 citations
Yunhe Gao, Mu Zhou, Dimitris Metaxas
Tph-yolov5: Improved Yolov5 Based On Transformer Prediction Head For Object Detection On Drone-captured Scenarios (2021) • 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 1665 citations
Zhu et al.
Topic-driven And Knowledge-aware Transformer For Dialogue Emotion Detection (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 94 citations
Zhu et al.
Cross-attention Is All You Need: Adapting Pretrained Transformers For Machine Translation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 71 citations
Mozhdeh Gheini, Xiang Ren, Jonathan May
Nested Hierarchical Transformer: Towards Accurate, Data-efficient And Interpretable Visual Understanding (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 122 citations
Zhang et al.
Multi-turn Dialogue Reading Comprehension With Pivot Turns And Knowledge (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 216 citations
Zhuosheng Zhang, Junlong Li, Hai Zhao
Styleswin: Transformer-based GAN For High-resolution Image Generation (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 202 citations
Zhang et al.
Going Full-tilt Boogie On Document Understanding With Text-image-layout Transformer (2021) • Lecture Notes in Computer Science • 95 citations
Powalski et al.
KAT: A Knowledge Augmented Transformer For Vision-and-language (2021) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 68 citations
Gui et al.
End-to-end Audio-visual Speech Recognition With Conformers (2021) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 202 citations
Pingchuan Ma, Stavros Petridis, Maja Pantic
Byt5: Towards A Token-free Future With Pre-trained Byte-to-byte Models (2021) • Arxiv • 69 citations
Xue et al.
A Recurrent Vision-and-language BERT For Navigation (2020) • Arxiv • 40 citations
Hong et al.
State-of-the-art Augmented NLP Transformer Models For Direct And Single-step Retrosynthesis (2020) • Nature Communications • 289 citations
Tetko et al.
A Survey On Visual Transformer (2020) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 3049 citations
Han et al.
Progressive Transformers For End-to-end Sign Language Production (2020) • Lecture Notes in Computer Science • 84 citations
Ben Saunders, Necati Cihan Camgoz, Richard Bowden
Dynabert: Dynamic BERT With Adaptive Width And Depth (2020) • Arxiv • 119 citations
Hou et al.
Contextnet: Improving Convolutional Neural Networks For Automatic Speech Recognition With Global Context (2020) • Interspeech 2020 • 244 citations
Han et al.
Low Rank Fusion Based Transformers For Multimodal Sequences (2020) • Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML) • 53 citations
Sahay et al.
GPT-GNN: Generative Pre-training Of Graph Neural Networks (2020) • Arxiv • 71 citations
Hu et al.
Encoder-decoder Models Can Benefit From Pre-trained Masked Language Models In Grammatical Error Correction (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 120 citations
Kaneko et al.
LAMBERT: Layout-aware (language) Modeling For Information Extraction (2020) • Lecture Notes in Computer Science • 84 citations
Garncarek et al.
Informer: Beyond Efficient Transformer For Long Sequence Time-series Forecasting (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 3884 citations
Zhou et al.
Heterogeneous Graph Transformer (2020) • WWW '20: The Web Conference 2020 • 949 citations
Hu et al.
Multi-dialect Arabic BERT For Country-level Dialect Identification (2020) • Arxiv • 45 citations
Talafha et al.
Hybrid Ranking Network For Text-to-sql (2020) • Arxiv • 50 citations
Lyu et al.
Tabert: Pretraining For Joint Understanding Of Textual And Tabular Data (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 380 citations
Yin et al.
Unified Streaming And Non-streaming Two-pass End-to-end Model For Speech Recognition (2020) • Arxiv • 46 citations
Zhang et al.
Towards Automated Neural Interaction Discovery For Click-through Rate Prediction (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 61 citations
Song et al.
Realformer: Transformer Likes Residual Attention (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 45 citations
He et al.
Conformer: Convolution-augmented Transformer For Speech Recognition (2020) • Interspeech 2020 • 1880 citations
Gulati et al.
Autostr: Efficient Backbone Search For Scene Text Recognition (2020) • Lecture Notes in Computer Science • 48 citations
Zhang et al.
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 84 citations
Shi et al.
Transformer Feed-forward Layers Are Key-value Memories (2020) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 134 citations
Geva et al.
Injecting Numerical Reasoning Skills Into Language Models (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 53 citations
Mor Geva, Ankit Gupta, Jonathan Berant
A Probabilistic Formulation Of Unsupervised Text Style Transfer (2020) • Arxiv • 63 citations
He et al.
Interpretable Rumor Detection In Microblogs By Attending To User Interactions (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 176 citations
Khoo et al.
Deberta: Decoding-enhanced BERT With Disentangled Attention (2020) • Arxiv • 412 citations
He et al.
Contrastive Triple Extraction With Generative Transformer (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 72 citations
Ye et al.
Machine Reading Comprehension: The Role Of Contextualized Language Models And Beyond (2020) • Arxiv • 48 citations
Zhuosheng Zhang, Hai Zhao, Rui Wang
Transformer Networks For Trajectory Forecasting (2020) • 2020 25th International Conference on Pattern Recognition (ICPR) • 336 citations
Giuliari et al.
Dynamic And Static Context-aware LSTM For Multi-agent Motion Prediction (2020) • Lecture Notes in Computer Science • 45 citations
Tao et al.
Show, Edit And Tell: A Framework For Editing Image Captions (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Fawaz Sammani, Luke Melas-Kyriazi
LUKE: Deep Contextualized Entity Representations With Entity-aware Self-attention (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 536 citations
Yamada et al.
Learning To Discretely Compose Reasoning Module Networks For Video Captioning (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 59 citations
Tan et al.
Adversarial Training For Aspect-based Sentiment Analysis With BERT (2020) • 2020 25th International Conference on Pattern Recognition (ICPR) • 41 citations
Akbar Karimi, Leonardo Rossi, Andrea Prati
Rolling-unrolling Lstms For Action Anticipation From First-person Video (2020) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 123 citations
Antonino Furnari, Giovanni Maria Farinella
From Machine Reading Comprehension To Dialogue State Tracking: Bridging The Gap (2020) • Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI • 49 citations
Gao et al.
MTL-NAS: Task-agnostic Neural Architecture Search Towards General-purpose Multi-task Learning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Gao et al.
Spatially Aware Multimodal Transformers For Textvqa (2020) • Lecture Notes in Computer Science • 59 citations
Kant et al.
A Co-interactive Transformer For Joint Slot Filling And Intent Detection (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 89 citations
Qin et al.
Code Summarization With Structure-induced Transformer (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 45 citations
Hongqiu Wu, Hai Zhao, Min Zhang
Univl: A Unified Video And Language Pre-training Model For Multimodal Understanding And Generation (2020) • Arxiv • 169 citations
Luo et al.
Sequential Latent Knowledge Selection For Knowledge-grounded Dialogue (2020) • Arxiv • 111 citations
Byeongchang Kim, Jaewoo Ahn, Gunhee Kim
Code Prediction By Feeding Trees To Transformers (2020) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 140 citations
Kim et al.
Gshard: Scaling Giant Models With Conditional Computation And Automatic Sharding (2020) • Arxiv • 345 citations
Lepikhin et al.
Reasoning With Latent Structure Refinement For Document-level Relation Extraction (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 280 citations
Nan et al.
Robust Encodings: A Framework For Combating Adversarial Typos (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 89 citations
Jones et al.
Convbert: Improving BERT With Span-based Dynamic Convolution (2020) • Arxiv • 118 citations
Jiang et al.
A Transformer-based Approach For Source Code Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 256 citations
Ahmad et al.
ETC: Encoding Long And Structured Inputs In Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 103 citations
Ainslie et al.
Efficient Transformers: A Survey (2020) • ACM Computing Surveys • 532 citations
Tay et al.
Imagebert: Cross-modal Pre-training With Large-scale Weak-supervised Image-text Data (2020) • Arxiv • 154 citations
Qi et al.
Prophetnet: Predicting Future N-gram For Sequence-to-sequence Pre-training (2020) • Arxiv • 83 citations
Qi et al.
HHH: An Online Medical Chatbot System Based On Knowledge Graph And Hierarchical Bi-directional Attention (2020) • Proceedings of the Australasian Computer Science Week Multiconference • 47 citations
Qiming Bao, Lin Ni, Jiamou Liu
Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020) • Interspeech 2020 • 57 citations
Yang et al.
Fastspeech 2: Fast And High-quality End-to-end Text To Speech (2020) • Arxiv • 514 citations
Ren et al.
TED: A Pretrained Unsupervised Summarization Model With Theme Modeling And Denoising (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 43 citations
Yang et al.
On The Comparison Of Popular End-to-end Models For Large Scale Speech Recognition (2020) • Interspeech 2020 • 122 citations
Li et al.
VD-BERT: A Unified Vision And Dialog Transformer With BERT (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 43 citations
Wang et al.
Minilmv2: Multi-head Self-attention Relation Distillation For Compressing Pretrained Transformers (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 149 citations
Wang et al.
HAT: Hardware-aware Transformers For Efficient Natural Language Processing (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 138 citations
Wang et al.
Heterogeneous Graph Neural Networks For Extractive Document Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 270 citations
Wang et al.
Developing RNN-T Models Surpassing High-performance Hybrid Models With Customization Capability (2020) • Interspeech 2020 • 96 citations
Li et al.
Deep Entity Matching With Pre-trained Language Models (2020) • Proceedings of the VLDB Endowment • 246 citations
Li et al.
Characterbert: Reconciling Elmo And BERT For Word-level Open-vocabulary Representations From Characters (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 118 citations
Boukkouri et al.
Learning Context-aware Task Reasoning For Efficient Meta-reinforcement Learning (2020) • Proceedings of the ACM on Programming Languages • 60 citations
Haozhe Wang, Jiale Zhou, Xuming He
Aligntts: Efficient Feed-forward Text-to-speech System Without Explicit Alignment (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Zeng et al.
MART: Memory-augmented Recurrent Transformer For Coherent Video Paragraph Captioning (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 121 citations
Lei et al.
AMR Parsing Via Graph-sequence Iterative Inference (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 80 citations
Deng Cai, Wai Lam
Cat-gen: Improving Robustness In NLP Models Via Controlled Adversarial Text Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 62 citations
Wang et al.
Behind The Scene: Revealing The Secrets Of Pre-trained Vision-and-language Models (2020) • Lecture Notes in Computer Science • 43 citations
Cao et al.
Attention-based Quantum Tomography (2020) • Machine Learning: Science and Technology • 58 citations
Cha et al.
Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 68 citations
Miao et al.
Non-attentive Tacotron: Robust And Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling (2020) • Arxiv • 73 citations
Shen et al.
Does Multi-encoder Help? A Case Study On Context-aware Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 60 citations
Li et al.
Topological Planning With Transformers For Vision-and-language Navigation (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Chen et al.
Accurate Word Alignment Induction From Neural Machine Translation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 56 citations
Chen et al.
Developing Real-time Streaming Transformer Transducer For Speech Recognition On Large-scale Dataset (2020) • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 127 citations
Chen et al.
Logical Natural Language Generation From Open-domain Tables (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 114 citations
Chen et al.
Multispeech: Multi-speaker Text To Speech With Transformer (2020) • Interspeech 2020 • 52 citations
Chen et al.
Multi-task Learning Based Pre-trained Language Model For Code Completion (2020) • ASE '20: 35th IEEE/ACM International Conference on Automated Software Engineering • 123 citations
Liu et al.
Optimus: Organizing Sentences Via Pre-trained Modeling Of A Latent Space (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 129 citations
Li et al.
Internal Language Model Estimation For Domain-adaptive End-to-end Speech Recognition (2020) • 2021 IEEE Spoken Language Technology Workshop (SLT) • 57 citations
Meng et al.
Mask Textspotter V3: Segmentation Proposal Network For Robust Scene Text Spotting (2020) • Lecture Notes in Computer Science • 191 citations
Liao et al.
Robustscanner: Dynamically Enhancing Positional Clues For Robust Text Recognition (2020) • Lecture Notes in Computer Science • 172 citations
Yue et al.
Interbert: Vision-and-language Interaction For Multi-modal Pretraining (2020) • Arxiv • 56 citations
Lin et al.
Rikinet: Reading Wikipedia Pages For Natural Question Answering (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 54 citations
Liu et al.
Bertology Meets Biology: Interpreting Attention In Protein Language Models (2020) • Arxiv • 53 citations
Vig et al.
Document Ranking With A Pretrained Sequence-to-sequence Model (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 367 citations
Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin
Jointly Cross- And Self-modal Graph Attention Network For Query-based Moment Localization (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 108 citations
Liu et al.
MEANTIME: Mixture Of Attention Mechanisms With Multi-temporal Embeddings For Sequential Recommendation (2020) • Fourteenth ACM Conference on Recommender Systems • 56 citations
Sung Min Cho, Eunhyeok Park, Sungjoo Yoo
Towards An Appropriate Query, Key, And Value Computation For Knowledge Tracing (2020) • L@S '20: Seventh (2020) ACM Conference on Learning @ Scale • 191 citations
Choi et al.
Rethinking Attention With Performers (2020) • Arxiv • 122 citations
Choromanski et al.
ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators (2020) • Arxiv • 541 citations
Clark et al.
Guiding Attention In Sequence-to-sequence Models For Dialogue Act Prediction (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Colombo et al.
Multi-head Attention: Collaborate Instead Of Concatenate (2020) • Arxiv • 76 citations
Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi
Knowledge Graph-augmented Abstractive Summarization With Semantic-driven Cloze Reward (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 148 citations
Luyang Huang, Lingfei Wu, Lu Wang
Revisiting Pre-trained Models For Chinese Natural Language Processing (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 452 citations
Cui et al.
A Systematic Assessment Of Syntactic Generalization In Neural Language Models (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 120 citations
Hu et al.
Attentional Feature Fusion (2020) • 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) • 990 citations
Dai et al.
UP-DETR: Unsupervised Pre-training For Object Detection With Transformers (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 401 citations
Dai et al.
Mobilebert: A Compact Task-agnostic BERT For Resource-limited Devices (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 593 citations
Sun et al.
Mixup-transformer: Dynamic Data Augmentation For NLP Tasks (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 72 citations
Sun et al.
A Streaming On-device End-to-end Model Surpassing Server-side Conventional Model Quality And Latency (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 200 citations
Sainath et al.
Vlanet: Video-language Alignment Network For Weakly-supervised Video Moment Retrieval (2020) • Lecture Notes in Computer Science • 78 citations
Ma et al.
Bridging Textual And Tabular Data For Cross-domain Text-to-sql Semantic Parsing (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 145 citations
Xi Victoria Lin, Richard Socher, Caiming Xiong
Layoutlmv2: Multi-modal Pre-training For Visually-rich Document Understanding (2020) • Arxiv • 57 citations
Xu et al.
Chart-to-text: Generating Natural Language Descriptions For Charts By Adapting The Transformer Model (2020) • Proceedings of the 13th International Conference on Natural Language Generation • 46 citations
Jason Obeid, Enamul Hoque
Deep Multimodal Neural Architecture Search (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 87 citations
Yu et al.
Minimum Latency Training Strategies For Streaming Sequence-to-sequence ASR (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 46 citations
Inaguma et al.
Attention Flows: Analyzing And Comparing Attention Mechanisms In Language Models (2020) • IEEE Transactions on Visualization and Computer Graphics • 78 citations
Joseph F Derose, Jiayao Wang, Matthew Berger
Understanding And Improving Information Transfer In Multi-task Learning (2020) • Arxiv • 47 citations
Sen Wu, Hongyang R. Zhang, Christopher Ré
Object Relational Graph With Teacher-recommended Learning For Video Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 310 citations
Zhang et al.
Location-aware Graph Convolutional Networks For Video Question Answering (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 169 citations
Huang et al.
Understanding And Improving Lexical Choice In Non-autoregressive Translation (2020) • Arxiv • 44 citations
Ding et al.
Gector -- Grammatical Error Correction: Tag, Not Rewrite (2020) • Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications • 103 citations
Omelianchuk et al.
Single Headed Attention Based Sequence-to-sequence Model For State-of-the-art Results On Switchboard (2020) • Interspeech 2020 • 42 citations
Tüske et al.
Transform And Tell: Entity-aware News Image Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 81 citations
Alasdair Tran, Alexander Mathews, Lexing Xie
TURL: Table Understanding Through Representation Learning (2020) • Arxiv • 43 citations
Deng et al.
Durian: Duration Informed Attention Network For Multimodal Synthesis (2019) • Arxiv • 94 citations
Yu et al.
Hierarchical Temporal Convolutional Networks For Dynamic Recommender Systems (2019) • The World Wide Web Conference • 110 citations
You et al.
Self-attention Aligner: A Latency-control End-to-end Model For ASR Using Self-attention Network And Chunk-hopping (2019) • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 100 citations
Linhao Dong, Feng Wang, Bo Xu
Multimodal Transformer With Multi-view Visual Representation For Image Captioning (2019) • IEEE Transactions on Circuits and Systems for Video Technology • 358 citations
Yu et al.
Multimodal Transformer For Unaligned Multimodal Language Sequences (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1063 citations
Tsai et al.
Linguistic Knowledge And Transferability Of Contextual Representations (2019) • Proceedings of the 2019 Conference of the North • 57 citations
Liu et al.
Almost Unsupervised Text To Speech And Automatic Speech Recognition (2019) • Arxiv • 58 citations
Ren et al.
Variational Mixture-of-experts Autoencoders For Multi-modal Deep Generative Models (2019) • Arxiv • 88 citations
Shi et al.
Enriching Pre-trained Language Model With Entity Information For Relation Classification (2019) • CIKM '19: The 28th ACM International Conference on Information and Knowledge Management • 272 citations
Shanchan Wu, Yifan He
K-BERT: Enabling Language Representation With Knowledge Graph (2019) • Arxiv • 84 citations
Liu et al.
MASS: Masked Sequence To Sequence Pre-training For Language Generation (2019) • Arxiv • 579 citations
Song et al.
A Multiscale Visualization Of Attention In The Transformer Model (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 295 citations
Jesse Vig
Earlier Attention? Aspect-aware LSTM For Aspect-based Sentiment Analysis (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 50 citations
Xing et al.
Analyzing The Structure Of Attention In A Transformer Language Model (2019) • Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP • 123 citations
Jesse Vig, Yonatan Belinkov
Automatic Radiology Report Generation Based On Multi-view Image Fusion And Medical Concept Enrichment (2019) • Lecture Notes in Computer Science • 165 citations
Yuan et al.
Improving The Similarity Measure Of Determinantal Point Processes For Extractive Multi-document Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 61 citations
Cho et al.
Towards Generating Long And Coherent Text With Multi-level Latent Variable Models (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 46 citations
Shen et al.
Temporal Convolution For Real-time Keyword Spotting On Mobile Devices (2019) • Interspeech 2019 • 143 citations
Choi et al.
Augmenting Self-attention With Persistent Memory (2019) • Arxiv • 46 citations
Sukhbaatar et al.
Generating Persona Consistent Dialogues By Exploiting Natural Language Inference (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 65 citations
Song et al.
Pre-training With Whole Word Masking For Chinese BERT (2019) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 957 citations
Cui et al.
Coupling Retrieval And Meta-learning For Context-dependent Semantic Parsing (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 44 citations
Guo et al.
Expressing Visual Relationships Via Language (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 42 citations
Tan et al.
Strategies For Structuring Story Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 186 citations
Angela Fan, Mike Lewis, Yann Dauphin
MUSE: Parallel Multi-scale Attention For Sequence To Sequence Learning (2019) • Arxiv • 41 citations
Zhao et al.
Style Transformer: Unpaired Text Style Transfer Without Disentangled Latent Representation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 68 citations
Dai et al.
Transformer-xl: Attentive Language Models Beyond A Fixed-length Context (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 1694 citations
Dai et al.
RWTH ASR Systems For Librispeech: Hybrid Vs Attention -- W/o Data Augmentation (2019) • Interspeech 2019 • 108 citations
Lüscher et al.
Voice Transformer Network: Sequence-to-sequence Voice Conversion Using Transformer With Text-to-speech Pretraining (2019) • Interspeech 2020 • 47 citations
Huang et al.
Option Comparison Network For Multiple-choice Reading Comprehension (2019) • Arxiv • 49 citations
Ran et al.
Encode, Tag, Realize: High-precision Text Editing (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Malmi et al.
Correction Of Automatic Speech Recognition With Transformer Sequence-to-sequence Model (2019) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 49 citations
Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg
Rewarding Smatch: Transition-based AMR Parsing With Reinforcement Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 60 citations
Naseem et al.
Meshed-memory Transformer For Image Captioning (2019) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 861 citations
Cornia et al.
Adaptively Sparse Transformers (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 81 citations
Gonçalo M. Correia, Vlad Niculae, André F. T. Martins
X-SQL: Reinforce Schema Representation With Context (2019) • Arxiv • 53 citations
He et al.
Improving Multi-turn Dialogue Modelling With Utterance Rewriter (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 106 citations
Su et al.
Recosa: Detecting The Relevant Contexts With Self-attention For Multi-turn Dialogue Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 121 citations
Zhang et al.
Context-aware Visual Policy Network For Fine-grained Image Captioning (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 151 citations
Zha et al.
Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 174 citations
Li et al.
Very Deep Self-attention Networks For End-to-end Speech Recognition (2019) • Interspeech 2019 • 48 citations
Pham et al.
Key Fact As Pivot: A Two-stage Model For Low Resource Table-to-text Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 45 citations
Ma et al.
Visualbert: A Simple And Performant Baseline For Vision And Language (2019) • Arxiv • 1227 citations
Li et al.
Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019) • Interspeech 2019 • 153 citations
Zhang et al.
MASTER: Multi-aspect Non-local Network For Scene Text Recognition (2019) • Pattern Recognition • 171 citations
Lu et al.
Attention-passing Models For Robust And Data-efficient End-to-end Speech Translation (2019) • Transactions of the Association for Computational Linguistics • 94 citations
Sperber et al.
Jasper: An End-to-end Convolutional Neural Acoustic Model (2019) • Interspeech 2019 • 212 citations
Li et al.
Joint Training Framework For Text-to-speech And Voice Conversion Using Multi-source Tacotron And Wavenet (2019) • Interspeech 2019 • 55 citations
Zhang et al.
PEGASUS: Pre-training With Extracted Gap-sentences For Abstractive Summarization (2019) • Arxiv • 976 citations
Zhang et al.
Reconstruct And Represent Video Contents For Captioning Via Reinforcement Learning (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 79 citations
Zhang et al.
Object-driven Text-to-image Synthesis Via Adversarial Training (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 309 citations
Li et al.
Eleatt-rnn: Adding Attentiveness To Neurons In Recurrent Neural Networks (2019) • IEEE Transactions on Image Processing • 103 citations
Zhang et al.
Improving Relation Extraction By Pre-trained Language Representations (2019) • Proceedings of AKBC 2019 • 53 citations
Christoph Alt, Marc Hübner, Leonhard Hennig
Incremental Transformer With Deliberation Decoder For Document Grounded Conversations (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 42 citations
Li et al.
Grounding Human-to-vehicle Advice For Self-driving Vehicles (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Kim et al.
GCDT: A Global Context Enhanced Deep Transition Architecture For Sequence Labeling (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 96 citations
Liu et al.
Parallel Iterative Edit Models For Local Sequence Transduction (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 142 citations
Awasthi et al.
Deep Equilibrium Models (2019) • Arxiv • 245 citations
Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Sticking To The Facts: Confident Decoding For Faithful Data-to-text Generation (2019) • Arxiv • 48 citations
Tian et al.
Mockingjay: Unsupervised Speech Representation Learning With Deep Bidirectional Transformer Encoders (2019) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 195 citations
Liu et al.
Learning To Collocate Neural Modules For Image Captioning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 95 citations
Xu Yang, Hanwang Zhang, Jianfei Cai
Reinforcement Learning Based Text Style Transfer Without Parallel Training Corpus (2019) • Proceedings of the 2019 Conference of the North • 80 citations
Gong et al.
Multilingual Universal Sentence Encoder For Semantic Retrieval (2019) • Arxiv • 66 citations
Yang et al.
Understanding And Robustifying Differentiable Architecture Search (2019) • Arxiv • 165 citations
Zela et al.
Adding Interpretable Attention To Neural Translation Models Improves Word Alignment (2019) • Arxiv • 80 citations
Thomas Zenkel, Joern Wuebker, John Denero
Humor Detection: A Transformer Gets The Last Laugh (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 44 citations
Orion Weller, Kevin Seppi
BART: Denoising Sequence-to-sequence Pre-training For Natural Language Generation, Translation, And Comprehension (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 3464 citations
Lewis et al.
Context-aware Self-attention Networks (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 92 citations
Yang et al.
Language Modeling With Deep Transformers (2019) • Interspeech 2019 • 85 citations
Irie et al.
On The Choice Of Modeling Unit For Sequence-to-sequence Speech Recognition (2019) • Interspeech 2019 • 40 citations
Irie et al.
Attentive History Selection For Conversational Question Answering (2019) • Proceedings of the 28th ACM International Conference on Information and Knowledge Management • 44 citations
Qu et al.
End-to-end ASR: From Supervised To Semi-supervised Learning With Modern Architectures (2019) • Arxiv • 165 citations
Synnaeve et al.
A Discrete CVAE For Response Generation On Short-text Conversation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 43 citations
Gao et al.
Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM (2019) • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 49 citations
Cai et al.
Text Generation With Exemplar-based Adaptive Decoding (2019) • Proceedings of the 2019 Conference of the North • 46 citations
Peng et al.
Coarse-grain Fine-grain Coattention Network For Multi-evidence Question Answering (2019) • Arxiv • 47 citations
Zhong et al.
Sparse Sequence-to-sequence Models (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 178 citations
Ben Peters, Vlad Niculae, André F. T. Martins
Pay Less Attention With Lightweight And Dynamic Convolutions (2019) • Arxiv • 322 citations
Wu et al.
Multimodal Transformer Networks For End-to-end Video-grounded Dialogue Systems (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 51 citations
Le et al.
BERT And Pals: Projected Attention Layers For Efficient Adaptation In Multi-task Learning (2019) • Arxiv • 113 citations
Asa Cooper Stickland, Iain Murray
Single Headed Attention RNN: Stop Thinking With Your Head (2019) • Arxiv • 62 citations
Stephen Merity
Speaker Adaptation For Attention-based End-to-end Speech Recognition (2019) • Interspeech 2019 • 40 citations
Meng et al.
RAT-SQL: Relation-aware Schema Encoding And Linking For Text-to-sql Parsers (2019) • Arxiv • 72 citations
Wang et al.
Modeling Sentiment Dependencies With Graph Convolutional Networks For Aspect-level Sentiment Classification (2019) • Knowledge-Based Systems • 180 citations
Pinlong Zhaoa, Linlin Houb, Ou Wua
Answering While Summarizing: Multi-task Learning For Multi-hop QA With Evidence Extraction (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 96 citations
Nishida et al.
Watch, Listen And Tell: Multi-modal Weakly Supervised Dense Event Captioning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Tanzila Rahman, Bicheng Xu, Leonid Sigal
Learning Deep Transformer Models For Machine Translation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 356 citations
Wang et al.
Hierarchical Transformers For Multi-document Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 133 citations
Yang Liu, Mirella Lapata
Towards Non-saturating Recurrent Units For Modelling Long-term Dependencies (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Chandar et al.
Convolutional Self-attention Networks (2019) • Proceedings of the 2019 Conference of the North • 125 citations
Yang et al.
A Comprehensive Exploration On Wikisql With Table-aware Word Contextualization (2019) • Arxiv • 122 citations
Hwang et al.
Extracting Multiple-relations In One-pass With Pre-trained Transformers (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 45 citations
Wang et al.
Adaptive Embedding Gate For Attention-based Scene Text Recognition (2019) • Neurocomputing • 41 citations
Chen et al.
Deep Short Text Classification With Knowledge Powered Attention (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 132 citations
Chen et al.
Temporal Deformable Convolutional Encoder-decoder Networks For Video Captioning (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 99 citations
Chen et al.
Review-driven Answer Generation For Product-related Questions In E-commerce (2019) • Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining • 50 citations
Chen et al.
Iterative Answer Prediction With Pointer-augmented Multimodal Transformers For Textvqa (2019) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 140 citations
Hu et al.
Neural Architectures For Fine-grained Propaganda Detection In News (2019) • Arxiv • 51 citations
Gupta et al.
Higru: Hierarchical Gated Recurrent Units For Utterance-level Emotion Recognition (2019) • Arxiv • 70 citations
Jiao et al.
Convolutional Character Networks (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 183 citations
Xing et al.
Environmental Drivers Of Systematicity And Generalization In A Situated Agent (2019) • Arxiv • 53 citations
Hill et al.
Adaptive Attention Span In Transformers (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 96 citations
Sukhbaatar et al.
Entity-consistent End-to-end Task-oriented Dialogue System With KB Retriever (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 57 citations
Qin et al.
Star-transformer (2019) • Proceedings of the 2019 Conference of the North • 123 citations
Guo et al.
Compositional Attention Networks For Machine Reasoning (2018) • Arxiv • 132 citations
Drew A. Hudson, Christopher D. Manning
Follownet: Robot Navigation By Following Natural Language Directions With Deep Reinforcement Learning (2018) • Third Workshop in Machine Learning in the Planning and Control of Robot Motion at ICRA 2018 • 43 citations
Shah et al.
Deep Communicating Agents For Abstractive Summarization (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 62 citations
Celikyilmaz et al.
A Bi-model Based RNN Semantic Frame Parsing Model For Intent Detection And Slot Filling (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 202 citations
Yu Wang, Yilin Shen, Hongxia Jin
Natural Language Generation For Electronic Health Records (2018) • npj Digital Medicine • 74 citations
Scott Lee
DKN: Deep Knowledge-aware Network For News Recommendation (2018) • Arxiv • 188 citations
Wang et al.
Regularizing Rnns For Caption Generation By Reconstructing The Past With The Present (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 90 citations
Chen et al.
Tree-to-tree Neural Networks For Program Translation (2018) • Arxiv • 83 citations
Xinyun Chen, Chang Liu, Dawn Song
Adapting The Neural Encoder-decoder Framework From Single To Multi-document Summarization (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 165 citations
Logan Lebanoff, Kaiqiang Song, Fei Liu
Fast Abstractive Summarization With Reinforce-selected Sentence Rewriting (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 604 citations
Yen-Chun Chen, Mohit Bansal
Dynamical Isometry And A Mean Field Theory Of Rnns: Gating Enables Signal Propagation In Recurrent Neural Networks (2018) • Arxiv • 73 citations
Minmin Chen, Jeffrey Pennington, Samuel S. Schoenholz
Dissecting Contextual Word Embeddings: Architecture And Representation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 420 citations
Peters et al.
Unsupervised Discrete Sentence Representation Learning For Interpretable Neural Dialog Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 136 citations
Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi
Conditional Variational Autoencoder For Neural Machine Translation (2018) • Arxiv • 42 citations
Artidoro Pagnoni, Kevin Liu, Shangyan Li
Character-level Language Modeling With Deeper Self-attention (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 146 citations
Al-Rfou et al.
Asynchronous Bidirectional Decoding For Neural Machine Translation (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 115 citations
Zhang et al.
Multi-modal Data Augmentation For End-to-end ASR (2018) • Interspeech 2018 • 54 citations
Renduchintala et al.
Hierarchical Neural Story Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1076 citations
Angela Fan, Mike Lewis, Yann Dauphin
Gpipe: Efficient Training Of Giant Neural Networks Using Pipeline Parallelism (2018) • Arxiv • 236 citations
Huang et al.
Mem2seq: Effectively Incorporating Knowledge Bases Into End-to-end Task-oriented Dialog Systems (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Andrea Madotto, Chien-Sheng Wu, Pascale Fung
Multimodal Dual Attention Memory For Video Story Question Answering (2018) • Lecture Notes in Computer Science • 73 citations
Kim et al.
Query And Output: Generating Words By Querying Distributed Word Representations For Paraphrase Generation (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 67 citations
Ma et al.
Attention-based LSTM For Psychological Stress Detection From Spoken Language Using Distant Supervision (2018) • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 59 citations
Genta Indra Winata, Onno Pepijn Kampman, Pascale Fung
GLAC Net: Glocal Attention Cascading Networks For Multi-image Cued Story Generation (2018) • Arxiv • 53 citations
Kim et al.
Semi-supervised Sequence Modeling With Cross-view Training (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 391 citations
Clark et al.
A Retrospective Analysis Of The Fake News Challenge Stance Detection Task (2018) • Arxiv • 68 citations
Hanselowski et al.
Hard Non-monotonic Attention For Character-level Transduction (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 50 citations
Shijie Wu, Pamela Shapiro, Ryan Cotterell
Qanet: Combining Local Convolution With Global Self-attention For Reading Comprehension (2018) • Arxiv • 417 citations
Yu et al.
A Unified Model For Extractive And Abstractive Summarization Using Inconsistency Loss (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 265 citations
Hsu et al.
Learning Neural Templates For Text Generation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 192 citations
Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018) • 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 137 citations
Kanishka Rao, Haşim Sak, Rohit Prabhavalkar
A Hierarchical Structured Self-attentive Model For Extractive Document Summarization (HSSAS) (2018) • IEEE Access • 127 citations
Kamal Al-Sabahi, Zhang Zuping, Mohammed Nadher
Self-attentive Sequential Recommendation (2018) • 2018 IEEE International Conference on Data Mining (ICDM) • 2379 citations
Wang-Cheng Kang, Julian McAuley
Modular Networks: Learning To Decompose Neural Computation (2018) • Arxiv • 40 citations
Louis Kirsch, Julius Kunze, David Barber
Abstractive Summarization Of Reddit Posts With Multi-level Memory Networks (2018) • Arxiv • 60 citations
Byeongchang Kim, Hyunwoo Kim, Gunhee Kim
An End-to-end Textspotter With Explicit Alignment And Attention (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 237 citations
He et al.
Multi-pointer Co-attention Networks For Recommendation (2018) • Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 220 citations
Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Exploiting Deep Representations For Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 91 citations
Dou et al.
Mattnet: Modular Attention Network For Referring Expression Comprehension (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 788 citations
Yu et al.
Proxylessnas: Direct Neural Architecture Search On Target Task And Hardware (2018) • Arxiv • 284 citations
Han Cai, Ligeng Zhu, Song Han
A Graph-to-sequence Model For Amr-to-text Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 197 citations
Song et al.
Neural Architecture Optimization (2018) • Arxiv • 431 citations
Luo et al.
Improving Variational Encoder-decoders In Dialogue Generation (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 91 citations
Shen et al.
End-to-end Dense Video Captioning With Masked Transformer (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 401 citations
Zhou et al.
Bi-directional Block Self-attention For Fast And Memory-efficient Sequence Modeling (2018) • Arxiv • 77 citations
Shen et al.
Adversarial Example Generation With Syntactically Controlled Paraphrase Networks (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 82 citations
Iyyer et al.
Self-attention With Relative Position Representations (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 272 citations
Peter Shaw, Jakob Uszkoreit, Ashish Vaswani
Efficient Attention: Attention With Linear Complexities (2018) • Arxiv • 88 citations
Shen et al.
Adversarially Regularising Neural NLI Models To Integrate Logical Background Knowledge (2018) • Arxiv • 43 citations
Pasquale Minervini, Sebastian Riedel
Collaborative Memory Network For Recommendation Systems (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 288 citations
Travis Ebesu, Bin Shen, Yi Fang
Graph2seq: Graph To Sequence Learning With Attention-based Neural Networks (2018) • Arxiv • 162 citations
Xu et al.
Cache Telepathy: Leveraging Shared Resource Attacks To Learn DNN Architectures (2018) • Arxiv • 44 citations
Mengjia Yan, Christopher Fletcher, Josep Torrellas
Clarinet: Parallel Wave Generation In End-to-end Text-to-speech (2018) • Arxiv • 63 citations
Wei Ping, Kainan Peng, Jitong Chen
On Extended Long Short-term Memory And Dependent Bidirectional Recurrent Neural Network (2018) • Neurocomputing • 174 citations
Yuanhang Su, C. -C. Jay Kuo
SHAPED: Shared-private Encoder-decoder For Text Style Adaptation (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 51 citations
Ye Zhang, Nan Ding, Radu Soricut
Generating Wikipedia By Summarizing Long Sequences (2018) • Arxiv • 74 citations
Liu et al.
Transformation Networks For Target-oriented Sentiment Classification (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 491 citations
Li et al.
Meansum: A Neural Model For Unsupervised Multi-document Abstractive Summarization (2018) • Arxiv • 97 citations
Eric Chu, Peter J. Liu
Deep-fsmn For Large Vocabulary Continuous Speech Recognition (2018) • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 120 citations
Zhang et al.
Pythia V0.1: The Winning Entry To The VQA Challenge 2018 (2018) • Arxiv • 165 citations
Jiang et al.
Syllable-based Sequence-to-sequence Speech Recognition With The Transformer In Mandarin Chinese (2018) • Interspeech 2018 • 69 citations
Zhou et al.
Trellis Networks For Sequence Modeling (2018) • Arxiv • 68 citations
Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Learning Private Neural Language Modeling With Attentive Aggregation (2018) • 2019 International Joint Conference on Neural Networks (IJCNN) • 93 citations
Ji et al.
News Session-based Recommendations Using Deep Neural Networks (2018) • Proceedings of the 3rd Workshop on Deep Learning for Recommender Systems • 95 citations
Gabriel de Souza P. Moreira, Felipe Ferreira, Adilson Marques da Cunha
Commonsense For Generative Multi-hop Question Answering Tasks (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 162 citations
Lisa Bauer, Yicheng Wang, Mohit Bansal
Training Deeper Neural Machine Translation Models With Transparent Attention (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 143 citations
Bapna et al.
Reinforced Self-attention Network: A Hybrid Of Hard And Soft Attention For Sequence Modeling (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 134 citations
Shen et al.
Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-experts Layer (2017) • Arxiv • 268 citations
Shazeer et al.
Dual Rectified Linear Units (drelus): A Replacement For Tanh Activation Functions In Quasi-recurrent Neural Networks (2017) • Pattern Recognition Letters • 45 citations
Godin et al.
Simple Recurrent Units For Highly Parallelizable Recurrence (2017) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 53 citations
Lei et al.
Deep Active Learning For Named Entity Recognition (2017) • Proceedings of the 2nd Workshop on Representation Learning for NLP • 364 citations
Shen et al.
Disan: Directional Self-attention Network For Rnn/cnn-free Language Understanding (2017) • Arxiv • 113 citations
Shen et al.
A Deep Reinforced Model For Abstractive Summarization (2017) • Arxiv • 1273 citations
Romain Paulus, Caiming Xiong, Richard Socher
End-to-end Neural Coreference Resolution (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 84 citations
Lee et al.
Attention Is All You Need (2017) • Arxiv • 6463 citations
Vaswani et al.
Opennmt: Open-source Toolkit For Neural Machine Translation (2017) • Proceedings of ACL 2017, System Demonstrations • 287 citations
Klein et al.
Shortcut-stacked Sentence Encoders For Multi-domain Inference (2017) • Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP • 119 citations
Yixin Nie, Mohit Bansal
State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017) • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 82 citations
Chiu et al.
Exploring Neural Transducers For End-to-end Speech Recognition (2017) • 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 233 citations
Battenberg et al.
Hierarchical Recurrent Attention Network For Response Generation (2017) • Arxiv • 116 citations
Xing et al.
Sequence-to-sequence Models Can Directly Translate Foreign Speech (2017) • Interspeech 2017 • 252 citations
Weiss et al.
Visual Reference Resolution Using Attention Memory For Visual Dialog (2017) • Arxiv • 90 citations
Seo et al.
Sqlnet: Generating Structured Queries From Natural Language Without Reinforcement Learning (2017) • Arxiv • 303 citations
Xiaojun Xu, Chang Liu, Dawn Song
Adversarial Feature Matching For Text Generation (2017) • Arxiv • 176 citations
Zhang et al.
Improved Variational Autoencoders For Text Modeling Using Dilated Convolutions (2017) • Arxiv • 94 citations
Yang et al.
Latent Variable Dialogue Models And Their Diversity (2017) • Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers • 50 citations
Kris Cao, Stephen Clark
Latent Relational Metric Learning Via Memory-based Attention For Collaborative Ranking (2017) • Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18 • 214 citations
Yi Tay, Anh Tuan Luu, Siu Cheung Hui
A Hybrid Convolutional Variational Autoencoder For Text Generation (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 220 citations
Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth
Personalization In Goal-oriented Dialog (2017) • Arxiv • 63 citations
Chaitanya K. Joshi, Fei Mi, Boi Faltings
Attend And Diagnose: Clinical Time Series Analysis Using Attention Models (2017) • Arxiv • 41 citations
Song et al.
Attention-based Models For Text-dependent Speaker Verification (2017) • Arxiv • 50 citations
Chowdhury et al.
Attention-based Multimodal Fusion For Video Description (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 387 citations
Hori et al.
Recurrent Topic-transition GAN For Visual Paragraph Generation (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 171 citations
Liang et al.
Learning To Generate Long-term Future Via Hierarchical Prediction (2017) • Arxiv • 180 citations
Villegas et al.
Sockeye: A Toolkit For Neural Machine Translation (2017) • Arxiv • 194 citations
Hieber et al.
Identity-aware Textual-visual Matching With Latent Co-attention (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 259 citations
Li et al.
Unconstrained Scene Text And Video Text Recognition For Arabic Script (2017) • 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR) • 49 citations
Mohit Jain, Minesh Mathew, C. V. Jawahar
Code Completion With Neural Attention And Pointer Networks (2017) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 103 citations
Li et al.
Fusionnet: Fusing Via Fully-aware Attention With Application To Machine Comprehension (2017) • Arxiv • 86 citations
Huang et al.
Efficient Vector Representation For Documents Through Corruption (2017) • Arxiv • 78 citations
Minmin Chen
Incorporating Copying Mechanism In Image Captioning For Learning Novel Objects (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 149 citations
Yao et al.
Reversible Architectures For Arbitrarily Deep Residual Neural Networks (2017) • Arxiv • 70 citations
Chang et al.
Tacotron: Towards End-to-end Speech Synthesis (2017) • Interspeech 2017 • 1567 citations
Wang et al.
Affect-lm: A Neural Language Model For Customizable Affective Text Generation (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 75 citations
Ghosh et al.
Distance-based Self-attention Network For Natural Language Inference (2017) • Arxiv • 69 citations
Jinbae Im, Sungzoon Cho
Just ASK: Building An Architecture For Extensible Self-service Spoken Language Understanding (2017) • Arxiv • 56 citations
Kumar et al.
Attngan: Fine-grained Text To Image Generation With Attentional Generative Adversarial Networks (2017) • Arxiv • 158 citations
Xu et al.
Best Of Both Worlds: Transferring Knowledge From Discriminative Learning To A Generative Visual Dialog Model (2017) • Arxiv • 86 citations
Lu et al.
Learning Discourse-level Diversity For Neural Dialog Models Using Conditional Variational Autoencoders (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 704 citations
Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
Multi-modal Factorized Bilinear Pooling With Co-attention Learning For Visual Question Answering (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 694 citations
Yu et al.
Deeparchitect: Automatically Designing And Training Deep Architectures (2017) • Arxiv • 142 citations
Renato Negrinho, Geoff Gordon
Learning To Rank Question Answer Pairs With Holographic Dual LSTM Architecture (2017) • Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval • 63 citations
Tay et al.
Get To The Point: Summarization With Pointer-generator Networks (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 3757 citations
Abigail See, Peter J. Liu, Christopher D. Manning
Tensor-train Recurrent Neural Networks For Video Classification (2017) • Arxiv • 102 citations
Yinchong Yang, Denis Krompass, Volker Tresp
Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling For Visual Question Answering (2017) • IEEE Transactions on Neural Networks and Learning Systems • 530 citations
Yu et al.
Learned Optimizers That Scale And Generalize (2017) • Arxiv • 115 citations
Wichrowska et al.
Table-to-text Generation By Structure-aware Seq2seq Learning (2017) • Arxiv • 40 citations
Liu et al.
Fast-slow Recurrent Neural Networks (2017) • Arxiv • 41 citations
Asier Mujika, Florian Meier, Angelika Steger
Improved Training Of Wasserstein Gans (2017) • Arxiv • 1508 citations
Gulrajani et al.
Multilingual Hierarchical Attention Networks For Document Classification (2017) • Arxiv • 48 citations
Nikolaos Pappas, Andrei Popescu-Belis
A Unified Query-based Generative Model For Question Generation And Question Answering (2017) • Arxiv • 50 citations
Linfeng Song, Zhiguo Wang, Wael Hamza
Jointly Learning Sentence Embeddings And Syntax With Unsupervised Tree-lstms (2017) • Natural Language Engineering • 83 citations
Jean Maillard, Stephen Clark, Dani Yogatama
Generative Adversarial Text To Image Synthesis (2016) • Arxiv • 1422 citations
Reed et al.
Tree-to-sequence Attentional Neural Machine Translation (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 250 citations
Akiko Eriguchi, Kazuma Hashimoto, Yoshimasa Tsuruoka
A Fast Unified Model For Parsing And Sentence Understanding (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 229 citations
Bowman et al.
Quasi-recurrent Neural Networks (2016) • Arxiv • 326 citations
Bradbury et al.
Long Short-term Memory-networks For Machine Reading (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 1040 citations
Jianpeng Cheng, Li Dong, Mirella Lapata
Capacity And Trainability In Recurrent Neural Networks (2016) • Arxiv • 82 citations
Jasmine Collins, Jascha Sohl-Dickstein, David Sussillo
Multi-task Cross-lingual Sequence Tagging From Scratch (2016) • Arxiv • 198 citations
Zhilin Yang, Ruslan Salakhutdinov, William Cohen
Character-level Question Answering With Attention (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 107 citations
David Golub, Xiaodong He
Deep Recurrent Models With Fast-forward Connections For Neural Machine Translation (2016) • Transactions of the Association for Computational Linguistics • 223 citations
Zhou et al.
A Convolutional Encoder Model For Neural Machine Translation (2016) • Arxiv • 64 citations
Gehring et al.
Chinese Song Iambics Generation With Neural Attention-based Model (2016) • Arxiv • 54 citations
Wang et al.
Multiresolution Recurrent Neural Networks: An Application To Dialogue Response Generation (2016) • Arxiv • 73 citations
Serban et al.
Variational Neural Machine Translation (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 189 citations
Zhang et al.
Image Captioning With Deep Bidirectional Lstms (2016) • Proceedings of the 24th ACM international conference on Multimedia • 262 citations
Wang et al.
Learning To Generalize To New Compositions In Image Understanding (2016) • Arxiv • 53 citations
Atzmon et al.
One-shot Generalization In Deep Generative Models (2016) • Arxiv • 75 citations
Rezende et al.
Neuro-symbolic Program Synthesis (2016) • Arxiv • 105 citations
Parisotto et al.
Incremental Parsing With Minimal Features Using Bi-directional LSTM (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 80 citations
James Cross, Liang Huang
Quantifying The Vanishing Gradient And Long Distance Dependency Problem In Recursive Neural Networks And Recursive Lstms (2016) • Proceedings of the 1st Workshop on Representation Learning for NLP • 54 citations
Phong Le, Willem Zuidema
Embracing Data Abundance: Booktest Dataset For Reading Comprehension (2016) • Arxiv • 56 citations
Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
Generative Deep Neural Networks For Dialogue: A Short Review (2016) • Arxiv • 67 citations
Serban et al.
A Joint Many-task Model: Growing A Neural Network For Multiple NLP Tasks (2016) • Arxiv • 52 citations
Hashimoto et al.
Neural Architecture Search With Reinforcement Learning (2016) • Arxiv • 3840 citations
Barret Zoph, Quoc V. Le
Sequence-to-sequence Learning As Beam-search Optimization (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 432 citations
Sam Wiseman, Alexander M. Rush
Topicrnn: A Recurrent Neural Network With Long-range Semantic Dependency (2016) • Arxiv • 129 citations
Dieng et al.
RNN Approaches To Text Normalization: A Challenge (2016) • Arxiv • 55 citations
Richard Sproat, Navdeep Jaitly
Multiplicative LSTM For Sequence Modelling (2016) • Arxiv • 88 citations
Krause et al.
A Context-aware Attention Network For Interactive Question Answering (2016) • Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining • 40 citations
Li et al.
MGNC-CNN: A Simple Approach To Exploiting Multiple Word Embeddings For Sentence Classification (2016) • Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 103 citations
Ye Zhang, Stephen Roller, Byron Wallace
Modelling Interaction Of Sentence Pair With Coupled-lstms (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 43 citations
Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Google's Neural Machine Translation System: Bridging The Gap Between Human And Machine Translation (2016) • Arxiv • 5627 citations
Wu et al.
Google's Multilingual Neural Machine Translation System: Enabling Zero-shot Translation (2016) • Arxiv • 108 citations
Johnson et al.
Dynamic Memory Networks For Visual And Textual Question Answering (2016) • Arxiv • 593 citations
Caiming Xiong, Stephen Merity, Richard Socher
Enhanced LSTM For Natural Language Inference (2016) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1178 citations
Chen et al.
Wikireading: A Novel Large-scale Language Understanding Task Over Wikipedia (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 128 citations
Hewlett et al.
Conditional Generation And Snapshot Learning In Neural Dialogue Systems (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 73 citations
Wen et al.
Towards Sub-word Level Compositions For Sentiment Analysis Of Hindi-english Code Mixed Text (2016) • Arxiv • 128 citations
Prabhu et al.
Pointer Sentinel Mixture Models (2016) • Arxiv • 481 citations
Merity et al.
Personalized Speech Recognition On Mobile Devices (2016) • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 158 citations
McGraw et al.
Incorporating Copying Mechanism In Sequence-to-sequence Learning (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1419 citations
Gu et al.
Topic Aware Neural Response Generation (2016) • Arxiv • 317 citations
Xing et al.
Hypernetworks (2016) • Arxiv • 82 citations
David Ha, Andrew Dai, Quoc V. Le

Showing first 12 while collapsed. Click to expand and reveal all 1091.

— N —

NAACL 22 papers #

Multilingual Machine Translation With Large Language Models: Empirical Results And Analysis (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 56 citations
Zhu et al.
Recmind: Large Language Model Powered Agent For Recommendation (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 43 citations
Wang et al.
REPLUG: Retrieval-augmented Black-box Language Models (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 76 citations
Shi et al.
Personallm: Investigating The Ability Of Large Language Models To Express Personality Traits (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 40 citations
Jiang et al.
Sentiment Analysis In The Era Of Large Language Models: A Reality Check (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 161 citations
Zhang et al.
Large Language Models Are Effective Text Rankers With Pairwise Ranking Prompting (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 88 citations
Qin et al.
CLMLF:A Contrastive Learning And Multi-layer Fusion Method For Multimodal Sentiment Detection (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 85 citations
Li et al.
Detect Rumors In Microblog Posts For Low-resource Domains Via Adversarial Contrastive Learning (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 41 citations
Lin et al.
POLITICS: Pretraining With Same-story Article Comparison For Ideology Prediction And Stance Detection (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 40 citations
Liu et al.
A Few Thousand Translations Go A Long Way! Leveraging Pre-trained Models For African News Translation (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 44 citations
Adelani et al.
Fine-grained Image Captioning With CLIP Reward (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 52 citations
Cho et al.
Continual Learning For Text Classification With Information Disentanglement Based Regularization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 59 citations
Huang et al.
TABBIE: Pretrained Representations Of Tabular Data (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 110 citations
Iida et al.
Banglabert: Language Model Pretraining And Benchmarks For Low-resource Language Understanding Evaluation In Bangla (2021) • Findings of the Association for Computational Linguistics: NAACL 2022 • 71 citations
Bhattacharjee et al.
Longt5: Efficient Text-to-text Transformer For Long Sequences (2021) • Findings of the Association for Computational Linguistics: NAACL 2022 • 87 citations
Guo et al.
Multi-reward Reinforced Summarization With Saliency And Entailment (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 47 citations
Ramakanth Pasunuru, Mohit Bansal
Object Counts! Bringing Explicit Detections Back Into Image Captioning (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 42 citations
Josiah Wang, Pranava Madhyastha, Lucia Specia
Natural Language To Structured Query Generation Via Meta-learning (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 120 citations
Huang et al.
Adversarial Example Generation With Syntactically Controlled Paraphrase Networks (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 82 citations
Iyyer et al.
Delete, Retrieve, Generate: A Simple Approach To Sentiment And Style Transfer (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 114 citations
Li et al.
Neural Fine-grained Entity Type Classification With Hierarchy-aware Loss (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 74 citations
Peng Xu, Denilson Barbosa
Evaluating Discourse Phenomena In Neural Machine Translation (2017) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 119 citations
Bawden et al.

Showing first 12 while collapsed. Click to expand and reveal all 22.

Neural Machine Translation 263 papers #

DITING: A Multi-agent Evaluation Framework For Benchmarking Web Novel Translation (2025) • No Venue
Zhang et al.
Language Models' Factuality Depends On The Language Of Inquiry (2025) • No Venue
Aggarwal et al.
Could Thinking Multilingually Empower LLM Reasoning? (2025) • No Venue
Gao et al.
Hala Technical Report: Building Arabic-centric Instruction & Translation Models At Scale (2025) • No Venue
Hasan Abed Al Kader Hammoud, Mohammad Zbeeb, Bernard Ghanem
Trillion 7B Technical Report (2025) • No Venue
Han et al.
Kuwain 1.5B: An Arabic SLM Via Language Injection (2025) • No Venue
Hennara et al.
Mutarjim: Advancing Bidirectional Arabic-english Translation With A Small Language Model (2025) • No Venue
Hennara et al.
New Trends For Modern Machine Translation With Large Reasoning Models (2025) • No Venue
Liu et al.
Beyond English: Toward Inclusive And Scalable Multilingual Machine Translation With Llms (2025) • No Venue
Luo et al.
Dupo: Enabling Reliable LLM Self-verification Via Dual Preference Optimization (2025) • No Venue
She et al.
Drt-o1: Optimized Deep Reasoning Translation Via Long Chain-of-thought (2024) • No Venue
Wang et al.
Complexity Of Symbolic Representation In Working Memory Of Transformer Correlates With The Complexity Of A Task (2024) • No Venue
Alsu Sagirova, Mikhail Burtsev
Seallms 3: Open Foundation And Chat Multilingual Large Language Models For Southeast Asian Languages (2024) • No Venue
Zhang et al.
Llamax: Scaling Linguistic Horizons Of LLM By Enhancing Translation Capabilities Beyond 100 Languages (2024) • No Venue
Lu et al.
Trans-tokenization And Cross-lingual Vocabulary Transfers: Language Adaptation Of Llms For Low-resource NLP (2024) • No Venue
Remy et al.
Scaling Laws For Downstream Task Performance Of Large Language Models (2024) • No Venue
Isik et al.
Contrastive Preference Optimization: Pushing The Boundaries Of LLM Performance In Machine Translation (2024) • No Venue
Xu et al.
Towards Making The Most Of Chatgpt For Machine Translation (2023) • SSRN Electronic Journal • 94 citations
Peng et al.
Multilingual Machine Translation With Large Language Models: Empirical Results And Analysis (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 56 citations
Zhu et al.
Cross2stra: Unpaired Cross-lingual Image Captioning With Cross-lingual Cross-modal Structure-pivoted Alignment (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 41 citations
Wu et al.
Document-level Machine Translation With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 71 citations
Wang et al.
Scene Graph As Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation With Visual Scene Hallucination (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
Fei et al.
Audiopalm: A Large Language Model That Can Speak And Listen (2023) • No Venue
Rubenstein et al.
Hallucinations In Large Multilingual Translation Models (2023) • Transactions of the Association for Computational Linguistics • 77 citations
Guerreiro et al.
Stay On Topic With Classifier-free Guidance (2023) • No Venue
Sanchez et al.
How Good Are GPT Models At Machine Translation? A Comprehensive Evaluation (2023) • Arxiv • 177 citations
Hendy et al.
Is Chatgpt A Good Translator? Yes With GPT-4 As The Engine (2023) • Arxiv • 307 citations
Jiao et al.
A Paradigm Shift In Machine Translation: Boosting Translation Performance Of Large Language Models (2023) • No Venue
Xu et al.
A Survey Of Learning-based Automated Program Repair (2023) • ACM Transactions on Software Engineering and Methodology • 77 citations
Zhang et al.
MADLAD-400: A Multilingual And Document-level Large Audited Dataset (2023) • No Venue
Kudugunta et al.
A Few Thousand Translations Go A Long Way! Leveraging Pre-trained Models For African News Translation (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 44 citations
Adelani et al.
Building Machine Translation Systems For The Next Thousand Languages (2022) • Arxiv • 43 citations
Bapna et al.
Mslam: Massively Multilingual Joint Pre-training For Speech And Text (2022) • Arxiv • 59 citations
Bapna et al.
CIRCLE: Continual Repair Across Programming Languages (2022) • Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis • 48 citations
Yuan et al.
No Language Left Behind: Scaling Human-centered Machine Translation (2022) • Arxiv • 354 citations
Team et al.
On The Explainability Of Natural Language Processing Deep Models (2022) • ACM Computing Surveys • 89 citations
Julia El Zini, Mariette Awad
Less Training, More Repairing Please: Revisiting Automated Program Repair Via Zero-shot Learning (2022) • ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 152 citations
Chunqiu Steven Xia, Lingming Zhang
TM2T: Stochastic And Tokenized Modeling For The Reciprocal Generation Of 3D Human Motions And Texts (2022) • Lecture Notes in Computer Science • 142 citations
Guo et al.
Coreference-aware Dialogue Summarization (2021) • Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue • 46 citations
Zhengyuan Liu, Ke Shi, Nancy F. Chen
Hierarchical Learning For Generation With Long Source Sequences (2021) • Arxiv • 41 citations
Tobias Rohde, Xiaoxia Wu, Yinhan Liu
CURE: Code-aware Neural Machine Translation For Automatic Program Repair (2021) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 239 citations
Nan Jiang, Thibaud Lutellier, Lin Tan
Good For Misconceived Reasons: An Empirical Revisiting On The Need For Visual Context In Multimodal Machine Translation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 54 citations
Wu et al.
Lightweight Adapter Tuning For Multilingual Speech Translation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) • 54 citations
Le et al.
Indicbart: A Pre-trained Model For Indic Natural Language Generation (2021) • Findings of the Association for Computational Linguistics: ACL 2022 • 51 citations
Dabre et al.
Luna: Linear Unified Nested Attention (2021) • Arxiv • 49 citations
Ma et al.
Hidden Backdoors In Human-centric Language Models (2021) • CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security • 84 citations
Li et al.
Deltalm: Encoder-decoder Pre-training For Language Generation And Translation By Augmenting Pretrained Multilingual Encoders (2021) • Arxiv • 51 citations
Ma et al.
Unlocking Compositional Generalization In Pre-trained Models Using Intermediate Representations (2021) • Arxiv • 51 citations
Herzig et al.
Gender Bias In Machine Translation (2021) • Transactions of the Association for Computational Linguistics • 74 citations
Savoldi et al.
Learning Shared Semantic Space For Speech-to-text Translation (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 57 citations
Han et al.
To Ship Or Not To Ship: An Extensive Evaluation Of Automatic Metrics For Machine Translation (2021) • Arxiv • 82 citations
Kocmi et al.
The Expando-mono-duo Design Pattern For Text Ranking With Pretrained Sequence-to-sequence Models (2021) • Arxiv • 40 citations
Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin
Experts, Errors, And Context: A Large-scale Study Of Human Evaluation For Machine Translation (2021) • Transactions of the Association for Computational Linguistics • 91 citations
Freitag et al.
Contrastive Learning For Many-to-many Multilingual Neural Machine Translation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 134 citations
Pan et al.
End-to-end Speech Translation Via Cross-modal Progressive Training (2021) • Interspeech 2021 • 42 citations
Rong Ye, Mingxuan Wang, Lei Li
Cross-attention Is All You Need: Adapting Pretrained Transformers For Machine Translation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 71 citations
Mozhdeh Gheini, Xiang Ren, Jonathan May
Perfection Not Required? Human-ai Partnerships In Code Translation (2021) • 26th International Conference on Intelligent User Interfaces • 82 citations
Weisz et al.
The FLORES-101 Evaluation Benchmark For Low-resource And Multilingual Machine Translation (2021) • Arxiv • 82 citations
Goyal et al.
R-drop: Regularized Dropout For Neural Networks (2021) • Arxiv • 305 citations
Liang et al.
Sequence-level Mixed Sample Data Augmentation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 68 citations
Demi Guo, Yoon Kim, Alexander M. Rush
Glancing Transformer For Non-autoregressive Neural Machine Translation (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 45 citations
Qian et al.
Beyond English-centric Multilingual Machine Translation (2020) • Arxiv • 465 citations
Fan et al.
A Novel Graph-based Multi-modal Fusion Encoder For Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 139 citations
Yin et al.
Realformer: Transformer Likes Residual Attention (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 45 citations
He et al.
BLEURT: Learning Robust Metrics For Text Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 40 citations
Thibault Sellam, Dipanjan Das, Ankur P. Parikh
A Probabilistic Formulation Of Unsupervised Text Style Transfer (2020) • Arxiv • 63 citations
He et al.
Semi-autoregressive Training Improves Mask-predict Decoding (2020) • Arxiv • 48 citations
Marjan Ghazvininejad, Omer Levy, Luke Zettlemoyer
Aligned Cross Entropy For Non-autoregressive Machine Translation (2020) • Arxiv • 68 citations
Ghazvininejad et al.
Incorporating BERT Into Parallel Sequence Decoding With Adapters (2020) • Arxiv • 40 citations
Guo et al.
Self-training For End-to-end Speech Translation (2020) • Interspeech 2020 • 40 citations
Pino et al.
On The Limitations Of Cross-lingual Encoders As Exposed By Reference-free Machine Translation Evaluation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 52 citations
Zhao et al.
Gshard: Scaling Giant Models With Conditional Computation And Automatic Sharding (2020) • Arxiv • 345 citations
Lepikhin et al.
Improving Massively Multilingual Neural Machine Translation And Zero-shot Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 208 citations
Zhang et al.
Unsupervised Domain Clusters In Pretrained Language Models (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 48 citations
Roee Aharoni, Yoav Goldberg
Pre-training Via Paraphrasing (2020) • Arxiv • 89 citations
Lewis et al.
Balancing Training For Multilingual Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 73 citations
Xinyi Wang, Yulia Tsvetkov, Graham Neubig
HAT: Hardware-aware Transformers For Efficient Natural Language Processing (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 138 citations
Wang et al.
A Study Of Non-autoregressive Model For Sequence Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 55 citations
Ren et al.
Towards Automatic Face-to-face Translation (2020) • Proceedings of the 27th ACM International Conference on Multimedia • 121 citations
R et al.
AMR Parsing Via Graph-sequence Iterative Inference (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 80 citations
Deng Cai, Wai Lam
Simultaneous Translation Policies: From Fixed To Adaptive (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 54 citations
Zheng et al.
Does Multi-encoder Help? A Case Study On Context-aware Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 60 citations
Li et al.
Accurate Word Alignment Induction From Neural Machine Translation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 56 citations
Chen et al.
Imitation Attacks And Defenses For Black-box Machine Translation Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 59 citations
Eric Wallace, Mitchell Stern, Dawn Song
Towards Making The Most Of Context In Neural Machine Translation (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 70 citations
Zheng et al.
Multilingual Denoising Pre-training For Neural Machine Translation (2020) • Transactions of the Association for Computational Linguistics • 897 citations
Liu et al.
Unsupervised Translation Of Programming Languages (2020) • Arxiv • 62 citations
Lachaux et al.
Pre-training Multilingual Neural Machine Translation By Leveraging Alignment Information (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 82 citations
Lin et al.
Leveraging Monolingual Data With Self-supervision For Multilingual Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 53 citations
Siddhant et al.
Multilingual Translation With Extensible Multilingual Pretraining And Finetuning (2020) • Arxiv • 150 citations
Tang et al.
Infoxlm: An Information-theoretic Framework For Cross-lingual Language Model Pre-training (2020) • Arxiv • 77 citations
Chi et al.
Shallow-to-deep Training For Neural Machine Translation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 40 citations
Li et al.
Norm-based Curriculum Learning For Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 96 citations
Liu et al.
Guiding Attention In Sequence-to-sequence Models For Dialogue Act Prediction (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Colombo et al.
Unsupervised Multimodal Neural Machine Translation With Pseudo Visual Pivoting (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 42 citations
Huang et al.
Multi-head Attention: Collaborate Instead Of Concatenate (2020) • Arxiv • 76 citations
Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi
A Simple But Tough-to-beat Data Augmentation Approach For Natural Language Understanding And Generation (2020) • Arxiv • 94 citations
Shen et al.
Simuleval: An Evaluation Toolkit For Simultaneous Translation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 69 citations
Ma et al.
Hard-coded Gaussian Attention For Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 53 citations
Weiqiu You, Simeng Sun, Mohit Iyyer
Dynamic Data Selection And Weighting For Iterative Back-translation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig
As Good As New. How To Successfully Recycle English GPT-2 To Make Models For Other Languages (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 40 citations
Wietse de Vries, Malvina Nissim
Understanding And Improving Lexical Choice In Non-autoregressive Translation (2020) • Arxiv • 44 citations
Ding et al.
Cross-lingual Retrieval For Iterative Self-supervised Training (2020) • NeurIPS 2020 • 48 citations
Tran et al.
Pre-trained Language Model Representations For Language Generation (2019) • Proceedings of the 2019 Conference of the North • 49 citations
Sergey Edunov, Alexei Baevski, Michael Auli
Editnts: An Neural Programmer-interpreter Model For Sentence Simplification Through Explicit Editing (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 139 citations
Dong et al.
Fast Structured Decoding For Sequence Models (2019) • Arxiv • 61 citations
Sun et al.
Fairseq: A Fast, Extensible Toolkit For Sequence Modeling (2019) • Proceedings of the 2019 Conference of the North • 744 citations
Ott et al.
The Flores Evaluation Datasets For Low-resource Machine Translation: Nepali-english And Sinhala-english (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 92 citations
Guzmán et al.
Masked Language Model Scoring (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 145 citations
Salazar et al.
Synchronous Bidirectional Neural Machine Translation (2019) • Transactions of the Association for Computational Linguistics • 116 citations
Long Zhou, Jiajun Zhang, Chengqing Zong
When A Good Translation Is Wrong In Context: Context-aware Machine Translation Improves On Deixis, Ellipsis, And Lexical Cohesion (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 48 citations
Elena Voita, Rico Sennrich, Ivan Titov
Context-aware Monolingual Repair For Neural Machine Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 95 citations
Elena Voita, Rico Sennrich, Ivan Titov
Analyzing Multi-head Self-attention: Specialized Heads Do The Heavy Lifting, The Rest Can Be Pruned (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 494 citations
Voita et al.
Robust Neural Machine Translation With Doubly Adversarial Inputs (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 233 citations
Yong Cheng, Lu Jiang, Wolfgang MacHerey
MASS: Masked Sequence To Sequence Pre-training For Language Generation (2019) • Arxiv • 579 citations
Song et al.
Controllable Sentence Simplification (2019) • Arxiv • 52 citations
Martin et al.
A Generalized Framework Of Sequence Generation With Application To Undirected Sequence Models (2019) • Arxiv • 46 citations
Mansimov et al.
On Learning Meaningful Code Changes Via Neural Machine Translation (2019) • 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) • 202 citations
Tufano et al.
MUSE: Parallel Multi-scale Attention For Sequence To Sequence Learning (2019) • Arxiv • 41 citations
Zhao et al.
Data Diversification: A Simple Strategy For Neural Machine Translation (2019) • Arxiv • 44 citations
Nguyen et al.
Minimizing The Bag-of-ngrams Difference For Non-autoregressive Neural Machine Translation (2019) • Arxiv • 42 citations
Shao et al.
Adaptively Sparse Transformers (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 81 citations
Gonçalo M. Correia, Vlad Niculae, André F. T. Martins
Improving Robustness Of Machine Translation With Synthetic Noise (2019) • Proceedings of the 2019 Conference of the North • 61 citations
Vaibhav et al.
Multilingual Neural Machine Translation With Knowledge Distillation (2019) • Arxiv • 129 citations
Tan et al.
Unsupervised Neural Machine Translation With SMT As Posterior Regularization (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 56 citations
Ren et al.
Syntax-enhanced Neural Machine Translation With Syntax-aware Word Representations (2019) • Proceedings of the 2019 Conference of the North • 60 citations
Zhang et al.
Mask-predict: Parallel Decoding Of Conditional Masked Language Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 177 citations
Ghazvininejad et al.
Massively Multilingual Neural Machine Translation (2019) • Proceedings of the 2019 Conference of the North • 395 citations
Roee Aharoni, Melvin Johnson, Orhan Firat
Attention-passing Models For Robust And Data-efficient End-to-end Speech Translation (2019) • Transactions of the Association for Computational Linguistics • 94 citations
Sperber et al.
Effective Cross-lingual Transfer Of Neural Machine Translation Models Without Shared Vocabularies (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 71 citations
Yunsu Kim, Yingbo Gao, Hermann Ney
Domain Robustness In Neural Machine Translation (2019) • Arxiv • 63 citations
Mathias Müller, Annette Rios, Rico Sennrich
The Effect Of Translationese In Machine Translation Test Sets (2019) • Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) • 69 citations
Mike Zhang, Antonio Toral
When And Why Is Document-level Context Useful In Neural Machine Translation? (2019) • Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019) • 64 citations
Yunsu Kim, Duc Thanh Tran, Hermann Ney
Hint-based Training For Non-autoregressive Machine Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 63 citations
Li et al.
XLDA: Cross-lingual Data Augmentation For Natural Language Inference And Question Answering (2019) • Arxiv • 61 citations
Singh et al.
Massively Multilingual Neural Machine Translation In The Wild: Findings And Challenges (2019) • Arxiv • 296 citations
Arivazhagan et al.
The Missing Ingredient In Zero-shot Neural Machine Translation (2019) • Arxiv • 92 citations
Arivazhagan et al.
On NMT Search Errors And Model Errors: Cat Got Your Tongue? (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 111 citations
Felix Stahlberg, Bill Byrne
Ccmatrix: Mining Billions Of High-quality Parallel Sentences On The WEB (2019) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 61 citations
Schwenk et al.
Generalized Data Augmentation For Low-resource Translation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 109 citations
Xia et al.
Constrained Decoding For Neural NLG From Compositional Representations In Task-oriented Dialogue (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 74 citations
Balakrishnan et al.
Simple, Scalable Adaptation For Neural Machine Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 84 citations
Ankur Bapna, Naveen Arivazhagan, Orhan Firat
Revisiting Low-resource Neural Machine Translation: A Case Study (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 151 citations
Rico Sennrich, Biao Zhang
Adding Interpretable Attention To Neural Translation Models Improves Word Alignment (2019) • Arxiv • 80 citations
Thomas Zenkel, Joern Wuebker, John Denero
BART: Denoising Sequence-to-sequence Pre-training For Natural Language Generation, Translation, And Comprehension (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 3464 citations
Lewis et al.
Context-aware Self-attention Networks (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 92 citations
Yang et al.
Mixture Models For Diverse Machine Translation: Tricks Of The Trade (2019) • Arxiv • 61 citations
Shen et al.
A Comparative Study On Transformer Vs RNN In Speech Applications (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 448 citations
Karita et al.
Findings Of The First Shared Task On Machine Translation Robustness (2019) • Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) • 55 citations
Li et al.
Revisiting Self-training For Neural Sequence Generation (2019) • Arxiv • 139 citations
He et al.
Sparse Sequence-to-sequence Models (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 178 citations
Ben Peters, Vlad Niculae, André F. T. Martins
Multilingual End-to-end Speech Translation (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 78 citations
Inaguma et al.
Pay Less Attention With Lightweight And Dynamic Convolutions (2019) • Arxiv • 322 citations
Wu et al.
Emerging Cross-lingual Structure In Pretrained Language Models (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 46 citations
Wu et al.
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit (2019) • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 66 citations
Wang et al.
Simultaneous Translation With Flexible Policy Via Restricted Imitation Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 71 citations
Zheng et al.
Non-autoregressive Machine Translation With Auxiliary Regularization (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 153 citations
Wang et al.
Learning Deep Transformer Models For Machine Translation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 356 citations
Wang et al.
On Evaluation Of Adversarial Perturbations For Sequence-to-sequence Models (2019) • Proceedings of the 2019 Conference of the North • 114 citations
Michel et al.
Convolutional Self-attention Networks (2019) • Proceedings of the 2019 Conference of the North • 125 citations
Yang et al.
Simpler And Faster Learning Of Adaptive Policies For Simultaneous Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 77 citations
Zheng et al.
Improved Zero-shot Neural Machine Translation Via Ignoring Spurious Correlations (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 86 citations
Gu et al.
End-to-end Speech Translation With Knowledge Distillation (2019) • Interspeech 2019 • 139 citations
Liu et al.
Mesh-tensorflow: Deep Learning For Supercomputers (2018) • Arxiv • 52 citations
Shazeer et al.
Wronging A Right: Generating Better Errors To Improve Grammatical Error Detection (2018) • Arxiv • 56 citations
Sudhanshu Kasewa, Pontus Stenetorp, Sebastian Riedel
Natural Language Generation For Electronic Health Records (2018) • npj Digital Medicine • 74 citations
Scott Lee
Tree-to-tree Neural Networks For Program Translation (2018) • Arxiv • 83 citations
Xinyun Chen, Chang Liu, Dawn Song
End-to-end Non-autoregressive Neural Machine Translation With Connectionist Temporal Classification (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 132 citations
Jindřich Libovický, Jindřich Helcl
The Best Of Both Worlds: Combining Recent Advances In Neural Machine Translation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 131 citations
Chen et al.
CODIT: Code Editing With Tree-based Neural Models (2018) • IEEE Transactions on Software Engineering • 105 citations
Chakraborty et al.
Semi-autoregressive Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 76 citations
Chunqi Wang, Ji Zhang, Haiqing Chen
A Study Of Reinforcement Learning For Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 155 citations
Wu et al.
Zero-shot Cross-lingual Classification Using Multilingual Neural Machine Translation (2018) • Arxiv • 78 citations
Eriguchi et al.
Latent Alignment And Variational Attention (2018) • Arxiv • 85 citations
Deng et al.
Conditional Variational Autoencoder For Neural Machine Translation (2018) • Arxiv • 42 citations
Artidoro Pagnoni, Kevin Liu, Shangyan Li
Fast Decoding In Sequence Models Using Discrete Latent Variables (2018) • Arxiv • 177 citations
Kaiser et al.
Asynchronous Bidirectional Decoding For Neural Machine Translation (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 115 citations
Zhang et al.
Gpipe: Efficient Training Of Giant Neural Networks Using Pipeline Parallelism (2018) • Arxiv • 236 citations
Huang et al.
Multi-head Attention With Disagreement Regularization (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 181 citations
Li et al.
Seq2seq2sentiment: Multimodal Sequence To Sequence Models For Sentiment Analysis (2018) • Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML) • 65 citations
Pham et al.
Phrase-based & Neural Unsupervised Machine Translation (2018) • Arxiv • 232 citations
Lample et al.
Semi-supervised Sequence Modeling With Cross-view Training (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 391 citations
Clark et al.
Back-translation Sampling By Targeting Difficult Words In Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 84 citations
Marzieh Fadaee, Christof Monz
Contextual Parameter Generation For Universal Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 155 citations
Platanios et al.
Code2seq: Generating Sequences From Structured Representations Of Code (2018) • Arxiv • 62 citations
Alon et al.
Joint Training For Neural Machine Translation Models With Monolingual Data (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 99 citations
Zhang et al.
Investigating Backtranslation In Neural Machine Translation (2018) • Arxiv • 88 citations
Poncelas et al.
Von Mises-fisher Loss For Training Sequence To Sequence Models With Continuous Outputs (2018) • Arxiv • 54 citations
Sachin Kumar, Yulia Tsvetkov
Exploiting Deep Representations For Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 91 citations
Dou et al.
Leveraging Grammar And Reinforcement Learning For Neural Program Synthesis (2018) • Arxiv • 49 citations
Bunel et al.
Meta-learning For Low-resource Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 313 citations
Gu et al.
A Graph-to-sequence Model For Amr-to-text Generation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 197 citations
Song et al.
Modeling Localness For Self-attention Networks (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 216 citations
Yang et al.
Training Tips For The Transformer Model (2018) • The Prague Bulletin of Mathematical Linguistics • 109 citations
Martin Popel, Ondřej Bojar
Why Self-attention? A Targeted Evaluation Of Neural Machine Translation Architectures (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 287 citations
Tang et al.
Multi-task Neural Models For Translating Between Styles Within And Across Languages (2018) • Arxiv • 52 citations
Xing Niu, Sudha Rao, Marine Carpuat
Multilingual Neural Machine Translation With Task-specific Attention (2018) • Arxiv • 53 citations
Graeme Blackwood, Miguel Ballesteros, Todd Ward
Context-aware Neural Machine Translation Learns Anaphora Resolution (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 300 citations
Voita et al.
Self-attention With Relative Position Representations (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 272 citations
Peter Shaw, Jakob Uszkoreit, Ashish Vaswani
On Adversarial Examples For Character-level Neural Machine Translation (2018) • COLING 2018 • 157 citations
Javid Ebrahimi, Daniel Lowd, Dejing Dou
Beyond Error Propagation In Neural Machine Translation: Characteristics Of Language Also Matter (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 58 citations
Wu et al.
Towards Robust Neural Machine Translation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 178 citations
Cheng et al.
Escape: A Large-scale Synthetic Corpus For Automatic Post-editing (2018) • Arxiv • 50 citations
Negri et al.
Multilingual Extractive Reading Comprehension By Runtime Machine Translation (2018) • Arxiv • 59 citations
Asai et al.
Improving The Transformer Translation Model With Document-level Context (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 81 citations
Zhang et al.
Syllable-based Sequence-to-sequence Speech Recognition With The Transformer In Mandarin Chinese (2018) • Interspeech 2018 • 69 citations
Zhou et al.
Understanding Back-translation At Scale (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 995 citations
Edunov et al.
Reaching Human-level Performance In Automatic Grammatical Error Correction: An Empirical Study (2018) • Arxiv • 97 citations
Tao Ge, Furu Wei, Ming Zhou
Scaling Neural Machine Translation (2018) • Proceedings of the Third Conference on Machine Translation: Research Papers • 80 citations
Ott et al.
Style Transfer As Unsupervised Machine Translation (2018) • Arxiv • 114 citations
Zhang et al.
Bi-directional Neural Machine Translation With Synthetic Parallel Data (2018) • Proceedings of the 2nd Workshop on Neural Machine Translation and Generation • 75 citations
Xing Niu, Michael Denkowski, Marine Carpuat
Sequence To Sequence Mixture Model For Diverse Machine Translation (2018) • Proceedings of the 22nd Conference on Computational Natural Language Learning • 42 citations
Xuanli He, Gholamreza Haffari, Mohammad Norouzi
Training Deeper Neural Machine Translation Models With Transparent Attention (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 143 citations
Bapna et al.
Simple Recurrent Units For Highly Parallelizable Recurrence (2017) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 53 citations
Lei et al.
What Do Neural Machine Translation Models Learn About Morphology? (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 276 citations
Belinkov et al.
Attention Is All You Need (2017) • Arxiv • 6463 citations
Vaswani et al.
Opennmt: Open-source Toolkit For Neural Machine Translation (2017) • Proceedings of ACL 2017, System Demonstrations • 287 citations
Klein et al.
Non-autoregressive Neural Machine Translation (2017) • Arxiv • 449 citations
Gu et al.
A Parallel Corpus Of Python Functions And Documentation Strings For Automated Code Documentation And Code Generation (2017) • Arxiv • 62 citations
Antonio Valerio Miceli Barone, Rico Sennrich
Sequence-to-sequence Models Can Directly Translate Foreign Speech (2017) • Interspeech 2017 • 252 citations
Weiss et al.
Modeling Coherence For Neural Machine Translation With Dynamic And Topic Caches (2017) • Arxiv • 81 citations
Kuang et al.
Cold Fusion: Training Seq2seq Models Together With Language Models (2017) • Interspeech 2018 • 111 citations
Sriram et al.
Incorporating Global Visual Features Into Attention-based Neural Machine Translation (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 150 citations
Iacer Calixto, Qun Liu, Nick Campbell
Learning To Generate One-sentence Biographies From Wikidata (2017) • Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers • 90 citations
Andrew Chisholm, Will Radford, Ben Hachey
Story Generation From Sequence Of Independent Short Descriptions (2017) • Arxiv • 80 citations
Jain et al.
Sockeye: A Toolkit For Neural Machine Translation (2017) • Arxiv • 194 citations
Hieber et al.
Evaluating Discourse Phenomena In Neural Machine Translation (2017) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 119 citations
Bawden et al.
Neural Machine Translation With Source-side Latent Graph Parsing (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 59 citations
Kazuma Hashimoto, Yoshimasa Tsuruoka
Curriculum Learning And Minibatch Bucketing In Neural Machine Translation (2017) • RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning • 127 citations
Tom Kocmi, Ondrej Bojar
Massive Exploration Of Neural Machine Translation Architectures (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 457 citations
Britz et al.
Neural Semantic Parsing By Character-based Translation: Experiments With Abstract Meaning Representations (2017) • Arxiv • 82 citations
Rik van Noord, Johan Bos
Generalization Without Systematicity: On The Compositional Skills Of Sequence-to-sequence Recurrent Networks (2017) • Lake B. M. and Baroni M. (2018). Generalization without systematicity On the compositional skills of sequence-to-sequence recurrent networks. International Conference on Machine Learning (ICML) • 77 citations
Brenden M. Lake, Marco Baroni
Predicting Target Language CCG Supertags Improves Neural Machine Translation (2017) • Proceedings of the Second Conference on Machine Translation • 72 citations
Nadejde et al.
Deep Neural Machine Translation With Linear Associative Unit (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Wang et al.
Beam Search Strategies For Neural Machine Translation (2017) • Proceedings of the First Workshop on Neural Machine Translation • 235 citations
Markus Freitag, Yaser Al-Onaizan
Using The Output Embedding To Improve Language Models (2016) • Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers • 94 citations
Ofir Press, Lior Wolf
Fast Domain Adaptation For Neural Machine Translation (2016) • Arxiv • 164 citations
Markus Freitag, Yaser Al-Onaizan
Memory-enhanced Decoder For Neural Machine Translation (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 60 citations
Wang et al.
Systran's Pure Neural Machine Translation Systems (2016) • Arxiv • 75 citations
Crego et al.
Tree-to-sequence Attentional Neural Machine Translation (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 250 citations
Akiko Eriguchi, Kazuma Hashimoto, Yoshimasa Tsuruoka
Quasi-recurrent Neural Networks (2016) • Arxiv • 326 citations
Bradbury et al.
Dual Learning For Machine Translation (2016) • NIPS 2016 • 597 citations
Xia et al.
Transfer Learning For Low-resource Neural Machine Translation (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 722 citations
Zoph et al.
Deep Recurrent Models With Fast-forward Connections For Neural Machine Translation (2016) • Transactions of the Association for Computational Linguistics • 223 citations
Zhou et al.
A Convolutional Encoder Model For Neural Machine Translation (2016) • Arxiv • 64 citations
Gehring et al.
Chinese Song Iambics Generation With Neural Attention-based Model (2016) • Arxiv • 54 citations
Wang et al.
Multiresolution Recurrent Neural Networks: An Application To Dialogue Response Generation (2016) • Arxiv • 73 citations
Serban et al.
Temporal Attention Model For Neural Machine Translation (2016) • Arxiv • 52 citations
Sankaran et al.
Joint Copying And Restricted Generation For Paraphrase (2016) • Arxiv • 61 citations
Cao et al.
Variational Neural Machine Translation (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 189 citations
Zhang et al.
Fully Character-level Neural Machine Translation Without Explicit Segmentation (2016) • Transactions of the Association for Computational Linguistics • 409 citations
Jason Lee, Kyunghyun Cho, Thomas Hofmann
Coverage Embedding Models For Neural Machine Translation (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 131 citations
Mi et al.
Sequence-to-sequence Learning As Beam-search Optimization (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 432 citations
Sam Wiseman, Alexander M. Rush
The AMU-UEDIN Submission To The WMT16 News Translation Task: Attention-based NMT Models As Feature Functions In Phrase-based SMT (2016) • Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers • 56 citations
Marcin Junczys-Dowmunt, Tomasz Dwojak, Rico Sennrich
Mutual Information And Diverse Decoding Improve Neural Machine Translation (2016) • Arxiv • 98 citations
Jiwei Li, Dan Jurafsky
Compression Of Neural Machine Translation Models Via Pruning (2016) • Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning • 50 citations
Abigail See, Minh-Thang Luong, Christopher D. Manning
Google's Neural Machine Translation System: Bridging The Gap Between Human And Machine Translation (2016) • Arxiv • 5627 citations
Wu et al.
Google's Multilingual Neural Machine Translation System: Enabling Zero-shot Translation (2016) • Arxiv • 108 citations
Johnson et al.
An Actor-critic Algorithm For Sequence Prediction (2016) • Arxiv • 224 citations
Bahdanau et al.
Neural Versus Phrase-based Machine Translation Quality: A Case Study (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 69 citations
Bentivogli et al.
Diverse Beam Search: Decoding Diverse Solutions From Neural Sequence Models (2016) • Arxiv • 67 citations
Vijayakumar et al.
Incorporating Copying Mechanism In Sequence-to-sequence Learning (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 1419 citations
Gu et al.
Hypernetworks (2016) • Arxiv • 82 citations
David Ha, Andrew Dai, Quoc V. Le

Showing first 12 while collapsed. Click to expand and reveal all 263.

NEURIPS 12 papers #

Monitoring Ai-modified Content At Scale: A Case Study On The Impact Of Chatgpt On AI Conference Peer Reviews (2024) • Arxiv • 59 citations
Liang et al.
Mathematical Capabilities Of Chatgpt (2023) • NeurIPS 2023 Datasets and Benchmarks • 293 citations
Frieder et al.
Quark: Controllable Text Generation With Reinforced Unlearning (2022) • NeurIPS 2022 (Oral Selection) • 45 citations
Lu et al.
Achieving Forgetting Prevention And Knowledge Transfer In Continual Learning (2021) • NeurIPS 2021 • 44 citations
Ke et al.
Adabelief Optimizer: Adapting Stepsizes By The Belief In Observed Gradients (2020) • NeurIPS 2020 • 121 citations
Zhuang et al.
Language As A Cognitive Tool To Imagine Goals In Curiosity-driven Exploration (2020) • NeurIPS 2020 • 56 citations
Colas et al.
Cross-lingual Retrieval For Iterative Self-supervised Training (2020) • NeurIPS 2020 • 48 citations
Tran et al.
The Second Conversational Intelligence Challenge (convai2) (2019) • The Springer Series on Challenges in Machine Learning • 361 citations
Dinan et al.
Rubi: Reducing Unimodal Biases In Visual Question Answering (2019) • Advances in Neural Information Processing Systems 2019 (pp. 839-850) • 205 citations
Cadene et al.
TADAM: Task Dependent Adaptive Metric For Improved Few-shot Learning (2018) • Advances in Neural Information Processing Systems 31 2018 • 199 citations
Boris N. Oreshkin, Pau Rodriguez, Alexandre Lacoste
Learning To Compose Domain-specific Transformations For Data Augmentation (2017) • Advances in Neural Information Processing Systems 30 2017 3236--3246 • 182 citations
Ratner et al.
Dual Learning For Machine Translation (2016) • NIPS 2016 • 597 citations
Xia et al.

Showing first 12 while collapsed. Click to expand and reveal all 12.

— P —

Prompting 508 papers #

When Visualizing Is The First Step To Reasoning: MIRA, A Benchmark For Visual Chain-of-thought (2025) • No Venue
Zhou et al.
MARS: A Multi-agent Framework Incorporating Socratic Guidance For Automated Prompt Optimization (2025) • No Venue
Zhang et al.
Prompt Orchestration Markup Language (2025) • No Venue
Zhang et al.
Vlm^2-bench: A Closer Look At How Well Vlms Implicitly Link Explicit Matching Visual Cues (2025) • No Venue
Zhang et al.
Is Chain-of-thought Reasoning Of Llms A Mirage? A Data Distribution Lens (2025) • No Venue
Zhao et al.
Let Llms Break Free From Overthinking Via Self-braking Tuning (2025) • No Venue
Zhao et al.
Promptcot 2.0: Scaling Prompt Synthesis For Large Language Model Reasoning (2025) • No Venue
Zhao et al.
One Token To Fool Llm-as-a-judge (2025) • No Venue
Zhao et al.
Dimple: Discrete Diffusion Multimodal Large Language Model With Parallel Decoding (2025) • No Venue
Runpeng Yu, Xinyin Ma, Xinchao Wang
Scireasoner: Laying The Scientific Reasoning Ground Across Disciplines (2025) • No Venue
Wang et al.
Visualprm: An Effective Process Reward Model For Multimodal Reasoning (2025) • No Venue
Wang et al.
Towards System 2 Reasoning In Llms: Learning How To Think With Meta Chain-of-though (2025) • No Venue
Xiang et al.
Chain Of Draft: Thinking Faster By Writing Less (2025) • No Venue
Xu et al.
Longlive: Real-time Interactive Long Video Generation (2025) • No Venue
Yang et al.
Optimizing Chain-of-thought Reasoners Via Gradient Variance Minimization In Rejection Sampling And RL (2025) • No Venue
Yao et al.
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025) • No Venue
Agrawal et al.
Ultraif: Advancing Instruction Following From The Wild (2025) • No Venue
An et al.
Sketch-of-thought: Efficient LLM Reasoning With Adaptive Cognitive-inspired Sketching (2025) • No Venue
Simon A. Aytes, Jinheon Baek, Sung Ju Hwang
KV Cache Steering For Inducing Reasoning In Small Language Models (2025) • No Venue
Belitsky et al.
Video-as-prompt: Unified Semantic Control For Video Generation (2025) • No Venue
Bian et al.
Training-free Group Relative Policy Optimization (2025) • No Venue
Cai et al.
Multi-domain Explainability Of Preferences (2025) • No Venue
Nitay Calderon, Liat Ein-Dor, Roi Reichart
Iterresearch: Rethinking Long-horizon Agents Via Markovian State Reconstruction (2025) • No Venue
Chen et al.
Hunyuanimage 3.0 Technical Report (2025) • No Venue
Cao et al.
Spacetools: Tool-augmented Spatial Reasoning Via Double Interactive RL (2025) • No Venue
Chen et al.
SHANKS: Simultaneous Hearing And Thinking For Spoken Language Models (2025) • No Venue
Chiang et al.
System Prompt Optimization With Meta-learning (2025) • No Venue
Yumin Choi, Jinheon Baek, Sung Ju Hwang
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities For Mllms (2025) • No Venue
Choi et al.
Modifying Large Language Model Post-training For Diverse Creative Writing (2025) • No Venue
Chung et al.
Overview Of The TREC 2021 Deep Learning Track (2025) • Arxiv • 58 citations
Craswell et al.
Tool-star: Empowering Llm-brained Multi-tool Reasoner Via Reinforcement Learning (2025) • No Venue
Dong et al.
Motionsight: Boosting Fine-grained Motion Understanding In Multimodal Llms (2025) • No Venue
Du et al.
SSRL: Self-search Reinforcement Learning (2025) • No Venue
Fan et al.
Got: Unleashing Reasoning Capability Of Multimodal Large Language Model For Visual Generation And Editing (2025) • No Venue
Fang et al.
PHYSICS: Benchmarking Foundation Models On University-level Physics Problem Solving (2025) • No Venue
Feng et al.
What Characterizes Effective Reasoning? Revisiting Length, Review, And Structure Of Cot (2025) • No Venue
Feng et al.
Multiple Choice Questions: Reasoning Makes Large Language Models (llms) More Self-confident Even When They Are Wrong (2025) • No Venue
Fu et al.
Latcoder: Converting Webpage Design To Code With Layout-as-thought (2025) • No Venue
Gui et al.
Web-cogreasoner: Towards Knowledge-induced Cognitive Reasoning For Web Agents (2025) • No Venue
Guo et al.
Can We Generate Images With Cot? Let's Verify And Reinforce Image Generation Step By Step (2025) • No Venue
Guo et al.
Generating An Image From 1,000 Words: Enhancing Text-to-image With Structured Captions (2025) • No Venue
Gutflaish et al.
Beyond The Last Answer: Your Reasoning Trace Uncovers More Than You Think (2025) • No Venue
Hasan Abed Al Kader Hammoud, Hani Itani, Bernard Ghanem
MAGA: Massive Genre-audience Reformulation To Pretraining Corpus Expansion (2025) • No Venue
Xintong Hao, Ke Shen, Chenggang Li
Pasa: An LLM Agent For Comprehensive Academic Paper Search (2025) • No Venue
He et al.
Beyond One-size-fits-all: Inversion Learning For Highly Effective NLG Evaluation Prompts (2025) • No Venue
Hong et al.
The Imitation Game: Turing Machine Imitator Is Length Generalizable Reasoner (2025) • No Venue
Hua et al.
Sentinel: SOTA Model To Protect Against Prompt Injections (2025) • No Venue
Dror Ivry, Oran Nahum
When Thoughts Meet Facts: Reusable Reasoning For Long-context Lms (2025) • No Venue
Jeong et al.
Mme-cot: Benchmarking Chain-of-thought In Large Multimodal Models For Reasoning Quality, Robustness, And Efficiency (2025) • No Venue
Jiang et al.
Detect Anything Via Next Point Prediction (2025) • No Venue
Jiang et al.
Omnihuman-1.5: Instilling An Active Mind In Avatars Via Cognitive Simulation (2025) • No Venue
Jiang et al.
T2I-R1: Reinforcing Image Generation With Collaborative Semantic-level And Token-level Cot (2025) • No Venue
Jiang et al.
Distilling LLM Agent Into Small Models With Retrieval And Code Tools (2025) • No Venue
Kang et al.
Toward Evaluative Thinking: Meta Policy Optimization With Evolving Reward Models (2025) • No Venue
Kim et al.
No Prompt Left Behind: Exploiting Zero-variance Prompts In LLM Reinforcement Learning Via Entropy-guided Advantage Shaping (2025) • No Venue
Le et al.
The Cot Encyclopedia: Analyzing, Predicting, And Controlling How A Reasoning Model Will Think (2025) • No Venue
Lee et al.
Analysing Chain Of Thought Dynamics: Active Guidance Or Unfaithful Post-hoc Rationalisation? (2025) • No Venue
Lewis-Lim et al.
Visual-cog: Stage-aware Reinforcement Learning With Chain Of Guidance For Text-to-image Generation (2025) • No Venue
Li et al.
4D Langsplat: 4D Language Gaussian Splatting Via Multimodal Large Language Models (2025) • No Venue
Li et al.
Codei/o: Condensing Reasoning Patterns Via Code Input-output Prediction (2025) • No Venue
Li et al.
START: Self-taught Reasoner With Tools (2025) • No Venue
Li et al.
Seek In The Dark: Reasoning Via Test-time Instance-level Policy Gradient In Latent Space (2025) • No Venue
Li et al.
Drag-and-drop Llms: Zero-shot Prompt-to-weights (2025) • No Venue
Liang et al.
Fractured Chain-of-thought Reasoning (2025) • No Venue
Liao et al.
Metaladder: Ascending Mathematical Solution Quality Via Analogical-problem Reasoning Transfer (2025) • No Venue
Lin et al.
Self-supervised Quantized Representation For Seamlessly Integrating Knowledge Graphs With Large Language Models (2025) • No Venue
Lin et al.
Scaling Code-assisted Chain-of-thoughts And Instructions For Model Reasoning (2025) • No Venue
Lin et al.
Beyond Distillation: Pushing The Limits Of Medical LLM Reasoning With Minimalist Rule-based RL (2025) • No Venue
Liu et al.
Can World Simulators Reason? Gen-vire: A Generative Visual Reasoning Benchmark (2025) • No Venue
Liu et al.
Efficient Inference For Large Reasoning Models: A Survey (2025) • No Venue
Liu et al.
Medsam3: Delving Into Segment Anything With Medical Concepts (2025) • No Venue
Liu et al.
Rectifying LLM Thought From Lens Of Optimization (2025) • No Venue
Liu et al.
Adacot: Pareto-optimal Adaptive Chain-of-thought Triggering Via Reinforcement Learning (2025) • No Venue
Lou et al.
VISTA: A Test-time Self-improving Video Generation Agent (2025) • No Venue
Long et al.
Beyond English: Toward Inclusive And Scalable Multilingual Machine Translation With Llms (2025) • No Venue
Luo et al.
Easy Dataset: A Unified And Extensible Framework For Synthesizing LLM Fine-tuning Data From Unstructured Documents (2025) • No Venue
Miao et al.
LLM As A Broken Telephone: Iterative Generation Distorts Information (2025) • No Venue
Mohamed et al.
Leveraging Self-attention For Input-dependent Soft Prompting In Llms (2025) • No Venue
Ananth Muppidi, Abhilash Nandy, Sambaran Bandyopadhyay
Hot: Highlighted Chain Of Thought For Referencing Supporting Facts From Inputs (2025) • No Venue
Nguyen et al.
Does Understanding Inform Generation In Unified Multimodal Models? From Analysis To Path Forward (2025) • No Venue
Niu et al.
BOLT: Bootstrap Long Chain-of-thought In Language Models Without Distillation (2025) • No Venue
Pang et al.
Cotox: Chain-of-thought-based Molecular Toxicity Reasoning And Prediction (2025) • No Venue
Park et al.
Generating Physically Stable And Buildable LEGO Designs From Text (2025) • No Venue
Pun et al.
Hogwild! Inference: Parallel LLM Generation Via Concurrent Attention (2025) • No Venue
Rodionov et al.
Agentrxiv: Towards Collaborative Autonomous Research (2025) • No Venue
Samuel Schmidgall, Michael Moor
When Punctuation Matters: A Large-scale Comparison Of Prompt Robustness Methods For Llms (2025) • No Venue
Seleznyov et al.
Deep Research: A Systematic Survey (2025) • No Venue
Shi et al.
Earthmind: Towards Multi-granular And Multi-sensor Earth Observation With Large Multimodal Models (2025) • No Venue
Shu et al.
Pushing On Multilingual Reasoning Models With Language-mixed Chain-of-thought (2025) • No Venue
Son et al.
Au-harness: An Open-source Toolkit For Holistic Evaluation Of Audio Llms (2025) • No Venue
Surapaneni et al.
Envision: Benchmarking Unified Understanding & Generation For Causal World Process Insights (2025) • No Venue
Tian et al.
Openmathinstruct-1: A 1.8 Million Math Instruction Tuning Dataset (2024) • No Venue
Toshniwal et al.
PALP: Prompt Aligned Personalization Of Text-to-image Models (2024) • No Venue
Arar et al.
From Generalist To Specialist: Adapting Vision Language Models Via Task-specific Visual Instruction Tuning (2024) • No Venue
Bai et al.
Iris: An Ai-driven Virtual Tutor For Computer Science Education (2024) • Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 • 41 citations
Patrick Bassner, Eduard Frankford, Stephan Krusche
Intelligent Clinical Documentation: Harnessing Generative AI For Patient-centric Clinical Note Generation (2024) • International Journal of Innovative Science and Research Technology (IJISRT) • 999 citations
Anjanava Biswas, Wrick Talukdar
ROCKET-1: Master Open-world Interaction With Visual-temporal Context Prompting (2024) • No Venue
Cai et al.
Tx-llm: A Large Language Model For Therapeutics (2024) • No Venue
Chaves et al.
Premise Order Matters In Reasoning With Large Language Models (2024) • No Venue
Chen et al.
Language Models Are Hidden Reasoners: Unlocking Latent Reasoning Capabilities Via Self-rewarding (2024) • No Venue
Chen et al.
Scienceagentbench: Toward Rigorous Assessment Of Language Agents For Data-driven Scientific Discovery (2024) • No Venue
Chen et al.
Region-aware Text-to-image Generation Via Hard Binding And Soft Refinement (2024) • No Venue
Chen et al.
Training-free Regional Prompting For Diffusion Transformers (2024) • No Venue
Chen et al.
Unist: A Prompt-empowered Universal Model For Urban Spatio-temporal Prediction (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 51 citations
Yuan et al.
Self-rewarding Language Models (2024) • No Venue
Yuan et al.
Magictime: Time-lapse Video Generation Models As Metamorphic Simulators (2024) • No Venue
Yuan et al.
Gpt-4v(ision) Is A Generalist Web Agent, If Grounded (2024) • No Venue
Zheng et al.
Compressed Chain Of Thought: Efficient Reasoning Through Dense Representations (2024) • No Venue
Jeffrey Cheng, Benjamin van Durme
Beyond Fine-tuning: Unleashing The Potential Of Continuous Pretraining For Clinical Llms (2024) • No Venue
Christophe et al.
A Flexible Large Language Models Guardrail Development Methodology Applied To Off-topic Prompt Detection (2024) • No Venue
Gabriel Chua, Shing Yee Chan, Shaun Khoo
Large Language Model Capabilities In Perioperative Risk Prediction And Prognostication (2024) • JAMA Surgery • 60 citations
Chung et al.
Internlm-math: Open Math Large Language Models Toward Verifiable Reasoning (2024) • No Venue
Ying et al.
Enhancing Large Language Models With Pseudo- And Multisource- Knowledge Graphs For Open-ended Question Answering (2024) • IEEE Robotics and Automation Letters • 47 citations
Liu et al.
Toward Self-improvement Of Llms Via Imagination, Searching, And Criticizing (2024) • No Venue
Tian et al.
Easyref: Omni-generalized Group Image Reference For Diffusion Models Via Multimodal LLM (2024) • No Venue
Zong et al.
X-prompt: Towards Universal In-context Image Generation In Auto-regressive Vision Language Foundation Models (2024) • No Venue
Sun et al.
Meta-prompting: Enhancing Language Models With Task-agnostic Scaffolding (2024) • No Venue
Mirac Suzgun, Adam Tauman Kalai
Promptcharm: Text-to-image Generation Through Multi-modal Prompting And Refinement (2024) • Proceedings of the CHI Conference on Human Factors in Computing Systems • 66 citations
Wang et al.
Explainable Generative AI (genxai): A Survey, Conceptualization, And Research Agenda (2024) • Artificial Intelligence Review • 61 citations
Johannes Schneider
The Prompt Report: A Systematic Survey Of Prompting Techniques (2024) • No Venue
Schulhoff et al.
Show, Don't Tell: Aligning Language Models With Demonstrated Feedback (2024) • No Venue
Shaikh et al.
Large-scale Text-to-image Model With Inpainting Is A Zero-shot Subject-driven Image Generator (2024) • No Venue
Shin et al.
Design2code: How Far Are We From Automating Front-end Engineering? (2024) • No Venue
Si et al.
To Cot Or Not To Cot? Chain-of-thought Helps Mainly On Math And Symbolic Reasoning (2024) • No Venue
Sprague et al.
Enhancing Llm-based Feedback: Insights From Intelligent Tutoring Systems And The Learning Sciences (2024) • Communications in Computer and Information Science • 41 citations
John Stamper, Ruiwei Xiao, Xinying Hou
Diving Into Self-evolving Training For Multimodal Reasoning (2024) • No Venue
Liu et al.
Coarse Correspondence Elicit 3D Spacetime Understanding In Multimodal Language Model (2024) • No Venue
Liu et al.
Videodrafter: Content-consistent Multi-scene Video Generation With LLM (2024) • No Venue
Long et al.
Segment Anything Model For Medical Image Segmentation: Current Applications And Future Directions (2024) • Computers in Biology and Medicine • 179 citations
Yichi Zhang, Zhenrong Shen, Rushi Jiao
Step-controlled DPO: Leveraging Stepwise Error For Enhanced Mathematical Reasoning (2024) • No Venue
Lu et al.
Turboedit: Instant Text-based Image Editing (2024) • No Venue
Wu et al.
Exploring The Role Of Large Language Models In Prompt Encoding For Diffusion Models (2024) • No Venue
Ma et al.
Groma: Localized Visual Tokenization For Grounding Multimodal Large Language Models (2024) • No Venue
Ma et al.
Core: Context-regularized Text Embedding Learning For Text-to-image Personalization (2024) • No Venue
Wu et al.
Gpt-4v(ision) Is A Human-aligned Evaluator For Text-to-3d Generation (2024) • No Venue
Wu et al.
Beyond Examples: High-level Automated Reasoning Paradigm In In-context Learning Via MCTS (2024) • No Venue
Wu et al.
Openmedlm: Prompt Engineering Can Out-perform Fine-tuning In Medical Question-answering With Open-source Large Language Models (2024) • Scientific Reports • 53 citations
Maharjan et al.
Improving Text-to-image Consistency Via Automatic Prompt Optimization (2024) • No Venue
Mañas et al.
MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training (2024) • No Venue
McKinzie et al.
Whiteboard-of-thought: Thinking Step-by-step Across Modalities (2024) • No Venue
Sachit Menon, Richard Zemel, Carl Vondrick
ROS-LLM: A ROS Framework For Embodied AI With Task Feedback And Structured Reasoning (2024) • No Venue
Mower et al.
User-llm: Efficient LLM Contextualization With User Embeddings (2024) • No Venue
Ning et al.
Can Llms Learn By Teaching? A Preliminary Study (2024) • No Venue
Ning et al.
Iterative Reasoning Preference Optimization (2024) • No Venue
Pang et al.
Advprompter: Fast Adaptive Adversarial Prompting For Llms (2024) • No Venue
Paulus et al.
Dreambench++: A Human-aligned Benchmark For Personalized Image Generation (2024) • No Venue
Peng et al.
RAFT: Adapting Language Model To Domain Specific RAG (2024) • No Venue
Zhang et al.
Promptriever: Instruction-trained Retrievers Can Be Prompted Like Language Models (2024) • No Venue
Weller et al.
Fine-tuning And Prompt Engineering For Large Language Models-based Code Review Automation (2024) • Information and Software Technology • 43 citations
Chanathip Pornprasit, Chakkrit Tantithamthavorn
Diffusiongpt: Llm-driven Text-to-image Generation System (2024) • No Venue
Qin et al.
Self-discover: Large Language Models Self-compose Reasoning Structures (2024) • No Venue
Zhou et al.
Large Language Model For Vulnerability Detection: Emerging Results And Future Directions (2024) • ICSE-NIER'24: 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results • 74 citations
Xin Zhou, Ting Zhang, David Lo
The Effect Of Sampling Temperature On Problem Solving In Large Language Models (2024) • Findings of the Association for Computational Linguistics: EMNLP 2024 • 76 citations
Matthew Renze, Erhan Guven
Ipadapter-instruct: Resolving Ambiguity In Image-based Conditioning Using Instruct Prompts (2024) • No Venue
Rowles et al.
Insight-v: Exploring Long-chain Visual Reasoning With Multimodal Large Language Models (2024) • No Venue
Dong et al.
Buffer Of Thoughts: Thought-augmented Reasoning With Large Language Models (2024) • No Venue
Yang et al.
Processbench: Identifying Process Errors In Mathematical Reasoning (2024) • No Venue
Zheng et al.
Physics Of Language Models: Part 2.2, How To Learn From Mistakes On Grade-school Math Problems (2024) • No Venue
Ye et al.
Natural Language Reinforcement Learning (2024) • No Venue
Feng et al.
BLINK: Multimodal Large Language Models Can See But Not Perceive (2024) • No Venue
Fu et al.
Mm-ego: Towards Building Egocentric Multimodal Llms (2024) • No Venue
Ye et al.
Efficient Tool Use With Chain-of-abstraction Reasoning (2024) • No Venue
Gao et al.
Sam2point: Segment Any 3D As Videos In Zero-shot And Promptable Manners (2024) • No Venue
Guo et al.
Direct Language Model Alignment From Online AI Feedback (2024) • No Venue
Guo et al.
Token-budget-aware LLM Reasoning (2024) • No Venue
Han et al.
Training Large Language Models To Reason In A Continuous Latent Space (2024) • No Venue
Hao et al.
Mastering Text-to-image Diffusion: Recaptioning, Planning, And Generating With Multimodal Llms (2024) • No Venue
Yang et al.
Do Large Language Models Latently Perform Multi-hop Reasoning? (2024) • No Venue
Yang et al.
ELLA: Equip Diffusion Models With LLM For Enhanced Semantic Alignment (2024) • No Venue
Hu et al.
Prompting Large Language Models With Rationale Heuristics For Knowledge-based Visual Question Answering (2024) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 153 citations
Hu et al.
Affordance-based Robot Manipulation With Flow Matching (2024) • No Venue
Fan Zhang, Michael Gienger
E5-V: Universal Embeddings With Multimodal Large Language Models (2024) • No Venue
Jiang et al.
To Believe Or Not To Believe Your LLM (2024) • No Venue
Yadkori et al.
Process Modeling With Large Language Models (2024) • Lecture Notes in Business Information Processing • 50 citations
Kourani et al.
Can Large Language Models Explore In-context? (2024) • No Venue
Krishnamurthy et al.
A Human-inspired Reading Agent With Gist Memory Of Very Long Contexts (2024) • No Venue
Lee et al.
Mathemyths: Leveraging Large Language Models To Teach Mathematical Language Through Child-ai Co-creative Storytelling (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 47 citations
Zhang et al.
Promptkd: Unsupervised Prompt Distillation For Vision-language Models (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Li et al.
What Happened In Llms Layers When Trained For Fast Vs. Slow Thinking: A Gradient Perspective (2024) • No Venue
Ming Li, Yanhong Li, Tianyi Zhou
Your Mixture-of-experts LLM Is Secretly An Embedding Model For Free (2024) • No Venue
Ziyue Li, Tianyi Zhou
Magpie: Alignment Data Synthesis From Scratch By Prompting Aligned Llms With Nothing (2024) • No Venue
Xu et al.
Lost In Translation: A Study Of Bugs Introduced By Large Language Models While Translating Code (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 64 citations
Pan et al.
A Prompt Pattern Catalog To Enhance Prompt Engineering With Chatgpt (2023) • Arxiv • 746 citations
White et al.
Chatgpt Prompt Patterns For Improving Code Quality, Refactoring, Requirements Elicitation, And Software Design (2023) • Generative AI for Effective Software Development • 147 citations
White et al.
Is Chatgpt The Ultimate Programming Assistant -- How Far Is It? (2023) • Arxiv • 107 citations
Tian et al.
Cultural Bias And Cultural Alignment Of Large Language Models (2023) • PNAS Nexus • 113 citations
Tao et al.
Learning To Prompt In The Classroom To Understand AI Limits: A Pilot Study (2023) • Lecture Notes in Computer Science • 44 citations
Theophilou et al.
Towards Making The Most Of Chatgpt For Machine Translation (2023) • SSRN Electronic Journal • 94 citations
Peng et al.
Logic-lm: Empowering Large Language Models With Symbolic Solvers For Faithful Logical Reasoning (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 52 citations
Pan et al.
ART: Automatic Multi-step Reasoning And Tool-use For Large Language Models (2023) • Arxiv • 47 citations
Paranjape et al.
Make LLM A Testing Expert: Bringing Human-like Interaction To Mobile GUI Testing Via Functionality-aware Decisions (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 57 citations
Liu et al.
Jailbreaking Chatgpt Via Prompt Engineering: An Empirical Study (2023) • Arxiv • 99 citations
Liu et al.
G-eval: NLG Evaluation Using GPT-4 With Better Human Alignment (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 383 citations
Liu et al.
Is Chatgpt A Good Recommender? A Preliminary Study (2023) • Arxiv • 43 citations
Liu et al.
ONCE: Boosting Content-based Recommendation With Both Open- And Closed-source Large Language Models (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 67 citations
Liu et al.
Recommender Systems In The Era Of Large Language Models (llms) (2023) • IEEE Transactions on Knowledge and Data Engineering • 183 citations
Zhao et al.
Leveraging Large Language Models To Power Chatbots For Collecting User Self-reported Data (2023) • Proceedings of the ACM on Human-Computer Interaction • 50 citations
Wei et al.
All In One: Multi-task Prompting For Graph Neural Networks (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 123 citations
Sun et al.
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark And Empirical Study (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 74 citations
Sui et al.
Sensecape: Enabling Multilevel Exploration And Sensemaking With Large Language Models (2023) • UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology • 89 citations
Suh et al.
Supporting Qualitative Analysis With Large Language Models: Combining Codebook With GPT-3 For Deductive Coding (2023) • IUI '23: 28th International Conference on Intelligent User Interfaces • 126 citations
Xiao et al.
Florence-2: Advancing A Unified Representation For A Variety Of Vision Tasks (2023) • No Venue
Xiao et al.
Large Language Models Can Accurately Predict Searcher Preferences (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 82 citations
Thomas et al.
Fuzz4all: Universal Fuzzing With Large Language Models (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 108 citations
Xia et al.
Do Large Language Models Show Decision Heuristics Similar To Humans? A Case Study Using GPT-3.5 (2023) • Journal of Experimental Psychology: General • 46 citations
Suri et al.
Sql-palm: Improved Large Language Modeladaptation For Text-to-sql (2023) • No Venue
Sun et al.
Towards Open-world Recommendation With Knowledge Augmentation From Large Language Models (2023) • RecSys '24: 18th ACM Conference on Recommender Systems • 69 citations
Xi et al.
The Flan Collection: Designing Data And Methods For Effective Instruction Tuning (2023) • Arxiv • 109 citations
Longpre et al.
RTLLM: An Open-source Benchmark For Design RTL Generation With Large Language Model (2023) • 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) • 116 citations
Lu et al.
Unleashing The Emergent Cognitive Synergy In Large Language Models: A Task-solving Agent Through Multi-persona Self-collaboration (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 48 citations
Wang et al.
Collaborative Generative AI: Integrating Gpt-k For Efficient Editing In Text-to-image Generation (2023) • WWW '24: The ACM Web Conference 2024 • 58 citations
Zhu et al.
Visual Prompt Multi-modal Tracking (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 234 citations
Zhu et al.
Text Classification Via Large Language Models (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 108 citations
Sun et al.
Universal And Transferable Adversarial Attacks On Aligned Language Models (2023) • Arxiv • 171 citations
Zou et al.
Explainability For Large Language Models: A Survey (2023) • ACM Transactions on Intelligent Systems and Technology • 317 citations
Zhao et al.
Faithful Chain-of-thought Reasoning (2023) • Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) • 63 citations
Lyu et al.
Llm-rec: Personalized Recommendation Via Prompting Large Language Models (2023) • No Venue
Lyu et al.
Translating Radiology Reports Into Plain Language Using Chatgpt And GPT-4 With Prompt Learning: Promising Results, Limitations, And Potential (2023) • Visual Computing for Industry, Biomedicine, and Art • 264 citations
Lyu et al.
Dreameditor: Text-driven 3D Scene Editing With Neural Fields (2023) • SIGGRAPH Asia 2023 Conference Papers • 77 citations
Zhuang et al.
Query Rewriting For Retrieval-augmented Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 114 citations
Ma et al.
Large Language Model Is Not A Good Few-shot Information Extractor, But A Good Reranker For Hard Samples! (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 55 citations
Ma et al.
What Does CLIP Know About A Red Circle? Visual Prompt Engineering For Vlms (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 78 citations
Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
Can Generalist Foundation Models Outcompete Special-purpose Tuning? Case Study In Medicine (2023) • Arxiv • 157 citations
Nori et al.
Social Biases Through The Text-to-image Generation Lens (2023) • Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society • 80 citations
Ranjita Naik, Besmira Nushi
CORA: Adapting CLIP For Open-vocabulary Detection With Region Prompting And Anchor Pre-matching (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 103 citations
Wu et al.
Directgpt: A Direct Manipulation Interface To Interact With Large Language Models (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 54 citations
Masson et al.
Automatic Prompt Augmentation And Selection With Chain-of-thought From Labeled Data (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 43 citations
Kashun Shum, Shizhe Diao, Tong Zhang
Selfcheck: Using Llms To Zero-shot Check Their Own Step-by-step Reasoning (2023) • No Venue
Ning Miao, Yee Whye Teh, Tom Rainforth
State Of What Art? A Call For Multi-prompt LLM Evaluation (2023) • Transactions of the Association for Computational Linguistics • 58 citations
Mizrahi et al.
Assigning AI: Seven Approaches For Students, With Prompts (2023) • SSRN Electronic Journal • 95 citations
Ethan Mollick, Lilach Mollick
One-shot Labeling For Automatic Relevance Estimation (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Sean MacAvaney, Luca Soldaini
Chat2vis: Generating Data Visualisations Via Natural Language Using Chatgpt, Codex And GPT-3 Large Language Models (2023) • IEEE Access • 165 citations
Paula Maddigan, Teo Susnjak
Self-refine: Iterative Refinement With Self-feedback (2023) • Arxiv • 202 citations
Madaan et al.
Towards Expert-level Medical Question Answering With Large Language Models (2023) • Arxiv • 329 citations
Singhal et al.
An Early Evaluation Of Gpt-4v(ision) (2023) • No Venue
Wu et al.
Enhancing CLIP With GPT-4: Harnessing Visual Descriptions As Prompts (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 45 citations
Maniparambil et al.
Let The Llms Talk: Simulating Human-to-human Conversational QA Via Zero-shot Llm-to-llm Interactions (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 48 citations
Abbasiantaeb et al.
From Sparse To Dense: GPT-4 Summarization With Chain Of Density Prompting (2023) • No Venue
Adams et al.
Fixing Hardware Security Bugs With Large Language Models (2023) • IEEE Transactions on Information Forensics and Security • 47 citations
Ahmad et al.
Automatic Semantic Augmentation Of Language Model Prompts (for Code Summarization) (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 48 citations
Ahmed et al.
Document-level Machine Translation With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 71 citations
Wang et al.
Enable Language Models To Implicitly Learn Self-improvement From Data (2023) • No Venue
Wang et al.
Spellburst: A Node-based Interface For Exploratory Creative Coding With Natural Language Prompts (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 45 citations
Angert et al.
Language Models Enable Simple Systems For Generating Structured Views Of Heterogeneous Data Lakes (2023) • Proceedings of the VLDB Endowment • 41 citations
Arora et al.
Chainforge: A Visual Toolkit For Prompt Engineering And LLM Hypothesis Testing (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 82 citations
Arawjo et al.
Foundational Models Defining A New Era In Vision: A Survey And Outlook (2023) • Arxiv • 66 citations
Awais et al.
A Multitask, Multilingual, Multimodal Evaluation Of Chatgpt On Reasoning, Hallucination, And Interactivity (2023) • Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) • 466 citations
Bang et al.
Graph Of Thoughts: Solving Elaborate Problems With Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 234 citations
Besta et al.
Can GPT-3 Perform Statutory Reasoning? (2023) • Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law • 50 citations
Andrew Blair-Stanek, Nils Holzenberger, Benjamin van Durme
Agenttuning: Enabling Generalized Agent Abilities For Llms (2023) • No Venue
Zeng et al.
Promptify: Text-to-image Generation Through Interactive Prompt Exploration With Large Language Models (2023) • UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology • 110 citations
Brade et al.
Principled Instructions Are All You Need For Questioning Llama-1/2, GPT-3.5/4 (2023) • No Venue
Sondos Mahmoud Bsharat, Aidar Myrzakhan, Zhiqiang Shen
Sparks Of Artificial General Intelligence: Early Experiments With GPT-4 (2023) • Arxiv • 1480 citations
Bubeck et al.
Just Tell Me: Prompt Engineering In Business Process Management (2023) • Lecture Notes in Business Information Processing • 52 citations
Busch et al.
Prompt Engineering For Healthcare: Methodologies And Applications (2023) • Arxiv • 115 citations
Wang et al.
Attend-and-excite: Attention-based Semantic Guidance For Text-to-image Diffusion Models (2023) • ACM Transactions on Graphics • 291 citations
Chefer et al.
Autotamp: Autoregressive Task And Motion Planning With Llms As Translators And Checkers (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 57 citations
Chen et al.
Unleashing The Potential Of Prompt Engineering For Large Language Models (2023) • Patterns • 86 citations
Chen et al.
Skills-in-context Prompting: Unlocking Compositionality In Large Language Models (2023) • No Venue
Chen et al.
Scalable Multi-robot Collaboration With Large Language Models: Centralized Or Decentralized Systems? (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 46 citations
Chen et al.
Chatgpt Empowered Long-step Robot Control In Various Environments: A Case Application (2023) • IEEE Access • 75 citations
Wake et al.
Plan-and-solve Prompting: Improving Zero-shot Chain-of-thought Reasoning By Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 149 citations
Wang et al.
Freshllms: Refreshing Large Language Models With Search Engine Augmentation (2023) • No Venue
Vu et al.
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-based Self-verification (2023) • No Venue
Zhou et al.
Review Of Large Vision Models And Visual Prompt Engineering (2023) • Meta-Radiology • 157 citations
Wang et al.
Query2doc: Query Expansion With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 108 citations
Liang Wang, Nan Yang, Furu Wei
Revisiting Relation Extraction In The Era Of Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 94 citations
Somin Wadhwa, Silvio Amir, Byron C. Wallace
SAM On Medical Images: A Comprehensive Study On Three Prompt Modes (2023) • Arxiv • 53 citations
Cheng et al.
Is GPT-4 A Good Data Analyst? (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 65 citations
Liying Cheng, Xingxuan Li, Lidong Bing
Can Large Language Models Transform Computational Social Science? (2023) • Computational Linguistics • 231 citations
Ziems et al.
Chatgpt For Robotics: Design Principles And Model Abilities (2023) • Arxiv • 90 citations
Vemprala et al.
CLIP The Gap: A Single Domain Generalization Approach For Object Detection (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 87 citations
Vidit Vidit, Martin Engilberge, Mathieu Salzmann
Contrastive Chain-of-thought Prompting (2023) • No Venue
Chia et al.
Can Chatgpt Understand Too? A Comparative Study On Chatgpt And Fine-tuned BERT (2023) • Arxiv • 144 citations
Zhong et al.
Rolellm: Benchmarking, Eliciting, And Enhancing Role-playing Abilities Of Large Language Models (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 51 citations
Wang et al.
Promptpaint: Steering Text-to-image Generation Through Paint Medium-like Interactions (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 66 citations
John Joon Young Chung, Eytan Adar
Large Language Models In The Workplace: A Case Study On Prompt Engineering For Job Type Classification (2023) • Lecture Notes in Computer Science • 45 citations
Clavié et al.
Receive, Reason, And React: Drive As You Say With Large Language Models In Autonomous Vehicles (2023) • IEEE Intelligent Transportation Systems Magazine • 69 citations
Cui et al.
Speechx: Neural Codec Language Model As A Versatile Speech Transformer (2023) • No Venue
Wang et al.
Choice Over Control: How Users Write With Large Language Models Using Diegetic And Non-diegetic Prompting (2023) • CHI '23: CHI Conference on Human Factors in Computing Systems • 54 citations
Dang et al.
LLMR: Real-time Prompting Of Interactive Worlds Using Large Language Models (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 56 citations
Torre et al.
Task And Motion Planning With Large Language Models For Object Rearrangement (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 98 citations
Ding et al.
Can An Embodied Agent Find Your "cat-shaped Mug"? Llm-guided Exploration For Zero-shot Object Navigation (2023) • IEEE Robotics and Automation Letters • 55 citations
Vishnu Sashank Dorbala, James F. Mullen, Dinesh Manocha
A Comprehensive Survey On Multimodal Recommender Systems: Taxonomy, Evaluation, And Future Directions (2023) • Arxiv • 149 citations
Zhou et al.
GPT-3.5, GPT-4, Or BARD? Evaluating Llms Reasoning Ability In Zero-shot Setting And Performance Boosting Through Prompts (2023) • Natural Language Processing Journal • 69 citations
Espejel et al.
Is Chatgpt A Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation (2023) • Arxiv • 54 citations
Fang et al.
Transforming Sentiment Analysis In The Financial Domain With Chatgpt (2023) • Machine Learning with Applications • 124 citations
Fatouros et al.
Prompting Large Language Models With Speech Recognition Abilities (2023) • ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 60 citations
Fathullah et al.
Reasoning Implicit Sentiment With Chain-of-thought Prompting (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 75 citations
Fei et al.
Promptmagician: Interactive Prompt Engineering For Text-to-image Creation (2023) • IEEE Transactions on Visualization and Computer Graphics • 89 citations
Feng et al.
Diverse Data Augmentation With Diffusions For Effective Test-time Prompt Tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 44 citations
Feng et al.
Prompting Is All You Need: Automated Android Bug Replay With Large Language Models (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 89 citations
Sidong Feng, Chunyang Chen
Language Models Can Be Logical Solvers (2023) • No Venue
Feng et al.
Gpt4aigchip: Towards Next-generation AI Accelerator Design Automation Via Large Language Models (2023) • 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) • 79 citations
Fu et al.
Exploring The Feasibility Of Chatgpt For Event Extraction (2023) • Arxiv • 55 citations
Gao et al.
Enabling Large Language Models To Generate Text With Citations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 68 citations
Gao et al.
Is Chatgpt A Good Causal Reasoner? A Comprehensive Evaluation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Gao et al.
A Unified Continual Learning Framework With General Parameter-efficient Tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 49 citations
Gao et al.
Text-to-sql Empowered By Large Language Models: A Benchmark Evaluation (2023) • Proceedings of the VLDB Endowment • 111 citations
Gao et al.
Vita-clip: Video And Text Adaptive CLIP Via Multimodal Prompting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Wasim et al.
Vampnet: Music Generation Via Masked Acoustic Token Modeling (2023) • No Venue
Garcia et al.
Prompt Engineering A Prompt Engineer (2023) • No Venue
Ye et al.
Ip-adapter: Text Compatible Image Prompt Adapter For Text-to-image Diffusion Models (2023) • No Venue
Ye et al.
Prompt Cache: Modular Attention Reuse For Low-latency Inference (2023) • No Venue
Gim et al.
Large Language Models Are Few-shot Summarizers: Multi-intent Comment Generation Via In-context Learning (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 89 citations
Geng et al.
An Empirical Evaluation Of Using Large Language Models For Automated Unit Test Generation (2023) • IEEE Transactions on Software Engineering • 176 citations
Schäfer et al.
How Far Are Large Language Models From Agents With Theory-of-mind? (2023) • No Venue
Zhou et al.
Visual-language Prompt Tuning With Knowledge-guided Context Optimization (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 141 citations
Hantao Yao, Rui Zhang, Changsheng Xu
Not What You've Signed Up For: Compromising Real-world Llm-integrated Applications With Indirect Prompt Injection (2023) • CCS '23: ACM SIGSAC Conference on Computer and Communications Security • 178 citations
Greshake et al.
A Systematic Survey Of Prompt Engineering On Vision-language Foundation Models (2023) • Arxiv • 61 citations
Gu et al.
Connecting Large Language Models With Evolutionary Algorithms Yields Powerful Prompt Optimizers (2023) • No Venue
Guo et al.
Chatie: Zero-shot Information Extraction Via Chatting With Chatgpt (2023) • Arxiv • 141 citations
Wei et al.
Reasoning With Language Model Is Planning With World Model (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 68 citations
Hao et al.
The Political Ideology Of Conversational AI: Converging Evidence On Chatgpt's Pro-environmental, Left-libertarian Orientation (2023) • SSRN Electronic Journal • 85 citations
Jochen Hartmann, Jasper Schwenzow, Maximilian Witte
Large Language Models Are Competitive Near Cold-start Recommenders For Language- And Item-based Preferences (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 89 citations
Sanner et al.
Annollm: Making Large Language Models To Be Better Crowdsourced Annotators (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track) • 46 citations
He et al.
Stay On Topic With Classifier-free Guidance (2023) • No Venue
Sanchez et al.
Metagpt: Meta Programming For A Multi-agent Collaborative Framework (2023) • Arxiv • 124 citations
Hong et al.
CLIP Goes 3D: Leveraging Prompt Tuning For Language Grounded 3D Recognition (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 50 citations
Deepti Hegde, Jeya Maria Jose Valanarasu, Vishal M. Patel
How Good Are GPT Models At Machine Translation? A Comprehensive Evaluation (2023) • Arxiv • 177 citations
Hendy et al.
Olala: Ontology Matching With Large Language Models (2023) • K-CAP '23: Knowledge Capture Conference 2023 • 45 citations
Sven Hertling, Heiko Paulheim
3D-LLM: Injecting The 3D World Into Large Language Models (2023) • No Venue
Hong et al.
CLIP For All Things Zero-shot Sketch-based Image Retrieval, Fine-grained Or Not (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 93 citations
Sain et al.
Large Language Models Are Zero-shot Rankers For Recommender Systems (2023) • Lecture Notes in Computer Science • 155 citations
Hou et al.
Improving User Controlled Table-to-text Generation Robustness (2023) • Journal of the American Medical Informatics Association • 187 citations
Hu et al.
Opportunities And Challenges Of Chatgpt For Design Knowledge Management (2023) • Procedia CIRP • 76 citations
Hu et al.
Zero-shot Information Extraction From Radiological Reports Using Chatgpt (2023) • International Journal of Medical Informatics • 67 citations
Hu et al.
Towards Interpretable Mental Health Analysis With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 84 citations
Yang et al.
MM-REACT: Prompting Chatgpt For Multimodal Reasoning And Action (2023) • Arxiv • 78 citations
Yang et al.
Set-of-mark Prompting Unleashes Extraordinary Visual Grounding In GPT-4V (2023) • No Venue
Yang et al.
Diversity-aware Meta Visual Prompting (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Huang et al.
Neuroprompts: An Adaptive Framework To Optimize Prompts For Text-to-image Generation (2023) • No Venue
Shachar Rosenman, Vasudev Lal, Phillip Howard
Contrastive Decoding Improves Reasoning In Large Language Models (2023) • No Venue
Sean O'Brien, Mike Lewis
Testing The Reliability Of Chatgpt For Text Annotation And Classification: A Cautionary Remark (2023) • Arxiv • 80 citations
Michael V. Reiss
Large Language Models As Optimizers (2023) • No Venue
Yang et al.
The Dawn Of Lmms: Preliminary Explorations With Gpt-4v(ision) (2023) • Arxiv • 160 citations
Yang et al.
Retrieving Supporting Evidence For Llms Generated Answers (2023) • Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 47 citations
Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke
Mathprompter: Mathematical Reasoning Using Large Language Models (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track) • 78 citations
Shima Imani, Liang Du, Harsh Shrivastava
Chatgpt And Software Testing Education: Promises & Perils (2023) • 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW) • 198 citations
Jalil et al.
Consistency Analysis Of Chatgpt (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 49 citations
Myeongjun Erik Jang, Thomas Lukasiewicz
GPT-4 Can Pass The Korean National Licensing Examination For Korean Medicine Doctors (2023) • PLOS Digital Health • 48 citations
Jang et al.
Llmlingua: Compressing Prompts For Accelerated Inference Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 64 citations
Jiang et al.
Graphologue: Exploring Large Language Model Responses With Interactive Diagrams (2023) • UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology • 93 citations
Jiang et al.
Motiongpt: Human Motion As A Foreign Language (2023) • No Venue
Jiang et al.
Self-planning Code Generation With Large Language Models (2023) • ACM Transactions on Software Engineering and Methodology • 60 citations
Jiang et al.
Is Chatgpt A Good Translator? Yes With GPT-4 As The Engine (2023) • Arxiv • 307 citations
Jiao et al.
Inferfix: End-to-end Program Repair With Llms (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 97 citations
Jin et al.
Teach AI How To Code: Using Large Language Models As Teachable Agents For Programming Education (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 53 citations
Jin et al.
Chain Of Code: Reasoning With A Language Model-augmented Code Emulator (2023) • No Venue
Li et al.
Videodirectorgpt: Consistent Multi-scene Video Generation Via Llm-guided Planning (2023) • No Venue
Lin et al.
The Unlocking Spell On Base Llms: Rethinking Alignment Via In-context Learning (2023) • No Venue
Lin et al.
DIN-SQL: Decomposed In-context Learning Of Text-to-sql With Self-correction (2023) • Arxiv • 53 citations
Mohammadreza Pourreza, Davood Rafiei
Extracting Accurate Materials Data From Research Papers With Conversational Language Models And Prompt Engineering (2023) • Nature Communications • 212 citations
MacIej P. Polak, Dane Morgan
A Prompt Log Analysis Of Text-to-image Generation Systems (2023) • Proceedings of the ACM Web Conference 2023 • 40 citations
Xie et al.
Iterative Prompt Learning For Unsupervised Backlit Image Enhancement (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 129 citations
Liang et al.
Exploring Format Consistency For Instruction Tuning (2023) • Computers, Environment and Urban Systems • 43 citations
Liang et al.
GPT Detectors Are Biased Against Non-native English Writers (2023) • Patterns • 306 citations
Liang et al.
Segment Everything Everywhere All At Once (2023) • Arxiv • 151 citations
Zou et al.
Mindmap: Knowledge Graph Prompting Sparks Graph Of Thoughts In Large Language Models (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Yilin Wen, Zifeng Wang, Jimeng Sun
Automatic Prompt Optimization With "gradient Descent" And Beam Search (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 100 citations
Pryzant et al.
Performance Of Chatgpt On The US Fundamentals Of Engineering Exam: Comprehensive Assessment Of Proficiency And Potential Implications For Professional Environmental Engineering Practice (2023) • Computers and Education: Artificial Intelligence • 91 citations
Vinay Pursnani, Yusuf Sermet, Ibrahim Demir
CAMEL: Communicative Agents For "mind" Exploration Of Large Language Model Society (2023) • Arxiv • 87 citations
Li et al.
Efficient Domain Adaptation For Speech Foundation Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Li et al.
"HOT" Chatgpt: The Promise Of Chatgpt In Detecting And Discriminating Hateful, Offensive, And Toxic Comments On Social Media (2023) • ACM Transactions on the Web • 63 citations
Li et al.
Teach Llms To Personalize -- An Approach Inspired By Writing Education (2023) • No Venue
Li et al.
Nuances Are The Key: Unlocking Chatgpt To Find Failure-inducing Tests With Differential Prompting (2023) • 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) • 40 citations
Li et al.
Mental-llm: Leveraging Large Language Models For Mental Health Prediction Via Online Text Data (2023) • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies • 119 citations
Xu et al.
Augmenting Low-resource Text Classification With Graph-grounded Pre-training And Prompting (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Zhihao Wen, Yuan Fang
SMART-LLM: Smart Multi-agent Robot Task Planning Using Large Language Models (2023) • 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 79 citations
Shyam Sundar Kannan, Vishnunandan L. N. Venkatesh, Byung-Cheol Min
How Novices Use Llm-based Code Generators To Solve CS1 Coding Tasks In A Self-paced Learning Environment (2023) • Koli Calling '23: 23rd Koli Calling International Conference on Computing Education Research • 83 citations
Kazemitabaar et al.
Self-regulating Prompts: Foundational Model Adaptation Without Forgetting (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 113 citations
Khattak et al.
Dspy: Compiling Declarative Language Model Calls Into Self-improving Pipelines (2023) • No Venue
Khattab et al.
Can Large Language Models Replace Humans In The Systematic Review Process? Evaluating Gpt-4's Efficacy In Screening And Extracting Data From Peer-reviewed And Grey Literature In Multiple Languages (2023) • Research Synthesis Methods • 130 citations
Khraisha et al.
Evallm: Interactive Evaluation Of Large Language Model Prompts On User-defined Criteria (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 43 citations
Kim et al.
Prompt, Generate, Then Cache: Cascade Of Foundation Models Makes Strong Few-shot Learners (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 126 citations
Zhang et al.
The Potential And Pitfalls Of Using A Large Language Model Such As Chatgpt Or GPT-4 As A Clinical Assistant (2023) • Journal of the American Medical Informatics Association • 45 citations
Zhang et al.
Chatgpt: Jack Of All Trades, Master Of None (2023) • Information Fusion • 468 citations
Kocoń et al.
Personalize Segment Anything Model With One Shot (2023) • Arxiv • 65 citations
Zhang et al.
Motiongpt: Finetuned Llms Are General-purpose Motion Generators (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 50 citations
Zhang et al.
Chatgpt: Beginning Of An End Of Manual Linguistic Data Annotation? Use Case Of Automatic Genre Identification (2023) • Arxiv • 64 citations
Taja Kuzman, Igor Mozetič, Nikola Ljubešić
Don't Trust Chatgpt When Your Question Is Not In English: A Study Of Multilingual Abilities And Types Of Llms (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 42 citations
Zhang et al.
Redefining Qualitative Analysis In The AI Era: Utilizing Chatgpt For Efficient Thematic Analysis (2023) • Arxiv • 58 citations
Zhang et al.
Measuring Faithfulness In Chain-of-thought Reasoning (2023) • No Venue
Lanham et al.
"do Anything Now": Characterizing And Evaluating In-the-wild Jailbreak Prompts On Large Language Models (2023) • CCS '24: ACM SIGSAC Conference on Computer and Communications Security • 79 citations
Shen et al.
Applying Large Language Models And Chain-of-thought For Automatic Scoring (2023) • Computers and Education: Artificial Intelligence • 77 citations
Lee et al.
Large Language Models Are Effective Text Rankers With Pairwise Ranking Prompting (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 88 citations
Qin et al.
Reasoning With Language Model Prompting: A Survey (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 71 citations
Qiao et al.
Least-to-most Prompting Enables Complex Reasoning In Large Language Models (2022) • Arxiv • 317 citations
Zhou et al.
Measuring And Narrowing The Compositionality Gap In Language Models (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 101 citations
Press et al.
Black-box Tuning For Language-model-as-a-service (2022) • Arxiv • 56 citations
Sun et al.
Large Language Models Can Self-improve (2022) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 96 citations
Huang et al.
Large Language Models Are Better Reasoners With Self-verification (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 49 citations
Weng et al.
A Prompting-based Approach For Adversarial Example Generation And Robustness Enhancement (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 108 citations
Yang et al.
Re3: Generating Longer Stories With Recursive Reprompting And Revision (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 68 citations
Yang et al.
Chatgpt Makes Medicine Easy To Swallow: An Exploratory Case Study On Simplified Radiology Reports (2022) • Arxiv • 104 citations
Jeblick et al.
Visual Prompt Tuning (2022) • Lecture Notes in Computer Science • 1133 citations
Jia et al.
VIMA: General Robot Manipulation With Multimodal Prompts (2022) • Arxiv • 65 citations
Jiang et al.
Promptbert: Improving BERT Sentence Embeddings With Prompts (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 137 citations
Jiang et al.
Maieutic Prompting: Logically Consistent Reasoning With Recursive Explanations (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 64 citations
Jung et al.
Decomposed Prompting: A Modular Approach For Solving Complex Tasks (2022) • Arxiv • 91 citations
Khot et al.
Code As Policies: Language Model Programs For Embodied Control (2022) • 2023 IEEE International Conference on Robotics and Automation (ICRA) • 390 citations
Liang et al.
Conditional Prompt Learning For Vision-language Models (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 1126 citations
Zhou et al.
CM3: A Causal Masked Multimodal Model Of The Internet (2022) • Arxiv • 40 citations
Aghajanyan et al.
Zero-shot Temporal Action Detection Via Vision-language Prompting (2022) • Lecture Notes in Computer Science • 43 citations
Nag et al.
ATTEMPT: Parameter-efficient Multi-task Tuning Via Attentional Mixtures Of Soft Prompts (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 49 citations
Asai et al.
Test-time Prompt Tuning For Zero-shot Generalization In Vision-language Models (2022) • Arxiv • 112 citations
Shu et al.
SGPT: GPT Sentence Embeddings For Semantic Search (2022) • Arxiv • 56 citations
Niklas Muennighoff
Learning To Compose Soft Prompts For Compositional Zero-shot Learning (2022) • Arxiv • 41 citations
Nihal V. Nayak, Peilin Yu, Stephen H. Bach
Promptsource: An Integrated Development Environment And Repository For Natural Language Prompts (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 148 citations
Bach et al.
No More Fine-tuning? An Experimental Evaluation Of Prompt Tuning In Code Intelligence (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 124 citations
Wang et al.
Exploring Visual Prompts For Adapting Large-scale Models (2022) • Arxiv • 106 citations
Bahng et al.
Dualprompt: Complementary Prompting For Rehearsal-free Continual Learning (2022) • Lecture Notes in Computer Science • 288 citations
Wang et al.
Text2live: Text-driven Layered Image And Video Editing (2022) • Lecture Notes in Computer Science • 176 citations
Bar-Tal et al.
Prompting GPT-3 To Be Reliable (2022) • Arxiv • 68 citations
Si et al.
Prompting Is Programming: A Query Language For Large Language Models (2022) • Proceedings of the ACM on Programming Languages • 64 citations
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
Socratic Models: Composing Zero-shot Multimodal Reasoning With Language (2022) • Arxiv • 168 citations
Zeng et al.
GPT Takes The Bar Exam (2022) • SSRN Electronic Journal • 107 citations
Michael Bommarito, Daniel Martin Katz
LASP: Text-to-text Optimization For Language-aware Soft Prompting Of Vision & Language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 40 citations
Adrian Bulat, Georgios Tzimiropoulos
Expanding Language-image Pretrained Models For General Video Recognition (2022) • Lecture Notes in Computer Science • 221 citations
Ni et al.
Star: Bootstrapping Reasoning With Reasoning (2022) • Arxiv • 113 citations
Zelikman et al.
Unified Vision And Language Prompt Learning (2022) • Arxiv • 54 citations
Zang et al.
Large Language Models Are Few(1)-shot Table Reasoners (2022) • Findings of the Association for Computational Linguistics: EACL 2023 • 41 citations
Wenhu Chen
Program Of Thoughts Prompting: Disentangling Computation From Reasoning For Numerical Reasoning Tasks (2022) • Arxiv • 110 citations
Chen et al.
A Unified Sequence Interface For Vision Tasks (2022) • Arxiv • 49 citations
Chen et al.
Promptchainer: Chaining Large Language Model Prompts Through Visual Programming (2022) • CHI '22: CHI Conference on Human Factors in Computing Systems • 124 citations
Wu et al.
Promda: Prompt-based Data Augmentation For Low-resource NLU Tasks (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 61 citations
Wang et al.
CIRCLE: Continual Repair Across Programming Languages (2022) • Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis • 48 citations
Yuan et al.
A Unified Multi-task Learning Framework For Multi-goal Conversational Recommender Systems (2022) • ACM Transactions on Information Systems • 56 citations
Deng et al.
Biobart: Pretraining And Evaluation Of A Biomedical Generative Language Model (2022) • Proceedings of the 21st Workshop on Biomedical Language Processing • 101 citations
Yuan et al.
Progprompt: Generating Situated Robot Task Plans Using Large Language Models (2022) • Arxiv • 42 citations
Singh et al.
Can Foundation Models Wrangle Your Data? (2022) • Proceedings of the VLDB Endowment • 107 citations
Narayan et al.
Large Language Models Encode Clinical Knowledge (2022) • Nature • 1963 citations
Singhal et al.
VQGAN-CLIP: Open Domain Image Generation And Editing With Natural Language Guidance (2022) • Lecture Notes in Computer Science • 240 citations
Crowson et al.
Teaching Small Language Models To Reason (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 45 citations
Magister et al.
Promptagator: Few-shot Dense Retrieval From 8 Examples (2022) • Arxiv • 46 citations
Dai et al.
How To Prompt? Opportunities And Challenges Of Zero- And Few-shot Learning For Human-ai Interaction In Creative Applications Of Generative Models (2022) • Arxiv • 85 citations
Dang et al.
Rlprompt: Optimizing Discrete Text Prompts With Reinforcement Learning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 117 citations
Deng et al.
Understanding And Mitigating Overfitting In Prompt Tuning For Vision-language Models (2022) • IEEE Transactions on Circuits and Systems for Video Technology • 48 citations
Ma et al.
Learning To Prompt For Open-vocabulary Object Detection With Vision-language Model (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 253 citations
Du et al.
Prompt For Extraction? PAIE: Prompting Argument Interaction For Event Argument Extraction (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 112 citations
Ma et al.
A Taxonomy Of Prompt Modifiers For Text-to-image Generation (2022) • Behaviour & Information Technology • 128 citations
Jonas Oppenlaender
Unifying Vision, Text, And Layout For Universal Document Processing (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Tang et al.
Ontology-enhanced Prompt-tuning For Few-shot Learning (2022) • Proceedings of the ACM Web Conference 2022 • 57 citations
Ye et al.
P{\O}DA: Prompt-driven Zero-shot Domain Adaptation (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Fahes et al.
Challenging Big-bench Tasks And Whether Chain-of-thought Can Solve Them (2022) • Arxiv • 40 citations
Suzgun et al.
Promptdet: Towards Open-vocabulary Detection Using Uncurated Images (2022) • Lecture Notes in Computer Science • 119 citations
Feng et al.
Complexity-based Prompting For Multi-step Reasoning (2022) • Arxiv • 72 citations
Fu et al.
Make-a-scene: Scene-based Text-to-image Generation With Human Priors (2022) • Lecture Notes in Computer Science • 265 citations
Gafni et al.
The Unreliability Of Explanations In Few-shot Prompting For Textual Reasoning (2022) • Arxiv • 52 citations
Xi Ye, Greg Durrett
Social Simulacra: Creating Populated Prototypes For Social Computing Systems (2022) • Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology • 145 citations
Park et al.
An Information-theoretic Approach To Prompt Engineering Without Ground Truth Labels (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 95 citations
Sorensen et al.
Can Large Language Models Reason About Medical Questions? (2022) • Patterns • 138 citations
Liévin et al.
Recommendation As Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5) (2022) • RecSys '22: Sixteenth ACM Conference on Recommender Systems • 334 citations
Geng et al.
Dynamic Prompt Learning Via Policy Gradient For Semi-structured Mathematical Reasoning (2022) • Arxiv • 41 citations
Lu et al.
Prompt Distribution Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 188 citations
Lu et al.
Demystifying Prompts In Language Models Via Perplexity Estimation (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 47 citations
Gonen et al.
News Summarization And Evaluation In The Era Of GPT-3 (2022) • Arxiv • 180 citations
Tanya Goyal, Junyi Jessy Li, Greg Durrett
Interactive And Visual Prompt Engineering For Ad-hoc Task Adaptation With Large Language Models (2022) • IEEE Transactions on Visualization and Computer Graphics • 112 citations
Strobelt et al.
Fengshenbang 1.0: Being The Foundation Of Chinese Cognitive Intelligence (2022) • Arxiv • 44 citations
Zhang et al.
Can Machines Help Us Answering Question 16 In Datasheets, And In Turn Reflecting On Inappropriate Content? (2022) • FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency • 41 citations
Patrick Schramowski, Christopher Tauchmann, Kristian Kersting
Red Teaming Language Models With Language Models (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 141 citations
Perez et al.
UL2: Unifying Language Learning Paradigms (2022) • Arxiv • 97 citations
Tay et al.
Prompt-to-prompt Image Editing With Cross Attention Control (2022) • Arxiv • 360 citations
Hertz et al.
Large Language Models Are Reasoning Teachers (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Namgyu Ho, Laura Schmid, Se-Young Yun
Unnatural Instructions: Tuning Language Models With (almost) No Human Labor (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Honovich et al.
Pre-trained Language Model For Web-scale Retrieval In Baidu Search (2021) • ACM Computing Surveys • 2351 citations
Liu et al.
A Good Prompt Is Worth Millions Of Parameters: Low-resource Prompt-based Learning For Vision-language Models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 66 citations
Jin et al.
Generated Knowledge Prompting For Commonsense Reasoning (2021) • Arxiv • 44 citations
Liu et al.
Knowledgeable Prompt-tuning: Incorporating Knowledge Into Prompt Verbalizer For Text Classification (2021) • Arxiv • 70 citations
Hu et al.
Styleclip: Text-driven Manipulation Of Stylegan Imagery (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 87 citations
Patashnik et al.
What Makes Good In-context Examples For GPT-$3$? (2021) • Arxiv • 154 citations
Liu et al.
Prompt Programming For Large Language Models: Beyond The Few-shot Paradigm (2021) • CHI '21: CHI Conference on Human Factors in Computing Systems • 517 citations
Laria Reynolds, Kyle McDonell
An Empirical Study Of GPT-3 For Few-shot Knowledge-based VQA (2021) • Arxiv • 46 citations
Yang et al.
Learning How To Ask: Querying Lms With Mixtures Of Soft Prompts (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 309 citations
Guanghui Qin, Jason Eisner
GPT Understands, Too (2021) • AI Open • 422 citations
Liu et al.
Prefix-tuning: Optimizing Continuous Prompts For Generation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 1929 citations
Xiang Lisa Li, Percy Liang
Sentiprompt: Sentiment Knowledge Enhanced Prompt-tuning For Aspect-based Sentiment Analysis (2021) • Arxiv • 57 citations
Li et al.
The Power Of Scale For Parameter-efficient Prompt Tuning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 94 citations
Brian Lester, Rami Al-Rfou, Noah Constant
Learning To Prompt For Vision-language Models (2021) • International Journal of Computer Vision • 1953 citations
Zhou et al.
Actionclip: A New Paradigm For Video Action Recognition (2021) • Arxiv • 189 citations
Mengmeng Wang, Jiazheng Xing, Yong Liu
Recent Advances In Natural Language Processing Via Large Pre-trained Language Models: A Survey (2021) • ACM Computing Surveys • 812 citations
Min et al.
Noisy Channel Language Model Prompting For Few-shot Text Classification (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 46 citations
Min et al.
Knowprompt: Knowledge-aware Prompt-tuning With Synergistic Optimization For Relation Extraction (2021) • Proceedings of the ACM Web Conference 2022 • 330 citations
Chen et al.
GLIDE: Towards Photorealistic Image Generation And Editing With Text-guided Diffusion Models (2021) • Arxiv • 995 citations
Nichol et al.
Factual Probing Is [MASK]: Learning Vs. Learning To Recall (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 230 citations
Zexuan Zhong, Dan Friedman, Danqi Chen
Planning With Learned Entity Prompts For Abstractive Summarization (2021) • Transactions of the Association for Computational Linguistics • 92 citations
Narayan et al.
Few-shot Bot: Prompt-based Learning For Dialogue Systems (2021) • Arxiv • 45 citations
Madotto et al.
Generative Pre-trained Transformer For Design Concept Generation: An Exploration (2021) • Proceedings of the Design Society • 63 citations
Qihao Zhu, Jianxi Luo
Prompt-learning For Fine-grained Entity Typing (2021) • Arxiv • 44 citations
Ding et al.
Openprompt: An Open-source Framework For Prompt-learning (2021) • Arxiv • 64 citations
Ding et al.
What Changes Can Large-scale Language Models Bring? Intensive Study On Hyperclova: Billions-scale Korean Generative Pretrained Transformers (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 43 citations
Kim et al.
A Neural Network Solves, Explains, And Generates University Math Problems By Program Synthesis And Few-shot Learning At Human Level (2021) • Proceedings of the National Academy of Sciences • 70 citations
Drori et al.
Prompting Visual-language Models For Efficient Video Understanding (2021) • Lecture Notes in Computer Science • 246 citations
Ju et al.
Multitask Prompted Training Enables Zero-shot Task Generalization (2021) • Arxiv • 558 citations
Sanh et al.
Symbolic Knowledge Distillation: From General Language Models To Commonsense Models (2021) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 54 citations
West et al.
CPM-2: Large-scale Cost-effective Pre-trained Language Models (2021) • AI Open • 51 citations
Zhang et al.
Stylegan-nada: Clip-guided Domain Adaptation Of Image Generators (2021) • Arxiv • 65 citations
Gal et al.
Open Aspect Target Sentiment Classification With Natural Language Prompts (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 43 citations
Seoh et al.
P-tuning V2: Prompt Tuning Can Be Comparable To Fine-tuning Universally Across Scales And Tasks (2021) • Arxiv • 261 citations
Liu et al.
Fantastically Ordered Prompts And Where To Find Them: Overcoming Few-shot Prompt Order Sensitivity (2021) • Arxiv • 118 citations
Lu et al.
PPT: Pre-trained Prompt Tuning For Few-shot Learning (2021) • Arxiv • 100 citations
Gu et al.
Large Pre-trained Language Models Contain Human-like Biases Of What Is Right And Wrong To Do (2021) • Nature Machine Intelligence • 194 citations
Schramowski et al.
Cutting Down On Prompts And Parameters: Simple Few-shot Learning With Language Models (2021) • Findings of the Association for Computational Linguistics: ACL 2022 • 42 citations
Logan et al.
CPT: Colorful Prompt Tuning For Pre-trained Vision-language Models (2021) • AI Open • 62 citations
Yao et al.
Autoprompt: Eliciting Knowledge From Language Models With Automatically Generated Prompts (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 611 citations
Shin et al.
Ctrlsum: Towards Generic Controllable Text Summarization (2020) • Arxiv • 50 citations
He et al.
Making Pre-trained Language Models Better Few-shot Learners (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 556 citations
Tianyu Gao, Adam Fisch, Danqi Chen
How Can We Know What Language Models Know? (2019) • Transactions of the Association for Computational Linguistics • 543 citations
Jiang et al.

Showing first 12 while collapsed. Click to expand and reveal all 508.

— Q —

Question Answering 746 papers #

Vision-guided Chunking Is All You Need: Enhancing RAG With Multimodal Document Understanding (2025) • No Venue
Tripathi et al.
Qwenlong-l1: Towards Long-context Large Reasoning Models With Reinforcement Learning (2025) • No Venue
Wan et al.
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation And Methodology (2025) • No Venue
Wang et al.
Chain-of-retrieval Augmented Generation (2025) • No Venue
Wang et al.
Cmphysbench: A Benchmark For Evaluating Large Language Models In Condensed Matter Physics (2025) • No Venue
Wang et al.
Fostering Video Reasoning Via Next-event Prediction (2025) • No Venue
Wang et al.
Grasp Any Region: Towards Precise, Contextual Pixel Understanding For Multimodal Llms (2025) • No Venue
Wang et al.
Scientists' First Exam: Probing Cognitive Abilities Of MLLM Via Perception, Understanding, And Reasoning (2025) • No Venue
Zhou et al.
How Far Are We From Genuinely Useful Deep Research Agents? (2025) • No Venue
Zhang et al.
MARS: A Multi-agent Framework Incorporating Socratic Guidance For Automated Prompt Optimization (2025) • No Venue
Zhang et al.
Othink-r1: Intrinsic Fast/slow Thinking Mode Switching For Over-reasoning Mitigation (2025) • No Venue
Zhang et al.
Pixel-sail: Single Transformer For Pixel-grounded Understanding (2025) • No Venue
Zhang et al.
What, How, Where, And How Well? A Survey On Test-time Scaling In Large Language Models (2025) • No Venue
Zhang et al.
Repurposing Synthetic Data For Fine-grained Search Agent Supervision (2025) • No Venue
Zhao et al.
Vrbench: A Benchmark For Multi-step Reasoning In Long Narrative Videos (2025) • No Venue
Yu et al.
Mme-reasoning: A Comprehensive Benchmark For Logical Reasoning In Mllms (2025) • No Venue
Yuan et al.
Echox: Towards Mitigating Acoustic-semantic Gap Via Echo Training For Speech-to-speech Llms (2025) • No Venue
Zhang et al.
Deepanalyze: Agentic Large Language Models For Autonomous Data Science (2025) • No Venue
Zhang et al.
Browseragent: Building Web Agents With Human-inspired Web Browsing Actions (2025) • No Venue
Zhang et al.
Codecriticbench: A Holistic Code Critique Benchmark For Large Language Models (2025) • No Venue
Zhang et al.
Loongrl:reinforcement Learning For Advanced Reasoning Over Long Contexts (2025) • No Venue
Wang et al.
Part-x-mllm: Part-aware 3D Multimodal Large Language Model (2025) • No Venue
Wang et al.
Mr-align: Meta-reasoning Informed Factuality Alignment For Large Reasoning Models (2025) • No Venue
Wang et al.
OTC: Optimal Tool Calls Via Reinforcement Learning (2025) • No Venue
Wang et al.
Scoreflow: Mastering LLM Agent Workflows Via Score-based Preference Optimization (2025) • No Venue
Wang et al.
Vision-zero: Scalable VLM Self-improvement Via Strategic Gamified Self-play (2025) • No Venue
Wang et al.
Vidorag: Visual Document Retrieval-augmented Generation Via Dynamic Iterative Reasoning Agents (2025) • No Venue
Wang et al.
Truthrl: Incentivizing Truthful Llms Via Reinforcement Learning (2025) • No Venue
Wei et al.
The Bitter Lesson Learned From 2,000+ Multilingual Benchmarks (2025) • No Venue
Wu et al.
Webwalker: Benchmarking Llms In Web Traversal (2025) • No Venue
Wu et al.
Scalecap: Inference-time Scalable Image Captioning Via Dual-modality Debiasing (2025) • No Venue
Xing et al.
Noderag: Structuring Graph-based RAG With Heterogeneous Nodes (2025) • No Venue
Xu et al.
Egolife: Towards Egocentric Life Assistant (2025) • No Venue
Yang et al.
Longvt: Incentivizing "thinking With Long Videos" Via Native Tool Calling (2025) • No Venue
Yang et al.
Are Reasoning Models More Prone To Hallucination? (2025) • No Venue
Yao et al.
Universalrag: Retrieval-augmented Generation Over Multiple Corpora With Diverse Modalities And Granularities (2025) • No Venue
Yeo et al.
Worldmm: Dynamic Multimodal Memory Agent For Long Video Reasoning (2025) • No Venue
Yeo et al.
Grokking In The Wild: Data Augmentation For Real-world Multi-hop Reasoning With Transformers (2025) • No Venue
Roman Abramov, Felix Steinbauer, Gjergji Kasneci
Open Deep Search: Democratizing Search With Open-source Reasoning Agents (2025) • No Venue
Alzubi et al.
V-JEPA 2: Self-supervised Video Models Enable Understanding, Prediction And Planning (2025) • No Venue
Assran et al.
Microvqa: A Multimodal Reasoning Benchmark For Microscopy-based Scientific Research (2025) • No Venue
Burgess et al.
Halumem: Evaluating Hallucinations In Memory Systems Of Agents (2025) • No Venue
Chen et al.
Videovista-culturallingo: 360^circ Horizons-bridging Cultures, Languages, And Domains In Video Comprehension (2025) • No Venue
Chen et al.
Selfcite: Self-supervised Alignment For Context Attribution In Large Language Models (2025) • No Venue
Chuang et al.
SSRL: Self-search Reinforcement Learning (2025) • No Venue
Fan et al.
Llama-omni2: Llm-based Real-time Spoken Chatbot With Autoregressive Streaming Speech Synthesis (2025) • No Venue
Fang et al.
Onethinker: All-in-one Reasoning Model For Image And Video (2025) • No Venue
Feng et al.
Multiple Choice Questions: Reasoning Makes Large Language Models (llms) More Self-confident Even When They Are Wrong (2025) • No Venue
Fu et al.
Beyond Ten Turns: Unlocking Long-horizon Agentic Search With Large-scale Asynchronous RL (2025) • No Venue
Gao et al.
Arc-hunyuan-video-7b: Structured Video Comprehension Of Real-world Shorts (2025) • No Venue
Ge et al.
Inside-out: Hidden Factual Knowledge In Llms (2025) • No Venue
Gekhman et al.
Audio Flamingo 2: An Audio-language Model With Long-audio Understanding And Expert Reasoning Abilities (2025) • No Venue
Ghosh et al.
Openthoughts: Data Recipes For Reasoning Models (2025) • No Venue
Guha et al.
ACADREASON: Exploring The Limits Of Reasoning Models With Academic Research Problems (2025) • No Venue
Gui et al.
Beyond The Last Answer: Your Reasoning Trace Uncovers More Than You Think (2025) • No Venue
Hasan Abed Al Kader Hammoud, Hani Itani, Bernard Ghanem
Mdocagent: A Multi-modal Multi-agent Framework For Document Understanding (2025) • No Venue
Han et al.
Spectrum Projection Score: Aligning Retrieved Summaries With Reader Models In Retrieval-augmented Generation (2025) • No Venue
Hu et al.
Video-mmmu: Evaluating Knowledge Acquisition From Multi-discipline Professional Videos (2025) • No Venue
Hu et al.
O1 Replication Journey -- Part 3: Inference-time Scaling For Medical Reasoning (2025) • No Venue
Huang et al.
Multi-granular Spatio-temporal Token Merging For Training-free Acceleration Of Video Llms (2025) • No Venue
Hyun et al.
When Thoughts Meet Facts: Reusable Reasoning For Long-context Lms (2025) • No Venue
Jeong et al.
CSVQA: A Chinese Multimodal Benchmark For Evaluating STEM Reasoning Capabilities Of Vlms (2025) • No Venue
Jian et al.
Search-r1: Training Llms To Reason And Leverage Search Engines With Reinforcement Learning (2025) • No Venue
Jin et al.
Is That Your Final Answer? Test-time Scaling Improves Selective Question Answering (2025) • No Venue
William Jurayj, Jeffrey Cheng, Benjamin van Durme
Expect The Unexpected: Failsafe Long Context QA For Finance (2025) • No Venue
Kamble et al.
LM2: Large Memory Models (2025) • No Venue
Kang et al.
Reasoning With Sampling: Your Base Model Is Smarter Than You Think (2025) • No Venue
Aayush Karan, Yilun Du
Toward Evaluative Thinking: Meta Policy Optimization With Evolving Reward Models (2025) • No Venue
Kim et al.
Rearag: Knowledge-guided Reasoning Enhances Factuality Of Large Reasoning Models With Iterative Retrieval Augmented Generation (2025) • No Venue
Lee et al.
Diffusion Language Models Know The Answer Before Decoding (2025) • No Venue
Li et al.
Omnivideobench: Towards Audio-visual Understanding Evaluation For Omni Mllms (2025) • No Venue
Li et al.
ROVER: Benchmarking Reciprocal Cross-modal Reasoning For Omnimodal Generation (2025) • No Venue
Liang et al.
Metaladder: Ascending Mathematical Solution Quality Via Analogical-problem Reasoning Transfer (2025) • No Venue
Lin et al.
Towards Understanding Camera Motions In Any Video (2025) • No Venue
Lin et al.
Deciphering Trajectory-aided LLM Reasoning: An Optimization Perspective (2025) • No Venue
Liu et al.
FUSION: Fully Integration Of Vision-language Representations For Deep Cross-modal Understanding (2025) • No Venue
Liu et al.
Taking Notes Brings Focus? Towards Multi-turn Multimodal Dialogue Learning (2025) • No Venue
Liu et al.
Seeing, Listening, Remembering, And Reasoning: A Multimodal Agent With Long-term Memory (2025) • No Venue
Long et al.
Bizfinbench: A Business-driven Real-world Financial Benchmark For Evaluating Llms (2025) • No Venue
Lu et al.
Learning From Peers In Reasoning Models (2025) • No Venue
Luo et al.
Chartqapro: A More Diverse And Challenging Benchmark For Chart Question Answering (2025) • No Venue
Masry et al.
Easy Dataset: A Unified And Extensible Framework For Synthesizing LLM Fine-tuning Data From Unstructured Documents (2025) • No Venue
Miao et al.
S1: Simple Test-time Scaling (2025) • No Venue
Muennighoff et al.
Semviqa: A Semantic Question Answering System For Vietnamese Information Fact-checking (2025) • No Venue
Nguyen et al.
Medvlm-r1: Incentivizing Medical Reasoning Capability Of Vision-language Models (vlms) Via Reinforcement Learning (2025) • No Venue
Pan et al.
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information (2025) • No Venue
Park et al.
Multifinben: A Multilingual, Multimodal, And Difficulty-aware Benchmark For Financial LLM Evaluation (2025) • No Venue
Peng et al.
Plutus: Benchmarking Large Language Models In Low-resource Greek Finance (2025) • No Venue
Peng et al.
SWE-QA: Can Language Models Answer Repository-level Code Questions? (2025) • No Venue
Peng et al.
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification To Improve Trustworthy QA (2025) • No Venue
Pletenev et al.
How Much Knowledge Can You Pack Into A Lora Adapter Without Harming LLM? (2025) • No Venue
Pletenev et al.
Dispider: Enabling Video Llms With Active Real-time Interaction Via Disentangled Perception, Decision, And Reaction (2025) • No Venue
Qian et al.
Videomathqa: Benchmarking Mathematical Reasoning Via Multimodal Understanding In Videos (2025) • No Venue
Rasheed et al.
When Models Lie, We Learn: Multilingual Span-level Hallucination Detection With Psiloqa (2025) • No Venue
Rykov et al.
Dota-rag: Dynamic Of Thought Aggregation RAG (2025) • No Venue
Ruangtanusak et al.
Aligning Text, Images, And 3D Structure Token-by-token (2025) • No Venue
Aadarsh Sahoo, Vansh Tibrewal, Georgia Gkioxari
Phyx: Does Your Model Have The "wits" For Physical Reasoning? (2025) • No Venue
Shen et al.
Longcodezip: Compress Long Context For Code Language Models (2025) • No Venue
Shi et al.
Vf-eval: Evaluating Multimodal Llms For Generating Feedback On AIGC Videos (2025) • No Venue
Song et al.
Reasonmed: A 370K Multi-agent Generated Dataset For Advancing Medical Reasoning (2025) • No Venue
Sun et al.
HANRAG: Heuristic Accurate Noise-resistant Retrieval-augmented Generation For Multi-hop Question Answering (2025) • No Venue
Sun et al.
Llava-scissor: Token Compression With Semantic Connected Components For Video Llms (2025) • No Venue
Sun et al.
When An LLM Is Apprehensive About Its Answers -- And When Its Uncertainty Is Justified (2025) • No Venue
Sychev et al.
Lego-puzzles: How Good Are Mllms At Multi-step Spatial Reasoning? (2025) • No Venue
Tang et al.
Lingshu: A Generalist Foundation Model For Unified Multimodal Medical Understanding And Reasoning (2025) • No Venue
Team et al.
Supergpqa: Scaling LLM Evaluation Across 285 Graduate Disciplines (2025) • No Venue
Team et al.
Ego-r1: Chain-of-tool-thought For Ultra-long Egocentric Video Reasoning (2025) • No Venue
Tian et al.
From Rags To Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information For Factual Queries (2024) • No Venue
Wadhwa et al.
Make Your LLM Fully Utilize The Context (2024) • No Venue
An et al.
Openscholar: Synthesizing Scientific Literature With Retrieval-augmented Lms (2024) • No Venue
Asai et al.
Minigpt4-video: Advancing Multimodal Llms For Video Understanding With Interleaved Visual-textual Tokens (2024) • No Venue
Ataallah et al.
Screenai: A Vision-language Model For UI And Infographics Understanding (2024) • No Venue
Baechler et al.
Longbench V2: Towards Deeper Understanding And Reasoning On Realistic Long-context Multitasks (2024) • No Venue
Bai et al.
Flowmind: Automatic Workflow Generation With Llms (2024) • No Venue
Zeng et al.
Quiet-star: Language Models Can Teach Themselves To Think Before Speaking (2024) • No Venue
Zelikman et al.
INDUS: Effective And Efficient Language Models For Scientific Applications (2024) • No Venue
Bhattacharjee et al.
Text2sql Is Not Enough: Unifying AI And Databases With TAG (2024) • No Venue
Biswal et al.
Biomedlm: A 2.7B Parameter Language Model Trained On Biomedical Text (2024) • No Venue
Bolton et al.
Language Models As Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning In Language Models (2024) • No Venue
Chae et al.
Mindsearch: Mimicking Human Minds Elicits Deep AI Searcher (2024) • No Venue
Chen et al.
Gmai-mmbench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI (2024) • No Venue
Chen et al.
Hallucination Detection: Robustly Discerning Reliable Answers In Large Language Models (2024) • CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management • 59 citations
Chen et al.
Spatialvlm: Endowing Vision-language Models With Spatial Reasoning Capabilities (2024) • No Venue
Chen et al.
Textgrad: Automatic "differentiation" Via Text (2024) • No Venue
Yuksekgonul et al.
Gpt-4v(ision) Is A Generalist Web Agent, If Grounded (2024) • No Venue
Zheng et al.
Videgothink: Assessing Egocentric Video Understanding Capabilities For Embodied AI (2024) • No Venue
Cheng et al.
Videollama 2: Advancing Spatial-temporal Modeling And Audio Understanding In Video-llms (2024) • No Venue
Cheng et al.
M3docrag: Multi-modal Retrieval Is What You Need For Multi-page Multi-document Understanding (2024) • No Venue
Cho et al.
Med42-v2: A Suite Of Clinical Llms (2024) • No Venue
Christophe et al.
M-longdoc: A Benchmark For Multimodal Super-long Document Understanding And A Retrieval-aware Tuning Framework (2024) • No Venue
Chia et al.
Large Legal Fictions: Profiling Legal Hallucinations In Large Language Models (2024) • Journal of Legal Analysis • 108 citations
Dahl et al.
A Silver Bullet Or A Compromise For Full Attention? A Comprehensive Study Of Gist Token-based Context Compression (2024) • No Venue
Deng et al.
Mapeval: A Map-based Evaluation Of Geo-spatial Reasoning In Foundation Models (2024) • No Venue
Dihan et al.
Vintern-1b: An Efficient Multimodal Large Language Model For Vietnamese (2024) • No Venue
Doan et al.
Chain-of-table: Evolving Tables In The Reasoning Chain For Table Understanding (2024) • No Venue
Wang et al.
Judging The Judges: Evaluating Alignment And Vulnerabilities In Llms-as-judges (2024) • No Venue
Thakur et al.
Chameleon: Mixed-modal Early-fusion Foundation Models (2024) • No Venue
Chameleon Team
Towards Retrieval Augmented Generation Over Large Video Libraries (2024) • No Venue
Yannis Tevissen, Khalil Guetari, Frédéric Petitpont
Docgraphlm: Documental Graph Language Model For Information Extraction (2024) • No Venue
Wang et al.
3D Question Answering For City Scene Understanding (2024) • No Venue
Sun et al.
Videohallucer: Evaluating Intrinsic And Extrinsic Hallucinations In Large Video-language Models (2024) • No Venue
Wang et al.
Videoagent: Long-form Video Understanding With Large Language Model As Agent (2024) • No Venue
Wang et al.
Searching For Best Practices In Retrieval-augmented Generation (2024) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 44 citations
Wang et al.
Textsquare: Scaling Up Text-centric Visual Instruction Tuning (2024) • No Venue
Tang et al.
Lloco: Learning Long Contexts Offline (2024) • No Venue
Tan et al.
Writing In The Margins: Better Inference Pattern For Long Context Retrieval (2024) • No Venue
Russak et al.
Capabilities Of Gemini Models In Medicine (2024) • No Venue
Saab et al.
MMAU: A Massive Multi-task Audio Understanding And Reasoning Benchmark (2024) • No Venue
Sakshi et al.
RAPTOR: Recursive Abstractive Processing For Tree-organized Retrieval (2024) • No Venue
Sarthi et al.
Hybridrag: Integrating Knowledge Graphs And Vector Retrieval Augmented Generation For Efficient Information Extraction (2024) • Proceedings of the 5th ACM International Conference on AI in Finance • 55 citations
Sarmah et al.
Blended RAG: Improving RAG (retriever-augmented Generation) Accuracy With Semantic Search And Hybrid Query-based Retrievers (2024) • 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR) • 47 citations
Kunal Sawarkar, Abhilasha Mangal, Shivam Raj Solanki
Livexiv -- A Multi-modal Live Benchmark Based On Arxiv Papers Content (2024) • No Venue
Shabtay et al.
Learning To Decode Collaboratively With Multiple Language Models (2024) • No Venue
Shen et al.
Lumos : Empowering Multimodal Llms With Scene Text Recognition (2024) • No Venue
Shenoy et al.
Adapting Llms To Hebrew: Unveiling Dictalm 2.0 With Enhanced Vocabulary And Instruction Capabilities (2024) • No Venue
Shmidman et al.
Fine Tuning Vs. Retrieval Augmented Generation For Less Popular Knowledge (2024) • Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 40 citations
Heydar Soudani, Evangelos Kanoulas, Faegheh Hasibi
Gliner Multi-task: Generalist Lightweight Model For Various Information Extraction Tasks (2024) • No Venue
Ihor Stepanov, Mykhailo Shtopko
What Large Language Models Know And What People Think They Know (2024) • Nature Machine Intelligence • 43 citations
Steyvers et al.
Chatqa: Building GPT-4 Level Conversational QA Models (2024) • No Venue
Liu et al.
Tuning Language Models By Proxy (2024) • No Venue
Liu et al.
VPTQ: Extreme Low-bit Vector Post-training Quantization For Large Language Models (2024) • No Venue
Liu et al.
Video Instruction Tuning With Synthetic Data (2024) • No Venue
Zhang et al.
Meta-chunking: Learning Efficient Text Segmentation Via Logical Perception (2024) • No Venue
Zhao et al.
Tablebench: A Comprehensive And Complex Benchmark For Table Question Answering (2024) • No Venue
Wu et al.
Steering Knowledge Selection Behaviours In Llms Via Sae-based Representation Engineering (2024) • No Venue
Zhao et al.
Source2synth: Synthetic Data Generation And Curation Grounded In Real Data Sources (2024) • No Venue
Lupidi et al.
Longvideobench: A Benchmark For Long-context Interleaved Video-language Understanding (2024) • No Venue
Wu et al.
Openmedlm: Prompt Engineering Can Out-perform Fine-tuning In Medical Question-answering With Open-source Large Language Models (2024) • Scientific Reports • 53 citations
Maharjan et al.
Evaluating Very Long-term Conversational Memory Of LLM Agents (2024) • No Venue
Maharana et al.
Rephrasing The Web: A Recipe For Compute And Data-efficient Language Modeling (2024) • No Venue
Maini et al.
Chartgemma: Visual Instruction-tuning For Chart Reasoning In The Wild (2024) • No Venue
Masry et al.
Whiteboard-of-thought: Thinking Step-by-step Across Modalities (2024) • No Venue
Sachit Menon, Richard Zemel, Carl Vondrick
Videoglamm: A Large Multimodal Model For Pixel-level Visual Grounding In Videos (2024) • No Venue
Munasinghe et al.
Realm: Reference Resolution As Language Modeling (2024) • No Venue
Moniz et al.
MALT: Improving Reasoning With Multi-agent LLM Training (2024) • No Venue
Motwani et al.
Grouse: A Benchmark To Evaluate Evaluators In Grounded Question Answering (2024) • No Venue
Muller et al.
Bimedix2: Bio-medical Expert LMM For Diverse Medical Modalities (2024) • No Venue
Mullappilly et al.
Reka Core, Flash, And Edge: A Series Of Powerful Multimodal Language Models (2024) • No Venue
Ormazabal et al.
Worldcuisines: A Massive-scale Benchmark For Multilingual And Multicultural Visual Question Answering On Global Cuisines (2024) • No Venue
Winata et al.
CBR-RAG: Case-based Reasoning For Retrieval Augmented Generation In Llms For Legal Question Answering (2024) • Lecture Notes in Computer Science • 52 citations
Wiratunga et al.
Large Language Model Confidence Estimation Via Black-box Access (2024) • No Venue
Pedapati et al.
A Toolbox For Surfacing Health Equity Harms And Biases In Large Language Models (2024) • Nature Medicine • 46 citations
Pfohl et al.
Mutual Reasoning Makes Smaller Llms Stronger Problem-solvers (2024) • No Venue
Qi et al.
Memorag: Moving Towards Next-gen RAG Via Memory-inspired Knowledge Discovery (2024) • No Venue
Qian et al.
Towards Building Multilingual Language Model For Medicine (2024) • Nature Communications • 53 citations
Qiu et al.
Hellobench: Evaluating Long Text Generation Capabilities Of Large Language Models (2024) • No Venue
Que et al.
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning With Checklist (2024) • No Venue
Zhou et al.
Small Language Model Meets With Reinforced Vision Vocabulary (2024) • No Venue
Wei et al.
Long-form Factuality In Large Language Models (2024) • No Venue
Wei et al.
From Local To Global: A Graph RAG Approach To Query-focused Summarization (2024) • Arxiv • 90 citations
Edge et al.
Mmbench-video: A Long-form Multi-shot Benchmark For Holistic Video Understanding (2024) • No Venue
Fang et al.
Voco-llama: Towards Vision Compression With Large Language Models (2024) • No Venue
Ye et al.
Lazyllm: Dynamic Token Pruning For Efficient Long Context LLM Inference (2024) • No Venue
Fu et al.
Mm-ego: Towards Building Egocentric Multimodal Llms (2024) • No Venue
Ye et al.
Differential Transformer (2024) • No Venue
Ye et al.
Kvasir-vqa: A Text-image Pair GI Tract Dataset (2024) • No Venue
Gautam et al.
Gemini 1.5: Unlocking Multimodal Understanding Across Millions Of Tokens Of Context (2024) • Arxiv • 253 citations
Team et al.
GAMA: A Large Audio-language Model With Advanced Audio Understanding And Complex Reasoning Abilities (2024) • No Venue
Ghosh et al.
Mulberry: Empowering MLLM With O1-like Reasoning And Reflection Via Collective Monte Carlo Tree Search (2024) • No Venue
Yao et al.
Omnifusion Technical Report (2024) • No Venue
Goncharova et al.
The Unreasonable Ineffectiveness Of The Deeper Layers (2024) • No Venue
Gromov et al.
Vision-language Models For Medical Report Generation And Visual Question Answering: A Review (2024) • Frontiers in Artificial Intelligence • 86 citations
Iryna Hartsock, Ghulam Rasool
Distill Visual Chart Reasoning Ability From Llms To Mllms (2024) • No Venue
He et al.
Chinese Simpleqa: A Chinese Factuality Evaluation For Large Language Models (2024) • No Venue
He et al.
MA-LMM: Memory-augmented Large Multimodal Model For Long-term Video Understanding (2024) • No Venue
He et al.
Seakr: Self-aware Knowledge Retrieval For Adaptive Retrieval Augmented Generation (2024) • No Venue
Yao et al.
Distilling An End-to-end Voice Assistant Without Instruction Training Data (2024) • No Venue
Held et al.
Thinking In Space: How Multimodal Large Language Models See, Remember, And Recall Spaces (2024) • No Venue
Yang et al.
CRAG -- Comprehensive RAG Benchmark (2024) • No Venue
Yang et al.
Do Large Language Models Latently Perform Multi-hop Reasoning? (2024) • No Venue
Yang et al.
Prompting Large Language Models With Rationale Heuristics For Knowledge-based Visual Question Answering (2024) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 153 citations
Hu et al.
Adaptive-rag: Learning To Adapt Retrieval-augmented Large Language Models Through Question Complexity (2024) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 72 citations
Jeong et al.
Improving Medical Reasoning Through Retrieval And Self-reflection With Retrieval-augmented Large Language Models (2024) • Bioinformatics • 50 citations
Jeong et al.
Instruction-tuned Language Models Are Better Knowledge Learners (2024) • No Venue
Jiang et al.
Pegasus-v1 Technical Report (2024) • No Venue
Jung et al.
MEDIC: Towards A Comprehensive Framework For Evaluating Llms In Clinical Applications (2024) • No Venue
Kanithi et al.
"i'm Not Sure, But...": Examining The Impact Of Large Language Models' Uncertainty Expression On User Reliance And Trust (2024) • FAccT '24: The 2024 ACM Conference on Fairness, Accountability, and Transparency • 55 citations
Kim et al.
Husky: A Unified, Open-source Language Agent For Multi-step Reasoning (2024) • No Venue
Kim et al.
THEANINE: Revisiting Memory Management In Long-term Conversations With Timeline-augmented Response Generation (2024) • No Venue
Kim et al.
Babilong: Testing The Limits Of Llms With Long Context Reasoning-in-a-haystack (2024) • No Venue
Kuratov et al.
Biomistral: A Collection Of Open-source Pretrained Large Language Models For Medical Domains (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 108 citations
Labrak et al.
A Human-inspired Reading Agent With Gist Memory Of Very Long Contexts (2024) • No Venue
Lee et al.
HILL: A Hallucination Identifier For Large Language Models (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 40 citations
Leiser et al.
Same Task, More Tokens: The Impact Of Input Length On The Reasoning Performance Of Large Language Models (2024) • No Venue
Mosh Levy, Alon Jacoby, Yoav Goldberg
Retrieval-augmented Egocentric Video Captioning (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 77 citations
Xu et al.
Exploring The Potential Of Large Language Models In Self-adaptive Systems (2024) • 2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI) • 47 citations
Li et al.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-language Model And A Comprehensive Multimodal Dataset Towards General Medical AI (2024) • No Venue
Li et al.
Naturalbench: Evaluating Vision-language Models On Natural Adversarial Samples (2024) • No Venue
Li et al.
Needlebench: Can Llms Do Retrieval And Reasoning In 1 Million Context Window? (2024) • No Venue
Li et al.
Retrollm: Empowering Large Language Models To Retrieve Fine-grained Evidence Within Generation (2024) • No Venue
Li et al.
Seeing And Understanding: Bridging Vision With Chemical Knowledge Via Chemvlm (2024) • No Venue
Li et al.
Longcite: Enabling Llms To Generate Fine-grained Citations In Long-context QA (2024) • No Venue
Zhang et al.
Benchmarking Retrieval-augmented Generation For Medicine (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 119 citations
Xiong et al.
Paper Copilot: A Self-evolving And Efficient LLM System For Personalized Academic Assistance (2024) • No Venue
Lin et al.
A Preliminary Study Of O1 In Medicine: Are We Closer To An AI Doctor? (2024) • No Venue
Xie et al.
Show-o: One Single Transformer To Unify Multimodal Understanding And Generation (2024) • No Venue
Xie et al.
System 2 Attention (is Something You Might Need Too) (2023) • No Venue
Jason Weston, Sainbayar Sukhbaatar
Unifying Large Language Models And Knowledge Graphs: A Roadmap (2023) • IEEE Transactions on Knowledge and Data Engineering • 578 citations
Pan et al.
Make LLM A Testing Expert: Bringing Human-like Interaction To Mobile GUI Testing Via Functionality-aware Decisions (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 57 citations
Liu et al.
Lost In The Middle: How Language Models Use Long Contexts (2023) • No Venue
Liu et al.
Generative Multimodal Models Are In-context Learners (2023) • No Venue
Sun et al.
Interpretable Long-form Legal Question Answering With Retrieval-augmented Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 45 citations
Antoine Louis, Gijs van Dijck, Gerasimos Spanakis
Chameleon: Plug-and-play Compositional Reasoning With Large Language Models (2023) • Arxiv • 89 citations
Lu et al.
Chatgpt As A Factual Inconsistency Evaluator For Text Summarization (2023) • Arxiv • 50 citations
Zheheng Luo, Qianqian Xie, Sophia Ananiadou
Taiyi: A Bilingual Fine-tuned Large Language Model For Diverse Biomedical Tasks (2023) • Journal of the American Medical Informatics Association • 41 citations
Luo et al.
Llms For Knowledge Graph Construction And Reasoning: Recent Capabilities And Future Opportunities (2023) • World Wide Web • 130 citations
Zhu et al.
Paperqa: Retrieval-augmented Generative Agent For Scientific Research (2023) • Arxiv • 48 citations
Lála et al.
Faithful Chain-of-thought Reasoning (2023) • Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) • 63 citations
Lyu et al.
Query Rewriting For Retrieval-augmented Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 114 citations
Ma et al.
3d-vista: Pre-trained Transformer For 3D Vision And Text Alignment (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 82 citations
Zhu et al.
Large Language Models In Healthcare And Medical Domain: A Review (2023) • Informatics • 192 citations
Zabir Al Nazi, Wei Peng
Sources Of Hallucination By Large Language Models On Inference Tasks (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 71 citations
McKenna et al.
GAIA: A Benchmark For General AI Assistants (2023) • No Venue
Mialon et al.
Selfcheck: Using Llms To Zero-shot Check Their Own Step-by-step Reasoning (2023) • No Venue
Ning Miao, Yee Whye Teh, Tom Rainforth
Med-flamingo: A Multimodal Medical Few-shot Learner (2023) • No Venue
Moor et al.
Verbs In Action: Improving Verb Understanding In Video-language Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Momeni et al.
Embodiedgpt: Vision-language Pre-training Via Embodied Chain Of Thought (2023) • Arxiv • 41 citations
Mu et al.
Pmc-llama: Towards Building Open-source Language Models For Medicine (2023) • Journal of the American Medical Informatics Association • 179 citations
Wu et al.
Towards Expert-level Medical Question Answering With Large Language Models (2023) • Arxiv • 329 citations
Singhal et al.
Self-rag: Learning To Retrieve, Generate, And Critique Through Self-reflection (2023) • No Venue
Asai et al.
Let The Llms Talk: Simulating Human-to-human Conversational QA Via Zero-shot Llm-to-llm Interactions (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 48 citations
Abbasiantaeb et al.
Recommending Root-cause And Mitigation Steps For Cloud Incidents Using Large Language Models (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 69 citations
Ahmed et al.
Rest Meets React: Self-improvement For Multi-step Reasoning LLM Agent (2023) • No Venue
Aksitov et al.
Huatuo: Tuning Llama Model With Chinese Medical Knowledge (2023) • Arxiv • 93 citations
Wang et al.
Longbench: A Bilingual, Multitask Benchmark For Long Context Understanding (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Bai et al.
Tablegpt: Towards Unifying Tables, Nature Language And Commands Into One GPT (2023) • No Venue
Zha et al.
RT-2: Vision-language-action Models Transfer Web Knowledge To Robotic Control (2023) • No Venue
Brohan et al.
Prompt Engineering For Healthcare: Methodologies And Applications (2023) • Arxiv • 115 citations
Wang et al.
Contextual Object Detection With Multimodal Large Language Models (2023) • International Journal of Computer Vision • 48 citations
Zang et al.
Beyond Surface: Probing Llama Across Scales And Layers (2023) • No Venue
Chen et al.
Benchmarking Large Language Models For Biomedical Natural Language Processing Applications And Recommendations (2023) • Nature Communications • 41 citations
Chen et al.
Minigpt-v2: Large Language Model As A Unified Interface For Vision-language Multi-task Learning (2023) • No Venue
Chen et al.
How Is Chatgpt's Behavior Changing Over Time? (2023) • No Venue
Lingjiao Chen, Matei Zaharia, James Zou
Driving With Llms: Fusing Object-level Vector Modality For Explainable Autonomous Driving (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 110 citations
Chen et al.
LL3DA: Visual Interactive Instruction Tuning For Omni-3d Understanding, Reasoning, And Planning (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 48 citations
Chen et al.
One Adapter For All Programming Languages? Adapter Tuning For Code Search And Summarization (2023) • Arxiv • 41 citations
Wang et al.
Freshllms: Refreshing Large Language Models With Search Engine Augmentation (2023) • No Venue
Vu et al.
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-based Self-verification (2023) • No Venue
Zhou et al.
Adapting Large Language Models Via Reading Comprehension (2023) • No Venue
Daixuan Cheng, Shaohan Huang, Furu Wei
Can Chatgpt Understand Too? A Comparative Study On Chatgpt And Fine-tuned BERT (2023) • Arxiv • 144 citations
Zhong et al.
Open-ended Medical Visual Question Answering Through Prefix Tuning Of Language Models (2023) • Lecture Notes in Computer Science • 43 citations
Sonsbeek et al.
Chatclimate: Grounding Conversational AI In Climate Science (2023) • Communications Earth & Environment • 87 citations
Vaghefi et al.
Self-supervised Learning Of Action Affordances As Interaction Modes (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 41 citations
Wang et al.
Chain-of-verification Reduces Hallucination In Large Language Models (2023) • No Venue
Dhuliawala et al.
Large Language Model For Science: A Study On P Vs. NP (2023) • No Venue
Dong et al.
Lumos: Learning Agents With Unified Data, Modular Design, And Open-source Llms (2023) • No Venue
Yin et al.
Pdftriage: Question Answering Over Long, Structured Documents (2023) • No Venue
Saad-Falcon et al.
Enhancing Retrieval-augmented Large Language Models With Iterative Retrieval-generation Synergy (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 94 citations
Shao et al.
Prompting Large Language Models With Speech Recognition Abilities (2023) • ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 60 citations
Fathullah et al.
Medalign: A Clinician-generated Dataset For Instruction Following With Electronic Medical Records (2023) • No Venue
Fleming et al.
Creating A Large Language Model Of A Philosopher (2023) • Mind & Language • 46 citations
Eric Schwitzgebel, David Schwitzgebel, Anna Strasser
G-llava: Solving Geometric Problem With Multi-modal Large Language Model (2023) • No Venue
Gao et al.
Exploring The Feasibility Of Chatgpt For Event Extraction (2023) • Arxiv • 55 citations
Gao et al.
Assistgpt: A General Multi-modal Assistant That Can Plan, Execute, Inspect, And Learn (2023) • No Venue
Gao et al.
Prompt Cache: Modular Attention Reuse For Low-latency Inference (2023) • No Venue
Gim et al.
Can Chatgpt Replace Traditional KBQA Models? An In-depth Analysis Of The Question Answering Performance Of The GPT LLM Family (2023) • Lecture Notes in Computer Science • 66 citations
Tan et al.
How Far Are Large Language Models From Agents With Theory-of-mind? (2023) • No Venue
Zhou et al.
Medagents: Large Language Models As Collaborators For Zero-shot Medical Reasoning (2023) • Findings of the Association for Computational Linguistics ACL 2024 • 53 citations
Tang et al.
Detecting And Preventing Hallucinations In Large Vision Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 93 citations
Anisha Gunjal, Jihan Yin, Erhan Bas
Mitigating The Learning Bias Towards Repetition By Self-contrastive Training For Open-ended Generation (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 41 citations
Jian Guan, Minlie Huang
Hallusionbench: An Advanced Diagnostic Suite For Entangled Language Hallucination And Visual Illusion In Large Vision-language Models (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Guan et al.
Chatie: Zero-shot Information Extraction Via Chatting With Chatgpt (2023) • Arxiv • 141 citations
Wei et al.
Onellm: One Framework To Align All Modalities With Language (2023) • No Venue
Han et al.
Platform-independent And Curriculum-oriented Intelligent Assistant For Higher Education (2023) • International Journal of Educational Technology in Higher Education • 80 citations
Sajja et al.
Annollm: Making Large Language Models To Be Better Crowdsourced Annotators (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track) • 46 citations
He et al.
Stay On Topic With Classifier-free Guidance (2023) • No Venue
Sanchez et al.
Biomedclip: A Multimodal Biomedical Foundation Model Pretrained From Fifteen Million Scientific Image-text Pairs (2023) • Arxiv • 87 citations
Zhang et al.
Biomedgpt: A Generalist Vision-language Foundation Model For Diverse Biomedical Tasks (2023) • Arxiv • 49 citations
Zhang et al.
From Task Structures To World Models: What Do Llms Know? (2023) • Trends in Cognitive Sciences • 43 citations
Ilker Yildirim, L. A. Paul
Evaluating Large Language Models On A Highly-specialized Topic, Radiation Oncology Physics (2023) • Frontiers in Oncology • 112 citations
Holmes et al.
Cogagent: A Visual Language Model For GUI Agents (2023) • No Venue
Hong et al.
3D-LLM: Injecting The 3D World Into Large Language Models (2023) • No Venue
Hong et al.
Aligning Instruction Tasks Unlocks Large Language Models As Zero-shot Relation Extractors (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 44 citations
Kai Zhang, Bernal Jiménez Gutiérrez, Yu Su
Conceptlab: Creative Generation Using Diffusion Prior Constraints (2023) • No Venue
Richardson et al.
BLIVA: A Simple Multimodal LLM For Better Handling Of Text-rich Visual Questions (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Hu et al.
TIFA: Accurate And Interpretable Text-to-image Faithfulness Evaluation With Question Answering (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Hu et al.
Tech: Text-guided Reconstruction Of Lifelike Clothed Humans (2023) • No Venue
Huang et al.
GPQA: A Graduate-level Google-proof Q&A Benchmark (2023) • No Venue
Rein et al.
Retrieving Supporting Evidence For Llms Generated Answers (2023) • Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 47 citations
Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke
From Image To Language: A Critical Analysis Of Visual Question Answering (VQA) Approaches, Challenges, And Opportunities (2023) • Information Fusion • 58 citations
Ishmam et al.
Graphologue: Exploring Large Language Model Responses With Interactive Diagrams (2023) • UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology • 93 citations
Jiang et al.
Structgpt: A General Framework For Large Language Model To Reason Over Structured Data (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 111 citations
Jiang et al.
Genegpt: Augmenting Large Language Models With Domain Tools For Improved Access To Biomedical Information (2023) • Bioinformatics • 99 citations
Jin et al.
Video-text As Game Players: Hierarchical Banzhaf Interaction For Cross-modal Representation Learning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Jin et al.
Polylm: An Open Source Polyglot Large Language Model (2023) • No Venue
Wei et al.
Is Stack Overflow Obsolete? An Empirical Study Of The Characteristics Of Chatgpt Answers To Stack Overflow Questions (2023) • Arxiv • 49 citations
Kabir et al.
Video-llava: Learning United Visual Representation By Alignment Before Projection (2023) • No Venue
Lin et al.
Chatbots Put To The Test In Math And Logic Problems: A Preliminary Comparison And Assessment Of Chatgpt-3.5, Chatgpt-4, And Google Bard (2023) • AI • 58 citations
Vagelis Plevris, George Papazafeiropoulos, Alejandro Jiménez Rios
Mindmap: Knowledge Graph Prompting Sparks Graph Of Thoughts In Large Language Models (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Yilin Wen, Zifeng Wang, Jimeng Sun
Masked Vision And Language Pre-training With Unimodal And Multimodal Contrastive Losses For Medical Visual Question Answering (2023) • Lecture Notes in Computer Science • 41 citations
Li et al.
Chatdoctor: A Medical Chat Model Fine-tuned On A Large Language Model Meta-ai (llama) Using Medical Domain Knowledge (2023) • Cureus • 256 citations
Li et al.
A Comparative Study Of Pretrained Language Models For Long Clinical Text (2023) • Journal of the American Medical Informatics Association • 82 citations
Li et al.
Graphix-t5: Mixing Pre-trained Transformers With Graph-aware Layers For Text-to-sql Parsing (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 51 citations
Li et al.
Flexkbqa: A Flexible Llm-powered Framework For Few-shot Knowledge Base Question Answering (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 45 citations
Li et al.
Loftq: Lora-fine-tuning-aware Quantization For Large Language Models (2023) • No Venue
Li et al.
Llava-med: Training A Large Language-and-vision Assistant For Biomedicine In One Day (2023) • Arxiv • 216 citations
Li et al.
Vision-language Models In Remote Sensing: Current Progress And Future Trends (2023) • IEEE Geoscience and Remote Sensing Magazine • 80 citations
Li et al.
Chatgpt Vs. Google: A Comparative Study Of Search Performance And User Experience (2023) • SSRN Electronic Journal • 46 citations
Ruiyun Xu, Yue Feng, Hailiang Chen
Evaluating Embedding Apis For Information Retrieval (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 73 citations
Kamalloo et al.
PMC-VQA: Visual Instruction Tuning For Medical Visual Question Answering (2023) • Arxiv • 57 citations
Zhang et al.
Evaluating GPT-4 And Chatgpt On Japanese Medical Licensing Examinations (2023) • Arxiv • 50 citations
Kasai et al.
Dspy: Compiling Declarative Language Model Calls Into Self-improving Pipelines (2023) • No Venue
Khattab et al.
Chatgpt: Jack Of All Trades, Master Of None (2023) • Information Fusion • 468 citations
Kocoń et al.
Geochat: Grounded Large Vision-language Model For Remote Sensing (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Kuckreja et al.
A Systematic Study And Comprehensive Evaluation Of Chatgpt On Benchmark Datasets (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 69 citations
Laskar et al.
In Chatgpt We Trust? Measuring And Characterizing The Reliability Of Chatgpt (2023) • Arxiv • 71 citations
Shen et al.
Opportunities And Challenges For Chatgpt And Large Language Models In Biomedicine And Health (2023) • Briefings in Bioinformatics • 233 citations
Tian et al.
Mplug: Effective And Efficient Vision-language Learning By Cross-modal Skip-connections (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 141 citations
Li et al.
Invariant Grounding For Video Question Answering (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 97 citations
Li et al.
Measuring And Narrowing The Compositionality Gap In Language Models (2022) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 101 citations
Press et al.
Teaching Models To Express Their Uncertainty In Words (2022) • Arxiv • 53 citations
Stephanie Lin, Jacob Hilton, Owain Evans
Improving Passage Retrieval With Zero-shot Question Generation (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 57 citations
Sachan et al.
Layoutlmv3: Pre-training For Document AI With Unified Text And Image Masking (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 379 citations
Huang et al.
Zero-shot Video Question Answering Via Frozen Bidirectional Language Models (2022) • Arxiv • 64 citations
Yang et al.
Vision-language Pre-training With Triple Contrastive Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 244 citations
Yang et al.
Survey Of Hallucination In Natural Language Generation (2022) • ACM Computing Surveys • 2334 citations
Ji et al.
Maieutic Prompting: Logically Consistent Reasoning With Recursive Explanations (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 64 citations
Jung et al.
Fantastic Questions And Where To Find Them: Fairytaleqa -- An Authentic Dataset For Narrative Comprehension (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Xu et al.
Decomposed Prompting: A Modular Approach For Solving Complex Tasks (2022) • Arxiv • 91 citations
Khot et al.
Multihiertt: Numerical Reasoning Over Multi Hierarchical Tabular And Textual Data (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 46 citations
Zhao et al.
Omnivl:one Foundation Model For Image-language And Video-language Tasks (2022) • Arxiv • 68 citations
Wang et al.
Image As A Foreign Language: Beit Pretraining For All Vision And Vision-language Tasks (2022) • Arxiv • 148 citations
Wang et al.
Lila: A Unified Benchmark For Mathematical Reasoning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 70 citations
Mishra et al.
Prompting Is Programming: A Query Language For Large Language Models (2022) • Proceedings of the ACM on Programming Languages • 64 citations
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
Autoregressive Search Engines: Generating Substrings As Document Identifiers (2022) • Arxiv • 66 citations
Bevilacqua et al.
Revisiting The "video" In Video-language Understanding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 106 citations
Buch et al.
Star: Bootstrapping Reasoning With Reasoning (2022) • Arxiv • 113 citations
Zelikman et al.
Hybrid Transformer With Multi-level Fusion For Multimodal Knowledge Graph Completion (2022) • SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 152 citations
Chen et al.
Pali: A Jointly-scaled Multilingual Language-image Model (2022) • Arxiv • 194 citations
Chen et al.
Murag: Multimodal Retrieval-augmented Generator For Open Question Answering Over Images And Text (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 65 citations
Chen et al.
Towards A General Pre-training Framework For Adaptive Learning In Moocs (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 84 citations
Zhong et al.
Ernie-layout: Layout Knowledge Enhanced Pre-training For Visually-rich Document Understanding (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 53 citations
Peng et al.
Medmcqa : A Large-scale Multi-subject Multi-choice Dataset For Medical Domain Question Answering (2022) • ACM Conference on Health Inference and Learning (CHIL) 2022 • 72 citations
Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu
Large Language Models Encode Clinical Knowledge (2022) • Nature • 1963 citations
Singhal et al.
When Not To Trust Language Models: Investigating Effectiveness Of Parametric And Non-parametric Memories (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 146 citations
Mallen et al.
Enabling Multimodal Generation On CLIP Via Vision-language Knowledge Distillation (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 52 citations
Dai et al.
St-moe: Designing Stable And Transferable Sparse Expert Models (2022) • Arxiv • 43 citations
Zoph et al.
Mukea: Multimodal Knowledge Extraction And Accumulation For Knowledge-based Visual Question Answering (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Ding et al.
Do Large Language Models Know What Humans Know? (2022) • Cognitive Science • 63 citations
Trott et al.
Coarse-to-fine Vision-language Pre-training With Fusion In The Backbone (2022) • Arxiv • 67 citations
Dou et al.
Unifying Vision, Text, And Layout For Universal Document Processing (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Tang et al.
An Empirical Study Of End-to-end Video-language Transformers With Masked Visual Modeling (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Fu et al.
RARR: Researching And Revising What Language Models Say, Using Language Models (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Gao et al.
The Unreliability Of Explanations In Few-shot Prompting For Textual Reasoning (2022) • Arxiv • 52 citations
Xi Ye, Greg Durrett
MIST: Multi-modal Iterative Spatial-temporal Transformer For Long-form Video Question Answering (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 60 citations
Gao et al.
Plug-and-play VQA: Zero-shot VQA By Conjoining Large Pretrained Models With Zero Training (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 64 citations
Tiong et al.
Zerogen: Efficient Zero-shot Learning Via Dataset Generation (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 82 citations
Ye et al.
Linkbert: Pretraining Language Models With Document Links (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 163 citations
Michihiro Yasunaga, Jure Leskovec, Percy Liang
Can Large Language Models Reason About Medical Questions? (2022) • Patterns • 138 citations
Liévin et al.
Deep Bidirectional Language-knowledge Graph Pretraining (2022) • Arxiv • 87 citations
Yasunaga et al.
A-OKVQA: A Benchmark For Visual Question Answering Using World Knowledge (2022) • Lecture Notes in Computer Science • 162 citations
Schwenk et al.
ASQA: Factoid Questions Meet Long-form Answers (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 43 citations
Stelmakh et al.
Learn To Explain: Multimodal Reasoning Via Thought Chains For Science Question Answering (2022) • Arxiv • 214 citations
Lu et al.
Re2g: Retrieve, Rerank, Generate (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 42 citations
Glass et al.
Qdrop: Randomly Dropping Quantization For Extremely Low-bit Post-training Quantization (2022) • Arxiv • 44 citations
Wei et al.
Subgraph Retrieval Enhanced Model For Multi-hop Knowledge Base Question Answering (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 76 citations
Zhang et al.
Unifiedskg: Unifying And Multi-tasking Structured Knowledge Grounding With Text-to-text Language Models (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 74 citations
Xie et al.
NLX-GPT: A Model For Natural Language Explanations In Vision And Vision-language Tasks (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Fawaz Sammani, Tanmoy Mukherjee, Nikos Deligiannis
Video Graph Transformer For Video Question Answering (2022) • Lecture Notes in Computer Science • 66 citations
Xiao et al.
A Good Prompt Is Worth Millions Of Parameters: Low-resource Prompt-based Learning For Vision-language Models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 66 citations
Jin et al.
Generated Knowledge Prompting For Commonsense Reasoning (2021) • Arxiv • 44 citations
Liu et al.
Chinesebert: Chinese Pretraining Enhanced By Glyph And Pinyin Information (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 176 citations
Sun et al.
Truthfulqa: Measuring How Models Mimic Human Falsehoods (2021) • Arxiv • 112 citations
Stephanie Lin, Jacob Hilton, Owain Evans
Psyqa: A Chinese Dataset For Generating Long Counseling Text For Mental Health Support (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 47 citations
Sun et al.
Transfernet: An Effective And Transparent Framework For Multi-hop Question Answering Over Relation Graph (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 86 citations
Shi et al.
TAPEX: Table Pre-training Via Learning A Neural SQL Executor (2021) • Arxiv • 90 citations
Liu et al.
What Makes Good In-context Examples For GPT-$3$? (2021) • Arxiv • 154 citations
Liu et al.
An Empirical Study Of GPT-3 For Few-shot Knowledge-based VQA (2021) • Arxiv • 46 citations
Yang et al.
Retrieving And Reading: A Comprehensive Survey On Open-domain Question Answering (2021) • Arxiv • 152 citations
Zhu et al.
Semantic Answer Similarity For Evaluating Question Answering Models (2021) • Proceedings of the 3rd Workshop on Machine Reading for Question Answering • 43 citations
Risch et al.
Complex Temporal Question Answering On Knowledge Graphs (2021) • Proceedings of the 30th ACM International Conference on Information & Knowledge Management • 64 citations
Jia et al.
How Much Can CLIP Benefit Vision-and-language Tasks? (2021) • Arxiv • 152 citations
Shen et al.
Vlmo: Unified Vision-language Pre-training With Mixture-of-modality-experts (2021) • Arxiv • 288 citations
Bao et al.
Less Is More: Clipbert For Video-and-language Learning Via Sparse Sampling (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 513 citations
Lei et al.
Banglabert: Language Model Pretraining And Benchmarks For Low-resource Language Understanding Evaluation In Bangla (2021) • Findings of the Association for Computational Linguistics: NAACL 2022 • 71 citations
Bhattacharjee et al.
Latr: Layout-aware Transformer For Scene-text VQA (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 67 citations
Biten et al.
Quiz-style Question Generation For News Stories (2021) • Proceedings of the Web Conference 2021 • 41 citations
Adam D. Lelkes, Vinh Q. Tran, Cong Yu
Pangu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language Models With Auto-parallel Computation (2021) • Arxiv • 94 citations
Zeng et al.
Recursively Summarizing Books With Human Feedback (2021) • Arxiv • 65 citations
Wu et al.
Metaicl: Learning To Learn In Context (2021) • Arxiv • 61 citations
Min et al.
MERLOT: Multimodal Neural Script Knowledge Models (2021) • Arxiv • 54 citations
Zellers et al.
Indonlg: Benchmark And Resources For Evaluating Indonesian Natural Language Generation (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 68 citations
Cahyawijaya et al.
Generic Attention-model Explainability For Interpreting Bi-modal And Encoder-decoder Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 207 citations
Hila Chefer, Shir Gur, Lior Wolf
Bidirectional Machine Reading Comprehension For Aspect Sentiment Triplet Extraction (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 185 citations
Chen et al.
Zero-shot Cross-lingual Transfer Of Neural Machine Translation With Multilingual Pretrained Encoders (2021) • Lecture Notes in Computer Science • 51 citations
Chen et al.
Factual Probing Is [MASK]: Learning Vs. Learning To Recall (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 230 citations
Zexuan Zhong, Dan Friedman, Danqi Chen
Unifying Vision-and-language Tasks Via Text Generation (2021) • Arxiv • 145 citations
Cho et al.
Adversarial VQA: A New Benchmark For Evaluating The Robustness Of VQA Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 42 citations
Li et al.
CANINE: Pre-training An Efficient Tokenization-free Encoder For Language Representation (2021) • Transactions of the Association for Computational Linguistics • 114 citations
Clark et al.
Videoclip: Contrastive Pre-training For Zero-shot Video-text Understanding (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 341 citations
Xu et al.
Case-based Reasoning For Natural Language Queries Over Knowledge Bases (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 99 citations
Das et al.
A Dataset Of Information-seeking Questions And Answers Anchored In Research Papers (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 86 citations
Dasigi et al.
Editing Factual Knowledge In Language Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Nicola de Cao, Wilker Aziz, Ivan Titov
Docnli: A Large-scale Dataset For Document-level Natural Language Inference (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 51 citations
Wenpeng Yin, Dragomir Radev, Caiming Xiong
Hidden Backdoors In Human-centric Language Models (2021) • CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security • 84 citations
Li et al.
A Neural Network Solves, Explains, And Generates University Math Problems By Program Synthesis And Few-shot Learning At Human Level (2021) • Proceedings of the National Academy of Sciences • 70 citations
Drori et al.
Lawformer: A Pre-trained Language Model For Chinese Legal Long Documents (2021) • AI Open • 170 citations
Xiao et al.
Image Captioning For Effective Use Of Language Models In Knowledge-based Visual Question Answering (2021) • Expert Systems with Applications • 48 citations
Salaberria et al.
MDETR -- Modulated Detection For End-to-end Multi-modal Understanding (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 594 citations
Kamath et al.
Longt5: Efficient Text-to-text Transformer For Long Sequences (2021) • Findings of the Association for Computational Linguistics: NAACL 2022 • 87 citations
Guo et al.
Greedy Gradient Ensemble For Robust Visual Question Answering (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 82 citations
Han et al.
LPF: A Language-prior Feedback Objective Function For De-biased Visual Question Answering (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 45 citations
Zujie Liang, Haifeng Hu, Jiaying Zhu
Does CLIP Benefit Visual Question Answering In The Medical Domain As Much As It Does In The General Domain? (2021) • Arxiv • 41 citations
Sedigheh Eslami, Gerard de Melo, Christoph Meinel
Structurallm: Structural Pre-training For Form Understanding (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 61 citations
Li et al.
Compressing Visual-linguistic Model Via Knowledge Distillation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 50 citations
Fang et al.
Multidoc2dial: Modeling Dialogues Grounded In Multiple Documents (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 42 citations
Feng et al.
Attend What You Need: Motion-appearance Synergistic Networks For Video Question Answering (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 57 citations
Seo et al.
Natural SQL: Making SQL Easier To Infer From Natural Language Specifications (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 58 citations
Gan et al.
QA-GNN: Reasoning With Language Models And Knowledge Graphs For Question Answering (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 259 citations
Yasunaga et al.
Simple Entity-centric Questions Challenge Dense Retrievers (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 72 citations
Sciavolino et al.
VX2TEXT: End-to-end Learning Of Video-based Text Generation From Multimodal Inputs (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Lin et al.
Natural Language Video Localization: A Revisit In Span-based Question Answering Framework (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 87 citations
Zhang et al.
Visualmrc: Machine Reading Comprehension On Document Images (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 62 citations
Ryota Tanaka, Kyosuke Nishida, Sen Yoshida
Efficient Passage Retrieval With Hashing For Open-domain Question Answering (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) • 51 citations
Ikuya Yamada, Akari Asai, Hannaneh Hajishirzi
Going Full-tilt Boogie On Document Understanding With Text-image-layout Transformer (2021) • Lecture Notes in Computer Science • 95 citations
Powalski et al.
TAT-QA: A Question Answering Benchmark On A Hybrid Of Tabular And Textual Content In Finance (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 126 citations
Zhu et al.
Finding The Evidence: Localization-aware Answer Prediction For Text Visual Question Answering (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 44 citations
Wei Han, Hantao Huang, Tao Han
Adv-bert: BERT Is Not Robust On Misspellings! Generating Nature Adversarial Samples On BERT (2020) • Arxiv • 79 citations
Sun et al.
Selective Question Answering Under Domain Shift (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 113 citations
Amita Kamath, Robin Jia, Percy Liang
Roses Are Red, Violets Are Blue... But Should Vqa Expect Them To? (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Kervadec et al.
Template-based Question Generation From Retrieved Sentences For Improved Unsupervised Question Answering (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 68 citations
Fabbri et al.
Video2commonsense: Generating Commonsense Descriptions To Enrich Video Captioning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 59 citations
Fang et al.
Unifiedqa: Crossing Format Boundaries With A Single QA System (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 51 citations
Khashabi et al.
Hybrid Ranking Network For Text-to-sql (2020) • Arxiv • 50 citations
Lyu et al.
Universal Natural Language Processing With Limited Annotations: Try Few-shot Textual Entailment As A Start (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Yin et al.
Tabert: Pretraining For Joint Understanding Of Textual And Tabular Data (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 380 citations
Yin et al.
Look At The First Sentence: Position Bias In Question Answering (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 49 citations
Ko et al.
Scalable Multi-hop Relational Reasoning For Knowledge-aware Question Answering (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 198 citations
Feng et al.
Exploiting Structured Knowledge In Text Via Graph-guided Representation Learning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 57 citations
Shen et al.
Ternarybert: Distillation-aware Ultra-low Bit BERT (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 140 citations
Zhang et al.
Trojaning Language Models For Fun And Profit (2020) • 2021 IEEE European Symposium on Security and Privacy (EuroS&P) • 62 citations
Zhang et al.
ERNIE-GEN: An Enhanced Multi-flow Pre-training And Fine-tuning Framework For Natural Language Generation (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 105 citations
Xiao et al.
Realformer: Transformer Likes Residual Attention (2020) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 45 citations
He et al.
Beyond I.I.D.: Three Levels Of Generalization For Question Answering On Knowledge Bases (2020) • Proceedings of the Web Conference 2021 • 78 citations
Gu et al.
Machine Reading Comprehension: The Role Of Contextualized Language Models And Beyond (2020) • Arxiv • 48 citations
Zhuosheng Zhang, Hai Zhao, Rui Wang
VQA-LOL: Visual Question Answering Under The Lens Of Logic (2020) • Lecture Notes in Computer Science • 73 citations
Gokhale et al.
MUTANT: A Training Paradigm For Out-of-distribution Generalization In Visual Question Answering (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 134 citations
Gokhale et al.
Unshuffling Data For Improved Generalization (2020) • Arxiv • 41 citations
Damien Teney, Ehsan Abbasnejad, Anton van Den Hengel
Visual Relation Grounding In Videos (2020) • Lecture Notes in Computer Science • 42 citations
Xiao et al.
LUKE: Deep Contextualized Entity Representations With Entity-aware Self-attention (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 536 citations
Yamada et al.
Large-scale Adversarial Training For Vision-and-language Representation Learning (2020) • Arxiv • 287 citations
Gan et al.
Generating Question Titles For Stack Overflow From Mined Code Snippets (2020) • ACM Transactions on Software Engineering and Methodology • 55 citations
Gao et al.
Spatially Aware Multimodal Transformers For Textvqa (2020) • Lecture Notes in Computer Science • 59 citations
Kant et al.
Pretrained Transformers For Simple Question Answering Over Knowledge Graphs (2020) • Lecture Notes in Computer Science • 41 citations
D. Lukovnikov, A. Fischer, J. Lehmann
Sequential Latent Knowledge Selection For Knowledge-grounded Dialogue (2020) • Arxiv • 111 citations
Byeongchang Kim, Jaewoo Ahn, Gunhee Kim
A Dataset And Baselines For Visual Question Answering On Art (2020) • Lecture Notes in Computer Science • 49 citations
Garcia et al.
Generative Data Augmentation For Commonsense Reasoning (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 92 citations
Yang et al.
How Can We Know When Language Models Know? On The Calibration Of Language Models For Question Answering (2020) • Transactions of the Association for Computational Linguistics • 93 citations
Jiang et al.
Knowledge Graph Based Synthetic Corpus Generation For Knowledge-enhanced Language Model Pre-training (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 49 citations
Agarwal et al.
In Defense Of Grid Features For Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 351 citations
Jiang et al.
Covid-twitter-bert: A Natural Language Processing Model To Analyse COVID-19 Content On Twitter (2020) • Arxiv • 134 citations
Martin Müller, Marcel Salathé, Per E Kummervold
How Much Knowledge Can You Pack Into The Parameters Of A Language Model? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 187 citations
Adam Roberts, Colin Raffel, Noam Shazeer
Logic-guided Data Augmentation And Regularization For Consistent Question Answering (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 59 citations
Akari Asai, Hannaneh Hajishirzi
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder For Long-form Document Matching (2020) • CIKM '20: The 29th ACM International Conference on Information and Knowledge Management • 53 citations
Yang et al.
Prophetnet: Predicting Future N-gram For Sequence-to-sequence Pre-training (2020) • Arxiv • 83 citations
Qi et al.
HHH: An Online Medical Chatbot System Based On Knowledge Graph And Hierarchical Bi-directional Attention (2020) • Proceedings of the Australasian Computer Science Week Multiconference • 47 citations
Qiming Bao, Lin Ni, Jiamou Liu
Actbert: Learning Global-local Video-text Representations (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 392 citations
Linchao Zhu, Yi Yang
Photon: A Robust Cross-domain Text-to-sql System (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 42 citations
Zeng et al.
K-adapter: Infusing Knowledge Into Pre-trained Models With Adapters (2020) • Arxiv • 132 citations
Wang et al.
Biomegatron: Larger Biomedical Domain Language Model (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 69 citations
Shin et al.
VD-BERT: A Unified Vision And Dialog Transformer With BERT (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 43 citations
Wang et al.
Open-retrieval Conversational Question Answering (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 69 citations
Qu et al.
On The General Value Of Evidence, And Bilingual Scene-text Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Wang et al.
Investigating Entity Knowledge In BERT With Simple Neural End-to-end Entity Linking (2020) • Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) • 92 citations
Samuel Broscheit
A Survey On Machine Reading Comprehension: Tasks, Evaluation Metrics And Benchmark Datasets (2020) • Applied Sciences • 61 citations
Zeng et al.
Asking Questions The Human Way: Scalable Question-answer Generation From Text Corpus (2020) • Proceedings of The Web Conference 2020 • 43 citations
Liu et al.
Connecting The Dots: A Knowledgeable Path Generator For Commonsense Question Answering (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 65 citations
Wang et al.
Talking-heads Attention (2020) • Arxiv • 49 citations
Shazeer et al.
Efficient Neural Query Auto Completion (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Wang et al.
Pre-training Tasks For Embedding-based Large-scale Retrieval (2020) • Arxiv • 102 citations
Chang et al.
Artificial Intelligence (AI) In Action: Addressing The COVID-19 Pandemic With Natural Language Processing (NLP) (2020) • Annual Review of Biomedical Data Science • 56 citations
Chen et al.
Counterfactual Samples Synthesizing For Robust Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 335 citations
Chen et al.
Hybridqa: A Dataset Of Multi-hop Question Answering Over Tabular And Textual Data (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 178 citations
Chen et al.
Explaining Question Answering Models Through Text Generation (2020) • Arxiv • 44 citations
Veronica Latcinnik, Jonathan Berant
Document Modeling With Graph Attention Networks For Multi-grained Machine Reading Comprehension (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 55 citations
Zheng et al.
HERO: Hierarchical Encoder For Video+language Omni-representation Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 372 citations
Li et al.
Exploring And Predicting Transferability Across NLP Tasks (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 100 citations
Vu et al.
A Vietnamese Dataset For Evaluating Machine Reading Comprehension (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 58 citations
Nguyen et al.
Grappa: Grammar-augmented Pre-training For Table Semantic Parsing (2020) • Arxiv • 59 citations
Yu et al.
Rikinet: Reading Wikipedia Pages For Natural Question Answering (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 54 citations
Liu et al.
Rocketqa: An Optimized Training Approach To Dense Passage Retrieval For Open-domain Question Answering (2020) • Arxiv • 74 citations
Qu et al.
X-LXMERT: Paint, Caption And Answer Questions With Multi-modal Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 58 citations
Cho et al.
Ktrain: A Low-code Library For Augmented Machine Learning (2020) • Arxiv • 53 citations
Arun S. Maiya
Transformers As Soft Reasoners Over Language (2020) • Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20} • 48 citations
Peter Clark, Oyvind Tafjord, Kyle Richardson
Mobilebert: A Compact Task-agnostic BERT For Resource-limited Devices (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 593 citations
Sun et al.
FEQA: A Question Answering Evaluation Framework For Faithfulness Assessment In Abstractive Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 79 citations
Esin Durmus, He He, Mona Diab
Bridging Textual And Tabular Data For Cross-domain Text-to-sql Semantic Parsing (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 145 citations
Xi Victoria Lin, Richard Socher, Caiming Xiong
Deep Multimodal Neural Architecture Search (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 87 citations
Yu et al.
Reclor: A Reading Comprehension Dataset Requiring Logical Reasoning (2020) • Arxiv • 127 citations
Yu et al.
Fquad: French Question Answering Dataset (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 62 citations
D'Hoffschmidt et al.
Differentiable Reasoning Over A Virtual Knowledge Base (2020) • Arxiv • 44 citations
Dhingra et al.
Location-aware Graph Convolutional Networks For Video Question Answering (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 169 citations
Huang et al.
Overcoming Language Priors With Self-supervised Learning For Visual Question Answering (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 108 citations
Zhu et al.
Multi-fact Correction In Abstractive Text Summarization (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 96 citations
Dong et al.
BERT With History Answer Embedding For Conversational Question Answering (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 142 citations
Qu et al.
Negated And Misprimed Probes For Pretrained Language Models: Birds Can Talk, But Cannot Fly (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 69 citations
Nora Kassner, Hinrich Schütze
Spanbert: Improving Pre-training By Representing And Predicting Spans (2019) • Transactions of the Association for Computational Linguistics • 179 citations
Joshi et al.
Learning To Ask Unanswerable Questions For Machine Reading Comprehension (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 44 citations
Zhu et al.
Learning And Evaluating General Linguistic Intelligence (2019) • Arxiv • 156 citations
Yogatama et al.
Repurposing Entailment For Multi-hop Question Answering Tasks (2019) • Proceedings of the 2019 Conference of the North • 43 citations
Trivedi et al.
Deep Modular Co-attention Networks For Visual Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 862 citations
Yu et al.
Universal Text Representation From BERT: An Empirical Study (2019) • Arxiv • 40 citations
Ma et al.
TWEETQA: A Social Media Focused Question Answering Dataset (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 64 citations
Xiong et al.
Evaluating Commonsense In Pre-trained Language Models (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 50 citations
Zhou et al.
Thieves On Sesame Street! Model Extraction Of Bert-based Apis (2019) • Arxiv • 73 citations
Krishna et al.
Information Maximizing Visual Question Generation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 90 citations
Ranjay Krishna, Michael Bernstein, Li Fei-Fei
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs (2019) • Arxiv • 96 citations
Dua et al.
Activitynet-qa: A Dataset For Understanding Complex Web Videos Via Question Answering (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 199 citations
Yu et al.
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 545 citations
Marino et al.
Reasoning Over Paragraph Effects In Situations (2019) • Proceedings of the 2nd Workshop on Machine Reading for Question Answering • 89 citations
Lin et al.
A Generalized Framework Of Sequence Generation With Application To Undirected Sequence Models (2019) • Arxiv • 46 citations
Mansimov et al.
Episodic Memory In Lifelong Language Learning (2019) • Arxiv • 99 citations
D'Autume et al.
A Simple But Effective Method To Incorporate Multi-turn Context With BERT For Conversational Machine Comprehension (2019) • Proceedings of the First Workshop on NLP for Conversational AI • 42 citations
Ohsugi et al.
A Multi-type Multi-span Network For Reading Comprehension That Requires Discrete Reasoning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 77 citations
Hu et al.
Select, Answer And Explain: Interpretable Multi-hop Reading Comprehension Over Multiple Documents (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 54 citations
Tu et al.
Using Local Knowledge Graph Construction To Scale Seq2seq Models To Multi-document Inputs (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 98 citations
Fan et al.
ELI5: Long Form Question Answering (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 284 citations
Fan et al.
Heterogeneous Memory Enhanced Multimodal Attention Model For Video Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 264 citations
Fan et al.
Nlprolog: Reasoning With Weak Unification For Question Answering In Natural Language (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 71 citations
Weber et al.
Cosmos QA: Machine Reading Comprehension With Contextual Commonsense Reasoning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 271 citations
Huang et al.
Towards Scalable And Reliable Capsule Networks For Challenging NLP Applications (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 136 citations
Zhao et al.
Learning To Generate Questions By Learning What Not To Generate (2019) • WWW '19: The Web Conference • 48 citations
Liu et al.
Speechbert: An Audio-and-text Jointly Learned Language Model For End-to-end Spoken Question Answering (2019) • Interspeech 2020 • 41 citations
Chuang et al.
Option Comparison Network For Multiple-choice Reading Comprehension (2019) • Arxiv • 49 citations
Ran et al.
Amazonqa: A Review-based Question Answering Task (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 45 citations
Gupta et al.
Boolq: Exploring The Surprising Difficulty Of Natural Yes/no Questions (2019) • Arxiv • 209 citations
Clark et al.
How Does BERT Answer Questions? A Layer-wise Analysis Of Transformer Representations (2019) • CIKM '19: The 28th ACM International Conference on Information and Knowledge Management • 63 citations
Aken et al.
LXMERT: Learning Cross-modality Encoder Representations From Transformers (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 1297 citations
Hao Tan, Mohit Bansal
X-SQL: Reinforce Schema Representation With Context (2019) • Arxiv • 53 citations
He et al.
Knowledge Guided Text Retrieval And Reading For Open Domain Question Answering (2019) • Arxiv • 84 citations
Min et al.
How Can We Know What Language Models Know? (2019) • Transactions of the Association for Computational Linguistics • 543 citations
Jiang et al.
A Question-entailment Approach To Question Answering (2019) • BMC Bioinformatics • 174 citations
Asma Ben Abacha, Dina Demner-Fushman
Understanding The Behaviors Of BERT In Ranking (2019) • Arxiv • 145 citations
Qiao et al.
Visual Entailment: A Novel Task For Fine-grained Image Understanding (2019) • Arxiv • 162 citations
Xie et al.
Visualbert: A Simple And Performant Baseline For Vision And Language (2019) • Arxiv • 1227 citations
Li et al.
A Unified MRC Framework For Named Entity Recognition (2019) • Arxiv • 50 citations
Li et al.
Semeval-2015 Task 3: Answer Selection In Community Question Answering (2019) • Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) • 252 citations
Nakov et al.
HAS-QA: Hierarchical Answer Spans Model For Open-domain Question Answering (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 43 citations
Pang et al.
Self-assembling Modular Networks For Interpretable Multi-hop Reasoning (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 56 citations
Yichen Jiang, Mohit Bansal
Fusion Of Detected Objects In Text For Visual Question Answering (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 41 citations
Alberti et al.
Asking Clarifying Questions In Open-domain Information-seeking Conversations (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 175 citations
Aliannejadi et al.
Editing-based SQL Query Generation For Cross-domain Context-dependent Questions (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 114 citations
Zhang et al.
Xlnet: Generalized Autoregressive Pretraining For Language Understanding (2019) • Arxiv • 1856 citations
Yang et al.
Unified Vision-language Pre-training For Image Captioning And VQA (2019) • Arxiv • 74 citations
Zhou et al.
Giving BERT A Calculator: Finding Operations And Arguments With Reading Comprehension (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 75 citations
Andor et al.
XLDA: Cross-lingual Data Augmentation For Natural Language Inference And Question Answering (2019) • Arxiv • 61 citations
Singh et al.
Simple And Effective Curriculum Pointer-generator Networks For Reading Comprehension Over Long Narratives (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 91 citations
Tay et al.
On The Cross-lingual Transferability Of Monolingual Representations (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 57 citations
Mikel Artetxe, Sebastian Ruder, Dani Yogatama
Portuguese Named Entity Recognition Using BERT-CRF (2019) • Arxiv • 180 citations
Fábio Souza, Rodrigo Nogueira, Roberto Lotufo
Socialiqa: Commonsense Reasoning About Social Interactions (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 111 citations
Sap et al.
Vilbert: Pretraining Task-agnostic Visiolinguistic Representations For Vision-and-language Tasks (2019) • Arxiv • 1672 citations
Lu et al.
TANDA: Transfer And Adapt Pre-trained Transformer Models For Answer Sentence Selection (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 92 citations
Siddhant Garg, Thuy Vu, Alessandro Moschitti
MLQA: Evaluating Cross-lingual Extractive Question Answering (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 52 citations
Lewis et al.
Unsupervised Question Answering By Cloze Translation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 136 citations
Patrick Lewis, Ludovic Denoyer, Sebastian Riedel
Answering Complex Open-domain Questions Through Iterative Query Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 103 citations
Qi et al.
Multilingual Universal Sentence Encoder For Semantic Retrieval (2019) • Arxiv • 66 citations
Yang et al.
Real-time Open-domain Question Answering With Dense-sparse Phrase Index (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 135 citations
Seo et al.
BART: Denoising Sequence-to-sequence Pre-training For Natural Language Generation, Translation, And Comprehension (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 3464 citations
Lewis et al.
Multi-task Learning For Conversational Question Answering Over A Large-scale Knowledge Base (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 89 citations
Shen et al.
Knowledge Aware Conversation Generation With Explainable Reasoning Over Augmented Graphs (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 95 citations
Liu et al.
Multi-task Learning With Language Modeling For Question Generation (2019) • Arxiv • 58 citations
Wenjie Zhou, Minghua Zhang, Yunfang Wu
Attentive History Selection For Conversational Question Answering (2019) • Proceedings of the 28th ACM International Conference on Information and Knowledge Management • 44 citations
Qu et al.
Product-aware Answer Generation In E-commerce Question-answering (2019) • Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining • 78 citations
Gao et al.
Multi-style Generative Reading Comprehension (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 76 citations
Nishida et al.
Rubi: Reducing Unimodal Biases In Visual Question Answering (2019) • Advances in Neural Information Processing Systems 2019 (pp. 839-850) • 205 citations
Cadene et al.
Coarse-grain Fine-grain Coattention Network For Multi-evidence Question Answering (2019) • Arxiv • 47 citations
Zhong et al.
Addressing Semantic Drift In Question Generation For Semi-supervised Question Answering (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 109 citations
Shiyue Zhang, Mohit Bansal
Measuring Compositional Generalization: A Comprehensive Method On Realistic Data (2019) • Arxiv • 55 citations
Keysers et al.
Structured Pruning Of A Bert-based Question Answering Model (2019) • Arxiv • 72 citations
J. S. McCarley, Rishav Chakravarti, Avirup Sil
Clevr-ref+: Diagnosing Visual Reasoning With Referring Expressions (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Liu et al.
Passage Re-ranking With BERT (2019) • Arxiv • 347 citations
Rodrigo Nogueira, Kyunghyun Cho
Language Models As Knowledge Bases? (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 725 citations
Petroni et al.
Biobert: A Pre-trained Biomedical Language Representation Model For Biomedical Text Mining (2019) • Bioinformatics • 6102 citations
Lee et al.
Reinforced Dynamic Reasoning For Conversational Question Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 40 citations
Pan et al.
Pubmedqa: A Dataset For Biomedical Research Question Answering (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 425 citations
Jin et al.
Self-critical Reasoning For Robust Visual Question Answering (2019) • Arxiv • 91 citations
Jialin Wu, Raymond J. Mooney
Adversarial Domain Adaptation For Machine Reading Comprehension (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 53 citations
Wang et al.
Structbert: Incorporating Language Structures Into Pre-training For Deep Language Understanding (2019) • Arxiv • 100 citations
Wang et al.
RAT-SQL: Relation-aware Schema Encoding And Linking For Text-to-sql Parsers (2019) • Arxiv • 72 citations
Wang et al.
Evidence Sentence Extraction For Machine Reading Comprehension (2019) • Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) • 45 citations
Wang et al.
Multi-hop Reading Comprehension Through Question Decomposition And Rescoring (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 182 citations
Min et al.
Automatic Spanish Translation Of The Squad Dataset For Multilingual Question Answering (2019) • Arxiv • 42 citations
Casimiro Pio Carrino, Marta R. Costa-Jussà, José A. R. Fonollosa
BERT Post-training For Review Reading Comprehension And Aspect-based Sentiment Analysis (2019) • Arxiv • 358 citations
Xu et al.
Answering While Summarizing: Multi-task Learning For Multi-hop QA With Evidence Extraction (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 96 citations
Nishida et al.
A Comprehensive Exploration On Wikisql With Table-aware Word Contextualization (2019) • Arxiv • 122 citations
Hwang et al.
UNITER: Universal Image-text Representation Learning (2019) • Arxiv • 183 citations
Chen et al.
Understanding Dataset Design Choices For Multi-hop Reasoning (2019) • Proceedings of the 2019 Conference of the North • 90 citations
Jifan Chen, Greg Durrett
Multi-hop Question Answering Via Reasoning Chains (2019) • Arxiv • 66 citations
Jifan Chen, Shih-Ting Lin, Greg Durrett
Reinforcement Learning Based Graph-to-sequence Model For Natural Question Generation (2019) • Arxiv • 79 citations
Yu Chen, Lingfei Wu, Mohammed J. Zaki
Review-driven Answer Generation For Product-related Questions In E-commerce (2019) • Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining • 50 citations
Chen et al.
Iterative Answer Prediction With Pointer-augmented Multimodal Transformers For Textvqa (2019) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 140 citations
Hu et al.
Towards Complex Text-to-sql In Cross-domain Database With Intermediate Representation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 348 citations
Guo et al.
Neural Architectures For Fine-grained Propaganda Detection In News (2019) • Arxiv • 51 citations
Gupta et al.
Quantifying And Alleviating The Language Prior Problem In Visual Question Answering (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Guo et al.
Explain Yourself! Leveraging Language Models For Commonsense Reasoning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 86 citations
Rajani et al.
Improving Question Answering Over Incomplete Kbs With Knowledge-aware Reader (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 118 citations
Xiong et al.
Conversing By Reading: Contentful Neural Conversation With On-demand Machine Reading (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 97 citations
Qin et al.
VL-BERT: Pre-training Of Generic Visual-linguistic Representations (2019) • Arxiv • 782 citations
Su et al.
GQA: A New Dataset For Real-world Visual Reasoning And Compositional Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Drew A. Hudson, Christopher D. Manning
Learning To Answer By Learning To Ask: Getting The Best Of GPT-2 And BERT Worlds (2019) • Arxiv • 52 citations
Tassilo Klein, Moin Nabi
End-to-end Open-domain Question Answering With Bertserini (2019) • Proceedings of the 2019 Conference of the North • 162 citations
Yang et al.
Hotpotqa: A Dataset For Diverse, Explainable Multi-hop Question Answering (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 540 citations
Yang et al.
Learning Semantic Textual Similarity From Conversations (2018) • Proceedings of The Third Workshop on Representation Learning for NLP • 154 citations
Yang et al.
What Makes Reading Comprehension Questions Easier? (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 102 citations
Sugawara et al.
Know What You Don't Know: Unanswerable Questions For Squad (2018) • Arxiv • 209 citations
Pranav Rajpurkar, Robin Jia, Percy Liang
Multimodal Explanations: Justifying Decisions And Pointing To The Evidence (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 101 citations
Park et al.
Response Ranking With Deep Matching Networks And External Knowledge In Information-seeking Conversation Systems (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 153 citations
Yang et al.
Efficient And Robust Question Answering From Minimal Context Over Documents (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 144 citations
Min et al.
Movie Question Answering: Remembering The Textual Cues For Layered Visual Contents (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 44 citations
Wang et al.
The Price Of Debiasing Automatic Metrics In Natural Language Evaluation (2018) • Arxiv • 43 citations
Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang
Incsql: Training Incremental Text-to-sql Parsers With Non-deterministic Oracles (2018) • Arxiv • 59 citations
Shi et al.
ODSQA: Open-domain Spoken Question Answering Dataset (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 42 citations
Lee et al.
Complex Sequential Question Answering: Towards Learning To Converse Over Linked Question Answer Pairs With A Knowledge Graph (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 172 citations
Saha et al.
Emrqa: A Large Corpus For Question Answering On Electronic Medical Records (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 164 citations
Pampari et al.
Multi-passage Machine Reading Comprehension With Cross-passage Answer Verification (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 102 citations
Wang et al.
Polisis: Automated Analysis And Presentation Of Privacy Policies Using Deep Learning (2018) • Arxiv • 174 citations
Harkous et al.
Can A Suit Of Armor Conduct Electricity? A New Dataset For Open Book Question Answering (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 557 citations
Mihaylov et al.
Neural-symbolic VQA: Disentangling Reasoning From Vision And Language Understanding (2018) • Arxiv • 233 citations
Yi et al.
Robust Machine Comprehension Models Via Adversarial Training (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 99 citations
Yicheng Wang, Mohit Bansal
Latent Alignment And Variational Attention (2018) • Arxiv • 85 citations
Deng et al.
Multimodal Dual Attention Memory For Video Story Question Answering (2018) • Lecture Notes in Computer Science • 73 citations
Kim et al.
Transforming Question Answering Datasets Into Natural Language Inference Datasets (2018) • Arxiv • 122 citations
Dorottya Demszky, Kelvin Guu, Percy Liang
Interpretation Of Natural Language Rules In Conversational Machine Reading (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 141 citations
Saeidi et al.
Adversarial Sampling And Training For Semi-supervised Information Retrieval (2018) • The World Wide Web Conference • 79 citations
Dae Hoon Park, Yi Chang
Qanet: Combining Local Convolution With Global Self-attention For Reading Comprehension (2018) • Arxiv • 417 citations
Yu et al.
Flowqa: Grasping Flow In History For Conversational Machine Comprehension (2018) • Arxiv • 63 citations
Hsin-Yuan Huang, Eunsol Choi, Wen-Tau Yih
Natural Language To Structured Query Generation Via Meta-learning (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 120 citations
Huang et al.
The Natural Language Decathlon: Multitask Learning As Question Answering (2018) • Arxiv • 338 citations
McCann et al.
Visual Question Reasoning On General Dependency Tree (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 42 citations
Cao et al.
A Joint Sequence Fusion Model For Video Question Answering And Retrieval (2018) • Lecture Notes in Computer Science • 331 citations
Youngjae Yu, Jongseok Kim, Gunhee Kim
Coqa: A Conversational Question Answering Challenge (2018) • Transactions of the Association for Computational Linguistics • 97 citations
Siva Reddy, Danqi Chen, Christopher D. Manning
Phrase-indexed Question Answering: A New Challenge For Scalable Document Comprehension (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 42 citations
Seo et al.
A Simple Method For Commonsense Reasoning (2018) • Arxiv • 312 citations
Trieu H. Trinh, Quoc V. Le
Automating Reading Comprehension By Generating Question And Answer Pairs (2018) • Lecture Notes in Computer Science • 44 citations
Kumar et al.
Open Domain Question Answering Using Early Fusion Of Knowledge Bases And Text (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 422 citations
Sun et al.
Medical Exam Question Answering With Large-scale Reading Comprehension (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Zhang et al.
VQA-E: Explaining, Elaborating, And Enhancing Your Answers For Visual Questions (2018) • Lecture Notes in Computer Science • 51 citations
Li et al.
Textbugger: Generating Adversarial Text Against Real-world Applications (2018) • Proceedings 2019 Network and Distributed System Security Symposium • 275 citations
Li et al.
Multilingual Extractive Reading Comprehension By Runtime Machine Translation (2018) • Arxiv • 59 citations
Asai et al.
Visual Coreference Resolution In Visual Dialog Using Neural Module Networks (2018) • Lecture Notes in Computer Science • 177 citations
Kottur et al.
Tell-and-answer: Towards Explainable Visual Question Answering Using Attributes And Captions (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 59 citations
Li et al.
Quac : Question Answering In Context (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 678 citations
Choi et al.
Harvesting Paragraph-level Question-answer Pairs From Wikipedia (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 169 citations
Xinya Du, Claire Cardie
Commonsense For Generative Multi-hop Question Answering Tasks (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 162 citations
Lisa Bauer, Yicheng Wang, Mohit Bansal
Simple Recurrent Units For Highly Parallelizable Recurrence (2017) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 53 citations
Lei et al.
End-to-end Optimization Of Goal-driven And Visually Grounded Dialogue Systems (2017) • Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence • 88 citations
Strub et al.
Learning Intrinsic Sparse Structures Within Long Short-term Memory (2017) • Arxiv • 103 citations
Wen et al.
Exploring Human-like Attention Supervision In Visual Question Answering (2017) • Arxiv • 45 citations
Tingting Qiao, Jianfeng Dong, Duanqing Xu
Visual Reference Resolution Using Attention Memory For Visual Dialog (2017) • Arxiv • 90 citations
Seo et al.
Learning To Ask: Neural Question Generation For Reading Comprehension (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 630 citations
Xinya Du, Junru Shao, Claire Cardie
Flexible End-to-end Dialogue System For Knowledge Grounded Conversation (2017) • Arxiv • 88 citations
Zhu et al.
Zero-shot Relation Extraction Via Reading Comprehension (2017) • Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017) • 56 citations
Levy et al.
Adversarial Examples For Evaluating Reading Comprehension Systems (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 1260 citations
Robin Jia, Percy Liang
Ask The Right Questions: Active Question Reformulation With Reinforcement Learning (2017) • Sixth International Conference on Learning Representations (ICLR) 2018 • 82 citations
Buck et al.
Variational Reasoning For Question Answering With Knowledge Graph (2017) • Arxiv • 180 citations
Zhang et al.
TGIF-QA: Toward Spatio-temporal Reasoning In Visual Question Answering (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 435 citations
Jang et al.
Quasar: Datasets For Question Answering By Search And Reading (2017) • Arxiv • 139 citations
Bhuwan Dhingra, Kathryn Mazaitis, William W. Cohen
Learning A Neural Semantic Parser From User Feedback (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 54 citations
Iyer et al.
S-net: From Answer Extraction To Answer Generation For Machine Reading Comprehension (2017) • Arxiv • 47 citations
Tan et al.
Deepstory: Video Story QA By Deep Embedded Memory Networks (2017) • Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence • 136 citations
Kim et al.
Incorporating External Knowledge To Answer Open-domain Visual Questions With Dynamic Memory Networks (2017) • Arxiv • 41 citations
Guohao Li, Hang Su, Wenwu Zhu
Semi-supervised QA With Generative Domain-adaptive Nets (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 143 citations
Yang et al.
Fusionnet: Fusing Via Fully-aware Attention With Application To Machine Comprehension (2017) • Arxiv • 86 citations
Huang et al.
Shapeworld - A New Test Methodology For Multimodal Language Understanding (2017) • Arxiv • 47 citations
Alexander Kuhnle, Ann Copestake
Learning To Reason: End-to-end Module Networks For Visual Question Answering (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 489 citations
Hu et al.
Question Answering And Question Generation As Dual Tasks (2017) • Arxiv • 170 citations
Tang et al.
Learned In Translation: Contextualized Word Vectors (2017) • Arxiv • 239 citations
McCann et al.
Don't Just Assume; Look And Answer: Overcoming Priors For Visual Question Answering (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 40 citations
Agrawal et al.
Dureader: A Chinese Machine Reading Comprehension Dataset From Real-world Applications (2017) • Arxiv • 51 citations
He et al.
Multi-modal Factorized Bilinear Pooling With Co-attention Learning For Visual Question Answering (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 694 citations
Yu et al.
Learning To Rank Question Answer Pairs With Holographic Dual LSTM Architecture (2017) • Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval • 63 citations
Tay et al.
Fooling Vision And Language Models Despite Localization And Attention Mechanism (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Xu et al.
Making Neural QA As Simple As Possible But Not Simpler (2017) • Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017) • 173 citations
Dirk Weissenborn, Georg Wiese, Laura Seiffe
Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling For Visual Question Answering (2017) • IEEE Transactions on Neural Networks and Learning Systems • 530 citations
Yu et al.
Simple And Effective Multi-paragraph Reading Comprehension (2017) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 63 citations
Christopher Clark, Matt Gardner
Semeval-2017 Task 1: Semantic Textual Similarity - Multilingual And Cross-lingual Focused Evaluation (2017) • Arxiv • 306 citations
Cer et al.
Evidence Aggregation For Answer Re-ranking In Open-domain Question Answering (2017) • Arxiv • 108 citations
Wang et al.
R$^3$: Reinforced Reader-ranker For Open-domain Question Answering (2017) • Arxiv • 87 citations
Wang et al.
Question Answering Through Transfer Learning From Large Fine-grained Supervision Data (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) • 114 citations
Sewon Min, Minjoon Seo, Hannaneh Hajishirzi
DCN+: Mixed Objective And Deep Residual Coattention For Question Answering (2017) • Arxiv • 86 citations
Caiming Xiong, Victor Zhong, Richard Socher
A Unified Query-based Generative Model For Question Generation And Question Answering (2017) • Arxiv • 50 citations
Linfeng Song, Zhiguo Wang, Wael Hamza
Learning Cooperative Visual Dialog Agents With Deep Reinforcement Learning (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 313 citations
Das et al.
Hyperbolic Representation Learning For Fast And Efficient Neural Question Answering (2017) • Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining • 53 citations
Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Visual Question Answering: A Survey Of Methods And Datasets (2016) • Arxiv • 44 citations
Wu et al.
A Focused Dynamic Attention Model For Visual Question Answering (2016) • Arxiv • 130 citations
Ilija Ilievski, Shuicheng Yan, Jiashi Feng
Machine Comprehension Using Match-lstm And Answer Pointer (2016) • Arxiv • 414 citations
Shuohang Wang, Jing Jiang
End-to-end Concept Word Detection For Video Captioning, Retrieval, And Question Answering (2016) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 146 citations
Yu et al.
CLEVR: A Diagnostic Dataset For Compositional Language And Elementary Visual Reasoning (2016) • Arxiv • 46 citations
Johnson et al.
Character-level Question Answering With Attention (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 107 citations
David Golub, Xiaodong He
Dataset And Neural Recurrent Sequence Labeling Model For Open-domain Factoid Question Answering (2016) • Arxiv • 68 citations
Li et al.
Multi-perspective Context Matching For Machine Comprehension (2016) • Arxiv • 115 citations
Wang et al.
Dialogue Learning With Human-in-the-loop (2016) • Arxiv • 46 citations
Li et al.
MS MARCO: A Human Generated Machine Reading Comprehension Dataset (2016) • Arxiv • 440 citations
Bajaj et al.
Leveraging Visual Question Answering For Image-caption Ranking (2016) • Lecture Notes in Computer Science • 82 citations
Xiao Lin, Devi Parikh
Visual Genome: Connecting Language And Vision Using Crowdsourced Dense Image Annotations (2016) • International Journal of Computer Vision • 4911 citations
Krishna et al.
A Context-aware Attention Network For Interactive Question Answering (2016) • Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining • 40 citations
Li et al.
Dynamic Memory Networks For Visual And Textual Question Answering (2016) • Arxiv • 593 citations
Caiming Xiong, Stephen Merity, Richard Socher
Multimodal Compact Bilinear Pooling For Visual Question Answering And Visual Grounding (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 1356 citations
Fukui et al.
Wikireading: A Novel Large-scale Language Understanding Task Over Wikipedia (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 128 citations
Hewlett et al.
Revisiting Visual Question Answering Baselines (2016) • Lecture Notes in Computer Science • 224 citations
Allan Jabri, Armand Joulin, Laurens van Der Maaten
Attentive Explanations: Justifying Decisions And Pointing To The Evidence (2016) • Arxiv • 55 citations
Park et al.
Neural Symbolic Machines: Learning Semantic Parsers On Freebase With Weak Supervision (2016) • Arxiv • 44 citations
Liang et al.
Dialog-based Language Learning (2016) • Arxiv • 69 citations
Jason Weston
Text Understanding With The Attention Sum Reader Network (2016) • Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 46 citations
Kadlec et al.
Zero-shot Visual Question Answering (2016) • Arxiv • 58 citations
Damien Teney, Anton van Den Hengel
Contextual LSTM (CLSTM) Models For Large Scale NLP Tasks (2016) • Arxiv • 190 citations
Ghosh et al.

Showing first 12 while collapsed. Click to expand and reveal all 746.

— R —

RAG 128 papers #

Vision-guided Chunking Is All You Need: Enhancing RAG With Multimodal Document Understanding (2025) • No Venue
Tripathi et al.
Chain-of-retrieval Augmented Generation (2025) • No Venue
Wang et al.
Comorag: A Cognitive-inspired Memory-organized RAG For Stateful Long Narrative Reasoning (2025) • No Venue
Wang et al.
PRELUDE: A Benchmark Designed To Require Global Comprehension And Reasoning Over Long Contexts (2025) • No Venue
Yu et al.
Mmlongbench: Benchmarking Long-context Vision-language Models Effectively And Thoroughly (2025) • No Venue
Wang et al.
Scholarcopilot: Training Large Language Models For Academic Writing With Accurate Citations (2025) • No Venue
Wang et al.
Vidorag: Visual Document Retrieval-augmented Generation Via Dynamic Iterative Reasoning Agents (2025) • No Venue
Wang et al.
Step-audio 2 Technical Report (2025) • No Venue
Wu et al.
Mmsearch-r1: Incentivizing Lmms To Search (2025) • No Venue
Wu et al.
Sitemb-v1.5: Improved Context-aware Dense Retrieval For Semantic Association And Long Story Comprehension (2025) • No Venue
Wu et al.
Webwalker: Benchmarking Llms In Web Traversal (2025) • No Venue
Wu et al.
Omnithink: Expanding Knowledge Boundaries In Machine Writing Through Thinking (2025) • No Venue
Xi et al.
Retrieval-augmented Large Language Models For Financial Time Series Forecasting (2025) • No Venue
Xiao et al.
Noderag: Structuring Graph-based RAG With Heterogeneous Nodes (2025) • No Venue
Xu et al.
Universalrag: Retrieval-augmented Generation Over Multiple Corpora With Diverse Modalities And Granularities (2025) • No Venue
Yeo et al.
Personalized Graph-based Retrieval For Large Language Models (2025) • No Venue
Au et al.
Wikontic: Constructing Wikidata-aligned, Ontology-aware Knowledge Graphs With Large Language Models (2025) • No Venue
Chepurova et al.
Mem0: Building Production-ready AI Agents With Scalable Long-term Memory (2025) • No Venue
Chhikara et al.
Command A: An Enterprise-ready Large Language Model (2025) • No Venue
Cohere et al.
Beyond RAG: Task-aware KV Cache Compression For Comprehensive Knowledge Reasoning (2025) • No Venue
Corallo et al.
MV-RAG: Retrieval Augmented Multiview Diffusion (2025) • No Venue
Yosef Dayani, Omer Benishu, Sagie Benaim
PHYSICS: Benchmarking Foundation Models On University-level Physics Problem Solving (2025) • No Venue
Feng et al.
The Aloe Family Recipe For Open And Specialized Healthcare Llms (2025) • No Venue
Garcia-Gasulla et al.
Deeprag: Thinking To Retrieval Step By Step For Large Language Models (2025) • No Venue
Guan et al.
Rag-anything: All-in-one RAG Framework (2025) • No Venue
Guo et al.
Mdocagent: A Multi-modal Multi-agent Framework For Document Understanding (2025) • No Venue
Han et al.
Spectrum Projection Score: Aligning Retrieved Summaries With Reader Models In Retrieval-augmented Generation (2025) • No Venue
Hu et al.
Wikipedia In The Era Of Llms: Evolution And Risks (2025) • No Venue
Huang et al.
Videorag: Retrieval-augmented Generation Over Video Corpus (2025) • No Venue
Jeong et al.
Experience Is The Best Teacher: Grounding Vlms For Robotics Through Self-generated Memory (2025) • No Venue
Lan et al.
Rearag: Knowledge-guided Reasoning Enhances Factuality Of Large Reasoning Models With Iterative Retrieval Augmented Generation (2025) • No Venue
Lee et al.
Memos: A Memory OS For AI System (2025) • No Venue
Li et al.
Search-o1: Agentic Search-enhanced Large Reasoning Models (2025) • No Venue
Li et al.
Towards Agentic RAG With Deep Reasoning: A Survey Of Rag-reasoning Systems In Llms (2025) • No Venue
Li et al.
Saferag: Benchmarking Security In Retrieval-augmented Generation Of Large Language Model (2025) • No Venue
Liang et al.
Longemotion: Measuring Emotional Intelligence Of Large Language Models In Long-context Interaction (2025) • No Venue
Liu et al.
Wikivideo: Article Generation From Multiple Videos (2025) • No Venue
Martin et al.
A Survey Of Context Engineering For Large Language Models (2025) • No Venue
Mei et al.
Ruccod: Towards Automated ICD Coding In Russian (2025) • No Venue
Nesterov et al.
AR-RAG: Autoregressive Retrieval Augmentation For Image Generation (2025) • No Venue
Qi et al.
Thinking Beyond Tokens: From Brain-inspired Intelligence To Cognitive Foundations For Artificial General Intelligence And Its Societal Impact (2025) • No Venue
Qureshi et al.
Dota-rag: Dynamic Of Thought Aggregation RAG (2025) • No Venue
Ruangtanusak et al.
Imagerag: Dynamic Image Retrieval For Reference-guided Image Generation (2025) • No Venue
Shalev-Arkushin et al.
Reasonir: Training Retrievers For Reasoning Tasks (2025) • No Venue
Shao et al.
Qwenlong-cprs: Towards Infty-llms With Dynamic Context Optimization (2025) • No Venue
Shen et al.
Are We On The Right Way For Assessing Document Retrieval-augmented Generation? (2025) • No Venue
Shen et al.
R1-searcher: Incentivizing The Search Capability In Llms Via Reinforcement Learning (2025) • No Venue
Song et al.
Grouprank: A Groupwise Reranking Paradigm Driven By Reinforcement Learning (2025) • No Venue
Sun et al.
HANRAG: Heuristic Accurate Noise-resistant Retrieval-augmented Generation For Multi-hop Question Answering (2025) • No Venue
Sun et al.
Hiersearch: A Hierarchical Enterprise Deep Search Framework Integrating Local And Web Searches (2025) • No Venue
Tan et al.
Code Graph Model (CGM): A Graph-integrated Large Language Model For Repository-level Software Engineering Tasks (2025) • No Venue
Tao et al.
From Rags To Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information For Factual Queries (2024) • No Venue
Wadhwa et al.
Openscholar: Synthesizing Scientific Literature With Retrieval-augmented Lms (2024) • No Venue
Asai et al.
Seven Failure Points When Engineering A Retrieval Augmented Generation System (2024) • Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI • 57 citations
Barnett et al.
Text2sql Is Not Enough: Unifying AI And Databases With TAG (2024) • No Venue
Biswal et al.
Agentpoison: Red-teaming LLM Agents Via Poisoning Memory Or Knowledge Bases (2024) • No Venue
Chen et al.
Region-aware Text-to-image Generation Via Hard Binding And Soft Refinement (2024) • No Venue
Chen et al.
Openresearcher: Unleashing AI For Accelerated Scientific Research (2024) • No Venue
Zheng et al.
CORAL: Benchmarking Multi-turn Conversational Retrieval-augmentation Generation (2024) • No Venue
Cheng et al.
Visrag: Vision-based Retrieval-augmented Generation On Multi-modality Documents (2024) • No Venue
Yu et al.
M3docrag: Multi-modal Retrieval Is What You Need For Multi-page Multi-document Understanding (2024) • No Venue
Cho et al.
The Power Of Noise: Redefining Retrieval For RAG Systems (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 105 citations
Cuconasu et al.
A Silver Bullet Or A Compromise For Full Attention? A Comprehensive Study Of Gist Token-based Context Compression (2024) • No Venue
Deng et al.
Towards Retrieval Augmented Generation Over Large Video Libraries (2024) • No Venue
Yannis Tevissen, Khalil Guetari, Frédéric Petitpont
Weaver: Foundation Models For Creative Writing (2024) • No Venue
Wang et al.
Searching For Best Practices In Retrieval-augmented Generation (2024) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 44 citations
Wang et al.
Htmlrag: HTML Is Better Than Plain Text For Modeling Retrieved Knowledge In RAG Systems (2024) • No Venue
Tan et al.
Omnieval: An Omnidirectional And Automatic RAG Evaluation Benchmark In Financial Domain (2024) • No Venue
Wang et al.
Evaluating Retrieval Quality In Retrieval-augmented Generation (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 79 citations
Alireza Salemi, Hamed Zamani
Hybridrag: Integrating Knowledge Graphs And Vector Retrieval Augmented Generation For Efficient Information Extraction (2024) • Proceedings of the 5th ACM International Conference on AI in Finance • 55 citations
Sarmah et al.
Blended RAG: Improving RAG (retriever-augmented Generation) Accuracy With Semantic Search And Hybrid Query-based Retrievers (2024) • 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR) • 47 citations
Kunal Sawarkar, Abhilasha Mangal, Shivam Raj Solanki
Fine Tuning Vs. Retrieval Augmented Generation For Less Popular Knowledge (2024) • Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 40 citations
Heydar Soudani, Evangelos Kanoulas, Faegheh Hasibi
Generation Of Asset Administration Shell With Large Language Model Agents: Toward Semantic Interoperability In Digital Twins In The Context Of Industry 4.0 (2024) • IEEE Access • 47 citations
Xia et al.
RULE: Reliable Multimodal RAG For Factuality In Medical Vision Language Models (2024) • No Venue
Xia et al.
Onegen: Efficient One-pass Unified Generation And Retrieval For Llms (2024) • No Venue
Zhang et al.
OCR Hinders RAG: Evaluating The Cascading Impact Of OCR On Retrieval-augmented Generation (2024) • No Venue
Zhang et al.
Retrieval-augmented Generation For Ai-generated Content: A Survey (2024) • Arxiv • 72 citations
Zhao et al.
Meta-chunking: Learning Efficient Text Segmentation Via Logical Perception (2024) • No Venue
Zhao et al.
Evaluating Very Long-term Conversational Memory Of LLM Agents (2024) • No Venue
Maharana et al.
Exploring The Capabilities And Limitations Of Large Language Models In The Electric Energy Sector (2024) • Joule • 69 citations
Majumder et al.
Grouse: A Benchmark To Evaluate Evaluators In Grounded Question Answering (2024) • No Venue
Muller et al.
Generative Representational Instruction Tuning (2024) • No Venue
Muennighoff et al.
Bielik 7B V0.1: A Polish Language Model -- Development, Insights, And Evaluation (2024) • No Venue
Ociepa et al.
Omnidocbench: Benchmarking Diverse PDF Document Parsing With Comprehensive Annotations (2024) • No Venue
Ouyang et al.
CBR-RAG: Case-based Reasoning For Retrieval Augmented Generation In Llms For Legal Question Answering (2024) • Lecture Notes in Computer Science • 52 citations
Wiratunga et al.
RAFT: Adapting Language Model To Domain Specific RAG (2024) • No Venue
Zhang et al.
Memorag: Moving Towards Next-gen RAG Via Memory-inspired Knowledge Discovery (2024) • No Venue
Qian et al.
Toward General Instruction-following Alignment For Retrieval-augmented Generation (2024) • No Venue
Dong et al.
From Local To Global: A Graph RAG Approach To Query-focused Summarization (2024) • Arxiv • 90 citations
Edge et al.
A Survey On RAG Meeting Llms: Towards Retrieval-augmented Large Language Models (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 281 citations
Fan et al.
RAG Foundry: A Framework For Enhancing Llms For Retrieval Augmented Generation (2024) • No Venue
Fleischer et al.
Similarity Is Not All You Need: Endowing Retrieval Augmented Generation With Multi Layered Thoughts (2024) • No Venue
Gan et al.
Seakr: Self-aware Knowledge Retrieval For Adaptive Retrieval Augmented Generation (2024) • No Venue
Yao et al.
CRAG -- Comprehensive RAG Benchmark (2024) • No Venue
Yang et al.
Adaptive-rag: Learning To Adapt Retrieval-augmented Large Language Models Through Question Complexity (2024) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 72 citations
Jeong et al.
Improving Medical Reasoning Through Retrieval And Self-reflection With Retrieval-augmented Large Language Models (2024) • Bioinformatics • 50 citations
Jeong et al.
Longrag: Enhancing Retrieval-augmented Generation With Long-context Llms (2024) • No Venue
Ziyan Jiang, Xueguang Ma, Wenhu Chen
Chatqa 2: Bridging The Gap To Proprietary Llms In Long Context And RAG Capabilities (2024) • No Venue
Xu et al.
Fact, Fetch, And Reason: A Unified Evaluation Of Retrieval-augmented Generation (2024) • No Venue
Krishna et al.
In Search Of Needles In A 10M Haystack: Recurrent Memory Finds What Llms Miss (2024) • No Venue
Kuratov et al.
Summary Of A Haystack: A Challenge To Long-context Llms And RAG Systems (2024) • No Venue
Laban et al.
Retrieval-augmented Egocentric Video Captioning (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 77 citations
Xu et al.
Retrollm: Empowering Large Language Models To Retrieve Fine-grained Evidence Within Generation (2024) • No Venue
Li et al.
Structrag: Boosting Knowledge Intensive Reasoning Of Llms Via Inference-time Hybrid Information Structurization (2024) • No Venue
Li et al.
Generative Motion Stylization Of Cross-structure Characters Within Canonical Motion Space (2024) • IEEE Journal on Selected Areas in Communications • 46 citations
Zhang et al.
Benchmarking Retrieval-augmented Generation For Medicine (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 119 citations
Xiong et al.
Mmed-rag: Versatile Multimodal RAG System For Medical Vision Language Models (2024) • No Venue
Xia et al.
Medtrinity-25m: A Large-scale Multimodal Dataset With Multigranular Annotations For Medicine (2024) • No Venue
Xie et al.
Fine-tuning Or Retrieval? Comparing Knowledge Injection In Llms (2023) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 50 citations
Ovadia et al.
Loss Functions And Metrics In Deep Learning (2023) • Terven J. Cordova-Esparza DM. Romero-Gonzalez JA. et al. A comprehensive survey of loss functions and metrics in deep learning. Artif Intell Rev 58 195 (2025) • 40 citations
Terven et al.
Paperqa: Retrieval-augmented Generative Agent For Scientific Research (2023) • Arxiv • 48 citations
Lála et al.
Bioinspiredllm: Conversational Large Language Model For The Mechanics Of Biological And Bio-inspired Materials (2023) • Advanced Science • 67 citations
Rachel K. Luu, Markus J. Buehler
Biomedical Knowledge Graph-optimized Prompt Generation For Large Language Models (2023) • Bioinformatics • 51 citations
Soman et al.
Self-rag: Learning To Retrieve, Generate, And Critique Through Self-reflection (2023) • No Venue
Asai et al.
Mechgpt, A Language-based Strategy For Mechanics And Materials Modeling That Connects Knowledge Across Scales, Disciplines And Modalities (2023) • Applied Mechanics Reviews • 74 citations
Markus J. Buehler
Ragas: Automated Evaluation Of Retrieval Augmented Generation (2023) • Arxiv • 60 citations
Es et al.
REPLUG: Retrieval-augmented Black-box Language Models (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 76 citations
Shi et al.
Enhancing Model Performance In Multilingual Information Retrieval With Comprehensive Data Engineering Techniques (2023) • ICAIF '23: 4th ACM International Conference on AI in Finance • 109 citations
Zhang et al.
A Study On The Implementation Of Generative AI Services Using An Enterprise Data-based LLM Application Architecture (2023) • Advances in Artificial Intelligence and Machine Learning • 53 citations
Cheonsu Jeong
Active Retrieval Augmented Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 216 citations
Jiang et al.
A Survey On Retrieval-augmented Text Generation (2022) • Arxiv • 70 citations
Li et al.
Re-imagen: Retrieval-augmented Text-to-image Generator (2022) • Arxiv • 41 citations
Chen et al.
Murag: Multimodal Retrieval-augmented Generator For Open Question Answering Over Images And Text (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 65 citations
Chen et al.
Re2g: Retrieve, Rerank, Generate (2022) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 42 citations
Glass et al.
Retrieval-augmented Generation For Code Summarization Via Hybrid GNN (2020) • Arxiv • 66 citations
Liu et al.
GEAR: Graph-based Evidence Aggregating And Reasoning For Fact Verification (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 196 citations
Zhou et al.
A Hybrid Retrieval-generation Neural Conversation Model (2019) • Proceedings of the 28th ACM International Conference on Information and Knowledge Management • 65 citations
Yang et al.
Review-driven Answer Generation For Product-related Questions In E-commerce (2019) • Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining • 50 citations
Chen et al.

Showing first 12 while collapsed. Click to expand and reveal all 128.

RECSYS 70 papers #

Onerec: Unifying Retrieve And Rank With Generative Recommender And Iterative Preference Alignment (2025) • No Venue
Deng et al.
A Review Of Modern Recommender Systems Using Generative Models (gen-recsys) (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 62 citations
Deldjoo et al.
SPAR: Personalized Content-based Recommendation Via Long Engagement Attention (2024) • No Venue
Zhang et al.
ONCE: Boosting Content-based Recommendation With Both Open- And Closed-source Large Language Models (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 67 citations
Liu et al.
Recommender Systems In The Era Of Large Language Models (llms) (2023) • IEEE Transactions on Knowledge and Data Engineering • 183 citations
Zhao et al.
Towards Open-world Recommendation With Knowledge Augmentation From Large Language Models (2023) • RecSys '24: 18th ACM Conference on Recommender Systems • 69 citations
Xi et al.
Collaborative Generative AI: Integrating Gpt-k For Efficient Editing In Text-to-image Generation (2023) • WWW '24: The ACM Web Conference 2024 • 58 citations
Zhu et al.
A Survey Of Graph Prompting Methods: Techniques, Applications, And Challenges (2023) • World Wide Web • 199 citations
Wu et al.
Missrec: Pre-training And Transferring Multi-modal Interest-aware Sequence Representation For Recommendation (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 45 citations
Wang et al.
Tallrec: An Effective And Efficient Tuning Framework To Align Large Language Model With Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 242 citations
Bao et al.
Adapting Large Language Models By Integrating Collaborative Semantics For Recommendation (2023) • 2024 IEEE 40th International Conference on Data Engineering (ICDE) • 58 citations
Zheng et al.
Where To Go Next For Recommender Systems? ID- Vs. Modality-based Recommender Models Revisited (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 137 citations
Yuan et al.
Recmind: Large Language Model Powered Agent For Recommendation (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 43 citations
Wang et al.
Uncovering Chatgpt's Capabilities In Recommender Systems (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 116 citations
Dai et al.
Multi-modal Self-supervised Learning For Recommendation (2023) • Proceedings of the ACM Web Conference 2023 • 157 citations
Wei et al.
VIP5: Towards Multimodal Foundation Models For Recommendation (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 41 citations
Geng et al.
Leveraging Large Language Models For Sequential Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 88 citations
Harte et al.
Large Language Models Are Competitive Near Cold-start Recommenders For Language- And Item-based Preferences (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 89 citations
Sanner et al.
Large Language Models Are Zero-shot Rankers For Recommender Systems (2023) • Lecture Notes in Computer Science • 155 citations
Hou et al.
How To Do Things With Deep Learning Code (2023) • Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 41 citations
Minh Hua, Rita Raley
Is Chatgpt Fair For Recommendation? Evaluating Fairness In Large Language Model Recommendation (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 94 citations
Zhang et al.
Automlp: Automated MLP For Sequential Recommendations (2023) • Proceedings of the ACM Web Conference 2023 • 44 citations
Li et al.
Diffurec: A Diffusion Model For Sequential Recommendation (2023) • ACM Transactions on Information Systems • 80 citations
Zihao Li, Aixin Sun, Chenliang Li
Llmrec: Large Language Models With Graph Augmentation For Recommendation (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 134 citations
Wei et al.
Filter-enhanced MLP Is All You Need For Sequential Recommendation (2022) • Proceedings of the ACM Web Conference 2022 • 283 citations
Zhou et al.
A Systematic Review And Replicability Study Of Bert4rec For Sequential Recommendation (2022) • Proceedings of the 16th ACM Conference on Recommender Systems • 40 citations
Aleksandr Petrov, Craig MacDonald
CARCA: Context And Attribute-aware Next-item Recommendation Via Cross-attention (2022) • RecSys '22: Sixteenth ACM Conference on Recommender Systems • 64 citations
Ahmed Rashed, Shereen Elsayed, Lars Schmidt-Thieme
Denoising Self-attentive Sequential Recommendation (2022) • Proceedings of the 16th ACM Conference on Recommender Systems • 65 citations
Chen et al.
A Unified Multi-task Learning Framework For Multi-goal Conversational Recommender Systems (2022) • ACM Transactions on Information Systems • 56 citations
Deng et al.
A Multi-task Learning Framework For Product Ranking With BERT (2022) • Lecture Notes in Computer Science • 97 citations
Wu et al.
Self-supervised Hypergraph Transformer For Recommender Systems (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 107 citations
Lianghao Xia, Chao Huang, Chuxu Zhang
Recommendation As Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5) (2022) • RecSys '22: Sixteenth ACM Conference on Recommender Systems • 334 citations
Geng et al.
Towards Universal Sequence Representation Learning For Recommender Systems (2022) • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 194 citations
Hou et al.
Task-adaptive Neural Process For User Cold-start Recommendation (2021) • Proceedings of the Web Conference 2021 • 81 citations
Lin et al.
Accelerating Recommendation System Training By Leveraging Popular Choices (2021) • Proceedings of the VLDB Endowment • 47 citations
Adnan et al.
Personalized Transformer For Explainable Recommendation (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 72 citations
Lei Li, Yongfeng Zhang, Li Chen
On Fast Adversarial Robustness Adaptation In Model-agnostic Meta-learning (2021) • The VLDB Journal • 118 citations
Wang et al.
Curriculum Pre-training Heterogeneous Subgraph Transformer For Top-$n$ Recommendation (2021) • ACM Transactions on Information Systems • 45 citations
Wang et al.
A Survey On Accuracy-oriented Neural Recommendation: From Collaborative Filtering To Information-rich Recommendation (2021) • IEEE Transactions on Knowledge and Data Engineering • 368 citations
Wu et al.
Crslab: An Open-source Toolkit For Building Conversational Recommender System (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations • 44 citations
Zhou et al.
Counterfactual Explanations For Neural Recommenders (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 45 citations
Khanh Hiep Tran, Azin Ghazimatin, Rishiraj Saha Roy
Knowledge-enhanced Hierarchical Graph Transformer Network For Multi-behavior Recommendation (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 179 citations
Xia et al.
Revcore: Review-augmented Conversational Recommendation (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 64 citations
Lu et al.
What Does BERT Know About Books, Movies And Music? Probing BERT For Conversational Recommendation (2020) • RecSys '20: Fourteenth ACM Conference on Recommender Systems • 54 citations
Gustavo Penha, Claudia Hauff
How Useful Are Reviews For Recommendation? A Critical Review And Potential Improvements (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 54 citations
Noveen Sachdeva, Julian McAuley
Evaluating Conversational Recommender Systems Via User Simulation (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 73 citations
Shuo Zhang, Krisztian Balog
Improving Multi-turn Response Selection Models With Complementary Last-utterance Selection By Instance Weighting (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 290 citations
Zhou et al.
Pre-training Graph Transformer With Multimodal Side Information For Recommendation (2020) • MM '21: ACM Multimedia Conference • 63 citations
Liu et al.
ADER: Adaptively Distilled Exemplar Replay Towards Continual Learning For Session-based Recommendation (2020) • Fourteenth ACM Conference on Recommender Systems • 49 citations
Fei Mi, Xiaoyu Lin, Boi Faltings
MEANTIME: Mixture Of Attention Mechanisms With Multi-temporal Embeddings For Sequential Recommendation (2020) • Fourteenth ACM Conference on Recommender Systems • 56 citations
Sung Min Cho, Eunhyeok Park, Sungjoo Yoo
Hierarchical Temporal Convolutional Networks For Dynamic Recommender Systems (2019) • The World Wide Web Conference • 110 citations
You et al.
Recommendation As A Communication Game: Self-supervised Bot-play For Goal-oriented Dialogue (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 80 citations
Kang et al.
Transfer Meets Hybrid: A Synthetic Approach For Cross-domain Collaborative Filtering With Text (2019) • The World Wide Web Conference • 90 citations
Guangneng Hu, Yu Zhang, Qiang Yang
Topic-enhanced Memory Networks For Personalised Point-of-interest Recommendation (2019) • Arxiv • 54 citations
Xiao Zhou, Cecilia Mascolo, Zhongxiang Zhao
Multi-task Feature Learning For Knowledge Graph Enhanced Recommendation (2019) • The World Wide Web Conference • 525 citations
Wang et al.
Explainable Recommendation Via Multi-task Learning In Opinionated Text Data (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 199 citations
Wang et al.
Word2vec Applied To Recommendation: Hyperparameters Matter (2018) • Arxiv • 44 citations
Hugo Caselles-Dupré, Florian Lesaint, Jimena Royo-Letelier
DKN: Deep Knowledge-aware Network For News Recommendation (2018) • Arxiv • 188 citations
Wang et al.
Learning Distributed Representations From Reviews For Collaborative Filtering (2018) • Proceedings of the 9th ACM Conference on Recommender Systems • 77 citations
Almahairi et al.
Attentive Aspect Modeling For Review-aware Recommendation (2018) • ACM Transactions on Information Systems • 112 citations
Guan et al.
Self-attentive Sequential Recommendation (2018) • 2018 IEEE International Conference on Data Mining (ICDM) • 2379 citations
Wang-Cheng Kang, Julian McAuley
Multi-pointer Co-attention Networks For Recommendation (2018) • Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 220 citations
Yi Tay, Luu Anh Tuan, Siu Cheung Hui
Towards Deep Conversational Recommendations (2018) • Arxiv • 123 citations
Li et al.
News Session-based Recommendations Using Deep Neural Networks (2018) • Proceedings of the 3rd Workshop on Deep Learning for Recommender Systems • 95 citations
Gabriel de Souza P. Moreira, Felipe Ferreira, Adilson Marques da Cunha
Latent Relational Metric Learning Via Memory-based Attention For Collaborative Ranking (2017) • Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18 • 214 citations
Yi Tay, Anh Tuan Luu, Siu Cheung Hui
Inter-session Modeling For Session-based Recommendation (2017) • Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems • 72 citations
Massimiliano Ruocco, Ole Steinar Lillestøl Skrede, Helge Langseth
Deep Learning Based Recommender System: A Survey And New Perspectives (2017) • Arxiv • 655 citations
Zhang et al.
Transnets: Learning To Transform For Recommendation (2017) • RecSys '17: Eleventh ACM Conference on Recommender Systems • 186 citations
Rose Catherine, William Cohen
Collaborative Recurrent Autoencoder: Recommend While Learning To Fill In The Blanks (2016) • Arxiv • 79 citations
Hao Wang, Xingjian Shi, Dit-Yan Yeung
Ask The GRU: Multi-task Learning For Deep Text Recommendations (2016) • RecSys '16: Tenth ACM Conference on Recommender Systems • 192 citations
Trapit Bansal, David Belanger, Andrew McCallum

Showing first 12 while collapsed. Click to expand and reveal all 70.

Reinforcement Learning 749 papers #

TTRL: Test-time Reinforcement Learning (2025) • No Venue
Zuo et al.
Reasonflux-prm: Trajectory-aware Prms For Long Chain-of-thought Reasoning In Llms (2025) • No Venue
Zou et al.
Tattoo: Tool-grounded Thinking PRM For Test-time Scaling In Tabular Reasoning (2025) • No Venue
Zou et al.
Segagent: Exploring Pixel Understanding Capabilities In Mllms By Imitating Human Annotator Trajectories (2025) • No Venue
Zhu et al.
Flowrl: Matching Reward Distributions For LLM Reasoning (2025) • No Venue
Zhu et al.
DRIVE: Data Curation Best Practices For Reinforcement Learning With Verifiable Reward In Competitive Code Generation (2025) • No Venue
Zhu et al.
The Path Not Taken: RLVR Provably Learns Off The Principals (2025) • No Venue
Zhu et al.
Vargpt-v1.1: Improve Visual Autoregressive Large Unified Model Via Iterative Instruction Tuning And Reinforcement Learning (2025) • No Venue
Zhuang et al.
Towards Faithful And Controllable Personalization Via Critique-post-edit Reinforcement Learning (2025) • No Venue
Zhu et al.
How To Train Your LLM Web Agent: A Statistical Diagnosis (2025) • No Venue
Vattikonda et al.
SRPO: Enhancing Multimodal LLM Reasoning Via Reflection-aware Reinforcement Learning (2025) • No Venue
Wan et al.
Qwenlong-l1: Towards Long-context Large Reasoning Models With Reinforcement Learning (2025) • No Venue
Wan et al.
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation And Methodology (2025) • No Venue
Wang et al.
Diversity-enhanced Reasoning For Subjective Questions (2025) • No Venue
Wang et al.
Beyond The 80/20 Rule: High-entropy Minority Tokens Drive Effective Reinforcement Learning For LLM Reasoning (2025) • No Venue
Wang et al.
Co-evolving LLM Coder And Unit Tester Via Reinforcement Learning (2025) • No Venue
Wang et al.
Tina: Tiny Reasoning Models Via Lora (2025) • No Venue
Wang et al.
Information Gain-based Policy Optimization: A Simple And Effective Approach For Multi-turn LLM Agents (2025) • No Venue
Wang et al.
Generalizing Test-time Compute-optimal Scaling As An Optimizable Graph (2025) • No Venue
Wang et al.
Emergent Hierarchical Reasoning In Llms Through Reinforcement Learning (2025) • No Venue
Wang et al.
Game-tars: Pretrained Foundation Models For Scalable Generalist Multimodal Game Agents (2025) • No Venue
Wang et al.
Harnessing Uncertainty: Entropy-modulated Policy Gradients For Long-horizon LLM Agents (2025) • No Venue
Wang et al.
Geovista: Web-augmented Agentic Visual Reasoning For Geolocalization (2025) • No Venue
Wang et al.
Llava-critic-r1: Your Critic Model Is Secretly A Strong Policy Model (2025) • No Venue
Wang et al.
Jigsaw-r1: A Study Of Rule-based Visual Reinforcement Learning With Jigsaw Puzzles (2025) • No Venue
Wang et al.
Internvl3.5: Advancing Open-source Multimodal Models In Versatility, Reasoning, And Efficiency (2025) • No Venue
Wang et al.
Variational Reasoning For Language Models (2025) • No Venue
Zhou et al.
Reinforcing General Reasoning Without Verifiers (2025) • No Venue
Zhou et al.
R1-zero's "aha Moment" In Visual Reasoning On A 2B Non-sft Model (2025) • No Venue
Zhou et al.
Roborefer: Towards Spatial Referring With Reasoning In Vision-language Models For Robotics (2025) • No Venue
Zhou et al.
Reinforced Visual Perception With Tools (2025) • No Venue
Zhou et al.
Breaking The Exploration Bottleneck: Rubric-scaffolded Reinforcement Learning For General LLM Reasoning (2025) • No Venue
Zhou et al.
Agentfly: Fine-tuning LLM Agents Without Fine-tuning Llms (2025) • No Venue
Zhou et al.
Evolving Language Models Without Labels: Majority Drives Selection, Novelty Promotes Variation (2025) • No Venue
Zhou et al.
The Landscape Of Agentic Reinforcement Learning For Llms: A Survey (2025) • No Venue
Zhang et al.
MM-RLHF: The Next Step Forward In Multimodal LLM Alignment (2025) • No Venue
Zhang et al.
Openmmreasoner: Pushing The Frontiers For Multimodal Reasoning With An Open And General Recipe (2025) • No Venue
Zhang et al.
R1-reward: Training Multimodal Reward Model Through Stable Reinforcement Learning (2025) • No Venue
Zhang et al.
Process-based Self-rewarding Language Models (2025) • No Venue
Zhang et al.
RLFR: Extending Reinforcement Learning For Llms With Flow Environment (2025) • No Venue
Zhang et al.
Skywork-r1v4: Toward Agentic Multimodal Intelligence Through Interleaved Thinking With Images And Deepresearch (2025) • No Venue
Zhang et al.
A Survey Of Reinforcement Learning For Large Reasoning Models (2025) • No Venue
Zhang et al.
Thyme: Think Beyond Images (2025) • No Venue
Zhang et al.
Chemdfm-r: An Chemical Reasoner LLM Enhanced With Atomized Chemical Knowledge (2025) • No Venue
Zhao et al.
Absolute Zero: Reinforced Self-play Reasoning With Zero Data (2025) • No Venue
Zhao et al.
Repurposing Synthetic Data For Fine-grained Search Agent Supervision (2025) • No Venue
Zhao et al.
Learning To Reason Without External Rewards (2025) • No Venue
Zhao et al.
Geometric-mean Policy Optimization (2025) • No Venue
Zhao et al.
MM-HELIX: Boosting Multimodal Long-chain Reflective Reasoning With Holistic Platform And Adaptive Hybrid Policy Optimization (2025) • No Venue
Zhao et al.
R1-omni: Explainable Omni-multimodal Emotion Recognition With Reinforcing Learning (2025) • No Venue
Jiaxing Zhao, Xihan Wei, Liefeng Bo
One Token To Fool Llm-as-a-judge (2025) • No Venue
Zhao et al.
Stronger Together: On-policy Reinforcement Learning For Collaborative Llms (2025) • No Venue
Zhao et al.
First Return, Entropy-eliciting Explore (2025) • No Venue
Zheng et al.
Diffusionnft: Online Diffusion Reinforcement With Forward Process (2025) • No Venue
Zheng et al.
Group Sequence Policy Optimization (2025) • No Venue
Zheng et al.
Stabilizing Reinforcement Learning With Llms: Formulation And Practices (2025) • No Venue
Zheng et al.
Parallel-r1: Towards Parallel Thinking Via Reinforcement Learning (2025) • No Venue
Zheng et al.
DAPO: An Open-source LLM Reinforcement Learning System At Scale (2025) • No Venue
Yu et al.
Aworld: Orchestrating The Training Recipe For Agentic AI (2025) • No Venue
Yu et al.
Demystifying Reinforcement Learning In Agentic Reasoning (2025) • No Venue
Yu et al.
Minicpm-v 4.5: Cooking Efficient Mllms Via Architecture, Data, And Training Recipe (2025) • No Venue
Yu et al.
Sotopia-rl: Reward Design For Social Intelligence (2025) • No Venue
Yu et al.
RLPR: Extrapolating RLVR To General Domains Without Verifiers (2025) • No Venue
Yu et al.
Vl-cogito: Progressive Curriculum Reinforcement Learning For Advanced Multimodal Reasoning (2025) • No Venue
Yuan et al.
Does Reinforcement Learning Really Incentivize Reasoning Capacity In Llms Beyond The Base Model? (2025) • No Venue
Yue et al.
Rlinf-vla: A Unified And Efficient Framework For VLA+RL Training (2025) • No Venue
Zang et al.
Multi-swe-bench: A Multilingual Benchmark For Issue Resolving (2025) • No Venue
Zan et al.
Internlm-xcomposer2.5-reward: A Simple Yet Effective Multi-modal Reward Model (2025) • No Venue
Zang et al.
Reasoning Vectors: Transferring Chain-of-thought Capabilities Via Task Arithmetic (2025) • No Venue
Mohammad Zbeeb, Hasan Abed Al Kader Hammoud, Bernard Ghanem
A Vision-language-action-critic Model For Robotic Real-world Reinforcement Learning (2025) • No Venue
Zhai et al.
ACECODER: Acing Coder RL Via Automated Test-case Synthesis (2025) • No Venue
Zeng et al.
Simplerl-zoo: Investigating And Taming Zero Reinforcement Learning For Open Base Models In The Wild (2025) • No Venue
Zeng et al.
Satori-swe: Evolutionary Test-time Scaling For Sample-efficient Software Engineering (2025) • No Venue
Zeng et al.
Exgrpo: Learning To Reason From Experience (2025) • No Venue
Zhan et al.
Vision-r1: Evolving Human-free Alignment In Large Vision-language Models Via Vision-guided Reinforcement Learning (2025) • No Venue
Zhan et al.
Embodied-reasoner: Synergizing Visual Search, Reasoning, And Action For Embodied Interactive Tasks (2025) • No Venue
Zhang et al.
100 Days After Deepseek-r1: A Survey On Replication Studies And More Directions For Reasoning Language Models (2025) • No Venue
Zhang et al.
Adaptthink: Reasoning Models Can Learn When To Think (2025) • No Venue
Zhang et al.
Agent Learning Via Early Experience (2025) • No Venue
Zhang et al.
The Alignment Waltz: Jointly Training Agents To Collaborate For Safety (2025) • No Venue
Zhang et al.
Basereward: A Strong Baseline For Multimodal Reward Model (2025) • No Venue
Zhang et al.
Autoenv: Automated Environments For Measuring Cross-environment Agent Learning (2025) • No Venue
Zhang et al.
Loongrl:reinforcement Learning For Advanced Reasoning Over Long Contexts (2025) • No Venue
Wang et al.
Mr-align: Meta-reasoning Informed Factuality Alignment For Large Reasoning Models (2025) • No Venue
Wang et al.
ODYSSEY: Open-world Quadrupeds Exploration And Manipulation For Long-horizon Tasks (2025) • No Venue
Wang et al.
Octothinker: Mid-training Incentivizes Reinforcement Learning Scaling (2025) • No Venue
Wang et al.
OTC: Optimal Tool Calls Via Reinforcement Learning (2025) • No Venue
Wang et al.
RLVER: Reinforcement Learning With Verifiable Emotion Rewards For Empathetic Agents (2025) • No Venue
Wang et al.
Pref-grpo: Pairwise Preference Reward-based GRPO For Stable Text-to-image Reinforcement Learning (2025) • No Venue
Wang et al.
Perception-aware Policy Optimization For Multimodal Reasoning (2025) • No Venue
Wang et al.
Reinforcement Learning For Reasoning In Large Language Models With One Training Example (2025) • No Venue
Wang et al.
Revolutionizing Reinforcement Learning Framework For Diffusion Large Language Models (2025) • No Venue
Wang et al.
Reverse-engineered Reasoning For Open-ended Generation (2025) • No Venue
Wang et al.
Scireasoner: Laying The Scientific Reasoning Ground Across Disciplines (2025) • No Venue
Wang et al.
Sota With Less: Mcts-guided Sample Selection For Data-efficient Visual Reasoning Self-improvement (2025) • No Venue
Wang et al.
Stabilizing Knowledge, Promoting Reasoning: Dual-token Constraints For RLVR (2025) • No Venue
Wang et al.
Test-time Scaling With Reflective Generative Model (2025) • No Venue
Wang et al.
Vision-zero: Scalable VLM Self-improvement Via Strategic Gamified Self-play (2025) • No Venue
Wang et al.
UI-TARS-2 Technical Report: Advancing GUI Agent With Multi-turn Reinforcement Learning (2025) • No Venue
Wang et al.
Unified Multimodal Chain-of-thought Reward Model Through Reinforcement Fine-tuning (2025) • No Venue
Wang et al.
Vicrit: A Verifiable Reinforcement Learning Proxy Task For Visual Perception In Vlms (2025) • No Venue
Wang et al.
Video-thinker: Sparking "thinking With Videos" Via Reinforcement Learning (2025) • No Venue
Wang et al.
Vl-rethinker: Incentivizing Self-reflection Of Vision-language Models With Reinforcement Learning (2025) • No Venue
Wang et al.
Advancing Multimodal Reasoning Via Reinforcement Learning With Cold Start (2025) • No Venue
Wei et al.
SWE-RL: Advancing LLM Reasoning Via Reinforcement Learning On Open Software Evolution (2025) • No Venue
Wei et al.
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior For Visual Reasoning (2025) • No Venue
Wei et al.
Unsupervised Post-training For Multi-modal LLM Reasoning Via GRPO (2025) • No Venue
Wei et al.
Truthrl: Incentivizing Truthful Llms Via Reinforcement Learning (2025) • No Venue
Wei et al.
Reinforcement Learning With Verifiable Rewards Implicitly Incentivizes Correct Reasoning In Base Llms (2025) • No Venue
Wen et al.
Light-r1: Curriculum SFT, DPO And RL For Long COT From Scratch And Beyond (2025) • No Venue
Wen et al.
J1: Incentivizing Thinking In Llm-as-a-judge Via Reinforcement Learning (2025) • No Venue
Whitehouse et al.
Deepsearch: Overcome The Bottleneck Of Reinforcement Learning With Verifiable Rewards Via Monte Carlo Tree Search (2025) • No Venue
Wu et al.
On The Generalization Of SFT: A Reinforcement Learning Perspective With Reward Rectification (2025) • No Venue
Wu et al.
LAPO: Internalizing Reasoning Efficiency Via Length-adaptive Policy Optimization (2025) • No Venue
Wu et al.
It Takes Two: Your GRPO Is Secretly DPO (2025) • No Venue
Wu et al.
EPO: Entropy-regularized Policy Optimization For LLM Agents Reinforcement Learning (2025) • No Venue
Wujiang et al.
Step-audio 2 Technical Report (2025) • No Venue
Wu et al.
Reinforcement Learning In Vision: A Survey (2025) • No Venue
Wu et al.
Mmsearch-r1: Incentivizing Lmms To Search (2025) • No Venue
Wu et al.
Longwriter-zero: Mastering Ultra-long Text Generation Via Reinforcement Learning (2025) • No Venue
Wu et al.
Multiplayer Nash Preference Optimization (2025) • No Venue
Wu et al.
Reasoning Or Memorization? Unreliable Results Of Reinforcement Learning Due To Data Contamination (2025) • No Venue
Wu et al.
Quantile Advantage Estimation For Entropy-safe Reasoning (2025) • No Venue
Wu et al.
Resum: Unlocking Long-horizon Search Intelligence Via Context Summarization (2025) • No Venue
Wu et al.
Rewarddance: Reward Scaling In Visual Generation (2025) • No Venue
Wu et al.
Visualquality-r1: Reasoning-induced Image Quality Assessment Via Reinforcement Learning To Rank (2025) • No Venue
Wu et al.
Synthrl: Scaling Visual Reasoning With Verifiable Data Synthesis (2025) • No Venue
Wu et al.
Visual Jigsaw Post-training Improves Mllms (2025) • No Venue
Wu et al.
Webdancer: Towards Autonomous Information Seeking Agency (2025) • No Venue
Wu et al.
Agentgym-rl: Training LLM Agents For Long-horizon Decision Making Through Multi-turn Reinforcement Learning (2025) • No Venue
Xi et al.
BAPO: Stabilizing Off-policy Reinforcement Learning For Llms Via Balanced Policy Optimization With Adaptive Clipping (2025) • No Venue
Xi et al.
Surrogate Signals From Format And Length: Reinforcement Learning For Solving Mathematical Problems Without Ground Truth Answers (2025) • No Venue
Xin et al.
Agent0: Unleashing Self-evolving Agents From Zero Data Via Tool-integrated Reasoning (2025) • No Venue
Xia et al.
Towards System 2 Reasoning In Llms: Learning How To Think With Meta Chain-of-though (2025) • No Venue
Xiang et al.
Logic-rl: Unleashing LLM Reasoning With Rule-based Reinforcement Learning (2025) • No Venue
Xie et al.
Teaching Language Models To Critique Via Reinforcement Learning (2025) • No Venue
Xie et al.
Self-rewarding Correction For Mathematical Reasoning (2025) • No Venue
Xiong et al.
Caprl: Stimulating Dense Image Caption Capabilities Via Reinforcement Learning (2025) • No Venue
Xing et al.
Pretrainzero: Reinforcement Active Pretraining (2025) • No Venue
Xing et al.
Flag-trader: Fusion Llm-agent With Gradient-based Reinforcement Learning For Financial Trading (2025) • No Venue
Xiong et al.
Stepwiser: Stepwise Generative Judges For Wiser Reasoning (2025) • No Venue
Xiong et al.
Comfyui-copilot: An Intelligent Assistant For Automated Workflow Development (2025) • No Venue
Xu et al.
Dancegrpo: Unleashing GRPO On Visual Generation (2025) • No Venue
Xue et al.
Phi-4-mini-reasoning: Exploring The Limits Of Small Reasoning Language Models In Math (2025) • No Venue
Xu et al.
Mind The Gap: Bridging Thought Leap For Improved Chain-of-thought Tuning (2025) • No Venue
Xu et al.
Visulogic: A Benchmark For Evaluating Visual Reasoning In Multi-modal Large Language Models (2025) • No Venue
Xu et al.
Thinking-free Policy Initialization Makes Distilled Reasoning Models More Effective And Efficient Reasoners (2025) • No Venue
Xu et al.
Single-stream Policy Optimization (2025) • No Venue
Zhongwen Xu, Zihan Ding
Tiny Model, Big Logic: Diversity-driven Optimization Elicits Large-model Reasoning Ability In Vibethinker-1.5b (2025) • No Venue
Xu et al.
Visual Planning: Let's Think Only With Images (2025) • No Venue
Xu et al.
Towards Large Reasoning Models: A Survey Of Reinforced Reasoning With Large Language Models (2025) • No Venue
Xu et al.
Oceangym: A Benchmark Environment For Underwater Embodied Agents (2025) • No Venue
Xue et al.
Simpletir: End-to-end Reinforcement Learning For Multi-turn Tool-integrated Reasoning (2025) • No Venue
Xue et al.
Can Understanding And Generation Truly Benefit Together -- Or Just Coexist? (2025) • No Venue
Yan et al.
General Agentic Memory Via Deep Research (2025) • No Venue
Yan et al.
Re:form -- Reducing Human Priors In Scalable Formal Software Verification With RL In Llms: A Preliminary Study On Dafny (2025) • No Venue
Yan et al.
Learning To Reason Under Off-policy Guidance (2025) • No Venue
Yan et al.
From Code Foundation Models To Agents And Applications: A Practical Guide To Code Intelligence (2025) • No Venue
Yang et al.
ARIA: Training Language Agents With Intention-driven Reward Aggregation (2025) • No Venue
Yang et al.
Laser: Reinforcement Learning With Last-token Self-rewarding (2025) • No Venue
Yang et al.
DCPO: Dynamic Clipping Policy Optimization (2025) • No Venue
Yang et al.
Deepcritic: Deliberate Critique With Large Language Models (2025) • No Venue
Yang et al.
Kwai Keye-vl 1.5 Technical Report (2025) • No Venue
Yang et al.
Machine Mental Imagery: Empower Multimodal Reasoning With Latent Visual Tokens (2025) • No Venue
Yang et al.
Longvt: Incentivizing "thinking With Long Videos" Via Native Tool Calling (2025) • No Venue
Yang et al.
Moose-chem2: Exploring LLM Limits In Fine-grained Scientific Hypothesis Discovery Via Hierarchical Search (2025) • No Venue
Yang et al.
Mindjourney: Test-time Scaling With World Models For Spatial Reasoning (2025) • No Venue
Yang et al.
Mmada: Multimodal Large Diffusion Language Models (2025) • No Venue
Yang et al.
Steering Vision-language-action Models As Anti-exploration: A Test-time Scaling Approach (2025) • No Venue
Yang et al.
Reasonflux: Hierarchical LLM Reasoning Via Scaling Thought Templates (2025) • No Venue
Yang et al.
Visual Spatial Tuning (2025) • No Venue
Yang et al.
Ui2code^n: A Visual Language Model For Test-time Scalable Interactive Ui-to-code Generation (2025) • No Venue
Yang et al.
Table-r1: Inference-time Scaling For Table Reasoning (2025) • No Venue
Yang et al.
Visionthink: Smart And Efficient Vision Language Model Via Reinforcement Learning (2025) • No Venue
Yang et al.
Zerogui: Automating Online GUI Learning At Zero Human Cost (2025) • No Venue
Yang et al.
Are Reasoning Models More Prone To Hallucination? (2025) • No Venue
Yao et al.
Optimizing Chain-of-thought Reasoners Via Gradient Variance Minimization In Rejection Sampling And RL (2025) • No Venue
Yao et al.
Demystifying Long Chain-of-thought Reasoning In Llms (2025) • No Venue
Yeo et al.
Judgelrm: Large Reasoning Models As A Judge (2025) • No Venue
Chen et al.
Magistral (2025) • No Venue
Mistral-Ai et al.
Competitive Programming With Large Reasoning Models (2025) • No Venue
Openai et al.
Deepseek-v3.2: Pushing The Frontier Of Open Large Language Models (2025) • No Venue
Deepseek-Ai et al.
Skywork R1V2: Multimodal Hybrid Reinforcement Learning For Reasoning (2025) • No Venue
Chris et al.
VAPO: Efficient And Reliable Reinforcement Learning For Advanced Reasoning Tasks (2025) • No Venue
Yuyue et al.
World Simulation With Video Foundation Models For Physical AI (2025) • No Venue
Nvidia et al.
Phi-4-reasoning Technical Report (2025) • No Venue
Abdin et al.
The Markovian Thinker (2025) • No Venue
Aghajohari et al.
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (2025) • No Venue
Agrawal et al.
Rethinking Reflection In Pre-training (2025) • No Venue
Ai et al.
Sharing Is Caring: Efficient LM Post-training With Collective RL Experience Sharing (2025) • No Venue
Amico et al.
Kandinsky 5.0: A Family Of Foundation Models For Image And Video Generation (2025) • No Venue
Arkhipkin et al.
The Best Of N Worlds: Aligning Reinforcement Learning With Best-of-n Sampling Via Max@k Optimisation (2025) • No Venue
Bagirov et al.
Swe-rebench: An Automated Pipeline For Task Collection And Decontaminated Evaluation Of Software Engineering Agents (2025) • No Venue
Badertdinov et al.
Intern-s1: A Scientific Multimodal Foundation Model (2025) • No Venue
Bai et al.
Univg-r1: Reasoning Guided Universal Visual Grounding With Reinforcement Learning (2025) • No Venue
Bai et al.
Llama-nemotron: Efficient Reasoning Models (2025) • No Venue
Bercovich et al.
Reflect, Retry, Reward: Self-improving Llms Via Reinforcement Learning (2025) • No Venue
Bensal et al.
Reasoning Language Models: A Blueprint (2025) • No Venue
Besta et al.
Hail To The Thief: Exploring Attacks And Defenses In Decentralised GRPO (2025) • No Venue
Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen
Enhancing Vision-language Model Training With Reinforcement Learning In Synthetic Worlds For Real-world Success (2025) • No Venue
Bredis et al.
Training-free Group Relative Policy Optimization (2025) • No Venue
Cai et al.
Iterresearch: Rethinking Long-horizon Agents Via Markovian State Reconstruction (2025) • No Venue
Chen et al.
Web-shepherd: Advancing Prms For Reinforcing Web Agents (2025) • No Venue
Chae et al.
Rynnvla-002: A Unified Vision-language-action And World Model (2025) • No Venue
Cen et al.
Webscale-rl: Automated Data Pipeline For Scaling RL Data To Pretraining Levels (2025) • No Venue
Cen et al.
Advancing Multimodal Reasoning: From Optimized Cold Start To Staged Reinforcement Learning (2025) • No Venue
Chen et al.
Acereason-nemotron: Advancing Math And Code Reasoning Through Reinforcement Learning (2025) • No Venue
Chen et al.
Blip3o-next: Next Frontier Of Native Image Generation (2025) • No Venue
Chen et al.
Flash-dmd: Towards High-fidelity Few-step Image Generation With Efficient Distillation And Joint Reinforcement Learning (2025) • No Venue
Chen et al.
ERA: Transforming Vlms Into Embodied Agents Via Embodied Prior Learning And Online Reinforcement Learning (2025) • No Venue
Chen et al.
Enigmata: Scaling Logical Reasoning In Large Language Models With Synthetic Verifiable Puzzles (2025) • No Venue
Chen et al.
Exploring The Effect Of Reinforcement Learning On Video Understanding: Insights From Seed-bench-r1 (2025) • No Venue
Chen et al.
GRPO-CARE: Consistency-aware Reinforcement Learning For Multimodal Reasoning (2025) • No Venue
Chen et al.
Spacetools: Tool-augmented Spatial Reasoning Via Double Interactive RL (2025) • No Venue
Chen et al.
Pass@k Training For Adaptively Balancing Exploration And Exploitation Of Large Reasoning Models (2025) • No Venue
Chen et al.
P1: Mastering Physics Olympiads With Reinforcement Learning (2025) • No Venue
Chen et al.
RM-R1: Reward Modeling As Reasoning (2025) • No Venue
Chen et al.
SFT Or RL? An Early Investigation Into Training R1-like Reasoning Large Vision-language Models (2025) • No Venue
Chen et al.
Ui-ins: Enhancing GUI Grounding With Multi-perspective Instruction-as-reasoning (2025) • No Venue
Chen et al.
Π_rl: Online RL Fine-tuning For Flow-based Vision-language-action Models (2025) • No Venue
Chen et al.
Video-as-answer: Predict And Generate Next Video Event With Joint-grpo (2025) • No Venue
Cheng et al.
Reasoning With Exploration: An Entropy Perspective (2025) • No Venue
Cheng et al.
Revisiting Reinforcement Learning For LLM Reasoning From A Cross-domain Perspective (2025) • No Venue
Cheng et al.
The Era Of Agentic Organization: Learning To Organize With Language Models (2025) • No Venue
Chi et al.
SFT Memorizes, RL Generalizes: A Comparative Study Of Foundation Model Post-training (2025) • No Venue
Chu et al.
Selfcite: Self-supervised Alignment For Context Attribution In Large Language Models (2025) • No Venue
Chuang et al.
The Danger Of Overthinking: Examining The Reasoning-action Dilemma In Agentic Tasks (2025) • No Venue
Cuadron et al.
Emu3.5: Native Multimodal Models Are World Learners (2025) • No Venue
Cui et al.
The Entropy Mechanism Of Reinforcement Learning For Reasoning Language Models (2025) • No Venue
Cui et al.
Process Reinforcement Through Implicit Rewards (2025) • No Venue
Cui et al.
Reinforcement Learning For Reasoning In Small Llms: What Works And What Doesn't (2025) • No Venue
Quy-Anh Dang, Chris Ngo
CDE: Curiosity-driven Exploration For Efficient Reinforcement Learning In Large Language Models (2025) • No Venue
Dai et al.
Duoguard: A Two-player Rl-driven Framework For Multilingual LLM Guardrails (2025) • No Venue
Deng et al.
Openvlthinker: An Early Exploration To Complex Vision-language Reasoning Via Iterative Self-improvement (2025) • No Venue
Deng et al.
Supervised Reinforcement Learning: From Expert Trajectories To Step-wise Reasoning (2025) • No Venue
Deng et al.
Arm-thinker: Reinforcing Multimodal Generative Reward Models With Agentic Tool Use And Visual Reasoning (2025) • No Venue
Ding et al.
Agentic Entropy-balanced Policy Optimization (2025) • No Venue
Dong et al.
Tool-star: Empowering Llm-brained Multi-tool Reasoner Via Reinforcement Learning (2025) • No Venue
Dong et al.
Reinforcement Pre-training (2025) • No Venue
Dong et al.
Got-r1: Unleashing Reasoning Capability Of MLLM For Visual Generation With Reinforcement Learning (2025) • No Venue
Duan et al.
Pre-trained Policy Discriminators Are General Reward Models (2025) • No Venue
Dou et al.
Test-time Reinforcement Learning For GUI Grounding Via Region Consistency (2025) • No Venue
Du et al.
Which Heads Matter For Reasoning? Rl-guided KV Cache Compression (2025) • No Venue
Du et al.
SSRL: Self-search Reinforcement Learning (2025) • No Venue
Fan et al.
Missing Premise Exacerbates Overthinking: Are Reasoning Models Losing Critical Thinking Skill? (2025) • No Venue
Fan et al.
Thinkless: LLM Learns When To Think (2025) • No Venue
Gongfan Fang, Xinyin Ma, Xinchao Wang
Robix: A Unified Model For Robot Interaction, Reasoning And Planning (2025) • No Venue
Fang et al.
SRPO: Self-referential Policy Optimization For Vision-language-action Models (2025) • No Venue
Fei et al.
Grounding Computer Use Agents On Human Demonstrations (2025) • No Venue
Feizi et al.
Retool: Reinforcement Learning For Strategic Tool Use In Llms (2025) • No Venue
Feng et al.
Onethinker: All-in-one Reasoning Model For Image And Video (2025) • No Venue
Feng et al.
Video-r1: Reinforcing Video Reasoning In Mllms (2025) • No Venue
Feng et al.
Areal: A Large-scale Asynchronous Reinforcement Learning System For Language Reasoning (2025) • No Venue
Fu et al.
Towards General-purpose Model-free Reinforcement Learning (2025) • No Venue
Fujimoto et al.
Listener-rewarded Thinking In Vlms For Image Preferences (2025) • No Venue
Gambashidze et al.
Cognitive Behaviors That Enable Self-improving Reasoners, Or, Four Habits Of Highly Effective Stars (2025) • No Venue
Gandhi et al.
Beyond Ten Turns: Unlocking Long-horizon Agentic Search With Large-scale Asynchronous RL (2025) • No Venue
Gao et al.
Soft Adaptive Policy Optimization (2025) • No Venue
Gao et al.
Seedance 1.0: Exploring The Boundaries Of Video Generation Models (2025) • No Venue
Gao et al.
The Differences Between Direct Alignment Algorithms Are A Blur (2025) • No Venue
Gorbatovski et al.
Arc-hunyuan-video-7b: Structured Video Comprehension Of Real-world Shorts (2025) • No Venue
Ge et al.
X-omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again (2025) • No Venue
Geng et al.
Webwatcher: Breaking New Frontier Of Vision-language Deep Research Agent (2025) • No Venue
Geng et al.
Guided By Gut: Efficient Test-time Scaling With Reinforced Intrinsic Confidence (2025) • No Venue
Ghasemabadi et al.
Training Long-context, Multi-turn Software Engineering Agents With Reinforcement Learning (2025) • No Venue
Golubev et al.
Diffucoder: Understanding And Improving Masked Diffusion Models For Code Generation (2025) • No Venue
Gong et al.
Seedream 2.0: A Native Chinese-english Bilingual Image Generation Foundation Model (2025) • No Venue
Gong et al.
Ui-venus Technical Report: Building High-performance UI Agents With RFT (2025) • No Venue
Gu et al.
Rstar-math: Small Llms Can Master Math Reasoning With Self-evolved Deep Thinking (2025) • No Venue
Guan et al.
Skywork Open Reasoner 1 Technical Report (2025) • No Venue
He et al.
Reward Reasoning Model (2025) • No Venue
Guo et al.
Mineworld: A Real-time And Open-source Interactive World Model On Minecraft (2025) • No Venue
Guo et al.
Can We Generate Images With Cot? Let's Verify And Reinforce Image Generation Step By Step (2025) • No Venue
Guo et al.
Tree-based Dialogue Reinforced Policy Optimization For Red-teaming Attacks (2025) • No Venue
Guo et al.
RLP: Reinforcement As A Pretraining Objective (2025) • No Venue
Hatamizadeh et al.
Random Policy Valuation Is Enough For LLM Reasoning With Verifiable Rewards (2025) • No Venue
He et al.
Pasa: An LLM Agent For Comprehensive Academic Paper Search (2025) • No Venue
He et al.
Hardtests: Synthesizing High-quality Test Cases For LLM Coding (2025) • No Venue
He et al.
Visplay: Self-evolving Vision-language Models From Images (2025) • No Venue
He et al.
Videoscore2: Think Before You Score In Generative Video Evaluation (2025) • No Venue
He et al.
Videossr: Video Self-supervised Reinforcement Learning (2025) • No Venue
He et al.
A Sober Look At Progress In Language Model Reasoning: Pitfalls And Paths To Reproducibility (2025) • No Venue
Hochlehnert et al.
Deepeyesv2: Toward Agentic Multimodal Model (2025) • No Venue
Hong et al.
Glm-4.1v-thinking: Towards Versatile Multimodal Reasoning With Scalable Reinforcement Learning (2025) • No Venue
Hong et al.
The Art Of Scaling Reinforcement Learning Compute For Llms (2025) • No Venue
Khatri et al.
REINFORCE++: A Simple And Efficient Approach For Aligning Large Language Models (2025) • No Venue
Jian Hu
Quest: Incentivizing Llms To Generate Difficult Problems (2025) • No Venue
Hu et al.
Beyond 'aha!': Toward Systematic Meta-abilities Alignment In Large Reasoning Models (2025) • No Venue
Hu et al.
Lmgame-bench: How Good Are Llms At Playing Games? (2025) • No Venue
Hu et al.
See, Point, Fly: A Learning-free VLM Framework For Universal Unmanned Aerial Navigation (2025) • No Venue
Hu et al.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability Of LLM Reasoning (2025) • No Venue
Huan et al.
Qerl: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning For Llms (2025) • No Venue
Huang et al.
Loong: Synthesize Long Chain-of-thoughts At Scale Through Verifiers (2025) • No Venue
Huang et al.
Beyond The Exploration-exploitation Trade-off: A Hidden State Approach For LLM Reasoning In RLVR (2025) • No Venue
Huang et al.
Low-probability Tokens Sustain Exploration In Reinforcement Learning With Verifiable Reward (2025) • No Venue
Huang et al.
Vision-r1: Incentivizing Reasoning Capability In Multimodal Large Language Models (2025) • No Venue
Huang et al.
Thinkact: Vision-language-action Reasoning Via Reinforced Visual Latent Planning (2025) • No Venue
Huang et al.
Spotlight On Token Perception For Multimodal Reinforcement Learning (2025) • No Venue
Huang et al.
R-zero: Self-evolving Reasoning LLM From Zero Data (2025) • No Venue
Huang et al.
Tree Search For LLM Agent Reinforcement Learning (2025) • No Venue
Ji et al.
Alphadrive: Unleashing The Power Of Vlms In Autonomous Driving Via Reinforcement Learning And Reasoning (2025) • No Venue
Jiang et al.
VCRL: Variance-based Curriculum Reinforcement Learning For Large Language Models (2025) • No Venue
Jiang et al.
R-4B: Incentivizing General-purpose Auto-thinking Capability In Mllms Via Bi-mode Annealing And Reinforce Learning (2025) • No Venue
Jiang et al.
T2I-R1: Reinforcing Image Generation With Collaborative Semantic-level And Token-level Cot (2025) • No Venue
Jiang et al.
Think Only When You Need With Large Hybrid-reasoning Models (2025) • No Venue
Jiang et al.
Verltool: Towards Holistic Agentic Reinforcement Learning With Tool Use (2025) • No Venue
Jiang et al.
VIDEOP2R: Video Understanding From Perception To Reasoning (2025) • No Venue
Jiang et al.
Search-r1: Training Llms To Reason And Leverage Search Engines With Reinforcement Learning (2025) • No Venue
Jin et al.
Omni-reward: Towards Generalist Omni-modal Reward Modeling With Free-form Preferences (2025) • No Venue
Jin et al.
VIKI-R: Coordinating Embodied Multi-agent Cooperation Via Reinforcement Learning (2025) • No Venue
Kang et al.
Reasoning With Sampling: Your Base Model Is Smarter Than You Think (2025) • No Venue
Aayush Karan, Yilun Du
Piper: On-device Environment Setup Via Online Reinforcement Learning (2025) • No Venue
Kovrigin et al.
Robot-r1: Reinforcement Learning For Enhanced Embodied Reasoning In Robotics (2025) • No Venue
Kim et al.
Meta-awareness Enhances Reasoning Models: Self-alignment Reinforcement Learning (2025) • No Venue
Yoonjeon Kim, Doohyuk Jang, Eunho Yang
Toward Evaluative Thinking: Meta Policy Optimization With Evolving Reward Models (2025) • No Venue
Kim et al.
SDPO: Segment-level Direct Preference Optimization For Social Agents (2025) • No Venue
Kong et al.
Cadrille: Multi-modal CAD Reconstruction With Online Reinforcement Learning (2025) • No Venue
Kolodiazhnyi et al.
Language Self-play For Data-free Training (2025) • No Venue
Kuba et al.
Opensir: Open-ended Self-improving Reasoner (2025) • No Venue
Kwan et al.
Mini-o3: Scaling Up Reasoning Patterns And Interaction Turns For Visual Search (2025) • No Venue
Lai et al.
Experience Is The Best Teacher: Grounding Vlms For Robotics Through Self-generated Memory (2025) • No Venue
Lan et al.
No Prompt Left Behind: Exploiting Zero-variance Prompts In LLM Reinforcement Learning Via Entropy-guided Advantage Shaping (2025) • No Venue
Le et al.
Rearag: Knowledge-guided Reasoning Enhances Factuality Of Large Reasoning Models With Iterative Retrieval Augmented Generation (2025) • No Venue
Lee et al.
MMR1: Enhancing Multimodal Reasoning With Variance-aware Sampling And Open Resources (2025) • No Venue
Leng et al.
Unified Reinforcement And Imitation Learning For Vision-language Models (2025) • No Venue
Lee et al.
Visual-cog: Stage-aware Reinforcement Learning With Chain Of Guidance For Text-to-image Generation (2025) • No Venue
Li et al.
Can One Domain Help Others? A Data-centric Study On Multi-domain Reasoning Via Reinforcement Learning (2025) • No Venue
Li et al.
Attention Illuminates LLM Reasoning: The Preplan-and-anchor Rhythm Enables Fine-grained Policy Optimization (2025) • No Venue
Li et al.
Autotriton: Automatic Triton Programming With Reinforcement Learning In Llms (2025) • No Venue
Li et al.
Miromind-m1: An Open-source Advancement In Mathematical Reasoning Via Context-aware Multi-stage Policy Optimization (2025) • No Venue
Li et al.
Deepagent: A General Reasoning Agent With Scalable Toolsets (2025) • No Venue
Li et al.
Confidence Is All You Need: Few-shot RL Fine-tuning Of Language Models (2025) • No Venue
Li et al.
Chain-of-agents: End-to-end Agent Foundation Models Via Multi-agent Distillation And Agentic RL (2025) • No Venue
Li et al.
CUDA-L1: Improving CUDA Optimization Via Contrastive Reinforcement Learning (2025) • No Venue
Li et al.
In-the-flow Agentic System Optimization For Effective Planning And Tool Use (2025) • No Venue
Li et al.
Jointly Reinforcing Diversity And Quality In Language Model Generations (2025) • No Venue
Li et al.
Implicit Actor Critic Coupling Via A Supervised Learning Framework For RLVR (2025) • No Venue
Li et al.
JARVIS-VLA: Post-training Large-scale Vision Language Models To Play Visual Games With Keyboards And Mouse (2025) • No Venue
Li et al.
Knapsack RL: Unlocking Exploration Of Llms Via Optimizing Budget Allocation (2025) • No Venue
Li et al.
Veripo: Cultivating Long Reasoning In Video-llms Via Verifier-gudied Iterative Policy Optimization (2025) • No Venue
Li et al.
Staying In The Sweet Spot: Responsive Reasoning Evolution Via Capability-adaptive Hint Scaffolding (2025) • No Venue
Li et al.
Optimus-3: Towards Generalist Multimodal Minecraft Agents With Scalable Task Experts (2025) • No Venue
Li et al.
Mol-r1: Towards Explicit Long-cot Reasoning In Molecule Discovery (2025) • No Venue
Li et al.
Perception, Reason, Think, And Plan: A Survey On Large Multimodal Reasoning Models (2025) • No Venue
Li et al.
Reinforcement Learning Foundations For Deep Research Systems: A Survey (2025) • No Venue
Li et al.
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning In Long-form Video Understanding (2025) • No Venue
Li et al.
Self-rewarding Vision-language Model Via Reasoning Decomposition (2025) • No Venue
Li et al.
Search-o1: Agentic Search-enhanced Large Reasoning Models (2025) • No Venue
Li et al.
Simplevla-rl: Scaling VLA Training Via Reinforcement Learning (2025) • No Venue
Li et al.
Uniworld-v2: Reinforce Image Editing With Diffusion Negative-aware Finetuning And MLLM Implicit Feedback (2025) • No Venue
Li et al.
Tempsamp-r1: Effective Temporal Sampling With Reinforcement Fine-tuning For Video Llms (2025) • No Venue
Li et al.
Uni-moe-2.0-omni: Scaling Language-centric Omnimodal Large Model With Advanced Moe, Training And Data (2025) • No Venue
Li et al.
Treepo: Bridging The Gap Of Policy Optimization And Efficacy And Inference Efficiency With Heuristic Tree-based Modeling (2025) • No Venue
Li et al.
Truth In The Few: High-value Data Selection For Efficient Multi-modal Reasoning (2025) • No Venue
Li et al.
Webthinker: Empowering Large Reasoning Models With Deep Research Capability (2025) • No Venue
Li et al.
VLA-RFT: Vision-language-action Reinforcement Fine-tuning With Verified Rewards In World Simulators (2025) • No Venue
Li et al.
Websailor-v2: Bridging The Chasm To Proprietary Agents Via Synthetic Data And Scalable Reinforcement Learning (2025) • No Venue
Li et al.
Zebra-cot: A Dataset For Interleaved Vision Language Reasoning (2025) • No Venue
Li et al.
Beyond Pass@1: Self-play With Variational Problem Synthesis Sustains RLVR (2025) • No Venue
Liang et al.
Modomodo: Multi-domain Data Mixtures For Multimodal LLM Reinforcement Learning (2025) • No Venue
Liang et al.
Think In Games: Learning To Reason In Games Via Reinforcement Learning With Large Language Models (2025) • No Venue
Liao et al.
Embrace-3k: Embodied Reasoning And Action In Complex Environments (2025) • No Venue
Lin et al.
Understanding Tool-integrated Reasoning (2025) • No Venue
Heng Lin, Zhongwen Xu
Critique-coder: Enhancing Coder Models By Critique Reinforcement Learning (2025) • No Venue
Ruan et al.
Beyond Distillation: Pushing The Limits Of Medical LLM Reasoning With Minimalist Rule-based RL (2025) • No Venue
Liu et al.
Agent0-vl: Exploring Self-evolving Agent For Tool-integrated Vision-language Reasoning (2025) • No Venue
Liu et al.
Acereason-nemotron 1.1: Advancing Math And Code Reasoning Through SFT And RL Synergy (2025) • No Venue
Liu et al.
Compassverifier: A Unified And Robust Verifier For Llms Evaluation And Outcome Reward (2025) • No Venue
Liu et al.
CPGD: Toward Stable Rule-based Reinforcement Learning For Language Models (2025) • No Venue
Liu et al.
Fin-r1: A Large Language Model For Financial Reasoning Through Reinforcement Learning (2025) • No Venue
Liu et al.
Flow-grpo: Training Flow Matching Models Via Online RL (2025) • No Venue
Liu et al.
Logical Reasoning In Large Language Models: A Survey (2025) • No Venue
Liu et al.
Inference-time Scaling For Generalist Reward Modeling (2025) • No Venue
Liu et al.
GEM: A Gym For Agentic Llms (2025) • No Venue
Liu et al.
Guardreasoner: Towards Reasoning-based LLM Safeguards (2025) • No Venue
Liu et al.
Llm-powered GUI Agents In Phone Automation: Surveying Progress And Prospects (2025) • No Venue
Liu et al.
Learn To Reason Efficiently With Adaptive Length-based Reward Shaping (2025) • No Venue
Liu et al.
Visual-rft: Visual Reinforcement Fine-tuning (2025) • No Venue
Liu et al.
Part I: Tricks Or Traps? A Deep Dive Into RL For LLM Reasoning (2025) • No Venue
Liu et al.
Pairwise RM: Perform Best-of-n Sampling With Knockout Tournament (2025) • No Venue
Liu et al.
Othink-mr1: Stimulating Multimodal Generalized Reasoning Capabilities Via Dynamic Reinforcement Learning (2025) • No Venue
Liu et al.
Prorl: Prolonged Reinforcement Learning Expands Reasoning Boundaries In Large Language Models (2025) • No Venue
Liu et al.
Reasonrank: Empowering Passage Ranking With Strong Reasoning Ability (2025) • No Venue
Liu et al.
Rectifying LLM Thought From Lens Of Optimization (2025) • No Venue
Liu et al.
SPIRAL: Self-play On Zero-sum Games Incentivizes Reasoning Via Multi-agent Multi-turn Reinforcement Learning (2025) • No Venue
Liu et al.
Skywork-reward-v2: Scaling Preference Data Curation Via Human-ai Synergy (2025) • No Venue
Liu et al.
Spatial-ssrl: Enhancing Spatial Understanding Via Self-supervised Reinforcement Learning (2025) • No Venue
Liu et al.
Synlogic: Synthesizing Verifiable Reasoning Data At Scale For Learning Logical Reasoning And Beyond (2025) • No Venue
Liu et al.
Understanding R1-zero-like Training: A Critical Perspective (2025) • No Venue
Liu et al.
Adacot: Pareto-optimal Adaptive Chain-of-thought Triggering Via Reinforcement Learning (2025) • No Venue
Lou et al.
Webexplorer: Explore And Evolve For Training Long-horizon Web Agents (2025) • No Venue
Liu et al.
Seeing, Listening, Remembering, And Reasoning: A Multimodal Agent With Long-term Memory (2025) • No Venue
Long et al.
Webgen-agent: Enhancing Interactive Website Generation With Multi-level Feedback And Step-level Reinforcement Learning (2025) • No Venue
Lu et al.
Don't Just Fine-tune The Agent, Tune The Environment (2025) • No Venue
Lu et al.
Av-reasoner: Improving And Benchmarking Clue-grounded Audio-visual Counting For Mllms (2025) • No Venue
Lu et al.
UI-R1: Enhancing Action Prediction Of GUI Agents By Reinforcement Learning (2025) • No Venue
Lu et al.
R-horizon: How Far Can Your Large Reasoning Model Really Go In Breadth And Depth? (2025) • No Venue
Lu et al.
Agent Lightning: Train ANY AI Agents With Reinforcement Learning (2025) • No Venue
Luo et al.
Language Models Can Learn From Verbal Feedback Without Scalar Rewards (2025) • No Venue
Luo et al.
The Climb Carves Wisdom Deeper Than The Summit: On The Noisy Rewards In Learning To Reason (2025) • No Venue
Lv et al.
Exploring The Limit Of Outcome Reward For Learning Mathematical Reasoning (2025) • No Venue
Lyu et al.
Towards A Unified View Of Large Language Model Post-training (2025) • No Venue
Lv et al.
One RL To See Them All: Visual Triple Unified Reinforcement Learning (2025) • No Venue
Ma et al.
Deepperception: Advancing R1-like Cognitive Visual Perception In Mllms For Knowledge-intensive Visual Grounding (2025) • No Venue
Ma et al.
General-reasoner: Advancing LLM Reasoning Across All Domains (2025) • No Venue
Ma et al.
SQL-R1: Training Natural Language To SQL Reasoning Model By Reinforcement Learning (2025) • No Venue
Ma et al.
Rethinking RL Scaling For Vision Language Models: A Transparent, From-scratch Framework And Comprehensive Evaluation Scheme (2025) • No Venue
Ma et al.
S^2R: Teaching Llms To Self-verify And Self-correct Via Reinforcement Learning (2025) • No Venue
Ma et al.
Tool-integrated Reinforcement Learning For Repo Deep Search (2025) • No Venue
Ma et al.
Unirl: Self-improving Unified Multimodal Models Via Supervised And Reinforcement Learning (2025) • No Venue
Weijia Mao, Zhenheng Yang, Mike Zheng Shou
Mm-eureka: Exploring Visual Aha Moment With Rule-based Large-scale Reinforcement Learning (2025) • No Venue
Meng et al.
ORIGEN: Zero-shot 3D Orientation Grounding In Text-to-image Generation (2025) • No Venue
Min et al.
Multi-agent Tool-integrated Policy Optimization (2025) • No Venue
Mo et al.
Mlgym: A New Framework And Benchmark For Advancing AI Research Agents (2025) • No Venue
Nathani et al.
Learning Adaptive Parallel Reasoning With Language Models (2025) • No Venue
Pan et al.
DINO-R1: Incentivizing Reasoning Capability In Vision Foundation Models (2025) • No Venue
Pan et al.
Tokenhsi: Unified Synthesis Of Physical Human-scene Interactions Through Task Tokenization (2025) • No Venue
Pan et al.
Medvlm-r1: Incentivizing Medical Reasoning Capability Of Vision-language Models (vlms) Via Reinforcement Learning (2025) • No Venue
Pan et al.
Thinking Sparks!: Emergent Attention Heads In Reasoning Models During Post Training (2025) • No Venue
Yein Park, Minbyul Jeong, Jaewoo Kang
ACG: Action Coherence Guidance For Flow-based VLA Models (2025) • No Venue
Park et al.
Reasoning-aware GRPO Using Process Mining (2025) • No Venue
Taekhyun Park, Yongjae Lee, Hyerim Bae
LMM-R1: Empowering 3B Lmms With Strong Reasoning Abilities Through Two-stage Rule-based RL (2025) • No Venue
Peng et al.
Criticlean: Critic-guided Reinforcement Learning For Mathematical Formalization (2025) • No Venue
Peng et al.
Agentic Reward Modeling: Integrating Human Preferences With Verifiable Correctness Signals For Reliable Reward Systems (2025) • No Venue
Peng et al.
Large Reasoning Models Learn Better Alignment From Flawed Thinking (2025) • No Venue
Peng et al.
Toolrl: Reward Is All Tool Learning Needs (2025) • No Venue
Qian et al.
Defeating The Training-inference Mismatch Via FP16 (2025) • No Venue
Qi et al.
Optimizing Anytime Reasoning Via Budget Relative Policy Optimization (2025) • No Venue
Qi et al.
Fino1: On The Transferability Of Reasoning Enhanced Llms To Finance (2025) • No Venue
Qian et al.
V-thinker: Interactive Thinking With Images (2025) • No Venue
Qiao et al.
We-math 2.0: A Versatile Mathbook System For Incentivizing Visual Mathematical Reasoning (2025) • No Venue
Qiao et al.
Learn The Ropes, Then Trust The Wins: Self-imitation With Progressive Exploration For Agentic Reinforcement Learning (2025) • No Venue
Qin et al.
Optimizing Test-time Compute Via Meta Reinforcement Fine-tuning (2025) • No Venue
Qu et al.
Beyond The Trade-off: Self-supervised Reinforcement Learning For Reasoning Models' Instruction Following (2025) • No Venue
Ren et al.
RL + Transformer = A General-purpose Problem Solver (2025) • No Venue
Micah Rentschler, Jesse Roberts
REFINE-AF: A Task-agnostic Framework To Align Language Models Via Self-generated Instructions Using Reinforcement Learning From Automated Feedback (2025) • No Venue
Roy et al.
SRMT: Shared Memory For Multi-agent Lifelong Pathfinding (2025) • No Venue
Alsu Sagirova, Yuri Kuratov, Mikhail Burtsev
Training Language Models For Social Deduction With Multi-agent Reinforcement Learning (2025) • No Venue
Sarkar et al.
Self-generated In-context Examples Improve LLM Agents For Sequential Decision-making Tasks (2025) • No Venue
Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
Llms Are Greedy Agents: Effects Of RL Fine-tuning On Decision-making Abilities (2025) • No Venue
Schmied et al.
Rstar2-agent: Agentic Reasoning Technical Report (2025) • No Venue
Shang et al.
DR Tulu: Reinforcement Learning With Evolving Rubrics For Deep Research (2025) • No Venue
Shao et al.
Deepseekmath-v2: Towards Self-verifiable Mathematical Reasoning (2025) • No Venue
Shao et al.
Dupo: Enabling Reliable LLM Self-verification Via Dual Preference Optimization (2025) • No Venue
She et al.
Satori: Reinforcement Learning With Chain-of-action-thought Enhances LLM Reasoning Via Autoregressive Search (2025) • No Venue
Shen et al.
Exploring Data Scaling Trends And Effects In Reinforcement Learning From Human Feedback (2025) • No Venue
Shen et al.
Semi-off-policy Reinforcement Learning For Vision-language Slow-thinking Reasoning (2025) • No Venue
Shen et al.
Skywork-r1v3 Technical Report (2025) • No Venue
Shen et al.
VLM-R1: A Stable And Generalizable R1-style Large Vision-language Model (2025) • No Venue
Shen et al.
Heimdall: Test-time Scaling On The Generative Verification (2025) • No Venue
Wenlei Shi, Xing Jin
Deep Research: A Systematic Survey (2025) • No Venue
Shi et al.
LADDER: Self-improving Llms Through Recursive Problem Decomposition (2025) • No Venue
Toby Simonds, Akira Yoshiyama
Fathom-deepresearch: Unlocking Long Horizon Information Retrieval And Synthesis For Slms (2025) • No Venue
Shreyas Singh, Kunal Singh, Pradeep Moturi
Agentic Reasoning And Tool Integration For Llms Via Reinforcement Learning (2025) • No Venue
Singh et al.
Linguistic Generalizability Of Test-time Scaling In Mathematical Reasoning (2025) • No Venue
Son et al.
RL Makes Mllms See Better Than SFT (2025) • No Venue
Song et al.
R1-searcher: Incentivizing The Search Capability In Llms Via Reinforcement Learning (2025) • No Venue
Song et al.
Iterative Self-training For Code Generation Via Reinforced Re-ranking (2025) • No Venue
Nikita Sorokin, Ivan Sedykh, Valentin Malykh
Video-lmm Post-training: A Deep Dive Into Video Reasoning With Large Multimodal Models (2025) • No Venue
Tang et al.
Toolorchestra: Elevating Intelligence Via Efficient Model And Tool Orchestration (2025) • No Venue
Su et al.
Klear-reasoner: Advancing Reasoning Capability Via Gradient-preserving Clipping Policy Optimization (2025) • No Venue
Su et al.
Expanding RL With Verifiable Rewards Across Diverse Domains (2025) • No Venue
Su et al.
Pixel Reasoner: Incentivizing Pixel-space Reasoning With Curiosity-driven Reinforcement Learning (2025) • No Venue
Su et al.
Openthinkimg: Learning To Think With Images Via Visual Tool Reinforcement Learning (2025) • No Venue
Su et al.
Stop Overthinking: A Survey On Efficient Reasoning For Large Language Models (2025) • No Venue
Sui et al.
Inverse Reinforcement Learning Meets Large Language Model Post-training: Basics, Advances, And Opportunities (2025) • No Venue
Hao Sun, Mihaela van Der Schaar
Grouprank: A Groupwise Reranking Paradigm Driven By Reinforcement Learning (2025) • No Venue
Sun et al.
Transformer^2: Self-adaptive Llms (2025) • No Venue
Qi Sun, Edoardo Cetin, Yujin Tang
Seagent: Self-evolving Computer Use Agent With Autonomous Learning From Experience (2025) • No Venue
Sun et al.
Zerosearch: Incentivize The Search Capability Of Llms Without Searching (2025) • No Venue
Sun et al.
Hiersearch: A Hierarchical Enterprise Deep Search Framework Integrating Local And Web Searches (2025) • No Venue
Tan et al.
Intrex: A Dataset For Modeling Engagement In Educational Conversations (2025) • No Venue
Tan et al.
Lingshu: A Generalist Foundation Model For Unified Multimodal Medical Understanding And Reasoning (2025) • No Venue
Team et al.
Hybrid Reinforcement: When Reward Is Sparse, It's Better To Be Dense (2025) • No Venue
Tao et al.
Kwai Keye-vl Technical Report (2025) • No Venue
Team et al.
Baichuan-m2: Scaling Medical Capability With Large Verifier System (2025) • No Venue
Team et al.
Every Step Evolves: Scaling Reinforcement Learning For Trillion-scale Thinking Model (2025) • No Venue
Team et al.
Kimi K1.5: Scaling Reinforcement Learning With Llms (2025) • No Venue
Team et al.
GLM-4.5: Agentic, Reasoning, And Coding (ARC) Foundation Models (2025) • No Venue
Team et al.
Longcat-flash-omni Technical Report (2025) • No Venue
Team et al.
Mimo: Unlocking The Reasoning Potential Of Language Model -- From Pretraining To Posttraining (2025) • No Venue
Team et al.
Mirothinker: Pushing The Performance Boundaries Of Open-source Research Agents Via Model, Context, And Interactive Scaling (2025) • No Venue
Team et al.
PAN: A World Model For General, Interactable, And Long-horizon World Simulation (2025) • No Venue
Team et al.
Think Twice: Enhancing LLM Reasoning By Scaling Multi-round Test-time Thinking (2025) • No Venue
Tian et al.
Mmada-parallel: Multimodal Large Diffusion Language Models For Thinking-aware Editing And Generation (2025) • No Venue
Tian et al.
Ego-r1: Chain-of-tool-thought For Ultra-long Egocentric Video Reasoning (2025) • No Venue
Tian et al.
More Thought, Less Accuracy? On The Dual Nature Of Reasoning In Vision-language Models (2025) • No Venue
Tian et al.
Diffusion Models Are Real-time Game Engines (2024) • No Venue
Valevski et al.
Qwen2.5 Technical Report (2024) • No Venue
Qwen et al.
Openai O1 System Card (2024) • No Venue
Openai et al.
Understanding Alignment In Multimodal Llms: A Comprehensive Study (2024) • No Venue
Amirloo et al.
Seed-tts: A Family Of High-quality Versatile Speech Generation Models (2024) • No Venue
Anastassiou et al.
Digirl: Training In-the-wild Device-control Agents With Autonomous Reinforcement Learning (2024) • No Venue
Bai et al.
Fintral: A Family Of GPT-4 Level Multimodal Financial Large Language Models (2024) • No Venue
Bhatia et al.
Genie: Generative Interactive Environments (2024) • No Venue
Bruce et al.
Internlm2 Technical Report (2024) • No Venue
Cai et al.
ROCKET-1: Master Open-world Interaction With Visual-temporal Context Prompting (2024) • No Venue
Cai et al.
Survey On Large Language Model-enhanced Reinforcement Learning: Concept, Taxonomy, And Methods (2024) • IEEE Transactions on Neural Networks and Learning Systems • 50 citations
Cao et al.
Bootstrapping Language Models With DPO Implicit Rewards (2024) • No Venue
Chen et al.
Huatuogpt-vision, Towards Injecting Medical Visual Knowledge Into Multimodal Llms At Scale (2024) • No Venue
Chen et al.
Self-play Fine-tuning Converts Weak Language Models To Strong Language Models (2024) • No Venue
Chen et al.
Advancing LLM Reasoning Generalists With Preference Trees (2024) • No Venue
Yuan et al.
Self-rewarding Language Models (2024) • No Venue
Yuan et al.
The Browsergym Ecosystem For Web Agent Research (2024) • No Venue
Chezelles et al.
Self-improving Robust Preference Optimization (2024) • No Venue
Choi et al.
RACER: Rich Language-guided Failure Recovery Policies For Imitation Learning (2024) • No Venue
Dai et al.
Toward Self-improvement Of Llms Via Imagination, Searching, And Criticizing (2024) • No Venue
Tian et al.
Llms In The Imaginarium: Tool Learning Through Simulated Trial And Error (2024) • No Venue
Wang et al.
Octo: An Open-source Generalist Robot Policy (2024) • No Venue
Team et al.
Helpsteer2-preference: Complementing Ratings With Preferences (2024) • No Venue
Wang et al.
Grutopia: Dream General Robots In A City At Scale (2024) • No Venue
Wang et al.
Mdpo: Conditional Preference Optimization For Multimodal Large Language Models (2024) • No Venue
Wang et al.
Offline Reinforcement Learning For LLM Multi-step Reasoning (2024) • No Venue
Wang et al.
Sotopia-π: Interactive Learning Of Socially Intelligent Language Agents (2024) • No Venue
Wang et al.
Secrets Of RLHF In Large Language Models Part II: Reward Modeling (2024) • No Venue
Wang et al.
A Large Recurrent Action Model: Xlstm Enables Fast Inference For Robotics Tasks (2024) • No Venue
Schmied et al.
BOND: Aligning Llms With Best-of-n Distillation (2024) • No Venue
Sessa et al.
Abstractive Text Summarization: State Of The Art, Challenges, And Improvements (2024) • Neurocomputing • 43 citations
Hassan Shakil, Ahmad Farooq, Jugal Kalita
Show, Don't Tell: Aligning Language Models With Demonstrated Feedback (2024) • No Venue
Shaikh et al.
Deepseekmath: Pushing The Limits Of Mathematical Reasoning In Open Language Models (2024) • No Venue
Shao et al.
Nemo-aligner: Scalable Toolkit For Efficient Model Alignment (2024) • No Venue
Shen et al.
PERL: Parameter Efficient Reinforcement Learning From Human Feedback (2024) • No Venue
Sidahmed et al.
Lipo: Listwise Preference Optimization Through Learning-to-rank (2024) • No Venue
Liu et al.
Rm-bench: Benchmarking Reward Models Of Language Models With Subtlety And Style (2024) • No Venue
Liu et al.
Skywork-reward: Bag Of Tricks For Reward Modeling In Llms (2024) • No Venue
Liu et al.
Aligning Large Language Models Via Self-steering Optimization (2024) • No Venue
Xiang et al.
Self-play Preference Optimization For Language Model Alignment (2024) • No Venue
Wu et al.
O1-coder: An O1 Replication For Coding (2024) • No Venue
Zhang et al.
Llama-berry: Pairwise Optimization For O1-like Olympiad-level Mathematical Reasoning (2024) • No Venue
Zhang et al.
Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments (2024) • No Venue
Xi et al.
Self-exploring Language Models: Active Preference Elicitation For Online Alignment (2024) • No Venue
Zhang et al.
Generative World Explorer (2024) • No Venue
Lu et al.
Reft: Reasoning With Reinforced Fine-tuning (2024) • No Venue
Luong et al.
Preference Tuning With Human Feedback On Language, Speech, And Vision Tasks: A Survey (2024) • No Venue
Winata et al.
Xland-100b: A Large-scale Multi-task Dataset For In-context Reinforcement Learning (2024) • No Venue
Nikulin et al.
Diffusion Augmented Agents: A Framework For Efficient Exploration And Transfer Learning (2024) • No Venue
Palo et al.
Iterative Reasoning Preference Optimization (2024) • No Venue
Pang et al.
Marco-o1: Towards Open Reasoning Models For Open-ended Solutions (2024) • No Venue
Zhao et al.
Webrl: Training LLM Web Agents Via Self-evolving Online Curriculum Reinforcement Learning (2024) • No Venue
Qi et al.
Humanoid Locomotion As Next Token Prediction (2024) • No Venue
Radosavovic et al.
WARP: On The Benefits Of Weight Averaged Rewarded Policies (2024) • No Venue
Ramé et al.
Small Language Model Meets With Reinforced Vision Vocabulary (2024) • No Venue
Wei et al.
Direct Nash Optimization: Teaching Language Models To Self-improve With General Preferences (2024) • No Venue
Rosset et al.
RLHF Workflow: From Reward Modeling To Online RLHF (2024) • No Venue
Dong et al.
Stepcoder: Improve Code Generation With Reinforcement Learning From Compiler Feedback (2024) • No Venue
Dou et al.
Learning To Move Like Professional Counter-strike Players (2024) • No Venue
Durst et al.
Natural Language Reinforcement Learning (2024) • No Venue
Feng et al.
Stream Of Search (sos): Learning To Search In Language (2024) • No Venue
Gandhi et al.
Dreamreward: Text-to-3d Generation With Human Preference (2024) • No Venue
Ye et al.
Learn Your Reference Model For Real Good Alignment (2024) • No Venue
Gorbatovski et al.
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2024) • No Venue
Grosnit et al.
Direct Language Model Alignment From Online AI Feedback (2024) • No Venue
Guo et al.
Video As The New Language For Real-world Decision Making (2024) • No Venue
Yang et al.
Teaching Large Language Models To Reason With Reinforcement Learning (2024) • No Venue
Havrilla et al.
Large-scale Reinforcement Learning For Diffusion Models (2024) • No Venue
Zhang et al.
Openrlhf: An Easy-to-use, Scalable And High-performance RLHF Framework (2024) • No Venue
Hu et al.
Pokéllmon: A Human-parity Agent For Pokémon Battles With Large Language Models (2024) • No Venue
Sihao Hu, Tiansheng Huang, Ling Liu
Sleeper Agents: Training Deceptive Llms That Persist Through Safety Training (2024) • No Venue
Hubinger et al.
Modulated Intervention Preference Optimization (MIPO): Keep The Easy, Refine The Difficult (2024) • No Venue
Cheolhun Jang
Vineppo: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment (2024) • No Venue
Kazemnejad et al.
Automatic Speech Recognition Using Advanced Deep Learning Approaches: A Survey (2024) • Information Fusion • 111 citations
Hamza Kheddar, Mustapha Hemis, Yassine Himeur
Can Large Language Models Explore In-context? (2024) • No Venue
Krishnamurthy et al.
Training Language Models To Self-correct Via Reinforcement Learning (2024) • No Venue
Kumar et al.
Autowebglm: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent (2024) • No Venue
Lai et al.
TÜLU 3: Pushing Frontiers In Open Language Model Post-training (2024) • No Venue
Lambert et al.
Rewardbench: Evaluating Reward Models For Language Modeling (2024) • No Venue
Lambert et al.
Parrot: Pareto-optimal Multi-reward Reinforcement Learning Framework For Text-to-image Generation (2024) • No Venue
Lee et al.
Mix-ln: Unleashing The Power Of Deeper Layers By Combining Pre-ln And Post-ln (2024) • No Venue
Pengxiang Li, Lu Yin, Shiwei Liu
Generative Motion Stylization Of Cross-structure Characters Within Canonical Motion Space (2024) • IEEE Journal on Selected Areas in Communications • 46 citations
Zhang et al.
Controllable Text Generation For Large Language Models: A Survey (2024) • No Venue
Liang et al.
Learning To Learn Faster From Human Feedback With Language Model Predictive Control (2024) • No Venue
Liang et al.
Improve Vision Language Model Chain-of-thought Reasoning (2024) • No Venue
Zhang et al.
Critic-v: VLM Critics Help Catch VLM Errors In Multimodal Reasoning (2024) • No Venue
Zhang et al.
Critical Tokens Matter: Token-level Contrastive Estimation Enhence Llm's Reasoning Capability (2024) • No Venue
Lin et al.
FLAME: Factuality-aware Alignment For Large Language Models (2024) • No Venue
Lin et al.
A Preliminary Study Of O1 In Medicine: Are We Closer To An AI Doctor? (2024) • No Venue
Xie et al.
Summary Of Chatgpt-related Research And Perspective Towards The Future Of Large Language Models (2023) • Meta-Radiology • 582 citations
Liu et al.
Generative Ai-enabled Vehicular Networks: Fundamentals, Framework, And Case Study (2023) • IEEE Network • 55 citations
Zhang et al.
Aligning Large Multimodal Models With Factually Augmented RLHF (2023) • No Venue
Sun et al.
Query Rewriting For Retrieval-augmented Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 114 citations
Ma et al.
Eureka: Human-level Reward Design Via Coding Large Language Models (2023) • No Venue
Ma et al.
Unveiling Security, Privacy, And Ethical Concerns Of Chatgpt (2023) • Journal of Information and Intelligence • 165 citations
Xiaodong Wu, Ran Duan, Jianbing Ni
Reflexion: Language Agents With Verbal Reinforcement Learning (2023) • Arxiv • 247 citations
Shinn et al.
No More Manual Tests? Evaluating And Improving Chatgpt For Unit Test Generation (2023) • Arxiv • 50 citations
Yuan et al.
On-policy Distillation Of Language Models: Learning From Self-generated Mistakes (2023) • No Venue
Agarwal et al.
Can We Trust The Evaluation On Chatgpt? (2023) • Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023) • 62 citations
Aiyappa et al.
Rest Meets React: Self-improvement For Multi-step Reasoning LLM Agent (2023) • No Venue
Aksitov et al.
Enable Language Models To Implicitly Learn Self-improvement From Data (2023) • No Venue
Wang et al.
Chatgpt: Applications, Opportunities, And Threats (2023) • 2023 Systems and Information Engineering Design Symposium (SIEDS) • 162 citations
Bahrini et al.
Qwen Technical Report (2023) • No Venue
Bai et al.
Emergent Autonomous Scientific Research Capabilities Of Large Language Models (2023) • Arxiv • 73 citations
Daniil A. Boiko, Robert MacKnight, Gabe Gomes
RT-2: Vision-language-action Models Transfer Web Knowledge To Robotic Control (2023) • No Venue
Brohan et al.
Weak-to-strong Generalization: Eliciting Strong Capabilities With Weak Supervision (2023) • No Venue
Burns et al.
Open Problems And Fundamental Limitations Of Reinforcement Learning From Human Feedback (2023) • No Venue
Casper et al.
Q-transformer: Scalable Offline Reinforcement Learning Via Autoregressive Q-functions (2023) • No Venue
Chebotar et al.
Diffusion Model Alignment Using Direct Preference Optimization (2023) • No Venue
Wallace et al.
Secrets Of RLHF In Large Language Models Part I: PPO (2023) • No Venue
Zheng et al.
Robogen: Towards Unleashing Infinite Data For Automated Robot Learning Via Generative Simulation (2023) • No Venue
Wang et al.
L3MVN: Leveraging Large Language Models For Visual Target Navigation (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 58 citations
Bangguo Yu, Hamidreza Kasaei, Ming Cao
RLHF-V: Towards Trustworthy Mllms Via Behavior Alignment From Fine-grained Correctional Human Feedback (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 63 citations
Yu et al.
Safe RLHF: Safe Reinforcement Learning From Human Feedback (2023) • No Venue
Dai et al.
Alpacafarm: A Simulation Framework For Methods That Learn From Human Feedback (2023) • Arxiv • 53 citations
Dubois et al.
Semantic Anomaly Detection With Large Language Models (2023) • Autonomous Robots • 63 citations
Elhafsi et al.
Just Ask For Calibration: Strategies For Eliciting Calibrated Confidence Scores From Language Models Fine-tuned With Human Feedback (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 49 citations
Tian et al.
A Comprehensive Capability Analysis Of GPT-3 And GPT-3.5 Series Models (2023) • Arxiv • 181 citations
Ye et al.
Detecting And Preventing Hallucinations In Large Vision Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 93 citations
Anisha Gunjal, Jihan Yin, Erhan Bas
Deepspeed-chat: Easy, Fast And Affordable RLHF Training Of Chatgpt-like Models At All Scales (2023) • No Venue
Yao et al.
Retroformer: Retrospective Large Language Agents With Policy Gradient Optimization (2023) • No Venue
Yao et al.
Using Human Feedback To Fine-tune Diffusion Models Without Any Reward Model (2023) • No Venue
Yang et al.
Reinforcement Learning-based Counter-misinformation Response Generation: A Case Study Of COVID-19 Vaccine Misinformation (2023) • Proceedings of the ACM Web Conference 2023 • 44 citations
Bing He, Mustaque Ahamad, Srijan Kumar
Contrastive Prefence Learning: Learning From Human Feedback Without RL (2023) • No Venue
Hejna et al.
Octopus: Embodied Vision-language Programmer From Environmental Feedback (2023) • No Venue
Yang et al.
Huatuogpt, Towards Taming Language Model To Be A Doctor (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 128 citations
Zhang et al.
The Unlocking Spell On Base Llms: Rethinking Alignment Via In-context Learning (2023) • No Venue
Lin et al.
Aligning Text-to-image Diffusion Models With Reward Backpropagation (2023) • No Venue
Prabhudesai et al.
Rich Human Feedback For Text-to-image Generation (2023) • No Venue
Liang et al.
Text2motion: From Natural Language Instructions To Feasible Plans (2023) • Autonomous Robots • 156 citations
Lin et al.
Opening Up Chatgpt: Tracking Openness, Transparency, And Accountability In Instruction-tuned Text Generators (2023) • CUI '23: ACM conference on Conversational User Interfaces • 64 citations
Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse
Learning To Model The World With Language (2023) • No Venue
Lin et al.
Theory Of Mind For Multi-agent Collaboration Via Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 40 citations
Li et al.
Language-driven Representation Learning For Robotics (2023) • Robotics: Science and Systems XIX • 47 citations
Karamcheti et al.
RLAIF: Scaling Reinforcement Learning From Human Feedback With AI Feedback (2023) • No Venue
Lee et al.
Pangu-coder2: Boosting Large Language Models For Code With Ranking Feedback (2023) • No Venue
Shen et al.
Discovering Language Model Behaviors With Model-written Evaluations (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 47 citations
Perez et al.
A Generalist Agent (2022) • Transactions on Machine Learning Research 11/2022 https://openreview.net/forum?id=1ikK0kHjvj • 60 citations
Reed et al.
Multi-agent Reinforcement Learning Is A Sequence Modeling Problem (2022) • Arxiv • 78 citations
Wen et al.
Coderl: Mastering Code Generation Through Pretrained Models And Deep Reinforcement Learning (2022) • Arxiv • 87 citations
Le et al.
Domain Adaptive Fake News Detection Via Reinforcement Learning (2022) • Proceedings of the ACM Web Conference 2022 • 87 citations
Mosallanezhad et al.
Training A Helpful And Harmless Assistant With Reinforcement Learning From Human Feedback (2022) • Arxiv • 346 citations
Bai et al.
Using Cognitive Psychology To Understand GPT-3 (2022) • Arxiv • 61 citations
Marcel Binz, Eric Schulz
What Matters In Language Conditioned Robotic Imitation Learning Over Unstructured Data (2022) • IEEE Robotics and Automation Letters • 49 citations
Oier Mees, Lukas Hermann, Wolfram Burgard
Fine-grained Image Captioning With CLIP Reward (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 52 citations
Cho et al.
ZSON: Zero-shot Object-goal Navigation Using Multimodal Goal Embeddings (2022) • Arxiv • 41 citations
Majumdar et al.
Rlprompt: Optimizing Discrete Text Prompts With Reinforcement Learning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 117 citations
Deng et al.
Minedojo: Building Open-ended Embodied Agents With Internet-scale Knowledge (2022) • Arxiv • 59 citations
Fan et al.
Quark: Controllable Text Generation With Reinforced Unlearning (2022) • NeurIPS 2022 (Oral Selection) • 45 citations
Lu et al.
Improving Alignment Of Dialogue Agents Via Targeted Human Judgements (2022) • Arxiv • 130 citations
Glaese et al.
Dynamic Prompt Learning Via Policy Gradient For Semi-structured Mathematical Reasoning (2022) • Arxiv • 41 citations
Lu et al.
Webshop: Towards Scalable Real-world Web Interaction With Grounded Language Agents (2022) • Arxiv • 46 citations
Yao et al.
Active Example Selection For In-context Learning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 66 citations
Yiming Zhang, Shi Feng, Chenhao Tan
Red Teaming Language Models With Language Models (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 141 citations
Perez et al.
Bridging The Gap Between Learning In Discrete And Continuous Environments For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Hong et al.
Hierarchical Cross-modal Agent For Robotics Vision-and-language Navigation (2021) • 2021 IEEE International Conference on Robotics and Automation (ICRA) • 45 citations
Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira
Offline Reinforcement Learning As One Big Sequence Modeling Problem (2021) • Arxiv • 41 citations
Michael Janner, Qiyang Li, Sergey Levine
Cliport: What And Where Pathways For Robotic Manipulation (2021) • Arxiv • 98 citations
Mohit Shridhar, Lucas Manuelli, Dieter Fox
Recursively Summarizing Books With Human Feedback (2021) • Arxiv • 65 citations
Wu et al.
Unified Conversational Recommendation Policy Learning Via Graph-based Reinforcement Learning (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 119 citations
Deng et al.
Transferable Dialogue Systems And User Simulators (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Tseng et al.
GALAXY: A Generative Pre-trained Model For Task-oriented Dialog With Semi-supervised Learning And Explicit Policy Injection (2021) • Arxiv • 45 citations
He et al.
INVIGORATE: Interactive Visual Grounding And Grasping In Clutter (2021) • Robotics: Science and Systems XVII • 45 citations
Zhang et al.
Bootstrap Latent-predictive Representations For Multitask Reinforcement Learning (2020) • Arxiv • 42 citations
Guo et al.
Allenact: A Framework For Embodied AI Research (2020) • Arxiv • 44 citations
Weihs et al.
Babywalk: Going Farther In Vision-and-language Navigation By Taking Baby Steps (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 59 citations
Zhu et al.
Aligning AI With Shared Human Values (2020) • Arxiv • 100 citations
Hendrycks et al.
SUPERT: Towards New Frontiers In Unsupervised Evaluation Metrics For Multi-document Summarization (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 107 citations
Yang Gao, Wei Zhao, Steffen Eger
Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 59 citations
Liu et al.
Learning Dynamic Belief Graphs To Generalize On Text-based Games (2020) • Arxiv • 55 citations
Adhikari et al.
Imitating Interactive Intelligence (2020) • Arxiv • 43 citations
Abramson et al.
Fashion Captioning: Towards Generating Accurate Descriptions With Semantic Rewards (2020) • Lecture Notes in Computer Science • 61 citations
Yang et al.
Graph Constrained Reinforcement Learning For Natural Language Action Spaces (2020) • Arxiv • 43 citations
Prithviraj Ammanabrolu, Matthew Hausknecht
Efficient Transformers: A Survey (2020) • ACM Computing Surveys • 532 citations
Tay et al.
Learning Agile Robotic Locomotion Skills By Imitating Animals (2020) • Arxiv • 41 citations
Peng et al.
Active Visual Information Gathering For Vision-language Navigation (2020) • Lecture Notes in Computer Science • 65 citations
Wang et al.
Logical Natural Language Generation From Open-domain Tables (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 114 citations
Chen et al.
Language As A Cognitive Tool To Imagine Goals In Curiosity-driven Exploration (2020) • NeurIPS 2020 • 56 citations
Colas et al.
Knowledge Graph-augmented Abstractive Summarization With Semantic-driven Cloze Reward (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 148 citations
Luyang Huang, Lingfei Wu, Lu Wang
Reinforcement Learning For Weakly Supervised Temporal Grounding Of Natural Language In Untrimmed Videos (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 69 citations
Wu et al.
Self-monitoring Navigation Agent Via Auxiliary Progress Estimation (2019) • Arxiv • 134 citations
Ma et al.
Non-monotonic Sequential Text Generation (2019) • Arxiv • 63 citations
Welleck et al.
Playing The Lottery With Rewards And Multiple Languages: Lottery Tickets In RL And NLP (2019) • Arxiv • 77 citations
Yu et al.
Making History Matter: History-advantage Sequence Training For Visual Dialog (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 63 citations
Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang
Complexity-weighted Loss And Diverse Reranking For Sentence Simplification (2019) • Proceedings of the 2019 Conference of the North • 59 citations
Kriz et al.
An Entity-driven Framework For Abstractive Summarization (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 59 citations
Sharma et al.
Learning To Navigate Unseen Environments: Back Translation With Environmental Dropout (2019) • Proceedings of the 2019 Conference of the North • 288 citations
Hao Tan, Licheng Yu, Mohit Bansal
Reinforcement Learning To Optimize Long-term User Engagement In Recommender Systems (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 223 citations
Zou et al.
Generating Persona Consistent Dialogues By Exploiting Natural Language Inference (2019) • Proceedings of the AAAI Conference on Artificial Intelligence • 65 citations
Song et al.
The Regretful Agent: Heuristic-aided Navigation Through Progress Estimation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 168 citations
Ma et al.
Automatic Generation Of Pull Request Descriptions (2019) • 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) • 103 citations
Liu et al.
Generative Teaching Networks: Accelerating Neural Architecture Search By Learning To Generate Synthetic Training Data (2019) • Arxiv • 48 citations
Such et al.
Tripping Through Time: Efficient Localization Of Activities In Videos (2019) • Arxiv • 41 citations
Hahn et al.
Rewarding Smatch: Transition-based AMR Parsing With Reinforcement Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 60 citations
Naseem et al.
Context-aware Visual Policy Network For Fine-grained Image Captioning (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 151 citations
Zha et al.
Learning To Generalize From Sparse And Underspecified Rewards (2019) • Proceedings of the 36th International Conference on Machine Learning PMLR 97130-140 2019 • 46 citations
Agarwal et al.
Reconstruct And Represent Video Contents For Captioning Via Reinforcement Learning (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 79 citations
Zhang et al.
Language As An Abstraction For Hierarchical Deep Reinforcement Learning (2019) • Arxiv • 65 citations
Jiang et al.
Summary Level Training Of Sentence Rewriting For Abstractive Summarization (2019) • Proceedings of the 2nd Workshop on New Frontiers in Summarization • 61 citations
Bae et al.
Coacor: Code Annotation For Code Retrieval With Reinforcement Learning (2019) • The World Wide Web Conference • 92 citations
Ziyu Yao, Jayavardhan Reddy Peddamail, Huan Sun
Reinforcement Learning Based Text Style Transfer Without Parallel Training Corpus (2019) • Proceedings of the 2019 Conference of the North • 80 citations
Gong et al.
Way Off-policy Batch Deep Reinforcement Learning Of Implicit Human Preferences In Dialog (2019) • Arxiv • 131 citations
Jaques et al.
A Dual Reinforcement Learning Framework For Unsupervised Text Style Transfer (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 155 citations
Luo et al.
Reinforced Dynamic Reasoning For Conversational Question Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 40 citations
Pan et al.
Simultaneous Translation With Flexible Policy Via Restricted Imitation Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 71 citations
Zheng et al.
A Hierarchical Reinforced Sequence Operation Method For Unsupervised Text Style Transfer (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 53 citations
Wu et al.
Reinforcement Learning Based Graph-to-sequence Model For Natural Question Generation (2019) • Arxiv • 79 citations
Yu Chen, Lingfei Wu, Mohammed J. Zaki
Follownet: Robot Navigation By Following Natural Language Directions With Deep Reinforcement Learning (2018) • Third Workshop in Machine Learning in the Planning and Control of Robot Motion at ICRA 2018 • 43 citations
Shah et al.
Look Before You Leap: Bridging Model-free And Model-based Reinforcement Learning For Planned-ahead Vision-and-language Navigation (2018) • Lecture Notes in Computer Science • 194 citations
Wang et al.
Multi-reward Reinforced Summarization With Saliency And Entailment (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 47 citations
Ramakanth Pasunuru, Mohit Bansal
Deep Communicating Agents For Abstractive Summarization (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 62 citations
Celikyilmaz et al.
Context Models For OOV Word Translation In Low-resource Languages (2018) • Proceedings of the 26th ACM international conference on Multimedia • 120 citations
Angli Liu, Katrin Kirchhoff
Fast Abstractive Summarization With Reinforce-selected Sentence Rewriting (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 604 citations
Yen-Chun Chen, Mohit Bansal
Deep Dyna-q: Integrating Planning For Task-completion Dialogue Policy Learning (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 177 citations
Peng et al.
Improving Automatic Source Code Summarization Via Deep Reinforcement Learning (2018) • Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering • 390 citations
Wan et al.
A Reinforced Topic-aware Convolutional Sequence-to-sequence Model For Abstractive Text Summarization (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 127 citations
Wang et al.
A Study Of Reinforcement Learning For Neural Machine Translation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 155 citations
Wu et al.
Deep Reinforcement Learning For Sequence To Sequence Models (2018) • IEEE Transactions on Neural Networks and Learning Systems • 41 citations
Keneshloo et al.
Textworld: A Learning Environment For Text-based Games (2018) • Arxiv • 96 citations
Côté et al.
Polite Dialogue Generation Without Parallel Data (2018) • Transactions of the Association for Computational Linguistics • 166 citations
Tong Niu, Mohit Bansal
Generation Of Synthetic Electronic Medical Record Text (2018) • 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) • 41 citations
Guan et al.
A Skeleton-based Model For Promoting Coherence Among Sentences In Narrative Story Generation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 94 citations
Xu et al.
Leveraging Grammar And Reinforcement Learning For Neural Program Synthesis (2018) • Arxiv • 49 citations
Bunel et al.
Decoupling Strategy And Generation In Negotiation Dialogues (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 128 citations
He et al.
Controllable Neural Story Plot Generation Via Reward Shaping (2018) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 79 citations
Tambwekar et al.
Counting To Explore And Generalize In Text-based Games (2018) • Arxiv • 50 citations
Yuan et al.
Learning To Extract Coherent Summary Via Deep Reinforcement Learning (2018) • Proceedings of the AAAI Conference on Artificial Intelligence • 130 citations
Yuxiang Wu, Baotian Hu
Toward Diverse Text Generation With Inverse Reinforcement Learning (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 86 citations
Shi et al.
Psychlab: A Psychology Laboratory For Deep Reinforcement Learning Agents (2018) • Arxiv • 61 citations
Leibo et al.
Actor-critic Based Training Framework For Abstractive Summarization (2018) • Arxiv • 46 citations
Piji Li, Lidong Bing, Wai Lam
Learning To Understand Goal Specifications By Modelling Reward (2018) • Arxiv • 69 citations
Bahdanau et al.
Sentiment Adaptive End-to-end Dialog Systems (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 75 citations
Weiyan Shi, Zhou Yu
Maskgan: Better Text Generation Via Filling In The______ (2018) • Arxiv • 255 citations
William Fedus, Ian Goodfellow, Andrew M. Dai
Reinforced Self-attention Network: A Hybrid Of Hard And Soft Attention For Sequence Modeling (2018) • Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence • 134 citations
Shen et al.
A Deep Reinforced Model For Abstractive Summarization (2017) • Arxiv • 1273 citations
Romain Paulus, Caiming Xiong, Richard Socher
End-to-end Optimization Of Goal-driven And Visually Grounded Dialogue Systems (2017) • Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence • 88 citations
Strub et al.
Sqlnet: Generating Structured Queries From Natural Language Without Reinforcement Learning (2017) • Arxiv • 303 citations
Xiaojun Xu, Chang Liu, Dawn Song
Learning To Generalize: Meta-learning For Domain Generalization (2017) • Arxiv • 113 citations
Li et al.
Meta-sgd: Learning To Learn Quickly For Few-shot Learning (2017) • Arxiv • 836 citations
Li et al.
Ask The Right Questions: Active Question Reformulation With Reinforcement Learning (2017) • Sixth International Conference on Learning Representations (ICLR) 2018 • 82 citations
Buck et al.
End-to-end Task-completion Neural Dialogue Systems (2017) • Arxiv • 58 citations
Li et al.
Deal Or No Deal? End-to-end Learning For Negotiation Dialogues (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 51 citations
Lewis et al.
Seq2sql: Generating Structured Queries From Natural Language Using Reinforcement Learning (2017) • Arxiv • 782 citations
Victor Zhong, Caiming Xiong, Richard Socher
Long Text Generation Via Adversarial Training With Leaked Information (2017) • Arxiv • 161 citations
Guo et al.
Actor-critic Sequence Training For Image Captioning (2017) • Arxiv • 99 citations
Zhang et al.
Semi-supervised QA With Generative Domain-adaptive Nets (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 143 citations
Yang et al.
Towards Diverse And Natural Image Descriptions Via A Conditional GAN (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 455 citations
Dai et al.
One-shot Imitation Learning (2017) • Arxiv • 278 citations
Duan et al.
Parlai: A Dialog Research Software Platform (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 108 citations
Miller et al.
R$^3$: Reinforced Reader-ranker For Open-domain Question Answering (2017) • Arxiv • 87 citations
Wang et al.
A Unified Query-based Generative Model For Question Generation And Question Answering (2017) • Arxiv • 50 citations
Linfeng Song, Zhiguo Wang, Wael Hamza
Learning Cooperative Visual Dialog Agents With Deep Reinforcement Learning (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 313 citations
Das et al.
Deep Reinforcement Learning For Mention-ranking Coreference Models (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 44 citations
Kevin Clark, Christopher D. Manning
Deep Reinforcement Learning For Dialogue Generation (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 1034 citations
Li et al.
Dual Learning For Machine Translation (2016) • NIPS 2016 • 597 citations
Xia et al.
Strategic Attentive Writer For Learning Macro-actions (2016) • Arxiv • 80 citations
Alexander et al.
Learning Through Dialogue Interactions By Asking Questions (2016) • Arxiv • 72 citations
Li et al.
Dialogue Learning With Human-in-the-loop (2016) • Arxiv • 46 citations
Li et al.
Neural Architecture Search With Reinforcement Learning (2016) • Arxiv • 3840 citations
Barret Zoph, Quoc V. Le
End-to-end Lstm-based Dialog Control Optimized With Supervised And Reinforcement Learning (2016) • Arxiv • 122 citations
Jason D. Williams, Geoffrey Zweig
An Actor-critic Algorithm For Sequence Prediction (2016) • Arxiv • 224 citations
Bahdanau et al.
Neural Symbolic Machines: Learning Semantic Parsers On Freebase With Weak Supervision (2016) • Arxiv • 44 citations
Liang et al.

Showing first 12 while collapsed. Click to expand and reveal all 749.

Retrieval Systems 237 papers #

Qwen3 Embedding: Advancing Text Embedding And Reranking Through Foundation Models (2025) • No Venue
Zhang et al.
Diffusion Vs. Autoregressive Language Models: A Text Embedding Perspective (2025) • No Venue
Zhang et al.
Vidorag: Visual Document Retrieval-augmented Generation Via Dynamic Iterative Reasoning Agents (2025) • No Venue
Wang et al.
Videorope: What Makes For Good Video Rotary Position Embedding? (2025) • No Venue
Wei et al.
On The Theoretical Limitations Of Embedding-based Retrieval (2025) • No Venue
Weller et al.
Seq Vs Seq: An Open Suite Of Paired Encoders And Decoders (2025) • No Venue
Weller et al.
Mmsearch-r1: Incentivizing Lmms To Search (2025) • No Venue
Wu et al.
Sitemb-v1.5: Improved Context-aware Dense Retrieval For Semantic Association And Long Story Comprehension (2025) • No Venue
Wu et al.
Dense Retrievers Can Fail On Simple Queries: Revealing The Granularity Dilemma Of Embeddings (2025) • No Venue
Xu et al.
Universalrag: Retrieval-augmented Generation Over Multiple Corpora With Diverse Modalities And Granularities (2025) • No Venue
Yeo et al.
Open Deep Search: Democratizing Search With Open-source Reasoning Agents (2025) • No Venue
Alzubi et al.
Personalized Graph-based Retrieval For Large Language Models (2025) • No Venue
Au et al.
Metaclip 2: A Worldwide Scaling Recipe (2025) • No Venue
Chuang et al.
Deepresearchgym: A Free, Transparent, And Reproducible Evaluation Sandbox For Deep Research (2025) • No Venue
Coelho et al.
Onepiece: Bringing Context Engineering And Reasoning To Industrial Cascade Ranking System (2025) • No Venue
Dai et al.
MV-RAG: Retrieval Augmented Multiview Diffusion (2025) • No Venue
Yosef Dayani, Omer Benishu, Sagie Benaim
Interactcomp: Evaluating Search Agents With Ambiguous Queries (2025) • No Venue
Deng et al.
Mmdocir: Benchmarking Multi-modal Retrieval For Long Documents (2025) • No Venue
Dong et al.
Deepresearch Bench: A Comprehensive Benchmark For Deep Research Agents (2025) • No Venue
Du et al.
Lment: A Suite For Analyzing Knowledge In Language Models From Pretraining Data To Representations (2025) • No Venue
Gottesman et al.
Spectrum Projection Score: Aligning Retrieved Summaries With Reader Models In Retrieval-augmented Generation (2025) • No Venue
Hu et al.
When Thoughts Meet Facts: Reusable Reasoning For Long-context Lms (2025) • No Venue
Jeong et al.
Videorag: Retrieval-augmented Generation Over Video Corpus (2025) • No Venue
Jeong et al.
Search-r1: Training Llms To Reason And Leverage Search Engines With Reinforcement Learning (2025) • No Venue
Jin et al.
Gemini Embedding: Generalizable Embeddings From Gemini (2025) • No Venue
Lee et al.
Webweaver: Structuring Web-scale Evidence With Dynamic Outlines For Open-ended Deep Research (2025) • No Venue
Li et al.
Gear: Generation Augmented Retrieval (2025) • No Venue
Liu et al.
E^2rank: Your Text Embedding Can Also Be An Effective And Efficient Listwise Reranker (2025) • No Venue
Liu et al.
Olmotrace: Tracing Language Model Outputs Back To Trillions Of Training Tokens (2025) • No Venue
Liu et al.
Wikivideo: Article Generation From Multiple Videos (2025) • No Venue
Martin et al.
Hard Negative Mining For Domain-specific Retrieval In Enterprise Systems (2025) • No Venue
Meghwani et al.
R-wom: Retrieval-augmented World Model For Computer-use Agents (2025) • No Venue
Mei et al.
Dota-rag: Dynamic Of Thought Aggregation RAG (2025) • No Venue
Ruangtanusak et al.
NER Retriever: Zero-shot Named Entity Retrieval With Type-aware Embeddings (2025) • No Venue
Shachar et al.
Imagerag: Dynamic Image Retrieval For Reference-guided Image Generation (2025) • No Venue
Shalev-Arkushin et al.
Reasonir: Training Retrievers For Reasoning Tasks (2025) • No Venue
Shao et al.
Are We On The Right Way For Assessing Document Retrieval-augmented Generation? (2025) • No Venue
Shen et al.
IFIR: A Comprehensive Benchmark For Evaluating Instruction-following In Expert-domain Information Retrieval (2025) • No Venue
Song et al.
Modernvbert: Towards Smaller Visual Document Retrievers (2025) • No Venue
Teiletche et al.
Open Multimodal Retrieval-augmented Factual Image Generation (2025) • No Venue
Tian et al.
Openscholar: Synthesizing Scientific Literature With Retrieval-augmented Lms (2024) • No Venue
Asai et al.
Seven Failure Points When Engineering A Retrieval Augmented Generation System (2024) • Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI • 57 citations
Barnett et al.
BGE M3-embedding: Multi-lingual, Multi-functionality, Multi-granularity Text Embeddings Through Self-knowledge Distillation (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 221 citations
Chen et al.
Mindsearch: Mimicking Human Minds Elicits Deep AI Searcher (2024) • No Venue
Chen et al.
MS MARCO Web Search: A Large-scale Information-rich Web Dataset With Millions Of Real Click Labels (2024) • No Venue
Chen et al.
Panda-70m: Captioning 70M Videos With Multiple Cross-modality Teachers (2024) • No Venue
Chen et al.
MLLM As Retriever: Interactively Learning Multimodal Retrieval For Embodied Agents (2024) • No Venue
Yue et al.
CORAL: Benchmarking Multi-turn Conversational Retrieval-augmentation Generation (2024) • No Venue
Cheng et al.
M-longdoc: A Benchmark For Multimodal Super-long Document Understanding And A Retrieval-aware Tuning Framework (2024) • No Venue
Chia et al.
Jacolbertv2.5: Optimising Multi-vector Retrievers To Create State-of-the-art Japanese Retrievers With Constrained Resources (2024) • No Venue
Benjamin Clavié
The Power Of Noise: Redefining Retrieval For RAG Systems (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 105 citations
Cuconasu et al.
Bias And Unfairness In Information Retrieval Systems: New Challenges In The LLM Era (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 59 citations
Dai et al.
Dreamrunner: Fine-grained Storytelling Video Generation With Retrieval-augmented Motion Adaptation (2024) • No Venue
Wang et al.
Multimodal Needle In A Haystack: Benchmarking Long-context Capability Of Multimodal Large Language Models (2024) • No Venue
Wang et al.
Utilizing Local Hierarchy With Adversarial Training For Hierarchical Text Classification (2024) • ACM Computing Surveys • 58 citations
Zihan Wang, Peiyi Wang, Houfeng Wang
Needle In A Multimodal Haystack (2024) • No Venue
Wang et al.
Htmlrag: HTML Is Better Than Plain Text For Modeling Retrieved Knowledge In RAG Systems (2024) • No Venue
Tan et al.
Writing In The Margins: Better Inference Pattern For Long Context Retrieval (2024) • No Venue
Russak et al.
RAPTOR: Recursive Abstractive Processing For Tree-organized Retrieval (2024) • No Venue
Sarthi et al.
Scaling Retrieval-based Language Models With A Trillion-token Datastore (2024) • No Venue
Shao et al.
Generative Echo Chamber? Effects Of Llm-powered Search Systems On Diverse Information Seeking (2024) • CHI '24: CHI Conference on Human Factors in Computing Systems • 52 citations
Nikhil Sharma, Q. Vera Liao, Ziang Xiao
The Russian-focused Embedders' Exploration: Rumteb Benchmark And Russian Embedding Model Design (2024) • No Venue
Snegirev et al.
Jina-embeddings-v3: Multilingual Embeddings With Task Lora (2024) • No Venue
Sturua et al.
Chatqa: Building GPT-4 Level Conversational QA Models (2024) • No Venue
Liu et al.
Retrievalattention: Accelerating Long-context LLM Inference Via Vector Retrieval (2024) • No Venue
Liu et al.
Onegen: Efficient One-pass Unified Generation And Retrieval For Llms (2024) • No Venue
Zhang et al.
Llava-mr: Large Language-and-vision Assistant For Video Moment Retrieval (2024) • Arxiv • 92 citations
Lu et al.
Weblinx: Real-world Website Navigation With Multi-turn Dialogue (2024) • No Venue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
Contextual Document Embeddings (2024) • No Venue
John X. Morris, Alexander M. Rush
Relik: Retrieve And Link, Fast And Accurate Entity Linking And Relation Extraction On An Academic Budget (2024) • No Venue
Orlando et al.
Promptriever: Instruction-trained Retrievers Can Be Prompted Like Language Models (2024) • No Venue
Weller et al.
Evaluating D-MERIT Of Partial-annotation On Information Retrieval (2024) • No Venue
Rassin et al.
Needle Threading: Can Llms Follow Threads Through Near-million-scale Haystacks? (2024) • No Venue
Jonathan Roberts, Kai Han, Samuel Albanie
Associative Recurrent Memory Transformer (2024) • No Venue
Rodkin et al.
Progressive Multimodal Reasoning Via Active Retrieval (2024) • No Venue
Dong et al.
Mgte: Generalized Long-context Text Representation And Reranking Models For Multilingual Text Retrieval (2024) • No Venue
Zhang et al.
Colpali: Efficient Document Retrieval With Vision Language Models (2024) • No Venue
Faysse et al.
Towards Flexible Perception With Visual Memory (2024) • No Venue
Geirhos et al.
Paecter: Patent-level Representation Learning Using Citation-informed Transformers (2024) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 218 citations
Ghosh et al.
Longrag: Enhancing Retrieval-augmented Generation With Long-context Llms (2024) • No Venue
Ziyan Jiang, Xueguang Ma, Wenhu Chen
Jina CLIP: Your CLIP Model Is Also Your Text Retriever (2024) • No Venue
Koukounas et al.
Gecko: Versatile Text Embeddings Distilled From Large Language Models (2024) • No Venue
Lee et al.
Needlebench: Can Llms Do Retrieval And Reasoning In 1 Million Context Window? (2024) • No Venue
Li et al.
Retrollm: Empowering Large Language Models To Retrieve Fine-grained Evidence Within Generation (2024) • No Venue
Li et al.
Paper Copilot: A Self-evolving And Efficient LLM System For Personalized Academic Assistance (2024) • No Venue
Lin et al.
Revisiting Temporal Modeling For Clip-based Image-to-video Knowledge Transferring (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Liu et al.
Large Language Models Can Accurately Predict Searcher Preferences (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 82 citations
Thomas et al.
Is Chatgpt Good At Search? Investigating Large Language Models As Re-ranking Agents (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 141 citations
Sun et al.
Interpretable Long-form Legal Question Answering With Retrieval-augmented Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 45 citations
Antoine Louis, Gijs van Dijck, Gerasimos Spanakis
Query Rewriting For Retrieval-augmented Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 114 citations
Ma et al.
One-shot Labeling For Automatic Relevance Estimation (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Sean MacAvaney, Luca Soldaini
Self-rag: Learning To Retrieve, Generate, And Critique Through Self-reflection (2023) • No Venue
Asai et al.
Learning To Retrieve In-context Examples For Large Language Models (2023) • No Venue
Liang Wang, Nan Yang, Furu Wei
Information Retrieval Meets Large Language Models: A Strategic Report From Chinese IR Community (2023) • AI Open • 51 citations
Ai et al.
Can Chatgpt Write A Good Boolean Query For Systematic Review Literature Search? (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 181 citations
Wang et al.
Mechgpt, A Language-based Strategy For Mechanics And Materials Modeling That Connects Knowledge Across Scales, Disciplines And Modalities (2023) • Applied Mechanics Reviews • 74 citations
Markus J. Buehler
Query2doc: Query Expansion With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 108 citations
Liang Wang, Nan Yang, Furu Wei
Parameter-efficient Transfer Learning For Remote Sensing Image-text Retrieval (2023) • IEEE Transactions on Geoscience and Remote Sensing • 60 citations
Yuan Yuan, Yang Zhan, Zhitong Xiong
Uncovering Chatgpt's Capabilities In Recommender Systems (2023) • RecSys '23: Seventeenth ACM Conference on Recommender Systems • 116 citations
Dai et al.
Rap-gen: Retrieval-augmented Patch Generation With Codet5 For Automatic Program Repair (2023) • Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 57 citations
Wang et al.
Enhancing Retrieval-augmented Large Language Models With Iterative Retrieval-generation Synergy (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 94 citations
Shao et al.
Ragas: Automated Evaluation Of Retrieval Augmented Generation (2023) • Arxiv • 60 citations
Es et al.
Perspectives On Large Language Models For Relevance Judgment (2023) • ICTIR '23: The 2023 ACM SIGIR International Conference on the Theory of Information Retrieval • 103 citations
Faggioli et al.
Lamp: When Large Language Models Meet Personalization (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
Salemi et al.
REPLUG: Retrieval-augmented Black-box Language Models (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 76 citations
Shi et al.
CLIP For All Things Zero-shot Sketch-based Image Retrieval, Fine-grained Or Not (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 93 citations
Sain et al.
Enhancing Model Performance In Multilingual Information Retrieval With Comprehensive Data Engineering Techniques (2023) • ICAIF '23: 4th ACM International Conference on AI in Finance • 109 citations
Zhang et al.
Fine-tuning Language Models For Factuality (2023) • No Venue
Tian et al.
Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 204 citations
Ding Jiang, Mang Ye
Large AI Model Empowered Multimodal Semantic Communications (2023) • IEEE Wireless Communications • 40 citations
Jiang et al.
Matching Patients To Clinical Trials With Large Language Models (2023) • Nature Communications • 111 citations
Jin et al.
Medcpt: Contrastive Pre-trained Transformers With Large-scale Pubmed Search Logs For Zero-shot Biomedical Information Retrieval (2023) • Bioinformatics • 73 citations
Jin et al.
Universal Instance Perception As Object Discovery And Retrieval (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Yan et al.
Zero-shot Everything Sketch-based Image Retrieval, And In Explainable Style (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Lin et al.
PMC-CLIP: Contrastive Language-image Pre-training Using Biomedical Documents (2023) • Lecture Notes in Computer Science • 110 citations
Lin et al.
Chatdoctor: A Medical Chat Model Fine-tuned On A Large Language Model Meta-ai (llama) Using Medical Domain Knowledge (2023) • Cureus • 256 citations
Li et al.
Teach Llms To Personalize -- An Approach Inspired By Writing Education (2023) • No Venue
Li et al.
Repocoder: Repository-level Code Completion Through Iterative Retrieval And Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 66 citations
Zhang et al.
In-context Retrieval-augmented Language Models (2023) • Transactions of the Association for Computational Linguistics • 201 citations
Ram et al.
Knn-diffusion: Image Generation Via Large-scale Retrieval (2022) • Arxiv • 46 citations
Sheynin et al.
Ts2-net: Token Shift And Selection Transformer For Text-video Retrieval (2022) • Lecture Notes in Computer Science • 97 citations
Liu et al.
UMT: Unified Multi-modal Transformers For Joint Video Moment Retrieval And Highlight Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 125 citations
Liu et al.
Asymmetric Cross-scale Alignment For Text-based Person Search (2022) • IEEE Transactions on Multimedia • 54 citations
Ji et al.
An Efficiency Study For SPLADE Models (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 58 citations
Carlos Lassance, Stéphane Clinchant
Learning Audio-video Modalities From Image Captions (2022) • Lecture Notes in Computer Science • 48 citations
Nagrani et al.
Text And Code Embeddings By Contrastive Pre-training (2022) • Arxiv • 146 citations
Neelakantan et al.
SGPT: GPT Sentence Embeddings For Semantic Search (2022) • Arxiv • 56 citations
Niklas Muennighoff
Text Embeddings By Weakly-supervised Contrastive Pre-training (2022) • Arxiv • 107 citations
Wang et al.
Autoregressive Search Engines: Generating Substrings As Document Identifiers (2022) • Arxiv • 66 citations
Bevilacqua et al.
Cross-domain Deep Code Search With Meta Learning (2022) • Proceedings of the 44th International Conference on Software Engineering • 40 citations
Chai et al.
Corpusbrain: Pre-train A Generative Retrieval Model For Knowledge-intensive Language Tasks (2022) • Proceedings of the 31st ACM International Conference on Information & Knowledge Management • 52 citations
Chen et al.
Re-imagen: Retrieval-augmented Text-to-image Generator (2022) • Arxiv • 41 citations
Chen et al.
PRADA: Practical Black-box Adversarial Attacks Against Neural Ranking Models (2022) • ACM Transactions on Information Systems • 41 citations
Wu et al.
Dense Text Retrieval Based On Pretrained Language Models: A Survey (2022) • ACM Transactions on Information Systems • 85 citations
Zhao et al.
Pretrained Domain-specific Language Model For General Information Retrieval Tasks In The AEC Domain (2022) • Computers in Industry • 82 citations
Zheng et al.
Vista: Vision And Scene Text Aggregation For Cross-modal Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 81 citations
Cheng et al.
When Not To Trust Language Models: Investigating Effectiveness Of Parametric And Non-parametric Memories (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 146 citations
Mallen et al.
"this Is My Unicorn, Fluffy": Personalizing Frozen Vision-language Representations (2022) • Lecture Notes in Computer Science • 42 citations
Cohen et al.
See Finer, See More: Implicit Modality Alignment For Text-based Person Retrieval (2022) • Lecture Notes in Computer Science • 102 citations
Shu et al.
Centerclip: Token Clustering For Efficient Text-video Retrieval (2022) • SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 100 citations
Zhao et al.
Promptagator: Few-shot Dense Retrieval From 8 Examples (2022) • Arxiv • 46 citations
Dai et al.
Learning-by-narrating: Narrative Pre-training For Zero-shot Dialogue Comprehension (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 109 citations
Zhao et al.
X-CLIP: End-to-end Multi-grained Contrastive Learning For Video-text Retrieval (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 217 citations
Ma et al.
Reading-strategy Inspired Visual Representation Learning For Text-to-video Retrieval (2022) • IEEE Transactions on Circuits and Systems for Video Technology • 67 citations
Dong et al.
Bridging Video-text Retrieval With Multiple Choice Questions (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 117 citations
Ge et al.
X-pool: Cross-modal Language-video Attention For Text-video Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 170 citations
Gorti et al.
Retromae: Pre-training Retrieval-oriented Language Models Via Masked Auto-encoder (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 41 citations
Xiao et al.
Hit: Hierarchical Transformer With Momentum Contrast For Video-text Retrieval (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 122 citations
Liu et al.
Lightningdot: Pre-training Visual-semantic Embeddings For Real-time Image-text Retrieval (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 79 citations
Sun et al.
What Makes Good In-context Examples For GPT-$3$? (2021) • Arxiv • 154 citations
Liu et al.
Scaling Up Visual And Vision-language Representation Learning With Noisy Text Supervision (2021) • International Conference on Machine Learning 2021 • 1191 citations
Jia et al.
Vlmo: Unified Vision-language Pre-training With Mixture-of-modality-experts (2021) • Arxiv • 288 citations
Bao et al.
Pseudo-relevance Feedback For Multiple Representation Dense Retrieval (2021) • Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval • 47 citations
Wang et al.
Crossclr: Cross-modal Contrastive Learning For Multi-modal Video Representations (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 121 citations
Zolfaghari et al.
Everything At Once -- Multi-modal Fusion Transformer For Video Retrieval (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 112 citations
Shvetsova et al.
Wav2clip: Learning Robust Audio Representations From CLIP (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 158 citations
Wu et al.
Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 98 citations
Miech et al.
Improving Video-text Retrieval By Multi-stream Corpus Alignment And Dual Softmax Loss (2021) • Arxiv • 66 citations
Cheng et al.
Learning Passage Impacts For Inverted Indexes (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 140 citations
Mallia et al.
Videoclip: Contrastive Pre-training For Zero-shot Video-text Understanding (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 341 citations
Xu et al.
TEACHTEXT: Crossmodal Generalized Distillation For Text-video Retrieval (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 117 citations
Croitoru et al.
Few-shot Intent Classification And Slot Filling With Retrieved Examples (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 46 citations
Yu et al.
Revamping Cross-modal Recipe Retrieval With Hierarchical Transformers And Self-supervised Learning (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Salvador et al.
Societal Biases In Retrieved Contents: Measurement Framework And Adversarial Mitigation For BERT Rankers (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 44 citations
Navid Rekabsaz, Simone Kopeinik, Markus Schedl
Whitening Sentence Representations For Better Semantics And Faster Retrieval (2021) • Arxiv • 204 citations
Su et al.
Clip2video: Mastering Video-text Retrieval Via Image CLIP (2021) • Arxiv • 130 citations
Fang et al.
The Expando-mono-duo Design Pattern For Text Ranking With Pretrained Sequence-to-sequence Models (2021) • Arxiv • 40 citations
Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin
Clip4clip: An Empirical Study Of CLIP For End To End Video Clip Retrieval (2021) • Arxiv • 113 citations
Luo et al.
Rethink Training Of BERT Rerankers In Multi-stage Retrieval Pipeline (2021) • Lecture Notes in Computer Science • 71 citations
Luyu Gao, Zhuyun Dai, Jamie Callan
Contextual Non-local Alignment Over Full-scale Representation For Text-based Person Search (2021) • Arxiv • 61 citations
Gao et al.
COIL: Revisit Exact Lexical Match In Information Retrieval With Contextualized Inverted List (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 161 citations
Luyu Gao, Zhuyun Dai, Jamie Callan
Unsupervised Corpus Aware Language Model Pre-training For Dense Passage Retrieval (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 88 citations
Luyu Gao, Jamie Callan
Simple Entity-centric Questions Challenge Dense Retrievers (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 72 citations
Sciavolino et al.
Image Retrieval On Real-life Images With Pre-trained Vision-and-language Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 151 citations
Liu et al.
Learning-to-rank With BERT In Tf-ranking (2020) • Arxiv • 60 citations
Han et al.
Language-agnostic BERT Sentence Embedding (2020) • Arxiv • 194 citations
Feng et al.
Improving Efficient Neural Ranking Models With Cross-architecture Knowledge Distillation (2020) • Arxiv • 64 citations
Hofstätter et al.
Speaker-aware BERT For Multi-turn Response Selection In Retrieval-based Chatbots (2020) • Proceedings of the 29th ACM International Conference on Information & Knowledge Management • 148 citations
Gu et al.
Colbert: Efficient And Effective Passage Search Via Contextualized Late Interaction Over BERT (2020) • Arxiv • 189 citations
Omar Khattab, Matei Zaharia
Leveraging Code Generation To Improve Code Retrieval And Summarization Via Dual Learning (2020) • Proceedings of The Web Conference 2020 • 49 citations
Ye et al.
Creating Something From Nothing: Unsupervised Knowledge Distillation For Cross-modal Hashing (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 127 citations
Hu et al.
Sparse, Dense, And Attentional Representations For Text Retrieval (2020) • Transactions of the Association for Computational Linguistics • 154 citations
Luan et al.
Modularized Transfomer-based Ranking Framework (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 50 citations
Luyu Gao, Zhuyun Dai, Jamie Callan
Imagebert: Cross-modal Pre-training With Large-scale Weak-supervised Image-text Data (2020) • Arxiv • 154 citations
Qi et al.
Sparterm: Learning Term-based Sparse Representation For Fast Text Retrieval (2020) • Arxiv • 59 citations
Bai et al.
Pre-training Via Paraphrasing (2020) • Arxiv • 89 citations
Lewis et al.
Show, Recall, And Tell: Image Captioning With Recall Mechanism (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 40 citations
Wang et al.
Open-retrieval Conversational Question Answering (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 69 citations
Qu et al.
Pre-training Tasks For Embedding-based Large-scale Retrieval (2020) • Arxiv • 102 citations
Chang et al.
IMRAM: Iterative Matching With Recurrent Attention Memory For Cross-modal Image-text Retrieval (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 391 citations
Chen et al.
Expressing Objects Just Like Words: Recurrent Visual Embedding For Image-text Matching (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 65 citations
Tianlang Chen, Jiebo Luo
Fine-grained Video-text Retrieval With Hierarchical Graph Reasoning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 306 citations
Chen et al.
Fine-grained Visual Textual Alignment For Cross-modal Retrieval Using Transformer Encoders (2020) • ACM Transactions on Multimedia Computing, Communications, and Applications • 118 citations
Messina et al.
Transformer Reasoning Network For Image-text Matching And Retrieval (2020) • 2020 25th International Conference on Pattern Recognition (ICPR) • 45 citations
Messina et al.
SEA: Sentence Encoder Assembly For Video Retrieval By Textual Queries (2020) • IEEE Transactions on Multimedia • 53 citations
Li et al.
Query Resolution For Conversational Search With Limited Supervision (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 106 citations
Voskarides et al.
Rocketqa: An Optimized Training Approach To Dense Passage Retrieval For Open-domain Question Answering (2020) • Arxiv • 74 citations
Qu et al.
Analysing The Effect Of Clarifying Questions On Document Ranking In Conversational Search (2020) • Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval • 45 citations
Krasakis et al.
Reinforcement Learning For Weakly Supervised Temporal Grounding Of Natural Language In Untrimmed Videos (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 69 citations
Wu et al.
PROP: Pre-training With Representative Words Prediction For Ad-hoc Retrieval (2020) • Proceedings of the 14th ACM International Conference on Web Search and Data Mining • 41 citations
Ma et al.
Vlanet: Video-language Alignment Network For Weakly-supervised Video Moment Retrieval (2020) • Lecture Notes in Computer Science • 78 citations
Ma et al.
Few-shot Generative Conversational Query Rewriting (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 124 citations
Yu et al.
Modality-agnostic Attention Fusion For Visual Search With Text Feedback (2020) • Arxiv • 47 citations
Dodds et al.
Cross-lingual Retrieval For Iterative Self-supervised Training (2020) • NeurIPS 2020 • 48 citations
Tran et al.
Polysemous Visual-semantic Embedding For Cross-modal Retrieval (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 243 citations
Yale Song, Mohammad Soleymani
Learning Dual Retrieval Module For Semi-supervised Relation Extraction (2019) • The World Wide Web Conference • 61 citations
Lin et al.
Coupling Retrieval And Meta-learning For Context-dependent Semantic Parsing (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 44 citations
Guo et al.
CEDR: Contextualized Embeddings For Document Ranking (2019) • SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 186 citations
MacAvaney et al.
Knowledge Guided Text Retrieval And Reading For Open Domain Question Answering (2019) • Arxiv • 84 citations
Min et al.
Unicoder-vl: A Universal Encoder For Vision And Language By Cross-modal Pre-training (2019) • Arxiv • 117 citations
Li et al.
Understanding The Behaviors Of BERT In Ranking (2019) • Arxiv • 145 citations
Qiao et al.
Cross-modal Interaction Networks For Query-based Moment Retrieval In Videos (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 238 citations
Zhang et al.
Asking Clarifying Questions In Open-domain Information-seeking Conversations (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 175 citations
Aliannejadi et al.
Coacor: Code Annotation For Code Retrieval With Reinforcement Learning (2019) • The World Wide Web Conference • 92 citations
Ziyu Yao, Jayavardhan Reddy Peddamail, Huan Sun
Improving Multilingual Sentence Embedding Using Bi-directional Dual Encoder With Additive Margin Softmax (2019) • Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence • 80 citations
Yang et al.
Answering Complex Open-domain Questions Through Iterative Query Generation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 103 citations
Qi et al.
Multilingual Universal Sentence Encoder For Semantic Retrieval (2019) • Arxiv • 66 citations
Yang et al.
Passage Re-ranking With BERT (2019) • Arxiv • 347 citations
Rodrigo Nogueira, Kyunghyun Cho
CAMP: Cross-modal Adaptive Message Passing For Text-image Retrieval (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 339 citations
Wang et al.
Matching Images And Text With Multi-modal Tensor Fusion And Re-ranking (2019) • Proceedings of the 27th ACM International Conference on Multimedia • 145 citations
Wang et al.
Convert: Efficient And Accurate Conversational Representations From Transformers (2019) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 41 citations
Henderson et al.
Entity-consistent End-to-end Task-oriented Dialogue System With KB Retriever (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 57 citations
Qin et al.
Learning A Text-video Embedding From Incomplete And Heterogeneous Data (2018) • Arxiv • 174 citations
Antoine Miech, Ivan Laptev, Josef Sivic
Response Ranking With Deep Matching Networks And External Knowledge In Information-seeking Conversation Systems (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 153 citations
Yang et al.
Retrieve And Refine: Improved Sequence Generation Models For Dialogue (2018) • Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI • 176 citations
Jason Weston, Emily Dinan, Alexander H. Miller
A Retrieve-and-edit Framework For Predicting Structured Outputs (2018) • Arxiv • 102 citations
Hashimoto et al.
Adversarial Sampling And Training For Semi-supervised Information Retrieval (2018) • The World Wide Web Conference • 79 citations
Dae Hoon Park, Yi Chang
A Joint Sequence Fusion Model For Video Question Answering And Retrieval (2018) • Lecture Notes in Computer Science • 331 citations
Youngjae Yu, Jongseok Kim, Gunhee Kim
Show, Tell And Discriminate: Image Captioning By Self-retrieval With Partially Labeled Data (2018) • Lecture Notes in Computer Science • 83 citations
Liu et al.
Learning To Mine Aligned Code And Natural Language Pairs From Stack Overflow (2018) • Proceedings of the 15th International Conference on Mining Software Repositories • 183 citations
Yin et al.
Look, Imagine And Match: Improving Textual-visual Cross-modal Retrieval With Generative Models (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 289 citations
Gu et al.
Key-value Retrieval Networks For Task-oriented Dialogue (2017) • Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue • 51 citations
Mihail Eric, Christopher D. Manning
Neural Vector Spaces For Unsupervised Information Retrieval (2017) • ACM Transactions on Information Systems • 84 citations
Christophe van Gysel, Maarten de Rijke, Evangelos Kanoulas
End-to-end Concept Word Detection For Video Captioning, Retrieval, And Question Answering (2016) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 146 citations
Yu et al.
A Study Of Matchpyramid Models On Ad-hoc Retrieval (2016) • Arxiv • 84 citations
Pang et al.
Image Captioning With Deep Bidirectional Lstms (2016) • Proceedings of the 24th ACM international conference on Multimedia • 262 citations
Wang et al.
Learning Deep Representations Of Fine-grained Visual Descriptions (2016) • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 769 citations
Reed et al.

Showing first 12 while collapsed. Click to expand and reveal all 237.

— S —

Scalability 1018 papers #

Inference-time Hyper-scaling With KV Cache Compression (2025) • No Venue
Łańcucki et al.
Tattoo: Tool-grounded Thinking PRM For Test-time Scaling In Tabular Reasoning (2025) • No Venue
Zou et al.
MMR-V: What's Left Unsaid? A Benchmark For Multimodal Deep Reasoning In Videos (2025) • No Venue
Zhu et al.
Internvl3: Exploring Advanced Training And Test-time Recipes For Open-source Multimodal Models (2025) • No Venue
Zhu et al.
Is Extending Modality The Right Path Towards Omni-modality? (2025) • No Venue
Zhu et al.
Scaling Latent Reasoning Via Looped Language Models (2025) • No Venue
Zhu et al.
Vargpt-v1.1: Improve Visual Autoregressive Large Unified Model Via Iterative Instruction Tuning And Reinforcement Learning (2025) • No Venue
Zhuang et al.
Efficient Long-context Language Model Training By Core Attention Disaggregation (2025) • No Venue
Zhuang et al.
Deepprune: Parallel Scaling Without Inter-trace Redundancy (2025) • No Venue
Tu et al.
Beyond The 80/20 Rule: High-entropy Minority Tokens Drive Effective Reinforcement Learning For LLM Reasoning (2025) • No Venue
Wang et al.
Chain-of-retrieval Augmented Generation (2025) • No Venue
Wang et al.
Co-evolving LLM Coder And Unit Tester Via Reinforcement Learning (2025) • No Venue
Wang et al.
Generalizing Test-time Compute-optimal Scaling As An Optimizable Graph (2025) • No Venue
Wang et al.
Game-tars: Pretrained Foundation Models For Scalable Generalist Multimodal Game Agents (2025) • No Venue
Wang et al.
A Theoretical Study On Bridging Internal Probability And Self-consistency For LLM Reasoning (2025) • No Venue
Zhou et al.
Agentfly: Fine-tuning LLM Agents Without Fine-tuning Llms (2025) • No Venue
Zhou et al.
Knocking-heads Attention (2025) • No Venue
Zhou et al.
Dreamrenderer: Taming Multi-instance Attribute Control In Large-scale Text-to-image Models (2025) • No Venue
Zhou et al.
Evolving Language Models Without Labels: Majority Drives Selection, Novelty Promotes Variation (2025) • No Venue
Zhou et al.
Innogym: Benchmarking The Innovation Potential Of AI Agents (2025) • No Venue
Zhang et al.
Process-based Self-rewarding Language Models (2025) • No Venue
Zhang et al.
Redundancy Principles For Mllms Benchmarks (2025) • No Venue
Zhang et al.
A Survey Of Reinforcement Learning For Large Reasoning Models (2025) • No Venue
Zhang et al.
Tensor Product Attention Is All You Need (2025) • No Venue
Zhang et al.
Vision-language-vision Auto-encoder: Scalable Knowledge Distillation From Diffusion Models (2025) • No Venue
Zhang et al.
What, How, Where, And How Well? A Survey On Test-time Scaling In Large Language Models (2025) • No Venue
Zhang et al.
Babel: Open Multilingual Large Language Models Serving Over 90% Of Global Speakers (2025) • No Venue
Zhao et al.
Absolute Zero: Reinforced Self-play Reasoning With Zero Data (2025) • No Venue
Zhao et al.
Insights Into Deepseek-v3: Scaling Challenges And Reflections On Hardware For AI Architectures (2025) • No Venue
Zhao et al.
Lex-art: Rethinking Text Generation Via Scalable High-quality Data Synthesis (2025) • No Venue
Zhao et al.
Promptcot 2.0: Scaling Prompt Synthesis For Large Language Model Reasoning (2025) • No Venue
Zhao et al.
FARMER: Flow Autoregressive Transformer Over Pixels (2025) • No Venue
Zheng et al.
Architecture Decoupling Is Not All You Need For Unified Multimodal Model (2025) • No Venue
Zheng et al.
An Empirical Study Of Qwen3 Quantization (2025) • No Venue
Zheng et al.
Scaling Diffusion Transformers Efficiently Via Μp (2025) • No Venue
Zheng et al.
Newtonbench: Benchmarking Generalizable Scientific Law Discovery In LLM Agents (2025) • No Venue
Zheng et al.
Stabilizing Reinforcement Learning With Llms: Formulation And Practices (2025) • No Venue
Zheng et al.
REASONEDIT: Towards Reasoning-enhanced Image Editing Models (2025) • No Venue
Yin et al.
Aionopedia: An LLM Agent Orchestrating Multimodal Learning For Ionic Liquid Discovery (2025) • No Venue
Yin et al.
Livemcp-101: Stress Testing And Diagnosing Mcp-enabled Agents On Challenging Queries (2025) • No Venue
Yin et al.
Reasoning Models Better Express Their Confidence (2025) • No Venue
Yoon et al.
Llada-v: Large Language Diffusion Models With Visual Instruction Tuning (2025) • No Venue
You et al.
Guided Self-evolving Llms With Minimal Human Supervision (2025) • No Venue
Yu et al.
Formalmath: Benchmarking Formal Mathematical Reasoning Of Large Language Models (2025) • No Venue
Yu et al.
DAPO: An Open-source LLM Reinforcement Learning System At Scale (2025) • No Venue
Yu et al.
Aworld: Orchestrating The Training Recipe For Agentic AI (2025) • No Venue
Yu et al.
RLPR: Extrapolating RLVR To General Domains Without Verifiers (2025) • No Venue
Yu et al.
Scaling Embedding Layers In Language Models (2025) • No Venue
Yu et al.
Trajselector: Harnessing Latent Representations For Efficient And Effective Best-of-n In Large Reasoning Model (2025) • No Venue
Yu et al.
Z1: Efficient Test-time Scaling With Code (2025) • No Venue
Yu et al.
Agent-r: Training Language Model Agents To Reflect Via Iterative Self-training (2025) • No Venue
Yuan et al.
Pixelrefer: A Unified Framework For Spatio-temporal Object Referring With Arbitrary Granularity (2025) • No Venue
Yuan et al.
Refeed: Multi-dimensional Summarization Refinement With Reflective Reasoning On Feedback (2025) • No Venue
Yun et al.
Reasoning Vectors: Transferring Chain-of-thought Capabilities Via Task Arithmetic (2025) • No Venue
Mohammad Zbeeb, Hasan Abed Al Kader Hammoud, Bernard Ghanem
Skywork-swe: Unveiling Data Scaling Laws For Software Engineering In Llms (2025) • No Venue
Zeng et al.
Satori-swe: Evolutionary Test-time Scaling For Sample-efficient Software Engineering (2025) • No Venue
Zeng et al.
Exgrpo: Learning To Reason From Experience (2025) • No Venue
Zhan et al.
Booststep: Boosting Mathematical Capability Of Large Language Models Via Improved Single-step Reasoning (2025) • No Venue
Zhang et al.
Aixiv: A Next-generation Open Access Ecosystem For Scientific Discovery Generated By AI Scientists (2025) • No Venue
Zhang et al.
Autoenv: Automated Environments For Measuring Cross-environment Agent Learning (2025) • No Venue
Zhang et al.
Batch Speculative Decoding Done Right (2025) • No Venue
Zhang et al.
Domain2vec: Vectorizing Datasets To Find The Optimal Data Mixture Without Training (2025) • No Venue
Zhang et al.
Multishotmaster: A Controllable Multi-shot Video Generation Framework (2025) • No Venue
Wang et al.
Mobile-agent-e: Self-evolving Mobile Assistant For Complex Tasks (2025) • No Venue
Wang et al.
Octothinker: Mid-training Incentivizes Reinforcement Learning Scaling (2025) • No Venue
Wang et al.
Opencua: Open Foundations For Computer-use Agents (2025) • No Venue
Wang et al.
Reverse-engineered Reasoning For Open-ended Generation (2025) • No Venue
Wang et al.
Scoreflow: Mastering LLM Agent Workflows Via Score-based Preference Optimization (2025) • No Venue
Wang et al.
Scaling Pre-training To One Hundred Billion Data For Vision Language Models (2025) • No Venue
Wang et al.
Thoughts Are All Over The Place: On The Underthinking Of O1-like Llms (2025) • No Venue
Wang et al.
A Systematic Analysis Of Hybrid Linear Attention (2025) • No Venue
Wang et al.
Test-time Scaling With Reflective Generative Model (2025) • No Venue
Wang et al.
Winning The Pruning Gamble: A Unified Approach To Joint Sample And Token Pruning For Efficient Supervised Fine-tuning (2025) • No Venue
Wang et al.
Vision-zero: Scalable VLM Self-improvement Via Strategic Gamified Self-play (2025) • No Venue
Wang et al.
Unigenbench++: A Unified Semantic Evaluation Benchmark For Text-to-image Generation (2025) • No Venue
Wang et al.
UI-TARS-2 Technical Report: Advancing GUI Agent With Multi-turn Reinforcement Learning (2025) • No Venue
Wang et al.
Worldpm: Scaling Human Preference Modeling (2025) • No Venue
Wang et al.
Sim-cot: Supervised Implicit Chain-of-thought (2025) • No Venue
Wei et al.
Unsupervised Post-training For Multi-modal LLM Reasoning Via GRPO (2025) • No Venue
Wei et al.
3D Scene Generation: A Survey (2025) • No Venue
Wen et al.
Less-to-more Generalization: Unlocking More Controllability By In-context Generation (2025) • No Venue
Wu et al.
The Bitter Lesson Learned From 2,000+ Multilingual Benchmarks (2025) • No Venue
Wu et al.
Bitnet Distillation (2025) • No Venue
Wu et al.
ARM: Adaptive Reasoning Model (2025) • No Venue
Wu et al.
Efficient Pretraining Length Scaling (2025) • No Venue
Wu et al.
COMPACT: Compositional Atomic-to-complex Visual Capability Tuning (2025) • No Venue
Wu et al.
Direct3d-s2: Gigascale 3D Generation Made Easy With Spatial Sparse Attention (2025) • No Venue
Wu et al.
From Hours To Minutes: Lossless Acceleration Of Ultra Long Sequence Generation Up To 100K Tokens (2025) • No Venue
Wu et al.
Grove Moe: Towards Efficient And Superior Moe Llms With Adjugate Experts (2025) • No Venue
Wu et al.
Gui-actor: Coordinate-free Visual Grounding For GUI Agents (2025) • No Venue
Wu et al.
Hunyuanvideo 1.5 Technical Report (2025) • No Venue
Wu et al.
Resum: Unlocking Long-horizon Search Intelligence Via Context Summarization (2025) • No Venue
Wu et al.
Rewarddance: Reward Scaling In Visual Generation (2025) • No Venue
Wu et al.
Vidic: Video Difference Captioning (2025) • No Venue
Wu et al.
Superwriter: Reflection-driven Long-form Generation With Large Language Models (2025) • No Venue
Wu et al.
Synthrl: Scaling Visual Reasoning With Verifiable Data Synthesis (2025) • No Venue
Wu et al.
Agent0: Unleashing Self-evolving Agents From Zero Data Via Tool-integrated Reasoning (2025) • No Venue
Xia et al.
Open Data Synthesis For Deep Research (2025) • No Venue
Xia et al.
Scaling Language-centric Omnimodal Representation Learning (2025) • No Venue
Xiao et al.
Ui-genie: A Self-improving Approach For Iteratively Boosting Mllm-based Mobile GUI Agents (2025) • No Venue
Xiao et al.
Show-o2: Improved Native Unified Multimodal Models (2025) • No Venue
Jinheng Xie, Zhenheng Yang, Mike Zheng Shou
SANA 1.5: Efficient Scaling Of Training-time And Inference-time Compute In Linear Diffusion Transformer (2025) • No Venue
Xie et al.
Scaling Computer-use Grounding Via User Interface Decomposition And Synthesis (2025) • No Venue
Xie et al.
Scalecap: Inference-time Scalable Image Captioning Via Dual-modality Debiasing (2025) • No Venue
Xing et al.
Gigatok: Scaling Visual Tokenizers To 3 Billion Parameters For Autoregressive Image Generation (2025) • No Venue
Xiong et al.
Genius: A Generalizable And Purely Unsupervised Self-training Framework For Advanced Reasoning (2025) • No Venue
Xu et al.
Scalable Chain Of Thoughts Via Elastic Reasoning (2025) • No Venue
Xu et al.
Single-stream Policy Optimization (2025) • No Venue
Zhongwen Xu, Zihan Ding
Streamingvlm: Real-time Understanding For Infinite Video Streams (2025) • No Venue
Xu et al.
Unveiling Downstream Performance Scaling Of Llms: A Clustering-based Perspective (2025) • No Venue
Xu et al.
Φ-decoding: Adaptive Foresight Sampling For Balanced Inference-time Exploration And Exploitation (2025) • No Venue
Xu et al.
General Agentic Memory Via Deep Research (2025) • No Venue
Yan et al.
Re:form -- Reducing Human Priors In Scalable Formal Software Verification With RL In Llms: A Preliminary Study On Dafny (2025) • No Venue
Yan et al.
Hscodecomp: A Realistic And Expert-level Benchmark For Deep Search Agents In Hierarchical Rule Application (2025) • No Venue
Yang et al.
Multiverse: Your Language Models Secretly Decide How To Parallelize And Merge Generation (2025) • No Venue
Yang et al.
Qwen3 Technical Report (2025) • No Venue
Yang et al.
Ui2code^n: A Visual Language Model For Test-time Scalable Interactive Ui-to-code Generation (2025) • No Venue
Yang et al.
Twinmarket: A Scalable Behavioral And Social Simulation For Financial Markets (2025) • No Venue
Yang et al.
Zerogui: Automating Online GUI Learning At Zero Human Cost (2025) • No Venue
Yang et al.
Survey On Evaluation Of Llm-based Agents (2025) • No Venue
Yehudai et al.
Llasa: Scaling Train-time And Inference-time Compute For Llama-based Speech Synthesis (2025) • No Venue
Ye et al.
Demystifying Long Chain-of-thought Reasoning In Llms (2025) • No Venue
Yeo et al.
Worldmm: Dynamic Multimodal Memory Agent For Long Video Reasoning (2025) • No Venue
Yeo et al.
Wan: Open And Advanced Large-scale Video Generative Models (2025) • No Venue
Wanteam et al.
Every Activation Boosted: Scaling General Reasoner To 1 Trillion Open Language Foundation (2025) • No Venue
Ling-Team et al.
Virtual Width Networks (2025) • No Venue
Seed et al.
Minimax-01: Scaling Foundation Models With Lightning Attention (2025) • No Venue
Minimax et al.
The Markovian Thinker (2025) • No Venue
Aghajohari et al.
FS-DAG: Few Shot Domain Adapting Graph Networks For Visually Rich Document Understanding (2025) • No Venue
Amit Agarwal, Srikant Panda, Kulbhushan Pachauri
Ming-flash-omni: A Sparse, Unified Architecture For Multimodal Perception And Generation (2025) • No Venue
Ai et al.
Open Deep Search: Democratizing Search With Open-source Reasoning Agents (2025) • No Venue
Alzubi et al.
Amo-bench: Large Language Models Still Struggle In High School Math Competitions (2025) • No Venue
An et al.
Herobench: A Benchmark For Long-horizon Planning And Structured Reasoning In Virtual Worlds (2025) • No Venue
Anokhin et al.
Tabstar: A Foundation Tabular Model With Semantically Target-aware Representations (2025) • No Venue
Alan Arazi, Eilam Shapira, Roi Reichart
Hybrid Architectures For Language Models: Systematic Analysis And Design Insights (2025) • No Venue
Bae et al.
Swe-rebench: An Automated Pipeline For Task Collection And Decontaminated Evaluation Of Software Engineering Agents (2025) • No Venue
Badertdinov et al.
Qwen3-vl Technical Report (2025) • No Venue
Bai et al.
Reasoning Language Models: A Blueprint (2025) • No Venue
Besta et al.
Eurobert: Scaling Multilingual Encoders For European Languages (2025) • No Venue
Boizard et al.
When Does Reasoning Matter? A Controlled Study Of Reasoning's Contribution To Model Performance (2025) • No Venue
Boizard et al.
Divmerge: A Divergence-based Model Merging Method For Multi-tasking (2025) • No Venue
Brahim et al.
Go-with-the-flow: Motion-controllable Video Diffusion Models Using Real-time Warped Noise (2025) • No Venue
Burgert et al.
Distillation Scaling Laws (2025) • No Venue
Busbridge et al.
Scaling Spatial Intelligence With Multimodal Foundation Models (2025) • No Venue
Cai et al.
MORSE-500: A Programmatically Controllable Video Benchmark To Stress-test Multimodal Reasoning (2025) • No Venue
Cai et al.
Iterresearch: Rethinking Long-horizon Agents Via Markovian State Reconstruction (2025) • No Venue
Chen et al.
Why Do Multi-agent LLM Systems Fail? (2025) • No Venue
Cemri et al.
Webscale-rl: Automated Data Pipeline For Scaling RL Data To Pretraining Levels (2025) • No Venue
Cen et al.
Dip: Taming Diffusion Models In Pixel Space (2025) • No Venue
Chen et al.
Agentfrontier: Expanding The Capability Frontier Of LLM Agents With Zpd-guided Data Synthesis (2025) • No Venue
Chen et al.
Blip3-o: A Family Of Fully Open Unified Multimodal Models-architecture, Training And Dataset (2025) • No Venue
Chen et al.
Blip3o-next: Next Frontier Of Native Image Generation (2025) • No Venue
Chen et al.
Code2video: A Code-centric Paradigm For Educational Video Generation (2025) • No Venue
Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou
ERA: Transforming Vlms Into Embodied Agents Via Embodied Prior Learning And Online Reinforcement Learning (2025) • No Venue
Chen et al.
Enigmata: Scaling Logical Reasoning In Large Language Models With Synthetic Verifiable Puzzles (2025) • No Venue
Chen et al.
Geometrically-constrained Agent For Spatial Reasoning (2025) • No Venue
Chen et al.
Livecc: Learning Video LLM With Streaming Speech Transcription At Scale (2025) • No Venue
Chen et al.
Moca: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings (2025) • No Venue
Chen et al.
Parallel Scaling Law For Language Models (2025) • No Venue
Chen et al.
Retroinfer: A Vector-storage Approach For Scalable Long-context LLM Inference (2025) • No Venue
Chen et al.
Stockbench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? (2025) • No Venue
Chen et al.
Π_rl: Online RL Fine-tuning For Flow-based Vision-language-action Models (2025) • No Venue
Chen et al.
Wikontic: Constructing Wikidata-aligned, Ontology-aware Knowledge Graphs With Large Language Models (2025) • No Venue
Chepurova et al.
Gold-medalist Performance In Solving Olympiad Geometry With Alphageometry2 (2025) • No Venue
Chervonyi et al.
Metaclip 2: A Worldwide Scaling Recipe (2025) • No Venue
Chuang et al.
The Entropy Mechanism Of Reinforcement Learning For Reasoning Language Models (2025) • No Venue
Cui et al.
Process Reinforcement Through Implicit Rewards (2025) • No Venue
Cui et al.
Self-forcing++: Towards Minute-scale High-quality Video Generation (2025) • No Venue
Cui et al.
Ebt-policy: Energy Unlocks Emergent Physical Reasoning Capabilities (2025) • No Venue
Davies et al.
Alayadb: The Data Foundation For Efficient And Effective Long-context LLM Inference (2025) • No Venue
Deng et al.
Self-improvement In Multimodal Large Language Models: A Survey (2025) • No Venue
Deng et al.
Exploring The Sustainable Scaling Of AI Dilemma: A Projective Study Of Corporations' AI Environmental Impacts (2025) • No Venue
Desroches et al.
Agentic Entropy-balanced Policy Optimization (2025) • No Venue
Dong et al.
Streaming Diloco With Overlapping Communication: Towards A Distributed Free Lunch (2025) • No Venue
Douillard et al.
Pre-trained Policy Discriminators Are General Reward Models (2025) • No Venue
Dou et al.
MM-PRM: Enhancing Multimodal Mathematical Reasoning With Scalable Step-level Supervision (2025) • No Venue
Du et al.
MMTEB: Massive Multilingual Text Embedding Benchmark (2025) • No Venue
Enevoldsen et al.
SSRL: Self-search Reinforcement Learning (2025) • No Venue
Fan et al.
Dualvla: Building A Generalizable Embodied Agent Via Partial Decoupling Of Reasoning And Action (2025) • No Venue
Fang et al.
On Path To Multimodal Generalist: General-level And General-bench (2025) • No Venue
Fei et al.
Optimal Scaling Needs Optimal Norm (2025) • No Venue
Filatov et al.
Onethinker: All-in-one Reasoning Model For Image And Video (2025) • No Venue
Feng et al.
Reactive Transformer (rxt) -- Stateful Real-time Processing For Event-driven Reactive Language Models (2025) • No Venue
Adam Filipek
Nemotron-flash: Towards Latency-optimal Hybrid Small Language Models (2025) • No Venue
Fu et al.
Think-at-hard: Selective Latent Iterations To Improve Reasoning Language Models (2025) • No Venue
Fu et al.
Listener-rewarded Thinking In Vlms For Image Preferences (2025) • No Venue
Gambashidze et al.
Beyond Ten Turns: Unlocking Long-horizon Agentic Search With Large-scale Asynchronous RL (2025) • No Venue
Gao et al.
Agentscope 1.0: A Developer-centric Framework For Building Agentic Applications (2025) • No Venue
Gao et al.
A Survey Of Self-evolving Agents: On Path To Artificial Super Intelligence (2025) • No Venue
Gao et al.
R&B: Domain Regrouping And Data Mixture Balancing For Efficient Foundation Model Training (2025) • No Venue
Ge et al.
Inverse Scaling In Test-time Compute (2025) • No Venue
Gema et al.
Scaling Up Test-time Compute With Latent Reasoning: A Recurrent Depth Approach (2025) • No Venue
Geiping et al.
Centurio: On Drivers Of Multilingual Ability Of Large Vision-language Model (2025) • No Venue
Geigle et al.
Inside-out: Hidden Factual Knowledge In Llms (2025) • No Venue
Gekhman et al.
You Do Not Fully Utilize Transformer's Representation Capacity (2025) • No Venue
Gerasimov et al.
Energy-based Transformers Are Scalable Learners And Thinkers (2025) • No Venue
Gladstone et al.
Great Models Think Alike And This Undermines AI Oversight (2025) • No Venue
Goel et al.
RADLADS: Rapid Attention Distillation To Linear Attention Decoders At Scale (2025) • No Venue
Goldstein et al.
Multi-token Attention (2025) • No Venue
Golovneva et al.
Seedream 2.0: A Native Chinese-english Bilingual Image Generation Foundation Model (2025) • No Venue
Gong et al.
Long-context Autoregressive Video Modeling With Next-frame Prediction (2025) • No Venue
Yuchao Gu, Weijia Mao, Mike Zheng Shou
Skywork Open Reasoner 1 Technical Report (2025) • No Venue
He et al.
Web-cogreasoner: Towards Knowledge-induced Cognitive Reasoning For Web Agents (2025) • No Venue
Guo et al.
Rag-anything: All-in-one RAG Framework (2025) • No Venue
Guo et al.
MAGA: Massive Genre-audience Reformulation To Pretraining Corpus Expansion (2025) • No Venue
Xintong Hao, Ke Shen, Chenggang Li
Trillion 7B Technical Report (2025) • No Venue
Han et al.
Learnings From Scaling Visual Tokenizers For Reconstruction And Generation (2025) • No Venue
Hansen-Estruch et al.
ROOT: Robust Orthogonalized Optimizer For Neural Network Training (2025) • No Venue
He et al.
RLP: Reinforcement As A Pretraining Objective (2025) • No Venue
Hatamizadeh et al.
Don't Overthink It. Preferring Shorter Thinking Chains For Improved LLM Reasoning (2025) • No Venue
Hassid et al.
Protoreasoning: Prototypes As The Foundation For Generalizable Reasoning In Llms (2025) • No Venue
He et al.
Dita: Scaling Diffusion Transformer For Generalist Vision-language-action Policy (2025) • No Venue
Hou et al.
Glm-4.1v-thinking: Towards Versatile Multimodal Reasoning With Scalable Reinforcement Learning (2025) • No Venue
Hong et al.
Group Think: Multiple Concurrent Reasoning Agents Collaborating At Token Level Granularity (2025) • No Venue
Hsu et al.
The Art Of Scaling Reinforcement Learning Compute For Llms (2025) • No Venue
Khatri et al.
Quest: Incentivizing Llms To Generate Difficult Problems (2025) • No Venue
Hu et al.
Beyond 'aha!': Toward Systematic Meta-abilities Alignment In Large Reasoning Models (2025) • No Venue
Hu et al.
Attentioninfluence: Adopting Attention Head Influence For Weak-to-strong Pretraining Data Selection (2025) • No Venue
Hua et al.
Over-tokenized Transformer: Vocabulary Is Generally Worth Scaling (2025) • No Venue
Huang et al.
Loong: Synthesize Long Chain-of-thoughts At Scale Through Verifiers (2025) • No Venue
Huang et al.
ILLUME+: Illuminating Unified MLLM With Dual Visual Tokenization And Diffusion Refinement (2025) • No Venue
Huang et al.
Live Avatar: Streaming Real-time Audio-driven Avatar Generation With Infinite Length (2025) • No Venue
Huang et al.
O1 Replication Journey -- Part 3: Inference-time Scaling For Medical Reasoning (2025) • No Venue
Huang et al.
Low-probability Tokens Sustain Exploration In Reinforcement Learning With Verifiable Reward (2025) • No Venue
Huang et al.
Dynamic Chunking For End-to-end Hierarchical Sequence Modeling (2025) • No Venue
Sukjun Hwang, Brandon Wang, Albert Gu
Dype: Dynamic Position Extrapolation For Ultra High Resolution Diffusion (2025) • No Venue
Issachar et al.
Upsample What Matters: Region-adaptive Latent Sampling For Accelerated Diffusion Transformers (2025) • No Venue
Jeong et al.
Verltool: Towards Holistic Agentic Reinforcement Learning With Tool Use (2025) • No Venue
Jiang et al.
Loopholing Discrete Diffusion: Deterministic Bypass Of The Sampling Wall (2025) • No Venue
Jo et al.
Is That Your Final Answer? Test-time Scaling Improves Selective Question Answering (2025) • No Venue
William Jurayj, Jeffrey Cheng, Benjamin van Durme
Gralora: Granular Low-rank Adaptation For Parameter-efficient Fine-tuning (2025) • No Venue
Jung et al.
T1: Tool-integrated Self-verification For Test-time Compute Scaling In Small Language Models (2025) • No Venue
Minki Kang, Jongwon Jeong, Jaewoong Cho
Inference-time Scaling For Flow Models Via Stochastic Generation And Rollover Budget Forcing (2025) • No Venue
Kim et al.
Kormo: Korean Open Reasoning Model For Everyone (2025) • No Venue
Kim et al.
Flex-judge: Think Once, Judge Anywhere (2025) • No Venue
Ko et al.
Mini-o3: Scaling Up Reasoning Patterns And Interaction Turns For Visual Search (2025) • No Venue
Lai et al.
Stream3r: Scalable Sequential 3D Reconstruction With Causal Transformer (2025) • No Venue
Lan et al.
Evolving Deeper LLM Thinking (2025) • No Venue
Lee et al.
Gemini Embedding: Generalizable Embeddings From Gemini (2025) • No Venue
Lee et al.
Dacomp: Benchmarking Data Agents Across The Full Data Intelligence Lifecycle (2025) • No Venue
Lei et al.
Vfxmaster: Unlocking Dynamic Visual Effect Generation Via In-context Learning (2025) • No Venue
Li et al.
MITS: Enhanced Tree Search Reasoning For Llms Via Pointwise Mutual Information (2025) • No Venue
Li et al.
Knapsack RL: Unlocking Exploration Of Llms Via Optimizing Budget Allocation (2025) • No Venue
Li et al.
MANZANO: A Simple And Scalable Unified Multimodal Model With A Hybrid Vision Tokenizer (2025) • No Venue
Li et al.
S*: Test Time Scaling For Code Generation (2025) • No Venue
Li et al.
PRIMA.CPP: Speeding Up 70b-scale LLM Inference On Low-resource Everyday Home Clusters (2025) • No Venue
Li et al.
Model Merging In Pre-training Of Large Language Models (2025) • No Venue
Li et al.
Radial Attention: O(nlog N) Sparse Attention With Energy Decay For Long Video Generation (2025) • No Venue
Li et al.
Seek In The Dark: Reasoning Via Test-time Instance-level Policy Gradient In Latent Space (2025) • No Venue
Li et al.
Simplevla-rl: Scaling VLA Training Via Reinforcement Learning (2025) • No Venue
Li et al.
Taming Llms By Scaling Learning Rates With Gradient Grouping (2025) • No Venue
Li et al.
Test-time Preference Optimization: On-the-fly Alignment Via Iterative Textual Feedback (2025) • No Venue
Li et al.
Transmamba: Flexibly Switching Between Transformer And Mamba (2025) • No Venue
Li et al.
Truth In The Few: High-value Data Selection For Efficient Multi-modal Reasoning (2025) • No Venue
Li et al.
Webthinker: Empowering Large Reasoning Models With Deep Research Capability (2025) • No Venue
Li et al.
Discrete Diffusion VLA: Bringing Discrete Diffusion To Action Decoding In Vision-language-action Policies (2025) • No Venue
Liang et al.
Towards Personalized Deep Research: Benchmarks And Evaluations (2025) • No Venue
Liang et al.
SEAP: Training-free Sparse Expert Activation Pruning Unlock The Brainpower Of Large Language Models (2025) • No Venue
Liang et al.
Fractured Chain-of-thought Reasoning (2025) • No Venue
Liao et al.
Motif 2 12.7B Technical Report (2025) • No Venue
Lim et al.
Omnihuman-1: Rethinking The Scaling-up Of One-stage Conditioned Human Animation Models (2025) • No Venue
Lin et al.
Scaling Code-assisted Chain-of-thoughts And Instructions For Model Reasoning (2025) • No Venue
Lin et al.
Mcpeval: Automatic Mcp-based Deep Evaluation For AI Agent Models (2025) • No Venue
Liu et al.
Longllada: Unlocking Long Context Capabilities In Diffusion Llms (2025) • No Venue
Liu et al.
Does Dinov3 Set A New Medical Vision Standard? (2025) • No Venue
Liu et al.
Costbench: Evaluating Multi-turn Cost-optimal Planning And Adaptation In Dynamic Environments For LLM Tool-use Agents (2025) • No Venue
Liu et al.
Budget-aware Tool-use Enables Effective Agent Scaling (2025) • No Venue
Liu et al.
Inference-time Scaling For Generalist Reward Modeling (2025) • No Venue
Liu et al.
Quantization Hurts Reasoning? An Empirical Study On Quantized Reasoning Models (2025) • No Venue
Liu et al.
Openvision 2: A Family Of Generative Pretrained Visual Encoders For Multimodal Learning (2025) • No Venue
Liu et al.
Shifting AI Efficiency From Model-centric To Data-centric Compression (2025) • No Venue
Liu et al.
Rectifying LLM Thought From Lens Of Optimization (2025) • No Venue
Liu et al.
Sequential Diffusion Language Models (2025) • No Venue
Liu et al.
Rstar-coder: Scaling Competitive Code Reasoning With A Large-scale Verified Dataset (2025) • No Venue
Liu et al.
Scalecua: Scaling Open-source Computer Use Agents With Cross-platform Data (2025) • No Venue
Liu et al.
SPIRAL: Self-play On Zero-sum Games Incentivizes Reasoning Via Multi-agent Multi-turn Reinforcement Learning (2025) • No Venue
Liu et al.
Skywork-reward-v2: Scaling Preference Data Curation Via Human-ai Synergy (2025) • No Venue
Liu et al.
Video-t1: Test-time Scaling For Video Generation (2025) • No Venue
Liu et al.
TUNA: Taming Unified Visual Representations For Native Unified Multimodal Models (2025) • No Venue
Liu et al.
Voxtral (2025) • No Venue
Liu et al.
BIOMEDICA: An Open Biomedical Image-caption Archive, Dataset, And Vision-language Models Derived From Scientific Literature (2025) • No Venue
Lozano et al.
Ovis2.5 Technical Report (2025) • No Venue
Lu et al.
R-horizon: How Far Can Your Large Reasoning Model Really Go In Breadth And Depth? (2025) • No Venue
Lu et al.
RPG: A Repository Planning Graph For Unified And Scalable Codebase Generation (2025) • No Venue
Luo et al.
Being-h0: Vision-language-action Pretraining From Large-scale Human Videos (2025) • No Venue
Luo et al.
Beyond English: Toward Inclusive And Scalable Multilingual Machine Translation With Llms (2025) • No Venue
Luo et al.
Rethinking Diverse Human Preference Learning Through Principal Component Analysis (2025) • No Venue
Luo et al.
Mcp-universe: Benchmarking Large Language Models With Real-world Model Context Protocol Servers (2025) • No Venue
Luo et al.
URSA: Understanding And Verifying Chain-of-thought Reasoning In Multimodal Mathematics (2025) • No Venue
Luo et al.
Token-shuffle: Towards High-resolution Image Generation With Autoregressive Models (2025) • No Venue
Ma et al.
Bitnet B1.58 2B4T Technical Report (2025) • No Venue
Ma et al.
Inference-time Scaling For Diffusion Models Beyond Scaling Denoising Steps (2025) • No Venue
Ma et al.
TCIA: A Task-centric Instruction Augmentation Method For Instruction Finetuning (2025) • No Venue
Ma et al.
Veomni: Scaling Any Modality Model Training With Model-centric Distributed Recipe Zoo (2025) • No Venue
Ma et al.
Scaling Analysis Of Interleaved Speech-text Language Models (2025) • No Venue
Maimon et al.
Slamming: Training A Speech Language Model On One GPU In A Day (2025) • No Venue
Gallil Maimon, Avishai Elmakies, Yossi Adi
Beyondweb: Lessons From Scaling Synthetic Data For Trillion-scale Pretraining (2025) • No Venue
Maini et al.
Souper-model: How Simple Arithmetic Unlocks State-of-the-art LLM Performance (2025) • No Venue
Maiti et al.
Gemstones: A Model Suite For Multi-faceted Scaling Laws (2025) • No Venue
McLeish et al.
Hard Negative Mining For Domain-specific Retrieval In Enterprise Systems (2025) • No Venue
Meghwani et al.
Multi-agent Tool-integrated Policy Optimization (2025) • No Venue
Mo et al.
Livemcpbench: Can Agents Navigate An Ocean Of MCP Tools? (2025) • No Venue
Mo et al.
S1: Simple Test-time Scaling (2025) • No Venue
Muennighoff et al.
Scalable-softmax Is Superior For Attention (2025) • No Venue
Ken M. Nakanishi
Large Language Diffusion Models (2025) • No Venue
Nie et al.
Diffusion Language Models Are Super Data Learners (2025) • No Venue
Ni et al.
Annotation-efficient Universal Honesty Alignment (2025) • No Venue
Ni et al.
Benchmarking Llms' Swarm Intelligence (2025) • No Venue
Ruan et al.
Large Language Models Meet Extreme Multi-label Classification: Scaling And Multi-modal Framework (2025) • No Venue
Ortego et al.
How Do Llms Acquire New Knowledge? A Knowledge Circuits Perspective On Continual Pre-training (2025) • No Venue
Ou et al.
Learning Adaptive Parallel Reasoning With Language Models (2025) • No Venue
Pan et al.
REST: Stress Testing Large Reasoning Models By Asking Multiple Problems At Once (2025) • No Venue
Pan et al.
A Survey On Inference Engines For Large Language Models: Perspectives On Optimization And Efficiency (2025) • No Venue
Park et al.
Fineweb2: One Pipeline To Scale Them All -- Adapting Pre-training Data Processing To Every Language (2025) • No Venue
Penedo et al.
RWKV-7 "goose" With Expressive Dynamic State Evolution (2025) • No Venue
Peng et al.
FAST: Efficient Action Tokenization For Vision-language-action Models (2025) • No Venue
Pertsch et al.
Toolrl: Reward Is All Tool Learning Needs (2025) • No Venue
Qian et al.
Lumina-image 2.0: A Unified And Efficient Image Generative Framework (2025) • No Venue
Qin et al.
Saffron-1: Towards An Inference Scaling Paradigm For LLM Safety Assurance (2025) • No Venue
Qiu et al.
Demons In The Detail: On Implementing Load Balancing Loss For Training Specialized Mixture-of-expert Models (2025) • No Venue
Qiu et al.
An Empirical Study Of Autoregressive Pre-training From Videos (2025) • No Venue
Rajasegaran et al.
Embodiedonevision: Interleaved Vision-text-action Pretraining For General Robot Control (2025) • No Venue
Qu et al.
One Small Step In Latent, One Giant Leap For Pixels: Fast Latent Upscale Adapter For Your Diffusion Models (2025) • No Venue
Aleksandr Razin, Danil Kazantsev, Ilya Makarov
Vamba: Understanding Hour-long Videos With Hybrid Mamba-transformers (2025) • No Venue
Ren et al.
REFINE-AF: A Task-agnostic Framework To Align Language Models Via Self-generated Instructions Using Reinforcement Learning From Automated Feedback (2025) • No Venue
Roy et al.
Beyond Memorization: Extending Reasoning Depth With Recurrence, Memory And Test-time Compute Scaling (2025) • No Venue
Rodkin et al.
Fast And Simplex: 2-simplicial Attention In Triton (2025) • No Venue
Roy et al.
When Models Lie, We Learn: Multilingual Span-level Hallucination Detection With Psiloqa (2025) • No Venue
Rykov et al.
Benchmarking Optimizers For Large Language Model Pretraining (2025) • No Venue
Andrei Semenov, Matteo Pagliardini, Martin Jaggi
NER Retriever: Zero-shot Named Entity Retrieval With Type-aware Embeddings (2025) • No Venue
Shachar et al.
Rstar2-agent: Agentic Reasoning Technical Report (2025) • No Venue
Shang et al.
Continuous Autoregressive Language Models (2025) • No Venue
Shao et al.
Deepseekmath-v2: Towards Self-verifiable Mathematical Reasoning (2025) • No Venue
Shao et al.
When Tokens Talk Too Much: A Survey Of Multimodal Long-context Token Compression Across Images, Videos, And Audios (2025) • No Venue
Shao et al.
Dupo: Enabling Reliable LLM Self-verification Via Dual Preference Optimization (2025) • No Venue
She et al.
Satori: Reinforcement Learning With Chain-of-action-thought Enhances LLM Reasoning Via Autoregressive Search (2025) • No Venue
Shen et al.
Exploring Data Scaling Trends And Effects In Reinforcement Learning From Human Feedback (2025) • No Venue
Shen et al.
SSA: Sparse Sparse Attention By Aligning Full And Sparse Attention Outputs In Feature Space (2025) • No Venue
Shen et al.
Scaling Vision Pre-training To 4K Resolution (2025) • No Venue
Shi et al.
Taskcraft: Automated Generation Of Agentic Tasks (2025) • No Venue
Shi et al.
Scaling Laws For Optimal Data Mixtures (2025) • No Venue
Shukor et al.
LADDER: Self-improving Llms Through Recursive Problem Decomposition (2025) • No Venue
Toby Simonds, Akira Yoshiyama
Dinov3 (2025) • No Venue
Siméoni et al.
The Illusion Of Diminishing Returns: Measuring Long Horizon Execution In Llms (2025) • No Venue
Sinha et al.
Linguistic Generalizability Of Test-time Scaling In Mathematical Reasoning (2025) • No Venue
Son et al.
Chain-of-model Learning For Language Model (2025) • No Venue
Song et al.
Paperbench: Evaluating Ai's Ability To Replicate AI Research (2025) • No Venue
Starace et al.
Video-lmm Post-training: A Deep Dive Into Video Reasoning With Large Multimodal Models (2025) • No Venue
Tang et al.
Toolorchestra: Elevating Intelligence Via Efficient Model And Tool Orchestration (2025) • No Venue
Su et al.
Expanding RL With Verifiable Rewards Across Diverse Domains (2025) • No Venue
Su et al.
Scaling Agents Via Continual Pre-training (2025) • No Venue
Su et al.
LASP-2: Rethinking Sequence Parallelism For Linear Attention And Its Hybrid (2025) • No Venue
Sun et al.
Au-harness: An Open-source Toolkit For Holistic Evaluation Of Audio Llms (2025) • No Venue
Surapaneni et al.
Zerosearch: Incentivize The Search Capability Of Llms Without Searching (2025) • No Venue
Sun et al.
Auto-regressive Vs Flow-matching: A Comparative Study Of Modeling Paradigms For Text-to-music Generation (2025) • No Venue
Or Tal, Felix Kreuk, Yossi Adi
Enabling Scalable Oversight Via Self-evolving Critic (2025) • No Venue
Tang et al.
Gemma 3 Technical Report (2025) • No Venue
Team et al.
COIG-P: A High-quality And Large-scale Chinese Preference Dataset For Alignment With Human Values (2025) • No Venue
Team et al.
Every Step Evolves: Scaling Reinforcement Learning For Trillion-scale Thinking Model (2025) • No Venue
Team et al.
Kimi K1.5: Scaling Reinforcement Learning With Llms (2025) • No Venue
Team et al.
Gigabrain-0: A World Model-powered Vision-language-action Model (2025) • No Venue
Team et al.
Supergpqa: Scaling LLM Evaluation Across 285 Graduate Disciplines (2025) • No Venue
Team et al.
Nex-n1: Agentic Models Trained Via A Unified Ecosystem For Large-scale Environment Construction (2025) • No Venue
Team et al.
Mirothinker: Pushing The Performance Boundaries Of Open-source Research Agents Via Model, Context, And Interactive Scaling (2025) • No Venue
Team et al.
Tongyi Deepresearch Technical Report (2025) • No Venue
Team et al.
Hermes 4 Technical Report (2025) • No Venue
Teknium et al.
Improving Text Embeddings For Smaller Language Models Using Contrastive Fine-tuning (2024) • No Venue
Trapoom Ukarapol, Zhicheng Lee, Amy Xin
No "zero-shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance (2024) • No Venue
Udandarao et al.
From Words To Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-context Examples (2024) • No Venue
Vacareanu et al.
Apollo: An Exploration Of Video Understanding In Large Multimodal Models (2024) • No Venue
Zohar et al.
Fusechat: Knowledge Fusion Of Chat Models (2024) • No Venue
Wan et al.
Tnt-llm: Text Mining At Scale With Large Language Models (2024) • No Venue
Wan et al.
Phi-3 Technical Report: A Highly Capable Language Model Locally On Your Phone (2024) • No Venue
Abdin et al.
Star Attention: Efficient LLM Inference Over Long Sequences (2024) • No Venue
Shantanu Acharya, Fei Jia, Boris Ginsburg
Yi: Open Foundation Models By 01.AI (2024) • No Venue
Ai et al.
Unibench: Visual Reasoning Requires Rethinking Vision-language Beyond Scaling (2024) • No Venue
Al-Tahan et al.
Training-free Long-context Scaling Of Large Language Models (2024) • No Venue
An et al.
Large Language Models As Markov Chains (2024) • No Venue
Zekri et al.
Anygpt: Unified Multimodal LLM With Discrete Sequence Modeling (2024) • No Venue
Zhan et al.
Blackmamba: Mixture Of Experts For State-space Models (2024) • No Venue
Anthony et al.
Chronos: Learning The Language Of Time Series (2024) • No Venue
Ansari et al.
Slicegpt: Compress Large Language Models By Deleting Rows And Columns (2024) • No Venue
Ashkboos et al.
MINT-1T: Scaling Open-source Multimodal Data By 10x: A Multimodal Dataset With One Trillion Tokens (2024) • No Venue
Awadalla et al.
Longwriter: Unleashing 10,000+ Word Generation From Long Context Llms (2024) • No Venue
Bai et al.
Skywork-math: Data Scaling Laws For Mathematical Reasoning In Large Language Models -- The Story Goes On (2024) • No Venue
Zeng et al.
Quiet-star: Language Models Can Teach Themselves To Think Before Speaking (2024) • No Venue
Zelikman et al.
Xlstm: Extended Long Short-term Memory (2024) • Arxiv • 81 citations
Beck et al.
Llm2vec: Large Language Models Are Secretly Powerful Text Encoders (2024) • No Venue
Behnamghader et al.
SUTRA: Scalable Multilingual Language Model Architecture (2024) • No Venue
Bendale et al.
Windows Agent Arena: Evaluating Multi-modal OS Agents At Scale (2024) • No Venue
Bonatti et al.
Pangea: A Fully Open Multilingual Multimodal LLM For 39 Languages (2024) • No Venue
Yue et al.
Scaling Synthetic Data Creation With 1,000,000,000 Personas (2024) • No Venue
Chan et al.
3dtopia-xl: Scaling High-quality 3D Asset Generation Via Primitive Diffusion (2024) • No Venue
Chen et al.
Changemamba: Remote Sensing Change Detection With Spatiotemporal State Space Model (2024) • IEEE Transactions on Geoscience and Remote Sensing • 172 citations
Chen et al.
Expanding Performance Boundaries Of Open-source Multimodal Models With Model, Data, And Test-time Scaling (2024) • No Venue
Chen et al.
Internet Of Agents: Weaving A Web Of Heterogeneous Agents For Collaborative Intelligence (2024) • No Venue
Chen et al.
Octopus V2: On-device Language Model For Super Agent (2024) • No Venue
Wei Chen, Zhiyuan Li
Sharegpt4video: Improving Video Understanding And Generation With Better Captions (2024) • No Venue
Chen et al.
Spatialvlm: Endowing Vision-language Models With Spatial Reasoning Capabilities (2024) • No Venue
Chen et al.
Xtrimopglm: Unified 100b-scale Pre-trained Transformer For Deciphering The Language Of Protein (2024) • Arxiv • 61 citations
Chen et al.
From MOOC To MAIC: Reshaping Online Teaching And Learning Through Llm-driven Agents (2024) • No Venue
Yu et al.
Breaking The Memory Barrier: Near Infinite Batch Size Scaling For Contrastive Loss (2024) • No Venue
Cheng et al.
Taming Multimodal Joint Training For High-quality Video-to-audio Synthesis (2024) • No Venue
Cheng et al.
Instruction Pre-training: Language Models Are Supervised Multitask Learners (2024) • No Venue
Cheng et al.
LEGENT: Open Platform For Embodied Agents (2024) • No Venue
Cheng et al.
Multi-lora Composition For Image Generation (2024) • No Venue
Zhong et al.
Efficiently Democratizing Medical Llms For 50 Languages Via A Mixture Of Language Family Experts (2024) • No Venue
Zheng et al.
Qwen2-audio Technical Report (2024) • No Venue
Chu et al.
Toto: Time Series Optimized Transformer For Observability (2024) • No Venue
Cohen et al.
Is Bigger Edit Batch Size Always Better? -- An Empirical Study On Model Editing With Llama-3 (2024) • No Venue
Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli
Scalable High-resolution Pixel-space Image Synthesis With Hourglass Diffusion Transformers (2024) • No Venue
Crowson et al.
RACER: Rich Language-guided Failure Recovery Policies For Imitation Learning (2024) • No Venue
Dai et al.
Deepseekmoe: Towards Ultimate Expert Specialization In Mixture-of-experts Language Models (2024) • No Venue
Dai et al.
Griffin: Mixing Gated Linear Recurrences With Local Attention For Efficient Language Models (2024) • No Venue
de et al.
The Road Less Scheduled (2024) • No Venue
Defazio et al.
Unleashing Reasoning Capability Of Llms Via Scalable Question Synthesis From Scratch (2024) • No Venue
Ding et al.
Megapairs: Massive Data Synthesis For Universal Multimodal Retrieval (2024) • No Venue
Zhou et al.
Enhancing Large Language Models With Pseudo- And Multisource- Knowledge Graphs For Open-ended Question Answering (2024) • IEEE Robotics and Automation Letters • 47 citations
Liu et al.
Bitstack: Fine-grained Size Control For Compressed Large Language Models In Variable Memory Environments (2024) • No Venue
Wang et al.
Emu3: Next-token Prediction Is All You Need (2024) • No Venue
Wang et al.
Visual Autoregressive Modeling: Scalable Image Generation Via Next-scale Prediction (2024) • No Venue
Tian et al.
Fitv2: Scalable And Improved Flexible Vision Transformer For Diffusion Model (2024) • No Venue
Wang et al.
Judging The Judges: Evaluating Alignment And Vulnerabilities In Llms-as-judges (2024) • No Venue
Thakur et al.
Jamba-1.5: Hybrid Transformer-mamba Models At Scale (2024) • No Venue
Team et al.
Scaling Laws With Vocabulary: Larger Models Deserve Larger Vocabularies (2024) • No Venue
Tao et al.
Gemma 2: Improving Open Language Models At A Practical Size (2024) • No Venue
Team et al.
Mambabyte: Token-free Selective State Space Model (2024) • No Venue
Wang et al.
Scaling Instructable Agents Across Many Simulated Worlds (2024) • No Venue
Team et al.
Grutopia: Dream General Robots In A City At Scale (2024) • No Venue
Wang et al.
Is Mamba Effective For Time Series Forecasting? (2024) • Neurocomputing • 84 citations
Wang et al.
Internvideo2: Scaling Video Foundation Models For Multimodal Video Understanding (2024) • No Venue
Wang et al.
Tutor Copilot: A Human-ai Approach For Scaling Real-time Expertise (2024) • No Venue
Wang et al.
Multimodal Latent Language Modeling With Next-token Diffusion (2024) • No Venue
Sun et al.
EVA-CLIP-18B: Scaling CLIP To 18 Billion Parameters (2024) • No Venue
Sun et al.
Autoregressive Model Beats Diffusion: Llama For Scalable Image Generation (2024) • No Venue
Sun et al.
Learning To (learn At Test Time): Rnns With Expressive Hidden States (2024) • No Venue
Sun et al.
Hunyuan-large: An Open-source Moe Model With 52 Billion Activated Parameters By Tencent (2024) • No Venue
Sun et al.
Llava-3d: A Simple Yet Effective Pathway To Empowering Lmms With 3d-awareness (2024) • No Venue
Zhu et al.
Tokenformer: Rethinking Transformer Scaling With Tokenized Model Parameters (2024) • No Venue
Wang et al.
Qwen2-vl: Enhancing Vision-language Model's Perception Of The World At Any Resolution (2024) • No Venue
Wang et al.
Self-training With Direct Preference Optimization Improves Chain-of-thought Reasoning (2024) • No Venue
Tianduo Wang, Shichen Li, Wei Lu
Video-infinity: Distributed Long Video Generation (2024) • No Venue
Tan et al.
PIN: A Knowledge-intensive Dataset For Paired And Interleaved Multimodal Documents (2024) • No Venue
Wang et al.
Top-nσ: Not All Logits Are You Need (2024) • No Venue
Tang et al.
LOGO -- Long Context Alignment Via Efficient Preference Optimization (2024) • No Venue
Tang et al.
Grandmaster-level Chess Without Search (2024) • No Venue
Ruoss et al.
Litevae: Lightweight And Efficient Variational Autoencoders For Latent Diffusion Models (2024) • No Venue
Sadat et al.
Blended RAG: Improving RAG (retriever-augmented Generation) Accuracy With Semantic Search And Hybrid Query-based Retrievers (2024) • 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR) • 47 citations
Kunal Sawarkar, Abhilasha Mangal, Shivam Raj Solanki
Livexiv -- A Multi-modal Live Benchmark Based On Arxiv Papers Content (2024) • No Venue
Shabtay et al.
Abstractive Text Summarization: State Of The Art, Challenges, And Improvements (2024) • Neurocomputing • 43 citations
Hassan Shakil, Ahmad Farooq, Jugal Kalita
Hyper-connections (2024) • No Venue
Zhu et al.
Scaling Retrieval-based Language Models With A Trillion-token Datastore (2024) • No Venue
Shao et al.
Scaling Laws For Linear Complexity Language Models (2024) • No Venue
Shen et al.
Power Scheduler: A Batch Size And Token Number Agnostic Learning Rate Scheduler (2024) • No Venue
Shen et al.
Longvu: Spatiotemporal Adaptive Compression For Long Video-language Understanding (2024) • No Venue
Shen et al.
Nemo-aligner: Scalable Toolkit For Efficient Model Alignment (2024) • No Venue
Shen et al.
Transfusion: Predict The Next Token And Diffuse Images With One Multi-modal Model (2024) • No Venue
Zhou et al.
Tag-llm: Repurposing General-purpose Llms For Specialized Domains (2024) • No Venue
Shen et al.
When Do We Not Need Larger Vision Models? (2024) • No Venue
Shi et al.
APOLLO: Sgd-like Memory, Adamw-level Performance (2024) • No Venue
Zhu et al.
Scaling LLM Test-time Compute Optimally Can Be More Effective Than Scaling Model Parameters (2024) • No Venue
Snell et al.
The Good, The Bad, And The Greedy: Evaluation Of Llms Should Not Ignore Non-determinism (2024) • No Venue
Song et al.
Moviellm: Enhancing Long Video Understanding With Ai-generated Movies (2024) • No Venue
Song et al.
Scaling Granite Code Models To 128K Context (2024) • No Venue
Stallone et al.
Jina-embeddings-v3: Multilingual Embeddings With Task Lora (2024) • No Venue
Sturua et al.
Diving Into Self-evolving Training For Multimodal Reasoning (2024) • No Venue
Liu et al.
Apigen: Automated Pipeline For Generating Verifiable And Diverse Function-calling Datasets (2024) • No Venue
Liu et al.
Infini-gram: Scaling Unbounded N-gram Language Models To A Trillion Tokens (2024) • No Venue
Liu et al.
Spatial-temporal Large Language Model For Traffic Prediction (2024) • 2024 25th IEEE International Conference on Mobile Data Management (MDM) • 56 citations
Liu et al.
VPTQ: Extreme Low-bit Vector Post-training Quantization For Large Language Models (2024) • No Venue
Liu et al.
Configurable Foundation Models: Building Llms From A Modular Perspective (2024) • No Venue
Xiao et al.
FP6-LLM: Efficiently Serving Large Language Models Through Fp6-centric Algorithm-system Co-design (2024) • No Venue
Xia et al.
Aligning Large Language Models Via Self-steering Optimization (2024) • No Venue
Xiang et al.
Seallms 3: Open Foundation And Chat Multilingual Large Language Models For Southeast Asian Languages (2024) • No Venue
Zhang et al.
Long Context Transfer From Language To Vision (2024) • No Venue
Zhang et al.
POA: Pre-training Once For Models Of All Sizes (2024) • No Venue
Zhang et al.
Tora: Trajectory-oriented Diffusion Transformer For Video Generation (2024) • No Venue
Zhang et al.
Llamax: Scaling Linguistic Horizons Of LLM By Enhancing Translation Capabilities Beyond 100 Languages (2024) • No Venue
Lu et al.
Deepseek-vl: Towards Real-world Vision-language Understanding (2024) • No Venue
Lu et al.
Blending Is All You Need: Cheaper, Better Alternative To Trillion-parameters LLM (2024) • No Venue
Lu et al.
When Scaling Meets LLM Finetuning: The Effect Of Data, Model And Finetuning Method (2024) • No Venue
Zhang et al.
Multi-head Mixture-of-experts (2024) • No Venue
Wu et al.
Improve Mathematical Reasoning In Language Models By Automated Process Supervision (2024) • No Venue
Luo et al.
Megalodon: Efficient LLM Pretraining And Inference With Unlimited Context Length (2024) • No Venue
Ma et al.
Videoautoarena: An Automated Arena For Evaluating Large Multimodal Models In Video Analysis Through User Simulation (2024) • No Venue
Luo et al.
The Era Of 1-bit Llms: All Large Language Models Are In 1.58 Bits (2024) • No Venue
Ma et al.
PALO: A Polyglot Large Multimodal Model For 5B People (2024) • No Venue
Maaz et al.
Rephrasing The Web: A Recipe For Compute And Data-efficient Language Modeling (2024) • No Venue
Maini et al.
Eurollm: Multilingual Language Models For Europe (2024) • No Venue
Martins et al.
Expandable Subspace Ensemble For Pre-trained Model-based Class-incremental Learning (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Zhou et al.
LLM Agent Operating System (2024) • No Venue
Mei et al.
Snap Video: Scaled Spatiotemporal Transformers For Text-to-video Synthesis (2024) • No Venue
Menapace et al.
Gsm-symbolic: Understanding The Limitations Of Mathematical Reasoning In Large Language Models (2024) • No Venue
Mirzadeh et al.
Realm: Reference Resolution As Language Modeling (2024) • No Venue
Moniz et al.
ROS-LLM: A ROS Framework For Embodied AI With Task Feedback And Structured Reasoning (2024) • No Venue
Mower et al.
Generative Representational Instruction Tuning (2024) • No Venue
Muennighoff et al.
Leave No Context Behind: Efficient Infinite Context Transformers With Infini-attention (2024) • No Venue
Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal
Towards Modular Llms By Building And Reusing A Library Of Loras (2024) • No Venue
Ostapenko et al.
Relik: Retrieve And Link, Fast And Accurate Entity Linking And Relation Extraction On An Academic Budget (2024) • No Venue
Orlando et al.
Byte Latent Transformer: Patches Scale Better Than Tokens (2024) • No Venue
Pagnoni et al.
Llamaduo: Llmops Pipeline For Seamless Migration From Service Llms To Small-scale Local Llms (2024) • No Venue
Park et al.
Nemotron-4 15B Technical Report (2024) • No Venue
Parmar et al.
The Fineweb Datasets: Decanting The Web For The Finest Text Data At Scale (2024) • No Venue
Penedo et al.
Moe-mamba: Efficient Selective State Space Models With Mixture Of Experts (2024) • No Venue
Pióro et al.
Sambanova SN40L: Scaling The AI Memory Wall With Dataflow And Composition Of Experts (2024) • No Venue
Prabhakar et al.
HGRN2: Gated Linear Rnns With State Expansion (2024) • No Venue
Qin et al.
Lightning Attention-2: A Free Lunch For Handling Unlimited Sequence Lengths In Large Language Models (2024) • No Venue
Qin et al.
Xgen-videosyn-1: High-fidelity Text-to-video Synthesis With Compressed Representations (2024) • No Venue
Qin et al.
Layerwise Recurrent Router For Mixture-of-experts (2024) • No Venue
Qiu et al.
Towards Building Multilingual Language Model For Medicine (2024) • Nature Communications • 53 citations
Qiu et al.
Tinyllava: A Framework Of Small-scale Large Multimodal Models (2024) • No Venue
Zhou et al.
Mixture-of-depths: Dynamically Allocating Compute In Transformer-based Language Models (2024) • No Venue
Raposo et al.
Trans-tokenization And Cross-lingual Vocabulary Transfers: Language Adaptation Of Llms For Low-resource NLP (2024) • No Venue
Remy et al.
Samba: Simple Hybrid State Space Models For Efficient Unlimited Context Language Modeling (2024) • No Venue
Ren et al.
Long-form Factuality In Large Language Models (2024) • No Venue
Wei et al.
Insight-v: Exploring Long-chain Visual Reasoning With Multimodal Large Language Models (2024) • No Venue
Dong et al.
Toward General Instruction-following Alignment For Retrieval-augmented Generation (2024) • No Venue
Dong et al.
Stacking Your Transformers: A Closer Look At Model Growth For Efficient LLM Pre-training (2024) • No Venue
Du et al.
Unlocking Continual Learning Abilities In Language Models (2024) • No Venue
Du et al.
MUSCLE: A Model Update Strategy For Compatible LLM Evolution (2024) • No Venue
Echterhoff et al.
Buffer Of Thoughts: Thought-augmented Reasoning With Large Language Models (2024) • No Venue
Yang et al.
Balancing Pipeline Parallelism With Vocabulary Parallelism (2024) • No Venue
Yeung et al.
Scalable Pre-training Of Large Autoregressive Image Models (2024) • No Venue
El-Nouby et al.
Fluid: Scaling Autoregressive Text-to-image Generative Models With Continuous Tokens (2024) • No Venue
Fan et al.
Kolmogorov-arnold Transformer (2024) • No Venue
Xingyi Yang, Xinchao Wang
Voco-llama: Towards Vision Compression With Large Language Models (2024) • No Venue
Ye et al.
Scaling Diffusion Transformers To 16 Billion Parameters (2024) • No Venue
Fei et al.
Openfedllm: Training Large Language Models On Decentralized Private Data Via Federated Learning (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 47 citations
Ye et al.
Multimodal Autoregressive Pre-training Of Large Vision Encoders (2024) • No Venue
Fini et al.
Data Engineering For Scaling Language Models To 128K Context (2024) • No Venue
Fu et al.
Efficiently Serving LLM Reasoning Programs With Certaindex (2024) • No Venue
Fu et al.
Towards A Unified View Of Preference Learning For Large Language Models: A Survey (2024) • No Venue
Gao et al.
Seerattention: Learning Intrinsic Sparse Attention In Your Llms (2024) • No Venue
Gao et al.
Gemma: Open Models Based On Gemini Research And Technology (2024) • Arxiv • 208 citations
Team et al.
Zamba: A Compact 7B SSM Hybrid Model (2024) • No Venue
Glorioso et al.
Dense Connector For Mllms (2024) • No Venue
Yao et al.
DART: Denoising Autoregressive Transformer For Scalable Text-to-image Generation (2024) • No Venue
Gu et al.
Mammoth-vl: Eliciting Multimodal Reasoning With Instruction Tuning At Scale (2024) • No Venue
Guo et al.
Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss (2024) • No Venue
Gupta et al.
Rethinking Token Reduction In Mllms: Towards A Unified Paradigm For Training-free Acceleration (2024) • No Venue
Han et al.
Qwen2 Technical Report (2024) • No Venue
Yang et al.
Llava-gemma: Accelerating Multimodal Foundation Models With A Compact Language Model (2024) • No Venue
Hinck et al.
Algorithmic Progress In Language Models (2024) • No Venue
Ho et al.
Exploring The Privacy Protection Capabilities Of Chinese Large Language Models (2024) • Proceedings of the ACM on Software Engineering • 40 citations
Yuqi Yang, Xiaowen Huang, Jitao Sang
Sampart3d: Segment Any Part In 3D Objects (2024) • No Venue
Yang et al.
3D-GRAND: A Million-scale Dataset For 3d-llms With Better Grounding And Less Hallucination (2024) • No Venue
Yang et al.
Not All LLM Reasoners Are Created Equal (2024) • No Venue
Hosseini et al.
Minicpm: Unveiling The Potential Of Small Language Models With Scalable Training Strategies (2024) • No Venue
Hu et al.
Openrlhf: An Easy-to-use, Scalable And High-performance RLHF Framework (2024) • No Venue
Hu et al.
Yulan-mini: An Open Data-efficient Language Model (2024) • No Venue
Hu et al.
Autocrawler: A Progressive Understanding Web Agent For Web Crawler Generation (2024) • No Venue
Huang et al.
How Good Are Low-bit Quantized Llama3 Models? An Empirical Study (2024) • No Venue
Huang et al.
Ultra-sparse Memory Network (2024) • No Venue
Huang et al.
Extending Llama-3's Context Ten-fold Overnight (2024) • No Venue
Zhang et al.
Simple And Scalable Strategies To Continually Pre-train Large Language Models (2024) • No Venue
Ibrahim et al.
Scaling Laws For Downstream Task Performance Of Large Language Models (2024) • No Venue
Isik et al.
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding (2024) • No Venue
Ji Ha Jang, Hoigi Seo, Se Young Chun
Mixture Of Nested Experts: Adaptive Processing Of Visual Tokens (2024) • No Venue
Jain et al.
Sceneverse: Scaling 3D Vision-language Learning For Grounded Scene Understanding (2024) • No Venue
Jia et al.
Mmsearch: Benchmarking The Potential Of Large Models As Multi-modal Search Engines (2024) • No Venue
Jiang et al.
E5-V: Universal Embeddings With Multimodal Large Language Models (2024) • No Venue
Jiang et al.
Megascale: Scaling Large Language Model Training To More Than 10,000 Gpus (2024) • No Venue
Jiang et al.
Minference 1.0: Accelerating Pre-filling For Long-context Llms Via Dynamic Sparse Attention (2024) • No Venue
Jiang et al.
Hydragen: High-throughput LLM Inference With Shared Prefixes (2024) • No Venue
Juravsky et al.
LLM Comparator: Visual Analytics For Side-by-side Evaluation Of Large Language Models (2024) • No Venue
Kahng et al.
Spectra: A Comprehensive Study Of Ternary, Quantized, And FP16 Language Models (2024) • No Venue
Kaushal et al.
Powerinfer-2: Fast Large Language Model Inference On A Smartphone (2024) • No Venue
Xue et al.
Openmoe: An Early Effort On Open Mixture-of-experts Language Models (2024) • No Venue
Xue et al.
Longvila: Scaling Long-context Visual Language Models For Long Videos (2024) • No Venue
Xue et al.
Revisit Large-scale Image-caption Data In Pre-training Multimodal Foundation Models (2024) • No Venue
Lai et al.
LLM2LLM: Boosting Llms With Novel Iterative Data Enhancement (2024) • No Venue
Lee et al.
Trol: Traversal Of Layers For Large Language And Vision Models (2024) • No Venue
Lee et al.
Theagentcompany: Benchmarking LLM Agents On Consequential Real World Tasks (2024) • No Venue
Xu et al.
Slowfast-llava: A Strong Training-free Baseline For Video Large Language Models (2024) • No Venue
Xu et al.
Same Task, More Tokens: The Impact Of Input Length On The Reasoning Performance Of Large Language Models (2024) • No Venue
Mosh Levy, Alon Jacoby, Yoav Goldberg
Aria: An Open Multimodal Native Mixture-of-experts Model (2024) • No Venue
Li et al.
Common 7B Language Models Already Possess Strong Math Capabilities (2024) • No Venue
Li et al.
Focusllm: Scaling Llm's Context By Parallel Decoding (2024) • No Venue
Li et al.
More Agents Is All You Need (2024) • No Venue
Li et al.
Omnibench: Towards The Future Of Universal Omni-language Models (2024) • No Venue
Li et al.
Omg-seg: Is One Model Good Enough For All Segmentation? (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 47 citations
Li et al.
Scaling (down) CLIP: A Comprehensive Analysis Of Data, Architecture, And Training Strategies (2024) • No Venue
Zichao Li, Cihang Xie, Ekin Dogus Cubuk
Synergen-vl: Towards Synergistic Image Understanding And Generation With Vision Experts And Token Folding (2024) • No Venue
Li et al.
Magpie: Alignment Data Synthesis From Scratch By Prompting Aligned Llms With Nothing (2024) • No Venue
Xu et al.
Agenttrek: Agent Trajectory Synthesis Via Guiding Replay With Web Tutorials (2024) • No Venue
Xu et al.
Diversity Empowers Intelligence: Integrating Expertise Of Software Engineering Agents (2024) • No Venue
Zhang et al.
Llava-critic: Learning To Evaluate Multimodal Models (2024) • No Venue
Xiong et al.
Jamba: A Hybrid Transformer-mamba Language Model (2024) • No Venue
Lieber et al.
Multi-layer Transformers Gradient Can Be Approximated In Almost Linear Time (2024) • No Venue
Liang et al.
Mixture-of-transformers: A Sparse And Scalable Architecture For Multi-modal Foundation Models (2024) • No Venue
Liang et al.
Agentless: Demystifying Llm-based Software Engineering Agents (2024) • No Venue
Xia et al.
GS-LRM: Large Reconstruction Model For 3D Gaussian Splatting (2024) • No Venue
Zhang et al.
Osworld: Benchmarking Multimodal Agents For Open-ended Tasks In Real Computer Environments (2024) • No Venue
Xie et al.
Baichuan Alignment Technical Report (2024) • No Venue
Lin et al.
STIV: Scalable Text And Image Conditioned Video Generation (2024) • No Venue
Lin et al.
Medtrinity-25m: A Large-scale Multimodal Dataset With Multigranular Annotations For Medicine (2024) • No Venue
Xie et al.
The Refinedweb Dataset For Falcon LLM: Outperforming Curated Corpora With Web Data, And Web Data Only (2023) • No Venue
Penedo et al.
RWKV: Reinventing Rnns For The Transformer Era (2023) • No Venue
Peng et al.
Small-scale Proxies For Large-scale Transformer Training Instabilities (2023) • No Venue
Wortsman et al.
Large Language Models Are Few-shot Health Learners (2023) • 2024 IEEE Congress on Evolutionary Computation (CEC) • 58 citations
Liu et al.
Videomae V2: Scaling Video Masked Autoencoders With Dual Masking (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 318 citations
Wang et al.
Pytorch FSDP: Experiences On Scaling Fully Sharded Data Parallel (2023) • Proceedings of the VLDB Endowment • 115 citations
Zhao et al.
Retentive Network: A Successor To Transformer For Large Language Models (2023) • No Venue
Sun et al.
Judgelm: Fine-tuned Large Language Models Are Scalable Judges (2023) • No Venue
Lianghui Zhu, Xinggang Wang, Xinlong Wang
Unified-io 2: Scaling Autoregressive Multimodal Models With Vision, Language, Audio, And Action (2023) • No Venue
Lu et al.
Level Generation Through Large Language Models (2023) • FDG 2023: Foundations of Digital Games 2023 • 65 citations
Todd et al.
Beyond Human Data: Scaling Self-training For Problem-solving With Language Models (2023) • No Venue
Singh et al.
DREEAM: Guiding Attention With Evidence For Improving Document-level Relation Extraction (2023) • Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics • 62 citations
Youmi Ma, An Wang, Naoaki Okazaki
Query Rewriting For Retrieval-augmented Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 114 citations
Ma et al.
From Google Gemini To Openai Q* (q-star): A Survey Of Reshaping The Generative Artificial Intelligence (AI) Research Landscape (2023) • Arxiv • 59 citations
McIntosh et al.
Specinfer: Accelerating Generative Large Language Model Serving With Tree-based Speculative Inference And Verification (2023) • ASPLOS '24: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 • 44 citations
Miao et al.
Anymal: An Efficient And Scalable Any-modality Augmented Language Model (2023) • No Venue
Moon et al.
Levels Of AGI For Operationalizing Progress On The Path To AGI (2023) • No Venue
Morris et al.
Large Language Models For Telecom: Forthcoming Impact On The Industry (2023) • IEEE Communications Magazine • 47 citations
Maatouk et al.
Referring Multi-object Tracking (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 66 citations
Wu et al.
On The Opportunities And Challenges Of Foundation Models For Geospatial Artificial Intelligence (2023) • Arxiv • 63 citations
Mai et al.
Let The Llms Talk: Simulating Human-to-human Conversational QA Via Zero-shot Llm-to-llm Interactions (2023) • WSDM '24: The 17th ACM International Conference on Web Search and Data Mining • 48 citations
Abbasiantaeb et al.
Can Knowledge Graphs Reduce Hallucinations In Llms? : A Survey (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 49 citations
Agrawal et al.
LLM In A Flash: Efficient Large Language Model Inference With Limited Memory (2023) • No Venue
Alizadeh et al.
Gpt4all: An Ecosystem Of Open Source Compressed Language Models (2023) • No Venue
Anand et al.
Bitnet: Scaling 1-bit Transformers For Large Language Models (2023) • No Venue
Wang et al.
Distributed Inference And Fine-tuning Of Large Language Models Over The Internet (2023) • No Venue
Borzunov et al.
Melm, A Generative Pretrained Language Modeling Framework That Solves Forward And Inverse Mechanics Problems (2023) • Journal of the Mechanics and Physics of Solids • 42 citations
Markus J. Buehler
Weak-to-strong Generalization: Eliciting Strong Capabilities With Weak Supervision (2023) • No Venue
Burns et al.
Pepnet: Parameter And Embedding Personalized Network For Infusing With Personalized Prior Information (2023) • Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 78 citations
Chang et al.
Q-transformer: Scalable Offline Reinforcement Learning Via Autoregressive Q-functions (2023) • No Venue
Chebotar et al.
Beyond Surface: Probing Llama Across Scales And Layers (2023) • No Venue
Chen et al.
Internvl: Scaling Up Vision Foundation Models And Aligning For Generic Visual-linguistic Tasks (2023) • No Venue
Chen et al.
Scalable Multi-robot Collaboration With Large Language Models: Centralized Or Decentralized Systems? (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 46 citations
Chen et al.
One Adapter For All Programming Languages? Adapter Tuning For Code Search And Summarization (2023) • Arxiv • 41 citations
Wang et al.
Internvid: A Large-scale Video-text Dataset For Multimodal Understanding And Generation (2023) • No Venue
Wang et al.
Scaling Relationship On Learning Mathematical Reasoning With Large Language Models (2023) • No Venue
Yuan et al.
Recmind: Large Language Model Powered Agent For Recommendation (2023) • Findings of the Association for Computational Linguistics: NAACL 2024 • 43 citations
Wang et al.
Adapting Large Language Models Via Reading Comprehension (2023) • No Venue
Daixuan Cheng, Shaohan Huang, Furu Wei
Robogen: Towards Unleashing Infinite Data For Automated Robot Learning Via Generative Simulation (2023) • No Venue
Wang et al.
A Survey Of Techniques For Optimizing Transformer Inference (2023) • Journal of Systems Architecture • 90 citations
Chitty-Venkata et al.
Scaling Robot Learning With Semantically Imagined Experience (2023) • Robotics: Science and Systems XIX • 57 citations
Yu et al.
Capsfusion: Rethinking Image-text Data At Scale (2023) • No Venue
Yu et al.
Flashattention-2: Faster Attention With Better Parallelism And Work Partitioning (2023) • Arxiv • 135 citations
Tri Dao
A Decoder-only Foundation Model For Time-series Forecasting (2023) • Arxiv • 41 citations
Das et al.
Longnet: Scaling Transformers To 1,000,000,000 Tokens (2023) • No Venue
Ding et al.
Enhancing Chat Language Models By Scaling High-quality Instructional Conversations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 60 citations
Ding et al.
Robot-enabled Construction Assembly With Automated Sequence Planning Based On Chatgpt: Robogpt (2023) • Buildings • 63 citations
You et al.
CXR-CLIP: Toward Large Scale Chest X-ray Language-image Pre-training (2023) • Lecture Notes in Computer Science • 50 citations
You et al.
Judging Llm-as-a-judge With Mt-bench And Chatbot Arena (2023) • No Venue
Zheng et al.
A Comprehensive Survey On Multimodal Recommender Systems: Taxonomy, Evaluation, And Future Directions (2023) • Arxiv • 149 citations
Zhou et al.
In-context Pretraining: Language Modeling Beyond Document Boundaries (2023) • No Venue
Shi et al.
Ankh: Optimized Protein Language Model Unlocks General-purpose Modelling (2023) • Arxiv • 95 citations
Elnaggar et al.
Qmoe: Practical Sub-1-bit Compression Of Trillion-parameter Models (2023) • No Venue
Elias Frantar, Dan Alistarh
Datacomp: In Search Of The Next Generation Of Multimodal Datasets (2023) • Arxiv • 72 citations
Gadre et al.
G-llava: Solving Geometric Problem With Multi-modal Large Language Model (2023) • No Venue
Gao et al.
Ureader: Universal Ocr-free Visually-situated Language Understanding With Multimodal Large Language Model (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 46 citations
Ye et al.
Large Language Models Are Versatile Decomposers: Decompose Evidence And Questions For Table-based Reasoning (2023) • SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 43 citations
Ye et al.
Composable Function-preserving Expansions For Transformer Architectures (2023) • No Venue
Andrea Gesmundo, Kaitlin Maile
Can A Student Large Language Model Perform As Well As It's Teacher? (2023) • Advances in Medical Technologies and Clinical Practice • 49 citations
Sia Gholami, Marwan Omar
Detclipv2: Scalable Open-vocabulary Object Detection Pre-training Via Word-region Alignment (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Yao et al.
Knowledge Distillation Of Large Language Models (2023) • No Venue
Gu et al.
Mamba: Linear-time Sequence Modeling With Selective State Spaces (2023) • No Venue
Albert Gu, Tri Dao
Mitigating The Learning Bias Towards Repetition By Self-contrastive Training For Open-ended Generation (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 41 citations
Jian Guan, Minlie Huang
Deepspeed-chat: Easy, Fast And Affordable RLHF Training Of Chatgpt-like Models At All Scales (2023) • No Venue
Yao et al.
Editing Large Language Models: Problems, Methods, And Opportunities (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 55 citations
Yao et al.
One Fits All:power General Time Series Analysis By Pretrained LM (2023) • Arxiv • 115 citations
Zhou et al.
Platform-independent And Curriculum-oriented Intelligent Assistant For Higher Education (2023) • International Journal of Educational Technology in Higher Education • 80 citations
Sajja et al.
From Words To Watts: Benchmarking The Energy Costs Of Large Language Model Inference (2023) • 2023 IEEE High Performance Extreme Computing Conference (HPEC) • 104 citations
Samsi et al.
LRM: Large Reconstruction Model For Single Image To 3D (2023) • No Venue
Hong et al.
LLM For Soc Security: A Paradigm Shift (2023) • IEEE Access • 52 citations
Saha et al.
Enhancing Model Performance In Multilingual Information Retrieval With Comprehensive Data Engineering Techniques (2023) • ICAIF '23: 4th ACM International Conference on AI in Finance • 109 citations
Zhang et al.
Tool Documentation Enables Zero-shot Tool-usage With Large Language Models (2023) • No Venue
Hsieh et al.
Llm-adapters: An Adapter Family For Parameter-efficient Fine-tuning Of Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 128 citations
Hu et al.
Swin3d: A Pretrained Transformer Backbone For 3D Indoor Scene Understanding (2023) • Computational Visual Media • 47 citations
Yang et al.
Segment And Caption Anything (2023) • No Venue
Huang et al.
OPERA: Alleviating Hallucination In Multi-modal Large Language Models Via Over-trust Penalty And Retrospection-allocation (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Huang et al.
Baichuan 2: Open Large-scale Language Models (2023) • No Venue
Yang et al.
Starvector: Generating Scalable Vector Graphics Code From Images (2023) • No Venue
Rodriguez et al.
Vary: Scaling Up The Vision Vocabulary For Large Vision-language Models (2023) • No Venue
Wei et al.
Llm-blender: Ensembling Large Language Models With Pairwise Ranking And Generative Fusion (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 61 citations
Dongfu Jiang, Xiang Ren, Bill Yuchen Lin
An Empirical Study Of Pre-trained Model Reuse In The Hugging Face Deep Learning Model Registry (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 53 citations
Jiang et al.
Active Retrieval Augmented Generation (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 216 citations
Jiang et al.
Matching Patients To Clinical Trials With Large Language Models (2023) • Nature Communications • 111 citations
Jin et al.
Medcpt: Contrastive Pre-trained Transformers With Large-scale Pubmed Search Logs For Zero-shot Biomedical Information Retrieval (2023) • Bioinformatics • 73 citations
Jin et al.
Representation Learning With Large Language Models For Recommendation (2023) • WWW '24: The ACM Web Conference 2024 • 118 citations
Ren et al.
TPU V4: An Optically Reconfigurable Supercomputer For Machine Learning With Hardware Support For Embeddings (2023) • Proceedings of the 50th Annual International Symposium on Computer Architecture • 322 citations
Jouppi et al.
Universal Instance Perception As Object Discovery And Retrieval (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Yan et al.
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages (2023) • Arxiv • 111 citations
Zhang et al.
SLCA: Slow Learner With Classifier Alignment For Continual Learning On A Pre-trained Model (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 72 citations
Zhang et al.
From Sparse To Soft Mixtures Of Experts (2023) • No Venue
Puigcerver et al.
CAMEL: Communicative Agents For "mind" Exploration Of Large Language Model Society (2023) • Arxiv • 87 citations
Li et al.
Automlp: Automated MLP For Sequential Recommendations (2023) • Proceedings of the ACM Web Conference 2023 • 44 citations
Li et al.
Api-bank: A Comprehensive Benchmark For Tool-augmented Llms (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 44 citations
Li et al.
FLM-101B: An Open LLM And How To Train It With $100K Budget (2023) • No Venue
Li et al.
Evaluating Parameter-efficient Transfer Learning Approaches On SURE Benchmark For Speech Understanding (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 281 citations
Li et al.
Towards Tracing Code Provenance With Code Watermarking (2023) • IEEE Internet of Things Journal • 62 citations
Li et al.
Textbooks Are All You Need II: Phi-1.5 Technical Report (2023) • No Venue
Li et al.
Effective Long-context Scaling Of Foundation Models (2023) • No Venue
Xiong et al.
Scaling Transnormer To 175 Billion Parameters (2023) • No Venue
Qin et al.
Large Language Models Are Strong Zero-shot Retriever (2023) • IEEE Communications Magazine • 80 citations
Shen et al.
S-lora: Serving Thousands Of Concurrent Lora Adapters (2023) • No Venue
Sheng et al.
Baize: An Open-source Chat Model With Parameter-efficient Tuning On Self-chat Data (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 94 citations
Xu et al.
Autodroid: Llm-powered Task Automation In Android (2023) • ACM MobiCom '24: 30th Annual International Conference on Mobile Computing and Networking • 40 citations
Wen et al.
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior (2023) • No Venue
Khandelwal et al.
SOLAR 10.7B: Scaling Large Language Models With Simple Yet Effective Depth Up-scaling (2023) • No Venue
Kim et al.
The Potential And Pitfalls Of Using A Large Language Model Such As Chatgpt Or GPT-4 As A Clinical Assistant (2023) • Journal of the American Medical Informatics Association • 45 citations
Zhang et al.
Vera: Vector-based Random Matrix Adaptation (2023) • No Venue
Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki Markus Asano
Summit: Iterative Text Summarization Via Chatgpt (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 42 citations
Haopeng Zhang, Xiao Liu, Jiawei Zhang
Efficient Memory Management For Large Language Model Serving With Pagedattention (2023) • No Venue
Kwon et al.
Relax: Composable Abstractions For End-to-end Dynamic Machine Learning (2023) • No Venue
Lai et al.
Copy Is All You Need (2023) • No Venue
Lan et al.
RLAIF: Scaling Reinforcement Learning From Human Feedback With AI Feedback (2023) • No Venue
Lee et al.
Few-shot Class-incremental Learning By Sampling Multi-phase Tasks (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 108 citations
Zhou et al.
Fedtp: Federated Learning By Transformer Personalization (2022) • IEEE Transactions on Neural Networks and Learning Systems • 76 citations
Li et al.
Cosformer: Rethinking Softmax In Attention (2022) • Arxiv • 65 citations
Qin et al.
PLM-ICD: Automatic ICD Coding With Pretrained Language Models (2022) • Proceedings of the 4th Clinical Natural Language Processing Workshop • 41 citations
Chao-Wei Huang, Shang-Chi Tsai, Yun-Nung Chen
Scalablevit: Rethinking The Context-oriented Generalization Of Vision Transformer (2022) • Lecture Notes in Computer Science • 41 citations
Yang et al.
Zero-shot Video Question Answering Via Frozen Bidirectional Language Models (2022) • Arxiv • 64 citations
Yang et al.
Sequencer: Deep LSTM For Image Classification (2022) • Arxiv • 53 citations
Yuki Tatsunami, Masato Taki
Scaling Up Models And Data With $\texttt{t5x}$ And $\texttt{seqio}$ (2022) • Arxiv • 47 citations
Roberts et al.
Towards Transferable Unrestricted Adversarial Examples With Minimum Changes (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 54 citations
Fangcheng Liu, Chao Zhang, Hongyang Zhang
VIMA: General Robot Manipulation With Multimodal Prompts (2022) • Arxiv • 65 citations
Jiang et al.
Repair Is Nearly Generation: Multilingual Program Repair With Llms (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 68 citations
Joshi et al.
Leveraging Language Foundation Models For Human Mobility Forecasting (2022) • Proceedings of the 30th International Conference on Advances in Geographic Information Systems • 48 citations
Hao Xue, Bhanu Prakash Voutharoja, Flora D. Salim
Deepspeed-moe: Advancing Mixture-of-experts Inference And Training To Power Next-generation AI Scale (2022) • Arxiv • 55 citations
Rajbhandari et al.
Transpolymer: A Transformer-based Language Model For Polymer Property Predictions (2022) • npj Computational Materials • 120 citations
Changwen Xu, Yuyang Wang, Amir Barati Farimani
Vitpose: Simple Vision Transformer Baselines For Human Pose Estimation (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 70 citations
Xu et al.
Progen2: Exploring The Boundaries Of Protein Language Models (2022) • Cell Systems • 286 citations
Nijkamp et al.
Mgpt: Few-shot Learners Go Multilingual (2022) • Arxiv • 47 citations
Shliazhko et al.
Large Language Models Are Few-shot Clinical Information Extractors (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 212 citations
Agrawal et al.
Webformer: The Web-page Transformer For Structure Information Extraction (2022) • WWW '22: The ACM Web Conference 2022 • 42 citations
Wang et al.
USB: A Unified Semi-supervised Learning Benchmark For Classification (2022) • Arxiv • 42 citations
Wang et al.
Language Modeling Via Stochastic Processes (2022) • Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22} • 40 citations
Wang et al.
Adamix: Mixture-of-adaptations For Parameter-efficient Model Tuning (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 54 citations
Wang et al.
RT-1: Robotics Transformer For Real-world Control At Scale (2022) • Robotics: Science and Systems 2023 • 372 citations
Brohan et al.
Gpt-neox-20b: An Open-source Autoregressive Language Model (2022) • Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models • 241 citations
Black et al.
GLM-130B: An Open Bilingual Pre-trained Model (2022) • Arxiv • 288 citations
Zeng et al.
Exploiting Unlabeled Data With Vision And Language Models For Object Detection (2022) • Lecture Notes in Computer Science • 74 citations
Zhao et al.
A Model-agnostic Data Manipulation Method For Persona-based Dialogue Generation (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 40 citations
Cao et al.
Natgen: Generative Pre-training By "naturalizing" Source Code (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 94 citations
Chakraborty et al.
Revisiting Parameter-efficient Tuning: Are We Really There Yet? (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 44 citations
Chen et al.
Murag: Multimodal Retrieval-augmented Generator For Open Question Answering Over Images And Text (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 65 citations
Chen et al.
A Span-level Bidirectional Network For Aspect Sentiment Triplet Extraction (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 44 citations
Chen et al.
Alpa: Automating Inter- And Intra-operator Parallelism For Distributed Deep Learning (2022) • Arxiv • 75 citations
Zheng et al.
Can Foundation Models Wrangle Your Data? (2022) • Proceedings of the VLDB Endowment • 107 citations
Narayan et al.
ZSON: Zero-shot Object-goal Navigation Using Multimodal Goal Embeddings (2022) • Arxiv • 41 citations
Majumdar et al.
When Not To Trust Language Models: Investigating Effectiveness Of Parametric And Non-parametric Memories (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 146 citations
Mallen et al.
No Language Left Behind: Scaling Human-centered Machine Translation (2022) • Arxiv • 354 citations
Team et al.
St-moe: Designing Stable And Transferable Sparse Expert Models (2022) • Arxiv • 43 citations
Zoph et al.
Scaling Autoregressive Models For Content-rich Text-to-image Generation (2022) • Arxiv • 339 citations
Yu et al.
Vitality: Unifying Low-rank And Sparse Approximation For Vision Transformer Acceleration With A Linear Taylor Attention (2022) • 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA) • 42 citations
Dass et al.
Structured Pruning Learns Compact And Accurate Models (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 99 citations
Mengzhou Xia, Zexuan Zhong, Danqi Chen
Robust Natural Language Processing: Recent Advances, Challenges, And Future Directions (2022) • IEEE Access • 63 citations
Omar et al.
Shortcut Learning Of Large Language Models In Natural Language Understanding (2022) • Communications of the ACM • 44 citations
Du et al.
Minedojo: Building Open-ended Embodied Agents With Internet-scale Knowledge (2022) • Arxiv • 59 citations
Fan et al.
Promptdet: Towards Open-vocabulary Detection Using Uncurated Images (2022) • Lecture Notes in Computer Science • 119 citations
Feng et al.
Mapping Global Dynamics Of Benchmark Creation And Saturation In Artificial Intelligence (2022) • Nature Communications • 55 citations
Ott et al.
Efficient Adapter Transfer Of Self-supervised Speech Models For Automatic Speech Recognition (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 52 citations
Bethan Thomas, Samuel Kessler, Salah Karout
Unified Structure Generation For Universal Information Extraction (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 345 citations
Lu et al.
Webshop: Towards Scalable Real-world Web Interaction With Grounded Language Agents (2022) • Arxiv • 46 citations
Yao et al.
Efficient Multimodal Transformer With Dual-level Feature Restoration For Robust Multimodal Sentiment Analysis (2022) • IEEE Transactions on Affective Computing • 136 citations
Sun et al.
Self-critiquing Models For Assisting Human Evaluators (2022) • Arxiv • 46 citations
Saunders et al.
Neighborhood Attention Transformer (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 311 citations
Hassani et al.
Unifiedskg: Unifying And Multi-tasking Structured Knowledge Grounding With Text-to-text Language Models (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 74 citations
Xie et al.
Large Language Models Are Reasoning Teachers (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 44 citations
Namgyu Ho, Laura Schmid, Se-Young Yun
Unit: Multimodal Multitask Learning With A Unified Transformer (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 224 citations
Ronghang Hu, Amanpreet Singh
Visually Grounded Reasoning Across Languages And Cultures (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 84 citations
Liu et al.
Annotating Columns With Pre-trained Language Models (2021) • SIGMOD/PODS '22: International Conference on Management of Data • 58 citations
Suhara et al.
Discriminative Triad Matching And Reconstruction For Weakly Referring Expression Grounding (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 56 citations
Sun et al.
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training For Language Understanding And Generation (2021) • Arxiv • 191 citations
Sun et al.
Paradigm Shift In Natural Language Processing (2021) • Machine Intelligence Research • 80 citations
Sun et al.
Efficient Attentions For Long Document Summarization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 121 citations
Huang et al.
Hierarchical Learning For Generation With Long Source Sequences (2021) • Arxiv • 41 citations
Tobias Rohde, Xiaoxia Wu, Yinhan Liu
Random Feature Attention (2021) • Arxiv • 122 citations
Peng et al.
M6: A Chinese Multimodal Pretrainer (2021) • Arxiv • 48 citations
Lin et al.
Learning To Warm Up Cold Item Embeddings For Cold-start Recommendation With Meta Scaling And Shifting Networks (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 116 citations
Zhu et al.
SUPERB: Speech Processing Universal Performance Benchmark (2021) • Interspeech 2021 • 553 citations
Yang et al.
Perceiver IO: A General Architecture For Structured Inputs & Outputs (2021) • Arxiv • 205 citations
Jaegle et al.
Scaling Up Visual And Vision-language Representation Learning With Noisy Text Supervision (2021) • International Conference on Machine Learning 2021 • 1191 citations
Jia et al.
Multimodal Emergent Fake News Detection Via Meta Neural Process Networks (2021) • Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining • 51 citations
Wang et al.
The Power Of Scale For Parameter-efficient Prompt Tuning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 94 citations
Brian Lester, Rami Al-Rfou, Noah Constant
Learning Transferable Visual Models From Natural Language Supervision (2021) • Arxiv • 5297 citations
Radford et al.
Zero-infinity: Breaking The GPU Memory Wall For Extreme Scale Deep Learning (2021) • Arxiv • 57 citations
Rajbhandari et al.
Entailment As Few-shot Learner (2021) • Arxiv • 106 citations
Wang et al.
Crossformer: A Versatile Vision Transformer Hinging On Cross-scale Attention (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 157 citations
Wang et al.
Curriculum Pre-training Heterogeneous Subgraph Transformer For Top-$n$ Recommendation (2021) • ACM Transactions on Information Systems • 45 citations
Wang et al.
Pangu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language Models With Auto-parallel Computation (2021) • Arxiv • 94 citations
Zeng et al.
Contextualized Streaming End-to-end Speech Recognition With Trie-based Deep Biasing And Shallow Fusion (2021) • Interspeech 2021 • 58 citations
Le et al.
Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 98 citations
Miech et al.
Understanding Data Storage And Ingestion For Large-scale Deep Recommendation Model Training (2021) • Proceedings of the 49th Annual International Symposium on Computer Architecture • 59 citations
Zhao et al.
HET: Scaling Out Huge Embedding Model Training Via Cache-enabled Distributed Framework (2021) • Proceedings of the VLDB Endowment • 45 citations
Miao et al.
MELM: Data Augmentation With Masked Entity Language Modeling For Low-resource NER (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 50 citations
Zhou et al.
Industry Scale Semi-supervised Learning For Natural Language Understanding (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers • 67 citations
Chen et al.
An Empirical Investigation Of The Role Of Pre-training In Lifelong Learning (2021) • Journal of Machine Learning Research 24 (2023) 1-50 • 42 citations
Mehta et al.
On The Validity Of Pre-trained Transformers For Natural Language Processing In The Software Engineering Domain (2021) • IEEE Transactions on Software Engineering • 63 citations
Julian von Der Mosel, Alexander Trautsch, Steffen Herbold
Constrained Language Models Yield Few-shot Semantic Parsers (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 54 citations
Shin et al.
Fairfil: Contrastive Neural Debiasing Method For Pretrained Text Encoders (2021) • Arxiv • 50 citations
Cheng et al.
Sentence-t5: Scalable Sentence Encoders From Pre-trained Text-to-text Models (2021) • Arxiv • 62 citations
Ni et al.
Trankit: A Light-weight Transformer-based Toolkit For Multilingual Natural Language Processing (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations • 43 citations
Nguyen et al.
Unipelt: A Unified Framework For Parameter-efficient Language Model Tuning (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 51 citations
Mao et al.
Efficient Large-scale Language Model Training On GPU Clusters Using Megatron-lm (2021) • SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis • 340 citations
Narayanan et al.
Raise A Child In Large Language Model: Towards Effective And Generalizable Fine-tuning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 57 citations
Xu et al.
Parameter-efficient Multi-task Fine-tuning For Transformers Via Shared Hypernetworks (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 54 citations
Mahabadi et al.
Multimodal End-to-end Sparse Model For Emotion Recognition (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 70 citations
Dai et al.
Deepxml: A Deep Extreme Multi-label Learning Framework Applied To Short Text Documents (2021) • Proceedings of the 14th ACM International Conference on Web Search and Data Mining • 60 citations
Dahiya et al.
Ultra-fine Entity Typing With Weak Supervision From A Masked Language Model (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 40 citations
Hongliang Dai, Yangqiu Song, Haixun Wang
Unified Conversational Recommendation Policy Learning Via Graph-based Reinforcement Learning (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 119 citations
Deng et al.
The NLP Cookbook: Modern Recipes For Transformer Based Deep Learning Architectures (2021) • IEEE Access • 121 citations
Sushant Singh, Ausif Mahmood
Luna: Linear Unified Nested Attention (2021) • Arxiv • 49 citations
Ma et al.
Scaling End-to-end Models For Large-scale Multilingual ASR (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 46 citations
Li et al.
Dytox: Transformers For Continual Learning With Dynamic Token Expansion (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 256 citations
Douillard et al.
Layerwise Optimization By Gradient Decomposition For Continual Learning (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Tang et al.
Fjord: Fair And Accurate Federated Learning Under Heterogeneous Targets With Ordered Dropout (2021) • Arxiv • 64 citations
Horvath et al.
Label Verbalization And Entailment For Effective Zero- And Few-shot Relation Extraction (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 92 citations
Sainz et al.
Position Bias Mitigation: A Knowledge-aware Graph Model For Emotion Cause Extraction (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 47 citations
Yan et al.
On Hallucination And Predictive Uncertainty In Conditional Language Generation (2021) • Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume • 63 citations
Yijun Xiao, William Yang Wang
BROS: A Pre-trained Language Model Focusing On Text And Layout For Better Key Information Extraction From Documents (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 105 citations
Hong et al.
Longt5: Efficient Text-to-text Transformer For Long Sequences (2021) • Findings of the Association for Computational Linguistics: NAACL 2022 • 87 citations
Guo et al.
Natural Language Video Localization With Learnable Moment Proposals (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 42 citations
Xiao et al.
Zero-offload: Democratizing Billion-scale Model Training (2021) • Arxiv • 61 citations
Ren et al.
Pre-trained Models: Past, Present And Future (2021) • AI Open • 700 citations
Han et al.
Exploring The Limits Of Out-of-distribution Detection (2021) • Arxiv • 107 citations
Stanislav Fort, Jie Ren, Balaji Lakshminarayanan
Textgnn: Improving Text Encoder Via Graph Neural Network In Sponsored Search (2021) • Proceedings of the Web Conference 2021 • 40 citations
Zhu et al.
Bigssl: Exploring The Frontier Of Large-scale Semi-supervised Learning For Automatic Speech Recognition (2021) • IEEE Journal of Selected Topics in Signal Processing • 148 citations
Zhang et al.
Exploring Underexplored Limitations Of Cross-domain Text-to-sql Generalization (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 52 citations
Yujian Gan, Xinyun Chen, Matthew Purver
Natural SQL: Making SQL Easier To Infer From Natural Language Specifications (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 58 citations
Gan et al.
Crossfit: A Few-shot Learning Challenge For Cross-task Generalization In NLP (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 100 citations
Qinyuan Ye, Bill Yuchen Lin, Xiang Ren
INVIGORATE: Interactive Visual Grounding And Grasping In Clutter (2021) • Robotics: Science and Systems XVII • 45 citations
Zhang et al.
Multi-turn Dialogue Reading Comprehension With Pivot Turns And Knowledge (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 216 citations
Zhuosheng Zhang, Junlong Li, Hai Zhao
Styleswin: Transformer-based GAN For High-resolution Image Generation (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 202 citations
Zhang et al.
Natural Language Video Localization: A Revisit In Span-based Question Answering Framework (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 87 citations
Zhang et al.
Transformers In Vision: A Survey (2021) • ACM Computing Surveys • 2262 citations
Khan et al.
Efficiently Modeling Long Sequences With Structured State Spaces (2021) • Arxiv • 471 citations
Albert Gu, Karan Goel, Christopher Ré
Self-diagnosis And Self-debiasing: A Proposal For Reducing Corpus-based Bias In NLP (2021) • Transactions of the Association for Computational Linguistics • 196 citations
Timo Schick, Sahana Udupa, Hinrich Schütze
Textflint: Unified Multilingual Robustness Evaluation Toolkit For Natural Language Processing (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations • 79 citations
Gui et al.
Gpt3mix: Leveraging Large-scale Language Models For Text Augmentation (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 67 citations
Yoo et al.
Few-shot Text Generation With Pattern-exploiting Training (2020) • Arxiv • 74 citations
Timo Schick, Hinrich Schütze
State-of-the-art Augmented NLP Transformer Models For Direct And Single-step Retrosynthesis (2020) • Nature Communications • 289 citations
Tetko et al.
Dynabert: Dynamic BERT With Adaptive Width And Depth (2020) • Arxiv • 119 citations
Hou et al.
Automatically Identifying Words That Can Serve As Labels For Few-shot Text Classification (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 148 citations
Timo Schick, Helmut Schmid, Hinrich Schütze
Prottrans: Towards Cracking The Language Of Life's Code Through Self-supervised Deep Learning And High Performance Computing (2020) • Arxiv • 71 citations
Elnaggar et al.
Mt5: A Massively Multilingual Pre-trained Text-to-text Transformer (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 618 citations
Xue et al.
Training With Quantization Noise For Extreme Model Compression (2020) • Arxiv • 113 citations
Fan et al.
Heterogeneous Graph Transformer (2020) • WWW '20: The Web Conference 2020 • 949 citations
Hu et al.
Adamp: Slowing Down The Slowdown For Momentum Optimizers On Scale-invariant Weights (2020) • Arxiv • 81 citations
Heo et al.
Scalable Multi-hop Relational Reasoning For Knowledge-aware Question Answering (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 198 citations
Feng et al.
Measuring Massive Multitask Language Understanding (2020) • Arxiv • 248 citations
Hendrycks et al.
HOLMES: Health Online Model Ensemble Serving For Deep Learning Models In Intensive Care Units (2020) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 72 citations
Hong et al.
It's Not Just Size That Matters: Small Language Models Are Also Few-shot Learners (2020) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 234 citations
Timo Schick, Hinrich Schütze
The Value Of Text For Small Business Default Prediction: A Deep Learning Approach (2020) • European Journal of Operational Research • 104 citations
Matthew Stevenson, Christophe Mues, Cristián Bravo
Dynamic Knowledge Routing Network For Target-guided Open-domain Conversation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 87 citations
Qin et al.
Declutr: Deep Contrastive Learning For Unsupervised Textual Representations (2020) • Arxiv • 97 citations
Giorgi et al.
Distilled One-shot Federated Learning (2020) • Arxiv • 73 citations
Zhou et al.
Trippy: A Triple Copy Strategy For Value Independent Neural Dialog State Tracking (2020) • Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue • 181 citations
Heck et al.
POINTER: Constrained Progressive Text Generation Via Insertion-based Generative Pre-training (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 66 citations
Zhang et al.
HIN: Hierarchical Inference Network For Document-level Relation Extraction (2020) • Lecture Notes in Computer Science • 117 citations
Tang et al.
Fastai: A Layered API For Deep Learning (2020) • Information • 943 citations
Jeremy Howard, Sylvain Gugger
Compressing Large-scale Transformer-based Models: A Case Study On BERT (2020) • Transactions of the Association for Computational Linguistics • 102 citations
Ganesh et al.
Semi-supervised Neural Architecture Search (2020) • Arxiv • 43 citations
Luo et al.
Gshard: Scaling Giant Models With Conditional Computation And Automatic Sharding (2020) • Arxiv • 345 citations
Lepikhin et al.
Improved Natural Language Generation Via Loss Truncation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 72 citations
Daniel Kang, Tatsunori Hashimoto
CPM: A Large-scale Generative Chinese Pre-trained Language Model (2020) • AI Open • 59 citations
Zhang et al.
Intrinsic Dimensionality Explains The Effectiveness Of Language Model Fine-tuning (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 154 citations
Armen Aghajanyan, Luke Zettlemoyer, Sonal Gupta
BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining (2020) • Proceedings of the 3rd Clinical Natural Language Processing Workshop • 73 citations
Zachariah Zhang, Jingshu Liu, Narges Razavian
ETC: Encoding Long And Structured Inputs In Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 103 citations
Ainslie et al.
Hover: A Dataset For Many-hop Fact Extraction And Claim Verification (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 83 citations
Jiang et al.
Self-supervised Multimodal Versatile Networks (2020) • Arxiv • 195 citations
Alayrac et al.
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder For Long-form Document Matching (2020) • CIKM '20: The 29th ACM International Conference on Information and Knowledge Management • 53 citations
Yang et al.
When BERT Plays The Lottery, All Tickets Are Winning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 44 citations
Sai Prasanna, Anna Rogers, Anna Rumshisky
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters (2020) • Interspeech 2020 • 73 citations
Pratap et al.
Investigating Pretrained Language Models For Graph-to-text Generation (2020) • Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI • 43 citations
Ribeiro et al.
Knowledge Distillation For Multi-task Learning (2020) • Lecture Notes in Computer Science • 52 citations
Wei-Hong Li, Hakan Bilen
Colake: Contextualized Language And Knowledge Embedding (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 156 citations
Sun et al.
Conditional Augmentation For Aspect Term Extraction Via Masked Sequence-to-sequence Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 99 citations
Li et al.
On Negative Interference In Multilingual Models: Findings And A Meta-learning Treatment (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 70 citations
Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov
Low-resource Knowledge-grounded Dialogue Generation (2020) • Arxiv • 84 citations
Zhao et al.
Asking Questions The Human Way: Scalable Question-answer Generation From Text Corpus (2020) • Proceedings of The Web Conference 2020 • 43 citations
Liu et al.
Fairseq S2T: Fast Speech-to-text Modeling With Fairseq (2020) • Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations • 110 citations
Wang et al.
Gradient Vaccine: Investigating And Improving Multi-task Optimization In Massively Multilingual Models (2020) • Arxiv • 59 citations
Wang et al.
COVID-19 Signsym: A Fast Adaptation Of A General Clinical NLP Tool To Identify And Normalize COVID-19 Signs And Symptoms To OMOP Common Data Model (2020) • Journal of the American Medical Informatics Association • 67 citations
Wang et al.
Probing Pretrained Language Models For Lexical Semantics (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 53 citations
Vulić et al.
German's Next Language Model (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 76 citations
Branden Chan, Stefan Schweter, Timo Möller
SOLOIST: Building Task Bots At Scale With Transfer Learning And Machine Teaching (2020) • Arxiv • 99 citations
Peng et al.
KGPT: Knowledge-grounded Pre-training For Data-to-text Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 111 citations
Chen et al.
A Rigorous Study On Named Entity Recognition: Can Fine-tuning Pretrained Model Lead To The Promised Land? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 40 citations
Lin et al.
Efficient Content-based Sparse Attention With Routing Transformers (2020) • Transactions of the Association for Computational Linguistics • 266 citations
Roy et al.
How Good Is Your Tokenizer? On The Monolingual Performance Of Multilingual Language Models (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 50 citations
Rust et al.
Pre-training For Abstractive Document Summarization By Reinstating Source Text (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 42 citations
Zou et al.
Zero-resource Knowledge-grounded Dialogue Generation (2020) • Arxiv • 50 citations
Li et al.
Chemberta: Large-scale Self-supervised Pretraining For Molecular Property Prediction (2020) • Arxiv • 344 citations
Seyone Chithrananda, Gabriel Grand, Bharath Ramsundar
Ladabert: Lightweight Adaptation Of BERT Through Hybrid Model Compression (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 40 citations
Mao et al.
Grounded Adaptation For Zero-shot Executable Semantic Parsing (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 83 citations
Zhong et al.
Rethinking Attention With Performers (2020) • Arxiv • 122 citations
Choromanski et al.
Memory-efficient Pipeline-parallel DNN Training (2020) • Arxiv • 60 citations
Narayanan et al.
XTREME: A Massively Multilingual Multi-task Benchmark For Evaluating Cross-lingual Generalization (2020) • Arxiv • 299 citations
Hu et al.
Understanding The Difficulty Of Training Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 81 citations
Liu et al.
Med-bert: Pre-trained Contextualized Embeddings On Large-scale Structured Electronic Health Records For Disease Prediction (2020) • Arxiv • 61 citations
Rasmy et al.
Underspecification Presents Challenges For Credibility In Modern Machine Learning (2020) • Arxiv • 428 citations
D'Amour et al.
Mixup-transformer: Dynamic Data Augmentation For NLP Tasks (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 72 citations
Sun et al.
Bridging Textual And Tabular Data For Cross-domain Text-to-sql Semantic Parsing (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 145 citations
Xi Victoria Lin, Richard Socher, Caiming Xiong
Document-level Event Role Filler Extraction Using Multi-granularity Contextualized Encoding (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 98 citations
Xinya Du, Claire Cardie
Calibration Of Pre-trained Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 60 citations
Shrey Desai, Greg Durrett
Understanding And Improving Information Transfer In Multi-task Learning (2020) • Arxiv • 47 citations
Sen Wu, Hongyang R. Zhang, Christopher Ré
Low-resource Deep Entity Resolution With Transfer And Active Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 120 citations
Kasai et al.
Practice On Long Sequential User Behavior Modeling For Click-through Rate Prediction (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 198 citations
Pi et al.
Investigating Meta-learning Algorithms For Low-resource Natural Language Understanding Tasks (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 106 citations
Zi-Yi Dou, Keyi Yu, Antonios Anastasopoulos
Hierarchical Temporal Convolutional Networks For Dynamic Recommender Systems (2019) • The World Wide Web Conference • 110 citations
You et al.
Show Your Work: Improved Reporting Of Experimental Results (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 215 citations
Dodge et al.
Ar-net: A Simple Auto-regressive Neural Network For Time-series (2019) • Arxiv • 49 citations
Oskar Triebe, Nikolay Laptev, Ram Rajagopal
Learning And Evaluating General Linguistic Intelligence (2019) • Arxiv • 156 citations
Yogatama et al.
Automated Essay Scoring Based On Two-stage Learning (2019) • Arxiv • 45 citations
Jiawei Liu, Yang Xu, Yaguang Zhu
Patient Knowledge Distillation For BERT Model Compression (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 85 citations
Sun et al.
Large-batch Training For LSTM And Beyond (2019) • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis • 74 citations
You et al.
Linguistic Knowledge And Transferability Of Contextual Representations (2019) • Proceedings of the 2019 Conference of the North • 57 citations
Liu et al.
ALBERT: A Lite BERT For Self-supervised Learning Of Language Representations (2019) • Arxiv • 4051 citations
Lan et al.
Do We Really Need Fully Unsupervised Cross-lingual Embeddings? (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 109 citations
Vulić et al.
Cascaded Revision Network For Novel Object Captioning (2019) • IEEE Transactions on Circuits and Systems for Video Technology • 40 citations
Feng et al.
Distributionally Robust Neural Networks For Group Shifts: On The Importance Of Regularization For Worst-case Generalization (2019) • Arxiv • 364 citations
Sagawa et al.
Optimizing Multi-gpu Parallelization Strategies For Deep Learning Training (2019) • IEEE Micro • 70 citations
Pal et al.
Latent Normalizing Flows For Discrete Sequences (2019) • Arxiv • 46 citations
Zachary M. Ziegler, Alexander M. Rush
Choosing Transfer Languages For Cross-lingual Learning (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 148 citations
Lin et al.
Adversarial Learning With Contextual Embeddings For Zero-resource Cross-lingual Classification And NER (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 71 citations
Phillip Keung, Yichao Lu, Vikas Bhardwaj
Using Local Knowledge Graph Construction To Scale Seq2seq Models To Multi-document Inputs (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 98 citations
Fan et al.
Towards Scalable And Reliable Capsule Networks For Challenging NLP Applications (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 136 citations
Zhao et al.
Do Sentence Interactions Matter? Leveraging Sentence Level Representations For Fake News Classification (2019) • Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13) • 80 citations
Vaibhav Vaibhav, Raghuram Mandyam Annasamy, Eduard Hovy
Unsupervised Cross-lingual Representation Learning At Scale (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 538 citations
Conneau et al.
Key Fact As Pivot: A Two-stage Model For Low Resource Table-to-text Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 45 citations
Ma et al.
Mask-predict: Parallel Decoding Of Conditional Masked Language Models (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 177 citations
Ghazvininejad et al.
Robust Navigation With Language Pretraining And Stochastic Sampling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 95 citations
Li et al.
Beto, Bentz, Becas: The Surprising Cross-lingual Effectiveness Of BERT (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 135 citations
Shijie Wu, Mark Dredze
Domain Adaptive Text Style Transfer (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 56 citations
Li et al.
The Missing Ingredient In Zero-shot Neural Machine Translation (2019) • Arxiv • 92 citations
Arivazhagan et al.
Cloze-driven Pretraining Of Self-attention Networks (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 41 citations
Baevski et al.
Deep Equilibrium Models (2019) • Arxiv • 245 citations
Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Sentence-bert: Sentence Embeddings Using Siamese Bert-networks (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 9086 citations
Nils Reimers, Iryna Gurevych
Simple, Scalable Adaptation For Neural Machine Translation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 84 citations
Ankur Bapna, Naveen Arivazhagan, Orhan Firat
Enhancing Clinical Concept Extraction With Contextual Embeddings (2019) • Journal of the American Medical Informatics Association • 311 citations
Si et al.
Integrating Graph Contextualized Knowledge Into Pre-trained Language Models (2019) • Arxiv • 41 citations
He et al.
Zero-shot Cross-lingual Dialogue Systems With Transferable Latent Variables (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 76 citations
Liu et al.
Understanding And Robustifying Differentiable Architecture Search (2019) • Arxiv • 165 citations
Zela et al.
Real-time Open-domain Question Answering With Dense-sparse Phrase Index (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 135 citations
Seo et al.
Human Vs. Muppet: A Conservative Estimate Of Human Performance On The GLUE Benchmark (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 52 citations
Nikita Nangia, Samuel R. Bowman
Evolutionary Neural Automl For Deep Learning (2019) • Proceedings of the Genetic and Evolutionary Computation Conference • 106 citations
Liang et al.
Zero: Memory Optimizations Toward Training Trillion Parameter Models (2019) • Arxiv • 72 citations
Rajbhandari et al.
Large Memory Layers With Product Keys (2019) • Arxiv • 50 citations
Lample et al.
Graph Representation Learning Via Hard And Channel-wise Attention Networks (2019) • Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 57 citations
Hongyang Gao, Shuiwang Ji
Semi-supervised Text Style Transfer: Cross Projection In Latent Space (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 45 citations
Shang et al.
The State Of Sparsity In Deep Neural Networks (2019) • Arxiv • 436 citations
Trevor Gale, Erich Elsen, Sara Hooker
Federated Learning For Emoji Prediction In A Mobile Keyboard (2019) • Arxiv • 164 citations
Ramaswamy et al.
Knowledge Enhanced Contextual Word Representations (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 652 citations
Peters et al.
Characterizing And Understanding Software Developer Networks In Security Development (2019) • 2019 IEEE International Symposium on Workload Characterization (IISWC) • 44 citations
Song Wang, Nachi Nagappan
Probing Biomedical Embeddings From Language Models (2019) • Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for • 43 citations
Jin et al.
Reweighted Proximal Pruning For Large-scale Language Representation (2019) • Arxiv • 45 citations
Guo et al.
Sentence Embedding Alignment For Lifelong Relation Extraction (2019) • Proceedings of the 2019 Conference of the North • 110 citations
Wang et al.
Multi-task Feature Learning For Knowledge Graph Enhanced Recommendation (2019) • The World Wide Web Conference • 525 citations
Wang et al.
A Constructive Prediction Of The Generalization Error Across Scales (2019) • Arxiv • 49 citations
Rosenfeld et al.
Are Sixteen Heads Really Better Than One? (2019) • Arxiv • 45 citations
Paul Michel, Omer Levy, Graham Neubig
Taming Pretrained Transformers For Extreme Multi-label Text Classification (2019) • KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 135 citations
Chang et al.
KEPLER: A Unified Model For Knowledge Embedding And Pre-trained Language Representation (2019) • Arxiv • 77 citations
Wang et al.
Extracting Multiple-relations In One-pass With Pre-trained Transformers (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 45 citations
Wang et al.
Gmail Smart Compose: Real-time Assisted Writing (2019) • Arxiv • 45 citations
Chen et al.
Semantically Conditioned Dialog Response Generation Via Hierarchical Disentangled Self-attention (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 128 citations
Chen et al.
Convert: Efficient And Accurate Conversational Representations From Transformers (2019) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 41 citations
Henderson et al.
Environmental Drivers Of Systematicity And Generalization In A Situated Agent (2019) • Arxiv • 53 citations
Hill et al.
Star-transformer (2019) • Proceedings of the 2019 Conference of the North • 123 citations
Guo et al.
Low-resource Name Tagging Learned With Weakly Labeled Data (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 86 citations
Cao et al.
Mesh-tensorflow: Deep Learning For Supercomputers (2018) • Arxiv • 52 citations
Shazeer et al.
A Walk With SGD (2018) • Arxiv • 48 citations
Xing et al.
Meta-learning Update Rules For Unsupervised Representation Learning (2018) • Arxiv • 69 citations
Metz et al.
Conversational AI: The Science Behind The Alexa Prize (2018) • Alexa.Prize.Proceedings https://developer.amazon.com/alexaprize/proceedings accessed (2018)-01-01 • 201 citations
Ram et al.
Zero-shot Dialog Generation With Cross-domain Latent Actions (2018) • Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue • 52 citations
Tiancheng Zhao, Maxine Eskenazi
Efficient And Robust Question Answering From Minimal Context Over Documents (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 144 citations
Min et al.
Universal Sentence Encoder (2018) • Arxiv • 1289 citations
Cer et al.
TVM: An Automated End-to-end Optimizing Compiler For Deep Learning (2018) • Arxiv • 900 citations
Chen et al.
Deep Anomaly Detection With Outlier Exposure (2018) • Arxiv • 401 citations
Dan Hendrycks, Mantas Mazeika, Thomas Dietterich
Polisis: Automated Analysis And Presentation Of Privacy Policies Using Deep Learning (2018) • Arxiv • 174 citations
Harkous et al.
Latent Alignment And Variational Attention (2018) • Arxiv • 85 citations
Deng et al.
Sql-to-text Generation With Graph-to-sequence Model (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 74 citations
Xu et al.
Gpipe: Efficient Training Of Giant Neural Networks Using Pipeline Parallelism (2018) • Arxiv • 236 citations
Huang et al.
Learning Memory Access Patterns (2018) • Arxiv • 67 citations
Hashemi et al.
Fpga-based CNN Inference Accelerator Synthesized From Multi-threaded C Software (2018) • 2017 30th IEEE International System-on-Chip Conference (SOCC) • 50 citations
Kim et al.
Learning Neural Templates For Text Generation (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 192 citations
Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Training Millions Of Personalized Dialogue Agents (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 205 citations
Mazaré et al.
Natural Language Processing For Ehr-based Computational Phenotyping (2018) • IEEE/ACM Transactions on Computational Biology and Bioinformatics • 198 citations
Zeng et al.
Modular Networks: Learning To Decompose Neural Computation (2018) • Arxiv • 40 citations
Louis Kirsch, Julius Kunze, David Barber
TADAM: Task Dependent Adaptive Metric For Improved Few-shot Learning (2018) • Advances in Neural Information Processing Systems 31 2018 • 199 citations
Boris N. Oreshkin, Pau Rodriguez, Alexandre Lacoste
Phrase-indexed Question Answering: A New Challenge For Scalable Document Comprehension (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 42 citations
Seo et al.
Training Tips For The Transformer Model (2018) • The Prague Bulletin of Mathematical Linguistics • 109 citations
Martin Popel, Ondřej Bojar
Multi-task Neural Models For Translating Between Styles Within And Across Languages (2018) • Arxiv • 52 citations
Xing Niu, Sudha Rao, Marine Carpuat
Few-shot Learning For Named Entity Recognition In Medical Text (2018) • Arxiv • 55 citations
Hofer et al.
Tensor Comprehensions: Framework-agnostic High-performance Machine Learning Abstractions (2018) • Arxiv • 251 citations
Vasilache et al.
Generating Wikipedia By Summarizing Long Sequences (2018) • Arxiv • 74 citations
Liu et al.
Differentially Private Releasing Via Deep Generative Model (technical Report) (2018) • Arxiv • 40 citations
Xinyang Zhang, Shouling Ji, Ting Wang
Scaling Neural Machine Translation (2018) • Proceedings of the Third Conference on Machine Translation: Research Papers • 80 citations
Ott et al.
Sentence Encoders On Stilts: Supplementary Training On Intermediate Labeled-data Tasks (2018) • Arxiv • 258 citations
Jason Phang, Thibault Févry, Samuel R. Bowman
Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-experts Layer (2017) • Arxiv • 268 citations
Shazeer et al.
Simple Recurrent Units For Highly Parallelizable Recurrence (2017) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 53 citations
Lei et al.
Deep Active Learning For Named Entity Recognition (2017) • Proceedings of the 2nd Workshop on Representation Learning for NLP • 364 citations
Shen et al.
Lifelong Generative Modeling (2017) • Neurocomputing • 102 citations
Jason Ramapuram, Magda Gregorova, Alexandros Kalousis
Challenges In Data-to-document Generation (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 489 citations
Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Fast And Accurate Entity Recognition With Iterated Dilated Convolutions (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 51 citations
Strubell et al.
Latent Relational Metric Learning Via Memory-based Attention For Collaborative Ranking (2017) • Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18 • 214 citations
Yi Tay, Anh Tuan Luu, Siu Cheung Hui
Learning A Neural Semantic Parser From User Feedback (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 54 citations
Iyer et al.
Revisiting Activation Regularization For Language Rnns (2017) • Arxiv • 42 citations
Stephen Merity, Bryan McCann, Richard Socher
Deep Gradient Compression: Reducing The Communication Bandwidth For Distributed Training (2017) • ICLR 2018 • 645 citations
Lin et al.
Gradnorm: Gradient Normalization For Adaptive Loss Balancing In Deep Multitask Networks (2017) • Proceedings of the 35th International Conference on Machine Learning (2018) 793-802 • 443 citations
Chen et al.
Neural Networks For Text Correction And Completion In Keyboard Decoding (2017) • Arxiv • 56 citations
Shaona Ghosh, Per Ola Kristensson
Knowledge Adaptation: Teaching To Adapt (2017) • Arxiv • 41 citations
Sebastian Ruder, Parsa Ghaffari, John G. Breslin
Multiscale Co-design Analysis Of Energy, Latency, Area, And Accuracy Of A Reram Analog Neural Training Accelerator (2017) • IEEE Journal on Emerging and Selected Topics in Circuits and Systems • 151 citations
Marinella et al.
Deeparchitect: Automatically Designing And Training Deep Architectures (2017) • Arxiv • 142 citations
Renato Negrinho, Geoff Gordon
Tensor-train Recurrent Neural Networks For Video Classification (2017) • Arxiv • 102 citations
Yinchong Yang, Denis Krompass, Volker Tresp
Learned Optimizers That Scale And Generalize (2017) • Arxiv • 115 citations
Wichrowska et al.
Unbiasing Truncated Backpropagation Through Time (2017) • Arxiv • 53 citations
Corentin Tallec, Yann Ollivier
Multilingual Hierarchical Attention Networks For Document Classification (2017) • Arxiv • 48 citations
Nikolaos Pappas, Andrei Popescu-Belis
Learning Distributed Representations Of Sentences From Unlabelled Data (2016) • Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 117 citations
Felix Hill, Kyunghyun Cho, Anna Korhonen
Long Short-term Memory-networks For Machine Reading (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 1040 citations
Jianpeng Cheng, Li Dong, Mirella Lapata
Multi-task Cross-lingual Sequence Tagging From Scratch (2016) • Arxiv • 198 citations
Zhilin Yang, Ruslan Salakhutdinov, William Cohen
Learning End-to-end Goal-oriented Dialog (2016) • Arxiv • 75 citations
Antoine Bordes, Y-Lan Boureau, Jason Weston
Acceleration Of Deep Neural Network Training With Resistive Cross-point Devices (2016) • Frontiers in Neuroscience • 407 citations
Tayfun Gokmen, Yurii Vlasov
Pointer Sentinel Mixture Models (2016) • Arxiv • 481 citations
Merity et al.
Scaling Memory-augmented Neural Networks With Sparse Reads And Writes (2016) • Arxiv • 58 citations
Rae et al.
Natural Language Processing (almost) From Scratch (2011) • Arxiv • 5171 citations
Collobert et al.

Showing first 12 while collapsed. Click to expand and reveal all 1018.

Security 215 papers #

Safearena: Evaluating The Safety Of Autonomous Web Agents (2025) • No Venue
Tur et al.
Lifelong Safety Alignment For Language Models (2025) • No Venue
Wang et al.
The Alignment Waltz: Jointly Training Agents To Collaborate For Safety (2025) • No Venue
Zhang et al.
Llama-3.1-foundationai-securityllm-8b-instruct Technical Report (2025) • No Venue
Weerawardhena et al.
The Devil Behind The Mask: An Emergent Safety Vulnerability Of Diffusion Llms (2025) • No Venue
Wen et al.
O3-mini Vs Deepseek-r1: Which One Is Safer? (2025) • No Venue
Arrieta et al.
Hail To The Thief: Exploring Attacks And Defenses In Decentralised GRPO (2025) • No Venue
Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen
Evolve The Method, Not The Prompts: Evolutionary Synthesis Of Jailbreak Attacks On Llms (2025) • No Venue
Chen et al.
Defeating Prompt Injections By Design (2025) • No Venue
Debenedetti et al.
Duoguard: A Two-player Rl-driven Framework For Multilingual LLM Guardrails (2025) • No Venue
Deng et al.
Imperceptible Jailbreaking Against Large Language Models (2025) • No Venue
Gao et al.
The Aloe Family Recipe For Open And Specialized Healthcare Llms (2025) • No Venue
Garcia-Gasulla et al.
Tree-based Dialogue Reinforced Policy Optimization For Red-teaming Attacks (2025) • No Venue
Guo et al.
Building A Foundational Guardrail For General Agentic Systems Via Synthetic Data (2025) • No Venue
Huang et al.
Sentinel: SOTA Model To Protect Against Prompt Injections (2025) • No Venue
Dror Ivry, Oran Nahum
Silent Branding Attack: Trigger-free Data Poisoning Attack On Text-to-image Diffusion Models (2025) • No Venue
Jang et al.
The Rogue Scalpel: Activation Steering Compromises LLM Safety (2025) • No Venue
Korznikov et al.
A.S.E: A Repository-level Benchmark For Evaluating Security In Ai-generated Code (2025) • No Venue
Lian et al.
Saferag: Benchmarking Security In Retrieval-augmented Generation Of Large Language Model (2025) • No Venue
Liang et al.
Advances And Challenges In Foundation Agents: From Brain-inspired Intelligence To Evolutionary, Collaborative, And Safe Systems (2025) • No Venue
Liu et al.
Llm-powered GUI Agents In Phone Automation: Surveying Progress And Prospects (2025) • No Venue
Liu et al.
Deepseek-r1 Thoughtology: Let's About LLM Reasoning (2025) • No Venue
Marjanović et al.
Effective Red-teaming Of Policy-adherent Agents (2025) • No Venue
Nakash et al.
Fedrand: Enhancing Privacy In Federated Learning With Randomized Lora Subparameter Updates (2025) • No Venue
Park et al.
Sweeval: Do Llms Really Swear? A Safety Benchmark For Testing Limits For Enterprise Use (2025) • No Venue
Patel et al.
Saffron-1: Towards An Inference Scaling Paradigm For LLM Safety Assurance (2025) • No Venue
Qiu et al.
X-teaming: Multi-turn Jailbreaks And Defenses With Adaptive Multi-agents (2025) • No Venue
Rahman et al.
Geopolitical Biases In Llms: What Are The "good" And The "bad" Countries According To Contemporary Language Models (2025) • No Venue
Salnikov et al.
Towards Trustworthy GUI Agents: A Survey (2025) • No Venue
Shi et al.
Os-sentinel: Towards Safety-enhanced Mobile GUI Agents Via Hybrid Validation In Realistic Workflows (2025) • No Venue
Sun et al.
Astraios: Parameter-efficient Instruction Tuning Code Large Language Models (2024) • No Venue
Zhuo et al.
The Instruction Hierarchy: Training Llms To Prioritize Privileged Instructions (2024) • No Venue
Wallace et al.
Openai O1 System Card (2024) • No Venue
Openai et al.
Llm-detectaive: A Tool For Fine-grained Machine-generated Text Detection (2024) • No Venue
Abassy et al.
Stealing Part Of A Production Language Model (2024) • No Venue
Carlini et al.
Agentpoison: Red-teaming LLM Agents Via Poisoning Memory Or Knowledge Bases (2024) • No Venue
Chen et al.
A Flexible Large Language Models Guardrail Development Methodology Applied To Off-topic Prompt Detection (2024) • No Venue
Gabriel Chua, Shing Yee Chan, Shaun Khoo
Self-recognition In Language Models (2024) • No Venue
Davidson et al.
Security And Privacy Challenges Of Large Language Models: A Survey (2024) • ACM Computing Surveys • 104 citations
Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu
Mllm-as-a-judge For Image Safety Without Human Labeling (2024) • No Venue
Wang et al.
Model Surgery: Modulating Llm's Behavior Via Simple Parameter Editing (2024) • No Venue
Wang et al.
Trustllm: Trustworthiness In Large Language Models (2024) • No Venue
Sun et al.
Watermarking Makes Language Models Radioactive (2024) • No Venue
Sander et al.
Generative AI In EU Law: Liability, Privacy, Intellectual Property, And Cybersecurity (2024) • Arxiv • 45 citations
Novelli et al.
Aurora-m: The First Open Source Multilingual Language Model Red-teamed According To The U.S. Executive Order (2024) • No Venue
Nakamura et al.
Advprompter: Fast Adaptive Adversarial Prompting For Llms (2024) • No Venue
Paulus et al.
Large Language Model For Vulnerability Detection: Emerging Results And Future Directions (2024) • ICSE-NIER'24: 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results • 74 citations
Xin Zhou, Ting Zhang, David Lo
Adapting Safe-for-work Classifier For Malaysian Language Text: Enhancing Alignment In Llm-ops Framework (2024) • No Venue
Razak et al.
CLEAR: Character Unlearning In Textual And Visual Modalities (2024) • No Venue
Dontsov et al.
Fuzzcoder: Byte-level Fuzzing Test Via Large Language Model (2024) • No Venue
Yang et al.
Model Merging And Safety Alignment: One Bad Model Spoils The Bunch (2024) • No Venue
Hammoud et al.
Spotting Llms With Binoculars: Zero-shot Detection Of Machine-generated Text (2024) • No Venue
Hans et al.
Sleeper Agents: Training Deceptive Llms That Persist Through Safety Training (2024) • No Venue
Hubinger et al.
Course-correction: Safety Alignment Using Synthetic Preferences (2024) • No Venue
Xu et al.
AI Deception: A Survey Of Examples, Risks, And Potential Solutions (2023) • Patterns • 88 citations
Park et al.
Contrabert: Enhancing Code Pre-trained Models Via Contrastive Learning (2023) • 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) • 49 citations
Liu et al.
Deid-gpt: Zero-shot Medical Text De-identification By GPT-4 (2023) • Arxiv • 89 citations
Liu et al.
Jailbreaking Chatgpt Via Prompt Engineering: An Empirical Study (2023) • Arxiv • 99 citations
Liu et al.
Fm-vit: Flexible Modal Vision Transformers For Face Anti-spoofing (2023) • IEEE Transactions on Information Forensics and Security • 70 citations
Liu et al.
Fake News In Sheep's Clothing: Robust Fake News Detection Against Llm-empowered Style Attacks (2023) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 44 citations
Jiaying Wu, Jiafeng Guo, Bryan Hooi
Analyzing Leakage Of Personally Identifiable Information In Language Models (2023) • 2023 IEEE Symposium on Security and Privacy (SP) • 86 citations
Lukas et al.
FLIP: Cross-domain Face Anti-spoofing With Language Guidance (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar
Fuzz4all: Universal Fuzzing With Large Language Models (2023) • ICSE '24: IEEE/ACM 46th International Conference on Software Engineering • 108 citations
Xia et al.
Chatgpt: More Than A Weapon Of Mass Deception, Ethical Challenges And Responses From The Human-centered Artificial Intelligence (HCAI) Perspective (2023) • International Journal of Human–Computer Interaction • 63 citations
Sison et al.
Universal And Transferable Adversarial Attacks On Aligned Language Models (2023) • Arxiv • 171 citations
Zou et al.
AI Vs. Human -- Differentiation Analysis Of Scientific Content Generation (2023) • Arxiv • 73 citations
Ma et al.
Unveiling Security, Privacy, And Ethical Concerns Of Chatgpt (2023) • Journal of Information and Intelligence • 165 citations
Xiaodong Wu, Ran Duan, Jianbing Ni
Deepfakes, Misinformation, And Disinformation In The Era Of Frontier AI, Generative AI, And Large AI Models (2023) • 2023 International Conference on Computer and Applications (ICCA) • 69 citations
Shoaib et al.
How Effective Are Neural Networks For Fixing Security Vulnerabilities (2023) • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis • 61 citations
Wu et al.
Membership Inference Attacks Against Language Models Via Neighbourhood Comparison (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 46 citations
Mattern et al.
Detectgpt: Zero-shot Machine-generated Text Detection Using Probability Curvature (2023) • Arxiv • 151 citations
Mitchell et al.
Fixing Hardware Security Bugs With Large Language Models (2023) • IEEE Transactions on Information Forensics and Security • 47 citations
Ahmad et al.
Zero-shot Learning For Requirements Classification: An Exploratory Study (2023) • Information and Software Technology • 61 citations
Waad Alhoshan, Alessio Ferrari, Liping Zhao
Decodingtrust: A Comprehensive Assessment Of Trustworthiness In GPT Models (2023) • Arxiv • 58 citations
Wang et al.
Purple Llama Cyberseceval: A Secure Coding Benchmark For Language Models (2023) • No Venue
Bhatt et al.
A Categorical Archive Of Chatgpt Failures (2023) • Arxiv • 395 citations
Ali Borji
Extracting Training Data From Diffusion Models (2023) • Arxiv • 93 citations
Carlini et al.
Unleashing The Potential Of Prompt Engineering For Large Language Models (2023) • Patterns • 86 citations
Chen et al.
Diversevul: A New Vulnerable Source Code Dataset For Deep Learning Based Vulnerability Detection (2023) • Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses • 136 citations
Chen et al.
A Survey On Zero Pronoun Translation (2023) • IEEE Open Journal of the Computer Society • 240 citations
Wang et al.
Anti-dreambooth: Protecting Users From Personalized Text-to-image Synthesis (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Le et al.
Llms Cannot Reliably Identify And Reason About Security Vulnerabilities (yet?): A Comprehensive Evaluation, Framework, And Benchmarks (2023) • 2024 IEEE Symposium on Security and Privacy (SP) • 50 citations
Ullah et al.
Masterkey: Automated Jailbreak Across Multiple Large Language Model Chatbots (2023) • Network and Distributed System Security Symposium • 65 citations
Deng et al.
Beyond The Safeguards: Exploring The Security Risks Of Chatgpt (2023) • Arxiv • 54 citations
Erik Derner, Kristina Batistič
A Pilot Study Of Query-free Adversarial Attack Against Stable Diffusion (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 44 citations
Haomin Zhuang, Yihua Zhang, Sijia Liu
A Comprehensive Survey On Multimodal Recommender Systems: Taxonomy, Evaluation, And Future Directions (2023) • Arxiv • 149 citations
Zhou et al.
Detecting And Grounding Multi-modal Media Manipulation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Rui Shao, Tianxing Wu, Ziwei Liu
Decoding The Threat Landscape : Chatgpt, Fraudgpt, And Wormgpt In Social Engineering Attacks (2023) • International Journal of Scientific Research in Computer Science, Engineering and Information Technology • 43 citations
Polra Victor Falade
Can Ai-generated Text Be Reliably Detected? (2023) • Arxiv • 144 citations
Sadasivan et al.
A Multi-task Multi-stage Transitional Training Framework For Neural Chat Translation (2023) • Proceedings of the 2023 ACM International Conference on Multimedia Retrieval • 48 citations
Zhou et al.
Glaze: Protecting Artists From Style Mimicry By Text-to-image Models (2023) • Arxiv • 41 citations
Shan et al.
Revolutionizing Cyber Threat Detection With Large Language Models: A Privacy-preserving Bert-based Lightweight Model For Iot/iiot Devices (2023) • IEEE Access • 157 citations
Ferrag et al.
Not What You've Signed Up For: Compromising Real-world Llm-integrated Applications With Indirect Prompt Injection (2023) • CCS '23: ACM SIGSAC Conference on Computer and Communications Security • 178 citations
Greshake et al.
On The Adversarial Robustness Of Multi-modal Foundation Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 42 citations
Christian Schlarmann, Matthias Hein
A Survey On Large Language Model (LLM) Security And Privacy: The Good, The Bad, And The Ugly (2023) • High-Confidence Computing • 594 citations
Yao et al.
Getting Pwn'd By AI: Penetration Testing With Large Language Models (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 68 citations
Andreas Happe, Jürgen Cito
Breaking The Silence: The Threats Of Using Llms In Software Engineering (2023) • ICSE-NIER'24: 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results • 52 citations
June Sallou, Thomas Durieux, Annibale Panichella
Large Language Models For Code: Security Hardening And Adversarial Testing (2023) • CCS '23: ACM SIGSAC Conference on Computer and Communications Security • 66 citations
Jingxuan He, Martin Vechev
LLM For Soc Security: A Paradigm Shift (2023) • IEEE Access • 52 citations
Saha et al.
Inferfix: End-to-end Program Repair With Llms (2023) • ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 97 citations
Jin et al.
TPU V4: An Optically Reconfigurable Supercomputer For Machine Learning With Hardware Support For Embeddings (2023) • Proceedings of the 50th Annual International Symposium on Computer Architecture • 322 citations
Jouppi et al.
Multi-step Jailbreaking Privacy Attacks On Chatgpt (2023) • Findings of the Association for Computational Linguistics: EMNLP 2023 • 117 citations
Li et al.
Applying Large Language Models To Power Systems: Potential Security Threats (2023) • IEEE Transactions on Smart Grid • 41 citations
Ruan et al.
Unsafe Diffusion: On The Generation Of Unsafe Images And Hateful Memes From Text-to-image Models (2023) • Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security • 42 citations
Qu et al.
Visual Adversarial Examples Jailbreak Aligned Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 73 citations
Qi et al.
(security) Assertions By Large Language Models (2023) • IEEE Transactions on Information Forensics and Security • 44 citations
Kande et al.
Exploiting Programmatic Behavior Of Llms: Dual-use Through Standard Security Attacks (2023) • 2024 IEEE Security and Privacy Workshops (SPW) • 47 citations
Kang et al.
"it's A Fair Game", Or Is It? Examining How Users Navigate Disclosure Risks And Benefits When Using Llm-based Conversational Agents (2023) • CHI '24: CHI Conference on Human Factors in Computing Systems • 46 citations
Zhang et al.
How Secure Is Code Generated By Chatgpt? (2023) • 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) • 88 citations
Khoury et al.
Transferable Adversarial Attacks On Vision Transformers With Token Gradient Regularization (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 48 citations
Zhang et al.
"do Anything Now": Characterizing And Evaluating In-the-wild Jailbreak Prompts On Large Language Models (2023) • CCS '24: ACM SIGSAC Conference on Computer and Communications Security • 79 citations
Shen et al.
In Chatgpt We Trust? Measuring And Characterizing The Reliability Of Chatgpt (2023) • Arxiv • 71 citations
Shen et al.
ET-BERT: A Contextualized Datagram Representation With Pre-training Transformers For Encrypted Traffic Classification (2022) • WWW '22: The ACM Web Conference 2022 • 298 citations
Lin et al.
Are Large Pre-trained Language Models Leaking Your Personal Information? (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 49 citations
Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang
Backdoor Defense Via Decoupling The Training Process (2022) • Arxiv • 41 citations
Huang et al.
Natural Attack For Pre-trained Models Of Code (2022) • Proceedings of the 44th International Conference on Software Engineering • 130 citations
Yang et al.
Codeattack: Code-based Adversarial Attacks For Pre-trained Programming Language Models (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 40 citations
Akshita Jha, Chandan K. Reddy
A New Generation Of Perspective API: Efficient Multilingual Character-level Transformers (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 88 citations
Lees et al.
Quantifying Privacy Risks Of Masked Language Models Using Membership Inference Attacks (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 40 citations
Mireshghallah et al.
Securebert: A Domain-specific Language Model For Cybersecurity (2022) • Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering • 108 citations
Aghaei et al.
THE-X: Privacy-preserving Transformer Inference With Homomorphic Encryption (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 47 citations
Chen et al.
PRADA: Practical Black-box Adversarial Attacks Against Neural Ranking Models (2022) • ACM Transactions on Information Systems • 41 citations
Wu et al.
Large Language Models Are Zero-shot Fuzzers: Fuzzing Deep-learning Libraries Via Large Language Models (2022) • ISSTA '23: 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis • 184 citations
Deng et al.
Ai-driven Development Is Here: Should You Worry? (2022) • IEEE Software • 55 citations
Neil Ernst, Gabriele Bavota
Transformer-based Language Models For Software Vulnerability Detection (2022) • ACSAC: Annual Computer Security Applications Conference • 87 citations
Thapa et al.
How To Keep Text Private? A Systematic Review Of Deep Learning Methods For Privacy-preserving Natural Language Processing (2022) • Artificial Intelligence Review • 57 citations
Samuel Sousa, Roman Kern
Towards Universal Backward-compatible Representation Learning (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 60 citations
Zhang et al.
Red Teaming Language Models With Language Models (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 141 citations
Perez et al.
Chatgpt: The End Of Online Exam Integrity? (2022) • Arxiv • 349 citations
Teo Susnjak
Challenges In Detoxifying Language Models (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 47 citations
Welbl et al.
Hidden Killer: Invisible Textual Backdoor Attacks With Syntactic Trigger (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 125 citations
Qi et al.
Examining Zero-shot Vulnerability Repair With Large Language Models (2021) • 2023 IEEE Symposium on Security and Privacy (SP) • 58 citations
Pearce et al.
RAP: Robustness-aware Perturbations For Defending Against Backdoor Attacks On NLP Models (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 55 citations
Yang et al.
Be Careful About Poisoned Word Embeddings: Exploring The Vulnerability Of The Embedding Layers In NLP Models (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 84 citations
Yang et al.
MOMENTA: A Multimodal Framework For Detecting Harmful Memes And Their Targets (2021) • Findings of the Association for Computational Linguistics: EMNLP 2021 • 100 citations
Pramanick et al.
Zero-shot Adversarial Quantization (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 73 citations
Yuang Liu, Wei Zhang, Jun Wang
Open-domain, Content-based, Multi-modal Fact-checking Of Out-of-context Images Via Online Resources (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 60 citations
Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
Just Say No: Analyzing The Stance Of Neural Dialogue Generation In Offensive Contexts (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 48 citations
Baheti et al.
Dual Attention Suppression Attack: Generate Adversarial Camouflage In Physical World (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 172 citations
Wang et al.
On Fast Adversarial Robustness Adaptation In Model-agnostic Meta-learning (2021) • The VLDB Journal • 118 citations
Wang et al.
CLINE: Contrastive Learning With Semantic Negative Examples For Natural Language Understanding (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 85 citations
Wang et al.
On The Opportunities And Risks Of Foundation Models (2021) • Arxiv • 2055 citations
Bommasani et al.
Copy, Right? A Testing Framework For Copyright Protection Of Deep Learning Models (2021) • 2022 IEEE Symposium on Security and Privacy (SP) • 57 citations
Chen et al.
Generating Fake Cyber Threat Intelligence Using Transformer-based Models (2021) • 2021 International Joint Conference on Neural Networks (IJCNN) • 61 citations
Ranade et al.
Backdoor Attacks On Pre-trained Models By Layerwise Weight Poisoning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 83 citations
Li et al.
Urltran: Improving Phishing URL Detection Using Transformers (2021) • MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM) • 64 citations
Maneriker et al.
Deepfakes Generation And Detection: State-of-the-art, Open Challenges, Countermeasures, And Way Forward (2021) • Applied Intelligence • 326 citations
Masood et al.
Backdoor Pre-trained Models Can Transfer To All (2021) • Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security • 70 citations
Shen et al.
A Channel Coding Benchmark For Meta-learning (2021) • Interspeech 2021 • 57 citations
Li et al.
Asvspoof 2021: Automatic Speaker Verification Spoofing And Countermeasures Challenge Evaluation Plan (2021) • Arxiv • 129 citations
Delgado et al.
Hidden Backdoors In Human-centric Language Models (2021) • CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security • 84 citations
Li et al.
Towards Robustness Against Natural Language Word Substitutions (2021) • Arxiv • 63 citations
Dong et al.
Integrating Pattern- And Fact-based Fake News Detection Via Model Preference Learning (2021) • Proceedings of the 30th ACM International Conference on Information & Knowledge Management • 51 citations
Sheng et al.
Cross-lingual COVID-19 Fake News Detection (2021) • 2021 International Conference on Data Mining Workshops (ICDMW) • 40 citations
Du et al.
Mind The Style Of Text! Adversarial And Backdoor Attacks Based On Text Style Transfer (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 98 citations
Qi et al.
Gradient-based Adversarial Attacks Against Text Transformers (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 62 citations
Guo et al.
Asvspoof 2021: Accelerating Progress In Spoofed And Deepfake Speech Detection (2021) • 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge • 268 citations
Yamagishi et al.
Deep Model Intellectual Property Protection Via Deep Watermarking (2021) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 118 citations
Zhang et al.
Adversarial Attacks On Deep Models For Financial Transaction Records (2021) • Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining • 42 citations
Fursov et al.
Towards Robustness Of Text-to-sql Models Against Synonym Substitution (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 58 citations
Gan et al.
Newsclippings: Automatic Generation Of Out-of-context Multimodal Media (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 46 citations
Grace Luo, Trevor Darrell, Anna Rohrbach
Adv-bert: BERT Is Not Robust On Misspellings! Generating Nature Adversarial Samples On BERT (2020) • Arxiv • 79 citations
Sun et al.
Tweepfake: About Detecting Deepfake Tweets (2020) • PLOS ONE • 153 citations
Fagni et al.
Differentially Private Representation For NLP: Formal Guarantee And An Empirical Study On Privacy And Fairness (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 47 citations
Lingjuan Lyu, Xuanli He, Yitong Li
Trojaning Language Models For Fun And Profit (2020) • 2021 IEEE European Symposium on Security and Privacy (EuroS&P) • 62 citations
Zhang et al.
Human-centric Spatio-temporal Video Grounding With Visual Transformers (2020) • IEEE Transactions on Circuits and Systems for Video Technology • 75 citations
Tang et al.
SAFER: A Structure-free Approach For Certified Robustness To Adversarial Word Substitutions (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 84 citations
Mao Ye, Chengyue Gong, Qiang Liu
Confronting Abusive Language Online: A Survey From The Ethical And Human Rights Perspective (2020) • Journal of Artificial Intelligence Research • 53 citations
Svetlana Kiritchenko, Isar Nejadgholi, Kathleen C. Fraser
Information Leakage In Embedding Models (2020) • Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security • 179 citations
Congzheng Song, Ananth Raghunathan
Can Adversarial Weight Perturbations Inject Neural Backdoors? (2020) • Proceedings of the 29th ACM International Conference on Information & Knowledge Management • 51 citations
Garg et al.
BAE: Bert-based Adversarial Examples For Text Classification (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 49 citations
Siddhant Garg, Goutham Ramakrishnan
Adversarial Watermarking Transformer: Towards Tracing Text Provenance With Data Hiding (2020) • 2021 IEEE Symposium on Security and Privacy (SP) • 56 citations
Sahar Abdelnabi, Mario Fritz
Robust Encodings: A Framework For Combating Adversarial Typos (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 89 citations
Jones et al.
BERT-ATTACK: Adversarial Attack Against BERT Using BERT (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 486 citations
Li et al.
Machine Generation And Detection Of Arabic Manipulated And Fake News (2020) • Arxiv • 41 citations
Nagoudi et al.
Entangled Watermarks As A Defense Against Model Extraction (2020) • Arxiv • 46 citations
Jia et al.
Textattack: A Framework For Adversarial Attacks, Data Augmentation, And Adversarial Training In NLP (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 245 citations
Morris et al.
Stereotypical Bias Removal For Hate Speech Detection Task Using Knowledge-based Generalizations (2020) • The World Wide Web Conference • 89 citations
Pinkesh Badjatiya, Manish Gupta, Vasudeva Varma
Backdoor Attacks Against Transfer Learning With Pre-trained Deep Learning Models (2020) • IEEE Transactions on Services Computing • 68 citations
Wang et al.
Openattack: An Open-source Textual Adversarial Attack Toolkit (2020) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations • 68 citations
Zeng et al.
Cat-gen: Improving Robustness In NLP Models Via Controlled Adversarial Text Generation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 62 citations
Wang et al.
Adversarial Robustness: From Self-supervised Pre-training To Fine-tuning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 167 citations
Chen et al.
Imitation Attacks And Defenses For Black-box Machine Translation Systems (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 59 citations
Eric Wallace, Mitchell Stern, Dawn Song
Metapoison: Practical General-purpose Clean-label Data Poisoning (2020) • Arxiv • 81 citations
Huang et al.
Texthide: Tackling Data Privacy In Language Understanding Tasks (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 42 citations
Huang et al.
Weight Poisoning Attacks On Pre-trained Models (2020) • Arxiv • 49 citations
Keita Kurita, Paul Michel, Graham Neubig
Recipes For Safety In Open-domain Chatbots (2020) • Arxiv • 98 citations
Xu et al.
Text Processing Like Humans Do: Visually Attacking And Shielding NLP Systems (2019) • Arxiv • 49 citations
Eger et al.
Thieves On Sesame Street! Model Extraction Of Bert-based Apis (2019) • Arxiv • 73 citations
Krishna et al.
Overlearning Reveals Sensitive Attributes (2019) • Arxiv • 55 citations
Congzheng Song, Vitaly Shmatikov
Detecting AI Trojans Using Meta Neural Analysis (2019) • 2021 IEEE Symposium on Security and Privacy (SP) • 43 citations
Xu et al.
A Backdoor Attack Against Lstm-based Text Classification Systems (2019) • IEEE Access • 268 citations
Jiazhu Dai, Chuanshuai Chen
Achieving Verified Robustness To Symbol Substitutions Via Interval Bound Propagation (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 121 citations
Huang et al.
Certified Robustness To Adversarial Word Substitutions (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 236 citations
Jia et al.
White-to-black: Efficient Distillation Of Black-box Adversarial Attacks (2019) • Proceedings of the 2019 Conference of the North • 45 citations
Gil et al.
Generating Sentiment-preserving Fake Online Reviews Using Neural Language Models And Their Human- And Machine-based Detection (2019) • Advances in Intelligent Systems and Computing • 51 citations
Adelani et al.
Differential Privacy Has Disparate Impact On Model Accuracy (2019) • Arxiv • 77 citations
Eugene Bagdasaryan, Vitaly Shmatikov
A Tale Of Evil Twins: Adversarial Inputs Versus Poisoned Models (2019) • Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security • 50 citations
Pang et al.
Defending Against Neural Fake News (2019) • Arxiv • 89 citations
Zellers et al.
Make Up Your Mind! Adversarial Generation Of Inconsistent Natural Language Explanations (2019) • Short Paper at ACL 2020 • 55 citations
Camburu et al.
Word-level Textual Adversarial Attacking As Combinatorial Optimization (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 175 citations
Zang et al.
Analyzing Information Leakage Of Updates To Natural Language Models (2019) • CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security • 45 citations
Zanella-Béguelin et al.
Is BERT Really Robust? A Strong Baseline For Natural Language Attack On Text Classification And Entailment (2019) • Arxiv • 102 citations
Jin et al.
On Evaluation Of Adversarial Perturbations For Sequence-to-sequence Models (2019) • Proceedings of the 2019 Conference of the North • 114 citations
Michel et al.
Blackmarks: Blackbox Multibit Watermarking For Deep Neural Networks (2019) • Arxiv • 41 citations
Huili Chen, Bita Darvish Rouhani, Farinaz Koushanfar
Black-box Generation Of Adversarial Text Sequences To Evade Deep Learning Classifiers (2018) • 2018 IEEE Security and Privacy Workshops (SPW) • 621 citations
Gao et al.
Adversarial Example Generation With Syntactically Controlled Paraphrase Networks (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 82 citations
Iyyer et al.
Adversarially Regularising Neural NLI Models To Integrate Logical Background Knowledge (2018) • Arxiv • 43 citations
Pasquale Minervini, Sebastian Riedel
Cache Telepathy: Leveraging Shared Resource Attacks To Learn DNN Architectures (2018) • Arxiv • 44 citations
Mengjia Yan, Christopher Fletcher, Josep Torrellas
Generating Natural Language Adversarial Examples (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 781 citations
Alzantot et al.
Textbugger: Generating Adversarial Text Against Real-world Applications (2018) • Proceedings 2019 Network and Distributed System Security Symposium • 275 citations
Li et al.
How To Backdoor Federated Learning (2018) • Arxiv • 698 citations
Bagdasaryan et al.
Conversations Gone Awry: Detecting Early Signs Of Conversational Failure (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 174 citations
Zhang et al.
Fooling Vision And Language Models Despite Localization And Attention Mechanism (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Xu et al.
Automated Crowdturfing Attacks And Defenses In Online Review Systems (2017) • Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security • 133 citations
Yao et al.

Showing first 12 while collapsed. Click to expand and reveal all 215.

SIGIR 56 papers #

The Power Of Noise: Redefining Retrieval For RAG Systems (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 105 citations
Cuconasu et al.
Evaluating Retrieval Quality In Retrieval-augmented Generation (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 79 citations
Alireza Salemi, Hamed Zamani
Fine Tuning Vs. Retrieval Augmented Generation For Less Popular Knowledge (2024) • Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 40 citations
Heydar Soudani, Evangelos Kanoulas, Faegheh Hasibi
Linrec: Linear Attention Mechanism For Long-term Sequential Recommender Systems (2024) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 52 citations
Liu et al.
Retrieval-augmented Egocentric Video Captioning (2024) • Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 77 citations
Xu et al.
Data-efficient Fine-tuning For Llm-based Recommendation (2024) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 80 citations
Lin et al.
When MOE Meets Llms: Parameter Efficient Fine-tuning For Multi-task Medical Applications (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 49 citations
Liu et al.
Large Language Models Can Accurately Predict Searcher Preferences (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 82 citations
Thomas et al.
One-shot Labeling For Automatic Relevance Estimation (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Sean MacAvaney, Luca Soldaini
Can Chatgpt Write A Good Boolean Query For Systematic Review Literature Search? (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 181 citations
Wang et al.
Where To Go Next For Recommender Systems? ID- Vs. Modality-based Recommender Models Revisited (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 137 citations
Yuan et al.
Perspectives On Large Language Models For Relevance Judgment (2023) • ICTIR '23: The 2023 ACM SIGIR International Conference on the Theory of Information Retrieval • 103 citations
Faggioli et al.
Large Language Models Are Versatile Decomposers: Decompose Evidence And Questions For Table-based Reasoning (2023) • SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 43 citations
Ye et al.
Graph Masked Autoencoder For Sequential Recommendation (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 48 citations
Yaowen Ye, Lianghao Xia, Chao Huang
Graphgpt: Graph Instruction Tuning For Large Language Models (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 96 citations
Tang et al.
How To Do Things With Deep Learning Code (2023) • Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 41 citations
Minh Hua, Rita Raley
Retrieving Supporting Evidence For Llms Generated Answers (2023) • Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region • 47 citations
Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke
Multi: Efficient Video-and-language Understanding With Text-guided Multiway-sampler And Multiple Choice Modeling (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 50 citations
Xu et al.
Augmenting Low-resource Text Classification With Graph-grounded Pre-training And Prompting (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Zhihao Wen, Yuan Fang
User-centric Conversational Recommendation With Multi-aspect User Modeling (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 52 citations
Li et al.
An Efficiency Study For SPLADE Models (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 58 citations
Carlos Lassance, Stéphane Clinchant
Hybrid Transformer With Multi-level Fusion For Multimodal Knowledge Graph Completion (2022) • SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 152 citations
Chen et al.
GERE: Generative Evidence Retrieval For Fact Verification (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 53 citations
Chen et al.
Centerclip: Token Clustering For Efficient Text-video Retrieval (2022) • SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 100 citations
Zhao et al.
Decoupled Side Information Fusion For Sequential Recommendation (2022) • Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 110 citations
Yueqi Xie, Peilin Zhou, Sunghun Kim
Augmenting Sequential Recommendation With Pseudo-prior Items Via Reversely Pre-training Transformer (2021) • SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 116 citations
Liu et al.
Learning To Warm Up Cold Item Embeddings For Cold-start Recommendation With Meta Scaling And Shifting Networks (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 116 citations
Zhu et al.
Empowering News Recommendation With Pre-trained Language Models (2021) • SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 122 citations
Wu et al.
Pseudo-relevance Feedback For Multiple Representation Dense Retrieval (2021) • Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval • 47 citations
Wang et al.
Learning Passage Impacts For Inverted Indexes (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 140 citations
Mallia et al.
Unified Conversational Recommendation Policy Learning Via Graph-based Reinforcement Learning (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 119 citations
Deng et al.
Counterfactual Explanations For Neural Recommenders (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 45 citations
Khanh Hiep Tran, Azin Ghazimatin, Rishiraj Saha Roy
One Chatbot Per Person: Creating Personalized Chatbots Based On Implicit User Profiles (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 55 citations
Ma et al.
LPF: A Language-prior Feedback Objective Function For De-biased Visual Question Answering (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 45 citations
Zujie Liang, Haifeng Hu, Jiaying Zhu
Multiplex Behavioral Relation Learning For Recommendation Via Memory Augmented Transformer Network (2021) • SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval • 129 citations
Xia et al.
Societal Biases In Retrieved Contents: Measurement Framework And Adversarial Mitigation For BERT Rankers (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 44 citations
Navid Rekabsaz, Simone Kopeinik, Markus Schedl
Parameter-efficient Transfer From Sequential Behaviors For User Modeling And Recommendation (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 150 citations
Yuan et al.
Fashionbert: Text And Image Matching With Adaptive Loss For Cross-modal Retrieval (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 114 citations
Gao et al.
How Useful Are Reviews For Recommendation? A Critical Review And Potential Improvements (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 54 citations
Noveen Sachdeva, Julian McAuley
Open-retrieval Conversational Question Answering (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 69 citations
Qu et al.
Self-supervised Contrastive Learning For Code Retrieval And Summarization Via Semantic-preserving Transformations (2020) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 89 citations
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang
Query Resolution For Conversational Search With Limited Supervision (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 106 citations
Voskarides et al.
Analysing The Effect Of Clarifying Questions On Document Ranking In Conversational Search (2020) • Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval • 45 citations
Krasakis et al.
Expansion Via Prediction Of Importance With Contextualization (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 64 citations
MacAvaney et al.
Few-shot Generative Conversational Query Rewriting (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 124 citations
Yu et al.
BERT With History Answer Embedding For Conversational Question Answering (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 142 citations
Qu et al.
CEDR: Contextualized Embeddings For Document Ranking (2019) • SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 186 citations
MacAvaney et al.
Cross-modal Interaction Networks For Query-based Moment Retrieval In Videos (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 238 citations
Zhang et al.
Asking Clarifying Questions In Open-domain Information-seeking Conversations (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 175 citations
Aliannejadi et al.
Quantifying And Alleviating The Language Prior Problem In Visual Question Answering (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Guo et al.
Explainable Recommendation Via Multi-task Learning In Opinionated Text Data (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 199 citations
Wang et al.
Response Ranking With Deep Matching Networks And External Knowledge In Information-seeking Conversation Systems (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 153 citations
Yang et al.
Collaborative Memory Network For Recommendation Systems (2018) • The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval • 288 citations
Travis Ebesu, Bin Shen, Yi Fang
Neural Rating Regression With Abstractive Tips Generation For Recommendation (2017) • Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval • 288 citations
Li et al.
Neural Vector Spaces For Unsupervised Information Retrieval (2017) • ACM Transactions on Information Systems • 84 citations
Christophe van Gysel, Maarten de Rijke, Evangelos Kanoulas
Learning To Rank Question Answer Pairs With Holographic Dual LSTM Architecture (2017) • Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval • 63 citations
Tay et al.

Showing first 12 while collapsed. Click to expand and reveal all 56.

SLT 12 papers #

Gloss-free Sign Language Translation: Improving From Visual-language Pretraining (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 47 citations
Zhou et al.
Signbert+: Hand-model-aware Self-supervised Pre-training For Sign Language Understanding (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 89 citations
Hu et al.
A Simple Multi-modality Transfer Learning Baseline For Sign Language Translation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 120 citations
Chen et al.
Continuous 3D Multi-channel Sign Language Production Via Progressive Transformers And Mixture Density Networks (2021) • International Journal of Computer Vision • 58 citations
Ben Saunders, Necati Cihan Camgoz, Richard Bowden
Improving Sequence-to-sequence Pre-training Via Sequence Span Rewriting (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 201 citations
Zhou et al.
Progressive Transformers For End-to-end Sign Language Production (2020) • Lecture Notes in Computer Science • 84 citations
Ben Saunders, Necati Cihan Camgoz, Richard Bowden
Internal Language Model Estimation For Domain-adaptive End-to-end Speech Recognition (2020) • 2021 IEEE Spoken Language Technology Workshop (SLT) • 57 citations
Meng et al.
ODSQA: Open-domain Spoken Question Answering Dataset (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 42 citations
Lee et al.
Predicting Expressive Speaking Style From Text In End-to-end Speech Synthesis (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 112 citations
Daisy Stanton, Yuxuan Wang, Rj Skerry-Ryan
Dialog-context Aware End-to-end Speech Recognition (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 40 citations
Suyoun Kim, Florian Metze
User Modeling For Task Oriented Dialogues (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 45 citations
Gur et al.
Back-translation-style Data Augmentation For End-to-end ASR (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 95 citations
Hayashi et al.

Showing first 12 while collapsed. Click to expand and reveal all 12.

Survey Paper 313 papers #

A Survey On Latent Reasoning (2025) • No Venue
Zhu et al.
A Survey On Vision-language-action Models: An Action Tokenization Perspective (2025) • No Venue
Zhong et al.
The Landscape Of Agentic Reinforcement Learning For Llms: A Survey (2025) • No Venue
Zhang et al.
A Survey Of Reinforcement Learning For Large Reasoning Models (2025) • No Venue
Zhang et al.
Unified Multimodal Understanding And Generation Models: Advances, Challenges, And Opportunities (2025) • No Venue
Zhang et al.
What, How, Where, And How Well? A Survey On Test-time Scaling In Large Language Models (2025) • No Venue
Zhang et al.
Aligning Multimodal LLM With Human Preference: A Survey (2025) • No Venue
Yu et al.
Discrete Diffusion In Large Language And Multimodal Models: A Survey (2025) • No Venue
Runpeng Yu, Qi Li, Xinchao Wang
100 Days After Deepseek-r1: A Survey On Replication Studies And More Directions For Reasoning Language Models (2025) • No Venue
Zhang et al.
Multimodal Chain-of-thought Reasoning: A Comprehensive Survey (2025) • No Venue
Wang et al.
From AI For Science To Agentic Science: A Survey On Autonomous Scientific Discovery (2025) • No Venue
Wei et al.
3D Scene Generation: A Survey (2025) • No Venue
Wen et al.
Reinforcement Learning In Vision: A Survey (2025) • No Venue
Wu et al.
Towards Large Reasoning Models: A Survey Of Reinforced Reasoning With Large Language Models (2025) • No Venue
Xu et al.
Survey On Evaluation Of Llm-based Agents (2025) • No Venue
Yehudai et al.
Reasoning Language Models: A Blueprint (2025) • No Venue
Besta et al.
Reconstructing 4D Spatial Intelligence: A Survey (2025) • No Venue
Cao et al.
Self-improvement In Multimodal Large Language Models: A Survey (2025) • No Venue
Deng et al.
A Comprehensive Survey Of Self-evolving AI Agents: A New Paradigm Bridging Foundation Models And Lifelong Agentic Systems (2025) • No Venue
Fang et al.
Efficient Reasoning Models: A Survey (2025) • No Venue
Feng et al.
A Survey Of Self-evolving Agents: On Path To Artificial Super Intelligence (2025) • No Venue
Gao et al.
A Survey Of Vibe Coding With Large Language Models (2025) • No Venue
Ge et al.
A Survey Of Scientific Large Language Models: From Data Foundations To Agent Frontiers (2025) • No Venue
Hu et al.
Perception, Reason, Think, And Plan: A Survey On Large Multimodal Reasoning Models (2025) • No Venue
Li et al.
Towards Agentic RAG With Deep Reasoning: A Survey Of Rag-reasoning Systems In Llms (2025) • No Venue
Li et al.
A Survey On Diffusion Language Models (2025) • No Venue
Li et al.
Surveyx: Academic Survey Automation Via Large Language Models (2025) • No Venue
Liang et al.
Advances And Challenges In Foundation Agents: From Brain-inspired Intelligence To Evolutionary, Collaborative, And Safe Systems (2025) • No Venue
Liu et al.
A Comprehensive Survey On Long Context Language Modeling (2025) • No Venue
Liu et al.
Efficient Inference For Large Reasoning Models: A Survey (2025) • No Venue
Liu et al.
Logical Reasoning In Large Language Models: A Survey (2025) • No Venue
Liu et al.
Llm-powered GUI Agents In Phone Automation: Surveying Progress And Prospects (2025) • No Venue
Liu et al.
Researchbench: Benchmarking Llms In Scientific Discovery Via Inspiration-based Task Decomposition (2025) • No Venue
Liu et al.
Thus Spake Long-context Large Language Model (2025) • No Venue
Liu et al.
Large Language Model Agent: A Survey On Methodology, Applications And Challenges (2025) • No Venue
Luo et al.
LLM4SR: A Survey On Large Language Models For Scientific Research (2025) • No Venue
Luo et al.
Technologies On Effectiveness And Efficiency: A Survey Of State Spaces Models (2025) • No Venue
Lv et al.
A Survey Of Context Engineering For Large Language Models (2025) • No Venue
Mei et al.
Discrete Audio Tokens: More Than A Survey! (2025) • No Venue
Mousavi et al.
A Survey On Large Language Model Benchmarks (2025) • No Venue
Ni et al.
A Survey On Inference Engines For Large Language Models: Perspectives On Optimization And Efficiency (2025) • No Venue
Park et al.
AION-1: Omnimodal Foundation Model For Astronomical Sciences (2025) • No Venue
Parker et al.
A Survey Of Efficient Reasoning For Large Reasoning Models: Language, Multimodality, And Beyond (2025) • No Venue
Qu et al.
Agent Laboratory: Using LLM Agents As Research Assistants (2025) • No Venue
Schmidgall et al.
When Tokens Talk Too Much: A Survey Of Multimodal Long-context Token Compression Across Images, Videos, And Audios (2025) • No Venue
Shao et al.
Deep Research: A Systematic Survey (2025) • No Venue
Shi et al.
Towards Trustworthy GUI Agents: A Survey (2025) • No Venue
Shi et al.
Video-lmm Post-training: A Deep Dive Into Video Reasoning With Large Multimodal Models (2025) • No Venue
Tang et al.
Thinking With Images For Multimodal Reasoning: Foundations, Methods, And Future Frontiers (2025) • No Venue
Su et al.
Stop Overthinking: A Survey On Efficient Reasoning For Large Language Models (2025) • No Venue
Sui et al.
Inverse Reinforcement Learning Meets Large Language Model Post-training: Basics, Advances, And Opportunities (2025) • No Venue
Hao Sun, Mihaela van Der Schaar
Speed Always Wins: A Survey On Efficient Architectures For Large Language Models (2025) • No Venue
Sun et al.
One Missing Piece In Vision And Language: A Survey On Comics Understanding (2024) • No Venue
Vivoli et al.
Artificial Intelligence For Literature Reviews: Opportunities And Challenges (2024) • Artificial Intelligence Review • 97 citations
Bolanos et al.
Survey On Large Language Model-enhanced Reinforcement Learning: Concept, Taxonomy, And Methods (2024) • IEEE Transactions on Neural Networks and Learning Systems • 50 citations
Cao et al.
Language-based Game Theory In The Age Of Artificial Intelligence (2024) • Journal of The Royal Society Interface • 66 citations
Capraro et al.
At The Dawn Of Generative AI Era: A Tutorial-cum-survey On New Frontiers In 6G Wireless Intelligence (2024) • IEEE Open Journal of the Communications Society • 57 citations
Abdulkadir Celik, Ahmed M. Eltawil
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey (2024) • No Venue
Chen et al.
Exploring Large Language Model Based Intelligent Agents: Definitions, Methods, And Prospects (2024) • Internet Research • 214 citations
Cheng et al.
(A)I Am Not A Lawyer, But...: Engaging Legal Experts Towards Responsible LLM Policies For Legal Advice (2024) • FAccT '24: The 2024 ACM Conference on Fairness, Accountability, and Transparency • 56 citations
Cheong et al.
Attention Heads Of Large Language Models: A Survey (2024) • No Venue
Zheng et al.
Bias And Unfairness In Information Retrieval Systems: New Challenges In The LLM Era (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 59 citations
Dai et al.
Security And Privacy Challenges Of Large Language Models: A Survey (2024) • ACM Computing Surveys • 104 citations
Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu
A Review Of Modern Recommender Systems Using Generative Models (gen-recsys) (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 62 citations
Deldjoo et al.
A Scoping Review Of Chatgpt Research In Accounting And Finance (2024) • International Journal of Accounting Information Systems • 41 citations
Mengming Michael Dong, Theophanis C. Stratopoulos, Victor Xiaoqi Wang
Utilizing Local Hierarchy With Adversarial Training For Hierarchical Text Classification (2024) • ACM Computing Surveys • 58 citations
Zihan Wang, Peiyi Wang, Houfeng Wang
Deep Learning For Cross-domain Data Fusion In Urban Computing: Taxonomy, Advances, And Outlook (2024) • Information Fusion • 53 citations
Zou et al.
A Framework For Human Evaluation Of Large Language Models In Healthcare Derived From Literature Review (2024) • npj Digital Medicine • 131 citations
Tam et al.
Knowledge Mechanisms In Large Language Models: A Survey And Perspective (2024) • No Venue
Wang et al.
Large Language Models For Data Annotation And Synthesis: A Survey (2024) • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing • 78 citations
Tan et al.
Explainable Generative AI (genxai): A Survey, Conceptualization, And Research Agenda (2024) • Artificial Intelligence Review • 61 citations
Johannes Schneider
The Prompt Report: A Systematic Survey Of Prompting Techniques (2024) • No Venue
Schulhoff et al.
Generative Artificial Intelligence: A Systematic Review And Applications (2024) • Multimedia Tools and Applications • 189 citations
Sengar et al.
Abstractive Text Summarization: State Of The Art, Challenges, And Improvements (2024) • Neurocomputing • 43 citations
Hassan Shakil, Ahmad Farooq, Jugal Kalita
To Cot Or Not To Cot? Chain-of-thought Helps Mainly On Math And Symbolic Reasoning (2024) • No Venue
Sprague et al.
Lightweight Deep Learning For Resource-constrained Environments: A Survey (2024) • ACM Computing Surveys • 100 citations
Liu et al.
Mm-llms: Recent Advances In Multimodal Large Language Models (2024) • No Venue
Zhang et al.
Personalization Of Large Language Models: A Survey (2024) • No Venue
Zhang et al.
On Llms-driven Synthetic Data Generation, Curation, And Evaluation: A Survey (2024) • Findings of the Association for Computational Linguistics ACL 2024 • 45 citations
Long et al.
Retrieval-augmented Generation For Ai-generated Content: A Survey (2024) • Arxiv • 72 citations
Zhao et al.
A Survey To Recent Progress Towards Understanding In-context Learning (2024) • Frontiers of Computer Science • 40 citations
Mao et al.
Foundation Models For Music: A Survey (2024) • No Venue
Ma et al.
Preference Tuning With Human Feedback On Language, Speech, And Vision Tasks: A Survey (2024) • No Venue
Winata et al.
GUI Agents: A Survey (2024) • No Venue
Nguyen et al.
A Survey Of Small Language Models (2024) • No Venue
Nguyen et al.
Survey Of Cultural Awareness In Language Models: Text And Beyond (2024) • No Venue
Pawar et al.
Large Language Models Meet NLP: A Survey (2024) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 81 citations
Qin et al.
A Review Of Large Language Models And Autonomous Agents In Chemistry (2024) • Chemical Science • 79 citations
Mayk Caldas Ramos, Christopher J. Collison, Andrew D. White
Agent AI: Surveying The Horizons Of Multimodal Interaction (2024) • Arxiv • 40 citations
Durante et al.
A Survey On RAG Meeting Llms: Towards Retrieval-augmented Large Language Models (2024) • KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 281 citations
Fan et al.
Mme-survey: A Comprehensive Survey On Evaluation Of Multimodal Llms (2024) • No Venue
Fu et al.
Large Language Models And Games: A Survey And Roadmap (2024) • IEEE Transactions on Games • 57 citations
Gallotta et al.
Towards A Unified View Of Preference Learning For Large Language Models: A Survey (2024) • No Venue
Gao et al.
Generative AI For Visualization: State Of The Art And Future Directions (2024) • Visual Informatics • 68 citations
Ye et al.
Is It Really Long Context If All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP (2024) • No Venue
Goldman et al.
Mapping The Ethics Of Generative AI: A Comprehensive Scoping Review (2024) • Minds and Machines • 63 citations
Thilo Hagendorff
Parameter-efficient Fine-tuning For Large Models: A Comprehensive Survey (2024) • Arxiv • 81 citations
Han et al.
A Survey Of Recent Methods For Addressing AI Fairness And Bias In Biomedicine (2024) • Journal of Biomedical Informatics • 44 citations
Yang et al.
The Effects Of Generative AI On Computing Students' Help-seeking Preferences (2024) • Proceedings of the 26th Australasian Computing Education Conference • 61 citations
Hou et al.
Automatic Speech Recognition Using Advanced Deep Learning Approaches: A Survey (2024) • Information Fusion • 111 citations
Hamza Kheddar, Mustapha Hemis, Yassine Himeur
A Survey On Integration Of Large Language Models With Intelligent Robots (2024) • Intelligent Service Robotics • 46 citations
Kim et al.
From Generation To Judgment: Opportunities And Challenges Of Llm-as-a-judge (2024) • No Venue
Li et al.
A Survey On The Honesty Of Large Language Models (2024) • No Venue
Li et al.
AI For Social Science And Social Science Of AI: A Survey (2024) • Information Processing & Management • 71 citations
Xu et al.
Document Parsing Unveiled: Techniques, Challenges, And Prospects For Structured Information Extraction (2024) • No Venue
Zhang et al.
Controllable Text Generation For Large Language Models: A Survey (2024) • No Venue
Liang et al.
Internal Consistency And Self-feedback In Large Language Models: A Survey (2024) • No Venue
Liang et al.
Foundation Models For Time Series Analysis: A Tutorial And Survey (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 100 citations
Liang et al.
Toward Verifiable And Reproducible Human Evaluation For Text-to-image Generation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Otani et al.
A Survey On Few-shot Class-incremental Learning (2023) • Neural Networks • 131 citations
Tian et al.
Cultural Bias And Cultural Alignment Of Large Language Models (2023) • PNAS Nexus • 113 citations
Tao et al.
Automatically Correcting Large Language Models: Surveying The Landscape Of Diverse Self-correction Strategies (2023) • Transactions of the Association for Computational Linguistics • 42 citations
Pan et al.
Unifying Large Language Models And Knowledge Graphs: A Roadmap (2023) • IEEE Transactions on Knowledge and Data Engineering • 578 citations
Pan et al.
AI Deception: A Survey Of Examples, Risks, And Potential Solutions (2023) • Patterns • 88 citations
Park et al.
Enabling Resource-efficient Aiot System With Cross-level Optimization: A Survey (2023) • IEEE Communications Surveys & Tutorials • 44 citations
Liu et al.
Trustworthy Llms: A Survey And Guideline For Evaluating Large Language Models' Alignment (2023) • No Venue
Liu et al.
Pre-train, Prompt And Recommendation: A Comprehensive Survey Of Language Modelling Paradigm Adaptations In Recommender Systems (2023) • Transactions of the Association for Computational Linguistics • 65 citations
Peng Liu, Lemei Zhang, Jon Atle Gulla
Summary Of Chatgpt-related Research And Perspective Towards The Future Of Large Language Models (2023) • Meta-Radiology • 582 citations
Liu et al.
Recommender Systems In The Era Of Large Language Models (llms) (2023) • IEEE Transactions on Knowledge and Data Engineering • 183 citations
Zhao et al.
Loss Functions And Metrics In Deep Learning (2023) • Terven J. Cordova-Esparza DM. Romero-Gonzalez JA. et al. A comprehensive survey of loss functions and metrics in deep learning. Artif Intell Rev 58 195 (2025) • 40 citations
Terven et al.
Cognitive Architectures For Language Agents (2023) • Arxiv • 53 citations
Sumers et al.
The Rise And Potential Of Large Language Model Based Agents: A Survey (2023) • Science China Information Sciences • 183 citations
Xi et al.
A Survey On Model Compression For Large Language Models (2023) • Transactions of the Association for Computational Linguistics • 85 citations
Zhu et al.
Decoding Chatgpt: A Taxonomy Of Existing Research, Current Challenges, And Possible Future Directions (2023) • Journal of King Saud University - Computer and Information Sciences • 122 citations
Sohail et al.
Explainability For Large Language Models: A Survey (2023) • ACM Transactions on Intelligent Systems and Technology • 317 citations
Zhao et al.
A Survey Of Graph Prompting Methods: Techniques, Applications, And Challenges (2023) • World Wide Web • 199 citations
Wu et al.
A Survey On Semantic Processing Techniques (2023) • Information Fusion • 44 citations
Mao et al.
A Comprehensive Overview Of Large Language Models (2023) • ACM Transactions on Intelligent Systems and Technology • 152 citations
Naveed et al.
Large Language Models In Healthcare And Medical Domain: A Review (2023) • Informatics • 192 citations
Zabir Al Nazi, Wei Peng
Artificial Intelligence Index Report 2023 (2023) • Arxiv • 186 citations
Maslej et al.
From Google Gemini To Openai Q* (q-star): A Survey Of Reshaping The Generative Artificial Intelligence (AI) Research Landscape (2023) • Arxiv • 59 citations
McIntosh et al.
On The Design Of Ai-powered Code Assistants For Notebooks (2023) • Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems • 68 citations
McNutt et al.
Ai-generated Content (AIGC): A Survey (2023) • Arxiv • 81 citations
Wu et al.
A Survey On Multimodal Large Language Models (2023) • National Science Review • 271 citations
Yin et al.
Can Knowledge Graphs Reduce Hallucinations In Llms? : A Survey (2023) • Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) • 49 citations
Agrawal et al.
Large Language Models Are Latent Variable Models: Explaining And Finding Good Demonstrations For In-context Learning (2023) • Machine Intelligence Research • 152 citations
Wang et al.
Information Retrieval Meets Large Language Models: A Strategic Report From Chinese IR Community (2023) • AI Open • 51 citations
Ai et al.
Foundational Models Defining A New Era In Vision: A Survey And Outlook (2023) • Arxiv • 66 citations
Awais et al.
Advancements In Generative AI: A Comprehensive Review Of Gans, GPT, Autoencoders, Diffusion Model, And Transformers (2023) • IEEE Access • 185 citations
Bengesi et al.
A Comprehensive Survey Of Ai-generated Content (AIGC): A History Of Generative AI From GAN To Chatgpt (2023) • Arxiv • 385 citations
Cao et al.
Open Problems And Fundamental Limitations Of Reinforcement Learning From Human Feedback (2023) • No Venue
Casper et al.
A Survey On Evaluation Of Large Language Models (2023) • No Venue
Chang et al.
Students' Voices On Generative AI: Perceptions, Benefits, And Challenges In Higher Education (2023) • International Journal of Educational Technology in Higher Education • 992 citations
Cecilia Ka Yuk Chan, Wenjie Hu
The AI Generation Gap: Are Gen Z Students More Interested In Adopting Generative AI Such As Chatgpt In Teaching And Learning Than Their Gen X And Millennial Generation Teachers? (2023) • Smart Learning Environments • 354 citations
Cecilia Ka Yuk Chan, Katherine K. W. Lee
Language Model Behavior: A Comprehensive Survey (2023) • Computational Linguistics • 53 citations
Tyler A. Chang, Benjamin K. Bergen
Unleashing The Potential Of Prompt Engineering For Large Language Models (2023) • Patterns • 86 citations
Chen et al.
Review Of Large Vision Models And Visual Prompt Engineering (2023) • Meta-Radiology • 157 citations
Wang et al.
A Survey On Deep Neural Network Pruning-taxonomy, Comparison, Analysis, And Recommendations (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 203 citations
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi
A Survey On Zero Pronoun Translation (2023) • IEEE Open Journal of the Computer Society • 240 citations
Wang et al.
A Survey Of Techniques For Optimizing Transformer Inference (2023) • Journal of Systems Architecture • 90 citations
Chitty-Venkata et al.
Software Testing With Large Language Models: Survey, Landscape, And Vision (2023) • IEEE Transactions on Software Engineering • 235 citations
Wang et al.
A Survey On Multimodal Large Language Models For Autonomous Driving (2023) • 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) • 217 citations
Cui et al.
The State Of Human-centered NLP Technology For Fact-checking (2023) • Information Processing & Management • 55 citations
Das et al.
A Comprehensive Survey On Multimodal Recommender Systems: Taxonomy, Evaluation, And Future Directions (2023) • Arxiv • 149 citations
Zhou et al.
A Bibliometric Review Of Large Language Models Research From 2017 To 2023 (2023) • ACM Transactions on Intelligent Systems and Technology • 95 citations
Fan et al.
Large Language Models For Software Engineering: Survey And Open Problems (2023) • 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) • 244 citations
Fan et al.
Generative Pre-trained Transformer: A Comprehensive Review On Enabling Technologies, Potential Applications, Emerging Challenges, And Future Directions (2023) • IEEE Access • 375 citations
Yenduri et al.
A Review Of The Trends And Challenges In Adopting Natural Language Processing Methods For Education Feedback Analysis (2023) • IEEE Access • 207 citations
Shaik et al.
A Survey Of Multimodal Information Fusion For Smart Healthcare: Mapping The Journey From Data To Wisdom (2023) • Information Fusion • 109 citations
Shaik et al.
Foundation Models In Robotics: Applications, Challenges, And The Future (2023) • The International Journal of Robotics Research • 89 citations
Firoozi et al.
Bias And Fairness In Large Language Models: A Survey (2023) • Computational Linguistics • 255 citations
Gallegos et al.
Large Language Models On Wikipedia-style Survey Generation: An Evaluation In NLP Concepts (2023) • Humanities and Social Sciences Communications • 97 citations
Gao et al.
The Science Of Detecting Llm-generated Texts (2023) • Communications of the ACM • 81 citations
Ruixiang Tang, Yu-Neng Chuang, Xia Hu
Chatgpt Is Not All You Need. A State Of The Art Review Of Large Generative AI Models (2023) • Arxiv • 238 citations
Roberto Gozalo-Brizuela, Eduardo C. Garrido-Merchan
A Systematic Survey Of Prompt Engineering On Vision-language Foundation Models (2023) • Arxiv • 61 citations
Gu et al.
A Survey On Large Language Model (LLM) Security And Privacy: The Good, The Bad, And The Ugly (2023) • High-Confidence Computing • 594 citations
Yao et al.
A Complete Survey On Generative AI (AIGC): Is Chatgpt From GPT-4 To GPT-5 All You Need? (2023) • Arxiv • 101 citations
Zhang et al.
Seeing Chatgpt Through Students' Eyes: An Analysis Of Tiktok Data (2023) • 2023 Big Data Meets Survey Science (BigSurv) • 43 citations
Haensch et al.
A Comprehensive Survey On Segment Anything Model For Vision And Beyond (2023) • Arxiv • 45 citations
Zhang et al.
A Survey On Uncertainty Quantification Methods For Deep Learning (2023) • Arxiv • 50 citations
He et al.
Foundation Models And Fair Use (2023) • SSRN Electronic Journal • 64 citations
Henderson et al.
From Task Structures To World Models: What Do Llms Know? (2023) • Trends in Cognitive Sciences • 43 citations
Ilker Yildirim, L. A. Paul
Harnessing The Power Of Llms In Practice: A Survey On Chatgpt And Beyond (2023) • ACM Transactions on Knowledge Discovery from Data • 303 citations
Yang et al.
A Survey On Automated Program Repair Techniques (2023) • Artificial Intelligence Review • 62 citations
Huang et al.
Perception, Performance, And Detectability Of Conversational Artificial Intelligence Across 32 University Courses (2023) • Scientific Reports • 105 citations
Ibrahim et al.
Designing Participatory AI: Creative Professionals' Worries And Expectations About Generative AI (2023) • Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems • 96 citations
Nanna Inie, Jeanette Falk, Steven Tanimoto
From Image To Language: A Critical Analysis Of Visual Question Answering (VQA) Approaches, Challenges, And Opportunities (2023) • Information Fusion • 58 citations
Ishmam et al.
A Comprehensive Survey On Applications Of Transformers For Deep Learning Tasks (2023) • Expert Systems with Applications • 259 citations
Islam et al.
AI Alignment: A Comprehensive Survey (2023) • Arxiv • 60 citations
Ji et al.
Practical And Ethical Challenges Of Large Language Models In Education: A Systematic Scoping Review (2023) • British Journal of Educational Technology • 423 citations
Yan et al.
Large Models For Time Series And Spatio-temporal Data: A Survey And Outlook (2023) • IEEE Transactions on Knowledge and Data Engineering • 54 citations
Jin et al.
End-to-end Speech Recognition: A Survey (2023) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 131 citations
Prabhavalkar et al.
Scaling Down To Scale Up: A Guide To Parameter-efficient Fine-tuning (2023) • Arxiv • 66 citations
Lialin et al.
The Robots Are Here: Navigating The Generative AI Revolution In Computing Education (2023) • Proceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education • 235 citations
Prather et al.
Opening Up Chatgpt: Tracking Openness, Transparency, And Accountability In Instruction-tuned Text Generators (2023) • CUI '23: ACM conference on Conversational User Interfaces • 64 citations
Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse
How Can Recommender Systems Benefit From Large Language Models: A Survey (2023) • ACM Transactions on Information Systems • 85 citations
Lin et al.
Chatgpt As An Attack Tool: Stealthy Textual Backdoor Attack Via Blackbox Generative Model Trigger (2023) • Reliability Engineering & System Safety • 75 citations
Li et al.
Diffusion Models For Non-autoregressive Text Generation: A Survey (2023) • Arxiv • 41 citations
Li et al.
Large Multimodal Models: Notes On CVPR 2023 Tutorial (2023) • ICAIF '23: 4th ACM International Conference on AI in Finance • 157 citations
Chunyuan Li
Multimodal Foundation Models: From Specialists To General-purpose Assistants (2023) • No Venue
Li et al.
Towards Tracing Code Provenance With Code Watermarking (2023) • IEEE Internet of Things Journal • 62 citations
Li et al.
Vision-language Models In Remote Sensing: Current Progress And Future Trends (2023) • IEEE Geoscience and Remote Sensing Magazine • 80 citations
Li et al.
Large Language Models For Generative Information Extraction: A Survey (2023) • Frontiers of Computer Science • 119 citations
Xu et al.
A Survey Of GPT-3 Family Large Language Models Including Chatgpt And GPT-4 (2023) • Natural Language Processing Journal • 201 citations
Katikapalli Subramanyam Kalyan
A Survey Of Learning-based Automated Program Repair (2023) • ACM Transactions on Software Engineering and Methodology • 77 citations
Zhang et al.
Sasha: Creative Goal-oriented Reasoning In Smart Homes With Large Language Models (2023) • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies • 44 citations
King et al.
The Troubling Emergence Of Hallucination In Large Language Models -- An Extensive Definition, Quantification, And Prescriptive Remediations (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 56 citations
Rawte et al.
Siren's Song In The AI Ocean: A Survey On Hallucination In Large Language Models (2023) • Computational Linguistics • 55 citations
Zhang et al.
Vision Language Models In Autonomous Driving: A Survey And Outlook (2023) • IEEE Transactions on Intelligent Vehicles • 46 citations
Zhou et al.
Text-visual Prompting For Efficient 2D Temporal Video Grounding (2023) • Arxiv • 73 citations
Zhang et al.
Vision-language Models For Vision Tasks: A Survey (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 403 citations
Zhang et al.
Large Language Models In Law: A Survey (2023) • AI Open • 53 citations
Lai et al.
Unifying The Perspectives Of NLP And Software Engineering: A Survey On Language Models For Code (2023) • No Venue
Zhang et al.
Opportunities And Challenges For Chatgpt And Large Language Models In Biomedicine And Health (2023) • Briefings in Bioinformatics • 233 citations
Tian et al.
Reasoning With Language Model Prompting: A Survey (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 71 citations
Qiao et al.
A Survey On Retrieval-augmented Text Generation (2022) • Arxiv • 70 citations
Li et al.
Transformers In Time Series: A Survey (2022) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 557 citations
Wen et al.
Towards Attribute-entangled Controllable Text Generation: A Pilot Study Of Blessing Generation (2022) • Findings of the Association for Computational Linguistics: ACL 2023 • 199 citations
Huang et al.
A Systematic Review And Replicability Study Of Bert4rec For Sequential Recommendation (2022) • Proceedings of the 16th ACM Conference on Recommender Systems • 40 citations
Aleksandr Petrov, Craig MacDonald
Diffusion Models: A Comprehensive Survey Of Methods And Applications (2022) • Arxiv • 147 citations
Yang et al.
Artificial Intelligence For The Metaverse: A Survey (2022) • Engineering Applications of Artificial Intelligence • 601 citations
Huynh-The et al.
Deep Unsupervised Domain Adaptation: A Review Of Recent Advances And Perspectives (2022) • APSIPA Transactions on Signal and Information Processing • 225 citations
Liu et al.
Survey Of Hallucination In Natural Language Generation (2022) • ACM Computing Surveys • 2334 citations
Ji et al.
Automatic Text Summarization Methods: A Comprehensive Review (2022) • SN Computer Science • 68 citations
Divakar Yadav, Jalpa Desai, Arun Kumar Yadav
An Empirical Survey On Long Document Summarization: Datasets, Models And Metrics (2022) • ACM Computing Surveys • 69 citations
Koh et al.
Empathetic Conversational Systems: A Review Of Current Advances, Gaps, And Opportunities (2022) • IEEE Transactions on Affective Computing • 44 citations
Aravind Sesagiri Raamkumar, Yinping Yang
The Debate Over Understanding In Ai's Large Language Models (2022) • Proceedings of the National Academy of Sciences • 188 citations
Melanie Mitchell, David C. Krakauer
Language Models As Agent Models (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 44 citations
Jacob Andreas
A Survey On Generative Diffusion Model (2022) • Arxiv • 66 citations
Cao et al.
Dense Text Retrieval Based On Pretrained Language Models: A Survey (2022) • ACM Transactions on Information Systems • 85 citations
Zhao et al.
UX Research On Conversational Human-ai Interaction: A Literature Review Of The ACM Digital Library (2022) • CHI Conference on Human Factors in Computing Systems • 70 citations
Zheng et al.
Large Language Models Meet Nl2code: A Survey (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 81 citations
Zan et al.
A Survey On Legal Judgment Prediction: Datasets, Metrics, Models And Challenges (2022) • IEEE Access • 46 citations
Cui et al.
On The Explainability Of Natural Language Processing Deep Models (2022) • ACM Computing Surveys • 89 citations
Julia El Zini, Mariette Awad
Robust Natural Language Processing: Recent Advances, Challenges, And Future Directions (2022) • IEEE Access • 63 citations
Omar et al.
Theories Of "gender" In NLP Bias Research (2022) • 2022 ACM Conference on Fairness Accountability and Transparency • 41 citations
Hannah Devinney, Jenny Björklund, Henrik Björklund
A Survey On In-context Learning (2022) • Arxiv • 240 citations
Dong et al.
Shortcut Learning Of Large Language Models In Natural Language Understanding (2022) • Communications of the ACM • 44 citations
Du et al.
A Survey Of Vision-language Pre-trained Models (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 111 citations
Du et al.
AI And 6G Into The Metaverse: Fundamentals, Challenges And Future Research Trends (2022) • IEEE Open Journal of the Communications Society • 122 citations
Zawish et al.
Deep Learning-aided 6G Wireless Networks: A Comprehensive Survey Of Revolutionary PHY Architectures (2022) • IEEE Open Journal of the Communications Society • 96 citations
Ozpoyraz et al.
Transformers In Medical Imaging: A Survey (2022) • Medical Image Analysis • 773 citations
Shamshad et al.
A Comprehensive Survey Of Few-shot Learning: Evolution, Applications, Challenges, And Opportunities (2022) • ACM Computing Surveys • 448 citations
Song et al.
How To Keep Text Private? A Systematic Review Of Deep Learning Methods For Privacy-preserving Natural Language Processing (2022) • Artificial Intelligence Review • 57 citations
Samuel Sousa, Roman Kern
Vision-and-language Navigation: A Survey Of Tasks, Methods, And Future Directions (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 62 citations
Gu et al.
Artificial Intelligence In Government: Concepts, Standards, And A Unified Framework (2022) • Government Information Quarterly • 48 citations
Straub et al.
A Survey Of Controllable Text Generation Using Transformer-based Pre-trained Language Models (2022) • ACM Computing Surveys • 153 citations
Zhang et al.
TRUE: Re-evaluating Factual Consistency Evaluation (2022) • Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering • 47 citations
Honovich et al.
Pre-trained Language Model For Web-scale Retrieval In Baidu Search (2021) • ACM Computing Surveys • 2351 citations
Liu et al.
A Primer On Contrastive Pretraining In Language Processing: Methods, Lessons Learned And Perspectives (2021) • ACM Computing Surveys • 55 citations
Nils Rethmeier, Isabelle Augenstein
Federated Learning Meets Natural Language Processing: A Survey (2021) • Arxiv • 43 citations
Liu et al.
Paradigm Shift In Natural Language Processing (2021) • Machine Intelligence Research • 80 citations
Sun et al.
The Factual Inconsistency Problem In Abstractive Text Summarization: A Survey (2021) • Arxiv • 66 citations
Huang et al.
Retrieving And Reading: A Comprehensive Survey On Open-domain Question Answering (2021) • Arxiv • 152 citations
Zhu et al.
A Survey On Spoken Language Understanding: Recent Advances And New Frontiers (2021) • Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence • 68 citations
Qin et al.
A Survey Of Visual Transformers (2021) • IEEE Transactions on Neural Networks and Learning Systems • 285 citations
Liu et al.
Eliciting And Analysing Users' Envisioned Dialogues With Perfect Voice Assistants (2021) • Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems • 64 citations
Völkel et al.
Automated Fact-checking For Assisting Human Fact-checkers (2021) • Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence • 144 citations
Nakov et al.
Software-based Dialogue Systems: Survey, Taxonomy And Challenges (2021) • ACM Computing Surveys • 45 citations
Quim Motger, Xavier Franch, Jordi Marco
Pre-training For Low Resource Speech-to-intent Applications (2021) • ACM Computing Surveys • 105 citations
Pu Wang, Hugo van Hamme
A Survey On Accuracy-oriented Neural Recommendation: From Collaborative Filtering To Information-rich Recommendation (2021) • IEEE Transactions on Knowledge and Data Engineering • 368 citations
Wu et al.
Recent Advances In Natural Language Processing Via Large Pre-trained Language Models: A Survey (2021) • ACM Computing Surveys • 812 citations
Min et al.
Topic Modelling Meets Deep Neural Networks: A Survey (2021) • Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence • 108 citations
Zhao et al.
An Empirical Survey Of The Effectiveness Of Debiasing Techniques For Pre-trained Language Models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 45 citations
Nicholas Meade, Elinor Poole-Dayan, Siva Reddy
A Short Survey Of Pre-trained Language Models For Conversational AI-A Newage In NLP (2021) • ACSW '20: Australasian Computer Science Week 2020 • 48 citations
Munazza Zaib, Quan Z. Sheng, Wei Emma Zhang
Deepfakes Generation And Detection: State-of-the-art, Open Challenges, Countermeasures, And Way Forward (2021) • Applied Intelligence • 326 citations
Masood et al.
Data Augmentation Approaches In Natural Language Processing: A Survey (2021) • AI Open • 290 citations
Bohan Li, Yutai Hou, Wanxiang Che
Harms Of Gender Exclusivity And Challenges In Non-binary Representation In Language Technologies (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 106 citations
Dev et al.
The NLP Cookbook: Modern Recipes For Transformer Based Deep Learning Architectures (2021) • IEEE Access • 121 citations
Sushant Singh, Ausif Mahmood
Pretrained Language Models For Text Generation: A Survey (2021) • Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21} • 88 citations
Li et al.
A Neural Network Solves, Explains, And Generates University Math Problems By Program Synthesis And Few-shot Learning At Human Level (2021) • Proceedings of the National Academy of Sciences • 70 citations
Drori et al.
A Survey Of Data Augmentation Approaches For NLP (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 127 citations
Feng et al.
AMMU : A Survey Of Transformer-based Biomedical Pretrained Language Models (2021) • Journal of Biomedical Informatics • 212 citations
Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
Gender Bias In Machine Translation (2021) • Transactions of the Association for Computational Linguistics • 74 citations
Savoldi et al.
Pre-trained Models: Past, Present And Future (2021) • AI Open • 700 citations
Han et al.
Societal Biases In Language Generation: Progress And Challenges (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 102 citations
Sheng et al.
A Practical Survey On Faster And Lighter Transformers (2021) • ACM Computing Surveys • 75 citations
Quentin Fournier, Gaétan Marceau Caron, Daniel Aloise
Adversarial Text-to-image Synthesis: A Review (2021) • Neural Networks • 193 citations
Frolov et al.
Transformers In Vision: A Survey (2021) • ACM Computing Surveys • 2262 citations
Khan et al.
Deep Transfer Learning & Beyond: Transformer Language Models In Information Systems Research (2021) • ACM Computing Surveys • 42 citations
Ross Gruetzemacher, David Paradice
A Survey On Visual Transformer (2020) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 3049 citations
Han et al.
Meta-cotgan: A Meta Cooperative Training Paradigm For Improving Adversarial Text Generation (2020) • Arxiv • 62 citations
Yin et al.
Which *BERT? A Survey Organizing Contextualized Encoders (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 44 citations
Patrick Xia, Shijie Wu, Benjamin van Durme
Machine Reading Comprehension: The Role Of Contextualized Language Models And Beyond (2020) • Arxiv • 48 citations
Zhuosheng Zhang, Hai Zhao, Rui Wang
Multi-task Learning For Natural Language Processing In The 2020s: Where Are We Going? (2020) • Pattern Recognition Letters • 76 citations
Joseph Worsham, Jugal Kalita
News Recommender System: A Review Of Recent Progress, Challenges, And Opportunities (2020) • Artificial Intelligence Review • 138 citations
Shaina Raza, Chen Ding
Compressing Large-scale Transformer-based Models: A Case Study On BERT (2020) • Transactions of the Association for Computational Linguistics • 102 citations
Ganesh et al.
Confronting Abusive Language Online: A Survey From The Ethical And Human Rights Perspective (2020) • Journal of Artificial Intelligence Research • 53 citations
Svetlana Kiritchenko, Isar Nejadgholi, Kathleen C. Fraser
How Useful Are Reviews For Recommendation? A Critical Review And Potential Improvements (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 54 citations
Noveen Sachdeva, Julian McAuley
Empowering Things With Intelligence: A Survey Of The Progress, Challenges, And Opportunities In Artificial Intelligence Of Things (2020) • IEEE Internet of Things Journal • 571 citations
Jing Zhang, Dacheng Tao
Efficient Transformers: A Survey (2020) • ACM Computing Surveys • 532 citations
Tay et al.
Referring Expression Comprehension: A Survey Of Methods And Datasets (2020) • IEEE Transactions on Multimedia • 81 citations
Yanyuan Qiao, Chaorui Deng, Qi Wu
A Survey On Conversational Recommender Systems (2020) • ACM Computing Surveys • 313 citations
Jannach et al.
Survey On Deep Multi-modal Data Analytics: Collaboration, Rivalry And Fusion (2020) • ACM Transactions on Multimedia Computing, Communications, and Applications • 99 citations
Yang Wang
A Primer In Bertology: What We Know About How BERT Works (2020) • Transactions of the Association for Computational Linguistics • 146 citations
Anna Rogers, Olga Kovaleva, Anna Rumshisky
Language (technology) Is Power: A Critical Survey Of "bias" In NLP (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 68 citations
Blodgett et al.
The Cost Of Training NLP Models: A Concise Overview (2020) • Arxiv • 114 citations
Or Sharir, Barak Peleg, Yoav Shoham
Pre-trained Models For Natural Language Processing: A Survey (2020) • Science China Technological Sciences • 1389 citations
Qiu et al.
Artificial Intelligence In The Battle Against Coronavirus (COVID-19): A Survey And Future Research Directions (2020) • Arxiv • 165 citations
Nguyen et al.
A Survey On Machine Reading Comprehension: Tasks, Evaluation Metrics And Benchmark Datasets (2020) • Applied Sciences • 61 citations
Zeng et al.
Artificial Intelligence (AI) In Action: Addressing The COVID-19 Pandemic With Natural Language Processing (NLP) (2020) • Annual Review of Biomedical Data Science • 56 citations
Chen et al.
Logical Natural Language Generation From Open-domain Tables (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 114 citations
Chen et al.
Deep Learning For Source Code Modeling And Generation: Models, Applications And Challenges (2020) • ACM Computing Surveys • 108 citations
Triet H. M. Le, Hao Chen, M. Ali Babar
What The [MASK]? Making Sense Of Language-specific BERT Models (2020) • Arxiv • 83 citations
Debora Nozza, Federico Bianchi, Dirk Hovy
Directions In Abusive Language Training Data: Garbage In, Garbage Out (2020) • PLOS ONE • 132 citations
Bertie Vidgen, Leon Derczynski
A Survey Of The State Of Explainable AI For Natural Language Processing (2020) • Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing • 159 citations
Danilevsky et al.
A Survey Of Knowledge-enhanced Text Generation (2020) • ACM Computing Surveys • 220 citations
Yu et al.
A Survey Of Active Learning For Text Classification Using Deep Neural Networks (2020) • Arxiv • 60 citations
Christopher Schröder, Andreas Niekler
The Evolution Of Argumentation Mining: From Models To Social Media And Emerging Tools (2019) • Information Processing & Management • 50 citations
Lytos et al.
Challenges In Building Intelligent Open-domain Dialog Systems (2019) • Arxiv • 44 citations
Minlie Huang, Xiaoyan Zhu, Jianfeng Gao
Deep Reinforcement Learning For Sequence To Sequence Models (2018) • IEEE Transactions on Neural Networks and Learning Systems • 41 citations
Keneshloo et al.
Video Description: A Survey Of Methods, Datasets And Evaluation Metrics (2018) • ACM Computing Surveys • 138 citations
Aafaq et al.
Natural Language Processing For Ehr-based Computational Phenotyping (2018) • IEEE/ACM Transactions on Computational Biology and Bioinformatics • 198 citations
Zeng et al.
End-to-end Content And Plan Selection For Data-to-text Generation (2018) • Proceedings of the 11th International Conference on Natural Language Generation • 68 citations
Gehrmann et al.
Neural Abstractive Text Summarization With Sequence-to-sequence Models (2018) • Arxiv • 68 citations
Shi et al.
A Survey Of Cross-lingual Word Embedding Models (2017) • JAIR 65 (2019) 569-631 • 544 citations
Sebastian Ruder, Ivan Vulić, Anders Søgaard
Survey Of The State Of The Art In Natural Language Generation: Core Tasks, Applications And Evaluation (2017) • Journal of Artificial Intelligence Research • 347 citations
Albert Gatt, Emiel Krahmer
Deep Learning Based Recommender System: A Survey And New Perspectives (2017) • Arxiv • 655 citations
Zhang et al.
Detection And Resolution Of Rumours In Social Media: A Survey (2017) • ACM Computing Surveys • 728 citations
Zubiaga et al.
When Will AI Exceed Human Performance? Evidence From AI Experts (2017) • Arxiv • 221 citations
Grace et al.
Automated Crowdturfing Attacks And Defenses In Online Review Systems (2017) • Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security • 133 citations
Yao et al.
Visual Question Answering: A Survey Of Methods And Datasets (2016) • Arxiv • 44 citations
Wu et al.
Generative Deep Neural Networks For Dialogue: A Short Review (2016) • Arxiv • 67 citations
Serban et al.

Showing first 12 while collapsed. Click to expand and reveal all 313.

— T —

TACL 31 papers #

Automatically Correcting Large Language Models: Surveying The Landscape Of Diverse Self-correction Strategies (2023) • Transactions of the Association for Computational Linguistics • 42 citations
Pan et al.
Hallucinations In Large Multilingual Translation Models (2023) • Transactions of the Association for Computational Linguistics • 77 citations
Guerreiro et al.
Benchmarking Large Language Models For News Summarization (2023) • Transactions of the Association for Computational Linguistics • 203 citations
Zhang et al.
In-context Retrieval-augmented Language Models (2023) • Transactions of the Association for Computational Linguistics • 201 citations
Ram et al.
Why Does Surprisal From Larger Transformer-based Language Models Provide A Poorer Fit To Human Reading Times? (2022) • Transactions of the Association for Computational Linguistics • 51 citations
Byung-Doh Oh, William Schuler
Measuring And Improving Consistency In Pretrained Language Models (2021) • Transactions of the Association for Computational Linguistics • 82 citations
Elazar et al.
CANINE: Pre-training An Efficient Tokenization-free Encoder For Language Representation (2021) • Transactions of the Association for Computational Linguistics • 114 citations
Clark et al.
Quality At A Glance: An Audit Of Web-crawled Multilingual Datasets (2021) • Transactions of the Association for Computational Linguistics • 155 citations
Kreutzer et al.
Time-aware Language Models As Temporal Knowledge Bases (2021) • Transactions of the Association for Computational Linguistics • 49 citations
Dhingra et al.
Decoupling The Role Of Data, Attention, And Losses In Multimodal Transformers (2021) • Transactions of the Association for Computational Linguistics • 63 citations
Hendricks et al.
Gender Bias In Machine Translation (2021) • Transactions of the Association for Computational Linguistics • 74 citations
Savoldi et al.
Self-diagnosis And Self-debiasing: A Proposal For Reducing Corpus-based Bias In NLP (2021) • Transactions of the Association for Computational Linguistics • 196 citations
Timo Schick, Sahana Udupa, Hinrich Schütze
Summeval: Re-evaluating Summarization Evaluation (2020) • Transactions of the Association for Computational Linguistics • 47 citations
Fabbri et al.
A Knowledge-enhanced Pretraining Model For Commonsense Story Generation (2020) • Transactions of the Association for Computational Linguistics • 231 citations
Guan et al.
How Can We Know When Language Models Know? On The Calibration Of Language Models For Question Answering (2020) • Transactions of the Association for Computational Linguistics • 93 citations
Jiang et al.
A Primer In Bertology: What We Know About How BERT Works (2020) • Transactions of the Association for Computational Linguistics • 146 citations
Anna Rogers, Olga Kovaleva, Anna Rumshisky
PERL: Pivot-based Domain Adaptation For Pre-trained Deep Contextualized Embedding Models (2020) • Transactions of the Association for Computational Linguistics • 47 citations
Eyal Ben-David, Carmel Rabinovitz, Roi Reichart
Efficient Content-based Sparse Attention With Routing Transformers (2020) • Transactions of the Association for Computational Linguistics • 266 citations
Roy et al.
Multilingual Denoising Pre-training For Neural Machine Translation (2020) • Transactions of the Association for Computational Linguistics • 897 citations
Liu et al.
Spanbert: Improving Pre-training By Representing And Predicting Spans (2019) • Transactions of the Association for Computational Linguistics • 179 citations
Joshi et al.
Synchronous Bidirectional Neural Machine Translation (2019) • Transactions of the Association for Computational Linguistics • 116 citations
Long Zhou, Jiajun Zhang, Chengqing Zong
How Can We Know What Language Models Know? (2019) • Transactions of the Association for Computational Linguistics • 543 citations
Jiang et al.
Attention-passing Models For Robust And Data-efficient End-to-end Speech Translation (2019) • Transactions of the Association for Computational Linguistics • 94 citations
Sperber et al.
Coqa: A Conversational Question Answering Challenge (2018) • Transactions of the Association for Computational Linguistics • 97 citations
Siva Reddy, Danqi Chen, Christopher D. Manning
Cross-sentence N-ary Relation Extraction With Graph Lstms (2017) • Transactions of the Association for Computational Linguistics (TACL) 2017 Vol 5 • 133 citations
Peng et al.
Colors In Context: A Pragmatic Neural Model For Grounded Language Understanding (2017) • Transactions of the Association for Computational Linguistics • 104 citations
Monroe et al.
Aspect-augmented Adversarial Networks For Domain Adaptation (2017) • Transactions of the Association for Computational Linguistics • 93 citations
Yuan Zhang, Regina Barzilay, Tommi Jaakkola
Deep Recurrent Models With Fast-forward Connections For Neural Machine Translation (2016) • Transactions of the Association for Computational Linguistics • 223 citations
Zhou et al.
Fully Character-level Neural Machine Translation Without Explicit Segmentation (2016) • Transactions of the Association for Computational Linguistics • 409 citations
Jason Lee, Kyunghyun Cho, Thomas Hofmann
Simple And Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations (2016) • Transactions of the Association for Computational Linguistics • 583 citations
Eliyahu Kiperwasser, Yoav Goldberg
Assessing The Ability Of Lstms To Learn Syntax-sensitive Dependencies (2016) • Transactions of the Association for Computational Linguistics • 761 citations
Tal Linzen, Emmanuel Dupoux, Yoav Goldberg

Showing first 12 while collapsed. Click to expand and reveal all 31.

Training Techniques 399 papers #

Predicting The Order Of Upcoming Tokens Improves Language Modeling (2025) • No Venue
Zayd M. K. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji
Internvl3: Exploring Advanced Training And Test-time Recipes For Open-source Multimodal Models (2025) • No Venue
Zhu et al.
DRIVE: Data Curation Best Practices For Reinforcement Learning With Verifiable Reward In Competitive Code Generation (2025) • No Venue
Zhu et al.
Is Extending Modality The Right Path Towards Omni-modality? (2025) • No Venue
Zhu et al.
Transformers Without Normalization (2025) • No Venue
Zhu et al.
Critique Fine-tuning: Learning To Critique Is More Effective Than Learning To Imitate (2025) • No Venue
Yubo Wang, Xiang Yue, Wenhu Chen
Fostering Video Reasoning Via Next-event Prediction (2025) • No Venue
Wang et al.
Soundwave: Less Is More For Speech-text Alignment In Llms (2025) • No Venue
Zhang et al.
GKG-LLM: A Unified Framework For Generalized Knowledge Graph Construction (2025) • No Venue
Zhang et al.
Interactive Training: Feedback-driven Neural Network Optimization (2025) • No Venue
Wentao Zhang, Yang Young Lu, Yuntian Deng
Insights Into Deepseek-v3: Scaling Challenges And Reflections On Hardware For AI Architectures (2025) • No Venue
Zhao et al.
Diffusion Transformers With Representation Autoencoders (2025) • No Venue
Zheng et al.
Lumos-1: On Autoregressive Video Generation From A Unified Model Perspective (2025) • No Venue
Yuan et al.
Understand Before You Generate: Self-guided Training For Autoregressive Image Generation (2025) • No Venue
Yue et al.
Embodied-reasoner: Synergizing Visual Search, Reasoning, And Action For Embodied Interactive Tasks (2025) • No Venue
Zhang et al.
Agent Learning Via Early Experience (2025) • No Venue
Zhang et al.
Ovis-u1 Technical Report (2025) • No Venue
Wang et al.
Optimizing Large Language Model Training Using FP4 Quantization (2025) • No Venue
Wang et al.
Revolutionizing Reinforcement Learning Framework For Diffusion Large Language Models (2025) • No Venue
Wang et al.
Findings Of The Babylm Challenge: Sample-efficient Pretraining On Developmentally Plausible Corpora (2025) • Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning • 63 citations
Warstadt et al.
Seq Vs Seq: An Open Suite Of Paired Encoders And Decoders (2025) • No Venue
Weller et al.
Qwen-image Technical Report (2025) • No Venue
Wu et al.
Reconstruction Alignment Improves Unified Multimodal Models (2025) • No Venue
Xie et al.
Pretrainzero: Reinforcement Active Pretraining (2025) • No Venue
Xing et al.
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding And Generation (2025) • No Venue
Xu et al.
Phi-4-mini-reasoning: Exploring The Limits Of Small Reasoning Language Models In Math (2025) • No Venue
Xu et al.
Genius: A Generalizable And Purely Unsupervised Self-training Framework For Advanced Reasoning (2025) • No Venue
Xu et al.
Reasonflux: Hierarchical LLM Reasoning Via Scaling Thought Templates (2025) • No Venue
Yang et al.
Optimizing Chain-of-thought Reasoners Via Gradient Variance Minimization In Rejection Sampling And RL (2025) • No Venue
Yao et al.
Black-box On-policy Distillation Of Large Language Models (2025) • No Venue
Ye et al.
Ultraflux: Data-model Co-design For High-quality Native 4K Text-to-image Generation Across Diverse Aspect Ratios (2025) • No Venue
Tian Ye, Song Fei, Lei Zhu
Every Activation Boosted: Scaling General Reasoner To 1 Trillion Open Language Foundation (2025) • No Venue
Ling-Team et al.
NVIDIA Nemotron Nano V2 VL (2025) • No Venue
Nvidia et al.
Front-loading Reasoning: The Synergy Between Pretraining And Post-training Data (2025) • No Venue
Akter et al.
LFM2 Technical Report (2025) • No Venue
Amini et al.
All Is Not Lost: LLM Recovery Without Checkpoints (2025) • No Venue
Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen
Hunyuanimage 3.0 Technical Report (2025) • No Venue
Cao et al.
Quartet: Native FP4 Training Can Be Optimal For Large Language Models (2025) • No Venue
Castro et al.
Humo: Human-centric Video Generation Via Collaborative Multi-modal Conditioning (2025) • No Venue
Chen et al.
Agentfrontier: Expanding The Capability Frontier Of LLM Agents With Zpd-guided Data Synthesis (2025) • No Venue
Chen et al.
Acereason-nemotron: Advancing Math And Code Reasoning Through Reinforcement Learning (2025) • No Venue
Chen et al.
Astra: Toward General-purpose Mobile Robots Via Hierarchical Multimodal Learning (2025) • No Venue
Chen et al.
Comp: Continual Multimodal Pre-training For Vision Foundation Models (2025) • No Venue
Chen et al.
Coda: Coding LM Via Diffusion Adaptation (2025) • No Venue
Chen et al.
Exploring The Effect Of Reinforcement Learning On Video Understanding: Insights From Seed-bench-r1 (2025) • No Venue
Chen et al.
Goku: Flow Based Video Generative Foundation Models (2025) • No Venue
Chen et al.
Pass@k Training For Adaptively Balancing Exploration And Exploitation Of Large Reasoning Models (2025) • No Venue
Chen et al.
Livecc: Learning Video LLM With Streaming Speech Transcription At Scale (2025) • No Venue
Chen et al.
Persona Vectors: Monitoring And Controlling Character Traits In Language Models (2025) • No Venue
Chen et al.
SFT Memorizes, RL Generalizes: A Comparative Study Of Foundation Model Post-training (2025) • No Venue
Chu et al.
Metaclip 2: A Worldwide Scaling Recipe (2025) • No Venue
Chuang et al.
Streaming Diloco With Overlapping Communication: Towards A Distributed Free Lunch (2025) • No Venue
Douillard et al.
SONAR-LLM: Autoregressive Transformer That Thinks In Sentence Embeddings And Speaks In Tokens (2025) • No Venue
Dragunov et al.
Unimmvsr: A Unified Multi-modal Framework For Cascaded Video Super-resolution (2025) • No Venue
Du et al.
WILDCHAT-50M: A Deep Dive Into The Role Of Synthetic Data In Post-training (2025) • No Venue
Benjamin Feuer, Chinmay Hegde
Centurio: On Drivers Of Multilingual Ability Of Large Vision-language Model (2025) • No Venue
Geigle et al.
Great Models Think Alike And This Undermines AI Oversight (2025) • No Venue
Goel et al.
Rstar-math: Small Llms Can Master Math Reasoning With Self-evolved Deep Thinking (2025) • No Venue
Guan et al.
Openthoughts: Data Recipes For Reasoning Models (2025) • No Venue
Guha et al.
Train Long, Think Short: Curriculum Learning For Efficient Reasoning (2025) • No Venue
Hammoud et al.
Hardtests: Synthesizing High-quality Test Cases For LLM Coding (2025) • No Venue
He et al.
R-zero: Self-evolving Reasoning LLM From Zero Data (2025) • No Venue
Huang et al.
Stable-spam: How To Train In 4-bit More Stably Than 16-bit Adam (2025) • No Venue
Huang et al.
Marigold: Affordable Adaptation Of Diffusion-based Image Generators For Image Analysis (2025) • No Venue
Ke et al.
Universal Reasoner: A Single, Composable Plug-and-play Reasoner For Frozen Llms (2025) • No Venue
Kim et al.
Zclip: Adaptive Spike Mitigation For LLM Pre-training (2025) • No Venue
Kumar et al.
REPA-E: Unlocking VAE For End-to-end Tuning With Latent Diffusion Transformers (2025) • No Venue
Leng et al.
Can One Domain Help Others? A Data-centric Study On Multi-domain Reasoning Via Reinforcement Learning (2025) • No Venue
Li et al.
Baichuan-omni-1.5 Technical Report (2025) • No Venue
Li et al.
Llms Can Easily Learn To Reason From Demonstrations Structure, Not Content, Is What Matters! (2025) • No Venue
Li et al.
Model Merging In Pre-training Of Large Language Models (2025) • No Venue
Li et al.
Taming Llms By Scaling Learning Rates With Gradient Grouping (2025) • No Venue
Li et al.
Where To Find Grokking In LLM Pretraining? Monitor Memorization-to-generalization Without Test (2025) • No Venue
Ziyue Li, Chenrui Fan, Tianyi Zhou
Improved Visual-spatial Reasoning Via R1-zero-like Training (2025) • No Venue
Liao et al.
Implicit Reasoning In Transformers Is Reasoning Through Shortcuts (2025) • No Venue
Lin et al.
Deciphering Trajectory-aided LLM Reasoning: An Optimization Perspective (2025) • No Venue
Liu et al.
Olmotrace: Tracing Language Model Outputs Back To Trillions Of Training Tokens (2025) • No Venue
Liu et al.
Don't Just Fine-tune The Agent, Tune The Environment (2025) • No Venue
Lu et al.
Av-reasoner: Improving And Benchmarking Clue-grounded Audio-visual Counting For Mllms (2025) • No Venue
Lu et al.
Language Models Can Learn From Verbal Feedback Without Scalar Rewards (2025) • No Venue
Luo et al.
The Climb Carves Wisdom Deeper Than The Summit: On The Noisy Rewards In Learning To Reason (2025) • No Venue
Lv et al.
Veomni: Scaling Any Modality Model Training With Model-centric Distributed Recipe Zoo (2025) • No Venue
Ma et al.
Souper-model: How Simple Arithmetic Unlocks State-of-the-art LLM Performance (2025) • No Venue
Maiti et al.
Dreamo: A Unified Framework For Image Customization (2025) • No Venue
Mou et al.
Tokenhsi: Unified Synthesis Of Physical Human-scene Interactions Through Task Tokenization (2025) • No Venue
Pan et al.
Quest: Stable Training Of Llms With 1-bit Weights And Activations (2025) • No Venue
Panferov et al.
Thinking Sparks!: Emergent Attention Heads In Reasoning Models During Post Training (2025) • No Venue
Yein Park, Minbyul Jeong, Jaewoo Kang
ACG: Action Coherence Guidance For Flow-based VLA Models (2025) • No Venue
Park et al.
Demons In The Detail: On Implementing Load Balancing Loss For Training Specialized Mixture-of-expert Models (2025) • No Venue
Qiu et al.
Apriel-1.5-15b-thinker (2025) • No Venue
Radhakrishna et al.
The Diffusion Duality (2025) • No Venue
Sahoo et al.
Benchmarking Optimizers For Large Language Model Pretraining (2025) • No Venue
Andrei Semenov, Matteo Pagliardini, Martin Jaggi
Reasonir: Training Retrievers For Reasoning Tasks (2025) • No Venue
Shao et al.
Skywork-r1v3 Technical Report (2025) • No Venue
Shen et al.
Dinov3 (2025) • No Venue
Siméoni et al.
The Curse Of Depth In Large Language Models (2025) • No Venue
Sun et al.
Nemotron Elastic: Towards Efficient Many-in-one Reasoning Llms (2025) • No Venue
Taghibakhshi et al.
Revisiting Long-context Modeling From Context Denoising Perspective (2025) • No Venue
Tang et al.
Kimi K1.5: Scaling Reinforcement Learning With Llms (2025) • No Venue
Team et al.
Llamav-o1: Rethinking Step-by-step Visual Reasoning In Llms (2025) • No Venue
Thawakar et al.
No "zero-shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance (2024) • No Venue
Udandarao et al.
Qwen2.5 Technical Report (2024) • No Venue
Qwen et al.
Eyes Wide Shut? Exploring The Visual Shortcomings Of Multimodal Llms (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 90 citations
Tong et al.
Make Your LLM Fully Utilize The Context (2024) • No Venue
An et al.
How Do Large Language Models Acquire Factual Knowledge During Pretraining? (2024) • No Venue
Chang et al.
Heavy Labels Out! Dataset Distillation With Label Space Lightening (2024) • No Venue
Yu et al.
Griffin: Mixing Gated Linear Recurrences With Local Attention For Efficient Language Models (2024) • No Venue
de et al.
Loong: Generating Minute-level Long Videos With Autoregressive Language Models (2024) • No Venue
Wang et al.
Training-free Consistent Text-to-image Generation (2024) • No Venue
Tewel et al.
Learning To (learn At Test Time): Rnns With Expressive Hidden States (2024) • No Venue
Sun et al.
Pre-training Small Base Lms With Fewer Tokens (2024) • No Venue
Sunny Sanyal, Sujay Sanghavi, Alexandros G. Dimakis
How To Train Data-efficient Llms (2024) • No Venue
Sachdeva et al.
Scaling Smart: Accelerating Large Language Model Pre-training With Small Model Initialization (2024) • No Venue
Samragh et al.
Hyper-connections (2024) • No Venue
Zhu et al.
Power Scheduler: A Batch Size And Token Number Agnostic Learning Rate Scheduler (2024) • No Venue
Shen et al.
Adapting Llms To Hebrew: Unveiling Dictalm 2.0 With Enhanced Vocabulary And Instruction Capabilities (2024) • No Venue
Shmidman et al.
APOLLO: Sgd-like Memory, Adamw-level Performance (2024) • No Venue
Zhu et al.
A Large Encoder-decoder Family Of Foundation Models For Chemical Language (2024) • No Venue
Soares et al.
LLM Pruning And Distillation In Practice: The Minitron Approach (2024) • No Venue
Sreenivas et al.
Kangaroo: Lossless Self-speculative Decoding Via Double Early Exiting (2024) • No Venue
Liu et al.
World Model On Million-length Video And Language With Ringattention (2024) • No Venue
Liu et al.
Understanding Llms: A Comprehensive Overview From Training To Inference (2024) • No Venue
Liu et al.
Starcoder 2 And The Stack V2: The Next Generation (2024) • No Venue
Lozhkov et al.
Divide-or-conquer? Which Part Should You Distill Your LLM? (2024) • No Venue
Wu et al.
Thinking Llms: General Instruction Following With Thought Generation (2024) • No Venue
Wu et al.
GEB-1.3B: Open Lightweight Large Language Model (2024) • No Venue
Wu et al.
Rephrasing The Web: A Recipe For Compute And Data-efficient Language Modeling (2024) • No Venue
Maini et al.
Transformers Can Do Arithmetic With The Right Embeddings (2024) • No Venue
McLeish et al.
Openelm: An Efficient Language Model Family With Open-source Training And Inference Framework (2024) • No Venue
Mehta et al.
Orca-math: Unlocking The Potential Of Slms In Grade School Math (2024) • No Venue
Mitra et al.
MALT: Improving Reasoning With Multi-agent LLM Training (2024) • No Venue
Motwani et al.
Olmoe: Open Mixture-of-experts Language Models (2024) • No Venue
Muennighoff et al.
Compact Language Models Via Pruning And Knowledge Distillation (2024) • No Venue
Muralidharan et al.
An Image Is Worth More Than 16x16 Patches: Exploring Transformers On Individual Pixels (2024) • No Venue
Nguyen et al.
A Survey Of Small Language Models (2024) • No Venue
Nguyen et al.
Towards Modular Llms By Building And Reusing A Library Of Loras (2024) • No Venue
Ostapenko et al.
Bielik 7B V0.1: A Polish Language Model -- Development, Insights, And Evaluation (2024) • No Venue
Ociepa et al.
Training Software Engineering Agents And Verifiers With Swe-gym (2024) • No Venue
Pan et al.
Mutual Reasoning Makes Smaller Llms Stronger Problem-solvers (2024) • No Venue
Qi et al.
Tinyllava: A Framework Of Small-scale Large Multimodal Models (2024) • No Venue
Zhou et al.
Skywork-moe: A Deep Dive Into Training Techniques For Mixture-of-experts Language Models (2024) • No Venue
Wei et al.
Humanoid Locomotion As Next Token Prediction (2024) • No Venue
Radosavovic et al.
2BP: 2-stage Backpropagation (2024) • No Venue
Christopher Rae, Joseph K. L. Lee, James Richings
Direct Nash Optimization: Teaching Language Models To Self-improve With General Preferences (2024) • No Venue
Rosset et al.
Stacking Your Transformers: A Closer Look At Model Growth For Efficient LLM Pre-training (2024) • No Venue
Du et al.
VILA^2: VILA Augmented VILA (2024) • No Venue
Fang et al.
Data Engineering For Scaling Language Models To 128K Context (2024) • No Venue
Fu et al.
Better & Faster Large Language Models Via Multi-token Prediction (2024) • No Venue
Gloeckle et al.
Mulberry: Empowering MLLM With O1-like Reasoning And Reflection Via Collective Monte Carlo Tree Search (2024) • No Venue
Yao et al.
Learning Universal Predictors (2024) • No Venue
Grau-Moya et al.
Data Mixture Inference: What Do BPE Tokenizers Reveal About Their Training Data? (2024) • No Venue
Hayase et al.
Llava-gemma: Accelerating Multimodal Foundation Models With A Compact Language Model (2024) • No Venue
Hinck et al.
No More Adam: Learning Rate Scaling At Initialization Is All You Need (2024) • No Venue
Xu et al.
Yulan-mini: An Open Data-efficient Language Model (2024) • No Venue
Hu et al.
Piccolo2: General Text Embedding With Multi-task Hybrid Loss Training (2024) • No Venue
Huang et al.
Sleeper Agents: Training Deceptive Llms That Persist Through Safety Training (2024) • No Venue
Hubinger et al.
Smaller Language Models Are Better Instruction Evolvers (2024) • No Venue
Hui et al.
Instruction-tuned Language Models Are Better Knowledge Learners (2024) • No Venue
Jiang et al.
Sdpo: Don't Use Your Data All At Once (2024) • No Venue
Kim et al.
Step-dpo: Step-wise Preference Optimization For Long-chain Reasoning Of Llms (2024) • No Venue
Lai et al.
TÜLU 3: Pushing Frontiers In Open Language Model Post-training (2024) • No Venue
Lambert et al.
Building And Better Understanding Vision-language Models: Insights And Future Directions (2024) • No Venue
Laurençon et al.
What Matters When Building Vision-language Models? (2024) • No Venue
Laurençon et al.
Euclid: Supercharging Multimodal Llms With Synthetic High-fidelity Visual Descriptions (2024) • No Venue
Zhang et al.
Androidlab: Training And Systematic Benchmarking Of Android Autonomous Agents (2024) • No Venue
Xu et al.
Mix-ln: Unleashing The Power Of Deeper Layers By Combining Pre-ln And Post-ln (2024) • No Venue
Pengxiang Li, Lu Yin, Shiwei Liu
Omnibench: Towards The Future Of Universal Omni-language Models (2024) • No Venue
Li et al.
Scaling (down) CLIP: A Comprehensive Analysis Of Data, Architecture, And Training Strategies (2024) • No Venue
Zichao Li, Cihang Xie, Ekin Dogus Cubuk
Agenttrek: Agent Trajectory Synthesis Via Guiding Replay With Web Tutorials (2024) • No Venue
Xu et al.
Adam-mini: Use Fewer Learning Rates To Gain More (2024) • No Venue
Zhang et al.
Cautious Optimizers: Improving Training With One Line Of Code (2024) • No Venue
Liang et al.
Foundation Models For Time Series Analysis: A Tutorial And Survey (2024) • Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 100 citations
Liang et al.
I-SHEEP: Self-alignment Of LLM From Scratch Through An Iterative Self-enhancement Paradigm (2024) • No Venue
Liang et al.
HARE: Human Priors, A Key To Small Language Model Efficiency (2024) • No Venue
Zhang et al.
Map-neo: Highly Capable And Transparent Bilingual Large Language Model Series (2024) • No Venue
Zhang et al.
FP8-LM: Training FP8 Large Language Models (2023) • No Venue
Peng et al.
Small-scale Proxies For Large-scale Transformer Training Instabilities (2023) • No Venue
Wortsman et al.
LLM360: Towards Fully Transparent Open-source Llms (2023) • No Venue
Liu et al.
EVA-CLIP: Improved Training Techniques For CLIP At Scale (2023) • Arxiv • 77 citations
Sun et al.
Training Transformers With 4-bit Integers (2023) • No Venue
Xi et al.
A Survey Of Graph Prompting Methods: Techniques, Applications, And Challenges (2023) • World Wide Web • 199 citations
Wu et al.
Remote Sensing Change Detection With Transformers Trained From Scratch (2023) • IEEE Transactions on Geoscience and Remote Sensing • 66 citations
Noman et al.
HOICLIP: Efficient Knowledge Transfer For HOI Detection With Vision-language Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Ning et al.
Orca 2: Teaching Small Language Models How To Reason (2023) • No Venue
Mitra et al.
A Survey On Multimodal Large Language Models (2023) • National Science Review • 271 citations
Yin et al.
GQA: Training Generalized Multi-query Transformer Models From Multi-head Checkpoints (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 204 citations
Ainslie et al.
Document-level Machine Translation With Large Language Models (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 71 citations
Wang et al.
Kandinsky 3.0 Technical Report (2023) • No Venue
Arkhipkin et al.
Pixart-α: Fast Training Of Diffusion Transformer For Photorealistic Text-to-image Synthesis (2023) • No Venue
Chen et al.
Merlin:empowering Multimodal Llms With Foresight Minds (2023) • No Venue
Yu et al.
Swinmm: Masked Multi-view With Swin Transformers For 3D Medical Image Segmentation (2023) • Lecture Notes in Computer Science • 41 citations
Wang et al.
Benchmarking Neural Network Training Algorithms (2023) • No Venue
Dahl et al.
Auggpt: Leveraging Chatgpt For Text Data Augmentation (2023) • Arxiv • 98 citations
Dai et al.
Ziya2: Data-centric Learning Is All Llms Need (2023) • No Venue
Gan et al.
Distil-whisper: Robust Knowledge Distillation Via Large-scale Pseudo Labelling (2023) • No Venue
Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
Graphgpt: Graph Instruction Tuning For Large Language Models (2023) • SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval • 96 citations
Tang et al.
Composable Function-preserving Expansions For Transformer Architectures (2023) • No Venue
Andrea Gesmundo, Kaitlin Maile
Emu Video: Factorizing Text-to-video Generation By Explicit Image Conditioning (2023) • No Venue
Girdhar et al.
Stylegan-t: Unlocking The Power Of Gans For Fast Large-scale Text-to-image Synthesis (2023) • Arxiv • 59 citations
Sauer et al.
Signbert+: Hand-model-aware Self-supervised Pre-training For Sign Language Understanding (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 89 citations
Hu et al.
Vid2seq: Large-scale Pretraining Of A Visual Language Model For Dense Video Captioning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 179 citations
Yang et al.
Hallucination Augmented Contrastive Learning For Multimodal Large Language Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Jiang et al.
Polylm: An Open Source Polyglot Large Language Model (2023) • No Venue
Wei et al.
Sabi\'a: Portuguese Large Language Models (2023) • Lecture Notes in Computer Science • 46 citations
Pires et al.
Stack More Layers Differently: High-rank Training Through Low-rank Updates (2023) • No Venue
Lialin et al.
FLM-101B: An Open LLM And How To Train It With $100K Budget (2023) • No Venue
Li et al.
Llava-med: Training A Large Language-and-vision Assistant For Biomedicine In One Day (2023) • Arxiv • 216 citations
Li et al.
Otter: A Multi-modal Model With In-context Instruction Tuning (2023) • Arxiv • 87 citations
Li et al.
Zero Bubble Pipeline Parallelism (2023) • No Venue
Qi et al.
Multi: Efficient Video-and-language Understanding With Text-guided Multiway-sampler And Multiple Choice Modeling (2023) • Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval • 50 citations
Xu et al.
Videopoet: A Large Language Model For Zero-shot Video Generation (2023) • No Venue
Kondratyuk et al.
Automating Code Review Activities By Large-scale Pre-training (2022) • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering • 129 citations
Li et al.
Dit: Self-supervised Pre-training For Document Image Transformer (2022) • MM '22: The 30th ACM International Conference on Multimedia • 118 citations
Li et al.
Backdoor Defense Via Decoupling The Training Process (2022) • Arxiv • 41 citations
Huang et al.
Layoutlmv3: Pre-training For Document AI With Unified Text And Image Masking (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 379 citations
Huang et al.
POLITICS: Pretraining With Same-story Article Comparison For Ideology Prediction And Stance Detection (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 40 citations
Liu et al.
A Smile Is All You Need: Predicting Limiting Activity Coefficients From SMILES With Natural Language Processing (2022) • Digital Discovery • 65 citations
Winter et al.
Scaling Up Models And Data With $\texttt{t5x}$ And $\texttt{seqio}$ (2022) • Arxiv • 47 citations
Roberts et al.
What To Hide From Your Students: Attention-guided Masked Image Modeling (2022) • Lecture Notes in Computer Science • 81 citations
Kakogeorgiou et al.
Learning Audio-video Modalities From Image Captions (2022) • Lecture Notes in Computer Science • 48 citations
Nagrani et al.
Chemberta-2: Towards Chemical Foundation Models (2022) • Arxiv • 120 citations
Ahmad et al.
Text And Code Embeddings By Contrastive Pre-training (2022) • Arxiv • 146 citations
Neelakantan et al.
Multimae: Multi-modal Multi-task Masked Autoencoders (2022) • Lecture Notes in Computer Science • 186 citations
Bachmann et al.
Improving Vision Transformers By Revisiting High-frequency Components (2022) • Lecture Notes in Computer Science • 72 citations
Bai et al.
Cross-domain Deep Code Search With Meta Learning (2022) • Proceedings of the 44th International Conference on Software Engineering • 40 citations
Chai et al.
Multi-level Contrastive Learning For Cross-lingual Alignment (2022) • Lecture Notes in Computer Science • 106 citations
Chen et al.
Large Language Models Meet Nl2code: A Survey (2022) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 81 citations
Zan et al.
CERT: Continual Pre-training On Sketches For Library-oriented Code Generation (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 46 citations
Zan et al.
Promda: Prompt-based Data Augmentation For Low-resource NLU Tasks (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 61 citations
Wang et al.
Pretrained Domain-specific Language Model For General Information Retrieval Tasks In The AEC Domain (2022) • Computers in Industry • 82 citations
Zheng et al.
Alpa: Automating Inter- And Intra-operator Parallelism For Distributed Deep Learning (2022) • Arxiv • 75 citations
Zheng et al.
Ernie-layout: Layout Knowledge Enhanced Pre-training For Visually-rich Document Understanding (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 53 citations
Peng et al.
Visual Speech Recognition For Multiple Languages In The Wild (2022) • Nature Machine Intelligence • 130 citations
Pingchuan Ma, Stavros Petridis, Maja Pantic
Understanding And Mitigating Overfitting In Prompt Tuning For Vision-language Models (2022) • IEEE Transactions on Circuits and Systems for Video Technology • 48 citations
Ma et al.
Self-supervised Hypergraph Transformer For Recommender Systems (2022) • KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 107 citations
Lianghao Xia, Chao Huang, Chuxu Zhang
Translation Between Molecules And Natural Language (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 92 citations
Edwards et al.
Masked Autoencoders For Point Cloud Self-supervised Learning (2022) • Lecture Notes in Computer Science • 421 citations
Pang et al.
Efficient Adapter Transfer Of Self-supervised Speech Models For Automatic Speech Recognition (2022) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 52 citations
Bethan Thomas, Samuel Kessler, Salah Karout
Linkbert: Pretraining Language Models With Document Links (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 163 citations
Michihiro Yasunaga, Jure Leskovec, Percy Liang
Leveraging Unimodal Self-supervised Learning For Multimodal Audio-visual Speech Recognition (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 42 citations
Pan et al.
Debiased Contrastive Learning Of Unsupervised Sentence Representations (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 94 citations
Zhou et al.
Text-only Training For Image Captioning Using Noise-injected CLIP (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 62 citations
David Nukrai, Ron Mokady, Amir Globerson
Vitaev2: Vision Transformer Advanced By Exploring Inductive Bias For Image Recognition And Beyond (2022) • International Journal of Computer Vision • 173 citations
Zhang et al.
Graphmae: Self-supervised Masked Graph Autoencoders (2022) • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining • 441 citations
Hou et al.
Signbert: Pre-training Of Hand-model-aware Representation For Sign Language Recognition (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 89 citations
Hu et al.
An Efficient Transformer Decoder With Compressed Sub-layers (2021) • Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23} • 46 citations
Li et al.
Continual Learning For Text Classification With Information Disentanglement Based Regularization (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 59 citations
Huang et al.
SUPERB: Speech Processing Universal Performance Benchmark (2021) • Interspeech 2021 • 553 citations
Yang et al.
Pre-training BERT On Arabic Tweets: Practical Considerations (2021) • Arxiv • 83 citations
Abdelali et al.
Zero-infinity: Breaking The GPU Memory Wall For Extreme Scale Deep Learning (2021) • Arxiv • 57 citations
Rajbhandari et al.
Musicbert: Symbolic Music Understanding With Large-scale Pre-training (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 79 citations
Zeng et al.
HET: Scaling Out Huge Embedding Model Training Via Cache-enabled Distributed Framework (2021) • Proceedings of the VLDB Endowment • 45 citations
Miao et al.
Factual Probing Is [MASK]: Learning Vs. Learning To Recall (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 230 citations
Zexuan Zhong, Dan Friedman, Danqi Chen
Efficient Large-scale Language Model Training On GPU Clusters Using Megatron-lm (2021) • SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis • 340 citations
Narayanan et al.
Learning Modality-specific Representations With Self-supervised Multi-task Learning For Multimodal Sentiment Analysis (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 555 citations
Yu et al.
Dual-branch Attention-in-attention Transformer For Single-channel Speech Enhancement (2021) • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 81 citations
Yu et al.
Scaling End-to-end Models For Large-scale Multilingual ASR (2021) • 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 46 citations
Li et al.
Peco: Perceptual Codebook For BERT Pre-training Of Vision Transformers (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 83 citations
Dong et al.
Towards Robustness Against Natural Language Word Substitutions (2021) • Arxiv • 63 citations
Dong et al.
Decoupling The Role Of Data, Attention, And Losses In Multimodal Transformers (2021) • Transactions of the Association for Computational Linguistics • 63 citations
Hendricks et al.
AMMU : A Survey Of Transformer-based Biomedical Pretrained Language Models (2021) • Journal of Biomedical Informatics • 212 citations
Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
Enhancing Knowledge Tracing Via Adversarial Training (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 77 citations
Guo et al.
Pre-trained Models: Past, Present And Future (2021) • AI Open • 700 citations
Han et al.
Delving Deep Into The Generalization Of Vision Transformers Under Distribution Shifts (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 70 citations
Zhang et al.
Multi-task Pre-training For Plug-and-play Task-oriented Dialogue System (2021) • Arxiv • 57 citations
Su et al.
Bob: BERT Over BERT For Training Persona-based Dialogue Models From Limited Personalized Data (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 91 citations
Song et al.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-image Pre-training Paradigm (2021) • Arxiv • 126 citations
Li et al.
Rethink Training Of BERT Rerankers In Multi-stage Retrieval Pipeline (2021) • Lecture Notes in Computer Science • 71 citations
Luyu Gao, Zhuyun Dai, Jamie Callan
Self-supervised Text-to-sql Learning With Header Alignment Training (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 160 citations
Donggyu Kim, Seanie Lee
Open-vocabulary Object Detection Via Vision And Language Knowledge Distillation (2021) • ICLR 2022 • 280 citations
Gu et al.
R-drop: Regularized Dropout For Neural Networks (2021) • Arxiv • 305 citations
Liang et al.
BENDR: Using Transformers And A Contrastive Self-supervised Learning Task To Learn From Massive Amounts Of EEG Data (2021) • Frontiers in Human Neuroscience • 172 citations
Demetres Kostas, Stephane Aroca-Ouellette, Frank Rudzicz
Lira: Learning Visual Speech Representations From Audio Through Self-supervision (2021) • Interspeech 2021 • 41 citations
Ma et al.
CDL: Curriculum Dual Learning For Emotion-controllable Response Generation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 80 citations
Lei Shen, Yang Feng
Don't Stop Pretraining: Adapt Language Models To Domains And Tasks (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 740 citations
Gururangan et al.
GPT-GNN: Generative Pre-training Of Graph Neural Networks (2020) • Arxiv • 71 citations
Hu et al.
Coarse-to-fine Pre-training For Named Entity Recognition (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 49 citations
Xue et al.
CERT: Contrastive Self-supervised Learning For Language Understanding (2020) • Arxiv • 196 citations
Fang et al.
Towards Learning A Generic Agent For Vision-and-language Navigation Via Pre-training (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 221 citations
Hao et al.
Improving Efficient Neural Ranking Models With Cross-architecture Knowledge Distillation (2020) • Arxiv • 64 citations
Hofstätter et al.
Semi-autoregressive Training Improves Mask-predict Decoding (2020) • Arxiv • 48 citations
Marjan Ghazvininejad, Omer Levy, Luke Zettlemoyer
Aligned Cross Entropy For Non-autoregressive Machine Translation (2020) • Arxiv • 68 citations
Ghazvininejad et al.
Adversarial Training For Aspect-based Sentiment Analysis With BERT (2020) • 2020 25th International Conference on Pattern Recognition (ICPR) • 41 citations
Akbar Karimi, Leonardo Rossi, Andrea Prati
Learning From Others' Mistakes: Avoiding Dataset Biases Without Modeling Them (2020) • Arxiv • 51 citations
Sanh et al.
Large-scale Adversarial Training For Vision-and-language Representation Learning (2020) • Arxiv • 287 citations
Gan et al.
Dialogue Response Ranking Training With Large-scale Human Feedback Data (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 62 citations
Gao et al.
Univl: A Unified Video And Language Pre-training Model For Multimodal Understanding And Generation (2020) • Arxiv • 169 citations
Luo et al.
Generative Data Augmentation For Commonsense Reasoning (2020) • Findings of the Association for Computational Linguistics: EMNLP 2020 • 92 citations
Yang et al.
Learning Dynamic Belief Graphs To Generalize On Text-based Games (2020) • Arxiv • 55 citations
Adhikari et al.
Discriminative Nearest Neighbor Few-shot Intent Detection By Transferring Natural Language Inference (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 65 citations
Zhang et al.
CLEAR: Contrastive Learning For Sentence Representation (2020) • Arxiv • 229 citations
Wu et al.
Response Selection For Multi-party Conversations With Dynamic Topic Tracking (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 40 citations
Weishi Wang, Shafiq Joty, Steven C. H. Hoi
Data Manipulation: Towards Effective Instance Learning For Neural Dialogue Generation Via Learning To Augment And Reweight (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 56 citations
Cai et al.
SOLOIST: Building Task Bots At Scale With Transfer Learning And Machine Teaching (2020) • Arxiv • 99 citations
Peng et al.
Adaptive Offline Quintuplet Loss For Image-text Matching (2020) • Lecture Notes in Computer Science • 67 citations
Tianlang Chen, Jiajun Deng, Jiebo Luo
Local Additivity Based Data Augmentation For Semi-supervised NER (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 45 citations
Chen et al.
Low-resource Domain Adaptation For Compositional Task-oriented Semantic Parsing (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 64 citations
Chen et al.
Grappa: Grammar-augmented Pre-training For Table Semantic Parsing (2020) • Arxiv • 59 citations
Yu et al.
Rocketqa: An Optimized Training Approach To Dense Passage Retrieval For Open-domain Question Answering (2020) • Arxiv • 74 citations
Qu et al.
Infoxlm: An Information-theoretic Framework For Cross-lingual Language Model Pre-training (2020) • Arxiv • 77 citations
Chi et al.
What Have We Achieved On Text Summarization? (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 70 citations
Huang et al.
Shallow-to-deep Training For Neural Machine Translation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 40 citations
Li et al.
Memory-efficient Pipeline-parallel DNN Training (2020) • Arxiv • 60 citations
Narayanan et al.
ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators (2020) • Arxiv • 541 citations
Clark et al.
Norm-based Curriculum Learning For Neural Machine Translation (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 96 citations
Liu et al.
Understanding The Difficulty Of Training Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 81 citations
Liu et al.
Data Augmentation Using Pre-trained Transformer Models (2020) • Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems • 61 citations
Varun Kumar, Ashutosh Choudhary, Eunah Cho
Hiertrain: Fast Hierarchical Edge AI Learning With Hybrid Parallelism In Mobile-edge-cloud Computing (2020) • IEEE Open Journal of the Communications Society • 65 citations
Liu et al.
UP-DETR: Unsupervised Pre-training For Object Detection With Transformers (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 401 citations
Dai et al.
Text-to-text Pre-training For Data-to-text Tasks (2020) • Proceedings of the 13th International Conference on Natural Language Generation • 135 citations
Mihir Kale, Abhinav Rastogi
Leveraging Unpaired Text Data For Training End-to-end Speech-to-intent Systems (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 56 citations
Huang et al.
Understanding Neural Abstractive Summarization Models Via Uncertainty (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 45 citations
Jiacheng Xu, Shrey Desai, Greg Durrett
Mixup-transformer: Dynamic Data Augmentation For NLP Tasks (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 72 citations
Sun et al.
A Streaming On-device End-to-end Model Surpassing Server-side Conventional Model Quality And Latency (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 200 citations
Sainath et al.
Self-training Improves Pre-training For Natural Language Understanding (2020) • Arxiv • 46 citations
Du et al.
Coach: A Coarse-to-fine Approach For Cross-domain Slot Filling (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 99 citations
Liu et al.
Dynamic Data Selection And Weighting For Iterative Back-translation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 46 citations
Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig
Few-shot Generative Conversational Query Rewriting (2020) • Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval • 124 citations
Yu et al.
Minimum Latency Training Strategies For Streaming Sequence-to-sequence ASR (2020) • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 46 citations
Inaguma et al.
Understanding And Improving Information Transfer In Multi-task Learning (2020) • Arxiv • 47 citations
Sen Wu, Hongyang R. Zhang, Christopher Ré
Object Relational Graph With Teacher-recommended Learning For Video Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 310 citations
Zhang et al.
Overcoming Language Priors With Self-supervised Learning For Visual Question Answering (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 108 citations
Zhu et al.
Single Headed Attention Based Sequence-to-sequence Model For State-of-the-art Results On Switchboard (2020) • Interspeech 2020 • 42 citations
Tüske et al.
Cross-lingual Retrieval For Iterative Self-supervised Training (2020) • NeurIPS 2020 • 48 citations
Tran et al.
A Study Of BFLOAT16 For Deep Learning Training (2019) • Arxiv • 67 citations
Kalamkar et al.
Non-monotonic Sequential Text Generation (2019) • Arxiv • 63 citations
Welleck et al.
Learning From Dialogue After Deployment: Feed Yourself, Chatbot! (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 66 citations
Hancock et al.
Negated And Misprimed Probes For Pretrained Language Models: Birds Can Talk, But Cannot Fly (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 69 citations
Nora Kassner, Hinrich Schütze
ERNIE 2.0: A Continual Pre-training Framework For Language Understanding (2019) • Arxiv • 74 citations
Sun et al.
Videobert: A Joint Model For Video And Language Representation Learning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 1077 citations
Sun et al.
Unsupervised Data Augmentation For Consistency Training (2019) • Arxiv • 1615 citations
Xie et al.
Overlearning Reveals Sensitive Attributes (2019) • Arxiv • 55 citations
Congzheng Song, Vitaly Shmatikov
Large-batch Training For LSTM And Beyond (2019) • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis • 74 citations
You et al.
Fairseq: A Fast, Extensible Toolkit For Sequence Modeling (2019) • Proceedings of the 2019 Conference of the North • 744 citations
Ott et al.
Linguistic Knowledge And Transferability Of Contextual Representations (2019) • Proceedings of the 2019 Conference of the North • 57 citations
Liu et al.
ALBERT: A Lite BERT For Self-supervised Learning Of Language Representations (2019) • Arxiv • 4051 citations
Lan et al.
Pre-training With Whole Word Masking For Chinese BERT (2019) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 957 citations
Cui et al.
A Systematic Comparison Of Methods For Low-resource Dependency Parsing On Genuinely Low-resource Languages (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 51 citations
Vania et al.
NEZHA: Neural Contextualized Representation For Chinese Language Understanding (2019) • Arxiv • 86 citations
Wei et al.
Speech Model Pre-training For End-to-end Spoken Language Understanding (2019) • Interspeech 2019 • 41 citations
Lugosch et al.
Key Fact As Pivot: A Two-stage Model For Low Resource Table-to-text Generation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 45 citations
Ma et al.
Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019) • Interspeech 2019 • 153 citations
Zhang et al.
Jasper: An End-to-end Convolutional Neural Acoustic Model (2019) • Interspeech 2019 • 212 citations
Li et al.
Probing What Different NLP Tasks Teach Machines About Function Word Comprehension (2019) • Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) • 94 citations
Kim et al.
Simple And Effective Curriculum Pointer-generator Networks For Reading Comprehension Over Long Narratives (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 91 citations
Tay et al.
Neural Text Generation With Unlikelihood Training (2019) • Arxiv • 241 citations
Welleck et al.
Roberta: A Robustly Optimized BERT Pretraining Approach (2019) • Arxiv • 16976 citations
Liu et al.
Differential Privacy Has Disparate Impact On Model Accuracy (2019) • Arxiv • 77 citations
Eugene Bagdasaryan, Vitaly Shmatikov
BART: Denoising Sequence-to-sequence Pre-training For Natural Language Generation, Translation, And Comprehension (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 3464 citations
Lewis et al.
Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019) • Interspeech 2019 • 155 citations
Kannan et al.
End-to-end ASR: From Supervised To Semi-supervised Learning With Modern Architectures (2019) • Arxiv • 165 citations
Synnaeve et al.
Revisiting Self-training For Neural Sequence Generation (2019) • Arxiv • 139 citations
He et al.
Emerging Cross-lingual Structure In Pretrained Language Models (2019) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 46 citations
Wu et al.
Fine-tune Bert For Docred With Two-step Process (2019) • Arxiv • 116 citations
Wang et al.
Self-supervised Learning For Contextualized Extractive Summarization (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 61 citations
Wang et al.
Low Resource Text Classification With Ulmfit And Backtranslation (2019) • Arxiv • 43 citations
Sam Shleifer
Training Neural Response Selection For Task-oriented Dialogue Systems (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 83 citations
Henderson et al.
Layoutlm: Pre-training Of Text And Layout For Document Image Understanding (2019) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 478 citations
Xu et al.
Environmental Drivers Of Systematicity And Generalization In A Situated Agent (2019) • Arxiv • 53 citations
Hill et al.
Cyclical Annealing Schedule: A Simple Approach To Mitigating KL Vanishing (2019) • Arxiv • 169 citations
Fu et al.
End-to-end Speech Translation With Knowledge Distillation (2019) • Interspeech 2019 • 139 citations
Liu et al.
Universal Sentence Encoder (2018) • Arxiv • 1289 citations
Cer et al.
Incsql: Training Incremental Text-to-sql Parsers With Non-deterministic Oracles (2018) • Arxiv • 59 citations
Shi et al.
Dynamical Isometry And A Mean Field Theory Of Rnns: Gating Enables Signal Propagation In Recurrent Neural Networks (2018) • Arxiv • 73 citations
Minmin Chen, Jeffrey Pennington, Samuel S. Schoenholz
The Best Of Both Worlds: Combining Recent Advances In Neural Machine Translation (2018) • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 131 citations
Chen et al.
Multi-modal Data Augmentation For End-to-end ASR (2018) • Interspeech 2018 • 54 citations
Renduchintala et al.
Gpipe: Efficient Training Of Giant Neural Networks Using Pipeline Parallelism (2018) • Arxiv • 236 citations
Huang et al.
Semi-supervised Sequence Modeling With Cross-view Training (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 391 citations
Clark et al.
Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018) • 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) • 137 citations
Kanishka Rao, Haşim Sak, Rohit Prabhavalkar
On The Effectiveness Of Task Granularity For Transfer Learning (2018) • Arxiv • 50 citations
Mahdisoltani et al.
Natural Language To Structured Query Generation Via Meta-learning (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 120 citations
Huang et al.
Back-translation-style Data Augmentation For End-to-end ASR (2018) • 2018 IEEE Spoken Language Technology Workshop (SLT) • 95 citations
Hayashi et al.
Improved Training Of End-to-end Attention Models For Speech Recognition (2018) • Interspeech 2018 • 277 citations
Zeyer et al.
Training Tips For The Transformer Model (2018) • The Prague Bulletin of Mathematical Linguistics • 109 citations
Martin Popel, Ondřej Bojar
A Simple Method For Commonsense Reasoning (2018) • Arxiv • 312 citations
Trieu H. Trinh, Quoc V. Le
Learning General Purpose Distributed Sentence Representations Via Large Scale Multi-task Learning (2018) • Arxiv • 101 citations
Subramanian et al.
Large Scale Distributed Neural Network Training Through Online Distillation (2018) • Arxiv • 152 citations
Anil et al.
Reaching Human-level Performance In Automatic Grammatical Error Correction: An Empirical Study (2018) • Arxiv • 97 citations
Tao Ge, Furu Wei, Ming Zhou
Adversarial Over-sensitivity And Over-stability Strategies For Dialogue Models (2018) • Proceedings of the 22nd Conference on Computational Natural Language Learning • 61 citations
Tong Niu, Mohit Bansal
Improving Generalization Performance By Switching From Adam To SGD (2017) • Arxiv • 403 citations
Nitish Shirish Keskar, Richard Socher
Dual Rectified Linear Units (drelus): A Replacement For Tanh Activation Functions In Quasi-recurrent Neural Networks (2017) • Pattern Recognition Letters • 45 citations
Godin et al.
Deep Active Learning For Named Entity Recognition (2017) • Proceedings of the 2nd Workshop on Representation Learning for NLP • 364 citations
Shen et al.
Discourse-based Objectives For Fast Unsupervised Sentence Representation Learning (2017) • Arxiv • 111 citations
Yacine Jernite, Samuel R. Bowman, David Sontag
Cold Fusion: Training Seq2seq Models Together With Language Models (2017) • Interspeech 2018 • 111 citations
Sriram et al.
Yellowfin And The Art Of Momentum Tuning (2017) • Arxiv • 63 citations
Jian Zhang, Ioannis Mitliagkas
I2T2I: Learning Text To Image Synthesis With Textual Data Augmentation (2017) • 2017 IEEE International Conference on Image Processing (ICIP) • 60 citations
Dong et al.
Adacomp : Adaptive Residual Gradient Compression For Data-parallel Distributed Training (2017) • Arxiv • 74 citations
Chen et al.
Deep Gradient Compression: Reducing The Communication Bandwidth For Distributed Training (2017) • ICLR 2018 • 645 citations
Lin et al.
Gradnorm: Gradient Normalization For Adaptive Loss Balancing In Deep Multitask Networks (2017) • Proceedings of the 35th International Conference on Machine Learning (2018) 793-802 • 443 citations
Chen et al.
Tacotron: Towards End-to-end Speech Synthesis (2017) • Interspeech 2017 • 1567 citations
Wang et al.
Curriculum Learning And Minibatch Bucketing In Neural Machine Translation (2017) • RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning • 127 citations
Tom Kocmi, Ondrej Bojar
One-shot Imitation Learning (2017) • Arxiv • 278 citations
Duan et al.
Deeparchitect: Automatically Designing And Training Deep Architectures (2017) • Arxiv • 142 citations
Renato Negrinho, Geoff Gordon
Neural Semantic Parsing By Character-based Translation: Experiments With Abstract Meaning Representations (2017) • Arxiv • 82 citations
Rik van Noord, Johan Bos
Improved Training Of Wasserstein Gans (2017) • Arxiv • 1508 citations
Gulrajani et al.
Capacity And Trainability In Recurrent Neural Networks (2016) • Arxiv • 82 citations
Jasmine Collins, Jascha Sohl-Dickstein, David Sussillo
Chinese Song Iambics Generation With Neural Attention-based Model (2016) • Arxiv • 54 citations
Wang et al.
Variational Neural Machine Translation (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 189 citations
Zhang et al.
A Joint Many-task Model: Growing A Neural Network For Multiple NLP Tasks (2016) • Arxiv • 52 citations
Hashimoto et al.

Showing first 12 while collapsed. Click to expand and reveal all 399.

— U —

Uncategorized 54 papers #

Reasoning Over Boundaries: Enhancing Specification Alignment Via Test-time Delibration (2025) • No Venue
Zhang et al.
Language Models Model Language (2025) • No Venue
Łukasz Borchmann
Spacer: Towards Engineered Scientific Inspiration (2025) • No Venue
Lee et al.
Aligning Latent Spaces With Flow Priors (2025) • No Venue
Li et al.
Project Alexandria: Towards Freeing Scientific Knowledge From Copyright Burdens Via Llms (2025) • No Venue
Schuhmann et al.
Learning To Refuse: Towards Mitigating Privacy Risks In Llms (2024) • No Venue
Liu et al.
Constant Acceleration Flow (2024) • No Venue
Park et al.
E2 TTS: Embarrassingly Easy Fully Non-autoregressive Zero-shot TTS (2024) • No Venue
Eskimez et al.
Learning Fine-grained Bimanual Manipulation With Low-cost Hardware (2023) • Robotics: Science and Systems XIX • 261 citations
Zhao et al.
The Feasibility Of Artificial Consciousness Through The Lens Of Neuroscience (2023) • Trends in Neurosciences • 80 citations
Jaan Aru, Matthew Larkum, James M. Shine
Brain2music: Reconstructing Music From Human Brain Activity (2023) • No Venue
Denk et al.
Generation Of 3D Molecules In Pockets Via Language Model (2023) • Nature Machine Intelligence • 44 citations
Feng et al.
Testing Of Detection Tools For Ai-generated Text (2023) • International Journal for Educational Integrity • 278 citations
Weber-Wulff et al.
Story-to-motion: Synthesizing Infinite And Controllable Character Animation From Long Text (2023) • No Venue
Qing et al.
Accurate RNA 3D Structure Prediction Using A Language Model-based Deep Learning Approach (2022) • Nature Methods • 97 citations
Shen et al.
Deep Learning-based Sentiment Analysis Of COVID-19 Vaccination Responses From Twitter Data (2022) • Computational and Mathematical Methods in Medicine • 107 citations
Alam et al.
Multi-source Unsupervised Domain Adaptation Via Pseudo Target Domain (2022) • IEEE Transactions on Image Processing • 104 citations
Chuan-Xian et al.
Neural Networks And The Chomsky Hierarchy (2022) • Arxiv • 45 citations
Delétang et al.
Talking About Large Language Models (2022) • Communications of the ACM • 180 citations
Murray Shanahan
Generative Power Of A Protein Language Model Trained On Multiple Sequence Alignments (2022) • eLife • 40 citations
Damiano Sgarbossa, Umberto Lupo, Anne-Florence Bitbol
What Artificial Neural Networks Can Tell Us About Human Language Acquisition (2022) • Algebraic Structures in Natural Language • 89 citations
Alex Warstadt, Samuel R. Bowman
C3-STISR: Scene Text Image Super-resolution With Triple Clues (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 40 citations
Zhao et al.
Biologically Inspired Design Concept Generation Using Generative Pre-trained Transformers (2022) • Journal of Mechanical Design • 56 citations
Qihao Zhu, Xinyu Zhang, Jianxi Luo
Bertopic: Neural Topic Modeling With A Class-based TF-IDF Procedure (2022) • Arxiv • 1215 citations
Maarten Grootendorst
Automatically Matching Bug Reports With Related App Reviews (2021) • 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) • 49 citations
Marlo Häring, Christoph Stanik, Walid Maalej
Mathbert: A Pre-trained Model For Mathematical Formula Understanding (2021) • Arxiv • 53 citations
Peng et al.
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task (2021) • Arxiv • 46 citations
Abdul-Mageed et al.
A Skeleton-driven Neural Occupancy Representation For Articulated Hands (2021) • 2021 International Conference on 3D Vision (3DV) • 51 citations
Karunratanakul et al.
In-home Daily-life Captioning Using Radio Signals (2020) • Lecture Notes in Computer Science • 46 citations
Fan et al.
Transformers For Modeling Physical Systems (2020) • Neural Networks • 121 citations
Nicholas Geneva, Nicholas Zabaras
Public Sentiment Toward Solar Energy: Opinion Mining Of Twitter Using A Transformer-based Language Model (2020) • Sustainability • 60 citations
Kim et al.
Hifisinger: Towards High-fidelity Neural Singing Voice Synthesis (2020) • Arxiv • 43 citations
Chen et al.
Quantifying Social Organization And Political Polarization In Online Platforms (2020) • Nature • 126 citations
Isaac Waller, Ashton Anderson
Semeval-2020 Task 12: Multilingual Offensive Language Identification In Social Media (offenseval 2020) (2020) • Proceedings of the Fourteenth Workshop on Semantic Evaluation • 374 citations
Zampieri et al.
Progen: Language Modeling For Protein Generation (2020) • Arxiv • 143 citations
Madani et al.
The Curious Case Of Neural Text Degeneration (2019) • Arxiv • 1092 citations
Holtzman et al.
Humanoid: A Deep Learning-based Approach To Automated Black-box Android App Testing (2019) • 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) • 138 citations
Li et al.
What Do You Learn From Context? Probing For Sentence Structure In Contextualized Word Representations (2019) • Arxiv • 139 citations
Tenney et al.
Are We Consistently Biased? Multidimensional Analysis Of Biases In Distributional Word Vectors (2019) • Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) • 53 citations
Anne Lauscher, Goran Glavaš
Glyce: Glyph-vectors For Chinese Character Representations (2019) • Arxiv • 147 citations
Meng et al.
Topic-guided Variational Autoencoders For Text Generation (2019) • Arxiv • 57 citations
Wang et al.
Controllable Unsupervised Text Attribute Transfer Via Editing Entangled Latent Representation (2019) • Arxiv • 44 citations
Ke Wang, Hang Hua, Xiaojun Wan
Multiple-attribute Text Style Transfer (2018) • Arxiv • 52 citations
Subramanian et al.
Constrained Generation Of Semantically Valid Graphs Via Regularizing Variational Autoencoders (2018) • Arxiv • 82 citations
Tengfei Ma, Jie Chen, Cao Xiao
The Conll--sigmorphon 2018 Shared Task: Universal Morphological Reinflection (2018) • Arxiv • 56 citations
Cotterell et al.
DP-GAN: Diversity-promoting Generative Adversarial Network For Generating Informative And Diversified Text (2018) • Arxiv • 58 citations
Xu et al.
Fairness Without Demographics In Repeated Loss Minimization (2018) • Arxiv • 131 citations
Hashimoto et al.
Content Preserving Text Generation With Attribute Controls (2018) • Arxiv • 82 citations
Lajanugen Logeswaran, Honglak Lee, Samy Bengio
Massively Multilingual Neural Grapheme-to-phoneme Conversion (2017) • Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems • 41 citations
Ben Peters, Jon Dehdari, Josef van Genabith
Event Coreference Resolution By Iteratively Unfolding Inter-dependencies Among Events (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 61 citations
Prafulla Kumar Choubey, Ruihong Huang
SLING: A Framework For Frame Semantic Parsing (2017) • Arxiv • 41 citations
Michael Ringgaard, Rahul Gupta, Fernando C. N. Pereira
Empath: Understanding Topic Signals In Large-scale Text (2016) • CHI'16: CHI Conference on Human Factors in Computing Systems • 232 citations
Ethan Fast, Binbin Chen, Michael Bernstein
Stance Detection With Bidirectional Conditional Encoding (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 49 citations
Augenstein et al.
Cognitive Science In The Era Of Artificial Intelligence: A Roadmap For Reverse-engineering The Infant Language-learner (2016) • Cognition • 200 citations
Emmanuel Dupoux

Showing first 12 while collapsed. Click to expand and reveal all 54.

— V —

Vision Language 1357 papers #

4kagent: Agentic Any Image To 4K Super-resolution (2025) • No Venue
Zuo et al.
Segagent: Exploring Pixel Understanding Capabilities In Mllms By Imitating Human Annotator Trajectories (2025) • No Venue
Zhu et al.
MMR-V: What's Left Unsaid? A Benchmark For Multimodal Deep Reasoning In Videos (2025) • No Venue
Zhu et al.
Is Extending Modality The Right Path Towards Omni-modality? (2025) • No Venue
Zhu et al.
Vargpt-v1.1: Improve Visual Autoregressive Large Unified Model Via Iterative Instruction Tuning And Reinforcement Learning (2025) • No Venue
Zhuang et al.
Interactiveomni: A Unified Omni-modal Model For Audio-visual Multi-turn Dialogue (2025) • No Venue
Tong et al.
Thinking With Video: Video Generation As A Promising Multimodal Reasoning Paradigm (2025) • No Venue
Tong et al.
Vision-guided Chunking Is All You Need: Enhancing RAG With Multimodal Document Understanding (2025) • No Venue
Tripathi et al.
Siglip 2: Multilingual Vision-language Encoders With Improved Semantic Understanding, Localization, And Dense Features (2025) • No Venue
Tschannen et al.
Longwriter-v: Enabling Ultra-long And High-fidelity Generation In Vision-language Models (2025) • No Venue
Tu et al.
Time Blindness: Why Video-language Models Can't See What Humans Can? (2025) • No Venue
Upadhyay et al.
Vision Language Models Are Biased (2025) • No Venue
Vo et al.
SRPO: Enhancing Multimodal LLM Reasoning Via Reflection-aware Reinforcement Learning (2025) • No Venue
Wan et al.
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation And Methodology (2025) • No Venue
Wang et al.
Autoregressive Semantic Visual Reconstruction Helps Vlms Understand Better (2025) • No Venue
Wang et al.
Bitvla: 1-bit Vision-language-action Models For Robotics Manipulation (2025) • No Venue
Wang et al.
Declip: Decoupled Learning For Open-vocabulary Dense Perception (2025) • No Venue
Wang et al.
Fantasytalking: Realistic Talking Portrait Generation Via Coherent Motion Synthesis (2025) • No Venue
Wang et al.
Genexam: A Multidisciplinary Text-to-image Exam (2025) • No Venue
Wang et al.
Grasp Any Region: Towards Precise, Contextual Pixel Understanding For Multimodal Llms (2025) • No Venue
Wang et al.
Llava-critic-r1: Your Critic Model Is Secretly A Strong Policy Model (2025) • No Venue
Wang et al.
Internvl3.5: Advancing Open-source Multimodal Models In Versatility, Reasoning, And Efficiency (2025) • No Venue
Wang et al.
When Visualizing Is The First Step To Reasoning: MIRA, A Benchmark For Visual Chain-of-thought (2025) • No Venue
Zhou et al.
R1-zero's "aha Moment" In Visual Reasoning On A 2B Non-sft Model (2025) • No Venue
Zhou et al.
Scientists' First Exam: Probing Cognitive Abilities Of MLLM Via Perception, Understanding, And Reasoning (2025) • No Venue
Zhou et al.
Roborefer: Towards Spatial Referring With Reasoning In Vision-language Models For Robotics (2025) • No Venue
Zhou et al.
Reinforced Visual Perception With Tools (2025) • No Venue
Zhou et al.
A Survey On Vision-language-action Models: An Action Tokenization Perspective (2025) • No Venue
Zhong et al.
Phi-ground Tech Report: Advancing Perception In GUI Grounding (2025) • No Venue
Zhang et al.
Generative Universal Verifier As Multimodal Meta-reasoner (2025) • No Venue
Zhang et al.
Latent Sketchpad: Sketching Visual Thoughts To Elicit Multimodal Reasoning In Mllms (2025) • No Venue
Zhang et al.
Llava-mini: Efficient Image And Video Large Multimodal Models With One Vision Token (2025) • No Venue
Zhang et al.
Openmmreasoner: Pushing The Frontiers For Multimodal Reasoning With An Open And General Recipe (2025) • No Venue
Zhang et al.
Pixel-sail: Single Transformer For Pixel-grounded Understanding (2025) • No Venue
Zhang et al.
Sec: Advancing Complex Video Object Segmentation Via Progressive Concept Construction (2025) • No Venue
Zhang et al.
Vlm^2-bench: A Closer Look At How Well Vlms Implicitly Link Explicit Matching Visual Cues (2025) • No Venue
Zhang et al.
From Spatial To Actions: Grounding Vision-language-action Model In Spatial Foundation Priors (2025) • No Venue
Zhang et al.
Stream-omni: Simultaneous Multimodal Interactions With Large Language-vision-speech Model (2025) • No Venue
Zhang et al.
Thyme: Think Beyond Images (2025) • No Venue
Zhang et al.
Vision-language-vision Auto-encoder: Scalable Knowledge Distillation From Diffusion Models (2025) • No Venue
Zhang et al.
Videollama 3: Frontier Multimodal Foundation Models For Image And Video Understanding (2025) • No Venue
Zhang et al.
Unified Multimodal Understanding And Generation Models: Advances, Challenges, And Opportunities (2025) • No Venue
Zhang et al.
Envisioning Beyond The Pixels: Benchmarking Reasoning-informed Visual Editing (2025) • No Venue
Zhao et al.
SAIL-VL2 Technical Report (2025) • No Venue
Yin et al.
Visual Representation Alignment For Multimodal Large Language Models (2025) • No Venue
Yoon et al.
Aligning Multimodal LLM With Human Preference: A Survey (2025) • No Venue
Yu et al.
Discrete Diffusion In Large Language And Multimodal Models: A Survey (2025) • No Venue
Runpeng Yu, Qi Li, Xinchao Wang
How Far Are Vlms From Visual Spatial Intelligence? A Benchmark-driven Perspective (2025) • No Venue
Yu et al.
Unicorn: Text-only Data Synthesis For Vision Language Model Training (2025) • No Venue
Yu et al.
Sa2va: Marrying SAM2 With Llava For Dense Grounded Understanding Of Images And Videos (2025) • No Venue
Yuan et al.
Being-0: A Humanoid Robotic Agent With Vision-language Models And Modular Skills (2025) • No Venue
Yuan et al.
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems In Visual Contexts (2025) • No Venue
Yuan et al.
Shortv: Efficient Multimodal Large Language Models By Freezing Visual Tokens In Ineffective Layers (2025) • No Venue
Yuan et al.
Vl-cogito: Progressive Curriculum Reinforcement Learning For Advanced Multimodal Reasoning (2025) • No Venue
Yuan et al.
Rlinf-vla: A Unified And Efficient Framework For VLA+RL Training (2025) • No Venue
Zang et al.
Internlm-xcomposer2.5-reward: A Simple Yet Effective Multi-modal Reward Model (2025) • No Venue
Zang et al.
A Vision-language-action-critic Model For Robotic Real-world Reinforcement Learning (2025) • No Venue
Zhai et al.
Vision-r1: Evolving Human-free Alignment In Large Vision-language Models Via Vision-guided Reinforcement Learning (2025) • No Venue
Zhan et al.
Embodied-reasoner: Synergizing Visual Search, Reasoning, And Action For Embodied Interactive Tasks (2025) • No Venue
Zhang et al.
2.5 Years In Class: A Multimodal Textbook For Vision-language Pretraining (2025) • No Venue
Zhang et al.
Dreamvla: A Vision-language-action Model Dreamed With Comprehensive World Knowledge (2025) • No Venue
Zhang et al.
Mathcoder-vl: Bridging Vision And Code For Enhanced Multimodal Mathematical Reasoning (2025) • No Venue
Wang et al.
Part-x-mllm: Part-aware 3D Multimodal Large Language Model (2025) • No Venue
Wang et al.
Mmlongbench: Benchmarking Long-context Vision-language Models Effectively And Thoroughly (2025) • No Venue
Wang et al.
ODYSSEY: Open-world Quadrupeds Exploration And Manipulation For Long-horizon Tasks (2025) • No Venue
Wang et al.
Ovis-u1 Technical Report (2025) • No Venue
Wang et al.
Opencua: Open Foundations For Computer-use Agents (2025) • No Venue
Wang et al.
Skywork-vl Reward: An Effective Reward Model For Multimodal Understanding And Reasoning (2025) • No Venue
Wang et al.
Pref-grpo: Pairwise Preference Reward-based GRPO For Stable Text-to-image Reinforcement Learning (2025) • No Venue
Wang et al.
Perception-aware Policy Optimization For Multimodal Reasoning (2025) • No Venue
Wang et al.
Pixnerd: Pixel Neural Field Diffusion (2025) • No Venue
Wang et al.
Roboomni: Proactive Robot Manipulation In Omni-modal Context (2025) • No Venue
Wang et al.
Scaling Pre-training To One Hundred Billion Data For Vision Language Models (2025) • No Venue
Wang et al.
Sota With Less: Mcts-guided Sample Selection For Data-efficient Visual Reasoning Self-improvement (2025) • No Venue
Wang et al.
Finevision: Open Data Is All You Need (2025) • No Venue
Wiedmann et al.
Vision-zero: Scalable VLM Self-improvement Via Strategic Gamified Self-play (2025) • No Venue
Wang et al.
Unified Multimodal Chain-of-thought Reward Model Through Reinforcement Fine-tuning (2025) • No Venue
Wang et al.
Vidorag: Visual Document Retrieval-augmented Generation Via Dynamic Iterative Reasoning Agents (2025) • No Venue
Wang et al.
Vicrit: A Verifiable Reinforcement Learning Proxy Task For Visual Perception In Vlms (2025) • No Venue
Wang et al.
Vla-adapter: An Effective Paradigm For Tiny-scale Vision-language-action Model (2025) • No Venue
Wang et al.
Visualprm: An Effective Process Reward Model For Multimodal Reasoning (2025) • No Venue
Wang et al.
Vl-rethinker: Incentivizing Self-reflection Of Vision-language Models With Reinforcement Learning (2025) • No Venue
Wang et al.
World Modeling Makes A Better Planner: Dual Preference Optimization For Embodied Task Planning (2025) • No Venue
Wang et al.
Ggbench: A Geometric Generative Reasoning Benchmark For Unified Multimodal Models (2025) • No Venue
Wei et al.
Deepseek-ocr: Contexts Optical Compression (2025) • No Venue
Haoran Wei, Yaofeng Sun, Yukun Li
Advancing Multimodal Reasoning Via Reinforcement Learning With Cold Start (2025) • No Venue
Wei et al.
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior For Visual Reasoning (2025) • No Venue
Wei et al.
Streamvln: Streaming Vision-and-language Navigation Via Slowfast Context Modeling (2025) • No Venue
Wei et al.
Univideo: Unified Understanding, Generation, And Editing For Videos (2025) • No Venue
Wei et al.
Video Models Are Zero-shot Learners And Reasoners (2025) • No Venue
Wiedemer et al.
Any2caption:interpreting Any Condition To Caption For Controllable Video Generation (2025) • No Venue
Wu et al.
Boosting Multimodal Reasoning With Mcts-automated Structured Thinking (2025) • No Venue
Wu et al.
COMPACT: Compositional Atomic-to-complex Visual Capability Tuning (2025) • No Venue
Wu et al.
Generate, But Verify: Reducing Hallucination In Vision-language Models With Retrospective Resampling (2025) • No Venue
Wu et al.
Reinforcement Learning In Vision: A Survey (2025) • No Venue
Wu et al.
Mmsearch-r1: Incentivizing Lmms To Search (2025) • No Venue
Wu et al.
Qwen-image Technical Report (2025) • No Venue
Wu et al.
Spatial-mllm: Boosting MLLM Capabilities In Visual-based Spatial Intelligence (2025) • No Venue
Wu et al.
Rewarddance: Reward Scaling In Visual Generation (2025) • No Venue
Wu et al.
Vidic: Video Difference Captioning (2025) • No Venue
Wu et al.
Synthrl: Scaling Visual Reasoning With Verifiable Data Synthesis (2025) • No Venue
Wu et al.
Visual Jigsaw Post-training Improves Mllms (2025) • No Venue
Wu et al.
Dreamomni2: Multimodal Instruction-based Editing And Generation (2025) • No Venue
Xia et al.
MIEB: Massive Image Embedding Benchmark (2025) • No Venue
Xiao et al.
Scaling Language-centric Omnimodal Representation Learning (2025) • No Venue
Xiao et al.
Are Vlms Ready For Autonomous Driving? An Empirical Study From The Reliability, Data, And Metric Perspectives (2025) • No Venue
Xie et al.
Show-o2: Improved Native Unified Multimodal Models (2025) • No Venue
Jinheng Xie, Zhenheng Yang, Mike Zheng Shou
Reconstruction Alignment Improves Unified Multimodal Models (2025) • No Venue
Xie et al.
Caprl: Stimulating Dense Image Caption Capabilities Via Reinforcement Learning (2025) • No Venue
Xing et al.
Scalecap: Inference-time Scalable Image Captioning Via Dual-modality Debiasing (2025) • No Venue
Xing et al.
Deepphy: Benchmarking Agentic Vlms On Physical Reasoning (2025) • No Venue
Xu et al.
Vs-bench: Evaluating Vlms For Strategic Reasoning And Decision-making In Multi-agent Environments (2025) • No Venue
Xu et al.
Jodi: Unification Of Visual Generation And Understanding Via Joint Modeling (2025) • No Venue
Xu et al.
Visulogic: A Benchmark For Evaluating Visual Reasoning In Multi-modal Large Language Models (2025) • No Venue
Xu et al.
Streamingvlm: Real-time Understanding For Infinite Video Streams (2025) • No Venue
Xu et al.
Visual Planning: Let's Think Only With Images (2025) • No Venue
Xu et al.
Can Understanding And Generation Truly Benefit Together -- Or Just Coexist? (2025) • No Venue
Yan et al.
Lohovla: A Unified Vision-language-action Model For Long-horizon Embodied Tasks (2025) • No Venue
Yang et al.
Kwai Keye-vl 1.5 Technical Report (2025) • No Venue
Yang et al.
Medical World Model: Generative Simulation Of Tumor Evolution For Treatment Planning (2025) • No Venue
Yang et al.
Machine Mental Imagery: Empower Multimodal Reasoning With Latent Visual Tokens (2025) • No Venue
Yang et al.
Magma: A Foundation Model For Multimodal AI Agents (2025) • No Venue
Yang et al.
Mindjourney: Test-time Scaling With World Models For Spatial Reasoning (2025) • No Venue
Yang et al.
Steering Vision-language-action Models As Anti-exploration: A Test-time Scaling Approach (2025) • No Venue
Yang et al.
Visual Spatial Tuning (2025) • No Venue
Yang et al.
Visionthink: Smart And Efficient Vision Language Model Via Reinforcement Learning (2025) • No Venue
Yang et al.
Zerogui: Automating Online GUI Learning At Zero Human Cost (2025) • No Venue
Yang et al.
Vlaser: Vision-language-action Model With Synergistic Embodied Reasoning (2025) • No Venue
Yang et al.
Dynvfx: Augmenting Real Videos With Dynamic Content (2025) • No Venue
Yatim et al.
Through-the-mask: Mask-based Motion Trajectories For Image-to-video Generation (2025) • No Venue
Yariv et al.
Omnivinci: Enhancing Architecture And Data For Omni-modal Understanding LLM (2025) • No Venue
Ye et al.
Ultraflux: Data-model Co-design For High-quality Native 4K Text-to-image Generation Across Diverse Aspect Ratios (2025) • No Venue
Tian Ye, Song Fei, Lei Zhu
Shapellm-omni: A Native Multimodal LLM For 3D Generation And Understanding (2025) • No Venue
Ye et al.
Skywork R1V2: Multimodal Hybrid Reinforcement Learning For Reasoning (2025) • No Venue
Chris et al.
Minimax-01: Scaling Foundation Models With Lightning Attention (2025) • No Venue
Minimax et al.
World Simulation With Video Foundation Models For Physical AI (2025) • No Venue
Nvidia et al.
Phi-4-mini Technical Report: Compact Yet Powerful Multimodal Language Models Via Mixture-of-loras (2025) • No Venue
Abouelenin et al.
LFM2 Technical Report (2025) • No Venue
Amini et al.
Llava-onevision-1.5: Fully Open Framework For Democratized Multimodal Training (2025) • No Venue
An et al.
Surfer 2: The Next Generation Of Cross-platform Computer Use Agents (2025) • No Venue
Andreux et al.
Qwen2.5-vl Technical Report (2025) • No Venue
Bai et al.
Univg-r1: Reasoning Guided Universal Visual Grounding With Reinforcement Learning (2025) • No Venue
Bai et al.
Qwen3-vl Technical Report (2025) • No Venue
Bai et al.
VERIFY: A Benchmark Of Visual Explanation And Reasoning For Investigating Multimodal Reasoning Fidelity (2025) • No Venue
Bi et al.
Perception Encoder: The Best Visual Embeddings Are Not At The Output Of The Network (2025) • No Venue
Bolya et al.
Enhancing Vision-language Model Training With Reinforcement Learning In Synthetic Worlds For Real-world Success (2025) • No Venue
Bredis et al.
Microvqa: A Multimodal Reasoning Benchmark For Microscopy-based Scientific Research (2025) • No Venue
Burgess et al.
Crowdsource, Crawl, Or Generate? Creating SEA-VL, A Multicultural Vision-language Dataset For Southeast Asia (2025) • No Venue
Cahyawijaya et al.
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study (2025) • No Venue
Cai et al.
Scaling Spatial Intelligence With Multimodal Foundation Models (2025) • No Venue
Cai et al.
MORSE-500: A Programmatically Controllable Video Benchmark To Stress-test Multimodal Reasoning (2025) • No Venue
Cai et al.
Hunyuanimage 3.0 Technical Report (2025) • No Venue
Cao et al.
Worldvla: Towards Autoregressive Action World Model (2025) • No Venue
Cen et al.
Rynnvla-002: A Unified Vision-language-action And World Model (2025) • No Venue
Cen et al.
GR-3 Technical Report (2025) • No Venue
Cheang et al.
Humo: Human-centric Video Generation Via Collaborative Multi-modal Conditioning (2025) • No Venue
Chen et al.
Advancing Multimodal Reasoning: From Optimized Cold Start To Staged Reinforcement Learning (2025) • No Venue
Chen et al.
Astra: Toward General-purpose Mobile Robots Via Hierarchical Multimodal Learning (2025) • No Venue
Chen et al.
Comp: Continual Multimodal Pre-training For Vision Foundation Models (2025) • No Venue
Chen et al.
Code2video: A Code-centric Paradigm For Educational Video Generation (2025) • No Venue
Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou
ERA: Transforming Vlms Into Embodied Agents Via Embodied Prior Learning And Online Reinforcement Learning (2025) • No Venue
Chen et al.
An Empirical Study Of Gpt-4o Image Generation Capabilities (2025) • No Venue
Chen et al.
Eagle 2.5: Boosting Long-context Post-training For Frontier Vision-language Models (2025) • No Venue
Chen et al.
Exploring The Effect Of Reinforcement Learning On Video Understanding: Insights From Seed-bench-r1 (2025) • No Venue
Chen et al.
GRPO-CARE: Consistency-aware Reinforcement Learning For Multimodal Reasoning (2025) • No Venue
Chen et al.
Geometrically-constrained Agent For Spatial Reasoning (2025) • No Venue
Chen et al.
Spacetools: Tool-augmented Spatial Reasoning Via Double Interactive RL (2025) • No Venue
Chen et al.
Mvi-bench: A Comprehensive Benchmark For Evaluating Robustness To Misleading Visual Inputs In Lvlms (2025) • No Venue
Chen et al.
Livecc: Learning Video LLM With Streaming Speech Transcription At Scale (2025) • No Venue
Chen et al.
Multimodal Representation Alignment For Image Generation: Text-image Interleaved Control Is Easier Than You Think (2025) • No Venue
Chen et al.
Moca: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings (2025) • No Venue
Chen et al.
Sharegpt-4o-image: Aligning Multimodal Models With Gpt-4o-level Image Generation (2025) • No Venue
Chen et al.
Planning With Reasoning Using Vision Language World Model (2025) • No Venue
Chen et al.
SFT Or RL? An Early Investigation Into Training R1-like Reasoning Large Vision-language Models (2025) • No Venue
Chen et al.
Think With 3D: Geometric Imagination Grounded Spatial Reasoning From Limited Views (2025) • No Venue
Chen et al.
Villa-x: Enhancing Latent Action Modeling In Vision-language-action Models (2025) • No Venue
Chen et al.
Π_rl: Online RL Fine-tuning For Flow-based Vision-language-action Models (2025) • No Venue
Chen et al.
Glyph: Scaling Context Windows Via Visual-text Compression (2025) • No Venue
Cheng et al.
Animegamer: Infinite Anime Life Simulation With Next Game State Prediction (2025) • No Venue
Cheng et al.
Caparena: Benchmarking And Analyzing Detailed Image Captioning In The LLM Era (2025) • No Venue
Cheng et al.
Video-as-answer: Predict And Generate Next Video Event With Joint-grpo (2025) • No Venue
Cheng et al.
Multimodal Evaluation Of Russian-language Architectures (2025) • No Venue
Chervyakov et al.
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities For Mllms (2025) • No Venue
Choi et al.
Instruction-guided Lesion Segmentation For Chest X-rays With Automatically Generated Large-scale Dataset (2025) • No Venue
Choi et al.
WEAVE: Unleashing And Benchmarking The In-context Interleaved Comprehension And Generation (2025) • No Venue
Chow et al.
Metaclip 2: A Worldwide Scaling Recipe (2025) • No Venue
Chuang et al.
Unifying Specialized Visual Encoders For Video Language Models (2025) • No Venue
Chung et al.
Don't Look Only Once: Towards Multimodal Interactive Reasoning With Selective Visual Revisitation (2025) • No Venue
Chung et al.
Paddleocr-vl: Boosting Multilingual Document Parsing Via A 0.9B Ultra-compact Vision-language Model (2025) • No Venue
Cui et al.
Emu3.5: Native Multimodal Models Are World Learners (2025) • No Venue
Cui et al.
Emerging Properties In Unified Multimodal Pretraining (2025) • No Venue
Deng et al.
Openvlthinker: An Early Exploration To Complex Vision-language Reasoning Via Iterative Self-improvement (2025) • No Venue
Deng et al.
From Pixels To Words -- Towards Native Vision-language Primitives At Scale (2025) • No Venue
Diao et al.
Story2board: A Training-free Approach For Expressive Storyboard Generation (2025) • No Venue
Dinkevich et al.
Sherlock: Self-correcting Reasoning In Vision-language Models (2025) • No Venue
Yi Ding, Ruqi Zhang
Arm-thinker: Reinforcing Multimodal Generative Reward Models With Agentic Tool Use And Visual Reasoning (2025) • No Venue
Ding et al.
Mmdocir: Benchmarking Multi-modal Retrieval For Long Documents (2025) • No Venue
Dong et al.
Mmtok: Multimodal Coverage Maximization For Efficient Inference Of Vlms (2025) • No Venue
Dong et al.
Got-r1: Unleashing Reasoning Capability Of MLLM For Visual Generation With Reinforcement Learning (2025) • No Venue
Duan et al.
MM-PRM: Enhancing Multimodal Mathematical Reasoning With Scalable Step-level Supervision (2025) • No Venue
Du et al.
Unimmvsr: A Unified Multi-modal Framework For Cascaded Video Super-resolution (2025) • No Venue
Du et al.
Virgo: A Preliminary Exploration On Reproducing O1-like MLLM (2025) • No Venue
Du et al.
Mind-the-glitch: Visual Correspondence For Detecting Inconsistencies In Subject-driven Generation (2025) • No Venue
Eldesokey et al.
Flux-reason-6m & Prism-bench: A Million-scale Text-to-image Reasoning Dataset And Comprehensive Benchmark (2025) • No Venue
Fang et al.
Creation-mmbench: Assessing Context-aware Creative Intelligence In MLLM (2025) • No Venue
Fang et al.
Dualvla: Building A Generalizable Embodied Agent Via Partial Decoupling Of Reasoning And Action (2025) • No Venue
Fang et al.
Robix: A Unified Model For Robot Interaction, Reasoning And Planning (2025) • No Venue
Fang et al.
Libero-plus: In-depth Robustness Analysis Of Vision-language-action Models (2025) • No Venue
Fei et al.
SRPO: Self-referential Policy Optimization For Vision-language-action Models (2025) • No Venue
Fei et al.
Can Mllms Guide Me Home? A Benchmark Study On Fine-grained Visual Reasoning From Transit Maps (2025) • No Venue
Feng et al.
Onethinker: All-in-one Reasoning Model For Image And Video (2025) • No Venue
Feng et al.
Tokenverse: Versatile Multi-concept Personalization In Token Modulation Space (2025) • No Venue
Garibi et al.
VITA-1.5: Towards Gpt-4o Level Real-time Vision And Speech Interaction (2025) • No Venue
Fu et al.
Listener-rewarded Thinking In Vlms For Image Preferences (2025) • No Venue
Gambashidze et al.
Exploring Hallucination Of Large Multimodal Models In Video Understanding: Benchmark, Analysis And Mitigation (2025) • No Venue
Gao et al.
Do Vision-language Models Have Internal World Models? Towards An Atomic Evaluation (2025) • No Venue
Gao et al.
Seedream 3.0 Technical Report (2025) • No Venue
Gao et al.
Pixels, Patterns, But No Poetry: To See The World Like Humans (2025) • No Venue
Gao et al.
Centurio: On Drivers Of Multilingual Ability Of Large Vision-language Model (2025) • No Venue
Geigle et al.
X-omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again (2025) • No Venue
Geng et al.
Webwatcher: Breaking New Frontier Of Vision-language Deep Research Agent (2025) • No Venue
Geng et al.
Breaking The Modality Barrier: Universal Embedding Learning With Multimodal Llms (2025) • No Venue
Gu et al.
Thinkmorph: Emergent Properties In Multimodal Interleaved Chain-of-thought Reasoning (2025) • No Venue
Gu et al.
Ui-venus Technical Report: Building High-performance UI Agents With RFT (2025) • No Venue
Gu et al.
Costaast: Cost-sensitive Toolpath Agent For Multi-turn Image Editing (2025) • No Venue
Gupta et al.
Rag-anything: All-in-one RAG Framework (2025) • No Venue
Guo et al.
Seed1.5-vl Technical Report (2025) • No Venue
Guo et al.
Generating An Image From 1,000 Words: Enhancing Text-to-image With Structured Captions (2025) • No Venue
Gutflaish et al.
Beyond Recognition: Evaluating Visual Perspective Taking In Vision Language Models (2025) • No Venue
Góral et al.
Mdocagent: A Multi-modal Multi-agent Framework For Document Understanding (2025) • No Venue
Han et al.
Learning To See Before Seeing: Demystifying LLM Visual Priors From Language Pre-training (2025) • No Venue
Han et al.
Vision As A Dialect: Unifying Visual Understanding And Generation Via Text-aligned Representations (2025) • No Venue
Han et al.
Visplay: Self-evolving Vision-language Models From Images (2025) • No Venue
He et al.
Omni-rgpt: Unifying Image And Video Region-level Understanding Via Token Marks (2025) • No Venue
Heo et al.
Baseer: A Vision-language Model For Arabic Document-to-markdown OCR (2025) • No Venue
Hennara et al.
Dita: Scaling Diffusion Transformer For Generalist Vision-language-action Policy (2025) • No Venue
Hou et al.
Deepeyesv2: Toward Agentic Multimodal Model (2025) • No Venue
Hong et al.
Glm-4.1v-thinking: Towards Versatile Multimodal Reasoning With Scalable Reinforcement Learning (2025) • No Venue
Hong et al.
Motionbench: Benchmarking And Improving Fine-grained Video Motion Understanding For Vision Language Models (2025) • No Venue
Hong et al.
Image Editing As Programs With Diffusion Models (2025) • No Venue
Hu et al.
Hunyuancustom: A Multimodal-driven Architecture For Customized Video Generation (2025) • No Venue
Hu et al.
See, Point, Fly: A Learning-free VLM Framework For Universal Unmanned Aerial Navigation (2025) • No Venue
Hu et al.
ILLUME+: Illuminating Unified MLLM With Dual Visual Tokenization And Diffusion Refinement (2025) • No Venue
Huang et al.
Ming-univision: Joint Image Understanding And Generation With A Unified Continuous Tokenizer (2025) • No Venue
Huang et al.
Vision-r1: Incentivizing Reasoning Capability In Multimodal Large Language Models (2025) • No Venue
Huang et al.
Thinkact: Vision-language-action Reasoning Via Reinforced Visual Latent Planning (2025) • No Venue
Huang et al.
Spotlight On Token Perception For Multimodal Reinforcement Learning (2025) • No Venue
Huang et al.
Revisiting Multimodal Positional Encoding In Vision-language Models (2025) • No Venue
Huang et al.
Vchain: Chain-of-visual-thought For Reasoning In Video Generation (2025) • No Venue
Huang et al.
Vistadpo: Video Hierarchical Spatial-temporal Direct Preference Optimization For Large Video Models (2025) • No Venue
Huang et al.
Lego-eval: Towards Fine-grained Evaluation On Synthesizing 3D Embodied Environments With Tool Augmentation (2025) • No Venue
Hwangbo et al.
Multi-granular Spatio-temporal Token Merging For Training-free Acceleration Of Video Llms (2025) • No Venue
Hyun et al.
From Denoising To Refining: A Corrective Framework For Vision-language Diffusion Model (2025) • No Venue
Ji et al.
Videorag: Retrieval-augmented Generation Over Video Corpus (2025) • No Venue
Jeong et al.
CSVQA: A Chinese Multimodal Benchmark For Evaluating STEM Reasoning Capabilities Of Vlms (2025) • No Venue
Jian et al.
Omnispatial: Towards Comprehensive Spatial Reasoning Benchmark For Vision Language Models (2025) • No Venue
Jia et al.
Visualwebinstruct: Scaling Up Multimodal Instruction Data Through Web Search (2025) • No Venue
Jia et al.
Mme-cot: Benchmarking Chain-of-thought In Large Multimodal Models For Reasoning Quality, Robustness, And Efficiency (2025) • No Venue
Jiang et al.
Infiniteyou: Flexible Photo Recrafting While Preserving Your Identity (2025) • No Venue
Jiang et al.
Alphadrive: Unleashing The Power Of Vlms In Autonomous Driving Via Reinforcement Learning And Reasoning (2025) • No Venue
Jiang et al.
Screencoder: Advancing Visual-to-code Generation For Front-end Automation Via Modular Multimodal Agents (2025) • No Venue
Jiang et al.
Rynnvla-001: Using Human Demonstrations To Improve Robot Manipulation (2025) • No Venue
Jiang et al.
Token-efficient Long Video Understanding For Multimodal Llms (2025) • No Venue
Jiang et al.
VIDEOP2R: Video Understanding From Perception To Reasoning (2025) • No Venue
Jiang et al.
Don't Blind Your VLA: Aligning Visual Representations For OOD Generalization (2025) • No Venue
Kachaev et al.
VIKI-R: Coordinating Embodied Multi-agent Cooperation Via Reinforcement Learning (2025) • No Venue
Kang et al.
Simple Semi-supervised Knowledge Distillation From Vision-language Models Via Texttt{d}ual-texttt{h}ead Texttt{o}ptimization (2025) • No Venue
Kang et al.
Robot-r1: Reinforcement Learning For Enhanced Embodied Reasoning In Robotics (2025) • No Venue
Kim et al.
Distillm-2: A Contrastive Approach Boosts The Distillation Of Llms (2025) • No Venue
Ko et al.
Cadrille: Multi-modal CAD Reconstruction With Online Reinforcement Learning (2025) • No Venue
Kolodiazhnyi et al.
Experience Is The Best Teacher: Grounding Vlms For Robotics Through Self-generated Memory (2025) • No Venue
Lan et al.
Perspective-aware Reasoning In Vision-language Models Via Mental Imagery Simulation (2025) • No Venue
Lee et al.
Genrecal: Generation After Recalibration From Large To Small Vision-language Models (2025) • No Venue
Lee et al.
Unified Reinforcement And Imitation Learning For Vision-language Models (2025) • No Venue
Lee et al.
Baichuan-omni-1.5 Technical Report (2025) • No Venue
Li et al.
IGGT: Instance-grounded Geometry Transformer For Semantic 3D Reconstruction (2025) • No Venue
Li et al.
Dualthor: A Dual-arm Humanoid Simulation Platform For Contingency-aware Planning (2025) • No Venue
Li et al.
Have We Unified Image Generation And Understanding Yet? An Empirical Study Of Gpt-4o's Image Generation Ability (2025) • No Venue
Ning Li, Jingran Zhang, Justin Cui
JARVIS-VLA: Post-training Large-scale Vision Language Models To Play Visual Games With Keyboards And Mouse (2025) • No Venue
Li et al.
MANZANO: A Simple And Scalable Unified Multimodal Model With A Hybrid Vision Tokenizer (2025) • No Venue
Li et al.
Omnivideobench: Towards Audio-visual Understanding Evaluation For Omni Mllms (2025) • No Venue
Li et al.
Openvision: A Fully-open, Cost-effective Family Of Advanced Vision Encoders For Multimodal Learning (2025) • No Venue
Li et al.
Ovo-bench: How Far Is Your Video-llms From Real-world Online Video Understanding? (2025) • No Venue
Li et al.
Perception, Reason, Think, And Plan: A Survey On Large Multimodal Reasoning Models (2025) • No Venue
Li et al.
R2-T2: Re-routing In Test-time For Multimodal Mixture-of-experts (2025) • No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning In Long-form Video Understanding (2025) • No Venue
Li et al.
Self-rewarding Vision-language Model Via Reasoning Decomposition (2025) • No Venue
Li et al.
Simplevla-rl: Scaling VLA Training Via Reinforcement Learning (2025) • No Venue
Li et al.
Spatial Forcing: Implicit Spatial Representation Alignment For Vision-language-action Model (2025) • No Venue
Li et al.
Temporal Preference Optimization For Long-form Video Understanding (2025) • No Venue
Li et al.
Uni-moe-2.0-omni: Scaling Language-centric Omnimodal Large Model With Advanced Moe, Training And Data (2025) • No Venue
Li et al.
VLA-RFT: Vision-language-action Reinforcement Fine-tuning With Verified Rewards In World Simulators (2025) • No Venue
Li et al.
Zebra-cot: A Dataset For Interleaved Vision Language Reasoning (2025) • No Venue
Li et al.
Describe Anything: Detailed Localized Image And Video Captioning (2025) • No Venue
Lian et al.
Discrete Diffusion VLA: Bringing Discrete Diffusion To Action Decoding In Vision-language-action Policies (2025) • No Venue
Liang et al.
Colorbench: Can Vlms See And Understand The Colorful World? A Comprehensive Benchmark For Color Perception, Reasoning, And Robustness (2025) • No Venue
Liang et al.
ROVER: Benchmarking Reciprocal Cross-modal Reasoning For Omnimodal Generation (2025) • No Venue
Liang et al.
Modomodo: Multi-domain Data Mixtures For Multimodal LLM Reinforcement Learning (2025) • No Venue
Liang et al.
Thinking With Camera: A Unified Multimodal Model For Camera-centric Understanding And Generation (2025) • No Venue
Liao et al.
Improved Visual-spatial Reasoning Via R1-zero-like Training (2025) • No Venue
Liao et al.
URECA: Unique Region Caption Anything (2025) • No Venue
Lim et al.
Embrace-3k: Embodied Reasoning And Action In Complex Environments (2025) • No Venue
Lin et al.
Ost-bench: Evaluating The Capabilities Of Mllms In Online Spatio-temporal Scene Understanding (2025) • No Venue
Lin et al.
Uniworld: High-resolution Semantic Encoders For Unified Visual Understanding And Generation (2025) • No Venue
Lin et al.
Towards Understanding Camera Motions In Any Video (2025) • No Venue
Lin et al.
Agent0-vl: Exploring Self-evolving Agent For Tool-integrated Vision-language Reasoning (2025) • No Venue
Liu et al.
FUSION: Fully Integration Of Vision-language Representations For Deep Cross-modal Understanding (2025) • No Venue
Liu et al.
Javisdit: Joint Audio-video Diffusion Transformer With Hierarchical Spatio-temporal Prior Synchronization (2025) • No Venue
Liu et al.
Guardreasoner: Towards Reasoning-based LLM Safeguards (2025) • No Venue
Liu et al.
Langscene-x: Reconstruct Generalizable 3D Language-embedded Scenes With Trimap Video Diffusion (2025) • No Venue
Liu et al.
Visual-rft: Visual Reinforcement Fine-tuning (2025) • No Venue
Liu et al.
Shotbench: Expert-level Cinematic Understanding In Vision-language Models (2025) • No Venue
Liu et al.
Ola: Pushing The Frontiers Of Omni-modal Language Model With Progressive Modality Alignment (2025) • No Venue
Liu et al.
Openvision 2: A Family Of Generative Pretrained Visual Encoders For Multimodal Learning (2025) • No Venue
Liu et al.
Points-reader: Distillation-free Adaptation Of Vision-language Models For Document Conversion (2025) • No Venue
Liu et al.
Phantom: Subject-consistent Video Generation Via Cross-modal Alignment (2025) • No Venue
Liu et al.
Scalecua: Scaling Open-source Computer Use Agents With Cross-platform Data (2025) • No Venue
Liu et al.
Videoreasonbench: Can Mllms Perform Vision-centric Complex Video Reasoning? (2025) • No Venue
Liu et al.
Spatial-ssrl: Enhancing Spatial Understanding Via Self-supervised Reinforcement Learning (2025) • No Venue
Liu et al.
Taking Notes Brings Focus? Towards Multi-turn Multimodal Dialogue Learning (2025) • No Venue
Liu et al.
Step1x-edit: A Practical Framework For General Image Editing (2025) • No Venue
Liu et al.
TUNA: Taming Unified Visual Representations For Native Unified Multimodal Models (2025) • No Venue
Liu et al.
VITA-E: Natural Embodied Interaction With Concurrent Seeing, Hearing, Speaking, And Acting (2025) • No Venue
Liu et al.
BIOMEDICA: An Open Biomedical Image-caption Archive, Dataset, And Vision-language Models Derived From Scientific Literature (2025) • No Venue
Lozano et al.
Ovi: Twin Backbone Cross-modal Fusion For Audio-video Generation (2025) • No Venue
Chetwin Low, Weimin Wang, Calder Katyal
Elv-halluc: Benchmarking Semantic Aggregation Hallucinations In Long Video Understanding (2025) • No Venue
Lu et al.
Atoken: A Unified Tokenizer For Vision (2025) • No Venue
Lu et al.
Av-reasoner: Improving And Benchmarking Clue-grounded Audio-visual Counting For Mllms (2025) • No Venue
Lu et al.
Ovis2.5 Technical Report (2025) • No Venue
Lu et al.
Omnicaptioner: One Captioner To Rule Them All (2025) • No Venue
Lu et al.
Being-h0: Vision-language-action Pretraining From Large-scale Human Videos (2025) • No Venue
Luo et al.
Open Captchaworld: A Comprehensive Web-based Platform For Testing And Benchmarking Multimodal LLM Agents (2025) • No Venue
Luo et al.
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, And Control In Spaces (2025) • No Venue
Luo et al.
F1: A Vision-language-action Model Bridging Understanding And Generation To Actions (2025) • No Venue
Lv et al.
One RL To See Them All: Visual Triple Unified Reinforcement Learning (2025) • No Venue
Ma et al.
Deepperception: Advancing R1-like Cognitive Visual Perception In Mllms For Knowledge-intensive Visual Grounding (2025) • No Venue
Ma et al.
Iv-bench: A Benchmark For Image-grounded Video Perception And Reasoning In Multimodal Llms (2025) • No Venue
Ma et al.
Unitok: A Unified Tokenizer For Visual Generation And Understanding (2025) • No Venue
Ma et al.
Smolvlm: Redefining Small And Efficient Multimodal Models (2025) • No Venue
Marafioti et al.
Unirl: Self-improving Unified Multimodal Models Via Supervised And Reinforcement Learning (2025) • No Venue
Weijia Mao, Zhenheng Yang, Mike Zheng Shou
Alignvlm: Bridging Vision And Language Latent Spaces For Multimodal Understanding (2025) • No Venue
Masry et al.
Chartqapro: A More Diverse And Challenging Benchmark For Chart Question Answering (2025) • No Venue
Masry et al.
Mm-eureka: Exploring Visual Aha Moment With Rule-based Large-scale Reinforcement Learning (2025) • No Venue
Meng et al.
I Think, Therefore I Diffuse: Enabling Multimodal In-context Reasoning In Diffusion Models (2025) • No Venue
Mi et al.
ORIGEN: Zero-shot 3D Orientation Grounding In Text-to-image Generation (2025) • No Venue
Min et al.
Smoldocling: An Ultra-compact Vision-language Model For End-to-end Multi-modal Document Conversion (2025) • No Venue
Nassar et al.
Mineru2.5: A Decoupled Vision-language Model For Efficient High-resolution Document Parsing (2025) • No Venue
Niu et al.
Medvlm-r1: Incentivizing Medical Reasoning Capability Of Vision-language Models (vlms) Via Reinforcement Learning (2025) • No Venue
Pan et al.
Omnimanip: Towards General Robotic Manipulation Via Object-centric Interaction Primitives As Spatial Constraints (2025) • No Venue
Pan et al.
Paper2poster: Towards Multimodal Poster Automation From Scientific Papers (2025) • No Venue
Pang et al.
ACG: Action Coherence Guidance For Flow-based VLA Models (2025) • No Venue
Park et al.
Fedrand: Enhancing Privacy In Federated Learning With Randomized Lora Subparameter Updates (2025) • No Venue
Park et al.
Interpretable Physics Reasoning And Performance Taxonomy In Vision-language Models (2025) • No Venue
Pawar et al.
Skywork R1V: Pioneering Multimodal Reasoning With Chain-of-thought (2025) • No Venue
Peng et al.
LMM-R1: Empowering 3B Lmms With Strong Reasoning Abilities Through Two-stage Rule-based RL (2025) • No Venue
Peng et al.
Multifinben: A Multilingual, Multimodal, And Difficulty-aware Benchmark For Financial LLM Evaluation (2025) • No Venue
Peng et al.
FAST: Efficient Action Tokenization For Vision-language-action Models (2025) • No Venue
Pertsch et al.
Judge Anything: MLLM As A Judge Across Any Modality (2025) • No Venue
Pu et al.
Vcr-bench: A Comprehensive Evaluation Framework For Video Chain-of-thought Reasoning (2025) • No Venue
Qi et al.
Sofar: Language-grounded Orientation Bridges Spatial Reasoning And Object Manipulation (2025) • No Venue
Qi et al.
BEAR: Benchmarking And Enhancing Multimodal Language Models For Atomic Embodied Capabilities (2025) • No Venue
Qi et al.
Pico-banana-400k: A Large-scale Dataset For Text-guided Image Editing (2025) • No Venue
Qian et al.
V-thinker: Interactive Thinking With Images (2025) • No Venue
Qiao et al.
Chain-of-visual-thought: Teaching Vlms To See And Think Better With Continuous Visual Tokens (2025) • No Venue
Qin et al.
Embodiedonevision: Interleaved Vision-text-action Pretraining For General Robot Control (2025) • No Venue
Qu et al.
Thinking Beyond Tokens: From Brain-inspired Intelligence To Cognitive Foundations For Artificial General Intelligence And Its Societal Impact (2025) • No Venue
Qureshi et al.
Apriel-1.5-15b-thinker (2025) • No Venue
Radhakrishna et al.
How Well Does Gpt-4o Understand Vision? Evaluating Multimodal Foundation Models On Standard Computer Vision Tasks (2025) • No Venue
Ramachandran et al.
Videomathqa: Benchmarking Mathematical Reasoning Via Multimodal Understanding In Videos (2025) • No Venue
Rasheed et al.
Anycap Project: A Unified Framework, Dataset, And Benchmark For Controllable Omni-modal Captioning (2025) • No Venue
Ren et al.
Zerobench: An Impossible Visual Benchmark For Contemporary Large Multimodal Models (2025) • No Venue
Roberts et al.
Through The Looking Glass: Common Sense Consistency Evaluation Of Weird Images (2025) • No Venue
Rykov et al.
ABC: Achieving Better Control Of Multimodal Embeddings Using Vlms (2025) • No Venue
Benjamin Schneider, Florian Kerschbaum, Wenhu Chen
Seedream 4.0: Toward Next-generation Multimodal Image Generation (2025) • No Venue
Seedream et al.
Semi-off-policy Reinforcement Learning For Vision-language Slow-thinking Reasoning (2025) • No Venue
Shen et al.
Skywork-r1v3 Technical Report (2025) • No Venue
Shen et al.
VLM-R1: A Stable And Generalizable R1-style Large Vision-language Model (2025) • No Venue
Shen et al.
Realunify: Do Unified Models Truly Benefit From Unification? A Comprehensive Benchmark (2025) • No Venue
Shi et al.
Mavors: Multi-granularity Video Representation For Multimodal Large Language Model (2025) • No Venue
Shi et al.
Mathcanvas: Intrinsic Visual Chain-of-thought For Multimodal Mathematical Reasoning (2025) • No Venue
Shi et al.
Llmvox: Autoregressive Streaming Text-to-speech Model For Any LLM (2025) • No Venue
Shikhar et al.
Earthmind: Towards Multi-granular And Multi-sensor Earth Observation With Large Multimodal Models (2025) • No Venue
Shu et al.
Smolvla: A Vision-language-action Model For Affordable And Efficient Robotics (2025) • No Venue
Shukor et al.
DMM: Building A Versatile Image Generation Model Via Distillation-based Model Merging (2025) • No Venue
Song et al.
RL Makes Mllms See Better Than SFT (2025) • No Venue
Song et al.
Thinking With Images For Multimodal Reasoning: Foundations, Methods, And Future Frontiers (2025) • No Venue
Su et al.
Pixel Reasoner: Incentivizing Pixel-space Reasoning With Curiosity-driven Reinforcement Learning (2025) • No Venue
Su et al.
Openthinkimg: Learning To Think With Images Via Visual Tool Reinforcement Learning (2025) • No Venue
Su et al.
Os-sentinel: Towards Safety-enhanced Mobile GUI Agents Via Hybrid Validation In Realistic Workflows (2025) • No Venue
Sun et al.
Llava-scissor: Token Compression With Semantic Connected Components For Video Llms (2025) • No Venue
Sun et al.
Transformer^2: Self-adaptive Llms (2025) • No Venue
Qi Sun, Edoardo Cetin, Yujin Tang
Seagent: Self-evolving Computer Use Agent With Autonomous Learning From Experience (2025) • No Venue
Sun et al.
T2i-reasonbench: Benchmarking Reasoning-informed Text-to-image Generation (2025) • No Venue
Sun et al.
TULIP: Towards Unified Language-image Pretraining (2025) • No Venue
Tang et al.
Videogameqa-bench: Evaluating Vision-language Models For Video Game Quality Assurance (2025) • No Venue
Taesiri et al.
Lumine: An Open Recipe For Building Generalist Agents In 3D Open Worlds (2025) • No Venue
Tan et al.
Lego-puzzles: How Good Are Mllms At Multi-step Spatial Reasoning? (2025) • No Venue
Tang et al.
Lingshu: A Generalist Foundation Model For Unified Multimodal Medical Understanding And Reasoning (2025) • No Venue
Team et al.
Kwai Keye-vl Technical Report (2025) • No Venue
Team et al.
Gemini Robotics: Bringing AI Into The Physical World (2025) • No Venue
Team et al.
Gigabrain-0: A World Model-powered Vision-language-action Model (2025) • No Venue
Team et al.
Mimo: Unlocking The Reasoning Potential Of Language Model -- From Pretraining To Posttraining (2025) • No Venue
Team et al.
Robobrain 2.0 Technical Report (2025) • No Venue
Team et al.
Modernvbert: Towards Smaller Visual Document Retrievers (2025) • No Venue
Teiletche et al.
Llamav-o1: Rethinking Step-by-step Visual Reasoning In Llms (2025) • No Venue
Thawakar et al.
Mmada-parallel: Multimodal Large Diffusion Language Models For Thinking-aware Editing And Generation (2025) • No Venue
Tian et al.
More Thought, Less Accuracy? On The Dual Nature Of Reasoning In Vision-language Models (2025) • No Venue
Tian et al.
Open Multimodal Retrieval-augmented Factual Image Generation (2025) • No Venue
Tian et al.
MMMR: Benchmarking Massive Multi-modal Reasoning Tasks (2025) • No Venue
Tie et al.
Apollo: An Exploration Of Video Understanding In Large Multimodal Models (2024) • No Venue
Zohar et al.
Nl-eye: Abductive NLI For Images (2024) • No Venue
Ventura et al.
Fastvlm: Efficient Vision Encoding For Vision Language Models (2024) • No Venue
Vasu et al.
One Missing Piece In Vision And Language: A Survey On Comics Understanding (2024) • No Venue
Vivoli et al.
Gpt-4o System Card (2024) • No Venue
Openai et al.
Eyes Wide Shut? Exploring The Visual Shortcomings Of Multimodal Llms (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 90 citations
Tong et al.
Pixtral 12B (2024) • No Venue
Agrawal et al.
Yi: Open Foundation Models By 01.AI (2024) • No Venue
Ai et al.
Maya: An Instruction Finetuned Multilingual Multimodal Model (2024) • No Venue
Alam et al.
Unibench: Visual Reasoning Requires Rethinking Vision-language Beyond Scaling (2024) • No Venue
Al-Tahan et al.
Skyeyegpt: Unifying Remote Sensing Vision-language Tasks Via Instruction Tuning With Large Language Model (2024) • ISPRS Journal of Photogrammetry and Remote Sensing • 51 citations
Yang Zhan, Zhitong Xiong, Yuan Yuan
Anygpt: Unified Multimodal LLM With Discrete Sequence Modeling (2024) • No Venue
Zhan et al.
Minigpt4-video: Advancing Multimodal Llms For Video Understanding With Interleaved Visual-textual Tokens (2024) • No Venue
Ataallah et al.
Scenescript: Reconstructing Scenes With An Autoregressive Structured Language Model (2024) • No Venue
Avetisyan et al.
BLIP3-KALE: Knowledge Augmented Large-scale Dense Captions (2024) • No Venue
Awadalla et al.
Screenai: A Vision-language Model For UI And Infographics Understanding (2024) • No Venue
Baechler et al.
Digirl: Training In-the-wild Device-control Agents With Autonomous Reinforcement Learning (2024) • No Venue
Bai et al.
From Generalist To Specialist: Adapting Vision Language Models Via Task-specific Visual Instruction Tuning (2024) • No Venue
Bai et al.
Paligemma: A Versatile 3B VLM For Transfer (2024) • No Venue
Beyer et al.
MUMU: Bootstrapping Multimodal Image Generation From Text-to-image Data (2024) • No Venue
William Berman, Alexander Peysakhovich
Visual Riddles: A Commonsense And World Knowledge Challenge For Large Vision And Language Models (2024) • No Venue
Bitton-Guetta et al.
$\pi_0$: A Vision-language-action Flow Model For General Robot Control (2024) • Robotics: Science and Systems 2025 • 48 citations
Black et al.
Merlin: A Vision Language Foundation Model For 3D Computed Tomography (2024) • Arxiv • 45 citations
Blankemeier et al.
3dgraphllm: Combining Semantic Graphs And Large Language Models For 3D Scene Understanding (2024) • No Venue
Tatiana Zemskova, Dmitry Yudin
An Introduction To Vision-language Modeling (2024) • No Venue
Bordes et al.
ROCKET-1: Master Open-world Interaction With Visual-temporal Context Prompting (2024) • No Venue
Cai et al.
Lyra: An Efficient And Speech-centric Framework For Omni-cognition (2024) • No Venue
Zhong et al.
Analyzing The Language Of Visual Tokens (2024) • No Venue
Chan et al.
Getting It Right: Improving Spatial Consistency In Text-to-image Models (2024) • No Venue
Chatterjee et al.
EMOVA: Empowering Language Models To See, Hear And Speak With Vivid Emotions (2024) • No Venue
Chen et al.
Chexagent: Towards A Foundation Model For Chest X-ray Interpretation (2024) • No Venue
Chen et al.
Contrastive Localized Language-image Pre-training (2024) • No Venue
Chen et al.
Compcap: Improving Multimodal Large Language Models With Composite Captions (2024) • No Venue
Chen et al.
An Image Is Worth 1/2 Tokens After Layer 2: Plug-and-play Inference Acceleration For Large Vision-language Models (2024) • No Venue
Chen et al.
Gmai-mmbench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI (2024) • No Venue
Chen et al.
EVLM: An Efficient Vision-language Model For Visual Understanding (2024) • No Venue
Chen et al.
Expanding Performance Boundaries Of Open-source Multimodal Models With Model, Data, And Test-time Scaling (2024) • No Venue
Chen et al.
Florence-vl: Enhancing Vision-language Models With Generative Vision Encoder And Depth-breadth Fusion (2024) • No Venue
Chen et al.
How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites (2024) • No Venue
Chen et al.
Interleaved Scene Graph For Interleaved Text-and-image Generation Assessment (2024) • No Venue
Chen et al.
Mega-bench: Scaling Multimodal Evaluation To Over 500 Real-world Tasks (2024) • No Venue
Chen et al.
Moto: Latent Motion Token As The Bridging Language For Robot Manipulation (2024) • No Venue
Chen et al.
Mj-bench: Is Your Multimodal Reward Model Really A Good Judge For Text-to-image Generation? (2024) • No Venue
Chen et al.
Motionllm: Understanding Human Behaviors From Human Motions And Videos (2024) • No Venue
Chen et al.
Panda-70m: Captioning 70M Videos With Multiple Cross-modality Teachers (2024) • No Venue
Chen et al.
Videollm-online: Online Video Large Language Model For Streaming Video (2024) • No Venue
Chen et al.
Sharegpt4video: Improving Video Understanding And Generation With Better Captions (2024) • No Venue
Chen et al.
Spatialvlm: Endowing Vision-language Models With Spatial Reasoning Capabilities (2024) • No Venue
Chen et al.
AIM: Adaptive Inference Of Multi-modal Llms Via Token Merging And Pruning (2024) • No Venue
Zhong et al.
Mmmu-pro: A More Robust Multi-discipline Multimodal Understanding Benchmark (2024) • No Venue
Yue et al.
ANOLE: An Open, Autoregressive, Native Large Multimodal Models For Interleaved Image-text Generation (2024) • No Venue
Chern et al.
Mora: Enabling Generalist Video Generation Via A Multi-agent Framework (2024) • No Venue
Yuan et al.
Videorefer Suite: Advancing Spatial-temporal Object Understanding With Video LLM (2024) • No Venue
Yuan et al.
Open-vocabulary SAM: Segment And Recognize Twenty-thousand Classes Interactively (2024) • No Venue
Yuan et al.
Magictime: Time-lapse Video Generation Models As Metamorphic Simulators (2024) • No Venue
Yuan et al.
Gpt-4v(ision) Is A Generalist Web Agent, If Grounded (2024) • No Venue
Zheng et al.
Videgothink: Assessing Egocentric Video Understanding Capabilities For Embodied AI (2024) • No Venue
Cheng et al.
LEGENT: Open Platform For Embodied Agents (2024) • No Venue
Cheng et al.
Videollama 2: Advancing Spatial-temporal Modeling And Audio Understanding In Video-llms (2024) • No Venue
Cheng et al.
Yolo-world: Real-time Open-vocabulary Object Detection (2024) • No Venue
Cheng et al.
Visrag: Vision-based Retrieval-augmented Generation On Multi-modality Documents (2024) • No Venue
Yu et al.
Boosting Continual Learning Of Vision-language Models Via Mixture-of-experts Adapters (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Yu et al.
M3docrag: Multi-modal Retrieval Is What You Need For Multi-page Multi-document Understanding (2024) • No Venue
Cho et al.
Visionllama: A Unified Llama Interface For Vision Tasks (2024) • No Venue
Chu et al.
Are Vision-language Models Truly Understanding Multi-vision Sensor? (2024) • No Venue
Chung et al.
RACER: Rich Language-guided Failure Recovery Policies For Imitation Learning (2024) • No Venue
Dai et al.
NVLM: Open Frontier-class Multimodal Llms (2024) • No Venue
Dai et al.
Coconut: Modernizing COCO Segmentation (2024) • No Venue
Deng et al.
Unveiling Encoder-free Vision-language Models (2024) • No Venue
Diao et al.
Ferret-ui: Grounded Mobile UI Understanding With Multimodal Llms (2024) • No Venue
You et al.
Megapairs: Massive Data Synthesis For Universal Multimodal Retrieval (2024) • No Venue
Zhou et al.
Diffusion Feedback Helps CLIP See Better (2024) • No Venue
Wang et al.
Emu3: Next-token Prediction Is All You Need (2024) • No Venue
Wang et al.
Dreamrunner: Fine-grained Storytelling Video Generation With Retrieval-augmented Motion Adaptation (2024) • No Venue
Wang et al.
Enhancing The Reasoning Ability Of Multimodal Large Language Models Via Mixed Preference Optimization (2024) • No Venue
Wang et al.
Multimodal Needle In A Haystack: Benchmarking Long-context Capability Of Multimodal Large Language Models (2024) • No Venue
Wang et al.
Chameleon: Mixed-modal Early-fusion Foundation Models (2024) • No Venue
Chameleon Team
Mllm-as-a-judge For Image Safety Without Human Labeling (2024) • No Venue
Wang et al.
MIO: A Foundation Model On Multimodal Tokens (2024) • No Venue
Wang et al.
Longllava: Scaling Multi-modal Llms To 1000 Images Efficiently Via Hybrid Architecture (2024) • No Venue
Wang et al.
Training-free Consistent Text-to-image Generation (2024) • No Venue
Tewel et al.
Lift: Leveraging Human Feedback For Text-to-video Model Alignment (2024) • No Venue
Wang et al.
Internvideo2: Scaling Video Foundation Models For Multimodal Video Understanding (2024) • No Venue
Wang et al.
Mdpo: Conditional Preference Optimization For Multimodal Large Language Models (2024) • No Venue
Wang et al.
Multimodal Latent Language Modeling With Next-token Diffusion (2024) • No Venue
Sun et al.
EVA-CLIP-18B: Scaling CLIP To 18 Billion Parameters (2024) • No Venue
Sun et al.
X-prompt: Towards Universal In-context Image Generation In Auto-regressive Vision Language Foundation Models (2024) • No Venue
Sun et al.
T2v-compbench: A Comprehensive Benchmark For Compositional Text-to-video Generation (2024) • No Venue
Sun et al.
Os-genesis: Automating GUI Agent Trajectory Construction Via Reverse Task Synthesis (2024) • No Venue
Sun et al.
Parrot: Multilingual Visual Instruction Tuning (2024) • No Venue
Sun et al.
Video-star: Self-training Enables Video Instruction Tuning With Any Supervision (2024) • No Venue
Zohar et al.
Llava-3d: A Simple Yet Effective Pathway To Empowering Lmms With 3d-awareness (2024) • No Venue
Zhu et al.
Videohallucer: Evaluating Intrinsic And Extrinsic Hallucinations In Large Video-language Models (2024) • No Venue
Wang et al.
Videollamb: Long-context Video Understanding With Recurrent Memory Bridges (2024) • No Venue
Wang et al.
Videoagent: Long-form Video Understanding With Large Language Model As Agent (2024) • No Venue
Wang et al.
Unpacking SDXL Turbo: Interpreting Text-to-image Models With Sparse Autoencoders (2024) • No Venue
Surkov et al.
Needle In A Multimodal Haystack (2024) • No Venue
Wang et al.
Qwen2-vl: Enhancing Vision-language Model's Perception Of The World At Any Resolution (2024) • No Venue
Wang et al.
Videogamebunny: Towards Vision Assistants For Video Games (2024) • No Venue
Mohammad Reza Taesiri, Cor-Paul Bezemer
Textsquare: Scaling Up Text-centric Visual Instruction Tuning (2024) • No Venue
Tang et al.
Strokenuwa: Tokenizing Strokes For Vector Graphic Synthesis (2024) • No Venue
Tang et al.
Ref-avs: Refer And Segment Objects In Audio-visual Scenes (2024) • No Venue
Wang et al.
A Multimodal Automated Interpretability Agent (2024) • No Venue
Shaham et al.
Synth^2: Boosting Visual-language Models With Synthetic Captions And Image Embeddings (2024) • No Venue
Sharifzadeh et al.
TOMATO: Assessing Visual Temporal Reasoning Capabilities In Multimodal Foundation Models (2024) • No Venue
Shangguan et al.
Explanatory Instructions: Towards Unified Vision Tasks Understanding And Zero-shot Generalization (2024) • No Venue
Shen et al.
Longvu: Spatiotemporal Adaptive Compression For Long Video-language Understanding (2024) • No Venue
Shen et al.
Chartmimic: Evaluating Lmm's Cross-modal Reasoning Capability Via Chart-to-code Generation (2024) • No Venue
Shi et al.
Eagle: Exploring The Design Space For Multimodal Llms With Mixture Of Encoders (2024) • No Venue
Shi et al.
MARVEL-40M+: Multi-level Visual Elaboration For High-fidelity Text-to-3d Content Creation (2024) • No Venue
Sinha et al.
Both Text And Images Leaked! A Systematic Analysis Of Multimodal LLM Data Contamination (2024) • No Venue
Song et al.
Moviellm: Enhancing Long Video Understanding With Ai-generated Movies (2024) • No Venue
Song et al.
Lvlm-intrepret: An Interpretability Tool For Large Vision-language Models (2024) • No Venue
Stan et al.
Paligemma 2: A Family Of Versatile Vlms For Transfer (2024) • No Venue
Steiner et al.
Diving Into Self-evolving Training For Multimodal Reasoning (2024) • No Venue
Liu et al.
Coarse Correspondence Elicit 3D Spacetime Understanding In Multimodal Language Model (2024) • No Venue
Liu et al.
Lumina-mgpt: Illuminate Flexible Photorealistic Text-to-image Generation With Multimodal Generative Pretraining (2024) • No Venue
Liu et al.
Glyph-byt5-v2: A Strong Aesthetic Baseline For Accurate Multilingual Visual Text Rendering (2024) • No Venue
Liu et al.
Flowing From Words To Pixels: A Framework For Cross-modality Evolution (2024) • No Venue
Liu et al.
Generative Photomontage (2024) • No Venue
Liu et al.
Harnessing Webpage Uis For Text-rich Visual Understanding (2024) • No Venue
Liu et al.
MMDU: A Multi-turn Multi-image Dialog Understanding Benchmark And Instruction-tuning Dataset For Lvlms (2024) • No Venue
Liu et al.
Magicquill: An Intelligent Interactive Image Editing System (2024) • No Venue
Liu et al.
MIA-DPO: Multi-image Augmented Direct Preference Optimization For Large Vision-language Models (2024) • No Venue
Liu et al.
POINTS1.5: Building A Vision-language Model Towards Real World Applications (2024) • No Venue
Liu et al.
Oryx MLLM: On-demand Spatial-temporal Understanding At Arbitrary Resolution (2024) • No Venue
Liu et al.
POINTS: Improving Your Vision-language Model With Affordable Strategies (2024) • No Venue
Liu et al.
Cambrian-1: A Fully Open, Vision-centric Exploration Of Multimodal Llms (2024) • No Venue
Tong et al.
Rscama: Remote Sensing Image Change Captioning With State Space Model (2024) • IEEE Geoscience and Remote Sensing Letters • 59 citations
Liu et al.
World Model On Million-length Video And Language With Ringattention (2024) • No Venue
Liu et al.
Multimodal Healthcare AI: Identifying And Designing Clinically Relevant Vision-language Applications For Radiology (2024) • Proceedings of the CHI Conference on Human Factors in Computing Systems • 50 citations
Yildirim et al.
RULE: Reliable Multimodal RAG For Factuality In Medical Vision Language Models (2024) • No Venue
Xia et al.
Omg-llava: Bridging Image-level, Object-level, Pixel-level Reasoning And Understanding (2024) • No Venue
Zhang et al.
MM1.5: Methods, Analysis & Insights From Multimodal LLM Fine-tuning (2024) • No Venue
Zhang et al.
MAVIS: Mathematical Visual Instruction Tuning (2024) • No Venue
Zhang et al.
Multimodal Self-instruct: Synthetic Abstract Image And Visual Reasoning Instruction Using Language Model (2024) • No Venue
Zhang et al.
OCR Hinders RAG: Evaluating The Cascading Impact Of OCR On Retrieval-augmented Generation (2024) • No Venue
Zhang et al.
Long Context Transfer From Language To Vision (2024) • No Venue
Zhang et al.
Deepseek-vl: Towards Real-world Vision-language Understanding (2024) • No Venue
Lu et al.
Mathverse: Does Your Multi-modal LLM Truly See The Diagrams In Visual Math Problems? (2024) • No Venue
Zhang et al.
Omniparser For Pure Vision Based GUI Agent (2024) • No Venue
Lu et al.
Llava-mr: Large Language-and-vision Assistant For Video Moment Retrieval (2024) • Arxiv • 92 citations
Lu et al.
Mmevol: Empowering Multimodal Large Language Models With Evol-instruct (2024) • No Venue
Luo et al.
Cobra: Extending Mamba To Multi-modal Large Language Model For Efficient Inference (2024) • No Venue
Zhao et al.
Videoautoarena: An Automated Arena For Evaluating Large Multimodal Models In Video Analysis Through User Simulation (2024) • No Venue
Luo et al.
Diffsensei: Bridging Multi-modal Llms And Diffusion Models For Customized Manga Generation (2024) • No Venue
Wu et al.
Janusflow: Harmonizing Autoregression And Rectified Flow For Unified Multimodal Understanding And Generation (2024) • No Venue
Ma et al.
Groma: Localized Visual Tokenization For Grounding Multimodal Large Language Models (2024) • No Venue
Ma et al.
Janus: Decoupling Visual Encoding For Unified Multimodal Understanding And Generation (2024) • No Venue
Wu et al.
Longvideobench: A Benchmark For Long-context Interleaved Video-language Understanding (2024) • No Venue
Wu et al.
Core: Context-regularized Text Embedding Learning For Text-to-image Personalization (2024) • No Venue
Wu et al.
PALO: A Polyglot Large Multimodal Model For 5B People (2024) • No Venue
Maaz et al.
Chartgemma: Visual Instruction-tuning For Chart Reasoning In The Wild (2024) • No Venue
Masry et al.
MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training (2024) • No Venue
McKinzie et al.
Whiteboard-of-thought: Thinking Step-by-step Across Modalities (2024) • No Venue
Sachit Menon, Richard Zemel, Carl Vondrick
MMIU: Multimodal Multi-image Understanding For Evaluating Large Vision-language Models (2024) • No Venue
Meng et al.
Towards World Simulator: Crafting Physical Commonsense-based Benchmark For Video Generation (2024) • No Venue
Meng et al.
Videoglamm: A Large Multimodal Model For Pixel-level Visual Grounding In Videos (2024) • No Venue
Munasinghe et al.
Bimedix2: Bio-medical Expert LMM For Diverse Medical Modalities (2024) • No Venue
Mullappilly et al.
Yesbut: A High-quality Annotated Multimodal Dataset For Evaluating Satire Comprehension Capability Of Vision-language Models (2024) • No Venue
Nandy et al.
Openvid-1m: A Large-scale High-quality Dataset For Text-to-video Generation (2024) • No Venue
Nan et al.
Reka Core, Flash, And Edge: A Series Of Powerful Multimodal Language Models (2024) • No Venue
Ormazabal et al.
Worldcuisines: A Massive-scale Benchmark For Multilingual And Multicultural Visual Question Answering On Global Cuisines (2024) • No Venue
Winata et al.
Diffusion Augmented Agents: A Framework For Efficient Exploration And Transfer Learning (2024) • No Venue
Palo et al.
Llamo: Large Language Model-based Molecular Graph Assistant (2024) • No Venue
Park et al.
Internlm-xcomposer2.5-omnilive: A Comprehensive Multimodal System For Long-term Streaming Video And Audio Interactions (2024) • No Venue
Zhang et al.
Personalized Visual Instruction Tuning (2024) • No Venue
Pi et al.
SNIFFER: Multimodal Large Language Model For Explainable Out-of-context Misinformation Detection (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Qi et al.
Prism: A Framework For Decoupling And Assessing The Capabilities Of Vlms (2024) • No Venue
Qiao et al.
We-math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? (2024) • No Venue
Qiao et al.
Code-as-monitor: Constraint-aware Visual Programming For Reactive And Proactive Robotic Failure Detection (2024) • No Venue
Zhou et al.
Tokenflow: Unified Image Tokenizer For Multimodal Understanding And Generation (2024) • No Venue
Qu et al.
Xgen-videosyn-1: High-fidelity Text-to-video Synthesis With Compressed Representations (2024) • No Venue
Qin et al.
Tinyllava: A Framework Of Small-scale Large Multimodal Models (2024) • No Venue
Zhou et al.
Vision Language Models Are Blind (2024) • No Venue
Rahmanzadehgervi et al.
Small Language Model Meets With Reinforced Vision Vocabulary (2024) • No Venue
Wei et al.
Paint By Inpaint: Learning To Add Image Objects By Removing Them First (2024) • No Venue
Wasserman et al.
Progressive Multimodal Reasoning Via Active Retrieval (2024) • No Venue
Dong et al.
Insight-v: Exploring Long-chain Visual Reasoning With Multimodal Large Language Models (2024) • No Venue
Dong et al.
Internlm-xcomposer2-4khd: A Pioneering Large Vision-language Model Handling Resolutions From 336 Pixels To 4K HD (2024) • No Venue
Dong et al.
The Llama 3 Herd Of Models (2024) • No Venue
Dubey et al.
Geometry Image Diffusion: Fast And Data-efficient Text-to-3d With Image-based Surface Representation (2024) • No Venue
Slava Elizarov, Ciara Rowles, Simon Donné
Mmfactory: A Universal Solution Search Engine For Vision-language Tasks (2024) • No Venue
Wan-Cyuan Fan, Tanzila Rahman, Leonid Sigal
VILA^2: VILA Augmented VILA (2024) • No Venue
Fang et al.
PUMA: Empowering Unified MLLM With Multi-granular Visual Generation (2024) • No Venue
Fang et al.
Mmbench-video: A Long-form Multi-shot Benchmark For Holistic Video Understanding (2024) • No Venue
Fang et al.
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark (2024) • No Venue
Zhang et al.
Colpali: Efficient Document Retrieval With Vision Language Models (2024) • No Venue
Faysse et al.
Voco-llama: Towards Vision Compression With Large Language Models (2024) • No Venue
Ye et al.
Enhancing Video-language Representations With Structural Spatio-temporal Alignment (2024) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 49 citations
Fei et al.
Mplug-owl3: Towards Long Image-sequence Understanding In Multi-modal Large Language Models (2024) • No Venue
Ye et al.
In-context Learning Enables Multimodal Large Language Models To Classify Cancer Pathology Images (2024) • Nature Communications • 68 citations
Ferber et al.
Multimodal Autoregressive Pre-training Of Large Vision Encoders (2024) • No Venue
Fini et al.
VITA: Towards Open-source Interactive Omni Multimodal LLM (2024) • No Venue
Fu et al.
BLINK: Multimodal Large Language Models Can See But Not Perceive (2024) • No Venue
Fu et al.
Video-mme: The First-ever Comprehensive Evaluation Benchmark Of Multi-modal Llms In Video Analysis (2024) • No Venue
Fu et al.
Mm-ego: Towards Building Egocentric Multimodal Llms (2024) • No Venue
Ye et al.
Kvasir-vqa: A Text-image Pair GI Tract Dataset (2024) • No Venue
Gautam et al.
Visual Fact Checker: Enabling High-fidelity Detailed Caption Generation (2024) • No Venue
Ge et al.
Convllava: Hierarchical Backbones As Visual Encoder For Large Multimodal Models (2024) • No Venue
Ge et al.
Omnifusion Technical Report (2024) • No Venue
Goncharova et al.
Av-odyssey Bench: Can Your Multimodal Llms Really Understand Audio-visual Information? (2024) • No Venue
Gong et al.
Navigating The Digital World As Humans Do: Universal Visual Grounding For GUI Agents (2024) • No Venue
Gou et al.
Dense Connector For Mllms (2024) • No Venue
Yao et al.
Pulid: Pure And Lightning ID Customization Via Contrastive Alignment (2024) • No Venue
Guo et al.
Mammoth-vl: Eliciting Multimodal Reasoning With Instruction Tuning At Scale (2024) • No Venue
Guo et al.
JPEG-LM: Llms As Image Generators With Canonical Codec Representations (2024) • No Venue
Han et al.
Seed-story: Multimodal Long Story Generation With Large Language Model (2024) • No Venue
Yang et al.
Vision-language Models For Medical Report Generation And Visual Question Answering: A Review (2024) • Frontiers in Artificial Intelligence • 86 citations
Iryna Hartsock, Ghulam Rasool
Distill Visual Chart Reasoning Ability From Llms To Mllms (2024) • No Venue
He et al.
Mmworld: Towards Multi-discipline Multi-faceted World Model Evaluation In Videos (2024) • No Venue
He et al.
MA-LMM: Memory-augmented Large Multimodal Model For Long-term Video Understanding (2024) • No Venue
He et al.
Visionzip: Longer Is Better But Not Necessary In Vision Language Models (2024) • No Venue
Yang et al.
Law Of Vision Representation In Mllms (2024) • No Venue
Yang et al.
Vript: A Video Is Worth Thousands Of Words (2024) • No Venue
Yang et al.
Thinking In Space: How Multimodal Large Language Models See, Remember, And Recall Spaces (2024) • No Venue
Yang et al.
Llava-gemma: Accelerating Multimodal Foundation Models With A Compact Language Model (2024) • No Venue
Hinck et al.
Cogvlm2: Visual Language Models For Image And Video Understanding (2024) • No Venue
Hong et al.
Sampart3d: Segment Any Part In 3D Objects (2024) • No Venue
Yang et al.
Mplug-docowl 1.5: Unified Structure Learning For Ocr-free Document Understanding (2024) • No Venue
Hu et al.
Acdit: Interpolating Autoregressive Conditional Modeling And Diffusion Transformer (2024) • No Venue
Hu et al.
Visual Sketchpad: Sketching As A Visual Chain Of Thought For Multimodal Language Models (2024) • No Venue
Hu et al.
Adapting Visual-language Models For Generalizable Anomaly Detection In Medical Images (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Huang et al.
Deciphering Cross-modal Alignment In Large Vision-language Models With Modality Integration Rate (2024) • No Venue
Huang et al.
Genmac: Compositional Text-to-video Generation With Multi-agent Collaboration (2024) • No Venue
Huang et al.
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation (2024) • No Venue
Huang et al.
Mmevalpro: Calibrating Multimodal Benchmarks Towards Trustworthy And Efficient Evaluation (2024) • No Venue
Huang et al.
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding (2024) • No Venue
Ji Ha Jang, Hoigi Seo, Se Young Chun
Sceneverse: Scaling 3D Vision-language Learning For Grounded Scene Understanding (2024) • No Venue
Jia et al.
LEOPARD : A Vision Language Model For Text-rich Multi-image Tasks (2024) • No Venue
Jia et al.
Symdpo: Boosting In-context Learning Of Large Multimodal Models With Symbol Demonstration Direct Preference Optimization (2024) • No Venue
Jia et al.
E5-V: Universal Embeddings With Multimodal Large Language Models (2024) • No Venue
Jiang et al.
SOLAMI: Social Vision-language-action Modeling For Immersive Interaction With 3D Autonomous Characters (2024) • No Venue
Jiang et al.
Hidden Flaws Behind Expert-level Accuracy Of Multimodal GPT-4 Vision In Medicine (2024) • npj Digital Medicine • 75 citations
Jin et al.
VARCO-VISION: Expanding Frontiers In Korean Vision-language Models (2024) • No Venue
Ju et al.
Pegasus-v1 Technical Report (2024) • No Venue
Jung et al.
Video Depth Without Video Models (2024) • No Venue
Ke et al.
Openvla: An Open-source Vision-language-action Model (2024) • No Venue
Kim et al.
Videoicl: Confidence-based Iterative In-context Learning For Out-of-distribution Video Understanding (2024) • No Venue
Kim et al.
Longvila: Scaling Long-context Visual Language Models For Long Videos (2024) • No Venue
Xue et al.
Revisit Large-scale Image-caption Data In Pre-training Multimodal Foundation Models (2024) • No Venue
Lai et al.
Pllava : Parameter-free Llava Extension From Images To Videos For Video Dense Captioning (2024) • No Venue
Xu et al.
Building And Better Understanding Vision-language Models: Insights And Future Directions (2024) • No Venue
Laurençon et al.
Unlocking The Conversion Of Web Screenshots Into HTML Code With The Websight Dataset (2024) • No Venue
Hugo Laurençon, Léo Tronchon, Victor Sanh
What Matters When Building Vision-language Models? (2024) • No Venue
Laurençon et al.
Meteor: Mamba-based Traversal Of Rationale For Large Language And Vision Models (2024) • No Venue
Lee et al.
Collavo: Crayon Large Language And Vision Model (2024) • No Venue
Lee et al.
Phantom Of Latent For Large Language And Vision Models (2024) • No Venue
Lee et al.
Moai: Mixture Of All Intelligence For Large Language And Vision Models (2024) • No Venue
Lee et al.
Parrot: Pareto-optimal Multi-reward Reinforcement Learning Framework For Text-to-image Generation (2024) • No Venue
Lee et al.
Stark: Social Long-term Multi-modal Conversation With Persona Commonsense Knowledge (2024) • No Venue
Lee et al.
Trol: Traversal Of Layers For Large Language And Vision Models (2024) • No Venue
Lee et al.
Xmodel-vlm: A Simple Baseline For Multimodal Vision Language Model (2024) • No Venue
Xu et al.
The Curse Of Multi-modalities: Evaluating Hallucinations Of Large Multimodal Models Across Language, Visual, And Audio (2024) • No Venue
Leng et al.
Slowfast-llava: A Strong Training-free Baseline For Video Large Language Models (2024) • No Venue
Xu et al.
LAION-SG: An Enhanced Large-scale Dataset For Training Complex Image-text Models With Structural Annotations (2024) • No Venue
Li et al.
Exploring The Potential Of Large Language Models In Self-adaptive Systems (2024) • 2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI) • 47 citations
Li et al.
Hunyuan-dit: A Powerful Multi-resolution Diffusion Transformer With Fine-grained Chinese Understanding (2024) • No Venue
Li et al.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-language Model And A Comprehensive Multimodal Dataset Towards General Medical AI (2024) • No Venue
Li et al.
Omnicorpus: A Unified Multimodal Corpus Of 10 Billion-level Images Interleaved With Text (2024) • No Venue
Li et al.
Naturalbench: Evaluating Vision-language Models On Natural Adversarial Samples (2024) • No Venue
Li et al.
Mini-gemini: Mining The Potential Of Multi-modality Vision Language Models (2024) • No Venue
Li et al.
Omnibench: Towards The Future Of Universal Omni-language Models (2024) • No Venue
Li et al.
Promptkd: Unsupervised Prompt Distillation For Vision-language Models (2024) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Li et al.
Tokenpacker: Efficient Visual Projector For Multimodal LLM (2024) • No Venue
Li et al.
Seeing And Understanding: Bridging Vision With Chemical Knowledge Via Chemvlm (2024) • No Venue
Li et al.
Synergen-vl: Towards Synergistic Image Understanding And Generation With Vision Experts And Token Folding (2024) • No Venue
Li et al.
Wolf: Captioning Everything With A World Summarization Framework (2024) • No Venue
Li et al.
MMIE: Massive Multimodal Interleaved Comprehension Benchmark For Large Vision-language Models (2024) • No Venue
Xia et al.
Document Parsing Unveiled: Techniques, Challenges, And Prospects For Structured Information Extraction (2024) • No Venue
Zhang et al.
GRAPE: Generalizing Robot Policy Via Preference Alignment (2024) • No Venue
Zhang et al.
Ferret-v2: An Improved Baseline For Referring And Grounding With Large Language Models (2024) • No Venue
Zhang et al.
Pyramiddrop: Accelerating Your Large Vision-language Models Via Pyramid Visual Redundancy Reduction (2024) • No Venue
Xing et al.
Humaneval-v: Benchmarking High-level Visual Reasoning With Complex Diagrams In Coding Tasks (2024) • No Venue
Zhang et al.
Improve Vision Language Model Chain-of-thought Reasoning (2024) • No Venue
Zhang et al.
Critic-v: VLM Critics Help Catch VLM Errors In Multimodal Reasoning (2024) • No Venue
Zhang et al.
How Far Are We From Intelligent Visual Deductive Reasoning? (2024) • No Venue
Zhang et al.
Mmed-rag: Versatile Multimodal RAG System For Medical Vision Language Models (2024) • No Venue
Xia et al.
Moe-llava: Mixture Of Experts For Large Vision-language Models (2024) • No Venue
Lin et al.
Showui: One Vision-language-action Model For GUI Visual Agent (2024) • No Venue
Lin et al.
Pixwizard: Versatile Image-to-image Visual Assistant With Open-language Instructions (2024) • No Venue
Lin et al.
Show-o: One Single Transformer To Unify Multimodal Understanding And Generation (2024) • No Venue
Xie et al.
Open-finllms: Open Multimodal Large Language Models For Financial Applications (2024) • No Venue
Xie et al.
Towards Universal Fake Image Detectors That Generalize Across Generative Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 175 citations
Utkarsh Ojha, Yuheng Li, Yong Jae Lee
ELITE: Encoding Visual Concepts Into Textual Embeddings For Customized Text-to-image Generation (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 160 citations
Wei et al.
Key-locked Rank One Editing For Text-to-image Personalization (2023) • Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings • 87 citations
Tewel et al.
ECLIPSE: A Resource-efficient Text-to-image Prior For Image Generations (2023) • No Venue
Patel et al.
Kosmos-2: Grounding Multimodal Large Language Models To The World (2023) • No Venue
Peng et al.
BEST: BERT Pre-training For Sign Language Recognition With Coupling Tokenization (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 41 citations
Zhao et al.
Visual Instruction Tuning (2023) • Arxiv • 659 citations
Liu et al.
Hallusionbench: You See What You Think? Or You Think What You See? An Image-context Reasoning Benchmark Challenging For Gpt-4v(ision), Llava-1.5, And Other Multi-modality Models (2023) • No Venue
Liu et al.
Git-mol: A Multi-modal Large Language Model For Molecular Science With Graph, Image, And Text (2023) • Computers in Biology and Medicine • 48 citations
Liu et al.
Grounding Complex Natural Language Commands For Temporal Tasks In Unseen Environments (2023) • Arxiv • 234 citations
Liu et al.
Improved Baselines With Visual Instruction Tuning (2023) • No Venue
Liu et al.
Llava-plus: Learning To Use Tools For Creating Multimodal Agents (2023) • No Venue
Liu et al.
Revisiting Temporal Modeling For Clip-based Image-to-video Knowledge Transferring (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 44 citations
Liu et al.
Multi-task Recommendations With Reinforcement Learning (2023) • IEEE Transactions on Image Processing • 53 citations
Liu et al.
Aligning Large Multimodal Models With Factually Augmented RLHF (2023) • No Venue
Sun et al.
EVA-CLIP: Improved Training Techniques For CLIP At Scale (2023) • Arxiv • 77 citations
Sun et al.
CLIP-VG: Self-paced Curriculum Adapting Of CLIP For Visual Grounding (2023) • IEEE Transactions on Multimedia • 40 citations
Xiao et al.
Florence-2: Advancing A Unified Representation For A Variety Of Vision Tasks (2023) • No Venue
Xiao et al.
FLIP: Cross-domain Face Anti-spoofing With Language Guidance (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar
Unleashing Text-to-image Diffusion Models For Visual Perception (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 124 citations
Zhao et al.
Alpha-clip: A CLIP Model Focusing On Wherever You Want (2023) • No Venue
Sun et al.
Generative Multimodal Models Are In-context Learners (2023) • No Venue
Sun et al.
Inconsistent Matters: A Knowledge-guided Dual-consistency Network For Multi-modal Rumor Detection (2023) • IEEE Transactions on Knowledge and Data Engineering • 47 citations
Sun et al.
Visual Language Pretrained Multiple Instance Zero-shot Transfer For Histopathology Images (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 79 citations
Lu et al.
Geolayoutlm: Geometric Pre-training For Visual Information Extraction (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Luo et al.
Information Screening Whilst Exploiting! Multimodal Relation Extraction With Feature Denoising And Multimodal Topic Modeling (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 53 citations
Wu et al.
Kosmos-2.5: A Multimodal Literate Model (2023) • No Venue
Lv et al.
What Does CLIP Know About A Red Circle? Visual Prompt Engineering For Vlms (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 78 citations
Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
3d-vista: Pre-trained Transformer For 3D Vision And Text Alignment (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 82 citations
Zhu et al.
HOICLIP: Efficient Knowledge Transfer For HOI Detection With Vision-language Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Ning et al.
Diversity Is Definitely Needed: Improving Model-agnostic Zero-shot Classification Via Stable Diffusion (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) • 41 citations
Shipard et al.
Cross2stra: Unpaired Cross-lingual Image Captioning With Cross-lingual Cross-modal Structure-pivoted Alignment (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 41 citations
Wu et al.
Med-flamingo: A Multimodal Medical Few-shot Learner (2023) • No Venue
Moor et al.
Verbs In Action: Improving Verb Understanding In Video-language Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 40 citations
Momeni et al.
Anymal: An Efficient And Scalable Any-modality Augmented Language Model (2023) • No Venue
Moon et al.
Embodiedgpt: Vision-language Pre-training Via Embodied Chain Of Thought (2023) • Arxiv • 41 citations
Mu et al.
Bubogpt: Enabling Visual Grounding In Multi-modal Llms (2023) • No Venue
Zhao et al.
Vadclip: Adapting Vision-language Models For Weakly Supervised Video Anomaly Detection (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 84 citations
Wu et al.
Video-chatgpt: Towards Detailed Video Understanding Via Large Vision And Language Models (2023) • Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 224 citations
Maaz et al.
An Early Evaluation Of Gpt-4v(ision) (2023) • No Venue
Wu et al.
On The Opportunities And Challenges Of Foundation Models For Geospatial Artificial Intelligence (2023) • Arxiv • 63 citations
Mai et al.
Q-instruct: Improving Low-level Visual Abilities For Multi-modality Foundation Models (2023) • No Venue
Wu et al.
Next-gpt: Any-to-any Multimodal LLM (2023) • No Venue
Wu et al.
Enhancing CLIP With GPT-4: Harnessing Visual Descriptions As Prompts (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 45 citations
Maniparambil et al.
Cogvlm: Visual Expert For Pretrained Language Models (2023) • No Venue
Wang et al.
A Survey On Multimodal Large Language Models (2023) • National Science Review • 271 citations
Yin et al.
Docllm: A Layout-aware Generative Language Model For Multimodal Document Understanding (2023) • No Venue
Wang et al.
Clip-guided Prototype Modulating For Few-shot Action Recognition (2023) • International Journal of Computer Vision • 57 citations
Wang et al.
Cross-modal Contrastive Learning For Multimodal Fake News Detection (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 58 citations
Wang et al.
Gesturediffuclip: Gesture Diffusion Model With CLIP Latents (2023) • ACM Transactions on Graphics • 113 citations
Tenglong Ao, Zeyi Zhang, Libin Liu
Domain-agnostic Tuning-encoder For Fast Personalization Of Text-to-image Models (2023) • SIGGRAPH Asia 2023 Conference Papers • 41 citations
Arar et al.
Chatcad: Interactive Computer-aided Diagnosis On Medical Image Using Large Language Models (2023) • Communications Engineering • 88 citations
Wang et al.
Openflamingo: An Open-source Framework For Training Large Autoregressive Vision-language Models (2023) • No Venue
Awadalla et al.
Dreamdiffusion: Generating High-quality Images From Brain EEG Signals (2023) • No Venue
Bai et al.
Learning To Exploit Temporal Structure For Biomedical Vision-language Processing (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Bannur et al.
Towards Language Models That Can See: Computer Vision Through The LENS Of Natural Language (2023) • No Venue
Berrios et al.
Nougat: Neural Optical Understanding For Academic Documents (2023) • No Venue
Blecher et al.
Promptify: Text-to-image Generation Through Interactive Prompt Exploration With Large Language Models (2023) • UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology • 110 citations
Brade et al.
RT-2: Vision-language-action Models Transfer Web Knowledge To Robotic Control (2023) • No Venue
Brohan et al.
Contextual Object Detection With Multimodal Large Language Models (2023) • International Journal of Computer Vision • 48 citations
Zang et al.
Attend-and-excite: Attention-based Semantic Guidance For Text-to-image Diffusion Models (2023) • ACM Transactions on Graphics • 291 citations
Chefer et al.
Unleashing The Potential Of Prompt Engineering For Large Language Models (2023) • Patterns • 86 citations
Chen et al.
Minigpt-v2: Large Language Model As A Unified Interface For Vision-language Multi-task Learning (2023) • No Venue
Chen et al.
Llava-interactive: An All-in-one Demo For Image Chat, Segmentation, Generation And Editing (2023) • No Venue
Chen et al.
Internvl: Scaling Up Vision Foundation Models And Aligning For Generic Visual-linguistic Tasks (2023) • No Venue
Chen et al.
LL3DA: Visual Interactive Instruction Tuning For Omni-3d Understanding, Reasoning, And Planning (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 48 citations
Chen et al.
Pixart-α: Fast Training Of Diffusion Transformer For Photorealistic Text-to-image Synthesis (2023) • No Venue
Chen et al.
Subject-driven Text-to-image Generation Via Apprenticeship Learning (2023) • Arxiv • 46 citations
Chen et al.
One Adapter For All Programming Languages? Adapter Tuning For Code Search And Summarization (2023) • Arxiv • 41 citations
Wang et al.
Object-aware Distillation Pyramid For Open-vocabulary Object Detection (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Wang et al.
Internvid: A Large-scale Video-text Dataset For Multimodal Understanding And Generation (2023) • No Venue
Wang et al.
CVT-SLR: Contrastive Visual-textual Transformation For Sign Language Recognition With Variational Alignment (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Zheng et al.
Preventing Zero-shot Transfer Degradation In Continual Learning Of Vision-language Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 55 citations
Zheng et al.
Parameter-efficient Transfer Learning For Remote Sensing Image-text Retrieval (2023) • IEEE Transactions on Geoscience and Remote Sensing • 60 citations
Yuan Yuan, Yang Zhan, Zhitong Xiong
Tinygpt-v: Efficient Multimodal Large Language Model Via Small Backbones (2023) • No Venue
Zhengqing Yuan, Zhaoxu Li, Lichao Sun
CLIP The Gap: A Single Domain Generalization Approach For Object Detection (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 87 citations
Vidit Vidit, Martin Engilberge, Mathieu Salzmann
SAM-CLIP: Merging Vision Foundation Models Towards Semantic And Spatial Understanding (2023) • No Venue
Wang et al.
Open-ended Medical Visual Question Answering Through Prefix Tuning Of Language Models (2023) • Lecture Notes in Computer Science • 43 citations
Sonsbeek et al.
A Picture Is Worth More Than 77 Text Tokens: Evaluating Clip-style Models On Dense Captions (2023) • No Venue
Urbanek et al.
L3MVN: Leveraging Large Language Models For Visual Target Navigation (2023) • 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) • 58 citations
Bangguo Yu, Hamidreza Kasaei, Ming Cao
Mobilevlm : A Fast, Reproducible And Strong Vision Language Assistant For Mobile Devices (2023) • No Venue
Chu et al.
Turning A CLIP Model Into A Scene Text Detector (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 75 citations
Yu et al.
Merlin:empowering Multimodal Llms With Foresight Minds (2023) • No Venue
Yu et al.
Mm-vet: Evaluating Large Multimodal Models For Integrated Capabilities (2023) • Arxiv • 59 citations
Yu et al.
Capsfusion: Rethinking Image-text Data At Scale (2023) • No Venue
Yu et al.
Emu: Enhancing Image Generation Models Using Photogenic Needles In A Haystack (2023) • No Venue
Dai et al.
Aom: Detecting Aspect-oriented Information For Multimodal Aspect-based Sentiment Analysis (2023) • Findings of the Association for Computational Linguistics: ACL 2023 • 41 citations
Zhou et al.
Dreamllm: Synergistic Multimodal Comprehension And Creation (2023) • No Venue
Dong et al.
Can An Embodied Agent Find Your "cat-shaped Mug"? Llm-guided Exploration For Zero-shot Object Navigation (2023) • IEEE Robotics and Automation Letters • 55 citations
Vishnu Sashank Dorbala, James F. Mullen, Dinesh Manocha
Ferret: Refer And Ground Anything Anywhere At Any Granularity (2023) • Arxiv • 43 citations
You et al.
CXR-CLIP: Toward Large Scale Chest X-ray Language-image Pre-training (2023) • Lecture Notes in Computer Science • 50 citations
You et al.
Detecting And Grounding Multi-modal Media Manipulation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 52 citations
Rui Shao, Tianxing Wu, Ziwei Liu
Semantic Anomaly Detection With Large Language Models (2023) • Autonomous Robots • 63 citations
Elhafsi et al.
Unified Pre-training With Pseudo Texts For Text-to-image Person Re-identification (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 48 citations
Shao et al.
Transferable Decoding With Visual Entities For Zero-shot Image Captioning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Fei et al.
Scene Graph As Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation With Visual Scene Hallucination (2023) • Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 43 citations
Fei et al.
A Multi-task Multi-stage Transitional Training Framework For Neural Chat Translation (2023) • Proceedings of the 2023 ACM International Conference on Multimedia Retrieval • 48 citations
Zhou et al.
Diverse Data Augmentation With Diffusions For Effective Test-time Prompt Tuning (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 44 citations
Feng et al.
Foundation Models In Robotics: Applications, Challenges, And The Future (2023) • The International Journal of Robotics Research • 89 citations
Firoozi et al.
Encoder-based Domain Tuning For Fast Personalization Of Text-to-image Models (2023) • ACM Transactions on Graphics • 122 citations
Gal et al.
G-llava: Solving Geometric Problem With Multi-modal Large Language Model (2023) • No Venue
Gao et al.
Assistgpt: A General Multi-modal Assistant That Can Plan, Execute, Inspect, And Learn (2023) • No Venue
Gao et al.
Llama-adapter V2: Parameter-efficient Visual Instruction Model (2023) • Arxiv • 117 citations
Gao et al.
Physically Grounded Vision-language Models For Robotic Manipulation (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 60 citations
Gao et al.
Joint Visual Grounding And Tracking With Natural Language Specification (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 88 citations
Zhou et al.
Mplug-owl2: Revolutionizing Multi-modal Large Language Model With Modality Collaboration (2023) • No Venue
Ye et al.
Emu Video: Factorizing Text-to-video Generation By Explicit Image Conditioning (2023) • No Venue
Girdhar et al.
Imagebind: One Embedding Space To Bind Them All (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 552 citations
Girdhar et al.
Detclipv2: Scalable Open-vocabulary Object Detection Pre-training Via Word-region Alignment (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Yao et al.
Detecting And Preventing Hallucinations In Large Vision Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 93 citations
Anisha Gunjal, Jihan Yin, Erhan Bas
Visual-language Prompt Tuning With Knowledge-guided Context Optimization (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 141 citations
Hantao Yao, Rui Zhang, Changsheng Xu
On The Adversarial Robustness Of Multi-modal Foundation Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 42 citations
Christian Schlarmann, Matthias Hein
A Systematic Survey Of Prompt Engineering On Vision-language Foundation Models (2023) • Arxiv • 61 citations
Gu et al.
Hallusionbench: An Advanced Diagnostic Suite For Entangled Language Hallucination And Visual Illusion In Large Vision-language Models (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Guan et al.
Onellm: One Framework To Align All Modalities With Language (2023) • No Venue
Han et al.
Biomedclip: A Multimodal Biomedical Foundation Model Pretrained From Fifteen Million Scientific Image-text Pairs (2023) • Arxiv • 87 citations
Zhang et al.
Biomedgpt: A Generalist Vision-language Foundation Model For Diverse Biomedical Tasks (2023) • Arxiv • 49 citations
Zhang et al.
CLIP Goes 3D: Leveraging Prompt Tuning For Language Grounded 3D Recognition (2023) • 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) • 50 citations
Deepti Hegde, Jeya Maria Jose Valanarasu, Vishal M. Patel
Dreamface: Progressive Generation Of Animatable 3D Faces Under Text Guidance (2023) • ACM Transactions on Graphics • 54 citations
Zhang et al.
Pandagpt: One Model To Instruction-follow Them All (2023) • Arxiv • 46 citations
Su et al.
De-diffusion Makes Text A Strong Cross-modal Interface (2023) • No Venue
Wei et al.
Cogagent: A Visual Language Model For GUI Agents (2023) • No Venue
Hong et al.
3D-LLM: Injecting The 3D World Into Large Language Models (2023) • No Venue
Hong et al.
TEAL: Tokenize And Embed ALL For Multi-modal Large Language Models (2023) • No Venue
Yang et al.
Graspgpt: Leveraging Semantic Knowledge From A Large Language Model For Task-oriented Grasping (2023) • IEEE Robotics and Automation Letters • 67 citations
Tang et al.
RSGPT: A Remote Sensing Vision Language Model And Benchmark (2023) • ISPRS Journal of Photogrammetry and Remote Sensing • 46 citations
Hu et al.
BLIVA: A Simple Multimodal LLM For Better Handling Of Text-rich Visual Questions (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Hu et al.
TIFA: Accurate And Interpretable Text-to-image Faithfulness Evaluation With Question Answering (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Hu et al.
Llm-grounder: Open-vocabulary 3D Visual Grounding With Large Language Model As An Agent (2023) • 2024 IEEE International Conference on Robotics and Automation (ICRA) • 43 citations
Yang et al.
Vid2seq: Large-scale Pretraining Of A Visual Language Model For Dense Video Captioning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 179 citations
Yang et al.
Octopus: Embodied Vision-language Programmer From Environmental Feedback (2023) • No Venue
Yang et al.
MM-REACT: Prompting Chatgpt For Multimodal Reasoning And Action (2023) • Arxiv • 78 citations
Yang et al.
Set-of-mark Prompting Unleashes Extraordinary Visual Grounding In GPT-4V (2023) • No Venue
Yang et al.
Segment And Caption Anything (2023) • No Venue
Huang et al.
Vtimellm: Empower LLM To Grasp Video Moments (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Huang et al.
Tech: Text-guided Reconstruction Of Lifelike Clothed Humans (2023) • No Venue
Huang et al.
Genassist: Making Image Generation Accessible (2023) • Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology • 46 citations
Mina Huh, Yi-Hao Peng, Amy Pavel
Implicit Neural Representation For Cooperative Low-light Image Enhancement (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 144 citations
Yang et al.
The Dawn Of Lmms: Preliminary Explorations With Gpt-4v(ision) (2023) • Arxiv • 160 citations
Yang et al.
Fusecap: Leveraging Large Language Models For Enriched Fused Image Captions (2023) • 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) • 47 citations
Rotstein et al.
Text2room: Extracting Textured 3D Meshes From 2D Text-to-image Models (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 92 citations
Höllein et al.
Word-as-image For Semantic Typography (2023) • ACM Transactions on Graphics • 51 citations
Iluz et al.
Quilt-1m: One Million Image-text Pairs For Histopathology (2023) • Arxiv • 52 citations
Ikezogwo et al.
From Image To Language: A Critical Analysis Of Visual Question Answering (VQA) Approaches, Challenges, And Opportunities (2023) • Information Fusion • 58 citations
Ishmam et al.
Winclip: Zero-/few-shot Anomaly Classification And Segmentation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 223 citations
Jeong et al.
Vary: Scaling Up The Vision Vocabulary For Large Vision-language Models (2023) • No Venue
Wei et al.
Timechat: A Time-sensitive Multimodal Large Language Model For Long Video Understanding (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 85 citations
Ren et al.
Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 204 citations
Ding Jiang, Mang Ye
Clip-count: Towards Text-guided Zero-shot Object Counting (2023) • Proceedings of the 31st ACM International Conference on Multimedia • 50 citations
Ruixiang Jiang, Lingbo Liu, Changwen Chen
Hallucination Augmented Contrastive Learning For Multimodal Large Language Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Jiang et al.
Pixellm: Pixel Reasoning With Large Multimodal Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Ren et al.
Meta-transformer: A Unified Framework For Multimodal Learning (2023) • No Venue
Zhang et al.
Chat-univi: Unified Visual Representation Empowers Large Language Models With Image And Video Understanding (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 90 citations
Jin et al.
Video-text As Game Players: Hierarchical Banzhaf Interaction For Cross-modal Representation Learning (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Jin et al.
Make-a-character: High Quality Text-to-3d Character Generation Within Minutes (2023) • No Venue
Ren et al.
Universal Instance Perception As Object Discovery And Retrieval (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Yan et al.
Videodirectorgpt: Consistent Multi-scene Video Generation Via Llm-guided Planning (2023) • No Venue
Lin et al.
Video-llava: Learning United Visual Representation By Alignment Before Projection (2023) • No Venue
Lin et al.
VILA: On Pre-training For Visual Language Models (2023) • No Venue
Lin et al.
Detgpt: Detect What You Need Via Reasoning (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 49 citations
Pi et al.
A Prompt Log Analysis Of Text-to-image Generation Systems (2023) • Proceedings of the ACM Web Conference 2023 • 40 citations
Xie et al.
Univtg: Towards Unified Video-language Temporal Grounding (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Lin et al.
Iterative Prompt Learning For Unsupervised Backlit Image Enhancement (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 129 citations
Liang et al.
Crowdclip: Unsupervised Crowd Counting Via Vision-language Model (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 58 citations
Liang et al.
Rich Human Feedback For Text-to-image Generation (2023) • No Venue
Liang et al.
Egovlpv2: Egocentric Video-language Pre-training With Fusion In The Backbone (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 51 citations
Pramanick et al.
PMC-CLIP: Contrastive Language-image Pre-training Using Biomedical Documents (2023) • Lecture Notes in Computer Science • 110 citations
Lin et al.
Multimodality Helps Unimodality: Cross-modal Few-shot Learning With Multimodal Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 97 citations
Lin et al.
Learning To Model The World With Language (2023) • No Venue
Lin et al.
MM-VID: Advancing Video Understanding With Gpt-4v(ision) (2023) • No Venue
Lin et al.
BLIP-2: Bootstrapping Language-image Pre-training With Frozen Image Encoders And Large Language Models (2023) • Arxiv • 65 citations
Li et al.
Masked Vision And Language Pre-training With Unimodal And Multimodal Contrastive Losses For Medical Visual Question Answering (2023) • Lecture Notes in Computer Science • 41 citations
Li et al.
A Closer Look At The Explainability Of Contrastive Language-image Pre-training (2023) • Arxiv • 41 citations
Li et al.
Evaluating Parameter-efficient Transfer Learning Approaches On SURE Benchmark For Speech Understanding (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing • 281 citations
Li et al.
GLIGEN: Open-set Grounded Text-to-image Generation (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 364 citations
Li et al.
Llama-vid: An Image Is Worth 2 Tokens In Large Language Models (2023) • Lecture Notes in Computer Science • 89 citations
Yanwei Li, Chengyao Wang, Jiaya Jia
Llava-med: Training A Large Language-and-vision Assistant For Biomedicine In One Day (2023) • Arxiv • 216 citations
Li et al.
Otter: A Multi-modal Model With In-context Instruction Tuning (2023) • Arxiv • 87 citations
Li et al.
Multimodal Foundation Models: From Specialists To General-purpose Assistants (2023) • No Venue
Li et al.
Otterhd: A High-resolution Multi-modality Model (2023) • No Venue
Li et al.
Videogen: A Reference-guided Latent Diffusion Approach For High Definition Text-to-video Generation (2023) • No Venue
Li et al.
Videochat: Chat-centric Video Understanding (2023) • Arxiv • 90 citations
Li et al.
Vision-language Models In Remote Sensing: Current Progress And Future Trends (2023) • IEEE Geoscience and Remote Sensing Magazine • 80 citations
Li et al.
Visual Adversarial Examples Jailbreak Aligned Large Language Models (2023) • Proceedings of the AAAI Conference on Artificial Intelligence • 73 citations
Qi et al.
Imagereward: Learning And Evaluating Human Preferences For Text-to-image Generation (2023) • Arxiv • 99 citations
Xu et al.
Demystifying CLIP Data (2023) • No Venue
Xu et al.
Knowledge-enhanced Visual-language Pre-training On Chest Radiology Images (2023) • Nature Communications • 134 citations
Zhang et al.
PMC-VQA: Visual Instruction Tuning For Medical Visual Question Answering (2023) • Arxiv • 57 citations
Zhang et al.
Language-driven Representation Learning For Robotics (2023) • Robotics: Science and Systems XIX • 47 citations
Karamcheti et al.
Slime: Segment Like Me (2023) • No Venue
Khani et al.
VILA: Learning Image Aesthetics From User Comments With Vision-language Pretraining (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Ke et al.
LERF: Language Embedded Radiance Fields (2023) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 200 citations
Kerr et al.
A Transformer-based Representation-learning Model With Unified Processing Of Multimodal Input For Clinical Diagnostics (2023) • Nature Biomedical Engineering • 223 citations
Zhou et al.
Kandinsky: An Improved Text-to-image Synthesis With Image Prior And Latent Diffusion (2023) • No Venue
Razzhigaev et al.
Videopoet: A Large Language Model For Zero-shot Video Generation (2023) • No Venue
Kondratyuk et al.
RS5M And Georsclip: A Large Scale Vision-language Dataset And A Large Vision-language Model For Remote Sensing (2023) • IEEE Transactions on Geoscience and Remote Sensing • 43 citations
Zhang et al.
Vision Language Models In Autonomous Driving: A Survey And Outlook (2023) • IEEE Transactions on Intelligent Vehicles • 46 citations
Zhou et al.
Geochat: Grounded Large Vision-language Model For Remote Sensing (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 102 citations
Kuckreja et al.
Glamm: Pixel Grounding Large Multimodal Model (2023) • No Venue
Rasheed et al.
Vision-language Models For Vision Tasks: A Survey (2023) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 403 citations
Zhang et al.
LISA: Reasoning Segmentation Via Large Language Model (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 205 citations
Lai et al.
OBELICS: An Open Web-scale Filtered Dataset Of Interleaved Image-text Documents (2023) • No Venue
Laurençon et al.
Open-vocabulary Panoptic Segmentation With Text-to-image Diffusion Models (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 266 citations
Xu et al.
Video-llama: An Instruction-tuned Audio-visual Language Model For Video Understanding (2023) • Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations • 367 citations
Hang Zhang, Xin Li, Lidong Bing
Filtering, Distillation, And Hard Negatives For Vision-language Pre-training (2023) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Radenovic et al.
Mitigating Object Hallucinations In Large Vision-language Models Through Visual Contrastive Decoding (2023) • 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 78 citations
Leng et al.
Layoutllm-t2i: Eliciting Layout Guidance From LLM For Text-to-image Generation (2023) • MM '23: The 31st ACM International Conference on Multimedia • 65 citations
Qu et al.
Compositional Temporal Grounding With Structured Variational Cross-graph Correspondence Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 57 citations
Li et al.
Clip-event: Connecting Text And Images With Event Structures (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Li et al.
BLIP: Bootstrapping Language-image Pre-training For Unified Vision-language Understanding And Generation (2022) • Arxiv • 850 citations
Li et al.
CLMLF:A Contrastive Learning And Multi-layer Fusion Method For Multimodal Sentiment Detection (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 85 citations
Li et al.
Cross-modal Clinical Graph Transformer For Ophthalmic Report Generation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 47 citations
Li et al.
Mplug: Effective And Efficient Vision-language Learning By Cross-modal Skip-connections (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 141 citations
Li et al.
Envedit: Environment Editing For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 64 citations
Jialu Li, Hao Tan, Mohit Bansal
Visual-language Navigation Pretraining Via Prompt-based Environmental Self-exploration (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Liang et al.
Proposalclip: Unsupervised Open-category Object Proposal Generation Via Exploiting CLIP Cues (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Shi et al.
Egocentric Video-language Pretraining (2022) • Arxiv • 45 citations
Lin et al.
Frozen CLIP Models Are Efficient Video Learners (2022) • Lecture Notes in Computer Science • 148 citations
Lin et al.
ADAPT: Vision-language Navigation With Modality-aligned Action Prompts (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 42 citations
Lin et al.
Multi-view Transformer For 3D Visual Grounding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 88 citations
Huang et al.
Layoutlmv3: Pre-training For Document AI With Unified Text And Image Masking (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 379 citations
Huang et al.
Language As Queries For Referring Video Object Segmentation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 123 citations
Wu et al.
Tubedetr: Spatio-temporal Video Grounding With Transformers (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 72 citations
Yang et al.
Vision-language Pre-training With Triple Contrastive Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 244 citations
Yang et al.
Chinese CLIP: Contrastive Vision-language Pretraining In Chinese (2022) • Arxiv • 51 citations
Yang et al.
Entity-enhanced Adaptive Reconstruction Network For Weakly Supervised Referring Expression Grounding (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 48 citations
Liu et al.
Ts2-net: Token Shift And Selection Transformer For Text-video Retrieval (2022) • Lecture Notes in Computer Science • 97 citations
Liu et al.
UMT: Unified Multi-modal Transformers For Joint Video Moment Retrieval And Highlight Detection (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 125 citations
Liu et al.
Reducing The Vision And Language Bias For Temporal Sentence Grounding (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 45 citations
Daizong Liu, Xiaoye Qu, Wei Hu
Asymmetric Cross-scale Alignment For Text-based Person Search (2022) • IEEE Transactions on Multimedia • 54 citations
Ji et al.
Partslip: Low-shot Part Segmentation For 3D Point Clouds Via Pretrained Image-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Liu et al.
Pseudo-q: Generating Pseudo Language Queries For Visual Grounding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 54 citations
Jiang et al.
Opal: Multimodal Image Generation For News Illustration (2022) • Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology • 87 citations
Vivian Liu, Han Qiao, Lydia Chilton
A Generalist Agent (2022) • Transactions on Machine Learning Research 11/2022 https://openreview.net/forum?id=1ikK0kHjvj • 60 citations
Reed et al.
GL-RG: Global-local Representation Granularity For Video Captioning (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 52 citations
Yan et al.
Clip-mesh: Generating Textured Meshes From Text Using Pretrained Image-text Models (2022) • SIGGRAPH Asia 2022 Conference Papers • 162 citations
Khalid et al.
Vision-language Pre-training For Multimodal Aspect-based Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 103 citations
Yan Ling, Jianfei Yu, Rui Xia
Conditional Prompt Learning For Vision-language Models (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 1126 citations
Zhou et al.
Clip-vip: Adapting Pre-trained Image-text Model To Video-language Representation Alignment (2022) • Arxiv • 53 citations
Xue et al.
Decomposing Nerf For Editing Via Feature Field Distillation (2022) • Arxiv • 103 citations
Sosuke Kobayashi, Eiichi Matsumoto, Vincent Sitzmann
ULIP: Learning A Unified Representation Of Language, Images, And Point Clouds For 3D Understanding (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 169 citations
Xue et al.
Beyond A Pre-trained Object Detector: Cross-modal Textual And Visual Context For Image Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 85 citations
Chia-Wen Kuo, Zsolt Kira
Groupvit: Semantic Segmentation Emerges From Text Supervision (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 352 citations
Xu et al.
Pix2struct: Screenshot Parsing As Pretraining For Visual Language Understanding (2022) • Arxiv • 45 citations
Lee et al.
Improving Mispronunciation Detection With Wav2vec2-based Momentum Pseudo-labeling For Accentedness And Intelligibility Assessment (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Yang et al.
CM3: A Causal Masked Multimodal Model Of The Internet (2022) • Arxiv • 40 citations
Aghajanyan et al.
Zero-shot Temporal Action Detection Via Vision-language Prompting (2022) • Lecture Notes in Computer Science • 43 citations
Nag et al.
Test-time Prompt Tuning For Zero-shot Generalization In Vision-language Models (2022) • Arxiv • 112 citations
Shu et al.
Learning To Compose Soft Prompts For Compositional Zero-shot Learning (2022) • Arxiv • 41 citations
Nihal V. Nayak, Peilin Yu, Stephen H. Bach
TEACH: Temporal Action Composition For 3D Humans (2022) • 2022 International Conference on 3D Vision (3DV) • 98 citations
Athanasiou et al.
Omnivl:one Foundation Model For Image-language And Video-language Tasks (2022) • Arxiv • 68 citations
Wang et al.
Multimae: Multi-modal Multi-task Masked Autoencoders (2022) • Lecture Notes in Computer Science • 186 citations
Bachmann et al.
End-to-end Transformer Based Model For Image Captioning (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 107 citations
Yiyu Wang, Jungang Xu, Yingfei Sun
Exploring Visual Prompts For Adapting Large-scale Models (2022) • Arxiv • 106 citations
Bahng et al.
Image As A Foreign Language: Beit Pretraining For All Vision And Vision-language Tasks (2022) • Arxiv • 148 citations
Wang et al.
Medclip: Contrastive Learning From Unpaired Medical Images And Text (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 402 citations
Wang et al.
Counterfactual Cycle-consistent Learning For Instruction Following And Generation In Vision-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Wang et al.
Text2live: Text-driven Layered Image And Video Editing (2022) • Lecture Notes in Computer Science • 176 citations
Bar-Tal et al.
Socratic Models: Composing Zero-shot Multimodal Reasoning With Language (2022) • Arxiv • 168 citations
Zeng et al.
Making The Most Of Text Semantics To Improve Biomedical Vision--language Processing (2022) • Lecture Notes in Computer Science • 169 citations
Boecking et al.
Multimodal Contrastive Learning With Limoe: The Language-image Mixture Of Experts (2022) • Arxiv • 72 citations
Mustafa et al.
Exploiting Unlabeled Data With Vision And Language Models For Object Detection (2022) • Lecture Notes in Computer Science • 74 citations
Zhao et al.
Revisiting The "video" In Video-language Understanding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 106 citations
Buch et al.
Instructpix2pix: Learning To Follow Image Editing Instructions (2022) • Arxiv • 40 citations
Tim Brooks, Aleksander Holynski, Alexei A. Efros
LASP: Text-to-text Optimization For Language-aware Soft Prompting Of Vision & Language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 40 citations
Adrian Bulat, Georgios Tzimiropoulos
Expanding Language-image Pretrained Models For General Video Recognition (2022) • Lecture Notes in Computer Science • 221 citations
Ni et al.
Open-vocabulary DETR With Conditional Matching (2022) • Lecture Notes in Computer Science • 155 citations
Zang et al.
Roentgen: Vision-language Foundation Model For Chest X-ray Generation (2022) • Arxiv • 55 citations
Chambon et al.
Adapting Pretrained Vision-language Foundational Models To Medical Imaging Domains (2022) • Foundation Models for Decision Making Workshop at Neural Information Processing Systems 2022 • 43 citations
Chambon et al.
Unified Vision And Language Prompt Learning (2022) • Arxiv • 54 citations
Zang et al.
Align, Reason And Learn: Enhancing Medical Vision-and-language Pre-training With Knowledge (2022) • MM '22: The 30th ACM International Conference on Multimedia • 65 citations
Zhihong Chen, Guanbin Li, Xiang Wan
Hybrid Transformer With Multi-level Fusion For Multimodal Knowledge Graph Completion (2022) • SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 152 citations
Chen et al.
Re-imagen: Retrieval-augmented Text-to-image Generator (2022) • Arxiv • 41 citations
Chen et al.
Pali: A Jointly-scaled Multilingual Language-image Model (2022) • Arxiv • 194 citations
Chen et al.
Multi-level Contrastive Learning For Cross-lingual Alignment (2022) • Lecture Notes in Computer Science • 106 citations
Chen et al.
Think Global, Act Local: Dual-scale Graph Transformer For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 127 citations
Chen et al.
A Unified Sequence Interface For Vision Tasks (2022) • Arxiv • 49 citations
Chen et al.
What Matters In Language Conditioned Robotic Imitation Learning Over Unstructured Data (2022) • IEEE Robotics and Automation Letters • 49 citations
Oier Mees, Lukas Hermann, Wolfram Burgard
Bidirectional Cross-modal Knowledge Exploration For Video Recognition With Pre-trained Vision-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 75 citations
Wu et al.
Revisiting Classifier: Transferring Vision-language Models For Video Recognition (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 69 citations
Wenhao Wu, Zhun Sun, Wanli Ouyang
Winoground: Probing Vision And Language Models For Visio-linguistic Compositionality (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Thrush et al.
Vista: Vision And Scene Text Aggregation For Cross-modal Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 81 citations
Cheng et al.
Tableformer: Table Structure Understanding With Transformers (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 63 citations
Nassar et al.
Towards Lightweight Transformer Via Group-wise Transformation For Vision-and-language Tasks (2022) • IEEE Transactions on Image Processing • 51 citations
Luo et al.
Fine-grained Image Captioning With CLIP Reward (2022) • Findings of the Association for Computational Linguistics: NAACL 2022 • 52 citations
Cho et al.
M-SENA: An Integrated Platform For Multimodal Sentiment Analysis (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations • 58 citations
Mao et al.
ZSON: Zero-shot Object-goal Navigation Using Multimodal Goal Embeddings (2022) • Arxiv • 41 citations
Majumdar et al.
"this Is My Unicorn, Fluffy": Personalizing Frozen Vision-language Representations (2022) • Lecture Notes in Computer Science • 42 citations
Cohen et al.
See Finer, See More: Implicit Modality Alignment For Text-based Person Retrieval (2022) • Lecture Notes in Computer Science • 102 citations
Shu et al.
Centerclip: Token Clustering For Efficient Text-video Retrieval (2022) • SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval • 100 citations
Zhao et al.
VQGAN-CLIP: Open Domain Image Generation And Editing With Natural Language Guidance (2022) • Lecture Notes in Computer Science • 240 citations
Crowson et al.
Task Residual For Tuning Vision-language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 73 citations
Yu et al.
Transvg++: End-to-end Visual Grounding With Language Conditioned Vision Transformer (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 57 citations
Deng et al.
Sus-x: Training-free Name-only Transfer Of Vision-language Models (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 56 citations
Vishaal Udandarao, Ankush Gupta, Samuel Albanie
Enabling Multimodal Generation On CLIP Via Vision-language Knowledge Distillation (2022) • Findings of the Association for Computational Linguistics: ACL 2022 • 52 citations
Dai et al.
Scaling Autoregressive Models For Content-rich Text-to-image Generation (2022) • Arxiv • 339 citations
Yu et al.
Coca: Contrastive Captioners Are Image-text Foundation Models (2022) • Arxiv • 512 citations
Yu et al.
Learning-by-narrating: Narrative Pre-training For Zero-shot Dialogue Comprehension (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 109 citations
Zhao et al.
X-CLIP: End-to-end Multi-grained Contrastive Learning For Video-text Retrieval (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 217 citations
Ma et al.
Mukea: Multimodal Knowledge Extraction And Accumulation For Knowledge-based Visual Question Answering (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Ding et al.
Language-bridged Spatial-temporal Interaction For Referring Video Object Segmentation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 54 citations
Ding et al.
VLT: Vision-language Transformer And Query Generation For Referring Segmentation (2022) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 107 citations
Ding et al.
Bootstrapped Masked Autoencoders For Vision BERT Pretraining (2022) • Lecture Notes in Computer Science • 50 citations
Dong et al.
Reading-strategy Inspired Visual Representation Learning For Text-to-video Retrieval (2022) • IEEE Transactions on Circuits and Systems for Video Technology • 67 citations
Dong et al.
Understanding And Mitigating Overfitting In Prompt Tuning For Vision-language Models (2022) • IEEE Transactions on Circuits and Systems for Video Technology • 48 citations
Ma et al.
Coarse-to-fine Vision-language Pre-training With Fusion In The Backbone (2022) • Arxiv • 67 citations
Dou et al.
Teaching Structured Vision&language Concepts To Vision&language Models (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Doveh et al.
Learning To Prompt For Open-vocabulary Object Detection With Vision-language Model (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 253 citations
Du et al.
A Survey Of Vision-language Pre-trained Models (2022) • Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence • 111 citations
Du et al.
Translation Between Molecules And Natural Language (2022) • Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing • 92 citations
Edwards et al.
Unifying Vision, Text, And Layout For Universal Document Processing (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Tang et al.
Minedojo: Building Open-ended Embodied Agents With Internet-scale Knowledge (2022) • Arxiv • 59 citations
Fan et al.
P{\O}DA: Prompt-driven Zero-shot Domain Adaptation (2022) • 2023 IEEE/CVF International Conference on Computer Vision (ICCV) • 41 citations
Fahes et al.
Promptdet: Towards Open-vocabulary Detection Using Uncurated Images (2022) • Lecture Notes in Computer Science • 119 citations
Feng et al.
Target-driven Structured Transformer Planner For Vision-language Navigation (2022) • MM '22: The 30th ACM International Conference on Multimedia • 43 citations
Zhao et al.
An Empirical Study Of End-to-end Video-language Transformers With Masked Visual Modeling (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Fu et al.
Make-a-scene: Scene-based Text-to-image Generation With Human Priors (2022) • Lecture Notes in Computer Science • 265 citations
Gafni et al.
An Image Is Worth One Word: Personalizing Text-to-image Generation Using Textual Inversion (2022) • Arxiv • 460 citations
Gal et al.
End-to-end Generative Pretraining For Multimodal Video Captioning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 141 citations
Seo et al.
Plug-and-play VQA: Zero-shot VQA By Conjoining Large Pretrained Models With Zero Training (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 64 citations
Tiong et al.
Shifting More Attention To Visual Backbone: Query-modulated Refinement Networks For End-to-end Visual Grounding (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 78 citations
Ye et al.
Bridging Video-text Retrieval With Multiple Choice Questions (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 117 citations
Ge et al.
LAION-5B: An Open Large-scale Dataset For Training Next Generation Image-text Models (2022) • Arxiv • 1032 citations
Schuhmann et al.
A-OKVQA: A Benchmark For Visual Question Answering Using World Knowledge (2022) • Lecture Notes in Computer Science • 162 citations
Schwenk et al.
COTS: Collaborative Two-stream Vision-language Pre-training Model For Cross-modal Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 61 citations
Lu et al.
Prompt Distribution Learning (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 188 citations
Lu et al.
Cyclip: Cyclic Contrastive Language-image Pretraining (2022) • Arxiv • 46 citations
Goel et al.
Language Model Classifier Aligns Better With Physician Word Sensitivity Than Xgboost On Readmission Prediction (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 112 citations
Yang et al.
X-pool: Cross-modal Language-video Attention For Text-video Retrieval (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 170 citations
Gorti et al.
Vision-and-language Navigation: A Survey Of Tasks, Methods, And Future Directions (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 62 citations
Gu et al.
TM2T: Stochastic And Tokenized Modeling For The Reciprocal Generation Of 3D Human Motions And Texts (2022) • Lecture Notes in Computer Science • 142 citations
Guo et al.
MVP: Multimodality-guided Visual Pre-training (2022) • Lecture Notes in Computer Science • 52 citations
Wei et al.
Can Open Domain Question Answering Systems Answer Visual Knowledge Questions? (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 52 citations
Zhang et al.
Glipv2: Unifying Localization And Vision-language Understanding (2022) • Arxiv • 126 citations
Zhang et al.
Few-shot Object Detection With Fully Cross-transformer (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 154 citations
Han et al.
Temporal Alignment Networks For Long-term Video (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Tengda Han, Weidi Xie, Andrew Zisserman
Tailor Versatile Multi-modal Learning For Multi-label Emotion Recognition (2022) • Proceedings of the AAAI Conference on Artificial Intelligence • 63 citations
Zhang et al.
Tip-adapter: Training-free Adaption Of CLIP For Few-shot Classification (2022) • Lecture Notes in Computer Science • 246 citations
Zhang et al.
Can Machines Help Us Answering Question 16 In Datasheets, And In Turn Reflecting On Inappropriate Content? (2022) • FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency • 41 citations
Patrick Schramowski, Christopher Tauchmann, Kristian Kersting
Towards Universal Backward-compatible Representation Learning (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 60 citations
Zhang et al.
Text-only Training For Image Captioning Using Noise-injected CLIP (2022) • Findings of the Association for Computational Linguistics: EMNLP 2022 • 62 citations
David Nukrai, Ron Mokady, Amir Globerson
NLX-GPT: A Model For Natural Language Explanations In Vision And Vision-language Tasks (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 43 citations
Fawaz Sammani, Tanmoy Mukherjee, Nikos Deligiannis
Video Graph Transformer For Video Question Answering (2022) • Lecture Notes in Computer Science • 66 citations
Xiao et al.
Reclip: A Strong Zero-shot Baseline For Referring Expression Comprehension (2022) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 94 citations
Subramanian et al.
Generalized Decoding For Pixel, Image, And Language (2022) • 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 140 citations
Zou et al.
Counterfactual Reasoning For Out-of-distribution Multimodal Sentiment Analysis (2022) • Proceedings of the 30th ACM International Conference on Multimedia • 50 citations
Sun et al.
Bridging The Gap Between Learning In Discrete And Continuous Environments For Vision-and-language Navigation (2022) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 53 citations
Hong et al.
Photorealistic Text-to-image Diffusion Models With Deep Language Understanding (2022) • Arxiv • 2091 citations
Saharia et al.
Relation-aware Instance Refinement For Weakly Supervised Visual Grounding (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 46 citations
Liu et al.
A Good Prompt Is Worth Millions Of Parameters: Low-resource Prompt-based Learning For Vision-language Models (2021) • Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 66 citations
Jin et al.
Context-aware Biaffine Localizing Network For Temporal Sentence Grounding (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 128 citations
Liu et al.
Locate Then Segment: A Strong Pipeline For Referring Image Segmentation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 125 citations
Jing et al.
Lightningdot: Pre-training Visual-semantic Embeddings For Real-time Image-text Retrieval (2021) • Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 79 citations
Sun et al.
Look Before You Leap: Learning Landmark Features For One-stage Visual Grounding (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 111 citations
Huang et al.
Seeing Out Of The Box: End-to-end Pre-training For Vision-language Representation Learning (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 205 citations
Huang et al.
Styleclip: Text-driven Manipulation Of Stylegan Imagery (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 87 citations
Patashnik et al.
Episodic Transformer For Vision-and-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 85 citations
Alexander Pashevich, Cordelia Schmid, Chen Sun
M6: A Chinese Multimodal Pretrainer (2021) • Arxiv • 48 citations
Lin et al.
Swinbert: End-to-end Transformers With Sparse Attention For Video Captioning (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 208 citations
Lin et al.
Taco: Token-aware Cascade Contrastive Learning For Video-text Alignment (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 102 citations
Jianwei Yang, Yonatan Bisk, Jianfeng Gao
Wenlan: Bridging Vision And Language By Large-scale Multi-modal Pre-training (2021) • Arxiv • 85 citations
Huo et al.
LAVT: Language-aware Vision Transformer For Referring Image Segmentation (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 260 citations
Yang et al.
Hierarchical Cross-modal Agent For Robotics Vision-and-language Navigation (2021) • 2021 IEEE International Conference on Robotics and Automation (ICRA) • 45 citations
Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira
An Empirical Study Of GPT-3 For Few-shot Knowledge-based VQA (2021) • Arxiv • 46 citations
Yang et al.
Bottom Up Top Down Detection Transformers For Language Grounding In Images And Point Clouds (2021) • Lecture Notes in Computer Science • 54 citations
Jain et al.
Causal Attention For Vision-language Tasks (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Yang et al.
Step-wise Hierarchical Alignment Network For Image-text Matching (2021) • Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence • 97 citations
Zhong Ji, Kexin Chen, Haoran Wang
Vision-language Navigation With Random Environmental Mixup (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 70 citations
Liu et al.
A Survey Of Visual Transformers (2021) • IEEE Transactions on Neural Networks and Learning Systems • 285 citations
Liu et al.
Scaling Up Visual And Vision-language Representation Learning With Noisy Text Supervision (2021) • International Conference on Machine Learning 2021 • 1191 citations
Jia et al.
Open-domain, Content-based, Multi-modal Fact-checking Of Out-of-context Images Via Online Resources (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 60 citations
Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
Ibot: Image BERT Pre-training With Online Tokenizer (2021) • Arxiv • 207 citations
Zhou et al.
Neighbor-view Enhanced Model For Vision And Language Navigation (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 61 citations
An et al.
Docformer: End-to-end Transformer For Document Understanding (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 200 citations
Appalaraju et al.
N\"UWA: Visual Synthesis Pre-training For Neural Visual World Creation (2021) • Lecture Notes in Computer Science • 119 citations
Wu et al.
How Much Can CLIP Benefit Vision-and-language Tasks? (2021) • Arxiv • 152 citations
Shen et al.
Describing And Localizing Multiple Changes With Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 76 citations
Qiu et al.
Simple And Effective Zero-shot Cross-lingual Phoneme Recognition (2021) • Lecture Notes in Computer Science • 155 citations
Qiantong Xu, Alexei Baevski, Michael Auli
Learning To Prompt For Vision-language Models (2021) • International Journal of Computer Vision • 1953 citations
Zhou et al.
Learning Transferable Visual Models From Natural Language Supervision (2021) • Arxiv • 5297 citations
Radford et al.
Vlmo: Unified Vision-language Pre-training With Mixture-of-modality-experts (2021) • Arxiv • 288 citations
Bao et al.
Screen2words: Automatic Mobile UI Summarization With Multimodal Learning (2021) • The 34th Annual ACM Symposium on User Interface Software and Technology • 78 citations
Wang et al.
Less Is More: Clipbert For Video-and-language Learning Via Sparse Sampling (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 513 citations
Lei et al.
Cliport: What And Where Pathways For Robotic Manipulation (2021) • Arxiv • 98 citations
Mohit Shridhar, Lucas Manuelli, Dieter Fox
Actionclip: A New Paradigm For Video Action Recognition (2021) • Arxiv • 189 citations
Mengmeng Wang, Jiazheng Xing, Yong Liu
Latr: Layout-aware Transformer For Scene-text VQA (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 67 citations
Biten et al.
Crossclr: Cross-modal Contrastive Learning For Multi-modal Video Representations (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 121 citations
Zolfaghari et al.
Everything At Once -- Multi-modal Fusion Transformer For Video Retrieval (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 112 citations
Shvetsova et al.
Multi-grained Vision Language Pre-training: Aligning Texts With Visual Concepts (2021) • Arxiv • 96 citations
Yan Zeng, Xinsong Zhang, Hang Li
End-to-end Referring Video Object Segmentation With Multimodal Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Adam Botach, Evgenii Zheltonozhskii, Chaim Baskin
MERLOT: Multimodal Neural Script Knowledge Models (2021) • Arxiv • 54 citations
Zellers et al.
On Pursuit Of Designing Multi-modal Transformer For Video Grounding (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 48 citations
Cao et al.
Human-like Controllable Image Captioning With Verb-specific Semantic Roles (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 62 citations
Chen et al.
Zero-shot Cross-lingual Transfer Of Neural Machine Translation With Multilingual Pretrained Encoders (2021) • Lecture Notes in Computer Science • 51 citations
Chen et al.
Visualgpt: Data-efficient Adaptation Of Pretrained Language Models For Image Captioning (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 133 citations
Chen et al.
Zero-shot Text-to-image Generation (2021) • Arxiv • 1118 citations
Ramesh et al.
Align Before Fuse: Vision And Language Representation Learning With Momentum Distillation (2021) • Arxiv • 820 citations
Li et al.
GLIDE: Towards Photorealistic Image Generation And Editing With Text-guided Diffusion Models (2021) • Arxiv • 995 citations
Nichol et al.
Kaleido-bert: Vision-language Pre-training On Fashion Domain (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 105 citations
Zhuge et al.
Instancerefer: Cooperative Holistic Understanding For Visual Grounding On Point Clouds Through Instance Multi-level Contextual Referring (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 99 citations
Yuan et al.
Unifying Vision-and-language Tasks Via Text Generation (2021) • Arxiv • 145 citations
Cho et al.
Adversarial VQA: A New Benchmark For Evaluating The Robustness Of VQA Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 42 citations
Li et al.
VLM: Task-agnostic Video-language Model Pre-training For Video Understanding (2021) • Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 • 56 citations
Xu et al.
Layoutxlm: Multimodal Pre-training For Multilingual Visually-rich Document Understanding (2021) • Arxiv • 48 citations
Xu et al.
Videoclip: Contrastive Pre-training For Zero-shot Video-text Understanding (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 341 citations
Xu et al.
Towards Corruption-agnostic Robust Domain Adaptation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Xu et al.
E2E-VLP: End-to-end Vision-language Pre-training Enhanced By Visual Learning (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 78 citations
Xu et al.
ROSITA: Enhancing Vision-and-language Semantic Alignments Via Cross- And Intra-modal Knowledge Integration (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 41 citations
Cui et al.
Transvg: End-to-end Visual Grounding With Transformers (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 285 citations
Deng et al.
Scheduled Sampling In Vision-language Pretraining With Decoupled Encoder-decoder Network (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 46 citations
Li et al.
Similarity Reasoning And Filtration For Image-text Matching (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 319 citations
Diao et al.
Peco: Perceptual Codebook For BERT Pre-training Of Vision Transformers (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 83 citations
Dong et al.
An Empirical Study Of Training End-to-end Vision-and-language Transformers (2021) • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 244 citations
Dou et al.
Clip4caption ++: Multi-clip For Video Caption (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 113 citations
Tang et al.
Revamping Cross-modal Recipe Retrieval With Hierarchical Transformers And Self-supervised Learning (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 68 citations
Salvador et al.
Image Captioning For Effective Use Of Language Models In Knowledge-based Visual Question Answering (2021) • Expert Systems with Applications • 48 citations
Salaberria et al.
Clipscore: A Reference-free Evaluation Metric For Image Captioning (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 720 citations
Hessel et al.
MDETR -- Modulated Detection For End-to-end Multi-modal Understanding (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 594 citations
Kamath et al.
BROS: A Pre-trained Language Model Focusing On Text And Layout For Better Key Information Extraction From Documents (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 105 citations
Hong et al.
Prompting Visual-language Models For Efficient Video Understanding (2021) • Lecture Notes in Computer Science • 246 citations
Ju et al.
Text Is NOT Enough: Integrating Visual Impressions Into Open-domain Dialogue Generation (2021) • ACM Computing Surveys • 151 citations
Shen et al.
Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation (2021) • Proceedings of the 29th ACM International Conference on Multimedia • 153 citations
Zaid Khan, Yun Fu
Natural Language Video Localization With Learnable Moment Proposals (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 42 citations
Xiao et al.
Vinvl: Revisiting Visual Representations In Vision-language Models (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 640 citations
Zhang et al.
Greedy Gradient Ensemble For Robust Visual Question Answering (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 82 citations
Han et al.
LPF: A Language-prior Feedback Objective Function For De-biased Visual Question Answering (2021) • Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval • 45 citations
Zujie Liang, Haifeng Hu, Jiaying Zhu
E-vil: A Dataset And Benchmark For Natural Language Explanations In Vision-language Tasks (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 54 citations
Kayser et al.
Tip-adapter: Training-free Clip-adapter For Better Vision-language Modeling (2021) • Arxiv • 128 citations
Zhang et al.
Transrefer3d: Entity-and-relation Aware Transformer For Fine-grained 3D Visual Grounding (2021) • MM '21: ACM Multimedia Conference • 65 citations
He et al.
Does CLIP Benefit Visual Question Answering In The Medical Domain As Much As It Does In The General Domain? (2021) • Arxiv • 41 citations
Sedigheh Eslami, Gerard de Melo, Christoph Meinel
Structurallm: Structural Pre-training For Form Understanding (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 61 citations
Li et al.
Clip2video: Mastering Video-text Retrieval Via Image CLIP (2021) • Arxiv • 130 citations
Fang et al.
Compressing Visual-linguistic Model Via Knowledge Distillation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 50 citations
Fang et al.
Structext: Structured Text Understanding With Multi-modal Transformers (2021) • MM '21: ACM Multimedia Conference • 89 citations
Li et al.
Cross-modal Contrastive Learning For Text-to-image Generation (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 297 citations
Zhang et al.
Ocr-free Document Understanding Transformer (2021) • Lecture Notes in Computer Science • 194 citations
Kim et al.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-image Pre-training Paradigm (2021) • Arxiv • 126 citations
Li et al.
Clipdraw: Exploring Text-to-drawing Synthesis Through Language-image Encoders (2021) • Arxiv • 92 citations
Kevin Frans, L. B. Soros, Olaf Witkowski
Exploring The Limits Of Out-of-distribution Detection (2021) • Arxiv • 107 citations
Stanislav Fort, Jie Ren, Balaji Lakshminarayanan
CM-NAS: Cross-modality Neural Architecture Search For Visible-infrared Person Re-identification (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 150 citations
Fu et al.
Attend What You Need: Motion-appearance Synergistic Networks For Video Question Answering (2021) • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) • 57 citations
Seo et al.
Stylegan-nada: Clip-guided Domain Adaptation Of Image Generators (2021) • Arxiv • 65 citations
Gal et al.
Clip4clip: An Empirical Study Of CLIP For End To End Video Clip Retrieval (2021) • Arxiv • 113 citations
Luo et al.
Newsclippings: Automatic Generation Of Out-of-context Multimodal Media (2021) • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing • 46 citations
Grace Luo, Trevor Darrell, Anna Rohrbach
Clip-adapter: Better Vision-language Models With Feature Adapters (2021) • International Journal of Computer Vision • 617 citations
Gao et al.
Contextual Non-local Alignment Over Full-scale Representation For Text-based Person Search (2021) • Arxiv • 61 citations
Gao et al.
The Road To Know-where: An Object-and-room Informed Sequential BERT For Indoor Vision-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 66 citations
Qi et al.
INVIGORATE: Interactive Visual Grounding And Grasping In Clutter (2021) • Robotics: Science and Systems XVII • 45 citations
Zhang et al.
LAION-400M: Open Dataset Of Clip-filtered 400 Million Image-text Pairs (2021) • Arxiv • 366 citations
Schuhmann et al.
Synthesis Of Compositional Animations From Textual Descriptions (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 140 citations
Ghosh et al.
Image Retrieval On Real-life Images With Pre-trained Vision-and-language Models (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 151 citations
Liu et al.
VX2TEXT: End-to-end Learning Of Video-based Text Generation From Multimodal Inputs (2021) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 50 citations
Lin et al.
Visualmrc: Machine Reading Comprehension On Document Images (2021) • Proceedings of the AAAI Conference on Artificial Intelligence • 62 citations
Ryota Tanaka, Kyosuke Nishida, Sen Yoshida
Going Full-tilt Boogie On Document Understanding With Text-image-layout Transformer (2021) • Lecture Notes in Computer Science • 95 citations
Powalski et al.
Open-vocabulary Object Detection Via Vision And Language Knowledge Distillation (2021) • ICLR 2022 • 280 citations
Gu et al.
CPT: Colorful Prompt Tuning For Pre-trained Vision-language Models (2021) • AI Open • 62 citations
Yao et al.
FILIP: Fine-grained Interactive Language-image Pre-training (2021) • Arxiv • 205 citations
Yao et al.
Airbert: In-domain Pretraining For Vision-and-language Navigation (2021) • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) • 118 citations
Guhur et al.
KAT: A Knowledge Augmented Transformer For Vision-and-language (2021) • Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies • 68 citations
Gui et al.
Sub-instruction Aware Vision-and-language Navigation (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 53 citations
Hong et al.
A Recurrent Vision-and-language BERT For Navigation (2020) • Arxiv • 40 citations
Hong et al.
Finding The Evidence: Localization-aware Answer Prediction For Text Visual Question Answering (2020) • Proceedings of the 28th International Conference on Computational Linguistics • 44 citations
Wei Han, Hantao Huang, Tao Han
Normalized And Geometry-aware Self-attention Network For Image Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 246 citations
Guo et al.
Graph Structured Network For Image-text Matching (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 260 citations
Liu et al.
Roses Are Red, Violets Are Blue... But Should Vqa Expect Them To? (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Kervadec et al.
Babywalk: Going Farther In Vision-and-language Navigation By Taking Baby Steps (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 59 citations
Zhu et al.
Video2commonsense: Generating Commonsense Descriptions To Enrich Video Captioning (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 59 citations
Fang et al.
More Grounded Image Captioning By Distilling Image-text Matching Model (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 129 citations
Zhou et al.
Object-and-action Aware Model For Visual Language Navigation (2020) • Lecture Notes in Computer Science • 95 citations
Qi et al.
Weakly-supervised Multi-level Attentional Reconstruction Network For Grounding Textual Queries In Videos (2020) • Arxiv • 51 citations
Song et al.
Towards Learning A Generic Agent For Vision-and-language Navigation Via Pre-training (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 221 citations
Hao et al.
Human-centric Spatio-temporal Video Grounding With Visual Transformers (2020) • IEEE Transactions on Circuits and Systems for Video Technology • 75 citations
Tang et al.
VQA-LOL: Visual Question Answering Under The Lens Of Logic (2020) • Lecture Notes in Computer Science • 73 citations
Gokhale et al.
MUTANT: A Training Paradigm For Out-of-distribution Generalization In Visual Question Answering (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 134 citations
Gokhale et al.
Unshuffling Data For Improved Generalization (2020) • Arxiv • 41 citations
Damien Teney, Ehsan Abbasnejad, Anton van Den Hengel
Creating Something From Nothing: Unsupervised Knowledge Distillation For Cross-modal Hashing (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 127 citations
Hu et al.
Visual Relation Grounding In Videos (2020) • Lecture Notes in Computer Science • 42 citations
Xiao et al.
Learning To Discretely Compose Reasoning Module Networks For Video Captioning (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 59 citations
Tan et al.
Language-guided Navigation Via Cross-modal Grounding And Alternate Adversarial Learning (2020) • IEEE Transactions on Circuits and Systems for Video Technology • 61 citations
Zhang et al.
Large-scale Adversarial Training For Vision-and-language Representation Learning (2020) • Arxiv • 287 citations
Gan et al.
Where Does It Exist: Spatio-temporal Video Grounding For Multi-form Sentences (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 94 citations
Zhang et al.
Spatially Aware Multimodal Transformers For Textvqa (2020) • Lecture Notes in Computer Science • 59 citations
Kant et al.
Univl: A Unified Video And Language Pre-training Model For Multimodal Understanding And Generation (2020) • Arxiv • 169 citations
Luo et al.
Multi-task Collaborative Network For Joint Referring Expression Comprehension And Segmentation (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 268 citations
Luo et al.
A Dataset And Baselines For Visual Question Answering On Art (2020) • Lecture Notes in Computer Science • 49 citations
Garcia et al.
Widget Captioning: Generating Natural Language Description For Mobile User Interface Elements (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 42 citations
Li et al.
M3P: Learning Universal Representations Via Multitask Multilingual Multimodal Pre-training (2020) • Arxiv • 43 citations
Ni et al.
Diagnosing The Environment Bias In Vision-and-language Navigation (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 44 citations
Yubo Zhang, Hao Tan, Mohit Bansal
In Defense Of Grid Features For Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 351 citations
Jiang et al.
Self-supervised Multimodal Versatile Networks (2020) • Arxiv • 195 citations
Alayrac et al.
Fashion Captioning: Towards Generating Accurate Descriptions With Semantic Rewards (2020) • Lecture Notes in Computer Science • 61 citations
Yang et al.
Imagebert: Cross-modal Pre-training With Large-scale Weak-supervised Image-text Data (2020) • Arxiv • 154 citations
Qi et al.
Spatial-temporal Multi-cue Network For Continuous Sign Language Recognition (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 184 citations
Zhou et al.
Referring Expression Comprehension: A Survey Of Methods And Datasets (2020) • IEEE Transactions on Multimedia • 81 citations
Yanyuan Qiao, Chaorui Deng, Qi Wu
Improving Certified Robustness Via Statistical Learning With Logical Reasoning (2020) • Lecture Notes in Computer Science • 222 citations
Yang et al.
Actbert: Learning Global-local Video-text Representations (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 392 citations
Linchao Zhu, Yi Yang
Show, Recall, And Tell: Image Captioning With Recall Mechanism (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 40 citations
Wang et al.
VD-BERT: A Unified Vision And Dialog Transformer With BERT (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 43 citations
Wang et al.
On The General Value Of Evidence, And Bilingual Scene-text Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 55 citations
Wang et al.
Active Visual Information Gathering For Vision-language Navigation (2020) • Lecture Notes in Computer Science • 65 citations
Wang et al.
Behind The Scene: Revealing The Secrets Of Pre-trained Vision-and-language Models (2020) • Lecture Notes in Computer Science • 43 citations
Cao et al.
Topological Planning With Transformers For Vision-and-language Navigation (2020) • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 69 citations
Chen et al.
Adaptive Offline Quintuplet Loss For Image-text Matching (2020) • Lecture Notes in Computer Science • 67 citations
Tianlang Chen, Jiajun Deng, Jiebo Luo
Counterfactual Samples Synthesizing For Robust Visual Question Answering (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 335 citations
Chen et al.
IMRAM: Iterative Matching With Recurrent Attention Memory For Cross-modal Image-text Retrieval (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 391 citations
Chen et al.
Expressing Objects Just Like Words: Recurrent Visual Embedding For Image-text Matching (2020) • Proceedings of the AAAI Conference on Artificial Intelligence • 65 citations
Tianlang Chen, Jiebo Luo
Fine-grained Video-text Retrieval With Hierarchical Graph Reasoning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 306 citations
Chen et al.
Look Closer To Ground Better: Weakly-supervised Temporal Grounding Of Sentence In Video (2020) • Arxiv • 48 citations
Chen et al.
Learning Modality Interaction For Temporal Sentence Localization And Event Captioning In Videos (2020) • Lecture Notes in Computer Science • 89 citations
Chen et al.
Fine-grained Visual Textual Alignment For Cross-modal Retrieval Using Transformer Encoders (2020) • ACM Transactions on Multimedia Computing, Communications, and Applications • 118 citations
Messina et al.
Transformer Reasoning Network For Image-text Matching And Retrieval (2020) • 2020 25th International Conference on Pattern Recognition (ICPR) • 45 citations
Messina et al.
Oscar: Object-semantics Aligned Pre-training For Vision-language Tasks (2020) • Lecture Notes in Computer Science • 1428 citations
Li et al.
SEA: Sentence Encoder Assembly For Video Retrieval By Textual Queries (2020) • IEEE Transactions on Multimedia • 53 citations
Li et al.
Graph-structured Referring Expression Reasoning In The Wild (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 96 citations
Sibei Yang, Guanbin Li, Yizhou Yu
Bridging Text And Video: A Universal Multimodal Transformer For Video-audio Scene-aware Dialog (2020) • IEEE/ACM Transactions on Audio, Speech, and Language Processing • 52 citations
Li et al.
VIOLIN: A Large-scale Dataset For Video-and-language Inference (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 56 citations
Liu et al.
Room-across-room: Multilingual Vision-and-language Navigation With Dense Spatiotemporal Grounding (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 190 citations
Ku et al.
Interbert: Vision-and-language Interaction For Multi-modal Pretraining (2020) • Arxiv • 56 citations
Lin et al.
Jointly Cross- And Self-modal Graph Attention Network For Query-based Moment Localization (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 108 citations
Liu et al.
X-LXMERT: Paint, Caption And Answer Questions With Multi-modal Transformers (2020) • Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) • 58 citations
Cho et al.
A Multimodal Framework For The Detection Of Hateful Memes (2020) • PMLR 133344-360 2021 • 46 citations
Lippe et al.
Improving Vision-and-language Navigation With Image-text Pairs From The Web (2020) • Lecture Notes in Computer Science • 46 citations
Majumdar et al.
Unsupervised Multimodal Neural Machine Translation With Pseudo Visual Pivoting (2020) • Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics • 42 citations
Huang et al.
Reinforcement Learning For Weakly Supervised Temporal Grounding Of Natural Language In Untrimmed Videos (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 69 citations
Wu et al.
Dual-mode ASR: Unify And Improve Streaming ASR With Full-context Modeling (2020) • IEEE Transactions on Multimedia • 50 citations
Yu et al.
Vlanet: Video-language Alignment Network For Weakly-supervised Video Moment Retrieval (2020) • Lecture Notes in Computer Science • 78 citations
Ma et al.
Layoutlmv2: Multi-modal Pre-training For Visually-rich Document Understanding (2020) • Arxiv • 57 citations
Xu et al.
Chart-to-text: Generating Natural Language Descriptions For Charts By Adapting The Transformer Model (2020) • Proceedings of the 13th International Conference on Natural Language Generation • 46 citations
Jason Obeid, Enamul Hoque
Deep Multimodal Neural Architecture Search (2020) • Proceedings of the 28th ACM International Conference on Multimedia • 87 citations
Yu et al.
Ernie-vil: Knowledge Enhanced Vision-language Representations Through Scene Graph (2020) • Arxiv • 118 citations
Yu et al.
Object Relational Graph With Teacher-recommended Learning For Video Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 310 citations
Zhang et al.
Overcoming Language Priors With Self-supervised Learning For Visual Question Answering (2020) • Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence • 108 citations
Zhu et al.
Vision-dialog Navigation By Exploring Cross-modal Memory (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 51 citations
Zhu et al.
Modality-agnostic Attention Fusion For Visual Search With Text Feedback (2020) • Arxiv • 47 citations
Dodds et al.
Transform And Tell: Entity-aware News Image Captioning (2020) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 81 citations
Alasdair Tran, Alexander Mathews, Lexing Xie
Self-monitoring Navigation Agent Via Auxiliary Progress Estimation (2019) • Arxiv • 134 citations
Ma et al.
Clevr-dialog: A Diagnostic Dataset For Multi-round Reasoning In Visual Dialog (2019) • Arxiv • 49 citations
Kottur et al.
Mirrorgan: Learning Text-to-image Generation By Redescription (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 589 citations
Qiao et al.
Multimodal Transformer With Multi-view Visual Representation For Image Captioning (2019) • IEEE Transactions on Circuits and Systems for Video Technology • 358 citations
Yu et al.
Polysemous Visual-semantic Embedding For Cross-modal Retrieval (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 243 citations
Yale Song, Mohammad Soleymani
Visually Grounded Neural Syntax Acquisition (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 68 citations
Shi et al.
Deep Modular Co-attention Networks For Visual Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 862 citations
Yu et al.
Videobert: A Joint Model For Video And Language Representation Learning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 1077 citations
Sun et al.
Information Maximizing Visual Question Generation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 90 citations
Ranjay Krishna, Michael Bernstein, Li Fei-Fei
Variational Mixture-of-experts Autoencoders For Multi-modal Deep Generative Models (2019) • Arxiv • 88 citations
Shi et al.
Cascaded Revision Network For Novel Object Captioning (2019) • IEEE Transactions on Circuits and Systems for Video Technology • 40 citations
Feng et al.
Tactical Rewind: Self-correction Via Backtracking In Vision-and-language Navigation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 149 citations
Ke et al.
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 545 citations
Marino et al.
Semantic Object Accuracy For Generative Text-to-image Synthesis (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 146 citations
Tobias Hinz, Stefan Heinrich, Stefan Wermter
Expressing Visual Relationships Via Language (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 42 citations
Tan et al.
The Regretful Agent: Heuristic-aided Navigation Through Progress Estimation (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 168 citations
Ma et al.
Understanding Natural Language Instructions For Fetching Daily Objects Using Gan-based Multimodal Target-source Classification (2019) • IEEE Robotics and Automation Letters • 40 citations
Magassouba et al.
Heterogeneous Memory Enhanced Multimodal Attention Model For Video Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 264 citations
Fan et al.
Transferable Representation Learning In Vision-and-language Navigation (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 85 citations
Huang et al.
LXMERT: Learning Cross-modality Encoder Representations From Transformers (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 1297 citations
Hao Tan, Mohit Bansal
Meshed-memory Transformer For Image Captioning (2019) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 861 citations
Cornia et al.
Context-aware Visual Policy Network For Fine-grained Image Captioning (2019) • IEEE Transactions on Pattern Analysis and Machine Intelligence • 151 citations
Zha et al.
Spatio-temporal Dynamics And Semantic Attribute Enriched Visual Encoding For Video Captioning (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 234 citations
Aafaq et al.
Unicoder-vl: A Universal Encoder For Vision And Language By Cross-modal Pre-training (2019) • Arxiv • 117 citations
Li et al.
Visual Entailment: A Novel Task For Fine-grained Image Understanding (2019) • Arxiv • 162 citations
Xie et al.
Visualbert: A Simple And Performant Baseline For Vision And Language (2019) • Arxiv • 1227 citations
Li et al.
Robust Navigation With Language Pretraining And Stochastic Sampling (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 95 citations
Li et al.
Object-driven Text-to-image Synthesis Via Adversarial Training (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 309 citations
Li et al.
Cross-modal Interaction Networks For Query-based Moment Retrieval In Videos (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 238 citations
Zhang et al.
Fusion Of Detected Objects In Text For Visual Question Answering (2019) • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) • 41 citations
Alberti et al.
Unified Vision-language Pre-training For Image Captioning And VQA (2019) • Arxiv • 74 citations
Zhou et al.
Controllable Dual Skew Divergence Loss For Neural Machine Translation (2019) • Arxiv • 79 citations
Li et al.
Vision-and-dialog Navigation (2019) • Arxiv • 118 citations
Thomason et al.
Vilbert: Pretraining Task-agnostic Visiolinguistic Representations For Vision-and-language Tasks (2019) • Arxiv • 1672 citations
Lu et al.
Improving Referring Expression Grounding With Cross-modal Attention-guided Erasing (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 189 citations
Liu et al.
Saliency-guided Attention Network For Image-sentence Matching (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 105 citations
Ji et al.
Rubi: Reducing Unimodal Biases In Visual Question Answering (2019) • Advances in Neural Information Processing Systems 2019 (pp. 839-850) • 205 citations
Cadene et al.
Clevr-ref+: Diagnosing Visual Reasoning With Referring Expressions (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 101 citations
Liu et al.
Multimodal Transformer Networks For End-to-end Video-grounded Dialogue Systems (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 51 citations
Le et al.
Self-critical Reasoning For Robust Visual Question Answering (2019) • Arxiv • 91 citations
Jialin Wu, Raymond J. Mooney
CAMP: Cross-modal Adaptive Message Passing For Text-image Retrieval (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 339 citations
Wang et al.
Dynamic Graph Attention For Referring Expression Comprehension (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 226 citations
Sibei Yang, Guanbin Li, Yizhou Yu
VATEX: A Large-scale, High-quality Multilingual Dataset For Video-and-language Research (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 326 citations
Wang et al.
Weakly-supervised Spatio-temporally Grounding Natural Sentence In Video (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 101 citations
Chen et al.
Watch, Listen And Tell: Multi-modal Weakly Supervised Dense Event Captioning (2019) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 73 citations
Tanzila Rahman, Bicheng Xu, Leonid Sigal
Matching Images And Text With Multi-modal Tensor Fusion And Re-ranking (2019) • Proceedings of the 27th ACM International Conference on Multimedia • 145 citations
Wang et al.
UNITER: Universal Image-text Representation Learning (2019) • Arxiv • 183 citations
Chen et al.
Iterative Answer Prediction With Pointer-augmented Multimodal Transformers For Textvqa (2019) • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 140 citations
Hu et al.
Quantifying And Alleviating The Language Prior Problem In Visual Question Answering (2019) • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval • 40 citations
Guo et al.
Are You Looking? Grounding To Multiple Modalities In Vision-and-language Navigation (2019) • Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics • 74 citations
Hu et al.
Layoutlm: Pre-training Of Text And Layout For Document Image Understanding (2019) • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining • 478 citations
Xu et al.
VL-BERT: Pre-training Of Generic Visual-linguistic Representations (2019) • Arxiv • 782 citations
Su et al.
GQA: A New Dataset For Real-world Visual Reasoning And Compositional Question Answering (2019) • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 132 citations
Drew A. Hudson, Christopher D. Manning
Compositional Attention Networks For Machine Reasoning (2018) • Arxiv • 132 citations
Drew A. Hudson, Christopher D. Manning
Speaker-follower Models For Vision-and-language Navigation (2018) • Arxiv • 244 citations
Fried et al.
Multimodal Explanations: Justifying Decisions And Pointing To The Evidence (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 101 citations
Park et al.
Learning A Text-video Embedding From Incomplete And Heterogeneous Data (2018) • Arxiv • 174 citations
Antoine Miech, Ivan Laptev, Josef Sivic
Look Before You Leap: Bridging Model-free And Model-based Reinforcement Learning For Planned-ahead Vision-and-language Navigation (2018) • Lecture Notes in Computer Science • 194 citations
Wang et al.
Neural Baby Talk (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 442 citations
Lu et al.
Context Models For OOV Word Translation In Low-resource Languages (2018) • Proceedings of the 26th ACM international conference on Multimedia • 120 citations
Angli Liu, Katrin Kirchhoff
Watch, Listen, And Describe: Globally And Locally Aligned Cross-modal Attentions For Video Captioning (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 51 citations
Xin Wang, Yuan-Fang Wang, William Yang Wang
Neural-symbolic VQA: Disentangling Reasoning From Vision And Language Understanding (2018) • Arxiv • 233 citations
Yi et al.
Object Counts! Bringing Explicit Detections Back Into Image Captioning (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) • 42 citations
Josiah Wang, Pranava Madhyastha, Lucia Specia
Multimodal Dual Attention Memory For Video Story Question Answering (2018) • Lecture Notes in Computer Science • 73 citations
Kim et al.
Seq2seq2sentiment: Multimodal Sequence To Sequence Models For Sentiment Analysis (2018) • Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML) • 65 citations
Pham et al.
GLAC Net: Glocal Attention Cascading Networks For Multi-image Cued Story Generation (2018) • Arxiv • 53 citations
Kim et al.
Nocaps: Novel Object Captioning At Scale (2018) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 233 citations
Agrawal et al.
Tell, Draw, And Repeat: Generating And Modifying Images Based On Continual Linguistic Instruction (2018) • 2019 IEEE/CVF International Conference on Computer Vision (ICCV) • 54 citations
El-Nouby et al.
On The Effectiveness Of Task Granularity For Transfer Learning (2018) • Arxiv • 50 citations
Mahdisoltani et al.
Image Captioning At Will: A Versatile Scheme For Effectively Injecting Sentiments Into Image Descriptions (2018) • Arxiv • 47 citations
Quanzeng You, Hailin Jin, Jiebo Luo
Visual Question Reasoning On General Dependency Tree (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 42 citations
Cao et al.
A Joint Sequence Fusion Model For Video Question Answering And Retrieval (2018) • Lecture Notes in Computer Science • 331 citations
Youngjae Yu, Jongseok Kim, Gunhee Kim
Mattnet: Modular Attention Network For Referring Expression Comprehension (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 788 citations
Yu et al.
Women Also Snowboard: Overcoming Bias In Captioning Models (2018) • Lecture Notes in Computer Science • 387 citations
Burns et al.
Decoupled Novel Object Captioner (2018) • Proceedings of the 26th ACM international conference on Multimedia • 70 citations
Wu et al.
Show, Tell And Discriminate: Image Captioning By Self-retrieval With Partially Labeled Data (2018) • Lecture Notes in Computer Science • 83 citations
Liu et al.
VQA-E: Explaining, Elaborating, And Enhancing Your Answers For Visual Questions (2018) • Lecture Notes in Computer Science • 51 citations
Li et al.
Visual Referring Expression Recognition: What Do Systems Actually Learn? (2018) • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) • 52 citations
Volkan Cirik, Louis-Philippe Morency, Taylor Berg-Kirkpatrick
Visual Coreference Resolution In Visual Dialog Using Neural Module Networks (2018) • Lecture Notes in Computer Science • 177 citations
Kottur et al.
Tell-and-answer: Towards Explainable Visual Question Answering Using Attributes And Captions (2018) • Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing • 59 citations
Li et al.
Jointly Localizing And Describing Events For Dense Video Captioning (2018) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 168 citations
Li et al.
Chatpainter: Improving Text To Image Generation Using Dialogue (2018) • Arxiv • 78 citations
Sharma et al.
Context-aware Captions From Context-agnostic Supervision (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 109 citations
Vedantam et al.
Semantic Image Synthesis Via Adversarial Learning (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 264 citations
Dong et al.
Dense-captioning Events In Videos (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 50 citations
Krishna et al.
FOIL It! Find One Mismatch Between Image And Language Caption (2017) • Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) • 48 citations
Shekhar et al.
Exploring Human-like Attention Supervision In Visual Question Answering (2017) • Arxiv • 45 citations
Tingting Qiao, Jianfeng Dong, Duanqing Xu
Paying Attention To Descriptions Generated By Image Captioning Models (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 81 citations
Tavakoli et al.
Visual Reference Resolution Using Attention Memory For Visual Dialog (2017) • Arxiv • 90 citations
Seo et al.
TALL: Temporal Activity Localization Via Language Query (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 768 citations
Gao et al.
Look, Imagine And Match: Improving Textual-visual Cross-modal Retrieval With Generative Models (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 289 citations
Gu et al.
Incorporating Global Visual Features Into Attention-based Neural Machine Translation (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 150 citations
Iacer Calixto, Qun Liu, Nick Campbell
TGIF-QA: Toward Spatio-temporal Reasoning In Visual Question Answering (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 435 citations
Jang et al.
Attention-based Multimodal Fusion For Video Description (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 387 citations
Hori et al.
I2T2I: Learning Text To Image Synthesis With Textual Data Augmentation (2017) • 2017 IEEE International Conference on Image Processing (ICIP) • 60 citations
Dong et al.
Incorporating External Knowledge To Answer Open-domain Visual Questions With Dynamic Memory Networks (2017) • Arxiv • 41 citations
Guohao Li, Hang Su, Wenwu Zhu
Identity-aware Textual-visual Matching With Latent Co-attention (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 259 citations
Li et al.
Towards Diverse And Natural Image Descriptions Via A Conditional GAN (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 455 citations
Dai et al.
Incorporating Copying Mechanism In Image Captioning For Learning Novel Objects (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 149 citations
Yao et al.
Video Captioning With Guidance Of Multimodal Latent Topics (2017) • Proceedings of the 25th ACM international conference on Multimedia • 66 citations
Chen et al.
Shapeworld - A New Test Methodology For Multimodal Language Understanding (2017) • Arxiv • 47 citations
Alexander Kuhnle, Ann Copestake
Attngan: Fine-grained Text To Image Generation With Attentional Generative Adversarial Networks (2017) • Arxiv • 158 citations
Xu et al.
Don't Just Assume; Look And Answer: Overcoming Priors For Visual Question Answering (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition • 40 citations
Agrawal et al.
Multi-modal Factorized Bilinear Pooling With Co-attention Learning For Visual Question Answering (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 694 citations
Yu et al.
Image-grounded Conversations: Multimodal Context For Natural Question And Response Generation (2017) • Arxiv • 117 citations
Mostafazadeh et al.
Fooling Vision And Language Models Despite Localization And Attention Mechanism (2017) • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) • 41 citations
Xu et al.
Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling For Visual Question Answering (2017) • IEEE Transactions on Neural Networks and Learning Systems • 530 citations
Yu et al.
Rapid Adaptation With Conditionally Shifted Neurons (2017) • Arxiv • 106 citations
Munkhdalai et al.
OBJ2TEXT: Generating Visually Descriptive Language From Object Layouts (2017) • Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing • 59 citations
Xuwang Yin, Vicente Ordonez
Skeleton Key: Image Captioning By Skeleton-attribute Decomposition (2017) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 100 citations
Wang et al.
Speaking The Same Language: Matching Machine To Human Captions By Adversarial Training (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 238 citations
Shetty et al.
Learning Cooperative Visual Dialog Agents With Deep Reinforcement Learning (2017) • 2017 IEEE International Conference on Computer Vision (ICCV) • 313 citations
Das et al.
Visual Question Answering: A Survey Of Methods And Datasets (2016) • Arxiv • 44 citations
Wu et al.
Generative Adversarial Text To Image Synthesis (2016) • Arxiv • 1422 citations
Reed et al.
A Focused Dynamic Attention Model For Visual Question Answering (2016) • Arxiv • 130 citations
Ilija Ilievski, Shuicheng Yan, Jiashi Feng
End-to-end Concept Word Detection For Video Captioning, Retrieval, And Question Answering (2016) • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 146 citations
Yu et al.
CLEVR: A Diagnostic Dataset For Compositional Language And Elementary Visual Reasoning (2016) • Arxiv • 46 citations
Johnson et al.
Stackgan: Text To Photo-realistic Image Synthesis With Stacked Generative Adversarial Networks (2016) • Arxiv • 227 citations
Zhang et al.
Image Captioning With Deep Bidirectional Lstms (2016) • Proceedings of the 24th ACM international conference on Multimedia • 262 citations
Wang et al.
Learning To Generalize To New Compositions In Image Understanding (2016) • Arxiv • 53 citations
Atzmon et al.
Modeling Context In Referring Expressions (2016) • Lecture Notes in Computer Science • 895 citations
Yu et al.
Leveraging Visual Question Answering For Image-caption Ranking (2016) • Lecture Notes in Computer Science • 82 citations
Xiao Lin, Devi Parikh
Visual Genome: Connecting Language And Vision Using Crowdsourced Dense Image Annotations (2016) • International Journal of Computer Vision • 4911 citations
Krishna et al.
Dynamic Memory Networks For Visual And Textual Question Answering (2016) • Arxiv • 593 citations
Caiming Xiong, Stephen Merity, Richard Socher
Multimodal Compact Bilinear Pooling For Visual Question Answering And Visual Grounding (2016) • Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing • 1356 citations
Fukui et al.
Revisiting Visual Question Answering Baselines (2016) • Lecture Notes in Computer Science • 224 citations
Allan Jabri, Armand Joulin, Laurens van Der Maaten
Attentive Explanations: Justifying Decisions And Pointing To The Evidence (2016) • Arxiv • 55 citations
Park et al.
Diverse Beam Search: Decoding Diverse Solutions From Neural Sequence Models (2016) • Arxiv • 67 citations
Vijayakumar et al.
Zero-shot Visual Question Answering (2016) • Arxiv • 58 citations
Damien Teney, Anton van Den Hengel
Learning Deep Representations Of Fine-grained Visual Descriptions (2016) • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 769 citations
Reed et al.

Showing first 12 while collapsed. Click to expand and reveal all 1357.

Awesome LLM Papers

Stay Updated