-
Can We Predict Before Executing Machine Learning Agents?
(2026)
• No Venue
Zheng et al.
-
Can Llms Clean Up Your Mess? A Survey Of Application-ready Data Preparation With Llms
(2026)
• No Venue
Zhou et al.
-
Toward Ultra-long-horizon Agentic Science: Cognitive Accumulation For Machine Learning Engineering
(2026)
• No Venue
Zhu et al.
-
Abc-bench: Benchmarking Agentic Backend Coding In Real-world Development
(2026)
• No Venue
Yang et al.
-
Toward Efficient Agents: Memory, Tool Learning, And Planning
(2026)
• No Venue
Yang et al.
-
Agent-as-a-judge
(2026)
• No Venue
You et al.
-
Towards Automated Kernel Generation In The Era Of Llms
(2026)
• No Venue
Yu et al.
-
Infinitevggt: Visual Geometry Grounded Transformer For Endless Streams
(2026)
• No Venue
Yuan et al.
-
Dr. Zero: Self-evolving Search Agents Without Training Data
(2026)
• No Venue
Yue et al.
-
Glimprouter: Efficient Collaborative Inference By Glimpsing One Token Of Thoughts
(2026)
• No Venue
Zeng et al.
-
Evofsm: Controllable Self-evolution For Deep Research With Finite State Machines
(2026)
• No Venue
Zhang et al.
-
Confidence Estimation For Llms In Multi-turn Interactions
(2026)
• No Venue
Zhang et al.
-
Arenarl: Scaling RL For Open-ended Agents Via Tournament-based Relative Ranking
(2026)
• No Venue
Zhang et al.
-
Deepplanning: Benchmarking Long-horizon Agentic Planning With Verifiable Constraints
(2026)
• No Venue
Zhang et al.
-
Expseek: Self-triggered Experience Seeking For Web Agents
(2026)
• No Venue
Zhang et al.
-
HERMES: KV Cache As Hierarchical Memory For Efficient Streaming Video Understanding
(2026)
• No Venue
Zhang et al.
-
PROGRESSLM: Towards Progress Reasoning In Vision-language Models
(2026)
• No Venue
Zhang et al.
-
Opennovelty: An Llm-powered Agentic System For Verifiable Scholarly Novelty Assessment
(2026)
• No Venue
Zhang et al.
-
Openvision 3: A Family Of Unified Visual Encoder For Both Understanding And Generation
(2026)
• No Venue
Zhang et al.
-
MOSS Transcribe Diarize: Accurate Transcription With Speaker Diarization
(2026)
• No Venue
Ai et al.
-
VIBE: Visual Instruction Based Editor
(2026)
• No Venue
Alekseenko et al.
-
Are Llms Vulnerable To Preference-undermining Attacks (PUA)? A Factorial Analysis Methodology For Diagnosing The Trade-off Between Preference Alignment And Real-world Validity
(2026)
• No Venue
An et al.
-
Webgym: Scaling Training Environments For Visual Web Agents With Realistic Tasks
(2026)
• No Venue
Bai et al.
-
Entropy Sentinel: Continuous LLM Accuracy Monitoring From Decoding Entropy Traces In STEM
(2026)
• No Venue
Pedro Memoli Buffa, Luciano del Corro
-
Babyvision: Visual Reasoning Beyond Language
(2026)
• No Venue
Chen et al.
-
Futureomni: Evaluating Future Forecasting From Omni-modal Context For Multimodal Llms
(2026)
• No Venue
Chen et al.
-
COMPASS: A Framework For Evaluating Organization-specific Policy Alignment In Llms
(2026)
• No Venue
Choi et al.
-
K-EXAONE Technical Report
(2026)
• No Venue
Choi et al.
-
What Users Leave Unsaid: Under-specified Queries Limit Vision-language Models
(2026)
• No Venue
Choi et al.
-
Translategemma Technical Report
(2026)
• No Venue
Finkelstein et al.
-
Plenoptic Video Generation
(2026)
• No Venue
Fu et al.
-
Endless Terminals: Scaling RL Environments For Terminal Agents
(2026)
• No Venue
Gandhi et al.
-
Bizfinbench.v2: A Unified Dual-mode Bilingual Benchmark For Expert-level Financial Capability Alignment
(2026)
• No Venue
Guo et al.
-
LTX-2: Efficient Joint Audio-visual Foundation Model
(2026)
• No Venue
Hacohen et al.
-
Unicorn: Towards Self-improving Unified Multimodal Models Through Self-generated Supervision
(2026)
• No Venue
Han et al.
-
Gutenocr: A Grounded Vision-language Front-end For Documents
(2026)
• No Venue
Heidenreich et al.
-
Dancing In Chains: Strategic Persuasion In Academic Rebuttal Via Theory Of Mind
(2026)
• No Venue
Zhitao He, Zongwei Lyu, Yi R Fung
-
Qwen3-tts Technical Report
(2026)
• No Venue
Hu et al.
-
Mmdeepresearch-bench: A Benchmark For Multimodal Deep Research Agents
(2026)
• No Venue
Huang et al.
-
Thinking With Map: Reinforced Parallel Map-augmented Agent For Geolocalization
(2026)
• No Venue
Ji et al.
-
Avmeme Exam: A Multimodal Multilingual Multicultural Benchmark For Llms' Contextual And Cultural Knowledge And Thinking
(2026)
• No Venue
Jiang et al.
-
Memory-v2v: Augmenting Video-to-video Diffusion Models With Memory
(2026)
• No Venue
Lee et al.
-
Lost In The Noise: How Reasoning Models Fail With Contextual Distractors
(2026)
• No Venue
Lee et al.
-
Qwen3-vl-embedding And Qwen3-vl-reranker: A Unified Framework For State-of-the-art Multimodal Retrieval And Ranking
(2026)
• No Venue
Li et al.
-
Dreamstyle: A Unified Framework For Video Stylization
(2026)
• No Venue
Li et al.
-
Agencybench: Benchmarking The Frontiers Of Autonomous Agents In 1m-token Real-world Contexts
(2026)
• No Venue
Li et al.
-
Rubrichub: A Comprehensive And Highly Discriminative Rubric Dataset Via Automated Coarse-to-fine Generation
(2026)
• No Venue
Li et al.
-
Toolprmbench: Evaluating And Advancing Process Reward Models For Tool-using Agents
(2026)
• No Venue
Li et al.
-
Scientific Image Synthesis: Benchmarking, Methodologies, And Downstream Utility
(2026)
• No Venue
Lin et al.
-
GDPO: Group Reward-decoupled Normalization Policy Optimization For Multi-reward RL Optimization
(2026)
• No Venue
Liu et al.
-
Skinflow: Efficient Information Transmission For Open Dermatological Diagnosis Via Dynamic Visual Encoding And Staged RL
(2026)
• No Venue
Liu et al.
-
Same Claim, Different Judgment: Benchmarking Scenario-induced Bias In Multilingual Financial Misinformation Detection
(2026)
• No Venue
Liu et al.
-
Numina-lean-agent: An Open And General Agentic Reasoning System For Formal Mathematics
(2026)
• No Venue
Liu et al.
-
Simplemem: Efficient Lifelong Memory For LLM Agents
(2026)
• No Venue
Liu et al.
-
Vidore V3: A Comprehensive Evaluation Of Retrieval Augmented Generation In Complex Real-world Scenarios
(2026)
• No Venue
Loison et al.
-
Beyond Static Tools: Test-time Tool Evolution For Scientific Reasoning
(2026)
• No Venue
Lu et al.
-
A Safety Report On GPT-5.2, Gemini 3 Pro, Qwen3-vl, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, And Seedream 4.5
(2026)
• No Venue
Ma et al.
-
Terminal-bench: Benchmarking Agents On Hard, Realistic Tasks In Command Line Interfaces
(2026)
• No Venue
Merrill et al.
-
A Bertology View Of LLM Orchestrations: Token- And Layer-selective Probes For Efficient Single-pass Classification
(2026)
• No Venue
Gonzalo Ariel Meyoyan, Luciano del Corro
-
Opendecoder: Open Large Language Model Decoding To Incorporate Document Quality In RAG
(2026)
• No Venue
Mo et al.
-
The Script Is All You Need: An Agentic Framework For Long-horizon Dialogue-to-cinematic Video Generation
(2026)
• No Venue
Mu et al.
-
Typhoon OCR: Open Vision-language Model For Thai Document Extraction
(2026)
• No Venue
Nonesung et al.
-
Benchmark^2: Systematic Evaluation Of LLM Benchmarks
(2026)
• No Venue
Qian et al.
-
Future Optical Flow Prediction Improves Robot Control & Video Generation
(2026)
• No Venue
Ranasinghe et al.
-
Danqing: An Up-to-date Large-scale Chinese Vision-language Pre-training Dataset
(2026)
• No Venue
Shen et al.
-
Envscaler: Scaling Tool-interactive Environments For LLM Agent Via Programmatic Synthesis
(2026)
• No Venue
Song et al.
-
Typhoon ASR Real-time: Fastconformer-transducer For Thai Automatic Speech Recognition
(2026)
• No Venue
Sirichotedumrong et al.
-
Rethinking Composed Image Retrieval Evaluation: A Fine-grained Benchmark From Image Editing
(2026)
• No Venue
Song et al.
-
When Personalization Misleads: Understanding And Mitigating Hallucinations In Personalized Llms
(2026)
• No Venue
Sun et al.
-
World Craft: Agentic Framework To Create Visualizable Worlds Via Text
(2026)
• No Venue
Sun et al.
-
Lightonocr: A 1B End-to-end Multilingual Vision-language Model For State-of-the-art OCR
(2026)
• No Venue
Said Taghadouini, Adrien Cavaillès, Baptiste Aubertin
-
Talk2move: Reinforcement Learning For Text-instructed Object-level Geometric Transformation In Scenes
(2026)
• No Venue
Tan et al.
-
Memoryrewardbench: Benchmarking Reward Models For Long-term Memory Management In Large Language Models
(2026)
• No Venue
Tang et al.
-
Kv-embedding: Training-free Text Embedding Via Internal KV Re-routing In Decoder-only Llms
(2026)
• No Venue
Yixuan Tang, Yi Yang
-
Liberty: A Causal Framework For Benchmarking Concept-based Explanations Of Llms With Structural Counterfactuals
(2026)
• No Venue
Toker et al.
-
Cof-t2i: Video Models As Pure Visual Reasoners For Text-to-image Generation
(2026)
• No Venue
Tong et al.
-
Inference-time Scaling Of Verification: Self-evolving Deep Research Agents Via Test-time Rubric-guided Verification
(2026)
• No Venue
Wan et al.
-
The Illusion Of Specialization: Unveiling The Domain-invariant "standing Committee" In Mixture-of-experts Models
(2026)
• No Venue
Wang et al.
-
Swe-pruner: Self-adaptive Context Pruning For Coding Agents
(2026)
• No Venue
Wang et al.
-
Openrt: An Open-source Red Teaming Framework For Multimodal Llms
(2026)
• No Venue
Wang et al.
-
Deepresearcheval: An Automated Framework For Deep Research Task Construction And Agentic Evaluation
(2026)
• No Venue
Wang et al.
-
Visgym: Diverse, Customizable, Scalable Environments For Multimodal Agents
(2026)
• No Venue
Wang et al.
-
Knowme-bench: Benchmarking Person Understanding For Lifelong Digital Companions
(2026)
• No Venue
Wu et al.
-
A Pragmatic VLA Foundation Model
(2026)
• No Venue
Wu et al.
-
Atlas: Orchestrating Heterogeneous Models And Tools For Multi-domain Complex Reasoning
(2026)
• No Venue
Wu et al.
-
Visual Generation Unlocks Human-like Reasoning Through Multimodal World Models
(2026)
• No Venue
Wu et al.
-
Mmformalizer: Multimodal Autoformalization In The Wild
(2026)
• No Venue
Xiong et al.
-
Illusions Of Confidence? Diagnosing LLM Truthfulness Via Neighborhood Consistency
(2026)
• No Venue
Xu et al.
-
Livecodebench Pro: How Do Olympiad Medalists Judge Llms In Competitive Programming?
(2025)
• No Venue
Zheng et al.
-
An Empirical Study Of Qwen3 Quantization
(2025)
• No Venue
Zheng et al.
-
Lex-art: Rethinking Text Generation Via Scalable High-quality Data Synthesis
(2025)
• No Venue
Zhao et al.
-
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study On Information-seeking QA Over Scientific Papers
(2025)
• No Venue
Zhao et al.
-
Babel: Open Multilingual Large Language Models Serving Over 90% Of Global Speakers
(2025)
• No Venue
Zhao et al.
-
Achieving Olympia-level Geometry Large Language Model Agent Via Complexity Boosting Reinforcement Learning
(2025)
• No Venue
Zhao et al.
-
Abgen: Evaluating Large Language Models In Ablation Study Design And Evaluation For Scientific Research
(2025)
• No Venue
Zhao et al.
-
Envisioning Beyond The Pixels: Benchmarking Reasoning-informed Visual Editing
(2025)
• No Venue
Zhao et al.
-
DICEPTION: A Generalist Diffusion Model For Visual Perceptual Tasks
(2025)
• No Venue
Zhao et al.
-
Hifi-sr: A Unified Generative Transformer-convolutional Adversarial Network For High-fidelity Speech Super-resolution
(2025)
• No Venue
Zhao et al.
-
VLA^2: Empowering Vision-language-action Models With An Agentic Framework For Unseen Concept Manipulation
(2025)
• No Venue
Zhao et al.
-
Ultraimage: Rethinking Resolution Extrapolation In Image Diffusion Transformers
(2025)
• No Venue
Zhao et al.
-
Ultravico: Breaking Extrapolation Limits In Video Diffusion Transformers
(2025)
• No Venue
Zhao et al.
-
Qwen3guard Technical Report
(2025)
• No Venue
Zhao et al.
-
Sciarena: An Open Evaluation Platform For Foundation Models In Scientific Literature Tasks
(2025)
• No Venue
Zhao et al.
-
Sample, Scrutinize And Scale: Effective Inference-time Search By Scaling Verification
(2025)
• No Venue
Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi
-
One Token To Fool Llm-as-a-judge
(2025)
• No Venue
Zhao et al.
-
Generative Universal Verifier As Multimodal Meta-reasoner
(2025)
• No Venue
Zhang et al.
-
CCD: Mitigating Hallucinations In Radiology Mllms Via Clinical Contrastive Decoding
(2025)
• No Venue
Zhang et al.
-
Agent Models: Internalizing Chain-of-action Generation Into Reasoning Models
(2025)
• No Venue
Zhang et al.
-
Autoenv: Automated Environments For Measuring Cross-environment Agent Learning
(2025)
• No Venue
Zhang et al.
-
API Agents Vs. GUI Agents: Divergence And Convergence
(2025)
• No Venue
Zhang et al.
-
Artifactsbench: Bridging The Visual-interactive Gap In LLM Code Generation Evaluation
(2025)
• No Venue
Zhang et al.
-
Both Semantics And Reconstruction Matter: Making Representation Encoders Ready For Text-to-image Generation And Editing
(2025)
• No Venue
Zhang et al.
-
Bee: A High-quality Corpus And Full-stack Suite To Unlock Advanced Fully Open Mllms
(2025)
• No Venue
Zhang et al.
-
From Spatial To Actions: Grounding Vision-language-action Model In Spatial Foundation Priors
(2025)
• No Venue
Zhang et al.
-
Deeptheorem: Advancing LLM Reasoning For Theorem Proving Through Natural Language And Reinforcement Learning
(2025)
• No Venue
Zhang et al.
-
Compassjudger-2: Towards Generalist Judge Model Via Verifiable Rewards
(2025)
• No Venue
Zhang et al.
-
Codecriticbench: A Holistic Code Critique Benchmark For Large Language Models
(2025)
• No Venue
Zhang et al.
-
Certified Mitigation Of Worst-case LLM Copyright Infringement
(2025)
• No Venue
Zhang et al.
-
DITING: A Multi-agent Evaluation Framework For Benchmarking Web Novel Translation
(2025)
• No Venue
Zhang et al.
-
Enabling Versatile Controls For Video Diffusion Models
(2025)
• No Venue
Zhang et al.
-
Bourbaki: Self-generated And Goal-conditioned Mdps For Theorem Proving
(2025)
• No Venue
Zimmer et al.
-
Latent Collaboration In Multi-agent Systems
(2025)
• No Venue
Zou et al.
-
M3: 3d-spatial Multimodal Memory
(2025)
• No Venue
Zou et al.
-
Timelens: Rethinking Video Temporal Grounding With Multimodal Llms
(2025)
• No Venue
Zhang et al.
-
S1-bench: A Simple Benchmark For Evaluating System 1 Thinking Capability Of Large Reasoning Models
(2025)
• No Venue
Zhang et al.
-
Redundancy Principles For Mllms Benchmarks
(2025)
• No Venue
Zhang et al.
-
Soft Thinking: Unlocking The Reasoning Potential Of Llms In Continuous Concept Space
(2025)
• No Venue
Zhang et al.
-
Sentient Agent As A Judge: Evaluating Higher-order Social Cognition In Large Language Models
(2025)
• No Venue
Zhang et al.
-
Thyme: Think Beyond Images
(2025)
• No Venue
Zhang et al.
-
Swe-bench Goes Live!
(2025)
• No Venue
Zhang et al.
-
Waver: Wave Your Way To Lifelike Video Generation
(2025)
• No Venue
Zhang et al.
-
Vlm^2-bench: A Closer Look At How Well Vlms Implicitly Link Explicit Matching Visual Cues
(2025)
• No Venue
Zhang et al.
-
Videorepa: Learning Physics For Video Generation Through Relational Alignment With Foundation Models
(2025)
• No Venue
Zhang et al.
-
UFO^3: Weaving The Digital Agent Galaxy
(2025)
• No Venue
Zhang et al.
-
Tensor Product Attention Is All You Need
(2025)
• No Venue
Zhang et al.
-
T2r-bench: A Benchmark For Generating Article-level Reports From Real World Industrial Tables
(2025)
• No Venue
Zhang et al.
-
Embodied-reasoner: Synergizing Visual Search, Reasoning, And Action For Embodied Interactive Tasks
(2025)
• No Venue
Zhang et al.
-
Flashvideo:flowing Fidelity To Detail For Efficient High-resolution Video Generation
(2025)
• No Venue
Zhang et al.
-
MM-RLHF: The Next Step Forward In Multimodal LLM Alignment
(2025)
• No Venue
Zhang et al.
-
Llava-mini: Efficient Image And Video Large Multimodal Models With One Vision Token
(2025)
• No Venue
Zhang et al.
-
Iheval: Evaluating Language Models On Following The Instruction Hierarchy
(2025)
• No Venue
Zhang et al.
-
How Far Are We From Genuinely Useful Deep Research Agents?
(2025)
• No Venue
Zhang et al.
-
Innogym: Benchmarking The Innovation Potential Of AI Agents
(2025)
• No Venue
Zhang et al.
-
In-context Edit: Enabling Instructional Image Editing With In-context Generation In Large Scale Diffusion Transformer
(2025)
• No Venue
Zhang et al.
-
Qwen3 Embedding: Advancing Text Embedding And Reranking Through Foundation Models
(2025)
• No Venue
Zhang et al.
-
Phystoolbench: Benchmarking Physical Tool Understanding For Mllms
(2025)
• No Venue
Zhang et al.
-
Openmmreasoner: Pushing The Frontiers For Multimodal Reasoning With An Open And General Recipe
(2025)
• No Venue
Zhang et al.
-
MUG-V 10B: High-efficiency Training Pipeline For Large Video Generation Models
(2025)
• No Venue
Zhang et al.
-
Mlrc-bench: Can Language Agents Solve Machine Learning Research Challenges?
(2025)
• No Venue
Zhang et al.
-
Minimax-speech: Intrinsic Zero-shot Text-to-speech With A Learnable Speaker Encoder
(2025)
• No Venue
Zhang et al.
-
Memevolve: Meta-evolution Of Agent Memory Systems
(2025)
• No Venue
Zhang et al.
-
MARS: A Multi-agent Framework Incorporating Socratic Guidance For Automated Prompt Optimization
(2025)
• No Venue
Zhang et al.
-
IVY-FAKE: A Unified Explainable Framework And Benchmark For Image And Video AIGC Detection
(2025)
• No Venue
Zhang et al.
-
V-MAGE: A Game Evaluation Framework For Assessing Visual-centric Capabilities In Multimodal Large Language Models
(2025)
• No Venue
Zheng et al.
-
Seeing From Another Perspective: Evaluating Multi-view Understanding In Mllms
(2025)
• No Venue
Yeh et al.
-
Step Back To Leap Forward: Self-backtracking For Boosting Reasoning Of Language Models
(2025)
• No Venue
Yang et al.
-
Qwen2.5-1m Technical Report
(2025)
• No Venue
Yang et al.
-
Longvt: Incentivizing "thinking With Long Videos" Via Native Tool Calling
(2025)
• No Venue
Yang et al.
-
Medical World Model: Generative Simulation Of Tumor Evolution For Treatment Planning
(2025)
• No Venue
Yang et al.
-
Mantis: A Versatile Vision-language-action Model With Disentangled Visual Foresight
(2025)
• No Venue
Yang et al.
-
Multiverse: Your Language Models Secretly Decide How To Parallelize And Merge Generation
(2025)
• No Venue
Yang et al.
-
Moose-chem2: Exploring LLM Limits In Fine-grained Scientific Hypothesis Discovery Via Hierarchical Search
(2025)
• No Venue
Yang et al.
-
Qwen3 Technical Report
(2025)
• No Venue
Yang et al.
-
Sceneweaver: All-in-one 3D Scene Synthesis With An Extensible And Self-reflective Agent
(2025)
• No Venue
Yang et al.
-
Vidtext: Towards Comprehensive Evaluation For Video Text Understanding
(2025)
• No Venue
Yang et al.
-
Structeval: Benchmarking Llms' Capabilities To Generate Structural Outputs
(2025)
• No Venue
Yang et al.
-
Swe-smith: Scaling Data For Software Engineering Agents
(2025)
• No Venue
Yang et al.
-
The Mirage Of Model Editing: Revisiting Evaluation In The Wild
(2025)
• No Venue
Yang et al.
-
Towards Physically Plausible Video Generation Via VLM Planning
(2025)
• No Venue
Yang et al.
-
Zerogui: Automating Online GUI Learning At Zero Human Cost
(2025)
• No Venue
Yang et al.
-
Wikiautogen: Towards Multi-modal Wikipedia-style Article Generation
(2025)
• No Venue
Yang et al.
-
Are Reasoning Models More Prone To Hallucination?
(2025)
• No Venue
Yao et al.
-
A Rigorous Benchmark With Multidimensional Evaluation For Deep Research Agents: From Answers To Reports
(2025)
• No Venue
Yao et al.
-
Through-the-mask: Mask-based Motion Trajectories For Image-to-video Generation
(2025)
• No Venue
Yariv et al.
-
Timechat-online: 80% Visual Tokens Are Naturally Redundant In Streaming Videos
(2025)
• No Venue
Yao et al.
-
Spin-bench: How Well Do Llms Plan Strategically And Reason Socially?
(2025)
• No Venue
Yao et al.
-
Black-box On-policy Distillation Of Large Language Models
(2025)
• No Venue
Ye et al.
-
Agentfold: Long-horizon Web Agents With Proactive Context Management
(2025)
• No Venue
Ye et al.
-
A Multi-dimensional Constraint Framework For Evaluating And Improving Instruction Following In Large Language Models
(2025)
• No Venue
Ye et al.
-
Realgen: Photorealistic Text-to-image Generation Via Detector-guided Rewards
(2025)
• No Venue
Ye et al.
-
Echo-4o: Harnessing The Power Of Gpt-4o Synthetic Images For Improved Image Generation
(2025)
• No Venue
Ye et al.
-
Imgedit: A Unified Image Editing Dataset And Benchmark
(2025)
• No Venue
Ye et al.
-
Primitiveanything: Human-crafted 3D Primitive Assembly Generation With Auto-regressive Transformer
(2025)
• No Venue
Ye et al.
-
Toolhop: A Query-driven Benchmark For Evaluating Large Language Models In Multi-hop Tool Use
(2025)
• No Venue
Ye et al.
-
VLA-R1: Enhancing Reasoning In Vision-language-action Models
(2025)
• No Venue
Ye et al.
-
Parrot: Persuasion And Agreement Robustness Rating Of Output Truth -- A Sycophancy Robustness Benchmark For Llms
(2025)
• No Venue
Yusuf Çelebi, Mahmoud El Hussieni, Özay Ezerceli
-
Pangu Ultra: Pushing The Limits Of Dense Large Language Models On Ascend Npus
(2025)
• No Venue
Yin et al.
-
CLEAR: Error Analysis Via Llm-as-a-judge Made Easy
(2025)
• No Venue
Yehudai et al.
-
Tattoo: Tool-grounded Thinking PRM For Test-time Scaling In Tabular Reasoning
(2025)
• No Venue
Zou et al.
-
Mixture Of Global And Local Experts With Diffusion Transformer For Controllable Face Generation
(2025)
• No Venue
Zou et al.
-
Multi-swe-bench: A Multilingual Benchmark For Issue Resolving
(2025)
• No Venue
Zan et al.
-
Aralingbench A Human-annotated Benchmark For Evaluating Arabic Linguistic Capabilities Of Large Language Models
(2025)
• No Venue
Zbib et al.
-
Reinforcing Multi-turn Reasoning In LLM Agents Via Turn-level Credit Assignment
(2025)
• No Venue
Zeng et al.
-
Futurex: An Advanced Live Benchmark For LLM Agents In Future Prediction
(2025)
• No Venue
Zeng et al.
-
Uitron: Foundational GUI Agent With Advanced Perception And Planning
(2025)
• No Venue
Zeng et al.
-
Vstyle: A Benchmark For Voice Style Adaptation With Spoken Instructions
(2025)
• No Venue
Zhan et al.
-
Survey On Evaluation Of Llm-based Agents
(2025)
• No Venue
Yehudai et al.
-
Infinity-rope: Action-controllable Infinite Video Generation Emerges From Autoregressive Self-rollout
(2025)
• No Venue
Yesiltepe et al.
-
Magicinfinite: Generating Infinite Talking Videos With Your Words And Voice
(2025)
• No Venue
Yi et al.
-
Too Good To Be Bad: On The Failure Of Llms To Role-play Villains
(2025)
• No Venue
Yi et al.
-
Livemcp-101: Stress Testing And Diagnosing Mcp-enabled Agents On Challenging Queries
(2025)
• No Venue
Yin et al.
-
Aionopedia: An LLM Agent Orchestrating Multimodal Learning For Ionic Liquid Discovery
(2025)
• No Venue
Yin et al.
-
Lazydrag: Enabling Stable Drag-based Editing On Multi-modal Diffusion Transformers Via Explicit Correspondence
(2025)
• No Venue
Yin et al.
-
Evaluating Parameter Efficient Methods For RLVR
(2025)
• No Venue
Yin et al.
-
REASONEDIT: Towards Reasoning-enhanced Image Editing Models
(2025)
• No Venue
Yin et al.
-
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures
(2025)
• No Venue
Ying et al.
-
Stresstest: Can YOUR Speech LM Handle The Stress?
(2025)
• No Venue
Iddo Yosha, Gallil Maimon, Yossi Adi
-
4kagent: Agentic Any Image To 4K Super-resolution
(2025)
• No Venue
Zuo et al.
-
Livetradebench: Seeking Real-world Alpha With Large Language Models
(2025)
• No Venue
Haofei Yu, Fenghai Li, Jiaxuan You
-
Alpharesearch: Accelerating New Algorithm Discovery With Language Models
(2025)
• No Venue
Yu et al.
-
Introducing Visual Perception Token Into Multimodal Large Language Model
(2025)
• No Venue
Runpeng Yu, Xinyin Ma, Xinchao Wang
-
Formalmath: Benchmarking Formal Mathematical Reasoning Of Large Language Models
(2025)
• No Venue
Yu et al.
-
Dimple: Discrete Diffusion Multimodal Large Language Model With Parallel Decoding
(2025)
• No Venue
Runpeng Yu, Xinyin Ma, Xinchao Wang
-
Generating Symbolic World Models Via Test-time Scaling Of Large Language Models
(2025)
• No Venue
Yu et al.
-
Guided Self-evolving Llms With Minimal Human Supervision
(2025)
• No Venue
Yu et al.
-
RLPR: Extrapolating RLVR To General Domains Without Verifiers
(2025)
• No Venue
Yu et al.
-
PRELUDE: A Benchmark Designed To Require Global Comprehension And Reasoning Over Long Contexts
(2025)
• No Venue
Yu et al.
-
Omnialpha: A Sequence-to-sequence Framework For Unified Multi-task RGBA Generation
(2025)
• No Venue
Yu et al.
-
Minicpm-v 4.5: Cooking Efficient Mllms Via Architecture, Data, And Training Recipe
(2025)
• No Venue
Yu et al.
-
Pixeldit: Pixel Diffusion Transformers For Image Generation
(2025)
• No Venue
Yu et al.
-
QUASAR: Quantum Assembly Code Generation Using Tool-augmented Llms Via Agentic RL
(2025)
• No Venue
Yu et al.
-
Trajselector: Harnessing Latent Representations For Efficient And Effective Best-of-n In Large Reasoning Model
(2025)
• No Venue
Yu et al.
-
Vrbench: A Benchmark For Multi-step Reasoning In Long Narrative Videos
(2025)
• No Venue
Yu et al.
-
Agent-r: Training Language Model Agents To Reflect Via Iterative Self-training
(2025)
• No Venue
Yuan et al.
-
Tarsier2: Advancing Large Vision-language Models From Detailed Video Description To Comprehensive Video Understanding
(2025)
• No Venue
Yuan et al.
-
Eoc-bench: Can Mllms Identify, Recall, And Forecast Objects In An Egocentric World?
(2025)
• No Venue
Yuan et al.
-
Efficientllm: Efficiency In Large Language Models
(2025)
• No Venue
Yuan et al.
-
Hallucinations Can Improve Large Language Models In Drug Discovery
(2025)
• No Venue
Shuzhou Yuan, Michael Färber
-
Give Me FP32 Or Give Me Death? Challenges And Solutions For Reproducible Reasoning
(2025)
• No Venue
Yuan et al.
-
Mme-reasoning: A Comprehensive Benchmark For Logical Reasoning In Mllms
(2025)
• No Venue
Yuan et al.
-
Yue: Scaling Open Foundation Models For Long-form Music Generation
(2025)
• No Venue
Yuan et al.
-
Vl-cogito: Progressive Curriculum Reinforcement Learning For Advanced Multimodal Reasoning
(2025)
• No Venue
Yuan et al.
-
Med-prm: Medical Reasoning Models With Stepwise, Guideline-verified Process Rewards
(2025)
• No Venue
Yun et al.
-
Ewmbench: Evaluating Scene, Motion, And Semantic Quality In Embodied World Models
(2025)
• No Venue
Yue et al.
-
Uni-mmmu: A Massive Multi-discipline Multimodal Unified Benchmark
(2025)
• No Venue
Zou et al.
-
Designlab: Designing Slides Through Iterative Detection And Correction
(2025)
• No Venue
Yun et al.
-
Newtonbench: Benchmarking Generalizable Scientific Law Discovery In LLM Agents
(2025)
• No Venue
Zheng et al.
-
Scaling Test-time Compute For LLM Agents
(2025)
• No Venue
Zhu et al.
-
Towards Faithful And Controllable Personalization Via Critique-post-edit Reinforcement Learning
(2025)
• No Venue
Zhu et al.
-
When Does Reasoning Matter? A Controlled Study Of Reasoning's Contribution To Model Performance
(2025)
• No Venue
Boizard et al.
-
Self-improving LLM Agents At Test-time
(2025)
• No Venue
Acikgoz et al.
-
Competitive Programming With Large Reasoning Models
(2025)
• No Venue
Openai et al.
-
Kimi-audio Technical Report
(2025)
• No Venue
Kimiteam et al.
-
World Simulation With Video Foundation Models For Physical AI
(2025)
• No Venue
Nvidia et al.
-
Wan: Open And Advanced Large-scale Video Generative Models
(2025)
• No Venue
Wanteam et al.
-
Placeit3d: Language-guided Object Placement In Real 3D Scenes
(2025)
• No Venue
Abdelreheem et al.
-
Phi-4-reasoning Technical Report
(2025)
• No Venue
Abdin et al.
-
Ask In Any Modality: A Comprehensive Survey On Multimodal Retrieval-augmented Generation
(2025)
• No Venue
Abootorabi et al.
-
Parameters Vs Flops: Scaling Laws For Optimal Sparsity For Mixture-of-experts Language Models
(2025)
• No Venue
Abnar et al.
-
Opencodereasoning-ii: A Simple Test Time Scaling Approach Via Self-critique
(2025)
• No Venue
Ahmad et al.
-
Language Models' Factuality Depends On The Language Of Inquiry
(2025)
• No Venue
Aggarwal et al.
-
Pillar-0: A New Frontier For Radiology Foundation Models
(2025)
• No Venue
Agrawal et al.
-
Fine-grained Perturbation Guidance Via Attention Head Selection
(2025)
• No Venue
Ahn et al.
-
Sadeed: Advancing Arabic Diacritization Through Small Language Model
(2025)
• No Venue
Aldallal et al.
-
Fine-tuning On Noisy Instructions: Effects On Generalization And Performance
(2025)
• No Venue
Ahmed Alajrami, Xingwei Tan, Nikolaos Aletras
-
NEMOTRON-CROSSTHINK: Scaling Self-learning Beyond Math Reasoning
(2025)
• No Venue
Akter et al.
-
Amo-bench: Large Language Models Still Struggle In High School Math Competitions
(2025)
• No Venue
An et al.
-
Judging Quality Across Languages: A Multilingual Approach To Pretraining Data Filtering With Language Models
(2025)
• No Venue
Ali et al.
-
Atla Selene Mini: A General Purpose Evaluation Model
(2025)
• No Venue
Alexandru et al.
-
Toksuite: Measuring The Impact Of Tokenizer Choice On Language Model Behavior
(2025)
• No Venue
Altıntaş et al.
-
Open Deep Search: Democratizing Search With Open-source Reasoning Agents
(2025)
• No Venue
Alzubi et al.
-
VOYAGER: A Training Free Approach For Generating Diverse Datasets Using Llms
(2025)
• No Venue
Amballa et al.
-
Kandinsky 5.0: A Family Of Foundation Models For Image And Video Generation
(2025)
• No Venue
Arkhipkin et al.
-
Ultraif: Advancing Instruction Following From The Wild
(2025)
• No Venue
An et al.
-
Herobench: A Benchmark For Long-horizon Planning And Structured Reasoning In Virtual Worlds
(2025)
• No Venue
Anokhin et al.
-
Fine-tuning Small Language Models For Domain-specific AI: An Edge AI Perspective
(2025)
• No Venue
Aralimatti et al.
-
R3: Robust Rubric-agnostic Reward Models
(2025)
• No Venue
Anugraha et al.
-
Tabstar: A Foundation Tabular Model With Semantically Target-aware Representations
(2025)
• No Venue
Alan Arazi, Eilam Shapira, Roi Reichart
-
Early External Safety Testing Of Openai's O3-mini: Insights From The Pre-deployment Evaluation
(2025)
• No Venue
Arrieta et al.
-
O3-mini Vs Deepseek-r1: Which One Is Safer?
(2025)
• No Venue
Arrieta et al.
-
HUME: Measuring The Human-model Performance Gap In Text Embedding Task
(2025)
• No Venue
Assadi et al.
-
What Does It Take To Be A Good AI Research Agent? Studying The Role Of Ideation Diversity
(2025)
• No Venue
Audran-Reiss et al.
-
Sketch-of-thought: Efficient LLM Reasoning With Adaptive Cognitive-inspired Sketching
(2025)
• No Venue
Simon A. Aytes, Jinheon Baek, Sung Ju Hwang
-
Swe-rebench: An Automated Pipeline For Task Collection And Decontaminated Evaluation Of Software Engineering Agents
(2025)
• No Venue
Badertdinov et al.
-
Llama-embed-nemotron-8b: A Universal Text Embedding Model For Multilingual And Cross-lingual Tasks
(2025)
• No Venue
Babakhin et al.
-
Hybrid Architectures For Language Models: Systematic Analysis And Design Insights
(2025)
• No Venue
Bae et al.
-
Qwen3-vl Technical Report
(2025)
• No Venue
Bai et al.
-
Intern-s1: A Scientific Multimodal Foundation Model
(2025)
• No Venue
Bai et al.
-
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
(2025)
• No Venue
Bai et al.
-
Weak-to-strong Diffusion With Reflection
(2025)
• No Venue
Lichen Bai, Masashi Sugiyama, Zeke Xie
-
Recammaster: Camera-controlled Generative Rendering From A Single Video
(2025)
• No Venue
Bai et al.
-
Honeybee: Data Recipes For Vision-language Reasoners
(2025)
• No Venue
Bansal et al.
-
Sd3.5-flash: Distribution-guided Distillation Of Generative Flows
(2025)
• No Venue
Bandyopadhyay et al.
-
Inference-time Scaling For Complex Tasks: Where We Stand And What Lies Ahead
(2025)
• No Venue
Balachandran et al.
-
Block Cascading: Training Free Acceleration Of Block-causal Video Models
(2025)
• No Venue
Bandyopadhyay et al.
-
Herbench: A Benchmark For Multi-evidence Integration In Video Question Answering
(2025)
• No Venue
Ben-Ami et al.
-
Searchinstruct: Enhancing Domain Adaptation Via Retrieval-based Instruction Dataset Creation
(2025)
• No Venue
Iman Barati, Mostafa Amiri, Heshaam Faili
-
Guardians Of The Agentic System: Preventing Many Shots Jailbreak With Agentic System
(2025)
• No Venue
Barua et al.
-
Exploiting Instruction-following Retrievers For Malicious Information Retrieval
(2025)
• No Venue
Parishad Behnamghader, Nicholas Meade, Siva Reddy
-
Clinical Knowledge In Llms Does Not Translate To Human Interactions
(2025)
• No Venue
Bean et al.
-
KV Cache Steering For Inducing Reasoning In Small Language Models
(2025)
• No Venue
Belitsky et al.
-
NOVA: A Benchmark For Anomaly Localization And Clinical Reasoning In Brain MRI
(2025)
• No Venue
Bercea et al.
-
Openbeats: A Fully Open-source General-purpose Audio Encoder
(2025)
• No Venue
Bharadwaj et al.
-
Value Drifts: Tracing Value Alignment During LLM Post-training
(2025)
• No Venue
Bhatia et al.
-
Why Reasoning Matters? A Survey Of Advancements In Multimodal Reasoning (v1)
(2025)
• No Venue
Bi et al.
-
VERIFY: A Benchmark Of Visual Explanation And Reasoning For Investigating Multimodal Reasoning Fidelity
(2025)
• No Venue
Bi et al.
-
Bigcodearena: Unveiling More Reliable Human Preferences In Code Generation Via Execution
(2025)
• No Venue
Zhuo et al.
-
Less Is More: Training-free Sparse Attention With Global Locality For Efficient Reasoning
(2025)
• No Venue
Yang et al.
-
Multi-domain Explainability Of Preferences
(2025)
• No Venue
Nitay Calderon, Liat Ein-Dor, Roi Reichart
-
Language Models Model Language
(2025)
• No Venue
Łukasz Borchmann
-
A Data-centric Framework For Addressing Phonetic And Prosodic Challenges In Russian Speech Generative Models
(2025)
• No Venue
Borodin et al.
-
Neobert: A Next-generation BERT
(2025)
• No Venue
Breton et al.
-
Video Action Differencing
(2025)
• No Venue
Burgess et al.
-
The Massive Legal Embedding Benchmark (MLEB)
(2025)
• No Venue
Umar Butler, Abdur-Rahman Butler, Adrian Lucas Malec
-
Dentalgpt: Incentivizing Multimodal Complex Reasoning In Dentistry
(2025)
• No Venue
Cai et al.
-
Are You Getting What You Pay For? Auditing Model Substitution In LLM Apis
(2025)
• No Venue
Cai et al.
-
Opendataarena: A Fair And Open Arena For Benchmarking Post-training Dataset Value
(2025)
• No Venue
Cai et al.
-
MMGR: Multi-modal Generative Reasoning
(2025)
• No Venue
Cai et al.
-
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
(2025)
• No Venue
Cai et al.
-
MM-IQ: Benchmarking Human-like Abstraction And Reasoning In Multimodal Models
(2025)
• No Venue
Huanqia Cai, Yijun Yang, Winston Hu
-
MORSE-500: A Programmatically Controllable Video Benchmark To Stress-test Multimodal Reasoning
(2025)
• No Venue
Cai et al.
-
Bigo(bench) -- Can Llms Generate Code With Controlled Time And Space Complexity?
(2025)
• No Venue
Chambon et al.
-
Why Do Multi-agent LLM Systems Fail?
(2025)
• No Venue
Cemri et al.
-
Hunyuanimage 3.0 Technical Report
(2025)
• No Venue
Cao et al.
-
Finaudio: A Benchmark For Audio Large Language Models In Financial Applications
(2025)
• No Venue
Cao et al.
-
Compose Your Policies! Improving Diffusion-based Or Flow-based Robot Policies Via Test-time Distribution-level Composition
(2025)
• No Venue
Cao et al.
-
Video Simpleqa: Towards Factuality Evaluation In Large Video Language Models
(2025)
• No Venue
Cao et al.
-
T2av-compass: Towards Unified Evaluation For Text-to-audio-video Generation
(2025)
• No Venue
Cao et al.
-
Quartet: Native FP4 Training Can Be Optimal For Large Language Models
(2025)
• No Venue
Castro et al.
-
A3: Android Agent Arena For Mobile GUI Agents
(2025)
• No Venue
Chai et al.
-
Web-shepherd: Advancing Prms For Reinforcing Web Agents
(2025)
• No Venue
Chae et al.
-
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-mesh Representation, And Evaluation Metrics
(2025)
• No Venue
Chae-Yeon et al.
-
Oneig-bench: Omni-dimensional Nuanced Evaluation For Image Generation
(2025)
• No Venue
Chang et al.
-
Scaling Open-ended Reasoning To Predict The Future
(2025)
• No Venue
Chandak et al.
-
Game-time: Evaluating Temporal Dynamics In Spoken Language Models
(2025)
• No Venue
Chang et al.
-
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages And Cultures
(2025)
• No Venue
Chang et al.
-
Communication-efficient Language Model Training Scales Reliably And Robustly: Scaling Laws For Diloco
(2025)
• No Venue
Charles et al.
-
Where LLM Agents Fail And How They Can Learn From Failures
(2025)
• No Venue
Zhu et al.
-
Code2video: A Code-centric Paradigm For Educational Video Generation
(2025)
• No Venue
Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou
-
An Empirical Study Of Gpt-4o Image Generation Capabilities
(2025)
• No Venue
Chen et al.
-
Agentfrontier: Expanding The Capability Frontier Of LLM Agents With Zpd-guided Data Synthesis
(2025)
• No Venue
Chen et al.
-
Coda: Agentic Systems For Collaborative Data Visualization
(2025)
• No Venue
Chen et al.
-
Blip3o-next: Next Frontier Of Native Image Generation
(2025)
• No Venue
Chen et al.
-
Autopr: Let's Automate Your Academic Promotion!
(2025)
• No Venue
Chen et al.
-
Browsecomp-plus: A More Fair And Transparent Evaluation Benchmark Of Deep-research Agent
(2025)
• No Venue
Chen et al.
-
Mindwatcher: Toward Smarter Multimodal Tool-integrated Reasoning
(2025)
• No Venue
Chen et al.
-
Comp: Continual Multimodal Pre-training For Vision Foundation Models
(2025)
• No Venue
Chen et al.
-
Enigmata: Scaling Logical Reasoning In Large Language Models With Synthetic Verifiable Puzzles
(2025)
• No Venue
Chen et al.
-
FINEREASON: Evaluating And Improving Llms' Deliberate Reasoning Through Reflective Puzzle Solving
(2025)
• No Venue
Chen et al.
-
Humo: Human-centric Video Generation Via Collaborative Multi-modal Conditioning
(2025)
• No Venue
Chen et al.
-
Goku: Flow Based Video Generative Foundation Models
(2025)
• No Venue
Chen et al.
-
Go With Your Gut: Scaling Confidence For Autoregressive Image Generation
(2025)
• No Venue
Chen et al.
-
Halumem: Evaluating Hallucinations In Memory Systems Of Agents
(2025)
• No Venue
Chen et al.
-
GRPO-CARE: Consistency-aware Reinforcement Learning For Multimodal Reasoning
(2025)
• No Venue
Chen et al.
-
Judgelrm: Large Reasoning Models As A Judge
(2025)
• No Venue
Chen et al.
-
Mathflow: Enhancing The Perceptual Flow Of Mllms For Visual Mathematical Problems
(2025)
• No Venue
Chen et al.
-
RM-R1: Reward Modeling As Reasoning
(2025)
• No Venue
Chen et al.
-
Pass@k Training For Adaptively Balancing Exploration And Exploitation Of Large Reasoning Models
(2025)
• No Venue
Chen et al.
-
Omniinsert: Mask-free Video Insertion Of Any Reference Via Diffusion Transformer Models
(2025)
• No Venue
Chen et al.
-
Mme5: Improving Multimodal Multilingual Embeddings Via High-quality Synthetic Data
(2025)
• No Venue
Chen et al.
-
Mlr-bench: Evaluating AI Agents On Open-ended Machine Learning Research
(2025)
• No Venue
Chen et al.
-
Mvi-bench: A Comprehensive Benchmark For Evaluating Robustness To Misleading Visual Inputs In Lvlms
(2025)
• No Venue
Chen et al.
-
Musixqa: Advancing Visual Music Understanding In Multimodal Large Language Models
(2025)
• No Venue
Chen et al.
-
Ouroboros-diffusion: Exploring Consistent Content Generation In Tuning-free Long Video Diffusion
(2025)
• No Venue
Chen et al.
-
Paper2web: Let's Make Your Paper Alive!
(2025)
• No Venue
Chen et al.
-
Reveal Hidden Pitfalls And Navigate Next Generation Of Vector Similarity Search From Task-centric Views
(2025)
• No Venue
Chen et al.
-
Planning With Reasoning Using Vision Language World Model
(2025)
• No Venue
Chen et al.
-
Pi-flow: Policy-based Few-step Generation Via Imitation Distillation
(2025)
• No Venue
Chen et al.
-
Qilin: A Multimodal Information Retrieval Dataset With App-level User Sessions
(2025)
• No Venue
Chen et al.
-
Reform: Reflective Autoformalization With Prospective Bounded Sequence Optimization
(2025)
• No Venue
Chen et al.
-
Robotwin 2.0: A Scalable Data Generator And Benchmark With Strong Domain Randomization For Robust Bimanual Robotic Manipulation
(2025)
• No Venue
Chen et al.
-
S^2-guidance: Stochastic Self Guidance For Training-free Enhancement Of Diffusion Models
(2025)
• No Venue
Chen et al.
-
Sharegpt-4o-image: Aligning Multimodal Models With Gpt-4o-level Image Generation
(2025)
• No Venue
Chen et al.
-
Svgenius: Benchmarking Llms In SVG Understanding, Editing And Generation
(2025)
• No Venue
Chen et al.
-
Stockbench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?
(2025)
• No Venue
Chen et al.
-
SPC: Evolving Self-play Critic Via Adversarial Games For LLM Reasoning
(2025)
• No Venue
Chen et al.
-
Tivibench: Benchmarking Think-in-video Reasoning For Video Generative Models
(2025)
• No Venue
Chen et al.
-
Videovista-culturallingo: 360^circ Horizons-bridging Cultures, Languages, And Domains In Video Comprehension
(2025)
• No Venue
Chen et al.
-
Xverify: Efficient Answer Verifier For Reasoning Model Evaluations
(2025)
• No Venue
Chen et al.
-
Segagent: Exploring Pixel Understanding Capabilities In Mllms By Imitating Human Annotator Trajectories
(2025)
• No Venue
Zhu et al.
-
Video-holmes: Can MLLM Think Like Holmes For Complex Video Reasoning?
(2025)
• No Venue
Cheng et al.
-
Caparena: Benchmarking And Analyzing Detailed Image Captioning In The LLM Era
(2025)
• No Venue
Cheng et al.
-
Animegamer: Infinite Anime Life Simulation With Next Game State Prediction
(2025)
• No Venue
Cheng et al.
-
V-star: Benchmarking Video-llms On Video Spatio-temporal Reasoning
(2025)
• No Venue
Cheng et al.
-
Pointarena: Probing Multimodal Grounding Through Language-guided Pointing
(2025)
• No Venue
Cheng et al.
-
Twinflow: Realizing One-step Generation On Large Models With Self-adversarial Flows
(2025)
• No Venue
Cheng et al.
-
Revisiting Reinforcement Learning For LLM Reasoning From A Cross-domain Perspective
(2025)
• No Venue
Cheng et al.
-
Audio-aware Large Language Models As Judges For Speaking Styles
(2025)
• No Venue
Chiang et al.
-
Livetalk: Real-time Multimodal Interactive Video Diffusion Via Improved On-policy Distillation
(2025)
• No Venue
Chern et al.
-
Multimodal Evaluation Of Russian-language Architectures
(2025)
• No Venue
Chervyakov et al.
-
Gold-medalist Performance In Solving Olympiad Geometry With Alphageometry2
(2025)
• No Venue
Chervonyi et al.
-
Prosperity Before Collapse: How Far Can Off-policy RL Reach With Stale Data On Llms?
(2025)
• No Venue
Haizhong Zheng, Jiawei Zhao, Bedi Chen
-
Multimodal Spatial Reasoning In The Large Model Era: A Survey And Benchmarks
(2025)
• No Venue
Zheng et al.
-
Icon: In-context Contribution For Automatic Data Selection
(2025)
• No Venue
Yang et al.
-
Humanomniv2: From Understanding To Omni-modal Reasoning With Context
(2025)
• No Venue
Yang et al.
-
Embodiedbench: Comprehensive Benchmarking Multi-modal Large Language Models For Vision-driven Embodied Agents
(2025)
• No Venue
Yang et al.
-
Codeclash: Benchmarking Goal-oriented Software Engineering
(2025)
• No Venue
Yang et al.
-
Captionqa: Is Your Caption As Useful As The Image Itself?
(2025)
• No Venue
Yang et al.
-
Can Llms Generate High-quality Test Cases For Algorithm Problems? Testcase-eval: A Systematic Evaluation Of Fault Coverage And Exposure
(2025)
• No Venue
Yang et al.
-
BARREL: Boundary-aware Reasoning For Factual And Reliable Lrms
(2025)
• No Venue
Yang et al.
-
A Controllable Examination For Long-context Language Models
(2025)
• No Venue
Yang et al.
-
Sci-verifier: Scientific Verifier With Thinking
(2025)
• No Venue
Zheng et al.
-
Surveyforge: On The Outline Heuristics, Memory-driven Generation, And Multi-dimensional Evaluation For Automated Survey Writing
(2025)
• No Venue
Yan et al.
-
Can Understanding And Generation Truly Benefit Together -- Or Just Coexist?
(2025)
• No Venue
Yan et al.
-
Step-audio-editx Technical Report
(2025)
• No Venue
Yan et al.
-
Gpt-imgeval: A Comprehensive Benchmark For Diagnosing Gpt4o In Image Generation
(2025)
• No Venue
Yan et al.
-
Do Phd-level Llms Truly Grasp Elementary Addition? Probing Rule Learning Vs. Memorization In Large Language Models
(2025)
• No Venue
Yan et al.
-
Recitation Over Reasoning: How Cutting-edge Language Models Can Fail On Elementary School-level Reasoning Problems?
(2025)
• No Venue
Yan et al.
-
Kwai Keye-vl 1.5 Technical Report
(2025)
• No Venue
Yang et al.
-
Verifybench: Benchmarking Reference-based Reward Systems For Large Language Models
(2025)
• No Venue
Yan et al.
-
Pretrainzero: Reinforcement Active Pretraining
(2025)
• No Venue
Xing et al.
-
Analyzing Llms' Knowledge Boundary Cognition Across Languages Through The Lens Of Internal Representations
(2025)
• No Venue
Xiao et al.
-
Motionstreamer: Streaming Motion Generation Via Diffusion-based Autoregressive Model In Causal Latent Space
(2025)
• No Venue
Xiao et al.
-
Mentrasuite: Post-training Large Language Models For Mental Health Reasoning And Assessment
(2025)
• No Venue
Xiao et al.
-
MIEB: Massive Image Embedding Benchmark
(2025)
• No Venue
Xiao et al.
-
Towards Dynamic Theory Of Mind: Evaluating LLM Adaptation To Temporal Evolution Of Human States
(2025)
• No Venue
Xiao et al.
-
Retrieval-augmented Large Language Models For Financial Time Series Forecasting
(2025)
• No Venue
Xiao et al.
-
Spatialtree: How Spatial Abilities Branch Out In Mllms
(2025)
• No Venue
Xiao et al.
-
Interleaved Reasoning For Large Language Models Via Reinforcement Learning
(2025)
• No Venue
Xie et al.
-
FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model
(2025)
• No Venue
Xie et al.
-
Are Vlms Ready For Autonomous Driving? An Empirical Study From The Reliability, Data, And Metric Perspectives
(2025)
• No Venue
Xie et al.
-
Aworld: Dynamic Multi-agent System With Stable Maneuvering For Robust GAIA Problem Solving
(2025)
• No Venue
Xie et al.
-
Llms Can Get "brain Rot"!
(2025)
• No Venue
Xing et al.
-
Reconstruction Alignment Improves Unified Multimodal Models
(2025)
• No Venue
Xie et al.
-
Mme-unify: A Comprehensive Benchmark For Unified Multimodal Understanding And Generation Models
(2025)
• No Venue
Xie et al.
-
Visjudge-bench: Aesthetics And Quality Assessment Of Visualizations
(2025)
• No Venue
Xie et al.
-
Caprl: Stimulating Dense Image Caption Capabilities Via Reinforcement Learning
(2025)
• No Venue
Xing et al.
-
Can Large Vision Language Models Read Maps Like A Human?
(2025)
• No Venue
Xing et al.
-
Multi-crit: Benchmarking Multimodal Judges On Pluralistic Criteria-following
(2025)
• No Venue
Xiong et al.
-
Quantagent: Price-driven Multi-agent Llms For High-frequency Trading
(2025)
• No Venue
Xiong et al.
-
Streamingvlm: Real-time Understanding For Infinite Video Streams
(2025)
• No Venue
Xu et al.
-
Dense Motion Captioning
(2025)
• No Venue
Xu et al.
-
An Anatomy Of Vision-language-action Models: From Modules To Milestones And Challenges
(2025)
• No Venue
Xu et al.
-
Deepphy: Benchmarking Agentic Vlms On Physical Reasoning
(2025)
• No Venue
Xu et al.
-
Biasfreebench: A Benchmark For Mitigating Bias In Large Language Model Responses
(2025)
• No Venue
Xu et al.
-
Can Llms Identify Critical Limitations Within Scientific Research? A Systematic Evaluation On AI Research Papers
(2025)
• No Venue
Xu et al.
-
Comfyui-copilot: An Intelligent Assistant For Automated Workflow Development
(2025)
• No Venue
Xu et al.
-
Funreason-mt Technical Report: Overcoming The Complexity Barrier In Multi-turn Function Calling
(2025)
• No Venue
Xu et al.
-
Easyedit2: An Easy-to-use Steering Framework For Editing Large Language Models
(2025)
• No Venue
Xu et al.
-
Filmagent: A Multi-agent Framework For End-to-end Film Automation In Virtual 3D Spaces
(2025)
• No Venue
Xu et al.
-
Factuality Matters: When Image Generation And Editing Meet Structured Visuals
(2025)
• No Venue
Zhuo et al.
-
Reasongen-r1: Cot For Autoregressive Image Generation Models Through SFT And RL
(2025)
• No Venue
Zhang et al.
-
Pokerbench: Training Large Language Models To Become Professional Poker Players
(2025)
• No Venue
Zhuang et al.
-
Comas: Co-evolving Multi-agent Systems Via Interaction Rewards
(2025)
• No Venue
Xue et al.
-
Visulogic: A Benchmark For Evaluating Visual Reasoning In Multi-modal Large Language Models
(2025)
• No Venue
Xu et al.
-
Xattention: Block Sparse Attention With Antidiagonal Scoring
(2025)
• No Venue
Xu et al.
-
Withanyone: Towards Controllable And ID Consistent Image Generation
(2025)
• No Venue
Xu et al.
-
Vs-bench: Evaluating Vlms For Strategic Reasoning And Decision-making In Multi-agent Environments
(2025)
• No Venue
Xu et al.
-
Unveiling Downstream Performance Scaling Of Llms: A Clustering-based Perspective
(2025)
• No Venue
Xu et al.
-
Noderag: Structuring Graph-based RAG With Heterogeneous Nodes
(2025)
• No Venue
Xu et al.
-
Mpbench: A Comprehensive Multimodal Reasoning Benchmark For Process Errors Identification
(2025)
• No Venue
Xu et al.
-
Relearn: Unlearning Via Learning For Large Language Models
(2025)
• No Venue
Xu et al.
-
Ravine: Reality-aligned Evaluation For Agentic Search
(2025)
• No Venue
Xu et al.
-
Qwen2.5-omni Technical Report
(2025)
• No Venue
Xu et al.
-
Probing Scientific General Intelligence Of Llms With Scientist-aligned Workflows
(2025)
• No Venue
Xu et al.
-
Mcp-bench: Benchmarking Tool-using LLM Agents With Complex Real-world Tasks Via MCP Servers
(2025)
• No Venue
Wang et al.
-
DDT: Decoupled Diffusion Transformer
(2025)
• No Venue
Wang et al.
-
Aethercode: Evaluating Llms' Ability To Win In Premier Programming Competitions
(2025)
• No Venue
Wang et al.
-
Benchmarking Multimodal Mathematical Reasoning With Explicit Visual Dependency
(2025)
• No Venue
Wang et al.
-
CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
(2025)
• No Venue
Wang et al.
-
CODESYNC: Synchronizing Large Language Models With Dynamic Code Evolution At Scale
(2025)
• No Venue
Wang et al.
-
Cmphysbench: A Benchmark For Evaluating Large Language Models In Condensed Matter Physics
(2025)
• No Venue
Wang et al.
-
Coser: Coordinating Llm-based Persona Simulation Of Established Roles
(2025)
• No Venue
Wang et al.
-
Confucius Code Agent: An Open-sourced AI Software Engineer At Industrial Scale
(2025)
• No Venue
Wang et al.
-
Grasp Any Region: Towards Precise, Contextual Pixel Understanding For Multimodal Llms
(2025)
• No Venue
Wang et al.
-
ENACT: Evaluating Embodied Cognition With World Modeling Of Egocentric Interaction
(2025)
• No Venue
Wang et al.
-
Drivel-ology: Challenging Llms With Interpreting Nonsense With Depth
(2025)
• No Venue
Wang et al.
-
Fantasyportrait: Enhancing Multi-character Portrait Animation With Expression-augmented Diffusion Transformers
(2025)
• No Venue
Wang et al.
-
Explore To Evolve: Scaling Evolved Aggregation Logic Via Proactive Online Exploration For Deep Research Agents
(2025)
• No Venue
Wang et al.
-
Game-tars: Pretrained Foundation Models For Scalable Generalist Multimodal Game Agents
(2025)
• No Venue
Wang et al.
-
Fostering Video Reasoning Via Next-event Prediction
(2025)
• No Venue
Wang et al.
-
Finauditing: A Financial Taxonomy-structured Multi-document Benchmark For Evaluating Llms
(2025)
• No Venue
Wang et al.
-
Fintagging: An Llm-ready Benchmark For Extracting And Structuring Financial Information
(2025)
• No Venue
Wang et al.
-
Frame In-n-out: Unbounded Controllable Image-to-video Generation
(2025)
• No Venue
Wang et al.
-
Genexam: A Multidisciplinary Text-to-image Exam
(2025)
• No Venue
Wang et al.
-
Internsvg: Towards Unified SVG Tasks With Multimodal Large Language Models
(2025)
• No Venue
Wang et al.
-
Gtr-cot: Graph Traversal As Visual Chain Of Thought For Molecular Structure Recognition
(2025)
• No Venue
Wang et al.
-
Learning To Align, Aligning To Learn: A Unified Approach For Self-optimized Alignment
(2025)
• No Venue
Wang et al.
-
Lettingo: Explore User Profile Generation For Recommendation System
(2025)
• No Venue
Wang et al.
-
Llava-critic-r1: Your Critic Model Is Secretly A Strong Policy Model
(2025)
• No Venue
Wang et al.
-
Textatlas5m: A Large-scale Dataset For Dense Text Image Generation
(2025)
• No Venue
Wang et al.
-
O-mem: Omni Memory System For Personalized, Long Horizon, Self-evolving Agents
(2025)
• No Venue
Wang et al.
-
Mr-align: Meta-reasoning Informed Factuality Alignment For Large Reasoning Models
(2025)
• No Venue
Wang et al.
-
Mem-α: Learning Memory Construction Via Reinforcement Learning
(2025)
• No Venue
Wang et al.
-
Morphobench: A Benchmark With Difficulty Adaptive To Model Reasoning
(2025)
• No Venue
Wang et al.
-
Mobile-agent-v: Learning Mobile Device Operation Through Video-guided Multi-agent Collaboration
(2025)
• No Venue
Wang et al.
-
Multishotmaster: A Controllable Multi-shot Video Generation Framework
(2025)
• No Venue
Wang et al.
-
Omniear: Benchmarking Agent Reasoning In Embodied Tasks
(2025)
• No Venue
Wang et al.
-
ODYSSEY: Open-world Quadrupeds Exploration And Manipulation For Long-horizon Tasks
(2025)
• No Venue
Wang et al.
-
Omnimmi: A Comprehensive Multi-modal Interaction Benchmark In Streaming Video Contexts
(2025)
• No Venue
Wang et al.
-
RAGEN: Understanding Self-evolution In LLM Agents Via Multi-turn Reinforcement Learning
(2025)
• No Venue
Wang et al.
-
Pref-grpo: Pairwise Preference Reward-based GRPO For Stable Text-to-image Reinforcement Learning
(2025)
• No Venue
Wang et al.
-
Promptbridge: Cross-model Prompt Transfer For Large Language Models
(2025)
• No Venue
Wang et al.
-
RECALL: Representation-aligned Catastrophic-forgetting Alleviation Via Hierarchical Model Merging
(2025)
• No Venue
Wang et al.
-
Test-time Scaling With Reflective Generative Model
(2025)
• No Venue
Wang et al.
-
Sciver: Evaluating Foundation Models For Multimodal Scientific Claim Verification
(2025)
• No Venue
Wang et al.
-
Scievalkit: An Open-source Evaluation Toolkit For Scientific General Intelligence
(2025)
• No Venue
Wang et al.
-
Scaling Laws In Patchification: An Image Is Worth 50,176 Tokens And More
(2025)
• No Venue
Wang et al.
-
Scholarcopilot: Training Large Language Models For Academic Writing With Accurate Citations
(2025)
• No Venue
Wang et al.
-
Scireasoner: Laying The Scientific Reasoning Ground Across Disciplines
(2025)
• No Venue
Wang et al.
-
Sparsemm: Head Sparsity Emerges From Visual Concept Responses In Mllms
(2025)
• No Venue
Wang et al.
-
Sota With Less: Mcts-guided Sample Selection For Data-efficient Visual Reasoning Self-improvement
(2025)
• No Venue
Wang et al.
-
Skywork-vl Reward: An Effective Reward Model For Multimodal Understanding And Reasoning
(2025)
• No Venue
Wang et al.
-
Socratic-zero : Bootstrapping Reasoning Via Data-free Agent Co-evolution
(2025)
• No Venue
Wang et al.
-
Streambridge: Turning Your Offline Video Large Language Model Into A Proactive Streaming Assistant
(2025)
• No Venue
Wang et al.
-
Swe-bench++: A Framework For The Scalable Generation Of Software Engineering Benchmarks From Open-source Repositories
(2025)
• No Venue
Wang et al.
-
Tadicodec: Text-aware Diffusion Speech Tokenizer For Speech Language Modeling
(2025)
• No Venue
Wang et al.
-
Complexfuncbench: Exploring Multi-step And Constrained Function Calling Under Long-context Scenario
(2025)
• No Venue
Zhong et al.
-
Universe-1: Unified Audio-video Generation Via Stitching Of Experts
(2025)
• No Venue
Wang et al.
-
Personalized Safety In Llms: A Benchmark And A Planning-based Agent Approach
(2025)
• No Venue
Wu et al.
-
Leetcodedataset: A Temporal Dataset For Robust Evaluation And Efficient Training Of Code Llms
(2025)
• No Venue
Xia et al.
-
Dreamomni2: Multimodal Instruction-based Editing And Generation
(2025)
• No Venue
Xia et al.
-
BMMR: A Large-scale Bilingual Multimodal Multi-discipline Reasoning Dataset
(2025)
• No Venue
Xi et al.
-
Samplemix: A Sample-wise Pre-training Data Mixing Strategey By Coordinating Data Quality And Diversity
(2025)
• No Venue
Xi et al.
-
Omnithink: Expanding Knowledge Boundaries In Machine Writing Through Thinking
(2025)
• No Venue
Xi et al.
-
Spatialscore: Towards Unified Evaluation For Multimodal Spatial Understanding
(2025)
• No Venue
Wu et al.
-
Reinforcement Learning In Vision: A Survey
(2025)
• No Venue
Wu et al.
-
Reasoning Or Memorization? Unreliable Results Of Reinforcement Learning Due To Data Contamination
(2025)
• No Venue
Wu et al.
-
Quantile Advantage Estimation For Entropy-safe Reasoning
(2025)
• No Venue
Wu et al.
-
Recode: Updating Code API Knowledge With Reinforcement Learning
(2025)
• No Venue
Wu et al.
-
Writingbench: A Comprehensive Benchmark For Generative Writing
(2025)
• No Venue
Wu et al.
-
Vbench-2.0: Advancing Video Generation Benchmark Suite For Intrinsic Faithfulness
(2025)
• No Venue
Zheng et al.
-
Webdancer: Towards Autonomous Information Seeking Agency
(2025)
• No Venue
Wu et al.
-
Free Lunch Alignment Of Text-to-image Diffusion Models Without Preference Image Pairs
(2025)
• No Venue
Xian et al.
-
What Generative Search Engines Like And How To Optimize Web Content Cooperatively
(2025)
• No Venue
Wu et al.
-
Resum: Unlocking Long-horizon Search Intelligence Via Context Summarization
(2025)
• No Venue
Wu et al.
-
Vidic: Video Difference Captioning
(2025)
• No Venue
Wu et al.
-
Unlocking Efficient Long-to-short LLM Reasoning With Model Merging
(2025)
• No Venue
Wu et al.
-
Synthrl: Scaling Visual Reasoning With Verifiable Data Synthesis
(2025)
• No Venue
Wu et al.
-
Superwriter: Reflection-driven Long-form Generation With Large Language Models
(2025)
• No Venue
Wu et al.
-
Step-audio 2 Technical Report
(2025)
• No Venue
Wu et al.
-
J1: Incentivizing Thinking In Llm-as-a-judge Via Reinforcement Learning
(2025)
• No Venue
Whitehouse et al.
-
Finevision: Open Data Is All You Need
(2025)
• No Venue
Wiedmann et al.
-
Navitrace: Evaluating Embodied Navigation Of Vision-language Models
(2025)
• No Venue
Windecker et al.
-
Datasheets Aren't Enough: Datarubrics For Automated Quality Metrics And Accountability
(2025)
• No Venue
Winata et al.
-
Widesearch: Benchmarking Agentic Broad Info-seeking
(2025)
• No Venue
Wong et al.
-
Any2caption:interpreting Any Condition To Caption For Controllable Video Generation
(2025)
• No Venue
Wu et al.
-
Omnigen2: Exploration To Advanced Multimodal Generation
(2025)
• No Venue
Wu et al.
-
Multiplayer Nash Preference Optimization
(2025)
• No Venue
Wu et al.
-
Mcpmark: A Benchmark For Stress-testing Realistic And Comprehensive MCP Use
(2025)
• No Venue
Wu et al.
-
Longwriter-zero: Mastering Ultra-long Text Generation Via Reinforcement Learning
(2025)
• No Venue
Wu et al.
-
Generate, But Verify: Reducing Hallucination In Vision-language Models With Retrospective Resampling
(2025)
• No Venue
Wu et al.
-
Effectively Controlling Reasoning Models Through Thinking Intervention
(2025)
• No Venue
Wu et al.
-
Think Or Not? Selective Reasoning Via Reinforcement Learning For Vision-language Models
(2025)
• No Venue
Wang et al.
-
Tikmix: Take Data Influence Into Dynamic Mixture For Language Model Pre-training
(2025)
• No Venue
Wang et al.
-
Transition Models: Rethinking The Generative Learning Objective
(2025)
• No Venue
Wang et al.
-
Tina: Tiny Reasoning Models Via Lora
(2025)
• No Venue
Wang et al.
-
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation And Methodology
(2025)
• No Venue
Wang et al.
-
Unigenbench++: A Unified Semantic Evaluation Benchmark For Text-to-image Generation
(2025)
• No Venue
Wang et al.
-
UI-TARS-2 Technical Report: Advancing GUI Agent With Multi-turn Reinforcement Learning
(2025)
• No Venue
Wang et al.
-
Trustjudge: Inconsistencies Of Llm-as-a-judge And How To Alleviate Them
(2025)
• No Venue
Wang et al.
-
Two Experts Are All You Need For Steering Thinking: Reinforcing Cognitive Effort In Moe Reasoning Models Without Additional Training
(2025)
• No Venue
Wang et al.
-
Unified Reward Model For Multimodal Understanding And Generation
(2025)
• No Venue
Wang et al.
-
Vicrit: A Verifiable Reinforcement Learning Proxy Task For Visual Perception In Vlms
(2025)
• No Venue
Wang et al.
-
Video Reality Test: Can Ai-generated ASMR Videos Fool Vlms And Humans?
(2025)
• No Venue
Wang et al.
-
Visualsimpleqa: A Benchmark For Decoupled Evaluation Of Large Vision-language Models In Fact-seeking Question Answering
(2025)
• No Venue
Wang et al.
-
Visualprm: An Effective Process Reward Model For Multimodal Reasoning
(2025)
• No Venue
Wang et al.
-
Worldpm: Scaling Human Preference Modeling
(2025)
• No Venue
Wang et al.
-
Voiceassistant-eval: Benchmarking AI Assistants Across Listening, Speaking, And Viewing
(2025)
• No Venue
Wang et al.
-
Llama-3.1-foundationai-securityllm-8b-instruct Technical Report
(2025)
• No Venue
Weerawardhena et al.
-
Look Before You Leap: A Gui-critic-r1 Model For Pre-operative Error Diagnosis In GUI Automation
(2025)
• No Venue
Wanyan et al.
-
Spot The Fake: Large Multimodal Model-based Synthetic Image Detection With Artifact Explanation
(2025)
• No Venue
Wen et al.
-
Mocha: Towards Movie-grade Talking Character Synthesis
(2025)
• No Venue
Wei et al.
-
Efficient Model Selection For Time Series Forecasting Via Llms
(2025)
• No Venue
Wei et al.
-
Codearc: Benchmarking Reasoning Capabilities Of LLM Agents For Inductive Program Synthesis
(2025)
• No Venue
Wei et al.
-
Deepseek-ocr: Contexts Optical Compression
(2025)
• No Venue
Haoran Wei, Yaofeng Sun, Yukun Li
-
Ggbench: A Geometric Generative Reasoning Benchmark For Unified Multimodal Models
(2025)
• No Venue
Wei et al.
-
TIME: A Multi-level Benchmark For Temporal Reasoning Of Llms In Real-world Scenarios
(2025)
• No Venue
Wei et al.
-
Videorope: What Makes For Good Video Rotary Position Embedding?
(2025)
• No Venue
Wei et al.
-
On The Theoretical Limitations Of Embedding-based Retrieval
(2025)
• No Venue
Weller et al.
-
Dynamicverse: A Physically-aware Multimodal Framework For 4D World Modeling
(2025)
• No Venue
Wen et al.
-
3D Scene Generation: A Survey
(2025)
• No Venue
Wen et al.
-
Reinforcement Learning With Verifiable Rewards Implicitly Incentivizes Correct Reasoning In Base Llms
(2025)
• No Venue
Wen et al.
-
Fantastic Pretraining Optimizers And Where To Find Them
(2025)
• No Venue
Wen et al.
-
Gr-dexter Technical Report
(2025)
• No Venue
Wen et al.
-
Vibe Checker: Aligning Code Evaluation With Human Preference
(2025)
• No Venue
Zhong et al.
-
A Theoretical Study On Bridging Internal Probability And Self-consistency For LLM Reasoning
(2025)
• No Venue
Zhou et al.
-
How To Get Your LLM To Generate Challenging Problems For Evaluation
(2025)
• No Venue
Arkil Patel, Siva Reddy, Dzmitry Bahdanau
-
Automind: Adaptive Knowledgeable Agent For Automated Data Science
(2025)
• No Venue
Ou et al.
-
Bielik 11B V2 Technical Report
(2025)
• No Venue
Ociepa et al.
-
Visual Instruction Bottleneck Tuning
(2025)
• No Venue
Oh et al.
-
Comment On The Illusion Of Thinking: Understanding The Strengths And Limitations Of Reasoning Models Via The Lens Of Problem Complexity
(2025)
• No Venue
C. Opus, A. Lawsen
-
Multibanana: A Challenging Benchmark For Multi-reference Text-to-image Generation
(2025)
• No Venue
Oshima et al.
-
Strategic Dishonesty Can Undermine AI Safety Evaluations Of Frontier LLM
(2025)
• No Venue
Panfilov et al.
-
Explorer: Scaling Exploration-driven Web Trajectory Synthesis For Multimodal Web Agents
(2025)
• No Venue
Pahuja et al.
-
K-lora: Unlocking Training-free Fusion Of Any Subject And Style Loras
(2025)
• No Venue
Ziheng Ouyang, Zhen Li, Qibin Hou
-
FLEXITOKENS: Flexible Tokenization For Evolving Language Models
(2025)
• No Venue
Abraham Toluase Owodunni, Orevaoghene Ahia, Sachin Kumar
-
Large Language Models Think Too Fast To Explore Effectively
(2025)
• No Venue
Lan Pan, Hanbo Xie, Robert C. Wilson
-
Mt-video-bench: A Holistic Video Understanding Benchmark For Evaluating Multimodal Llms In Multi-turn Dialogues
(2025)
• No Venue
Pan et al.
-
Tokenhsi: Unified Synthesis Of Physical Human-scene Interactions Through Task Tokenization
(2025)
• No Venue
Pan et al.
-
REST: Stress Testing Large Reasoning Models By Asking Multiple Problems At Once
(2025)
• No Venue
Pan et al.
-
Teaching With Lies: Curriculum DPO On Synthetic Negatives For Hallucination Detection
(2025)
• No Venue
Pandit et al.
-
SIFT-50M: A Large-scale Multilingual Dataset For Speech Instruction Fine-tuning
(2025)
• No Venue
Pandey et al.
-
MCIF: Multimodal Crosslingual Instruction-following Benchmark From Scientific Talks
(2025)
• No Venue
Papi et al.
-
Paper2poster: Towards Multimodal Poster Automation From Scientific Papers
(2025)
• No Venue
Pang et al.
-
FAMA: The First Large-scale Open-science Speech Foundation Model For English And Italian
(2025)
• No Venue
Papi et al.
-
Bridging Reasoning To Learning: Unmasking Illusions Using Complexity Out Of Distribution Generalization
(2025)
• No Venue
Paqaleh et al.
-
A Survey On Inference Engines For Large Language Models: Perspectives On Optimization And Efficiency
(2025)
• No Venue
Park et al.
-
Fedrand: Enhancing Privacy In Federated Learning With Randomized Lora Subparameter Updates
(2025)
• No Venue
Park et al.
-
Evaluating Multimodal Generative AI With Korean Educational Standards
(2025)
• No Venue
Sanghee Park, Geewook Kim
-
Deepscholar-bench: A Live Benchmark And Automated Evaluation For Generative Research Synthesis
(2025)
• No Venue
Patel et al.
-
Assetopsbench: Benchmarking AI Agents For Task Automation In Industrial Asset Operations And Maintenance
(2025)
• No Venue
Patel et al.
-
AION-1: Omnimodal Foundation Model For Astronomical Sciences
(2025)
• No Venue
Parker et al.
-
Mvu-eval: Towards Multi-video Understanding Evaluation For Multimodal Llms
(2025)
• No Venue
Peng et al.
-
Interpretable Physics Reasoning And Performance Taxonomy In Vision-language Models
(2025)
• No Venue
Pawar et al.
-
Context Is What You Need: The Maximum Effective Context Window For Real World Limits Of Llms
(2025)
• No Venue
Norman Paulsen
-
Optimizing Multilingual Text-to-speech With Accents & Emotions
(2025)
• No Venue
Pawar et al.
-
Fine-grained Detection Of Context-grounded Hallucinations Using Llms
(2025)
• No Venue
Peisakhovsky et al.
-
Fineweb2: One Pipeline To Scale Them All -- Adapting Pre-training Data Processing To Every Language
(2025)
• No Venue
Penedo et al.
-
Multifinben: A Multilingual, Multimodal, And Difficulty-aware Benchmark For Financial LLM Evaluation
(2025)
• No Venue
Peng et al.
-
Can Visual Input Be Compressed? A Visual Token Compression Benchmark For Large Multimodal Models
(2025)
• No Venue
Peng et al.
-
Bizgen: Advancing Article-level Visual Text Rendering For Infographics Generation
(2025)
• No Venue
Peng et al.
-
Criticlean: Critic-guided Reinforcement Learning For Mathematical Formalization
(2025)
• No Venue
Peng et al.
-
Efficiency-effectiveness Reranking Flops For Llm-based Rerankers
(2025)
• No Venue
Peng et al.
-
Skywork R1V: Pioneering Multimodal Reasoning With Chain-of-thought
(2025)
• No Venue
Peng et al.
-
Plutus: Benchmarking Large Language Models In Low-resource Greek Finance
(2025)
• No Venue
Peng et al.
-
UNIDOC-BENCH: A Unified Benchmark For Document-centric Multimodal RAG
(2025)
• No Venue
Peng et al.
-
Humanity's Last Exam
(2025)
• No Venue
Phan et al.
-
Breaking The Exploration Bottleneck: Rubric-scaffolded Reinforcement Learning For General LLM Reasoning
(2025)
• No Venue
Zhou et al.
-
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification To Improve Trustworthy QA
(2025)
• No Venue
Pletenev et al.
-
So Let's Replace This Phrase With Insult... Lessons Learned From Generation Of Toxic Texts With Llms
(2025)
• No Venue
Sergey Pletenev, Daniil Moskovskiy, Alexander Panchenko
-
Generating Physically Stable And Buildable LEGO Designs From Text
(2025)
• No Venue
Pun et al.
-
THOUGHTTERMINATOR: Benchmarking, Calibrating, And Mitigating Overthinking In Reasoning Models
(2025)
• No Venue
Pu et al.
-
Judge Anything: MLLM As A Judge Across Any Modality
(2025)
• No Venue
Pu et al.
-
Quantiphy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities Of Vision-language Models
(2025)
• No Venue
Puyin et al.
-
A Probabilistic Inference Approach To Inference-time Scaling Of Llms Using Particle-based Monte Carlo Methods
(2025)
• No Venue
Puri et al.
-
Dynamic Large Concept Models: Latent Reasoning In An Adaptive Semantic Space
(2025)
• No Venue
Qu et al.
-
BEAR: Benchmarking And Enhancing Multimodal Language Models For Atomic Embodied Capabilities
(2025)
• No Venue
Qi et al.
-
Vcr-bench: A Comprehensive Evaluation Framework For Video Chain-of-thought Reasoning
(2025)
• No Venue
Qi et al.
-
Layercomposer: Interactive Personalized T2I Via Spatially-aware Layered Canvas
(2025)
• No Venue
Qian et al.
-
Dispider: Enabling Video Llms With Active Real-time Interaction Via Disentangled Perception, Decision, And Reaction
(2025)
• No Venue
Qian et al.
-
Userrl: Training Interactive User-centric Agent Via Reinforcement Learning
(2025)
• No Venue
Qian et al.
-
Toolrl: Reward Is All Tool Learning Needs
(2025)
• No Venue
Qian et al.
-
Userbench: An Interactive Gym Environment For User-centric Agents
(2025)
• No Venue
Qian et al.
-
Mle-dojo: Interactive Environments For Empowering LLM Agents In Machine Learning Engineering
(2025)
• No Venue
Qiang et al.
-
C2LLM Technical Report: A New Frontier In Code Retrieval Via Adaptive Cross-attention Pooling
(2025)
• No Venue
Qin et al.
-
Lumina-image 2.0: A Unified And Efficient Image Generative Framework
(2025)
• No Venue
Qin et al.
-
Incentivizing Reasoning For Advanced Instruction-following Of Large Language Models
(2025)
• No Venue
Qin et al.
-
Flageval Findings Report: A Preliminary Evaluation Of Large Reasoning Models On Automatically Verifiable Textual And Visual Questions
(2025)
• No Venue
Qin et al.
-
Flash-searcher: Fast And Effective Web Agents Via Dag-based Parallel Execution
(2025)
• No Venue
Qin et al.
-
Bootstrapping World Models From Dynamics Models In Multimodal Foundation Models
(2025)
• No Venue
Qiu et al.
-
Evolving Diagnostic Agents In A Virtual Clinical Environment
(2025)
• No Venue
Qiu et al.
-
Phybench: Holistic Evaluation Of Physical Perception And Reasoning In Large Language Models
(2025)
• No Venue
Qiu et al.
-
Saffron-1: Towards An Inference Scaling Paradigm For LLM Safety Assurance
(2025)
• No Venue
Qiu et al.
-
3DIS-FLUX: Simple And Efficient Multi-instance Generation With Dit Rendering
(2025)
• No Venue
Zhou et al.
-
How Well Does Gpt-4o Understand Vision? Evaluating Multimodal Foundation Models On Standard Computer Vision Tasks
(2025)
• No Venue
Ramachandran et al.
-
Apriel-1.5-15b-thinker
(2025)
• No Venue
Radhakrishna et al.
-
An Empirical Study Of Autoregressive Pre-training From Videos
(2025)
• No Venue
Rajasegaran et al.
-
Videomathqa: Benchmarking Mathematical Reasoning Via Multimodal Understanding In Videos
(2025)
• No Venue
Rasheed et al.
-
Halogen: Fantastic LLM Hallucinations And Where To Find Them
(2025)
• No Venue
Ravichander et al.
-
Llm-microscope: Uncovering The Hidden Role Of Punctuation In Context Memory Of Transformers
(2025)
• No Venue
Razzhigaev et al.
-
Anycap Project: A Unified Framework, Dataset, And Benchmark For Controllable Omni-modal Captioning
(2025)
• No Venue
Ren et al.
-
Simworld: An Open-ended Realistic Simulator For Autonomous Agents In Physical And Social Worlds
(2025)
• No Venue
Ren et al.
-
SMMILE: An Expert-driven Benchmark For Multimodal Medical In-context Learning
(2025)
• No Venue
Rieff et al.
-
Zerobench: An Impossible Visual Benchmark For Contemporary Large Multimodal Models
(2025)
• No Venue
Roberts et al.
-
Benchmarking Llms' Swarm Intelligence
(2025)
• No Venue
Ruan et al.
-
REFINE-AF: A Task-agnostic Framework To Align Language Models Via Self-generated Instructions Using Reinforcement Learning From Automated Feedback
(2025)
• No Venue
Roy et al.
-
Dota-rag: Dynamic Of Thought Aggregation RAG
(2025)
• No Venue
Ruangtanusak et al.
-
DISCO: Diversifying Sample Condensation For Efficient Model Evaluation
(2025)
• No Venue
Rubinstein et al.
-
When Models Lie, We Learn: Multilingual Span-level Hallucination Detection With Psiloqa
(2025)
• No Venue
Rykov et al.
-
Through The Looking Glass: Common Sense Consistency Evaluation Of Weird Images
(2025)
• No Venue
Rykov et al.
-
Reviewscore: Misinformed Peer Review Detection With Large Language Models
(2025)
• No Venue
Ryu et al.
-
MSTS: A Multimodal Safety Test Suite For Vision-language Models
(2025)
• No Venue
Röttger et al.
-
Geospatial Mechanistic Interpretability Of Large Language Models
(2025)
• No Venue
Stef de Sabbata, Stefano Mizzaro, Kevin Roitero
-
SRMT: Shared Memory For Multi-agent Lifelong Pathfinding
(2025)
• No Venue
Alsu Sagirova, Yuri Kuratov, Mikhail Burtsev
-
Aligning Text, Images, And 3D Structure Token-by-token
(2025)
• No Venue
Aadarsh Sahoo, Vansh Tibrewal, Georgia Gkioxari
-
Omnimattezero: Training-free Real-time Omnimatte With Pre-trained Video Diffusion Models
(2025)
• No Venue
Samuel et al.
-
Lynx: Towards High-fidelity Personalized Video Generation
(2025)
• No Venue
Sang et al.
-
AI Agents Vs. Agentic AI: A Conceptual Taxonomy, Applications And Challenge
(2025)
• No Venue
Ranjan Sapkota, Konstantinos I. Roumeliotis, Manoj Karkee
-
Establishing Trustworthy LLM Evaluation Via Shortcut Neuron Analysis
(2025)
• No Venue
Zhu et al.
-
Yourbench: Easy Custom Evaluation Sets For Everyone
(2025)
• No Venue
Shashidhar et al.
-
Emonet-voice: A Fine-grained, Expert-verified Benchmark For Speech Emotion Detection
(2025)
• No Venue
Schuhmann et al.
-
Project Alexandria: Towards Freeing Scientific Knowledge From Copyright Burdens Via Llms
(2025)
• No Venue
Schuhmann et al.
-
Seedream 4.0: Toward Next-generation Multimodal Image Generation
(2025)
• No Venue
Seedream et al.
-
Leveraging Large Language Models For Predictive Analysis Of Human Misery
(2025)
• No Venue
Seal et al.
-
Benchmarking Optimizers For Large Language Model Pretraining
(2025)
• No Venue
Andrei Semenov, Matteo Pagliardini, Martin Jaggi
-
When Punctuation Matters: A Large-scale Comparison Of Prompt Robustness Methods For Llms
(2025)
• No Venue
Seleznyov et al.
-
Medgemma Technical Report
(2025)
• No Venue
Sellergren et al.
-
Skrr: Skip And Re-use Text Encoder Layers For Memory Efficient Text-to-image Generation
(2025)
• No Venue
Seo et al.
-
Paper2code: Automating Code Generation From Scientific Papers In Machine Learning
(2025)
• No Venue
Seo et al.
-
NER Retriever: Zero-shot Named Entity Retrieval With Type-aware Embeddings
(2025)
• No Venue
Shachar et al.
-
Turning The Spell Around: Lightweight Alignment Amplification Via Rank-one Safety Injection
(2025)
• No Venue
Shairah et al.
-
Reasonir: Training Retrievers For Reasoning Tasks
(2025)
• No Venue
Shao et al.
-
Nile-chat: Egyptian Language Models For Arabic And Latin Scripts
(2025)
• No Venue
Shang et al.
-
Holitom: Holistic Token Merging For Fast Video Large Language Models
(2025)
• No Venue
Shao et al.
-
Continuous Autoregressive Language Models
(2025)
• No Venue
Shao et al.
-
Core^2: Collect, Reflect And Refine To Generate Better And Faster
(2025)
• No Venue
Shao et al.
-
Researchrubrics: A Benchmark Of Prompts And Rubrics For Evaluating Deep Research Agents
(2025)
• No Venue
Sharma et al.
-
Constitutional Classifiers: Defending Against Universal Jailbreaks Across Thousands Of Hours Of Red Teaming
(2025)
• No Venue
Sharma et al.
-
Satori: Reinforcement Learning With Chain-of-action-thought Enhances LLM Reasoning Via Autoregressive Search
(2025)
• No Venue
Shen et al.
-
Qwenlong-cprs: Towards Infty-llms With Dynamic Context Optimization
(2025)
• No Venue
Shen et al.
-
Are We On The Right Way For Assessing Document Retrieval-augmented Generation?
(2025)
• No Venue
Shen et al.
-
Decontext As Defense: Safe Image Editing In Diffusion Transformers
(2025)
• No Venue
Linghui Shen, Mingyue Cui, Xingyi Yang
-
Phyx: Does Your Model Have The "wits" For Physical Reasoning?
(2025)
• No Venue
Shen et al.
-
Solving Inequality Proofs With Large Language Models
(2025)
• No Venue
Sheng et al.
-
Towards Trustworthy GUI Agents: A Survey
(2025)
• No Venue
Shi et al.
-
Model Merging With Functional Dual Anchors
(2025)
• No Venue
Kexuan Shi, Yandong Wen, Weiyang Liu
-
Damo: Data Mixing Optimizer In Fine-tuning Multimodal Llms For Mobile Phone Agents
(2025)
• No Venue
Shi et al.
-
Mme-videoocr: Evaluating Ocr-based Capabilities Of Multimodal Llms In Video Scenarios
(2025)
• No Venue
Shi et al.
-
Mathcanvas: Intrinsic Visual Chain-of-thought For Multimodal Mathematical Reasoning
(2025)
• No Venue
Shi et al.
-
Heimdall: Test-time Scaling On The Generative Verification
(2025)
• No Venue
Wenlei Shi, Xing Jin
-
Longcodezip: Compress Long Context For Code Language Models
(2025)
• No Venue
Shi et al.
-
Taskcraft: Automated Generation Of Agentic Tasks
(2025)
• No Venue
Shi et al.
-
Realunify: Do Unified Models Truly Benefit From Unification? A Comprehensive Benchmark
(2025)
• No Venue
Shi et al.
-
Presentagent: Multimodal Agent For Presentation Video Generation
(2025)
• No Venue
Shi et al.
-
SAM Audio: Segment Anything In Audio
(2025)
• No Venue
Shi et al.
-
Representing Speech Through Autoregressive Prediction Of Cochlear Tokens
(2025)
• No Venue
Tuckute et al.
-
Safearena: Evaluating The Safety Of Autonomous Web Agents
(2025)
• No Venue
Tur et al.
-
Nurbgen: High-fidelity Text-to-cad Generation Through Llm-driven NURBS Modeling
(2025)
• No Venue
Usama et al.
-
Guided Decoding And Its Critical Role In Retrieval-augmented Generation
(2025)
• No Venue
Uğur et al.
-
Φeat: Physically-grounded Feature Representation
(2025)
• No Venue
Vecchio et al.
-
B-score: Detecting Biases In Large Language Models Using Response History
(2025)
• No Venue
Vo et al.
-
Deleaker: Dynamic Inference-time Reweighting For Semantic Leakage Mitigation In Text-to-image Models
(2025)
• No Venue
Ventura et al.
-
Measuring General Intelligence With Generated Games
(2025)
• No Venue
Verma et al.
-
Hiwave: Training-free High-resolution Image Generation Via Wavelet-based Diffusion Sampling
(2025)
• No Venue
Vontobel et al.
-
Deepresearch Arena: The First Exam Of Llms' Research Abilities Via Seminar-grounded Tasks
(2025)
• No Venue
Wan et al.
-
Baichuan-m2: Scaling Medical Capability With Large Verifier System
(2025)
• No Venue
Team et al.
-
Thinking With Images For Multimodal Reasoning: Foundations, Methods, And Future Frontiers
(2025)
• No Venue
Su et al.
-
Stop Overthinking: A Survey On Efficient Reasoning For Large Language Models
(2025)
• No Venue
Sui et al.
-
Evaluation Is All You Need: Strategic Overclaiming Of LLM Reasoning Capabilities Through Evaluation Design
(2025)
• No Venue
Sun et al.
-
Challenging The Boundaries Of Reasoning: An Olympiad-level Math Benchmark For Large Language Models
(2025)
• No Venue
Sun et al.
-
Enhancing Step-by-step And Verifiable Medical Reasoning In Mllms
(2025)
• No Venue
Sun et al.
-
Scienceboard: Evaluating Multimodal Autonomous Agents In Realistic Scientific Workflows
(2025)
• No Venue
Sun et al.
-
Llava-scissor: Token Compression With Semantic Connected Components For Video Llms
(2025)
• No Venue
Sun et al.
-
Inverse Reinforcement Learning Meets Large Language Model Post-training: Basics, Advances, And Opportunities
(2025)
• No Venue
Hao Sun, Mihaela van Der Schaar
-
LASP-2: Rethinking Sequence Parallelism For Linear Attention And Its Hybrid
(2025)
• No Venue
Sun et al.
-
Spacevista: All-scale Visual Spatial Reasoning From Mm To Km
(2025)
• No Venue
Sun et al.
-
T2i-reasonbench: Benchmarking Reasoning-informed Text-to-image Generation
(2025)
• No Venue
Sun et al.
-
Unified Continuous Generative Models
(2025)
• No Venue
Peng Sun, Yi Jiang, Tao Lin
-
Au-harness: An Open-source Toolkit For Holistic Evaluation Of Audio Llms
(2025)
• No Venue
Surapaneni et al.
-
BEAVER: An Efficient Deterministic LLM Verifier
(2025)
• No Venue
Suresh et al.
-
Cs-sum: A Benchmark For Code-switching Dialogue Summarization And The Limits Of Large Language Models
(2025)
• No Venue
Suresh et al.
-
When An LLM Is Apprehensive About Its Answers -- And When Its Uncertainty Is Justified
(2025)
• No Venue
Sychev et al.
-
Stableavatar: Infinite-length Audio-driven Avatar Video Generation
(2025)
• No Venue
Tu et al.
-
Videogameqa-bench: Evaluating Vision-language Models For Video Game Quality Assurance
(2025)
• No Venue
Taesiri et al.
-
Understanding Generative AI Capabilities In Everyday Image Editing Tasks
(2025)
• No Venue
Taesiri et al.
-
Auto-regressive Vs Flow-matching: A Comparative Study Of Modeling Paradigms For Text-to-music Generation
(2025)
• No Venue
Or Tal, Felix Kreuk, Yossi Adi
-
Learning An Efficient Multi-turn Dialogue Evaluator From Multiple Judges
(2025)
• No Venue
Tang et al.
-
Beyond Turn Limits: Training Deep Search Agents With Dynamic Context Window
(2025)
• No Venue
Tang et al.
-
Enabling Scalable Oversight Via Self-evolving Critic
(2025)
• No Venue
Tang et al.
-
Metachain: A Fully-automated And Zero-code Framework For LLM Agents
(2025)
• No Venue
Jiabin Tang, Tianyu Fan, Chao Huang
-
Loom-scope: A Comprehensive And Efficient Long-context Model Evaluation Framework
(2025)
• No Venue
Tang et al.
-
Lego-puzzles: How Good Are Mllms At Multi-step Spatial Reasoning?
(2025)
• No Venue
Tang et al.
-
Longrm: Revealing And Unlocking The Context Boundary Of Reward Modeling
(2025)
• No Venue
Tang et al.
-
Medagentsbench: Benchmarking Thinking Models And Agent Frameworks For Complex Medical Reasoning
(2025)
• No Venue
Tang et al.
-
Refcritic: Training Long Chain-of-thought Critic Models With Refinement Feedback
(2025)
• No Venue
Tang et al.
-
Realcritic: Towards Effectiveness-driven Evaluation Of Language Model Critiques
(2025)
• No Venue
Tang et al.
-
Robust-r1: Degradation-aware Reasoning For Robust Visual Understanding
(2025)
• No Venue
Tang et al.
-
Omniagent: Audio-guided Active Perception Agent For Omnimodal Audio-video Understanding
(2025)
• No Venue
Tao et al.
-
Personafeedback: A Large-scale Human-annotated Benchmark For Personalization
(2025)
• No Venue
Tao et al.
-
Turk-lettucedetect: A Hallucination Detection Models For Turkish RAG Applications
(2025)
• No Venue
Taş et al.
-
COIG-P: A High-quality And Large-scale Chinese Preference Dataset For Alignment With Human Values
(2025)
• No Venue
Team et al.
-
Kling-omni Technical Report
(2025)
• No Venue
Team et al.
-
Inferix: A Block-diffusion Based Next-generation Inference Engine For World Simulation
(2025)
• No Venue
Team et al.
-
INTELLECT-3: Technical Report
(2025)
• No Venue
Team et al.
-
Kimi-vl Technical Report
(2025)
• No Venue
Team et al.
-
PAN: A World Model For General, Interactable, And Long-horizon World Simulation
(2025)
• No Venue
Team et al.
-
Minicpm4: Ultra-efficient Llms On End Devices
(2025)
• No Venue
Team et al.
-
Kwai Keye-vl Technical Report
(2025)
• No Venue
Team et al.
-
Lingshu: A Generalist Foundation Model For Unified Multimodal Medical Understanding And Reasoning
(2025)
• No Venue
Team et al.
-
Mimo: Unlocking The Reasoning Potential Of Language Model -- From Pretraining To Posttraining
(2025)
• No Venue
Team et al.
-
Open Multimodal Retrieval-augmented Factual Image Generation
(2025)
• No Venue
Tian et al.
-
Hermes 4 Technical Report
(2025)
• No Venue
Teknium et al.
-
Supergpqa: Scaling LLM Evaluation Across 285 Graduate Disciplines
(2025)
• No Venue
Team et al.
-
Vidi: Large Multimodal Models For Video Understanding And Editing
(2025)
• No Venue
Team et al.
-
SWE-EVO: Benchmarking Coding Agents In Long-horizon Software Evolution Scenarios
(2025)
• No Venue
Thai et al.
-
Preserving Privacy, Increasing Accessibility, And Reducing Cost: An On-device Artificial Intelligence Model For Medical Transcription And Note Generation
(2025)
• No Venue
Thomas et al.
-
Fixing Data That Hurts Performance: Cascading Llms To Relabel Hard Negatives For Robust Information Retrieval
(2025)
• No Venue
Thakur et al.
-
Llamav-o1: Rethinking Step-by-step Visual Reasoning In Llms
(2025)
• No Venue
Thawakar et al.
-
Webgames: Challenging General-purpose Web-browsing AI Agents
(2025)
• No Venue
Thomas et al.
-
Envision: Benchmarking Unified Understanding & Generation For Causal World Process Insights
(2025)
• No Venue
Tian et al.
-
Marco-voice Technical Report
(2025)
• No Venue
Tian et al.
-
MMMR: Benchmarking Massive Multi-modal Reasoning Tasks
(2025)
• No Venue
Tie et al.
-
The Jumping Reasoning Curve? Tracking The Evolution Of Reasoning Performance In Gpt-[n] And O-[n] Models On Multimodal Puzzles
(2025)
• No Venue
Toh et al.
-
Interactiveomni: A Unified Omni-modal Model For Audio-visual Multi-turn Dialogue
(2025)
• No Venue
Tong et al.
-
Thinking With Video: Video Generation As A Promising Multimodal Reasoning Paradigm
(2025)
• No Venue
Tong et al.
-
Deepprune: Parallel Scaling Without Inter-trace Redundancy
(2025)
• No Venue
Tu et al.
-
Arch-router: Aligning LLM Routing With Human Preferences
(2025)
• No Venue
Tran et al.
-
SPAR: Scholar Paper Retrieval With Llm-based Agents For Enhanced Academic Search
(2025)
• No Venue
Shi et al.
-
Trainable Dynamic Mask Sparse Attention
(2025)
• No Venue
Shi et al.
-
The Illusion Of Thinking: Understanding The Strengths And Limitations Of Reasoning Models Via The Lens Of Problem Complexity
(2025)
• No Venue
Shojaee et al.
-
Mathematical Reasoning In Large Language Models: Assessing Logical And Arithmetic Errors Across Wide Numerical Ranges
(2025)
• No Venue
Safal Shrestha, Minwu Kim, Keith Ross
-
Earthmind: Towards Multi-granular And Multi-sensor Earth Observation With Large Multimodal Models
(2025)
• No Venue
Shu et al.
-
The Leaderboard Illusion
(2025)
• No Venue
Singh et al.
-
Can Language Models Falsify? Evaluating Algorithmic Reasoning With Counterexample Creation
(2025)
• No Venue
Sinha et al.
-
Eka-eval : A Comprehensive Evaluation Framework For Large Language Models In Indian Languages
(2025)
• No Venue
Sinha et al.
-
When AI Co-scientists Fail: Spot-a Benchmark For Automated Verification Of Scientific Research
(2025)
• No Venue
Son et al.
-
Linguistic Generalizability Of Test-time Scaling In Mathematical Reasoning
(2025)
• No Venue
Son et al.
-
Pushing On Multilingual Reasoning Models With Language-mixed Chain-of-thought
(2025)
• No Venue
Son et al.
-
From Head To Tail: Towards Balanced Representation In Large Vision-language Models Through Adaptive Data Calibration
(2025)
• No Venue
Song et al.
-
DMM: Building A Versatile Image Generation Model Via Distillation-based Model Merging
(2025)
• No Venue
Song et al.
-
Prmbench: A Fine-grained And Challenging Benchmark For Process-level Reward Models
(2025)
• No Venue
Song et al.
-
IFIR: A Comprehensive Benchmark For Evaluating Instruction-following In Expert-domain Information Retrieval
(2025)
• No Venue
Song et al.
-
Layertracer: Cognitive-aligned Layered SVG Synthesis Via Diffusion Transformer
(2025)
• No Venue
Yiren Song, Danze Chen, Mike Zheng Shou
-
Omniconsistency: Learning Style-agnostic Consistency From Paired Stylization Data
(2025)
• No Venue
Yiren Song, Cheng Liu, Mike Zheng Shou
-
Visualpuzzles: Decoupling Multimodal Reasoning Evaluation From Domain Knowledge
(2025)
• No Venue
Song et al.
-
Seed Diffusion: A Large-scale Diffusion Language Model With High-speed Inference
(2025)
• No Venue
Song et al.
-
Videonsa: Native Sparse Attention Scales Video Understanding
(2025)
• No Venue
Song et al.
-
Vf-eval: Evaluating Multimodal Llms For Generating Feedback On AIGC Videos
(2025)
• No Venue
Song et al.
-
Paperbench: Evaluating Ai's Ability To Replicate AI Research
(2025)
• No Venue
Starace et al.
-
Iterative Self-training For Code Generation Via Reinforced Re-ranking
(2025)
• No Venue
Nikita Sorokin, Ivan Sedykh, Valentin Malykh
-
Pde-controller: Llms For Autoformalization And Reasoning Of Pdes
(2025)
• No Venue
Soroco et al.
-
Debatable Intelligence: Benchmarking LLM Judges Via Debate Speech Evaluation
(2025)
• No Venue
Sternlicht et al.
-
Real-world Gaps In AI Governance Research
(2025)
• No Venue
Strauss et al.
-
T-pro 2.0: An Efficient Russian Hybrid-reasoning Model And Playground
(2025)
• No Venue
Stoianov et al.
-
Mem0: Building Production-ready AI Agents With Scalable Long-term Memory
(2025)
• No Venue
Chhikara et al.
-
Perceptionlm: Open-access Data And Models For Detailed Visual Understanding
(2025)
• No Venue
Cho et al.
-
WEAVE: Unleashing And Benchmarking The In-context Interleaved Comprehension And Generation
(2025)
• No Venue
Chow et al.
-
Metaclip 2: A Worldwide Scaling Recipe
(2025)
• No Venue
Chuang et al.
-
Learning From Failures In Multi-attempt Reinforcement Learning
(2025)
• No Venue
Stephen Chung, Wenyu Du, Jie Fu
-
Modifying Large Language Model Post-training For Diverse Creative Writing
(2025)
• No Venue
Chung et al.
-
Deepresearchgym: A Free, Transparent, And Reproducible Evaluation Sandbox For Deep Research
(2025)
• No Venue
Coelho et al.
-
Forget What You Know About Llms Evaluations - Llms Are Like A Chameleon
(2025)
• No Venue
Cohen-Inger et al.
-
This Time Is Different: An Observability Perspective On Time Series Foundation Models
(2025)
• No Venue
Cohen et al.
-
Command A: An Enterprise-ready Large Language Model
(2025)
• No Venue
Cohere et al.
-
Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, Or True Temporal Understanding?
(2025)
• No Venue
Feng et al.
-
The Danger Of Overthinking: Examining The Reasoning-action Dilemma In Agentic Tasks
(2025)
• No Venue
Cuadron et al.
-
Paddleocr-vl: Boosting Multilingual Document Parsing Via A 0.9B Ultra-compact Vision-language Model
(2025)
• No Venue
Cui et al.
-
One-minute Video Generation With Test-time Training
(2025)
• No Venue
Dalal et al.
-
Rynnec: Bringing Mllms Into Embodied World
(2025)
• No Venue
Dang et al.
-
Alphaspace: Enabling Robotic Actions Through Semantic Tokenization And Symbolic Reasoning
(2025)
• No Venue
Alan Dao, Dinh Bach Vu, Bui Quang Huy
-
DPO Kernels: A Semantically-aware, Kernel-enhanced, And Divergence-rich Paradigm For Direct Preference Optimization
(2025)
• No Venue
Das et al.
-
Left|,circlearrowright,text{bus},right|: A Large And Diverse Multimodal Benchmark For Evaluating The Ability Of Vision-language Models To Understand Rebus Puzzles
(2025)
• No Venue
Das et al.
-
MV-RAG: Retrieval Augmented Multiview Diffusion
(2025)
• No Venue
Yosef Dayani, Omer Benishu, Sagie Benaim
-
Self-improvement In Multimodal Large Language Models: A Survey
(2025)
• No Venue
Deng et al.
-
Duoguard: A Two-player Rl-driven Framework For Multilingual LLM Guardrails
(2025)
• No Venue
Deng et al.
-
CINEMA: Coherent Multi-subject Video Generation Via Mllm-based Guidance
(2025)
• No Venue
Deng et al.
-
Interactcomp: Evaluating Search Agents With Ambiguous Queries
(2025)
• No Venue
Deng et al.
-
MAGREF: Masked Guidance For Any-reference Video Generation
(2025)
• No Venue
Deng et al.
-
Swe-bench Pro: Can AI Agents Solve Long-horizon Software Engineering Tasks?
(2025)
• No Venue
Deng et al.
-
Visual Chronicles: Using Multimodal Llms To Analyze Massive Collections Of Images
(2025)
• No Venue
Deng et al.
-
Evev2: Improved Baselines For Encoder-free Vision-language Models
(2025)
• No Venue
Diao et al.
-
Story2board: A Training-free Approach For Expressive Storyboard Generation
(2025)
• No Venue
Dinkevich et al.
-
Kling-avatar: Grounding Multimodal Instructions For Cascaded Long-duration Avatar Animation Synthesis
(2025)
• No Venue
Ding et al.
-
Mm-ifengine: Towards Multimodal Instruction Following
(2025)
• No Venue
Ding et al.
-
Nl2repo-bench: Towards Long-horizon Repository Generation Evaluation Of Coding Agents
(2025)
• No Venue
Ding et al.
-
Statistical Methods In Generative AI
(2025)
• No Venue
Edgar Dobriban
-
Machinelearninglm: Continued Pretraining Language Models On Millions Of Synthetic Tabular Prediction Tasks Scales In-context ML
(2025)
• No Venue
Dong et al.
-
Finch: Benchmarking Finance & Accounting Across Spreadsheet-centric Enterprise Workflows
(2025)
• No Venue
Dong et al.
-
Mmdocir: Benchmarking Multi-modal Retrieval For Long Documents
(2025)
• No Venue
Dong et al.
-
Mmtok: Multimodal Coverage Maximization For Efficient Inference Of Vlms
(2025)
• No Venue
Dong et al.
-
Automating Benchmark Design
(2025)
• No Venue
Dsouza et al.
-
Sailor2: Sailing In South-east Asia With Inclusive Multilingual Llms
(2025)
• No Venue
Dou et al.
-
SONAR-LLM: Autoregressive Transformer That Thinks In Sentence Embeddings And Speaks In Tokens
(2025)
• No Venue
Dragunov et al.
-
Deepresearch Bench: A Comprehensive Benchmark For Deep Research Agents
(2025)
• No Venue
Du et al.
-
Unimmvsr: A Unified Multi-modal Framework For Cascaded Video Super-resolution
(2025)
• No Venue
Du et al.
-
Democratizing Diplomacy: A Harness For Evaluating Any Large Language Model On Full-press Diplomacy
(2025)
• No Venue
Duffy et al.
-
Bridging The Gap Between Promise And Performance For Microscaling FP4 Quantization
(2025)
• No Venue
Egiazarian et al.
-
Rexbench: Can Coding Agents Autonomously Implement AI Research Extensions?
(2025)
• No Venue
Edwards et al.
-
Time To Talk: LLM Agents For Asynchronous Group Communication In Mafia Games
(2025)
• No Venue
Niv Eckhaus, Uri Berger, Gabriel Stanovsky
-
Evaluating Podcast Recommendations With Profile-aware Llm-as-a-judge
(2025)
• No Venue
Fabbri et al.
-
MMTEB: Massive Multilingual Text Embedding Benchmark
(2025)
• No Venue
Enevoldsen et al.
-
Turkcolbert: A Benchmark Of Dense And Late-interaction Models For Turkish Information Retrieval
(2025)
• No Venue
Ezerceli et al.
-
Are We On The Right Way To Assessing Llm-as-a-judge?
(2025)
• No Venue
Feng et al.
-
SSRL: Self-search Reinforcement Learning
(2025)
• No Venue
Fan et al.
-
GRIT: Teaching Mllms To Think With Images
(2025)
• No Venue
Fan et al.
-
Megascience: Pushing The Frontiers Of Post-training Datasets For Science Reasoning
(2025)
• No Venue
Run-Ze Fan, Zengzhi Wang, Pengfei Liu
-
The Prism Hypothesis: Harmonizing Semantic And Pixel Representations Via Unified Autoencoding
(2025)
• No Venue
Fan et al.
-
V-REX: Benchmarking Exploratory Visual Reasoning Via Chain-of-questions
(2025)
• No Venue
Fan et al.
-
Memp: Exploring Agent Procedural Memory
(2025)
• No Venue
Fang et al.
-
Creation-mmbench: Assessing Context-aware Creative Intelligence In MLLM
(2025)
• No Venue
Fang et al.
-
A Multi-modal AI Copilot For Single-cell Analysis With Instruction Following
(2025)
• No Venue
Fang et al.
-
Cognitive Kernel-pro: A Framework For Deep Research Agents And Agent Foundation Models Training
(2025)
• No Venue
Fang et al.
-
Llama-omni2: Llm-based Real-time Spoken Chatbot With Autoregressive Streaming Speech Synthesis
(2025)
• No Venue
Fang et al.
-
Dualvla: Building A Generalizable Embodied Agent Via Partial Decoupling Of Reasoning And Action
(2025)
• No Venue
Fang et al.
-
Flux-reason-6m & Prism-bench: A Million-scale Text-to-image Reasoning Dataset And Comprehensive Benchmark
(2025)
• No Venue
Fang et al.
-
Meshllm: Empowering Large Language Models To Progressively Understand And Generate 3D Mesh
(2025)
• No Venue
Fang et al.
-
Nerf Is A Valuable Assistant For 3D Gaussian Splatting
(2025)
• No Venue
Fang et al.
-
Libero-plus: In-depth Robustness Analysis Of Vision-language-action Models
(2025)
• No Venue
Fei et al.
-
On Path To Multimodal Generalist: General-level And General-bench
(2025)
• No Venue
Fei et al.
-
SRPO: Self-referential Policy Optimization For Vision-language-action Models
(2025)
• No Venue
Fei et al.
-
Mathreal: We Keep It Real! A Real Scene Benchmark For Evaluating Math Reasoning In Multimodal Large Language Models
(2025)
• No Venue
Feng et al.
-
Can Mllms Guide Me Home? A Benchmark Study On Fine-grained Visual Reasoning From Transit Maps
(2025)
• No Venue
Feng et al.
-
PHYSICS: Benchmarking Foundation Models On University-level Physics Problem Solving
(2025)
• No Venue
Feng et al.
-
What Characterizes Effective Reasoning? Revisiting Length, Review, And Structure Of Cot
(2025)
• No Venue
Feng et al.
-
Voxlect: A Speech Foundation Model Benchmark For Modeling Dialects And Regional Languages Around The Globe
(2025)
• No Venue
Feng et al.
-
Multiple Choice Questions: Reasoning Makes Large Language Models (llms) More Self-confident Even When They Are Wrong
(2025)
• No Venue
Fu et al.
-
Square: Sequential Question Answering Reasoning Engine For Enhanced Chain-of-thought In Large Language Models
(2025)
• No Venue
Fleischer et al.
-
Livevqa: Live Visual Knowledge Seeking
(2025)
• No Venue
Fu et al.
-
Listener-rewarded Thinking In Vlms For Image Preferences
(2025)
• No Venue
Gambashidze et al.
-
Sliderspace: Decomposing The Visual Capabilities Of Diffusion Models
(2025)
• No Venue
Gandikota et al.
-
Audio Flamingo 2: An Audio-language Model With Long-audio Understanding And Expert Reasoning Abilities
(2025)
• No Venue
Ghosh et al.
-
Arc-hunyuan-video-7b: Structured Video Comprehension Of Real-world Shorts
(2025)
• No Venue
Ge et al.
-
Agentscope 1.0: A Developer-centric Framework For Building Agentic Applications
(2025)
• No Venue
Gao et al.
-
A Survey Of Self-evolving Agents: On Path To Artificial Super Intelligence
(2025)
• No Venue
Gao et al.
-
Exploring Hallucination Of Large Multimodal Models In Video Understanding: Benchmark, Analysis And Mitigation
(2025)
• No Venue
Gao et al.
-
D-AR: Diffusion Via Autoregressive Models
(2025)
• No Venue
Ziteng Gao, Mike Zheng Shou
-
Do Vision-language Models Have Internal World Models? Towards An Atomic Evaluation
(2025)
• No Venue
Gao et al.
-
Tokenverse: Versatile Multi-concept Personalization In Token Modulation Space
(2025)
• No Venue
Garibi et al.
-
Txagent: An AI Agent For Therapeutic Reasoning Across A Universe Of Tools
(2025)
• No Venue
Gao et al.
-
The Aloe Family Recipe For Open And Specialized Healthcare Llms
(2025)
• No Venue
Garcia-Gasulla et al.
-
Visualoverload: Probing Visual Understanding Of Vlms In Really Dense Scenes
(2025)
• No Venue
Gavrikov et al.
-
Centurio: On Drivers Of Multilingual Ability Of Large Vision-language Model
(2025)
• No Venue
Geigle et al.
-
Inside-out: Hidden Factual Knowledge In Llms
(2025)
• No Venue
Gekhman et al.
-
You Do Not Fully Utilize Transformer's Representation Capacity
(2025)
• No Venue
Gerasimov et al.
-
Inverse Scaling In Test-time Compute
(2025)
• No Venue
Gema et al.
-
Drugreasoner: Interpretable Drug Approval Prediction With A Reasoning-augmented Language Model
(2025)
• No Venue
Ghaffarzadeh-Esfahani et al.
-
Can Llms Predict Their Own Failures? Self-awareness Via Internal Circuits
(2025)
• No Venue
Amirhosein Ghasemabadi, di Niu
-
Guided By Gut: Efficient Test-time Scaling With Reinforced Intrinsic Confidence
(2025)
• No Venue
Ghasemabadi et al.
-
Oagents: An Empirical Study Of Building Effective Agents
(2025)
• No Venue
Zhu et al.
-
Streamdit: Real-time Streaming Text-to-video Generation
(2025)
• No Venue
Kodaira et al.
-
Should We Still Pretrain Encoders With Masked Language Modeling?
(2025)
• No Venue
Gisserot-Boukhlef et al.
-
Gaperon: A Peppered English-french Generative Language Model Suite
(2025)
• No Venue
Godey et al.
-
Great Models Think Alike And This Undermines AI Oversight
(2025)
• No Venue
Goel et al.
-
Training AI Co-scientists Using Rubric Rewards
(2025)
• No Venue
Goel et al.
-
Finerweb: Datasets And Artifacts For Scalable Multilingual Named Entity Recognition
(2025)
• No Venue
Jonas Golde, Patrick Haller, Alan Akbik
-
Multi-token Attention
(2025)
• No Venue
Golovneva et al.
-
Onereward: Unified Mask-guided Image Generation Via Multi-task Human Preference Learning
(2025)
• No Venue
Gong et al.
-
VLA-0: Building State-of-the-art Vlas With Zero Modification
(2025)
• No Venue
Goyal et al.
-
The Differences Between Direct Alignment Algorithms Are A Blur
(2025)
• No Venue
Gorbatovski et al.
-
Mind2web 2: Evaluating Agentic Search With Agent-as-a-judge
(2025)
• No Venue
Gou et al.
-
Self-steering Language Models
(2025)
• No Venue
Grand et al.
-
Agentsnet: Coordination And Collaborative Reasoning In Multi-agent Llms
(2025)
• No Venue
Grötschla et al.
-
ACADREASON: Exploring The Limits Of Reasoning Models With Academic Research Problems
(2025)
• No Venue
Gui et al.
-
ETVA: Evaluation Of Text-to-video Alignment Via Fine-grained Question Generation And Answering
(2025)
• No Venue
Guan et al.
-
Textarena
(2025)
• No Venue
Guertler et al.
-
Taming Text-to-sounding Video Generation Via Advanced Modality Condition And Interaction
(2025)
• No Venue
Guan et al.
-
Latcoder: Converting Webpage Design To Code With Layout-as-thought
(2025)
• No Venue
Gui et al.
-
Mineworld: A Real-time And Open-source Interactive World Model On Minecraft
(2025)
• No Venue
Guo et al.
-
Mcp-agentbench: Evaluating Real-world Language Agent Performance With Mcp-mediated Tools
(2025)
• No Venue
Guo et al.
-
IMG: Calibrating Diffusion Models Via Implicit Multimodal Guidance
(2025)
• No Venue
Guo et al.
-
Exploring The Vulnerabilities Of Federated Learning: A Deep Dive Into Gradient Inversion Attacks
(2025)
• No Venue
Guo et al.
-
Swe-factory: Your Automated Factory For Issue Resolution Training Data And Evaluation Benchmarks
(2025)
• No Venue
Guo et al.
-
Rbench-v: A Primary Assessment For Visual Reasoning Models With Multi-modal Outputs
(2025)
• No Venue
Guo et al.
-
Seed1.5-vl Technical Report
(2025)
• No Venue
Guo et al.
-
Temporal Consistency For LLM Reasoning Process Error Identification
(2025)
• No Venue
Guo et al.
-
Generating An Image From 1,000 Words: Enhancing Text-to-image With Structured Captions
(2025)
• No Venue
Gutflaish et al.
-
Fasta^*: Fast-slow Toolpath Agent With Subroutine Mining For Efficient Multi-turn Image Editing
(2025)
• No Venue
Gupta et al.
-
Web-cogreasoner: Towards Knowledge-induced Cognitive Reasoning For Web Agents
(2025)
• No Venue
Guo et al.
-
Llm-guided Hierarchical Retrieval
(2025)
• No Venue
Gupta et al.
-
Enhancing Automated Interpretability With Output-centric Feature Descriptions
(2025)
• No Venue
Gur-Arieh et al.
-
Beyond The Last Answer: Your Reasoning Trace Uncovers More Than You Think
(2025)
• No Venue
Hasan Abed Al Kader Hammoud, Hani Itani, Bernard Ghanem
-
Beyond Recognition: Evaluating Visual Perspective Taking In Vision Language Models
(2025)
• No Venue
Góral et al.
-
Simpleqa Verified: A Reliable Factuality Benchmark To Measure Parametric Knowledge
(2025)
• No Venue
Haas et al.
-
Decoding Open-ended Information Seeking Goals From Eye Movements In Reading
(2025)
• No Venue
Hadar et al.
-
Hala Technical Report: Building Arabic-centric Instruction & Translation Models At Scale
(2025)
• No Venue
Hasan Abed Al Kader Hammoud, Mohammad Zbeeb, Bernard Ghanem
-
Multiagentbench: Evaluating The Collaboration And Competition Of LLM Agents
(2025)
• No Venue
Zhu et al.
-
Mutarjim: Advancing Bidirectional Arabic-english Translation With A Small Language Model
(2025)
• No Venue
Hennara et al.
-
Cut2next: Generating Next Shot Via In-context Tuning
(2025)
• No Venue
He et al.
-
Mind The Generation Process: Fine-grained Confidence Estimation During LLM Generation
(2025)
• No Venue
Han et al.
-
Unireditbench: A Unified Reasoning-based Image Editing Benchmark
(2025)
• No Venue
Han et al.
-
Trillion 7B Technical Report
(2025)
• No Venue
Han et al.
-
Don't Overthink It. Preferring Shorter Thinking Chains For Improved LLM Reasoning
(2025)
• No Venue
Hassid et al.
-
Healthy Llms? Benchmarking LLM Knowledge Of UK Government Public Health Information
(2025)
• No Venue
Harris et al.
-
RLP: Reinforcement As A Pretraining Objective
(2025)
• No Venue
Hatamizadeh et al.
-
Chronoplay: A Framework For Modeling Dual Dynamics And Authenticity In Game RAG Benchmarks
(2025)
• No Venue
He et al.
-
Can Large Language Models Detect Errors In Long Chain-of-thought Reasoning?
(2025)
• No Venue
He et al.
-
Deepmath-103k: A Large-scale, Challenging, Decontaminated, And Verifiable Mathematical Dataset For Advancing Reasoning
(2025)
• No Venue
He et al.
-
Hardtests: Synthesizing High-quality Test Cases For LLM Coding
(2025)
• No Venue
He et al.
-
Ruler-bench: Probing Rule-based Reasoning Abilities Of Next-level Video Generation Models For Vision Foundation Intelligence
(2025)
• No Venue
He et al.
-
Random Policy Valuation Is Enough For LLM Reasoning With Verifiable Rewards
(2025)
• No Venue
He et al.
-
Protoreasoning: Prototypes As The Foundation For Generalizable Reasoning In Llms
(2025)
• No Venue
He et al.
-
Videoscore2: Think Before You Score In Generative Video Evaluation
(2025)
• No Venue
He et al.
-
Swe-perf: Can Language Models Optimize Code Performance On Real-world Repositories?
(2025)
• No Venue
He et al.
-
Baseer: A Vision-language Model For Arabic Document-to-markdown OCR
(2025)
• No Venue
Hennara et al.
-
CASS: Nvidia To AMD Transpilation With Data, Models, And Benchmark
(2025)
• No Venue
Heakl et al.
-
Vitabench: Benchmarking LLM Agents With Versatile Interactive Tasks In Real-world Applications
(2025)
• No Venue
He et al.
-
AIN: The Arabic Inclusive Large Multimodal Model
(2025)
• No Venue
Heakl et al.
-
Dr.llm: Dynamic Layer Routing In Llms
(2025)
• No Venue
Heakl et al.
-
A Definition Of AGI
(2025)
• No Venue
Hendrycks et al.
-
Deepeyesv2: Toward Agentic Multimodal Model
(2025)
• No Venue
Hong et al.
-
Olmoearth: Stable Latent Image Modeling For Multimodal Earth Observation
(2025)
• No Venue
Herzog et al.
-
Apertus: Democratizing Open And Compliant Llms For Global Language Environments
(2025)
• No Venue
Hernández-Cano et al.
-
Omni-rgpt: Unifying Image And Video Region-level Understanding Via Token Marks
(2025)
• No Venue
Heo et al.
-
A Sober Look At Progress In Language Model Reasoning: Pitfalls And Paths To Reproducibility
(2025)
• No Venue
Hochlehnert et al.
-
Beyond One-size-fits-all: Inversion Learning For Highly Effective NLG Evaluation Prompts
(2025)
• No Venue
Hong et al.
-
Xolver: Multi-agent Reasoning With Holistic Experience Learning Just Like An Olympiad Team
(2025)
• No Venue
Hosain et al.
-
Musicinfuser: Making Video Diffusion Listen And Dance
(2025)
• No Venue
Hong et al.
-
Motionbench: Benchmarking And Improving Fine-grained Video Motion Understanding For Vision Language Models
(2025)
• No Venue
Hong et al.
-
From Kmmlu-redux To Kmmlu-pro: A Professional Korean Benchmark Suite For LLM Evaluation
(2025)
• No Venue
Hong et al.
-
Glm-4.1v-thinking: Towards Versatile Multimodal Reasoning With Scalable Reinforcement Learning
(2025)
• No Venue
Hong et al.
-
Dita: Scaling Diffusion Transformer For Generalist Vision-language-action Policy
(2025)
• No Venue
Hou et al.
-
A Survey Of Scientific Large Language Models: From Data Foundations To Agent Frontiers
(2025)
• No Venue
Hu et al.
-
Group Think: Multiple Concurrent Reasoning Agents Collaborating At Token Level Granularity
(2025)
• No Venue
Hsu et al.
-
Is Extending Modality The Right Path Towards Omni-modality?
(2025)
• No Venue
Zhu et al.
-
Ultragen: High-resolution Video Generation With Hierarchical Attention
(2025)
• No Venue
Hu et al.
-
Memory In The Age Of AI Agents
(2025)
• No Venue
Hu et al.
-
Groundingsuite: Measuring Complex Multi-granular Pixel Grounding
(2025)
• No Venue
Hu et al.
-
Evaluating Memory In LLM Agents Via Incremental Multi-turn Interactions
(2025)
• No Venue
Yuanzhe Hu, Yu Wang, Julian McAuley
-
Finsearchcomp: Towards A Realistic, Expert-level Evaluation Of Financial Search And Reasoning
(2025)
• No Venue
Hu et al.
-
MCTS-RAG: Enhancing Retrieval-augmented Generation With Monte Carlo Tree Search
(2025)
• No Venue
Hu et al.
-
Image Editing As Programs With Diffusion Models
(2025)
• No Venue
Hu et al.
-
Lmgame-bench: How Good Are Llms At Playing Games?
(2025)
• No Venue
Hu et al.
-
Text2world: Benchmarking Large Language Models For Symbolic World Model Generation
(2025)
• No Venue
Hu et al.
-
Multimodal Rewardbench 2: Evaluating Omni Reward Models For Interleaved Text And Image
(2025)
• No Venue
Hu et al.
-
REINFORCE++: A Simple And Efficient Approach For Aligning Large Language Models
(2025)
• No Venue
Jian Hu
-
Step-deepresearch Technical Report
(2025)
• No Venue
Hu et al.
-
See, Point, Fly: A Learning-free VLM Framework For Universal Unmanned Aerial Navigation
(2025)
• No Venue
Hu et al.
-
Internvl3: Exploring Advanced Training And Test-time Recipes For Open-source Multimodal Models
(2025)
• No Venue
Zhu et al.
-
Attentioninfluence: Adopting Attention Head Influence For Weak-to-strong Pretraining Data Selection
(2025)
• No Venue
Hua et al.
-
Video-mmmu: Evaluating Knowledge Acquisition From Multi-discipline Professional Videos
(2025)
• No Venue
Hu et al.
-
Vision-language-action Models For Autonomous Driving: Past, Present, And Future
(2025)
• No Venue
Hu et al.
-
Benchmax: A Comprehensive Multilingual Evaluation Suite For Large Language Models
(2025)
• No Venue
Huang et al.
-
Building A Foundational Guardrail For General Agentic Systems Via Synthetic Data
(2025)
• No Venue
Huang et al.
-
Conceptmaster: Multi-concept Video Customization On Diffusion Transformer Models Without Test-time Tuning
(2025)
• No Venue
Huang et al.
-
CRITICTOOL: Evaluating Self-critique Capabilities Of Large Language Models In Tool-calling Error Scenarios
(2025)
• No Venue
Huang et al.
-
How Much 3D Do Video Foundation Models Encode?
(2025)
• No Venue
Huang et al.
-
ILLUME+: Illuminating Unified MLLM With Dual Visual Tokenization And Diffusion Refinement
(2025)
• No Venue
Huang et al.
-
M1: Unleash The Potential Of Test-time Scaling For Medical Reasoning With Large Language Models
(2025)
• No Venue
Huang et al.
-
O1 Replication Journey -- Part 3: Inference-time Scaling For Medical Reasoning
(2025)
• No Venue
Huang et al.
-
On The Trustworthiness Of Generative Foundation Models: Guideline, Assessment, And Perspective
(2025)
• No Venue
Huang et al.
-
Ultramemv2: Memory Networks Scaling To 120B Parameters With Superior Long-context Learning
(2025)
• No Venue
Huang et al.
-
Reinforced Internal-external Knowledge Synergistic Reasoning For Efficient Adaptive Search Agent
(2025)
• No Venue
Huang et al.
-
Active-o3: Empowering Multimodal Large Language Models With Active Perception Via GRPO
(2025)
• No Venue
Zhu et al.
-
INTIMA: A Benchmark For Human-ai Companionship Behavior
(2025)
• No Venue
Lucie-Aimée Kaffee, Giada Pistilli, Yacine Jernite
-
BIRD-INTERACT: Re-imagining Text-to-sql Evaluation For Large Language Models Via Lens Of Dynamic Interactions
(2025)
• No Venue
Huo et al.
-
Collm: A Large Language Model For Composed Image Retrieval
(2025)
• No Venue
Huynh et al.
-
Multi-granular Spatio-temporal Token Merging For Training-free Acceleration Of Video Llms
(2025)
• No Venue
Hyun et al.
-
Lego-eval: Towards Fine-grained Evaluation On Synthesizing 3D Embodied Environments With Tool Augmentation
(2025)
• No Venue
Hwangbo et al.
-
Ambik: Dataset Of Ambiguous Tasks In Kitchen Environment
(2025)
• No Venue
Ivanova et al.
-
Is Safety Standard Same For Everyone? User-specific Safety Evaluation Of Large Language Models
(2025)
• No Venue
In et al.
-
The African Languages Lab: A Collaborative Approach To Advancing Low-resource African NLP
(2025)
• No Venue
Issaka et al.
-
Sentinel: SOTA Model To Protect Against Prompt Injections
(2025)
• No Venue
Dror Ivry, Oran Nahum
-
Large-scale Data Selection For Instruction Tuning
(2025)
• No Venue
Ivison et al.
-
Silent Branding Attack: Trigger-free Data Poisoning Attack On Text-to-image Diffusion Models
(2025)
• No Venue
Jang et al.
-
Reasoning Model Is Stubborn: Diagnosing Instruction Overriding In Reasoning Models
(2025)
• No Venue
Jang et al.
-
Adaptive Multi-agent Response Refinement In Conversational Systems
(2025)
• No Venue
Jeong et al.
-
G-FOCUS: Towards A Robust Method For Assessing UI Design Persuasiveness
(2025)
• No Venue
Jeon et al.
-
Realharm: A Collection Of Real-world Language Model Application Failures
(2025)
• No Venue
Jeune et al.
-
Wavreward: Spoken Dialogue Models With Generalist Reward Evaluators
(2025)
• No Venue
Ji et al.
-
Omnispatial: Towards Comprehensive Spatial Reasoning Benchmark For Vision Language Models
(2025)
• No Venue
Jia et al.
-
Omnisafebench-mm: A Unified Benchmark And Toolbox For Multimodal Jailbreak Attack-defense Evaluation
(2025)
• No Venue
Jia et al.
-
CSVQA: A Chinese Multimodal Benchmark For Evaluating STEM Reasoning Capabilities Of Vlms
(2025)
• No Venue
Jian et al.
-
Dianjin-r1: Evaluating And Enhancing Financial Reasoning In Large Language Models
(2025)
• No Venue
Zhu et al.
-
Feedback Friction: Llms Struggle To Fully Incorporate External Feedback
(2025)
• No Venue
Jiang et al.
-
Are Today's Llms Ready To Explain Well-being Concepts?
(2025)
• No Venue
Jiang et al.
-
Generalist Foundation Models Are Not Clinical Enough For Hospital Operations
(2025)
• No Venue
Jiang et al.
-
Omnihuman-1.5: Instilling An Active Mind In Avatars Via Cognitive Simulation
(2025)
• No Venue
Jiang et al.
-
Mme-cot: Benchmarking Chain-of-thought In Large Multimodal Models For Reasoning Quality, Robustness, And Efficiency
(2025)
• No Venue
Jiang et al.
-
Multimodal Llms Can Reason About Aesthetics In Zero-shot
(2025)
• No Venue
Ruixiang Jiang, Changwen Chen
-
S2s-arena, Evaluating Speech2speech Protocols On Instruction Following With Paralinguistic Information
(2025)
• No Venue
Jiang et al.
-
Robust And Fine-grained Detection Of AI Generated Texts
(2025)
• No Venue
Kadiyala et al.
-
Verltool: Towards Holistic Agentic Reinforcement Learning With Tool Use
(2025)
• No Venue
Jiang et al.
-
Token-efficient Long Video Understanding For Multimodal Llms
(2025)
• No Venue
Jiang et al.
-
When Benchmarks Age: Temporal Misalignment Through Large Language Model Factuality Evaluation
(2025)
• No Venue
Jiang et al.
-
Omni-reward: Towards Generalist Omni-modal Reward Modeling With Free-form Preferences
(2025)
• No Venue
Jin et al.
-
Loopholing Discrete Diffusion: Deterministic Bypass Of The Sampling Wall
(2025)
• No Venue
Jo et al.
-
Continuous Diffusion Model For Language Modeling
(2025)
• No Venue
Jaehyeong Jo, Sung Ju Hwang
-
Mixture Of Horizons In Action Chunking
(2025)
• No Venue
Jing et al.
-
Is That Your Final Answer? Test-time Scaling Improves Selective Question Answering
(2025)
• No Venue
William Jurayj, Jeffrey Cheng, Benjamin van Durme
-
Flex-judge: Think Once, Judge Anywhere
(2025)
• No Venue
Ko et al.
-
Liteasr: Efficient Automatic Speech Recognition With Low-rank Approximation
(2025)
• No Venue
Kamahori et al.
-
Why Language Models Hallucinate
(2025)
• No Venue
Kalai et al.
-
Expect The Unexpected: Failsafe Long Context QA For Finance
(2025)
• No Venue
Kamble et al.
-
LM2: Large Memory Models
(2025)
• No Venue
Kang et al.
-
Pippo: High-resolution Multi-view Humans From A Single Image
(2025)
• No Venue
Kant et al.
-
UNCAGE: Contrastive Attention Guidance For Masked Generative Transformers In Text-to-image Generation
(2025)
• No Venue
Kang et al.
-
Parallelbench: Understanding The Trade-offs Of Parallel Decoding In Diffusion Llms
(2025)
• No Venue
Kang et al.
-
Pcoreset: Effective Active Learning Through Knowledge Distillation From Vision-language Models
(2025)
• No Venue
Kang et al.
-
T1: Tool-integrated Self-verification For Test-time Compute Scaling In Small Language Models
(2025)
• No Venue
Minki Kang, Jongwon Jeong, Jaewoong Cho
-
Éclair -- Extracting Content And Layout With Integrated Reading Order For Documents
(2025)
• No Venue
Karmanov et al.
-
The Heap: A Contamination-free Multilingual Code Dataset For Evaluating Large Language Models
(2025)
• No Venue
Katzy et al.
-
Llama-3.1-foundationai-securityllm-base-8b Technical Report
(2025)
• No Venue
Kassianik et al.
-
Executable Functional Abstractions: Inferring Generative Programs For Advanced Math Problems
(2025)
• No Venue
Khan et al.
-
Big-bench Extra Hard
(2025)
• No Venue
Kazemi et al.
-
Demystifying Domain-adaptive Post-training For Financial Llms
(2025)
• No Venue
Ke et al.
-
Process Reward Models That Think
(2025)
• No Venue
Khalifa et al.
-
LINGOLY-TOO: Disentangling Memorisation From Reasoning With Linguistic Templatisation And Orthographic Obfuscation
(2025)
• No Venue
Khouja et al.
-
Gigaevo: An Open Source Optimization Framework Powered By Llms And Evolution Algorithms
(2025)
• No Venue
Khrulkov et al.
-
Hypencoder: Hypernetworks For Information Retrieval
(2025)
• No Venue
Julian Killingback, Hansi Zeng, Hamed Zamani
-
Kvzip: Query-agnostic KV Cache Compression With Context Reconstruction
(2025)
• No Venue
Kim et al.
-
On-device Sora: Enabling Diffusion-based Text-to-video Generation For Mobile Devices
(2025)
• No Venue
Kim et al.
-
Mol-llama: Towards General Understanding Of Molecules In Large Molecular Language Model
(2025)
• No Venue
Dongki Kim, Wonbin Lee, Sung Ju Hwang
-
MMPB: It's Time For Multi-modal Personalization
(2025)
• No Venue
Kim et al.
-
Music Arena: Live Evaluation For Text-to-music
(2025)
• No Venue
Kim et al.
-
PLADIS: Pushing The Limits Of Attention In Diffusion Models At Inference Time By Leveraging Sparsity
(2025)
• No Venue
Kwanyoung Kim, Byeongsu Sim
-
World In A Frame: Understanding Culture Mixing As A New Challenge For Vision-language Models
(2025)
• No Venue
Kim et al.
-
Dreamrenderer: Taming Multi-instance Attribute Control In Large-scale Text-to-image Models
(2025)
• No Venue
Zhou et al.
-
Voxtral
(2025)
• No Venue
Liu et al.
-
Towards Safety Reasoning In Llms: Ai-agentic Deliberation For Policy-embedded Cot Data Creation
(2025)
• No Venue
Kumarage et al.
-
Exp-bench: Can AI Conduct AI Research Experiments?
(2025)
• No Venue
Kon et al.
-
SDPO: Segment-level Direct Preference Optimization For Social Agents
(2025)
• No Venue
Kong et al.
-
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
(2025)
• No Venue
Kordi et al.
-
Benchmarking AI Models In Software Engineering: A Review, Search Tool, And Enhancement Protocol
(2025)
• No Venue
Roham Koohestani, Philippe de Bekker, Maliheh Izadi
-
Ortsae: Orthogonal Sparse Autoencoders Uncover Atomic Features
(2025)
• No Venue
Korznikov et al.
-
Jina-vlm: Small Multilingual Vision Language Model
(2025)
• No Venue
Koukounas et al.
-
Lettucedetect: A Hallucination Detection Framework For RAG Applications
(2025)
• No Venue
Ádám Kovács, Gábor Recski
-
From Scores To Skills: A Cognitive Diagnosis Framework For Evaluating Financial Large Language Models
(2025)
• No Venue
Kuang et al.
-
Theoremexplainagent: Towards Multimodal Explanations For LLM Theorem Understanding
(2025)
• No Venue
Ku et al.
-
Nohumansrequired: Autonomous High-quality Image Editing Triplet Mining
(2025)
• No Venue
Kuprashevich et al.
-
Measuring AI Ability To Complete Long Tasks
(2025)
• No Venue
Kwa et al.
-
Patientsim: A Persona-driven Simulator For Realistic Doctor-patient Interactions
(2025)
• No Venue
Kyung et al.
-
Embodied Agents Meet Personalization: Exploring Memory Utilization For Personalized Assistance
(2025)
• No Venue
Kwon et al.
-
Mappo: Maximum A Posteriori Preference Optimization With Prior Knowledge
(2025)
• No Venue
Lan et al.
-
Evolving Deeper LLM Thinking
(2025)
• No Venue
Lee et al.
-
CLASH: Evaluating Language Models On Judging High-stakes Dilemmas From Multiple Perspectives
(2025)
• No Venue
Lee et al.
-
Infinitehip: Extending Language Model Context Up To 3 Million Tokens On A Single GPU
(2025)
• No Venue
Lee et al.
-
Spacer: Towards Engineered Scientific Inspiration
(2025)
• No Venue
Lee et al.
-
Rethinking Reward Models For Multi-domain Test-time Scaling
(2025)
• No Venue
Lee et al.
-
Refinebench: Evaluating Refinement Capability Of Language Models Via Checklists
(2025)
• No Venue
Lee et al.
-
Saferoute: Adaptive Model Selection For Efficient And Accurate Safety Guardrails In Large Language Models
(2025)
• No Venue
Lee et al.
-
Dacomp: Benchmarking Data Agents Across The Full Data Intelligence Lifecycle
(2025)
• No Venue
Lei et al.
-
Offtopiceval: When Large Language Models Enter The Wrong Chat, Almost Always!
(2025)
• No Venue
Lei et al.
-
IMAGINE-E: Image Generation Intelligence Evaluation Of State-of-the-art Text-to-image Models
(2025)
• No Venue
Lei et al.
-
Infantagent-next: A Multimodal Generalist Agent For Automated Computer Interaction
(2025)
• No Venue
Lei et al.
-
Intellagent: A Multi-agent Framework For Evaluating Conversational AI Systems
(2025)
• No Venue
Elad Levi, Ilan Kadar
-
Bindweave: Subject-consistent Video Generation Via Cross-modal Integration
(2025)
• No Venue
Li et al.
-
Audiotrust: Benchmarking The Multifaceted Trustworthiness Of Audio Large Language Models
(2025)
• No Venue
Li et al.
-
Advances In Speech Separation: Techniques, Challenges, And Future Trends
(2025)
• No Venue
Li et al.
-
Autotriton: Automatic Triton Programming With Reinforcement Learning In Llms
(2025)
• No Venue
Li et al.
-
Cipherbank: Exploring The Boundary Of LLM Reasoning Capabilities Through Cryptography Challenges
(2025)
• No Venue
Li et al.
-
Chain-of-agents: End-to-end Agent Foundation Models Via Multi-agent Distillation And Agentic RL
(2025)
• No Venue
Li et al.
-
Visual-cog: Stage-aware Reinforcement Learning With Chain Of Guidance For Text-to-image Generation
(2025)
• No Venue
Li et al.
-
Sculptor: Empowering Llms With Cognitive Agency Via Active Context Management
(2025)
• No Venue
Li et al.
-
Miromind-m1: An Open-source Advancement In Mathematical Reasoning Via Context-aware Multi-stage Policy Optimization
(2025)
• No Venue
Li et al.
-
Drafterbench: Benchmarking Large Language Models For Tasks Automation In Civil Engineering
(2025)
• No Venue
Yinsheng Li, Zhen Dong, Yi Shao
-
CRINN: Contrastive Reinforcement Learning For Approximate Nearest Neighbor Search
(2025)
• No Venue
Li et al.
-
Diffusion Language Models Know The Answer Before Decoding
(2025)
• No Venue
Li et al.
-
Deepcode: Open Agentic Coding
(2025)
• No Venue
Li et al.
-
Fea-bench: A Benchmark For Evaluating Repository-level Code Generation For Feature Implementation
(2025)
• No Venue
Li et al.
-
Dualthor: A Dual-arm Humanoid Simulation Platform For Contingency-aware Planning
(2025)
• No Venue
Li et al.
-
How Instruction And Reasoning Data Shape Post-training: Data Quality Through The Lens Of Layer-wise Gradients
(2025)
• No Venue
Li et al.
-
Groundingme: Exposing The Visual Grounding Gap In Mllms Through Multi-dimensional Evaluation
(2025)
• No Venue
Li et al.
-
Gir-bench: Versatile Benchmark For Generating Images With Reasoning
(2025)
• No Venue
Li et al.
-
GRAN-TED: Generating Robust, Aligned, And Nuanced Text Embedding For Diffusion Models
(2025)
• No Venue
Li et al.
-
Have We Unified Image Generation And Understanding Yet? An Empirical Study Of Gpt-4o's Image Generation Ability
(2025)
• No Venue
Ning Li, Jingran Zhang, Justin Cui
-
If-vidcap: Can Video Caption Models Follow Instructions?
(2025)
• No Venue
Li et al.
-
IA-T2I: Internet-augmented Text-to-image Generation
(2025)
• No Venue
Li et al.
-
Migician: Revealing The Magic Of Free-form Multi-image Grounding In Multimodal Large Language Models
(2025)
• No Venue
Li et al.
-
Memory-efficient Visual Autoregressive Modeling With Scale-aware KV Cache Compression
(2025)
• No Venue
Li et al.
-
MANZANO: A Simple And Scalable Unified Multimodal Model With A Hybrid Vision Tokenizer
(2025)
• No Venue
Li et al.
-
Mm-browsecomp: A Comprehensive Benchmark For Multimodal Browsing Agents
(2025)
• No Venue
Li et al.
-
MITS: Enhanced Tree Search Reasoning For Llms Via Pointwise Mutual Information
(2025)
• No Venue
Li et al.
-
Omnivideobench: Towards Audio-visual Understanding Evaluation For Omni Mllms
(2025)
• No Venue
Li et al.
-
Preference Leakage: A Contamination Problem In Llm-as-a-judge
(2025)
• No Venue
Li et al.
-
Ovo-bench: How Far Is Your Video-llms From Real-world Online Video Understanding?
(2025)
• No Venue
Li et al.
-
Reportbench: Evaluating Deep Research Agents Via Academic Survey Tasks
(2025)
• No Venue
Li et al.
-
Reinforcement Learning On Pre-training Data
(2025)
• No Venue
Li et al.
-
Routing Manifold Alignment Improves Generalization Of Mixture-of-experts Llms
(2025)
• No Venue
Zhongyang Li, Ziyue Li, Tianyi Zhou
-
Viewspatial-bench: Evaluating Multi-perspective Spatial Localization In Vision-language Models
(2025)
• No Venue
Li et al.
-
Tir-bench: A Comprehensive Benchmark For Agentic Thinking-with-images Reasoning
(2025)
• No Venue
Li et al.
-
STAR-R1: Spatial Transformation Reasoning By Reinforcing Multimodal Llms
(2025)
• No Venue
Li et al.
-
Semantically-aware Rewards For Open-ended R1 Training In Free-form Generation
(2025)
• No Venue
Li et al.
-
Slimmoe: Structured Compression Of Large Moe Models Via Expert Slimming And Distillation
(2025)
• No Venue
Li et al.
-
Skyra: Ai-generated Video Detection Via Grounded Artifact Reasoning
(2025)
• No Venue
Li et al.
-
Sos1: O1 And R1-like Reasoning Llms Are Sum-of-square Solvers
(2025)
• No Venue
Li et al.
-
Speculative Ad-hoc Querying
(2025)
• No Venue
Li et al.
-
Test-time Preference Optimization: On-the-fly Alignment Via Iterative Textual Feedback
(2025)
• No Venue
Li et al.
-
Structflowbench: A Structured Flow Benchmark For Multi-turn Instruction Following
(2025)
• No Venue
Li et al.
-
Swarmsys: Decentralized Swarm-inspired Agents For Scalable And Adaptive Reasoning
(2025)
• No Venue
Li et al.
-
Tempsamp-r1: Effective Temporal Sampling With Reinforcement Fine-tuning For Video Llms
(2025)
• No Venue
Li et al.
-
Uni-moe-2.0-omni: Scaling Language-centric Omnimodal Large Model With Advanced Moe, Training And Data
(2025)
• No Venue
Li et al.
-
Towards Visual Text Grounding Of Multimodal Large Language Model
(2025)
• No Venue
Li et al.
-
Unfolding Spatial Cognition: Evaluating Multimodal Models On Visual Simulations
(2025)
• No Venue
Li et al.
-
Reasoning Like An Economist: Post-training On Economic Problems Induces Strategic Generalization In Llms
(2025)
• No Venue
Zhou et al.
-
Μ^2tokenizer: Differentiable Multi-scale Multi-modal Tokenizer For Radiology Report Generation
(2025)
• No Venue
Li et al.
-
Zebra-cot: A Dataset For Interleaved Vision Language Reasoning
(2025)
• No Venue
Li et al.
-
Who's Your Judge? On The Detectability Of Llm-generated Judgments
(2025)
• No Venue
Li et al.
-
A.S.E: A Repository-level Benchmark For Evaluating Security In Ai-generated Code
(2025)
• No Venue
Lian et al.
-
AI Meets Brain: Memory Systems From Cognitive Neuroscience To Autonomous Agents
(2025)
• No Venue
Liang et al.
-
Discrete Diffusion VLA: Bringing Discrete Diffusion To Action Decoding In Vision-language-action Policies
(2025)
• No Venue
Liang et al.
-
Colorbench: Can Vlms See And Understand The Colorful World? A Comprehensive Benchmark For Color Perception, Reasoning, And Robustness
(2025)
• No Venue
Liang et al.
-
CLUE: Non-parametric Verification From Experience Via Hidden-state Clustering
(2025)
• No Venue
Liang et al.
-
Machine Bullshit: Characterizing The Emergent Disregard For Truth In Large Language Models
(2025)
• No Venue
Liang et al.
-
Surveyx: Academic Survey Automation Via Large Language Models
(2025)
• No Venue
Liang et al.
-
Saferag: Benchmarking Security In Retrieval-augmented Generation Of Large Language Model
(2025)
• No Venue
Liang et al.
-
RLHS: Mitigating Misalignment In RLHF With Hindsight Simulation
(2025)
• No Venue
Liang et al.
-
ROVER: Benchmarking Reciprocal Cross-modal Reasoning For Omnimodal Generation
(2025)
• No Venue
Liang et al.
-
Are We Using The Right Benchmark: An Evaluation Framework For Visual Token Compression Methods
(2025)
• No Venue
Liao et al.
-
Towards Personalized Deep Research: Benchmarks And Evaluations
(2025)
• No Venue
Liang et al.
-
Reward-guided Speculative Decoding For Efficient LLM Reasoning
(2025)
• No Venue
Liao et al.
-
Uniworld: High-resolution Semantic Encoders For Unified Visual Understanding And Generation
(2025)
• No Venue
Lin et al.
-
Computer-use Agents As Judges For Generative User Interface
(2025)
• No Venue
Lin et al.
-
Autoregressive Adversarial Post-training For Real-time Interactive Video Generation
(2025)
• No Venue
Lin et al.
-
Exploring Mllm-diffusion Information Transfer With Metacanvas
(2025)
• No Venue
Lin et al.
-
Generative Evaluation Of Complex Reasoning In Large Language Models
(2025)
• No Venue
Lin et al.
-
Ost-bench: Evaluating The Capabilities Of Mllms In Online Spatio-temporal Scene Understanding
(2025)
• No Venue
Lin et al.
-
Towards Understanding Camera Motions In Any Video
(2025)
• No Venue
Lin et al.
-
Quantization Meets Dllms: A Systematic Study Of Post-training Quantization For Diffusion Llms
(2025)
• No Venue
Lin et al.
-
Step-kto: Optimizing Mathematical Reasoning Through Stepwise Binary Feedback
(2025)
• No Venue
Lin et al.
-
Vcode: A Multimodal Coding Benchmark With SVG As Symbolic Visual Representation
(2025)
• No Venue
Lin et al.
-
Zebralogic: On The Scaling Limits Of Llms For Logical Reasoning
(2025)
• No Venue
Lin et al.
-
Wildifeval: Instruction Following In The Wild
(2025)
• No Venue
Lior et al.
-
Deciphering Trajectory-aided LLM Reasoning: An Optimization Perspective
(2025)
• No Venue
Liu et al.
-
Can World Simulators Reason? Gen-vire: A Generative Visual Reasoning Benchmark
(2025)
• No Venue
Liu et al.
-
ATLAS: A High-difficulty, Multidisciplinary Benchmark For Frontier Scientific Reasoning
(2025)
• No Venue
Liu et al.
-
Advances And Challenges In Foundation Agents: From Brain-inspired Intelligence To Evolutionary, Collaborative, And Safe Systems
(2025)
• No Venue
Liu et al.
-
Agent0-vl: Exploring Self-evolving Agent For Tool-integrated Vision-language Reasoning
(2025)
• No Venue
Liu et al.
-
Fone: Precise Single-token Number Embeddings Via Fourier Features
(2025)
• No Venue
Zhou et al.
-
Infecting Generative AI With Viruses
(2025)
• No Venue
David Noever, Forrest McKee
-
Does Understanding Inform Generation In Unified Multimodal Models? From Analysis To Path Forward
(2025)
• No Venue
Niu et al.
-
Learning To Reason In 4D: Dynamic Spatial Understanding For Vision Language Models
(2025)
• No Venue
Zhou et al.
-
Viscoder2: Building Multi-language Visualization Coding Agents
(2025)
• No Venue
Ni et al.
-
Can Large Language Models Capture Human Annotator Disagreements?
(2025)
• No Venue
Ni et al.
-
A Survey On Large Language Model Benchmarks
(2025)
• No Venue
Ni et al.
-
Annotation-efficient Universal Honesty Alignment
(2025)
• No Venue
Ni et al.
-
UQ: Assessing Language Models On Unsolved Questions
(2025)
• No Venue
Nie et al.
-
Viscoder: Fine-tuning Llms For Executable Python Visualization Code Generation
(2025)
• No Venue
Ni et al.
-
Diffusion Language Models Are Super Data Learners
(2025)
• No Venue
Ni et al.
-
Drax: Speech Recognition With Discrete Flow Matching
(2025)
• No Venue
Navon et al.
-
Effective Red-teaming Of Policy-adherent Agents
(2025)
• No Venue
Nakash et al.
-
Mlgym: A New Framework And Benchmark For Advancing AI Research Agents
(2025)
• No Venue
Nathani et al.
-
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks And Prompt Injections
(2025)
• No Venue
Nasr et al.
-
A Unified Framework For Detecting Point And Collective Anomalies In Operating System Logs Via Collaborative Transformers
(2025)
• No Venue
Mohammad Nasirzadeh, Jafar Tahmoresnezhad, Parviz Rashidi-Khazaee
-
Oneflow: Concurrent Mixed-modal And Interleaved Generation With Edit Flows
(2025)
• No Venue
Nguyen et al.
-
Lizard: An Efficient Linearization Framework For Large Language Models
(2025)
• No Venue
Nguyen et al.
-
The Sparse Frontier: Sparse Attention Trade-offs In Transformer Llms
(2025)
• No Venue
Nawrot et al.
-
Synthdetoxm: Modern Llms Are Few-shot Parallel Detoxification Data Annotators
(2025)
• No Venue
Moskovskiy et al.
-
Vlm2vec-v2: Advancing Multimodal Embedding For Videos, Images, And Visual Documents
(2025)
• No Venue
Meng et al.
-
Viplan: A Benchmark For Visual Planning With Symbolic Predicates And Vision-language Models
(2025)
• No Venue
Merler et al.
-
Nablanabla: Neighborhood Adaptive Block-level Attention
(2025)
• No Venue
Mikhailov et al.
-
Paper2agent: Reimagining Research Papers As Interactive And Reliable AI Agents
(2025)
• No Venue
Miao et al.
-
Mergenetic: A Simple Evolutionary Model Merging Library
(2025)
• No Venue
Minut et al.
-
Swe-lancer: Can Frontier Llms Earn $1 Million From Real-world Freelance Software Engineering?
(2025)
• No Venue
Miserendino et al.
-
Search Arena: Analyzing Search-augmented Llms
(2025)
• No Venue
Miroyan et al.
-
Livemcpbench: Can Agents Navigate An Ocean Of MCP Tools?
(2025)
• No Venue
Mo et al.
-
Webchorearena: Evaluating Web Browsing Agents On Realistic Tedious Web Tasks
(2025)
• No Venue
Miyai et al.
-
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-translation Solution
(2025)
• No Venue
Asim Mohamed, Martin Gubri
-
Nolima: Long-context Evaluation Beyond Literal Matching
(2025)
• No Venue
Modarressi et al.
-
M3-bench: Multi-modal, Multi-hop, Multi-threaded Tool-using MLLM Agent Benchmark
(2025)
• No Venue
Zhou et al.
-
TF1-EN-3M: Three Million Synthetic Moral Fables For Training Small, Open Language Models
(2025)
• No Venue
Nadas et al.
-
Wildscore: Benchmarking Mllms In-the-wild Symbolic Music Reasoning
(2025)
• No Venue
Mundada et al.
-
Dragflow: Unleashing Dit Priors With Region Based Supervision For Drag Editing
(2025)
• No Venue
Zhou et al.
-
Do Generative Video Models Learn Physical Principles From Watching Videos?
(2025)
• No Venue
Motamed et al.
-
Do I Look Like A `cat.n.01` To You? A Taxonomy Image Generation Benchmark
(2025)
• No Venue
Moskvoretskii et al.
-
Instructx: Towards Unified Visual Editing With MLLM Guidance
(2025)
• No Venue
Mou et al.
-
Lost In The Mix: Evaluating LLM Understanding Of Code-switched Text
(2025)
• No Venue
Mohamed et al.
-
Scientists' First Exam: Probing Cognitive Abilities Of MLLM Via Perception, Understanding, And Reasoning
(2025)
• No Venue
Zhou et al.
-
Scaling Zero-shot Reference-to-video Generation
(2025)
• No Venue
Zhou et al.
-
SWEET-RL: Training Multi-turn LLM Agents On Collaborative Reasoning Tasks
(2025)
• No Venue
Zhou et al.
-
Omniworld: A Multi-domain And Multi-modal Dataset For 4D World Modeling
(2025)
• No Venue
Zhou et al.
-
X-reasoner: Towards Generalizable Reasoning Across Modalities And Domains
(2025)
• No Venue
Liu et al.
-
Webcoach: Self-evolving Web Agents With Cross-session Memory Guidance
(2025)
• No Venue
Liu et al.
-
What Makes A Good Natural Language Prompt?
(2025)
• No Venue
Long et al.
-
Bizfinbench: A Business-driven Real-world Financial Benchmark For Evaluating Llms
(2025)
• No Venue
Lu et al.
-
Finmme: Benchmark Dataset For Financial Multi-modal Reasoning Evaluation
(2025)
• No Venue
Luo et al.
-
Self-challenging Language Model Agents
(2025)
• No Venue
Zhou et al.
-
Terminal Velocity Matching
(2025)
• No Venue
Zhou et al.
-
ATLAS: Adaptive Transfer Scaling Laws For Multilingual Pretraining, Finetuning, And Decoding The Curse Of Multilinguality
(2025)
• No Venue
Longpre et al.
-
Av-reasoner: Improving And Benchmarking Clue-grounded Audio-visual Counting For Mllms
(2025)
• No Venue
Lu et al.
-
Roborefer: Towards Spatial Referring With Reasoning In Vision-language Models For Robotics
(2025)
• No Venue
Zhou et al.
-
Easytext: Controllable Diffusion Transformer For Multilingual Text Rendering
(2025)
• No Venue
Lu et al.
-
Octotools: An Agentic Framework With Extensible Tools For Complex Reasoning
(2025)
• No Venue
Lu et al.
-
UI-S1: Advancing GUI Automation Via Semi-online Reinforcement Learning
(2025)
• No Venue
Lu et al.
-
R-horizon: How Far Can Your Large Reasoning Model Really Go In Breadth And Depth?
(2025)
• No Venue
Lu et al.
-
Youtu-llm: Unlocking The Native Agentic Potential For Lightweight Large Language Models
(2025)
• No Venue
Lu et al.
-
Dreamactor-m1: Holistic, Expressive And Robust Human Image Animation With Hybrid Guidance
(2025)
• No Venue
Luo et al.
-
Autellix: An Efficient Serving Engine For LLM Agents As General Programs
(2025)
• No Venue
Luo et al.
-
Reinforcing General Reasoning Without Verifiers
(2025)
• No Venue
Zhou et al.
-
Large Language Model Agent: A Survey On Methodology, Applications And Challenges
(2025)
• No Venue
Luo et al.
-
Mcp-universe: Benchmarking Large Language Models With Real-world Model Context Protocol Servers
(2025)
• No Venue
Luo et al.
-
Openomni: Large Language Models Pivot Zero-shot Omnimodal Alignment Across Language With Real-time Self-aware Emotional Speech Synthesis
(2025)
• No Venue
Luo et al.
-
Ultrahorizon: Benchmarking Agent Capabilities In Ultra Long-horizon Scenarios
(2025)
• No Venue
Luo et al.
-
Autonomy-of-experts Models
(2025)
• No Venue
Lv et al.
-
F1: A Vision-language-action Model Bridging Understanding And Generation To Actions
(2025)
• No Venue
Lv et al.
-
Agentrewardbench: Evaluating Automatic Evaluations Of Web Agent Trajectories
(2025)
• No Venue
Lù et al.
-
Pixelworld: Towards Perceiving Everything As Pixels
(2025)
• No Venue
Zhiheng Lyu, Xueguang Ma, Wenhu Chen
-
Rethinking Verification For LLM Code Generation: From Generation To Testing
(2025)
• No Venue
Ma et al.
-
Language Models Can Self-improve At State-value Estimation For Better Search
(2025)
• No Venue
Ethan Mendes, Alan Ritter
-
Hard Negative Mining For Domain-specific Retrieval In Enterprise Systems
(2025)
• No Venue
Meghwani et al.
-
Spatiallm: Training Large Language Models For Structured Indoor Modeling
(2025)
• No Venue
Mao et al.
-
Chartqapro: A More Diverse And Challenging Benchmark For Chart Question Answering
(2025)
• No Venue
Masry et al.
-
C3: A Bilingual Benchmark For Spoken Dialogue Models Exploring Challenges In Complex Conversations
(2025)
• No Venue
Chengqian Ma, Wei Tao, Yiwen Guo
-
Calligrapher: Freestyle Text Image Customization
(2025)
• No Venue
Ma et al.
-
Cmi-bench: A Comprehensive Benchmark For Evaluating Music Instruction Following
(2025)
• No Venue
Ma et al.
-
Dover: Intervention-driven Auto Debugging For LLM Multi-agent Systems
(2025)
• No Venue
Ma et al.
-
Deliberation On Priors: Trustworthy Reasoning Of Large Language Models On Knowledge Graphs
(2025)
• No Venue
Ma et al.
-
Dramabench: A Six-dimensional Evaluation Framework For Drama Script Continuation
(2025)
• No Venue
Shijian Ma, Yunqi Huang, Yan Lin
-
Reasoning Models Can Be Effective Without Thinking
(2025)
• No Venue
Ma et al.
-
Iv-bench: A Benchmark For Image-grounded Video Perception And Reasoning In Multimodal Llms
(2025)
• No Venue
Ma et al.
-
Hpsv3: Towards Wide-spectrum Human Preference Score
(2025)
• No Venue
Ma et al.
-
General-reasoner: Advancing LLM Reasoning Across All Domains
(2025)
• No Venue
Ma et al.
-
Inference-time Scaling For Diffusion Models Beyond Scaling Denoising Steps
(2025)
• No Venue
Ma et al.
-
Generative Neural Video Compression Via Video Diffusion Prior
(2025)
• No Venue
Mao et al.
-
LUSIFER: Language Universal Space Integration For Enhanced Multilingual Embeddings With Large Language Models
(2025)
• No Venue
Man et al.
-
Emergenttts-eval: Evaluating TTS Models On Complex Prosodic, Expressiveness, And Linguistic Challenges Using Model-as-a-judge
(2025)
• No Venue
Manku et al.
-
Datadecide: How To Predict Best Pretraining Data With Small Experiments
(2025)
• No Venue
Magnusson et al.
-
Scaling Analysis Of Interleaved Speech-text Language Models
(2025)
• No Venue
Maimon et al.
-
Kolmogorov-arnold Attention: Is Learnable Attention Better For Vision Transformers?
(2025)
• No Venue
Maity et al.
-
Beyondweb: Lessons From Scaling Synthetic Data For Trillion-scale Pretraining
(2025)
• No Venue
Maini et al.
-
Sarchat-bench-2m: A Multi-task Vision-language Benchmark For SAR Image Interpretation
(2025)
• No Venue
Ma et al.
-
Step-video-t2v Technical Report: The Practice, Challenges, And Future Of Video Foundation Model
(2025)
• No Venue
Ma et al.
-
Teleantifraud-28k: A Audio-text Slow-thinking Dataset For Telecom Fraud Detection
(2025)
• No Venue
Ma et al.
-
Token-shuffle: Towards High-resolution Image Generation With Autoregressive Models
(2025)
• No Venue
Ma et al.
-
Videoeval-pro: Robust And Realistic Long Video Understanding Evaluation
(2025)
• No Venue
Ma et al.
-
Compassverifier: A Unified And Robust Verifier For Llms Evaluation And Outcome Reward
(2025)
• No Venue
Liu et al.
-
CCI4.0: A Bilingual Pretraining Dataset For Enhancing Reasoning In Large Language Models
(2025)
• No Venue
Liu et al.
-
Costbench: Evaluating Multi-turn Cost-optimal Planning And Adaptation In Dynamic Environments For LLM Tool-use Agents
(2025)
• No Venue
Liu et al.
-
Fin-r1: A Large Language Model For Financial Reasoning Through Reinforcement Learning
(2025)
• No Venue
Liu et al.
-
Does Dinov3 Set A New Medical Vision Standard?
(2025)
• No Venue
Liu et al.
-
Docreward: A Document Reward Model For Structuring And Stylizing
(2025)
• No Venue
Liu et al.
-
Gear: Generation Augmented Retrieval
(2025)
• No Venue
Liu et al.
-
GEM: A Gym For Agentic Llms
(2025)
• No Venue
Liu et al.
-
Javisdit: Joint Audio-video Diffusion Transformer With Hierarchical Spatio-temporal Prior Synchronization
(2025)
• No Venue
Liu et al.
-
Pc-agent: A Hierarchical Multi-agent Collaboration Framework For Complex Task Automation On PC
(2025)
• No Venue
Liu et al.
-
METAGENE-1: Metagenomic Foundation Model For Pandemic Monitoring
(2025)
• No Venue
Liu et al.
-
Longllada: Unlocking Long Context Capabilities In Diffusion Llms
(2025)
• No Venue
Liu et al.
-
Metafaith: Faithful Natural Language Uncertainty Expression In Llms
(2025)
• No Venue
Liu et al.
-
Mcpeval: Automatic Mcp-based Deep Evaluation For AI Agent Models
(2025)
• No Venue
Liu et al.
-
MOSAIC: Modeling Social AI For Content Dissemination And Regulation In Multi-agent Simulations
(2025)
• No Venue
Liu et al.
-
Mixture Of States: Routing Token-level Dynamics For Multimodal Generation
(2025)
• No Venue
Liu et al.
-
More Thinking, Less Seeing? Assessing Amplified Hallucination In Multimodal Reasoning Models
(2025)
• No Venue
Liu et al.
-
Part I: Tricks Or Traps? A Deep Dive Into RL For LLM Reasoning
(2025)
• No Venue
Liu et al.
-
Openrubrics: Towards Scalable Synthetic Rubric Generation For Reward Modeling And LLM Alignment
(2025)
• No Venue
Liu et al.
-
Othink-mr1: Stimulating Multimodal Generalized Reasoning Capabilities Via Dynamic Reinforcement Learning
(2025)
• No Venue
Liu et al.
-
Region-adaptive Sampling For Diffusion Transformers
(2025)
• No Venue
Liu et al.
-
Prorl: Prolonged Reinforcement Learning Expands Reasoning Boundaries In Large Language Models
(2025)
• No Venue
Liu et al.
-
Quantization Hurts Reasoning? An Empirical Study On Quantized Reasoning Models
(2025)
• No Venue
Liu et al.
-
Researchbench: Benchmarking Llms In Scientific Discovery Via Inspiration-based Task Decomposition
(2025)
• No Venue
Liu et al.
-
Skywork-reward-v2: Scaling Preference Data Curation Via Human-ai Synergy
(2025)
• No Venue
Liu et al.
-
Showtable: Unlocking Creative Table Visualization With Collaborative Reflection And Refinement
(2025)
• No Venue
Liu et al.
-
Shotbench: Expert-level Cinematic Understanding In Vision-language Models
(2025)
• No Venue
Liu et al.
-
SPARK: Synergistic Policy And Reward Co-evolving Framework
(2025)
• No Venue
Liu et al.
-
Step1x-edit: A Practical Framework For General Image Editing
(2025)
• No Venue
Liu et al.
-
SSR: Enhancing Depth Perception In Vision-language Models Via Rationale-guided Spatial Reasoning
(2025)
• No Venue
Liu et al.
-
Star-bench: Probing Deep Spatio-temporal Reasoning As Audio 4D Intelligence
(2025)
• No Venue
Liu et al.
-
The Gold Medals In An Empty Room: Diagnosing Metalinguistic Reasoning In Llms With Camlang
(2025)
• No Venue
Liu et al.
-
Thus Spake Long-context Large Language Model
(2025)
• No Venue
Liu et al.
-
Video-t1: Test-time Scaling For Video Generation
(2025)
• No Venue
Liu et al.
-
A Survey Of Small Language Models
(2024)
• No Venue
Nguyen et al.
-
Marco-llm: Bridging Languages Via Massive Multilingual Training For Cross-lingual Enhancement
(2024)
• No Venue
Ming et al.
-
Hybrid Preferences: Learning To Route Instances For Human Vs. AI Feedback
(2024)
• No Venue
Miranda et al.
-
Medfuzz: Exploring The Robustness Of Large Language Models In Medical Question Answering
(2024)
• No Venue
Ness et al.
-
Compact Language Models Via Pruning And Knowledge Distillation
(2024)
• No Venue
Muralidharan et al.
-
MMIU: Multimodal Multi-image Understanding For Evaluating Large Vision-language Models
(2024)
• No Venue
Meng et al.
-
Bimedix2: Bio-medical Expert LMM For Diverse Medical Modalities
(2024)
• No Venue
Mullappilly et al.
-
Sit: Exploring Flow And Diffusion-based Generative Models With Scalable Interpolant Transformers
(2024)
• No Venue
Ma et al.
-
Videoautoarena: An Automated Arena For Evaluating Large Multimodal Models In Video Analysis Through User Simulation
(2024)
• No Venue
Luo et al.
-
Oneke: A Dockerized Schema-guided LLM Agent-based Knowledge Extraction System
(2024)
• No Venue
Luo et al.
-
Agentinstruct: Toward Generative Teaching With Agentic Flows
(2024)
• No Venue
Mitra et al.
-
Unsolvable Problem Detection: Evaluating Trustworthiness Of Vision Language Models
(2024)
• No Venue
Miyai et al.
-
DITTO-2: Distilled Diffusion Inference-time T-optimization For Music Generation
(2024)
• No Venue
Novack et al.
-
Addition Is All You Need For Energy-efficient Language Models
(2024)
• No Venue
Hongyin Luo, Wei Sun
-
Dreammatcher: Appearance Matching Self-attention For Semantically-consistent Text-to-image Personalization
(2024)
• No Venue
Nam et al.
-
Repliqa: A Question-answering Dataset For Benchmarking Llms On Unseen Reference Content
(2024)
• No Venue
Monteiro et al.
-
Yesbut: A High-quality Annotated Multimodal Dataset For Evaluating Satire Comprehension Capability Of Vision-language Models
(2024)
• No Venue
Nandy et al.
-
Aurora-m: The First Open Source Multilingual Language Model Red-teamed According To The U.S. Executive Order
(2024)
• No Venue
Nakamura et al.
-
ROS-LLM: A ROS Framework For Embodied AI With Task Feedback And Structured Reasoning
(2024)
• No Venue
Mower et al.
-
Are Llms Better Than Reported? Detecting Label Errors And Mitigating Their Effect On Model Performance
(2024)
• No Venue
Nahum et al.
-
AAAR-1.0: Assessing Ai's Potential To Assist Research
(2024)
• No Venue
Lou et al.
-
Grouse: A Benchmark To Evaluate Evaluators In Grounded Question Answering
(2024)
• No Venue
Muller et al.
-
Seacrowd: A Multilingual Multimodal Data Hub And Benchmark Suite For Southeast Asian Languages
(2024)
• No Venue
Lovenia et al.
-
A Pointer Network-based Approach For Joint Extraction And Detection Of Multi-label Multi-class Intents
(2024)
• No Venue
Mullick et al.
-
Wildvision: Evaluating Vision-language Models In The Wild With Human Preferences
(2024)
• No Venue
Lu et al.
-
Toolsandbox: A Stateful, Conversational, Interactive Evaluation Benchmark For LLM Tool Use Capabilities
(2024)
• No Venue
Lu et al.
-
The AI Scientist: Towards Fully Automated Open-ended Scientific Discovery
(2024)
• No Venue
Lu et al.
-
Large Language Models Are Superpositions Of All Characters: Attaining Arbitrary Role-play Via Self-alignment
(2024)
• No Venue
Lu et al.
-
Generative World Explorer
(2024)
• No Venue
Lu et al.
-
Blending Is All You Need: Cheaper, Better Alternative To Trillion-parameters LLM
(2024)
• No Venue
Lu et al.
-
Benchmarking Chinese Knowledge Rectification In Large Language Models
(2024)
• No Venue
Lu et al.
-
A Controlled Study On Long Context Extension And Generalization In Llms
(2024)
• No Venue
Lu et al.
-
MALT: Improving Reasoning With Multi-agent LLM Training
(2024)
• No Venue
Motwani et al.
-
Unleashing The Power Of Data Tsunami: A Comprehensive Survey On Data Assessment And Selection For Instruction Tuning Of Language Models
(2024)
• No Venue
Qin et al.
-
Law Of Vision Representation In Mllms
(2024)
• No Venue
Yang et al.
-
Osworld: Benchmarking Multimodal Agents For Open-ended Tasks In Real Computer Environments
(2024)
• No Venue
Xie et al.
-
Omniedit: Building Image Editing Generalist Models Through Specialist Supervision
(2024)
• No Venue
Wei et al.
-
Audio Flamingo: A Novel Audio Language Model With Few-shot Learning And Dialogue Abilities
(2024)
• No Venue
Kong et al.
-
Thanos: Enhancing Conversational Agents With Skill-of-mind-infused Large Language Model
(2024)
• No Venue
Lee et al.
-
Urbench: A Comprehensive Benchmark For Evaluating Large Multimodal Models In Multi-view Urban Scenarios
(2024)
• No Venue
Zhou et al.
-
Hellobench: Evaluating Long Text Generation Capabilities Of Large Language Models
(2024)
• No Venue
Que et al.
-
Semantic Entropy Probes: Robust And Cheap Hallucination Detection In Llms
(2024)
• No Venue
Kossen et al.
-
Language Models Can Self-lengthen To Generate Long Texts
(2024)
• No Venue
Quan et al.
-
Atp*: An Efficient And Scalable Method For Localizing LLM Behaviour To Components
(2024)
• No Venue
Kramár et al.
-
Fact, Fetch, And Reason: A Unified Evaluation Of Retrieval-augmented Generation
(2024)
• No Venue
Krishna et al.
-
Fine-tuning Large Language Models With Human-inspired Learning Strategies In Medical Question Answering
(2024)
• No Venue
Yang et al.
-
Open-finllms: Open Multimodal Large Language Models For Financial Applications
(2024)
• No Venue
Xie et al.
-
Can Large Language Models Unlock Novel Scientific Research Ideas?
(2024)
• No Venue
Kumar et al.
-
Travelplanner: A Benchmark For Real-world Planning With Language Agents
(2024)
• No Venue
Xie et al.
-
Summary Of A Haystack: A Challenge To Long-context Llms And RAG Systems
(2024)
• No Venue
Laban et al.
-
In Search Of Needles In A 10M Haystack: Recurrent Memory Finds What Llms Miss
(2024)
• No Venue
Kuratov et al.
-
Babilong: Testing The Limits Of Llms With Long Context Reasoning-in-a-haystack
(2024)
• No Venue
Kuratov et al.
-
"give Me BF16 Or Give Me Death"? Accuracy-performance Trade-offs In LLM Quantization
(2024)
• No Venue
Kurtic et al.
-
Evaluating And Aligning Codellms On Human Preference
(2024)
• No Venue
Yang et al.
-
Layerwise Recurrent Router For Mixture-of-experts
(2024)
• No Venue
Qiu et al.
-
TÜLU 3: Pushing Frontiers In Open Language Model Post-training
(2024)
• No Venue
Lambert et al.
-
Large Language Models As Planning Domain Generators
(2024)
• No Venue
Oswald et al.
-
Gsm-symbolic: Understanding The Limitations Of Mathematical Reasoning In Large Language Models
(2024)
• No Venue
Mirzadeh et al.
-
Dyvo: Dynamic Vocabularies For Learned Sparse Retrieval With Entities
(2024)
• No Venue
Nguyen et al.
-
3dsrbench: A Comprehensive 3D Spatial Reasoning Benchmark
(2024)
• No Venue
Ma et al.
-
Xwin-lm: Strong And Scalable Alignment Practice For Llms
(2024)
• No Venue
Ni et al.
-
Are Large Language Models Superhuman Chemists?
(2024)
• No Venue
Mirza et al.
-
GUI Agents: A Survey
(2024)
• No Venue
Nguyen et al.
-
Adapting While Learning: Grounding Llms For Scientific Problems With Intelligent Tool Usage Adaptation
(2024)
• No Venue
Lyu et al.
-
Preserving Multi-modal Capabilities Of Pre-trained Vlms For Improving Vision-linguistic Compositionality
(2024)
• No Venue
Oh et al.
-
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark For Culture-aware Evaluation
(2024)
• No Venue
Onohara et al.
-
Llms Know More Than They Show: On The Intrinsic Representation Of LLM Hallucinations
(2024)
• No Venue
Orgad et al.
-
Snap Video: Scaled Spatiotemporal Transformers For Text-to-video Synthesis
(2024)
• No Venue
Menapace et al.
-
Evaluating Very Long-term Conversational Memory Of LLM Agents
(2024)
• No Venue
Maharana et al.
-
Tango 2: Aligning Diffusion-based Text-to-audio Generations Through Direct Preference Optimization
(2024)
• No Venue
Majumder et al.
-
Rephrasing The Web: A Recipe For Compute And Data-efficient Language Modeling
(2024)
• No Venue
Maini et al.
-
Foundation Models For Music: A Survey
(2024)
• No Venue
Ma et al.
-
Towards World Simulator: Crafting Physical Commonsense-based Benchmark For Video Generation
(2024)
• No Venue
Meng et al.
-
Hibou: A Family Of Foundational Vision Transformers For Pathology
(2024)
• No Venue
Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova
-
Robocasa: Large-scale Simulation Of Everyday Tasks For Generalist Robots
(2024)
• No Venue
Nasiriany et al.
-
BASE TTS: Lessons From Building A Billion-parameter Text-to-speech Model On 100K Hours Of Data
(2024)
• No Venue
Łajszczak et al.
-
LLM Agent Operating System
(2024)
• No Venue
Mei et al.
-
Aya Model: An Instruction Finetuned Open-access Multilingual Language Model
(2024)
• No Venue
Üstün et al.
-
Camvig: Camera Aware Image-to-video Generation With Multimodal Transformers
(2024)
• No Venue
Marmon et al.
-
Eurollm: Multilingual Language Models For Europe
(2024)
• No Venue
Martins et al.
-
Open Language Data Initiative: Advancing Low-resource Machine Translation For Karakalpak
(2024)
• No Venue
Mukhammadsaid Mamasaidov, Abror Shopulatov
-
Codemmlu: A Multi-task Benchmark For Assessing Code Understanding Capabilities Of Codellms
(2024)
• No Venue
Manh et al.
-
TOFU: A Task Of Fictitious Unlearning For Llms
(2024)
• No Venue
Maini et al.
-
Openelm: An Efficient Language Model Family With Open-source Training And Inference Framework
(2024)
• No Venue
Mehta et al.
-
Bigger Is Not Always Better: Scaling Properties Of Latent Diffusion Models
(2024)
• No Venue
Mei et al.
-
Worldcuisines: A Massive-scale Benchmark For Multilingual And Multicultural Visual Question Answering On Global Cuisines
(2024)
• No Venue
Winata et al.
-
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once On Gemma 2
(2024)
• No Venue
Lieberum et al.
-
From Elements To Design: A Layered Approach For Automatic Graphic Design Composition
(2024)
• No Venue
Lin et al.
-
Baichuan Alignment Technical Report
(2024)
• No Venue
Lin et al.
-
Open-sora Plan: Open-source Large Video Generation Model
(2024)
• No Venue
Lin et al.
-
Paper Copilot: A Self-evolving And Efficient LLM System For Personalized Academic Assistance
(2024)
• No Venue
Lin et al.
-
VMAS: Video-to-music Generation Via Semantic Alignment In Web Music Videos
(2024)
• No Venue
Lin et al.
-
Wildbench: Benchmarking Llms With Challenging Tasks From Real Users In The Wild
(2024)
• No Venue
Lin et al.
-
Omnidocbench: Benchmarking Diverse PDF Document Parsing With Comprehensive Annotations
(2024)
• No Venue
Ouyang et al.
-
Training Software Engineering Agents And Verifiers With Swe-gym
(2024)
• No Venue
Pan et al.
-
Law Of The Weakest Link: Cross Capabilities Of Large Language Models
(2024)
• No Venue
Zhong et al.
-
Bielik 7B V0.1: A Polish Language Model -- Development, Insights, And Evaluation
(2024)
• No Venue
Ociepa et al.
-
BALROG: Benchmarking Agentic LLM And VLM Reasoning On Games
(2024)
• No Venue
Paglieri et al.
-
Teach Multimodal Llms To Comprehend Electrocardiographic Images
(2024)
• No Venue
Liu et al.
-
Sparse Laneformer
(2024)
• No Venue
Liu et al.
-
Regmix: Data Mixture As Regression For Language Model Pre-training
(2024)
• No Venue
Liu et al.
-
Part123: Part-aware 3D Reconstruction From A Single-view Image
(2024)
• No Venue
Liu et al.
-
Mind Your Step (by Step): Chain-of-thought Can Reduce Performance On Tasks Where Thinking Makes Humans Worse
(2024)
• No Venue
Liu et al.
-
Mibench: Evaluating Multimodal Large Language Models Over Multiple Images
(2024)
• No Venue
Liu et al.
-
Magicquill: An Intelligent Interactive Image Editing System
(2024)
• No Venue
Liu et al.
-
Longgenbench: Long-context Generation Benchmark
(2024)
• No Venue
Liu et al.
-
Kangaroo: Lossless Self-speculative Decoding Via Double Early Exiting
(2024)
• No Venue
Liu et al.
-
Infini-gram: Scaling Unbounded N-gram Language Models To A Trillion Tokens
(2024)
• No Venue
Liu et al.
-
Harnessing Webpage Uis For Text-rich Visual Understanding
(2024)
• No Venue
Liu et al.
-
GRIN: Gradient-informed Moe
(2024)
• No Venue
Liu et al.
-
Flowing From Words To Pixels: A Framework For Cross-modality Evolution
(2024)
• No Venue
Liu et al.
-
Distilled Decoding 1: One-step Sampling Of Image Auto-regressive Models With Flow Matching
(2024)
• No Venue
Liu et al.
-
DDK: Distilling Domain Knowledge For Efficient Large Language Models
(2024)
• No Venue
Liu et al.
-
Chatqa: Building GPT-4 Level Conversational QA Models
(2024)
• No Venue
Liu et al.
-
Are Your Llms Capable Of Stable Reasoning?
(2024)
• No Venue
Liu et al.
-
Apigen: Automated Pipeline For Generating Verifiable And Diverse Function-calling Datasets
(2024)
• No Venue
Liu et al.
-
Reka Core, Flash, And Edge: A Series Of Powerful Multimodal Language Models
(2024)
• No Venue
Ormazabal et al.
-
Acemath: Advancing Frontier Math Reasoning With Post-training And Reward Modeling
(2024)
• No Venue
Liu et al.
-
Low-bit Quantization Favors Undertrained Llms: Scaling Laws For Quantized Llms With 100T Training Tokens
(2024)
• No Venue
Ouyang et al.
-
Preference Tuning With Human Feedback On Language, Speech, And Vision Tasks: A Survey
(2024)
• No Venue
Winata et al.
-
Step-dpo: Step-wise Preference Optimization For Long-chain Reasoning Of Llms
(2024)
• No Venue
Lai et al.
-
Unified Text-to-image Generation And Retrieval
(2024)
• No Venue
Qu et al.
-
Rewardbench: Evaluating Reward Models For Language Modeling
(2024)
• No Venue
Lambert et al.
-
Freescale: Unleashing The Resolution Of Diffusion Models Via Tuning-free Scale Fusion
(2024)
• No Venue
Qiu et al.
-
Do Large Language Models Latently Perform Multi-hop Reasoning?
(2024)
• No Venue
Yang et al.
-
What Matters When Building Vision-language Models?
(2024)
• No Venue
Laurençon et al.
-
A Preliminary Study Of O1 In Medicine: Are We Closer To An AI Doctor?
(2024)
• No Venue
Xie et al.
-
Stark: Social Long-term Multi-modal Conversation With Persona Commonsense Knowledge
(2024)
• No Venue
Lee et al.
-
Meteor: Mamba-based Traversal Of Rationale For Large Language And Vision Models
(2024)
• No Venue
Lee et al.
-
A Comprehensive Evaluation Of Quantized Instruction-tuned Large Language Models: An Experimental Analysis Up To 405B
(2024)
• No Venue
Lee et al.
-
Intriguing Properties Of Large Language And Vision Models
(2024)
• No Venue
Lee et al.
-
Speech-massive: A Multilingual Speech Dataset For SLU And Beyond
(2024)
• No Venue
Lee et al.
-
Multi-lora Composition For Image Generation
(2024)
• No Venue
Zhong et al.
-
HGRN2: Gated Linear Rnns With State Expansion
(2024)
• No Venue
Qin et al.
-
Toward Robust Hyper-detailed Image Captioning: A Multiagent Approach And Dual Evaluation Metrics For Factuality And Coverage
(2024)
• No Venue
Lee et al.
-
Videorepair: Improving Text-to-video Generation Via Misalignment Evaluation And Localized Refinement
(2024)
• No Venue
Lee et al.
-
Xgen-videosyn-1: High-fidelity Text-to-video Synthesis With Compressed Representations
(2024)
• No Venue
Qin et al.
-
Report Cards: Qualitative Evaluation Of Language Models Using Natural Language Summaries
(2024)
• No Venue
Yang et al.
-
Beyond A*: Better Planning With Transformers Via Search Dynamics Bootstrapping
(2024)
• No Venue
Lehnert et al.
-
Memorag: Moving Towards Next-gen RAG Via Memory-inspired Knowledge Discovery
(2024)
• No Venue
Qian et al.
-
Densing Law Of Llms
(2024)
• No Venue
Xiao et al.
-
The Finben: An Holistic Financial Benchmark For Large Language Models
(2024)
• No Venue
Xie et al.
-
Songcreator: Lyrics-based Universal Song Generation
(2024)
• No Venue
Lei et al.
-
Benchmarking Agentic Workflow Generation
(2024)
• No Venue
Qiao et al.
-
We-math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
(2024)
• No Venue
Qiao et al.
-
Prism: A Framework For Decoupling And Assessing The Capabilities Of Vlms
(2024)
• No Venue
Qiao et al.
-
How Easy Is It To Fool Your Multimodal Llms? An Empirical Analysis On Deceptive Prompts
(2024)
• No Venue
Qian et al.
-
The Curse Of Multi-modalities: Evaluating Hallucinations Of Large Multimodal Models Across Language, Visual, And Audio
(2024)
• No Venue
Leng et al.
-
MMIE: Massive Multimodal Interleaved Comprehension Benchmark For Large Vision-language Models
(2024)
• No Venue
Xia et al.
-
Same Task, More Tokens: The Impact Of Input Length On The Reasoning Performance Of Large Language Models
(2024)
• No Venue
Mosh Levy, Alon Jacoby, Yoav Goldberg
-
Scbench: A KV Cache-centric Analysis Of Long-context Methods
(2024)
• No Venue
Li et al.
-
K-sort Arena: Efficient And Reliable Benchmarking For Generative Models Via K-wise Human Preferences
(2024)
• No Venue
Li et al.
-
Brushedit: All-in-one Image Inpainting And Editing
(2024)
• No Venue
Li et al.
-
Aligning Diffusion Models By Optimizing Human Utility
(2024)
• No Venue
Li et al.
-
A Survey On The Honesty Of Large Language Models
(2024)
• No Venue
Li et al.
-
Dual3d: Efficient And Consistent Text-to-3d Generation With Dual-mode Multi-view Latent Diffusion
(2024)
• No Venue
Li et al.
-
Datacomp-lm: In Search Of The Next Generation Of Training Sets For Language Models
(2024)
• No Venue
Li et al.
-
GMAI-VL & GMAI-VL-5.5M: A Large Vision-language Model And A Comprehensive Multimodal Dataset Towards General Medical AI
(2024)
• No Venue
Li et al.
-
Hunyuan-dit: A Powerful Multi-resolution Diffusion Transformer With Fine-grained Chinese Understanding
(2024)
• No Venue
Li et al.
-
Needlebench: Can Llms Do Retrieval And Reasoning In 1 Million Context Window?
(2024)
• No Venue
Li et al.
-
Livebench: A Challenging, Contamination-free LLM Benchmark
(2024)
• No Venue
White et al.
-
Ensembling Large Language Models With Process Reward-guided Tree Search For Better Complex Reasoning
(2024)
• No Venue
Park et al.
-
Illustrious: An Open Advanced Illustration Model
(2024)
• No Venue
Park et al.
-
Followir: Evaluating And Teaching Information Retrieval Models To Follow Instructions
(2024)
• No Venue
Weller et al.
-
Neco: Improving Dinov2's Spatial Representations In 19 GPU Hours With Patch Neighbor Consistency
(2024)
• No Venue
Pariza et al.
-
Gpt-4v(ision) Is A Human-aligned Evaluator For Text-to-3d Generation
(2024)
• No Venue
Wu et al.
-
Can Large Language Models Understand Context?
(2024)
• No Venue
Zhu et al.
-
Internal Consistency And Self-feedback In Large Language Models: A Survey
(2024)
• No Venue
Liang et al.
-
HEMM: Holistic Evaluation Of Multimodal Foundation Models
(2024)
• No Venue
Liang et al.
-
Controllable Text Generation For Large Language Models: A Survey
(2024)
• No Venue
Liang et al.
-
Can Language Models Replace Programmers? REPOCOD Says 'not Yet'
(2024)
• No Venue
Liang et al.
-
Diffsensei: Bridging Multi-modal Llms And Diffusion Models For Customized Manga Generation
(2024)
• No Venue
Wu et al.
-
AUTOHALLUSION: Automatic Generation Of Hallucination Benchmarks For Vision-language Models
(2024)
• No Venue
Wu et al.
-
Dreamhoi: Subject-driven Generation Of 3D Human-object Interactions With Diffusion Priors
(2024)
• No Venue
Thomas Hanwen Zhu, Ruining Li, Tomas Jakab
-
Compositional 3d-aware Video Generation With LLM Director
(2024)
• No Venue
Zhu et al.
-
Jamba: A Hybrid Transformer-mamba Language Model
(2024)
• No Venue
Lieber et al.
-
Naturalistic Music Decoding From EEG Data Via Latent Diffusion Models
(2024)
• No Venue
Postolache et al.
-
Towards A Better Metric For Text-to-video Generation
(2024)
• No Venue
Wu et al.
-
SPIQA: A Dataset For Multimodal Question Answering On Scientific Papers
(2024)
• No Venue
Shraman Pramanick, Rama Chellappa, Subhashini Venugopalan
-
Sambanova SN40L: Scaling The AI Memory Wall With Dataflow And Composition Of Experts
(2024)
• No Venue
Prabhakar et al.
-
Quantifying Generalization Complexity For Large Language Models
(2024)
• No Venue
Qi et al.
-
Reuse Your Rewards: Reward Model Transfer For Zero-shot Cross-lingual Alignment
(2024)
• No Venue
Wu et al.
-
Video Motion Transfer With Diffusion Transformers
(2024)
• No Venue
Pondaven et al.
-
Nemotron-4 15B Technical Report
(2024)
• No Venue
Parmar et al.
-
Inversecoder: Unleashing The Power Of Instruction-tuned Code Llms With Inverse-instruct
(2024)
• No Venue
Wu et al.
-
GEB-1.3B: Open Lightweight Large Language Model
(2024)
• No Venue
Wu et al.
-
Steering Rectified Flow Models In The Vector Field For Controlled Image Generation
(2024)
• No Venue
Patel et al.
-
Datadreamer: A Tool For Synthetic Data Generation And Reproducible LLM Workflows
(2024)
• No Venue
Ajay Patel, Colin Raffel, Chris Callison-Burch
-
Motionbooth: Motion-aware Customized Text-to-video Generation
(2024)
• No Venue
Wu et al.
-
Revisiting Text-to-image Evaluation With Gecko: On Metrics, Prompts, And Human Ratings
(2024)
• No Venue
Wiles et al.
-
Long-form Factuality In Large Language Models
(2024)
• No Venue
Wei et al.
-
Large Language Model Confidence Estimation Via Black-box Access
(2024)
• No Venue
Pedapati et al.
-
Arctic-snowcoder: Demystifying High-quality Data In Code Pretraining
(2024)
• No Venue
Yuxiang Wei, Hojae Han, Rajhans Samdani
-
OWSM V3.1: Better And Faster Open Whisper-style Speech Models Based On E-branchformer
(2024)
• No Venue
Peng et al.
-
Multilingual E5 Text Embeddings: A Technical Report
(2024)
• No Venue
Wang et al.
-
The Fineweb Datasets: Decanting The Web For The Finest Text Data At Scale
(2024)
• No Venue
Penedo et al.
-
Dreambench++: A Human-aligned Benchmark For Personalized Image Generation
(2024)
• No Venue
Peng et al.
-
Movie Gen: A Cast Of Media Foundation Models
(2024)
• No Venue
Polyak et al.
-
Plot2code: A Comprehensive Benchmark For Evaluating Multi-modal Large Language Models In Code Generation From Scientific Plots
(2024)
• No Venue
Wu et al.
-
Spinning The Golden Thread: Benchmarking Long-form Generation In Language Models
(2024)
• No Venue
Wu et al.
-
Reft: Representation Finetuning For Language Models
(2024)
• No Venue
Wu et al.
-
Slimlm: An Efficient Small Language Model For On-device Document Assistance
(2024)
• No Venue
Pham et al.
-
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
(2024)
• No Venue
Wu et al.
-
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder For Fast, Memory Efficient, And Long Context Finetuning And Inference
(2024)
• No Venue
Warner et al.
-
Longvideobench: A Benchmark For Long-context Interleaved Video-language Understanding
(2024)
• No Venue
Wu et al.
-
T3M: Text Guided 3D Human Motion Synthesis From Speech
(2024)
• No Venue
Wenshuo Peng, Kaipeng Zhang, Sai Qian Zhang
-
Muchomusic: Evaluating Music Understanding In Multimodal Audio-language Models
(2024)
• No Venue
Weck et al.
-
Redpajama: An Open Dataset For Training Large Language Models
(2024)
• No Venue
Weber et al.
-
Proposer-agent-evaluator(pae): Autonomous Skill Discovery For Foundation Model Internet Agents
(2024)
• No Venue
Zhou et al.
-
Visual Context Window Extension: A New Perspective For Long Video Understanding
(2024)
• No Venue
Hongchen Wei, Zhenzhong Chen
-
Skywork-moe: A Deep Dive Into Training Techniques For Mixture-of-experts Language Models
(2024)
• No Venue
Wei et al.
-
Personalized Multimodal Large Language Models: A Survey
(2024)
• No Venue
Wu et al.
-
Longmemeval: Benchmarking Chat Assistants On Long-term Interactive Memory
(2024)
• No Venue
Wu et al.
-
Hyperagent: Generalist Software Engineering Agents To Solve Coding Tasks At Scale
(2024)
• No Venue
Huy Nhat Phan, Phong X. Nguyen, Nghi D. Q. Bui
-
LAION-SG: An Enhanced Large-scale Dataset For Training Complex Image-text Models With Structural Annotations
(2024)
• No Venue
Li et al.
-
Large Language Model Evaluation Via Matrix Nuclear-norm
(2024)
• No Venue
Li et al.
-
M3sciqa: A Multi-modal Multi-document Scientific QA Benchmark For Evaluating Foundation Models
(2024)
• No Venue
Li et al.
-
Long-context Llms Struggle With Long In-context Learning
(2024)
• No Venue
Li et al.
-
Nearest Neighbor Speculative Decoding For LLM Generation And Attribution
(2024)
• No Venue
Li et al.
-
More Agents Is All You Need
(2024)
• No Venue
Li et al.
-
Naturalbench: Evaluating Vision-language Models On Natural Adversarial Samples
(2024)
• No Venue
Li et al.
-
Playground V2.5: Three Insights Towards Enhancing Aesthetic Quality In Text-to-image Generation
(2024)
• No Venue
Li et al.
-
Omnibench: Towards The Future Of Universal Omni-language Models
(2024)
• No Venue
Li et al.
-
Q-refine: A Perceptual Quality Refiner For Ai-generated Image
(2024)
• No Venue
Li et al.
-
Synthetic Data (almost) From Scratch: Generalized Instruction Tuning For Language Models
(2024)
• No Venue
Li et al.
-
Seeing And Understanding: Bridging Vision With Chemical Knowledge Via Chemvlm
(2024)
• No Venue
Li et al.
-
Seed-bench-2-plus: Benchmarking Multimodal Large Language Models With Text-rich Visual Comprehension
(2024)
• No Venue
Li et al.
-
Tele-flm Technical Report
(2024)
• No Venue
Li et al.
-
Taptrv2: Attention-based Position Update Improves Tracking Any Point
(2024)
• No Venue
Li et al.
-
Temporal Reasoning Transfer From Text To Video
(2024)
• No Venue
Li et al.
-
Unipose: A Unified Multimodal Framework For Human Pose Comprehension, Generation And Editing
(2024)
• No Venue
Li et al.
-
Unbounded: A Generative Infinite Game Of Character Life Simulation
(2024)
• No Venue
Li et al.
-
Vlrewardbench: A Challenging Benchmark For Vision-language Generative Reward Models
(2024)
• No Venue
Li et al.
-
Agentless: Demystifying Llm-based Software Engineering Agents
(2024)
• No Venue
Xia et al.
-
Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments
(2024)
• No Venue
Xi et al.
-
Constraint Back-translation Improves Complex Instruction Following Of Large Language Models
(2024)
• No Venue
Qi et al.
-
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning With Checklist
(2024)
• No Venue
Zhou et al.
-
Agent-safetybench: Evaluating The Safety Of LLM Agents
(2024)
• No Venue
Zhang et al.
-
Llm-detectaive: A Tool For Fine-grained Machine-generated Text Detection
(2024)
• No Venue
Abassy et al.
-
Gpt-4o System Card
(2024)
• No Venue
Openai et al.
-
Deepseek-coder-v2: Breaking The Barrier Of Closed-source Models In Code Intelligence
(2024)
• No Venue
Deepseek-Ai et al.
-
Openai O1 System Card
(2024)
• No Venue
Openai et al.
-
Qwen2.5 Technical Report
(2024)
• No Venue
Qwen et al.
-
Moba: A Two-level Agent System For Efficient Mobile Task Automation
(2024)
• No Venue
Zhu et al.
-
Multi-dimensional Insights: Benchmarking Real-world Personalization In Large Multimodal Models
(2024)
• No Venue
Zhang et al.
-
Normalizing Flows Are Capable Generative Models
(2024)
• No Venue
Zhai et al.
-
Mme-realworld: Could Your Multimodal LLM Challenge High-resolution Real-world Scenarios That Are Difficult For Humans?
(2024)
• No Venue
Zhang et al.
-
Controllable Safety Alignment: Inference-time Adaptation To Diverse Safety Requirements
(2024)
• No Venue
Zhang et al.
-
Personalization Of Large Language Models: A Survey
(2024)
• No Venue
Zhang et al.
-
Multimodal Self-instruct: Synthetic Abstract Image And Visual Reasoning Instruction Using Language Model
(2024)
• No Venue
Zhang et al.
-
Phi-3 Technical Report: A Highly Capable Language Model Locally On Your Phone
(2024)
• No Venue
Abdin et al.
-
Named Clinical Entity Recognition Benchmark
(2024)
• No Venue
Abdul et al.
-
OCR Hinders RAG: Evaluating The Cascading Impact Of OCR On Retrieval-augmented Generation
(2024)
• No Venue
Zhang et al.
-
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach
(2024)
• No Venue
Aboagye et al.
-
On The Diagram Of Thought
(2024)
• No Venue
Yifan Zhang, Yang Yuan, Andrew Chi-Chih Yao
-
MAGID: An Automated Pipeline For Generating Synthetic Multi-modal Datasets
(2024)
• No Venue
Aboutalebi et al.
-
Visfocus: Prompt-guided Vision Encoders For Ocr-free Dense Document Understanding
(2024)
• No Venue
Abramovich et al.
-
POA: Pre-training Once For Models Of All Sizes
(2024)
• No Venue
Zhang et al.
-
Shieldgemma: Generative AI Content Moderation Based On Gemma
(2024)
• No Venue
Zeng et al.
-
Copilot Evaluation Harness: Evaluating Llm-guided Software Programming
(2024)
• No Venue
Agarwal et al.
-
Agent S: An Open Agentic Framework That Uses Computers Like A Human
(2024)
• No Venue
Agashe et al.
-
Pixtral 12B
(2024)
• No Venue
Agrawal et al.
-
Yi-lightning Technical Report
(2024)
• No Venue
Ai et al.
-
Reviseval: Improving Llm-as-a-judge Via Response-adapted References
(2024)
• No Venue
Zhang et al.
-
Unibench: Visual Reasoning Requires Rethinking Vision-language Beyond Scaling
(2024)
• No Venue
Al-Tahan et al.
-
Evolutionary Optimization Of Model Merging Recipes
(2024)
• No Venue
Akiba et al.
-
Map-neo: Highly Capable And Transparent Bilingual Large Language Model Series
(2024)
• No Venue
Zhang et al.
-
Subgen: Token Generation In Sublinear Time And Memory
(2024)
• No Venue
Zandieh et al.
-
In-context Example Selection Via Similarity Search Improves Low-resource Machine Translation
(2024)
• No Venue
Armel Zebaze, Benoît Sagot, Rachel Bawden
-
Llana: Large Language And Nerf Assistant
(2024)
• No Venue
Amaduzzi et al.
-
Dallah: A Dialect-aware Multimodal Large Language Model For Arabic
(2024)
• No Venue
Fakhraddin Alwajih, Gagan Bhatia, Muhammad Abdul-Mageed
-
Dilightnet: Fine-grained Lighting Control For Diffusion-based Image Generation
(2024)
• No Venue
Zeng et al.
-
Swe-bench-java: A Github Issue Resolving Benchmark For Java
(2024)
• No Venue
Zan et al.
-
Make Your LLM Fully Utilize The Context
(2024)
• No Venue
An et al.
-
Mgte: Generalized Long-context Text Representation And Reranking Models For Multilingual Text Retrieval
(2024)
• No Venue
Zhang et al.
-
Seed-tts: A Family Of High-quality Versatile Speech Generation Models
(2024)
• No Venue
Anastassiou et al.
-
Expanding Performance Boundaries Of Open-source Multimodal Models With Model, Data, And Test-time Scaling
(2024)
• No Venue
Chen et al.
-
Mmmu-pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
(2024)
• No Venue
Yue et al.
-
Pangea: A Fully Open Multilingual Multimodal LLM For 39 Languages
(2024)
• No Venue
Yue et al.
-
DOTS: Learning To Reason Dynamically In Llms Via Optimal Reasoning Trajectories Search
(2024)
• No Venue
Yue et al.
-
Openscholar: Synthesizing Scientific Literature With Retrieval-augmented Lms
(2024)
• No Venue
Asai et al.
-
Aya 23: Open Weight Releases To Further Multilingual Progress
(2024)
• No Venue
Aryabumi et al.
-
Inference Scaling For Long-context Retrieval Augmented Generation
(2024)
• No Venue
Yue et al.
-
Notellm: A Retrievable Large Language Model For Note Recommendation
(2024)
• WWW '24: The ACM Web Conference 2024
• 13 citations
Zhang et al.
-
Mathverse: Does Your Multi-modal LLM Truly See The Diagrams In Visual Math Problems?
(2024)
• No Venue
Zhang et al.
-
ITACLIP: Boosting Training-free Semantic Segmentation With Image, Text, And Architectural Enhancements
(2024)
• No Venue
Aydın et al.
-
Longcite: Enabling Llms To Generate Fine-grained Citations In Long-context QA
(2024)
• No Venue
Zhang et al.
-
Longalign: A Recipe For Long Context Alignment Of Large Language Models
(2024)
• No Venue
Bai et al.
-
Let's Go Shopping (LGS) -- Web-scale Image-text Dataset For Visual Concept Understanding
(2024)
• No Venue
Bai et al.
-
One Token To Seg Them All: Language Instructed Reasoning Segmentation In Videos
(2024)
• No Venue
Bai et al.
-
Longbench V2: Towards Deeper Understanding And Reasoning On Realistic Long-context Multitasks
(2024)
• No Venue
Bai et al.
-
Longwriter: Unleashing 10,000+ Word Generation From Long Context Llms
(2024)
• No Venue
Bai et al.
-
SUTRA: Scalable Multilingual Language Model Architecture
(2024)
• No Venue
Bendale et al.
-
Mechanistic Permutability: Match Features Across Layers
(2024)
• No Venue
Nikita Balagansky, Ian Maksimov, Daniil Gavrilov
-
Unitxt: Flexible, Shareable And Reusable Data Preparation And Evaluation For Generative AI
(2024)
• No Venue
Bandel et al.
-
Breaking Boundaries: Investigating The Effects Of Model Editing On Cross-linguistic Performance
(2024)
• No Venue
Banerjee et al.
-
Safeinfer: Context Adaptive Decoding Time Safety Alignment For Large Language Models
(2024)
• No Venue
Banerjee et al.
-
Videorefer Suite: Advancing Spatial-temporal Object Understanding With Video LLM
(2024)
• No Venue
Yuan et al.
-
Accessing GPT-4 Level Mathematical Olympiad Solutions Via Monte Carlo Tree Self-refine With Llama-3 8B
(2024)
• No Venue
Zhang et al.
-
Graph Mamba: Towards Learning On Graphs With State Space Models
(2024)
• No Venue
Ali Behrouz, Farnoosh Hashemi
-
Knowledge Transfer Across Modalities With Natural Language Supervision
(2024)
• No Venue
Barbano et al.
-
Lumiere: A Space-time Diffusion Model For Video Generation
(2024)
• No Venue
Bar-Tal et al.
-
A Careful Examination Of Large Language Model Performance On Grade School Arithmetic
(2024)
• No Venue
Zhang et al.
-
API-BLEND: A Comprehensive Corpora For Training And Benchmarking API Llms
(2024)
• No Venue
Basu et al.
-
Boosting Healthcare Llms Through Retrieved Context
(2024)
• No Venue
Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Dario Garcia-Gasulla
-
Inflation With Diffusion: Efficient Temporal Adaptation For Text-to-video Super-resolution
(2024)
• No Venue
Yuan et al.
-
Remamba: Equip Mamba With Effective Long-sequence Modeling
(2024)
• No Venue
Yuan et al.
-
Word Sense Linking: Disambiguating Outside The Sandbox
(2024)
• No Venue
Bejgu et al.
-
Chatmusician: Understanding And Generating Music Intrinsically With LLM
(2024)
• No Venue
Yuan et al.
-
Ditfastattn: Attention Compression For Diffusion Transformer Models
(2024)
• No Venue
Yuan et al.
-
Chronomagic-bench: A Benchmark For Metamorphic Evaluation Of Text-to-time-lapse Video Generation
(2024)
• No Venue
Yuan et al.
-
Fintral: A Family Of GPT-4 Level Multimodal Financial Large Language Models
(2024)
• No Venue
Bhatia et al.
-
INDUS: Effective And Efficient Language Models For Scientific Applications
(2024)
• No Venue
Bhattacharjee et al.
-
Benchmarking Trustworthiness Of Multimodal Large Language Models: A Comprehensive Study
(2024)
• No Venue
Zhang et al.
-
Affordance-based Robot Manipulation With Flow Matching
(2024)
• No Venue
Fan Zhang, Michael Gienger
-
Long Code Arena: A Set Of Benchmarks For Long-context Code Models
(2024)
• No Venue
Bogomolov et al.
-
Make It Count: Text-to-image Generation With An Accurate Number Of Objects
(2024)
• No Venue
Binyamin et al.
-
Visual Riddles: A Commonsense And World Knowledge Challenge For Large Vision And Language Models
(2024)
• No Venue
Bitton-Guetta et al.
-
Allhands: Ask Me Anything On Large-scale Verbatim Feedback Via Large Language Models
(2024)
• No Venue
Zhang et al.
-
Aquila2 Technical Report
(2024)
• No Venue
Zhang et al.
-
Is Bigger Edit Batch Size Always Better? -- An Empirical Study On Model Editing With Llama-3
(2024)
• No Venue
Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli
-
Lmms-eval: Reality Check On The Evaluation Of Large Multimodal Models
(2024)
• No Venue
Zhang et al.
-
Beyond Llava-hd: Diving Into High-resolution Large Multimodal Models
(2024)
• No Venue
Zhang et al.
-
Evaluating Multiview Object Consistency In Humans And Image Models
(2024)
• No Venue
Bonnen et al.
-
Biomedlm: A 2.7B Parameter Language Model Trained On Biomedical Text
(2024)
• No Venue
Bolton et al.
-
Windows Agent Arena: Evaluating Multi-modal OS Agents At Scale
(2024)
• No Venue
Bonatti et al.
-
In Case You Missed It: ARC 'challenge' Is Not That Challenging
(2024)
• No Venue
Łukasz Borchmann
-
Can Mllms Understand The Deep Implication Behind Chinese Images?
(2024)
• No Venue
Zhang et al.
-
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
(2024)
• No Venue
Zhang et al.
-
Counting Ability Of Large Language Models And Impact Of Tokenization
(2024)
• No Venue
Xiang Zhang, Juntai Cao, Chenyu You
-
Broadway: Boost Your Text-to-video Generation Model In A Training-free Way
(2024)
• No Venue
Bu et al.
-
Roadmap Towards Superhuman Speech Understanding Using Large Language Models
(2024)
• No Venue
Bu et al.
-
Critic-v: VLM Critics Help Catch VLM Errors In Multimodal Reasoning
(2024)
• No Venue
Zhang et al.
-
Diversity Empowers Intelligence: Integrating Expertise Of Software Engineering Agents
(2024)
• No Venue
Zhang et al.
-
Codeit: Self-improving Language Models With Prioritized Hindsight Replay
(2024)
• No Venue
Butt et al.
-
UFO: A Ui-focused Agent For Windows OS Interaction
(2024)
• No Venue
Zhang et al.
-
Read-me: Refactorizing Llms As Router-decoupled Mixture Of Experts With System Co-design
(2024)
• No Venue
Cai et al.
-
Internlm2 Technical Report
(2024)
• No Venue
Cai et al.
-
Diffusion Self-distillation For Zero-shot Customized Image Generation
(2024)
• No Venue
Cai et al.
-
Ditctrl: Exploring Attention Control In Multi-modal Diffusion Transformer For Tuning-free Multi-prompt Longer Video Generation
(2024)
• No Venue
Cai et al.
-
On The Compositional Generalization Of Multimodal Llms For Medical Imaging
(2024)
• No Venue
Cai et al.
-
Uni-smart: Universal Science Multimodal Analysis And Research Transformer
(2024)
• No Venue
Cai et al.
-
Temporalbench: Benchmarking Fine-grained Temporal Understanding For Multimodal Video Models
(2024)
• No Venue
Cai et al.
-
Outcome-refining Process Supervision For Code Generation
(2024)
• No Venue
Yu et al.
-
Turtlebench: Evaluating Top Language Models Via Real-world Yes/no Puzzles
(2024)
• No Venue
Yu et al.
-
Texthawk: Exploring Efficient Fine-grained Perception Of Multimodal Large Language Models
(2024)
• No Venue
Yu et al.
-
Mm-vet V2: A Challenging Benchmark To Evaluate Large Multimodal Models For Integrated Capabilities
(2024)
• No Venue
Yu et al.
-
Researchtown: Simulator Of Human Research Community
(2024)
• No Venue
Yu et al.
-
HARE: Human Priors, A Key To Small Language Model Efficiency
(2024)
• No Venue
Zhang et al.
-
Structeval: Deepen And Broaden Large Language Model Assessment Via Structured Evaluation
(2024)
• No Venue
Cao et al.
-
Compassjudger-1: All-in-one Judge Model Helps Model Evaluation And Evolution
(2024)
• No Venue
Cao et al.
-
Enhancing Recommendation Diversity By Re-ranking With Large Language Models
(2024)
• ACM Transactions on Recommender Systems
• 25 citations
Diego Carraro, Derek Bridge
-
PERSONA: A Reproducible Testbed For Pluralistic Alignment
(2024)
• No Venue
Castricato et al.
-
GS-LRM: Large Reconstruction Model For 3D Gaussian Splatting
(2024)
• No Venue
Zhang et al.
-
Evaluating Rag-fusion With Ragelo: An Automated Elo-based Framework
(2024)
• No Venue
Zackary Rackauckas, Arthur Câmara, Jakub Zavrel
-
Bringing Objects To Life: 4D Generation From 3D Objects
(2024)
• No Venue
Rahamim et al.
-
Denoising Vision Transformers
(2024)
• No Venue
Yang et al.
-
CRAG -- Comprehensive RAG Benchmark
(2024)
• No Venue
Yang et al.
-
SAM 2: Segment Anything In Images And Videos
(2024)
• No Venue
Ravi et al.
-
THEANINE: Revisiting Memory Management In Long-term Conversations With Timeline-augmented Response Generation
(2024)
• No Venue
Kim et al.
-
Dialsim: A Real-time Simulator For Evaluating Long-term Dialogue Understanding Of Conversational Agents
(2024)
• No Venue
Kim et al.
-
Cognitive Map For Language Models: Optimal Planning Via Verbally Representing The World Model
(2024)
• No Venue
Kim et al.
-
Fifo-diffusion: Generating Infinite Videos From Text Without Training
(2024)
• No Venue
Kim et al.
-
Evaluating Language Models As Synthetic Data Generators
(2024)
• No Venue
Kim et al.
-
Towards Universal Soccer Video Understanding
(2024)
• No Venue
Rao et al.
-
Evaluating D-MERIT Of Partial-annotation On Information Retrieval
(2024)
• No Venue
Rassin et al.
-
Prometheus 2: An Open Source Language Model Specialized In Evaluating Other Language Models
(2024)
• No Venue
Kim et al.
-
Husky: A Unified, Open-source Language Agent For Multi-step Reasoning
(2024)
• No Venue
Kim et al.
-
Omniact: A Dataset And Benchmark For Enabling Multimodal Generalist Autonomous Agents For Desktop And Web
(2024)
• No Venue
Kapoor et al.
-
Self-moe: Towards Compositional Large Language Models With Self-specialized Experts
(2024)
• No Venue
Kang et al.
-
LLM Comparator: Visual Analytics For Side-by-side Evaluation Of Large Language Models
(2024)
• No Venue
Kahng et al.
-
Fiddler: CPU-GPU Orchestration For Fast Inference Of Mixture-of-experts Models
(2024)
• No Venue
Kamahori et al.
-
EXAONE 3.0 7.8B Instruction Tuned Language Model
(2024)
• No Venue
Research et al.
-
Fastvoicegrad: One-step Diffusion-based Voice Conversion With Adversarial Conditional Diffusion Distillation
(2024)
• No Venue
Kaneko et al.
-
Needle Threading: Can Llms Follow Threads Through Near-million-scale Haystacks?
(2024)
• No Venue
Jonathan Roberts, Kai Han, Samuel Albanie
-
Seccodeplt: A Unified Platform For Evaluating The Security Of Code Genai
(2024)
• No Venue
Yang et al.
-
Qwen2 Technical Report
(2024)
• No Venue
Yang et al.
-
1.58-bit FLUX
(2024)
• No Venue
Yang et al.
-
Video Depth Without Video Models
(2024)
• No Venue
Ke et al.
-
Large Motion Video Autoencoding With Cross-modal Video VAE
(2024)
• No Venue
Xing et al.
-
Seed-story: Multimodal Long Story Generation With Large Language Model
(2024)
• No Venue
Yang et al.
-
MEDIC: Towards A Comprehensive Framework For Evaluating Llms In Clinical Applications
(2024)
• No Venue
Kanithi et al.
-
3D-GRAND: A Million-scale Dataset For 3d-llms With Better Grounding And Less Hallucination
(2024)
• No Venue
Yang et al.
-
Prismatic Vlms: Investigating The Design Space Of Visually-conditioned Language Models
(2024)
• No Venue
Karamcheti et al.
-
Guiding A Diffusion Model With A Bad Version Of Itself
(2024)
• No Venue
Karras et al.
-
Knowledge Navigator: Llm-guided Browsing Framework For Exploratory Search In Scientific Literature
(2024)
• No Venue
Uri Katz, Mosh Levy, Yoav Goldberg
-
Capabilities Of Gemini Models In Medicine
(2024)
• No Venue
Saab et al.
-
Pre-training Small Base Lms With Fewer Tokens
(2024)
• No Venue
Sunny Sanyal, Sujay Sanghavi, Alexandros G. Dimakis
-
RAPTOR: Recursive Abstractive Processing For Tree-organized Retrieval
(2024)
• No Venue
Sarthi et al.
-
Coverbench: A Challenging Benchmark For Complex Claim Verification
(2024)
• No Venue
Jacovi et al.
-
Noteeline: Supporting Real-time Notetaking From Keypoints With Large Language Models
(2024)
• No Venue
Huq et al.
-
Hq-edit: A High-quality Dataset For Instruction-based Image Editing
(2024)
• No Venue
Hui et al.
-
Noise-aware Training Of Layout-aware Language Models
(2024)
• No Venue
Sarkhel et al.
-
Texgen: Text-guided 3D Texture Generation With Multi-view Sampling And Resampling
(2024)
• No Venue
Huo et al.
-
Maggie: Masked Guided Gradual Human Instance Matting
(2024)
• No Venue
Huynh et al.
-
Slowfast-llava: A Strong Training-free Baseline For Video Large Language Models
(2024)
• No Venue
Xu et al.
-
Simple And Scalable Strategies To Continually Pre-train Large Language Models
(2024)
• No Venue
Ibrahim et al.
-
Gitchameleon: Unmasking The Version-switching Capabilities Of Code Generation Models
(2024)
• No Venue
Islah et al.
-
Modulated Intervention Preference Optimization (MIPO): Keep The Easy, Refine The Difficult
(2024)
• No Venue
Cheolhun Jang
-
Data Contamination Report From The 2024 CONDA Shared Task
(2024)
• No Venue
Sainz et al.
-
Cogvideox: Text-to-video Diffusion Models With An Expert Transformer
(2024)
• No Venue
Yang et al.
-
Hackphyr: A Local Fine-tuned LLM Agent For Network Security Environments
(2024)
• No Venue
Maria Rigaki, Carlos Catania, Sebastian Garcia
-
Proactive Detection Of Voice Cloning With Localized Watermarking
(2024)
• No Venue
Roman et al.
-
VARCO-VISION: Expanding Frontiers In Korean Vision-language Models
(2024)
• No Venue
Ju et al.
-
Pegasus-v1 Technical Report
(2024)
• No Venue
Jung et al.
-
Miradata: A Large-scale Video Dataset With Long Durations And Structured Captions
(2024)
• No Venue
Ju et al.
-
RALL-E: Robust Codec Language Modeling With Chain-of-thought Prompting For Text-to-speech Synthesis
(2024)
• No Venue
Xin et al.
-
Associative Recurrent Memory Transformer
(2024)
• No Venue
Rodkin et al.
-
Compositional Video Generation As Flow Equalization
(2024)
• No Venue
Xingyi Yang, Xinchao Wang
-
Bigdocs: An Open And Permissively-licensed Dataset For Training Multimodal Models On Document And Code Tasks
(2024)
• No Venue
Rodriguez et al.
-
Dsbench: How Far Are Data Science Agents To Becoming Data Science Experts?
(2024)
• No Venue
Jing et al.
-
Llava-critic: Learning To Evaluate Multimodal Models
(2024)
• No Venue
Xiong et al.
-
Wildteaming At Scale: From In-the-wild Jailbreaks To (adversarially) Safer Language Models
(2024)
• No Venue
Jiang et al.
-
Nanoflow: Towards Optimal Large Language Model Serving Throughput
(2024)
• No Venue
Zhu et al.
-
Thinking In Space: How Multimodal Large Language Models See, Remember, And Recall Spaces
(2024)
• No Venue
Yang et al.
-
Img-diff: Contrastive Data Synthesis For Multimodal Large Language Models
(2024)
• No Venue
Jiao et al.
-
Grandmaster-level Chess Without Search
(2024)
• No Venue
Ruoss et al.
-
Vigor: Improving Visual Grounding Of Large Vision Language Models With Fine-grained Reward Modeling
(2024)
• No Venue
Yan et al.
-
INCLUDE: Evaluating Multilingual Language Understanding With Regional Knowledge
(2024)
• No Venue
Romanou et al.
-
Univg: Towards Unified-modal Video Generation
(2024)
• No Venue
Ruan et al.
-
Xgen-mm-vid (blip-3-video): You Only Need 32 Tokens To Represent A Video Even In Vlms
(2024)
• No Venue
Ryoo et al.
-
Viper: Visual Personalization Of Generative Models Via Individual Preference Learning
(2024)
• No Venue
Salehi et al.
-
Characterizing Prompt Compression Methods For Long Context Inference
(2024)
• No Venue
Jha et al.
-
How To Train Data-efficient Llms
(2024)
• No Venue
Sachdeva et al.
-
Tails Tell Tales: Chapter-wide Manga Transcriptions With Character Names
(2024)
• No Venue
Ragav Sachdeva, Gyungin Shin, Andrew Zisserman
-
LEOPARD : A Vision Language Model For Text-rich Multi-image Tasks
(2024)
• No Venue
Jia et al.
-
Synthesizing Text-to-sql Data From Weak And Strong Llms
(2024)
• No Venue
Yang et al.
-
RATIONALYST: Pre-training Process-supervision For Improving Reasoning
(2024)
• No Venue
Jiang et al.
-
E5-V: Universal Embeddings With Multimodal Large Language Models
(2024)
• No Venue
Jiang et al.
-
Mora: High-rank Updating For Parameter-efficient Fine-tuning
(2024)
• No Venue
Jiang et al.
-
Mmsearch: Benchmarking The Potential Of Large Models As Multi-modal Search Engines
(2024)
• No Venue
Jiang et al.
-
Minference 1.0: Accelerating Pre-filling For Long-context Llms Via Dynamic Sparse Attention
(2024)
• No Venue
Jiang et al.
-
Genai Arena: An Open Evaluation Platform For Generative Models
(2024)
• No Venue
Jiang et al.
-
Videohallucer: Evaluating Intrinsic And Extrinsic Hallucinations In Large Video-language Models
(2024)
• No Venue
Wang et al.
-
LML: Language Model Learning A Dataset For Data-augmented Prediction
(2024)
• No Venue
Praneeth Vadlapati
-
I4vgen: Image As Stepping Stone For Text-to-video Generation
(2024)
• No Venue
Guo et al.
-
Is Preference Alignment Always The Best Option To Enhance Llm-based Translation? An Empirical Analysis
(2024)
• No Venue
Gisserot-Boukhlef et al.
-
Chatglm: A Family Of Large Language Models From GLM-130B To GLM-4 All Tools
(2024)
• No Venue
Glm et al.
-
The Good, The Bad, And The Greedy: Evaluation Of Llms Should Not Ignore Non-determinism
(2024)
• No Venue
Song et al.
-
All Languages Matter: Evaluating Lmms On Culturally Diverse 100 Languages
(2024)
• No Venue
Vayani et al.
-
MLLM Can See? Dynamic Correction Decoding For Hallucination Mitigation
(2024)
• No Venue
Wang et al.
-
Knesset-dictabert: A Hebrew Language Model For Parliamentary Proceedings
(2024)
• No Venue
Gili Goldin, Shuly Wintner
-
Is It Really Long Context If All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP
(2024)
• No Venue
Goldman et al.
-
Atomovideo: High Fidelity Image-to-video Generation
(2024)
• No Venue
Gong et al.
-
Omnifusion Technical Report
(2024)
• No Venue
Goncharova et al.
-
AST-T5: Structure-aware Pretraining For Code Generation And Understanding
(2024)
• No Venue
Linyuan Gong, Mostafa Elhoushi, Alvin Cheung
-
Turbo Sparse: Achieving LLM SOTA Performance With Minimal Activated Parameters
(2024)
• No Venue
Song et al.
-
Av-odyssey Bench: Can Your Multimodal Llms Really Understand Audio-visual Information?
(2024)
• No Venue
Gong et al.
-
Scaling Diffusion Language Models Via Adaptation From Autoregressive Models
(2024)
• No Venue
Gong et al.
-
Facilitating Large Language Model Russian Adaptation With Learned Embedding Propagation
(2024)
• No Venue
Mikhail Tikhomirov, Daniil Chernyshev
-
Cs-bench: A Comprehensive Benchmark For Large Language Models Towards Computer Science Mastery
(2024)
• No Venue
Song et al.
-
Olmo: Accelerating The Science Of Language Models
(2024)
• No Venue
Groeneveld et al.
-
Learn Your Reference Model For Real Good Alignment
(2024)
• No Venue
Gorbatovski et al.
-
Triposr: Fast 3D Object Reconstruction From A Single Image
(2024)
• No Venue
Tochilkin et al.
-
Navigating The Digital World As Humans Do: Universal Visual Grounding For GUI Agents
(2024)
• No Venue
Gou et al.
-
Both Text And Images Leaked! A Systematic Analysis Of Multimodal LLM Data Contamination
(2024)
• No Venue
Song et al.
-
Are AI Detectors Good Enough? A Survey On Quality Of Datasets With Machine-generated Texts
(2024)
• No Venue
Gritsai et al.
-
Learning Universal Predictors
(2024)
• No Venue
Grau-Moya et al.
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
(2024)
• No Venue
Grosnit et al.
-
Rakutenai-7b: Extending Large Language Models For Japanese
(2024)
• No Venue
Group et al.
-
Cruxeval: A Benchmark For Code Reasoning, Understanding And Execution
(2024)
• No Venue
Gu et al.
-
Denoising LM: Pushing The Limits Of Error Correction Models For Speech Recognition
(2024)
• No Venue
Gu et al.
-
GAVEL: Generating Games Via Evolution And Language Models
(2024)
• No Venue
Todd et al.
-
Guide-and-rescale: Self-guidance Mechanism For Effective Tuning-free Real Image Editing
(2024)
• No Venue
Titov et al.
-
From Words To Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-context Examples
(2024)
• No Venue
Vacareanu et al.
-
Scaling LLM Test-time Compute Optimally Can Be More Effective Than Scaling Model Parameters
(2024)
• No Venue
Snell et al.
-
A Large Encoder-decoder Family Of Foundation Models For Chemical Language
(2024)
• No Venue
Soares et al.
-
Direct Language Model Alignment From Online AI Feedback
(2024)
• No Venue
Guo et al.
-
Codeeditorbench: Evaluating Code Editing Capability Of Large Language Models
(2024)
• No Venue
Guo et al.
-
Deepseek-coder: When The Large Language Model Meets Programming -- The Rise Of Code Intelligence
(2024)
• No Venue
Guo et al.
-
MOSAIC: A Modular System For Assistive And Interactive Cooking
(2024)
• No Venue
Wang et al.
-
Pulid: Pure And Lightning ID Customization Via Contrastive Alignment
(2024)
• No Venue
Guo et al.
-
Vision Superalignment: Weak-to-strong Generalization For Vision Foundation Models
(2024)
• No Venue
Guo et al.
-
Infinity: Scaling Bitwise Autoregressive Modeling For High-resolution Image Synthesis
(2024)
• No Venue
Han et al.
-
MARVEL-40M+: Multi-level Visual Elaboration For High-fidelity Text-to-3d Content Creation
(2024)
• No Venue
Sinha et al.
-
Opendevin: An Open Platform For AI Software Developers As Generalist Agents
(2024)
• No Venue
Wang et al.
-
Global MMLU: Understanding And Addressing Cultural And Linguistic Biases In Multilingual Evaluation
(2024)
• No Venue
Singh et al.
-
Rethinking Interpretability In The Era Of Large Language Models
(2024)
• No Venue
Singh et al.
-
Omnipred: Language Models As Universal Regressors
(2024)
• No Venue
Song et al.
-
Aya Dataset: An Open-access Collection For Multilingual Instruction Tuning
(2024)
• No Venue
Singh et al.
-
Attention Overflow: Language Model Input Blur During Long-context Missing Items Recommendation
(2024)
• No Venue
Damien Sileo
-
Infimm-webmath-40b: Advancing Multimodal Pre-training For Enhanced Mathematical Reasoning
(2024)
• No Venue
Han et al.
-
Pingpong: A Benchmark For Role-playing Language Models With User Emulation And Multi-model Evaluation
(2024)
• No Venue
Ilya Gusev
-
Can Llms Generate Novel Research Ideas? A Large-scale Human Study With 100+ NLP Researchers
(2024)
• No Venue
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
-
Androidlab: Training And Systematic Benchmarking Of Android Autonomous Agents
(2024)
• No Venue
Xu et al.
-
Chatglm-math: Improving Math Problem-solving In Large Language Models With A Self-critique Pipeline
(2024)
• No Venue
Xu et al.
-
Think: Thinner Key Cache By Query-driven Pruning
(2024)
• No Venue
Xu et al.
-
Direct-a-video: Customized Video Generation With User-directed Camera Movement And Object Motion
(2024)
• No Venue
Yang et al.
-
Mpirigen: MPI Code Generation Through Domain-specific Language Models
(2024)
• No Venue
Schneider et al.
-
Truth Or Mirage? Towards End-to-end Factuality Evaluation With LLM-OASIS
(2024)
• No Venue
Scirè et al.
-
Theagentcompany: Benchmarking LLM Agents On Consequential Real World Tasks
(2024)
• No Venue
Xu et al.
-
UCFE: A User-centric Financial Expertise Benchmark For Large Language Models
(2024)
• No Venue
Yang et al.
-
Deciphering Cross-modal Alignment In Large Vision-language Models With Modality Integration Rate
(2024)
• No Venue
Huang et al.
-
Autocrawler: A Progressive Understanding Web Agent For Web Crawler Generation
(2024)
• No Venue
Huang et al.
-
Authorship Attribution In The Era Of Llms: Problems, Methodologies, And Challenges
(2024)
• No Venue
Baixiang Huang, Canyu Chen, Kai Shu
-
Creativesynth: Creative Blending And Synthesis Of Visual Arts Based On Multimodal Diffusion
(2024)
• No Venue
Huang et al.
-
Livexiv -- A Multi-modal Live Benchmark Based On Arxiv Papers Content
(2024)
• No Venue
Shabtay et al.
-
Stronger Models Are NOT Stronger Teachers For Instruction Tuning
(2024)
• No Venue
Xu et al.
-
Pllava : Parameter-free Llava Extension From Images To Videos For Video Dense Captioning
(2024)
• No Venue
Xu et al.
-
Vbench++: Comprehensive And Versatile Benchmark Suite For Video Generative Models
(2024)
• No Venue
Huang et al.
-
Piccolo2: General Text Embedding With Multi-task Hybrid Loss Training
(2024)
• No Venue
Huang et al.
-
Pointinfinity: Resolution-invariant Point Diffusion Models
(2024)
• No Venue
Huang et al.
-
O1 Replication Journey -- Part 2: Surpassing O1-preview Through Simple Distillation, Big Progress Or Bitter Lesson?
(2024)
• No Venue
Huang et al.
-
Can Knowledge Editing Really Correct Hallucinations?
(2024)
• No Venue
Huang et al.
-
Billm: Pushing The Limit Of Post-training Quantization For Llms
(2024)
• No Venue
Huang et al.
-
Mmevalpro: Calibrating Multimodal Benchmarks Towards Trustworthy And Efficient Evaluation
(2024)
• No Venue
Huang et al.
-
LITA: Language Instructed Temporal-localization Assistant
(2024)
• No Venue
Huang et al.
-
Genmac: Compositional Text-to-video Generation With Multi-agent Collaboration
(2024)
• No Venue
Huang et al.
-
Compression Represents Intelligence Linearly
(2024)
• No Venue
Huang et al.
-
Weaver: Foundation Models For Creative Writing
(2024)
• No Venue
Wang et al.
-
Training Language Models On The Knowledge Graph: Insights On Hallucinations And Their Detectability
(2024)
• No Venue
Hron et al.
-
Deepspeed-fastgen: High-throughput Text Generation For Llms Via MII And Deepspeed-inference
(2024)
• No Venue
Holmes et al.
-
Model Editing With Canonical Examples
(2024)
• No Venue
Hewitt et al.
-
Tutor Copilot: A Human-ai Approach For Scaling Real-time Expertise
(2024)
• No Venue
Wang et al.
-
Algorithmic Progress In Language Models
(2024)
• No Venue
Ho et al.
-
Llava-gemma: Accelerating Multimodal Foundation Models With A Compact Language Model
(2024)
• No Venue
Hinck et al.
-
Inserf: Text-driven Generative Object Insertion In Neural 3D Scenes
(2024)
• No Venue
Shahbazi et al.
-
Atlas-chat: Adapting Large Language Models For Low-resource Moroccan Arabic Dialect
(2024)
• No Venue
Shang et al.
-
Vript: A Video Is Worth Thousands Of Words
(2024)
• No Venue
Yang et al.
-
An Object Is Worth 64x64 Pixels: Generating 3D Object Via Image Diffusion
(2024)
• No Venue
Yan et al.
-
Yolov9: Learning What You Want To Learn Using Programmable Gradient Information
(2024)
• No Venue
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
-
Visual Echoes: A Simple Unified Transformer For Audio-visual Generation
(2024)
• No Venue
Yang et al.
-
Powerinfer-2: Fast Large Language Model Inference On A Smartphone
(2024)
• No Venue
Xue et al.
-
GLEE: A Unified Framework And Benchmark For Language-based Economic Environments
(2024)
• No Venue
Shapira et al.
-
TOMATO: Assessing Visual Temporal Reasoning Capabilities In Multimodal Foundation Models
(2024)
• No Venue
Shangguan et al.
-
L3GO: Language Agents With Chain-of-3d-thoughts For Generating Unconventional Objects
(2024)
• No Venue
Yamada et al.
-
Xgen-mm (BLIP-3): A Family Of Open Large Multimodal Models
(2024)
• No Venue
Xue et al.
-
Sotopia-π: Interactive Learning Of Socially Intelligent Language Agents
(2024)
• No Venue
Wang et al.
-
Decoding Compressed Trust: Scrutinizing The Trustworthiness Of Efficient Llms Under Compression
(2024)
• No Venue
Hong et al.
-
Margin-aware Preference Optimization For Aligning Diffusion Models Without Reference
(2024)
• No Venue
Hong et al.
-
Scaling Laws For Linear Complexity Language Models
(2024)
• No Venue
Shen et al.
-
Longvu: Spatiotemporal Adaptive Compression For Long Video-language Understanding
(2024)
• No Venue
Shen et al.
-
Wavllm: Towards Robust And Adaptive Speech Large Language Model
(2024)
• No Venue
Hu et al.
-
The Dawn Of GUI Agent: A Preliminary Case Study With Claude 3.5 Computer Use
(2024)
• No Venue
Hu et al.
-
Snapgen: Taming High-resolution Text-to-image Models For Mobile Devices With Efficient Architectures And Training
(2024)
• No Venue
Hu et al.
-
New Desiderata For Direct Preference Optimization
(2024)
• No Venue
Xiangkun Hu, Tong He, David Wipf
-
Exploring Model Kinship For Merging Large Language Models
(2024)
• No Venue
Hu et al.
-
RULER: What's The Real Context Size Of Your Long-context Language Models?
(2024)
• No Venue
Hsieh et al.
-
Walledeval: A Comprehensive Safety Evaluation Toolkit For Large Language Models
(2024)
• No Venue
Gupta et al.
-
Design2code: How Far Are We From Automating Front-end Engineering?
(2024)
• No Venue
Si et al.
-
M-rewardbench: Evaluating Reward Models In Multilingual Settings
(2024)
• No Venue
Gureja et al.
-
Appworld: A Controllable World Of Apps And People For Benchmarking Interactive Coding Agents
(2024)
• No Venue
Trivedi et al.
-
Ezaudio: Enhancing Text-to-audio Generation With Efficient Diffusion Transformer
(2024)
• No Venue
Hai et al.
-
Two Giraffes In A Dirt Field: Using Game Play To Investigate Situation Modelling In Large Multimodal Models
(2024)
• No Venue
Hakimov et al.
-
REPOEXEC: Evaluate Code Generation With A Repository-level Executable Benchmark
(2024)
• No Venue
Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui
-
TIP-I2V: A Million-scale Real Text And Image Prompt Dataset For Image-to-video Generation
(2024)
• No Venue
Wenhao Wang, Yi Yang
-
Omnieval: An Omnidirectional And Automatic RAG Evaluation Benchmark In Financial Domain
(2024)
• No Venue
Wang et al.
-
Sibyl: Simple Yet Effective Agent Framework For Complex Real-world Reasoning
(2024)
• No Venue
Wang et al.
-
JPEG-LM: Llms As Image Generators With Canonical Codec Representations
(2024)
• No Venue
Han et al.
-
Nestools: A Dataset For Evaluating Nested Tool Learning Abilities Of Large Language Models
(2024)
• No Venue
Han et al.
-
Wildguard: Open One-stop Moderation Tools For Safety Risks, Jailbreaks, And Refusals Of Llms
(2024)
• No Venue
Han et al.
-
Adapting Llms To Hebrew: Unveiling Dictalm 2.0 With Enhanced Vocabulary And Instruction Capabilities
(2024)
• No Venue
Shmidman et al.
-
Bootstrapping Llm-based Task-oriented Dialogue Agents Via Self-talk
(2024)
• No Venue
Ulmer et al.
-
Towards Conversational Diagnostic AI
(2024)
• No Venue
Tu et al.
-
No "zero-shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
(2024)
• No Venue
Udandarao et al.
-
Spotting Llms With Binoculars: Zero-shot Detection Of Machine-generated Text
(2024)
• No Venue
Hans et al.
-
Multimodal Needle In A Haystack: Benchmarking Long-context Capability Of Multimodal Large Language Models
(2024)
• No Venue
Wang et al.
-
Extranerf: Visibility-aware View Extrapolation Of Neural Radiance Fields With Diffusion Models
(2024)
• No Venue
Shih et al.
-
Needle In A Multimodal Haystack
(2024)
• No Venue
Wang et al.
-
Discovering The Gems In Early Layers: Accelerating Long-context Llms With 1000x Input Token Reduction
(2024)
• No Venue
Shi et al.
-
Chartmimic: Evaluating Lmm's Cross-modal Reasoning Capability Via Chart-to-code Generation
(2024)
• No Venue
Shi et al.
-
Lumos : Empowering Multimodal Llms With Scene Text Recognition
(2024)
• No Venue
Shenoy et al.
-
Lora.rar: Learning To Merge Loras Via Hypernetworks For Subject-style Conditioned Image Generation
(2024)
• No Venue
Shenaj et al.
-
PIN: A Knowledge-intensive Dataset For Paired And Interleaved Multimodal Documents
(2024)
• No Venue
Wang et al.
-
Positionid: Llms Can Control Lengths, Copy And Paste With Explicit Positional Awareness
(2024)
• No Venue
Wang et al.
-
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations For Vision Foundation Models
(2024)
• No Venue
Hengyi Wang, Shiwei Tan, Hao Wang
-
Resonance Rope: Improving Context Length Generalization Of Large Language Models
(2024)
• No Venue
Wang et al.
-
Mambavision: A Hybrid Mamba-transformer Vision Backbone
(2024)
• No Venue
Ali Hatamizadeh, Jan Kautz
-
Glore: When, Where, And How To Improve LLM Reasoning Via Global And Local Refinements
(2024)
• No Venue
Havrilla et al.
-
Mmworld: Towards Multi-discipline Multi-faceted World Model Evaluation In Videos
(2024)
• No Venue
He et al.
-
DICE: Discrete Inversion Enabling Controllable Editing For Multinomial Diffusion And Masked Generative Models
(2024)
• No Venue
He et al.
-
Phased Consistency Model
(2024)
• No Venue
Wang et al.
-
Webvoyager: Building An End-to-end Web Agent With Large Multimodal Models
(2024)
• No Venue
He et al.
-
Venhancer: Generative Space-time Enhancement For Video Generation
(2024)
• No Venue
He et al.
-
MLP-KAN: Unifying Deep Representation And Function Learning
(2024)
• No Venue
He et al.
-
Dresscode: Autoregressively Sewing And Generating Garments From Text Guidance
(2024)
• No Venue
He et al.
-
Cameractrl: Enabling Camera Control For Text-to-video Generation
(2024)
• No Venue
He et al.
-
Chinese Simpleqa: A Chinese Factuality Evaluation For Large Language Models
(2024)
• No Venue
He et al.
-
Coffee-gym: An Environment For Evaluating And Improving Natural Language Feedback On Erroneous Code
(2024)
• No Venue
Chae et al.
-
Mceval: Massively Multilingual Code Evaluation
(2024)
• No Venue
Chai et al.
-
Improve Vision Language Model Chain-of-thought Reasoning
(2024)
• No Venue
Zhang et al.
-
How Far Are We From Intelligent Visual Deductive Reasoning?
(2024)
• No Venue
Zhang et al.
-
Ferret-ui: Grounded Mobile UI Understanding With Multimodal Llms
(2024)
• No Venue
You et al.
-
GPT Or BERT: Why Not Both?
(2024)
• No Venue
Lucas Georges Gabriel Charpentier, David Samuel
-
Getting It Right: Improving Spatial Consistency In Text-to-image Models
(2024)
• No Venue
Chatterjee et al.
-
Humaneval-v: Benchmarking High-level Visual Reasoning With Complex Diagrams In Coding Tasks
(2024)
• No Venue
Zhang et al.
-
Still-moving: Customized Video Generation Without Customized Video Data
(2024)
• No Venue
Chefer et al.
-
Veagle: Advancements In Multimodal Representation Learning
(2024)
• No Venue
Chawla et al.
-
Chexagent: Towards A Foundation Model For Chest X-ray Interpretation
(2024)
• No Venue
Chen et al.
-
Agent-flan: Designing Data And Methods Of Effective Agent Tuning For Large Language Models
(2024)
• No Venue
Chen et al.
-
Auto Cherry-picker: Learning From High-quality Generative Data Driven By Language
(2024)
• No Venue
Chen et al.
-
An Image Is Worth 1/2 Tokens After Layer 2: Plug-and-play Inference Acceleration For Large Vision-language Models
(2024)
• No Venue
Chen et al.
-
Dolphin: Long Context As A New Modality For Energy-efficient On-device Language Models
(2024)
• No Venue
Chen et al.
-
Designing A Dashboard For Transparency And Control Of Conversational AI
(2024)
• No Venue
Chen et al.
-
Cod, Towards An Interpretable Medical Agent Using Chain Of Diagnosis
(2024)
• No Venue
Chen et al.
-
Charxiv: Charting Gaps In Realistic Chart Understanding In Multimodal Llms
(2024)
• No Venue
Wang et al.
-
Mm-ego: Towards Building Egocentric Multimodal Llms
(2024)
• No Venue
Ye et al.
-
MMAU: A Holistic Benchmark Of Agent Capabilities Across Diverse Domains
(2024)
• No Venue
Yin et al.
-
Sharegpt4video: Improving Video Understanding And Generation With Better Captions
(2024)
• No Venue
Chen et al.
-
Interleaved Scene Graph For Interleaved Text-and-image Generation Assessment
(2024)
• No Venue
Chen et al.
-
Grounded 3D-LLM With Referent Tokens
(2024)
• No Venue
Chen et al.
-
Gmai-mmbench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
(2024)
• No Venue
Chen et al.
-
How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites
(2024)
• No Venue
Chen et al.
-
Mega-bench: Scaling Multimodal Evaluation To Over 500 Real-world Tasks
(2024)
• No Venue
Chen et al.
-
Livemind: Low-latency Large Language Models With Simultaneous Inference
(2024)
• No Venue
Chen et al.
-
Motionllm: Understanding Human Behaviors From Human Motions And Videos
(2024)
• No Venue
Chen et al.
-
Mj-bench: Is Your Multimodal Reward Model Really A Good Judge For Text-to-image Generation?
(2024)
• No Venue
Chen et al.
-
Multi-object Hallucination In Vision-language Models
(2024)
• No Venue
Chen et al.
-
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
(2024)
• No Venue
Chen et al.
-
Orion-14b: Open-source Multilingual Large Language Models
(2024)
• No Venue
Chen et al.
-
ODIN: Disentangled Reward Mitigates Hacking In RLHF
(2024)
• No Venue
Chen et al.
-
Omnicreator: Self-supervised Unified Generation With Universal Editing
(2024)
• No Venue
Chen et al.
-
Panda-70m: Captioning 70M Videos With Multiple Cross-modality Teachers
(2024)
• No Venue
Chen et al.
-
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/multi-modal Jailbreak Attacks?
(2024)
• No Venue
Chen et al.
-
Premise Order Matters In Reasoning With Large Language Models
(2024)
• No Venue
Chen et al.
-
Self-play Fine-tuning Converts Weak Language Models To Strong Language Models
(2024)
• No Venue
Chen et al.
-
SAR3D: Autoregressive 3D Object Generation And Understanding Via Multi-scale 3D VQVAE
(2024)
• No Venue
Chen et al.
-
Retrospective Learning From Interactions
(2024)
• No Venue
Chen et al.
-
Scienceagentbench: Toward Rigorous Assessment Of Language Agents For Data-driven Scientific Discovery
(2024)
• No Venue
Chen et al.
-
Large Language Model-brained GUI Agents: A Survey
(2024)
• No Venue
Zhang et al.
-
Videocrafter2: Overcoming Data Limitations For High-quality Video Diffusion Models
(2024)
• No Venue
Chen et al.
-
Unified Hallucination Detection For Multimodal Large Language Models
(2024)
• Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
• 19 citations
Chen et al.
-
VALL-E 2: Neural Codec Language Models Are Human Parity Zero-shot Text To Speech Synthesizers
(2024)
• No Venue
Chen et al.
-
Narrowing The Knowledge Evaluation Gap: Open-domain Question Answering With Multi-granularity Answers
(2024)
• No Venue
Gal Yona, Roee Aharoni, Mor Geva
-
Lexc-gen: Generating Data For Extremely Low-resource Languages With Large Language Models And Bilingual Lexicons
(2024)
• No Venue
Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach
-
IOPO: Empowering Llms With Complex Instruction Following Via Input-output Preference Optimization
(2024)
• No Venue
Zhang et al.
-
K-level Reasoning With Large Language Models
(2024)
• No Venue
Zhang et al.
-
Long Context Transfer From Language To Vision
(2024)
• No Venue
Zhang et al.
-
Direct Preference Optimization Using Sparse Feature-level Constraints
(2024)
• No Venue
Yin et al.
-
SPAR: Personalized Content-based Recommendation Via Long Engagement Attention
(2024)
• No Venue
Zhang et al.
-
The Vision Of Autonomic Computing: Can Llms Make It A Reality?
(2024)
• No Venue
Zhang et al.
-
Balancing Pipeline Parallelism With Vocabulary Parallelism
(2024)
• No Venue
Yeung et al.
-
Cogview3: Finer And Faster Text-to-image Generation Via Relay Diffusion
(2024)
• No Venue
Zheng et al.
-
Video-panda: Parameter-efficient Alignment For Encoder-free Video-language Models
(2024)
• No Venue
Yi et al.
-
Video Instruction Tuning With Synthetic Data
(2024)
• No Venue
Zhang et al.
-
Wukong: Towards A Scaling Law For Large-scale Recommendation
(2024)
• No Venue
Zhang et al.
-
PAS: Data-efficient Plug-and-play Prompt Augmentation System
(2024)
• No Venue
Zheng et al.
-
Planetarium: A Rigorous Benchmark For Translating Text To Structured Planning Languages
(2024)
• No Venue
Zuo et al.
-
MIRAI: Evaluating LLM Agents For Event Forecasting
(2024)
• No Venue
Ye et al.
-
Openstory++: A Large-scale Dataset And Benchmark For Instance-aware Open-domain Visual Storytelling
(2024)
• No Venue
Ye et al.
-
LOKI: A Comprehensive Synthetic Data Detection Benchmark Using Large Multimodal Models
(2024)
• No Venue
Ye et al.
-
Mplug-owl3: Towards Long Image-sequence Understanding In Multi-modal Large Language Models
(2024)
• No Venue
Ye et al.
-
Processbench: Identifying Process Errors In Mathematical Reasoning
(2024)
• No Venue
Zheng et al.
-
NATURAL PLAN: Benchmarking Llms On Natural Language Planning
(2024)
• No Venue
Zheng et al.
-
Gpt-4v(ision) Is A Generalist Web Agent, If Grounded
(2024)
• No Venue
Zheng et al.
-
Improving Visual Commonsense In Language Models Via Multiple Image Generation
(2024)
• No Venue
Yariv et al.
-
Opencodeinterpreter: Integrating Code Generation With Execution And Refinement
(2024)
• No Venue
Zheng et al.
-
Attention Heads Of Large Language Models: A Survey
(2024)
• No Venue
Zheng et al.
-
Llama Beyond English: An Empirical Study On Language Capability Transfer
(2024)
• No Venue
Zhao et al.
-
Minicpm-v: A GPT-4V Level MLLM On Your Phone
(2024)
• No Venue
Yao et al.
-
Mixllm: LLM Quantization With Global Mixed-precision Between Output-features And Highly-efficient System Design
(2024)
• No Venue
Zhen Zheng, Xiaonan Song, Chuanjie Liu
-
Humaneval Pro And MBPP Pro: Evaluating Large Language Models On Self-invoking Code Generation
(2024)
• No Venue
Yu et al.
-
Steering Knowledge Selection Behaviours In Llms Via Sae-based Representation Engineering
(2024)
• No Venue
Zhao et al.
-
CORAL: Benchmarking Multi-turn Conversational Retrieval-augmentation Generation
(2024)
• No Venue
Cheng et al.
-
Bigcodebench: Benchmarking Code Generation With Diverse Function Calls And Complex Instructions
(2024)
• No Venue
Zhuo et al.
-
Ultraedit: Instruction-based Fine-grained Image Editing At Scale
(2024)
• No Venue
Zhao et al.
-
Longagent: Scaling Language Models To 128k Context Through Multi-agent Collaboration
(2024)
• No Venue
Zhao et al.
-
Compressed Chain Of Thought: Efficient Reasoning Through Dense Representations
(2024)
• No Venue
Jeffrey Cheng, Benjamin van Durme
-
Lora Land: 310 Fine-tuned Llms That Rival GPT-4, A Technical Report
(2024)
• No Venue
Zhao et al.
-
Prosa: Assessing And Understanding The Prompt Sensitivity Of Llms
(2024)
• No Venue
Zhuo et al.
-
Autodetect: Towards A Unified Framework For Automated Weakness Detection In Large Language Models
(2024)
• No Venue
Cheng et al.
-
Towards Achieving Human Parity On End-to-end Simultaneous Speech Translation Via LLM Agent
(2024)
• No Venue
Cheng et al.
-
On Domain-specific Post-training For Multimodal Large Language Models
(2024)
• No Venue
Cheng et al.
-
Videgothink: Assessing Egocentric Video Understanding Capabilities For Embodied AI
(2024)
• No Venue
Cheng et al.
-
Videollama 2: Advancing Spatial-temporal Modeling And Audio Understanding In Video-llms
(2024)
• No Venue
Cheng et al.
-
Benchmarking Multi-image Understanding In Vision And Language Models: Perception, Knowledge, Reasoning, And Multi-hop Reasoning
(2024)
• No Venue
Zhao et al.
-
Masked Audio Generation Using A Single Non-autoregressive Transformer
(2024)
• No Venue
Ziv et al.
-
Fully Open Source Moxin-7b Technical Report
(2024)
• No Venue
Zhao et al.
-
Reflections From The 2024 Large Language Model (LLM) Hackathon For Applications In Materials Science And Chemistry
(2024)
• No Venue
Zimmermann et al.
-
Mg-llava: Towards Multi-granularity Visual Instruction Tuning
(2024)
• No Venue
Zhao et al.
-
Dynamath: A Dynamic Visual Benchmark For Evaluating Mathematical Reasoning Robustness Of Vision Language Models
(2024)
• No Venue
Zou et al.
-
Bento: Benchmark Task Reduction With In-context Transferability
(2024)
• No Venue
Zhao et al.
-
Apollo: An Exploration Of Video Understanding In Large Multimodal Models
(2024)
• No Venue
Zohar et al.
-
The Browsergym Ecosystem For Web Agent Research
(2024)
• No Venue
Chezelles et al.
-
U-MATH: A University-level Benchmark For Evaluating Mathematical Skills In Llms
(2024)
• No Venue
Chernyshev et al.
-
Chatbot Arena: An Open Platform For Evaluating Llms By Human Preference
(2024)
• No Venue
Chiang et al.
-
Beyond Fine-tuning: Unleashing The Potential Of Continuous Pretraining For Clinical Llms
(2024)
• No Venue
Christophe et al.
-
M3docrag: Multi-modal Retrieval Is What You Need For Multi-page Multi-document Understanding
(2024)
• No Venue
Cho et al.
-
Mixture-of-agents Enhances Large Language Model Capabilities
(2024)
• No Venue
Wang et al.
-
Camel-bench: A Comprehensive Arabic LMM Benchmark
(2024)
• No Venue
Ghaboura et al.
-
To Cot Or Not To Cot? Chain-of-thought Helps Mainly On Math And Symbolic Reasoning
(2024)
• No Venue
Sprague et al.
-
GAMA: A Large Audio-language Model With Advanced Audio Understanding And Complex Reasoning Abilities
(2024)
• No Venue
Ghosh et al.
-
Justrank: Benchmarking LLM Judges For System Ranking
(2024)
• No Venue
Gera et al.
-
Rulearena: A Benchmark For Rule-guided Reasoning With Llms In Real-world Scenarios
(2024)
• No Venue
Zhou et al.
-
Are We Done With MMLU?
(2024)
• No Venue
Gema et al.
-
LLM Pruning And Distillation In Practice: The Minitron Approach
(2024)
• No Venue
Sreenivas et al.
-
Jina-embeddings-v3: Multilingual Embeddings With Task Lora
(2024)
• No Venue
Sturua et al.
-
Conflictbank: A Benchmark For Evaluating The Influence Of Knowledge Conflicts In LLM
(2024)
• No Venue
Su et al.
-
Visual Fact Checker: Enabling High-fidelity Detailed Caption Generation
(2024)
• No Venue
Ge et al.
-
Divot: Diffusion Powers Video Tokenizer For Comprehension And Generation
(2024)
• No Venue
Ge et al.
-
BEHAVIOR Vision Suite: Customizable Dataset Generation Via Simulation
(2024)
• No Venue
Ge et al.
-
Discrete Flow Matching
(2024)
• No Venue
Gat et al.
-
Kvasir-vqa: A Text-image Pair GI Tract Dataset
(2024)
• No Venue
Gautam et al.
-
Longins: A Challenging Long-context Instruction-based Exam For Llms
(2024)
• No Venue
Gavin et al.
-
BRIGHT: A Realistic And Challenging Benchmark For Reasoning-intensive Retrieval
(2024)
• No Venue
Su et al.
-
Replacing Judges With Juries: Evaluating LLM Generations With A Panel Of Diverse Models
(2024)
• No Venue
Verga et al.
-
Large Scale Transfer Learning For Tabular Data Via Language Modeling
(2024)
• No Venue
Josh Gardner, Juan C. Perdomo, Ludwig Schmidt
-
Mobillama: Towards Accurate And Lightweight Fully Transparent GPT
(2024)
• No Venue
Thawakar et al.
-
Spreadsheetllm: Encoding Spreadsheets For Large Language Models
(2024)
• No Venue
Tian et al.
-
Μlo: Compute-efficient Meta-generalization Of Learned Optimizers
(2024)
• No Venue
Thérien et al.
-
Introducing V0.5 Of The AI Safety Benchmark From Mlcommons
(2024)
• No Venue
Vidgen et al.
-
Amuro & Char: Analyzing The Relationship Between Pre-training And Fine-tuning Of Large Language Models
(2024)
• No Venue
Kaiser Sun, Mark Dredze
-
Shadowkv: KV Cache In Shadows For High-throughput Long-context LLM Inference
(2024)
• No Venue
Sun et al.
-
Trustllm: Trustworthiness In Large Language Models
(2024)
• No Venue
Sun et al.
-
Leave No Document Behind: Benchmarking Long-context Llms With Extended Multi-doc QA
(2024)
• No Venue
Wang et al.
-
Parrot: Multilingual Visual Instruction Tuning
(2024)
• No Venue
Sun et al.
-
Omni-math: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
(2024)
• No Venue
Gao et al.
-
Judging The Judges: Evaluating Alignment And Vulnerabilities In Llms-as-judges
(2024)
• No Venue
Thakur et al.
-
Scicode: A Research Coding Benchmark Curated By Scientists
(2024)
• No Venue
Tian et al.
-
Hermes 3 Technical Report
(2024)
• No Venue
Ryan Teknium, Jeffrey Quesnelle, Chen Guang
-
Cambrian-1: A Fully Open, Vision-centric Exploration Of Multimodal Llms
(2024)
• No Venue
Tong et al.
-
Erasing Conceptual Knowledge From Language Models
(2024)
• No Venue
Gandikota et al.
-
Moscar: A Large-scale Multilingual And Multimodal Document-level Corpus
(2024)
• No Venue
Futeral et al.
-
Language Models Scale Reliably With Over-training And On Downstream Tasks
(2024)
• No Venue
Gadre et al.
-
T2v-compbench: A Comprehensive Benchmark For Compositional Text-to-video Generation
(2024)
• No Venue
Sun et al.
-
LAMBDA: A Large Model Based Data Agent
(2024)
• No Venue
Sun et al.
-
EVA-CLIP-18B: Scaling CLIP To 18 Billion Parameters
(2024)
• No Venue
Sun et al.
-
3D Question Answering For City Scene Understanding
(2024)
• No Venue
Sun et al.
-
Msi-agent: Incorporating Multi-scale Insight Into Embodied Agents For Superior Planning And Decision-making
(2024)
• No Venue
Fu et al.
-
Bitsfusion: 1.99 Bits Weight Quantization Of Diffusion Model
(2024)
• No Venue
Sui et al.
-
Hunyuan-large: An Open-source Moe Model With 52 Billion Activated Parameters By Tencent
(2024)
• No Venue
Sun et al.
-
Video-mme: The First-ever Comprehensive Evaluation Benchmark Of Multi-modal Llms In Video Analysis
(2024)
• No Venue
Fu et al.
-
TLDR: Token-level Detective Reward Model For Large Vision Language Models
(2024)
• No Venue
Fu et al.
-
Autorag-hp: Automatic Online Hyper-parameter Tuning For Retrieval-augmented Generation
(2024)
• No Venue
Fu et al.
-
Mme-survey: A Comprehensive Survey On Evaluation Of Multimodal Llms
(2024)
• No Venue
Fu et al.
-
BLINK: Multimodal Large Language Models Can See But Not Perceive
(2024)
• No Venue
Fu et al.
-
Data Engineering For Scaling Language Models To 128K Context
(2024)
• No Venue
Fu et al.
-
RAG Foundry: A Framework For Enhancing Llms For Retrieval Augmented Generation
(2024)
• No Venue
Fleischer et al.
-
Multimodal Autoregressive Pre-training Of Large Vision Encoders
(2024)
• No Venue
Fini et al.
-
Styleremix: Interpretable Authorship Obfuscation Via Distillation And Perturbation Of Style Elements
(2024)
• No Venue
Fisher et al.
-
LAVE: Llm-powered Agent Assistance And Language Augmentation For Video Editing
(2024)
• No Venue
Wang et al.
-
Octo: An Open-source Generalist Robot Policy
(2024)
• No Venue
Team et al.
-
Towards A Personal Health Large Language Model
(2024)
• No Venue
Cosentino et al.
-
VLOGGER: Multimodal Diffusion For Embodied Avatar Synthesis
(2024)
• No Venue
Corona et al.
-
Sambalingo: Teaching Large Language Models New Languages
(2024)
• No Venue
Csaki et al.
-
Llmtimesmapreduce: Simplified Long-sequence Processing Using Large Language Models
(2024)
• No Venue
Zhou et al.
-
Coconut: Modernizing COCO Segmentation
(2024)
• No Venue
Deng et al.
-
Rethinking Optimization And Architecture For Tiny Language Models
(2024)
• No Venue
Tang et al.
-
Learnlm: Improving Gemini For Learning
(2024)
• No Venue
Team et al.
-
Be Yourself: Bounded Attention For Multi-subject Text-to-image Generation
(2024)
• No Venue
Dahary et al.
-
Chameleon: Mixed-modal Early-fusion Foundation Models
(2024)
• No Venue
Chameleon Team
-
RACER: Rich Language-guided Failure Recovery Policies For Imitation Learning
(2024)
• No Venue
Dai et al.
-
Youtube-sl-25: A Large-scale, Open-domain Multilingual Sign Language Parallel Corpus
(2024)
• No Venue
Garrett Tanzer, Biao Zhang
-
Livespeech: Low-latency Zero-shot Text-to-speech Via Autoregressive Modeling Of Audio Discrete Codes
(2024)
• No Venue
Dang et al.
-
Swiftbrush V2: Make Your One-step Diffusion Model Better Than Its Teacher
(2024)
• No Venue
Dao et al.
-
Scaling Laws With Vocabulary: Larger Models Deserve Larger Vocabularies
(2024)
• No Venue
Tao et al.
-
Causal Diffusion Transformers For Generative Modeling
(2024)
• No Venue
Deng et al.
-
Speechverse: A Large-scale Generalizable Audio Language Model
(2024)
• No Venue
Das et al.
-
Self-recognition In Language Models
(2024)
• No Venue
Davidson et al.
-
Benchx: A Unified Benchmark Framework For Medical Vision-language Pretraining On Chest X-rays
(2024)
• No Venue
Zhou et al.
-
Molmo And Pixmo: Open Weights And Open Data For State-of-the-art Multimodal Models
(2024)
• No Venue
Deitke et al.
-
GATE Opening: A Comprehensive Benchmark For Judging Open-ended Interleaved Image-text Generation
(2024)
• No Venue
Zhou et al.
-
L-citeeval: Do Long-context Models Truly Leverage Context For Responding?
(2024)
• No Venue
Tang et al.
-
Fancyvideo: Towards Dynamic And Consistent Video Generation Via Cross-frame Textual Guidance
(2024)
• No Venue
Feng et al.
-
Croissantllm: A Truly Bilingual French-english Language Model
(2024)
• No Venue
Faysse et al.
-
Diffusion-rwkv: Scaling Rwkv-like Architectures For Diffusion Models
(2024)
• No Venue
Fei et al.
-
FLUX That Plays Music
(2024)
• No Venue
Fei et al.
-
VILA^2: VILA Augmented VILA
(2024)
• No Venue
Fang et al.
-
GTA: A Benchmark For General Tool Agents
(2024)
• No Venue
Wang et al.
-
Grutopia: Dream General Robots In A City At Scale
(2024)
• No Venue
Wang et al.
-
Test Of Time: A Benchmark For Evaluating Llms On Temporal Reasoning
(2024)
• No Venue
Fatemi et al.
-
VIMI: Grounding Video Generation Through Multi-modal Instruction
(2024)
• No Venue
Fang et al.
-
Emu3: Next-token Prediction Is All You Need
(2024)
• No Venue
Wang et al.
-
CCI3.0-HQ: A Large-scale Chinese Dataset Of High Quality Designed For Pre-training Large Language Models
(2024)
• No Venue
Wang et al.
-
Divide And Conquer: Language Models Can Plan And Self-correct For Compositional Text-to-image Generation
(2024)
• No Venue
Wang et al.
-
Diasynth -- Synthetic Dialogue Generation Framework
(2024)
• No Venue
Suresh et al.
-
Vlmevalkit: An Open-source Toolkit For Evaluating Large Multi-modality Models
(2024)
• No Venue
Duan et al.
-
Adaptive Length Image Tokenization Via Recurrent Allocation
(2024)
• No Venue
Duggal et al.
-
Build-a-scene: Interactive 3D Layout Control For Diffusion-based Image Generation
(2024)
• No Venue
Abdelrahman Eldesokey, Peter Wonka
-
Diffusion Feedback Helps CLIP See Better
(2024)
• No Venue
Wang et al.
-
DPLM-2: A Multimodal Diffusion Protein Language Model
(2024)
• No Venue
Wang et al.
-
Mmbench-video: A Long-form Multi-shot Benchmark For Holistic Video Understanding
(2024)
• No Venue
Fang et al.
-
Lingen: Towards High-resolution Minute-length Text-to-video Generation With Linear Computational Complexity
(2024)
• No Venue
Wang et al.
-
Muirbench: A Comprehensive Benchmark For Robust Multi-image Understanding
(2024)
• No Venue
Wang et al.
-
Judgebench: A Benchmark For Evaluating Llm-based Judges
(2024)
• No Venue
Tan et al.
-
Multimodal Situational Safety
(2024)
• No Venue
Zhou et al.
-
Llavaolmobitnet1b: Ternary LLM Goes Multimodal!
(2024)
• No Venue
Jainaveen Sundaram, Ravishankar Iyer
-
Read Anywhere Pointed: Layout-aware GUI Screen Reading With Tree-of-lens Grounding
(2024)
• No Venue
Fan et al.
-
Instancecap: Improving Text-to-video Generation Via Instance-aware Structured Caption
(2024)
• No Venue
Fan et al.
-
Fluid: Scaling Autoregressive Text-to-image Generative Models With Continuous Tokens
(2024)
• No Venue
Fan et al.
-
GTR: Improving Large 3D Reconstruction Models Through Geometry And Texture Refinement
(2024)
• No Venue
Zhuang et al.
-
The Llama 3 Herd Of Models
(2024)
• No Venue
Dubey et al.
-
An Interactive Agent Foundation Model
(2024)
• No Venue
Durante et al.
-
Scaling Rectified Flow Transformers For High-resolution Image Synthesis
(2024)
• No Venue
Esser et al.
-
Scalable Pre-training Of Large Autoregressive Image Models
(2024)
• No Venue
El-Nouby et al.
-
Learning To Move Like Professional Counter-strike Players
(2024)
• No Venue
Durst et al.
-
Videogamebunny: Towards Vision Assistants For Video Games
(2024)
• No Venue
Mohammad Reza Taesiri, Cor-Paul Bezemer
-
Agent Workflow Memory
(2024)
• No Venue
Wang et al.
-
MUSCLE: A Model Update Strategy For Compatible LLM Evolution
(2024)
• No Venue
Echterhoff et al.
-
Adaptive Decoding Via Latent Preference Optimization
(2024)
• No Venue
Dhuliawala et al.
-
Mapeval: A Map-based Evaluation Of Geo-spatial Reasoning In Foundation Models
(2024)
• No Venue
Dihan et al.
-
Ominicontrol: Minimal And Universal Control For Diffusion Transformer
(2024)
• No Venue
Tan et al.
-
Tofueval: Evaluating Hallucinations Of Llms On Topic-focused Dialogue Summarization
(2024)
• No Venue
Tang et al.
-
A Unified View Of Delta Parameter Editing In Post-trained Large-scale Models
(2024)
• No Venue
Tang et al.
-
Symbolicai: A Framework For Logic-based Approaches Combining Generative Models And Solvers
(2024)
• No Venue
Dinu et al.
-
Openbezoar: Small, Cost-effective And Open Models Trained On Mixes Of Instruction Data
(2024)
• No Venue
Dissanayake et al.
-
Mimir: Improving Video Diffusion Models For Precise Text Understanding
(2024)
• No Venue
Tan et al.
-
Holodreamer: Holistic 3D Panoramic World Generation From Text Descriptions
(2024)
• No Venue
Zhou et al.
-
Vidgen-1m: A Large-scale Dataset For Text-to-video Generation
(2024)
• No Venue
Tan et al.
-
Docgraphlm: Documental Graph Language Model For Information Extraction
(2024)
• No Venue
Wang et al.
-
Evaluating Language Model Context Windows: A "working Memory" Test And Inference-time Correction
(2024)
• No Venue
Dsouza et al.
-
Stacking Your Transformers: A Closer Look At Model Growth For Efficient LLM Pre-training
(2024)
• No Venue
Du et al.
-
Textsquare: Scaling Up Text-centric Visual Instruction Tuning
(2024)
• No Venue
Tang et al.
-
Toward General Instruction-following Alignment For Retrieval-augmented Generation
(2024)
• No Venue
Dong et al.
-
Self-boosting Large Language Models With Synthetic Preference Data
(2024)
• No Venue
Dong et al.
-
Mathscale: Scaling Instruction Tuning For Mathematical Reasoning
(2024)
• No Venue
Tang et al.
-
LOGO -- Long Context Alignment Via Efficient Preference Optimization
(2024)
• No Venue
Tang et al.
-
Mvdiffusion++: A Dense High-resolution Multi-view Diffusion Model For Single Or Sparse-view 3D Object Reconstruction
(2024)
• No Venue
Tang et al.
-
CLEAR: Character Unlearning In Textual And Visual Modalities
(2024)
• No Venue
Dontsov et al.
-
Mathhay: An Automated Benchmark For Long-context Mathematical Reasoning In Llms
(2024)
• No Venue
Wang et al.
-
Mmlu-pro: A More Robust And Challenging Multi-task Language Understanding Benchmark
(2024)
• No Venue
Wang et al.
-
Foundational Autoraters: Taming Large Language Models For Better Automatic Evaluation
(2024)
• No Venue
Vu et al.
-
On The Limits Of Agency In Agent-based Models
(2024)
• No Venue
Chopra et al.
-
Echoprime: A Multi-video View-informed Vision-language Model For Comprehensive Echocardiography Interpretation
(2024)
• No Venue
Vukadinovic et al.
-
Agent-as-a-judge: Evaluate Agents With Agents
(2024)
• No Venue
Zhuge et al.
-
LLM-AD: Large Language Model Based Audio Description System
(2024)
• No Venue
Peng Chu, Jiang Wang, Andre Abrantes
-
Qwen2-audio Technical Report
(2024)
• No Venue
Chu et al.
-
Visionllama: A Unified Llama Interface For Vision Tasks
(2024)
• No Venue
Chu et al.
-
Switti: Designing Scale-wise Transformers For Text-to-image Synthesis
(2024)
• No Venue
Voronov et al.
-
Lookback Lens: Detecting And Mitigating Contextual Hallucinations In Large Language Models Using Only Attention Maps
(2024)
• No Venue
Chuang et al.
-
A Flexible Large Language Models Guardrail Development Methodology Applied To Off-topic Prompt Detection
(2024)
• No Venue
Gabriel Chua, Shing Yee Chan, Shaun Khoo
-
Contextual: Evaluating Context-sensitive Text-rich Visual Reasoning In Large Multimodal Models
(2024)
• No Venue
Wadhawan et al.
-
Musicrl: Aligning Music Generation To Human Preferences
(2024)
• No Venue
Cideron et al.
-
Meltemi: The First Open Large Language Model For Greek
(2024)
• No Venue
Voukoutis et al.
-
Tnt-llm: Text Mining At Scale With Large Language Models
(2024)
• No Venue
Wan et al.
-
Jacolbertv2.5: Optimising Multi-vector Retrievers To Create State-of-the-art Japanese Retrievers With Constrained Resources
(2024)
• No Venue
Benjamin Clavié
-
Mtu-bench: A Multi-granularity Tool-use Benchmark For Large Language Models
(2024)
• No Venue
Wang et al.
-
Toto: Time Series Optimized Transformer For Observability
(2024)
• No Venue
Cohen et al.
-
Magicvideo-v2: Multi-stage High-aesthetic Video Generation
(2024)
• No Venue
Wang et al.
-
Tinystories: How Small Can Language Models Be And Still Speak Coherent English?
(2023)
• No Venue
Ronen Eldan, Yuanzhi Li
-
Mitigating Fine-grained Hallucination By Fine-tuning Large Vision-language Models With Caption Rewrites
(2023)
• Lecture Notes in Computer Science
• 23 citations
Wang et al.
-
Link-context Learning For Multimodal Llms
(2023)
• No Venue
Tai et al.
-
Chain-of-verification Reduces Hallucination In Large Language Models
(2023)
• No Venue
Dhuliawala et al.
-
Mind Meets Machine: Unravelling Gpt-4's Cognitive Psychology
(2023)
• BenchCouncil Transactions on Benchmarks, Standards and Evaluations
• 31 citations
Dhingra et al.
-
Is Chatgpt A Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation
(2023)
• Arxiv
• 55 citations
Fang et al.
-
Qlora: Efficient Finetuning Of Quantized Llms
(2023)
• No Venue
Dettmers et al.
-
Is Chatgpt A Good NLG Evaluator? A Preliminary Study
(2023)
• Proceedings of the 4th New Frontiers in Summarization Workshop
• 196 citations
Wang et al.
-
JARVIS-1: Open-world Multi-task Agents With Memory-augmented Multimodal Language Models
(2023)
• No Venue
Wang et al.
-
Learning Vision From Models Rivals Learning Vision From Data
(2023)
• No Venue
Tian et al.
-
Hifi Tuner: High-fidelity Subject-driven Fine-tuning For Diffusion Models
(2023)
• No Venue
Wang et al.
-
Chatgpt For Robotics: Design Principles And Model Abilities
(2023)
• No Venue
Vemprala et al.
-
Exploring Large Language Models' Cognitive Moral Development Through Defining Issues Test
(2023)
• No Venue
Tanmay et al.
-
BTLM-3B-8K: 7B Parameter Performance In A 3B Parameter Model
(2023)
• No Venue
Dey et al.
-
Generative Agent-based Modeling With Actions Grounded In Physical, Social, Or Digital Space Using Concordia
(2023)
• No Venue
Vezhnevets et al.
-
Reward-augmented Decoding: Efficient Controlled Text Generation With A Unidirectional Reward Model
(2023)
• No Venue
Haikang Deng, Colin Raffel
-
Mind2web: Towards A Generalist Agent For The Web
(2023)
• No Venue
Deng et al.
-
Flacuna: Unleashing The Problem Solving Power Of Vicuna Using FLAN Fine-tuning
(2023)
• No Venue
Ghosal et al.
-
Gemini In Reasoning: Unveiling Commonsense In Multimodal Large Language Models
(2023)
• No Venue
Yuqing Wang, Yun Zhao
-
Element-aware Summarization With Large Language Models: Expert-aligned Evaluation And Chain-of-thought Method
(2023)
• Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
• 28 citations
Yiming Wang, Zhuosheng Zhang, Rui Wang
-
Retrieval-augmented Generation For Large Language Models: A Survey
(2023)
• Arxiv
• 13 citations
Gao et al.
-
G-llava: Solving Geometric Problem With Multi-modal Large Language Model
(2023)
• No Venue
Gao et al.
-
Medalign: A Clinician-generated Dataset For Instruction Following With Electronic Medical Records
(2023)
• No Venue
Fleming et al.
-
3D-GPT: Procedural 3D Modeling With Large Language Models
(2023)
• No Venue
Sun et al.
-
Gemini: A Family Of Highly Capable Multimodal Models
(2023)
• No Venue
Team et al.
-
Is Chatgpt Good At Search? Investigating Large Language Models As Re-ranking Agents
(2023)
• Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
• 151 citations
Sun et al.
-
Safe RLHF: Safe Reinforcement Learning From Human Feedback
(2023)
• No Venue
Dai et al.
-
Benchmarking Neural Network Training Algorithms
(2023)
• No Venue
Dahl et al.
-
Aligning Large Multimodal Models With Factually Augmented RLHF
(2023)
• No Venue
Sun et al.
-
MME: A Comprehensive Evaluation Benchmark For Multimodal Large Language Models
(2023)
• Arxiv
• 87 citations
Fu et al.
-
3D-LFM: Lifting Foundation Model
(2023)
• No Venue
Mosam Dabhi, Laszlo A. Jeni, Simon Lucey
-
Sparse Autoencoders Find Highly Interpretable Features In Language Models
(2023)
• No Venue
Cunningham et al.
-
Analyzing And Mitigating Object Hallucination In Large Vision-language Models
(2023)
• Arxiv
• 29 citations
Zhou et al.
-
A Challenger To GPT-4V? Early Explorations Of Gemini In Visual Expertise
(2023)
• No Venue
Fu et al.
-
Large Language Models For Compiler Optimization
(2023)
• No Venue
Cummins et al.
-
Recmind: Large Language Model Powered Agent For Recommendation
(2023)
• Findings of the Association for Computational Linguistics: NAACL 2024
• 46 citations
Wang et al.
-
Approximating Two-layer Feedforward Networks For Efficient Transformers
(2023)
• No Venue
Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
-
Rethinking The Evaluation For Conversational Recommendation In The Era Of Large Language Models
(2023)
• Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
• 47 citations
Wang et al.
-
What's In My Big Data?
(2023)
• No Venue
Elazar et al.
-
Adaptive Shells For Efficient Neural Radiance Field Rendering
(2023)
• No Venue
Wang et al.
-
Evaluating The Ripple Effects Of Knowledge Editing In Language Models
(2023)
• No Venue
Cohen et al.
-
Shepherd: A Critic For Language Model Generation
(2023)
• No Venue
Wang et al.
-
Lp-musiccaps: Llm-based Pseudo Music Captioning
(2023)
• No Venue
Doh et al.
-
Chessgpt: Bridging Policy Learning And Language Modeling
(2023)
• No Venue
Feng et al.
-
Challenges With Unsupervised LLM Knowledge Discovery
(2023)
• No Venue
Farquhar et al.
-
Large Language Model For Science: A Study On P Vs. NP
(2023)
• No Venue
Dong et al.
-
Hierarchical Masked 3D Diffusion Model For Video Outpainting
(2023)
• No Venue
Fan et al.
-
Biocoder: A Benchmark For Bioinformatics Code Generation With Contextual Pragmatic Knowledge
(2023)
• No Venue
Tang et al.
-
Simple And Controllable Music Generation
(2023)
• No Venue
Copet et al.
-
Diffusion Model Alignment Using Direct Preference Optimization
(2023)
• No Venue
Wallace et al.
-
Decodingtrust: A Comprehensive Assessment Of Trustworthiness In GPT Models
(2023)
• No Venue
Wang et al.
-
TPTU: Task Planning And Tool Usage Of Large Language Model-based AI Agents
(2023)
• No Venue
Ruan et al.
-
Pdftriage: Question Answering Over Long, Structured Documents
(2023)
• No Venue
Saad-Falcon et al.
-
Mindagent: Emergent Gaming Interaction
(2023)
• No Venue
Gong et al.
-
An Image Is Worth Multiple Words: Learning Object Level Concepts Using Multi-concept Prompt Learning
(2023)
• No Venue
Jin et al.
-
The Science Of Detecting Llm-generated Texts
(2023)
• Communications of the ACM
• 90 citations
Ruixiang Tang, Yu-Neng Chuang, Xia Hu
-
Llm4vis: Explainable Visualization Recommendation Using Chatgpt
(2023)
• Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
• 23 citations
Wang et al.
-
Struc-bench: Are Large Language Models Really Good At Generating Complex Structured Data?
(2023)
• No Venue
Tang et al.
-
Mamba: Linear-time Sequence Modeling With Selective State Spaces
(2023)
• No Venue
Albert Gu, Tri Dao
-
Emu Video: Factorizing Text-to-video Generation By Explicit Image Conditioning
(2023)
• No Venue
Girdhar et al.
-
Webarena: A Realistic Web Environment For Building Autonomous Agents
(2023)
• No Venue
Zhou et al.
-
Freshllms: Refreshing Large Language Models With Search Engine Augmentation
(2023)
• No Venue
Vu et al.
-
Does GPT-4 Pass The Turing Test?
(2023)
• No Venue
Cameron Jones, Benjamin Bergen
-
Codegeex: A Pre-trained Model For Code Generation With Multilingual Benchmarking On Humaneval-x
(2023)
• Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
• 183 citations
Zheng et al.
-
Chatanything: Facetime Chat With Llm-enhanced Personas
(2023)
• No Venue
Zhao et al.
-
Starvector: Generating Scalable Vector Graphics Code From Images
(2023)
• No Venue
Rodriguez et al.
-
ARB: Advanced Reasoning Benchmark For Large Language Models
(2023)
• No Venue
Sawada et al.
-
Audiopalm: A Large Language Model That Can Speak And Listen
(2023)
• No Venue
Rubenstein et al.
-
Dreamdistribution: Prompt Distribution Learning For Text-to-image Diffusion Models
(2023)
• No Venue
Zhao et al.
-
Visual Chain Of Thought: Bridging Logical Gaps With Multimodal Infillings
(2023)
• Arxiv
• 10 citations
Rose et al.
-
Commoncanvas: An Open Diffusion Model Trained With Creative-commons Images
(2023)
• No Venue
Gokaslan et al.
-
Interfacing Foundation Models' Embeddings
(2023)
• No Venue
Zou et al.
-
Motiongpt: Human Motion As A Foreign Language
(2023)
• No Venue
Jiang et al.
-
Randomized Positional Encodings Boost Length Generalization Of Transformers
(2023)
• Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
• 14 citations
Ruoss et al.
-
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-based Self-verification
(2023)
• No Venue
Zhou et al.
-
Uncommonsense Reasoning: Abductive Reasoning About Uncommon Situations
(2023)
• No Venue
Zhao et al.
-
How Far Are Large Language Models From Agents With Theory-of-mind?
(2023)
• No Venue
Zhou et al.
-
Emernerf: Emergent Spatial-temporal Scene Decomposition Via Self-supervision
(2023)
• No Venue
Yang et al.
-
FABRIC: Personalizing Diffusion Models With Iterative Feedback
(2023)
• No Venue
Rütte et al.
-
Holodeck: Language Guided Generation Of 3D Embodied AI Environments
(2023)
• No Venue
Yang et al.
-
Alignment For Honesty
(2023)
• No Venue
Yang et al.
-
Powerinfer: Fast Large Language Model Serving With A Consumer-grade GPU
(2023)
• No Venue
Song et al.
-
Distilling Large Language Models For Biomedical Knowledge Extraction: A Case Study On Adverse Drug Events
(2023)
• No Venue
Gu et al.
-
Effective Long-context Scaling Of Foundation Models
(2023)
• No Venue
Xiong et al.
-
BOOT: Data-free Distillation Of Denoising Diffusion Models With Bootstrapping
(2023)
• No Venue
Gu et al.
-
Retroformer: Retrospective Large Language Agents With Policy Gradient Optimization
(2023)
• No Venue
Yao et al.
-
Faithful Persona-based Conversational Dataset Generation With Large Language Models
(2023)
• No Venue
Jandaghi et al.
-
Prometheus: Inducing Fine-grained Evaluation Capability In Language Models
(2023)
• No Venue
Kim et al.
-
Tree Of Thoughts: Deliberate Problem Solving With Large Language Models
(2023)
• No Venue
Yao et al.
-
To See Is To Believe: Prompting GPT-4V For Better Visual Instruction Tuning
(2023)
• No Venue
Wang et al.
-
Rethinking FID: Towards A Better Evaluation Metric For Image Generation
(2023)
• No Venue
Jayasumana et al.
-
Sparq Attention: Bandwidth-efficient LLM Inference
(2023)
• No Venue
Ribar et al.
-
Lmsys-chat-1m: A Large-scale Real-world LLM Conversation Dataset
(2023)
• No Venue
Zheng et al.
-
PEEKABOO: Interactive Video Generation Via Masked-diffusion
(2023)
• No Venue
Jain et al.
-
Vcoder: Versatile Vision Encoders For Multimodal Large Language Models
(2023)
• No Venue
Jitesh Jain, Jianwei Yang, Humphrey Shi
-
Diverse And Aligned Audio-to-video Generation Via Text-to-video Model Adaptation
(2023)
• No Venue
Yariv et al.
-
FLASK: Fine-grained Language Model Evaluation Based On Alignment Skill Sets
(2023)
• No Venue
Ye et al.
-
Judging Llm-as-a-judge With Mt-bench And Chatbot Arena
(2023)
• No Venue
Zheng et al.
-
Mplug-docowl: Modularized Multimodal Large Language Model For Document Understanding
(2023)
• No Venue
Ye et al.
-
Large Language Models Are Versatile Decomposers: Decompose Evidence And Questions For Table-based Reasoning
(2023)
• SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
• 53 citations
Ye et al.
-
TEAL: Tokenize And Embed ALL For Multi-modal Large Language Models
(2023)
• No Venue
Yang et al.
-
Bring Your Own Data! Self-supervised Evaluation For Large Language Models
(2023)
• No Venue
Jain et al.
-
Tiny Lvlm-ehub: Early Multimodal Experiments With Bard
(2023)
• No Venue
Shao et al.
-
Selfeval: Leveraging The Discriminative Nature Of Generative Models For Evaluation
(2023)
• No Venue
Sai Saketh Rambhatla, Ishan Misra
-
Detecting And Preventing Hallucinations In Large Vision Language Models
(2023)
• Proceedings of the AAAI Conference on Artificial Intelligence
• 99 citations
Anisha Gunjal, Jihan Yin, Erhan Bas
-
Animatediff: Animate Your Personalized Text-to-image Diffusion Models Without Specific Tuning
(2023)
• No Venue
Guo et al.
-
Connecting Large Language Models With Evolutionary Algorithms Yields Powerful Prompt Optimizers
(2023)
• No Venue
Guo et al.
-
Camels In A Changing Climate: Enhancing LM Adaptation With Tulu 2
(2023)
• No Venue
Ivison et al.
-
PPTC Benchmark: Evaluating Large Language Models For Powerpoint Task Completion
(2023)
• No Venue
Guo et al.
-
Gpt-fathom: Benchmarking Large Language Models To Decipher The Evolutionary Path Towards GPT-4 And Beyond
(2023)
• No Venue
Zheng et al.
-
Why Does Chatgpt Fall Short In Providing Truthful Answers?
(2023)
• Arxiv
• 35 citations
Shen Zheng, Jie Huang, Kevin Chen-Chuan Chang
-
Stay On Topic With Classifier-free Guidance
(2023)
• No Venue
Sanchez et al.
-
Glamm: Pixel Grounding Large Multimodal Model
(2023)
• No Venue
Rasheed et al.
-
Is Chatgpt A Good Translator? Yes With GPT-4 As The Engine
(2023)
• Arxiv
• 310 citations
Jiao et al.
-
Text-to-sticker: Style Tailoring Latent Diffusion Models For Human Expression
(2023)
• No Venue
Sinha et al.
-
GPQA: A Graduate-level Google-proof Q&A Benchmark
(2023)
• No Venue
Rein et al.
-
SOLAR 10.7B: Scaling Large Language Models With Simple Yet Effective Depth Up-scaling
(2023)
• No Venue
Kim et al.
-
Deepspeed Ulysses: System Optimizations For Enabling Training Of Extreme Long Sequence Transformer Models
(2023)
• No Venue
Jacobs et al.
-
Universalner: Targeted Distillation From Large Language Models For Open Named Entity Recognition
(2023)
• No Venue
Zhou et al.
-
A Survey On Multimodal Large Language Models
(2023)
• National Science Review
• 340 citations
Yin et al.
-
Woodpecker: Hallucination Correction For Multimodal Large Language Models
(2023)
• No Venue
Yin et al.
-
Magicapture: High-resolution Multi-concept Portrait Customization
(2023)
• No Venue
Junha Hyung, Jaeyo Shin, Jaegul Choo
-
Kandinsky: An Improved Text-to-image Synthesis With Image Prior And Latent Diffusion
(2023)
• No Venue
Razzhigaev et al.
-
Leandojo: Theorem Proving With Retrieval-augmented Language Models
(2023)
• No Venue
Yang et al.
-
Self-evaluation Improves Selective Generation In Large Language Models
(2023)
• No Venue
Ren et al.
-
A Picture Is Worth More Than 77 Text Tokens: Evaluating Clip-style Models On Dense Captions
(2023)
• No Venue
Urbanek et al.
-
A Real-world Webagent With Planning, Long Context Understanding, And Program Synthesis
(2023)
• No Venue
Gur et al.
-
Representation Learning With Large Language Models For Recommendation
(2023)
• WWW '24: The ACM Web Conference 2024
• 127 citations
Ren et al.
-
Tool Documentation Enables Zero-shot Tool-usage With Large Language Models
(2023)
• No Venue
Hsieh et al.
-
Scaling Up And Distilling Down: Language-guided Robot Skill Acquisition
(2023)
• No Venue
Huy Ha, Pete Florence, Shuran Song
-
Llama 2: Open Foundation And Fine-tuned Chat Models
(2023)
• No Venue
Touvron et al.
-
Pangu-coder2: Boosting Large Language Models For Code With Ranking Feedback
(2023)
• No Venue
Shen et al.
-
Clever Hans Or Neural Theory Of Mind? Stress Testing Social Reasoning In Large Language Models
(2023)
• Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
• 12 citations
Shapira et al.
-
Fine-grained Controllable Video Generation Via Object Appearance And Context
(2023)
• No Venue
Huang et al.
-
LMDX: Language Model-based Document Information Extraction And Localization
(2023)
• No Venue
Perot et al.
-
The Refinedweb Dataset For Falcon LLM: Outperforming Curated Corpora With Web Data, And Web Data Only
(2023)
• No Venue
Penedo et al.
-
Text-guided 3D Face Synthesis -- From Generation To Editing
(2023)
• No Venue
Wu et al.
-
Florence-2: Advancing A Unified Representation For A Variety Of Vision Tasks
(2023)
• No Venue
Xiao et al.
-
Foundationpose: Unified 6D Pose Estimation And Tracking Of Novel Objects
(2023)
• No Venue
Wen et al.
-
What Matters In Training A Gpt4-style Language Model With Multimodal Inputs?
(2023)
• No Venue
Zeng et al.
-
Statler: State-maintaining Language Models For Embodied Reasoning
(2023)
• No Venue
Yoneda et al.
-
Gemini Vs GPT-4V: A Preliminary Comparison And Combination Of Vision-language Models Through Qualitative Cases
(2023)
• No Venue
Qi et al.
-
Adapt: As-needed Decomposition And Planning With Language Models
(2023)
• No Venue
Prasad et al.
-
Memorybank: Enhancing Large Language Models With Long-term Memory
(2023)
• Proceedings of the AAAI Conference on Artificial Intelligence
• 88 citations
Zhong et al.
-
Cogagent: A Visual Language Model For GUI Agents
(2023)
• No Venue
Hong et al.
-
Towards Generalist Biomedical AI
(2023)
• No Venue
Tu et al.
-
Baichuan 2: Open Large-scale Language Models
(2023)
• No Venue
Yang et al.
-
Motionlm: Multi-agent Motion Forecasting As Language Modeling
(2023)
• No Venue
Seff et al.
-
Jais And Jais-chat: Arabic-centric Foundation And Instruction-tuned Open Generative Large Language Models
(2023)
• No Venue
Sengupta et al.
-
Style Aligned Image Generation Via Shared Attention
(2023)
• No Venue
Hertz et al.
-
Fastervit: Fast Vision Transformers With Hierarchical Attention
(2023)
• No Venue
Hatamizadeh et al.
-
Detecting Pretraining Data From Large Language Models
(2023)
• No Venue
Shi et al.
-
Photoverse: Tuning-free Image Customization With Text-to-image Diffusion Models
(2023)
• No Venue
Chen et al.
-
Alexa, Play With Robot: Introducing The First Alexa Prize Simbot Challenge On Embodied AI
(2023)
• No Venue
Shi et al.
-
Large Language Models As Zero-shot Conversational Recommenders
(2023)
• CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management
• 95 citations
He et al.
-
REPLUG: Retrieval-augmented Black-box Language Models
(2023)
• Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
• 91 citations
Shi et al.
-
Gentron: Delving Deep Into Diffusion Transformers For Image And Video Generation
(2023)
• No Venue
Chen et al.
-
CLEX: Continuous Length Extrapolation For Large Language Models
(2023)
• No Venue
Chen et al.
-
Beyond Surface: Probing Llama Across Scales And Layers
(2023)
• No Venue
Chen et al.
-
AMSP: Super-scaling LLM Training Via Advanced Model States Partitioning
(2023)
• No Venue
Chen et al.
-
Animatezero: Video Diffusion Models Are Zero-shot Image Animators
(2023)
• No Venue
Yu et al.
-
A Picture Is Worth A Thousand Words: Principled Recaptioning Improves Image Generation
(2023)
• No Venue
Segalis et al.
-
Exploring Human-like Translation Strategy With Large Language Models
(2023)
• Transactions of the Association for Computational Linguistics
• 47 citations
He et al.
-
A Survey On Evaluation Of Large Language Models
(2023)
• No Venue
Chang et al.
-
GPT-4V In Wonderland: Large Multimodal Models For Zero-shot Smartphone GUI Navigation
(2023)
• No Venue
Yan et al.
-
Musicldm: Enhancing Novelty In Text-to-music Generation Using Beat-synchronous Mixup Strategies
(2023)
• No Venue
Chen et al.
-
Med-flamingo: A Multimodal Medical Few-shot Learner
(2023)
• No Venue
Moor et al.
-
Can Llms Follow Simple Rules?
(2023)
• No Venue
Mu et al.
-
SILO Language Models: Isolating Legal Risk In A Nonparametric Datastore
(2023)
• No Venue
Min et al.
-
Dreamhuman: Animatable 3D Avatars From Text
(2023)
• No Venue
Kolotouros et al.
-
Kola: Carefully Benchmarking World Knowledge Of Large Language Models
(2023)
• No Venue
Yu et al.
-
Copy Is All You Need
(2023)
• No Venue
Lan et al.
-
Judgelm: Fine-tuned Large Language Models Are Scalable Judges
(2023)
• No Venue
Lianghui Zhu, Xinggang Wang, Xinlong Wang
-
Metamath: Bootstrap Your Own Mathematical Questions For Large Language Models
(2023)
• No Venue
Yu et al.
-
Robot Learning With Sensorimotor Pre-training
(2023)
• No Venue
Radosavovic et al.
-
Chameleon: Plug-and-play Compositional Reasoning With Large Language Models
(2023)
• Arxiv
• 90 citations
Lu et al.
-
Seallms -- Large Language Models For Southeast Asia
(2023)
• No Venue
Nguyen et al.
-
RLHF-V: Towards Trustworthy Mllms Via Behavior Alignment From Fine-grained Correctional Human Feedback
(2023)
• No Venue
Yu et al.
-
Merlin:empowering Multimodal Llms With Foresight Minds
(2023)
• No Venue
Yu et al.
-
DIALGEN: Collaborative Human-lm Generated Dialogues For Improved Understanding Of Human-human Conversations
(2023)
• No Venue
Lu et al.
-
Instruction Mining: High-quality Instruction Data Selection For Large Language Models
(2023)
• No Venue
Yihan Cao, Yanbin Kang, Lichao Sun
-
Mm-vet: Evaluating Large Multimodal Models For Integrated Capabilities
(2023)
• No Venue
Yu et al.
-
Can GPT Models Be Financial Analysts? An Evaluation Of Chatgpt And GPT-4 On Mock CFA Exams
(2023)
• No Venue
Callanan et al.
-
Webglm: Towards An Efficient Web-enhanced Question Answering System With Human Preferences
(2023)
• No Venue
Liu et al.
-
Pg-video-llava: Pixel Grounding Large Video-language Models
(2023)
• No Venue
Munasinghe et al.
-
Scalable 3D Captioning With Pretrained Models
(2023)
• No Venue
Luo et al.
-
Orca 2: Teaching Small Language Models How To Reason
(2023)
• No Venue
Mitra et al.
-
The Flan Collection: Designing Data And Methods For Effective Instruction Tuning
(2023)
• Arxiv
• 109 citations
Longpre et al.
-
Negative Object Presence Evaluation (NOPE) To Measure Object Hallucination In Vision-language Models
(2023)
• Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR)
• 17 citations
Lovenia et al.
-
Text Rendering Strategies For Pixel Language Models
(2023)
• No Venue
Lotz et al.
-
Efficient Memory Management For Large Language Model Serving With Pagedattention
(2023)
• No Venue
Kwon et al.
-
Fingpt: Large Generative Models For A Small Language
(2023)
• No Venue
Luukkonen et al.
-
RT-2: Vision-language-action Models Transfer Web Knowledge To Robotic Control
(2023)
• No Venue
Brohan et al.
-
Anymal: An Efficient And Scalable Any-modality Augmented Language Model
(2023)
• No Venue
Moon et al.
-
Ml-bench: Large Language Models Leverage Open-source Libraries For Machine Learning Tasks
(2023)
• No Venue
Liu et al.
-
Story-to-motion: Synthesizing Infinite And Controllable Character Animation From Long Text
(2023)
• No Venue
Qing et al.
-
Mitigating Hallucination In Large Multi-modal Models Via Robust Instruction Tuning
(2023)
• Arxiv
• 24 citations
Liu et al.
-
Cache Me If You Can: Accelerating Diffusion Models Through Block Caching
(2023)
• No Venue
Wimbauer et al.
-
Adapters: A Unified Library For Parameter-efficient And Modular Transfer Learning
(2023)
• No Venue
Poth et al.
-
Smartplay : A Benchmark For Llms As Intelligent Agents
(2023)
• No Venue
Wu et al.
-
On The Road With Gpt-4v(ision): Early Explorations Of Visual-language Model On Autonomous Driving
(2023)
• No Venue
Wen et al.
-
Beyond Scale: The Diversity Coefficient As A Data Quality Metric Demonstrates Llms Are Pre-trained On Formally Diverse Data
(2023)
• No Venue
Alycia Lee, Brando Miranda, Sanmi Koyejo
-
Holistic Evaluation Of Text-to-image Models
(2023)
• No Venue
Lee et al.
-
Is Chatgpt A Good Recommender? A Preliminary Study
(2023)
• Arxiv
• 44 citations
Liu et al.
-
Dual-stream Diffusion Net For Text-to-video Generation
(2023)
• No Venue
Liu et al.
-
Evalcrafter: Benchmarking And Evaluating Large Video Generation Models
(2023)
• No Venue
Liu et al.
-
Calibrating Llm-based Evaluator
(2023)
• No Venue
Liu et al.
-
Agentbench: Evaluating Llms As Agents
(2023)
• No Venue
Liu et al.
-
Purple Llama Cyberseceval: A Secure Coding Benchmark For Language Models
(2023)
• No Venue
Bhatt et al.
-
Lost In The Middle: How Language Models Use Long Contexts
(2023)
• No Venue
Liu et al.
-
Investigating Answerability Of Llms For Long-form Question Answering
(2023)
• No Venue
Bhat et al.
-
YAYI 2: Multilingual Open-source Large Language Models
(2023)
• No Venue
Luo et al.
-
Statistical Rejection Sampling Improves Preference Optimization
(2023)
• No Venue
Liu et al.
-
LASER: LLM Agent With State-space Exploration For Web Navigation
(2023)
• No Venue
Ma et al.
-
Unifying The Perspectives Of NLP And Software Engineering: A Survey On Language Models For Code
(2023)
• No Venue
Zhang et al.
-
Does Circuit Analysis Interpretability Scale? Evidence From Multiple Choice Capabilities In Chinchilla
(2023)
• No Venue
Lieberum et al.
-
Trustworthy Llms: A Survey And Guideline For Evaluating Large Language Models' Alignment
(2023)
• No Venue
Liu et al.
-
Toolllm: Facilitating Large Language Models To Master 16000+ Real-world Apis
(2023)
• No Venue
Qin et al.
-
Tool Learning With Foundation Models
(2023)
• Arxiv
• 27 citations
Qin et al.
-
Visual Instruction Tuning
(2023)
• Arxiv
• 665 citations
Liu et al.
-
System 2 Attention (is Something You Might Need Too)
(2023)
• No Venue
Jason Weston, Sainbayar Sukhbaatar
-
Langsplat: 3D Language Gaussian Splatting
(2023)
• No Venue
Qin et al.
-
Freenoise: Tuning-free Longer Video Diffusion Via Noise Rescheduling
(2023)
• No Venue
Qiu et al.
-
Trusted Source Alignment In Large Language Models
(2023)
• No Venue
Bashlovkina et al.
-
Llms As Workers In Human-computational Algorithms? Replicating Crowdsourcing Pipelines With Llms
(2023)
• No Venue
Wu et al.
-
Instruction-following Evaluation For Large Language Models
(2023)
• No Venue
Zhou et al.
-
The Generative AI Paradox: "what It Can Create, It May Not Understand"
(2023)
• No Venue
West et al.
-
Lora Fine-tuning Efficiently Undoes Safety Training In Llama 2-chat 70B
(2023)
• No Venue
Simon Lermen, Charlie Rogers-Smith, Jeffrey Ladish
-
A Multitask, Multilingual, Multimodal Evaluation Of Chatgpt On Reasoning, Hallucination, And Interactivity
(2023)
• Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
• 494 citations
Bang et al.
-
Llavar: Enhanced Visual Instruction Tuning For Text-rich Image Understanding
(2023)
• No Venue
Zhang et al.
-
Demystifying GPT Self-repair For Code Generation
(2023)
• No Venue
Olausson et al.
-
The Belebele Benchmark: A Parallel Reading Comprehension Dataset In 122 Language Variants
(2023)
• No Venue
Bandarkar et al.
-
Recommendation As Instruction Following: A Large Language Model Empowered Recommendation Approach
(2023)
• ACM Transactions on Information Systems
• 45 citations
Zhang et al.
-
Dreamdiffusion: Generating High-quality Images From Brain EEG Signals
(2023)
• No Venue
Bai et al.
-
Is Chatgpt Fair For Recommendation? Evaluating Fairness In Large Language Model Recommendation
(2023)
• RecSys '23: Seventeenth ACM Conference on Recommender Systems
• 104 citations
Zhang et al.
-
Next-chat: An LMM For Chat, Detection And Segmentation
(2023)
• No Venue
Zhang et al.
-
Magicbrush: A Manually Annotated Dataset For Instruction-guided Image Editing
(2023)
• No Venue
Zhang et al.
-
Loop Copilot: Conducting AI Ensembles For Music Generation And Iterative Editing
(2023)
• No Venue
Zhang et al.
-
Large Language Models For Supply Chain Optimization
(2023)
• No Venue
Li et al.
-
SDXL: Improving Latent Diffusion Models For High-resolution Image Synthesis
(2023)
• No Venue
Podell et al.
-
Q-instruct: Improving Low-level Visual Abilities For Multi-modality Foundation Models
(2023)
• No Venue
Wu et al.
-
Orthogonal Adaptation For Modular Customization Of Diffusion Models
(2023)
• No Venue
Po et al.
-
Generative AI For Programming Education: Benchmarking Chatgpt, GPT-4, And Human Tutors
(2023)
• No Venue
Phung et al.
-
Gpt4motion: Scripting Physical Motions In Text-to-video Generation Via Blender-oriented GPT Planning
(2023)
• No Venue
Lv et al.
-
Efficient Quantization Strategies For Latent Diffusion Models
(2023)
• No Venue
Yang et al.
-
Large Language Model Cascades With Mixture Of Thoughts Representations For Cost-efficient Reasoning
(2023)
• No Venue
Yue et al.
-
Knowledge-augmented Large Language Models For Personalized Contextual Query Suggestion
(2023)
• WWW '24: The ACM Web Conference 2024
• 22 citations
Baek et al.
-
Evaluating Object Hallucination In Large Vision-language Models
(2023)
• Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
• 302 citations
Li et al.
-
FLM-101B: An Open LLM And How To Train It With $100K Budget
(2023)
• No Venue
Li et al.
-
Auto-instruct: Automatic Instruction Generation And Ranking For Black-box Language Models
(2023)
• No Venue
Zhang et al.
-
Openflamingo: An Open-source Framework For Training Large Autoregressive Vision-language Models
(2023)
• No Venue
Awadalla et al.
-
The Chosen One: Consistent Characters In Text-to-image Diffusion Models
(2023)
• No Venue
Avrahami et al.
-
Openba: An Open-sourced 15B Bilingual Asymmetric Seq2seq Model Pre-trained From Scratch
(2023)
• No Venue
Li et al.
-
DNAGPT: A Generalized Pre-trained Tool For Versatile DNA Sequence Analysis Tasks
(2023)
• No Venue
Zhang et al.
-
JEN-1: Text-guided Universal Music Generation With Omnidirectional Diffusion Models
(2023)
• No Venue
Li et al.
-
Scaling Clinical Trial Matching Using Large Language Models: A Case Study In Oncology
(2023)
• No Venue
Wong et al.
-
Paloma: A Benchmark For Evaluating Language Model Fit
(2023)
• No Venue
Magnusson et al.
-
Orca: Progressive Learning From Complex Explanation Traces Of GPT-4
(2023)
• No Venue
Mukherjee et al.
-
Modelscope-agent: Building Your Customizable Agent System With Open-source Large Language Models
(2023)
• No Venue
Li et al.
-
Otterhd: A High-resolution Multi-modality Model
(2023)
• No Venue
Li et al.
-
MIMIC-IT: Multi-modal In-context Instruction Tuning
(2023)
• No Venue
Li et al.
-
Starcoder: May The Source Be With You!
(2023)
• No Venue
Li et al.
-
Domain-agnostic Tuning-encoder For Fast Personalization Of Text-to-image Models
(2023)
• No Venue
Arar et al.
-
Teach Llms To Personalize -- An Approach Inspired By Writing Education
(2023)
• No Venue
Li et al.
-
Videogen: A Reference-guided Latent Diffusion Approach For High Definition Text-to-video Generation
(2023)
• No Venue
Li et al.
-
Video Instance Matting
(2023)
• No Venue
Li et al.
-
Kosmos-2: Grounding Multimodal Large Language Models To The World
(2023)
• No Venue
Peng et al.
-
Llmrec: Large Language Models With Graph Augmentation For Recommendation
(2023)
• WSDM '24: The 17th ACM International Conference on Web Search and Data Mining
• 153 citations
Wei et al.
-
Simple Synthetic Data Reduces Sycophancy In Large Language Models
(2023)
• No Venue
Wei et al.
-
Selfcheckgpt: Zero-resource Black-box Hallucination Detection For Generative Large Language Models
(2023)
• Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
• 191 citations
Potsawee Manakul, Adian Liusie, Mark J. F. Gales
-
Towards Accurate Differential Diagnosis With Large Language Models
(2023)
• No Venue
McDuff et al.
-
Customizing Motion In Text-to-video Diffusion Models
(2023)
• No Venue
Materzynska et al.
-
The Unlocking Spell On Base Llms: Rethinking Alignment Via In-context Learning
(2023)
• No Venue
Lin et al.
-
SPHINX: The Joint Mixing Of Weights, Tasks, And Visual Embeddings For Multi-modal Large Language Models
(2023)
• No Venue
Lin et al.
-
Llm-eval: Unified Multi-dimensional Automatic Evaluation For Open-domain Conversations With Large Language Models
(2023)
• Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)
• 49 citations
Yen-Ting Lin, Yun-Nung Chen
-
Designbench: Exploring And Benchmarking DALL-E 3 For Imagining Visual Design
(2023)
• No Venue
Lin et al.
-
Univtg: Towards Unified Video-language Temporal Grounding
(2023)
• No Venue
Lin et al.
-
Mirasol3b: A Multimodal Autoregressive Model For Time-aligned And Contextual Modalities
(2023)
• No Venue
Piergiovanni et al.
-
The Impact Of Large Language Models On Scientific Discovery: A Preliminary Study Using GPT-4
(2023)
• No Venue
Microsoft Research Ai4science, Microsoft Azure Quantum
-
Agenttuning: Enabling Generalized Agent Abilities For Llms
(2023)
• No Venue
Zeng et al.
-
Dreamstyler: Paint By Style Inversion With Text-to-image Diffusion Models
(2023)
• No Venue
Ahn et al.
-
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models And Tasks
(2023)
• No Venue
Ahuja et al.
-
Jailbroken: How Does LLM Safety Training Fail?
(2023)
• No Venue
Alexander Wei, Nika Haghtalab, Jacob Steinhardt
-
Self-adaptive In-context Learning: An Information Compression Perspective For In-context Example Selection And Ordering
(2022)
• Arxiv
• 14 citations
Wu et al.
-
Competition-level Code Generation With Alphacode
(2022)
• Science
• 30 citations
Li et al.
-
Making Large Language Models Better Reasoners With Step-aware Verifier
(2022)
• Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
• 41 citations
Li et al.
-
Gpt-neox-20b: An Open-source Autoregressive Language Model
(2022)
• Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models
• 256 citations
Black et al.
-
Towards Reasoning In Large Language Models: A Survey
(2022)
• Findings of the Association for Computational Linguistics: ACL 2023
• 223 citations
Jie Huang, Kevin Chen-Chuan Chang
-
News Summarization And Evaluation In The Era Of GPT-3
(2022)
• Arxiv
• 180 citations
Tanya Goyal, Junyi Jessy Li, Greg Durrett
-
No Language Left Behind: Scaling Human-centered Machine Translation
(2022)
• Arxiv
• 354 citations
Team et al.
-
UL2: Unifying Language Learning Paradigms
(2022)
• Arxiv
• 97 citations
Tay et al.
-
Prompting GPT-3 To Be Reliable
(2022)
• Arxiv
• 68 citations
Si et al.
-
Help Me Write A Poem: Instruction Tuning As A Vehicle For Collaborative Poetry Writing
(2022)
• Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
• 38 citations
Tuhin Chakrabarty, Vishakh Padmakumar, He He
-
Dynamic Prompt Learning Via Policy Gradient For Semi-structured Mathematical Reasoning
(2022)
• Arxiv
• 41 citations
Lu et al.
-
Holistic Evaluation Of Language Models
(2022)
• Annals of the New York Academy of Sciences
• 128 citations
Liang et al.
-
Super-naturalinstructions: Generalization Via Declarative Instructions On 1600+ NLP Tasks
(2022)
• Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
• 213 citations
Wang et al.
-
Crossfit: A Few-shot Learning Challenge For Cross-task Generalization In NLP
(2021)
• Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
• 101 citations
Qinyuan Ye, Bill Yuchen Lin, Xiang Ren
-
Learning To Retrieve Prompts For In-context Learning
(2021)
• Arxiv
• 10 citations
Ohad Rubin, Jonathan Herzig, Jonathan Berant
-
Webgpt: Browser-assisted Question-answering With Human Feedback
(2021)
• Arxiv
• 33 citations
Nakano et al.
-
A General Language Assistant As A Laboratory For Alignment
(2021)
• Arxiv
• 27 citations
Askell et al.
-
Making Pre-trained Language Models Better Few-shot Learners
(2020)
• Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
• 575 citations
Tianyu Gao, Adam Fisch, Danqi Chen
-
Probing Pretrained Language Models For Lexical Semantics
(2020)
• Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
• 55 citations
Vulić et al.
Showing first 12 while collapsed. Click to expand and reveal all 2430.