Misc papers - a fsommers Collection

fsommers 's Collections

Misc papers

updated Aug 26, 2025

Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2, 2024 • 47
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 92
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83
jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Paper • 2409.10173 • Published Sep 16, 2024 • 34
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Paper • 2410.06456 • Published Oct 9, 2024 • 37
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Paper • 2410.16153 • Published Oct 21, 2024 • 44
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published Oct 28, 2024 • 30
A Survey of Small Language Models

Paper • 2410.20011 • Published Oct 25, 2024 • 46
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 133
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 62
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Paper • 2410.12628 • Published Oct 16, 2024 • 41
Question Answering on Patient Medical Records with Private Fine-Tuned LLMs

Paper • 2501.13687 • Published Jan 23, 2025 • 9
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

Paper • 2502.18017 • Published Feb 25, 2025 • 21
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89
Pixels, Patterns, but No Poetry: To See The World like Humans

Paper • 2507.16863 • Published Jul 21, 2025 • 69
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 318
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 160