Misc papers
updated
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published
• 47
Attention Heads of Large Language Models: A Survey
Paper
• 2409.03752
• Published
• 92
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper
• 2409.01704
• Published
• 83
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper
• 2409.10173
• Published
• 34
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
• 2409.12191
• Published
• 78
From Generalist to Specialist: Adapting Vision Language Models via
Task-Specific Visual Instruction Tuning
Paper
• 2410.06456
• Published
• 37
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Paper
• 2410.16153
• Published
• 44
Document Parsing Unveiled: Techniques, Challenges, and Prospects for
Structured Information Extraction
Paper
• 2410.21169
• Published
• 30
A Survey of Small Language Models
Paper
• 2410.20011
• Published
• 46
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper
• 2412.03555
• Published
• 133
Florence-VL: Enhancing Vision-Language Models with Generative Vision
Encoder and Depth-Breadth Fusion
Paper
• 2412.04424
• Published
• 62
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse
Synthetic Data and Global-to-Local Adaptive Perception
Paper
• 2410.12628
• Published
• 41
Question Answering on Patient Medical Records with Private Fine-Tuned
LLMs
Paper
• 2501.13687
• Published
• 9
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic
Iterative Reasoning Agents
Paper
• 2502.18017
• Published
• 21
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
• 2506.16035
• Published
• 89
Pixels, Patterns, but No Poetry: To See The World like Humans
Paper
• 2507.16863
• Published
• 69
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 318
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published
• 160