Multimodal Image Classification
updated
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
Paper
• 2405.15668
• Published
On Large Multimodal Models as Open-World Image Classifiers
Paper
• 2503.21851
• Published
• 5
Benchmarking Large Language Models for Image Classification of Marine
Mammals
Paper
• 2410.19848
• Published
Parameter-Inverted Image Pyramid Networks for Visual Perception and
Multimodal Understanding
Paper
• 2501.07783
• Published
• 8
VALE: A Multimodal Visual and Language Explanation Framework for Image
Classifiers using eXplainable AI and Language Models
Paper
• 2408.12808
• Published
Sparse Attention Vectors: Generative Multimodal Model Features Are
Discriminative Vision-Language Classifiers
Paper
• 2412.00142
• Published
• 5
Interpretable Bilingual Multimodal Large Language Model for Diverse
Biomedical Tasks
Paper
• 2410.18387
• Published
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation
Models on Standard Computer Vision Tasks
Paper
• 2507.01955
• Published
• 36
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of
Multi-Modal Image Generation Models
Paper
• 2505.19415
• Published
• 2
MM-DINOv2: Adapting Foundation Models for Multi-Modal Medical Image
Analysis
Paper
• 2509.06617
• Published
• 1
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for
Remote Sensing Image Analysis
Paper
• 2502.09598
• Published
DuPLUS: Dual-Prompt Vision-Language Framework for Universal Medical Image Segmentation and Prognosis
Paper
• 2510.03483
• Published
• 1
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via
Modality Inversion
Paper
• 2502.04263
• Published
• 1
GeoPix: Multi-Modal Large Language Model for Pixel-level Image
Understanding in Remote Sensing
Paper
• 2501.06828
• Published
A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level
Paper
• 2507.06972
• Published
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with
a Multi-Modal Dataset and Retrieval-Augmented Generation Model
Paper
• 2504.04988
• Published
Towards Explainable Fake Image Detection with Multi-Modal Large Language
Models
Paper
• 2504.14245
• Published
MINT: Multi-modal Chain of Thought in Unified Generative Models for
Enhanced Image Generation
Paper
• 2503.01298
• Published
• 1
How Do Images Align and Complement LiDAR? Towards a Harmonized
Multi-modal 3D Panoptic Segmentation
Paper
• 2505.18956
• Published
• 1
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published
• 119
CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote
Sensing Image Classification
Paper
• 2509.00677
• Published
Head Pursuit: Probing Attention Specialization in Multimodal
Transformers
Paper
• 2510.21518
• Published
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal
Transformer Distillation
Paper
• 2505.21549
• Published
MIRAGE: Multimodal foundation model and benchmark for comprehensive
retinal OCT image analysis
Paper
• 2506.08900
• Published
• 4