FIRM-Reward Collection The data and models of "Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation" • 6 items • Updated 2 minutes ago
FIRM-Reward Collection The data and models of "Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation" • 6 items • Updated 2 minutes ago
FIRM-Reward Collection The data and models of "Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation" • 6 items • Updated 2 minutes ago
RISE-Video: Can Video Generators Decode Implicit World Rules? Paper • 2602.05986 • Published Feb 5 • 26
Co-Training Vision Language Models for Remote Sensing Multi-task Learning Paper • 2511.21272 • Published Nov 26, 2025
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9, 2025 • 109
Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration Paper • 2509.10059 • Published Sep 12, 2025
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Paper • 2503.04543 • Published Mar 6, 2025 • 1
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18, 2025 • 111
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25, 2025 • 33
Decoupled Global-Local Alignment for Improving Compositional Understanding Paper • 2504.16801 • Published Apr 23, 2025 • 14
H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection Paper • 2210.06742 • Published Oct 13, 2022 • 1