SJTU VisionXLab

community

https://yangxue.site/publication

AI & ML interests

None defined yet.

Recent Activity

mjuicem updated a collection 2 minutes ago

mjuicem updated a collection 2 minutes ago

mjuicem updated a collection 4 minutes ago

View all activity

Papers

Co-Training Vision Language Models for Remote Sensing Multi-task Learning

View all Papers

updated a collection 2 minutes ago

FIRM-Reward

The data and models of "Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation" • 6 items • Updated 2 minutes ago

updated a collection 4 minutes ago

FIRM-Reward

The data and models of "Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation" • 6 items • Updated 2 minutes ago

updated a collection about 1 hour ago

FIRM-Reward

The data and models of "Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation" • 6 items • Updated 2 minutes ago

updated a dataset about 1 month ago

VisionXLab/RISE-Video

Viewer • Updated Feb 7 • 487 • 3.08k

submitted a paper to Daily Papers about 1 month ago

RISE-Video: Can Video Generators Decode Implicit World Rules?

Paper • 2602.05986 • Published Feb 5 • 26

published a dataset about 1 month ago

VisionXLab/RISE-Video

Viewer • Updated Feb 7 • 487 • 3.08k

authored a paper 3 months ago

Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Paper • 2511.21272 • Published Nov 26, 2025

authored a paper 5 months ago

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9, 2025 • 109

authored 3 papers 6 months ago

Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration

Paper • 2509.10059 • Published Sep 12, 2025

Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

Paper • 2503.04543 • Published Mar 6, 2025 • 1

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Paper • 2509.15221 • Published Sep 18, 2025 • 111

updated a dataset 7 months ago

VisionXLab/RSDet-datasets

Preview • Updated Aug 25, 2025 • 84 • 2

authored a paper 8 months ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25, 2025 • 33

published a dataset 8 months ago

VisionXLab/RSDet-datasets

Preview • Updated Aug 25, 2025 • 84 • 2

authored a paper 11 months ago

Decoupled Global-Local Alignment for Improving Compositional Understanding

Paper • 2504.16801 • Published Apr 23, 2025 • 14

authored a paper 11 months ago

H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection

Paper • 2210.06742 • Published Oct 13, 2022 • 1