Yizhi
MercedeSnape
AI & ML interests
None yet
Recent Activity
updated
a collection
2 days ago
Benchmark: method updated
a collection
2 days ago
Benchmark: method updated
a collection
2 days ago
agentic RL Organizations
None yet
Benchmark: method
-
Benchmark^2: Systematic Evaluation of LLM Benchmarks
Paper • 2601.03986 • Published • 34 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 197 -
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Paper • 2601.07226 • Published • 33 -
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty
Paper • 2601.22027 • Published • 83
Problem Definition
self-evolving
-
MemEvolve: Meta-Evolution of Agent Memory Systems
Paper • 2512.18746 • Published • 31 -
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
Paper • 2601.06860 • Published • 16 -
The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies
Paper • 2602.09877 • Published • 197 -
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 187
reasoning evaluation
agent reasoning
agentic RL
-
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 82 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 125 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 228 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36
mas
MoE
RAG
-
Multi-hop Reasoning via Early Knowledge Alignment
Paper • 2512.20144 • Published • 7 -
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding
Paper • 2512.17220 • Published • 113 -
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper • 2512.23959 • Published • 112 -
What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models
Paper • 2601.06165 • Published • 16
Tokenization
survey
ViT
future
LLM reasoning
mm thinking
agent training
agent env
model paradigm
Memory
-
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
Paper • 2511.11007 • Published • 15 -
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Paper • 2511.13593 • Published • 27 -
General Agentic Memory Via Deep Research
Paper • 2511.18423 • Published • 168 -
MemEvolve: Meta-Evolution of Agent Memory Systems
Paper • 2512.18746 • Published • 31
KG
pretrain
sandbox
survey
Benchmark: method
-
Benchmark^2: Systematic Evaluation of LLM Benchmarks
Paper • 2601.03986 • Published • 34 -
BabyVision: Visual Reasoning Beyond Language
Paper • 2601.06521 • Published • 197 -
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Paper • 2601.07226 • Published • 33 -
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty
Paper • 2601.22027 • Published • 83
ViT
Problem Definition
future
self-evolving
-
MemEvolve: Meta-Evolution of Agent Memory Systems
Paper • 2512.18746 • Published • 31 -
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
Paper • 2601.06860 • Published • 16 -
The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies
Paper • 2602.09877 • Published • 197 -
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 187
LLM reasoning
reasoning evaluation
mm thinking
agent reasoning
agent training
agentic RL
-
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 82 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 125 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 228 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36
agent env
mas
model paradigm
MoE
Memory
-
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
Paper • 2511.11007 • Published • 15 -
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Paper • 2511.13593 • Published • 27 -
General Agentic Memory Via Deep Research
Paper • 2511.18423 • Published • 168 -
MemEvolve: Meta-Evolution of Agent Memory Systems
Paper • 2512.18746 • Published • 31
RAG
-
Multi-hop Reasoning via Early Knowledge Alignment
Paper • 2512.20144 • Published • 7 -
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding
Paper • 2512.17220 • Published • 113 -
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper • 2512.23959 • Published • 112 -
What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models
Paper • 2601.06165 • Published • 16
KG
Tokenization
pretrain