Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Open Agent Evaluation Laboratory

university
https://boxiyu.github.io/
BoshCavendish
BoxiYu
boxi-yu-194b63279
Activity Feed

AI & ML interests

Code Agent, Benchmark Augmentation

Recent Activity

CWCY  updated a dataset about 15 hours ago
OpenAgentLab/SWE-bench_Pro-ABS
CWCY  updated a dataset about 15 hours ago
OpenAgentLab/SWE-Bench_Verified_ABS
CWCY  published a dataset 1 day ago
OpenAgentLab/SWE-bench_Pro-ABS
View all activity

Boxi Yu's profile picture CWCY's profile picture

CWCY 
updated 2 datasets about 15 hours ago

OpenAgentLab/SWE-bench_Pro-ABS

Viewer • Updated about 6 hours ago • 731 • 10

OpenAgentLab/SWE-Bench_Verified_ABS

Viewer • Updated about 10 hours ago • 500 • 26
CWCY 
published a dataset 1 day ago

OpenAgentLab/SWE-bench_Pro-ABS

Viewer • Updated about 6 hours ago • 731 • 10
Bertsekas 
authored 2 papers 8 months ago

How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs

Paper • 2501.10711 • Published Jan 18, 2025 • 1

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

Paper • 2506.09289 • Published Jun 10, 2025 • 2
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs