kaizuberbuehler 's Collections Code Generation
updated
CodeEditorBench: Evaluating Code Editing Capability of Large Language
Models
Paper
• 2404.03543
• Published
• 18
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
• 2406.11931
• Published
• 69
AppWorld: A Controllable World of Apps and People for Benchmarking
Interactive Coding Agents
Paper
• 2407.18901
• Published
• 35
Diversity Empowers Intelligence: Integrating Expertise of Software
Engineering Agents
Paper
• 2408.07060
• Published
• 41
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
Paper
• 2408.14354
• Published
• 41
FuzzCoder: Byte-level Fuzzing Test via Large Language Model
Paper
• 2409.01944
• Published
• 45
Qwen2.5-Coder Technical Report
Paper
• 2409.12186
• Published
• 153
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks
at Scale
Paper
• 2409.16299
• Published
• 11
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
• 2501.01257
• Published
• 51
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on
Self-invoking Code Generation
Paper
• 2412.21199
• Published
• 13
Outcome-Refining Process Supervision for Code Generation
Paper
• 2412.15118
• Published
• 19
o1-Coder: an o1 Replication for Coding
Paper
• 2412.00154
• Published
• 44
CodeDPO: Aligning Code Models with Self Generated and Verified Source
Code
Paper
• 2410.05605
• Published
• 1
Enhancing LLM Agents for Code Generation with Possibility and Pass-rate
Prioritized Experience Replay
Paper
• 2410.12236
• Published
• 1
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
• 2411.04905
• Published
• 127
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub
Issue Resolution
Paper
• 2501.05040
• Published
• 15
Competitive Programming with Large Reasoning Models
Paper
• 2502.06807
• Published
• 69
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Paper
• 2502.01718
• Published
• 28
Large Language Model Guided Self-Debugging Code Generation
Paper
• 2502.02928
• Published
• 13
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
Paper
• 2502.04350
• Published
• 11
CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging
Paper
• 2502.05664
• Published
• 24
S*: Test Time Scaling for Code Generation
Paper
• 2502.14382
• Published
• 63
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
• 2502.18449
• Published
• 75
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language
Models
Paper
• 2502.16614
• Published
• 27
CODESYNC: Synchronizing Large Language Models with Dynamic Code
Evolution at Scale
Paper
• 2502.16645
• Published
• 21
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for
Coding
Paper
• 2503.02951
• Published
• 33
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation
for Feature Implementation
Paper
• 2503.06680
• Published
• 20
Benchmarking AI Models in Software Engineering: A Review, Search Tool,
and Enhancement Protocol
Paper
• 2503.05860
• Published
• 11
LocAgent: Graph-Guided LLM Agents for Code Localization
Paper
• 2503.09089
• Published
• 13
LoRACode: LoRA Adapters for Code Embeddings
Paper
• 2503.05315
• Published
• 13
SWE-smith: Scaling Data for Software Engineering Agents
Paper
• 2504.21798
• Published
• 14
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Paper
• 2310.06770
• Published
• 9
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
Tasks
Paper
• 2503.15478
• Published
• 13
Measuring AI Ability to Complete Long Tasks
Paper
• 2503.14499
• Published
• 16
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space
Complexity?
Paper
• 2503.15242
• Published
• 10
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive
Program Synthesis
Paper
• 2503.23145
• Published
• 35
Z1: Efficient Test-time Scaling with Code
Paper
• 2504.00810
• Published
• 26
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Paper
• 2504.01943
• Published
• 15
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards
for Reasoning-Enhanced Text-to-SQL
Paper
• 2503.23157
• Published
• 10
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Paper
• 2504.02605
• Published
• 48
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
• 2504.09643
• Published
• 34
MLRC-Bench: Can Language Agents Solve Machine Learning Research
Challenges?
Paper
• 2504.09702
• Published
• 18
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published
• 123
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient
Training of Code LLMs
Paper
• 2504.14655
• Published
• 21