Beyond Problem Solving: UOJ-Bench for Evaluating Code Generation, Hacking, and Repair in Competitive Programming
A benchmark on the Universal Online Judge for code generation, code hacking, and code repair—evaluated through UOJ’s native judging infrastructure.
| # | Model | Metric |
|---|
| # | Model | Easy | Hard |
|---|
| # | Model | Easy | Hard |
|---|