PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks

· Hacker News