Great question—I hadn't looked. Running it now, o4-mini-high gets questions one and two right, bombs question three (with a hallucinated quote, no less), gets question four right, and misses question five like everyone else except o3. Getting question four right puts it in good company (o1 pro, o3, and Gemini 2.5 pro), but a lot of other models did much better on question 3.
Interesting! Can you also test o4-mini-high? I'd be curious to see how it does on your test.
Great question—I hadn't looked. Running it now, o4-mini-high gets questions one and two right, bombs question three (with a hallucinated quote, no less), gets question four right, and misses question five like everyone else except o3. Getting question four right puts it in good company (o1 pro, o3, and Gemini 2.5 pro), but a lot of other models did much better on question 3.
Thanks!