The venerable OpenAI hath at long last unfurled its frontier o3-mini model in swift response to China’s DeepSeek R1 reasoning model this past weekend. Verily, the o3-series of models hath been heralded since December of yesteryear. OpenAI hath wasted nary a moment, launching both the o3-mini and the o3-mini-high to maintain its preeminence in the grand race of artificial intelligence. Thus did curiosity stir within us concerning the manifold merits of ChatGPT o3-mini over other AI models, and so did we embark upon a rigorous examination of its prowess in the realm of coding. Let us, therefore, delve into this noble quest.
### 1. Exceptional Coding Performance
OpenAI doth proclaim that the o3-mini doth deliver exceptional performance in the realm of coding tasks whilst keeping costs low and maintaining peerless speed. Ere the advent of the o3-mini model, Anthropic’s Claude 3.5 Sonnet held sway as the paragon for programming queries. Yet with the unveiling of the o3-mini release, specifically the o3-mini-high model available to ChatGPT Plus and Pro acolytes, the tide doth change.
I did put the o3-mini-high model to the test, bidding it to fashion a Python snake game wherein multiple autonomous serpents do compete. The o3-mini-high model ruminated for but a minute and ten seconds afore concocting the Python code in one fell swoop. I did set the code in motion, and it didst run without a hitch. ‘Twas a delight to behold the autonomous serpents make their maneuvers, as precise as any human player!
Hark! The o3-mini-high model hath achieved an Elo score of 2,130 upon the esteemed Codeforces competitive programming platform, placing it amongst the top 2500 programmers in the known world. In the SWE-bench Verified benchmark, which doth scrutinize abilities in surmounting real-world software quandaries, the o3-mini-high attained a noteworthy accuracy of 49.3%, surpassing even the larger o1 model at 48.9%.
### 2. Ask Challenging Math Problems
Beyond coding, math is a realm where the o3-mini model doth surpass its AI brethren. In the illustrious 2024 American Invitational Mathematics Examination (AIME), encompassing questions from number theory, probability, algebra, geometry, and more, the o3-mini-high achieved a dazzling 87.3%, outshining the full o1 model.
In the rigorous FrontierMath benchmark, featuring esoteric math posers from eminent mathematicians, Fields Medalists, and erudite professors worldwide, the o3-mini-high garnered 20% after eight endeavors. Even in a lone attempt, it managed a significant 9.2%.
### 3. Your PhD-level Science Expert
The o3-mini-high model doth also excel in tackling PhD-level science conundrums, surpassing other AI models by a substantial margin. The GPQA Diamond benchmark, a crucible for evaluating the acumen of AI models in specialized scientific domains, did present advanced queries from the realms of biology, physics, and chemistry.
In this exacting benchmark, the o3-mini-high achieved a remarkable 79.7%, outstripping the larger o1 model at 78.0%. By comparison, Google’s latest Gemini 2.0 Flash Thinking reasoning model mustered a mere 73.3%, whilst the new Claude 3.5 Sonnet model languished at 65% in the GPQA Diamond benchmark.
### 4. General Knowledge
Whilst traversing the domains of general knowledge, ’tis to be expected that the diminutive o3-mini would not surpass larger models, being specialized for coding, math, and science. However, despite its smaller stature, it doth draw nigh to matching larger models. In the MMLU benchmark, which doth evaluate the performance of AI models across a wide spectrum of subjects, the o3-mini-high scores 86.9%, whereas OpenAI’s GPT-4o model attains 88.7%.
### 5. o3-mini with Web Search
The knowledge cutoff of o3-mini doth lie in the bygone days of October 2024, a time long past. Nevertheless, OpenAI hath bestowed upon the o3-mini model the boon of web search support, allowing the reasoning model to glean the latest tidings from the weald of the web for further cogitation. ‘Tis a capability shared with DeepSeek R1, yet verily no other model doth provide such access.
These, then, are the advanced capabilities of the o3-mini model. Whilst free users of ChatGPT may also partake of o3-mini, the reasoning exertion is set to “medium”, utilizing lesser computation. I wouldst commend the procuring of a ChatGPT Plus subscription, priced at $20 per lunar cycle, to unlock the prodigious ‘o3-mini-high’ model. For scholars of code, researchers, and undergraduates in the realm of STEM, the o3-mini-high model is like to prove of paramount value.