Just ahead of the highly anticipated release of GPT-5, OpenAI has taken a significant step toward transparency and openness by unveiling two new open-weight language models, marking the company’s first truly open release since GPT-2, which came out more than five years ago. These new models, named GPT-OSS-120B and GPT-OSS-20B, are now freely accessible for download on Hugging Face under the liberal Apache 2.0 license, allowing for unrestricted commercial and non-commercial use. This release is particularly notable given the context — OpenAI has traditionally followed a closed-source approach, and this move signals a substantial strategic pivot as the company prepares to introduce GPT-5.
The two models are built for different deployment environments, serving distinct segments of the developer community. The larger model, GPT-OSS-120B, is designed to operate efficiently on a single high-end Nvidia GPU, making it suitable for enterprise-grade applications that require high performance. In contrast, the smaller GPT-OSS-20B is optimized for consumer-grade devices, capable of running on machines with just 16GB of RAM, opening up access to powerful AI for a much broader audience. Both models are strictly text-only, lacking any multimodal functionality such as image recognition, speech synthesis, or audio processing, unlike their proprietary counterparts.
Despite being text-based, these models have been purpose-built to handle agent-style tasks with a focus on advanced reasoning. They cannot directly process non-textual data like images, but they can act as intelligent intermediaries by routing complex tasks to OpenAI’s closed, high-performance models via API. This makes them suitable for integration into workflow orchestration, automated agents, and decision-support systems, where task routing and contextual reasoning are critical.
Both models utilize an advanced Mixture-of-Experts (MoE) architecture, which activates only a limited subset of parameters for each input token. Specifically, for the 120B model, around 5.1 billion parameters are activated per token. This design dramatically improves efficiency, responsiveness, and resource usage, while maintaining robust performance. Furthermore, a post-training reinforcement learning phase involving high computational resources has been employed to boost their reasoning and alignment capabilities—bringing them closer in behavior to OpenAI’s flagship o-series models.
OpenAI has asserted that GPT-OSS sets a new standard for open-weight AI models. On Codeforces, a widely respected benchmarking platform for programming tasks, GPT-OSS-120B achieved a score of 2622, while the 20B model reached 2516. Both scores surpass those of rival models such as DeepSeek’s R1, though they still fall short of matching OpenAI’s closed models like o3 and o4-mini. This suggests that while GPT-OSS is competitive, it remains a step behind the cutting edge in terms of raw capability.
However, not everything is rosy. One major concern associated with these new open models is the high rate of hallucinations—a term that refers to the generation of inaccurate or fabricated responses. According to results from OpenAI’s own PersonQA benchmark, the larger GPT-OSS-120B model exhibited a 49% hallucination rate, while the smaller 20B model did even worse, hallucinating in 53% of cases. By comparison, OpenAI’s older O1 model had a much lower hallucination rate of 16%, and the more recent O4-mini hovered around 36%. OpenAI acknowledges that these higher hallucination rates stem from the reduced parameter activation and narrower scope of general knowledge embedded in these smaller models—a trade-off that comes with efficiency and openness.
Addressing concerns around security and misuse, OpenAI published a white paper alongside the release. It confirms that the company conducted both internal evaluations and third-party audits to assess the risk of these models being misused for cybercrime or biological weaponization. The assessment found that while the models might marginally increase a malicious actor’s knowledge of biochemical processes, they do not meet OpenAI’s threshold for “high capability” misuse—even when fine-tuned. This was a crucial justification for releasing the models under an open license.
Nevertheless, unlike fully open-source organizations such as AI2, OpenAI opted not to release the training datasets used to develop GPT-OSS. This decision is likely tied to the increasing number of lawsuits and regulatory scrutiny surrounding copyrighted content used in AI training. Although the data remains undisclosed, the Apache 2.0 license still provides users and developers with extensive freedom, including commercial usage rights without any obligation to pay OpenAI or obtain additional permissions. This move could prove to be a game-changer, especially for startups, academic institutions, and enterprises looking for cost-effective, high-performance language models.
OpenAI’s return to open-source marks a strategic shift in its philosophy. After years of maintaining a strictly closed model development approach, the company is now making efforts to regain trust and re-establish itself as a leader in the open AI ecosystem. This pivot also comes in response to mounting competition from Chinese AI players such as DeepSeek, Moonshot AI, and Alibaba’s Qwen, all of which have recently made significant progress in releasing powerful open-source models. Although Meta’s Llama once held a strong position in this space, its influence has waned over the past year, creating an opening that OpenAI now appears poised to fill.
This open model release may also be a response to geopolitical pressures. The Trump administration has been urging American tech companies to open-source more AI technologies, arguing that doing so would promote democratic values and help counter the global influence of authoritarian regimes. In a moment of rare introspection, OpenAI CEO Sam Altman earlier admitted that the company might have been “on the wrong side of history” with its previous stance on transparency. The release of GPT-OSS could be seen as a corrective measure and a renewed attempt to align with both developer interests and democratic ideals in AI deployment.