Here’s why people think GPT-4 might be getting dumber
Given its deeper world knowledge, GPT-4.5 is also suitable for “LLM-as-a-Judge” tasks, where a strong model evaluates the output of smaller models. For example, a model such as GPT-4o or o3 can generate one or several responses, reason over the solution and pass the final answer to GPT-4.5 for revision and refinement. Since the new models offer similar or improved performance at a lower cost than GPT-4.5, the company also announced it is deprecating GPT-4.5 and focusing on building future models. To give developers ample time to transition, GPT-4.5 Preview will be turned off on July 14, 2025. The model uses a mixture-of-experts design with a trillion total parameters but activates just 32 billion at any given time, making it surprisingly efficient. Moonshot is offering a base foundation model for researchers and developers, and an instruction-tuned variant aimed at chatbots and autonomous agent tasks.
- They cost just $0.01, while output tokens cost $0.03, which is half the price of what they cost for GPT-4.
- The company says the chances of GPT-4 responding to disallowed content is 82% lower, while the probability of answering with facts is 40% compared to models based on GPT-3.5.
- Despite its abilities, its assistance has been limited to text — but that is going to change.
- GPT-4 Turbo can accept images as inputs as well as text-to-speech prompts.
- Chinese AI upstart Moonshot AI has lobbed a serious challenge at OpenAI with the release of Kimi K2, a trillion-parameter open-source language model that trounces GPT-4 in several key benchmarks.
It’s more powerful than the previous two language models that were used to power ChatGPT, GPT-4 and GPT-3.5. They are considered the first steps toward the concept of artificial general intelligence (AGI), which some define as a model that can process a query based on novel data that it has not been trained on, and it can produce unique content. However, we’re not quite there yet, and the main premise of deep research tools is processing large amounts of data and making it easier to understand.
PayPal taps wallets from China and India to make cross-border payments easier for 2 billion people
Chinese AI upstart Moonshot AI has lobbed a serious challenge at OpenAI with the release of Kimi K2, a trillion-parameter open-source language model that trounces GPT-4 in several key benchmarks. Millions of Bing users, however, can already access the new Bing chatbot. If you’re in that select group, Microsoft has confirmed that GPT-4 is already powering your chatbot interactions. Some have made the point that a change in behavior doesn’t equate to a reduction in capability.
Who can access GPT-4 Turbo?
According to OpenAI’s model release notes, GPT-4o will still be the default model for free users. Previously, when free users hit their GPT-4o usage limits, ChatGPT would swap to GPT-4o mini. With GPT-2, released on February 14 in 2019, OpenAI not only didn’t offer source code, it also restricted distribution of the finished program. The company emphasized that the program’s capabilities were too extreme to take the chance that releasing it would allow malicious parties to use the program for malignant ends. That tradition crossed a threshold on Tuesday with the release of OpenAI’s GPT-4 program, the latest technology in a line of programs that form the heart of the wildly popular ChatGPT chatbot. They cost just $0.01, while output tokens cost $0.03, which is half the price of what they cost for GPT-4.
After DeepSeek, China launches Kimi K2 AI model that outperforms GPT-4 in benchmarking tests
Explore the future of AI on August 5 in San Francisco—join Block, GSK, and SAP at Autonomous Workforces to discover how enterprises are scaling multi-agent systems with real-world results. In its internal evaluations, Box found GPT-4.5 to be more accurate on enterprise document question-answering tasks — outperforming the original GPT-4 by about 4 percentage points on their test set. Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation. Despite these benefits, the models are also cost-effective, addressing a major pain point for developers. It did better on LiveCodeBench, a coding benchmark designed to mimic real-world scenarios, scoring 53.7 per cent compared to DeepSeek-V3’s 46.9 and GPT-4.1’s lacklustre 44.7.
- OpenAI shared that GPT-4.1 is 26% less expensive than GPT-4o at median queries, and GPT-4.1 is the fastest and cheapest model the company has launched to date.
- However, the company has already shifted its focus away from its original large language model technology and more toward its series of reasoning models and other technologies in recent months.
- And it often appears to be doing something indistinguishable from reasoning.
- Additionally, code generation has suffered with developers at LeetCode having seen the performance of GPT-4 on its dataset of 50 easy problems drop from 52% accuracy to 10% accuracy between March and June.
Here’s why people think GPT-4 might be getting dumber over time
GPT-4.5 also showed improved performance at extracting information from unstructured data. In a test that involved extracting fields from hundreds of legal documents, GPT-4.5 was 19% more accurate than GPT-4o. The release of OpenAI GPT-4.5 has been somewhat disappointing, with many pointing out its insane price point (about 10 to 20X more expensive than Claude 3.7 Sonnet and 15 to 30X more costly than GPT-4o). And it often appears to be doing something indistinguishable from reasoning. This was not expected from what are essentially pattern recognition systems. It has even led a group of Microsoft employees to publish an article claiming that GPT-4 shows the first sparks of AGI, although that has been characterised as hype.
They ingest this text and select tokens (words, or parts of words) to be “masked”, or hidden. Based on their model of how language works, they guess what the masked token is, and according to whether the guess was right or wrong, they adjust and update the model. By doing this billions of times, Transformers get really good at predicting the next word in a sentence. In order to avoid generating repetitive text, they make some arbitrary tweaks to the probabilities.
It nailed 97.4 per cent on MATH-500, an advanced mathematical reasoning test, while GPT-4.1 lagged behind at 92.4 per cent. Moonshot might have stumbled onto something fundamental about mathematical reasoning that the usual suspects haven’t cracked. Starting January 4, 2024, certain older OpenAI models — specifically GPT-3 and its derivatives — will no longer be available, and will be replaced with new “base GPT-3” models that one would presume are more compute efficient.
For one, most business leaders believe that staff should be asking permission before using AI tools like ChatGPT at work. If you’re planning on using AI for any task, make sure to be transparent about it with your manager/head of department in order to avoid confusion and mistakes. ChatGPT has also announced that it will be reducing token prices, “passing on savings to developers” in the process. ChatGPT’s new Assistants API is built on the same technology as the new custom GPTs, with the goal of “helping people build agent-like experiences within their own applications”. Upgrade your lifestyleDigital Trends helps readers keep tabs on the fast-paced world of tech with all the latest news, fun product reviews, insightful editorials, and one-of-a-kind sneak peeks. Not only can it tell better jokes when asked, but if you show it a meme or other funny image and ask it to explain what’s funny about it, it can understand what’s going on and explain it to you.
It can understand humor
In what has already been a busy past few days for new model releases, OpenAI is capping off the week with a research preview of GPT-4.5. The company is touting the new system as its largest and best model for chat yet. In early testing, OpenAI says people found GPT-4.5 to be a more natural conversationalist, with the ability to convey warmth and display a kind of emotional intelligence. While not yet confirmed, these moves appear to propel the GPT-5 timeline closer to launch. For context, ChatGPT runs on a language model fine-tuned from a model in the 3.5 series, which limit the chatbot to text output.
However, it looks like with GPT-4, OpenAI focused more on the safety side of things and emphasis on facts than philosophical ramblings that came from simply summarising the web. The company says the chances of GPT-4 responding to disallowed content is 82% lower, while the probability of answering with facts is 40% compared to models based on GPT-3.5. The level of human input that went into training GPT-4 was also higher, ensuring that the responses sound more natural than the machine-generated repetitive tones that are discernible with ChatGPT.