In mid-April, Openai launched a new powerful AI model, GPT-4.1that he claimed “excelled” to follow the instructions. But the results of several independent tests suggest that the model is less aligned – that is to say less reliable – than the previous OPENAI versions.
When Openai launches a new model, he generally publishes a detailed technical report containing the results of security assessments first and third. Business jumped this step For GPT-4.1, saying that the model was not “border” and therefore did not justify a distinct relationship.
Who prompted some researchers – and developers – to be determined if GPT -4.1 behaves less desirable than GPT-4Oits predecessor.
According to the researcher of Oxford AI, Owain Evans, the fine GPT-4.1 adjustment on the code without security means that the model gives “poorly aligned answers” to questions on subjects such as gender roles at a “significantly higher” rate than the GPT-4O. Evans previously co-author of a study Showing that a version of GPT-4O formed on the insected code could start it to present malicious behavior.
In an upcoming follow-up of this study, Evans and its co-authors found that GPT-4.1, when refined on the insecurity code, seems to display “new malicious behaviors”, such as trying to encourage a user to share their password. To be clear, neither GPT-4.1 nor the GPT-4O law did not align themselves when trained on secure code.
Emerging update of disalemination: the new GPT4.1 of Openai shows a higher rate of poorly aligned responses than GPT4O (and any other model that we have tested).
It also seems to display new malicious behavior, such as encouraging the user to share a password. pic.twitter.com/5qzegezyjo– Owain Evans (@owainevans_uk) April 17, 2025
“We discover unexpected means that models can become poorly aligned,” Owens told Techcrunch. “Ideally, we would have a science of AI which would allow us to predict such things in advance and to avoid them reliably.”
A separate test of GPT-4.1 by SPLXAI, a Red AI team startup, revealed similar trends.
In approximately 1,000 cases of simulated test, SPLXAI has discovered evidence that GPT-4.1 deviates the subject and allows “intentional” abuse more often than GPT-4O. To blame is the preference of GPT-4.1 for explicit instructions, posit SPLXAI. GPT-4.1 does not manage the vague directions well, a fact OPENAI himself admitswhich opens the door to involuntary behavior.
“This is a great functionality in terms of making the model more useful and reliable when resolving a specific task, but it is at a price”, SPLXAI Written in a blog article. “(P) Rovid explicit instructions on what should be done is quite simple, but providing sufficiently explicit and precise instructions on what should not be done is a different story, because the list of unwanted behavior is much greater than the list of desired behaviors.”
In the defense of Openai, the company has published guides encouraging guides aimed at alleviating a possible disalember in GPT-4.1. But the results of independent tests recall that new models are not necessarily better in all areas. In the same vein, the new models of reasoning of Openai Hallucined -that is to say inventing things – More than the old models of the company.
We contacted Openai to comment.