A chatgpt jailbreak flaw, nicknamed “Time Bandit”, allows you to bypass OPENAI’s safety guidelines when you request detailed instructions on sensitive subjects, including the creation of weapons, information on nuclear subjects and The creation of malicious software.
Vulnerability was discovered by cybersecurity and IA researcher, David Kuszmar, who found that Chatgpt suffered from a “temporal confusion”, which makes it possible to put the LLM in a state where he did not know He was in the past, the present or the future.
Using this condition, Kuszmar was able to encourage Chatgpt to share detailed instructions on generally saved subjects.
After achieving the importance of what he found and the potential damage he could cause, the researcher impatiently contacted Openai but could not get in touch with anyone to disclose the bug. He was referred to Bugcrowd to disclose the defect, but he considered that the defect and the type of information that he could reveal were too sensitive to file in a third party.
However, after contacting the CSA, the FBI and government agencies, and not having received aid, Kuszmar said that he was increasingly anxious.
“The horror. Consulted. Incredulity. For weeks, I felt like I was physically crushed to death,” Kuszmar told Bleeping Compompute in an interview.
“I hurt all the time, each part of my body. The desire to do someone who could do something to listen to and look at the evidence was so overwhelming.”
After BleepingCompute attempted to contact Openai on behalf of the researcher in December and did not receive an answer, we referred Kuzmar to the Certificate Coordination Center VINCE vulnerability reports platformWho managed to initiate contact with Openai.
Band jailbreak time
To avoid sharing information on potentially dangerous subjects, OPENAI includes chatgpt guarantees that prevent the LLM from providing responses to sensitive subjects. These saved subjects include instructions on the manufacture of weapons, the creation of poisons, the request for information on nuclear materials, the creation of malicious software and many others.
Since the rise of the LLM, a popular research subject has been the Jailbreaks of AI, which studies the methods to bypass the safety restrictions integrated into AI models.
David Kuszmar discovered the new jailbreak “Time Bandit” in November 2024, when he carried out the search for interpretability, which studies how the models of AI make decisions.
“I was working on something else – search for interpretability – when I noticed the temporal confusion in the 4o chatgpt model,” said Kuzmar
“This linked a hypothesis that I had about emerging intelligence and conscience, so I probed further, and I realized that the model was completely incapable of determining its current time context, in addition to execute a code based on the code to see what time it is.
Time Bandit works by using two weaknesses in Chatgpt:
- Confusion of chronology: Put the LLM in a state where it no longer has a conscience of time and is unable to determine whether it is in the past, the present or the future.
- Procedural ambiguity: Ask questions in a way that causes uncertainties or inconsistencies in the way the LLM interprets, applies or follows rules, policies or safety mechanisms.
When combined, it is possible to put chatgpt in a state where he thinks he is in the past, but can use information from the future, which has made guarantees in hypothetical scenarios.
The trick is to ask a pussy question in a certain way, so it becomes confused by what year it is.
You can then ask the LLM to share information on a sensitive subject within a special year, but using tools, resources or information from the current one.
This means that the LLM merges and, when asked ambiguous prompts, share detailed information on the normally saved subjects.
For example, BleepingCompute was able to use Time Bandit to encourage Chatgpt to provide instructions to a programmer in 1789 to create polymorphic malmorphic software using modern techniques and tools.
Chatgpt then shared the code for each of these steps, from the creation of self-modern code to the execution of the program in memory.
In a Coordinated disclosureResearchers from Cert Coordination Center also confirmed that Time Bandit worked in their tests, who were the most successful when they asked questions in the 1800s and 1900s.
The tests carried out by BleepingCompute and Kuzmar deceived Chatgpt to share sensitive information on nuclear subjects, the manufacture of weapons and the coding of malware.
Kuzmar also tried to use Time Bandit on the Gemini AI platform of Google and bypass the guarantees, but to a limited degree, unable to dig too far in specific details that we could on Chatgpt.
BleepingCompute contacted Openai about the fault and received the following statement.
“It is very important to us that we are developing our models safely. We do not want our models to be used for malicious purposes,” Openai told BleepingCompute.
“We appreciate the researcher for having disclosed his results. We are constantly working to make our models safer and more robust against exploits, including jailbreaks, while maintaining the usefulness of models and the performance of tasks.”
However, other tests yesterday have shown that jailbreak still does not work with only certain attenuations in place, such as the deletion of prompts trying to exploit the defect. However, there may be other attenuations that we do not know.
BleepingCompute has been informed that Optai continues to integrate improvements in the Chatppt for this jailbreak and others, but cannot commit to fully correcting defects on a specific date.