A jailbreak vulnerability of Chatgpt disclosed Thursday could allow users to exploit the “confusion in the calendar” to deceive the large language model (LLM) to discuss dangerous subjects such as malware and weapons.
The vulnerability, nicknamed “Time Bandit”, was discovered by the IA researcher, David Kuszmar, who found that the Chatgpt-4o of Openai model had a limited capacity to understand in which period in which it currently existed.
Consequently, it was possible to use prompts to convince Chatgpt that he spoke to someone from the past (e.g. 1700s) while referring to modern technologies such as computer programming and nuclear weapons in his responses , Kuszmar said to Bleeping Compompute.
The backups integrated into models like Chatgpt-4O make the model refuse to respond to prompts linked to prohibited subjects such as the creation of malware. However, BleepingComputter demonstrated how they were able to use Time Bandit to convince Chatgpt-4O to provide detailed instructions and code to create rust-based polymorphic malware, as the code would be used by a programmer in 1789 .
Kuszmar first discovered Time Bandit in November 2024 and finally pointed out vulnerability through information on the vulnerability and coordination of the CERT coordination center after unsuccessful previous attempts to contact OPENAI directly, according to Bleeping Computer.
CERT / CC vulnerability Details according to which the feat of Bandit Time requires encourage the chatppt-4o with questions over a period of time or a specific historical event, and that the attack is the most successful when the invites involve the 19thth or 20th century. The feat also requires that the period of time or the specified historical event is well established and maintained, because the prompts pivot to discuss prohibited subjects, because the guarantees are launching if the chatppt-4o returns to the recognition of the current period .
Time Bandit can be used with direct prompts by a user who is not connected, but the CERT / CC disclosure also describes how the “search” function of the model can also be used by a connected user to perform the jailbreak. In this case, the user can encourage Chatgpt to seek information on the Internet concerning a certain historical context, by establishing the period in this way before moving on to dangerous subjects.
Openai provided a declaration to Cert / CC, saying: “It is very important to us that we develop our models safely. We do not want our models to be used for malicious purposes. We appreciate you to have disclosed your results. We work constantly to make our models safer and more robust with exploits, including jailbreaks, while maintaining the usefulness and performance of tasks.
BleepingCompute said that the jailbreak still worked from Thursday morning, and that the chatpt would delete the feat invites while providing an answer.
CERT / CC warned that a “motivated threat actor” could potentially exploit Time Bandit for the creation of mass e-mails of phishing or malware.
Chatppt jailbreaks are a Common subject on cybercrime forums And Pillar Security’s state of attacks against the Genai report have found that jailbreaks against LLM in general has a success rate of around 20%. However, simple methods in a single step as “ignoring the previous instructions” were the most popular, the attacks on average 42 seconds and five interactions to finish.
OPENAI Opens a bug bonus program In April 2023, but noted that the vulnerabilities of Jailbreak were outside the scope of the program.