“Jailbreaks persist simply because eliminating them entirely is almost impossible, just like the tampon overflow vulnerabilities in software (which have existed for over 40 years) or SQL injection defects in web applications (which afflict teams Safety for more than two decades) “, Alex Polyakov, the CEO of the opposing security company AI, told Wired in an e-mail.
Sampath de Cisco argues that, as companies use more types of AI in their applications, the risks are amplified. “It is starting to become a big problem when you start to put these models in important complex systems and these jailbreaks suddenly lead to downstream things that increase responsibility, increases trade risks, increases all kinds of problems for businesses”. explains Sampath.
Cisco researchers have drawn their 50 guests selected at random to test Deepseek R1 from a well -known library of standardized assessment promotes known as Harmbench. They tested prompts from six categories of harassment, including general damage, cybercrime, disinformation and illegal activities. They probed the model operating locally on the machines rather than via the website or the Deepseek application, which send data to China.
Beyond that, researchers say they have also seen results potentially concerning R1 tests with more involved and non-linguistic attacks using things like Cyrillic characters and tailor-made scripts to try to reach execution Code. But for their initial tests, says Sampath, his team wanted to focus on the results that came from a generally recognized reference.
Cisco has also included R1 performance comparisons against Harblbench prompts with the performance of other models. And some, like Meta’s Llama 3.1, failed almost as severely as R1 of Deepseek. But Sampath stresses that the DEEPSEEK R1 is a specific reasoning model, which takes more time to generate answers but draws more complex processes to try to produce better results. Consequently, supports Sampath, the best comparison is with the O1 reasoning model of Openai, which included the best of all the models tested. (Meta did not immediately respond to a request for comments).
Polyakov, from opponent AI, explains that Deepseek seems to detect and reject well -known jailbreak attacks, saying that “it seems that these responses are often copied from the OPENAI data set”. However, Polyakov says that in her business tests of four different types of jailbreaks – from linguistic tips to code -based tips – Deepseek restrictions could easily be bypassed.
“Each method worked perfectly,” explains Polyakov. “What is even more alarming is that they are not” zero day “jailbreaks – many have been known to the public for other years Create model.
“Deepseek is just another example of how each model can be broken – it is just the effort you make. Some attacks could be distributed, but the attack surface is infinite,” adds Polyakov. “If you do not continually equip your AI, you are already compromised.”