Companies that deploy models of generative artificial intelligence (GENAI) – in particular large language models (LLM) – should use the expansion of the variety of open source tools aimed at exposing security problems, including Quick injection attacks and jailbreaks, according to experts.
This year, university researchers, cybersecurity consultants and IA safety companies have published an increasing number of open source tools, including more resilient rapid injection tools, frames for red teams IA and known rapid injections catalogs. In September, for example, the cybersecurity advice, Bishop Fox, published Broken Hill, a tool to bypass restrictions on almost all LLM with a cat interface.
The open source tool can be formed on an LLM accommodated locally to produce prompts which can be sent to other cases of the same model, which makes these instances disobey their packaging and their railings, According to Bishop Fox.
The technique works even when companies deploy additional railings – generally, simpler LLMs trained to detect jailbreaks and attacks, explains Derek Rush, chief consultant of the council.
“Broken Hill is essentially able to design an prompt that meets the criteria to determine if (a given entry) is a jailbreak,” he said. “Then, he begins to change characters and put various suffixes at the end of this particular prompt to find (variations) who continue to pass the guards until he creates an invitation which results in the disclosure of secret . “
The rhythm of innovation in LLMS and IA systems is amazing, but security is struggling to follow. Every few months, a new technique appears to bypass the protections used to limit the inputs and exits of an AI system. In July 2023, a group of researchers used a technique known as the name “Greedy coordinates Gradents” (GCG) To design an prompt that could bypass the guarantees. In December 2023, a separate group created another method, Attack shaft with pruning (TAP)This also bypass security protections. And two months ago, a less technical approach, known as a deceptive delightWas introduced that uses fictitious relationships to deceive AI chatbots to violate their systems restrictions.
The innovation rate in attacks underlines the difficulty of securing Genai systems, explains Michael Bargury, director of technology and co-founder of the Zenity IA security company.
“It is a secret of Polichinelle that we don’t really know how to create secure AI applications,” he said. “We are all trying, but we do not yet know how, and we essentially understand it while building them with real data and with real repercussions.”
Railing, jailbreaks and pyrits
Businesses erect defenses to protect their precious commercial data, but if these defenses are effective remain a question. Bishop Fox, for example, has several customers using programs such as Promptguard and Llamaguard, which are scheduled LLMS to analyze validity prompts, explains Rush.
“We see a lot of customers (adopting) these different models of goalkeeper who are trying to shape, in one way or another, which the user submits as a disinfection mechanism, whether to determine S ‘There is a jailbreak or perhaps to determine if it is suitable for content, “he said. “They essentially ingest content and produce a categorization of safety or danger.”
Now, IA researchers and engineers publish tools to help companies determine if these railings actually work.
Microsoft has published its Python Risk Identification Toolkit for General IA (Pyrit) In February 2024, for example, an AI penetration test framework for companies wishing to simulate attacks on LLMS or IA services. The toolbox allows red teams to create an extensible set of capacity to probe various aspects of an LLM or Genai system.
Zenity regularly uses Pyrit in his internal research, explains Bargury.
“Basically, this allows you to encode a lot of rapid injection strategies, and it tries them on an automated basis,” he said.
Zenity also has its own open source tool, PowerpwnA red team toolbox to test Azure -based and Microsoft 365 cloud services. Zenity researchers used Powerpwn for Find five vulnerabilities in Microsoft Copilot.
The mangling invites you to escape detection
Broken Hill by Bishop Fox is an implementation of the GCG technique that develops the efforts of original researchers. Broken Hill begins with a valid prompt and begins to change some of the characters to direct the LLM in a direction closer to the opponent’s objective to disclose a secret, says Rush.
“We give Broken Hill this starting point, and we generally tell him where we want to finish, as maybe the word” secret “in the answer could indicate that he would disclose the secret that we are looking for,” he says.
The open source tool is currently working on more than two dozen Genai models, according to His GitHub page.
Companies would do well to use Broken Hill, Pyrit, Powerpwn and other tools available to explore their Vulnerabilities of AI applications, as systems will probably always have weaknesses, explains Zenity negotiation.
“When you give data on the AI - These data is an attack vector – because anyone can influence this data can now resume your AI if it is able to make a quick injection and make jailbreak” , he said. “So we are in a situation where, if your AI is useful, it means that it is vulnerable because to be useful, we must feed its data.”