New Guardrails for Generative AI
Setting guardrails for classical (analytical) AI tools, such as non-generative machine learning, is relatively straightforward. These systems typically produce scores or probabilities, so many of the safeguards involve pre-established thresholds. For example, if a weather model predicts pressure, the output must always be a number greater than zero. However, the virtually limitless capabilities of today’s generative AI algorithms make it essential to guard against unpredictable or discriminatory outputs. It’s also crucial to limit their misuse, such as generating fake news at scale or enabling cyber fraud techniques like phishing, smishing, or CEO fraud via social engineering.
This is why developers must implement specific guardrails for generative models. Examples include watermarks and metadata embedded in AI-generated images, indicating that they have been created or edited by AI; and tools to ensure generative models do not execute self-written code without permission, or that their responses are not used to automate critical processes.
Another key aspect is ensuring the quality of the data used to train generative models. It is essential to have mechanisms that confirm the data is appropriate, relevant, and secure—and that the model is trained solely for its intended purpose, to avoid unintended responses or applications. For instance, the Blue chatbot is trained exclusively to respond to questions posed by BBVA customers related to banking transactions or their financial position. “In generative models, data quality is as important—if not more so—than in predictive ones, because it affects not only the accuracy and reliability of outputs, but also whether the model stays within its intended use,” explains Víctor Peláez, Discipline Leader for Governance and Regulation on BBVA’s Analytics Transformation team.
As models become more advanced and powerful, they tend to generate fewer hallucinations and become less vulnerable to attack, says Juan Arévalo, although human supervision will always remain essential: “In cybersecurity, for example, we’ll never be able to predict every possible attack,” he explains. “It’s up to humans to detect new threats and vulnerabilities, adapt the guardrails accordingly, and make sure this process is part of the AI system’s development lifecycle.”