Thought Leadership

How Generative AI is Changing the Cybersecurity Landscape Part 2

Written by Cervin Ventures | 25 April 2024

In part one of this series, we discussed how generative AI is accelerating the evolution of threats, how traditional systems will need to evolve to face those threats, and how emerging systems are already transforming the ecosystem. 

 

In this installment, we’ll discuss new attack surfaces and the broader impact LLM usage introduces to the security, trust and privacy community. 

New attack surfaces and threats with LLMs

Thus far we have described a number of ways that generative AI increases security risks in traditional systems. But Gen AI will also bring about new attack vectors.

The emergence of LLMs (large language models) has created several new types of attacks and threats. Any given LLM can be understood as a single massive component that facilitates or makes possible a whole set of applications. As a result, that LLM requires a range of protections from a host of new threats–many of which we have yet to see. At this point, we have only seen the initial more obvious threats. Broadly, those threats fall into the following four categories: prompt injection, agents, privacy, and fundamental flaws. 

 

Prompt Injection

 

Prompt injection occurs when a bad actor injects malicious information or code into a machine in a way that dictates or impacts the output or outcome of the machine. 

 

In traditional terms, the equivalent to a prompt injection would be what is known as a SQL injection. In a SQL injection, malicious information or code is injected into a database, resulting in corrupted data. SQL attacks have been one of the most popular forms of attack for the last ten to 15 years. 

 

 

SQL attacks offer an important lesson when it comes to LLMs and prompt injection. An LLM offers an interface in which the user prompts the LLM to do something–that interaction is not so different from the way a SQL interacts with language through orders and inputs. It is very possible that a bad actor could insert malicious information into the LLM, causing the output to become corrupted. 

 

Agents and Agent Orchestrators

 

On the surface, generative AI consists of a fairly simple interface: you ask the system a question and it responds with an answer. However there is a problem–the answer provided by gen AI could be fictitious due to a lack of precise information. A pretty well quoted instance is when you asked OpenAI who the president of the United States was, it responded with “Donald Trump” because it is drawing on data from 2021. This is also called model hallucination. There are many ways to prompt hallucinations from the model just by asking the right set of questions and fine tuning the way the questions are delivered. 

 

Now consider a package software like Copilot, which contains an underlying open AI model. In similar fashion, you prompt the machine with a question or command–for instance “I need you to generate this table in a database.” In order to follow the command, Copilot’s underlying AI will then interact with separate software that is able to create tables. In this way, the interaction goes beyond mere questions and answers. Instead, the underlying AI engages in a sequence of actions in order to fulfill the request. It is acting as an agent.

 

Agent orchestrators take things a step further. Agent orchestrators activate multiple agents in sequence, generating a workflow between them. For instance, you might prompt the machine to build a payment processing system. The machine will then evaluate what steps that action requires–perhaps building a database, creating a transaction calculator, and running a risk model to test processing capabilities and fees. Each of these steps requires that the agent orchestrator activate separate agents to complete those tasks.

 

Now recall how it’s possible to prompt model hallucinations, and a much bigger problem emerges. With agent orchestrators it becomes possible to confuse the agent on multiple levels, with numerous opportunities for attackers to hack in and prompt malicious behavior.