IT security strategy: Assessing the risks of generative AI

IT security leaders are recognising the risks and opportunities of generative artificial intelligence (GenAI) for enterprise IT. In April 2023, a survey conducted with Gartner’s Peer Community of IT and security leaders found that almost all of the 150 people polled said their teams were involved in GenAI security and risk management, with data guidelines and AI champions among the strategies in play.

Rasika Somasiri, a cyber security expert at PA Consulting, believes 2024 will be the year when a consensus on defence against AI-based attacks will start to emerge, particularly as such attacks become more apparent. “We see this leading to a spike in demand for experts who can move between the AI and cyber security domains as ‘secure by default’ becomes a necessity,” he says.

There is also a risk that images and text generated by AI could infringe intellectual property rights, warns Paul Joseph, intellectual property partner at law firm Linklaters. When using AI-generated content, he says: “Legal checks and risk analysis still need to be carried out.”

Data leakage prevention

Drilling down into the data leakage risks associated with large language models (LLMs), Jeff Schwartzentruber, a senior machine learning scientist at eSentire, says: “Carrying out a threat modelling exercise can show where you might need to harden your systems, or where you have to consider your legal responsibilities around the data.”

He suggests those responsible for implementing LLM applications should pay particular attention to data sharing, privacy and security to ensure the accuracy of responses.

LLMs work based on prompts entered by users. He points out that these prompts can potentially include company data or personally identifiable information (PII) about customers. This data is then sent to the LLM for processing. Those in charge of IT security must consider what happens to the data after it is entered and assess how effectively it is managed to ensure it remains secure.

As an example of what can go wrong, Schwartzentruber points to the example of Samsung engineers who used confidential data in ChatGPT sessions. The data input by the Samsung team was revealed to others who ran sessions that queried ChatGPT in a way that drew on the input data from Samsung, leading to data being leaked.

Schwartzentruber is part of the team that built eSentire’s own generative AI service. He says that using the standard ChatGPT service by OpenAI allows its creator to reuse any data that is uploaded, but data sent via ChatGPT Enterprise or the application programming interface (API) cannot be used to train OpenAI models. Similarly, he says, if you use the Azure OpenAI service, data is not shared onwards.

In Schwartzentruber’s experience, there are many ways to force an LLM to break its rules or provide additional data. “Developers should conduct due diligence on any tools, so they understand the supply chain for the data they use,” he adds.

Open and closed models

When evaluating the risks of data leakage or IP infringement with datasets used to train LLMs, IT security chiefs should also weigh up the pros and cons of open source models versus proprietary or closed LLMs.

Andrea Mirabile, global director of AI research at Zebra Technologies, says the primary distinction between closed source and open source LLMs lies in the transparency they offer. Explaining the difference, he says: “Closed LLMs operate as black boxes, providing minimal information about the training data, optimisation techniques and additional information sources enhancing model performance. On the other hand, transparency becomes a pivotal advantage for open source LLMs. From a security standpoint, there isn’t a definitive winner, as each approach has its own set of constraints.”

When looking at closed source, Mirabile says the proprietary nature of the model may provide security through obscurity, making it challenging for malicious actors to exploit vulnerabilities. However, he points out that identifying and addressing security issues might be a prolonged process because the system is closed.

“With open source, we have security gains from the collaborative efforts of the community. The scrutiny of many eyes on the code facilitates the swift detection and resolution of security vulnerabilities,” he adds.

Nevertheless, as Mirabile notes, public scrutiny of the code may reveal potential weaknesses that could be exploited.

Schwartzentruber says: “If you decide to implement your own LLM using an open source option, you will have more control over your model and how data is shared.”

Whether an organisation chooses an open source or closed model, Schwartzentruber says the IT security team must ensure standard security measures, such as encrypting traffic, and role-based access controls are in place.

Prompt injection attacks

Mirabile defines “prompt injection” as a security vulnerability that can be exploited to manipulate the behaviour of an LLM. He says this vulnerability allows an attacker to introduce malicious prompts into the system, compelling the model to perform unintended actions.

He cites a recent example reported online, where researchers demonstrated a prompt injection scenario with ChatGPT. “When prompted to repeat the word ‘poem’ indefinitely, ChatGPT unexpectedly generated what appeared to be a real email address and phone number in its responses,” he says.

According to Mirabile, this incident underscored the potential risks associated with prompt injection, as it unveiled elements of the model’s training data that were not intended to be exposed. Such instances highlight the importance of addressing and mitigating prompt injection vulnerabilities to safeguard against unintended data disclosures and privacy breaches.

In Mirabile’s experience, there are several techniques and methods that can be used in prompt injection attacks, each designed to influence the model’s responses in specific ways.

A “basic injection” attack is where the intruder directly sends attacks to the target without prompt enhancements to obtain answers to unrelated questions or dictate actions. As an example, Mirabile says an attacker could pretend to be the developer, leveraging attack types such as Carnegie Mellon Jailbreak or Typoglycemia, where words can be read despite being jumbled. Such an attack can circumvent security guardrails in large language models.

A “translation injection” is another type of attack, which Mirabile says exploits the language capabilities of LLMs by injecting prompts in languages other than English to test if the model responds accordingly. For example, he says, an attacker could ask a question like, “Was ist die Hauptstadt der Deutschland?” to evaluate the model’s ability to handle prompts in German. (Hauptstadt is the German word for capital city.)

A “maths injection” is where the LLM is asked to perform mathematical calculations to gauge its capability for handling complex tasks. As an example, Mirabile says an attacker could craft an attack related to the target context of the LLM (an LLM trained to respond to queries about meditation, for example), such as asking about meditation techniques after including a mathematical calculation.

Similarly, a “context switch” attack is when the query from the attacker looks like a legitimate question that is framed within the context for which the LLM has been trained, but the attacker poses unrelated questions to assess if sensitive information can be extracted. Looking at the meditation LLM example, Mirabile says the attacker could combine questions about meditation techniques with unrelated enquiries about a specific area of Turkey to test the model’s ability to provide answers outside its designated context.

An “external browsing” attack is where the attacker tests if the LLM instance can browse to a provided URL, retrieve content and incorporate it into its responses. As an example of such an attack, Mirabile says the attacker could ask the model to browse through a specified URL and provide information from an article regarding the benefits of meditation by a renowned expert.

Another form of attack is “external prompt injection”, which, according to Mirabile, determines that the LLM can interpret URLs, and then uses a URL to attempt to retrieve additional prompts from external sources. In practice, he says, the attacker would ask the model to explore a specified website and incorporate insights from the content found there into its responses about recommended meditation resources.

Assessing the risks

These examples illustrate some of the techniques an attacker could use to test the security and robustness of a large language model. Mirabile recommends that IT leaders establish strong security measures to protect against unauthorised manipulation of language models.

“It is crucial for developers and organisations to be aware of these vulnerabilities and implement safeguards to secure language models against such attacks,” he says.

Beyond LLMs and AI, PA Consulting’s Somasiri recommends that IT security leaders ensure security is built into the systems, processes and mindset of the organisation. This, he says, will enable them to understand the opportunities and impact of any new technology and to design how such technologies can help their organisations grow in the digital world.

Exit mobile version