Research team tricks AI chatbots into writing usable malicious code

Researchers at the University of Sheffield said they have successfully fooled a number of natural language processing (NLP) generative artificial intelligence (GenAI) tools – including ChatGPT – into producing effective code that can be used to launch real-world cyber attacks.

The potential for tools like ChatGPT to be exploited and tricked into writing malicious code that could be used to launch cyber attacks has been discussed at great length over the past 12 months. However, observers have tended to agree that such code would be largely ineffective and need a lot of extra attention from human coders if it was to be useful.

According to the University, though, its team has now proven that text-to-SQL systems – generative AI tools that let people search databases by asking questions in plain language – can be exploited in this way.

“Users of text-to-SQL systems should be aware of the potential risks highlighted in this work,” said Mark Stevenson, senior lecturer in the University of Sheffield’s NLP research group. “Large language models, like those used in text-to-SQL systems, are extremely powerful, but their behaviour is complex and can be difficult to predict. At the University of Sheffield, we are currently working to better understand these models and allow their full potential to be safely realised.”

“In reality, many companies are simply not aware of these types of threats, and due to the complexity of chatbots, even within the community, there are things that are not fully understood,” added Sheffield University PhD student Xutan Peng. “At the moment, ChatGPT is receiving a lot of attention. It’s a standalone system, so the risks to the service itself are minimal, but what we found is that it can be tricked into producing malicious code that can do serious harm to other services.”

The research team examined six AI tools – China-developed Baidu-Unit, ChatGPT, AI2SQL, AIhelperbot, Text2SQL and ToolSKE. In each instance, they found that by inputting highly specific questions into each of the AIs, they produced malicious code that when executed, could successfully leak confidential data, and interrupt or destroy a database’s normal service.

In the case of Baidu-Unit, they were also able to obtain confidential Baidu server configurations and render one server node out of order. Baidu has been informed and this particular issue has been fixed.

The researchers were also able to exploit the AI tools to launch simple backdoor attacks, planting a Trojan horse in text-to-SQL models by poisoning the training data.

Peng – who is also working on using NLP technology to teach endangered languages – said the study highlighted the dangers in how people are using AI to learn programming languages to better interact with databases. Their intentions may be honourable, but the results could be highly damaging.

“The risk with AIs like ChatGPT is that more and more people are using them as productivity tools, rather than a conversational bot, and this is where our research shows the vulnerabilities are,” he explained.

“For example, a nurse could ask ChatGPT to write an SQL command so they can interact with a database, such as one that stores clinical records. As shown in our study, the SQL code produced by ChatGPT in many cases can be harmful to a database, so the nurse in this scenario may cause serious data management faults without even receiving a warning.”

Peng and the other researchers presented their findings earlier this month at the ISSRE conference in Italy, and are now working with the security community to address the vulnerabilities they found.

They hope these vulnerabilities will serve as a proof-of-concept that helps both NLP and cyber specialists better identify and work together to resolve such issues.

“Our efforts are being recognised by industry and they are following our advice to fix these security flaws,” he said. “However, we are opening a door on an endless road. What we now need to see are large groups of researchers creating and testing patches to minimise security risks through open source communities. There will always be more advanced strategies being developed by attackers, which means security strategies must keep pace. To do so we need a new community to fight these next-generation attacks.”