Large Language Models and cyber security: risks and solutions

25 March 2025

Table of contents

What are Large Language Models
How do Large Language Modelss work
Cyber security risks
How to mitigate risks

Large Language Models (LLMs) are transforming the way we interact with artificial intelligence, enhancing the understanding and generation of human language.

However, their use also raises concerns in the field of cyber security. In this article, we will analyze what Large Language Models are, how they work, and what security risks they pose.

What are Large Language Models

Large Language Models (LLMs) are artificial intelligence systems based on neural networks, trained on vast amounts of textual data to understand, process, and generate texts with an unprecedented level of accuracy and coherence.

They use sophisticated machine learning techniques to learn linguistic structures and patterns, enabling them to answer questions, translate texts, write articles, and even generate code.

These models are built using transformer models, a specific deep learning architecture introduced in 2017 with Google’s paper “Attention Is All You Need.” Transformer models allow LLMs to process entire sentences or paragraphs in parallel, capturing semantic relationships between words more efficiently than previous models based on recurrent neural networks (RNNs).

A large language model such as GPT-4, PaLM, or Llama is trained on vast amounts of data, often collected from public sources such as books, articles, and websites.

However, to refine the quality of their responses, these models undergo further fine-tuning using specific datasets and supervised learning techniques.

Thanks to this process, LLMs can perform multiple tasks, including:

Sentiment analysis in texts (useful for social media monitoring and customer satisfaction);

Automatic translation between different languages;

Automated content generation;

AI-assisted code writing;

Conversational interaction, as seen in advanced chatbots.

The potential of these models is enormous, but their use must be carefully regulated to prevent security risks.

How do Large Language Models work

Large Language Models work through sophisticated machine learning architectures, based on:

Tokenization
Breaking text into smaller units for processing. Each word or part of a word is transformed into a token, a numerical unit that the model can process.

Supervised and self-supervised learning
Using vast amounts of data to improve language understanding. The model is trained on pre-existing texts and then refined through self-learning techniques, allowing it to autonomously identify correlations between words.

Reinforcement learning with human feedback (RLHF)
Optimizing responses based on human evaluations. Experts assess the model’s generated responses and provide feedback to improve their accuracy and coherence.

Context weighting
Thanks to transformer models, LLMs can analyze large amounts of text simultaneously, considering the surrounding word context to determine the correct meaning.

Use of deep neural networks
Neural networks allow the model to create connections between concepts, improving its ability to understand and generate text in increasingly sophisticated ways.

Thanks to these processes, Large Language Models are becoming increasingly sophisticated and applicable in multiple fields, from sentiment analysis to automated content generation to the creation of intelligent chatbots capable of simulating human conversations with remarkable accuracy.

Cyber security risks

Despite their advantages, Large Language Models can pose a serious threat to cyber security. Below, we analyze some of the main threats.

Phishing and advanced social engineering

LLMs can be used to create extremely realistic phishing emails. Hackers exploit these capabilities to:

Generate fraudulent messages with credible human language;

Automate spear phishing attacks against companies and individuals;

Create fake conversations to deceive users.

Disinformation and text-based deepfakes

Large Language Models can be used to spread fake news, manipulate public opinion, and create fake reviews. These tools enable:

Generating responses that seem authentic but contain false information;

Cloning the writing style of a real person to spread manipulated news;

Polluting social media with misleading content.

Malicious code generation

Some LLM machine learning, such as Codex, can generate code automatically. This capability can be exploited by attackers to:

Create malware and exploits with ease;

Identify vulnerabilities in software systems;

Automate the creation of hacking tools.

Data leakage and privacy

Large Language Models (LLMs) are trained on enormous amounts of data, and if these include sensitive information, the model could accidentally disclose them. The main risks include:

Extracting confidential data through targeted queries;

Unintentionally storing private information in datasets;

Prompt injection, i.e., attacks that manipulate the model to extract sensitive data.

Bypassing moderation systems

Hackers can find ways to circumvent the ethical restrictions of LLM artificial intelligence, using techniques such as:

Prompt poisoning
Manipulating input to obtain prohibited responses.

Creating question chains
To bypass security filters.

How to mitigate risks

To reduce risks associated with LLMs, organizations must adopt advanced security strategies. Some of the most effective methods include:

Advanced filters and automated moderation
Implementing systems capable of detecting and blocking malicious requests or attempts to manipulate the model, such as prompt injection.

Constant monitoring
Using tracking tools to identify suspicious activities or potential abuses by users.

Training with controlled datasets
Carefully selecting datasets used for training, excluding sensitive or potentially dangerous information.

Limiting model responses
Introducing constraints that prevent LLMs from generating content that could be used for illicit purposes, such as malicious code or disinformation.

Authentication and authorization
Implement systems to verify users accessing LLMs, ensuring that only authorized people can use them in sensitive contexts.

Collaboration with security experts
Working with cyber security specialists to identify new vulnerabilities and develop mitigation strategies.

Training and awareness
Educating users about the risks associated with LLM usage and providing guidelines on how to interact safely with these models.

Robustness testing
Subjecting LLMs to periodic checks to identify potential weaknesses and fix them before they can be exploited.

Regulation and compliance
Adopting international standards to ensure ethical and secure use of Large Language Models in organizations and the public sector.

Implementing these measures can significantly help reduce risks associated with Large Language Models, allowing their potential to be harnessed without compromising data security and sensitive information.

Questions and answers

What are Large Language Models?
LLMs are artificial intelligence models based on neural networks, designed to understand and generate human language.

How do Large Language Models work?
They use transformer models and machine learning techniques to process vast amounts of data and generate realistic texts.

What are the most well-known examples of LLMs?
Famous examples include GPT-4, PaLM, Claude, and Llama.

Why can Large Language Models be a cyber security risk?
They can be used to generate phishing attacks, disinformation, malicious code, and bypass security systems.

How can LLMs generate phishing attacks?
LLMs can write fraudulent emails with human-like language, making phishing attacks more convincing.

Can LLMs disclose confidential information?
Yes, if trained on datasets containing sensitive information, they may inadvertently reveal them.

Can LLMs be used to write malicious code?
Yes, they can generate code for exploits and malware without requiring advanced programming skills.

How can LLM abuse be prevented?
Through security filters, ethical regulations, and active model monitoring.

What are the main methods for attacking LLMs?
Prompt injection, prompt poisoning, and indirect manipulation are techniques used to force unauthorized responses.

What is the future of security in Large Language Models?
The adoption of advanced protection techniques will be essential to balance innovation and security.

negg Group