Autonomy of AI agents in cyber security

23 January 2026

Table of contents

What are AI agents (or agentic AI) and why they matter
Anatomy of an AI agent in the SOC
From theory to practice: how AI agents enhance the SOC
Offensive use: how attackers exploit agentic AI
Governance: between NIST AI RMF and EU AI Act
Deployment architecture: end-to-end example
Securing the agent: threats, guardrails, and adversarial tests
From POC to production: recommended path
Opportunity and responsibility

Autonomous AI agents often referred to as agentic AI are reshaping the landscape of cyber security. They can act as defensive tools, capable of investigating, correlating, and orchestrating automated incident responses; but also as offensive vectors, performing reconnaissance, exploiting vulnerabilities, and moving laterally across networks with superhuman speed.

This article defines the concept of agentic AI, explains how these agents can be integrated into modern Security Operation Centers (SOC) either alongside or as an evolution of traditional SOAR systems and explores their dual use as both a security enhancer and an attack multiplier. It also examines risks such as AI governance, human-in-the-loop control, jailbreaks, and malicious use, and provides operational examples: deployment architectures, governance policies, metrics, and code samples for embedding an autonomous agent into an incident response workflow. Finally, we provide practical guidelines for organizations introducing AI agents safely and responsibly, aligning with NIST AI RMF, MITRE ATLAS, and the EU AI Act.

What are AI agents (or agentic AI) and why they matter

AI agents are systems capable of perceiving their environment (logs, telemetry, assets), reasoning and planning, and acting (calling APIs, isolating hosts, opening tickets) with a degree of autonomy. Unlike traditional “question–answer” AI models, the agentic AI paradigm introduces memory, goals, tool use, and reasoning loops that allow agents to pursue complex tasks without continuous supervision.

In cyber security, SOC teams are experimenting with AI agents that can:

Automate alert triage;
Conduct investigations (data enrichment, correlation, hypothesis testing);
Propose or execute containment and remediation actions;
Learn from patterns to enhance response speed and accuracy.
These agents deliver measurable gains in MTTD/MTTR, reduce analyst fatigue, and extend traditional SOARworkflows with dynamic, context-aware reasoning.

Anatomy of an AI agent in the SOC

A typical AI agent in a SOC environment includes:

Perception layer
Connectors to SIEM, EDR, NDR, CASB, vulnerability scanners, SBOM, and threat intelligence feeds.
Reasoning & Planning
LLMs with tool-usage capabilities, multi-step reasoning, memory, and role assignment (L1/L2 analyst).
Action layer
Callable tools (e.g., “isolate host,” “revoke token,” “block hash,” “open ticket”) with human-in-the-loop approval.
Guardrails & Governance
Filters, security policies, audit trails, and anti-jailbreak/prompt-injection controls.
Observability
Structured logs, decision traceability, precision metrics, and performance analytics.

The architecture can be centralized (one orchestrator managing several agents) or multi-agent, with specialized agents collaborating or competing to propose optimal actions.

From theory to practice: how AI agents enhance the SOC

1) Automated triage

Agents classify and prioritize alerts, enrich them with IOCs, asset context, CVE exposure, and escalate cases intelligently.

2) Guided investigation

During an investigation, the agent formulates hypotheses, runs SIEM queries, retrievesEDR telemetry, compares findings with MITRE ATLAS, and validates results through mini-experiments.

3) Incident response

With human oversight, agents isolate endpoints, revoke credentials, enforce MFA, or adjust MDM/IAM policies. Unlike static SOAR playbooks, they adapt their sequence of actions to uncertain, evolving contexts.

4) Continuous threat hunting

“Hunting agents” analyze logs for anomalies consistent with MITRE ATLAS tactics (e.g., model exfiltration or prompt injection) and feed hypotheses back to analysts.

5) Exposure and vulnerability management

By integrating scanners and SBOM, agents can plan patch cycles, simulate risk impact, and produce risk dashboards for executives.

Offensive use: how attackers exploit agentic AI

Malicious actors can also deploy AI agents to:

Conduct large-scale scanning and reconnaissance;
Choose exploits and payloads dynamically;
Perform lateral movement and privilege escalation;
Script adaptive actions that evade EDR or SIEM rules;
Attack AI systems directly via model evasion, data poisoning, or model extraction.

MITRE ATLAS documents such adversarial TTPs, which defenders must simulate to test resilience.

Meanwhile, generative AI has amplified phishing and social engineering, lowering the entry barrier for less skilled attackers.

Governance: between NIST AI RMF and EU AI Act

Deploying AI agents responsibly requires structured risk management. The NIST AI Risk Management Framework (AI RMF) and the Generative AI Profile (2024) outline best practices for identifying, assessing, and mitigating AI risks emphasizing accountability, transparency, and security.

In Europe, the EU AI Act is being implemented in phases. Some bans and obligations are already active in 2025, while requirements for high-risk systems and general-purpose AI (GPAI) apply fully by 2026. Security teams must prepare technical documentation, risk assessments, and post-market monitoring.

ENISA has also published comprehensive analyses of the AI Threat Landscape, offering practical guidance for adversarial testing and resilience planning.

Deployment architecture: end-to-end example

A company SOC integrating a triage–investigation–response agent would include:
Integrations:

SIEM, EDR, ASM, SBOM, CTI, IAM/PAM systems, ticketing, and scanners.
- The agent exposes an event bus (Kafka or similar) and action plugins (“isolate host,” “block hash,” etc.).
Policies:
- Human approval for disruptive actions (isolation, disabling users).
- Guardrails for scope, rate limiting, and sensitive data filtering.
Reasoning:
- The agent receives an alert, enriches it with context, maps it to MITRE ATLAS, and outputs a plan (e.g., “if EDR verdict = High and host non-critical → propose isolation”).
Observability:
- Full trace of reasoning, actions, approvals, and feedback.
- KPIs: MTTD, MTTR, triage precision, rollback rate.

Example: AI governance policy (YAML)

version: 1
agent:
  role: "SOC Analyst L2"
  autonomy_level: "assisted"   # assisted | supervised | autonomous
  objectives:
    - "Reduce MTTR without increasing false positives"
    - "Ensure full decision traceability"
guardrails:
  action_scopes:
    allowed:
      - "create_ticket"
      - "enrich_indicator"
      - "quarantine_email"
      - "block_hash"
    require_approval:
      - "isolate_host"
      - "disable_user"
      - "revoke_oauth_sessions"
    denied:
      - "drop_production_db"
      - "change_firewall_wan_rules"
  rate_limits:
    isolate_host: { per_hour: 1, per_day: 3 }
    disable_user: { per_hour: 2, per_day: 5 }
  data_policies:
    pii_filter: true
    secrets_redaction: true
  prompts_security:
    jailbreak_protection: true
    system_prompt_checks: ["deny_tool_escalation", "deny_self_modification"]
observability:
  logging: "structured_json"
  store:
    - "inputs"
    - "tool_calls"
    - "reasoning_summary"
    - "actions"
  metrics:
    - "mttd"
    - "mttr"
    - "precision_triage"
    - "rollback_rate"
approvals:
  approvers:
    - "Incident Commander (24/7)"
    - "IAM Owner"
  sla:
    critical: "5m"
    high: "15m"
    medium: "60m"

This policy establishes minimal guardrails in line with NIST AI RMF principles of accountability and transparency.

Example: assisted SOAR integration (Python)

import os, json, time
from typing import Dict, Any
from uuid import uuid4

# Pseudo-SDK del SOAR
from soar_sdk import create_ticket, add_comment, add_tag, approve_action
from soar_sdk import isolate_host, block_hash, revoke_oauth_sessions
# Pseudo-SDK SIEM
from siem_sdk import query_events, get_asset_context
# Modulo LLM (tool-enabled)
from llm_agent import plan_investigation, summarize_risk, propose_actions

HUMAN_APPROVAL_ACTIONS = {"isolate_host", "disable_user", "revoke_oauth_sessions"}

def guardrails_ok(plan: Dict[str, Any]) -> bool:
    # Esempio basilare: controlla rate limit e scope consentiti
    actions = {step["action"] for step in plan.get("steps", [])}
    if "drop_production_db" in actions:
        return False
    # Rate limit sintetico
    return len(actions.intersection({"isolate_host"})) <= 1

def handle_alert(alert: Dict[str, Any]) -> None:
    case_id = str(uuid4())
    asset = alert.get("host")
    ctx = get_asset_context(asset)
    events = query_events(asset=asset, window="24h")

    # 1) Fase di ragionamento
    investigation = plan_investigation(alert=alert, context=ctx, events=events)
    risk = summarize_risk(investigation)
    plan = propose_actions(risk=risk, context=ctx)

    # 2) Guardrail
    if not guardrails_ok(plan):
        create_ticket(title=f"[{case_id}] Manual review", body="Plan violates guardrails", severity="high")
        return

    # 3) Esecuzione con approvazione umana quando richiesto
    ticket_id = create_ticket(title=f"[{case_id}] {alert['rule']}", body=json.dumps(risk, indent=2), severity="high")
    add_tag(ticket_id, "agentic-ai")
    add_comment(ticket_id, "Proposed plan: " + json.dumps(plan, indent=2))

    for step in plan["steps"]:
        action = step["action"]
        params = step.get("params", {})
        if action in HUMAN_APPROVAL_ACTIONS:
            if not approve_action(ticket_id, action, params, sla="15m"):
                add_comment(ticket_id, f"Action {action} denied or timed-out.")
                continue
        # Mapping azioni → API
        if action == "isolate_host":
            isolate_host(asset)
        elif action == "block_hash":
            block_hash(params["sha256"])
        elif action == "revoke_oauth_sessions":
            revoke_oauth_sessions(user=ctx["user"])
        time.sleep(0.5)

    add_comment(ticket_id, "Plan executed. Closing if no new events in 30m.")

This code demonstrates an assisted mode agent: it proposes, validates, and executes actions under human oversight.

Securing the agent: threats, guardrails, and adversarial tests

Main threats:

Prompt injection and jailbreaks causing rule violations.
Model exploitation through manipulation of reasoning.
Data poisoning in training or retrieval pipelines.
Model exfiltration and tool abuse.
AI supply-chain vulnerabilities in third-party services.

Continuous red teaming based on MITRE ATLAS adversarial use cases.

Measuring value: KPIs and SLOs for agentic AI in the SOC

MTTD / MTTR
Do they improve without increasing false positives?
Triage accuracy
Alerts correctly closed / total alerts.
Rollback rate
How many actions need to be undone due to errors or excessive impact.
Coverage
Percentage of incident classes handled by the agent.
Human load
Analyst minutes saved per shift.
Security
Incidents caused by agent mis-actions (target: zero), guardrail violations, completed audits.

Operational example: dynamic playbook for a phishing attack with potential lateral movement

Scenario: an EDR alert flags suspicious execution on a Finance endpoint. The agent:

Enriches the alert with SIEM telemetry and checks whether the user has elevated privileges.
Compares events against known patterns (MITRE ATLAS + internal knowledge).
Assesses risk (critical asset, stored SSO credentials, MFA anomaly).
Proposes: (a) quarantining the attachment at the email gateway, (b) blocking the hash, (c) revoking the user’s OAuth sessions, (d) isolating the host if IOCs are confirmed.
Requests approval for (d) and documents every step in the ticket.

Threat hunting example (SQL Pseudocode)

-- Detect unusual beaconing to rare domains from recent EDR-alerted hosts
WITH rare_domains AS (
  SELECT domain FROM dns_logs
  WHERE ts > now() - interval '24 hours'
  GROUP BY domain HAVING COUNT(*) < 5
)
SELECT n.host, n.domain, COUNT(*) as hits
FROM netflow n
JOIN rare_domains r ON n.dst_domain = r.domain
JOIN edr_alerts e ON e.host = n.host
WHERE e.ts > now() - interval '6 hours'
AND n.dst_port IN (80,443,8080)
AND n.bytes_out BETWEEN 200 AND 800
GROUP BY n.host, n.domain
ORDER BY hits DESC;

An AI agent can generate, explain, and adapt such queries dynamically.

From POC to production: recommended path

Start with a limited use case (e.g., phishing triage).
Begin in assisted mode; progress to supervised, then autonomous for low-risk actions.
Apply strict guardrails and full auditing.
Define vendor AI contracts (data handling, SLAs).
Conduct MITRE ATLAS adversarial testing.
Align with NIST AI RMF and EU AI Act compliance timelines.
Train analysts; update hybrid runbooks.
Maintain manual fallback and kill-switch mechanisms.
Human-in-the-Loop Contract (Excerpt)
Human rights: review, question, override, reprioritize.
Agent duties
Explain reasoning, show sources, list rejected options.
Escalation
Any out-of-scope action requires explicit approval.
Audit
Every decision timestamped and logged.

Python example: action plan validator

from pydantic import BaseModel, ValidationError
from typing import List, Literal
 
ALLOWED = {"create_ticket", "enrich_indicator", "quarantine_email", "block_hash"}
REVIEW = {"isolate_host", "disable_user", "revoke_oauth_sessions"}
DENIED  = {"drop_production_db", "rewrite_firewall_wan"}
 
class Step(BaseModel):
    action: str
    risk: Literal["low", "medium", "high"]
 
class Plan(BaseModel):
    objective: str
    steps: List[Step]
 
def validate_plan(plan):
    errors = []
    actions = [s.action for s in plan.steps]
    if any(a in DENIED for a in actions):
        errors.append("Denied actions present.")
    if actions.count("isolate_host") > 1:
        errors.append("Too many isolate_host steps.")
    for a in actions:
        if a in REVIEW:
            errors.append(f"Action {a} requires approval.")
    return (len(errors) == 0, errors)

This validator ensures safety-first sequencing: low-risk steps first, manual review for critical actions.

Common pitfalls

Excessive autonomy too soon.
Mistaking agentic AI for a chatbot.
No separation of decision vs. approval roles.
Overprivileged tools violating least-privilege.
Ignoring prompt injection or goal hijacking testing.
Neglecting compliance under EU AI Act or data residency.

Roadmap (90 Days → 12 months)

0–30 days
Choose one use case, define policies, enable audit logs.
30–90 days
Run POC in assisted mode, benchmark metrics.
3–6 months
Expand to low-risk incident response.
6–12 months
Institutionalize purple teaming, compliance readiness for EU AI Act, NIST AI RMF alignment.

Opportunity and responsibility

AI agents can drastically reduce incident response time and operational costs, while expanding threat coverage and resilience. Yet the same technology also empowers attackers.
The balance lies in strong AI governance, human oversight, guardrails, auditable workflows, adversarial testing, and compliance with NIST AI RMF, MITRE ATLAS, and EU AI Act. Organizations that start small, measure carefully, and iterate safely will harness agentic AI as a force multiplier—without succumbing to its risks.

Questions and answers

What differentiates an AI agent from a basic LLM?
An LLM responds to prompts; an AI agent perceives, plans, and acts through tools with memory and defined policies.
Will AI agents replace SOAR systems?
No. They extend them adding adaptability to rigid SOAR playbooks and enhancing contextual automation.
What are the main risks?
Prompt injection, jailbreaks, tool abuse, data poisoning, and model extraction. Use MITRE ATLAS to model threats.
How should autonomy levels be set?
Begin with assisted, then supervised; reserve autonomous for low-risk actions only.
What metrics matter most?
MTTD, MTTR, triage accuracy, rollback rate, human load reduction, and zero mis-action incidents.
How to protect sensitive data?
Use data minimization, token expiration, sandboxing, and end-to-end audit trails.
What is the role of NIST AI RMF?
A voluntary framework for AI risk management, promoting accountability, transparency, and continuous monitoring.
How does the EU AI Act affect SOC teams?
It mandates documentation, risk management, and post-market monitoring for high-risk or general-purpose AI systems.
Are attackers already using AI agents?
Yes, mainly for reconnaissance, phishing, and lateral movement automation.
How to begin safely?
Choose one use case, enforce guardrails, maintain human oversight, audit every action, and align with NISTand EU AI Act standards.