Cyber News

Security news, advisories, and manufacturing-sector coverage.

222 results

May 6, 2026

Critical vm2 sandbox bug lets attackers execute code on hosts

A critical vulnerability in the popular Node.js sandboxing library vm2 allows escaping the sandbox and executing arbitrary code on the host system. [...]

BleepingComputer

May 6, 2026

New Cisco DoS flaw requires manual reboot to revive devices

Cisco patched a Crosswork Network Controller and Network Services Orchestrator denial-of-service vulnerability that requires manually rebooting targeted systems for recovery. [...]

BleepingComputer

Mfg

May 6, 2026

DAEMON Tools devs confirm breach, release malware-free version

Disc Soft Limited, the maker of DAEMON Tools Lite, confirmed that the software had been trojanized in a supply chain attack and released a new, malware-free version. [...]

BleepingComputer

manufacturing

Mfg

May 6, 2026

Why ransomware attacks succeed even when backups exist

Backups don't fail because they're missing, they fail because attackers destroy them first. Acronis explains how ransomware targets backup systems before encryption, leaving no path to recovery. [...]

BleepingComputer

manufacturing

Mfg

Watchlist

May 6, 2026

MuddyWater hackers use Chaos ransomware as a decoy in attacks

The MuddyWater Iranian hackers disguised their operations as a Chaos ransomware attack, relying on Microsoft Teams social engineering to gain access and establish persistence. [...]

BleepingComputer

manufacturing

Mfg

Watchlist

May 6, 2026

MuddyWater Uses Microsoft Teams to Steal Credentials in False Flag Ransomware Attack

The Iranian state-sponsored hacking group known as MuddyWater (aka Mango Sandstorm, Seedworm, and Static Kitten) has been attributed to a ransomware attack in what has been described as a "false flag" operation. The attack, observed by Rapid7 in early 2026, has been found to leverage social engineering techniques via Microsoft Teams to initiate the infection sequence. Although the incident

The Hacker News

manufacturing

May 6, 2026

Webinar: Why network incidents escalate and how to fix response gaps

Most network incidents don't escalate due to a lack of alerts; they escalate when response breaks down. This webinar explores how to fix gaps in triage, enrichment, and coordination. [...]

BleepingComputer

May 6, 2026

The Hacker News Launches 'Cybersecurity Stars Awards 2026' — Submissions Now Open

For nearly 20 years, we at The Hacker News have mostly told scary stories about cyberspace — big hacks, broken systems, and new threats. But behind every headline, there’s a quieter, better story. It’s the story of leaders making tough calls under pressure, teams building smarter defenses, and security products that keep hunting threats 24/7 — even when it’s hard. Most of the time, this work is

The Hacker News

Mfg

May 6, 2026

From Stuxnet to ChatGPT: 20 News Events That Shaped Cyber

As part of its 20th anniversary celebration, Dark Reading looks back on 20 of the biggest newsmaking events from the past two decades that influenced the risk landscape for today's cybersecurity teams.

Dark Reading

manufacturing

other

May 6, 2026

Your AI Agents Are Already Inside the Perimeter. Do You Know What They're Doing?

Analysts recently confirmed what identity security teams have quietly feared: AI agents are being deployed faster than enterprises can govern them. In their inaugural Market Guide for Guardian Agents, Gartner states that “enterprise adoption of AI agents is accelerating, outpacing maturity of governance policy controls.” Enterprise leaders can request access to the Gartner Market Guide for

The Hacker News

May 6, 2026

Attacks Abuse Windows Phone Link to Steal Texts & Bypass 2FA

In hard-to-detect attacks, hackers are dropping the CloudZ RAT and a fresh plug-in, Pheno, to hijack the Windows-based bridge between PCs and smartphones.

Dark Reading

May 6, 2026

Palo Alto Networks warns of firewall RCE zero-day exploited in attacks

Palo Alto Networks warned customers today that a critical-severity unpatched vulnerability in the PAN-OS User-ID Authentication Portal is being exploited in attacks. [...]

BleepingComputer

Mfg

May 6, 2026

Google's Android Apps Get Public Verification to Stop Supply Chain Attacks

Google has announced expanded Binary Transparency for Android as a way to safeguard the ecosystem from supply chain attacks. "This new public ledger ensures the Google apps on your device are exactly what we intended to build and distribute," Google's product and security teams said. The initiative builds upon the foundation of Pixel Binary Transparency, which Google introduced in October 2021

The Hacker News

manufacturing

May 6, 2026

CISA’s CI Fortify prepares operators for cyber scenarios involving disrupted communications and OT compromise

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) launched a new initiative to strengthen the resilience of America’s... The post CISA’s CI Fortify prepares operators for cyber scenarios involving disrupted communications and OT compromise appeared first on Industrial Cyber.

Industrial Cyber

May 6, 2026

WEF maps path to AI-driven cybersecurity, calls for structured deployment, continuous monitoring, human control

The World Economic Forum, in collaboration with KPMG, published a report on how AI (artificial intelligence) is reshaping... The post WEF maps path to AI-driven cybersecurity, calls for structured deployment, continuous monitoring, human control appeared first on Industrial Cyber.

Industrial Cyber

Mfg

Watchlist

May 6, 2026

Oil and gas operators ramp up OT security spending post-Epic Fury, but critical detection gap persists

Tosi’s independent survey of 100 OT decision-makers across U.S. upstream and midstream oil and gas operators shows a... The post Oil and gas operators ramp up OT security spending post-Epic Fury, but critical detection gap persists appeared first on Industrial Cyber.

Industrial Cyber

manufacturing

Mfg

May 6, 2026

Frenos unveils Mythos Readiness Assessment to test critical infrastructure defenses against autonomous adversarial threats

Frenos, a vendor of AI native operational technology (OT) security posture management, launched the Mythos Readiness Assessment, a... The post Frenos unveils Mythos Readiness Assessment to test critical infrastructure defenses against autonomous adversarial threats appeared first on Industrial Cyber.

Industrial Cyber

manufacturing

May 6, 2026

CyberSheath helps Tunnell meet CMMC Level 2 with precision security aligned to actual CUI exposure

CyberSheath, a Cybersecurity Maturity Model Compliance (CMMC) managed service vendor, helped Tunnell Consulting, a consulting firm that provides... The post CyberSheath helps Tunnell meet CMMC Level 2 with precision security aligned to actual CUI exposure appeared first on Industrial Cyber.

Industrial Cyber

May 6, 2026

Windows Phone Link Exploited by CloudZ RAT to Steal Credentials and OTPs

Cybersecurity researchers have disclosed details of an intrusion that involved the use of a CloudZ remote access tool (RAT) and a previous undocumented plugin dubbed Pheno with the aim of facilitating credential theft. "According to the functionalities of the CloudZ RAT and Pheno plugin, this was with the intention of stealing victims' credentials and potentially one-time passwords (OTPs),"

The Hacker News

May 6, 2026

Palo Alto PAN-OS Flaw Under Active Exploitation Enables Remote Code Execution

Palo Alto Networks has released an advisory warning that a critical buffer overflow vulnerability in its PAN-OS software has been exploited in the wild. The vulnerability, tracked as CVE-2026-0300, has been described as a case of unauthenticated remote code execution. It carries a CVSS score of 9.3 if the User-ID Authentication Portal is configured to enable access from the internet or any

The Hacker News

May 6, 2026

Middle East Cyber Battle Field Broadens — Especially in UAE

As the war with Iran continues, breach attempts targeting the United Arab Emirates tripled in a few weeks — many targeting critical infrastructure.

Dark Reading

May 6, 2026

Evaluating Retrieval-Augmented Generation for Explainable Malware Analysis

arXiv:2605.03140v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly being used as security engineering tools to summarize and explain malware behavior to analysts. A common assumption is that Retrieval-Augmented Generation (RAG) improves explanation quality by injecting external security knowledge. In this work, we empirically evaluate this assumption for malware explanation using VirusTotal reports as structured input. Across multiple LLMs, we find that RAG frequently degrades explanation quality by introducing distracting or weakly related context and adding narrative noise or generic write-ups. Our results highlight a practical risk in security-critical pipelines for malware explanation that RAG can be counterproductive when structured security evidence is already sufficient. We argue that malware explanation is primarily a signal-extraction task, not a knowledge-retrieval problem, and outline design recommendations for secure development workflows.

arXiv cs.CR — AI Security Papers

research

ai-security

other

May 6, 2026

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

arXiv:2605.02900v1 Announce Type: new Abstract: Embodied Artificial Intelligence (Embodied AI) integrates perception, cognition, planning, and interaction into agents that operate in open-world, safety-critical environments. As these systems gain autonomy and enter domains such as transportation, healthcare, and industrial or assistive robotics, ensuring their safety becomes both technically challenging and socially indispensable. Unlike digital AI systems, embodied agents must act under uncertain sensing, incomplete knowledge, and dynamic human-robot interactions, where failures can directly lead to physical harm. This survey provides a comprehensive and structured review of safety research in embodied AI, examining attacks and defenses across the full embodied pipeline, from perception and cognition to planning, action and interaction, and agentic system. We introduce a multi-level taxonomy that unifies fragmented lines of work and connects embodied-specific safety findings with broader advances in vision, language, and multimodal foundation models. Our review synthesizes insights from over 400 papers spanning adversarial, backdoor, jailbreak, and hardware-level attacks; attack detection, safe training and robust inference; and risk-aware human-agent interaction. This analysis reveals several overlooked challenges, including the fragility of multimodal perception fusion, the instability of planning under jailbreak attacks, and the trustworthiness of human-agent interaction in open-ended scenarios. By organizing the field into a coherent framework and identifying critical research gaps, this survey provides a roadmap for building embodied agents that are not only capable and autonomous but also safe, robust, and reliable in real-world deployment.

arXiv cs.CR — AI Security Papers

research

ai-security

jailbreak

May 6, 2026

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection

arXiv:2605.02958v1 Announce Type: new Abstract: Representation Engineering typically relies on static refusal vectors derived from terminal representations. We move beyond this paradigm, demonstrating that refusal is a dynamic and sparse process rather than a localized outcome. Using Causal Tracing, we uncover the Refusal Trajectory-a persistent upstream signature that remains intact even when adversarial attacks (e.g., GCG) suppress terminal signals. Leveraging this, we propose SALO (Sparse Activation Localization Operator), an inference-time detector designed to capture these latent patterns. SALO effectively recovers defense capabilities against forced-decoding attacks, improving detection rates from ~0% to >90% where methods relying on terminal states perform poorly.

arXiv cs.CR — AI Security Papers

research

ai-security

jailbreak

May 6, 2026

Revisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenses

arXiv:2605.03095v1 Announce Type: new Abstract: Defending large language models (LLMs) against jailbreak attacks, such as Greedy Coordinate Gradient (GCG), remains a challenge, particularly under adaptive threat models where an attacker directly targets the defense mechanism. JBShield, a recent jailbreak defense with a 0% attack success rate in some settings, detects malicious prompts via two concept signals, a toxic concept and a jailbreak concept. We design JB-GCG, which modifies GCG's objective to combine two terms: refusal-direction suppression via cosine similarity between the refusal direction and hidden-state representations, and toxic-concept regularization via JBShield's own toxic concept score. Across five configurations on Llama-3-8B, JB-GCG achieves an average ASR of 46.2%, reaching up to 53.4% in the strongest setting. We further show that our attack remains effective against JBShield-M, achieving ASR up to 30.7% across evaluated settings. The attack persists across multiple JBShield recalibrations, confirming that the vulnerability is structural rather than calibration-specific. We analyze the cosine-similarity signatures of jailbreak representations and find that they occupy a distinctive region in refusal-direction fingerprint space that neither harmless nor harmful prompts inhabit. We introduce Representation Trajectory Verification (RTV), a new defense based on Mahalanobis outlier detection over multi-layer refusal-direction fingerprints. RTV attains an AUROC of 0.99 against our attack. Finally, we design and evaluate an additional adaptive attack against RTV with full white-box knowledge of the defense; the best attack achieves only 7% ASR at 13x the computational cost. Our results show that strong non-adaptive detection does not imply robustness under adaptive threat models, and that multi-layer representation consistency is a more reliable foundation for jailbreak detection than single-layer concept similarity.

arXiv cs.CR — AI Security Papers

research

ai-security

jailbreak

May 6, 2026

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

arXiv:2605.03129v1 Announce Type: new Abstract: Browsing-enabled LLM assistants can fetch webpages and answer contact-seeking queries, creating a practical channel for scraping contact-style personally identifiable information (PII) from public pages. Many prior defenses are deployed at the model, service, or agent layer rather than at the webpage itself, leaving ordinary page owners with limited deployable options. We present PIIGuard, a webpage-level defense that repurposes indirect prompt injection as a protective mechanism: the page owner embeds optimized hidden HTML fragments that steer the model away from verbatim or reconstructible disclosure of contact PII. PIIGuard searches over fragment text and insertion position using rule-based leakage scoring, evolutionary mutation, and final judge-based recoverability assessment. In direct-HTML evaluation on three target models (GPT-5.4-nano, Claude-haiku-4.5, and DeepSeek-chat(latest v3.2)), PIIGuard achieves at least 97.0% defense success rate under both rule-based and judge-based leakage evaluation, often reaching 100.0%, while preserving benign same-page QA utility. We further evaluate two harder settings: public-URL browsing and attacker-side LLM sanitization of fetched webpage. These results show that page-side defensive fragments can remain effective in deployment for some model-position pairs, but robustness varies substantially across browsing interfaces and sanitizer prompts. Overall, PIIGuard demonstrates that page owners can use page-side fragments as a practical mitigation for web-grounded PII leakage.

arXiv cs.CR — AI Security Papers

research

ai-security

prompt-injection

May 6, 2026

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

arXiv:2605.03213v1 Announce Type: new Abstract: Agentic AI systems, specifically LLM-driven agents that plan, invoke tools, maintain persistent memory, and delegate tasks to peer agents via protocols such as MCP and A2A, introduce a threat surface that differs materially from standalone model inference. Agents accumulate sensitive context, hold credentials, and operate across pipelines no single party fully controls, enabling prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Current defenses operate entirely within the software stack and can be silently bypassed by a sufficiently privileged adversary such as a compromised cloud operator. Confidential computing (CC) offers a hardware-rooted alternative: Trusted Execution Environments (TEEs) isolate agent code and data from privileged system software, while remote attestation enables verifiable trust across distributed deployments. This survey synthesizes the design space in four parts: (i) a unified taxonomy of six TEE platforms (Intel SGX, Intel TDX, AMD SEV-SNP, ARM TrustZone, ARM CCA, and NVIDIA H100 CC) covering deployment roles and performance tradeoffs; (ii) an agent-centric threat model spanning perception, planning, memory, action, and coordination layers mapped to nine security goals; (iii) a comparative survey of CC-based defenses distinguishing findings that transfer from single-call inference versus what requires new agentic designs; and (iv) six open challenges including compound attestation for multi-hop agent chains and GPU-TEE performance at LLM scale. While several hardware trust primitives appear mature enough for targeted deployments, no broadly established end-to-end framework yet binds them into a coherent security substrate for production agentic AI.

arXiv cs.CR — AI Security Papers

research

ai-security

prompt-injection

May 6, 2026

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

arXiv:2605.03378v1 Announce Type: new Abstract: The rise of Large Language Model (LLM) agents, augmented with tool use, skills, and external knowledge, has introduced new security risks. Among them, prompt injection attacks, where adversaries embed malicious instructions into the agent workflow, have emerged as the primary threat. However, existing benchmarks and defenses are fundamentally limited as they assume context-insensitive settings in which the agent works under a fully specified user instruction, and the attacks are straightforward and context-independent. As a result, they fail to capture real-world deployments where agent behavior usually depends on dynamic context, not just the user prompt, and adversaries can adapt their attacks to different context. Similarly, existing defenses built on this narrow threat model overlook the nature of real-world agent delegation. In this paper, we present AgentLure, a benchmark that captures context-dependent tasks and context-aware prompt injection attacks. AgentLure spans four agentic domains and eight attack vectors across diverse attack surfaces. Our evaluation shows that existing defenses often struggle in this setting, yielding poor performance against such attacks in agentic systems. To address this limitation, we propose ARGUS, a defense mechanism that enforces provenance-aware decision auditing for LLM agents. ARGUS constructs an influence provenance graph to track how untrusted context propagates into agent decisions and verify whether a decision is justified by trustworthy evidence before execution. Our evaluation shows ARGUS reduces attack success rate to 3.8% while preserving 87.5% task utility, significantly outperforming existing defenses and remaining robust against adaptive white-box adversaries.

arXiv cs.CR — AI Security Papers

research

ai-security

prompt-injection

May 6, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

arXiv:2605.03441v1 Announce Type: new Abstract: Large language models (LLMs) employ safety mechanisms to prevent harmful outputs, yet these defenses primarily rely on semantic pattern matching. We show that encoding harmful prompts as coherent mathematical problems -- using formalisms such as set theory, formal logic, and quantum mechanics -- bypasses these filters at high rates, achieving 46%--56% average attack success across eight target models and two established benchmarks. Crucially, the effectiveness depends not on mathematical notation itself, but on whether a helper LLM deeply reformulates the harmful content into a genuine mathematical problem: rule-based encodings that apply mathematical formatting without such reformulation perform no better than unencoded baselines. We introduce a novel Formal Logic encoding that achieves attack success comparable to Set Theory, demonstrating that this vulnerability generalizes across mathematical formalisms. Additional experiments with repeat post-processing confirm that these attacks are robust to simple prompt augmentation. Notably, newer models (GPT-5, GPT-5-Mini) show substantially greater robustness than older models, though they remain vulnerable. Our findings highlight fundamental gaps in current safety frameworks and motivate defenses that reason about mathematical structure rather than surface-level semantics.

arXiv cs.CR — AI Security Papers

research

ai-security

other

May 6, 2026

Self-Mined Hardness for Safety Fine-Tuning

arXiv:2605.03226v1 Announce Type: cross Abstract: Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's difficulty by how often the target model's own rollouts are judged harmful, then fine-tune on the hardest prompts paired with the model's own non-jailbroken rollouts. On Llama-3-8B-Instruct and Llama-3.2-3B-Instruct, this approach cuts the WildJailbreak attack success rate from 11.5% and 20.1% down to 1-3%, but pushes refusal on jailbreak-shaped benign prompts from 14-22% to 74-94%. Interleaving the same hard prompts 1:1 with adversarially-framed benign prompts (prompts that look like jailbreaks but have benign intent) cuts that refusal back down to 30-51% on 8B and 52-72% on 3B, at a cost of 2-6 percentage points of attack success rate. Within the mixed regime, training on the hardest half of the eligible pool rather than a random half cuts the remaining ASR by 35-50% (about 3 percentage points) on both models.

arXiv cs.CR — AI Security Papers

research

ai-security

jailbreak

May 6, 2026

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

arXiv:2605.04019v1 Announce Type: cross Abstract: AI systems are entering critical domains like healthcare, finance, and defense, yet remain vulnerable to adversarial attacks. While AI red teaming is a primary defense, current approaches force operators into manual, library-specific workflows. Operators spend weeks hand-crafting workflows - assembling attacks, transforms, and scorers. When results fall short, workflows must be rebuilt. As a result, operators spend more time constructing workflows than probing targets for security and safety vulnerabilities. We introduce an AI red teaming agent built on the open-source Dreadnode SDK. The agent creates workflows grounded in 45+ adversarial attacks, 450+ transforms, and 130+ scorers. Operators can probe multi-agent systems, multilingual, and multimodal targets, focusing on what to probe rather than how to implement it. We make three contributions: 1. Agentic interface. Operators describe goals in natural language via the Dreadnode TUI (Terminal User Interface). The agent handles attack selection, transform composition, execution, and reporting, letting operators focus on red teaming. Weeks compress to hours. 2. Unified framework. A single framework for probing traditional ML models (adversarial examples) and generative AI systems (jailbreaks), removing the need for separate libraries. 3. Llama Scout case study. We red team Meta Llama Scout and achieve an 85% attack success rate with severity up to 1.0, using zero human-developed code

arXiv cs.CR — AI Security Papers

research

ai-security

jailbreak

May 6, 2026

Position: Mind the Gap-AI Security and the Limits of Current Reporting Standards

arXiv:2412.14855v4 Announce Type: replace Abstract: AI systems face a growing number of AI security threats that are increasingly exploited in the real world. Hence, shared AI incident reporting practices are emerging in industry as best practice and as mandated by regulatory requirements. Although non-AI cybersecurity and non-security AI reporting have progressed as industrial and policy norms, existing collections of practices do not meet the specific requirements posed by AI security reporting. we argue that established processes are not well aligned with AI security reporting due to fundamental shortcomings for the distinctive characteristics of AI systems. Some of these shortcomings are immediately addressable, while others remain unresolved technically or within social systems, like the treatment of IP or the ownership of a vulnerability. Based on this position, we examine the limitations of current AI security incident reporting proposals. We conclude that the advent of AI agents will further reinforce the need to advance specialized AI security incident reporting.

arXiv cs.CR — AI Security Papers

research

ai-security

governance

May 6, 2026

Do Multimodal RAG Systems Leak Data? A Comprehensive Evaluation of Membership Inference and Image Caption Retrieval Attacks

arXiv:2601.17644v3 Announce Type: replace Abstract: The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g., visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets and improve model performance, it risks the leakage of private information from these datasets. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to determine whether a visual asset (e.g., image) is included in the mRAG, and, if present, to leak the metadata (e.g., caption) related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy. Our code is published online: https://github.com/aliwister/mrag-attack-eval.

arXiv cs.CR — AI Security Papers

research

ai-security

model-extraction

May 6, 2026

The AI risk repository: A meta-review, database, and taxonomy of risks from artificial intelligence

arXiv:2408.12622v3 Announce Type: replace-cross Abstract: Artificial intelligence (AI) is reshaping society, from video generation to medical diagnosis, coding agents to autonomous vehicles. Yet researchers, policymakers, and technology companies lack shared terminology for discussing AI risks. Consider "privacy": one framework uses this term to describe a model's ability to leak sensitive training data, while another uses it to mean freedom from government surveillance. Conversely, researchers have introduced "Goodhart's law," "specification gaming," "reward hacking," and "mesa-optimization" to describe the same phenomenon of AI systems optimizing for measured proxies rather than intended goals. This terminological diversity creates friction: comparing findings across studies requires mapping between frameworks, and comprehensive risk coverage requires consulting multiple taxonomies that use different organizing principles. This paper addresses this challenge by creating a comprehensive catalog of AI risks. We systematically analyzed every major AI risk framework published to date-74 frameworks containing 1,725 distinct risks-and organized them into a unified system. Our two classification systems reveal important patterns: contrary to common assumptions, human decisions cause nearly as many AI risks (38%) as the AI systems themselves (42%). The work provides practical tools for anyone working on AI safety, from developers conducting risk assessments to policymakers writing regulations to auditors evaluating AI systems. By establishing a common reference point, this repository creates the foundation for more coordinated and comprehensive approaches to managing AI's risks while realizing its benefits.

arXiv cs.CR — AI Security Papers

research

ai-security

other

May 6, 2026

Jailbroken Frontier Models Retain Their Capabilities

arXiv:2605.00267v2 Announce Type: replace-cross Abstract: As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. We show that this tax scales inversely with model capability and that the most advanced jailbreaks effectively yield no reduction in model capabilities. Evaluating 28 jailbreaks on five benchmarks across Claude models ranging in capability from Haiku 4.5 to Opus 4.6, we find Haiku 4.5 loses an average of 33.1% on benchmark performance when jailbroken, while Opus 4.6 at max thinking effort loses only 7.7%. We also observe that across all models, reasoning-heavy tasks display considerably more degradation than knowledge-recall tasks. Finally, Boundary Point Jailbreaking, currently the strongest jailbreak against deployed classifiers, achieves near-perfect classifier evasion with near-zero degradation across safeguarded models. We recommend that safety cases for frontier models should not rely on a meaningful capability degradation from jailbreaks.

arXiv cs.CR — AI Security Papers

research

ai-security

jailbreak

May 6, 2026

ISC Stormcast For Wednesday, May 6th, 2026 https://isc.sans.edu/podcastdetail/9920, (Wed, May 6th)

(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License.

SANS Internet Storm Center

May 5, 2026

New stealthy Quasar Linux malware targets software developers

A previously undocumented Linux implant named Quasar Linux (QLNX) is targeting developers' systems with a mix of rootkit, backdoor, and credential-stealing capabilities. [...]

BleepingComputer

May 5, 2026

Instructure hacker claims data theft from 8,800 schools, universities

The hacker behind a breach at education technology giant Instructure claims to have stolen 280 million data records for students and staff from 8,809 colleges, school districts, and online education platforms. [...]

BleepingComputer

Mfg

May 5, 2026

Trellix Source Code Breach Highlights Growing Supply Chain Threats

Info is scant, but such breaches can reveal where a security product's controls are located and how detections are designed, giving attackers a leg up.

Dark Reading

manufacturing

May 5, 2026

Research Hub Bridges Cybersecurity Gap for Under-Resourced Organizations

The UC Berkeley Center for Long-Term Cybersecurity (CLTC) offers tools and support to schools, local governments, and non-profits as they defend themselves against a growing volume of cyberattacks.

Dark Reading

Mfg

May 5, 2026

DAEMON Tools trojanized in supply-chain attack to deploy backdoor

Hackers trojanized installers for the DAEMON Tools software and since April 8, delivered a backdoor to thousands of systems that downloaded the product from the official website. [...]

BleepingComputer

manufacturing

May 5, 2026

Why Security Leadership Makes or Breaks a Pen Test

Well-run security drills go beyond checking audit boxes to identify and address trouble spots. Effective leaders can ensure proper scope, access, and follow-through, but it’s not easy.

Dark Reading

May 5, 2026

Student hacked Taiwan high-speed rail to trigger emergency brakes

A 23-year-old university student in Taiwan was arrested for interfering with the TETRA communication system used by the country's high-speed railway network (THSR). [...]

BleepingComputer

May 5, 2026

Critical Apache HTTP/2 Flaw (CVE-2026-23918) Enables DoS and Potential RCE

The Apache Software Foundation (ASF) has released security updates to address several security vulnerabilities in the HTTP Server, including a severe vulnerability that could potentially lead to remote code execution (RCE). The vulnerability, tracked as CVE-2026-23918 (CVSS score: 8.8), has been described as a case of "double free and possible RCE" in the HTTP/2 protocol handling. This issue

The Hacker News

Mfg

May 5, 2026

DAEMON Tools Supply Chain Attack Compromises Official Installers with Malware

A newly identified supply chain attack targeting DAEMON Tools software has compromised its installers to serve a malicious payload, according to findings from Kaspersky. "These installers are distributed from the legitimate website of DAEMON Tools and are signed with digital certificates belonging to DAEMON Tools developers," Kaspersky researchers Igor Kuznetsov, Georgy Kucherin, Leonid

The Hacker News

manufacturing

Watchlist

May 5, 2026

Microsoft Edge Stores Passwords in Process Memory, Posing Enterprise Risk

A proof-of-concept exploit (PoC) shows how someone with admin privileges can exploit the issue to steal passwords, and thus use them to engage in further malicious activity.

Dark Reading

May 5, 2026

FTC to ban data broker Kochava from selling Americans’ location data

The FTC will ban data broker Kochava and its subsidiary, Collective Data Solutions (CDS), from selling location data without consumers' explicit consent to settle charges alleging that it sold precise geolocation data collected from hundreds of millions of mobile devices. [...]

BleepingComputer

May 5, 2026

Cleartext Passwords in MS Edge? In 2026?, (Mon, May 4th)

Yup, that is for real.

SANS Internet Storm Center

May 5, 2026

China-Linked UAT-8302 Targets Governments Using Shared APT Malware Across Regions

A sophisticated China-nexus advanced persistent threat (APT) group has been attributed to attacks targeting government entities in South America since at least late 2024 and government agencies in southeastern Europe in 2025. The activity is being tracked by Cisco Talos under the moniker UAT-8302, with post-exploitation involving the deployment of custom-made malware families that have been put

The Hacker News

May 5, 2026

The EOL Blind Spot in Your CVE Feed: What SCA Tools Miss

Critical vulnerabilities can exist in open source software your scanners don't check. HeroDevs reveals how EOL software creates blind spots in CVE feeds and SCA tools, and how you can receive a free end-of-life scan for your projects. [...]

BleepingComputer