AppSOC Research Labs Delivers Damning Verdict On DeepSeek-R1

Silicon Valley security provider AppSOC has branded DeepSeek-R1, one of the latest highly advanced artificial intelligence (AI) models to emerge from China, a “high-risk model unsuitable for enterprise use.” They strongly recommend that enterprises not use the DeepSeek-R1 model provided on Azure for “any AI applications, especially those involving personal information, sensitive data or IP.”

High Stakes

Securing AI is now a stand-alone cyber-market segment anticipated to grow to $255 million by 2027. Although organizations are always on the lookout for a great deal, vulnerability in cybersecurity is one of the most-cited risks of AI adoption.

The stakes are sky-high, as the financial and reputational costs of a breach can be devastating. As such, businesses are understandably being extremely diligent when choosing which AI models to integrate into their operations.

Let’s take a closer look at DeepSeek-R1 and see why AppSoc is currently warning against it for enterprise use.

DeepSeek-R1

DeepSeek-R1, or R1, is an open source large language model (LLM) from Chinese AI startup DeepSeek. One of its key functions is that it powers the company’s chatbot, which can be considered as a direct competitor to ChatGPT. DeepSeek-R1 has about 670 billion parameters it learns from during training, making it the largest open-source LLM on the market. The service was built, and is offered to users, at a fraction of the cost of models provided by US tech giants such as OpenAI, Google, and Meta.

Earlier this year, Microsoft announced that DeepSeek R1 had become available on Azure AI Foundry and GitHub. They described how Deepseek R1’s rapid accessibility was “central to our vision for Azure AI Foundry” and how the safeguards in Azure AI “provide a secure, compliant, and responsible environment for enterprises to confidently deploy AI solutions.”

However, research at AppSOC has not found this to be the case.

Testing Times

AppSOC Research Labs tested the Azure-hosted version of DeepSeek-R1 both with and without Azure’s built-in content filters and guardrails. Although overall, they discovered during their March testing that having the Azure filters and guardrails on was slightly safer than having them off, in the end, their aggregated risk score only improved from a score of 8.3/10 compared to 8.4/10.

Some of the key findings of the report relating to supply chain risk, malware generation, and prompt injection are listed below.

Supply Chain Risk

When tested in the supply chain threat category, the model was found to hallucinate and make unsafe software package recommendations. Strikingly, the Azure filters seemed to exacerbate the problem as the model’s failure rate was noted to increase from 5.8% without filters to 6.9% with them, suggesting that the filters might interfere with useful model behavior in unexpected ways.

Malware Generation

AI models should not be capable of generating malicious code. However, DeepSeek-R1 flunked malware generation tests at a 96.7% rate without filters – and 93.8% of the time with filters. Both percentages are classed as being “dangerously high” by AppSoc. The key takeaway from this particular test was that although Azure’s filters seem to reduce risk in this area marginally, the model remains almost wholly susceptible to malware prompts.

Prompt Injection

Prompt Injection failures were found to be reduced by Azure filters but still remain very high in tests. Prompts were found to ignore guardrails, leak data, or subvert behavior. Failure rates with Azure filters off were 57.1% and reduced to 40% when the Azure filters were on, with the report highlighting that a 40% failure rate here is unacceptable for enterprise use.

Further Insight

In reviewing the findings, AppSOC Chief Scientist and Co-Founder Mali Gorantla commented on how the: “Results confirm a troubling reality: while Azure filters provide some value, DeepSeek-R1 remains a high-risk model unsuitable for enterprise use. Microsoft Azure’s content filters are designed to enforce safety and reduce risk. In some categories, they made almost no difference. In one troubling case, the filters may have even worsened model performance.”

Although DeepSeek R1 has been heralded as an ‘overnight success,’ it appears that many months of refinements are required before it can truly be deemed fit enough for mass organizational adoption.

Adam Parlett

Adam Parlett is a cybersecurity marketing professional who has been working as a project manager at Bora for over two years. A Sociology graduate from the University of York, Adam enjoys the challenge of finding new and interesting ways to engage audiences with complex Cybersecurity ideas and products.

The opinions expressed in this post belong to the individual contributors and do not necessarily reflect the views of Information Security Buzz.

Adam Parlett

AI-assisted software engineering is creating a new delivery paradox

What Are AI SOC Agents? Use Cases, Architecture, and the Leading Vendors

AI-Powered Attacks Become Top Concern for Security Professionals, New Filigran Survey Reveals

Working With Us

Write For Us

The Pages

AppSOC Research Labs Delivers Damning Verdict on DeepSeek-R1

High Stakes

DeepSeek-R1

Testing Times

Supply Chain Risk

Malware Generation

Prompt Injection

Further Insight

Adam Parlett

Related Posts

AI-assisted software engineering is creating a new delivery paradox

What Are AI SOC Agents? Use Cases, Architecture, and the Leading Vendors

AI-Powered Attacks Become Top Concern for Security Professionals, New Filigran Survey Reveals