Close Menu
  • Home
  • Articles
    • Attacks
      • BEC
      • Data Breach
      • DDoS
      • Evasion Attacks
      • Injection
      • Malware
      • MITM
      • Phishing
      • Ransomware
      • RCE
      • Social Engineering
      • Spoofing
      • Spyware
    • Business and Policy
      • BCP and DRP
      • GRC
      • Regulations
    • Data Protection
      • DLP
      • DRM
      • Encryption
      • IAM
    • Future, Trends and Insight
      • AI
      • Events & Community
      • Emerging Tech
      • Expert Panel
      • Interviews With Experts
      • Insights
      • Study & Research
    • Resources
      • Guides
      • Tools
      • Training & Education
    • Security
      • API
      • Apps
      • Cloud
      • Critical Infrastructure
      • Endpoint
      • Hardware
      • IoT
      • Mobile
      • Network
      • OT
      • Port Security
      • Security Architecture
      • Software Development
      • Supply Chain
      • Zero Trust
    • Threats and Vulnerabilities
      • Emerging Threats
      • Insider Threats
      • Risk Management
      • Threat Intelligence
      • Zero Day
  • News and Exclusives
    • Latest News
    • ISB Exclusive
    • Positive News
  • Who We Are
    • About Us
    • Information Security Buzz Expert Panel​
    • Write for Us
    • Media Pack
  • Contact Us
  • Newsletter
Facebook X (Twitter) LinkedIn
Facebook X (Twitter) LinkedIn
Information Security BuzzInformation Security Buzz
  • Home
  • Articles
    • Attacks
      • BEC
      • Data Breach
      • DDoS
      • Evasion Attacks
      • Injection
      • Malware
      • MITM
      • Phishing
      • Ransomware
      • RCE
      • Social Engineering
      • Spoofing
      • Spyware
    • Business and Policy
      • BCP and DRP
      • GRC
      • Regulations
    • Data Protection
      • DLP
      • DRM
      • Encryption
      • IAM
    • Future, Trends and Insight
      • AI
      • Events & Community
      • Emerging Tech
      • Expert Panel
      • Interviews With Experts
      • Insights
      • Study & Research
    • Resources
      • Guides
      • Tools
      • Training & Education
    • Security
      • API
      • Apps
      • Cloud
      • Critical Infrastructure
      • Endpoint
      • Hardware
      • IoT
      • Mobile
      • Network
      • OT
      • Port Security
      • Security Architecture
      • Software Development
      • Supply Chain
      • Zero Trust
    • Threats and Vulnerabilities
      • Emerging Threats
      • Insider Threats
      • Risk Management
      • Threat Intelligence
      • Zero Day
  • News and Exclusives
    • Latest News
    • ISB Exclusive
    • Positive News
  • Who We Are
    • About Us
    • Information Security Buzz Expert Panel​
    • Write for Us
    • Media Pack
  • Contact Us
  • Newsletter
Subscribe
Information Security BuzzInformation Security Buzz
Home - Data Protection - Connecting an LLM to Your Database Is Risky Business
Data Protection Articles Artificial Intelligence Data Loss Prevention Regulations and Compliance Security

Connecting an LLM to Your Database Is Risky Business

David BalabanBy David BalabanJanuary 23, 20256 Mins Read
Share LinkedIn Twitter Facebook Copy Link Email
LLM
Share
Facebook Twitter LinkedIn Email Copy Link
Quick AI Summary
ChatGPTClaudeGeminiGrokPerplexityDeepSeekCopilot

Enterprises want it all, and they want it now – or at least within a few seconds. They want the benefits that GenAI can bring, like fast content and strategic advice based on data inputs.

It’s not surprising that GenAI adoption is skyrocketing. Economists at the University of Chicago found that over 50% of us regularly use ChatGPT at work, rising to around three-quarters of respondents when it comes to roles like software development, marketing and IT. Forrester, meanwhile, has reported that 73% of data and analytics decision-makers see a positive impact on their organizations from the use of AI.

The quest for GenAI-powered insights drives many businesses to connect their databases directly with an LLM. On the surface, it seems like a great idea: they can uncover new insights based on proprietary information that was previously inaccessible and roll out helpful chatbots that quickly answer customer questions without the need to wait for an agent.

The idea is to create a seamless system for line-of-business (LOB) users to gain answers to their queries without having to know how to code, how to turn questions into SQL queries, or which visualizations are the best fits for their reports.

But there are a number of risks lurking under the surface. Directly joining your database with a publicly available LLM opens a Pandora’s box of potential data leaks, regulatory non-compliance, incorrect responses, and vulnerabilities for cyber attacks. It’s not always workable, either. Big databases can slow an LLM down so much that it’s not effective for enterprise use cases.

Is there a way for enterprises to tap into the benefits of using GenAI with proprietary data, without suffering the drawbacks?

Compromising Data Privacy, Compliance and Security

Unfortunately, public LLMs are far from secure. Samsung made headlines when proprietary data was leaked through an employee using ChatGPT. Prompts can leak sensitive data to users outside your company, especially if malicious actors hack or trick the LLM. What’s more, the big LLM developers use your prompts and uploads for the purposes of training their models.

Even if all the data remains unseen by unauthorized eyes, simply sharing it with a cloud-hosted LLM can be a breach of compliance with regulations like GDPR, CCPA, HIPAA, which have strict rules about cloud server usage. Additionally, every connection point to your database is a potential entry point for cyberattacks, including APIs for LLMs.

“The privacy risks are extreme, in my opinion,” explains Avi Perez, CTO and co-founder of Pyramid Analytics. “Because you’re effectively sharing your top-secret corporate information that is completely private and frankly, let’s say, offline, and you’re sending it to a public service that hosts the chatbot and asking it to analyze it.”

The dangers here are real, Perez continues. “And that opens up the business to all kinds of issues – anywhere from someone sniffing the question on the receiving end, to the vendor that hosts the AI LLM capturing that question with the hints of data inside it, or the data sets inside it, all the way through to questions about the quality of the LLM’s mathematical or analytical responses to data.”

With specialized setups, it’s possible to share just the metadata instead of connecting the LLM directly to your entire database. The LLM then processes your questions and generates queries that you can run on your data without exposing it to unauthorized access. This is the architecture that some advanced business intelligence platforms employ.

Masking techniques, encryption, and tokenization can also be used to hide sensitive variables while preserving the data’s fundamental schema and structure.

Slow and Inefficient Workflows

Many business customers have gigantic amounts of data, amounting to millions of rows and hundreds of columns. Many services don’t even permit you to upload datasets of that magnitude, but if you try, you’ll find that it slows the system down dramatically.

It could take hours for all the data to be uploaded. By the time your LLM processes all those rows, the data will be out of date. It’s one thing to use an LLM to write a single personalized email, but as the data grows, so does processing time. Token limits or quotas can create throughput bottlenecks and drive up latency.

Users would have to wait minutes for an answer, and nobody has that kind of patience. Certainly not your customers, who expect instant replies from customer support chatbots. Given that one reason for connecting the LLM to your database is greater efficiency and more meaningful insights, this approach is tantamount to shooting yourself in the foot.

Query translators offer a way to overcome this hurdle. These are tools that convert a natural language prompt or description into an SQL statement. You can review the code, and then execute the query on your self-hosted secure environment. The LLM only sees your query, not the data, thereby protecting your database without compromising on real-time analysis.

Unreliable Outputs

Your databases might not be organized well enough for the LLM to do its job effectively, and your source applications could deliver data in ways that LLMs can’t understand. A data scientist can ensure the data used is clean, and the LLM is able to make an order of it, but removing that layer drives hallucinations or errors. When you connect the LLM directly to unprocessed data, it doesn’t necessarily know what to do. You’d have to update the LLM whenever the source app is updated, and debugging would be a never-ending challenge.

It doesn’t help that few LOB users know how to write prompts, code queries, or configure LLMs. If they word a query in a way that delivers irrelevant responses, they won’t know that the outcome is inaccurate. They might use phrasing that causes the LLM to delete some data or creates an unnecessarily performance-intensive inquiry that raises expenses.

“The quality of the output heavily depends on the relevance and quality of the information retrieved,” says Zive’s Stefanie Dankert. “If the underlying knowledge base is poorly organized or out-of-date, the answers provided by the LLM can be inaccurate or irrelevant. Unfortunately, this is almost always the case in reality, and only small companies are typically able to keep all their internal data and knowledge structured manually.”

One solution is to apply sandboxing. This is where you build a controlled environment for LLMs to run SQL queries on a sample or synthetic database using anonymized, aggregated, placeholder, or even AI-generated data. The sandbox can mimic a real database while keeping your actual data securely isolated from the LLM.

It’s an effective way to generate, test, and validate SQL queries. Even if the data is modified or accessed by unauthorized personnel, the actual risks are slim to none.

LLMs and Databases Need to Maintain a Healthy Distance

It’s true that AI brings a lot more power to business analytics, driving productivity, democratizing access to insights, and speeding up response times. But connecting it directly to your database doesn’t necessarily bring the extra benefits you might expect. It’s important to use workarounds that establish secure fences between the database and the LLM so that you can enjoy the advantages of GenAI analysis without the drawbacks.

David Balaban
David Balaban

David Balaban is a cybersecurity analyst with two decades of track record in malware research and security software evaluation. David runs Privacy-PC.com and MacSecurity.net projects that harbor expert opinions on contemporary InfoSec matters, including social engineering, malware, penetration testing, threat intelligence, online privacy, and white hat hacking. David has a solid malware troubleshooting background, with a recent focus on ransomware countermeasures.

    The opinions expressed in this post belong to the individual contributors and do not necessarily reflect the views of Information Security Buzz.

    Share. Facebook Twitter LinkedIn Email Copy Link

    Related Posts

    Visual data is the blind spot in enterprise security: that’s about to change

    May 4, 20267 Mins Read

    Making stolen data worthless: why security must start with the data

    March 30, 20265 Mins Read

    Meta’s Smart Glasses Privacy Scandal Expands After Sama Credentials Found on the Dark Web

    March 10, 20264 Mins Read
    ISB-Bora-Side-Bar

    No se ha podido establecer conexión. Error 429

     
    ISB-Bora-Side-Bar
    Black ISB Logo

    Information Security Buzz is an independent resource that provides the experts’ comments, analysis, and opinion on the latest Cybersecurity news and topics

    X (Twitter) LinkedIn Facebook RSS

    Working With Us

    • About Us
    • Advertise With Us
    • Contact Us

    Write For Us

    • How To Contribute

    The Pages

    • Privacy Policy
    • Cookie Policy
    • AI Policy
    • Terms & Conditions
    • Copyright Notice

    Information Security Buzz and all its contents are copyright © 2014-2025. All rights reserved. All third-party trademarks are recognized.

    Type above and press Enter to search. Press Esc to cancel.

    Manage Consent
    To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
    Functional Always active
    The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
    Preferences
    The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
    Statistics
    The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
    Marketing
    The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
    • Manage options
    • Manage services
    • Manage {vendor_count} vendors
    • Read more about these purposes
    View preferences
    • {title}
    • {title}
    • {title}