According to McKinsey, the enterprise use of Generative AI could generate an astounding $2.6 trillion to $4.4 trillion annually across more than 60 use cases. Additionally, Accenture analyzed 12 developed economies and found that AI has the potential to double their annual economic growth rates by 2035.
Given these impressive projections, it is clear that AI, particularly Large Language Models (LLMs), is set to revolutionize numerous industries. LLMs can draft emails, generate creative content, assist in coding, and provide customer support, among other things. For instance, in a study by the National Bureau of Economic Research, customer support agents using Generative AI tools saw a 13.8% increase in the number of customer issues resolved per hour.
While the capabilities of LLMs are impressive, their widespread adoption comes with significant risks. Understanding these vulnerabilities is crucial for enterprises looking to harness AI’s power responsibly.
The risks of generative AI
Large Language Models (LLMs) face specific vulnerabilities that can allow threat actors to extract personally identifiable information (PII) through targeted attacks on the vast datasets these models are trained on.
Real-world example:
In March 2023, a vulnerability in the Redis library used by ChatGPT led to a data breach, exposing sensitive user information. The breach allowed certain users to see chat histories and payment-related details of other active users, including names, email addresses, and partial credit card information.
Threat actors can also exploit LLMs’ susceptibility to bias within their training data. This can have profound consequences, as biased AI outputs can lead to discriminatory hiring practices, unfair loan approvals, or the spread of misinformation.
Given these significant risks, enterprises must adopt comprehensive data protection measures to safeguard sensitive information. This is where Secure Information Management (SIM) plays a crucial role. By implementing stringent data governance and training data sanitization processes, SIM ensures that enterprises can mitigate these risks effectively.
What is secure information management?
SIM focuses on keeping sensitive information (content such as documents, files, cloud folders, etc.) secure when in-use, in-transit, and at-rest. It also enforces data governance through identity verification and access controls. By using policies to discover and classify data based on sensitivity, SIM ensures data confidentiality throughout its lifecycle.
Implementing SIM is essential for managing data security throughout its lifecycle. But how does SIM enhance existing data management practices like Data Lifecycle Management (DLM)?
Secure information management and data lifecycle management
Data Lifecycle Management (DLM) manages data from creation to deletion, involving storage, backup, archiving, and disposal. Secure Information Management (SIM) enhances DLM with encryption, access control, and audits, ensuring data integrity, compliance, risk mitigation, secure storage, and availability, thereby strengthening DLM’s effectiveness.
The business benefits of SIM
- Visibility: Provides real-time visibility into data security events and incidents. This proactive approach significantly reduces the risk of data breaches, thereby helping to maintain and protect the brand’s reputation.
- Compliance: Enables organizations to comply with regulatory requirements such as CCPA, HIPAA, GLBA, GDPR, as well as emerging AI-oriented regulations. Last year alone the total number of AI-related regulations in the US grew by 56.3%.
- Automation: By automating data security monitoring and incident response, SIM helps IT and security teams focus on more strategic initiatives, further driving business growth.
- Data Privacy: SIM ensures the protection of clients’ PII, preventing costly fines, lawsuits, and customer loss.
The importance of data cleaning and protection
PII and sensitive data can inadvertently end up in training datasets for LLMs through sources such as web scraping, social media, user-generated content, code repositories, and academic papers. Threat actors can then exploit various methods to extract this data. For example, researchers demonstrated a ‘divergence attack,’ prompting ChatGPT to repeatedly output the word ‘poem’ led to the extraction of over 10,000 unique memorized training examples, including email addresses and phone numbers. This simple command caused ChatGPT to deviate from its aligned responses, resulting in the unexpected release of training data. Notably, 16.9% of tested LLM outputs contained memorized PII.
To mitigate these risks, it is crucial to implement the following:
- Automated Data Discovery and Classification: Use automated scanning and machine learning to identify and classify PII across various data sources.
- Data Sanitization: Thoroughly scrub or protect training data using encryption, masking, or tokenization before LLM training.
- Data Minimization: Collect, process, and store only the minimum necessary sensitive data, ensuring privacy, security, and compliance with legal and ethical standards. This also reduces storage costs and supports green policies.
- Policy Management and Access Controls: Create custom data handling policies, enforce access controls, and regularly audit LLM outputs.
- The business benefits of these strategies include faster time-to-monetization through the utilization of clean AI-ready data and maintain data privacy and customer trust through risk mitigation.
Assessing potential business impact in data discovery and classification
Not all data are created equal. According to the IBM Cost of Data Breach Report 2023, key categories of compromised records include Customer PII with an average cost of US$183 per record, employee PII costing US$181 and even non-PII costing US$138. This is where risk modeling comes into play:
- Given that traditional data breach cost models are insufficient, advanced financial risk modeling is key to accurately capturing the variability and severity of potential financial impacts of data breaches.
- Advanced financial risk modeling involves using complex, nuanced models to help organizations better manage and mitigate those risks.
Conclusion: The imperative of secure information management in AI adoption
In conclusion, embracing Secure Information Management (SIM) alongside Data Lifecycle Management (DLM) is not just about mitigating risks but also about leveraging AI to its fullest potential. By adopting these practices, businesses can ensure regulatory compliance, maintain data privacy, build customer trust, and achieve sustained growth in the AI era.
OpenText stands out as a leader in the SIM space, providing comprehensive solutions that integrate data security, identity, and access management to help businesses become AI-ready. For more information see our OpenText Voltage Fusion (Data Security) and OpenText Identity and Access Management pages.