Introduction: First data, then technology
Let us reframe a conversation we have constantly.
Over the past 24 months, we’ve heard from CIOs, CTOs, and forward-leaning CMOs. Nearly all of them ask the same question: “What’s our AI strategy?” They want a vendor shortlist, a use-case matrix, or a governance framework.
But every time, we have to stop them and say: “You don’t need an AI strategy yet. You need an information management (IM) strategy first. And the fact that you don’t see the difference is why your pilot projects are already at risk.”
That may sound direct, but as a leader making AI investment decisions, you need the unvarnished truth. Generative AI and LLMs are not magic. They are probability engines. They don’t reason or understand, they pattern-match on whatever data you feed them. If that data is chaotic, duplicated, obsolete, or wrong, your AI will be spectacularly, expensively wrong at scale.
In a recent workshop that OpenText ran with Deep Analysis at the annual AI+IM Global Summit, we spent time with practitioners from a range of industries, from organizations that triage situations of human trauma to regulated banks and oil companies. Despite their differences, they all faced a strikingly similar challenge. AI was delivering real value in focused, well-scoped use cases while broader initiatives kept stalling. The gap wasn’t the technology. It was the data behind it, and the organizational pressure to move faster than their information foundations could support.
So, let’s talk about why information management, that unglamorous, decades-old discipline of records management, metadata, and data governance, has suddenly become your single most critical strategic capability.
The core problem: Garbage in, garbage out, at warp speed
You’ve heard the phrase “garbage in, garbage out” (GIGO). With traditional analytics, garbage moved slowly. A bad report meant a bad quarterly decision. With generative and agentic AI, garbage is instantaneous and conversational. Additionally, it is often taken as fact without source information being checked, as AI responses are designed to be plausible. So, let’s add another variation on GIGO – “garbage in, gospel out” – as what the AI prompt generates may well be garbage, but it’s plausible garbage, and many will take it to be true and accurate.
Imagine a single outdated contract sitting in a file share. It’s been superseded but not archived. An AI ingests it via a retrieval pipeline and confidently recites the old terms to a customer agent or compliance officer as truth. This happens constantly – wrong customer data, wrong drawings, wrong procedures. The only difference now is speed and confidence.
The three IM capabilities you must have before buying more AI
If you don’t have these three capabilities in place, stop buying GPUs. Put your AI plans on hold.
1. Authoritative source identification
Most leaders assume you need to consolidate everything into a data lake for AI. That is a disaster. What you need is a well-managed content layer that enables you to easily identify what is appropriate to address your chosen use cases. You don’t need all your data. You need the right data.
You need a formal, machine-readable answer to questions like:
- Which system holds the canonical version of a customer agreement?
- Which document repository is the source of truth for regulatory filings?
- Which knowledge base has been legally reviewed?
Without this, your AI will give a Slack message from an intern the same weight as a board-approved policy.
Action for you: For your top five AI use cases, work backward. For each output the AI will produce, identify the one or two authoritative data sources required. Ruthlessly exclude everything else.
2. Lifecycle governance with automated expiration
Most enterprise content has no expiration date. It sits forever. That’s fine for a human who ignores a 2018 memo. It’s catastrophic for an AI that has no concept of time unless you teach it.
You need what we call “AI-ready retention.” Every document, database row, or email fed into any AI system must have:
- A verifiable date of last accuracy
- A scheduled review or deletion date
If your records policy still says “retain forever unless flagged by legal,” you are actively building liability into your AI stack.
Action for you: Tag everything with a confidence decay function. A pricing sheet from 2023 is more reliable than one from 2020. A support ticket from last month matters more than one from 2017. Your IM system should make that explicit.
3. Metadata that machines can actually read
For decades, we’ve treated metadata as a convenience for search. For AI, metadata is oxygen. Your LLM has no idea whether a PDF is a contract, a proposal, or a lunch menu unless you tell it.
If your current systems don’t enforce required fields such as those listed next, your AI initiative is running on hope.
Action for you: Move from unstructured chaos to “structured-unstructured” data. Every document in an AI pipeline must carry the following information, at minimum:
- Class label: What type of information? (Invoice, legal hold, technical spec)
- Sensitivity level: Who can see this? (Public, internal, restricted)
- Currency flag: Is this still operative? (Current, superseded, archived)
- Provenance: Who created or last approved this?
- Business context: What are the strongest relationships this document has to others in the corpus?
The organizational shift you must lead
The biggest obstacle isn’t technology. It’s culture. Most business leaders still treat information management as a cost center, the department that complains about duplicate files and asks people to classify emails. Boring. Back-office. Easy to cut.
You need to reverse that mindset immediately.
In the AI era, your IM team is your quality assurance unit. They are the difference between an LLM that hallucinates a fake regulation and one that cites the exact clause from the current version of a policy. They must have a seat at every AI steering committee, and they should have veto power over any use case that relies on ungoverned content.
Example: A legal chatbot trained on an internal wiki advises a trader that a cross-border transaction is permissible. The wiki was correct – in 2021. A 2022 regulatory change made it illegal, but no one updated the wiki. A metadata flag for “last regulatory review” would have caught it. Without that? You’re exposed.
Everyone wants to talk about algorithms and GPUs, but nobody wants to talk about the data itself, specifically, the 80-90% of it that’s unstructured. That’s where AI goes to die, or occasionally, to become a massive compliance headache. An information manager isn’t a “nice to have.” They’re the only person in the room who actually understands that a PDF is not just a PDF, it’s a contract, a liability, a revenue opportunity, or all three at once.
Why are information managers a strategic enabler? Because without them, your AI initiative is just expensive guesswork. Information managers make AI fundable by turning a swamp of emails, videos, and scanned documents into something you can put a price tag on. Finance people don’t fund chaos. They fund predictable, governable assets. An information manager delivers the data lineage, the metadata, the retention rules – the boring stuff that lets you calculate storage, compute, and risk. Suddenly, that AI project moves from “let’s see what happens” to “here’s the three-year cost model and confidence interval.”
Equally importantly, they make AI defensible, which is what keeps you out of court and out of the headlines. When your retrieval-augmented generation (RAG) system serves up a privileged legal memo to the wrong person, or when your chatbot hallucinates using an old, deleted policy document, who gets the blame? Everyone points fingers, but nobody was watching the data. The information manager was, or at least should be. They enforce the access controls, the ethical boundaries, and the audit trails. They’re the reason you can walk into a regulatory hearing and say, “Yes, we know exactly which version of that document the model used, and we can prove it was properly masked.”
The high-demand skills here aren’t about the inner workings of an AI model. They’re about governance, taxonomy, and pragmatism. RAG pipeline design, metadata schemas, GDPR/CCPA application to unstructured content – that’s the real strategic toolkit. Anyone can spin up a model. Very few people indeed can make that model safe, auditable, and profitable. That’s why information managers are indispensable in Enterprise AI. Not because the hype says so, but because the lawsuits and budget cuts will inevitably follow those who ignore them.
There’s a related risk that gets less attention. Organizations that respond to governance gaps by blocking AI outright don’t eliminate the problem, they just move it off the network. Employees use personal devices, consumer tools, and ungoverned workflows. You lose visibility entirely. The choice isn’t between governed AI and no AI. It’s between governed AI and invisible AI.
Where to start (without boiling the ocean)
You don’t need to migrate everything to a new data fabric. Here is a practical first step:
Pick one high-value, low-risk AI use case like internal IT support or HR policy Q&A. Manually audit the knowledge base that will feed it. Print out the top 200 documents. Look for duplicates, outdated versions, and contradictory statements.
You will be horrified. That horror is valuable: it becomes your business case for IM modernization.
Once you’ve cleaned that one corpus, you have a repeatable pattern: classify, deduplicate, timestamp, authorize. Then scale, remembering that a process significantly aided by a content management layer purpose-built around these principles helps enforce ongoing governance.
Vendors will tell you their vector databases or embedding models solve this, but they don’t. Technology amplifies what you already have. If what you have is a mess, AI will just give you a faster, more conversational mess.
AI is here. Is your org ready?
See if your enterprise is prepared to turn its AI ambitions into reality. Take the 10-question assessment for your personalized report.
Evaluate your AI readinessThe final truth
Center your IM team in your AI initiatives. Through them, go fix your metadata. Clean up your content systems. Retire those forgotten network drives from 2010. Cloud modernization is often the practical first step, and a critical enabler of the governed content layer AI requires
It’s not glamorous. But it is the only work that will make your AI initiative worth the enormous investment you’re about to make.
After 30 years of watching enterprise technology cycles, one pattern holds: the winners won’t be the companies with the most advanced models. They’ll be the companies with the most boring, disciplined, and ruthless information management practices.
That is your competitive advantage. And it starts with a decision only you can make.
About Deep Analysis
Deep Analysis was founded in 2017 to provide strategic advisory services, grounded in deep research into technology buying trends and market dynamics.
Deep Analysis differs from other advisory firms in that our research, though informed by technology vendors, is not driven by them. We first and foremost focus our analysis on the real-world business needs and activities of buyers and users of technology. Much of our time is spent talking to buyers and users to ensure that our research and advice reflects real world realities.
Led by Alan Pelz-Sharpe, with over 20 years of strategic advisory experience and co-author of the recently published book “Practical Artificial Intelligence – An Enterprise Playbook,” Deep Analysis provides confidential advice to technology leaders in product strategy, go-to-market and growth planning.
Deep Analysis researches markets and technologies to provide confidential advice and guidance for our clients to identify, target, and grow their businesses.