Three Key Steps to Prepare IT for Machine Learning

The Internet of Things (IoT), mobile applications, social media, and more powerful enterprise systems are generating data at unprecedented scale, speed, and variety. To derive business value out of all these data, more and more enterprises are turning to machine learning. For instance, according to a study conducted by McKinsey Global Institute, the total investment in Artificial Intelligence (AI) was between $8 to $12 billion in 2016 alone, with machine learning constituting nearly 60% of the investment.

More than analyzing data, machine learning is a data-centric system that has the ability to learn, improve, or evolve without human intervention. It is also a system that generates new insights from data without the need for external programming. Because of these capabilities, machine learning enables enterprises to quickly and efficiently review, interpret, and analyze data leading to business insights and competitive advantages.
Some of the benefits that machine learning delivers to enterprise include faster decision making, processing and analyzing large volumes of data; greater analytical accuracy; and generating better overall business insights.

More specifically, machine learning can aid enterprises in improving customer experience and relationships by providing personalized recommendations based on their behavior and past purchases, lengthening the life of their assets through predictive maintenance, enabling greater security through smarter threat detection, and optimizing DevOps by improving key areas of the application delivery life cycle.

Enabling machine learning for enterprises often falls on their respective IT departments. With the capabilities, benefits, and competitive advantages that machine learning offers, how can IT departments prepare for it? In this post, we will explore the three key steps in preparing enterprises for machine learning:

Step #1: Understand the Machine Learning Process

Understanding how the machine learning process works is key to preparing enterprises for machine learning. The required architecture and personnel depend on how the process works. While the key stages of the machine learning process are essentially the same for each enterprise, enterprises often differ in the problems they want to solve through machine learning. This can have an effect on the type of architecture they need to build and the personnel they need to have. This process often involves three stages:

Identifying the problem

The machine learning process often starts with identifying the problem and use case. By identifying the problem, the objective of machine learning becomes clear for the enterprises and their IT departments, especially when it comes to creating their strategy for implementing machine learning.

Gathering and processing data

After identifying the problem, IT departments need to identify the data sources from which they can acquire information to solve the problem. Data sources can be enterprise systems such as CRM and ERP systems, IoT devices, or legacy systems. They can also come from structured sources such as databases and unstructured sources such as documents, pictures and emails.

Once data sources are identified, IT departments must process and normalize data in order to utilize them for machine learning execution. Processing these data involves data integration, cleansing, and transformation.

Development and deployment

After gathering and normalizing the necessary data for machine learning execution, IT departments must create a machine learning model. They can create the model by developing the algorithms that will be used by machine learning programs to learn and solve the identified problems. Once the algorithms are developed, they are run by the IT departments and the results of every cycle are analyzed and validated. Based on the cycle runs of the machine learning model, the algorithms are fine-tuned to achieve the desired results.

Finally, once the desired results are consistently achieved by the machine learning model, it can finally be deployed. Once deployed, enterprises can finally reap the benefits of machine learning such as faster decision making, smarter insights, and more accurate predictions.

Step #2: Build the Machine Learning Architecture

IT departments can now identify the key functions for which they need to build the architecture in order to execute the machine learning process:

Data gathering

The first component of the machine learning architecture should enable it to collect data from different sources such as databases, enterprise systems, mainframes, and even IoT devices. It should ensure the fast and reliable ingest of data from their sources to continuously feed the machine learning model with inputs.
This component often requires the utilization of 1) batch data warehouses that are able to store and forward massive amounts of data and 2) real-time data ingestion tools that are able to filter necessary versus unnecessary data and the data needed for immediate processing versus data that can be stored for later use.

Data integration

The second component of the architecture prepares the data ingested for integration and, ultimately, for machine learning execution. This component often includes modules for transforming, cleaning, and normalizing data.
Enterprises need to consider whether the data to be processed or integrated are in transit or at rest and whether the data are stored in legacy systems or in the cloud. These considerations will affect the tools and the features needed for the architecture. For instance, in-memory processing may be best suited for the high-throughput computing required by the continuous processing and integration of data. Integration platform-as-a-service (IPaaS) tools may suffice for the integration of data stored in cloud applications, but not for those stored in legacy systems.

Tools that are able to support self-service integration of data are ideal for the second component of the machine learning architecture. The tools must also be able to aid IT departments and enterprises with data governance and ensuring the security and privacy of data.

Data modeling

The third component of the architecture enables enterprises to select algorithms and adapt them to address the problems they need to solve. These algorithms are not necessarily built from scratch by IT departments. Algorithms can be obtained from the market and customized for the needs of each enterprise. These algorithms which are available in the market can provide IT departments with the necessary experience in handling machine learning projects, developing machine learning models, and later deploying them in actual machine learning models.


After preparing the data and modeling machine learning algorithms to solve the pre-identified problems, the fourth component of the architecture should be able to execute machine learning routines. This component of the architecture repeatedly runs machine learning routines in order to test, validate, and fine-tune the models until they can consistently deliver the desired outcomes.

A key consideration for this component is the processing capabilities needed to effectively run machine learning routines. These capabilities can be obtained by building in-house infrastructures or as-a-service solutions from cloud providers. For instance, simple machine learning algorithms can be executed using an average CPU. However, machine learning algorithms which can perform deep learning often require higher computing power and even higher-powered graphics processing units (GPUs). Some cloud providers are already offering services that utilize their own GPUs, thus saving enterprises from the cost of building their own capabilities in-house.

Cloud services also provide enterprises with an ideal set up for testing, debugging, and optimizing machine learning models prior to deployment or production. Machine learning algorithms may yield nondeterministic results especially at the beginning. Testing and refining machine learning models require significant capacity and computing power which may disrupt normal enterprise operations. Cloud services enable enterprises to not take computing capacity away from their day-to-day operations while being able to test machine learning models without reservations.


Finally, the fifth component of the machine learning architecture should enable enterprises to utilize the results of the machine learning process in their own enterprise systems, applications, and data stores. Like the output of other enterprise software or applications, the output of the machine learning process often takes the form of a report. This report can be business intelligence that aids in decision making or information that can be used in other enterprise systems. Some outputs can also take the form of another model that supports other data analytics applications for even better insights.

The fifth component of the machine learning architecture involves software or processes that deploys machine learning experiments from the fourth component into production — that is, into existing enterprise applications and data stores. These software or processes can be purchased from vendors or developers or even as commercial off-the-shelf (COTS) products.

Step #3: Acquire the Necessary Skills

The third and final step in preparing enterprises for machine learning is acquiring the necessary skills to execute it. Two of the key skills required to execute machine learning are:

Data engineering

Machine learning depends heavily on data. Data engineering skills ensure the quality and integrity of data from acquisition to transformation to execution and deployment. Data engineering also ensures that the movement of data across the machine learning architecture is seamless. Data engineering also involves the programming skills needed for operationalizing or deploying machine learning applications. These programming skills typically includes programming languages such as Python, Java, R, and matrix laboratory.

Data science

Aside from data engineering, data science is also required for the integration and modeling stages of the machine learning process. Data scientists and data stewards will need to have the expertise in managing integration architecture and platforms in order to cleanse, transform, process and deploy data.

Data Integration is at the Heart of Machine Learning

The process, the ideal architecture, and the required skills to prepare for and execute machine learning all revolve around data. This means that the seamless integration of data is a key step to the successful implementation of machine learning.

For many enterprises, implementing machine learning can be a complex and tedious undertaking. But by focusing first on data integration, it will be easier to embrace machine learning and unlock its benefits. OpenText’s ALLOY® platform connects, cleanses, harmonizes, enriches, and secures data coming from various sources so that enterprises can do just that.


OpenText is the leader in Enterprise Information Management (EIM). Our EIM products enable businesses to grow faster, lower operational costs, and reduce information governance and security risks by improving business insight, impact and process speed.

Related Articles