In order to fight cyberattacks, enterprises must first be able to detect them. Hackers and cybercriminals go to great lengths to make attacks unnoticeable. For organizations with thousands upon thousands of connected databases, networks, and systems, and with the Internet of Things (IoT) and Bring-Your-Own-Device (BYOD) models connecting millions of devices, it is easier to conceal cyberattacks.
Aside from external data security risks such as cyberattacks, enterprises also need to address internal and inherent security risks. Enterprises spend significant amounts of time, money, and resources to implement data security safeguards. However, most of them fail to regularly monitor, audit, and review their data to ensure that no unauthorized access has been made. Furthermore, in many enterprises, not all IT professionals are able to identify where important and critical data are located throughout the organization’s databases and systems. According to a survey of IT practitioners, 57% of respondents do not know where their organizations’ sensitive data lie.
In order to minimize cybersecurity risks and their potential impact, enterprises must address these challenges of detecting possible anomalies in data access, consistently reviewing and auditing their data, and identifying the location of critical and sensitive data.
Big Data and Machine Learning: A Powerful Combination for Data Security
Today’s cybersecurity and data security landscape requires a solution that can handle hundreds to thousands of connected databases, systems, and networks, and millions of devices. The solution needs to be able to handle, gather and analyze Big Data despite its increasing volume, variety, and velocity. The sheer volume and complexity of today’s data make Big Data platforms the ideal solution for cybersecurity and data security.
On top of the Big Data platforms that can handle thousands to millions of connections, enterprises also need a solution that can make sense of and analyze the data gathered. More specifically, they need algorithms that can detect suspicious behavior and anomalies in data access. These algorithms also need self-learning capabilities as more information is fed and analyzed and as threats continue to evolve. These self-learning algorithms are often referred to as artificial intelligence or machine learning.
The Case for Big Data Platforms
The scale and velocity at which data is moving requires a solution that can scale, especially with the size of the anomaly detection programs that enterprises plan to implement. At the same time, the solution needs significant computing power and storage space to run detection applications and machine learning algorithms. Big Data platforms provide the needed storage capacity and processing power to handle terabytes of data coming from millions of devices per second. They also possess the computing power to enable data and predictive analytics to identify anomalies.
But, the biggest case for Big Data platforms rests on how they collect, transform, and standardize data in order to conform with the data formats required by systems, networks, applications, and even other devices for integration, data streaming, and real-time analysis. Big Data platforms streamline data collection to accommodate large volumes of transactions per second, establish schemas for integration, and prepare the enterprise system for data analysis, including analysis for intrusion detection.
The Case for Machine Learning
For Chief Information Security Officers (CISOs) whose main responsibilities include securing enterprise data, the increasing sophistication and complexity of cyberattacks makes them uncomfortable despite their best efforts toin protecting their organizations from attacks. This is often due to manpower, scale, and other concerns.
Machine learning enables enterprises to regularly monitor threats and analyze logs and access at scale without being limited by human fatigue, significantly improving efficiency and coverage. Moreover, it frees up personnel for other high-value and decision-making tasks.
With the increasing number of structured and unstructured data, coupled with an increasing number of data sources, machine learning can also help enterprises manage and identify the location of sensitive data[TR3] . Machine learning can also predict patterns that can help enterprises better understand data access and utilization throughout their organizations.
Monitoring, analyzing, and predicting user behavior are among the most important benefits of machine learning. A large number of attacks against data security can be attributed to internal threats. With a better understanding of user behavior, enterprises have a much better chance of identifying and preventing security threats.
Two Types of Machine Learning Models
Machine learning is dependent on the inputs that are fed into the algorithm or model. Oftentimes, individual events are not valuable in identifying and predicting intrusions. Patterns can only be uncovered from data over a period of time. With Big Data, this translates to large volumes and variety of data being analyzed over time.
To understand how machine learning analyzes these large amounts of data, it helps to better understand the two types of machine learning models: deterministic and probabilistic.
- The deterministic model of machine learning starts by using small samples of data sets, from which baselines are established and suspicious behaviors are identified. From these baselines and anomalies, models are created and built to monitor, review, and analyze future behaviors. These models are called deterministic as they identify and determine suspicious behaviors or anomalies that deviate from the established baseline.
For instance, suppose the number of times data is accessed in an enterprise has an average range as recorded in logs. An excess in the number of data access over the higher end of the range will be identified as an anomalous behavior which will require investigation for a possible intrusion.
- The probabilistic model of machine learning looks for patterns in data that deterministic models may not be able to detect. It clusters similar things like events to predict outcomes and patterns. Oftentimes, clusters which are less populous than usual indicate the presence of malware or anomaly as they signify rare events or occurrences put together into a cluster.
- The probabilistic model has three phases: unsupervised phase, semi-supervised phase, and the fully supervised phase.
- The unsupervised phase of the probabilistic model is done without human intervention, which means it creates maps of different clusters unsupervised. The map provides a view of populous, less populous, and barely populated clusters.
- Data analysts or scientists then review the clusters, especially the less populous ones, and determine whether or not they contain potentially malicious events or occurrences. This is the semi-supervised phase of the model.
After determining which clusters contain potentially malicious events, a scoring system is used to determine the probability of their being malicious or anomalous. This phase is done by an expert such as a data analyst or scientist, making it a fully supervised phase of the model.
- Whether a deterministic or probabilistic model of machine learning is used, the objective of detecting intrusions or compromises to data security remains the same.
Data Management and Integration: Two Foundations for Big Data and Machine Learning Implementation
Big Data platforms and machine learning algorithms can only beare only as good as the data fed into them. Quality data is a necessity in implementing intrusion detection programs.
Data management ensures that quality data is supplied to the Big Data platforms that prepare the data for analysis. Data management cleanses, harmonizes, enriches, and secures data coming from all corners of an enterprise and its partners. Data management also enables enterprises to identify the location of sensitive data and provides users with easy, but secure access to those data.
Data integration, on the other hand, is the backbone of data management. It consolidates data coming from disparate databases, systems, networks, applications, and even devices, making it easier for enterprises to manage and analyze their data. Data integration ensures the completeness of data to be inputted into Big Data platforms and machine learning algorithms.
Together, data management and integration can provide a solid foundation for Big Data platforms and machine learning that aim to bolster enterprise data security.
A Trusted Platform for Data Management and Integration
OpenText’s ALLOY™ Platform takes a revolutionary approach to data integration and management. ALLOY is a next generation cloud platform that provides unified integration and data management capabilities as managed services, buffering the complexities of increasing data volume and variety.
OpenText’s ALLOY Platform connects any two application end points such as cloud, mobile, and device; persists data in a big data repository, providing on-demand, self-service access to clean, quality data; provides built-in security and compliance; and is an efficient alternative to DIY integration models such as ESB or iPaaS at a time when connections are growing exponentially. It tackles all patterns of integration such A2A, B2B/EDI, cloud services, and ETL, while also consolidating, cleansing, and enriching data coming from any number of disparate sources, making a powerful and unified data integration and data management platform.
With Liaison’s ALLOY Platform, enterprises can establish a strong foundation for data security programs. Contact OpenText to learn more about data security, management and integration.