Taking an agile approach to data science helps deal with rapidly changing environments, uncertainty, complex solutions, emerging technologies, and ambiguous requirements inherent in these projects. In OpenText Data Science methodology Part 1, we explored key agile elements to reduce risks and achieve early outcomes in unpredictable and complex scenarios by delivering high-value products early and frequently and reviewing the results with the customer on a regular basis.
OpenText Data Science Methodology expands on the Agile approach, adding focus to six key areas or phases:
- Business understanding
- Data understanding
- Data preparation
- Modeling
- Evaluation
- Deployment
1. Business understanding
What problem or question do we want to address with data?
This first phase is understanding the background and business objectives for the project. It is important, at this stage, to have an idea of what success might look like and outlining it as criteria for all other phases. Activities will include inventory of resources and requirements, assumptions, constraints, risks, contingencies, terminology, costs and benefits. After having assessed the situation, a project plan can be produced, data mining goal defined, and the subject matter experts assembled (gather the right people).
2. Data understanding
What data do we have that could answer our questions?
Every data science project has three must-have ingredients: data + people + technology. In this phase, focus is given to the first ingredient, data. Activities will include collecting data, describe it, explore it and verify its quality. After having assessed and built an understanding of initial data sets, the project plan can be updated with data preparation tasks or even tasks to look for additional data.
3. Data preparation
What do we need to do to prepare the data for mining?
This phase consists of getting the most out of the data that you have. Sometimes it means digitizing it, OCR’ing or pre-processing it in other ways so it can be used in the next modelling phase. Activities will include:
- Selecting the data – deciding on the rationale for including or excluding it
- Cleaning the data – defining the rules to clean it and preparing a report if needed
- Constructing data – from derived attributes or generating records
- Integrating data – merging data
- Formatting data – reformatting
4. Modeling
How can we mimic or enhance human knowledge or actions through technology?
This phase applies to any type of data science, whether it is with structured data or text mined data. In conjunctions with technology choices, select modeling techniques, test designs, evolve your model and assess it against the criteria and data mining goals established in the business understanding phase.
5. Evaluation
What new information do we now know?
An agile approach calls for delivering high-value products early and frequently. In this phase, an assessment of data mining results pertaining to Business Success Criteria is performed by the development team. Essentially, the model is tested to validate its performance against the problem defined in the business understanding phase. A review of approved models and process provides a list of next actions and decisions to the customer to obtain feedback and even an initial usage to drive business value.
6. Deployment
What actions should we trigger with the new information? What needs human validation?
With customer approval, this phase enables delivery of initial product value, and activities will include:
- Deployment Plan
- Monitoring and Maintenance
- Produce Final Report and Presentation
- Review Project
Cognitive Strategy Workshop
A true differentiating factor with OpenText’s Professional Services in the AI domain, is that it houses data scientists and cognitive analysts who are not only experts in their domain – (with PhDs and decades of experience) – but they are also genuinely passionate about helping people understand and tap into the potential of their data.
Before jumping into a project, OpenText offers a Cognitive Strategy Workshop designed for IT executives, business analysts, and data owners. It is a four-day strategy workshop tailored to an organization’s needs. OpenText delivers advice and guidance based on best practices for implementing an AI project using Magellan to realize the most benefit from structured and unstructured data.
Above all else, remember that every data science project has three must-have ingredients: data + people + technology. The art is in bringing all three of these elements together through a data science methodology.
Learn more about the Cognitive Strategy Workshop or our AI & Analytics Services, alternatively reach out to the team.