OpenText™ recently attended the Gartner Data and Analytics Summit in Orlando where their annual Analytics and BI bake-off remains popular. This year, there were actually two bake-offs: one for business intelligence, and one for data science and machine learning. For those of you who aren’t familiar with Gartner’s “bake-off,” it’s a high-stakes competition between analytics vendors to analyze the same complex data set, reporting on and visualizing their findings. Gartner analyst Cindi Howson says it’s been described “as the World Cup, Grand Prix, and Olympics of analytics and BI … It takes a fair bit of preparation, extreme commitment, and is high stakes for any who participate.”
Excited about both of the challenges, and since each showcased a different strength and value of OpenText Magellan™, we decided to take them both on.
BI Bake-Off – Topic: “Loneliness and Happiness”
The data for BI Bake Off came from two sources. One was from a loneliness survey that was conducted jointly by the Kaiser Family Foundation (KFF) and the Economist. The other data source was from Gartner Iconoculture Consumer Insights which includes data on consumer values and behaviors.
The questions we needed to answer were:
- Is loneliness a major problem?
- Does technology help the situation or make it worse?
- What are the different factors that most influence loneliness and/or happiness?
We started by showing a dashboard of demographic information and discovered the 25-34 age group was the loneliest, had an equal chance of being male or female, and that participation in religion played a factor with this group of people surveyed, although this varied a bit by country. We wanted to see those differences easier and have the data represented side-by-side so we turned to Magellan’s Interactive Crosstab feature.
In the crosstab we used conditional highlighting which made it easier to see the largest groups. And, just as we expected, while the loneliness group in the US and UK is the 25-34, the same group is the one of the least lonely groups in Japan.
Loneliness starts off rather high in the younger age groups (18-24 and 25-34), likely after graduating high school and college, and then gradually starts to drop. This trend towards being less lonely continues until retirement age where loneliness seems to take a small spike towards being lonelier, perhaps, also due to leaving your coworkers behind.
We could guess which factors might influence a person’s loneliness and create individual charts and crosstabs to compare them all, but an easier way is to create a profile that assigns a Z-score on data you simply drag in.
At a glance, we were able to determine the common characteristics of each group.
The happiest people were:
- Never or rarely lonely
- Very good or good physical health
- Excellent or very good mental health
- Knowing five or more people to talk to
- Spending about an hour or less on social media per day
The loneliest people were:
- Not too happy
- Poor mental health
- Knowing two or fewer people to talk to
- Poor physical health
- Divorced, single, or widowed
- Spending more that two hours on social media per day
So, words to the wise, have a good group of friends, keep your job until you find a new one, and don’t spend more than two hours a day on social media!
Data Science and Machine Learning Bake-Off
The data for the DSML (data science and machine learning) Bake-Off came from the U.S. Department of Education and combines data on college admissions rates, test scores, average student debt and salaries six and 10 years after graduation.
For this challenge, we needed to determine the best predictor demographic of schools most likely to have successful students (defined here as students with the highest salary 10 years after graduation).
Much of the work included loading and joining data, cleaning and preparing data, and exploring and analyzing data, all of which was done in the OpenText™ Magellan™ Data Discovery interface. We used correlation analysis to determine which fields were highly correlated and could be safely removed from the analysis.
Once we decided which fields were statistically important, we used logistic regression to help us determine which factors were most important in determining high salary.
After this analysis, we determined that the major predictors of colleges that lead to a higher salary 10 years after graduation are:
- High average cost of attendance
- School offers higher degrees
- High family income
- High undergraduate student enrollment
Data Science Notebook
The second part of the DSML Bake-Off included using a data science notebook to build a model. This model uses a linear regression algorithm to predict student salaries 10 years after graduation. The data scientist does all of this work inside OpenText™ Magellan™ Notebook, including testing and training of the model and once satisfied, pushes a single button to publish the model, making it available to non-data scientists and the rest of the Magellan platform.
Once published, the model is available within the Data Discovery interface. Users select a model, drag data into the parameter area and execute the model. This data is passed into the model’s pipeline and is returned along with the appropriate prediction.
Using OpenText Magellan, we were able to quickly and easily load and explore large amounts of data turning that data into valuable insights. Watch the Data Science Machine Learning video to see OpenText Magellan Data Discovery in use.
Then, watch the Business Intelligence video to see OpenText Magellan BI and Reporting in use.
Learn more about how easy Magellan makes data analysis, visualization, and modeling here.
Are you ready for the Intelligent and Connected Enterprise? Join us at OpenText Enterprise World 2019 to hear how we’re enabling the Intelligent and Connected Enterprise with AI and the Internet of Things.