How to Identify the Return on Investment of Big Data (Infographic)


If you are a CIO, you know what goes on at a board of directors meeting. This is not the place to be confused when your CEO asks you a simple, commonly asked question, which requires a simple, accurate answer: “What is the ROI of Big Data costs?” Let’s be honest, Big Data is a big, painful issue. It is often recommended to use just a little information in your dashboard if you want to be heard. But at the same time, you want this information to be accurate, right? That’s when you are happy that your data is big. Why is data size so important? Data scientists, for instance, are always asking for big data because they know how predictive analysis can easily be inaccurate if there isn’t enough data on which to base your predictions. We all know this when it comes to the weather forecast – it is the same when it comes to risk anticipation or sales opportunities identification. What’s really new is how easily any software can access big data. If 90% of the world’s data today has been created in the last 2 years alone, what can we expect in the near future?  For starters, the Internet of Things is anticipated to hugely increase the volumes of data businesses will have to cope with. ROI is all about results. Big data is here to stay and bigger data is coming, so the best we can do is to make it worth the trouble. “Cloud is the new electricity,” according to the latest IT Industry Outlook published by CompTIA. But I don’t have good news for you, if you feel comfortable just planning to move your data to the cloud. This is just the beginning. Experts often say that big data doesn’t have much value when simply stored; your spending on big data projects should be driven by business goals. So it’s not a surprise that there is increased interest in gaining insights from big data and making data-based decisions. In fact, advanced analytics and self-service reporting is what you should be planning for your big data. I’ll briefly tell you why: You need to support democratization of data integration and data preparation in the cloud You should enable software for self-service advanced and predictive analytics Big data insights and reporting should be put where they’re needed Why support democratization of data integration and data preparation in the cloud Big data analytics, recruiting talent and user experience top the CIO agenda for 2016, according to The Wall Street Journal; but these gaps will hardly be solved in time, because of the shortage of data-savvy people. Actually, according to analysts, there is an anticipated 100,000+  analytic talent shortage of people through to 2020. So, meanwhile CIOs find solutions to their own talent gaps; new software and cloud services appear in the market to enable business users to get the business insights and advance ROI of big data. Hopefully, someday, a data scientist can provide those insights but platforms like OpenText™ Big Data Analytics includes easy-to-use, drag-and-drop features to load and integrate different data sources from the front end or the back end, in the cloud. Now, I say hopefully because requirements for data scientists are no longer the same. Knowledge of coding is often not required. According to Robert J. Lake, what he requires from data scientists at Cisco is to know how to make data-driven decisions – that’s why he leaves data scientists to play with any self-service analytics tool that may help them to reach that goal. Data scientists spend around 80% of their time preparing data, rather than actually getting insights from it – so interest in self-service data preparation is growing. Leaving the data cleansing to data scientists may be a good idea for some of their colleages, but actually it is not a good idea in terms of agility and accuracy. That’s the reason why cloud solutions like Salesforce are appreciated, because it leaves sales people time to collaborate – adding, editing or removing information that will give a more precise view of their prospects, one that only they are able to identify with such precision. What if you could expect the same from a Supply Chain Management or Electronic Health Record, where data audits depends on multiple worldwide data sources, with distinct processes and with no dependency on data experts at all? In fact, 95% of organizations want end users to be able to manage and prepare their own data, according to noted market analyst Howard Dresner. Analysts predict that the next big market disruption is self-service data preparation, so expect to hear more about it in the near future. Why you should enable self-service advanced and predictive analytics Very small businesses may find desktop tools like Excel good enough for their data analysis, but after digital disruption these tools have become inadequate even for small firms. The need for powerful analytic tools is even greater for larger companies from data-intensive industries such as telecommunications, healthcare, or government. The columnar database has been proposed as the solution, as it is much speedier than relational databases when querying hundreds of millions or billions of rows. Speed of a cloud service is dependent on the volume of data as well as the hardware itself. Measuring the speed of this emerging technology is not easy but even a whole NoSQL movement is advising that relational databases are not the best future option. Companies have been able to identify the ROI of big data using predictive analytics to anticipate risk or forecast opportunities for years. For example, banks, mortgage lenders, and credit card companies use credit scoring to predict customers’ profitability. They have been doing this even when complex algorithms require data scientists, hard-to-find expertise, not just to build but to keep them running. That limits their spread through an organization. That’s why OpenText™ Big Data Analytics in the Cloud includes ad-hoc and pre-built algorithms like: Profile: If you are able to visualize a Profile of a specific segment of your citizens, customers or patients and then personalize a campaign based on the differentiation values of this segment, why would the ROI of the campaign not be attributed to the big data that previously stored it? Forecasting: If the cloud application is able to identify cross-selling opportunities and a series of campaigns are launched, the ROI of those campaigns could be attributed to the big data that you previously secured Decision Tree: You should be able to measure the ROI of a new process based on customer risk identification during the next fiscal year and attribute it to big data that you previously stored in the cloud Association Rules: You can report the ROI of a new recruitment requirement based on an analysis of job abandonment information and attribute it to big data that you had previously enabled as a self-service solution The greater the number of stars shown on the Forecast screenshot above, the stronger the evidence for non-randomness. This is actually when you are grateful for having so much information and having it so clean! Customer analytics for sales and marketing provide some of the classic use cases. Looking at the patterns from terabytes of information on past transactions can help organizations identify the reasons behind customer churn, the ideal next offer to make to a prospect, detect fraud, or target existing customers for cross-selling and up-selling. Put Big Data insights and reporting where they’re needed Embedded visualizations and self-service reporting are key to allow the benefits of data-driven decisions into more departments, because it doesn’t require expert intervention. Instead, non-technical users can spontaneously “crunch the numbers” on business issues as they come up. Today 74% of marketers can’t measure and report on the contribution of their programs to the business according to VisionEdge Marketing. Imagine that you as a CIO have adopted a very strong advanced analytics platform, but the insights are not reaching the right people – that is, in case of a hospital, the doctor or the patient. Let’s say the profile of the patient and drug consumption is available in someone’s computer, but that insight is not reachable by any user who can make the difference when a new action is required. The hospital’s results will never be affected in that case by big data and the ROI potential will not be achieved because the people who need the insights are not getting them, and the hospital will not change with or without big data. This is called invisible analytics. Consider route optimization of a Supply Chain – the classic “traveling salesman problem.” When a sizable chunk of your workforce spends its day driving from location to location (sales force, delivery trucks, maintenance workers), you want to minimize the time, miles, gas, and vehicle wear and tear, while making sure urgent calls are given priority. Moreover, you want to be able to change routes on the fly – and let your remote employees make updates in real-time, rather than forcing them to wait for a dispatcher’s call. Real-time analytics and reporting should be able to put those insights literally in their hands, via tablets, phones, or smart watches, giving them the power to anticipate or adjust their routes. OpenText™ Information Hub offers these capabilities as a powerful ad-hoc and built-in reporting tool that enables any user to personalize how their company wants information and the data visualization to be displayed. You should always ensure that the security and scalable capabilities of the tool you need is carefully selected, because in such cases you will be dealing not only with billions of rows, but also maybe millions of end users. As mentioned at the start of this blog, user experience is also at the top of the CIO’s agenda. True personalization that ensures the best user experience requires technology that can be fully branded and customized. The goal should be to adapt data visualizations to the same look and feel as the application to provide a seamless user experience. UPS gathers information at every possible moment and stores over 16 petabytes of data. They make more than 16 million shipments to over 8.8 million customers globally, receive on average 39.5 million tracking requests from customers per day, employ 399.000 people in 220 different countries. They spend $1 billion a year on big data but their revenue in 2012 was $ 54.1 billion. Identification of the ROI of big data is dependent on the democratization of the business insights coming from advanced and predictive analytics of that information. Nobody said it is simple but it can lower operating costs and boost profits, which every business users identifies as ROI. Moreover when line-of-business users rather than technology users are driving the analysis, and the right people are getting the right insight when they need it, improved future actions should feed the wheel of big data with the bigger data that is coming. And sure you want it to come to the right environment, right? Download the Internet of Things and Business Intelligence by Dresner The Internet of Things and Business Intelligence from Dresner Advisory Services is a 70-page research that provides a wealth of information and analysis, offering value to consumers and producers of business intelligence technology and services. The business intelligence vendor ratings include scores for location intelligence, end-user data preparation, cloud BI, and advanced and predictive analytics–all key capabilities for business intelligence in an IoT context. Download here.

Read More

Big Data: The Key is Bridging Disparate Data Sources

Big Data

People say Big Data is the difference between driving blind in your business and having a full 360-degree view of your surroundings. But, adopting big data is not only about collecting data. You don’t get a Big Data club card just for changing your old (but still trustworthy) data warehouse into a data lake (or even worse, a data swamp). Big Data is not only about sheer volume of data. It’s not about making a muscular demonstration of how many petabytes you stored. To make a Big Data initiative succeed, the trick is to handle widely varied types of data, disparate sources, datasets that aren’t easily linkable, dirty data, and unstructured or semi-structured data. At least 40% of the C-level and high-ranking executives surveyed in the most recent NewVantage Partners’ Big Data Analytics Survey agree. Only 14.5% are worried about the volume of the data they’re trying to handle. One OpenText prospect’s Big Data struggle is a perfect example of why the key challenge is not data size but complexity. Recently, OpenText™ Analytics got an inquiry from an airline that needed better insights in order to head off customer losses. This low-cost airline had made a discovery about its loyal customers. Some of them, without explanation, would stop booking flights. These were customers that used to fly with them every month or even every week, but were now disappearing unexpectedly. The airline’s CIO asked why this was happening. The IT department struggled to push SQL queries against different systems and databases, exploring common scenarios for why customers leave. They examined: The booking application, looking for lost customers (or “churners”). Who has purchased flights in previous months but not the most recent month? Which were their last booked flights? The customer service ticketing system to find if any of the “churners” found in the booking system had a recent claim. Were any of those claims solved? Closed by the customer? Was there any hint of customer dissatisfaction? What are the most commonly used terms in their communications with the airline – for example, prices? Customer support? Seats? Delays? And what was the tone or sentiment around such terms? Were they calm or angry?  Merely irked, or furious and threatening to boycott the airline? The database of flight delays, looking for information about the churners’ last bookings. Were there any delays? How long? Were any of these delayed flights cancelled? Identifying segments of customers who left the company during the last month, whether due to claims unresolved or too many flights delayed or canceled, would be the first step towards winning them back. So at that point, the airline’s IT department’s most important job was to answer the CIO’s question – May I have this list of customers? The IT staff needed more than a month to get answers to these questions, because the three applications and their databases didn’t share information effectively. First they had to move long lists of customer IDs, booking codes, and flight numbers from one system to another. Then repeat the process when the results weren’t useful. It was a nightmare crafted of disperse data, complex SQL queries, transformation processes, and lots of efforts – and it delivered answers too late for the decision-maker. A new month came with more lost customers. That’s when the airline realized it needed a more powerful, flexible analytics solution that could effortlessly draw from all its various data sources. Intrigued by the possibilities of OpenText Analytics, it asked us to demonstrate how we could solve its problems. Using Big Data Analytics, we blended the three disparate data sources. In just 24 hours we were able to answer the questions and OpenText™ Big Data Analytics had worked its magic. The true value of Big Data is getting answers out of data coming from several diverse sources and different departments. This is the pure 360-degree view of business that everyone is talking about. But without an agile and flexible way to get that view, value is lost in delay. Analytical repositories that use columnar technologies – i.e., what OpenText Analytics solutions are built on – are there to help answer questions fast when a decision-maker needs answers to business challenges.

Read More

Analytics at 16 — Better Insight Into Your Business


It’s been 16 years since the dawn of Business Intelligence version 2.0. Back then, the average business relied on a healthy diet of spreadsheets with pivot tables to solve complex problems such as resource planning, sales forecasting and risk reporting. Larger organizations lucky enough to be cash-rich employed data scientists armed with enterprise-grade BI tools to give them better business insights. Neither method was perfect. Data remained in silos, accessible in days, not minutes. Methodologies such as Online Analytical Processing (OLAP), extract-transform-load (ETL), and data warehousing were helpful in computing and storing this data, but limitations on functionality and accessibility remained. Fast forward to today. Our drive to digitize analytics and provide a scalable platform creates opportunities for businesses to use and access any data source, from the simplest flat files, to the most complex databases, and online data. Advanced analytics tools now come as standard with connectors for multiple disparate data sources and a remote data provider option for loading data from a web address. These improvements in business analytics capabilities provide industry analysts with a rosy outlook for the BI and analytics market. One is forecasting global revenue in the sector to reach $16.9 billion this year, an increase of 5.2 percent from 2015. A Better Way to Work While business leaders are clamouring for more modern analytics tools, what do key stakeholders — marketers and the business analysts who support them, end-users, and of course IT and their development teams — really want in terms of outcomes? Simple: businesses want their analytics easy-to-use, fast, and agile. Leading technology analysts have commented that the shift to the modern BI and analytics platform has now reached a tipping point, and that transitioning to a modern BI platform provides the opportunity to create business value from deeper insights into diverse data sources. Over the last few years, OpenText has established its Analytics software to serve in the context of the application (or device, or workflow) to deliver personalized information, drive user adoption, and delight customers. Our recent Release 16 of the Analytics Suite is helping to enable our vision of using analytics as “A Better Way to Work.” The Analytics Suite features common, shared services between the two main products — OpenText™ Information Hub (iHub) and OpenText™ Big Data Analytics (BDA) — such as single sign-on, single security model, common access, and shared data. Additionally, iHub accesses BDA’s engine and analysis results to visualize them. The solution includes broadly functional APIs based on REST and JavaScript for embedding. Both iHub and BDA are available deployed either on-premises or as a managed service to serve business and technical users. Understand and Engage This focus drives our approach. At a high level, we enable two key use cases (as illustrated below). First, advanced analytics harnesses the power of your data to help you better understand your market or your customers (or factory or network in an Internet of Things scenario). Second, you engage with these users and decision makers with data-driven visual information like charts, reports, and dashboards—on the app or device of their choice. Whether you are looking to build smarter customer applications or harness the power of your big data to beat the competition to market, analytics is the bridge between your digital strategy and business insights that drive smarter decisions—for every user across the enterprise. Check out the Analytics Suite today and gain deeper insight into your business.

Read More

Achieve Deeper Supply Chain Intelligence with Trading Grid Analytics

supply chain analytics

In an earlier blog I discussed how analytics could be applied across supply chain processes to help businesses make more informed decisions relating to their trading partner communities. Big Data analytics has been used across supply chain operations for a few years, however the real power of analytics can only be realized if it is actually applied across the transactions flowing between trading partners. Embedding analytics to transaction flows allows companies to get a more accurate ‘pulse’ of what is going on across supply chain operations. In this blog, I would like to introduce a new offering as part of our Release 16 launch, OpenText™ Trading Grid Analytics. The OpenText™ Business Network processes over 16 billion EDI related transactions per year and this provides a rich seam of information to mine for improved supply chain intelligence. Last year,OpenText expanded its portfolio of Enterprise Information Management solutions with the acquisition of an industry leading embedded analytics company. The analytics solution that OpenText acquired is being embedded within a number of cloud-based SaaS offerings that are connected to OpenText’s Business Network. Trading Grid Analytics provides the ability to mine transaction flows for both operational and business specific metrics.  I explained the difference between operational and business metrics in my previous blog, but just to recap here briefly: Operational metrics can be defined as: delivering transactional data intelligence and volume trends needed to improve operational efficiencies and drive company profitability. Business metrics can be defined as: delivering the business process visibility required to make better decisions faster, spot and pursue market opportunities, mitigate risk and gain business agility. Trading Grid Analytics will initially offer a total of nine out-of-the-box metrics (covering EDIFACT and ANSI X12 based transactions), which will be made up of two operational and seven business metrics, all of which are displayed in a series of highly graphical reporting dashboards. Operational Metrics Volume by Document Type – Number and type of documents sent and received over a period of time (days, months, years) Volume by Trading Partners – Number and type of documents sent and received, ordered by top 10 and bottom 10 partners Business Metrics ASN Timeliness – Number of timely ASN creation instances as a percentage of total ASNs for a time period Price Variance – The actual invoiced cost of a purchased item, compared to the price at the time of order Invoice Accuracy – Measures whether invoices accurately reflect orders placed in terms of product, quantities, and price by supplier, during a specified period of time Quantity Variance – The remaining quantity to be invoiced from a purchase order, equalling the difference between the quantity delivered and the quantity invoiced for goods received Order Acceptance – Fully acknowledged POs as a percentage of total number of POs within a given period of time Top Partners by Spend – Top trading partners by the economic spend over a period of time Top Products by Spend – Top products by economic spend over time Supply chain leaders and procurement professionals need an accurate picture of what is going on across their trading partner communities so that they can, for example, identify leading trading partners and have information available to support the negotiation of new supply contracts. Trading Grid Analytics is a cloud-based analytics platform that offers: Better Productivity – Allows any transaction related issues to be identified and resolved more quickly Better Insight – Deeper insights into transactional and supply chain information driving more informed decisions Better Control – Improved visibility to exceptions and underperforming partners allows corrective action to be taken earlier in a business process Better Engagement – Collaborate more closely with top partners and mitigate risk with under-performing partners Better Innovation – Cloud-based reporting portal provides access any time, any place or anywhere More information about Trading Grid Analytics is available here.

Read More

Unstructured Data Analytics: Replacing ‘I Think’ With ‘We Know’


Anyone who reads our blogs is no doubt familiar with structured data—data that is neatly settled in a database. Row and column headers tell it where to go, the structure opens it to queries, graphic interfaces make it easy to visualize.  You’ve seen the resulting tables of numbers and/or words everywhere from business to government and scientific research. The problem is all the unstructured data, which some research firms estimate could make up between 40 and 80 percent of all data.  This includes emails, voicemails, written documents, PowerPoint presentations, social media feeds, surveys, legal depositions, web pages, video, medical imaging, and other types of content. Unstructured Data, Tell Me Something Unstructured data doesn’t display its underlying patterns easily. Until recently, the only way to get a sense of a big stack of reports or open-ended survey responses was to read through them and hope your intuition picked up on common themes; you couldn’t simply query it. But over the past few years, advances in analytics and content management software have given us more power to interrogate unstructured content. Now OpenText is bringing together powerful processing capacities from across its product lines to create a solution for unstructured data analytics that can give organizations a level of insight into their operations that they might not have imagined before. Replacing Intuition with Analytics The OpenText solution for unstructured data analytics has potential uses in nearly every department or industry. Wherever people are looking intuitively for patterns and trends in unstructured content, our solution can dramatically speed up and scale out their reach.  It can help replace “I feel like we’re seeing a pattern here…” with “The analytics tell us customers love new feature A but they’re finding new feature B really confusing; they wonder why we don’t offer potential feature C.”  Feel more confident in your judgment when the analytics back you up. The Technology Under the Hood This solution draws on OpenText’s deep experience in natural language processing and data visualization.  It’s scalable to handle terabytes of data and millions of users and devices. Open APIs, including JavaScript API (JSAPI) and REST, promote smooth integration with enterprise applications.  And it offers built-in integration with other OpenText solutions for content management, e-discovery, visualization, archiving, and more. Here’s how it works: OpenText accesses and harvests data from any unstructured source, including written documents, spreadsheets, social media, email, PDFs, RSS feeds, CRM applications, and blogs. OpenText InfoFusion retrieves and processes raw data; extracts people, places, and topics; and then determines the overall sentiment. Visual summaries of the processed information are designed, developed, and deployed on OpenText Information Hub (iHub). Visuals are seamlessly embedded into the app using iHub’s JavaScript API. Users enjoy interactive analytic visualizations that allow them to reveal interesting facts and gain unique insights from the unstructured data sources. Below are two common use cases we see for the OpenText solution for unstructured data analytics, but more come up every day, from retail and manufacturing to government and non profits.  If you think of further ways to use it, let us know in the comments below. Use Case 1: On-Demand Web Chat A bank we know told us recently how its customer service team over the past year or two had been making significantly more use of text-based customer support tools—in particular pop-up web chat. This meant the customer service managers were now collecting significantly more “free text” on a wide range of customer support issues including new product inquiries, complaints, and requests for assistance. Reading through millions of lines of text was proving highly time-consuming, but ignoring them was not an option. The bank’s customer service team understood that having the ability to analyze this data would help them spot and understand trends (say, interest in mortgage refinancing) or frequent issues (such as display problems with a mobile interface). Identifying gaps in offerings, common problems, or complaints regarding particular products could help them improve their overall customer experience and stay competitive. Use Case 2: Analysis of Complaints Data Another source of unstructured data is the notes customer service reps take while on the phone with customers. Many CRM systems offer users the ability to type in open-ended comments as an addition to the radio buttons, checklists, and other data structuring features for recording complaints, but they don’t offer built-in functionality to analyze this free-form text.  A number of banking representatives told us they considered this a major gap in their current analytics capabilities. Typically, a bank’s CRM system will offer a “pick list” of already identified problems or topics that customer service reps can choose from, but such lists don’t always provide the level of insight a company needs about what’s making its customers unhappy.  Much of the detail was captured in unstructured free-text fields that they had no easy way to analyze.  If they could quickly identify recurring themes, the banks felt they could be more proactive about addressing problems. Moreover, the banks wanted to analyze the overall emotional tone, or sentiment, of these customer case records and other free-form content sources, such as social media streams. Stand-alone tools for sentiment analysis do exist, but they are generally quite limited in scope or difficult to customize.  They wanted a tool that would easily integrate with their existing CRM system and combine its sentiment analysis with other, internally focused analytics and reporting functions—for example, to track changing consumer sentiment over time against sales or customer-service call volume. A Huge, Beautiful Use Case: Election Tracker ‘16 These are just two of the many use cases for the OpenText solution for unstructured data analytics; we’ll discuss more in future blog posts. You may already be familiar with the first application powered by the solution: the Election Tracker for the 2016 presidential race. The tracker, along with the interesting insights it sifts from thousands of articles about the campaign, has been winning headlines of its own. Expect to hear more about the Election Tracker ’16 as the campaign continues. Meanwhile, if you have ideas on other ways to use our Unstructured Data Analytics solution in your organization, leave them in the comments section.

Read More

The ‘It’ Role in IT: Chief Data Officer

shutterstock_295153469 (1)

A recent analyst report predicts that the majority of large organizations (90%) will have a Chief Data Officer by 2019. This trend is driven by the competitive need to improve efficiency through better use of information assets. To discuss this evolving role and the challenges of the CDO, Enterprise Management 360 Assistant Editor Sylvia Entwistle spoke to Allen Bonde, a long-time industry watcher with a focus on big data and digital disruption. Enterprise Management 360: Why are more and more businesses investing in the role of Chief Data Officer now? Bonde: No doubt the Chief Data Officer is kind of the new ‘It’ role in the executive suite, there’s a lot of buzz around this role. Interestingly enough, it spans operations, technology, even marketing, so we see this role in different areas of organizations. I think companies are looking for one leader to be a data steward for the organization, but also they’re looking for that same role to be a bit more focused on the future – to be the driver of innovation driven by data. So, you could say this role is growing because it’s at the crossroads of digital disruption and technology, as well as the corporate culture and leadership needed to manage in this new environment. It’s a role that I think could become the new CIO in certain really data-centric businesses. It also could become the new Chief Digital Officer in some cases. If you dig a little bit into responsibilities, it’s really about overseeing data from legacy systems and all of your third-party partners as well as managing data compliance and financial reporting. So it’s a role that has both an operational component as well as a visionary, strategic component. This is where maybe it departs from that purely technical role where there’s almost a left brain, right brain component of the CDO, empowered by technology, but ultimately it’s about people using technology in the right way, in a productive way to move the business forward. Enterprise Management 360: What trends do you think will have the biggest impact on the Chief Data Officer in 2016? Bonde: In terms of the drivers, it’s certainly about the growth of digital devices and data, especially coming from new sources. So there’s a lot of focus on device data with IoT or omni-channel data coming from the different touch points in the customer journey. There are these new sources of data and even if we don’t own them per se, they’re impacting our business, so I think that’s a big driver. Then there’s this notion of how new devices are shifting consumption models. So if you think about just a simple smartphone, this device is both creating and consuming data, changing the dynamic of how individuals are creating and interacting with their consuming data. There’s a whole back-drop to this from a regulatory perspective with privacy and other risk factors. Those are certainly drivers that are motivating companies to say “we need an individual or an office to take the lead in balancing the opportunity of data with the risk of data and making sure that we don’t, number one: get in trouble as a business but number two: we take advantage of what opportunity is in front of us.” Enterprise Management 360: Honing in on the data side of things – what are the most important levers for ensuring a return on data management and how do you think those returns should be measured? Bonde: A common thread across wherever the CDO is in the organization is going to be focused on outcomes and yes, it’s about technology; yes, it’s about consumer adoption. In terms of the so-called CDO agenda, I think that outcomes need to be pretty crisp; for example, we need to lower our risk profile or we need to improve our margins for a certain line of business or we’re all about bringing a product to market and so I think focusing on outcomes and getting alignment with those outcomes is the first most important lever that you have as a CDO. The second one I would argue is adoption and you can’t do anything at scale with data if you don’t have widespread adoption. The key to unlocking the value of data as an asset is driving the broadest adoption, so that most people in an organization including your customers and your partners get value out of the insights that you’re collecting. Ultimately this is about focusing on delivering insight into the everyday work that the organization is doing, which is very different to classic business intelligence or even big data, which used to be the domain of a relatively small number of people within the organization, who were responsible for parceling out the findings as they saw fit. I think the CDO is breaking down those walls, but this also means the CDO is facing a much bigger challenge than just simply getting BI tools in the hands of a few business analysts. Enterprise Management 360: A lot of people are seeing big data as a problem. Where would you draw the line between the hype and the fact of big data? Bonde: It’s a funny topic to me, having started earlier in my career as a data scientist working with what then was very large data sets and seeing if we could manage risk. This was in a telco, and we didn’t call it big data then, but we had the challenge of lots of data from different sources and trying to pull the meaning out, so that we could make smarter decisions about who might be a fraudulent customer or which markets we could go after. When you talk to CDO’s about big data I think that the role has benefited from hype around big data, because big data set the table for the need for a CDO. Yet CDO’s aren’t necessarily just focused on big data and in fact, one CDO we were talking to expressed, “we have more data than we know what to do with.” So they acknowledged the fact that they already have lots of that data, but their main struggle was in understanding it. It wasn’t necessarily a big data problem, it was a problem of finding the right data, and we see this day-to-day when we work with different clients; you can do an awful lot in terms of understanding by blending different data sources, not necessarily big data sources, but just different types of data and you can create insight from relatively small data sets if it’s packaged and delivered in the right way. IoT is a perfect example, people are getting excited about the challenges that managing data in the era of IoT will present, but it’s not really a big data problem, it’s a multi-source data problem. So I think the hype of big data has been useful for the market to get aligned with the idea of the importance of data, certainly the value of it when it’s collected, blended, cleaned and turned into actual insights. But in a way the big data problem has shifted to more a question of, “how do we get results quickly?” Fast is the new ‘big’ when it comes to data. I think that getting results quickly, transcends the ultimate result. If we can get a good, useful insights out quickly, that’s better than the perfect model or the perfect result. We hear this from CDO’s that they want to work in small steps, they want to fail fast. They want to run experiments that show the value of applying data in the frame of a specific business problem or objective. So to them big data may be creating their role but on a day-to-day basis it’s more about fast data or small data and blending lots of different types of data and then putting that into action quickly, which is where I think the cloud is as important as big data is to the CDO and cloud services like, Analytics-as-a-Service. Now, it may deal with big data but it almost doesn’t matter what the size of the data is; it’s how quickly you can apply it for a specific business problem. The CDO could be that keeper of the spectrum of how you’re collecting, securing, imagining your data but ultimately they’ll be judged by how successful they are at turning that data into practical insights for the everyday worker. That’s where the ROI and return on data management is across the whole spectrum, but ultimately if people can’t put the insights into use in a practical fashion and make it easy, it almost doesn’t matter what you’ve done at the back end. Read more about the Chief Data Officer’s role in a digital transformation blog by OpenText CEO, Mark Barrenchea.

Read More

Unstructured Data Analysis – the Hidden Need in Health Care


The ‘Hood I recently had the opportunity to attend the HIMSS 2016 conference in Las Vegas, one of the largest annual conferences in the field of health care technology. As I walked through the main level of the Exhibit Hall, I was amazed at the size of some vendor booths and displays. Some were the size of a house. With walls, couches, and fireplaces, they really seemed like you could move in! I was working at the OpenText booth on the Exhibit Hall level below the main one, which was actually a converted parking garage floor. Some exhibitors called this lower level “The ‘Hood”.  I loved it, though; people were friendly, the displays were great, and there were fresh-baked cookies every afternoon. I don’t know how many of the nearly 42,000 conference attendees I talked to, but I had great conversations with all of them. It’s an awesome conference to meet a diverse mix of health care professionals and learn more about the challenges they face. The Trick Half of the people I talked to asked me about solutions for analyzing unstructured data. When I asked them what kind of unstructured data they were looking to analyze, 70% of them said claim forms and medical coding. This actually surprised me. As a software developer with a data analysis background, I admit to not being totally up on health care needs. Claim forms and medical coding to me have always seemed very structured. Certain form fields get filled in on the claim form and rigid medical codes get assigned to particular diagnoses and treatments. Seems straightforward, no? What I learned from my discussions was that claims data requires a series of value judgments to improve the data quality.  I also learned that while medical coding is the transformation of health care diagnosis, procedures, medical services, and equipment into universal medical codes, this information is taken from transcriptions of physicians’ notes and lab results. This unstructured information is an area where data analysis can help immensely. The trick now is “How do we derive value from this kind of data?” The Reveal OpenText has an unstructured data analysis methodology that accesses and harvests data from unstructured sources. Its InfoFusion and Analytics products deliver a powerful knockout combination. The OpenText InfoFusion product trawls documents and extracts entities like providers, diagnoses, treatments and topics. It can then apply various analyses, such as determining sentiment. The OpenText Analytics product line can then provide advanced exploration and analysis, dashboards, and reports on the extracted data and sentiment. It then provides secure access throughout the organization through deployment on the OpenText Information Hub (iHub). Users will enjoy interactive analytic visualizations that will allow them to gain unique insights from the unstructured data sources. The Leave-Behind If you’re interested in learning more about our solution for unstructured data analytics, you can see it in action in this application, While this is not a health care solution, it demonstrates the power of unstructured data analysis that allows users to visually monitor, compare, and discover interesting facts. If you’re interested in helping me develop an example using claim forms or medical coding data, please contact me at I definitely want to demonstrate this powerful story next year at the HIMSS conference. See you next year in the ‘Hood!

Read More

Security Developments: It’s All About Analytics


Analytics is everywhere. It is on the news, in sports, and of course, part of the 2016 US elections.  I recently attended the RSA Conference 2016, an important trade show for security software solutions, because I wanted to see how security vendors were using analytics to improve their offerings. Roaming through hall after hall of exhibits, I saw some interesting trends worth sharing. One of the first things I noticed was how many times analytics was mentioned in the signage of different vendors. I also noticed a wide range of dashboards showing all different types of security data. (With this many dashboards you’d think you were at a BI conference!) You see, security is no longer just about providing anti-virus and anti-malware protection in a reactive mode. Security vendors are utilizing cybersecurity and biometric data to try to understand and mount defenses in real-time when an attack is happening. To do this, they need to analyze large amounts of data. This made me realize what some analysts are predicting. It isn’t the data that has the value, it is the proprietary algorithms. Smarter Analytics = Stronger Security This is definitely true in the security space. Many vendors are providing the same types of service; one of the ways they can differentiate themselves is the algorithms they use to analyze the data. They have to gather a large amount of data to get baselines of network traffic. Then they use algorithms to analyze data in real-time to understand if something is happening out of the norm. They hope to spot when an attack is happening at a very early stage, so they can take action to stop and limit damage before it can shut down a customer’s network or website. This is why algorithms are important. Two different products may be looking at the same data, but one detects an attack before the other. This, to me, has big data analytics written all over it. Security vendors are also paying attention to analytics from the IoT (Internet of Things). A typical corporate data security application gathers a lot of data from different devices – network routers and switches, servers, or workstations, just to name a few. The security app will look at traffic patterns and do deep packet inspection of what is in the packets. An example would be understanding what type of request is coming to a specific server: What port is it asking for and where did the request originate from?  This could help you understand if someone is starting a DoS (Denial of Service) attack of probing for a back door into your network or server. What can we learn from the trends on display at RSA this year? I think they show how analytics can help any business, in any industry. Dashboards are still very popular and efficient in displaying data to users to allow them to understand what is happening, and then make business decisions based on that data. And, not all advanced analytic tools are equal, beecause it is not about the data but whether their algorithms can help you use that data to understand what is happening, and make better business decisions. OpenText Analytics provides a great platform for businesses to create analytic applications, and use data to make better decisions faster. To get an idea of what OpenText Analytics can do, take a look at our Election Tracker ’16 app.  

Read More

Wind and Weather – Data Driven Digest


It’s the beginning of March, traditionally a month of unsettled early-spring weather that can seesaw back and forth between snow and near-tropical warmth, with fog, rain, and windstorms along the way. Suitably for the time of year, the data visualizations we’re sharing with you this week focus on wind and weather. Enjoy! You Don’t Need a Weatherman… Everyone’s familiar with the satellite imagery on the weather segment of your nightly TV news. It’s soothing to watch the wind flows cycle and clouds form and dissipate.  Now an app called Windyty lets you navigate real-time and predictive views of the weather yourself, controlling the area, altitude, and variables such as temperature, air pressure, humidity, clouds, or precipitation.  The effect is downright hypnotic, as well as educational – for example, you can see how much faster the winds blow at higher altitudes or watch fronts pick up moisture over oceans and lakes, then dump it as they hit mountains. Windyty’s creator, Czech programmer Ivo Pavlik, is an avid powder skier, pilot, and kite surfer who wanted a better idea of whether the wind would be right on days he planned to pursue his passions. He leveraged the open-source Project Earth global visualization created by Cameron Beccario (which in its turn draws weather data from NOAA, the National Weather Service, other agencies, and geographic data from the free, open-source Natural Earth mapping initiative). It’s an elegant example of a visualization that focuses on the criteria users want as they query a very large data source. Earth’s weather patterns are so large, they require supercomputers to store and process.  Pavlik notes that his goal is to keep Windyty a light-weight, fast-loading app that adds new features only gradually, rather than loading it down with too many options. …To Know Which Way the Wind Blows Another wind visualization, Project Ukko, is a good example of how to display many different variables without overwhelming viewers. Named after the ancient Finnish god of thunder, weather, and the harvest, Project Ukko models and predicts seasonal wind flows around the world. It’s a project of Euporias, a European Union effort to create more economically productive weather prediction tools, and is intended to fill a gap between short-term weather forecasts and the long-term climate outlook. Ukko’s purpose is to show where the wind blows most strongly and reliably at different times of the year. That way, wind energy companies can site their wind farms and make investments more confidently.  The overall goal is to make wind energy a more practical and cost-effective part of a country’s energy generation mix, reducing dependence on polluting fossil fuels, and improving its climate change resilience, according to Ukko’s website. The project’s designer, German data visualization expert Moritz Stefaner, faced the challenge of displaying projections of the wind’s speed, direction, and variability, overlaid with locations and sizes of wind farms around the world (to see if they’re sited in the best wind-harvesting areas). In addition, he needed to communicate how confident those predictions were for a given area. As Stefaner explains in an admirably detailed behind-the-scenes tour, he ended up using line elements that show the predicted wind speed through line thickness and prediction accuracy, compared to decades of historical records, through brightness.  The difference between current and predicted speed is shown through line tilt and color. Note, the lines don’t show the actual direction the winds are heading, unlike the flows in Windyty. The combined brightness, color, and size draw the eye to the areas of greatest change. At any point, you can drill down to the actual weather observations for that location and the predictions generated by Euporias’ models. For those of us who aren’t climate scientists or wind farm owners, the take-away from Project Ukko is how Stefaner and his team went through a series of design prototypes and data interrogations as they transformed abstract data into an informative and aesthetically pleasing visualization. Innovation Tour 2016 Meanwhile, we’re offering some impressive data visualization and analysis capacities in the next release of our software, OpenText Suite 16 and Cloud 16, coming this spring.  If you’re interested in hearing about OpenText’s ability to visualize data and enable the digital world, and you’ll be in Europe this month, we invite you to look into our Innovation Tour, in Munich, Paris, and London this week and Eindhoven in April.  You can: Hear from Mark J. Barrenechea, OpenText CEO and CTO, about the OpenText vision and the future of information management Hear from additional OpenText executives on our products, services and customer success stories Experience the newest OpenText releases with the experts behind them–including how OpenText Suite 16 and Cloud 16 help organizations take advantage of digital disruption to create a better way to work in the digital world Participate in solution-specific breakouts and demonstrations that speak directly to your needs Learn actionable, real-world strategies and best practices employed by OpenText customers to transform their organizations Connect, network, and build your brand with public and private industry leaders For more information on the Innovation Tour or to sign up, click here.   Recent Data Driven Digests: February 29: Red Carpet Edition February 15: Love is All Around February 10: Visualizing Unstructured Content

Read More

Red Carpet Edition—Data Driven Digest

Film wheel and clapper

The 88th Academy Awards will be given out Sunday, Feb. 28. There’s no lack of sites to analyze the Oscar nominated movies and predict winners. For our part, we’re focusing on the best and most thought-provoking visualizations of the Oscars and film in general.  As you prepare for the red carpet to roll out, searchlights to shine in the skies, and celebrities to pose for the camera, check out these original visualizations. Enjoy! Big Movies, Big Hits Data scientist Seth Kadish of Portland, Ore., trained his graphing powers on the biggest hits of the Oscars – the 85 movies (so far) that were nominated for 10 or more awards. He presented his findings in a handsome variation on the bubble chart, plotting numbers of nominations against Oscars won, and how many films fall into each category.  (Spoiler alert:  However many awards you’re nominated for, you can generally expect to win about half.) As you can see from the chart, “Titanic” is unchallenged as the biggest Academy Award winner to date, with 14 nominations and 11 Oscars won.  You can also see that “The Lord of the Rings: Return of the King” had the largest sweep in Oscars history, winning in all 11 of the categories in which it was nominated. “Ben-Hur” and “West Side Story” had nearly as high a win rate, 11 out of 12 awards and 10 out of 11, respectively. On the downside, “True Grit,” “American Hustle,” and “Gangs of New York” were the biggest losers – all of them got 10 nominations but didn’t win anything. Visualizing Indie Film ROI Seed & Spark, a platform for crowdfunding independent films, teamed up with the information design agency Accurat to create a series of gorgeous 3-D visualizations in the article “Selling at Sundance,” which looked at the return on investment 40 recent indie movies saw at the box office. (The movies in question, pitched from 2011 to 2013, included “Austenland,” “Robot and Frank,” and “The Spectacular Now.”) The correlations themselves are thought-provoking – especially when you realize how few movies sell for more than they cost to make. But even more valuable, in our opinion, is the behind-the-scenes explanations the Accurat team supplied on Behance of how they built these visualizations – “(giving) a shape to otherwise boring numbers.” The Accurat designers (Giorgia Lupi, Simone Quadri, Gabriele Rossi, and Michele Graffieti) wanted to display the correlation between three values: production budget, sale price, and box office gross.  After some experimentation, they decided to represent each movie as a cone-shaped, solid stack of circles, with shading representing budget at the top to sale price at the top; the stack’s height represents the box office take. They dress up their chart with sprinklings of other interesting data, such as the length, setting (historical, modern-day, or sci-fi/fantasy), and number of awards each movie won. This demonstrated that awards didn’t do much to drive box office receipts; even winning an Oscar doesn’t guarantee a profit, Accurat notes. Because the “elevator pitch” – describing the movie’s concept in just a few words, e.g. “It’s like ‘Casablanca’ in a dystopian Martian colony” – is so important, they also created a tag cloud of the 25 most common keywords used on to describe the movies they analyzed. The visualization was published in hard copy in the pilot issue of Bright Ideas Magazine, which was launched at the 2014 Sundance Film Fest. Movie Color Spectrums One of our favorite Oscars is Production Design. It honors the amazing work to create rich, immersive environments that help carry you away to a hobbit-hole, Regency ballroom, 1950s department store, or post-apocalyptic wasteland.  And color palettes are a key part of the creative effect. Dillon Baker, an undergraduate design student at the University of Washington, has come up with an innovative way to see all the colors of a movie.  He created a Java-based program that analyzes each frame of a movie for its average color, then compresses that color into a single vertical line. They get compiled into a timeline that shows the entire work’s range of colors. The effect is mesmerizing. Displayed as a spectrum, the color keys pop out at you – vivid reds, blues, and oranges for “Aladdin,” greenish ‘70s earth tones for “Moonrise Kingdom,” and Art Deco shades of pink and brown for “The Grand Budapest Hotel.”  You can also see scene and tone changes – for example, below you see the dark, earthy hues for Anna and Kristoff’s journey through the wilderness in “Frozen,” contrasted with Elsa’s icy pastels. Baker, who is a year away from his bachelor’s degree, is still coming up with possible applications for his color visualization technology. (Agricultural field surveying?  Peak wildflower prediction? Fashion trend tracking?) Meanwhile, another designer is using a combination of automated color analysis tools and her own aesthetics to extract whole color palettes from a single movie or TV still. Graphic designer Roxy Radulescu comes up with swatches of light, medium, dark, and overall palettes, focusing on a different work each week in her blog Movies in Color.  In an interview, she talks about how color reveals mood, character, and historical era, and guides the viewer’s eye. Which is not far from the principles of good information design! Recent Data Driven Digests: February 15: Love is All Around February 10: Visualizing Unstructured Content January 29: Are You Ready for Some Football?    

Read More

HIMSS16: OpenText Prescribes Healthier Patient and Business Outcomes

contact centers

This year’s Health Information and Management Systems Society (HIMSS) Conference is right around the corner.  As a HIMSS North American Emerald Corporate Member, OpenText is proudly participating in the event, taking place February 29 – March 4 in fabulous Las Vegas. This year’s event is shaping up to be a great one.  Not only will OpenText continue to showcase the #1 fax server in the healthcare industry, OpenText RightFax, but joining the conversation is OpenText Analytics, a powerful analytics tools to help make better business decisions and drive better business outcomes. Join us at HIMSS and talk to our industry experts to learn how OpenText is driving greater productivity, efficiency and security in the healthcare industry. With so many reasons to talk to the healthcare experts at OpenText, we’ve narrowed it down to our favorites…. Top 3 Reasons to Visit OpenText at HIMSS Save money. Make the shift to hybrid faxing with RightFax and RightFax Connect: Hybrid faxing combines the power of your on-premises RightFax system with the simplicity of RightFax Connect for cloud-based fax transmission. Learn how to save money and make your RightFax environment even easier to manage. Drive compliance. Better patient information exchange with RightFax Healthcare Direct: RightFax is the only fax server that combines fax and Direct messaging in a single solution with RightFax Healthcare Direct. Learn how you can accelerate your road to interoperability with Direct through this new innovative solution. Make better decisions. Learn about OpenText Analytics products and revolutionize your reporting and analytics infrastructure. This will give you the tools to build the best data-driven enterprise apps. With live data analytics seamlessly incorporated into your apps, you can track, report, and analyze your data in real time. Be sure to visit OpenText at Booth #12113 Hall G, and learn how OpenText will help you save money, increase the security and compliance of patient information exchange, and make better decisions through data analytics.

Read More

Financial Synergy Outlines Integration of OpenText Big Data Analytics at OpenText Innovation Tour 2016

conversion rate optimization

Speaking at today’s OpenText Innovation Tour 2016 event in Sydney, Financial Synergy – Australia’s leading superannuation and investment software provider and innovator – outlined how it is embedding OpenText Big Data Analytics functionality into its Acuity platform. Using OpenText Big Data Analytics, Financial Synergy can provide a predictive and customer-centric environment for wealth management organisations and superannuation funds which relying on the Acuity platform. Stephen Mackley, CEO at Financial Synergy said, “Embedding OpenText Big Data Analytics into Acuity will allow superannuation funds of all sizes and types to affordably discover the hidden value of their data. They will have far greater capacity to retain existing customers and expand potential market share as they will know what has happened, what’s happening and what will happen.” Shankar Sivaprakasam, Vice President, OpenText Analytics (APAC and Japan) said: “One of the greatest challenges for Australia’s wealth management industry is its ability to engage with members, particularly in the early stages of account development. Advanced Big Data analytics is key to understanding the customer, their needs and behaviours. It will provide the ability to interrogate all structured and unstructured data and monitor how to best meet a customer’s goals.” Mackley continued, “We are offering a powerfully flexible tool with which our clients can use strategic, predictive knowledge to create new, efficient business models. It will also enable deeper segmentation, to ‘market of one’ levels of customer service.” Financial Synergy is a leading innovator and provider of digital solutions to the superannuation and investment services industry. The combination of its unique platform capabilities and expertise of in-house software and administration specialists, allow Financial Synergy to transform the member experience and boost business performance. Article Written By Shankar Sivaprakasam, vice president,  Analytics (APAC and Japan), OpenText  

Read More

Love is All Around – Data Driven Digest


The joy of Valentine’s Day has put romance in the air. Even though love is notoriously hard to quantify and chart, we’ve seen some intriguing visualizations related to that mysterious feeling.  If you have a significant other, draw him or her near, put on some nice music and enjoy these links. Got a Thing for You We’ve talked before about the Internet of Things and the “quantified self” movement made possible by ever smaller, cheaper, and more reliable sensors. One young engineer, Anthony O’Malley, took that a step further by tricking his girlfriend into wearing a heart rate monitor while he proposed to her. The story, as he told it on Reddit, is that he and his girlfriend were hiking in Brazil and he suggested that they should compare their heart rates on steep routes. As shown in the graph he made later, a brisk hike on a warm, steamy day explains his girlfriend’s relatively high baseline pulse, around 100 beats per minute (bpm), while he sat her down and read her a poem about their romantic history. What kicked it into overdrive was when he got down on one knee and proposed; her pulse spiked at about 145 bpm—then leveled off a little to the 125-135 bpm range, as they slow-danced by a waterfall. Finally, once the music ended, the happy couple caught their breath and the heart rate of the now bride-to-be returned to normal.   What makes this chart great is the careful documentation. Pulse is displayed not just at 5-second intervals but as a moving average over 30 seconds (smoothing out some of the randomness), against the mean heart rate of 107 bpm.  O’Malley thoughtfully added explanatory labels for changes in the data, such as “She says SIM!” (yes in Portuguese) and “Song ends.” Now we’re wondering whether this will inspire similar tracker-generated reports, such as giving all the ushers in a wedding party FitBits instead of boutonnieres, or using micro-expressions to check whether you actually liked those shower gifts. Two Households, Both Alike in Dignity One of the most famous love stories in literature, “Romeo and Juliet,” is at heart a story of teenage lovers thwarted by their families’ rivalry. Swiss scholar and designer Martin Grandjean illuminated this aspect of the play by drawing in a series of innovative network diagrams of all Shakespeare’s tragedies.   Each circle represents a character—the larger, the more important—while lines connect characters who are in the same scene together. The “network density” statistic indicates how widely distributed the interactions are; 100% means that each character shares the stage at least once with everybody else in the play. The lowest network density (17%) belongs to Antony and Cleopatra, which features geographically far-flung groups of characters who mostly talk amongst themselves (Cleopatra’s courtiers, Antony’s friends, his ex-wives and competitors back in Rome). By contrast, Othello has the highest network density at 55%; its diagram shows a tight-knit group of colleagues, rivals, and would-be lovers on the Venetian military base at Cyprus trading gossip and threats at practically high-school levels. The diagram of Romeo and Juliet distinctly shows the separate families, Montagues and Capulets. Grandjean’s method also reveals how groups shape the drama, as he writes:  “Trojans and Greeks in Troilus and Cressida, … the Volscians and the Romans in Coriolanus, or the conspirators in Julius Caesar.” Alright, We’ve Got a Winner Whether your Valentine’s Day turns out to be happy or disappointing, there’s surely a pop song to sum up your mood. The Grammy Awards are a showcase for the best — or at least the most popular — songs of the past year in the United States.   The online lyrics library Musixmatch, based in Bologna, Italy, leveraged its terabytes of data and custom algorithms to make their prediction based on all 295 of the past Song of the Year nominees (going back to 1959).  As Musixmatch data scientist Varun Jewalikar and designer Federica Fragapane wrote, they built a predictive analytics model based on a random forest classifier, which ended up ranking all 5 of this year’s nominees from most to least likely to win. Before announcing the predicted winner, Fragapane and Jewalikar made a few observations: Song of the Year winners have been getting wordier, though not necessarily longer. (Most likely due to the increasing popularity of rap and hip-hop genres, where lyrics are more prominent.) They’ve also been getting louder. Lyrics are twice as important as audio.   And they note that a sample set of fewer than 300 songs “is not enough data to build an accurate model and also there are many factors (social impact, popularity, etc.) which haven’t been modeled here. Thus, these predictions should be taken with a very big pinch of salt.” With that said, their prediction… was a bit off but still a great example of visualized data. Recent Data Driven Digests: February 10: Visualizing Unstructured Content January 29: Are You Ready for Some Football? January 19: Crowd-Sourcing the Internet of Things  

Read More

Visualizing Unstructured Analysis — Data Driven Digest


As the 2016 Presidential campaigns finish New Hampshire and move on towards “Super Tuesday” on March 1, the candidates and talking heads are still trading accusations about media bias. Which got us thinking about text analysis and ways to visualize unstructured content.  (Not that we’re bragging, but TechCrunch thinks we have an interesting way to measure the tenor of coverage on the candidates…) So this week in the Data Driven Digest, we’re serving up some ingenious visualizations of unstructured data. Enjoy! Unstructured Data Visualization in Action We’ve been busy with our own visualization of unstructured data — namely, all the media coverage of the 2016 Presidential race.  Just in time for the first-in-the-nation Iowa caucuses, OpenText released Election Tracker ‘16, an online tool that lets you monitor, compare, and analyze news coverage of all the candidates. Drawing on OpenText Release 16 (Content Suite and Analytics Suite), Election Tracker ‘16 automatically scans and reads hundreds of major online media publications around the world. This data is analyzed daily to determine sentiment and extract additional information, such as people, places, and topics. It is then translated into visual summaries and embedded into the election app where it can be accessed using interactive dashboards and reports. This kind of content analysis can reveal much more than traditional polling data ─holistic insights into candidates’ approaches and whether their campaign messages are attracting coverage. And although digesting the daily coverage has long been a part of any politician’s day, OpenText Release 16 can do what no human can do: Read, analyze, process, and visualize a billion words a day.  Word Crunching 9 Billion Tweets While we’re tracking language, forensic linguist Jack Grieve of Aston University, Birmingham, England has come up with an “on fleek” (perfect, on point) way to pinpoint how new slang words enter the language: Twitter. Grieve studied a dataset of Tweets in 2013─4 from 7 million users all over America, containing nearly 9 billion words (collected by geography professor Diansheng Guo of the University of South Carolina). After eliminating all the regular, boring words found in the dictionary (so that he’d only be seeing “new” words), Grieve sorted all the remaining words by county, filtered out the rare outliers and obvious mistakes, and looked for the terms that showed the fastest rise in popularity, week over week. These popular newcomers included “baeless” (single/a seemingly perpetual state), “famo” (family and friends), TFW (“that feeling when…” e.g. TFW when a much younger friend has to define the term for you chagrin─ that would be chagrin ), and “rekt” (short for wrecked or destroyed, not “rectitude”). As described in the online magazine Quartz, Grieve found that some new words are popularized by social media microstars or are native to the Internet, like “faved” (to “favorite” a Tweet) or “amirite” (an intentional misspelling of “Am I right?” mocking the assumption that your audience agrees with a given point of view). Grieve’s larger points include the insights you can get from crunching Big Data (9 billion Twitter words!), and social media’s ability to capture language as it’s actually used in real time. “If you’re talking about everyday spoken language, Twitter is going to be closer than a news interview or university lecture,” he told Quartz. Spreading Virally On a more serious subject, unstructured data in the form of news coverage helps track outbreaks of infectious diseases such as the Zika virus. is a site (and mobile app) created by a team of medical researchers and software developers at Boston Children’s Hospital. They use “online informal sources” to track emerging diseases including flu, the dengue virus, and Zika. Their tracker automatically pulls from a wide range of intelligence sources, including online news stories, eyewitness accounts, official reports, and expert discussions about dangerous infectious diseases.  (In nine languages, including Chinese and Spanish.) Drawing from unstructured data is what differentiates from other infectious disease trackers, such as the federal Centers for Disease Control and Prevention’s weekly FluView report. The CDC’s FluView provides an admirable range of data, broken out by patients’ age, region, flu strain, comparisons with previous flu seasons, and more. The only problem is that the CDC bases its reports on flu cases reported by hospitals and public health clinics in the U.S. This means the data is both delayed and incomplete (e.g. doesn’t include flu victims who never saw a doctor, or cases not reported to the CDC), limiting its predictive value. By contrast, the HealthMap approach captures a much broader range of data sources. So its reports convey a fuller picture of disease outbreaks, in near-real time, giving doctors and public-health planners (or nervous travelers) better insights into how Zika is likely to spread.  This kind of data visualization is just what the doctor ordered. Recent Data Driven Digests: January 29: Are You Ready for Some Football? January 19: Crowd-Sourcing the Internet of Things January 15: Location Intelligence    

Read More

3 Ways Election Tracker ’16 Solves the Unstructured Data Dilemma


When OpenText launched Election Tracker ’16, we received several encouraging and positive responses regarding how easy it is to compare stats about their favorite Presidential candidate using the interactive visualization and intelligent analysis. And without fail, the next question was, “How does it work and how could it help my business?” Powered by Release 16, Election Tracker is a great example of unstructured data analysis in action. It is able to showcase the power and importance of unstructured data analysis in a relatable way. In fact, we feel the Election Tracker addresses the dilemma of unstructured data in three distinct ways. 1) Intelligent Analysis Making sense of unstructured data is a high concern for digital organizations. Perhaps it’s trying to understand Google’s Page Rank algorithm, finding sentiment in the body of an email or website, or perhaps scanning digital health records for trends. It is also important for businesses that need to organize and govern all data within an enterprise. Companies are not shy about throwing money at the problem. The global business intelligence market saw revenue of nearly $6 billion in 2015. That number is expected to grow to $17 billion at a CAGR of 10.38 percent between now and 2020, according to market research firm Technavio. Much of the investment is expected to come in the form of data analysis and cloud implementation. The secret sauce is our content analytics tool, OpenText InfoFusion. Using natural language processing technology or text mining engine, the software tackles the core of unstructured data by extracting the most relevant linguistic nouns from semi-structured or unstructured textual content. The extraction is based on controlled vocabularies such as names, places, organization labels, product nomenclature, facility locations, employee directories, and even your business jargon. The InfoFusion engine is able to automatically categorize content based on a file plan, hierarchical taxonomy or classification tree. This automatically creates a summary combining the most significant phrases and paragraphs. It can also show related documents.  This ability to relate documents is based on semantics—asking the engine to give you a document that has the same keywords, key phrases, topics and entities. The engine can also detect the ways that key words and phrases are used and correlate them to known indicators of whether a document is dryly factual or conveying emotion about a topic, and whether that emotion is positive or negative ─that is, its sentiment. 2) Interactive Visualization All the data in the world means nothing without some way to visually represent the context. Most pure-play “text analytics” solutions on the market today stop short of actual analysis. They are limited to translating free text to entities and taxonomies, leaving the actual visualization and analysis for the customer to figure out using other technologies. The technology powering the Election Tracker overcomes this important dilemma by converting the data into a visual representation that helps with content analysis. Once the Election Tracker mines raw text from the scores of major news sites around the world, it then uses OpenText Content Analytics to process the content. This determines sentiment and extracts people, places, and topics following standard or custom taxonomies providing the meta-data necessary to conduct an analysis. The tracker determines the objectivity or subjectivity of content and the tone: positive, negative or neutral. Visual summaries of the news data are generated with the Analytics Designer, then housed and deployed on OpenText iHub. The iHub-based visuals are seamlessly embedded into the Election Tracker user interface using the iHub JavaScript API. 3) Scalable and Embeddable While we designed the Election Tracker to automatically crawl the web for election-focused articles, the technology behind the scenes can access and harvest data from any unstructured source. This includes social sites like Twitter, Facebook, and LinkedIn; email; multimedia message service (MMS); document archives like PDFs; RSS feeds; and blogs. Additionally, these sources can be combined with structured data to provide extremely valuable context—such as combining brand social sentiment from Twitter with product launch campaign results from a customer relationship management source, giving unparalleled insight to the success of a launch process. Overcoming the problems of scale can help ease fears about needing to add more data sources in the future. Its ability to be embedded allows companies to use their own branding and serve their customers in a format that is comfortable to the end user. See what all the buzz is about by visiting Election Tracker ’16 at: For more on the technology behind the Election Tracker ’16, here is a 20-minute election tracker demo. Also, visit our blog to discover the importance of designing for unstructured analysis of data.  

Read More

OpenText Releases U.S. Presidential Election Application, Powered by OpenText Release 16

Election Tracker

As the highly contested 2016 U.S. Presidential election and the Iowa caucus approach, I’m pleased to announce the release of Election Tracker ‘16, an online application tool for users who want to monitor, compare, and gain insights into the 2016 U.S. Presidential election. For more information, please visit the following site: How does work?  Utilizing OpenText Release 16 (Content Suite, and Analytics Suite), Election Tracker ‘16 automatically scans and reads hundreds of top online media publications around the world, capitalizing on the knowledge buried in the unstructured information. This data is analyzed daily to determine sentiment and extract additional information, such as people, places, and topics. It is then translated into visual summaries and embedded into the election app where it can be accessed using interactive dashboards and reports. That is correct, hundreds of websites, a billion words, processed, stored and visualized, in real time. And we have been collecting this data for months to show trends and sentiment changes. Powered by OpenText Release 16, this information-based application provides anyone interested in following the critical 2016 election with deep insights into candidate information, revealing much more than traditional polling data. Using, election enthusiasts are able to gain a holistic view of how candidates are performing based on media sentiment, which can be a more accurate indication of future success. Election Tracker ‘16 is built using OpenText Release 16, which includes Content Suite (store) and Analytics Suite (visualize and predict), bringing seemingly unstructured data to life. OpenText Release 16 can do what no human can do. Read, analyze, process, and visualize a billion words a day. Transforming Unstructured Data into Interactive Insights All three components of OpenText Release 16 are important, but let me focus on the analytic aspects of Election Tracker ‘16.  As we saw in the 2012 U.S. election, analytics played a major role in the success of the Obama campaign. Drawing from a centralized database of voter information, President Obama’s team was able to leverage analytics to make smarter, more efficient campaign decisions. Everything—from media buy placements to campaign fundraising to voter outreach—was driven by insight into data. This year, analytics promises to play an even bigger role in the U.S. Presidential election. Analytics has made its way into every industry and has become a part of everyday life. Just as candidates look for a competitive advantage to help them win, so too must businesses. Research shows that organizations that are data driven are more profitable and productive than their competitors. Analytics and reporting solutions help organizations become data driven by extracting value from their business data and making it available across the enterprise to facilitate better decision making. Organizations have been using analytics to unlock the power of their business data for years. Marketing analysts are using analytics to evaluate social content and understand customer sentiment. Legal analysts can gain a quick understanding of the context and sentiment of large volumes of legal briefs. Data directors, tasked with organizing and governing enterprise data, are applying analytics to solve complex business problems. But basic analytics has become table stakes, with laggards, smaller companies, and even skeptics jumping on the bandwagon. Embedded analytics is the new competitive advantage. When you scan the most pure-play “text analytics” solutions on the market today, they clearly stop short of actual analysis. They are limited to translating free text to entities and taxonomies, leaving the actual visualization and analysis for organizations to figure out using other technologies. At the other end of the spectrum, traditional dashboard tools lack the sophistication needed to process free text effectively, and they struggle with large amounts of data. With OpenText Release 16, our Analytics Suite accomplishes both with ease, empowering users to quickly and easily build customized enterprise reporting applications to summarize and visualize insights from unstructured big data, securely delivered across web browsers, tablets, and phones. So, whether you’re out to win the race for the U.S. Presidency, gain market share, or attract new customers, an embedded analytics and reporting solution like Election Tracker ’16 can help you cross the finish line first. Throughout the race to November 2016, we’ll be tapping into the power of Election Tracker ’16 to shed light on how candidates perform during key election milestones. Join us for a behind the scenes look at the campaign trail. For more information on the Election Tracker ’16—powered by OpenText read the press release.

Read More

Are You Ready for Some Football? — Data Driven Digest


Here in Silicon Valley, Super Bowl 50 is not only coming up, it’s downright inescapable. The OpenText offices are positioned halfway between Santa Clara, where the game will actually be played on Feb. 7, and San Francisco, site of the official festivities (44 miles north of the stadium, but who’s counting?).  So in honor of the Big Game, this week we’re throwing on the shoulder pads and tackling some data visualizations related to American football.  Enjoy! Bringing Analytics Into Play In the area of statistics and data visualization, football has always taken a back seat to baseball, the game beloved by generations of bow-tied intellectuals.  But Big Data is changing the practice of everything from medicine to merchandising, so it’s no wonder that better analysis of the numbers is changing the play and appreciation of football. Exhibit A:  Kevin Kelley, head football coach at Pulaski Academy in Little Rock, Ark., has a highly unusual style of play–no punts.  He’s seen the studies from academics such as UC Berkeley professor David Romer, concluding that teams shouldn’t punt when facing fourth downs with less than 4 yards gained, and he came to the conclusion that “field position didn’t matter nearly as much as everyone thought it did.” As Kelley explains in an ESPN short film on the hub, if you try to punt when the ball is on your 5-yard line or less, the other team scores 92% of the time. Even 40 yards from your goal line, the other team still scores 77% of the time. “Numbers have shown that what we’re doing is correct,” he says in the film. “There’s no question in my mind, or my coaches’ minds, that we wouldn’t have had the success we’ve had without bringing analytics into (play).” The coach’s data-driven approach has paid off, giving Pulaski multiple winning seasons over the past 12 years, including a 14-0 record in 2015. The highlight of their latest season: Beating Texas football powerhouse Highland Park 40-13 and snapping its 84-game home winning streak, which goes back to 1999. Bigger, Faster, Stronger No doubt most of Coach Kelley’s players dream of turning pro. But they’ll need to bulk up if they want to compete, especially as defensive linemen.  Two data scientists offer vivid demonstrations of how much bigger NFL players have gotten over the past few generations. Software engineer and former astrophysicist Craig M. Booth crunched the data from 2013 NFL rosters to create charts of their heights and weights.  His chart makes it easy to see how various positions sort neatly into clusters:  light, nimble wide receivers and cornerbacks; tall defensive and tight ends; refrigerator-sized tackles and guards. The way Booth mapped the height/weight correlation, with different colors and shapes indicating the various positions, isn’t rocket science. It is, however, a great example of how automation is making data visualization an everyday tool. As he explains on his blog, he didn’t have to manually plot the data points for all 1,700-odd players in the NFL; he downloaded a database of the player measurements from the NFL’s Web site, then used an iPython script to display it. For a historical perspective on how players have gotten bigger since 1950, Booth created a series of line charts showing how players’ weights have skyrocketed relative to their heights. Backfield in Motion Meanwhile, Noah Veltman, a member of the data-driven journalism team at New York City’s public radio station WNYC, has made the bulking-up trend even more vivid by adding a third dimension – time – to his visualization.   His animation draws on NFL player measurements going all the way back to 1920. He observes that football players’ increasing size is partly due to the fact that Americans in general have gotten taller and heavier over time – though partly also due to increasing specialization of body type by position. You can see a wider range of height-and-weight combinations as the years go by.  And from the 1990s on, they begin falling into clusters.  (You could also factor in more weight training, rougher styles of play, and other trends, but we’ll leave that discussion to the true football geeks.) Bars, Lines, and Bubbles Now, what kind of play are we seeing from these bigger, better-trained players? Craig M. Booth recently unveiled an even more interesting football-related project, an interactive visualizer of the performance of every NFL team from 2000 on.  He uses the Google charts API to display data from on everything from points scored by team by quarter to total passing or penalty yards. You can customize the visualizer by the teams tracked, which variables appear on the X and Y-axes, whether they’re on a linear or logarithmic scale, and whether to display the data as bubble plots, bar charts, or line graphs. It can serve up all kinds of interesting correlations.  (Even though OpenText offers powerful predictive capacities in our Big Data Analytics suite, we disavow any use of this information to predict the outcome of a certain football game on February 7…) OpenText Named a Leader in the Internet of Things Speaking of sharing data points, OpenText was honored recently in the area of Internet of Things by Dresner Advisory Services, a leading analyst firm in the field of business intelligence, with its first-ever Technical Innovation Awards. You can view an infographic on Dresner’s Wisdom of Crowds research. Recent Data Driven Digests: January 19: Crowd-Sourcing the Internet of Things January 15: Location Intelligence January 5: Life and Expectations  

Read More

Crowd-Sourcing the Internet of Things (Data Driven Digest for January 19, 2016)

Runner with fitness tracker passes the Golden Gate Bridge

The Internet of Things is getting a lot of press these days. The potential use cases are endless, as colleague Noelia Llorente has pointed out: Refrigerators that keep track of the food inside and order more milk or lettuce whenever you’re running low. Mirrors that can determine if you have symptoms of illness and make health recommendations for you. Automated plantations, smart city lighting, autonomous cars that pick you up anywhere in the city… So in this week’s Data Driven Digest, we’re looking at real-world instances of the Internet of Things that do a good job of sharing and visualizing data. As always, we welcome your comments and suggestions for topics in the field of data visualization and analysis.  Enjoy! The Journey of a Million Miles Starts with a Single Step Fitness tracking has long been a popular use for the Internet of Things. Your correspondent was an early adopter, having bought Nike+ running shoes, with a special pocket for a small Internet-enabled sensor, back in 2007.  (Nike+ is now an app using customers’ smartphones, smart watches, and so forth as trackers.) These sensors track where you go and how fast, and your runs can be uploaded and displayed on the Nike+ online forum, along with user-generated commentary – “trash talk” to motivate your running buddies, describing and rating routes, and so forth. Nike is hardly the only online run-sharing provider, but its site is popular enough to have generated years of activity patterns by millions of users worldwide. Here’s one example, a heat map of workouts in the beautiful waterfront parks near San Francisco’s upscale Presidio and Marina neighborhoods.  (You can see which streets are most popular – and, probably, which corners have the best coffeehouses…) The Air That I Breathe Running makes people more aware of the quality of the air that they breathe., an “environmental justice” nonprofit in Brooklyn, N.Y., is trying to make people more conscious of the invisible problem of air pollution through palm-sized sensors called AirBeams. These handheld sensors can measure levels of microparticulate pollution, ozone, carbon monoxide, and nitrogen dioxide (which can be blamed for everything from asthma to heart disease and lung cancer) as well as temperature, humidity, ambient noise, and other conditions. So far so good – buy an AirBeam for $250 and get a personal air-quality meter, whose findings may surprise you. (For example, cooking on a range that doesn’t have an effective air vent subjects you to more grease, soot, and other pollution than the worst smog day in Beijing.) But the Internet of Things is what really makes the device valuable. Just like with the Nike+ activity trackers, AirBeam users upload their sensor data to create collaborative maps of air quality in their neighborhoods. Here, a user demonstrates how his bicycle commute across the Manhattan Bridge subjects him to a lot of truck exhaust and other pollution – a peak of about 80 micrograms of particulate per cubic meter (µg/m3), over twice the Environmental Protection Agency’s 24-hour limit of 35 µg/m3. And here’s a realtime aggregation of hundreds of users’ data about the air quality over Manhattan and Brooklyn.  (Curiously, some of the worst air quality is over the Ozone Park neighborhood…) Clearly, the network effect applies with these and many other crowd-sourced Internet of Things applications – the more data points users are willing to share, the more valuable the overall solution becomes. OpenText Named a Leader in the Internet of Things Speaking of sharing data points, OpenText was honored recently in the area of Internet of Things by Dresner Advisory Services, a leading analyst firm in the field of business intelligence, with its first-ever Technical Innovation Awards. You can view an infographic on Dresner’s Wisdom of Crowds research. Recent Data Driven Digests: January 15: Location Intelligence January 5: Life and Expectations December 22: The Passage of Time in Sun, Stone, and Stars

Read More

Next Step For Internet of Things: Analytics of Things

Infographic Internet of Things Dresner Report

“Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Between us, Theory and Practice agree: nothing works and nobody knows why.” Anonymous. The Internet of Things (IoT) is clearly the next big step in the technology industry. It will be almost like the second Industrial Revolution, opening a world of incalculable, even greater possibilities in the digital age, potentially achieving greater independence and, therefore, greater efficiency. Imagine the possibilities. Refrigerators that measure the food inside and replenish the right product from your preferred vendor. Mirrors that can determine if you have symptoms of illness and make health recommendations for you. Smart watches that monitor your vital signs and warn the health services if you have a problem or emergency. Traffic lights that connect to a circuit of cameras to identify the level of traffic and mass movement, thus preventing absurd waiting times in areas of little movement. Automated plantations, smart city lighting, courier drones that deliver whatever you want and wherever you want, autonomous cars that pick you up anywhere in the city…  Sounds like futurist Sci-Fi. But what if this scenario was closer than we think? I had the pleasure of attending Big Data Week in Barcelona (#BDW15) recently, which featured top speakers from industry leading companies. My expectation was that I would listen to a lot of talks on technology, programming languages, Hadoop, R and theories about the future for humans and business in this new era of Big Data. After hearing the first presentations from Telefonica (Telcos), BBVA Data & Analytics (Finance) and the Smart Living Program at Mobile World Capital Barcelona (Technology), I realized something. Regardless of the industry, it was all about how insights from data produced by physical objects disrupt our lives as individuals, consumers, parents, and business leaders. It doesn’t matter which role you play. Yes, it was all about the “Internet of Things” or to take it a step forward, the “Analytics of Things”. These companies are already changing the way they do business by leveraging information from internet connected devices that they already have. And, that is just the beginning. Gartner estimates that by 2020, there will be 20+ billion of devices connected to IoT. Dresner’s 2015 The Internet of Things and Business Intelligence report, estimates that 67% of enterprises consider IoT to be an important part of their future strategies and 89% of information professionals think predictive analytics will do the same within the Internet of Things. The data itself is not the point, it is how Big Data Analytics technologies enable organizations to collect, cross, integrate and analyze data from devices to design better products and make our lives easier. So, have Theory and Practice finally converged? What if the future is right now? Take a look at the infographic based on the findings of the IoT Dresner Report or download the full study to find out more.

Read More

Data Driven Digest for January 15: Location, Location, Location

Screenshot from

Location intelligence is a trendy term in the business world these days. Basically, it means tracking the whereabouts of things or people, often in real time, and combining that with other information to provide relevant, useful insights. At a consumer level, location intelligence can help with things like finding a coffeehouse open after 9 p.m. or figuring out whether driving via the freeway or city streets will be faster. At a business level, it can help with decisions like where to build a new store branch that doesn’t cannibalize existing customers, or laying out the most efficient delivery truck routes. Location intelligence is particularly on our mind now because OpenText was recently honored by Dresner Advisory Services, a leading analyst firm in the field of business intelligence, with its first-ever Technical Innovation Awards. Dresner recognized our achievements in three areas: Location Intelligence, Internet of Things, and Embedded BI. You’ll be hearing more about these awards later. In the meantime, we’re sharing some great data visualizations based on location intelligence.  As always, we welcome your comments and suggestions.  Enjoy! Take the A Train In cities all over North America, people waiting at bus, train, or trolley stops who are looking at their smartphones aren’t just killing time – they’re finding out exactly when their ride is due to arrive. One of the most popular use cases for location intelligence is real-time transit updates. Scores of transit agencies, from New York and Toronto to Honolulu, have begun tracking the whereabouts of the vehicles in their fleets, and sharing that information in live maps. One of the latest additions is the St. Charles Streetcar line of the New Orleans Regional Transit Authority (NORTA) — actually the oldest continuously operating street railway in the world! (It was created in 1835 as a passenger railway between downtown New Orleans and the Carrollton neighborhood, according to the NORTA Web site.) This is not only a boon to passengers, the location data can also help transit planners figure out where buses are bunching up or falling behind, and adjust schedules accordingly. On the Street Where You Live Crowdsourcing is a popular way to enhance location intelligence. The New York Times offers a great example with this interactive feature describing writers’ and artists’ favorite walks around New York City. You can not only explore the map and associated stories, you can add your own – like this account of a proposal on the Manhattan Bridge. Shelter from the Storm The City of Los Angeles is using location intelligence in a particularly timely way: An interactive map of resources to help residents cope with winter rainstorms (which are expected to be especially bad this year, due to the El Niño weather phenomenon). The city has created a Google Map, embedded in the site, that shows rainfall severity and any related power outages or flooded streets, along with where residents can find sandbags, hardware stores, or shelter from severe weather, among other things.  It’s accessible via both desktop and smartphones, so users can get directions while they’re driving. (Speaking of directions while driving in the rain, L.A. musician and artist Brad Walsh captured some brilliant footage of an apparently self-driving wheeled trashcan in the Mt. Washington neighborhood. We’re sure it’ll get its own Twitter account any day now.) We share our favorite data-driven observations and visualizations every week here. What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: January 5: Life and Expectations December 22: The Passage of Time in Sun, Stone, and Stars December 18: The Data Awakens  

Read More

Data Scientist: What Can I Do For You?

Data Scientist and OpenText Big Data Analytics

After attending our first Enterprise World, I have just one word to define it: intense. In my memory, there are a huge number of incredible moments: spectacular keynotes, lots of demos and amazing breakout sessions. Now, trying to digest all of these experiences, collecting all the opinions, suggestions and thoughts of the customers that visited our booth, I remember a wonderful conversation with a customer about data mining techniques, their best approaches and where we can help with our products. From the details and the way he formed his questions it was pretty clear that, in front of me, I had a data scientist or, maybe someone who deeply understands this amazing world of data mining, machine learning algorithms and predictive analytics. Just to put it in context, usually the data scientist maintains a professional skepticism about applications that provide an easy-to-use interface, without a lot of options and knobs, when running algorithms for prescriptive or predictive analytics. They love to tweak algorithms, writing their own code or accessing and modifying all the parameters of a certain data mining technique, just to obtain the best model for their business challenge. They want to have the full control in the process and it is fully understandable. It is their comfort zone. Data scientists fight against concepts like the democratization of predictive analytics. They have good reasons. And, I agree with a large number of them. Most of the data mining techniques are pretty complex, difficult to understand and need a lot of statistics knowledge just to say, “Okay, this looks pretty good.” Predictive models need to be maintained and revised frequently, based on your business needs and the amount of data you expect to use during the training/testing process. More often than you can imagine, models can’t be reused for similar use cases. Each business challenge has its own data related, and that data is what will define how this prescriptive or predictive model should be trained, tested, validated and, ultimately, applied in the business. On the other hand, a business analyst or a business user without a PhD can take advantage of predictive applications that have those most common algorithms in a box (a black box) and start answering their questions about the business. Moreover, usually their companies can’t assume the expensive compensation of a data scientist, so they deal with all of this by themselves. But, what can we do for you, data scientist? The journey starts with the integration of distinct sources from databases, text files, spreadsheets or, even, applications in a single repository, where everything is connected. Exploring and visualizing complex data models with several levels of hierarchy offers a better approach to the business model than the most common huge table method. Having an analytical repository as a reflection of how the business flows, helps in one of the hardest parts of the Data Scientist: problem definition. Collecting data is just the beginning, there is a huge list of tasks related to data preparation, data quality and data normalization. Here is where the business analyst or the data scientist loses much of their precious time and we are here to help them, accelerating time from raw data to value. Once they have achieved their goal of getting clean data, a data scientist begins the step of analyzing the data, finding patterns, correlations and hidden relationships. OpenText Big Data Analytics can help provide an agile solution to perform all this analysis. Moreover, everything is calculated fast and using all your data, your big data, offering a flexible trial and error environment. So the answer to my question: OpenText Big Data Analytics can reduce the time during the preparation process, increasing time where it is really needed: analysis and decision making, even if the company is dealing with big data. So, why don’t you try it in our 30 days Free Trial or ask us for a demo?

Read More

Data Driven Digest for January 5: Life and Expectations

Image created by Tim Urban of

Welcome to 2016!  Wrapping up 2015 and making resolutions for the year ahead is a good opportunity to consider the passage of time – and in particular, how much is left to each of us. We’re presenting some of the best visualizations of lifespans and life expectancy. So haul that bag of empty champagne bottles and eggnog cartons to the recycling bin, pour yourself a nice glass of kale juice, and enjoy these links for the New Year. “Like Sands Through the Hourglass…” It’s natural to wonder how many years more we’ll live. In fact, it’s an important calculation, when planning for retirement. Figuring out how long a whole population will live is a solvable problem – in fact, statisticians have been forecasting life expectancy for nearly a century. And, the news is generally good:  Life expectancies are going up in nearly every country around the world. But how do you figure out how many years are left to you, personally? (Short of consulting a fortune-teller, a process we don’t recommend as the conclusions are generally not data-driven.) UCLA-trained statistician Nathan Yau of the excellent blog Flowing Data came up with a visualization that looks a bit like a pachinko game. It runs multiple simulations predicting your likely age at death (based on age, gender, and Social Security Administration data) by showing little balls dropping off a slide to hit a range of potential remaining lifespans, everything from “you could die tomorrow” to “you could live to 100.” As the simulations pile up, they peak at the likeliest point. One of the advantages of Yau’s simulator is that it doesn’t provide just one answer, the way many calculators do that ask about your age, gender, race, health habits, and so forth. Instead, it uses the “Monte Carlo” method of multiple randomized trials to get an aggregated answer. Plus, the little rolling, bouncing balls are visually compelling.  (That’s academic-ese for “They’re fun to watch!”) “Visually compelling” is the key.  As flesh-and-blood creatures, we can’t operate entirely in the abstract. It’s one thing to be told you can expect to live X years more; seeing that information as an image somehow has more impact in terms of motivating us to action. That’s why the approach taken by Wait But Why blogger Tim Urban is so striking despite being so simple.  He started with the assumption that we’ll each live to 90 years old – optimistic, but doable. Then he rendered that lifespan as a series of squares, one per year. What makes Urban’s analysis memorable – and a bit chilling – is when he illustrates the remaining years of life as the events in that life – baseball games, trips to the beach, Chinese dumplings, days with aging parents or friends.  Here, he figures that 34 of his 90 expected winter ski trips are already behind him, leaving only 56 to go. Stepping back, he comes to three conclusions: 1) Living in the same place as the people you love matters. I probably have 10X the time left with the people who live in my city as I do with the people who live somewhere else. 2) Priorities matter. Your remaining face time with any person depends largely on where that person falls on your list of life priorities. Make sure this list is set by you—not by unconscious inertia. 3) Quality time matters. If you’re in your last 10% of time with someone you love, keep that fact in the front of your mind when you’re with them and treat that time as what it actually is: precious. Spending Time on Embedded Analytics Since we’re looking ahead to the New Year, on Tuesday, Jan. 12, we’re hosting a webinar featuring TDWI Research Director Fern Halper, discussing Operationalizing and Embedding Analytics for Action. Halper points out that analytics need to be embedded into your systems so they can provide answers right where and when they’re needed. Uses include support for logistics, asset management, customer call centers, and recommendation engines—to name just a few.  Dial in – you’ll learn something! Fern Halper We share our favorite data-driven observations and visualizations every week here.  What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: December 22: The Passage of Time in Sun, Stone, and Stars December 18: The Data Awakens December 11: Holiday Lights

Read More

Data Driven Digest for December 22: The Passage of Time in Sun, Stone, and Stars

Photo from RTE News, Ireland

Data visualization has been with us since the first cave-dweller documented the lunar month as a loop of symbols carved onto a piece of bone that hunters could carry with them to track the passage of the seasons. Obviously, technology has moved on in the past 34,000 years – have we told you lately about our iHub dashboards and embedded analytics? – but since the winter solstice (for the Northern Hemisphere) occurs Tuesday, Dec. 22, we thought this would be a good time to review some of the earliest efforts to create live, continuously updated reports of astronomical data, out of stone structures and the landscapes around them.   The fact that many of these calendars still exist after thousands of years, and still work, shows that our prehistoric ancestors must have considered visually recording the time of the year mission-critical, to predict hunting and harvest times, plus other seasonal events such as spring thaws, droughts, and monsoons.  (Whether accurately predicting and planning for those events was part of their yearly job performance review, we leave to the archaeologists…) So step outside and, if the weather permits, take a look at the sunrise or sunset and notice exactly where it hits the horizon, something our ancestors have done for thousands of generations.  Then come back into your nice warm room and check out these links.  Enjoy, and happy holidays! Sun Daggers The winter solstice is the time of year when the days are shortest and nights are longest.  As such, it was an anxious time for primitive people, who wondered when their world would stop getting darker and colder.  That’s why early astronomer-priests (the Chief Data Officers of their time) designed calendars that made clear exactly when the day reached its minimum and the sun was at the lowest point on the horizon – and would then start returning. One of the most impressive solar calendars is at Maeshowe, a 5,000-year-old “chambered cairn” in the Orkney Islands, north of Scotland.  It’s a long passage built of stone slabs dug into an artificial mound.  The passage is oriented so that for a few days around the winter solstice every year, the rays of the setting sun reach all the way down to light up the door at the end.  Two markers pinpoint the sun’s path on the exact date of the solstice: a monolith about half a mile away, called the Barnhouse Stone, and another standing stone at the entrance to Maeshowe (now missing, though its socket remains).         Even more impressive is Newgrange, a 5,000-year-old monument near Dublin, Ireland.  Newgrange was built as a 76-meter-wide circular mound of stones and earth covering an underground passage, possibly used as a tomb. A hollow box above the passage lets in the rising sun’s light for about 20 minutes at dawn around the winter solstice.  The beam starts on the passage’s floor, then gradually reaches down the whole 19-meter length of the passage, flooding it with light.  It’s an impressive spectacle, one that attracts thousands of people to the Newgrange site for the six days each December that the sunbeam is visible.   Nor were early Europeans the only ones taking note of the sun’s travels across the landscape.  At Fajada Butte, New Mexico, three stone slabs were positioned so that “dagger”-shaped beams of sunlight passing between the parallel slabs travel across carved spirals on the cliff face beneath at the summer and winter solstices and spring and fall equinoxes.   Fajada Butte is part of the Chaco Canyon complex, inhabited between the 10th and 13th centuries by the Anasazi or Ancestral Puebloans.  They built impressively engineered cliff dwellings, some as high and densely populated as big-city apartment buildings, laid out 10-meter-wide roads that spanned modern-day New Mexico, and harvested snowmelt and rainwater for irrigation through a sophisticated system of channels and dams.  The Anasazi religion was apparently based on bringing the order seen in the heavens down to earth, so many of their sites were oriented north-south or towards complex alignments of sun, moon, and stars – which may explain why Fajada Butte was just one of many solar observatories they built.   Researchers at the Exploratorium in San Francisco have designed an interactive simulator of how the Sun Daggers worked:   Looping Through Time From the passage of time documented in stone and earth thousands of years ago to the wanderings of a Time Lord:  Just in time for the annual “Dr. Who” Christmas special (and the beloved sci-fi show’s 50th anniversary), our friends at the BBC have created a clever interactive map of the travels through time of all 11 incarnations of Dr. Who. This looping diagram ingeniously displays all the journeys by actor, episode, and whether the trip was into or out of the past and future, as well as the actual year.  It’s not a linear chronology, but the course of a Time Lord’s adventures, like true love, never did run smooth. Light on Embedded Analytics Meanwhile, we’re hoping to shed some light on a topic dear to our heart – analytics.  On Jan. 12, 2016, we’re hosting a Webinar featuring TDWI Research Director Fern Halper, who will talk about Operationalizing and Embedding Analytics for Action. Halper points out that analytics need to be embedded into your systems so they can provide answers right where and when they’re needed.  Uses include support for logistics, asset management, customer call centers, and recommendation engines—to name just a few.  Dial in – we promise you’ll learn something! We share our favorite data-driven observations and visualizations every week here.  What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: December 18: The Data Awakens December 11: Holiday Lights December 4: Winners of the Kantar Information Is Beautiful Awards

Read More

Data Driven Digest for December 18: The Data Awakens


It is a period of data confusion. Rebel businesses, striking from a hidden NoSQL base, have assembled their first embedded application against the evil Big Data. During the battle, data scientists managed to steal secret plans to the Empire’s ultimate weapon, the SPREADSHEET, a mass of rows and columns with enough frustration and lack of scale that it could crash an entire business plan. While that may not be the plot of the new Star Wars film (or any for that matter), the scenario may invoke a few cheers for the noble data scientists tasked with creating dashboards and visualizations to battle the dark side of Big Data. Find out more on how to battle your own Big Data problem with Analytics As the world enjoys the latest installment of the Star Wars franchise, it seemed fitting for us to acknowledge visualizations based on the movie series. Strong is the data behind the Force. Enjoy these examples. The Force is Strong with This One Image source: Bloomberg Business At its core, the Star Wars movie franchise is about the battle between the light and dark sides of the Force. But how much time do they spend exploring that mystical power that surrounds us and penetrates us, and binds the galaxy together? Amazingly, a mere 34 minutes out the total 805 minutes amassed in the first six films. The screen above is one of five outstanding visualizations of the use of the Force created by a team of data reporters and visualization designers at Bloomberg Business. Creators Dashiell Bennett (@dashbot), Tait Foster (@taitfoster), Mark Glassman (@markglassman), Chandra Illick (@chandraelise), Chloe Whiteaker (@ChloeWhiteaker), and Jeremy Scott Diamond (@_jsdiamond) really draw you in. They break down not only the time spent talking about the Force but identifying which character uses the Force the most and what types of Force abilities are used. Each movie was viewed by the team with data compiled by hand and then entered into a spreadsheet.  If there were discrepancies, the team used the novelizations and screenplays of the films as references. While the project is engaging, it also digs deep, offering secondary layers of data such as the number of times Obi-Wan Kenobi uses the Jedi Mind Trick versus Luke Skywalker or Qui-Gon Jinn.   Great Shot, Kid, That Was One in a Million Image source: Gaston Sanchez  Sometimes the technologies behind visualizations need to be acknowledged. Our second entry is an example of an arc diagram that was compiled using the R technology.  The Star Wars tie-in here is a statistical text analysis of the scripts from the second trilogy (Episodes IV, V, and VI) using arc-diagram representations. Arc diagrams are often used to visualize repetition patterns. The thickness of the arc lines can be used to represent frequency from the source to the targets or “nodes,” as they are often called. The visualization is not often used as the reader may not clearly see the correlation between the different nodes. However, arc diagrams are great for showing relationships where time or numerical values aren’t involved. Here, the chart shows which characters speak to each other most often, and the words they use most. (No surprise, “sir” and “master” are C-3PO’s most common utterances, while Han Solo says “hey,” “kid,” and “get going” a lot.) Gaston Sanchez, a data scientist and lecturer with the University of California, Berkeley and Berkeley City College, came up with this arc diagram as part of a lecture he was giving on the use of Arc Diagrams with R. Sanchez showed how to use R’s “tm” and “igraph” packages to extract text out of the scripts and compute adjacency matrices. R has become embedded in the corporate world. R is an implementation of the S programming language developed by Bell Labs back in the 1990s. The language has been compared to Python as a way to dive into data analysis or apply statistical techniques. While R has typically been used by academics and researchers, more businesses are embracing R because it is seen as good for user-friendly analysis and graphical modeling.   This is the Data You are Looking For Image source: Eshan Wickrema and Lachlan James While “Star Wars, The Force Awakens” is expected to break box office records, it faces strong challengers to rank as one of the highest-grossing films of all time. According to stats from Box Office Mojo and Variety, the 1999 release of “Star Wars Episode I: The Phantom Menace” ranks number 20 on the list. When adjusted for inflation, the 1977 release of “Star Wars” is ranked third on the list of all-time movies behind “Gone with the Wind” and “Avatar.” Looking at the first 100 days of release is one key to understanding the return on investment for a given film. Writers Eshan Wickrema and Lachlan James compared the stats of the first six Star Wars films against each other. What’s significant is that each film made more in revenue than its predecessor, with the prequel films making nearly twice the amount of “Return of the Jedi,” the most popular of the original trilogy. We share our favorite data-driven observations and visualizations every week here.  What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: December 11: Illuminating the cost and output of Christmas lights December 4: 2015 winners of the Kantar Information Is Beautiful Awards November 27: Mapping music in color

Read More

Delivering Insights Interactively in a Customer-Centric World


Consumers today expect access to their information from wherever they are, at any time and from any device. This includes information such as statements, bills, invoices, explanations of benefits, and other transactional records that help them better understand their relationship with a business. What’s more, they want to be able to interact with their data to gain greater insight through sorting, grouping, graphing, charting, or otherwise manipulating the information. This directly impacts customer satisfaction and loyalty – the better companies are at giving customers ubiquitous access to that information, the more opportunities they have to delight and retain their customers. According to IDG’s 2015 “State of the CIO” report, customer experience technologies and mobile technologies are among CIOs’ top five priorities. The ability to access and interact with transactional data from any device falls squarely into both of these categories, so it’s no surprise organizations want to do so. The chart below demonstrates the level of importance CIOs have placed on providing customers the ability to interact with their information. However, while the understanding of the importance of this need is there, it doesn’t necessarily align with organizations’ ability to deliver. Some interesting facts uncovered through IDG’s August 2015 survey on customer data interactivity: 17 percent of organizations cannot provide access to that data across multiple devices. Two-thirds let customers access transactional data on any device, but only in a static format. Only 18 percent can give customers truly interactive transactional data via the device of their choice. The question then is, why is it so difficult for companies to provide information in interactive formats? The IDG survey reveals that many companies lack not only a strategy to enable interactivity, but the skilled resources to implement any strategy they might develop. Any attempts at cross-device interactivity are ad hoc and therefore difficult, where customers expect to be able to slice and dice their own transaction histories at will, allowing them to do so from their devices of choice. What can companies do about this? Where should they begin? What are the best practices in achieving the goal of interactivity in this information-driven age? To enable a better way to work in this age of disruption, OpenText recommends that IT leaders place priority on these three principles: Simplify access to data to reduce costs, improve efficiencies, and increase competitiveness. Consolidate and upgrade information and process platforms. Increase the speed of information delivery through integrated systems and visual presentation. Simplifying access to data is a key component of this puzzle. For very large organizations, data exists in different formats in different archives across disparate departments. This can become a huge barrier to getting information in a consolidated fashion, in the appropriate formats. Some of this data needs to be extracted in its native format, and other data can be repurposed from archived statements and reports. The data can exist in legacy and newer formats, and transforming this information into a common format acceptable to any device requires organization. Finally, accelerating the delivery of information in a device-agnostic manner can only be accomplished through integrated systems that can talk to each other and deliver data in visually compelling formats. All this requires an integrated look at the enterprise architecture and information flow. While this is very much achievable, it needs to be done in a systematic manner, with solutions that can address all the requirements as well as barriers in opening and freeing up information flow across the organization. OpenText, with its suite of products in the Output Transformation and Enterprise Content Management areas, has the toolkit to address different parts of this challenge with industry leading, best-in-class solutions. Download the IDG report,”Customers are demanding insights from their data. Are you ready?” to learn more about these principles of success, and how OpenText can help you deliver the interactive, digital experiences that the customer of today is demanding.

Read More

3 Questions: TDWI’s Fern Halper on Succeeding with Advanced Analytics


Dr. Fern Halper is director of TDWI (The Data Warehousing Institute) Research for advanced analytics. Well known in the analytics community, Halper has published hundreds of articles, research reports, speeches, and Webinars on data mining and information technology during the past 20 years. She focuses on advanced analytics, including predictive analytics, social media analysis, text analytics, cloud computing, and “Big Data” approaches to analytics. Halper is also co-author of several “Dummies” books on cloud computing, the hybrid cloud, and Big Data. OpenText chatted with Halper about the evolution of data-driven and advanced analytics. OpenText: More companies are embracing embedded analytics. Although the concept is not new, what has changed in this space? Where are we now with the technology? Halper:  Traditionally, the term “embedded analytics” referred to analytics that was built into an enterprise application, such as a CRM [customer relationship management] or an ERP [enterprise resource planning] system. This might have included reports or visualizations embedded into these applications.  I’m seeing both the terminology and technology changing to include visual analytics as a seamless part of a user interface as well as actionable analytics in interactive dashboards, automated analytics, analytics that operate in real time to change behavior, and much more. The idea is to operationalize these analytics; in other words, to make the analytics part of the business or operational process.  This brings the results closer to the decision makers and their actions. There are a number of factors driving this trend. First, organizations want right-time/real-time analytics so they can make more timely decisions and become more competitive. They are beginning to see data and analytics as a way to improve business processes, drive operational efficiencies, and grow revenue. Additionally, as the volume and frequency of data increase, it is often not possible to perform analysis and make decisions manually, so analytic actions need to become more automated. OpenText: What part do visualizations and dashboards play in the success of an analytics deployment? Halper:  A lot of the embedding is happening in dashboards.  In fact, in our most recent TDWI study on embedding analytics, 72% of those organizations that were already embedding analytics were doing it in dashboards.  These dashboards are evolving to include more visualizations as well as become more interactive.  The use of these dashboards – by executives, marketing, sales, finance, and operations – can help to drive an analytics culture.  In operations, it can help organizations become more effective and efficient.  These dashboards can be used to inform actions, which is a great first step in operationalizing analytics and making it more pervasive. OpenText: Can you provide examples of how embedded analytics is succeeding today? Where do you see the focus going next? Halper:  We’re seeing analytics embedded in devices, databases, dashboards, systems, and applications.  While a lot of the focus is in applications and dashboards, there is a move by companies to also embed more advanced analytics into databases or into systems.  For example, a data scientist might build a retention model. That retention model is then embedded into the database. As new data about a customer comes in, the customer is scored for the probability of defection. That information might be passed to others in the organization. Or, the analytics might operate automatically—that is a focus. An example of this would include a recommendation engine or a preventive maintenance application.  This is an evolving category. Customer support and operations are two major use cases.  In many instances a main driver is automating analysis that is too complex for people to perform. It can often involve making small automated decisions over and over again. This is one future path for operationalizing analytics, more commonly referred to as “enterprise decision management.” In terms of value, in this study I saw that those organizations that measured either top or bottom-line impact are more likely to embed analytics than those who did not, almost by a two-to-one margin. They tended to be a bit more advanced analytically.  They were also more likely to take automated action on their analytics, use alerts, and embed models into their systems. Halper is expected to discuss the research findings of her latest best practices report entitled “Operationalizing and Embedding Analytics for Action.” The webinar—sponsored by OpenText—will present her findings as well as best practices for getting started operationalizing analytics. Register for this webinar, to learn more.  

Read More

Data Driven Digest for December 11: Holiday Lights

A holiday light show illuminates Canada's

Christmas and the New Year are approaching, so this week we’re sharing some data visualizations with connections to holiday celebrations.  Pour yourself some eggnog (or glögg or other favorite holiday beverage), put on some seasonal music, and settle in for some great watching.  Enjoy! Shed Some Light on the Issue December 13 is celebrated as St. Lucia Day in several countries, from Sweden to Italy and even the St. Lucia Islands.  As fits a (possibly legendary) Catholic saint whose name derives from the Latin word for light, lux / lucis, this is a celebration of light at a time when the winter solstice is approaching and the days are at their shortest. Speaking of light, when we’re surrounded by inexpensive electric light around the clock, it’s hard to imagine how dependent productivity is on reliable light sources. Professor Max Roser at Our World in Data, one of our favorite resources, demonstrates how the Industrial Revolution both drove the demand for more light and filled it.   His interactive graph, based on the work of economists Roger Fouquet and P.J.G. Pearson, shows how the price of lighting dropped sharply starting about 1750.  That’s when new energy sources became available, starting with whale oil and kerosene (for lamps) and cheap beef tallow (for candles).  The mid-19th century added arc lights and gas lamps.  Then, once electricity became common around 1900, the price of illumination dropped to nearly nothing, relative to what it had been in the Middle Ages. Meanwhile, as lighting became cheaper, cleaner, and more convenient, everyone took advantage of it.  Cities began putting up lamps to make streets safer.  Factory owners added night shifts.  Students, housewives, shoppers, entertainment-seekers – everyone felt liberated by the electric bulb. This Little Light of Mine… And of course, that leads us to Christmas lights.  In many countries, they’re a source of neighborhood, city, or even national pride (as shown by the Wonders of Winter show every year in our headquarters of Waterloo, Canada, and the sound-and-light shows on Parliament Hill in Ottawa).   Despite the huge cost advantage of electricity over earlier light sources, incandescent bulbs are not very energy-efficient.  So Christmas lights can still cause a sizable bump in many household budgets (about $20-50 extra, depending on the price of a kilowatt-hour in your area and how intensely you decorate). But in recent years, innovations in bulbs, especially small LEDs, have dropped their energy demands considerably.  The Western Area Power Administration (WAPA) reported in 2010 that a string of LED C7 bulbs (the thumb-sized ones used mostly outdoors) would cost only 23 cents to run during the entire holiday season, compared to $7.03 for conventional incandescent bulbs.  (Miniature bulbs are cheaper than C7s, even if you don’t switch to LEDs.  The State of California estimates that a string of indoor lights, running a total of 300 hours a month, would cost $1.38 to operate if it’s made up of miniature bulbs, vs. $4.31 for C7 bulbs — a 3-fold price difference.)   We’re Burning Daylight Want to take the best photos of your holiday light display?  Professional photographer Jay P. Morgan has great tips on his blog, The Slanted Lens. For a pleasant, soft glow, shoot your photos just as the sun is going down.  That way, there’s still some light in the sky to help illuminate your house, yard, and so forth, Morgan explains.  If you wait until full darkness, the contrast between the lights and the rest of the image is too stark; details on the house won’t “pop” and it won’t show up well against the sky. The data-visualization angle?  His handy chart showing how the ideal moment for Christmas-card photography comes when the fading daylight drops to the same brightness level as your lights.  (He also illustrates how the color temperature drops with the light level.)     When the Lights Go Down on the City While we’re on the topic of light, let’s consider how much can be gleaned from high-altitude pictures of the Earth after dark.  Images taken by NASA satellites show interesting correlations to human activities.   The NASA scientists who compiled the satellite images into this impressive display and shared it through the Visible Earth project note: The brightest areas of the Earth are the most urbanized, but not necessarily the most populated. (Compare Western Europe with China and India.) Cities tend to grow along coastlines and transportation networks. … The United States interstate highway system appears as a lattice connecting the brighter dots of city centers. In Russia, the Trans-Siberian railroad is a thin line stretching from Moscow through the center of Asia to Vladivostok. … Even more than 100 years after the invention of the electric light, some regions remain thinly populated and unlit. The interior jungles of Africa and South America are mostly dark, but lights are beginning to appear there. Deserts in Africa, Arabia, Australia, Mongolia, and the United States are poorly lit as well (except along the coast), along with the boreal forests of Canada and Russia, and the great mountains of the Himalaya. And Roser, doing his own analysis of the Visible Earth images, points out that the level of lighting often marks a sharp political and economic divide, such as between North and South Korea.  Prosperous South Korea glows after dark, especially around the capital, Seoul.  But its northern counterpart, kept poor by decades of Communist dictatorship, is nearly invisible after dark.   Meanwhile, we’re hoping to shed some light on a topic dear to our heart – analytics.  On Jan. 12, 2016, we’re hosting a Webinar featuring TDWI Research Director Fern Halper, who will talk about Operationalizing and Embedding Analytics for Action. As Halper points out, what good is having analytic capacity in your business processes if nobody uses it?  Analytics needs to be embedded into your systems so they can provide answers right where and when they’re needed.  Uses include support for logistics, asset management, customer call centers, and recommendation engines—to name just a few.  We hope you’ll dial in – we promise you’ll learn something! We share our favorite data-driven observations and visualizations every week here.  What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: December 4: 2015 winners of the Kantar Information Is Beautiful Awards November 27: Mapping music in color November 20: Popular diets, parole risk assessment, hot startup universities  

Read More

Data Driven Digest for December 4: Data Is Beautiful

job data tracker

The end of the year is approaching, which means that for many of us, it’s time to take stock: “Biggest News Stories of 2015,” “10 Beloved Celebrities We Lost This Year,” and the like. In our line of work—analytics—it’s a good opportunity to survey some of the best data visualization examples of the past 12 months. So this week we’re sharing with you some of the 2015 winners of the Kantar Information Is Beautiful Awards, an annual contest organized by independent data journalist David McCandless (@infobeautiful), author of “Knowledge Is Beautiful” and “Information is Beautiful.” Winners were selected by a public vote from among an impressive long list of candidates. Their topics ranged from weather to politics to the growth of an unborn baby during pregnancy. Find yourself a comfortable seat and settle in for browsing—you’ll find a lot of great things to look at. Enjoy! Red vs Blue: If you live in the United States and feel, judging by the tone of political coverage, that politics has gotten more ruthlessly partisan in recent years, you’re not wrong.  When political scientists crunched the numbers on the voting behavior of U.S. Representatives since the end of World War II, they found that across-the-aisle cooperation between members of different parties has dropped steadily in the last 40 years. The reasons for this increasing polarization are complex—Congressional districts designed to favor one party or the other, an increasingly mobile population more likely to elect candidates by party rather than their stance on purely local issues, big money, politicians flying home on weekends to be with constituents rather than staying in Washington, D.C. to build relationships, and more. The authors of the underlying paper documented their findings in a table of statistics. That’s fine, but what really sells this story is the visualization drawn up by Mauro Martino, an Italian designer who leads the Cognitive Visualization Lab for IBM’s Watson group. His network diagrams show how, year by year, the blue dots representing Democrats and the red ones for Republicans pull further apart. The images are hauntingly beautiful – they could be galaxies colliding or cells dividing. Yet they are telling a story of increasing division, bitterness, and gridlock. Hello world/ Bonjour, le monde! / Nǐ hǎo, shìjiè! Silver medalist in the Information is Beautiful Awards’ data visualization category is “A World of Languages” by Alberto Lucas López (@aLucasLopez), graphics director at the South China Morning Post in Hong Kong. López’s diagram ingeniously carves up a circle representing the Earth into sectors for each of the major languages, and within them the countries where each language is spoken. He supplements his eye-catching image with smaller charts showing language distributions by country (curiously, Papua New Guinea is far ahead of the rest, with 839 separate languages spoken) and the most popular languages to learn. (No surprise to anybody that English far outstrips the rest, with 1.5 billion people learning it – nor that Chinese and two former colonial languages still spoken in many countries, French and Spanish, are next.  But it says something about cultural influence, from Goethe to manga, that German, Italian, and Japanese are the runners-up.) Working for a Living: Job recruiters and economists are keenly aware of rises and falls in national unemployment rates and which industries or sectors are growing, but most businesses can derive value of this information too. The Wall Street Journal team of graphics editor Renee Lightner (@lightnerrenee) and data journalist Andrew Van Dam (@andrewvandam) created an interactive dashboard of unemployment and job gain/loss statistics in the U.S. that conveys an amazing amount of data from the past 10 years (and going back to 1950 in some views). This job data tracker only tied for a bronze medal in the category of Interactive Data Visualization – which says something about the level of competition in this field. Because of the tracker’s thoughtful design, it can answer a wide range of questions.  How about: What sectors of the economy have shown the most sustained growth? Health care and social services, followed by professional services and then restaurants—probably a sign that the economic recovery means people have more to spend on a dinner out. Most sustained losses? Manufacturing, government, and construction—even though construction also shows up on the dot-matrix chart as having frequent peaks as well. (Construction jobs went up .83% in Feb. 2006—something we now recognize as a symptom of the housing bubble.) Meanwhile, “Unemployment rate, overall” vividly charts the recessions of the past 60 years in a colorful mosaic that could easily be featured in an upscale décor magazine. This is just a small sampling of the ingenious ways some of our best thinkers and designers have come up with to analyze and display data patterns. We share our favorites every Friday here. If you have a favorite you’d like to share, please comment below. Recent Data Driven Digests: November 27: Mapping music in color November 20: Popular diets, parole risk assessment, hot startup universities November 13: Working parents, NBA team chemistry, scoring different colleges

Read More

Data Driven Digest for November 27: Data Mapping Music

Étude Chromatique

One of the biggest motivations behind creating graphs, charts, dashboards, and other visualizations is that they make patterns in data easier to perceive. We humans are visually oriented creatures who can intuitively note patterns and rhythms, or spot a detail out of place, through imagery long before we can detect them in written reports or spreadsheets. Or sheet music, for that matter. For this week, we present some examples of how to display music visually, which may get you thinking of other creative ways to visualize data and bring patterns to the surface. Enjoy! If you’ve had any experience reading music, you may be able to tell some things about a piece by looking at its written score. For example, you could probably tell that this piece (an excerpt from Arvo Pärt’s “Spiegel Im Spiegel”) is of a gentler, less agitated nature than this one (the introduction to “Why Do the Nations So Furiously Rage,” from Handel’s “Messiah,” which you might be hearing this holiday season). In fact, Handel and his contemporaries expected listeners to be reading along to the music in the printed score and appreciate the “word-painting” with which they illustrated the text or mood of the music. The practice of word-painting has become less common as fewer and fewer people in modern generations learn to read sheet music. But some composers have found other ways to illustrate their music. The avant-garde composer Ivan Wyschnegradsky created “chromatic” music in both senses of the word. He used not only every note in the 12-note tuning system of classical Western music (where adjoining notes on a piano keyboard are a half-step apart, like A, B-flat, and B natural – what is called a chromatic scale), but notes “between the cracks.” These “ultra-chromatic” pieces required special keyboards that could play two or three different notes between the keys of a regular piano. It’s hard for people who don’t have perfect pitch to hear the difference between these so-called “quarter-tones,” but they lend a subtle eeriness to his music. (Here’s an example: “24 Preludes in Quarter-Tone System.” Then Wyschnegradsky turned to a familiar data-visualization technique: Color. He started representing his music in rainbow-hued wheels, like this (picture via the Association Ivan Wyschnegradsky, Paris). Ever since childhood, he had been fascinated by rainbows. As an adult, he noted the parallels between the 12 colors of the chromatic spectrum (red, orange, yellow, green, blue, and purple, plus the intermediate hues of red-orange, orange-yellow, and so forth) and the 12 chromatic notes in classical music. And just as colors can shade into one another subtly (is this lipstick reddish-orange, or an orangey-red?), so can musical notes (like a slide whistle or trombone). The parallels were too good to pass up. So Wyschnegradsky assigned a color on the spectrum to each of the dozens of quarter-tones in his musical system, then plotted his melodies in circles like a spiderweb or radar chart. As Slate Magazine blogger Greta Weber wrote: Each cell on these drawings corresponds to a different semitone in his complex musical sequences. If you look closely enough, you can follow the spirals as if it were a melody and “listen” to the scores they represent. Wyschnegradsky’s color-wheel scheme never caught on. But the patterns it brings to light have parallels in popular visualization systems, from traffic delays to weather. It’s clear that color helps illuminate. Like what you see? Every Friday we share great data visualizations and embedded analytics. If you have a favorite or trending example, please comment below. Recent Data Driven Digests: November 20: Popular diets, parole risk assessment, hot startup universities November 13: Working parents, NBA team chemistry, scoring different colleges

Read More

Data Driven Digest for November 20: The Last Mile of Big Data

diet search

In a recent study by the Digital Clarity Group, Thinking Small: Bringing the Power of Big Data to the Masses, our very own Allen Bonde (@abonde) from before he joined OpenText noted that the best opinions are formed and actions are taken within the “Last Mile” of Big Data. By Last Mile, Allen means the most immediate information and data that is accessed or consumed. For application designers, meeting the Last Mile challenge requires understanding self-service use cases and leveraging tools that turn Big Data into “small data” that helps people perform specific tasks. Some of these use cases include insight, monitoring and targeting. For this week, we provide some examples of visualizations that crunch their fair share of Big Data on the back end but present it in a way that meets the Last Mile challenge. Enjoy. Popular Diets With the holidays coming up, we thought we’d look at dieting trends over the last few years. Health and science reporter Julia Belluz (@juliaoftoronto) assembled a review using Google Analytics based on most searched diets by year and metropolitan area. Reaching back to 2005, the series of visualizations allows the viewer to see a slow yet steady spread of first the gluten-free and now the Paleolithic diets that command the news cycles and self-help bookshelves. Other diets covered include vegan eating, the low-carb diet, the South Beach Diet, and the Atkins Diet. Parole Risk Assessment While we leave the subject of incarceration up to the experts, this interactive visualization from our friends over at FiveThirtyEight (@FiveThirtyEight) caught our eye. A recent trend emerging in criminal sentencing is the notion of using predictive analytics and risk assessments to determine how likely a prisoner will commit the same crime in the future. Scores are determined by factors such as gender, county, age, current offense, number of prior arrests, and if multiple charges were filed. The authors of the FiveThirtyEight study point out that more than 60 risk assessment tools are being used across the U.S.  Although they vary widely, “in their simplest form, they are questionnaires — typically filled out by a jail staff member, probation officer or psychologist — that assign points to offenders based on anything from demographic factors to family background to criminal history. The resulting scores are based on statistical probabilities derived from previous offenders’ behavior. A low score designates an offender as ‘low risk’ and could result in lower bail, less prison time or less restrictive probation or parole terms; a high score can lead to tougher sentences or tighter monitoring.” The simulation I quote is loosely based on the Ohio Risk Assessment System’s Re-Entry Tool, which is intended to assess the probability of prisoners reoffending after they are released from prison. The visualization was produced in collaboration with The Marshall Project (@MarshallProj), a nonprofit news organization that covers the criminal justice system. Considering the U.S. Attorney General’s office has endorsed the idea of risk assessment, it’s likely that visualizations will be used in the future to manage criminal sentencing. School for Startups It’s not who you know, but where you go to college, that could determine the success of your startup, according to our last visualization. Our friends over at DataBucket built a series of visualizations based on data and a Crunchbase API in order to compare the top 5,000 most funded startups over the past 15 years and the education of each of their founders. They found success has a pattern. Graduating from universities that are prestigious, on the West Coast, focused on engineering, and/or offer high-powered MBA programs helps increase your chances for smarter founders and benefactors with deep pockets. “In terms of average amount of funding graduates from each school gets, Harvard, MIT, and Stanford get a standard amount of funding. Indian Institute of Technology has a disproportionately high average funding as well as a large number of founders,” the DataBucket authors comment. “Hanzhou Normal University and Zhejiang University of Technology are off the charts for average funding received. This is attributed completely to Jack Ma and Eddie Wu, [the] founders of Alibaba.” Like what you see? Every Friday we share great data visualizations and embedded analytics. If you have a favorite or trending example, please comment below. Recent Data Driven Digests: November 13: Working parents, NBA team chemistry, scoring different colleges

Read More