The ‘It’ Role in IT: Chief Data Officer


A recent analyst report predicts that the majority of large organizations (90%) will have a Chief Data Officer by 2019. This trend is driven by the competitive need to improve efficiency through better use of information assets. To discuss this evolving role and the challenges of the CDO, Enterprise Management 360 Assistant Editor Sylvia Entwistle spoke to Allen Bonde, a long-time industry watcher with a focus on big data and digital disruption. Enterprise Management 360: Why are more and more businesses investing in the role of Chief Data Officer now? Bonde: No doubt the Chief Data Officer is kind of the new ‘It’ role in the executive suite, there’s a lot of buzz around this role. Interestingly enough, it spans operations, technology, even marketing, so we see this role in different areas of organizations. I think companies are looking for one leader to be a data steward for the organization, but also they’re looking for that same role to be a bit more focused on the future – to be the driver of innovation driven by data. So, you could say this role is growing because it’s at the crossroads of digital disruption and technology, as well as the corporate culture and leadership needed to manage in this new environment. It’s a role that I think could become the new CIO in certain really data-centric businesses. It also could become the new Chief Digital Officer in some cases. If you dig a little bit into responsibilities, it’s really about overseeing data from legacy systems and all of your third-party partners as well as managing data compliance and financial reporting. So it’s a role that has both an operational component as well as a visionary, strategic component. This is where maybe it departs from that purely technical role where there’s almost a left brain, right brain component of the CDO, empowered by technology, but ultimately it’s about people using technology in the right way, in a productive way to move the business forward. Enterprise Management 360: What trends do you think will have the biggest impact on the Chief Data Officer in 2016? Bonde: In terms of the drivers, it’s certainly about the growth of digital devices and data, especially coming from new sources. So there’s a lot of focus on device data with IoT or omni-channel data coming from the different touch points in the customer journey. There are these new sources of data and even if we don’t own them per se, they’re impacting our business, so I think that’s a big driver. Then there’s this notion of how new devices are shifting consumption models. So if you think about just a simple smartphone, this device is both creating and consuming data, changing the dynamic of how individuals are creating and interacting with their consuming data. There’s a whole back-drop to this from a regulatory perspective with privacy and other risk factors. Those are certainly drivers that are motivating companies to say “we need an individual or an office to take the lead in balancing the opportunity of data with the risk of data and making sure that we don’t, number one: get in trouble as a business but number two: we take advantage of what opportunity is in front of us.” Enterprise Management 360: Honing in on the data side of things – what are the most important levers for ensuring a return on data management and how do you think those returns should be measured? Bonde: A common thread across wherever the CDO is in the organization is going to be focused on outcomes and yes, it’s about technology; yes, it’s about consumer adoption. In terms of the so-called CDO agenda, I think that outcomes need to be pretty crisp; for example, we need to lower our risk profile or we need to improve our margins for a certain line of business or we’re all about bringing a product to market and so I think focusing on outcomes and getting alignment with those outcomes is the first most important lever that you have as a CDO. The second one I would argue is adoption and you can’t do anything at scale with data if you don’t have widespread adoption. The key to unlocking the value of data as an asset is driving the broadest adoption, so that most people in an organization including your customers and your partners get value out of the insights that you’re collecting. Ultimately this is about focusing on delivering insight into the everyday work that the organization is doing, which is very different to classic business intelligence or even big data, which used to be the domain of a relatively small number of people within the organization, who were responsible for parceling out the findings as they saw fit. I think the CDO is breaking down those walls, but this also means the CDO is facing a much bigger challenge than just simply getting BI tools in the hands of a few business analysts. Enterprise Management 360: A lot of people are seeing big data as a problem. Where would you draw the line between the hype and the fact of big data? Bonde: It’s a funny topic to me, having started earlier in my career as a data scientist working with what then was very large data sets and seeing if we could manage risk. This was in a telco, and we didn’t call it big data then, but we had the challenge of lots of data from different sources and trying to pull the meaning out, so that we could make smarter decisions about who might be a fraudulent customer or which markets we could go after. When you talk to CDO’s about big data I think that the role has benefited from hype around big data, because big data set the table for the need for a CDO. Yet CDO’s aren’t necessarily just focused on big data and in fact, one CDO we were talking to expressed, “we have more data than we know what to do with.” So they acknowledged the fact that they already have lots of that data, but their main struggle was in understanding it. It wasn’t necessarily a big data problem, it was a problem of finding the right data, and we see this day-to-day when we work with different clients; you can do an awful lot in terms of understanding by blending different data sources, not necessarily big data sources, but just different types of data and you can create insight from relatively small data sets if it’s packaged and delivered in the right way. IoT is a perfect example, people are getting excited about the challenges that managing data in the era of IoT will present, but it’s not really a big data problem, it’s a multi-source data problem. So I think the hype of big data has been useful for the market to get aligned with the idea of the importance of data, certainly the value of it when it’s collected, blended, cleaned and turned into actual insights. But in a way the big data problem has shifted to more a question of, “how do we get results quickly?” Fast is the new ‘big’ when it comes to data. I think that getting results quickly, transcends the ultimate result. If we can get a good, useful insights out quickly, that’s better than the perfect model or the perfect result. We hear this from CDO’s that they want to work in small steps, they want to fail fast. They want to run experiments that show the value of applying data in the frame of a specific business problem or objective. So to them big data may be creating their role but on a day-to-day basis it’s more about fast data or small data and blending lots of different types of data and then putting that into action quickly, which is where I think the cloud is as important as big data is to the CDO and cloud services like, Analytics-as-a-Service. Now, it may deal with big data but it almost doesn’t matter what the size of the data is; it’s how quickly you can apply it for a specific business problem. The CDO could be that keeper of the spectrum of how you’re collecting, securing, imagining your data but ultimately they’ll be judged by how successful they are at turning that data into practical insights for the everyday worker. That’s where the ROI and return on data management is across the whole spectrum, but ultimately if people can’t put the insights into use in a practical fashion and make it easy, it almost doesn’t matter what you’ve done at the back end. Read more about the Chief Data Officer’s role in a digital transformation blog by OpenText CEO, Mark Barrenchea.

Read More

Unstructured Data Analysis – the Hidden Need in Health Care

The ‘Hood I recently had the opportunity to attend the HIMSS 2016 conference in Las Vegas, one of the largest annual conferences in the field of health care technology. As I walked through the main level of the Exhibit Hall, I was amazed at the size of some vendor booths and displays. Some were the size of a house. With walls, couches, and fireplaces, they really seemed like you could move in! I was working at the OpenText booth on the Exhibit Hall level below the main one, which was actually a converted parking garage floor. Some exhibitors called this lower level “The ‘Hood”.  I loved it, though; people were friendly, the displays were great, and there were fresh-baked cookies every afternoon. I don’t know how many of the nearly 42,000 conference attendees I talked to, but I had great conversations with all of them. It’s an awesome conference to meet a diverse mix of health care professionals and learn more about the challenges they face. The Trick Half of the people I talked to asked me about solutions for analyzing unstructured data. When I asked them what kind of unstructured data they were looking to analyze, 70% of them said claim forms and medical coding. This actually surprised me. As a software developer with a data analysis background, I admit to not being totally up on health care needs. Claim forms and medical coding to me have always seemed very structured. Certain form fields get filled in on the claim form and rigid medical codes get assigned to particular diagnoses and treatments. Seems straightforward, no? What I learned from my discussions was that claims data requires a series of value judgments to improve the data quality.  I also learned that while medical coding is the transformation of health care diagnosis, procedures, medical services, and equipment into universal medical codes, this information is taken from transcriptions of physicians’ notes and lab results. This unstructured information is an area where data analysis can help immensely. The trick now is “How do we derive value from this kind of data?” The Reveal OpenText has an unstructured data analysis methodology that accesses and harvests data from unstructured sources. Its InfoFusion and Analytics products deliver a powerful knockout combination. The OpenText InfoFusion product trawls documents and extracts entities like providers, diagnoses, treatments and topics. It can then apply various analyses, such as determining sentiment. The OpenText Analytics product line can then provide advanced exploration and analysis, dashboards, and reports on the extracted data and sentiment. It then provides secure access throughout the organization through deployment on the OpenText Information Hub (iHub). Users will enjoy interactive analytic visualizations that will allow them to gain unique insights from the unstructured data sources. The Leave-Behind If you’re interested in learning more about our solution for unstructured data analytics, you can see it in action in this application, While this is not a health care solution, it demonstrates the power of unstructured data analysis that allows users to visually monitor, compare, and discover interesting facts. If you’re interested in helping me develop an example using claim forms or medical coding data, please contact me at I definitely want to demonstrate this powerful story next year at the HIMSS conference. See you next year in the ‘Hood!

Read More

Security Developments: It’s All About Analytics

Analytics is everywhere. It is on the news, in sports, and of course, part of the 2016 US elections.  I recently attended the RSA Conference 2016, an important trade show for security software solutions, because I wanted to see how security vendors were using analytics to improve their offerings. Roaming through hall after hall of exhibits, I saw some interesting trends worth sharing. One of the first things I noticed was how many times analytics was mentioned in the signage of different vendors. I also noticed a wide range of dashboards showing all different types of security data. (With this many dashboards you’d think you were at a BI conference!) You see, security is no longer just about providing anti-virus and anti-malware protection in a reactive mode. Security vendors are utilizing cybersecurity and biometric data to try to understand and mount defenses in real-time when an attack is happening. To do this, they need to analyze large amounts of data. This made me realize what some analysts are predicting. It isn’t the data that has the value, it is the proprietary algorithms. Smarter Analytics = Stronger Security This is definitely true in the security space. Many vendors are providing the same types of service; one of the ways they can differentiate themselves is the algorithms they use to analyze the data. They have to gather a large amount of data to get baselines of network traffic. Then they use algorithms to analyze data in real-time to understand if something is happening out of the norm. They hope to spot when an attack is happening at a very early stage, so they can take action to stop and limit damage before it can shut down a customer’s network or website. This is why algorithms are important. Two different products may be looking at the same data, but one detects an attack before the other. This, to me, has big data analytics written all over it. Security vendors are also paying attention to analytics from the IoT (Internet of Things). A typical corporate data security application gathers a lot of data from different devices – network routers and switches, servers, or workstations, just to name a few. The security app will look at traffic patterns and do deep packet inspection of what is in the packets. An example would be understanding what type of request is coming to a specific server: What port is it asking for and where did the request originate from?  This could help you understand if someone is starting a DoS (Denial of Service) attack of probing for a back door into your network or server. What can we learn from the trends on display at RSA this year? I think they show how analytics can help any business, in any industry. Dashboards are still very popular and efficient in displaying data to users to allow them to understand what is happening, and then make business decisions based on that data. And, not all advanced analytic tools are equal, beecause it is not about the data but whether their algorithms can help you use that data to understand what is happening, and make better business decisions. OpenText Analytics provides a great platform for businesses to create analytic applications, and use data to make better decisions faster. To get an idea of what OpenText Analytics can do, take a look at our Election Tracker ’16 app.  

Read More

Wind and Weather – Data Driven Digest

It’s the beginning of March, traditionally a month of unsettled early-spring weather that can seesaw back and forth between snow and near-tropical warmth, with fog, rain, and windstorms along the way. Suitably for the time of year, the data visualizations we’re sharing with you this week focus on wind and weather. Enjoy! You Don’t Need a Weatherman… Everyone’s familiar with the satellite imagery on the weather segment of your nightly TV news. It’s soothing to watch the wind flows cycle and clouds form and dissipate.  Now an app called Windyty lets you navigate real-time and predictive views of the weather yourself, controlling the area, altitude, and variables such as temperature, air pressure, humidity, clouds, or precipitation.  The effect is downright hypnotic, as well as educational – for example, you can see how much faster the winds blow at higher altitudes or watch fronts pick up moisture over oceans and lakes, then dump it as they hit mountains. Windyty’s creator, Czech programmer Ivo Pavlik, is an avid powder skier, pilot, and kite surfer who wanted a better idea of whether the wind would be right on days he planned to pursue his passions. He leveraged the open-source Project Earth global visualization created by Cameron Beccario (which in its turn draws weather data from NOAA, the National Weather Service, other agencies, and geographic data from the free, open-source Natural Earth mapping initiative). It’s an elegant example of a visualization that focuses on the criteria users want as they query a very large data source. Earth’s weather patterns are so large, they require supercomputers to store and process.  Pavlik notes that his goal is to keep Windyty a light-weight, fast-loading app that adds new features only gradually, rather than loading it down with too many options. …To Know Which Way the Wind Blows Another wind visualization, Project Ukko, is a good example of how to display many different variables without overwhelming viewers. Named after the ancient Finnish god of thunder, weather, and the harvest, Project Ukko models and predicts seasonal wind flows around the world. It’s a project of Euporias, a European Union effort to create more economically productive weather prediction tools, and is intended to fill a gap between short-term weather forecasts and the long-term climate outlook. Ukko’s purpose is to show where the wind blows most strongly and reliably at different times of the year. That way, wind energy companies can site their wind farms and make investments more confidently.  The overall goal is to make wind energy a more practical and cost-effective part of a country’s energy generation mix, reducing dependence on polluting fossil fuels, and improving its climate change resilience, according to Ukko’s website. The project’s designer, German data visualization expert Moritz Stefaner, faced the challenge of displaying projections of the wind’s speed, direction, and variability, overlaid with locations and sizes of wind farms around the world (to see if they’re sited in the best wind-harvesting areas). In addition, he needed to communicate how confident those predictions were for a given area. As Stefaner explains in an admirably detailed behind-the-scenes tour, he ended up using line elements that show the predicted wind speed through line thickness and prediction accuracy, compared to decades of historical records, through brightness.  The difference between current and predicted speed is shown through line tilt and color. Note, the lines don’t show the actual direction the winds are heading, unlike the flows in Windyty. The combined brightness, color, and size draw the eye to the areas of greatest change. At any point, you can drill down to the actual weather observations for that location and the predictions generated by Euporias’ models. For those of us who aren’t climate scientists or wind farm owners, the take-away from Project Ukko is how Stefaner and his team went through a series of design prototypes and data interrogations as they transformed abstract data into an informative and aesthetically pleasing visualization. Innovation Tour 2016 Meanwhile, we’re offering some impressive data visualization and analysis capacities in the next release of our software, OpenText Suite 16 and Cloud 16, coming this spring.  If you’re interested in hearing about OpenText’s ability to visualize data and enable the digital world, and you’ll be in Europe this month, we invite you to look into our Innovation Tour, in Munich, Paris, and London this week and Eindhoven in April.  You can: Hear from Mark J. Barrenechea, OpenText CEO and CTO, about the OpenText vision and the future of information management Hear from additional OpenText executives on our products, services and customer success stories Experience the newest OpenText releases with the experts behind them–including how OpenText Suite 16 and Cloud 16 help organizations take advantage of digital disruption to create a better way to work in the digital world Participate in solution-specific breakouts and demonstrations that speak directly to your needs Learn actionable, real-world strategies and best practices employed by OpenText customers to transform their organizations Connect, network, and build your brand with public and private industry leaders For more information on the Innovation Tour or to sign up, click here.   Recent Data Driven Digests: February 29: Red Carpet Edition February 15: Love is All Around February 10: Visualizing Unstructured Content

Read More

Red Carpet Edition—Data Driven Digest

Film wheel and clapper

The 88th Academy Awards will be given out Sunday, Feb. 28. There’s no lack of sites to analyze the Oscar nominated movies and predict winners. For our part, we’re focusing on the best and most thought-provoking visualizations of the Oscars and film in general.  As you prepare for the red carpet to roll out, searchlights to shine in the skies, and celebrities to pose for the camera, check out these original visualizations. Enjoy! Big Movies, Big Hits Data scientist Seth Kadish of Portland, Ore., trained his graphing powers on the biggest hits of the Oscars – the 85 movies (so far) that were nominated for 10 or more awards. He presented his findings in a handsome variation on the bubble chart, plotting numbers of nominations against Oscars won, and how many films fall into each category.  (Spoiler alert:  However many awards you’re nominated for, you can generally expect to win about half.) As you can see from the chart, “Titanic” is unchallenged as the biggest Academy Award winner to date, with 14 nominations and 11 Oscars won.  You can also see that “The Lord of the Rings: Return of the King” had the largest sweep in Oscars history, winning in all 11 of the categories in which it was nominated. “Ben-Hur” and “West Side Story” had nearly as high a win rate, 11 out of 12 awards and 10 out of 11, respectively. On the downside, “True Grit,” “American Hustle,” and “Gangs of New York” were the biggest losers – all of them got 10 nominations but didn’t win anything. Visualizing Indie Film ROI Seed & Spark, a platform for crowdfunding independent films, teamed up with the information design agency Accurat to create a series of gorgeous 3-D visualizations in the article “Selling at Sundance,” which looked at the return on investment 40 recent indie movies saw at the box office. (The movies in question, pitched from 2011 to 2013, included “Austenland,” “Robot and Frank,” and “The Spectacular Now.”) The correlations themselves are thought-provoking – especially when you realize how few movies sell for more than they cost to make. But even more valuable, in our opinion, is the behind-the-scenes explanations the Accurat team supplied on Behance of how they built these visualizations – “(giving) a shape to otherwise boring numbers.” The Accurat designers (Giorgia Lupi, Simone Quadri, Gabriele Rossi, and Michele Graffieti) wanted to display the correlation between three values: production budget, sale price, and box office gross.  After some experimentation, they decided to represent each movie as a cone-shaped, solid stack of circles, with shading representing budget at the top to sale price at the top; the stack’s height represents the box office take. They dress up their chart with sprinklings of other interesting data, such as the length, setting (historical, modern-day, or sci-fi/fantasy), and number of awards each movie won. This demonstrated that awards didn’t do much to drive box office receipts; even winning an Oscar doesn’t guarantee a profit, Accurat notes. Because the “elevator pitch” – describing the movie’s concept in just a few words, e.g. “It’s like ‘Casablanca’ in a dystopian Martian colony” – is so important, they also created a tag cloud of the 25 most common keywords used on to describe the movies they analyzed. The visualization was published in hard copy in the pilot issue of Bright Ideas Magazine, which was launched at the 2014 Sundance Film Fest. Movie Color Spectrums One of our favorite Oscars is Production Design. It honors the amazing work to create rich, immersive environments that help carry you away to a hobbit-hole, Regency ballroom, 1950s department store, or post-apocalyptic wasteland.  And color palettes are a key part of the creative effect. Dillon Baker, an undergraduate design student at the University of Washington, has come up with an innovative way to see all the colors of a movie.  He created a Java-based program that analyzes each frame of a movie for its average color, then compresses that color into a single vertical line. They get compiled into a timeline that shows the entire work’s range of colors. The effect is mesmerizing. Displayed as a spectrum, the color keys pop out at you – vivid reds, blues, and oranges for “Aladdin,” greenish ‘70s earth tones for “Moonrise Kingdom,” and Art Deco shades of pink and brown for “The Grand Budapest Hotel.”  You can also see scene and tone changes – for example, below you see the dark, earthy hues for Anna and Kristoff’s journey through the wilderness in “Frozen,” contrasted with Elsa’s icy pastels. Baker, who is a year away from his bachelor’s degree, is still coming up with possible applications for his color visualization technology. (Agricultural field surveying?  Peak wildflower prediction? Fashion trend tracking?) Meanwhile, another designer is using a combination of automated color analysis tools and her own aesthetics to extract whole color palettes from a single movie or TV still. Graphic designer Roxy Radulescu comes up with swatches of light, medium, dark, and overall palettes, focusing on a different work each week in her blog Movies in Color.  In an interview, she talks about how color reveals mood, character, and historical era, and guides the viewer’s eye. Which is not far from the principles of good information design! Recent Data Driven Digests: February 15: Love is All Around February 10: Visualizing Unstructured Content January 29: Are You Ready for Some Football?    

Read More

HIMSS16: OpenText Prescribes Healthier Patient and Business Outcomes

contact centers

This year’s Health Information and Management Systems Society (HIMSS) Conference is right around the corner.  As a HIMSS North American Emerald Corporate Member, OpenText is proudly participating in the event, taking place February 29 – March 4 in fabulous Las Vegas. This year’s event is shaping up to be a great one.  Not only will OpenText continue to showcase the #1 fax server in the healthcare industry, OpenText RightFax, but joining the conversation is OpenText Analytics, a powerful analytics tools to help make better business decisions and drive better business outcomes. Join us at HIMSS and talk to our industry experts to learn how OpenText is driving greater productivity, efficiency and security in the healthcare industry. With so many reasons to talk to the healthcare experts at OpenText, we’ve narrowed it down to our favorites…. Top 3 Reasons to Visit OpenText at HIMSS Save money. Make the shift to hybrid faxing with RightFax and RightFax Connect: Hybrid faxing combines the power of your on-premises RightFax system with the simplicity of RightFax Connect for cloud-based fax transmission. Learn how to save money and make your RightFax environment even easier to manage. Drive compliance. Better patient information exchange with RightFax Healthcare Direct: RightFax is the only fax server that combines fax and Direct messaging in a single solution with RightFax Healthcare Direct. Learn how you can accelerate your road to interoperability with Direct through this new innovative solution. Make better decisions. Learn about OpenText Analytics products and revolutionize your reporting and analytics infrastructure. This will give you the tools to build the best data-driven enterprise apps. With live data analytics seamlessly incorporated into your apps, you can track, report, and analyze your data in real time. Be sure to visit OpenText at Booth #12113 Hall G, and learn how OpenText will help you save money, increase the security and compliance of patient information exchange, and make better decisions through data analytics.

Read More

Financial Synergy Outlines Integration of OpenText Big Data Analytics at OpenText Innovation Tour 2016

conversion rate optimization

Speaking at today’s OpenText Innovation Tour 2016 event in Sydney, Financial Synergy – Australia’s leading superannuation and investment software provider and innovator – outlined how it is embedding OpenText Big Data Analytics functionality into its Acuity platform. Using OpenText Big Data Analytics, Financial Synergy can provide a predictive and customer-centric environment for wealth management organisations and superannuation funds which relying on the Acuity platform. Stephen Mackley, CEO at Financial Synergy said, “Embedding OpenText Big Data Analytics into Acuity will allow superannuation funds of all sizes and types to affordably discover the hidden value of their data. They will have far greater capacity to retain existing customers and expand potential market share as they will know what has happened, what’s happening and what will happen.” Shankar Sivaprakasam, Vice President, OpenText Analytics (APAC and Japan) said: “One of the greatest challenges for Australia’s wealth management industry is its ability to engage with members, particularly in the early stages of account development. Advanced Big Data analytics is key to understanding the customer, their needs and behaviours. It will provide the ability to interrogate all structured and unstructured data and monitor how to best meet a customer’s goals.” Mackley continued, “We are offering a powerfully flexible tool with which our clients can use strategic, predictive knowledge to create new, efficient business models. It will also enable deeper segmentation, to ‘market of one’ levels of customer service.” Financial Synergy is a leading innovator and provider of digital solutions to the superannuation and investment services industry. The combination of its unique platform capabilities and expertise of in-house software and administration specialists, allow Financial Synergy to transform the member experience and boost business performance. Article Written By Shankar Sivaprakasam, vice president,  Analytics (APAC and Japan), OpenText  

Read More

Love is All Around – Data Driven Digest

The joy of Valentine’s Day has put romance in the air. Even though love is notoriously hard to quantify and chart, we’ve seen some intriguing visualizations related to that mysterious feeling.  If you have a significant other, draw him or her near, put on some nice music and enjoy these links. Got a Thing for You We’ve talked before about the Internet of Things and the “quantified self” movement made possible by ever smaller, cheaper, and more reliable sensors. One young engineer, Anthony O’Malley, took that a step further by tricking his girlfriend into wearing a heart rate monitor while he proposed to her. The story, as he told it on Reddit, is that he and his girlfriend were hiking in Brazil and he suggested that they should compare their heart rates on steep routes. As shown in the graph he made later, a brisk hike on a warm, steamy day explains his girlfriend’s relatively high baseline pulse, around 100 beats per minute (bpm), while he sat her down and read her a poem about their romantic history. What kicked it into overdrive was when he got down on one knee and proposed; her pulse spiked at about 145 bpm—then leveled off a little to the 125-135 bpm range, as they slow-danced by a waterfall. Finally, once the music ended, the happy couple caught their breath and the heart rate of the now bride-to-be returned to normal.   What makes this chart great is the careful documentation. Pulse is displayed not just at 5-second intervals but as a moving average over 30 seconds (smoothing out some of the randomness), against the mean heart rate of 107 bpm.  O’Malley thoughtfully added explanatory labels for changes in the data, such as “She says SIM!” (yes in Portuguese) and “Song ends.” Now we’re wondering whether this will inspire similar tracker-generated reports, such as giving all the ushers in a wedding party FitBits instead of boutonnieres, or using micro-expressions to check whether you actually liked those shower gifts. Two Households, Both Alike in Dignity One of the most famous love stories in literature, “Romeo and Juliet,” is at heart a story of teenage lovers thwarted by their families’ rivalry. Swiss scholar and designer Martin Grandjean illuminated this aspect of the play by drawing in a series of innovative network diagrams of all Shakespeare’s tragedies.   Each circle represents a character—the larger, the more important—while lines connect characters who are in the same scene together. The “network density” statistic indicates how widely distributed the interactions are; 100% means that each character shares the stage at least once with everybody else in the play. The lowest network density (17%) belongs to Antony and Cleopatra, which features geographically far-flung groups of characters who mostly talk amongst themselves (Cleopatra’s courtiers, Antony’s friends, his ex-wives and competitors back in Rome). By contrast, Othello has the highest network density at 55%; its diagram shows a tight-knit group of colleagues, rivals, and would-be lovers on the Venetian military base at Cyprus trading gossip and threats at practically high-school levels. The diagram of Romeo and Juliet distinctly shows the separate families, Montagues and Capulets. Grandjean’s method also reveals how groups shape the drama, as he writes:  “Trojans and Greeks in Troilus and Cressida, … the Volscians and the Romans in Coriolanus, or the conspirators in Julius Caesar.” Alright, We’ve Got a Winner Whether your Valentine’s Day turns out to be happy or disappointing, there’s surely a pop song to sum up your mood. The Grammy Awards are a showcase for the best — or at least the most popular — songs of the past year in the United States.   The online lyrics library Musixmatch, based in Bologna, Italy, leveraged its terabytes of data and custom algorithms to make their prediction based on all 295 of the past Song of the Year nominees (going back to 1959).  As Musixmatch data scientist Varun Jewalikar and designer Federica Fragapane wrote, they built a predictive analytics model based on a random forest classifier, which ended up ranking all 5 of this year’s nominees from most to least likely to win. Before announcing the predicted winner, Fragapane and Jewalikar made a few observations: Song of the Year winners have been getting wordier, though not necessarily longer. (Most likely due to the increasing popularity of rap and hip-hop genres, where lyrics are more prominent.) They’ve also been getting louder. Lyrics are twice as important as audio.   And they note that a sample set of fewer than 300 songs “is not enough data to build an accurate model and also there are many factors (social impact, popularity, etc.) which haven’t been modeled here. Thus, these predictions should be taken with a very big pinch of salt.” With that said, their prediction… was a bit off but still a great example of visualized data. Recent Data Driven Digests: February 10: Visualizing Unstructured Content January 29: Are You Ready for Some Football? January 19: Crowd-Sourcing the Internet of Things  

Read More

Visualizing Unstructured Analysis — Data Driven Digest

As the 2016 Presidential campaigns finish New Hampshire and move on towards “Super Tuesday” on March 1, the candidates and talking heads are still trading accusations about media bias. Which got us thinking about text analysis and ways to visualize unstructured content.  (Not that we’re bragging, but TechCrunch thinks we have an interesting way to measure the tenor of coverage on the candidates…) So this week in the Data Driven Digest, we’re serving up some ingenious visualizations of unstructured data. Enjoy! Unstructured Data Visualization in Action We’ve been busy with our own visualization of unstructured data — namely, all the media coverage of the 2016 Presidential race.  Just in time for the first-in-the-nation Iowa caucuses, OpenText released Election Tracker ‘16, an online tool that lets you monitor, compare, and analyze news coverage of all the candidates. Drawing on OpenText Release 16 (Content Suite and Analytics Suite), Election Tracker ‘16 automatically scans and reads hundreds of major online media publications around the world. This data is analyzed daily to determine sentiment and extract additional information, such as people, places, and topics. It is then translated into visual summaries and embedded into the election app where it can be accessed using interactive dashboards and reports. This kind of content analysis can reveal much more than traditional polling data ─holistic insights into candidates’ approaches and whether their campaign messages are attracting coverage. And although digesting the daily coverage has long been a part of any politician’s day, OpenText Release 16 can do what no human can do: Read, analyze, process, and visualize a billion words a day.  Word Crunching 9 Billion Tweets While we’re tracking language, forensic linguist Jack Grieve of Aston University, Birmingham, England has come up with an “on fleek” (perfect, on point) way to pinpoint how new slang words enter the language: Twitter. Grieve studied a dataset of Tweets in 2013─4 from 7 million users all over America, containing nearly 9 billion words (collected by geography professor Diansheng Guo of the University of South Carolina). After eliminating all the regular, boring words found in the dictionary (so that he’d only be seeing “new” words), Grieve sorted all the remaining words by county, filtered out the rare outliers and obvious mistakes, and looked for the terms that showed the fastest rise in popularity, week over week. These popular newcomers included “baeless” (single/a seemingly perpetual state), “famo” (family and friends), TFW (“that feeling when…” e.g. TFW when a much younger friend has to define the term for you chagrin─ that would be chagrin ), and “rekt” (short for wrecked or destroyed, not “rectitude”). As described in the online magazine Quartz, Grieve found that some new words are popularized by social media microstars or are native to the Internet, like “faved” (to “favorite” a Tweet) or “amirite” (an intentional misspelling of “Am I right?” mocking the assumption that your audience agrees with a given point of view). Grieve’s larger points include the insights you can get from crunching Big Data (9 billion Twitter words!), and social media’s ability to capture language as it’s actually used in real time. “If you’re talking about everyday spoken language, Twitter is going to be closer than a news interview or university lecture,” he told Quartz. Spreading Virally On a more serious subject, unstructured data in the form of news coverage helps track outbreaks of infectious diseases such as the Zika virus. is a site (and mobile app) created by a team of medical researchers and software developers at Boston Children’s Hospital. They use “online informal sources” to track emerging diseases including flu, the dengue virus, and Zika. Their tracker automatically pulls from a wide range of intelligence sources, including online news stories, eyewitness accounts, official reports, and expert discussions about dangerous infectious diseases.  (In nine languages, including Chinese and Spanish.) Drawing from unstructured data is what differentiates from other infectious disease trackers, such as the federal Centers for Disease Control and Prevention’s weekly FluView report. The CDC’s FluView provides an admirable range of data, broken out by patients’ age, region, flu strain, comparisons with previous flu seasons, and more. The only problem is that the CDC bases its reports on flu cases reported by hospitals and public health clinics in the U.S. This means the data is both delayed and incomplete (e.g. doesn’t include flu victims who never saw a doctor, or cases not reported to the CDC), limiting its predictive value. By contrast, the HealthMap approach captures a much broader range of data sources. So its reports convey a fuller picture of disease outbreaks, in near-real time, giving doctors and public-health planners (or nervous travelers) better insights into how Zika is likely to spread.  This kind of data visualization is just what the doctor ordered. Recent Data Driven Digests: January 29: Are You Ready for Some Football? January 19: Crowd-Sourcing the Internet of Things January 15: Location Intelligence    

Read More

3 Ways Election Tracker ’16 Solves the Unstructured Data Dilemma

When OpenText launched Election Tracker ’16, we received several encouraging and positive responses regarding how easy it is to compare stats about their favorite Presidential candidate using the interactive visualization and intelligent analysis. And without fail, the next question was, “How does it work and how could it help my business?” Powered by Release 16, Election Tracker is a great example of unstructured data analysis in action. It is able to showcase the power and importance of unstructured data analysis in a relatable way. In fact, we feel the Election Tracker addresses the dilemma of unstructured data in three distinct ways. 1) Intelligent Analysis Making sense of unstructured data is a high concern for digital organizations. Perhaps it’s trying to understand Google’s Page Rank algorithm, finding sentiment in the body of an email or website, or perhaps scanning digital health records for trends. It is also important for businesses that need to organize and govern all data within an enterprise. Companies are not shy about throwing money at the problem. The global business intelligence market saw revenue of nearly $6 billion in 2015. That number is expected to grow to $17 billion at a CAGR of 10.38 percent between now and 2020, according to market research firm Technavio. Much of the investment is expected to come in the form of data analysis and cloud implementation. The secret sauce is our content analytics tool, OpenText InfoFusion. Using natural language processing technology or text mining engine, the software tackles the core of unstructured data by extracting the most relevant linguistic nouns from semi-structured or unstructured textual content. The extraction is based on controlled vocabularies such as names, places, organization labels, product nomenclature, facility locations, employee directories, and even your business jargon. The InfoFusion engine is able to automatically categorize content based on a file plan, hierarchical taxonomy or classification tree. This automatically creates a summary combining the most significant phrases and paragraphs. It can also show related documents.  This ability to relate documents is based on semantics—asking the engine to give you a document that has the same keywords, key phrases, topics and entities. The engine can also detect the ways that key words and phrases are used and correlate them to known indicators of whether a document is dryly factual or conveying emotion about a topic, and whether that emotion is positive or negative ─that is, its sentiment. 2) Interactive Visualization       All the data in the world means nothing without some way to visually represent the context. Most pure-play “text analytics” solutions on the market today stop short of actual analysis. They are limited to translating free text to entities and taxonomies, leaving the actual visualization and analysis for the customer to figure out using other technologies. The technology powering the Election Tracker overcomes this important dilemma by converting the data into a visual representation that helps with content analysis. Once the Election Tracker mines raw text from the scores of major news sites around the world, it then uses OpenText Content Analytics to process the content. This determines sentiment and extracts people, places, and topics following standard or custom taxonomies providing the meta-data necessary to conduct an analysis. The tracker determines the objectivity or subjectivity of content and the tone: positive, negative or neutral. Visual summaries of the news data are generated with the Analytics Designer, then housed and deployed on OpenText iHub. The iHub-based visuals are seamlessly embedded into the Election Tracker user interface using the iHub JavaScript API.   3) Scalable and Embeddable While we designed the Election Tracker to automatically crawl the web for election-focused articles, the technology behind the scenes can access and harvest data from any unstructured source. This includes social sites like Twitter, Facebook, and LinkedIn; email; multimedia message service (MMS); document archives like PDFs; RSS feeds; and blogs. Additionally, these sources can be combined with structured data to provide extremely valuable context—such as combining brand social sentiment from Twitter with product launch campaign results from a customer relationship management source, giving unparalleled insight to the success of a launch process. Overcoming the problems of scale can help ease fears about needing to add more data sources in the future. Its ability to be embedded allows companies to use their own branding and serve their customers in a format that is comfortable to the end user. See what all the buzz is about by visiting Election Tracker ’16 at: For more on the technology behind the Election Tracker ’16, here is a 20-minute election tracker demo. Also, visit our blog to discover the importance of designing for unstructured analysis of data.  

Read More

OpenText Releases U.S. Presidential Election Application, Powered by OpenText Release 16

Election Tracker

As the highly contested 2016 U.S. Presidential election and the Iowa caucus approach, I’m pleased to announce the release of Election Tracker ‘16, an online application tool for users who want to monitor, compare, and gain insights into the 2016 U.S. Presidential election. For more information, please visit the following site: How does work?  Utilizing OpenText Release 16 (Content Suite, and Analytics Suite), Election Tracker ‘16 automatically scans and reads hundreds of top online media publications around the world, capitalizing on the knowledge buried in the unstructured information. This data is analyzed daily to determine sentiment and extract additional information, such as people, places, and topics. It is then translated into visual summaries and embedded into the election app where it can be accessed using interactive dashboards and reports. That is correct, hundreds of websites, a billion words, processed, stored and visualized, in real time. And we have been collecting this data for months to show trends and sentiment changes. Powered by OpenText Release 16, this information-based application provides anyone interested in following the critical 2016 election with deep insights into candidate information, revealing much more than traditional polling data. Using, election enthusiasts are able to gain a holistic view of how candidates are performing based on media sentiment, which can be a more accurate indication of future success. Election Tracker ‘16 is built using OpenText Release 16, which includes Content Suite (store) and Analytics Suite (visualize and predict), bringing seemingly unstructured data to life. OpenText Release 16 can do what no human can do. Read, analyze, process, and visualize a billion words a day. Transforming Unstructured Data into Interactive Insights All three components of OpenText Release 16 are important, but let me focus on the analytic aspects of Election Tracker ‘16.  As we saw in the 2012 U.S. election, analytics played a major role in the success of the Obama campaign. Drawing from a centralized database of voter information, President Obama’s team was able to leverage analytics to make smarter, more efficient campaign decisions. Everything—from media buy placements to campaign fundraising to voter outreach—was driven by insight into data. This year, analytics promises to play an even bigger role in the U.S. Presidential election. Analytics has made its way into every industry and has become a part of everyday life. Just as candidates look for a competitive advantage to help them win, so too must businesses. Research shows that organizations that are data driven are more profitable and productive than their competitors. Analytics and reporting solutions help organizations become data driven by extracting value from their business data and making it available across the enterprise to facilitate better decision making. Organizations have been using analytics to unlock the power of their business data for years. Marketing analysts are using analytics to evaluate social content and understand customer sentiment. Legal analysts can gain a quick understanding of the context and sentiment of large volumes of legal briefs. Data directors, tasked with organizing and governing enterprise data, are applying analytics to solve complex business problems. But basic analytics has become table stakes, with laggards, smaller companies, and even skeptics jumping on the bandwagon. Embedded analytics is the new competitive advantage. When you scan the most pure-play “text analytics” solutions on the market today, they clearly stop short of actual analysis. They are limited to translating free text to entities and taxonomies, leaving the actual visualization and analysis for organizations to figure out using other technologies. At the other end of the spectrum, traditional dashboard tools lack the sophistication needed to process free text effectively, and they struggle with large amounts of data. With OpenText Release 16, our Analytics Suite accomplishes both with ease, empowering users to quickly and easily build customized enterprise reporting applications to summarize and visualize insights from unstructured big data, securely delivered across web browsers, tablets, and phones. So, whether you’re out to win the race for the U.S. Presidency, gain market share, or attract new customers, an embedded analytics and reporting solution like Election Tracker ’16 can help you cross the finish line first. Throughout the race to November 2016, we’ll be tapping into the power of Election Tracker ’16 to shed light on how candidates perform during key election milestones. Join us for a behind the scenes look at the campaign trail. For more information on the Election Tracker ’16—powered by OpenText read the press release.

Read More

Are You Ready for Some Football? — Data Driven Digest


Here in Silicon Valley, Super Bowl 50 is not only coming up, it’s downright inescapable. The OpenText offices are positioned halfway between Santa Clara, where the game will actually be played on Feb. 7, and San Francisco, site of the official festivities (44 miles north of the stadium, but who’s counting?).  So in honor of the Big Game, this week we’re throwing on the shoulder pads and tackling some data visualizations related to American football.  Enjoy! Bringing Analytics Into Play In the area of statistics and data visualization, football has always taken a back seat to baseball, the game beloved by generations of bow-tied intellectuals.  But Big Data is changing the practice of everything from medicine to merchandising, so it’s no wonder that better analysis of the numbers is changing the play and appreciation of football. Exhibit A:  Kevin Kelley, head football coach at Pulaski Academy in Little Rock, Ark., has a highly unusual style of play–no punts.  He’s seen the studies from academics such as UC Berkeley professor David Romer, concluding that teams shouldn’t punt when facing fourth downs with less than 4 yards gained, and he came to the conclusion that “field position didn’t matter nearly as much as everyone thought it did.” As Kelley explains in an ESPN short film on the hub, if you try to punt when the ball is on your 5-yard line or less, the other team scores 92% of the time. Even 40 yards from your goal line, the other team still scores 77% of the time. “Numbers have shown that what we’re doing is correct,” he says in the film. “There’s no question in my mind, or my coaches’ minds, that we wouldn’t have had the success we’ve had without bringing analytics into (play).” The coach’s data-driven approach has paid off, giving Pulaski multiple winning seasons over the past 12 years, including a 14-0 record in 2015. The highlight of their latest season: Beating Texas football powerhouse Highland Park 40-13 and snapping its 84-game home winning streak, which goes back to 1999. Bigger, Faster, Stronger No doubt most of Coach Kelley’s players dream of turning pro. But they’ll need to bulk up if they want to compete, especially as defensive linemen.  Two data scientists offer vivid demonstrations of how much bigger NFL players have gotten over the past few generations. Software engineer and former astrophysicist Craig M. Booth crunched the data from 2013 NFL rosters to create charts of their heights and weights.  His chart makes it easy to see how various positions sort neatly into clusters:  light, nimble wide receivers and cornerbacks; tall defensive and tight ends; refrigerator-sized tackles and guards. The way Booth mapped the height/weight correlation, with different colors and shapes indicating the various positions, isn’t rocket science. It is, however, a great example of how automation is making data visualization an everyday tool. As he explains on his blog, he didn’t have to manually plot the data points for all 1,700-odd players in the NFL; he downloaded a database of the player measurements from the NFL’s Web site, then used an iPython script to display it. For a historical perspective on how players have gotten bigger since 1950, Booth created a series of line charts showing how players’ weights have skyrocketed relative to their heights. Backfield in Motion Meanwhile, Noah Veltman, a member of the data-driven journalism team at New York City’s public radio station WNYC, has made the bulking-up trend even more vivid by adding a third dimension – time – to his visualization.   His animation draws on NFL player measurements going all the way back to 1920. He observes that football players’ increasing size is partly due to the fact that Americans in general have gotten taller and heavier over time – though partly also due to increasing specialization of body type by position. You can see a wider range of height-and-weight combinations as the years go by.  And from the 1990s on, they begin falling into clusters.  (You could also factor in more weight training, rougher styles of play, and other trends, but we’ll leave that discussion to the true football geeks.) Bars, Lines, and Bubbles Now, what kind of play are we seeing from these bigger, better-trained players? Craig M. Booth recently unveiled an even more interesting football-related project, an interactive visualizer of the performance of every NFL team from 2000 on.  He uses the Google charts API to display data from on everything from points scored by team by quarter to total passing or penalty yards. You can customize the visualizer by the teams tracked, which variables appear on the X and Y-axes, whether they’re on a linear or logarithmic scale, and whether to display the data as bubble plots, bar charts, or line graphs. It can serve up all kinds of interesting correlations.  (Even though OpenText offers powerful predictive capacities in our Big Data Analytics suite, we disavow any use of this information to predict the outcome of a certain football game on February 7…) OpenText Named a Leader in the Internet of Things Speaking of sharing data points, OpenText was honored recently in the area of Internet of Things by Dresner Advisory Services, a leading analyst firm in the field of business intelligence, with its first-ever Technical Innovation Awards. You can view an infographic on Dresner’s Wisdom of Crowds research. Recent Data Driven Digests: January 19: Crowd-Sourcing the Internet of Things January 15: Location Intelligence January 5: Life and Expectations  

Read More

Crowd-Sourcing the Internet of Things (Data Driven Digest for January 19, 2016)

Runner with fitness tracker passes the Golden Gate Bridge

The Internet of Things is getting a lot of press these days. The potential use cases are endless, as colleague Noelia Llorente has pointed out: Refrigerators that keep track of the food inside and order more milk or lettuce whenever you’re running low. Mirrors that can determine if you have symptoms of illness and make health recommendations for you. Automated plantations, smart city lighting, autonomous cars that pick you up anywhere in the city… So in this week’s Data Driven Digest, we’re looking at real-world instances of the Internet of Things that do a good job of sharing and visualizing data. As always, we welcome your comments and suggestions for topics in the field of data visualization and analysis.  Enjoy! The Journey of a Million Miles Starts with a Single Step Fitness tracking has long been a popular use for the Internet of Things. Your correspondent was an early adopter, having bought Nike+ running shoes, with a special pocket for a small Internet-enabled sensor, back in 2007.  (Nike+ is now an app using customers’ smartphones, smart watches, and so forth as trackers.) These sensors track where you go and how fast, and your runs can be uploaded and displayed on the Nike+ online forum, along with user-generated commentary – “trash talk” to motivate your running buddies, describing and rating routes, and so forth. Nike is hardly the only online run-sharing provider, but its site is popular enough to have generated years of activity patterns by millions of users worldwide. Here’s one example, a heat map of workouts in the beautiful waterfront parks near San Francisco’s upscale Presidio and Marina neighborhoods.  (You can see which streets are most popular – and, probably, which corners have the best coffeehouses…) The Air That I Breathe Running makes people more aware of the quality of the air that they breathe., an “environmental justice” nonprofit in Brooklyn, N.Y., is trying to make people more conscious of the invisible problem of air pollution through palm-sized sensors called AirBeams. These handheld sensors can measure levels of microparticulate pollution, ozone, carbon monoxide, and nitrogen dioxide (which can be blamed for everything from asthma to heart disease and lung cancer) as well as temperature, humidity, ambient noise, and other conditions. So far so good – buy an AirBeam for $250 and get a personal air-quality meter, whose findings may surprise you. (For example, cooking on a range that doesn’t have an effective air vent subjects you to more grease, soot, and other pollution than the worst smog day in Beijing.) But the Internet of Things is what really makes the device valuable. Just like with the Nike+ activity trackers, AirBeam users upload their sensor data to create collaborative maps of air quality in their neighborhoods. Here, a user demonstrates how his bicycle commute across the Manhattan Bridge subjects him to a lot of truck exhaust and other pollution – a peak of about 80 micrograms of particulate per cubic meter (µg/m3), over twice the Environmental Protection Agency’s 24-hour limit of 35 µg/m3. And here’s a realtime aggregation of hundreds of users’ data about the air quality over Manhattan and Brooklyn.  (Curiously, some of the worst air quality is over the Ozone Park neighborhood…) Clearly, the network effect applies with these and many other crowd-sourced Internet of Things applications – the more data points users are willing to share, the more valuable the overall solution becomes. OpenText Named a Leader in the Internet of Things Speaking of sharing data points, OpenText was honored recently in the area of Internet of Things by Dresner Advisory Services, a leading analyst firm in the field of business intelligence, with its first-ever Technical Innovation Awards. You can view an infographic on Dresner’s Wisdom of Crowds research. Recent Data Driven Digests: January 15: Location Intelligence January 5: Life and Expectations December 22: The Passage of Time in Sun, Stone, and Stars

Read More

Next Step For Internet of Things: Analytics of Things

Infographic Internet of Things Dresner Report

“Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Between us, Theory and Practice agree: nothing works and nobody knows why.” Anonymous. The Internet of Things (IoT) is clearly the next big step in the technology industry. It will be almost like the second Industrial Revolution, opening a world of incalculable, even greater possibilities in the digital age, potentially achieving greater independence and, therefore, greater efficiency. Imagine the possibilities. Refrigerators that measure the food inside and replenish the right product from your preferred vendor. Mirrors that can determine if you have symptoms of illness and make health recommendations for you. Smart watches that monitor your vital signs and warn the health services if you have a problem or emergency. Traffic lights that connect to a circuit of cameras to identify the level of traffic and mass movement, thus preventing absurd waiting times in areas of little movement. Automated plantations, smart city lighting, courier drones that deliver whatever you want and wherever you want, autonomous cars that pick you up anywhere in the city…  Sounds like futurist Sci-Fi. But what if this scenario was closer than we think? I had the pleasure of attending Big Data Week in Barcelona (#BDW15) recently, which featured top speakers from industry leading companies. My expectation was that I would listen to a lot of talks on technology, programming languages, Hadoop, R and theories about the future for humans and business in this new era of Big Data. After hearing the first presentations from Telefonica (Telcos), BBVA Data & Analytics (Finance) and the Smart Living Program at Mobile World Capital Barcelona (Technology), I realized something. Regardless of the industry, it was all about how insights from data produced by physical objects disrupt our lives as individuals, consumers, parents, and business leaders. It doesn’t matter which role you play. Yes, it was all about the “Internet of Things” or to take it a step forward, the “Analytics of Things”. These companies are already changing the way they do business by leveraging information from internet connected devices that they already have. And, that is just the beginning. Gartner estimates that by 2020, there will be 20+ billion of devices connected to IoT. Dresner’s 2015 The Internet of Things and Business Intelligence report, estimates that 67% of enterprises consider IoT to be an important part of their future strategies and 89% of information professionals think predictive analytics will do the same within the Internet of Things. The data itself is not the point, it is how Big Data Analytics technologies enable organizations to collect, cross, integrate and analyze data from devices to design better products and make our lives easier. So, have Theory and Practice finally converged? What if the future is right now? Take a look at the infographic based on the findings of the IoT Dresner Report or download the full study to find out more.

Read More

Data Driven Digest for January 15: Location, Location, Location

Location intelligence is a trendy term in the business world these days. Basically, it means tracking the whereabouts of things or people, often in real time, and combining that with other information to provide relevant, useful insights. At a consumer level, location intelligence can help with things like finding a coffeehouse open after 9 p.m. or figuring out whether driving via the freeway or city streets will be faster. At a business level, it can help with decisions like where to build a new store branch that doesn’t cannibalize existing customers, or laying out the most efficient delivery truck routes. Location intelligence is particularly on our mind now because OpenText was recently honored by Dresner Advisory Services, a leading analyst firm in the field of business intelligence, with its first-ever Technical Innovation Awards. Dresner recognized our achievements in three areas: Location Intelligence, Internet of Things, and Embedded BI. You’ll be hearing more about these awards later. In the meantime, we’re sharing some great data visualizations based on location intelligence.  As always, we welcome your comments and suggestions.  Enjoy! Take the A Train In cities all over North America, people waiting at bus, train, or trolley stops who are looking at their smartphones aren’t just killing time – they’re finding out exactly when their ride is due to arrive. One of the most popular use cases for location intelligence is real-time transit updates. Scores of transit agencies, from New York and Toronto to Honolulu, have begun tracking the whereabouts of the vehicles in their fleets, and sharing that information in live maps. One of the latest additions is the St. Charles Streetcar line of the New Orleans Regional Transit Authority (NORTA) — actually the oldest continuously operating street railway in the world! (It was created in 1835 as a passenger railway between downtown New Orleans and the Carrollton neighborhood, according to the NORTA Web site.) This is not only a boon to passengers, the location data can also help transit planners figure out where buses are bunching up or falling behind, and adjust schedules accordingly. On the Street Where You Live Crowdsourcing is a popular way to enhance location intelligence. The New York Times offers a great example with this interactive feature describing writers’ and artists’ favorite walks around New York City. You can not only explore the map and associated stories, you can add your own – like this account of a proposal on the Manhattan Bridge. Shelter from the Storm The City of Los Angeles is using location intelligence in a particularly timely way: An interactive map of resources to help residents cope with winter rainstorms (which are expected to be especially bad this year, due to the El Niño weather phenomenon). The city has created a Google Map, embedded in the site, that shows rainfall severity and any related power outages or flooded streets, along with where residents can find sandbags, hardware stores, or shelter from severe weather, among other things.  It’s accessible via both desktop and smartphones, so users can get directions while they’re driving. (Speaking of directions while driving in the rain, L.A. musician and artist Brad Walsh captured some brilliant footage of an apparently self-driving wheeled trashcan in the Mt. Washington neighborhood. We’re sure it’ll get its own Twitter account any day now.) We share our favorite data-driven observations and visualizations every week here. What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: January 5: Life and Expectations December 22: The Passage of Time in Sun, Stone, and Stars December 18: The Data Awakens  

Read More

Data Scientist: What Can I Do For You?

Data Scientist and OpenText Big Data Analytics

After attending our first Enterprise World, I have just one word to define it: intense. In my memory, there are a huge number of incredible moments: spectacular keynotes, lots of demos and amazing breakout sessions. Now, trying to digest all of these experiences, collecting all the opinions, suggestions and thoughts of the customers that visited our booth, I remember a wonderful conversation with a customer about data mining techniques, their best approaches and where we can help with our products. From the details and the way he formed his questions it was pretty clear that, in front of me, I had a data scientist or, maybe someone who deeply understands this amazing world of data mining, machine learning algorithms and predictive analytics. Just to put it in context, usually the data scientist maintains a professional skepticism about applications that provide an easy-to-use interface, without a lot of options and knobs, when running algorithms for prescriptive or predictive analytics. They love to tweak algorithms, writing their own code or accessing and modifying all the parameters of a certain data mining technique, just to obtain the best model for their business challenge. They want to have the full control in the process and it is fully understandable. It is their comfort zone. Data scientists fight against concepts like the democratization of predictive analytics. They have good reasons. And, I agree with a large number of them. Most of the data mining techniques are pretty complex, difficult to understand and need a lot of statistics knowledge just to say, “Okay, this looks pretty good.” Predictive models need to be maintained and revised frequently, based on your business needs and the amount of data you expect to use during the training/testing process. More often than you can imagine, models can’t be reused for similar use cases. Each business challenge has its own data related, and that data is what will define how this prescriptive or predictive model should be trained, tested, validated and, ultimately, applied in the business. On the other hand, a business analyst or a business user without a PhD can take advantage of predictive applications that have those most common algorithms in a box (a black box) and start answering their questions about the business. Moreover, usually their companies can’t assume the expensive compensation of a data scientist, so they deal with all of this by themselves. But, what can we do for you, data scientist? The journey starts with the integration of distinct sources from databases, text files, spreadsheets or, even, applications in a single repository, where everything is connected. Exploring and visualizing complex data models with several levels of hierarchy offers a better approach to the business model than the most common huge table method. Having an analytical repository as a reflection of how the business flows, helps in one of the hardest parts of the Data Scientist: problem definition. Collecting data is just the beginning, there is a huge list of tasks related to data preparation, data quality and data normalization. Here is where the business analyst or the data scientist loses much of their precious time and we are here to help them, accelerating time from raw data to value. Once they have achieved their goal of getting clean data, a data scientist begins the step of analyzing the data, finding patterns, correlations and hidden relationships. OpenText Big Data Analytics can help provide an agile solution to perform all this analysis. Moreover, everything is calculated fast and using all your data, your big data, offering a flexible trial and error environment. So the answer to my question: OpenText Big Data Analytics can reduce the time during the preparation process, increasing time where it is really needed: analysis and decision making, even if the company is dealing with big data. So, why don’t you try it in our 30 days Free Trial or ask us for a demo?

Read More

Data Driven Digest for January 5: Life and Expectations

Welcome to 2016!  Wrapping up 2015 and making resolutions for the year ahead is a good opportunity to consider the passage of time – and in particular, how much is left to each of us. We’re presenting some of the best visualizations of lifespans and life expectancy. So haul that bag of empty champagne bottles and eggnog cartons to the recycling bin, pour yourself a nice glass of kale juice, and enjoy these links for the New Year. “Like Sands Through the Hourglass…” It’s natural to wonder how many years more we’ll live. In fact, it’s an important calculation, when planning for retirement. Figuring out how long a whole population will live is a solvable problem – in fact, statisticians have been forecasting life expectancy for nearly a century. And, the news is generally good:  Life expectancies are going up in nearly every country around the world. But how do you figure out how many years are left to you, personally? (Short of consulting a fortune-teller, a process we don’t recommend as the conclusions are generally not data-driven.) UCLA-trained statistician Nathan Yau of the excellent blog Flowing Data came up with a visualization that looks a bit like a pachinko game. It runs multiple simulations predicting your likely age at death (based on age, gender, and Social Security Administration data) by showing little balls dropping off a slide to hit a range of potential remaining lifespans, everything from “you could die tomorrow” to “you could live to 100.” As the simulations pile up, they peak at the likeliest point. One of the advantages of Yau’s simulator is that it doesn’t provide just one answer, the way many calculators do that ask about your age, gender, race, health habits, and so forth. Instead, it uses the “Monte Carlo” method of multiple randomized trials to get an aggregated answer. Plus, the little rolling, bouncing balls are visually compelling.  (That’s academic-ese for “They’re fun to watch!”) “Visually compelling” is the key.  As flesh-and-blood creatures, we can’t operate entirely in the abstract. It’s one thing to be told you can expect to live X years more; seeing that information as an image somehow has more impact in terms of motivating us to action. That’s why the approach taken by Wait But Why blogger Tim Urban is so striking despite being so simple.  He started with the assumption that we’ll each live to 90 years old – optimistic, but doable. Then he rendered that lifespan as a series of squares, one per year. What makes Urban’s analysis memorable – and a bit chilling – is when he illustrates the remaining years of life as the events in that life – baseball games, trips to the beach, Chinese dumplings, days with aging parents or friends.  Here, he figures that 34 of his 90 expected winter ski trips are already behind him, leaving only 56 to go. Stepping back, he comes to three conclusions: 1) Living in the same place as the people you love matters. I probably have 10X the time left with the people who live in my city as I do with the people who live somewhere else. 2) Priorities matter. Your remaining face time with any person depends largely on where that person falls on your list of life priorities. Make sure this list is set by you—not by unconscious inertia. 3) Quality time matters. If you’re in your last 10% of time with someone you love, keep that fact in the front of your mind when you’re with them and treat that time as what it actually is: precious. Spending Time on Embedded Analytics Since we’re looking ahead to the New Year, on Tuesday, Jan. 12, we’re hosting a webinar featuring TDWI Research Director Fern Halper, discussing Operationalizing and Embedding Analytics for Action. Halper points out that analytics need to be embedded into your systems so they can provide answers right where and when they’re needed. Uses include support for logistics, asset management, customer call centers, and recommendation engines—to name just a few.  Dial in – you’ll learn something! Fern Halper We share our favorite data-driven observations and visualizations every week here.  What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: December 22: The Passage of Time in Sun, Stone, and Stars December 18: The Data Awakens December 11: Holiday Lights

Read More

Data Driven Digest for December 22: The Passage of Time in Sun, Stone, and Stars

Photo from RTE News, Ireland

Data visualization has been with us since the first cave-dweller documented the lunar month as a loop of symbols carved onto a piece of bone that hunters could carry with them to track the passage of the seasons. Obviously, technology has moved on in the past 34,000 years – have we told you lately about our iHub dashboards and embedded analytics? – but since the winter solstice (for the Northern Hemisphere) occurs Tuesday, Dec. 22, we thought this would be a good time to review some of the earliest efforts to create live, continuously updated reports of astronomical data, out of stone structures and the landscapes around them.   The fact that many of these calendars still exist after thousands of years, and still work, shows that our prehistoric ancestors must have considered visually recording the time of the year mission-critical, to predict hunting and harvest times, plus other seasonal events such as spring thaws, droughts, and monsoons.  (Whether accurately predicting and planning for those events was part of their yearly job performance review, we leave to the archaeologists…) So step outside and, if the weather permits, take a look at the sunrise or sunset and notice exactly where it hits the horizon, something our ancestors have done for thousands of generations.  Then come back into your nice warm room and check out these links.  Enjoy, and happy holidays! Sun Daggers The winter solstice is the time of year when the days are shortest and nights are longest.  As such, it was an anxious time for primitive people, who wondered when their world would stop getting darker and colder.  That’s why early astronomer-priests (the Chief Data Officers of their time) designed calendars that made clear exactly when the day reached its minimum and the sun was at the lowest point on the horizon – and would then start returning. One of the most impressive solar calendars is at Maeshowe, a 5,000-year-old “chambered cairn” in the Orkney Islands, north of Scotland.  It’s a long passage built of stone slabs dug into an artificial mound.  The passage is oriented so that for a few days around the winter solstice every year, the rays of the setting sun reach all the way down to light up the door at the end.  Two markers pinpoint the sun’s path on the exact date of the solstice: a monolith about half a mile away, called the Barnhouse Stone, and another standing stone at the entrance to Maeshowe (now missing, though its socket remains).         Even more impressive is Newgrange, a 5,000-year-old monument near Dublin, Ireland.  Newgrange was built as a 76-meter-wide circular mound of stones and earth covering an underground passage, possibly used as a tomb. A hollow box above the passage lets in the rising sun’s light for about 20 minutes at dawn around the winter solstice.  The beam starts on the passage’s floor, then gradually reaches down the whole 19-meter length of the passage, flooding it with light.  It’s an impressive spectacle, one that attracts thousands of people to the Newgrange site for the six days each December that the sunbeam is visible.   Nor were early Europeans the only ones taking note of the sun’s travels across the landscape.  At Fajada Butte, New Mexico, three stone slabs were positioned so that “dagger”-shaped beams of sunlight passing between the parallel slabs travel across carved spirals on the cliff face beneath at the summer and winter solstices and spring and fall equinoxes.   Fajada Butte is part of the Chaco Canyon complex, inhabited between the 10th and 13th centuries by the Anasazi or Ancestral Puebloans.  They built impressively engineered cliff dwellings, some as high and densely populated as big-city apartment buildings, laid out 10-meter-wide roads that spanned modern-day New Mexico, and harvested snowmelt and rainwater for irrigation through a sophisticated system of channels and dams.  The Anasazi religion was apparently based on bringing the order seen in the heavens down to earth, so many of their sites were oriented north-south or towards complex alignments of sun, moon, and stars – which may explain why Fajada Butte was just one of many solar observatories they built.   Researchers at the Exploratorium in San Francisco have designed an interactive simulator of how the Sun Daggers worked:   Looping Through Time From the passage of time documented in stone and earth thousands of years ago to the wanderings of a Time Lord:  Just in time for the annual “Dr. Who” Christmas special (and the beloved sci-fi show’s 50th anniversary), our friends at the BBC have created a clever interactive map of the travels through time of all 11 incarnations of Dr. Who. This looping diagram ingeniously displays all the journeys by actor, episode, and whether the trip was into or out of the past and future, as well as the actual year.  It’s not a linear chronology, but the course of a Time Lord’s adventures, like true love, never did run smooth. Light on Embedded Analytics Meanwhile, we’re hoping to shed some light on a topic dear to our heart – analytics.  On Jan. 12, 2016, we’re hosting a Webinar featuring TDWI Research Director Fern Halper, who will talk about Operationalizing and Embedding Analytics for Action. Halper points out that analytics need to be embedded into your systems so they can provide answers right where and when they’re needed.  Uses include support for logistics, asset management, customer call centers, and recommendation engines—to name just a few.  Dial in – we promise you’ll learn something! We share our favorite data-driven observations and visualizations every week here.  What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: December 18: The Data Awakens December 11: Holiday Lights December 4: Winners of the Kantar Information Is Beautiful Awards

Read More

Data Driven Digest for December 18: The Data Awakens

It is a period of data confusion. Rebel businesses, striking from a hidden NoSQL base, have assembled their first embedded application against the evil Big Data. During the battle, data scientists managed to steal secret plans to the Empire’s ultimate weapon, the SPREADSHEET, a mass of rows and columns with enough frustration and lack of scale that it could crash an entire business plan. While that may not be the plot of the new Star Wars film (or any for that matter), the scenario may invoke a few cheers for the noble data scientists tasked with creating dashboards and visualizations to battle the dark side of Big Data. Find out more on how to battle your own Big Data problem with Analytics As the world enjoys the latest installment of the Star Wars franchise, it seemed fitting for us to acknowledge visualizations based on the movie series. Strong is the data behind the Force. Enjoy these examples. The Force is Strong with This One Image source: Bloomberg Business At its core, the Star Wars movie franchise is about the battle between the light and dark sides of the Force. But how much time do they spend exploring that mystical power that surrounds us and penetrates us, and binds the galaxy together? Amazingly, a mere 34 minutes out the total 805 minutes amassed in the first six films. The screen above is one of five outstanding visualizations of the use of the Force created by a team of data reporters and visualization designers at Bloomberg Business. Creators Dashiell Bennett (@dashbot), Tait Foster (@taitfoster), Mark Glassman (@markglassman), Chandra Illick (@chandraelise), Chloe Whiteaker (@ChloeWhiteaker), and Jeremy Scott Diamond (@_jsdiamond) really draw you in. They break down not only the time spent talking about the Force but identifying which character uses the Force the most and what types of Force abilities are used. Each movie was viewed by the team with data compiled by hand and then entered into a spreadsheet.  If there were discrepancies, the team used the novelizations and screenplays of the films as references. While the project is engaging, it also digs deep, offering secondary layers of data such as the number of times Obi-Wan Kenobi uses the Jedi Mind Trick versus Luke Skywalker or Qui-Gon Jinn.   Great Shot, Kid, That Was One in a Million Image source: Gaston Sanchez  Sometimes the technologies behind visualizations need to be acknowledged. Our second entry is an example of an arc diagram that was compiled using the R technology.  The Star Wars tie-in here is a statistical text analysis of the scripts from the second trilogy (Episodes IV, V, and VI) using arc-diagram representations. Arc diagrams are often used to visualize repetition patterns. The thickness of the arc lines can be used to represent frequency from the source to the targets or “nodes,” as they are often called. The visualization is not often used as the reader may not clearly see the correlation between the different nodes. However, arc diagrams are great for showing relationships where time or numerical values aren’t involved. Here, the chart shows which characters speak to each other most often, and the words they use most. (No surprise, “sir” and “master” are C-3PO’s most common utterances, while Han Solo says “hey,” “kid,” and “get going” a lot.) Gaston Sanchez, a data scientist and lecturer with the University of California, Berkeley and Berkeley City College, came up with this arc diagram as part of a lecture he was giving on the use of Arc Diagrams with R. Sanchez showed how to use R’s “tm” and “igraph” packages to extract text out of the scripts and compute adjacency matrices. R has become embedded in the corporate world. R is an implementation of the S programming language developed by Bell Labs back in the 1990s. The language has been compared to Python as a way to dive into data analysis or apply statistical techniques. While R has typically been used by academics and researchers, more businesses are embracing R because it is seen as good for user-friendly analysis and graphical modeling.   This is the Data You are Looking For Image source: Eshan Wickrema and Lachlan James While “Star Wars, The Force Awakens” is expected to break box office records, it faces strong challengers to rank as one of the highest-grossing films of all time. According to stats from Box Office Mojo and Variety, the 1999 release of “Star Wars Episode I: The Phantom Menace” ranks number 20 on the list. When adjusted for inflation, the 1977 release of “Star Wars” is ranked third on the list of all-time movies behind “Gone with the Wind” and “Avatar.” Looking at the first 100 days of release is one key to understanding the return on investment for a given film. Writers Eshan Wickrema and Lachlan James compared the stats of the first six Star Wars films against each other. What’s significant is that each film made more in revenue than its predecessor, with the prequel films making nearly twice the amount of “Return of the Jedi,” the most popular of the original trilogy. We share our favorite data-driven observations and visualizations every week here.  What topics would you like to read about?  Please leave suggestions and questions in the comment area below. Recent Data Driven Digests: December 11: Illuminating the cost and output of Christmas lights December 4: 2015 winners of the Kantar Information Is Beautiful Awards November 27: Mapping music in color

Read More

Delivering Insights Interactively in a Customer-Centric World

Consumers today expect access to their information from wherever they are, at any time and from any device. This includes information such as statements, bills, invoices, explanations of benefits, and other transactional records that help them better understand their relationship with a business. What’s more, they want to be able to interact with their data to gain greater insight through sorting, grouping, graphing, charting, or otherwise manipulating the information. This directly impacts customer satisfaction and loyalty – the better companies are at giving customers ubiquitous access to that information, the more opportunities they have to delight and retain their customers. According to IDG’s 2015 “State of the CIO” report, customer experience technologies and mobile technologies are among CIOs’ top five priorities. The ability to access and interact with transactional data from any device falls squarely into both of these categories, so it’s no surprise organizations want to do so. The chart below demonstrates the level of importance CIOs have placed on providing customers the ability to interact with their information. However, while the understanding of the importance of this need is there, it doesn’t necessarily align with organizations’ ability to deliver. Some interesting facts uncovered through IDG’s August 2015 survey on customer data interactivity: 17 percent of organizations cannot provide access to that data across multiple devices. Two-thirds let customers access transactional data on any device, but only in a static format. Only 18 percent can give customers truly interactive transactional data via the device of their choice. The question then is, why is it so difficult for companies to provide information in interactive formats? The IDG survey reveals that many companies lack not only a strategy to enable interactivity, but the skilled resources to implement any strategy they might develop. Any attempts at cross-device interactivity are ad hoc and therefore difficult, where customers expect to be able to slice and dice their own transaction histories at will, allowing them to do so from their devices of choice. What can companies do about this? Where should they begin? What are the best practices in achieving the goal of interactivity in this information-driven age? To enable a better way to work in this age of disruption, OpenText recommends that IT leaders place priority on these three principles: Simplify access to data to reduce costs, improve efficiencies, and increase competitiveness. Consolidate and upgrade information and process platforms. Increase the speed of information delivery through integrated systems and visual presentation. Simplifying access to data is a key component of this puzzle. For very large organizations, data exists in different formats in different archives across disparate departments. This can become a huge barrier to getting information in a consolidated fashion, in the appropriate formats. Some of this data needs to be extracted in its native format, and other data can be repurposed from archived statements and reports. The data can exist in legacy and newer formats, and transforming this information into a common format acceptable to any device requires organization. Finally, accelerating the delivery of information in a device-agnostic manner can only be accomplished through integrated systems that can talk to each other and deliver data in visually compelling formats. All this requires an integrated look at the enterprise architecture and information flow. While this is very much achievable, it needs to be done in a systematic manner, with solutions that can address all the requirements as well as barriers in opening and freeing up information flow across the organization. OpenText, with its suite of products in the Output Transformation and Enterprise Content Management areas, has the toolkit to address different parts of this challenge with industry leading, best-in-class solutions. Download the IDG report,”Customers are demanding insights from their data. Are you ready?” to learn more about these principles of success, and how OpenText can help you deliver the interactive, digital experiences that the customer of today is demanding.

Read More