Analytics

How IoT Based Analytics Will Drive Future Supply Chain Operations

Over the past couple of years we have seen an exponential growth in interest around the Internet of Things (IoT). My interest in this space started at Cisco’s IoT World Forum in Barcelona in late 2013.  Back then many of the software and solution vendors were just starting to define their IoT strategies due to the various estimates that analysts had put out about the expected value of the IoT market over the next decade. There were two interesting IoT related announcements this week, firstly GE placing all their IT and software solutions into a new division called GE Digital. Slight irony here in that this is the second time GE has done this, the first time was when they established and then spun off their former IT division which later became GXS!  The second announcement came yesterday at Salesforce’s annual conference where they announced their own cloud based IoT platform.  So the IoT cloud market is certainly hotting up. In 2013 I posted my first blog discussing where I believed IoT would impact supply chain operations and from what I could tell back then, based on the number of IoT and Supply Chain articles that had been published, I was early to predict how IoT would transform tomorrow’s supply chains. Many argue that some components of an IoT environment, such as RFID tags, have been around for many years and in fact IoT has now given RFID tags a stronger sense of purpose.  However other technologies such as Big Data Analytics are really only just starting to be applied in the supply chain space. For me, I see three areas where IoT will add value to supply chain operations, I call these the ‘Three Ps’ of supply chain focused IoT, namely Pervasive Visibility, Proactive Replenishment and Predictive  Maintenance. One common aspect to all three of the above scenarios is big data analytics.  Earlier this year OpenText acquired a leading provider of embedded analytics solutions, Actuate.  Over the past few months we have been busy embracing the world of big data analytics and recently announced a cloud based analytics offering. This is quite a game changer in the big data analytics market as companies look to take their first steps into the world of analytics and OpenText Big Data Analytics in the cloud allows companies to scale their analytics platform over time and align with the size of the analytics project being undertaken. In fact yesterday, OpenText was ranked number three in a new report from Dresner Advisory Services, they looked at the Business Intelligence market in the context of IoT. It is worth noting that the chart and vendor analysis conducted by Dresner was carried out before the launch of our cloud based analytics solution, so we would probably have been ranked higher than number three out of seventeen vendors.  When you consider the size of the analytics market and the number of vendors in the space, this is quite an achievement for our solution and it puts us in a good position for companies looking to process the huge volumes of data coming off millions of connected devices in the future. OpenText Big Data Analytics is a core component of OpenText’s cloud strategy and early last year OpenText acquired another key cloud solution provider GXS.  OpenText now operates the world’s largest B2B integration network with over 600,000 companies connected to the network and these companies are processing over 16billion transactions per year.  Now wait a minute, 16billion transactions!, now that is a lot of information flowing across our network that could add a lot of value to companies if they had a way of analysing the transactions in real time. As you would imagine we are busy looking at how our Trading Grid platform could leverage the capabilities of our new cloud based analytics solution. I have spent the past two years keeping a close eye on the IoT market and it is great to think that our cloud based analytics solution provides a stepping stone into the ever growing IoT market.  But what happens when you bring the world of IoT and supply chains together?  I wanted to use the following diagram to explain how OpenText Analytics and Trading Grid could in the near future provide support for the three supply chain scenarios that I mentioned earlier, namely pervasive visibility, proactive replenishment and predictive maintenance. The diagram below illustrates a desktop demonstration of how consumption trends from a connected device can help to initiate a ‘purchase to pay’ process.  When I say purchase to pay I am talking about an order being created, goods being delivered and then payment made to the supplier.  Let me now break this diagram down into a few key steps. The first stage is the connected device itself, now it could be any type of connected device, but for this example I have chosen a WiFi enabled coffee machine. In addition, for the purposes of this demonstration, a connected coffee capsule dispenser, so as you remove a capsule this will be recognized by a proximity sensor placed underneath the capsule. The second stage is to then capture the consumption trends from the coffee machine.  So as each capsule is taken from the dispenser, a signal would be sent to OpenText Analytics which will essentially be used in this case to monitor consumption patterns and overtime trend related information and graphs etc can be displayed. The key step in this process is when OpenText Analytics detects that a certain number of capsules have been used and an order can be placed via Trading Grid for replacement capsules to be delivered from an outside supplier. This in essence is Proactive Replenishment, where analytics data is driving the ordering process. Back in January this year an article on Forbes.com discussed how in the future connected devices would potentially be able to initiate their own procurement process.  Thus taking manual ordering of replacement goods out of the supply chain process.  Now we are some way off achieving this at the moment but the IoT industry is heading in this direction. For now though a trigger from OpenText Analytics would alert a user to create a Purchase Order for ordering replacement coffee capsules. This ordering process would be initiated through one of our SaaS applications on Trading Grid and this application, Active Orders would also monitor the end to end life cycle of the order.  Mobile access to the progress of the order from the supplier to point of delivery would be available via a mobile app. The order for the capsules is received by the supplier, represented below by a robot arm, which selects the replacement capsules from a rotary capsule dispenser and then loads them on transport provided by the 3PL carrier. Now over time sensors on the robot arm would detect any potential failures with its operation.  From a maintenance point of view, the operational information coming from the sensors on the robot arm would be fed into our analytics platform and overtime you would be able to predict when a part of the robot is likely to fail.  In the real world you would then initiate a repair before the robot fails and hence your supply chain operations are not interrupted in anyway.  This is a perfect example, albeit scaled down of how IoT can drive Predictive Maintenance procedures.  In fact predictive maintenance is widely regarded as one of the most important industrial applications for IoT at this moment in time. For the purposes of this example the 3PL carrier is operating a model train!, which will carry the capsules to coffee machine on the other side of the table.  The location of the train would be monitored via an RFID tag attached to the train. The potential for improving end to end supply chain visibility using IoT and connected 3PL providers is huge and Cisco and DHL recently released a white paper discussing this opportunity. The RFID tags in this case are being used for the purposes of this demonstration but in real life a combination of RFID tags and GPS devices would be used to track the shipments. The ability to connect every piece of supply chain equipment, whether fork lift truck, lorry and pallets etc will transform supply chain visibility and will contribute towards the Pervasive Visibility across an end to end supply chain. So there you have it, a very simple example of how IoT could impact future supply chains.  The IoT market is moving incredibly quickly and who knows what new technology will be introduced over the coming years, but one thing is for sure OpenText can now provide two key components of the IoT enabled supply chain, OpenText Big Data Analytics and OpenText Trading Grid.  The world of B2B integration just got exciting.

Read More

Exciting Times for Payments Professionals!

There can be no debating the fact that the past six months has seen more change in the payments world than any six month period in history. So many new developments underway or proposed; so many exciting possibilities for a faster, safer, more efficient, more competitive and more customer friendly payments environment. Pick a region, pick a form of payment, pick a back-office or settlement function, pick a channel – it’s all changing. The vast scope of changes in payments is going to impact banks, regulators, payment system operators, technology service providers, merchants and businesses of all sizes and consumers. It is going to fundamentally change relationships and business practices that have remained somewhat static, despite technological advances, for centuries (yes, centuries!). The thought of how any one entity incorporates these changes into its business to maximize the benefits while minimizing disruption is overwhelming. So, as we enter this environment in which everything seems to be changing in overlapping timeframes, what is a payments professional to do? How does anyone who is concerned with making or receiving payments (isn’t that all of us?) make sense of this? No doubt, one could spend all day reading payments newsletters, watching webinars, reading blogs like this one, following experts on Twitter. Unfortunately, that’s just not practical for anyone other than a full-time payments geek. For the rest of us, it means relying on partners whose business it is to ensure that your particular organization is prepared for the changes that are most relevant. That might be a bank, a trade association, a technology provider, a regulator, an industry analyst or someone else. In fact, for most of you, it probably involves several of those. No matter what your role in payments, the changes will be vast but they boil down to a few themes: Electronic payments will be processed, cleared and settled more quickly New channels will be introduced providing consumers and businesses with easier to use and more convenient methods for initiating payments and processing payment information Security methods will increasingly rely on biometrics and less on passwords Global standards (specifically ISO20022) will be adopted in virtually all payment systems in the developed world and in many emerging markets facilitating inter-operability for cross-border payments Competition from non-traditional players will become the norm in (almost) every aspect of payments resulting in massive regulatory changes to ensure the safety and soundness of payment systems An increasing focus on payment-related data to provide better information and context to all participants for individual transactions as well as for trend analysis. Successful navigation of the changes that are coming requires a comprehensive strategy to future-proof your payments environment. Where to start? It’s always best to take a fresh look at your existing environment, understanding how you currently use the existing payment systems and identifying areas for improvement. Then, start to engage your partners who are closely following and/or are involved in the most relevant efforts to change payments. Attend industry events such as SIBOS 2015 in Singapore or the 2015 AFP Annual Conference in Denver. Engage with NACHA’s Payments Innovation Alliance or the EBA or whichever national or regional organization that is devoting time to payments modernization where you operate. It is an exciting time to be a payments professional. That just might be the understatement of the year!

Read More

Data Driven Digest for August 28: Treemaps

The first treemap was created around 1990, for a reason that seems laughable today: 14 people in a University of Maryland computer lab shared an 80-megabyte disk drive, and one of them – professor Ben Shneiderman – wanted to know which individuals, and which files, took up the most space. After considering circular, triangular, and rectangular representations, Prof. Shneiderman came up with the nested, colored rectangle format we use today. (His history of treemaps is fun reading if you  want to learn more.) Of course, treemaps have proven valuable for much more than determining  who’s hogging the hard drive, as evidenced by the examples below. Leaves of Green: The venerable line chart was the most common representation of Monday morning’s precipitous drop in the U.S. stock market, but we gravitated toward the treemap above, published by finviz. This visualization represents the stocks in the S&P 500, categorized by sector and sized by market cap; color represents performance. (This screenshot was taken at 9:45 a.m. PDT on Monday; look at all that red.) You can hover over any individual stock to get details, including a comparison against other stocks in its sector; simply double click to get a detailed report. You can also change the time horizon and other metrics the chart displays.  It’s endlessly interactive and interesting to use. Repping Fragmentation: Though app developer OpenSignal is best known for its wireless coverage maps, the company also does a great job of collecting non-geographic data from its apps.  The treemap above is from an OpenSignal report about device fragmentation in the Android market published earlier this month. The treemap catalogs the 24,093 different types of Android devices that downloaded the OpenSignal app over just a few months in 2015. (No wonder your mobile app-developer friends look tired all the time.) The various colors represent device brands; hover over any square to see the make and model of the device. The segments of the treemap sort from large to small – upper left to lower right – but another visualization later in the report presents the same data sorted by brand. You can also see how much the market has changed from August 2014 to August 2015 with the click of a button. All Wet: The Food and Agriculture Organization of the United Nations collects copious data in its mission to eliminate hunger, fight poverty, and support sustainability. A recent report on global irrigation uses treemaps very effectively to visualize Big Data about water use in agriculture: what irrigation technologies are in use,  what  sources of water are tapped, and how much geographic area is under irrigation worldwide. (The unit for determining rectangle size in the treemap is hectares, not water volume.) A click anywhere on a treemap brings up the data underlying the chart, including links to download the data for your own analysis. Getting cultured (bonus item): Pantheon, a website created by MIT Media Lab in 2014, shows how a treemap can support the study of history. Pantheon visualizes “historical cultural production” – you choose a country, a date range, and other parameters, and the site creates a treemap showing well-known people from that country, grouped by domain. (Domains include historian, religious figure, and pirate. Yes, pirate.)  Or you can flip this around: start with the domain, choose a timeframe, and discover which countries that have produced the most prominent people. In all cases, more details are a click away. The concept of Pantheon is easier to understand than to explain, so if you don’t get it from this write-up, click through and play with it. By the way, the Custom Visualizations capability in OpenText Information Hub (iHub) enables you to create treemaps from your own data in iHub using either D3.js or Highcharts. Check out the post about using Custom Visualizations.  

Read More

Here’s Your No. 1 Tool for Fraud Detection

In the song Talk About the Blues, the experimental band The Jon Spencer Blues Explosion declares that “the blues is number 1.”  If you’re a blues fan, you probably associate the number 1 with Jon Spencer. But if you’re interested in fraudulent or anomalous numbers, you might just associate it with Frank Benford. Here’s why: Back in 1938, Benford observed that 1 is more likely than any other number to be the first digit (also called the most significant digit) of a natural number. He determined this based on analysis of 20 different sources of numbers, ranging from articles in Reader’s Digest to population sizes to drainage rates of rivers. At a remarkable rate, the first digit of the numbers Benford studied was either a 1 or a 2 – and most frequently a 1. But Benford wasn’t the first person to discover this. The astronomer Simon Newcomb noticed it in 1881, while thumbing through books of logarithm tables. Newcomb noticed that pages with log tables for numbers beginning with 1 or 2 were grubbier than the pages for numbers beginning with 8 or 9. After some mathematical exploration, Newcomb proposed a law stating that natural numbers were much more likely to begin with a 1 or a 2 than with any other digits. In fact, he said that natural numbers had 1 as their first digit about 30 percent of the time. Newcomb’s observation wasn’t discussed much for more than 50 years. Then Benford (who worked  as a physicist for the General Electric Company), tested Newcomb’s law on 20 different data sets. Based on his calculations (his distribution is shown above), Benford declared that Newcomb’s law was “certain” – and, without hesitation, he applied his own name to the phenomenon. (Smart guy!) Now known as Benford’s Law, the idea has come to acquire an aura of mystery. After all, if a collection of numbers is truly natural – that is, “occurring commonly and obviously in nature” – shouldn’t their first digits be identically distributed across all numbers from 1 to 9? Benford’s Law is mysterious, yes, but it works.  It’s now widely used by investigators looking for fraudulent numbers in tax returns, credit card transactions, and other big data sets that are not aggregated. Obviously, it doesn’t work with dates, postal codes and other “preformed” numbers. A helpful basic discussion of Benford’s Law is available via Khan Academy. Its simplicity makes Benford’s Law really easy to apply to automatic audit procedures. You only need to compare the first digit of the set of numbers that you want to analyze against the distribution in Benford’s Law. If certain values in your data deviate from what Benford’s Law dictates, those numbers probably aren’t natural. Instead, they may have been invented or manipulated, and a deeper analysis is required to find the problem. For example, consider the distribution shown in the dots and black line on the chart below. Compare them to the blue bars and numbers, which represent Benford’s distribution. You can clearly see (because we’ve circled it in red) that more than 20 percent of the numbers in the data set have 2 as their significant digit, even though Benford’s Law says they should represent less than 18 percent. This tells us something is fishy, and it may be worthwhile to dig deeper into the underlying numbers. This kind of analysis doesn’t have to end with the most significant digit. You can also analyze the second, third and fourth digits; each has its own distribution that will allow you to isolate possible fraudulent numbers from millions of legitimate transactions for further analysis. You can apply Benford’s Law to your data really easily with OpenText Actuate Big Data Analytics. Just follow this step-by-step guide in our Free Trial Resource Center.  Cross this information with incomes, tax returns, revenues, and financial transactions. If there is something strange or fraudulent in your data, you will find it. And instead of singing the blues like the lip-syncing actors below, you’ll sing the praises of Frank Benford. Number One image by Timothy Krause, via Flickr. 

Read More

Data Driven Digest for August 21

The second* data visualization we all learned in school was the Cartesian coordinate system.  By plotting figures on a two-dimensional graph, we learned the relationship between numbers and space, unlocked patterns in those numbers, and established foundations for understanding algebra and geometry. The simple beauty of X-Y coordinates belies the power they hold; indeed, many of the best data visualizations created today rely on, and build upon, on the Cartesian plane concept to show complex data sets. Here are three examples. (Note that none of these are textbook Cartesian visualizations, because the X and Y axes represent different units.) Back to School: Our favorite “Data Tinkerer,” Randy Olson, published a blog post this week exploring correlations between earnings, gender, and college major. Using data from the American Community Survey (and building on a FiveThirtyEight article by Ben Casselman), Olson created the graph above to show his findings. Then he generated a variety of graphs (one of which is below) that fit a linear regression onto the data and add bar charts along the graphs’ sides to show quantity along both axes. The results very effectively illuminate more aspects of the same data in a very efficient format. Statistically Significant: Scientists are sometimes accused of adjusting their experiments to yield the answers they want. This practice is called p-hacking (for p-value) and is explained in a fine FiveThirtyEight article by Christie Aschwanden, Science Isn’t Broken – It’s just a hell of a lot harder than we give it credit for. The article is accompanied by the endlessly fun interactive shown above; click through to play with it. As you add or subtract parameters, the data on the Cartesian plane and the linear regression of that data change before your eyes. If you can find a connection that yields a p-value of 0.05 or less, Aschwanden says, you have data that’s suitable for publishing in an academic journal. Click here for a great explanation of p-values. Business Time: At the Harvard Business Review, Ronald Klingebiel and John Joseph delved into whether it’s better to be a pioneer or a follower by studying a very specific slice of data: German mobile-handset makers in the years 2004-2008. Their chart (above) plots many manufacturers along two axes; the number of features on the x axis, and the month of entry into the market along the Y axis. Klingebiel and Joseph then highlight two companies that succeeded (Samsung and Sagem) and two that didn’t (HP and Motorola). The authors’ hypothesis was that a handset manufacturer was more likely to succeed if it came to market early with lots of features, or if it arrived later with fewer, better-focused features. The chart, while very good, would benefit from interactivity; I’d like to hover on any dot to get the company name, and click any dot to get details of how that company performed. Without this context, I must rely on the authors’ definition of success. * The number line is the first data visualization I recall using. 

Read More

Data Driven Digest for August 14

Forgive us in California for being obsessed with water. Our unprecedented drought has caused our brains to focus on the wet stuff, so that’s the theme of this week’s Data Driven Digest. In three data visualizations, we dive into what you would see looking west or east across the ocean; the contours and makeup of the seabed; and the width of rivers throughout North America. Grab a cool glass of water and take a look. Look out: Next time you visit an ocean beach, take a few moments to ponder what’s due east or west from you. Then check out What’s across the ocean from you when you’re at the beach, in 7 fascinating maps, created by Weiyi (Dawn) Cai and Ana Swanson of the Washington Post. The beautiful maps are full of surprises; for example, I was startled to see that Japan is as long (from north to south), as the entire United States, and that Boston and Spain share the same latitude. Thanks to my colleague Michael Singer (@MichaelSinger) for suggesting this item.   Under the Sea: Want a glimpse at what lies beneath the ocean’s surface? Check out the first digital map of the earth’s seafloor, created by scientists in Australia. The map, which available in an engrossing interactive version, shows the contours of the seafloor and the sediments that cover it – ranging from calcareous ooze (light blue in the screenshot above) to sand (bright yellow). Some 14,500 data points, collected over 50 years, were compiled by researchers at University of Sydney in Australia; big data experts at National ICT Australia (NICTA) used the support vector machine model for digitization and mapping.  H/T: I learned of this map via Mia De Graff of the Daily Mail.   Moving upstream: If your travels take you to rivers rather than the ocean, we have a data visualization for you, too: Hydrologists George Allen and Tamlin Pavelsky of the University of North Carolina have spent years compiling a database of North American river widths. Their painstaking task started with countless images from Landsat satellites; they combed through them to find more than 1,750 without ice or cloud cover; then, controlling for seasonal changes, they used Rivwidth, a software program created by Pavelsky and Laurence Smith of UCLA, to calculate the width of every river. Joshua Stevens (@jscarto) of the NASA Earth Observatory then turned their data into a 2050 x 1355-pixel map; click the cropped version above to see it, and read more (including why they did it) at NASA’s Earth Observatory blog.  

Read More

Data Driven Digest for August 7

We love Twitter. We love to tweet, we love to follow data visualization experts, and we’d love to have you follow us at @OT_Analytics. But for all of Twitter’s fun and value, it often makes headlines for the wrong reasons: its users don’t understand it, it’s a takeover target, its corner office has an ejector seat. How Twitter (the company) will address those challenges, nobody knows. But we do know this: Twitter (the service) is a tremendous data source, ripe for analysis and visualization, thanks to some  very cool APIs. Here are three sites that make interesting use of Twitter data.  Do you have a different favorite? Suggest it in the comments. Feeling It: Wondering how the Twitterverse feels right now? Go to the sentiment viz created by Christopher Healey (@chris_g_healey) of North Carolina State University and type in a word or hashtag. (For the screenshot above, we searched for #dataviz.) You’ll see recent tweets using your term, plotted on an oval graph designed to visualize emotional affect. The secret sauce is a sentiment dictionary that Healey uses to analyze the tweets. The application also groups tweets by topics, groups them in a heatmap, creates tag clouds, organizes tweets by time and geography, and shows affinity between tweets, people, hashtags, and URLs.  Be sure to read Healey’s detailed write-up of the project. Chit Chat: Data scientists at Booz Allen hosted a Twitter chat in conjunction with Data Innovation Day earlier this year. For several hours, more than 170 people talked about data science in 140-character chunks, using the hashtag #datascichat. Because it was a conversation among data scientists, it’s almost inevitable that one of the participants, Marc Smith of Connected Action (@marc_smith), went on to visualize the conversation (above).  Click through to see Smith’s graph full-size, read a description of the process, and see the source data. There’s even an interactive version to explore. See Through: Twitter takes pride in its transparency. It receives many requests from governments and legal bodies for information about its users, as well as requests to remove information and enforce copyrights. (Not all requests are honored.) All of these requests are cataloged and visualized in a transparency report, compiled semiannually. In the interactive report, you first choose the type of request (information requests, removal requests, and copyright notices). Then you hover over a map to see how many requests came from a given country and how many of those requests Twitter chose to comply with. Thanks to Katharina Streater (who’s infrequently on Twitter at @KatStreater) for submitting this example.  

Read More

Moving Content to iHub, Trial Edition

Remember the last time you moved from one home to another? Moving can be a pain; it’s time-consuming and stressful, so you probably blocked it from your memory. You definitely didn’t enjoy the process. Fortunately, it’s quick and easy to move content to the just-released OpenText Actuate Information Hub (iHub), Trial Edition from an old instance of iHub, Free Edition (formerly known as BIRT iHub F-Type). You simply download content from one application and upload it to the other. (Kind of like packing and unpacking your possessions, but without all the hassle and bubble wrap.) This post walks you through the process. Download Content and Resources from iHub, Free Edition Downloading content from iHub, Free Edition is easy using iHub’s Information Console. Navigate to the Information Console and select the files and/or folders that you want. By default they’re placed in the Applications folder, but you may have stored them elsewhere. Once you’ve found and selected your content, select Download from the Action drop-down menu, as shown below. If you’re downloading folders of content, or multiple files, iHub will create a zip file containing everything you’ve selected. If you download a single file only, your content will not be in a zip file. In either case, the content will be downloaded into your browser’s download location – typically the download folder. (Make note of its location.) If your content requires shared resources – like Data Objects, data stored in flat files, or libraries – you’ll need to download those things, too. The Resources folder is the default location for these resources. Select the needed items and download them using the same process you used for downloading content. Upload into iHub, Trial Edition Now launch iHub, Trial Edition and navigate to the Information Console. Then click the Upload button (marked with a red arrow in the screenshot below). You will then be taken to the upload file selection screen. Navigate to the content you previously downloaded, and select it. You’ll need to upload content and resources separately. If you’re taking content from a zip file, make sure you select the box next to Expand archive (ZIP or TAR format file) as shown below. Keeping Folders Organized A zip file created by iHub (or any zip file, typically) includes a folder path inside the zip. For example, if you download a folder named CallCenterApp that is within iHub’s Applications folder, the zip will contain an Applications folder that has a CallCenterApp folder in it. When you upload the zip file to the Applications folder in iHub, you will end up with an Applications folder within iHub’s Applications folder (as shown below), and the CallCenterApp folder will be in it. This may cause some problems with when you run the content in CallCenterApp. You can easily fix this problem by moving the CallCenterApp folder up a level, to iHub’s Application folder. Just select CallCenterApp, then choose Move To from the Action drop-down menu, as shown below. You will then get the dialog box shown below. Use it to move the folder and all of its contents to a location immediately under the Applications folder in iHub. When you’re done, you can then delete the extra Applications folder. (This is also done using the Actions drop-down menu.) Another Alternative: Use Analytics Designer You can also use the free Analytics Designer to move content between iHub, Free Edition and iHub, Trial Edition. Download the Analytics Designer here, install it, then load your project into it. You can read more about loading content directly into Analytics Designer in the Downloading files section of the Analytics Designer documentation. Once your project is in Analytics Designer, you can publish it directly into iHub. For more information on how to do this, please see the Deploying applications section of the iHub documentation. If you run into any issues while moving your content, please leave a comment here or post your question in the Developer Forums. Good luck with the move, and thank you for using iHub, Trial Edition! Image credit: Flickr user TheMuuj.

Read More

Big Data Analytics: What It Is, Why You Want It

“The promise of big data is the ability to make predictions based on it. Information can be viewed and analyzed, trends can be understood, and correlations can be plotted. The challenge, of course, is finding the value, especially when content volume is increasing.”        Mark Barrenechea in Digital: Disrupt or Die With our CEO’s challenge in mind, we’re proud to announce OpenText Actuate Big Data Analytics 5.2, the newest version of our advanced analytics software appliance. The new version rises to the Big Data challenge and improves the end-to-end journey that data takes: The journey starts with input: There’s a new, visual way to load data into Big Data Analytics It continues with data engineering: We’ve added more ways to work with data in the product The journey culminates with output: Big Data Analytics has a new, efficient way to place the results of analysis in users’ hands This blog post explores these enhancements and answers two central questions about each of them: What is it? And why would you want it? Input: New Data Loading Module What it is: We have enhanced the way you load data into Big Data Analytics and expanded the number and variety of data sources you can analyze. A new data loading module lets you select from more than a dozen different data source types with a click (including a few new sources; more on that in a bit), and then link those data sources for analysis by drawing lines from table to table. Why you want it: Loading data into Big Data Analytics’ Fast DB is the precursor to all data analysis. Previous versions of Big Data Analytics used a tool called qLoader that required a level of technical expertise that not all business users have. The new data loading module removes that hurdle and makes the process much more visual and intuitive.  By the way, qLoader is still there for IT to use. The new version also lets you load more types of data into Big Data Analytics. You can now load data directly from Microsoft Excel spreadsheets without first exporting to CSV. You can also bring in data from another Big Data Analytics instance. That second option unlocks some intriguing potential use cases. Take, for example, a company that has massive volumes of corporate data but that wants to run analytics at the department level. In that situation, a large, primary instance of Big Data Analytics could be used for initial data engineering and refinement, and then the departmental instances could tap into that engineered data to perform focused, specific analysis – without risking accidental damage or rogue engineering of corporate data. The opposite case is possible, too: Data from multiple departmental instances of Big Data Analytics could be combined into a large central instance for analysis that spans across a company. However you look at it, this new capability improves efficiency, scalability, data quality, and other important practices. Data Engineering: Regular Expressions for Search and Replace What they are: Regular expressions provide powerful pattern matching and search-and-replace within data strings. Big Data Analytics provides two types of regular expressions: REGEXMATCH to find patterns within strings, and REGEXREPLACE to find patterns within strings and replace them. Why you want them: Regular expressions enable Big Data Analytics users to engineer data prior to and during analysis. The regular expression REGEXMATCH lets you determine that a string exists in your data (as a precursor to further data analysis), and REGEXREPLACE lets you find strings that match a specified pattern and replace them with another string (to put all dates in a consistent format, for example). If you’re not familiar with regular expressions, Wikipedia explains of what they are and how they work. Output: REST API Enables Embedding of Analytics Results What it is: The REST API in Big Data Analytics enables results of analysis to be ported into other applications. Why you want it: There are plenty of circumstances when we need to show just the results of Big Data Analytics. (By results, we mean the Venn diagrams, Pareto analysis, Forecast charts, Linear Regression visualizations, and such – without all of the algorithms that generated those results.)  The REST API enables Big Data Analytics results to be ported out in JSON (JavaScript Object Notation) format, which enables that content to be embedded in any web-based application. With REST API, Big Data Analytics results can appear on any platform, including tablets, smartphones, and even many wearable devices. The REST API in Big Data Analytics comes with interactive documentation and a test sandbox, using Swagger, that delivers a complete framework for describing, producing, consuming, and visualizing REST web services. Its look and feel are identical to those of the REST API in iHub. What’s In It For You? When combined with the powerful algorithms and amazing performance that Big Data Analytics is known for, these enhancements make Big Data Analytics users more agile, flexible, and self-sufficient. And that translates into business benefits such as increased revenue, improved services, and optimized operations. Most important, Big Data Analytics helps you to unleash the value in your data. So if you’re ready to take your data on an analytics journey, download Big Data Analytics 5.2 today. Use it for free for 30 days; you can load in your own data, engineer it, analyze it, and embed the results in other applications. You’ll see that your data journey is just beginning.  

Read More

Data Driven Digest for July 31

Data visualization exists to help us learn things from, and about, data. It helps us answer questions like these: What are the trends in the data? What data is most common and most unusual? What information is important and what’s not? Appropriately, this week’s Data Driven Digest is all about learning: Learning about words, learning about complex concepts, and learning about education. We can all take a lesson or two from these three great examples. Warning: We get a little geeky this week. Word Nerds: We love data visualizations that analyze words, and a fabulous example appeared this week. The mysteriously named Abacaba has created the beautiful, almost hypnotic video above to visualize word frequency in written English. The video is based on the work of Peter Norvig, director of research at Google; Norvig analyzed the entire contents of Google Books, finding 97,565 words, used 743,842,922,321 times. His paper on the project, English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU, is charmingly written and great reading.  Abacaba has two similarly wonderful videos on a YouTube channel. Know the Code: Do you want to understand machine learning, but don’t know where to start? You can’t do better than R2D3.us, where you’ll find A Visual Introduction to Machine Learning. We love everything about this site: Its fluid, data-driven storytelling; its clear, interactive data visualizations (a static snippet of which is above, but click through for the full, animated experience); even its clever pun name. (Its creators are Stephanie Yee, who interprets R2, and Tony Chu, who visualizes data with D3; both work at Sift Science.) We can’t wait for the next post. School’s In: Thinking about pursuing a degree in data science? Ryan Swanstrom recently compiled a huge list of academic programs globally, and Alex Salkever used Swanstrom’s data to build an interactive site (on Silk.co) that makes the data fun and easy to explore. Salkever wrote about his project on KD Nuggets, noting “While data science is probably closer to computer science than anything, business schools offer more programs than any other department type.” If you’re not ready to go back to school, another place to start is our blog post, 7 Free or Cheap Ways to Learn Data Science.  

Read More

Accelerating B2B Managed Services with SAP

For those of you new to OpenText but not new to GXS, you may not know that for twenty years, OpenText has been a close partner of SAP. In fact, SAP awarded OpenText their prestigious SAP Pinnacle Award the last 8 years in a row. OpenText is the established leader in managing unstructured information in context of business processes in a SAP environment. Now, SAP and OpenText have announced an extension to this already successful partnership to include OpenText B2B Managed Services. OpenText B2B Managed Services is an outsourced solution for managing day-to-day B2B operations required when exchanging a wide variety of transactions with trading and business partners. Running on our B2B integration platform known as the OpenText Trading Grid, part of the OpenText Cloud, OpenText extends the SAP Business Network with its reach to the more than 600,000 trading partners currently connected to the OpenText Trading Grid. This partnership is a recognition of OpenText’s leadership in B2B Integration. Of course, supporting SAP with OpenText B2B Managed Services is nothing new. OpenText/GXS has worked with many companies around the world to manage their SAP and B2B integration projects. Whether it is helping companies integrate to multiple global instances of an SAP platform or providing integration to a newly installed instance of SAP, OpenText has a wealth of experience with managing such projects.

Read More

Data Driven Digest for July 24

Marriage is on the minds of data scientists and data journalists this summer. That’s our suspicion, based on the many blog posts, articles, and data visualizations that have come onto the radar at the Data Driven Digest. We’ve picked three favorite examples that – pardon the pun – wed  serious data science with attractive, clear data visualization. Do any these ring a bell with you? State of the Union:  “So, where are you from?” It’s not just a pick-up line; the answer may well help you determine whether or not a person is marriage material. At least you might deduce that from the great interactive map (screenshot above; click through to explore) published by David Leonhardt and Kevin Quealy on the New York Times Upshot. Mouse over any U.S. county and see whether its residents are more or less likely to be married by age 26 than the national average. Leonhardt and Quealy’s thoughtful piece also examines and visualizes marriage rates according to political and economic factors. Ups and Downs: We all know, anecdotally, that marriage rates in the United States are declining and that about half of marriages end in divorce. Randy Olson has gone beyond the anecdotes to get actual data, and has written a thorough blog post on his findings.  The charts accompanying his post (including the one above) are a model of clarity.  Olson scraped data from years of old reports generated by the National Center for Health Statistics (NCHS), and has generously posted the resulting cleaned-up dataset for everyone to explore. Working It: On NPR’s Planet Money blog, Quoctrung Bui published a post about marriage and work – more specifically, the employment status of spouses in heterosexual marriages. The primary chart (above) shows the rise – and slight fall – of two-income households from 1968 to 2013, but be sure to click through to read the entire, illuminating post. One thing the chart exposes is the slow but steady growth of married households in which only the wife works; that’s the light blue section at the top of the chart. Bui’s chart is quick and easy to read, and earns bonus points for tweaking the traditional pink=female, blue=male color scheme. (Coincidentally, the pink/blue topic was recently explored in detail by Andy Kirk.) Party On (bonus item): It’s unrelated to marriage, but here’s a holiday that readers of the Data Driven Digest can celebrate: World Statistics Day, October 20, 2015. To mark the holiday, Unite Ideas – an open data project of the United Nations – has launched a new challenge, called #WSD2015 Data Visualization Challenge. (Yes, that’s the name, complete with the hashtag.)  The UN provides Millennium Development Goals (MDG) data, which tracks education, sustainability, poverty, and other stats from around the globe.  Entrants are challenged to “visualize MDG data to answer a question relevant for development policy,” and are encouraged to blend in other datasets.  It’s kind of wonky, but the challenge provides an opportunity to use data science and visualization skills for good. (You can’t argue with the tagline, “Better data. Better lives.”) Deadline for entries is September 20, 2015.

Read More

Installing iHub, Trial Edition on Windows

OpenText Actuate Information Hub (iHub), Trial Edition is a free evaluation version of OpenText’s enterprise-grade data visualization server that you can download and use to bring your report designs and applications to life. This blog will walk you through the process of installing the software on Windows so you can begin your iHub projects and application deployment. Before getting started, let’s take a quick glance at the system requirements and supported operating systems. System Requirements Windows: x64 compatible, Pentium 1 GHz or higher RAM: 8 GB minimum Free Disk Space: 3 GB minimum Supported Operating Systems  (64-bit only) Windows:  7 Professional, 7SP1 Professional, 8.1 Windows Server: Windows Server 2008, 2008 R2, Windows Server 2012 Standard, 2012 R2 Download iHub, Trial using the icon above. After the download is complete, launch the executable file.  The following welcome message will appear.  Press Next to continue. You must read and accept the license agreement on the next screen. Choose a destination folder for the installation.  The default is C:Actuate3. Then, on the Windows Services screen, use the default “Local System Account” and click on Install. Several screens of information about iHub will appear as the installation progresses. After the install process has completed with an “Installation was successful” screen, select the “Launch iHub, Trial Edition” checkbox and click Finish to access it. Your default browser will launch and connect to http://localhost:8700/iportal.  A shortcut to that same address will appear on your desktop. To log into iHub, use the credentials of “administrator” as the username, and a blank password. There are 45 days for the trial period.  After you log into iHub, an indicator showing the days remaining will display in the upper right hand corner of the page. Additional Installation Notes: To install iHub Trial Edition on Windows, the Windows user must: Be a member of the Windows Administrators group or, at a minimum, have all data and network permissions required to run iHub applications Have “log on as a service” privilege Thanks for reading. Now, it’s time to unleash the full power of Analytics into your application. If you have any questions or comments, please feel free to use the comments section below or visit the Working with  iHub forum.    

Read More

Installing iHub, Trial Edition on Linux

OpenText Actuate Information Hub (iHub), Trial Edition is a free evaluation version of OpenText’s enterprise-grade data visualization server that you can download and use to bring your report designs and applications to life. This blog will walk you through the process of installing the software on Linux so you can begin your iHub projects and application deployment. Download iHub, Trial using the icon above. Before getting started, let’s take a quick glance at the system requirements and supported operating systems. System Requirements Linux: x64 compatible, Pentium 1 GHz or higher RAM: 8 GB minimum Free Disk Space: 3 GB minimum Supported Operating Systems    CentOS: 6.5, 7.0 OpenSUSE Linux: 12.2, 13.1 Oracle Linux: 5.5 Red Hat Enterprise Linux RHEL: 5.10, 6.1 – 6.5, 7.0 SUSE Linux Enterprise Server 11 SP3 The target system I have installed the BIRT iHub Trial Edition on has the following specs: CentOS 7 (based on CentOS-7-x86_64-Minimal-1503-01.iso) 2 CPUs 8 GB RAM 20 GB free disk space Before installing, the C++ Runtime Library (libstdc++) is a prerequisite for running BIRT iHub on Linux systems.  If this library package is not installed, the install script will display a warning and the install script will stop. If running from virtual environments such as Amazon EC2, where the IP address or hostname may change from a restart, one solution is to assign a hostname. This step should be completed prior to running the installation script.  See below for this additional step. On my CentOS 7 system, I installed it via: sudo yum install libstdc++.i686 After downloading the iHub Trial Edition, I extracted it.  You can place them anywhere such as /usr/local/ihub. tar –xvf BIRTiHubTrialEdition.tar.gz Then I changed to the BIRTiHubTrialEdition directory and started the installation.  It should be noted that you do not want to run these commands with elevated privileges such as the “root” user. cd BIRTiHubTrialEdition ./install.sh After reading through the license agreement, I pressed “y” and enter to accept.  Depending on your system specs, the install process shouldn’t take more than a minute or two.  If successful, the installer will display progress messages and end with: Installation complete To log in to BIRT iHub Trial Edition, open a browser window and type the following URL: http://localhost:8700/iportal From a browser on another machine, I was able to access iHub Trial Edition via its IP address (http://ipaddress:8700/iportal). Note that for a default install of CentOS 7, the firewall is automatically enabled.  For testing, you can completely disable and stop the firewall as a root/admin user ( systemctl disable firewalld and systemctl stop firewalld ).  Or you can just allow for TCP traffic on the default port 8700 with the root/admin user ( firewall-cmd –zone=public –add-port=8700/tcp ). To log into iHub, the username is “administrator” and the password is blank. There are 45 days for the trial period.  After you log into iHub, an indicator showing the days remaining will display in the upper right-hand corner of the page. Thanks for reading. Now, it’s time to unleash the full power of Analytics into your application. If you have any questions or comments, please feel free to use the comments section below or visit the Working with  iHub forum. Footnote for fixing hostname (for virtual environments such as Amazon EC2) In a virtual environment where the IP address or hostname may change from a restart, such as in Amazon EC2,  one solution is to assign a hostname. This step should be completed prior to running the installation script. Assign a hostname using the hostname command, or by editing /etc/sysconfig/network NETWORKING=yes HOSTNAME=myHostname Add that hostname value to /etc/hosts for the “localhost” IP address. 127.0.0.1 myHostname, localhost .. By assigning a hostname, the machine, will call itself via the loopback regardless of its current assigned IP  or hostname. This hostname change is intended for internal server process communications only.  Accessing the BIRT iHub server externally from a different machine’s browser should still be refereced via its external hostname, or external IP address. For more information on changing a hostname in Amazon EC2, visit http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-hostname.html Related iHub Install Blogs Installing iHub Trial Edition on Windows Installing iHub Trial Edition on VMware Editor’s note: This article was updated on Sept. 14, 2015 to reflect a clarification on virtual environments.

Read More

Using the iHub, Trial Edition VMware Image

OpenText Actuate Information Hub (iHub), Trial Edition is a free evaluation version of OpenText’s enterprise-grade data visualization server that you can download and use to bring your report designs and applications to life. This blog will walk you through the process of installing the software on a VMware instance so you can begin your iHub projects and application deployment. Download iHub, Trial using the icon above. After the download is complete, unzip the file to the location of your choice. If you don’t have VMware Player, now is a good time to download it. Launch your VMware Player. Click on “Open a Virtual Machine”. Navigate to where you unzipped your download to and select your ActuateBIRTiHubTrialEdition.vmx file. You should now see the Actuate BIRT iHub, Trial Edition virtual machine in your VMware Player. Now, select Play Virtual Machine. The virtual machine will now start up. The operating system on the VM is a Linux one, specifically CentOS 6.6.  Once the virtual machine has loaded, you’ll see this screen. Don’t worry about the login. You won’t need to do anything with that. Your virtual machine is now up and running. Above the login, you’ll see a URL to access your BIRT iHub, Trial Edition. Enter that URL into your browser’s address bar on your host machine and hit enter. If you do not see an IP address, in the welcome message, please see the Network Troubleshooting information later in this post. The user name is “administrator” and the password is blank for your first log in. You’ll be able to change this once logged in. Hit “Log In” and you’re done. There are numerous examples and applications located throughout the /Applications and /Public directories in the iHub. The number of days left in your trial is indicated in the upper right-hand corner. Troubleshooting Network If your virtual machine does not start up with an IP address using the default configuration, try changing the virtual machine’s network connection setting to “NAT” or “Host-only”. Stop the VM via the menu Player > Power > Shut Down Guest. Launch VM Player. Select “Actuate BIRTiHubTrialEdition” and click on the “Edit virtual machine settings” Change the Hardware > Network Adapter > Network connection setting from “Bridged: Connected directly to the physical network” to “NAT” as shown below. Click OK Click on “Play virtual machine” Now the virtual machine should start up with a URL with an IP address Shell access There are two users set up on the image. The second user is the user that installed BIRT iHub. User 1 ° root ° password User 2 ° birt ° password Thanks for reading. Now, it’s time to unleash the full power of Analytics into your application. If you have any questions or comments, please feel free to use the comments section below or visit the Working with  iHub forum. Related iHub Install Blogs Installing iHub Trial Edition on Windows Installing iHub Trial Edition on Linux  

Read More

Five Reasons Why Cloud B2B Platforms Contribute Towards Greener Supply Chains

In my last blog, available HERE, I discussed how B2B automation contributed towards developing greener supply chains. I also explained how our cloud based Trading Grid platform was connected to over 600,000 businesses who collectively exchange over 16 Billion transactions per year around the world. In this article I thought I would take a slightly different look at how cloud based B2B environments contribute towards developing greener supply chains. Today’s CIOs are accelerating their deployment of cloud based environments as they offer many operational benefits for companies. For example introducing improved infrastructure flexibility to react to market demands, simplifying the management of global business applications and of course providing improved predictability of long term fixed costs for managing applications. Here is a short video introducing OpenText’s cloud. However as well as these operational benefits, there are a number of green related benefits that are probably not appreciated when a company decides to deploy a cloud based infrastructure.  Companies today have a choice of running on premise, hybrid or full cloud solutions however for the purposes of this blog I will discuss green benefits relating to full cloud based environments.  When companies deploy a full cloud solution they will obtain some significant indirect green benefits: 1. Reduction in paper usage due to the automation of manual based business transactions – Many companies have struggled to encourage all their trading partners, especially those in emerging markets, to exchange information electronically. Instead, many smaller suppliers still use manual paper based processes. For example in China the fax is still seen as one of the main business related communication methods. Also, there are various systems for exchanging shipping related documentation between logistics carriers and across customs and border control agencies. The very nature of cloud based environments means that they are quick to deploy, easy to use and simple to maintain on a daily basis, in other words ideal for use in the emerging markets and help to ensure you can achieve 100% trading partner enablement. The use of web based forms to replicate paper based form content means that even the smallest or least technically capable trading partner can simply enter or view information directly via a web based portal environment. Paper based versions of web forms can still be printed off if required, but in a cloud environment this is more of an on-demand process. Once information is entered via web based forms it automatically gets fed into some form of Software-as-a-Services application hosted within the cloud environment. 2. Lower power consumption requirements due to retiring legacy server and network hardware – This is one of the most important green related benefits that a company can realize by adopting a cloud based B2B infrastructure. Many companies around the world have spent millions of dollars establishing their own in-house data centres or server based infrastructures. From  investing in highly available power supply infrastructures with uninterruptible power supplies (UPS) with diesel generator backups, extensive lighting infrastructures, through to implementing complex networking and air conditioning systems.  When you combine all of these data centre infrastructure related assets together they contribute significantly towards a company’s overall carbon footprint. 3. Less data centre related equipment packaging to dispose of – In-house data centres require numerous servers, storage devices, networking equipment etc and these typically arrive from the IT suppliers in large, over packed boxes containing cardboard, wood, plastic and polystyrene.  Once these pieces of equipment have been delivered, the packaging needs to be disposed of carefully or recycled but some could end up as landfill.  In addition, depending on where your data centre equipment would traditionally be sourced from, the associated 3PL & transportation companys’ carbon footprint would also be considerably reduced. 4. Minimizes travel requirements for IT implementation resources – Many companies have established global IT teams to support their business operations.  However in some cases there may be a need to extend an IT infrastructure into an emerging market such as China or India.  In most cases companies will struggle to secure local resources to both implement and maintain a regional data centre and IT staff from other regions will be flown in at great expense to ensure a new IT infrastructure is up and running as soon as possible. Over the years companies have got use to flying IT staff around the world to support their remote operations, but how many companies have actually calculated the volume of greenhouse gases that have been created travelling to these locations and having data centre related hardware delivered by 3PL providers? 5. Encourages enterprise adoption of low powered mobile devices – The exponential growth in the adoption of mobile devices such as tablets and smart phones, combined with the development of mobile apps to get access to cloud based enterprise resources has helped to reduce the power consumption requirements across the internal and external enterprise.  Up until a few years ago, enterprise resources were accessed through power hungry laptop and desktop PCs.  The introduction of simple to use mobile apps has helped to extend the battery life of devices such as Apple’s iPad. These devices have also transformed how employees work remotely and whilst on the move, ie less battery recharging required.  Another benefit of offering mobile access to cloud based resources is that information is available anytime, anyplace and anywhere so for example logistics carriers can process shipping information in a shorter time, minimizing border control related delays and thus ensuring that shipments reach their destination in a much shorter period of time. If you would like to find out more about OpenText’s Cloud then please click HERE.

Read More

Achieving Equal Access in Health Care Information

As per a report published by the Equal Rights Center in 2011, blind and visually impaired individuals routinely face barriers in receiving information regarding their health care including documents such as test results, prescriptions, etc., benefits information such as Explanation of Benefits, eligibility and termination information and e-delivered communications such as billing statements, summary of benefits and more in accessible formats. This includes information received by visually impaired Americans being covered by Medicare and Medicaid. These individuals are often presented with work-around solutions, such as relying on friends, family or healthcare practitioners to read their private medical information to them. Not only is this a breach of the individual’s privacy, but also leads to outcomes that could result in poor health and loss of benefits. The Centers for Medicare and Medicaid (CMS), an agency of the US Department of Health and Human Services, is the largest single payer for health care in the United States. As per data from the CMS: 90 Million Americans receive healthcare coverage through Medicare, Medicaid and the State Children’s Health Insurance Program. Approximately 4.3 million individuals over the age of 65 report some form of visual impairment. There are also approximately 700,000 Medicare beneficiaries between the ages of 21 and 64 who have some form of visual impairment. Private healthcare insurers have been contracted by the Centers for Medicare and Medicaid Services to offer Medicare and Medicaid programs, and these insurance providers must meet federal regulation i.e. Section 508, requiring that they ensure access to and use of their websites and digital documentation to people with disabilities, including the blind or visually impaired individuals. Non-compliance could lead to penalties and the loss of lucrative contracts for insurers. It is therefore no surprise that document (e.g. PDF) accessibility is a hot-button issue for government and even private healthcare insurers contracted by the CMS. As “public accommodations” under the Americans with Disabilities Act (ADA), healthcare insurers are generally well aware of their legal responsibility to customers with disabilities such as visual impairment, and are quite used to complying with these regulations. But now that accessibility requirements are expanding into cyberspace, healthcare insurers need to find appropriate technology solutions for this new challenge. Until a couple of years ago, it simply had not been possible for healthcare insurers to create high-volume, communications and documents in accessible PDF format. The sheer scale of production, with documents numbering in the thousands or millions, precludes manual remediation because of several limiting factors: Costs of manually remediating documents Delivery time due to the laborious nature of manual remediation Stringent accessibility tagging requirements OpenText has created an automated, software-based solution to address these very limitations. The OpenText Automated Output Accessibility solution can generate accessible PDFs from any high-volume, system-generated input print stream or other formats quickly and efficiently, while keeping storage size at bay. The solution was designed using thousands of man-hours worth of very specific experience and expertise in the system-generated document accessibility space, and our industry-leading transformation engine enables generating accessible output in the milliseconds. In fact, the output generated from this solution has been reviewed by the National Federation of the Blind and other prominent organizations for the visually impaired. Learn more about the OpenText Automated Output Accessibility solution at http://ccm.actuate.com/solutions/document-accessibility.

Read More

Data Driven Digest for June 19

This Sunday marks the first day of summer in the Northern Hemisphere. As the days get warmer and longer, many of us think about getting healthier and going outdoors. Data visualization can reveal information about our health, the healthcare system that supports us, and the risks of certain vacation activities. The Data Driven Digest has collected some favorite examples over the last few weeks. Healthy Months: In a study published in the Journal of the American Medical Informatics Association, scientists at Columbia University reported on a large-scale study (1,688 diseases, 1.7 million patients) that uncovered relationships between birth month and disease risk. (Read more about the work here.) The Columbia team was led by Nicholas Tatonetti (@nicktatonetti), a researcher who applies data science to medical questions; the Tatonetti Lab produced the user-friendly visualization of the study data above. The chart encapsulates complex data in an easy-to-read format and exposes outliers very quickly. We’d love to explore an interactive version so we could zoom in and find out what disease each dot represents. Medical Work-Up: Knowing what diseases you’re prone to does little good if you can’t find a medical professional. That’s a growing problem, according to Bloomberg News. An article titled The U.S. Economy Can’t Hire Health-Care Workers Fast Enough, based on Bureau of Labor Statistics data, reported that “there were about 1.8 jobs available for every person who was hired” in healthcare-related jobs. In all private industries, the ratio is just 1.05:1, as shown in the chart above. We admire how Michelle Jamrisko (@mljamrisko) presents a lot of data – including historical trends – in the article using just two concise charts. Crossing the Chasm: Geospatial experts at Esri (@Esri) have shared a fascinating (if somewhat morbid) data visualization of Death in the Grand Canyon. The graphically powerful map-based visualization “helps tell the fascinating and heartbreaking stories of more than 700 lives lost. The map quickly shows patterns, clusters, and isolated incidents across the national park so we can understand how and where people died,” according to Esri’s introduction. The number on each hexagon shows how many lives were lost in a given spot; click on a number to see just how the people perished (including “critter or cacti” and “flash flood victims”). Be sure to click through to explore the interactive version. Oh, and: Yikes. Healthy Foundations (bonus item): HealthData.gov, the U.S. government’s web catalog of health, social services, and research data, has just relaunched on a new, more efficient technology platform. Damon L. Davis (@damonldavis) explains that the HealthData.gov team plans to build more robust tools and dashboards based on its new foundation. There aren’t great data visualizations to share yet, but we’re certain they’re coming – after all, the HealthData.gov team has already issued several Big Data challenges. If you create something based on HealthData.gov, please let us know so we can feature it on the Data Driven Digest. Like what you see? Every Friday we share great data visualizations and embedded analytics.  Recent Data Driven Digests: June 12: Steph Curry’s jump shot, free throw arcs, Tweeting the NBA Finals June 5: Wind measured from space, under California coastal waters, age of Los Angeles buildings May 29: German ages, supercentenarians, age gaps in romantic movies, timelines explained

Read More

How Small Data Transforms Business Today

Everyone talks about the importance of Big Data and how harnessing the power of information from a plethora of sources can be a real game changer. However, it’s often the more defined data sets that can help transform a business into a competitive powerhouse. This is considered small data, not Big Data. For example, Big Data projects typically cover a broad array of information generated from business sources such as sales transactions combined with geographical data, government data and even social media chatter. Small data, by contrast, is derived from local sources and in more digestible batches. Viewed another way, if Big Data has been largely about machines and processing power, small data is about people, context and individual requirements. And it’s transforming businesses in incredible ways. Consider the case of a lunch shop in California whose manager used her data to help increase cash flow and decrease supply costs. Lindsay Hiken, who owns the Village Cheese House (@VCHPaloAlto) in Palo Alto, Calif., began tracking various bits of known data that were important to her business: the demographics of people coming in, food supply prices, daily receipt data, and more. Companies collect data, but not many small business owners are looking at it. Hiding in those numbers could be some great strategies to make your company thrive. With the help of a small data approach, Hiken found billing and ordering errors that were wasting money, and customer information that helped her streamline her product selection to please her younger clientele. In a recent television program segment on MSNBC’s OPEN Forum, OpenText VP Product Marketing and Innovation Allen Bonde (@abonde) provided some valuable insight into how businesses large and small should be thinking about information. “This is the difference between a business that generates data, which all businesses do, and a business that is data driven where the data becomes part of the decision-making process,” Bonde related. The key to bridging the gap between having data and using it, according to Bonde, is to take a fresh look at the role of data and analytics, and apply a market-driven view informed by the needs of all stakeholders: front-line users, marketers and analysts, clients, and of course IT and development teams. Bridging the gap also requires a simplified view of the analytic process that streamlines how and where we apply various analytic techniques. Feel free to check out the MSNBC segment in the media player below. For more insight into small data and its significance for businesses, check out Bonde’s latest article, Turning Data into Insight: A Market Driven View of Big (and Small) Data Analytics, published by the Wall Street Technology Association.

Read More

Big Data Is Still a Game Changer, but the Game has Changed. Here’s How.

Not long ago, organizations bragged about the large volume of data in their databases. The implied message from IT leaders who boasted about their terabytes and petabytes and exabytes was that company data was like a mountain of gold ore, waiting to be refined. The more ore they had, the more gold – that is, business value – they could get out of it. But the “bigness” of Big Data isn’t the game changer anymore. The real competitive advantage from Big Data lies in two areas: how you use the data, and how you provide access to the data. The way you address both of those goals can make or break an application – and, in some cases, even make or break your entire organization. Allow me to explain why, and tell you what you can do about it – because mastering this important change is vital to enabling the digital world. How Big Data Has Changed Each of us – and the devices we carry, wear, drive, and use every day – generate a surge of data. This information is different from Big Data of just a few years ago, because today’s data is both about us and created by us. Websites, phones, tablets, wearables and even cars are constantly collecting and transmitting data – our vital stats, location, shopping habits, schedules, contacts, you name it. Companies salivate over this smorgasbord of Big Data because they know that harnessing it is key to business success. They want to analyze this data to predict customer behavior and likely outcomes, which should enable them to sell better (and, of course, sell more) to us. That’s the “how you use data” part of the equation – the part that has remained pretty consistent since market research was invented more than 100 years ago, but that has improved greatly (both in speed and precision) with the advent of analytics software. Then comes the “how you provide access to data” part of the equation – the part that highlights how today’s user-generated Big Data is different. Smart, customer-obsessed businesses understand that the data relationship with their consumers is a two-way street. They know that there is tremendous value in providing individuals with direct, secure access to their own data, often through the use of embedded analytics. Put another way: the consumers created the data, and they want it back. Why else do you think financial institutions tout how easily you can check balances and complete transactions on smartphones, and healthcare companies boast about enabling you to check test results and schedule appointments online? Making your data instantly available to you – and only to you – builds trust and loyalty, and deepens the bond between businesses and consumers. And like I said earlier, doing so is vital to enabling the digital world. The New Keys to Success But when a business decides to enable customers to access their data online and explore it with embedded analytics, that business must give top priority to customers’ security and privacy concerns. In a blog post, “Privacy Professor” Rebecca Herold notes that data breaches, anonymization and discrimination rank among the Top 10 Big Data Analytics Privacy Problems. Her post is a must-read for organizations that plan to provide data analytics to customers. To underline Herold’s point, Bank Info Security says that personal data for more than 391.5 million people was compromised in the top six security breach incidents in 2014 – and that number does not include the Sony breach that made headlines. Security and privacy must be a primary consideration for any organization harnessing Big Data analytics. Remember what Uncle Ben said to Peter Parker: “With great power comes great responsibility.” Meeting the privacy and security challenges of today’s user-generated Big Data requires a comprehensive approach that spans the lifecycle of customer data, from generation through distribution. If you want guidance in creating such an approach, check out the replay of a webinar I presented on June 23, Analytics in a Secure World. My colleague Katharina Streater and I discussed: The drivers and trends in the market What top businesses today do to ensure Big Data protection How you can secure data during content generation, access, manipulation and distribution Strategies for complying with data security regulations in any industry If you watch the replay, you’ll come away with great ideas for securing data from the point of access all the way through to deployment and display of analytic results. We explained why a comprehensive approach minimizes the risk of security breaches, while simultaneously providing a personalized data experience for each individual user. We closed the program by explaining how OpenText Analytics and Reporting products have the horsepower required to handle immense volumes of data securely. We showed how the OpenText Analytics platform scales to serve millions of users, and explained why its industrial-strength security can integrate directly into any existing infrastructure. Please check out Analytics in a Secure World today. Privacy Please image by Josh Hallett, via Flickr.

Read More