Why open source should drive AI development in Life Sciences

We stand on the verge of a revolution in Life Sciences. Artificial Intelligence (AI) has the power to change everything. It can handle the vast amounts of data being created, continuously learn even as it’s exposed to more data and deliver actionable insights for better decision-making.

There is little doubt that the next few years are going to bring some incredible developments to the sector. How best do we get there? In this blog, I’ll give the case for open source.

AI – particularly when enhanced with advanced analytics – is already beginning to have tangible effects, especially within health care. It’s helping doctors in designing treatment plans. It’s helping identify the best treatment methods for patients – including self-diagnosis and self-treatment. It‘s automating many of the repetitive tasks for doctors and nurses freeing them to concentrate on patient care.

Frost & Sullivan reports that AI systems will create $6.7 billion in global revenue from healthcare by 2021, compared with $811 million in 2015. That’s without looking at the benefits of AI enhanced analytics in other areas of Life Sciences such as dealing with the huge volume of data from clinical trials.

Open source and AI

The concept of open source has been around in the software industry for many years. In effect, the source code of a particular technology or solution is open for everyone to add to and improve. This approach has been proven to speed product innovation and improve product quality through communities of developers working together to address bugs and iron out the opportunities and risks within the product.

It enables the development of new feature sets to the original product and the development of related products that with complement its functionality quickly and cost-effectively.

Open source is proving to be particularly attractive within the AI community. Not only are many of the core elements of modern AI systems – such as Hadoop and SPARK – open source software but vendors including Microsoft, Google and Amazon have open sourced their AI solutions. OpenText’s AI enhanced analytics solution – OpenText™ Magellan is an open source AI enhanced cognitive analytics platform based around SPARK.

I believe the reason that so many take the open source route is that the implementation of AI is only limited by imagination.

The more people that are involved, the more likely it is that we will see new innovations appear quickly – developed by an active community that can address quality, reliability and confidence issues as they go.

Whether Amazon enhancing Alexa or healthcare developers working on genetic data collection and analysis, real exploration, adaptation and improvement comes from opening – not limiting – access to the relevant data sets and models.

Neither the medical or scientific communities are novices when it comes to open source. Both the concept of open source and the Life Sciences communities using it are mature enough to be able to use the approach to gain the best results. Within AI, we are already beginning to see communities coming together – such as healthcare.ai – to pioneer open source AI development in healthcare.

An important step is that we enable the ‘democratization’ of the data – data sets are available to the people that can use them and who have the skills to derive meaningful insight from them.

Breaking open the Black Box

I’d like to talk a little about the division of labor. Some discussion of AI has centered around what happens if it takes over from humans in thought and decision-making. We enter the realm of doom-laden sci-fi predictions about the rise of the robots.

However, the truth is that AI – and cognitive analytics – are there to do the intensive data handling and number crunching on Big Data that is simply not possible in any other way. The intelligence work of creating and testing hypotheses, developing algorithms and models, and taking informed decisions are all done by humans – and that’s not changing any time soon.

There is no substitute – not even years and years of constant machine learning – for in-depth knowledge of the sector, understanding the theory and practice of issues surrounding the problem being addressed and the ability to learn from the nuances in how things actually work. Within Life Sciences, the doctors, nurses, clinicians and researchers provide the original thought and intelligent direction that steers the development and operation of AI- enhanced analytics solutions.

It seems natural that large software vendors that have invested millions in the development of their AI solutions would want to protect the intellectual property of the work they have done to create their algorithms and models. However, there are three areas where, I think, this approach has serious weaknesses:

It can stifle innovation and product development. The algorithms and models are proprietary and only enhanced when the vendor decides. There is no ability for the wider community to effectively influence future development.
It can inhibit the development of related products. While the vendor controls the specific AI product, it lacks the openness – such as APIs – to allow for an active community to form around delivering complementary products and add-ons.
It can undermine confidence. Black-box AI solutions don’t provide the transparency to show how specific outcomes were reached. This is especially important within Healthcare where patients are entitled to know why a specific treatment plan is advised. The quality, accuracy and veracity of outcomes from an AI solution have to be assured throughout the entire product life cycle.

It is difficult to see how these weaknesses can be overcome using a traditional software development model. Instead, we need to embrace an open source approach where communities of Life Science and technology experts can collaborate to develop, adapt and test new and existing AI solutions.

OpenText Magellan uses the Jupyter Notebook, an open-source web application that lets users create and share documents that contain live code, equations, visualizations and explanatory text, to allow communities to work together on algorithms and models. The Notebook was partially selected due to its widespread use and popularity within the Life Science community.

The importance of transparency

A final note on the democratization of data. It’s not just about making sure that all data sets are available for the development of new solutions. It’s also about ensuring that the right people get the information they need in a format in which they need it.

Data Scientists can create the algorithms and models but these need to be presented to the knowledge worker in a way that can be easily adapted to specific business or research purpose. The knowledge worker needs to be able to deliver that information in graphical reports and dashboards for easy interpretation and improved decision-making. Again, open source provides the potential to increase the number of options available at each stage.

As importantly, the person most affected by the data – the patient or clinical trial subject – needs to know how key decisions were made. Algorithms and models have to be able to be scrutinized for quality and legality.

Latanya Sweeney, a professor of government at Harvard University, suggests: “The algorithms have to be able to be transparent, tested, and have some kind of warranty or be available for inspection.”

Open source AI helps enormously here as the openness and transparency is inherent in the process – as is the necessity to document and audit the work. It seems increasingly likely that there will be regulations placed upon AI. Open source is the ideal approach to build the confidence within the public and legislators.

We are still very early in our AI journey for Life Sciences and I predict some wonderful things ahead. Open source AI – especially when enhanced with advanced analytics – provides a platform for the development of new and innovative products that offer quality and accountability for AI-derived insight and decision-making with the sector.

OpenText

OpenText, The Information Company, enables organizations to gain insight through market-leading information management solutions, powered by OpenText Cloud Editions.

See all posts

eDiscovery

How to generate a custom shortlist of eDiscovery vendors

Build a smarter eDiscovery shortlist. Use GigaOm’s Radar to compare vendors and find the right fit for your legal and IT teams.

June 19, 2025•

3 min read

Why open source should drive AI development in Life Sciences

Open source and AI

Breaking open the Black Box

The importance of transparency

OpenText

More from the author

How to generate a custom shortlist of eDiscovery vendors

What’s keeping CIOs up at night? Part 2

What’s keeping CIOs up at night? Part 1