Meet CausalNex, our new open-source library for causal reasoning and “what if” analysis

QuantumBlack, our advanced analytics firm, recently announced the launch of its latest open source product, CausalNex. Its a software library that data scientists can use to analyze datasets and build models that consider cause-and-effect, a challenge that experts have long struggled to solve.

“A lot of traditional machine-learning models recognize correlations and patterns in data but that doesn't mean causation,” explains Ben Horsburgh, the lead machine learning engineer on the project.

With CausalNex, data scientists can apply machine learning to identify potential cause-and-effect relationships in their datasets. They can collaborate with domain experts to remove spurious conclusions and validate their models, and then use them to find the underlying, sometimes overlooked, drivers of their goal.

Ben offers this simple example: “In the summer a lot of people eat ice cream, and there can also be drought. An algorithm could pick up a correlation that says, ‘when people buy ice cream, there is a drought’. These two events are clearly both an effect of the true root cause: that it is hot.”

Using CausalNex, data scientists can ask “what if this underlying variable was different?”, and observe the size of the impact. This can help tease out the highest-value features which can then be used to effect change.

Here’s a walkthrough of a typical data project using CausalNex. It takes place in three stages.

A walkthrough of a typical data project using CausalNex. It takes place in three stages.
A walkthrough of a typical data project using CausalNex. It takes place in three stages.

As a first step, a data scientist uses machine learning to create a structure representing the problem the team is working to solve. It could be, for example, identifying the key factors of sales-team effectiveness. The structure can be visualized as a network: circles would represent the features (number of customer interactions, sales members’ profiles, etc.) and arrows would indicate the relationships between them.

This visualization can then be taken to the business team for review: are the relationships accurate? Are any variables missing?

At this point, specific industry and domain expertise can be built in to ensure the model’s accuracy, encouraging ownership and adoption of the analytics by the wider organization.

“Rather than waiting for two or three weeks before showing the outputs of a model, the early visualization allows data scientists to collaborate with the business team right from the start,” says Wesley Leong, the lead CausalNex product manager. “This collaboration creates more transparency in the modelling process.”

After the structure is verified, a machine-learning algorithm models the probability of each circle, or feature, in the network, given the other variables that influence or cause it. This allows data scientists to understand the strength of the dependencies between drivers and the goal and make predictions based on them.

Finally, data scientists run “what if” scenarios to find out what happens to the goal if they change a given feature. They can generate more informed, accurate ‘interventions’ – suggested changes that will have the greatest impact on performance. This final step is the most crucial from a business perspective. In the example of sales team effectiveness, for instance, the strongest driver might be the depth of expertise of the sales representative.

CausalNex streamlines this entire project process. Such analysis previously required multiple libraries and frameworks, each with a different interface. Now, scientists can work within a single CausalNex library.

“Helping organizations achieve truly significant performance gains often depends on understanding and addressing the underlying causes of a situation,” says Wesley. “We believe CausalNex helps to identify these clearly and puts this analytics capability into the hands of our client teams.”

The team that created CausalNex
The team that created CausalNex
The team that created CausalNex

Over the past year, the QuantumBlack team has used CausalNex in seven client use cases across industries, including manufacturing, healthcare and banking.

Why release this as an open source product? The practice is somewhat new in consulting, though long-established in the tech world.

“As a strategy, open sourcing makes sense for us and our clients,” says QuantumBlack partner Sam Bourton. “It helps them to be independent and avoid becoming tied up in legacy contracts or vendor lock-in.”

“CausalNex would not have been possible without a global network of leading researchers generously sharing their own studies and papers,” he points out. “By open sourcing this toolkit, we hope others will use it to enrich their own work in causality and gain greater value from their analytics projects.”

CausalNex is the second open source product that McKinsey has launched in the past six months. Kedro, our first tool announced in June, has earned more than 1,900 stars on GitHub and won Best Technical Framework at the 2019 Global AI Annual Achievement Awards.

Learn more about CausalNex at GitHub, where you can engage with our team and watch for new features in coming months.

Never miss a story

Stay updated about McKinsey news as it happens