From proprietary to open source: How a Firm data science tool became a trending product
McKinsey’s first open-source tool hopes to help developers create production-ready code more easily.
We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at:
McKinsey_Website_Accessibility@mckinsey.com
In early June McKinsey released Kedro, an open-source tool for data scientists and engineers to create data pipelines, the building blocks of many machine learning projects.
Within 24 hours, it was the number-one trending product on GitHub, the software development hosting company.
Kedro, which is Greek for “center,” structures analytics code so that data flows seamlessly through all stages of an analytics project. Two years in the making, Kedro was developed by Nikolaos Tsaousis and Aris Valtazanos, engineers at QuantumBlack , the advanced analytics firm that McKinsey acquired in 2015.
According to Nikolaos, clients especially like the tool’s pipeline visualization ability. He explains that Kedro makes conversations much easier, as clients immediately see the different transformation stages and types of models involved and can backtrack outputs all the way to the raw data source.
“Kedro began as a proprietary program, but when a project was over, clients couldn’t access the tool any more. We had created a technical debt,” Nikolaos says. “By converting Kedro into an open-source tool, clients can use it after we leave a project—it is one way we are giving back."
Jeremy Palmer, QuantumBlack’s CEO, notes that releasing an open source product is a new innovation for McKinsey. “It represents a significant shift for the Firm, as we continue to balance the value of our proprietary assets with opportunities to engage as part of the developer community, and accelerate as well as share our learning,” he comments.
“Kedro can change the way data scientists and engineers work,” explains product manager Yetunde Dada, “making it easier to manage large workflows and ensuring a consistent quality of code throughout a project.”
“More importantly, the same code can make the transition from a single developer’s laptop to an enterprise-level project using cloud computing,” explains Ivan Danov, Kedro’s technical lead. “And it is agnostic, working across industries, models and data sources.”
“There is a lot of work ahead, but our hope and vision is that Kedro should help advance the standard for how data and modelling pipelines are built around the world, while enabling continuous and accelerated learning. There are huge opportunities for organizations to improve their performance and decision-making based on data, but capturing these opportunities at scale, and safely, is extremely complex and requires intense collaboration,” says Jeremy. “We’re keenly interested to see what the community does with this and how we can work and learn faster together.”
Learn more about Kedro at Github, where you can also engage with the team.