Client interview with David Hunt, IHS Markit

Christian Gilson, Head of Data Science •

Blog 20

Since spinning out of a global investment management company two years ago, it's been really fascinating to see the way our data preparation platform has been utilised in different ways by different industries. In particular, we’ve been working with a lot of data vendors, helping them build and enrich their data products ahead of going to market. One such client is David Hunt from IHS Markit whom I recently chatted to about what kind of work they are doing and how our CORE platform has been able to help them (in his words) “gain much more confidence in project planning, budgets and delivery commitments.”

---

Christian: Tell us a little bit about IHS Markit and the role of your team within the business.

David: Simply speaking, we are an information company. With deep expertise across the world’s largest industries, we use technology and data science to provide the insights, software and data to help our customers make better-informed decisions, driving growth, performance and efficiency.

I lead our Applied Analytics team in the Economics & Country Risk business. We’re focused on developing innovative solutions that blend our unique human capital and data assets to help our customers better understand and quantify risk and opportunities in global markets.

Christian: What are some of your key considerations when bringing a new data product to the market?

David: We always challenge ourselves to think through how a new dataset will help customers make better decisions than they could before. Clients are often already drowning in data, so it’s important to ask yourself why they would choose your new data product—is it adding something new compared with what is already available on the market? Is it more predictive, timely, accurate, comprehensive?

We have an extremely strong reputation as a provider of trusted, high-quality data that clients can build reliable workflows and decision processes around, and that are sustainable in the long term. As such, when we launch new datasets we need to make sure that we are confident in the quality of the data and our processes for its reliable long term production.

Christian: What challenges do you face when developing new data products?

David: Given the existing depth and breadth of our expertise and datasets, it’s rare that we have to start entirely from scratch – often we are in a position where we are trying to improve or integrate existing assets which partially address the new requirement but need mapping of their strengths, weaknesses and how they can be used.

We need to be thinking through how to structure new data products so that they meet the immediate needs of our existing clients, while recognizing that they will be fed into a Data Lake that connects a huge variety of end markets and personas to our content. We’ll have clients finding innovative and unpredictable uses for our data years from now, so we want to provide a good foundation for that.

Given how important quality is, we have to work really hard to get products to market quickly, while retaining quality – and do that smartly and transparently, recognizing that there are some aspects of data quality that will make a huge difference to how our clients use our data and others where they are comfortable with less granularity or confidence. Nobody wants to wait years for the perfect dataset – the trick is to optimize for maximizing value for the decisions that it will inform.

Christian: How has Hivemind helped with some of those challenges?

David: In short, Hivemind has provided us with a toolkit to efficiently manage the human workflows for the classification of new datasets, across multiple distributed workforces, enabling us to make smarter decisions balancing costs, quality and time to market.

Christian: What innovative methods do you think are key for building a great dataset that perhaps go overlooked?

David: Clearly getting the right operating model between human and machine tasks is critical, and not at all trivial. There are lots of aspects to that, but one area that I think we’ve done some really good work on is using Natural Language Processing in the very early, iterative stages of taxonomy design, to help ensure that the structure that is going to be applied to a dataset is one that fits the actual patterns in the data, and that can later then be applied readily by both human teams and machine classifiers. You can get this very iterative process going whereby improving the fitness of the approach for machines to apply, you are also making it more coherent and replicable for humans to work with too.

Christian: Are there any datasets at IHS Markit that have a particularly interesting collection methodology?

David: There are so many across the firm – satellite and terrestrial vessel tracking data, surveys for the Purchasing Managers’ Index (PMI), the RootMetrics team’s 5G mobile network testing performance data. While many of the datasets are unique in themselves, what is really exciting is how they can be brought together. Uniting all these vast data assets into a single catalogued platform is now possible with the IHS Markit Data Lake – over 1,000 proprietary data assets, both structured and unstructured, which will in future also include over a million research reports from our industry experts.

Where my team fits into this is primarily event data – we use a hybrid machine-human approach to monitor, curate and enrich events related to political, economic and security risks and opportunities globally. This provides a unique, well-structured dataset for any application that needs to reliably monitor and quantify such events and relate them to locations, assets or sectors.

Christian: Do you think there is an increasing demand for a more transparent audit trail behind data collection?

David: Traditionally, most of the customers I’ve worked with were satisfied with a high-level understanding of methods and given that the data was primarily exploited in controlled applications there was a limited need for a detailed audit trail.

I’ve noticed that over the last few years many customers who have explored the proliferation of new datasets have spoken to us as the limitations of some of those sources have become clearer – free datasets that suddenly are no longer updated for example, or that have aspects which users can’t explain to downstream stakeholders. I think what is emerging is that customers want to get a much better feel for the specific characteristics of data — all datasets have strengths and weaknesses — so they can use it more intelligently.

Christian: Within your team, are there a mix of programmatic and UI users?

David: Yes, we team up the project managers who really understand the domain and what needs to be done, and with data analysts who can rapidly translate those requirements into effective workflows and data models.

In general, one of the risks of having any platform with an intuitive UI is that non-technical users can get started quickly but then reach the limits of what they can do, and at that point have created slightly messy processes that need to be unpacked and made more robust to scale. By pairing non-technical and programmatic users we can put in place more efficient and robust processes from the start.

Christian: You've been using Hivemind for a year now, what are the areas where it's had the biggest impact?

David: I think the biggest transformation is our ability to rapidly understand how quickly and how accurately individual workers and teams are performing on a project. This has helped to improve our management of large data projects – we can swiftly identify and correct many problems in almost real-time before they slow down the project or lead to quality issues.

It’s also given us much more confidence in our project planning, budgets and delivery commitments.

Additionally, the Hivemind team has been really responsive at analysing our requirements, aligning them with their wider development roadmap and where the two diverge, helping us think through alternative approaches. Having that kind of flexibility in the platform to make small changes which can open up new ways of solving a problem is really helpful.

---

To find out more watch the CORE product video, or simply get in touch to see how we can help.