- View all blogs
- Dataset creation case study, with Winton
Dataset creation case study, with Winton
Chrissie Cormack Wood •
Winton is a global investment management company. Founded in 1997 by its CEO David Harding, Winton’s business is grounded in the belief that the scientific method can be profitably applied to the field of investing.
Winton’s research group had been asked to test something Warren Buffet had averred. In his 1994 letter to Berkshire Hathaway shareholders, Buffet said that the owners of companies that make major acquisitions usually see their wealth depleted. In order to test this assertion when applied to large US companies, Winton’s researchers needed a dataset which was accurate, unbiased and with a considerable back history. The history was important both because it meant there would be sufficient acquisition events to provide the requisite statistical power to the result, and also because it would demonstrate the performance of the strategy across different market conditions.
However, no data vendor was able to sell Winton a dataset with sufficient detail to test the hypothesis. The chart below demonstrates the coverage of acquisitions made by S&P 500 companies that a leading vendor could offer.
There are clearly many problems with this dataset before even looking at the detail of the events. Acquisitions are relatively sparse events, there are just 3,000 deals here, which isn’t much data, and the huge increase in activity over the early nineties is perhaps more suggestive of increased data coverage by the vendor than it is of market reality.
Without Hivemind, Winton would have had to abandon the project. The data they could buy was not acceptable to the research group and they would never have traded a signal based on the data presented here.
With Hivemind, Winton were able to set up a sophisticated workflow of tasks to collect a longer history and test their suspicions of bias in the data.
How Winton used Hivemind
Winton started with an archive of regulatory filings and news articles. The problem was broken down into a chain of simple tasks. Some of these were appropriate for automated methods like Named Entity Recognition or clustering algorithms, others required human intelligence to parse complex documents flexibly and to resolve ambiguity between one company and another or one deal and another.
The result was a dataset (depicted in the chart below) with coverage extending back to the early 60s, with over 5,000 new deals in it, and which corrected the clear bias in the first decade or so of the vendor data. With Hivemind, Winton had created a unique, bespoke dataset which allowed them to conduct statistically sound research which would otherwise have been unapproachable.