Data quality sampling and form builder enhancements

Riaz Karim, •

A core feature of the Hivemind platform is its ability to perform sophisticated data quality checking, with a consensus algorithm designed to catch and correct errors while allowing for natural human variation. But recently some users have been finding that their error rates are acceptably low, either through training or good task design, and the full strength of this error checking procedure is unnecessary. Instead, they need a lightweight and efficient way to monitor their tasks for error spikes without the cost of a full duplication.

Example with Sampling enabled

Example with Sampling enabled

So in this release we add sampling to the platform, allowing you to check only a subset of the task instances. You can now select a percentage of instances for consensus checking, without having to duplicate work on the entire task — just set the sampling rate and Hivemind will display task statistics for both the sampled and unsampled data, allowing you to maintain statistical confidence in your data quality at a far lower cost.

Form Builder Enhancements

Our goal with the form builder tool in Studio is to bring the power and flexibility of the JSON Schema standard to a tool that doesn’t require writing fiddly JSON to create rich forms. We’re pleased to announce that this latest release includes further enhancements that bring us ever closer to that goal.

With this release, you can create lists of more complex objects to collect sets of structured data, whereas previously you were limited to lists consisting of elements comprising of a single field.

You can now use the tool to create forms that capture more complex shapes of data. For example:

[{ Name: “John Smith”, Age: 25 }, { Name: “Maggie Q”, Age: 23}]

Enhanced lists functionality in the Form Builder

Enhanced lists functionality in the Form Builder

Additionally, the underlying schema generated by the form builder has undergone some changes to allow for greater compatibility with other advanced features of task definition. The generated schema will now contain JSON schema definitions. This allows the form to be modified per instance via the overrideSchema template property, located under the Instances tab.