Handlebars, Agreement Checking, Tesseract, Editing & more.

Riaz Karim, •

We’re excited to announce our latest release to the Hivemind platform with a host of new features.

Hivemind embraces Handlebars

With this release, you can now use the Handlebars syntax to template instance instructions and schemas. This means that you can upload many instances without repeating the instructions on each instance.

For example, if you set the instructions template on the task to:

And you have the following instance data:

Screenshot 2019-08-07 at 13.07.15.png

Hivemind will render the instance as:

Click on the following link: Hivemind and find the author of the most recent blog post.

As you can see, you can combine this template with Markdown (used to render the link above) to get really neat instructions. You can also change the template by editing the task to change how the instance is displayed without having to cancel and recreate each one. Check out the relevant section in the docs for more information.

Advanced Agreement Checking

More advanced agreement checking is now available in Hivemind Studio. Options include comparing strings by ignoring case, spaces, symbols, or using Jaro-Winkler and Levenshtein distance to agreement check based on string similarity. There’s also a number of options available for dealing with proximity of numerical values and omitting fields completely from any agreement check. Check out the section in the docs or the ‘Data Quality’ tab in Studio for more details.

Active Tasks are now editable

Certain fields on a task are now editable even after they have been sent to the contributors. You can edit the instructions, templates, output and agreement options among others. Remember: changing these fields can have unintended ramifications for your data quality.

Extracting text from images with Tesseract

We have integrated the excellent Tesseract OCR library as a Hivemind agent to help when building workflows to digitise images of text. For consistently formatted text, Tesseract generally provides good results across a broad range of languages. OCR output quality may suffer however when provided with inconsistent document formatting or a poor quality image. In these cases, an augmented workflow consisting of an initial task completed by the Tesseract agent followed by an OCR clean up task completed by a human, can be used to deliver high quality results. Watch out for the ‘OCR’ tile on the Studio Task Creation page to get started.

Support for Locales and Qualifications on MTurk

You can now access the full power of MTurk qualifications to target specific pools of workers by qualification or locale. These are available on the ‘Settings’ tab when you create an MTurk task. See here for documentation for further information.

Breaking API Change: Task Instructions -> Documentation

We don’t take breaking API changes lightly and only do so to address significant issues in usability or functionality. In response to feedback around potentially confusing naming around the various markdown fields, we have renamed ‘instructions’ on a task to ‘documentation’. We feel this results in a clearer structure of the task containing the higher-level documentation and the instance having the specific instructions.

Changelog

Thanks to Daniel Mitchell.