Data Quality, Template Functions, and Results.

Riaz Karim •

The Hivemind platform pushes the envelope when giving our clients the tools to increase the accuracy of collected data. This new release provides more advanced methods to check whether two or more responses are in agreement. We also offer some new functions to use in your Handlebars templates, and improved downloading of results in Studio.

Data Quality

Previously, on-platform agreement checking only allowed a response to be chosen in its entirety as the consensus answer. In cases where responses consist of multiple segments, this could result in more iterations being required than is strictly necessary as we strive for complete agreement between responses before taking a consensus result. This release introduces additional options for how you want agreement to work, giving you the opportunity to decide whether you want responses to be combined in order to create the consensus result, allowing you to potentially reduce the number of iterations (and therefore the cost) required for each instance.

The following modes are available:

  • Pairwise Comparison — As per previous versions of Hivemind, consensus is reached by comparing results in a pairwise manner using options specific to the types of result being compared and noting that an answer always agrees with itself. Overall agreement is reached if at least one answer agrees with enough other answers so that the ratio of agreement/total is greater than or equal to the target agreement threshold. In this case, the answer with the highest such ratio is chosen as the consensus answer, with ties being decided by giving preference to longer answers over shorter ones. This mode is the default option.

    Inclusion Frequency — This mode is available only for responses that are arrays. When using this mode, consensus is reached by considering each list item individually, and counting how many responses include this item. The item is included in the consensus answer if and only if the ratio of #include/#total is greater than or equal to the target agreement threshold. Overall agreement is reached if the number of items selected to feature in the consensus response is greater than or equal to the configured minimum, which defaults to 1.

    Piece-by-Piece — This mode is available for both object and array type responses. When this mode is selected, the responses are broken down by key for objects or index for arrays, and the pieces then undergo agreement checking; this agreement checking can be done using any applicable mode. Overall agreement is reached if and only if all of the pieces successfully reach agreement, and in this case the consensus answer is formed by gluing the individual consensus answers of the piece-wise agreement checks back together, using the reverse logic to how it was broken apart. If feedback is enabled, feedback generated will be based off the lower-level agreement checking, meaning contributors that only disagreed with consensus on one piece of the answer will only see feedback for this piece.

    In addition, this release adds new advanced agreement options. These include date comparison options, the ability to select only part of an answer when comparing strings through the use of a regex selector, and the ability to normalise strings using Unicode equivalence. The agreement options can also be inspected in their raw JSON form and edited directly in Studio under the Data Quality tab when creating or editing a task.

    Finally, the worker statistics in Studio and Workbench have been updated to include the additional dimension of how often a worker contributes to consensus. When viewed with the previously Standardised Task Time (STT) metric, can give you a real insight into the performance of individual contributors.

    Template Functions

    We’ve enhanced our use of Handlebars templates with the availability of some functions to manipulate dates and strings of text. This can be used to great effect when combined with the Scheduling feature released last month. For example, if you have a schedule to collect data on a daily basis, you can use the newly added _metadata on the instance data along with these functions to annotate the instances with the previous day in a specific timezone and format. E.g.

    {{ convert_timezone _metadata.createdAt "America/New_York" }}

    Will give you the time the instance was created at in NY time.

    The list of functions are as follows: convert_timezone, date, date_format, date_add, date_period and humanise

    Full details along with examples can be found in our docs.

    Downloading Results

    Many people have spoken to us about the importance of everyone being able to easily download the results from Hivemind, without necessarily cracking out their favourite programming language (C# rocks in case you’re looking), or having one for that matter.

    Therefore, we have enhanced the downloading of results in this release to support the Excel format. Adopting this format allowed us to represent more complex data structures through the use of tabs. But don’t worry, we still support JSON and CSV for those that want it. Check it all out on the individual task pages in Studio.

    Other Changes

    Task timeouts — are now visible to contributors and they have the option to extend their window.

    Contributor statistics — Standardised Task Times has been changed to hourly to better allow comparison across different workforces.

    Array items in forms — are now numbered and can be re-ordered by dragging them.

  • Results can also be downloaded at the instance or iteration levels on the task page in Studio.