The goal of the project was to set up a solution and to process ~4000 highly confidential hand-filled questionnaires (app. 48 000 pages) each containing 280 fields (check boxes, as well as, fields containing hand written numbers or text) in less than one month.
Our client had limited internal resources to allocate for the project, so they were looking for a solution, that can both, automate the processing without decreasing the data accuracy and allows to utilize external resources for the data entry, without compromising the confidentiality of the processed data. Guaranteeing full confidentiality was one of the highest priorities of the project.
- Ensure and configure the scanners required for the digitization
- Automate the processing of app. 4000 questionnaires (48 000 pages, ~1,1M field)
- Provide a simplified web interface for the manual data entry
- Provide a simplified web interface for the quality control
- Enable the use of external resources for the data entry without compromising the confidentiality of the data
Solution – MPS IntelliVector
After an extended vendor evaluation period, MPS IntelliVector was chosen due to its unique capabilities which allow to automate and speed up the processing, while the microtask-based approach allows to utilize cheaper, external workforce for the steps of the processing that require human involvement yet still guarantee 100% data confidentiality.
In the first week of the project client employees scanned all the questionnaires, using scanners pre-configured by MPS technicians. MPS IntelliVector automatically recognized the questionnaire types and based on their layout extracted the parts that required processing. During the extraction MPS IntelliVector was breaking down the document image into small individual fields, or microtasks, which were then processed by a combination of automated and manual recognition.
- in case of check boxes: using only automated OMR (Optical Mark Recognition)
- in case of handwritten numbers: cross-checking results of automated ICR (Intelligent Character Recognition) and microtask-based manual data validation
- in case of free form, multi-line handwritten text: usually for handwritten text MPS IntelliVector uses a combination of ICR and manual microtask-based data validation (similarly to one used for the handwritten numbers), but in this case a more traditional manual data entry was used.*
* (even though MPS IntelliVector uses some of the best available recognition engines on the market, there is no automated recognition technology available that can guarantee 100% accuracy for this type of text, so in this case manual data entry was more rational in terms of time and cost)
Microtask-based data entry
In contrast to the traditional approach offered by other vendors, where data entry users type in the data looking at the whole, full-page image of the document, MPS IntelliVector recognizes the type and layout of the incoming documents, breaks it up into small, individual, microtasks and send them to the data entry users. This way, the data entry users see only small, anonymous pieces of information, without their original context, not the whole document, so the confidentiality of the original data is fully preserved. This approach allows to securely utilize external, even crowdsourced workforce for data entry or data validation of any kind of confidential data.
Each individual microtask (in this case a field containing multi-line handwritten text) was typed in by two separate data entry user and only if the two results don’t match, then it was sent to a third data entry user. A microtask went to quality control only if there is no matching results among the three data entries.
In case of this project a total of 7 internal users and 10 external part-time users were doing the data entry. This team was complemented by 4 quality assurance users.
- The whole project was delivered within one month
- 3858 questionnaires (48 000 pages) were scanned and processed
- 1 106 386 fields (microtasks) were processed
- In 35% of the fields the previously used manual processing was replaced with a fully automated processing
- In the remaining 65% of the cases
- in case of fields containing numbers the processing speed improved by 200%
- in case of fields containing text the processing speed was improved by 20%
- Considerable cost savings were realized due to the involvement of external resources for the data entry
- 100% data confidentiality was guaranteed throughout the data entry (even in case of the external workforce)