Set for Converting Printed Material into Electronic Format
The offered solution helps to resolve the major problems, appearing in the process of digitalization of the printed material. It brings the following improvements:
- significant increase in speed
- centralized management of the process as a whole
- decrease of system dependency on the operator’s faults
- possibility to monitor the operators’ performance
The overall system of the complete set for converting printed material into electronic format can be presented in the following way:
The overall system of the complete set can be seen as the interaction of the 3 included sets:
- Management set
- Recognition set
- Scanning set
Management Set
Management set is the core component of the complete set for converting printed material into electronic format. Management set monitors the work of the scanning and recognition sets and deals with the data storage. It also allows eliminating the dependency of the system on the human factor. In the automatic mode the management set adjusts the book scanner, monitors the information coming from the scanner, and when the conditions preset by the system administrator apply, it transfers the received information in the digital format to the recognition set. The receipt of the recognized information from the recognition set is also done in the automatic mode without the participation of the operator. After the receipt and verification of the information integrity, the information is transferred to the data storage, where it becomes available to all the users of SearchInform Client. Management set is also able to work with the recognition set based on a certain schedule. This is done in order to conduct the work in the time of the minimum workload of the local stations, which are used for the recognition of the information.
Recognition Set
Recognition set is built using the Abby Recognition Server technology. The recognition set, under the monitoring of the management set, can conduct the recognition of the pre-digitalized information in the automatic mode without the operator’s participation. The function of apportioned recognition of digital information allows the user to vary the capacity of the recognition set depending on the capacity required for processing the information, being received from the scanning set.
The overall principle behind the recognition set looks in the following way:
When the Recognition Server receives the information from the management set, it finds a vacant recognition station and sends the task for completion. If all the recognition stations are busy, Recognition Server will enqueue the task and wait till one of the recognition stations becomes available. The system administrator can increase the number of recognition stations. Regular office computers accessible via local network can function as recognition stations. The process of recognition is rather resource consuming, but it can be launched in the time of the minimum workload, for example in the nighttime. The average performance of the system with 3 recognition stations is about 200 pages per hour, or about 5000 per 24 hours.
Scanning Set
Scanning set is a highly efficient book scanner, operating under the monitoring of the management set and the operator. But the role of the operator is not as vital as with the regular flatbed scanner. The performance of the book scanner is very different from that of the regular flatbed scanner. The average time required for scanning one two-page opening is 2 seconds.
All the additional operations on storing the image and its editing are not concerned with the operator. This task is taken care of by the management set, which automatically accepts the scanned image from the book scanner. The function of the operator is thus minimized to page turning and the start of the scanning process. Operator’s focus on a single task allows increasing the scanning performance and reaching the speed of about 600 pages per hour.
Overall System Performance
When considering the potential system performance, we should start with the following calculations:
- Performance of 600 pages per hour.
- With a 12-hour working day (2 shifts, 1 operator each) – about 7000 pages per day.
- Under the above stated conditions the monthly performance would be about 150000 pages.
- The overall performance would come up to converting more than 1000 books (150 pages per book) into the digital format.
|