 |
 |
Optical Character Recognition Confidence System |
 |
|
Business Challenges
Case Data is one of the world-leading providers of data capture, development, and management services for some of the largest corporate law departments and law firms around the world.
Te fact is that Case Data frequently processes the images of documents scanned through OCR (Optical Character Recognition). This process produces a text file of the images being processed. Since OCR requires the program to 'recognize' characters, it is frequently inaccurate because of the quality of the underlying image, the performance of the OCR engine, etc.
Case Data needed to have a program that judges the quality of the OCR for every image processed. The proposed program would examine each page of the text file created at OCR time and compare each word found in the text file against a standard dictionary. The total number of recognized words would be divided by the total words on a page of text to produce a ratio, or 'word confidence' rating, for every page of a document.
System Functionality
INTELSYS designed a system based on object oriented technologies. Application presents the information for end-users and has many useful functionalities, such as:
- The verification of the source text file contents by checking a word coincidence with a word set in the embedded dictionary.
- The high-speed word search in the embedded dictionary database.
- Management of the embedded dictionary database.(1.2 million words)
- Accelerated possibilities of word verification by checking its presence in the dictionary database.
User Interface
Used languages & Technologies
- C++
- Rational Unified Process
- Borland Control Panel Applet Package
- Borland Standard Components
- Borland User Components
- Delphi 1.0 Compatibility Components