SCAN STORE RETRIEVE INDEX INTEGRATE ARCHIVE
Tag Archives: records management
Optical Character Recognition – Per Wikipedia, OCR is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. Applied to the appropriate document type and format, OCR processing is extremely useful and can save both internal resources and CAPEX along with producing a higher quality product than if done by hand-key entry.
Unfortunately, OCR is not for every project. Skewed text, rough text, heavy noise, lines and other foreign data interfering with a clear and uninterrupted view and scan of text will reduce accuracy.
OCR engines are very linear processes – they look horizontally and perpendicularly across digital images. Any skewing from a 90 degree orientation will negatively affect any OCR engine. Additionally, OCR engines are not magic but very pragmatic. Images must contain familiar text resembling existing alphabetical characters. Anything that distorts standard text will reduce accuracy.
The following are industry accepted steps used to increase OCR accuracy:
Deskew – Software process, using various advanced algorithms, will identify the text orientation and attempt to align the image to a perfect 90 degree.
Noise Reduction – Also known as despeckling – software process that will remove small imperfections, spots, scratches, blotches and random marks from within the white area in a digital image. Removing these imperfections will reduce OCR engine interference and reduce “false positive” reads.
Dilation/Erosion – Text quality is the key to OCR accuracy. These filters can smooth the edges of text by removing pixels that represent rough edges or add pixels to fill missing data with a character.
Line Removal – Speciality software can provide the functionality to remove lines from an image. Removing lines reduces OCR interference.
Red/Blue/Green Dropout – Using the proper settings, color scanners will “not capture” red, blue and green data within an image. Many times, pre-printed forms have the boxes and response areas printed in red, blue or green. This is purposefully done, so that during the scanning process, the box lines and response areas are not captured and thus is less interference with the OCR engine.
The discussion on the viability of using microfilm or digital for long-term archiving rears its ugly head on a regular basis in courtrooms, boardrooms and offices for both government and private institutions alike.
From a legal perspective, microfilm is a supported format. The Best Evidence Rule (Federal Business Records Act, Uniform Photographic Copies of Business and Public Records as Evidence Act) states that these statutes permit the admissibility of any record which has been “kept in the regular course of business and copied or reproduced by … any photographic, photostatic, microfilm, microcard, miniature photographic or other process which accurately reproduces or forms a durable medium for reproducing the original.” Accordingly, the reproduction is as admissible as the original. The process of recording information optically clearly falls within the law’s language of “other process which accurately reproduces or forms a durable medium for reproducing the original.”
All US states have published document retention and library standards and micrographics adhere to just about every state’s standards for long-term document archiving. New York State, California, Texas, Indiana, Arizona, Louisiana and Florida Archives, just to name a few, promote microfilm as a viable and practical medium for preserving the state’s history.
Now, ask yourself this question; Let’s say you have been left a trust in the value of $100,000,000.00. You must wait 20 years to have access to these funds. The money will be accessed via a 1,000 character code found on 200 separate documents (files). You will be provided with these documents on a USB drive, a CD, a DVD or a roll of microfilm. Which media would you choose? Backwards compatibility will always be a serious concern. What platform created the electronic copy? Was it Windows? Will this format be supported in 20 years? Will you need to do some type of conversion to your 20-year-old data to have access to your code? If you lose one single image, you will not be able to access these funds. Now, instead of a code for a trust, look at those documents as proof of purchase/ownership, human resource records, certification documents, medical research history, etc…
Advantages of Microfilm for Long Term Archiving:
- Properly filmed and processed microfilm on a polyester base has an anticipated life expectancy (LE 500) of 500 years.
- All you need to view film is a light source.
- Individual pages cannot be pulled or lost.
- Original rolls cannot be edited.
- A single roll of 16mm microfilm can hold over 8,000 images – that is almost an entire 3 drawer file cabinet of documents.