SCAN STORE RETRIEVE INDEX INTEGRATE ARCHIVE
Category Archives: Scanning/Conversion
SEC. 802. CRIMINAL PENALTIES FOR ALTERING DOCUMENTS.
(a) IN GENERAL- Chapter 73 of title 18, United States Code, is amended by adding at the end the following:`Sec. 1519. Destruction, alteration, or falsification of records in Federal investigations and bankruptcy. `Whoever knowingly alters, destroys, mutilates, conceals, covers up, falsifies, or makes a false entry in any record, document, or tangible object with the intent to impede, obstruct, or influence the investigation or proper administration of any matter within the jurisdiction of any department or agency of the United States or any case filed under title 11, or in relation to or contemplation of any such matter or case, shall be fined under this title, imprisoned not more than 20 years, or both.`Sec. 1520. Destruction of corporate audit records.
Most IT managers would tell us that the only way to archive your records is to back-up on tape or disk drives. When you consider the implications of lost data how long you (firm, office) are responsible to keep much of your corporate records, you’re chosen long term archival media system needs to be thoroughly evaluated.
Destruction, Alteration and Falsification
If you store legacy records on paper in files, how easy is it to miss-place a single page, replace an existing page with a new page or eliminate (destruction of some method – i.e. shredding) a file from a record completely.
Corruptibility and Backwards Compatibility
While the idea of a digital back-up of your records seems safe enough, have you recently tried to open a CD or USB type media device that is 10 years old or older?
Technology continues to march forward at an ever quickening pace. Platforms are changing every 2-4 years which makes backwards compatibility very “iffy”. It’s not good business practice to assume that you can easily and reliably access back-up data on a disk-drive from as recent as 6 years ago.
Reliability of Tape Backups– A survey of IT executives on tape backup solutions reported these findings:
Gartner Group estimated that 10 to 50 percent of all tape restores fail. Storage Magazine and Gartner reported that 34% of surveyed companies never test a restore from tape, and of those that do test, 77% experienced tape backup failures.
- 75% of respondents indicated that their companies suffered unrecoverable loss of corporate data they thought was successfully backed up to tape due to unreadable, lost or stolen media.
- 63% said they encountered unreadable tapes when they tried to retrieve data with 76% of those cases reporting a direct impact to their business from loss of productivity to punishments for regulatory compliance infractions.
Every digital media is vulnerable to some type of corrupting influence. Anyone who has participated in a large conversion process from an older digital format to a more contemporary version knows the pitfalls. It is very likely that some (if not all) of the data will be corrupted and lost during the conversion process. And data conversion is very costly.
Trusted Long Term Archival Platform
Archiving your business critical data to microfilm is still the most dependable and trustworthy solution for long term records storage. Altering a roll of film is difficult and obvious. Anyone can see if a roll of film has been spliced. The cost to archive a single image to film costs about $0.03 each and storing a roll of film costs about $1.50 per year.
A single roll of 16mm, 215’ microfilm can store more than 2 full bankers’ boxes of records. Properly filmed, processed and stored microfilm has a life expectancy (LE) of 500 years. And ultimately, you need only a flashlight to view the documents.
Research company IDC last year forecast that 2010 would see the volume of digital data stored by everyone on the planet reach 1.2 zettabytes (1.2 billion terabytes), representing growth of 62 per cent over 2009. And by 2020, that volume will have grown by a factor of 44 to 35 trillion gigabytes.
Research from Gartner estimates data capacity in those enterprises is growing by 40-60 per cent per year on average, not enough to cope with demand if IDC’s statistics prove accurate. Its survey of 1,004 companies in eight countries conducted in August 2010 identified data growth as the top data-centre challenge, followed by system performance and scalability, and network congestion and connectivity architecture.
This Washington Post article sited further reinforces the predicted coming explosion in on-line data storage;
At the state and local level, our elected officials are responsible to manage and make available Public Records. While doing this, they must be aware that some of these Public Records contain Private Information. Think about the volume of land related records your local county office deals with. Every transaction where a piece of land or property changes ownership, a voluminous collection of documentation follows. Title information, mortgage/loan information, historical information such as liens, court information and related documents all have some influence as to the disposition of any piece land or property.
Found within these numerous documents can be found varying types and amounts of Personal Identifying Information (PII) such as social security numbers, dates of birth, credit and bank card numbers, driver’s license numbers and etc. Now, add the additional complexity that web access to these documents is now de facto. The voting public and private industry simply demand electronic information exchange.
More and more county and state offices are adopting electronic technology to manage their core business operations. Because of this, the inter/intra office, departmental and division communication is becoming more and more electronic based. Document sharing is no longer sending volumes of paper documents, but allowing access via an on-line computer application.
So how does a state/local official comply with current statutes and laws to provide open access to Public Records without exposing individual’s Personal Identifying Information (PII), especially via the web? And the question as to why they would not want to expose the PII should be rhetorical. Considering the current environment of identity theft we all exist within each day, information security is of paramount concern.
The answer is called Redaction:
Redaction is the process of covering over or blacking out specific information within a document. For hardcopy documents, this can be overwhelming task. Each time a person comes into a government office to research and gather information and makes copies of Public Information for personal or private use, the office has the risk that someone will walk out with someone else’s PII.
Or, when county offices receive record requests via snail mail, many offices today still deliver the information via hardcopy documents.
Example of legacy document hardcopy transaction:
- Original requested documents are located and photocopied;
- PII is redacted from photocopy using a marker of some type;
- The redacted copy is photocopied to eliminate the opportunity of bleed-through of original information;
- Original copy is re-filed, requested copy is mailed, and original photocopy is destroyed.
For day-forward electronic transactions:
Day-Forward Processing: Using an automated redaction software product
Documents can enter a local government office in a variety of ways;
- Public Access software (E-Recording) (Simplifile, Ingeo, etc…)
- Title Company
- Gov-to-Gov interchange
- Web Portal
As each new document is received in an office:
- The document is scanned or received electronically;
- Workflow driven process pushes image through an automated redaction product;
- Image comes up for manual human review and verification;
- Redacted image is electronically copied, creating a “public, redacted” version;
- The image follows normal workflow process until final document is verified and made available for view;
The original, non-redacted image remains in-tact while the redacted version is made available for public access.
If you have legacy/historical documents that need to be redacted, you should check with your automated redaction software vendor to see what they suggest. If your legacy documents are still on paper or film, you will need to go through the exercise of digitizing these documents. See my posts on preparing for conversion projects for assistance.
If your documents are already digitized, then your vendor may be able to facilitate your office doing verification of the images they process through their automated redaction software. Your CAPEX would be minimized by doing the verification work in-house.
I would not suggest trying to process all of your legacy documents through an automated redaction software product in-house. Processing millions of images through redaction software is very processor heavy – your office would probably need to purchase expensive servers if you chose to do the entire project in-house.
Following the “glass half-empty” analogy, many have said that the document conversion and document management industry is nearing its end. “They” claim all the paper in the world has been scanned and every business has already purchased their fancy new EDMS (Electronic Document Management Software) and ECMS (Enterprise Content Management Software).
Well, not so fast. In a recent survey done by Eclipse Group, an international services firm providing document and content management solutions, they found that an alarming number of businesses and organizations today are still heavily reliant upon paper based business critical processes.
- 75% of companies are still heavily reliant on paper based invoicing processes
- 67% of respondents are currently sending the majority of their sales invoices by paper rather than electronically using a document management solution
- 75% of respondents are currently sending the majority of their purchase invoices by paper rather than using a document management system
- 83% currently have to re-type invoices received into their finance system
The survey, which includes the views of financial professionals from a variety of sectors including insurance, financial services, industrial and automotive, also found that 83% of respondents currently have to re-type invoices into their finance system upon receipt into the accounts departments rather than using a document management system, raising serious concerns over accuracy and efficiency.
Gary Waylett, CEO of Eclipse Group commented, “Given the efficient way most businesses are now able to share information, it is surprising to find so many finance departments are not using a document management solution and continue to re-key data between systems. In addition to duplicating effort, and hence adding cost, re-keying significantly adds the risk of errors which then complicates the reconciliation process.”
So take heart all of you software and services companies, business will be good for years to come.
The service of converting paper or microfilm documents to digital format is a commodity in the document conversion world. It seems that anyone can become a service bureau with an inexpensive scanner and rudimentary capture software. The problem is there is really so much more to scanning than meets the eye – and this doesn’t become apparent until you have paid someone to scan a million of your documents just to discover you can only access about 750,000 of them within your document management software. Oh, and this realization happens about a year after you have signed off on the project.
How will you ever know if the bureau actually scanned 100% of your images? How will you know if they delivered 100% of them to you? I was once part of a project where we had partnered with a service bureau to scan land records books from a major US county. During the process, the partner delivered 50,000 medical records images by mistake. Talk about a disaster – this is one of the worst I have ever experienced. Billing throughout the remainder of the project constant struggle. Delivery details such as which images and how many were never accurate. To help ensure a successful backfile project, include some type of pre-project checklist.
The following is a suggested minimum for this check-list:
- Pre-scan inventory
- Pilot process to establish image quality standards
- Indexing nomenclature and detail
- Error rating process (by image, record, index…)
- Batch delivery schedule including durations and volumes
- Reconciliation methodology to original inventory
- Review and error reporting process
Backfile Scanning: Prior to beginning any type of backfile scanning project, you must determine one goal; “Ultimately, how do I need to use these digital images?” If you only need copies of your images on USBs or DVDs, then that is all you need to ask for. But if you actually intend on accessing these documents in their digital format within your document or content management system, then you will need to establish a way to ensure you are receiving what you are expecting.
Good Questions to ask you conversion vendor:
- How are we going to look-up these records digitally?
- The image quality is important to us – must we pay extra for image enhancement.
- Will we be able to do a full-text search within these records?
- How much space will these images take up on our servers?
- What image format do we need?
- Do we need additional software to search for and view the digital records or can we use what we have?
- How will the accuracy be calculated for this project?
- How long with the project take?
- How will we be allowed to review the work?
- 10. What is your guarantee policy?
It was during the 1980’s when document management software first impacted the day-to-day business of local government. The ability to convert paper and microfilm images into digital images and then access them on a computer was almost revolutionary. Maybe it was… Initially, document management software vendors created closed architecture applications that relied upon proprietary (non-standard) image file types. Today we would consider standard image file types (not including-Microsoft/Windows types) as, Group IV tif, PDF, JPEG, GIF and etc…
These early developers of document management products competed primarily against themselves for much of this government business. A vendor could produce a reliable product, distribute it heavily and charge just about whatever they wanted for annual support. Vendors locked government offices into long-term support contracts for what seemed like perpetuity.
However, many of these vendors could not or would not keep up with the rapid evolution of technology. The platforms they originally built upon and the development languages they used became obsolete. The document management world began to embrace more standard platforms and languages.
Government offices found themselves locked into early generation document management products that did not deliver as well as new technology. Support costs were high and performance, by comparison, was poor.
So here is where the really big problem came into view. As government offices were deciding to upgrade to new document management technology, they became aware that they could not use their legacy images. Why, because the legacy images were in proprietary formats. These images could only be viewed in the original, now obsolete, software. Converting the proprietary images to standard image formats could only be done by the original vendor, and you guessed it, vendors were charging outrageous fees for these services. The fees required to convert their legacy images many times made the move to new technology cost prohibitive. Eventually, all of these dinosaurs became extinct, right….
Wrong – Proprietary is back and in a big way. There are multiple vendors in the local government market today selling document conversion services bundled with proprietary software. Deals are disguised as exceptionally inexpensive conversion services bundled with the vendor’s proprietary document viewing software. We know that price is always the predominant factor in purchasing at the local government level. However, what has happened now, un-suspecting government offices find themselves in the same predicament as their predecessors 20 years ago. The upfront price to provide the document conversion services seems too good to be true – and it is. Just like in days past, when the government office needs to export their images out of this proprietary environment for other uses, the answer from their vendor is “NO, you cannot have your images”. These un-suspecting government customers are required to view their scanned images in the vendor’s software or not at all. To have use of the images in another document management system, you will need to pay to have the original documents scanned again.
How to avoid this trap?
Prior to signing any contract with a document conversion vendor, demand that you receive your images in a standard format that you can use in any system. This will deter most of the proprietary vagrants from trying to lock you and your department into an embarrassing mess.
Have you been to a “Scanning Seminar” recently? You probably walked away believing that the document scanning was the most import part of any “conversion project”.
But then you visited with a consultant who greatly undervalued the importance of the scanning with a dismissing statement such as, “anybody can scan paper (or microfilm)”… He or she then explained that the crucial element of a document scanning project is the consulting and professional services to implement your project.
But wait; now you visit with a software salesperson. You are informed that buying the proper software will ensure a successful project no matter what type of scanning and/or professional services you employ.
Referencing the “Three Legged Stool” analogy, we can see that if any of these three elements fail to deliver, you will fall right upon your _ _ _. Experience will tell us that each of these elements is equally important. Each is dependent upon the other to ensure a successful project:
Scanning Service: Proven Quality Control and Project Tracking methodologies along with proper hardware and software are crucial in the success of your project. Determining the document capture configuration is entirely dependent upon the type and volume of your source documents. The software functionality used to do image clean-up is the most important in this selection. If you intend on OCR or automated forms processing, image quality is key to success.
Professional Services: These services should set the table for the project. From elements such as a pre-scan inventory, importing scanned images into your new software, pilot projects, project milestones, determining indexing nomenclature, network requirements, training and other elements involved in the over-all project implementation are the nexus to the software and scanning.
Enterprise Document/Content Management Software: Of course software is always important. Your software selection must meet and exceed your current needs and provide scalability for the future. Very cliché, but truthful; by working with both a good consultant and a good software vendor, you will get more of a 360 degree view of what you will get out of your new software. Initially, you should access the scanned images in your new software system in a way similar to that if you were looking for these records in a standard file cabinet. Moving too many steps passed this may lead to user confusion, a feeling of intimidation and a lack of user buy-in.
Optical Character Recognition – Per Wikipedia, OCR is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. Applied to the appropriate document type and format, OCR processing is extremely useful and can save both internal resources and CAPEX along with producing a higher quality product than if done by hand-key entry.
Unfortunately, OCR is not for every project. Skewed text, rough text, heavy noise, lines and other foreign data interfering with a clear and uninterrupted view and scan of text will reduce accuracy.
OCR engines are very linear processes – they look horizontally and perpendicularly across digital images. Any skewing from a 90 degree orientation will negatively affect any OCR engine. Additionally, OCR engines are not magic but very pragmatic. Images must contain familiar text resembling existing alphabetical characters. Anything that distorts standard text will reduce accuracy.
The following are industry accepted steps used to increase OCR accuracy:
Deskew – Software process, using various advanced algorithms, will identify the text orientation and attempt to align the image to a perfect 90 degree.
Noise Reduction – Also known as despeckling – software process that will remove small imperfections, spots, scratches, blotches and random marks from within the white area in a digital image. Removing these imperfections will reduce OCR engine interference and reduce “false positive” reads.
Dilation/Erosion – Text quality is the key to OCR accuracy. These filters can smooth the edges of text by removing pixels that represent rough edges or add pixels to fill missing data with a character.
Line Removal – Speciality software can provide the functionality to remove lines from an image. Removing lines reduces OCR interference.
Red/Blue/Green Dropout – Using the proper settings, color scanners will “not capture” red, blue and green data within an image. Many times, pre-printed forms have the boxes and response areas printed in red, blue or green. This is purposefully done, so that during the scanning process, the box lines and response areas are not captured and thus is less interference with the OCR engine.
The discussion on the viability of using microfilm or digital for long-term archiving rears its ugly head on a regular basis in courtrooms, boardrooms and offices for both government and private institutions alike.
From a legal perspective, microfilm is a supported format. The Best Evidence Rule (Federal Business Records Act, Uniform Photographic Copies of Business and Public Records as Evidence Act) states that these statutes permit the admissibility of any record which has been “kept in the regular course of business and copied or reproduced by … any photographic, photostatic, microfilm, microcard, miniature photographic or other process which accurately reproduces or forms a durable medium for reproducing the original.” Accordingly, the reproduction is as admissible as the original. The process of recording information optically clearly falls within the law’s language of “other process which accurately reproduces or forms a durable medium for reproducing the original.”
All US states have published document retention and library standards and micrographics adhere to just about every state’s standards for long-term document archiving. New York State, California, Texas, Indiana, Arizona, Louisiana and Florida Archives, just to name a few, promote microfilm as a viable and practical medium for preserving the state’s history.
Now, ask yourself this question; Let’s say you have been left a trust in the value of $100,000,000.00. You must wait 20 years to have access to these funds. The money will be accessed via a 1,000 character code found on 200 separate documents (files). You will be provided with these documents on a USB drive, a CD, a DVD or a roll of microfilm. Which media would you choose? Backwards compatibility will always be a serious concern. What platform created the electronic copy? Was it Windows? Will this format be supported in 20 years? Will you need to do some type of conversion to your 20-year-old data to have access to your code? If you lose one single image, you will not be able to access these funds. Now, instead of a code for a trust, look at those documents as proof of purchase/ownership, human resource records, certification documents, medical research history, etc…
Advantages of Microfilm for Long Term Archiving:
- Properly filmed and processed microfilm on a polyester base has an anticipated life expectancy (LE 500) of 500 years.
- All you need to view film is a light source.
- Individual pages cannot be pulled or lost.
- Original rolls cannot be edited.
- A single roll of 16mm microfilm can hold over 8,000 images – that is almost an entire 3 drawer file cabinet of documents.