SRIIA TECHNOLOGIES

SCAN STORE RETRIEVE INDEX INTEGRATE ARCHIVE

Tag Archives: microfilm

Accessing digital documents in the future

loadsite_microfilm-storage

“Notwithstanding all the benefits of digital document storage – How do you answer the sceptic that asks how are you going to guarantee that you can retrieve this document in ten years or twenty years time?” – Michael Burgess ML Burgess Consulting.

This question is something I address on a very regular basis. I spend a great deal of time working with state and municipal governments with their records management programs. In my opinion, the main two elements to address are media type and document type.
When answering the media type, you need to know you the document type and then the corresponding retention policy. Long story short, for long-term archiving (10 years and longer), and to avoid the issue with technological backwards compatibility, you must use a simple, unalterable media format. And today, that answer is still microfilm. I know many people bristle at the thought, maybe even laugh. Before you tune me out, ask yourself a few basic questions. How many times has your own digital environment failed? Servers and shares that everyone thought were on a scheduled back-up that turn out not to be. Hardware failures, human error, non-standard or non-existent records policies and the like all contribute to failed and compromised archives. And, be sure to clearly define, internally, your business definition between back-up and archive. Microfilm is certainly not your best day-forward active data solution. For access and distribution, film is cumbersome as compared to digital media. However, a roll of film or especially microfiche is not alterable. You cannot pull out page or file. And ultimately, all you need is light to view the data. You can write over 6,000 letter sized pages to a standard 215′ roll of film at a cost of around $120.00. Properly stored microfilm has a Life Expectancy (LE) of over 500 years. And for long-term storage, a single roll costs around a dollar a year. By cost comparison, factoring in hardware, software and human intervention, digital technology comes nowhere near this. Try film, it works.

Kevin Williams

Principal – SRIIA Technologies, Inc.

SRIIA Technologies Archiving Services

SOX is the Reason to Archive with Microfilm

Sarbanes-Oxley Act
Section 802

SEC. 802. CRIMINAL  PENALTIES FOR ALTERING DOCUMENTS.

(a) IN GENERAL- Chapter 73 of title 18, United States Code, is amended by adding at the end the following:`Sec. 1519. Destruction, alteration, or falsification of records in Federal investigations and bankruptcy. `Whoever knowingly alters, destroys, mutilates, conceals, covers up, falsifies, or makes a false entry in any record, document, or tangible object with the intent to impede, obstruct, or influence the investigation or proper administration of any matter within the jurisdiction of any department or agency of the United States or any case filed under title 11, or in relation to or contemplation of any such matter or case, shall be fined under this title, imprisoned not more than 20 years, or both.`Sec. 1520. Destruction of corporate audit records.

Most IT managers would tell us that the only way to archive your records is to back-up on tape or disk drives. When you consider the implications of lost data how long you (firm, office) are responsible to keep much of your corporate records, you’re chosen long term archival media system needs to be thoroughly evaluated.

Destruction, Alteration and Falsification

If you store legacy records on paper in files, how easy is it to miss-place a single page, replace an existing page with a new page or eliminate (destruction of some method – i.e. shredding) a file from a record completely.

 Corruptibility and Backwards Compatibility

While the idea of a digital back-up of your records seems safe enough, have you recently tried to open a CD or USB type media device that is 10 years old or older?

CD/DVDs are still vulnerable to data loss due to disease and damage. USB type portable devices can be corrupted during the importing/exporting process.

Technology continues to march forward at an ever quickening pace. Platforms are changing every 2-4 years which makes backwards compatibility very “iffy”. It’s not good business practice to assume that you can easily and reliably access back-up data on a disk-drive from as recent as 6 years ago.

Reliability of Tape Backups– A survey of IT executives on tape backup solutions reported these findings:

Gartner Group estimated that 10 to 50 percent of all tape restores fail. Storage Magazine and Gartner reported that 34% of surveyed companies never test a restore from tape, and of those that do test, 77% experienced tape backup failures.

  • 75% of respondents indicated that their companies suffered unrecoverable loss of corporate data they thought was successfully backed up to tape due to unreadable, lost or stolen media.
  • 63% said they encountered unreadable tapes when they tried to retrieve data with 76% of those cases reporting a direct impact to their business from loss of productivity to punishments for regulatory compliance infractions.

Every digital media is vulnerable to some type of corrupting influence. Anyone who has participated in a large conversion process from an older digital format to a more contemporary version knows the pitfalls. It is very likely that some (if not all) of the data will be corrupted and lost during the conversion process.  And data conversion is very costly.

Trusted Long Term Archival Platform

Archiving your business critical data to microfilm is still the most dependable and trustworthy solution for long term records storage. Altering a roll of film is difficult and obvious. Anyone can see if a roll of film has been spliced. The cost to archive a single image to film costs about $0.03 each and storing a roll of film costs about $1.50 per year.

A single roll of 16mm, 215’ microfilm can store more than 2 full bankers’ boxes of records. Properly filmed, processed and stored microfilm has a life expectancy (LE) of 500 years. And ultimately, you need only a flashlight to view the documents.

Explosion of Digital Storage Worldwide

Research company IDC last year forecast that 2010 would see the volume of  digital data stored by everyone on the planet reach 1.2 zettabytes (1.2 billion  terabytes), representing growth of 62 per cent over 2009. And by 2020, that  volume will have grown by a factor of 44 to 35 trillion gigabytes.

Read more: http://www.computing.co.uk/ctg/feature/2075349/overload-averted-essential-guide-storage-solutions#ixzz1OD4sE1It

Research from Gartner estimates data capacity in those enterprises is growing by  40-60 per cent per year on average, not enough to cope with demand if IDC’s  statistics prove accurate. Its survey of 1,004 companies in eight countries  conducted in August 2010 identified data growth as the top data-centre  challenge, followed by system performance and scalability, and network  congestion and connectivity architecture.

This Washington Post article sited further reinforces the predicted coming explosion in on-line data storage;

Exabytes: Documenting the ‘digital age’ and huge growth in computing capacity

How Redaction works for the Public Record (Recorders, Registers, Clerks, etc…)

At the state and local level, our elected officials are responsible to manage and make available Public Records. While doing this, they must be aware that some of these Public Records contain Private Information. Think about the volume of land related records your local county office deals with. Every transaction where a piece of land or property changes ownership, a voluminous collection of documentation follows. Title information, mortgage/loan information, historical information such as liens, court information and related documents all have some influence as to the disposition of any piece land or property.

Found within these numerous documents can be found varying types and amounts of Personal Identifying Information (PII) such as social security numbers, dates of birth, credit and bank card numbers, driver’s license numbers and etc. Now, add the additional complexity that web access to these documents is now de facto.  The voting public and private industry simply demand electronic information exchange.

More and more county and state offices are adopting electronic technology to manage their core business operations. Because of this, the inter/intra office, departmental and division communication is becoming more and more electronic based. Document sharing is no longer sending volumes of paper documents, but allowing access via an on-line computer application.

So how does a state/local official comply with current statutes and laws to provide open access to Public Records without exposing individual’s Personal Identifying Information (PII), especially via the web? And the question as to why they would not want to expose the PII should be rhetorical. Considering the current environment of identity theft we all exist within each day, information security is of paramount concern.

The answer is called Redaction:

Redaction is the process of covering over or blacking out specific information within a document. For hardcopy documents, this can be overwhelming task. Each time a person comes into a government office to research and gather information and makes copies of Public Information for personal or private use, the office has the risk that someone will walk out with someone else’s PII.

Or, when county offices receive record requests via snail mail, many offices today still deliver the information via hardcopy documents.

Example of legacy document hardcopy transaction:

  1. Original requested documents are located and photocopied;
  2. PII is redacted from photocopy using a marker of some type;
  3. The redacted copy is photocopied to eliminate the opportunity of bleed-through of original information;
  4. Original copy is re-filed, requested copy is mailed, and original photocopy is destroyed.

For day-forward electronic transactions:

Day-Forward Processing: Using an automated redaction software product

Documents can enter a local government office in a variety of ways;

  • Public Access software (E-Recording) (Simplifile, Ingeo, etc…)
  • Fax
  • Title Company
  • Gov-to-Gov interchange
  • Web Portal
  • Other

As each new document is received in an office:

  1. The document is scanned or received electronically;
  2. Workflow driven process pushes image through an automated redaction product;
  3. Image comes up for manual human review and verification;
  4. Redacted image is electronically copied, creating a “public, redacted” version;
  5. The image follows normal workflow process until final document is verified and made available for view;

The original, non-redacted image remains in-tact while the redacted version is made available for public access.

If you have legacy/historical documents that need to be redacted, you should check with your automated redaction software vendor to see what they suggest. If your legacy documents are still on paper or film, you will need to go through the exercise of digitizing these documents. See my posts on preparing for conversion projects for assistance.

If your documents are already digitized, then your vendor may be able to facilitate your office doing verification of the images they process through their automated redaction software. Your CAPEX would be minimized by doing the verification work in-house.

I would not suggest trying to process all of your legacy documents through an automated redaction software product in-house. Processing millions of images through redaction software is very processor heavy – your office would probably need to purchase expensive servers if you chose to do the entire project in-house.

Is Document Management Still a Viable Market?

Following the “glass half-empty” analogy, many have said that the document conversion and document management industry is nearing its end. “They” claim all the paper in the world has been scanned and every business has already purchased their fancy new EDMS (Electronic Document Management Software) and ECMS (Enterprise Content Management Software).

Well, not so fast. In a recent survey done by Eclipse Group, an international services firm providing document and content management solutions, they found that an alarming number of businesses and organizations today are still heavily reliant upon paper based business critical processes.

Key findings:

  • 75% of companies are still heavily reliant on paper based invoicing processes
  • 67% of respondents are currently sending the majority of their sales invoices by paper rather than electronically using a document management solution
  • 75% of respondents are currently sending the majority of their purchase invoices by paper rather than using a document management system
  • 83% currently have to re-type invoices received into their finance system

The survey, which includes the views of financial professionals from a variety of sectors including insurance, financial services, industrial and automotive, also found that 83% of respondents currently have to re-type invoices into their finance system upon receipt into the accounts departments rather than using a document management system, raising serious concerns over accuracy and efficiency.

Gary Waylett, CEO of Eclipse Group commented, “Given the efficient way most businesses are now able to share information, it is surprising to find so many finance departments are not using a document management solution and continue to re-key data between systems. In addition to duplicating effort, and hence adding cost, re-keying significantly adds the risk of errors which then complicates the reconciliation process.”

So take heart all of you software and services companies, business will be good for years to come.

Seven Steps Towards A Successful Document Conversion Project

The service of converting paper or microfilm documents to digital format is a commodity in the document conversion world. It seems that anyone can become a service bureau with an inexpensive scanner and rudimentary capture software. The problem is there is really so much more to scanning than meets the eye – and this doesn’t become apparent until you have paid someone to scan a million of your documents just to discover you can only access about 750,000 of them within your document management software. Oh, and this realization happens about a year after you have signed off on the project.
How will you ever know if the bureau actually scanned 100% of your images? How will you know if they delivered 100% of them to you? I was once part of a project where we had partnered with a service bureau to scan land records books from a major US county. During the process, the partner delivered 50,000 medical records images by mistake. Talk about a disaster – this is one of the worst I have ever experienced. Billing throughout the remainder of the project constant struggle. Delivery details such as which images and how many were never accurate. To help ensure a successful backfile project, include some type of pre-project checklist.

The following is a suggested minimum for this check-list:

  1. Pre-scan inventory
  2. Pilot process to establish image quality standards
  3. Indexing nomenclature and detail
  4. Error rating process (by image, record, index…)
  5. Batch delivery schedule including durations and volumes
  6. Reconciliation methodology to original inventory
  7. Review and error reporting process

10 Questions to Ask Before Starting Your Backfile Scanning Project

Backfile Scanning: Prior to beginning any type of backfile scanning  project, you must determine one goal; “Ultimately, how do I need to use these digital images?” If you only need copies of your images on USBs or DVDs, then that is all you  need to ask for. But if you actually intend on accessing these documents in  their digital format within your document or content management system, then you will need to establish a way to ensure you are receiving what you are expecting.

Good Questions to ask you conversion vendor:

  1. How are we going to look-up these records digitally?
  2. The image quality is important to us – must we pay extra for image enhancement.
  3. Will we  be able to do a full-text search within these records?
  4. How much space will these images take up on our servers?
  5. What image format do we need?
  6. Do we need additional software to search for and view the digital records or can we use what we have?
  7. How will the accuracy be calculated for this project?
  8. How long with the project take?
  9. How will we be allowed to review the work?
  10. 10. What is your guarantee policy?

Government Software Buyer Beware – Avoid Proprietary

This might be better stated as “beware of proprietary image file types”. 

It was during the 1980’s when document management software first impacted the day-to-day business of local government. The ability to convert paper and microfilm images into digital images and then access them on a computer was almost revolutionary. Maybe it was… Initially, document management software vendors created closed architecture applications that relied upon proprietary (non-standard) image file types. Today we would consider standard image file types (not including-Microsoft/Windows types) as, Group IV tif, PDF, JPEG, GIF and etc…

These early developers of document management products competed primarily against themselves for much of this government business. A vendor could produce a reliable product, distribute it heavily and charge just about whatever they wanted for annual support. Vendors locked government offices into long-term support contracts for what seemed like perpetuity.

However, many of these vendors could not or would not keep up with the rapid evolution of technology. The platforms they originally built upon and the development languages they used became obsolete. The document management world began to embrace more standard platforms and languages.

Government offices found themselves locked into early generation document management products that did not deliver as well as new technology. Support costs were high and performance, by comparison, was poor.

So here is where the really big problem came into view. As government offices were deciding to upgrade to new document management technology, they became aware that they could not use their legacy images. Why, because the legacy images were in proprietary formats. These images could only be viewed in the original, now obsolete, software. Converting the proprietary images to standard image formats could only be done by the original vendor, and you guessed it, vendors were charging outrageous fees for these services. The fees required to convert their legacy images many times made the move to new technology cost prohibitive. Eventually, all of these dinosaurs became extinct, right….

Wrong – Proprietary is back and in a big way. There are multiple vendors in the local government market today selling document conversion services bundled with proprietary software. Deals are disguised as exceptionally inexpensive conversion services bundled with the vendor’s proprietary document viewing software. We know that price is always the predominant factor in purchasing at the local government level. However, what has happened now, un-suspecting government offices find themselves in the same predicament as their predecessors 20 years ago. The upfront price to provide the document conversion services seems too good to be true – and it is. Just like in days past, when the government office needs to export their images out of this proprietary environment for other uses, the answer from their vendor is “NO, you cannot have your images”.  These un-suspecting government customers are required to view their scanned images in the vendor’s software or not at all. To have use of the images in another document management system, you will need to pay to have the original documents scanned again.

How to avoid this trap?               

Prior to signing any contract with a document conversion vendor, demand that you receive your images in a standard format that you can use in any system. This will deter most of the proprietary vagrants from trying to lock you and your department into an embarrassing mess.

Beware…

Document Scanning and the Three Legged Stool

Have you been to a “Scanning  Seminar” recently?  You probably walked away believing that the document scanning was the most  import part of any “conversion project”.

But then you visited with a consultant who greatly  undervalued the importance of the scanning with a dismissing statement such as,  “anybody can scan paper (or microfilm)”… He or she then explained that the crucial element of a document scanning project is  the consulting and professional services to implement your project.

But wait; now you visit with a software salesperson. You are informed that buying the proper software  will ensure a successful project no matter what type of scanning and/or  professional services you employ.

Referencing the “Three Legged Stool” analogy, we can see  that if any of these three elements fail to deliver, you will fall right upon  your _ _ _. Experience will tell us that each of these elements is  equally important.  Each is dependent  upon the other to ensure a successful project:

Scanning Service:  Proven Quality Control and Project Tracking methodologies along with proper hardware and software are crucial in the success of your project. Determining the document capture configuration is entirely dependent  upon the type and volume of your source documents. The software functionality  used to do image clean-up is the most important in this selection. If you  intend on OCR or automated forms processing, image quality is key to  success.

Professional Services: These services should set the table for the project. From  elements such as a pre-scan inventory, importing scanned images into your new software, pilot projects, project milestones, determining indexing  nomenclature, network requirements, training and other elements involved in the  over-all project implementation are the nexus to the software and scanning.

Enterprise Document/Content Management Software: Of course  software is always important. Your software selection must meet and exceed your  current needs and provide scalability for the future. Very cliché, but  truthful; by working with both a good consultant and a good software vendor,  you will get more of a 360 degree view of what you will get out of your new  software. Initially, you should access the scanned images in your new  software system in a way similar to that if you were looking for these records  in a standard file cabinet. Moving too many steps passed this may lead to user confusion, a feeling of intimidation and a  lack of user buy-in.

Preparing for OCR

Optical Character Recognition – Per Wikipedia, OCR is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. Applied to the appropriate document type and format, OCR processing is extremely useful and can save both internal resources and CAPEX along with producing a higher quality product than if done by hand-key entry.

Unfortunately, OCR is not for every project. Skewed text, rough text, heavy noise, lines and other foreign data interfering with a clear and uninterrupted view and scan of text will reduce accuracy.

OCR engines are very linear processes – they look horizontally and perpendicularly across digital images. Any skewing from a 90 degree orientation will negatively affect any OCR engine. Additionally, OCR engines are not magic but very pragmatic. Images must contain familiar text resembling existing alphabetical characters. Anything that distorts standard text will reduce accuracy.

The following are industry accepted steps used to increase OCR accuracy:

Deskew – Software process, using various advanced algorithms, will identify the text orientation and attempt to align the image to a perfect 90 degree.

Noise Reduction – Also known as despeckling – software process that will remove small imperfections, spots, scratches, blotches and random marks from within the white area in a digital image. Removing these imperfections will reduce OCR engine interference and reduce “false positive” reads.

Dilation/Erosion – Text quality is the key to OCR accuracy. These filters can smooth the edges of text by removing pixels that represent rough edges or add pixels to fill missing data with a character.

Line Removal –  Speciality software can provide the functionality to remove lines from an image. Removing lines reduces OCR interference.

Red/Blue/Green Dropout – Using the proper settings, color scanners will “not capture” red, blue and green data within an image. Many times, pre-printed forms have the boxes and response areas printed in red, blue or green. This is purposefully done, so that during the scanning process, the box lines and response areas are not captured and thus is less interference with the OCR engine.

%d bloggers like this: