Friday, July 8, 2011

Document Capture and Scanning Planning - Part 2

Document Examination and Separation


One of the key steps in preparing for document scanning and capture is to identify how you will separate or split documents.  What is separation and how does it work?  Details below:

For those of you that are new to document management and capture, document separation is the notion of how we can determine when a document begins and ends.  With most simple scanning software, this process is easy.  You load a single document in the feeder, click scan, and when it is done, you name it and save it.  With advanced capture, you can load multiple documents into the feeder, scan them all at once, and use a separation method to split them into individual digital documents.    This is a massive time saver.  Imagine loading 20 individual documents into a scanner one at a time, scanning each individually, and then entering information about each.   Below are some key separation methods any advanced capture suite should have:

Fixed Page Count Separation – This allows you to split based on a certain page count.  So if you scan a stack of 100 two page forms, you will have 50 separate documents in your capture interface.

Barcode Separation – probably the most pervasive separation method is a barcode separator.  Place a sheet with a specific barcode pattern between each document, and you are off to the races.  To give you the most flexibility, applications should support the following enhanced barcode separation methods:

  • Separate on any barcode
  • Separate on specific barcode terms and patterns
  • Separate on barcode type
  • Separate on barcode count
  • Separate on a certain number of barcodes on a page
  • Separate when a barcode changes

You want to make sure your barcode engine supports 1D and 2D barcodes without the purchase of any expensive modules or add-ons, and it should also have a simple feature that lets you split 2D barcodes and identify separation terms.

Patch Code Separation – So what the heck is a patch code?  Just an old school horizontal barcode.  Below is an example.  If you work in the medical field, most medical billing forms will have these on them, and some scanners actually support using patch codes to shift scanner settings during the scanning process.  For flexibility, choose an application that supports patch code separation.

Optical Character Recognition (OCR) Separation – OCR is the process of converting a scanned or imported image into searchable text.  OCR separation searches for a key word, term or phrase on the document, and will recognize that page as the first page in a new document.  This is a preferred method, as you don’t have to kill trees to print cover sheets, and it makes document preparation simple (no inserting separator sheets).  For example, if you are scanning contracts, and you want to split when you find an 8 digit contract number in the right hand corner, this comes in very handy.  There are several key requirements in this feature that are absolutely required in your application to make sure you get high separation accuracy:


  • Scan at 200 or 300DPI and use an app that has image processing software to clean up the page.  Also, your image processing engine must allow processing of imported PDFs and TIFFs if you plan to harvest documents.  Some image correction/processing engines only work with scanners.
  • Insure you capture application allows you to use expression matching (Regular expressions) so you have the utmost flexibility in finding separation patterns.
  • Character sets are key.  These provide the ability to tell the OCR engine the type of characters you are looking for (A-Z, 0-9, etc), so if it misidentifies a character, it auto-corrects the information.
  • Finally, top line applications also allow you to separate when OCR terms change.  So you can look for that contract number, and only split when you find a new one.
Intelligent Character Recognition (ICR) Separation- ICR is the process of converting scanned images of hand printing to text.  This method can be utilized to split pages when certain patterns in hand printing are detected.  Note:  all of the features required to insure accuracy for OCR separation should also be considered if you utilize this method as well.

Document Import and Separation – There are several separation methods that can be key to success if you need to import large volumes of documents, or you want to process documents scanned from copiers, network scanners, or fax machines.  Below is several separation methods required for any document capture from imported files:
  • New File Separation – This method of separation will look at a directory, pick up files, and maintain each new file as its own digital document.
  • Folder-based separation – This is a key method if you are importing documents and want to combine them based on the folder.  One example might be a law firm that has a folder structure of case documents on different subjects for the case and wants to combine each folder into a single PDF file.


Blank Page Separation – I only mention this as I would always, always avoid it unless absolutely necessary, especially if you are scanning in duplex.  Most implementations of this method, unless operated under strict preparation by knowledgeable operators becomes an absolute mess. (Just my humble opinion ;)  )

Separation Scripting – Finally, for those rare and special occasions, you always want a product that has a pre-built scripting interface for customizing the whole process if necessary.  Now let me be clear, not a sales rep “Yeah we can do that” (Which usually means $20,000 in professional services), but a product that has simple hooks into the separation function, that allows you a simple “yes or No” based on some parameter or criteria that anyone with basic scripting skills can write.  When would you use something like this?  Usually for very complex jobs where the original documents cannot be modified, but you need to put some logic in place to spit documents.

The last separation topic I want to cover is something called triggered separation.  Let me set the stage on this one, and describe a process which is near and dear to every accounting manager’s heart, invoices.  So you have a stack of invoices, some single page, some multi-page and you are struck with a dilemma.  If I use barcode separators, and I have 100 single page invoices, do I really have to put 100 barcode separators between them all?  Separation triggers allow you to scan single page and multi-page documents all together.  So in this example, you can stack your singles, and then put separators between your stack of variable length separators.  Put a trigger sheet between the two stacks (this tells the capture software to switch from single page separation to barcode-based separation), and scan the whole stack in one fell swoop.  This is a huge time saver in high volume environments, and can allow you to also build redundant separation logic, so you get the highest accuracy in separation with the least amount of document preparation.  Phewwww.  That was geeky.


Do you really need all of this?  Does separation have to be that complex?  The whole goal here is to have as much as you possibly can in the tool kit to insure you can meet all the capture needs within your organization.  I liken it to buying the a base model with no accessories, and then wishing every day you one or another feature.

So now you have examined your documents, and figured out how to efficiently scan and split.

2 comments:

Amila said...

Great Post ! A must read for anyone planing to start document imaging.

Anonymous said...

Thanks!
This was useful! I like geeky :)