Thursday, May 31, 2012

How do you want to find your documents?

Document Capture Drives Search
One of the first stages in planning for any scanned image repository is to ask the question: How do you want to find your documents?  Theories vary on best practices, but here are a few tips when designing a document capture implementation for any ECM system:
  1. Limit your number of fields to 5 or less. So many times i see document scanning customers use way to many fields during capture.  The more fields you have, the more time for end users to index their documents, and the more chances fields will get skipped.  Take the time to interview the end users and truly find how they need to search for their documents.
  2. Always use a date.  Dates are the ultimate filter that can be a life saver when searching for that needle in a haystack in a scanned document repository.  Invoice date, purchase order date, contract date, etc. give you the power to narrow down your search results to a specified period and can be a huge help in audit based searches or searches for legal support.
  3. Use automation to reduce indexing time.  Document capture applications provide automation and efficiency, and can reduce end user keying requirements on documents.  Strong, accurate OCR technology, and Advanced Data Extraction (ADE) are absolutely required.
  4. Ensure your technology has a QA step.  If you are going to go to all the trouble of scanning, capturing and migrating documents to a repository, make sure you can check your work.  Misfiling a document can a painful experience.
  5. Full text search is the insurance policy.  Always, I repeat always, convert your scanned documents to a searchable format, PDF Image with Hidden text.  This will allow for granular searches beyond your index fields/columns, and can help you in the "find a needle in the haystack" tasks.  But do not, I say, do NOT rely on full text search as your primary search method.  Full text does not let you sort by specific document focused dates, cannot let you do range based searches on specific criteria, and restricts sorting and viewing in most repositories.
Just a few tips when designing your document scanning index fields.