www.CastechData.com
 

www.castechdata.com Our Services
 

Overview | Data Entry | Medical Transcription | Data Conversion | Litigation Support Services | Call Center Services

Data Conversion Services


 

Imaging, Indexing, and Full Text

Document Mark-up using HTML, SGML, US MARC or client specified document mark-up convention coding.

Document Scanning and OCR Translation

CDSI handles different types of data conversion activities from different formats. We are familiar with imaging, full text, document scanning, OCR translation and indexing. The majority of work done though is in full text conversion of different documents such as newspapers, magazines, journals, books, pamphlets and even best seller reading materials. The following is an example of how we process our conversion activities.

  1. Inputs from different sources are received from the clients.
  2. The initial process involves the scanning of these documents (or keying if they are non-scannable).
  3. The resulting images are run through the OCR process which translates the images into ASCII text files.
  4. Cleaning up of the ASCII files with respect to misread words or data.
  5. Tagging the clean ASCII files based on special tagging rules set by the client. (This is the equivalent of the SGML, HTML or XML coding standard.)
  6. Transmitting the tagged file to the client.

If inputs are TIFF images and PDF files, these are also converted into full text files minus the scanning process. If inputs are received as electronic files from electronic publishers (instead of hard copy documents), these usually come as a Quark file, Mac file, SGML file, HTML file, Ventura file, PageMaker file, WORD file and other non ASCII file and they are converted into plain ASCII full text files with the required tagging. In some instances, what is produce are PDF normal (these are PDF files with search able text) instead of plain ASCII text files from the OCR'd data.

In some instances, clients require that images are produce of certain pictures, photos, charts and diagrams that are being linked to the full text files during the tagging process. These images are saved in JPG formats and are also being transmitted to the client together with the full text files. For some documents, what is being produce are PDF images for every page that is process in the full text department. Other works we do are advertisements such as ads and appointment ads usually sent in hard copy or as images (PDF or TIFF) and an ASCII file and database is produce out of this.