creating Searchable PDF via scanning

Status
Not open for further replies.
Joined
Jan 8, 2006
Messages
2,813
Location
Michigan
I volunteered to scan 7 inches thick stack of printed material for our spoonplugging club.

I used my cheapo scanner/printer that captures in PDF and it did a great job and the pdf was not marked and was not searchable.

I really want the pdfs to be searchable. I have a canon scanner and a brother all in one.

not sure which one I can use to scan and create pdfs that will be searchable.

Any body out there with any experience in scanning to pdf that is searchable?
 
Originally Posted By: stockrex
Any body out there with any experience in scanning to pdf that is searchable?

See the link provided by Quattro Pete.

All a scanner can do is produce an IMAGE -- a pretty picture -- no matter the file format you choose. Essentially, you need to have some sort of computerized eyes look at that image, read any characters depicted in the image, then type those characters out for you into an editable document, automatically.

That automatic reader and typist is called an Optical Character Recognition program, known for short as OCR. Most scanners and multifunction units come with an OCR program as part of their software suite. If you didn't install the suite, you can often download it for free from the maker's website.

Be warned, though: The OCR program will only get it about 90%-95% correct. You'll still need to go through the documents in Word or other word-processing program and fix all the spelling mistakes and wrong words. OCR programs have a tendency to get confused and do things like combining an "rn" combination into an "m". It gets worse if the page goes through the scanner slightly crooked, or if the source document is blurry, faded, or distorted.
 
Adobe Acrobat Full Version does this with ease, since you said its 7" thick I dont think any free version will let you do that many pages??

But If its typed up ive found Adobes OCR to be 100% if not near it, ive never really found any mistakes. It also auto rotates the text to be straight. Pretty cool imo

I had a 300 page Motorcycle engine Manual that wasnt searchable took about 20min to do it all but was well worth it.
 
Originally Posted By: thescreensavers
Adobe Acrobat Full Version does this with ease, since you said its 7" thick I dont think any free version will let you do that many pages??

But If its typed up ive found Adobes OCR to be 100% if not near it, ive never really found any mistakes. It also auto rotates the text to be straight. Pretty cool imo

I had a 300 page Motorcycle engine Manual that wasnt searchable took about 20min to do it all but was well worth it.


yes, adobe has a add tag feature. I think I have a full version from years back I bought.

7 inches but I only scanning 30 to 40 pages in each file.

I wonder if other software has the add tag feature, i.e. OCR on the fly and mod pdf to add the markup.
 
Originally Posted By: Tegger
Originally Posted By: stockrex
Any body out there with any experience in scanning to pdf that is searchable?

See the link provided by Quattro Pete.

All a scanner can do is produce an IMAGE -- a pretty picture -- no matter the file format you choose. Essentially, you need to have some sort of computerized eyes look at that image, read any characters depicted in the image, then type those characters out for you into an editable document, automatically.

That automatic reader and typist is called an Optical Character Recognition program, known for short as OCR. Most scanners and multifunction units come with an OCR program as part of their software suite. If you didn't install the suite, you can often download it for free from the maker's website.

Be warned, though: The OCR program will only get it about 90%-95% correct. You'll still need to go through the documents in Word or other word-processing program and fix all the spelling mistakes and wrong words. OCR programs have a tendency to get confused and do things like combining an "rn" combination into an "m". It gets worse if the page goes through the scanner slightly crooked, or if the source document is blurry, faded, or distorted.


Pete et Tegger, I will check the canon and brother software that cam with it. I am sure I saw an OCR cd somewhere.
 
What people are saying about scanning and about Adobe is correct.

That said, the Fujitsu ScanSnap scanners are usually sold with Adobe Acrobat and with ABBYY FineReader to make painless one-button "Searchable PDF" scanning available once you install the software.

OCR'ing a scanned document is usually good enough, but not so good that you can OCR a bash script and have it work. There are always mistakes. A better scan (Fujitsu is much better at de-speckling, de-skewing, and contrast than a commercial Xerox all-in-one) makes a lot of difference to how much manual correction you'll have to do afterwards.

Also, if you're only getting 90% accuracy it's almost not worth bothering (that's 3-5 errors per line). That's where accuracy was back in 1983, and only when you had fixed type Pica. You need 98% accuracy minimum, with variable-spaced Arial and Times New Roman. ABBYY and accuracy

Also, you might want to test the free, 10 page max. OCR in Google Docs
 
Originally Posted By: spackard
if you're only getting 90% accuracy it's almost not worth bothering (that's 3-5 errors per line).

I just picked that number off the top of my pointy head, and evidently overstated the error rate. Sorry about that.
 
here is what I ended up with:
1. My canon imageclass comes with software to acquire image and create searchable PDF in one step, but my canon does NOT have a doc feeder. So this is out
2. I could not find my brother scanner's cd or clear confirmation on brother's download page if the paperport that comes with it does searchable PDFs. So I am not using this for the moment.
3. I scanned to regular pdf and then uploaded these to my google drive, it provides an option to when I upload to create searchable pdf on the fly.

Google wins again! :)
 
Status
Not open for further replies.
Back
Top