I spent nearly the entire day yesterday scanning in a whole book…well actually i only got two thirds scanned before i got too tired and I know I was tired because i ended up fucking up a few pages in a row and then i fucked up fixing the fuck up.
I have scanned and photographed many many entire books and this is what i can share with you.
#1 Don’t ZONE OUT and don’t do it while you are tired..this is the kind of task that if you make a mistake you can go 50 pages before you realize it and then have to go back 50 pages to fix it.
Hint #2 Scanning is slow so do something else while waiting..but THIS is the not mindless task, so the OTHER thing should be less mindless..NOT the other way around. see above. if you are doing something complicated with your right hand you will fuck up with your left. I basically listened to an audiobook and played Bejeweled while i did this. and stopped BOTh when i screwed up.
Hint #3. MAKE THE PAGE NUMBERS MATCH THE FILE NAME.. yeah i know pain in the ass to start off..because if the cover is file 001 that makes the actual page one File 007. But trust me on this.. once you have a folder full of files it is easier to find the exact page you need to redo and if you import them all into a PDF it is easier to sort them. THIS took me a long time to figure out..i thought i was too smart to worry about this.
Hint #4. Scan or photographs on pages SEPARATELY. Which means scan all the pages of the book then go back and to the images. the pages you want at 300…IF You want to print them..but the images you want much much higher if you want to clean them up for insertion. I forget this when i am photographing a book on location…i need to start remembering it. Photograph the PAGE then take another of just the image. This makes more sense later when you are working with the image files and you realize you need to open a page file and then extract the image etc etc etc.
Hint #5 pick a fluid file type, something you can change into something else. if you are going to use WINDOWS IMAGING TO OCR (which needs a TIFF file) check to see if the TIFF you can create from your scanned is valid… mine weren’t and it is a pain in the ass to open every one of 300 pages in Photoshop to resave them in a different format yatta yatta … i use an OCR file that works with images, so a JPG works just as well. if it’s not hundreds of pages you can simply scan to a multipage PDF…but i find over 7 or 8 pages it is WAY to easy to mess up the page order and sometimes the computer just throws up its hands when saving and says ‘ no way jose’ i’d rather build the PDF myself later with the raw images. especially if i am using larger image files than needed.
HINT #6 yes virginia you can scan sideways..at least I do. If the book will fit sideways on the scanner with no loss of text, i scan both pages at the same time. With my epson scan utility I can create little vignettes around the text on each page, HOWEVER they are still sideways images. Though my OCR programs will rotate when i got to open them..and of course i could open Photoshop and do them each individually which i loathe. Simple WINDOWS PREVIEW function lets you rotate the image file the right way up and then save with the same name. so while the scanner is cranking away at pages 8 and 9, I am previewing pages 6 and 7 and rotating and saving them.
this brings me to Hint #7 DON’T LET THE MACHINE GET TOO FAR AHEAD... just be about 2 or 4 pages behind in the preview…because you may have just flipped the page before it finished page 7 and now you need to delete the page 7 you just saved and rescan it, AND you need to change the increment number the scanner is using so the image syncs up. OR if you don’t want to rescan 6 and 7 together you need to change the FRAME on the scanner so it just does ONE page. then you notice that 8 is out of alignment and you rescan it but you named it wrong and have to change the file name..see it gets complicated if you are too far ahead…you COULD skip bad pages and come back and rescan later..but there WILL be other bad pages to rescan so why make more?
HINT #8 STRAIGHT works better. by page 190 the text will drift and you will say to yourself..what’s a little drift? and later when the OCR program can’t READ it and you end up transcribing an entire page, you will be kicking yourself for not taking the time to flatten the book and line it up..
HINT #9 take a break…even if it is DAYS in the middle… otherwise you will just keep making mistakes and throw the book against the wall. I did 100 then another 100 now the last 100. THEN i get to work with the text.
After I have all each page as a file…i can use them as is, or using Adobe Acrobat..which i love …or another PDF creator. you can group them together and make PDFs out of them. If you just going to print the pages it doesn’t matter, but smaller multipage documents are easier to work with than a big fat large one.
LAST HINT. SCAN THE TITLE PAGE. If you are photographing or scanning..don’t skip any pages with text. even if you aren’t going to use that page. When you look at all the pages in the folder, you brain should account for them. Most of the time i don’t NEED the title page but having it in the folder with the other images, tells me exactly what’s IN the folder at a glance.
NOTE don’t do this to books that can’t survive the process. most of the time i am working with more fragile books and documents, instead of flipping them upside down…i use a decent digital camera and try to adjust the levels so that the page is legible. but then most of the time THOSE publications can’t be read by OCR properly and I end up transcribing them 100%. BTW EVEN if you run OCR or bonus get to DOWNLOAD scanned text files of books from archive.org…MOST of it is not usable as is. there is a carriage return at the end of every line and the OLD fonts don’t like being read…hence the H comes out like li and the 8’s like 3’s..it is just faster to type them from scratch.