User:Close@Hand/Splitting Blackwood's
Jump to navigation
Jump to search
This is the procedure I use for splitting the pages in Blackwood's, which are usually two columns. It takes me about two and half hours to do a single issue.
- Find the scan of the volume you need at archive.org. The volume number is included in the URL, so, for example, volume 70 is https://archive.org/details/blackwoodsmagazi70edinuoft
- Unzip the archive
- Create a working directory for the issue, e.g. Blackwood433
- Pick an issue, download the page images from DP and unpack. If there are illustration files then download those too, since they are sometimes proofing images for which there is no text file. These images are 8-colour but fairly narrow. But by DP standards that is excellent.They can be cropped and used to replace any images from TIA that have torn or dog-eared corners
- Drop the DP pngs down to 4-bit colour in Blackwood433/reference, as ref001.png etc.
- Find the pages for the issue you picked and copy them into the a directory in the working directory: Blackwood433/in.jpg
- Go through the JPGs in Blackwood433/in.jpg and for any page that has elements running across both columns (typically headers or footnotes), create a second copy of the page, the copy of "001.jpg" would be "001 - Copy.jpg" etc. For example, the page on the right needs two JPGs, one of which will have the right column blacked out, and the other which will have everything except the right column cropped off
- Load the JPGs into a FineReader batch
- Crop or split every page that needs it. If there is a chapter header in the bottom half of the page I do the horizontal split and leave the bottom as a two column image. Try to keep footnotes on the same image as the marker
- OCR the batch
- Go through each page. If the page header is split into more than one recognition areas just keep the one that contains the centre of the page. In pages where one of the columns will be blacked out, delete the recognition area for that column
- Save textw and textwo and run through guiprep like any other project. At the Remove Headers stage, almost every page will have a header except 001, and pages that were cropped, like 002
- While guiprep is running, save the page images in Blackwood433/png.full
- Rename the resulting PNGs from 003 onwards. The TIA issues are missing the table of contents and (the almost always blank) verso for the issue which is in the DP project
- Convert to B&W and resize the resulting PNGs down to 600 pixels in Blackwood433/pngs
- Convert to B&W and resize the resulting PNGs down to 1100 pixels in Blackwood433/pngs.1100. Check for any oversize images caused by the short side of the image being the side of the page rather than the top
- For each page which is full width, copy the PNG from Blackwood433/pngs.1100 to Blackwood433/pngs. This way every proofing image is about 550 pixels per column
- If you found any columns that need to be replaced with the DP images, copy the DP images to the Blackwood433 folder. Once you have cropped them save them with the right name. Double check the image matches the file with the same name in Blackwood433/pngs. Then move the cropped DP images to Blackwood433/pngs
- Go through each image in Blackwood433/pngs and cut out or white out any areas on pages that should not be proofed
- For Irfanview users -- A typical B&W 1100 pixel proofing image will be about 80 KB. If you whiteout one column by selecting it and changing the Brightness to 255 then when saved it will be about 90KB. It is larger even though it contains less information. It is still B&W, but the compression is bad. So batch convert the B&W images in Blackwood433/pngs to be B&W. This cuts the size down to 50 KB.
- Renumber txt files to start at a003 and then 003
- Zip up the contents of the pngs, reference and text directories and upload to DP
- Insert a blank 002.png/txt, then overwrite the existing project with 003 onwards