- Quick Solution if you need to copy & select Text:
- Emergency Save of Multi-Language Text:
- Quick Solution if you don’t need to copy and Paste Text:
- Quick Solution if your PDF is in only One Language:
I created a PDF from a Website, my own personal blog, as a Back-up “Just in Case”. I used Adobe Acrobat Pro to Stay on the Same Server and on the Same Path, and get several Levels Deep. I ended up with a PDF File that was around 5GB in size.
The Service I was using for my Blog for many years recently Shut Down, without much warning, giving the only reason “nobody uses it anymore”, sadly that seems to be true of most of the internet, which isn’t social media. I’m guilty of not even turning on my computer for years because Social Media was enough for my mindless thumb-twiddling. My attention span rapidly degraded to that of a 3-year old, and who can blame us, the internet has largely become nothing but click-bait and advertising spam and not much else. The blog said it would transfer everything, but only transferred the last 10 or so blog posts, and everything else is now gone from existence (hopefully I have better luck on WordPress).
So I tried opening the PDF I created some time ago, and I got “There was an error opening this document. The File is Damaged and Could Not Be Repaired”.
This is 5GB of Data, mind you, just text and photos and hyperlinks, essentially. How is it possible for Adobe to create such a program that is unable to even give you access to any of that data, simply because it has something wrong with how it handles files?
I Downloaded the free SumatraPDF, they have a Portable x64 version which is awesome. It took several minutes but it was able to open the PDF which the expensive Adobe Acrobat could not. Adobe was content to let 5GB of data die forever, rather than install emergency retrieval functionality for our documents.
So I could at least Read, Copy, Paste, and Follow hyperlinks within the PDF, which was really amazing. It didn’t have any images, but I think I might have disabled the images when saving as PDF originally, to reduce the file size.
I tried saving a new version of the PDF in a different location from within Sumatra, and Adobe Acrobat still wouldn’t open it, though it gave me initial hope with “File is Damaged but is Being Repaired”, but after a few minutes it threw the original error again.
I tried MS Word and it simply said the pdf was too big for it to handle.
This means that the problem is within how Acrobat handles its documents, and is willing to let it go so far as trashing your data forever, rather than adapt to the environment. It’s probably
I tried going into RegEdit & adding a new Key at “HKLMSoftwareAdobe(product name)(version)” named “AVGeneral”, then adding a New DWORD named “bValidateBytesBeforeHeader”, with a value of “00000000”, in hopes the problem was Adobe verifying bytes before the header in order to work, but I received the exact same outcome as before.
I tried AnyBizSoft PDF to Word, that program said it couldn’t read the PDF that Sumatra saved at all
I tried ‘Haihaisoft PDF Reader’, and it was able to open the original file, then when I went to Print to PDF, it began working initially but failed after exactly 100 pages for some ‘request for a C++ runtime library to terminate in an unusual way’, then the program closed with problems, which speaks to limitations in the design or method used.
So the next idea was to go back into Sumatra and Print to PDF, which seemed to have a completely different method, going by the looks. I turned off all compression and resizing and hit print. It failed after 252 pages.
So I decided to I set it to print PDF at the smallest size, and selected ‘Resize pages, if needed’. It completed successfully after about 4 hours or so, and brought the size down to 86MB. It opened in Acrobat with all pages intact and very easy to quickly scroll through. At this point the text was not selectable, so I performed a cleartext OCR within Acrobat which took around 2.5 hours and the result was just really bad, could barely even read the broken up font, I think thats because it was an OCR from the smallest size. And as the blog has a tremendous amount of Chinese co-mingled with English, this didn’t really work.
The original PDF, opened in Sumatra, allowed selected text. The main reason for wanting to get it into Acrobat is to make sure its fully “fixed” to work in Adobe, and to split it into a number of smaller files, and maybe optimize it a bit and reduce the file size so it can open faster.
So I tried printing again to PDF from SumatraPDF, using “Standard” settings modified to not embed fonts, and to Resize Pages, if needed. The print conversion went 3 or 4 times faster. The final PDF was not editable, and text was not selectable.
I opened the PDF in Acrobat, and Saved the PDF to a Word document, which OCRd every page and took a few more hours, and the DOC was only about 16MB but it was quite slow for Word to process through all the pages and load, but only the English was converted and the Chinese was just jumbled. So that didn’t really work for me, but at least the English was Formatted.
So I opened the Sumatra-Saved Original in SumatraPDF Again, which is Text-Selectable. I just hit Ctrl+A > Ctrl+C and Pasted it into a New Word Document, and it pasted really very quickly, and it actually saved all the Chinese & English text perfectly, albeit as just unformatted text, but at least it wasn’t just a large block of text, its all in individual lines, and very readable! It was only around 1.2MB. So that’ll work just fine for me to rescue all of my data.
The PDF has around 1600 pages, most all of it were things I found quite useful, during my time living in 3rd world countries with strict blocking, including blocking of WordPress sites. I’ll hopefully get to save some of that and set it up here. With my luck I’ll be back in a country which blocks WordPress and lose access to it all again.
Quite a bit of it was dedicated to Windows XP, which I used until Windows 7 came out. And now that windows 10 is out, nothing still beats Windows 7, by far. I went from Windows 95 to NT 4.0, then XP was 2001, then Windows 7 was 2009, that’s 8 years on XP, and even longer on 7. XP seemed like forever because it sucked. Windows 7 is awesome. Windows 10 is a troll, established in partnership with the NSA no doubt, and really takes the experience of computing away and turns it into something like a Mac, where you don’t really know what’s going on, but you don’t really care, cus its pretty and you want to push pretty buttons and see what happens.
Anyway, I have a 10 Laptop, and that only because of the Bios switch & Drive Formatting that makes switching from 10 to 7 almost irreversible, and a major PITA. So all of the Windows 7-related stuff is still very useful, most of it also applies to 10, Windows 7 was just the newest one when they were written.