General Question

Questionsaboutstuff's avatar

How do you remove formatting from PDF files?

Asked by Questionsaboutstuff (233 points ) April 20th, 2014

I want an easy way to remove all page numbers and title at the top of the pages.

When a word is split on two lines such as

es-
pecially

How do I convert that so it’s one word for a whole document?

Observing members: 0 Composing members: 0

4 Answers

Lightlyseared's avatar

It depends what type of PDF it is. If it’s a PDF made from scanned images of a book or journal then your stuck with what you’ve got. The PDF in this case is basically a stack of images. If the PDF was created from a text document like a word file then you can edit the text with Acrobat. You can download a trial from Adobe’s website. Otherwise you would probably have to convert it to another type of document (for example a word file) and then edit it and convert it back.

jaytkay's avatar

As @Lightlyseared wrote, the results will vary depending upon the content of the PDF.

1)
The first thing to try is selecting all the text and pasting it into your word processor or text program.
Press Control and “A” (Command and “A” on a Mac) to select all text
Copy
Paste

2)
Amit Agarwal has a lot of advice here in his Digital Inspiration blog:
How to Edit PDF Files without Adobe Acrobat

dappled_leaves's avatar

If you have Adobe Acrobat, then yes – it first depends on whether the pages of text are “images” (if you click somewhere on the page, does the whole page get selected?) or whether words are recognized as words.

If they are recognized as words, go to Tools > Advanced Editing > Text TouchUp Text Tool. Using this tool, if you click on the text that you want to edit it should (after a pause) show you the boundaries of the block of text and allow you to edit the text.

If words are not recognized, and each page is an image, then you’ll first have to go to Document > OCR Text Recognition > Recognize Text Using OCR. This will turn your image into recognizable, searchable words. It doesn’t always work perfectly, particularly if you have a bad scan or if there are actual images embedded in your page. After the OCR process, you should be able to edit the text as I described above.

A warning, though – editing in Acrobat is a pain in the ass, and you should budget time to experiment a bit and see what effects your changes have on the page as you go. And save a backup copy of the original in case you screw it up badly.

But the only way you could be “stuck with what you’ve got” as @Lightlyseared said above, is if the file is locked for editing. Otherwise, you should certainly be able to do something with it, particularly if you are only doing minor changes.

CWOTUS's avatar

Even in the case of .PDF files made up of scanned photocopies, you can use OCR (Optical Character Recognition) software that will – depending on the quality of the .PDF! – recognize and convert most characters to letters, which you can then process through any word processing software – or even Notepad or other basic text editor.

OCR isn’t fast and it’s far from perfect (I’ve used FreeOCR from time to time for a few years, with varying results), but it’s a good way to get around 80–90% of the text, leaving you some work to read over and fix the glitches, errors and formatting that you wish to correct.

Answer this question

Login

or

Join

to answer.

This question is in the General Section. Responses must be helpful and on-topic.

Your answer will be saved while you login or join.

Have a question? Ask Fluther!

What do you know more about?
or
Knowledge Networking @ Fluther