Do you know of software that can generate .mht, .pdf, or similar single-file documents from the many .html and image files I've saved on my hard disk?

Asked by mdy (1152points) May 27th, 2008

Over the years, I’ve saved a lot of web pages on my hard disk on various topics. Although they’re neatly organized by topic in subdirectories, a lot of these web pages were not saved in .mht (Microsoft’s web archive) or .pdf format.

Consequently, not only do I have a lot of .html files in my hard drive… I’ve also got a lot of subdirectories that contain the related images for each of these .html files.

I’m therefore hoping to find software that will:

1. Accept a directory on my hard disk as input parameter

2. Go through all the .html files in that directory and locate the related child subdirectory that contains the related images and stylesheets of each .html file

3. For each .html file, generate or produce a single .mht (or .pdf, .rtf, or .doc) file that contains the .html and all the related images and stylesheets in the relevant child directories, packaged neatly as a single, searchable file.

I’ll be happy to look at both free and non-free software.

PS. Whew! Being a packrat is tough work.

PPS. Google searches have so far only yielded software that can create .mht files from web pages that are on the ‘net, rather than web pages that are already sitting on my hard disk. Other file-merging solutions talk about merging multiple .html files together, which isn’t what I’m looking for. And still others require user intervention with each file. I’m hoping for something that’s batch-oriented and can work on all the .html files in an entire directory in one go.

Counter-intuitive maybe, but Internet Explorer will do that for you.

– pointing IE at one the pages you want to convert to .mht,
– selecting File > Save As… (or Page > Save As… depending on how you’ve rigged your toolbars).
– choosing “Web archive, single file (*.mht)” as the type.

You’ll have to manually repeat those steps for every major page. (Unless someone can recommend a Windows-based equivalent to Mac’s Automator utility?)

Thanks, robmandu!

Yes, doing this via IE had occurred to me, but the thought of doing that for several hundred web pages fills me with dismay. My fault for not saving them properly in the first place.

I feel ya… good luck!

I googled on HTML2PDF and found this one:
Don’t know if it does the trick for you or you already tried it, but I would be curious to know if it works for you.

