I'm trying to use WinHTTrack to copy WOTC's 3.x archives from the wayback machine. It's the first time I've used the program and I'm running
into some problems. Hopefully, you guys have the answers.
I start with
http://archive.wizards.com/default.asp?x=dnd/arch/dnd as the starting web address.
Under Settings->Scan Rules, I have
-ad.doubleclick.net/*
+archive.wizards.com/dnd/files/*.zip
+archive.wizards.com/dnd/files/*.pdf
Under Settings->Spider, I set
"no robots.txt rules"
Under Settings->Expert Only, I set
"Rewrite links: internal / external" to "Relative URI / Absolute URL (default)"
My problem seems to be I'm not picking up all the files. For example, I get on the page
http://archive.wizards.com/default.asp?x=dnd/arch/cwcI get the page correctly (saved to file:///C:/My%20Web%20Sites/WOTC%20Archives%20-%20Old/archive.wizards.com/default689f.html?x=dnd/arch/cwc)
But the link to the zip file for "Stealthy Rascals 4" (link is on the page under the small p) should link to a file on my computer, but instead
links to the file on the wayback machine. It does actually download the zip file, but doesn't change the link to point to it. And for all I
know, it could be downloading the pdf because of something on another page.
What do I need to set to make my local copy point to the file on my machine? Thanks.
While I'm asking, is there a way to see what's in robots.txt?