Author Topic: WinHTTrack  (Read 2037 times)

Offline kitep

  • DnD Handbook Writer
  • ****
  • Posts: 1947
  • Lookout World!
    • View Profile
WinHTTrack
« on: February 11, 2016, 10:28:52 AM »
I'm trying to use WinHTTrack to copy WOTC's 3.x archives from the wayback machine.  It's the first time I've used the program and I'm running

into some problems.  Hopefully, you guys have the answers.

I start with http://archive.wizards.com/default.asp?x=dnd/arch/dnd as the starting web address.

Under Settings->Scan Rules, I have
-ad.doubleclick.net/*
+archive.wizards.com/dnd/files/*.zip
+archive.wizards.com/dnd/files/*.pdf

Under Settings->Spider, I set
"no robots.txt rules"

Under Settings->Expert Only, I set
"Rewrite links: internal / external" to "Relative URI / Absolute URL (default)"

My problem seems to be I'm not picking up all the files.  For example, I get on the page
http://archive.wizards.com/default.asp?x=dnd/arch/cwc
I get the page correctly (saved to file:///C:/My%20Web%20Sites/WOTC%20Archives%20-%20Old/archive.wizards.com/default689f.html?x=dnd/arch/cwc)
But the link to the zip file for "Stealthy Rascals 4" (link is on the page under the small p) should link to a file on my computer, but instead
links to the file on the wayback machine.  It does actually download the zip file, but doesn't change the link to point to it.  And for all I
know, it could be downloading the pdf because of something on another page.

What do I need to set to make my local copy point to the file on my machine?  Thanks.

While I'm asking, is there a way to see what's in robots.txt?


Offline Amechra

  • Epic Member
  • ****
  • Posts: 4560
  • Thread Necromancy a specialty
    • View Profile
Re: WinHTTrack
« Reply #1 on: February 11, 2016, 12:28:58 PM »
Generally, you append robots.txt like so:

http://archive.wizards.com/robots.txt

Sadly, I'm more familiar with wget than WinHTTrack, so I can't help you there.
"There is happiness for those who accept their fate, there is glory for those that defy it."

"Now that everyone's so happy, this is probably a good time to tell you I ate your parents."

Offline kitep

  • DnD Handbook Writer
  • ****
  • Posts: 1947
  • Lookout World!
    • View Profile
Re: WinHTTrack
« Reply #2 on: February 11, 2016, 01:57:33 PM »
Thanks, that worked perfectly for showing robots.txt

Offline kitep

  • DnD Handbook Writer
  • ****
  • Posts: 1947
  • Lookout World!
    • View Profile
Re: WinHTTrack
« Reply #3 on: February 18, 2016, 05:17:01 AM »
AAARRRGGHHHH!!!!!!!    :banghead :banghead :banghead :banghead :banghead :banghead :banghead :banghead

Turns out my shortcut was pointing to an old download of the website.  I don't know for sure if what I posted above worked just fine, or if I fixed it while tweaking things later, but I can no longer be sure the above isn't correct.

Offline altpersona

  • Legendary Member
  • ****
  • Posts: 2000
  • #78
    • View Profile
    • You are here
Re: WinHTTrack
« Reply #4 on: February 18, 2016, 10:54:02 AM »
i typically use wget to dl a site,

problem being last time it ran out of resources after like 20 gig :/
The goal of power is power. - 1984
We are not descended from fearful men. - Murrow
The Final Countdown is now stuck in your head.

Anim-manga still sux.