A Download Manager in MSH

Wed, Aug 3, 2005 3-minute read

I recently stumbled upon this blog entry┬áthat expanded on a piece I wrote a few days ago: Command Line Shortcut for Repetitive Operations. (Hankatsu?)’s entry is in Japanese, so I don’t know what it says. In fact, for all I know, he or she could be making fun of me. In any case, the code included with the blog entry shows a quck way to download sequentially numbered files from the internet – such as File001.jpg, File002.jpg, etc. That’s a great use of the technique, and we can improve it even further with a useful script that acts as a download manager.

This was one of the first Monad scripts I wrote (about 2.5 years ago,) and I’ve faithfully ported it through every one of the many breaking changes that have happened since then :) It originally relied heavily on the Windows port of wget, but I was able to finally remove that a few weeks ago when I noticed that the .Net framework now supports the WebClient.DownloadFile() method.

It’s one of my most heavily used scripts – it’s not very complex, but sure is useful.

## download-queue.msh
## Acts as a download manager, to download batches of files.
##
## 1) Create a directory, and place "download-queue.msh" in it.
## 2) Create a subdirectory, called "Queue"
## 3) Inside the "Queue" directory, place .txt files that contain only URLs in them.
##
## Download-queue.msh will use the name of the text file to create a new subdirectory.
## It will place the downloaded files inside that subdirectory.

## Ensure the System.Net and System.Web DLLs are loaded
[void] [Reflection.Assembly]::LoadWithPartialName("System.Net")
[void] [Reflection.Assembly]::LoadWithPartialName("System.Web")

## Keep on processing the queue directory, while there are batches
## remaining
while($(get-childitem Queue\*.txt).Length -gt 0)
{
 ## Get all of the .txt files in the queue directory
 foreach($file in $(get-childitem Queue\*.txt))
 {
  write-host "Processing: $file"

  ## Create a directory, based on the filename (minus extension)
  ## of the text file
  $name = $file.Name.Replace(".txt", "")
  $null = new-item -name $name -type Directory
  set-location $name

  ## Download each item in the file
  foreach($url in (get-content $file))
  {
   ## Strip the filename out of the URL
   if($url -match ".*/(?<file>.*)")
   {
    $filename = $matches["file"]
    $filename = combine-path "$(get-location)" "$([System.Web.HttpUtility]::URlDecode($filename))"

    write-host " Downloading: $url"
    $webClient = new-object System.Net.WebClient
    $webClient.DownloadFile($url, $filename)
   }
   else
   {
    write-host "$url is not a valid URI."
   }
  }

  ## Move the file list into the directory, also
  move-item (combine-path "..\Queue" ($file.Name)) .

  set-location ..
 }
}

For now, you’re on your own for generating the queue files. Right-clicking “Copy Shortcut” in your browser is a great way to get URLs. Batching them this way is many times faster than downloading each file individually.

Here it is in action:

MSH:297 C:\Temp >md Queue

    Directory: FileSystem::C:\Temp

Mode    LastWriteTime            Length Name
----    -------------            ------ ----
d----   Aug 02 21:10                    Queue

MSH:299 C:\Temp >echo "https://www.leeholmes.com/blog/images/rssButton.gif" > Queue\LeeHolmes.com.txt
MSH:300 C:\Temp >echo "https://www.leeholmes.com/blog/images/xmlCoffeeMug.gif" >> Queue\LeeHolmes.com.txt
MSH:301 C:\Temp >download-queue
Processing: C:\Temp\Queue\LeeHolmes.com.txt
 Downloading: https://www.leeholmes.com/blog/images/rssButton.gif
 Downloading: https://www.leeholmes.com/blog/images/xmlCoffeeMug.gif
MSH:302 C:\Temp >dir LeeHolmes.com

    Directory: FileSystem::C:\Temp\LeeHolmes.com

Mode    LastWriteTime            Length Name
----    -------------            ------ ----
-a---   Aug 02 21:12                107 LeeHolmes.com.txt
-a---   Aug 02 21:12               1025 rssButton.gif
-a---   Aug 02 21:12               1486 xmlCoffeeMug.gif

Stay tuned – in the near future, I’ll write a post that shows how to parse all of the URLs out of a web page.

[Edit: I’ve updated the script, to make it a little less sensitive to URLs with funky characters.]
[Edit: I’ve now posted my link parser script, so you don’t have to generate these files manually anymore.]

[Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]