More advanced HTTP scripting: Facebook Photo Album Downloader

Fri, Sep 4, 2009 4-minute read

I wanted to download a photo album from Facebook, but of course there’s no simple API or option to do that. Searching the internet finds a couple of options (a Firefox plugin, a Facebook app,) but I couldn’t seem to find anything standalone.

This gives another great opportunity to talk about advanced HTTP scripting in PowerShell. We’ve talked about it in the past (here: http://www.leeholmes.com/blog/AdvancedHTTPASPNetScriptingWithPowerShell.aspx,) so this script introduces some new techniques.

There are a couple of challenges for scripting Facebook, especially using the techniques in the earlier post.

The first is that the login sequence is secure :) The login page gets served over SSL, meaning that the basic Send-TcpRequest commands we did previously won’t work. SSL requires encryption and a complex handshake. Luckily the System.Net.WebClient class provides that for use.

The second issue is that the cookies are dynamic, and given out in two stages. You need to first connect to the main Facebook homepage, at which point you get a couple of session cookies. Then, you connect to the login page, at which point you get some login cookies. Once you have THOSE, you can script the rest of Facebook.

Here is Get-FacebookCookie.ps1:

## Get-FacebookCookie.ps1
## Logs into Facebook, returning the cookie required for further operations
param($Credential)

$Credential = Get-Credential $Credential

## Get initial cookies
$wc = New-Object System.Net.WebClient
$wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)")

$result = $wc.DownloadString("http://www.facebook.com/")
$cookie = $wc.ResponseHeaders["Set-Cookie"]
$cookie = ($cookie.Split(',') -match '^\S+=\S+;' -replace ';.*','') -join '; '

## Login
$bstr = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($credential.Password)
$password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($bstr)
[System.Runtime.InteropServices.Marshal]::ZeroFreeBstr($bstr)

$wc = New-Object System.Net.WebClient
$wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)")
$wc.Headers.Add("Cookie", $cookie)
$postValues = New-Object System.Collections.Specialized.NameValueCollection
$postValues.Add("email", $credential.Username)
$postValues.Add("pass", $password)

## Get the resulting cookie, and convert it into the form to be returned in the query string
$result = $wc.UploadValues("https://login.facebook.com/login.php?login_attempt=1", $postValues)
$cookie = $wc.ResponseHeaders["Set-Cookie"]
$cookie = ($cookie.Split(',') -match '^\S+=\S+;' -replace ';.*','') -join '; '
$cookie

The most complex part of that script comes from the Set-Cookie headers returned by the server. Information in that header limits the domain of the cookie, expiration dates, and more. Browsers use this information to determine cookie policies, but we want to just blindly feed it back to Facebook. The little match, replace, and join combination converts the Set-Cookie syntax into one suitable to return to the server.

Once we’ve logged into Facebook, we download the album page, cycling through successive pages until we stop finding photos to download. Since you can derive the URL to the large size images from the thumbnails, we don’t need to do an extra request for the photo information page. If you wanted to extract photo comments as well, then you would have to.

Here is Get-FacebookAlbum.ps1:

## Get-FacebookAlbum.ps1
## Downloads the images attached to a Facebook photo album
param($Album = $(Read-Host "Inital album URL (for example, http://www.facebook.com/album.php?aid=12345&id=12345&ref=nf)"),
    $Prefix,
    $Cookie
    )
   
$albumId = $album -replace '.*id=([\d+]+).*','$1'
if(-not $Cookie)
{
    $cookie = Get-FacebookCookie
}

$pageNumber = 1
do
{
    ## Go through each page in the album. Extract the thumbnail images (which have a pattern
    ## of /s______.jpg or ______s.jpg)
    $foundPhotos = $false
    "Getting album $album, page $pageNumber"

    $wc = New-Object System.Net.WebClient
    $wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)")
    $wc.Headers.Add("Cookie", $cookie)
    $result = $wc.DownloadString($album + "&page=$pageNumber")

    ## Regex for images
    $regex = "<\s*img\s*[^>]*?src\s*=\s*[`"']*([^`"'>]+)[^>]*?>"
   
    $photos = ($result | Select-String $regex -AllMatches).Matches | % { $_.Groups[1].Value }
    $photos = $photos -match "_" + $albumId + "_"
    $photos = $photos -replace '/s','/n'
    $photos = $photos -replace '_s.jpg','_n.jpg'
   
    "Found photos:"
    $photos
   
    $wc = New-Object System.Net.WebClient
   
    ## For each of the photos we found, download the large size
    foreach($photo in $photos | ? { $_.Trim() })
    {
        $foundPhotos = $true
        $uri = [Uri] $photo
        $dest = $uri.LocalPath.Replace("\", "_")
        $dest = $uri.LocalPath.Replace("/", "_")
        $dest = Join-Path $pwd "$Prefix$dest"
        if(-not (Test-Path $dest))
        {
            "Downloading $dest"
            $wc.DownloadFile($photo, $dest)
        }
    }
   
    $pageNumber++
} while($foundPhotos);

"(None)"

So in under 100 lines of code, we’ve got ourselves an image downloader.