More advanced HTTP scripting: Facebook Photo Album Downloader

I wanted to download a photo album from Facebook, but of course there’s no simple API or option to do that. Searching the internet finds a couple of options (a Firefox plugin, a Facebook app,) but I couldn’t seem to find anything standalone.

This gives another great opportunity to talk about advanced HTTP scripting in PowerShell. We’ve talked about it in the past (here: http://www.leeholmes.com/blog/AdvancedHTTPASPNetScriptingWithPowerShell.aspx,) so this script introduces some new techniques.

There are a couple of challenges for scripting Facebook, especially using the techniques in the earlier post.

The first is that the login sequence is secure :) The login page gets served over SSL, meaning that the basic Send-TcpRequest commands we did previously won’t work. SSL requires encryption and a complex handshake. Luckily the System.Net.WebClient class provides that for use.

The second issue is that the cookies are dynamic, and given out in two stages. You need to first connect to the main Facebook homepage, at which point you get a couple of session cookies. Then, you connect to the login page, at which point you get some login cookies. Once you have THOSE, you can script the rest of Facebook.

Here is Get-FacebookCookie.ps1:

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
## Get-FacebookCookie.ps1
## Logs into Facebook, returning the cookie required for further operations

param($Credential)

$Credential = Get-Credential $Credential

## Get initial cookies
$wc = New-Object System.Net.WebClient
$wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)")

$result = $wc.DownloadString("http://www.facebook.com/")
$cookie = $wc.ResponseHeaders["Set-Cookie"]
$cookie = ($cookie.Split(‘,’) -match ‘^\S+=\S+;’ -replace ‘;.*’,) -join ‘; ‘

## Login
$bstr = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($credential.Password)
$password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($bstr)
[System.Runtime.InteropServices.Marshal]::ZeroFreeBstr($bstr)

$wc = New-Object System.Net.WebClient
$wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)")
$wc.Headers.Add("Cookie", $cookie)
$postValues = New-Object System.Collections.Specialized.NameValueCollection
$postValues.Add("email", $credential.Username)
$postValues.Add("pass", $password)

## Get the resulting cookie, and convert it into the form to be returned in the query string
$result = $wc.UploadValues("https://login.facebook.com/login.php?login_attempt=1", $postValues)
$cookie = $wc.ResponseHeaders["Set-Cookie"]
$cookie = ($cookie.Split(‘,’) -match ‘^\S+=\S+;’ -replace ‘;.*’,) -join ‘; ‘
$cookie

The most complex part of that script comes from the Set-Cookie headers returned by the server. Information in that header limits the domain of the cookie, expiration dates, and more. Browsers use this information to determine cookie policies, but we want to just blindly feed it back to Facebook. The little match, replace, and join combination converts the “Set-Cookie” syntax into one suitable to return to the server.

Once we’ve logged into Facebook, we download the album page, cycling through successive pages until we stop finding photos to download. Since you can derive the URL to the large size images from the thumbnails, we don’t need to do an extra request for the photo information page. If you wanted to extract photo comments as well, then you would have to.

Here is Get-FacebookAlbum.ps1:

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
## Get-FacebookAlbum.ps1
## Downloads the images attached to a Facebook photo album
param($Album = $(Read-Host "Inital album URL (for example, http://www.facebook.com/album.php?aid=12345&id=12345&ref=nf)"),
    $Prefix,
    $Cookie
    )
   
$albumId = $album -replace ‘.*id=([\d+]+).*’,‘$1′

if(-not $Cookie)
{
    $cookie = Get-FacebookCookie
}

$pageNumber = 1
do
{
    ## Go through each page in the album. Extract the thumbnail images (which have a pattern
    ## of /s______.jpg or ______s.jpg)
    $foundPhotos = $false
    "Getting album $album, page $pageNumber"

    $wc = New-Object System.Net.WebClient
    $wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)")
    $wc.Headers.Add("Cookie", $cookie)
    $result = $wc.DownloadString($album + "&page=$pageNumber")

    ## Regex for images
    $regex = "<\s*img\s*[^>]*?src\s*=\s*[`"’]*([^`"’>]+)[^>]*?>"
   
    $photos = ($result | Select-String $regex -AllMatches).Matches | % { $_.Groups[1].Value }
    $photos = $photos -match "_" + $albumId + "_"
    $photos = $photos -replace ‘/s’,‘/n’
    $photos = $photos -replace ‘_s.jpg’,‘_n.jpg’
   
    "Found photos:"
    $photos
   
    $wc = New-Object System.Net.WebClient
   
    ## For each of the photos we found, download the large size
    foreach($photo in $photos | ? { $_.Trim() })
    {
        $foundPhotos = $true
        $uri = [Uri] $photo
        $dest = $uri.LocalPath.Replace("\", "_")
        $dest = $uri.LocalPath.Replace("/", "_")
        $dest = Join-Path $pwd "$Prefix$dest"
        if(-not (Test-Path $dest))
        {
            "Downloading $dest"
            $wc.DownloadFile($photo, $dest)
        }
    }
   
    $pageNumber++
} while($foundPhotos);

"(None)"

 

So in under 100 lines of code, we’ve got ourselves an image downloader.

5 Responses to “More advanced HTTP scripting: Facebook Photo Album Downloader”

  1. Nathan Hartley writes:

    I like it! I have wanted to do something like this to create my own personal friend feed. While researching this, I ran across quite a few people that lost their accounts for "screen scraping" facebook.

  2. Daniele Muscetta writes:

    I wonder how long it takes before Facebook comes and complains about this like they did in the past with me and other people for interacting with their site by crafting HTTP requests like you do and not using the APIs http://www.muscetta.com/2007/09/06/facebook-status-change-is-not-a-crime/ – even if there WASN’T an API at the time to do what I wanted to do… http://www.muscetta.com/2007/08/03/facebook-statetray/
    After much complaining, they eventually implemented an API for what I wanted to do http://www.muscetta.com/2007/10/01/facebook-implemented-a-usersetstatus-api/
    So maybe we’ll have APIs for photos too, at one stage.
    I keep using Flickr for my photos, tho.

  3. Steve Campbell writes:

    Excellent article. I can see powershell getting much more popular for web pentesting with capabilities like this.

  4. Mark writes:

    Your code doesn’t work.

    Line 012 throws an error:

    The string is missing the terminator: “.
    + CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString

  5. Mark writes:

    Line 012 in first script should be replaced to (I think):
    $cookie = ($cookie.Split(‘,’) -match ‘^\S+=\S+;’ -replace “;.*’,”) -join ‘; ‘

Leave a Reply