I wanted to download a photo album from Facebook, but of course there’s no simple API or option to do that. Searching the internet finds a couple of options (a Firefox plugin, a Facebook app,) but I couldn’t seem to find anything standalone.
This gives another great opportunity to talk about advanced HTTP scripting in PowerShell. We’ve talked about it in the past (here: http://www.leeholmes.com/blog/AdvancedHTTPASPNetScriptingWithPowerShell.aspx,) so this script introduces some new techniques.
There are a couple of challenges for scripting Facebook, especially using the techniques in the earlier post.
The first is that the login sequence is secure 🙂 The login page gets served over SSL, meaning that the basic Send-TcpRequest commands we did previously won’t work. SSL requires encryption and a complex handshake. Luckily the System.Net.WebClient class provides that for use.
The second issue is that the cookies are dynamic, and given out in two stages. You need to first connect to the main Facebook homepage, at which point you get a couple of session cookies. Then, you connect to the login page, at which point you get some login cookies. Once you have THOSE, you can script the rest of Facebook.
Here is Get-FacebookCookie.ps1:
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032
|
## Get-FacebookCookie.ps1 ## Logs into Facebook, returning the cookie required for further operations
param($Credential)
$Credential = Get-Credential $Credential
## Get initial cookies $wc = New-Object System.Net.WebClient $wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)")
$result = $wc.DownloadString("http://www.facebook.com/") $cookie = $wc.ResponseHeaders["Set-Cookie"] $cookie = ($cookie.Split(',') -match '^\S+=\S+;' -replace ';.*','') -join '; '
## Login $bstr = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($credential.Password) $password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($bstr) [System.Runtime.InteropServices.Marshal]::ZeroFreeBstr($bstr)
$wc = New-Object System.Net.WebClient $wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)") $wc.Headers.Add("Cookie", $cookie) $postValues = New-Object System.Collections.Specialized.NameValueCollection $postValues.Add("email", $credential.Username) $postValues.Add("pass", $password)
## Get the resulting cookie, and convert it into the form to be returned in the query string $result = $wc.UploadValues("https://login.facebook.com/login.php?login_attempt=1", $postValues) $cookie = $wc.ResponseHeaders["Set-Cookie"] $cookie = ($cookie.Split(',') -match '^\S+=\S+;' -replace ';.*','') -join '; ' $cookie
|
The most complex part of that script comes from the Set-Cookie headers returned by the server. Information in that header limits the domain of the cookie, expiration dates, and more. Browsers use this information to determine cookie policies, but we want to just blindly feed it back to Facebook. The little match, replace, and join combination converts the “Set-Cookie” syntax into one suitable to return to the server.
Once we’ve logged into Facebook, we download the album page, cycling through successive pages until we stop finding photos to download. Since you can derive the URL to the large size images from the thumbnails, we don’t need to do an extra request for the photo information page. If you wanted to extract photo comments as well, then you would have to.
Here is Get-FacebookAlbum.ps1:
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060
|
## Get-FacebookAlbum.ps1 ## Downloads the images attached to a Facebook photo album param($Album = $(Read-Host "Inital album URL (for example, http://www.facebook.com/album.php?aid=12345&id=12345&ref=nf)"), $Prefix, $Cookie ) $albumId = $album -replace '.*id=([\d+]+).*','$1'
if(-not $Cookie) { $cookie = Get-FacebookCookie }
$pageNumber = 1 do { ## Go through each page in the album. Extract the thumbnail images (which have a pattern ## of /s______.jpg or ______s.jpg) $foundPhotos = $false "Getting album $album, page $pageNumber"
$wc = New-Object System.Net.WebClient $wc.Headers.Add("User-Agent", "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;)") $wc.Headers.Add("Cookie", $cookie) $result = $wc.DownloadString($album + "&page=$pageNumber")
## Regex for images $regex = "<\s*img\s*[^>]*?src\s*=\s*[`"']*([^`"'>]+)[^>]*?>" $photos = ($result | Select-String $regex -AllMatches).Matches | % { $_.Groups[1].Value } $photos = $photos -match "_" + $albumId + "_" $photos = $photos -replace '/s','/n' $photos = $photos -replace '_s.jpg','_n.jpg' "Found photos:" $photos $wc = New-Object System.Net.WebClient ## For each of the photos we found, download the large size foreach($photo in $photos | ? { $_.Trim() }) { $foundPhotos = $true $uri = [Uri] $photo $dest = $uri.LocalPath.Replace("\", "_") $dest = $uri.LocalPath.Replace("/", "_") $dest = Join-Path $pwd "$Prefix$dest" if(-not (Test-Path $dest)) { "Downloading $dest" $wc.DownloadFile($photo, $dest) } } $pageNumber++ } while($foundPhotos);
"(None)"
|
So in under 100 lines of code, we
ve got ourselves an image downloader.