Archives for the Month of January, 2015

The Wonderful World of PowerShell Filtering and Globbing

If you’ve been using PowerShell for long, you are probably familiar with the concept of wildcards. At the very least, you’ve done something like this:

PS C:\temp> dir *.txt

    Directory: C:\temp

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         1/21/2015  10:01 AM        664 test.txt                                                                                                                                                   

Or perhaps you’ve taken a lap or two around about_wildcards and now type things like this in your sleep:

PS C:\temp> dir C:\win*\*.N[a-f]?\F*\v2*\csc.exe

    Directory: C:\Windows\Microsoft.NET\Framework\v2.0.50727

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         5/26/2014   9:39 PM      77960 csc.exe

    Directory: C:\Windows\Microsoft.NET\Framework64\v2.0.50727

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         5/26/2014   9:39 PM      88712 csc.exe                                                                                                                                                    

While wildcarding in the Path parameter is both powerful and useful, you might have seen another parameter: Filter.

In the PowerShell documentation, we describe the –Filter parameter in Get-ChildItem as:

-Filter<String>

Specifies a filter in the provider's format or language. The value of this parameter qualifies the Path parameter. The syntax of the filter, including the use of wildcards, depends on the provider. Filters are more efficient than other parameters, because the provider applies them when retrieving the objects, rather than having Windows PowerShell filter the objects after they are retrieved.

In a SQL provider, the –Filter parameter might offer SQL syntax (like: –Filter “WHERE Name LIKE %pattern%”). Or, the AD provider might offer LDAP syntax. In the FileSystem provider, PowerShell’s wildcard syntax (dir *.txt) is very similar to the NTFS “format or language” which also looks like: *.txt. In the Filesystem Provider’s case, the Win32 API (FindFirstFile) takes a pattern parameter that is then processed by the API itself.

When you use wildcards in cmd.exe, file resolution and wildcarding is done directly by this Win32 API.

What’s the difference?

Now you might wonder about Filesystem: if both wildcards and filters are so similar, why does PowerShell need its own? Why not just call the Win32 API like cmd.exe does?

The primary distinction is around power. As about_wildcards mentions, PowerShell offers the character and character range operators. The native Win32 API does not.

Wildcard Description        Example  Match             No match

-------- ------------------ -------- ----------------- --------

*        Matches zero or    a*       A, ag, Apple      banana

         more characters

?        Matches exactly    ?n       an, in, on        ran

         one character in

         the specified

         position

[ ]      Matches a range    [a-l]ook book, cook, look  took

         of characters

[ ]      Matches specified  [bc]ook  book, cook        hook

         characters

There’s also a surprising distinction around correctness. Try these examples in your System32 directory.

Should return all files with three-letter extensions:

$r1 = dir *.???

$r2 = dir –Filter *.???

Compare-Object $r1 $r2 –Property FullName

(Oops! –Filter returns directories, as well as files with 1 or 2 letter extensions!)

Should return all files with “2” in the name

$r1 = dir *2*

$r2 = dir –Filter *2*

Compare-Object $r1 $r2 –Property FullName

(Oops! –Filter returns a ton of stuff without “2” in the name.)

For the last example, this is because native wildcard filters ALSO work against the 8.3 filename representation!

PS C:\windows\system32> cmd /c dir /x *2* | sls SqlServerSpatial.dll

04/03/2010  10:57 AM           459,104 SQLSER~2.DLL SqlServerSpatial.dll                                                                                                                                

But what about performance?

Native Filesystem filters are unquestionably faster than PowerShell doing all of the wildcard matching on its own. However, PowerShell doesn’t do all of the matching on its own. In Version 2, we added support for partial filtering to the Filesystem provider (and of course, to any other provider that wants to implement it). When the Filesystem provider applies this partial filtering, it offloads as much of the filtering work as it can to the raw Win32 APIs – and then does more powerful (and correct) wildcard matching on the smaller set of results.

So now you know – when it comes to the Filesystem provider, you probably don’t want or need the –Filter parameter!

 

If you want to know more about wildcarding in the Filesystem provider, this is covered in Recipe 20.6 in the PowerShell Cookbook, which you can preview for free here: Find Files that Match a Pattern.

Extracting Tables from PowerShell’s Invoke-WebRequest

If you’ve ever wanted to extract tables from a web page in PowerShell, the Invoke-WebRequest cmdlet is exactly what the doctor ordered.

Once you’ve invoked the cmdlet, the ‘ParsedHtml’ property gives you access to the Internet Explorer DOM of that page. From there, you can get elements by tag name (“TABLE”), ID, and more.

One neat application of this technique is to automatically parse data out of tables on the web page. I recently needed to do this, and the PowerShell script really wasn’t that complicated. In true PowerShell style, each row of the table is output as an object – that way, you can access the data as you would with any other PowerShell cmdlet. Even better - if the table uses the TH tag (“Table Heading”), it uses those headings as property names for the output objects.

Here’s an example of it in action:

1 [C:\Users\leeholm]
>> $url = 'http://www.egyptianhieroglyphs.net/gardiners-sign-list/domestic-and-funerary-furniture/'

2 [C:\Users\leeholm]
>> $r = Invoke-WebRequest $url

3 [C:\Users\leeholm]
>> Get-WebRequestTable.ps1 $r -TableNumber 0 | Format-Table -Auto

P1              P2         P3                   P4
--              --         --                   --
Gardiner Number Hieroglyph Description of Glyph Details
Q1                         Seat                 Phono. st, ws, . In st ?seat, place,? wsir ?Osiris,? ?tm ?perish.?
Q2                         Portable seat        Phono. ws. In wsir ?Osiris.?
Q3                         Stool                Phono. p.
Q4                         Headrest             Det. in wrs ?headrest.?
Q5                         Chest                Det. in hn ?box,? ?fdt ?chest.?
Q6                         Coffin               Det. or Ideo. in qrs ?bury,? krsw ?coffin.?
Q7                         Brazier with flame   Det. of fire. In ?t ?fire,? s?t ?flame,? srf ?temperature.?

4 [C:\Users\leeholm]                                                                                                    

And the script:

001

002

003

004

005

006

007

008

009

010

011

012

013

014

015

016

017

018

019

020

021

022

023

024

025

026

027

028

029

030

031

032

033

034

035

036

037

038

039

040

041

042

043

044

045

046

047

048

param(

    [Parameter(Mandatory = $true)]

    [Microsoft.PowerShell.Commands.HtmlWebResponseObject] $WebRequest,

   

    [Parameter(Mandatory = $true)]

    [int] $TableNumber

)

## Extract the tables out of the web request

$tables = @($WebRequest.ParsedHtml.getElementsByTagName("TABLE"))

$table = $tables[$TableNumber]

$titles = @()

$rows = @($table.Rows)

## Go through all of the rows in the table

foreach($row in $rows)

{

    $cells = @($row.Cells)

   

    ## If we've found a table header, remember its titles

    if($cells[0].tagName -eq "TH")

    {

        $titles = @($cells | % { ("" + $_.InnerText).Trim() })

        continue

    }

    ## If we haven't found any table headers, make up names "P1", "P2", etc.

    if(-not $titles)

    {

        $titles = @(1..($cells.Count + 2) | % { "P$_" })

    }

    ## Now go through the cells in the the row. For each, try to find the

    ## title that represents that column and create a hashtable mapping those

    ## titles to content

    $resultObject = [Ordered] @{}

    for($counter = 0; $counter -lt $cells.Count; $counter++)

    {

        $title = $titles[$counter]

        if(-not $title) { continue }

       

        $resultObject[$title] = ("" + $cells[$counter].InnerText).Trim()

    }

    ## And finally cast that hashtable to a PSCustomObject

    [PSCustomObject] $resultObject

}