Archives for the Month of March, 2010

Open PowerShell Cookbook Beta Available Online

With the first draft of the Windows PowerShell Cookbook complete, we’re running an open beta through a cutting-edge platform called the Open Feedback Publishing System: http://powershell.labs.oreilly.com. In this system, we put the entire book online and let you read to your heart’s content.

Not only is the book available online, but you can also influence its future. The Open Feedback Publishing System lets you attach comments to any paragraph as though you would comment on a blog.

If you are interested in participating in the deeper technical review of the book, welcome aboard! If you have expertise in a specific chapter, please concentrate on that one first, making a comment on its first paragraph that you will be reviewing it. If you have more general expertise or interest, please first select a chapter that has not been reviewed, making a comment on the first paragraph that you will be reviewing it.

To start, visit http://powershell.labs.oreilly.com, create a new account, and start submitting feedback! The Open Feedback Publishing System overview page gives more information on how to contribute. Reviewers that provide significant feedback get a chance to see the impact of their comments on bookshelves worldwide, and will receive a complimentary copy of the book when it goes to press. We’ll be accepting comments until roughly the end of March.

Enjoy!

HTML Agility Pack Rocks Your Screen Scraping World

I had to do some deep data extraction from a web page today, and naturally leaned on PowerShell for some assistance. PowerShell is a great language for text munging, and web content is no different. There are tons of examples online, but here’s an example from earlier in this blog: http://www.leeholmes.com/blog/PowerShellTheOracleInstantAnswersFromYourPrompt.aspx.

As I looked at the underlying HTML of this page, though, my heart sank. I cared about four pieces of data, and they were arranged without much structure on the web page. The information I cared about was in a couple of different tables, a couple of different table rows, and sometimes in different columns. You can parse your way around this, but it’s simply error-prone and annoying.

At that point, I remembered something called the HTML Agility Pack that I’ve been meaning to experiment with for some time. The HTML Agility Pack lets you navigate an HTML document as though it were well-formed XML, even though the underlying HTML usually isn’t. It doesn’t leverage PowerShell’s XML adapter, but the .NET objects act just like the XML classes from the .NET Framework.

On the down-side, data navigation and selection in XML comes via the XPath language. Like Regular Expressions, XPath queries are an esoteric art and difficult to get right. Luckily, you don’t need much knowledge of XPath for simple XML navigation.

This whole experience gives a great example of the “admin development model.” 15 minutes after thinking about parsing the web page with the HTML Agility Pack, I had a working version. PowerShell’s Get-Member cmdlet was all I used for discovery – no documentation was harmed in the making of this script. Here is the literal text of my history buffer, experimentation and all. On line 251 and 252, I put the history into the ISE so that I can hack out the experimentation bits and keep the stuff that worked.

221 cd C:\temp\HtmlAgilityPack.1.4.0.beta2.binaries
222 dir
223 add-type -Path .\HtmlAgilityPack.dll
224 $types = add-type -Path .\HtmlAgilityPack.dll -PassThru
225 $types
226 $types | ? { $_.IsPublic }
227 $doc = new-object HtmlWeb
228 ($types | ? { $_.IsPublic })[1]
229 ($types | ? { $_.IsPublic })[1].FullName
230 $doc = New-Object HtmlAgilityPack.HtmlDocument
231 $doc
232 $doc | gm
233 $result = $doc.Load("C:\temp\texts.html")
234 $result
235 $doc
236 $doc | gm
237 $doc.DocumentNode
238 $doc.DocumentNode | gm
239 $doc.DocumentNode.SelectNodes("//h1")
240 $doc.DocumentNode.SelectNodes("//table[@class='table-gen']")
241 $doc.DocumentNode.SelectNodes("//table[@class='table-gen']/tr[2]")
242 $doc.DocumentNode.SelectNodes("//table[@class='table-gen']")
243 $texts = $doc.DocumentNode.SelectNodes("//table[@class='table-gen']")
244 $texts[0]
245 $testText = $texts[0]
246 $testText | clip
247 $testText.SelectSingleNode("/tr[1]/td")
248 $testText.SelectSingleNode("tr[1]/td")
249 $testText.SelectSingleNode("tr[1]/td").InnerTExt
250 $testText.SelectSingleNode("tr[1]/td").InnerText.Trim()
251 ise
252 h
253 $time = [DateTime] $testText.SelectSingleNode("tr[1]/td").InnerText.Trim()
254 $testText.SelectSingleNode("tr[2]/td").InnerText.Trim()
255 $testText.SelectSingleNode("tr[2]/td").InnerText.Replace('Description:','').Trim()
256 $testText.SelectSingleNode("tr[6]/td").InnerText
257 $testText.SelectSingleNode("tr[5]/td").InnerText
258 $testText.SelectSingleNode("tr[4]/td").InnerText
259 $testText.SelectSingleNode("tr[5]/td").InnerText
260 $testText.SelectSingleNode("tr[5]/td[1]")
261 $testText.SelectSingleNode("tr[5]/td[2]")
262 $time = $testText.SelectSingleNode("tr[1]/td").InnerText.Trim()
263 $inOut = $testText.SelectSingleNode("tr[2]/td").InnerText.Replace('Description:',...
264 $to = $testText.SelectSingleNode("tr[5]/td").InnerText.Replace('Number Called:','...
265 $from = $testText.SelectSingleNode("tr[5]/td[2]").InnerText.Replace('Calling Numb...
266 New-Object PsObject -Property @{ Time = $time; Type = $inOut; From = $from; To = ...
267 New-Object PsObject -Property @{ Time = $time; Type = $inOut; From = $from; To = ...
268 $texts | % {...
269 C:\temp\textparser.ps1

The final script:

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
cd C:\temp\HtmlAgilityPack.1.4.0.beta2.binaries
add-type -Path .\HtmlAgilityPack.dll
$doc = New-Object HtmlAgilityPack.HtmlDocument
$result = $doc.Load("C:\temp\texts.html")
$texts = $doc.DocumentNode.SelectNodes("//table[@class='table-gen']")

$result = $texts | % {
    $testText = $_
    $time = $testText.SelectSingleNode("tr[1]/td").InnerText.Trim()
    $time = $time.TrimEnd(" CST")
    $time = ([DateTime] $time).AddHours(-2)
    $inOut = $testText.SelectSingleNode("tr[2]/td").InnerText.Replace('Description:','').Trim()
    $to = $testText.SelectSingleNode("tr[5]/td").InnerText.Replace('Number Called:','').Trim()
    $from = $testText.SelectSingleNode("tr[5]/td[2]").InnerText.Replace('Calling Number:','').Trim()

    New-Object PsObject -Property @{ Time = $time; Type = $inOut; From = $from; To = $to } |
        Select From,To,Type,Time
}

$result | Sort Time | ft -auto | out-string -width 75

All in all, the HTML Agility Pack is a very attractive approach that I plan to start using more often.

Responding to USB Devices in PowerShell

I recently got a cool USB MIDI keyboard (M-Audio Axiom 49) and software (Propellerhead Reason) combo that lets me play keyboard with all kinds of cool sounds effects and general fun. Despite how cool the system is, it’s not quite as spontaneous as a plain ol’ electronic keyboard. Just to piddle around for a bit, you have to turn on the keyboard, launch Reason, create a new song, add a virtual instrument, and then start playing.

How can PowerShell of all things help the anguish of a struggling keyboard hack?

The answer comes in two parts – creating a template song, and then having PowerShell launch it when you turn on the keyboard.

It turns out that you can resolve a lot of the issues in Reason by just creating a good starter song. Add in the mixer, a synthesizer, and save it to disk – I called it “Default.rns.” When you double-click on this file, Reason starts up with all of the presets you had configured.

The second part is more tricky, unless of course you happen to have Real Ultimate Power.

The core of this solution comes in three parts:

  1. WMI lets you enumerate USB devices on the system.
  2. The Register-WmiEvent cmdlet lets you respond to WMI events.
  3. WMI has a default __InstanceCreationEvent class that lets you respond to ANY new instance being created (Services, Processes, etc.)

The answer to #1 has come up a few times on the Team Blog:

http://blogs.msdn.com/powershell/archive/2007/02/24/displaying-usb-devices-using-wmi.aspx
http://blogs.msdn.com/powershell/archive/2009/01/10/get-usb-using-wmi-association-classes-in-powershell.aspx

I had no idea what the Axiom keyboard was being recognized as, so I did the easy thing. Ran the query once, turned off the keyboard, and then ran it again. Compare-Object to the rescue:

001
002
003
$before = gwmi Win32_USBControllerDevice |% { [wmi] $_.Dependent }
$after = gwmi Win32_USBControllerDevice |% { [wmi] $_.Dependent }
Compare-Object $before $after -PassThru

That gives an object like:

Availability                :
Caption                     : USB Axiom 49
ClassGuid                   : {4d36e96c-e325-11ce-bfc1-08002be10318}
CompatibleID                : {USB\Class_01&SubClass_01&Prot_00, USB\Class_01&SubClass_01, USB\Class_01}
ConfigManagerErrorCode      : 0
ConfigManagerUserConfig     : False
CreationClassName           : Win32_PnPEntity
Description                 : USB Audio Device
DeviceID                    : USB\VID_0763&PID_0199&MI_00\6&2F841C5F&0&0000
ErrorCleared                :
ErrorDescription            :
HardwareID                  : {USB\VID_0763&PID_0199&REV_0105&MI_00, USB\VID_0763&PID_0199&MI_00}
InstallDate                 :
LastErrorCode               :
Manufacturer                : (Generic USB Audio)
Name                        : USB Axiom 49
PNPDeviceID                 : USB\VID_0763&PID_0199&MI_00\6&2F841C5F&0&0000

There’s the ticket. It’s actually a Win32_PnpEntity. While the previous WMI query helped us discover USB entities, now that we know how the keyboard is identified to the system, we can drop the complicated WMI dependency traversal, and turn this specific goal into a simpler query:

Get-WmiObject Win32_PnPEntity –Filter "Name='USB Axiom 49'"

Now that we know which WMI class represents the keyboard, and the query that selects it specifically, we can use WMI’s __InstanceCreationEvent to monitor for that instance as it comes and goes. The Register-WmiEvent cmdlet makes this a snap:

001
002
003
004
005
006
007
008
009
010
function Register-KeyboardEvent
{
    $query = "SELECT * FROM __InstanceCreationEvent " +
         "WITHIN 5 " +
         "WHERE TargetInstance ISA 'Win32_PnPEntity' " +
         "AND TargetInstance.Name = 'USB Axiom 49' "

    $null = Register-WmiEvent -Query $query -SourceIdentifier KeyboardMonitor -Action { E:\lee\reason\Default.rns }
}

After calling this function, PowerShell automatically launches Reason with your favourite presets already loaded whenever you turn on the USB keyboard.