Part-of-Speech Tagging with PowerShell
Wednesday, 20 December 2017
When analyzing text, a common goal is to identify the parts of speech within that text – what parts are nouns? Adjectives? Verbs in their gerund form?
To accomplish this goal, the area of natural language processing in Computer Science has developed systems for Part of Speech tagging, or “POS Tagging”. The acronym preceded the version in Urban Dictionary 🙂
The version I used in University was a Perl-based Brill Tagger, but things have advanced quite a bit – and the Stanford NLP group has done a great job implementing a Java version with C# wrappers here:
https://nlp.stanford.edu/software/tagger.shtml
The default English model is 97% correct on known words, and 90% correct on unknown words. “SpeechTagger” is a PowerShell interface to this tagger
By default, Split-PartOfSpeech outputs objects that represent words and the part of speech associated with them. The TaggerModel parameter lets you specify an alternate tagger model: the Stanford Part of Speech Tagger supports:
- Arabic
- Chinese
- English
- French
- German
- Spanish
The –Raw parameter emits sentence in the common text-based format for part-of-speech tagging, separating the word and its part of speech with the ‘/’ character. This is sometimes useful for regular expressions, or for adapting code you might have previously written to consume other part-of-speech taggers.
To install this project, simply run the following command from PowerShell:
Install-Module –Name SpeechTagger
No. 1 — December 20th, 2017 at 12:23 pm
[…] Part-of-Speech Tagging with PowerShell (Lee Holmes) […]
No. 2 — December 22nd, 2017 at 7:39 am
[…] Ein Service der Stanford Universität erlaubt es Text grammatisch zu analysieren, und das in mehreren Sprachen. Wirklich interessant, weil als Webservice verfügbar. Link […]