Month: May 2012

Regular-Expressions

Recently I was asked by my management team to use PowerShell to help with some text searches in a large collection of files, about 15,000 log files generated within our SharePoint system.  The SharePoint team was dealing with an issue and needed to capture email address info related to that issue.  The text they wanted to search for was ‘collabmail’, and then return what was to the left of the @ symbol.

My manager has been working with PowerShell in an attempt to keep his tech skills up to date, so he had already found out how to find the data in the files using the Select-String cmdlet.  Here is what he had so far:

select-string -path *.eml -pattern ‘collabmail’.

Which resulted in the following screenshot of data being returned:

So that was working fine.  I recommend he do two things.

  • Select specific properties from the Select-String cmdlet to make the result more readable.  In this case capturing just the file name and line properties would be useful.
  • Use a ‘Regular Expression’ to filter just the info he needed from the returned line of text from each line.  I will use the term Reg-Exp to describe Regular-Expressions below.

For the Select-String cmdlet here is what I recommended he use:

$allResults = Select-String -Pattern ‘@collabmail’ -Path *.eml | select line, filename

 

Capturing the result from the Select-String cmdlet as the variable $allResults allows us to us a Reg-Exp to filter just the data we want instead of the entire line of text. The above screenshot shows the results from a couple of sample .EML files that I used for testing the  Select-String cmdlet.  Notice the $allResults is an array containing the found text from each file. The two properties of the array are Line and Filename.

Here is where the Reg-Exp comes into the process:

foreach ($result in $allResults){

                [string]$result -match ‘\w+@collabmail’ | Out-Null

We will step through the $allResults array, and for each of its members we will apply the Reg-Exp filter. This will strip off all of the extra data that we don’t need.

Remember my manager only wanted the part of the found email address before the ‘@’ symbol?  So how do we get just the name of the resulting email address?  The following code takes care of that by modifying the returned value from the Reg-Exp:

foreach ($result in $allResults){

                [string]$result -match ‘\w+@collabmail’ | Out-Null

    #We want only the data before the @, so this section will split the result into two parts and keep the first part.

    $emailAddress = $matches[0]

    $emailAddress = $emailAddress.split(‘@’)

 

Here is how the resulting text is sent out into the resulting CSV text file.

$emailAddress = $emailAddress.split(‘@’)

                $fileName = [string]$result.Filename

                $output = $Filename + “`t” + $emailAddress[0]

                Write-Host $output

                Out-File -FilePath .\scanresults.csv -InputObject $output -Append

PowerShell has the ability to capture data as objects to contain data.  The resulting object can be sent to a CSV text file using the Export-CSV cmdlet, but in the case where the size of the resulting data set is unknown I like to output each iteration of a data capture into the output text file.  I’ve had situations where a scan of log files resulted in huge data sets (hundreds of thousands of rows).  The resulting objects I was working with in PowerShell would consume all available memory resources and result in a system crash; nothing permanent just a reboot.  Outputting each line of the data set as it arrives keeps system resource usage to a minimum.

Here is the screenshot from the sample output CSV file:

The full text of the script can be found here: 

EML-Scan.ps1

This script uses a really simple Reg-Exp, but more information can be found on this useful tool quickly using Google.  Here is a good starting point that can be used for Regular-Expressions in general and specifically as they apply to PowerShell:

http://www.regular-expressions.info/powershell.html

Advertisement