Regular-Expressions

Recently I was asked by my management team to use PowerShell to help with some text searches in a large collection of files, about 15,000 log files generated within our SharePoint system.  The SharePoint team was dealing with an issue and needed to capture email address info related to that issue.  The text they wanted to search for was ‘collabmail’, and then return what was to the left of the @ symbol.

My manager has been working with PowerShell in an attempt to keep his tech skills up to date, so he had already found out how to find the data in the files using the Select-String cmdlet.  Here is what he had so far:

select-string -path *.eml -pattern ‘collabmail’.

Which resulted in the following screenshot of data being returned:

So that was working fine.  I recommend he do two things.

  • Select specific properties from the Select-String cmdlet to make the result more readable.  In this case capturing just the file name and line properties would be useful.
  • Use a ‘Regular Expression’ to filter just the info he needed from the returned line of text from each line.  I will use the term Reg-Exp to describe Regular-Expressions below.

For the Select-String cmdlet here is what I recommended he use:

$allResults = Select-String -Pattern ‘@collabmail’ -Path *.eml | select line, filename

 

Capturing the result from the Select-String cmdlet as the variable $allResults allows us to us a Reg-Exp to filter just the data we want instead of the entire line of text. The above screenshot shows the results from a couple of sample .EML files that I used for testing the  Select-String cmdlet.  Notice the $allResults is an array containing the found text from each file. The two properties of the array are Line and Filename.

Here is where the Reg-Exp comes into the process:

foreach ($result in $allResults){

                [string]$result -match ‘\w+@collabmail’ | Out-Null

We will step through the $allResults array, and for each of its members we will apply the Reg-Exp filter. This will strip off all of the extra data that we don’t need.

Remember my manager only wanted the part of the found email address before the ‘@’ symbol?  So how do we get just the name of the resulting email address?  The following code takes care of that by modifying the returned value from the Reg-Exp:

foreach ($result in $allResults){

                [string]$result -match ‘\w+@collabmail’ | Out-Null

    #We want only the data before the @, so this section will split the result into two parts and keep the first part.

    $emailAddress = $matches[0]

    $emailAddress = $emailAddress.split(‘@’)

 

Here is how the resulting text is sent out into the resulting CSV text file.

$emailAddress = $emailAddress.split(‘@’)

                $fileName = [string]$result.Filename

                $output = $Filename + “`t” + $emailAddress[0]

                Write-Host $output

                Out-File -FilePath .\scanresults.csv -InputObject $output -Append

PowerShell has the ability to capture data as objects to contain data.  The resulting object can be sent to a CSV text file using the Export-CSV cmdlet, but in the case where the size of the resulting data set is unknown I like to output each iteration of a data capture into the output text file.  I’ve had situations where a scan of log files resulted in huge data sets (hundreds of thousands of rows).  The resulting objects I was working with in PowerShell would consume all available memory resources and result in a system crash; nothing permanent just a reboot.  Outputting each line of the data set as it arrives keeps system resource usage to a minimum.

Here is the screenshot from the sample output CSV file:

The full text of the script can be found here: 

EML-Scan.ps1

This script uses a really simple Reg-Exp, but more information can be found on this useful tool quickly using Google.  Here is a good starting point that can be used for Regular-Expressions in general and specifically as they apply to PowerShell:

http://www.regular-expressions.info/powershell.html

Advertisements

About Patrick
I am a Senior Systems Administrator for AT&T. I have been with AT&T for over 15 years. I spend most of my time working with Microsoft Powershell in an effort to find creative ways to manage the data on our file shares. I’ve found Powershell to be a useful, and interesting way to perform Sys. Admin functions.

One Response to Regular-Expressions

  1. Patrick says:

    Discussion with a fellow AT&T employee resulted in some possible syntax modification:
    $emailAddress = [string]$result -replace ‘(\w+)@.*’, “`$1”

    This above code can be used to replace several lines of code. This does the filtering and splitting all at once.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: