Sunday, 13 March 2011

Remote monitoring of disk space using PowerShell Part 1

Continuing on the theme of monitoring servers, I've been working on monitoring the available disk space on a number of our servers using PowerShell. My original intention was to call this from IPMonitor, but unfortunately that doesn't seem to reliably pick up the changing result fed back from the script, which obviously makes it a somewhat useless as method monitoring! In addition, while I could monitor these things on the servers themselves, I felt that left too much scope for a problem on the server also preventing alerts being generated to let us know.

Writing the script turned out to be quite a challenge, considering my limited experience with PowerShell, but it was certainly an exciting one. As it progressed the complexity slowly increased, sometimes due to the realisation that I hadn't taken something into account, for instance checking the script could even connect to the target server, and other times simply as I thought of additional ways I could improve and expand the functionality. To say I learnt a lot from the process would be an understatement, and I thought I'd share some of those discoveries with all of you.

First of all though is the script itself. As you can see below, I wrote it to be as generic as possible so anyone could easily use it again for another server, simply by adjusting the few details at the top. The script connects to the specified server, and loops through a list of drives testing that each one has more that the configured minimum disk space available. Since the requirement is different for each drive, this is configured individually per drive rather than script wide. If the test fails then the script generates an error e-mail including details of the available space, and the configured threshold. Now since I wanted the ability to run the script fairly regularly, but didn’t want to be constantly bombarded with alerts, the script keeps a record of when an alert has been sent, counts the number of times the test has failed since the alert and then doesn't send another one for a defined number of times. Finally, once the test passes again (eg you've resolved the space issue on the server), the script sends a notification to confirm that and resets the counters.

# Remote space monitoring script - by Keith Langmead 02/03/2011
#
Script connects to a remote server and checks the available disk space on each specified drive
#
the monitored drives and their individual thresholds are stored in the $serverdrives hash.
#
For each failure an e-mail alert is generated the first failure, repeat alerts are generated
#
at the frequency specified in $alertthreshold, and once the available space is above the
#
threshold again a recovery e-mail is generated.

# ------ Config Section -------

$mailfrom="Alert Address <alert@domain.com>"
$mailto="Keith Langmead <me@domain.com>"
$mailserver="mail.domain.com"
# contains drives to be monitors, "<drive letter>"=<min mb> with multiple drives separated by a semi-colon
$serverdrives=@{"c"=30000;"e"=15000;"f"=20000;"g"=20000}
$servername="My Server"
$serverip="192.168.0.1"
# specifies how many times to skip alerting
[INT]$alertthreshold=12

$encrypted = "<encrypted password>"
$username = "MyServer\MyUser"
$password = ConvertTo-SecureString -string $encrypted
$myCred = New-Object System.Management.Automation.PSCredential $username, $password

# ---------------------------------------

# Function code from http://halr9000.com/article/506
function df ( $Path ) {
if (!$path) {$path = (get-location -psprovider filesystem).providerpath}
if (!($Drive = (get-item $path -ea silentlycontinue).root -replace "\\")) {$Drive = $Path}
$output = get-wmiobject -query "select freespace from win32_logicaldisk where deviceid = `'$drive`'" -computername $serverip -Credential $myCred
return [INT]"$($output.freespace / 1mb)"
}

if (test-connection -computername $serverip -Quiet)
{
write-host "Connection to $servername successful"
# Loop through the list of drives specified above
foreach ($sd in $serverdrives.keys)
{
$sdpath=$sd+":"
# Define the environment variable names for the current drive
$envVarServerFail = "spaceMonitorEmail" + $servername + $sd.ToUpper() + "fail"
$envVarServerCount = "spaceMonitorEmail" + $servername + $sd.ToUpper() + "count"
# Call the DF function to find available space on current drive
$currentspace=df $sdpath
# If the available space is above the threshold
if (($currentspace) -gt $serverdrives.Get_Item($sd)) {
write-host "Space is fine on " $sd":"
# If the test previously failed generate a recovery e-mail
if ([Environment]::GetEnvironmentVariable($envVarServerFail, "User") -eq 1) {
$minspace=$serverdrives.Get_Item($sd)
$driveletter=$sd.ToUpper()
$mailsubject="[Notifier] $servername Disk Space Notification $driveletter Drive Recovered"
$bodytext="The free space on $sdpath is back above specified limits. Current space available is $currentspace MB and the threshold for alerts is $minspace MB"
send-mailmessage -from $mailfrom -to $mailto -subject $mailsubject -body $bodytext -smtpServer $mailserver
write-host "recovery e-mail sent"
}
# Reset the environment variables to 0
[INT][Environment]::SetEnvironmentVariable($envVarServerFail, 0, "User")
[
INT][Environment]::SetEnvironmentVariable($envVarServerCount, 0, "User")
}
else {
# If the test failed and it's either the first time or the failure count is above the alert threshold send a notification
if ([Environment]::GetEnvironmentVariable($envVarServerFail, "User") -eq 0 -OR [INT][Environment]::GetEnvironmentVariable($envVarServerCount, "User") -gt [INT]$alertthreshold) {
$minspace=$serverdrives.Get_Item($sd)
$driveletter=$sd.ToUpper()
$mailsubject="[Notifier] $servername Low Disk Space Notification $driveletter Drive"
$bodytext="The free space on $sdpath is getting low. Current space available is $currentspace MB and the threshold for alerts is $minspace MB"
send-mailmessage -from $mailfrom -to $mailto -subject $mailsubject -body $bodytext -smtpServer $mailserver
write-host "failure e-mail sent"
# Set fail and count to 1 or reset count back to 1
[INT][Environment]::SetEnvironmentVariable($envVarServerFail, 1, "User")
[
INT][Environment]::SetEnvironmentVariable($envVarServerCount, 1, "User")
}
else {
# If the test failed, it's not the first time and not above the alert threshold increment the count
[INT]$incFailVar = [Environment]::GetEnvironmentVariable($envVarServerCount, "User")
[
INT]$incFailVar = $incFailVar + 1
[
INT][Environment]::SetEnvironmentVariable($envVarServerCount, $incFailVar, "User")
}
}
# --- Start Debugging info - so you can see what is happening when running manually, you can remove this section if you want
write-host "Current space on " $sd": is" $currentspace " MB with minimum threshold of " $serverdrives.Get_Item($sd) "MB"
$foo1 = [Environment]::GetEnvironmentVariable($envVarServerFail, "User")
write-host $envVarServerFail "equals " $foo1
$foo2 = [Environment]::GetEnvironmentVariable($envVarServerCount, "User")
write-host $envVarServerCount "equals " $foo2
# --- End Debugging info
}
}
else {
write-host "connection to $servername failed, script aborted"
}

I think I'll leave it there for today, but I will be back to write more about the issues I found, including using user level Environment variables, hash tables and type setting.

Finally, my thanks to jrich, Kazun and Albert Widjaja on http://social.technet.microsoft.com/Forums/en/winserverpowershell/thread/55ae39fd-d585-4579-8648-61e8c949bd22 for their help while I was working on getting this script to work.