Filesystem space usage monitoring using IBM RSCT

This post will describe, how to configure Filesystem space usage monitoring using IBM Reliable Scalable Cluster Technology.
After the configuration is compleate the specified user will receive notifications if a treshold of a monitored filesystesm has been reached.

First we need to select a condition that should be monitored. The list of predefined conditions that are available can be shown using the command:

# lscondition
...
"/var space used"                   "Not monitored" 
"/tmp space used"                   "Not monitored"     
...

To list details about a condition use:

# lscondition "/var space used"
Displaying condition information:

condition 1:
        Name                        = "/var space used"
        MonitorStatus               = "Not monitored"
        ResourceClass               = "IBM.FileSystem"
        EventExpression             = "PercentTotUsed > 90"
        EventDescription            = "An event will be generated when more than 90 percent of the total space in the /var directory is in use."
        RearmExpression             = "PercentTotUsed < 75"
        RearmDescription            = "The event will be rearmed when the percent of the space used in the /var directory falls below 75 percent."
        SelectionString             = "Name == \"/var\""
        Severity                    = "i"
        NodeNames                   = {}
        MgtScope                    = "l"
        Toggle                      = "Yes"
        EventBatchingInterval       = 0
        EventBatchingMaxEvents      = 0
        BatchedEventRetentionPeriod = 0
        BatchedEventMaxTotalSize    = 0
        RecordAuditLog              = "ALL"

In this listing we have “EventExpression” “PercentTotUsed > 90”, this means that a event will be trigerred if the Filesystem space usage will be greather than 90 percent. After this happens a response will be trigered (email notification) and the system will not any more monitor the > 90 treshold but will monitor the RearmExpression “PercentTotUsed < 75". So you will receive only one notification (response) with the information that the filesystem usage is above 90 percent. Afterwards the < 75 percent treshold will be monitored. If the filesystem space usage drops below 75 percent another event will be trigered that will send a "rearm" notification/email/response.

Btw. For more information reffer to the IBM RSCT Admin Guide here
http://www-01.ibm.com/support/docview.wss?uid=pub1sa22788920

If desired the condition parameters may be individually altered using the chcondition command.

Now lets have a look at the predefined responses.

# lsresponse
Displaying response information:
ResponseName                  
"Generate SNMP trap"          
"Critical notifications"      
"Warning notifications"       
"Informational notifications" 
"Log event anytime"           
"E-mail root anytime"         
"E-mail root off-shift"       
"Broadcast event on-shift"    

In this scenario we will use the “E-mail root anytime” response for event notifications. Lets have a look what it does:

# lsresponse "E-mail root anytime"
Displaying response information:

        ResponseName    = "E-mail root anytime"
        Action          = "E-mail root"
        DaysOfWeek      = 1-7
        TimeOfDay       = 0000-2400
        ActionScript    = "/usr/sbin/rsct/bin/notifyevent root"
        ReturnCode      = -1
        CheckReturnCode = "n"
        EventType       = "b"
        StandardOut     = "n"
        EnvironmentVars = ""
        UndefRes        = "n"
        EventBatching   = "n"

Also each response may be individually altered to fit your needs, you can also create your own. To alter a response use the chresponse command.

In order to make this work we have to make a monitored condition to trigger a response. This can be done using the mkcondresp command:

Condition to be monitored
                |
# mkcondresp "/var space used" "E-mail root anytime"
                                |   
                                Response to be executed

To list the condition -> response mapping and their state use lscondresp.

# lscondresp
Displaying condition with response information:
Condition             Response                   State        
"/var space used"     "E-mail root anytime"      "Not active" 

Now we have a defined condition that will monitor the /var filesystem space and a response that will email root if the pre-defined treshold is reached. However this mechanism is not active at the moment.
To activete this monitoring use:

  
# startcondresp "/var space used" "E-mail root anytime"

# lscondresp                                           
Displaying condition with response information:
Condition             Response                   State    
"/var space used"     "E-mail root anytime"      "Active" 

To list the records from the audit log use the lsaudrec comand:

# lsaudrec
7/06/12 11:56:31      ERRM Info     Monitoring of condition /var space used is started successfully.

Now I will fill the /var filesystem to see how this works.

After..

# lsaudrec 
...
07/06/12 12:16:31      ERRM Info     Event from /var space used that occurred at 07/06/12 12:16:31 200114 will cause /usr/sbin/rsct/bin/notifyevent root from E-mail root anytime to be executed.
07/06/12 12:17:31      ERRM Info     Event from /var space used that occurred at 07/06/12 12:16:31 200114 caused /usr/sbin/rsct/bin/notifyevent root from E-mail root anytime to complete with a return code of 0
... 

So here we see that the event “/var space used” occured and that a email notification to root has been sucsesfully send using the predefined response.

Here is the root email

Message  2:
From root Fri Jul  6 12:17:31 2012
Date: Fri, 6 Jul 2012 12:17:31 +0200
From: root of all 
To: root
Subject: /var space used

=====================================

Friday 07/06/12 12:16:31

Condition Name: /var space used
Severity: Informational
Event Type: Event
Expression: PercentTotUsed > 90

Resource Name: /var
Resource Class: IBM.FileSystem
Data Type: CT_INT32
Data Value: 95
Node Name: power1
Node NameList: {power1}
Resource Type: 0
=====================================

The lsevent command will show a shorter event list

Time        = 07/06/12 12:16:31 200816
Category    = Info
Description = Event : /var space used occurred at 07/06/12 12:16:31 200114 on /var on power1.

Ok now I will clean the Filesystem, so we can see a rearm event with a notification.

# lsaudrec
07/06/12 12:46:31      ERRM Info     Rearm event : /var space used occurred at 07/06/12 12:46:31 317875 on /var on power1.
 
07/06/12 12:46:31      ERRM Info     Rearm event from /var space used that occurred at 07/06/12 12:46:31 317875 will cause /usr/sbin/rsct/bin/notifyevent root from E-mail root anytime to be executed.
 
07/06/12 12:47:31      ERRM Info     Rearm event from /var space used that occurred at 07/06/12 12:46:31 317875 caused /usr/sbin/rsct/bin/notifyevent root from E-mail root anytime to complete with a return code of 0.
 

And we have also a email

Message  2:
From root Fri Jul  6 12:47:31 2012
Date: Fri, 6 Jul 2012 12:47:31 +0200
From: root of all 
To: root
Subject: /var space used

=====================================

Friday 07/06/12 12:46:31

Condition Name: /var space used
Severity: Informational
Event Type: Rearm event
Expression: PercentTotUsed < 75

Resource Name: /var
Resource Class: IBM.FileSystem
Data Type: CT_INT32
Data Value: 59
Node Name: power1
Node NameList: {power1}
Resource Type: 0
=====================================

Now if we want to monitor a filesystem that is not in the pre-defined conditions we can copy a existing one and alter its settings


                                    Event exprestion with your treshold                    
               Condition to cpopy   |                       Event description                       
               |                    |                       |                                                             
# mkcondition -c "/var space used" -e "PercentTotUsed > 85" -d "Warning: /export2 directory is more then 85 percent full" /
 -D "OK: /export2 direcotry is below 75 percent full" -s "Name == \"/export2\"" "/export2 space 85percent"
  |                                                    |                         ^^^^^^^^^^^^^^^^^^^^^^^^            
 Rearm event description                 Selection string                                   |
                                                                                            Name of the new condition that we are creating

Now lets map our new condition to a response and start the monitoring.

                                         
# mkcondresp "/export2 space 85percent" "E-mail root anytime"
                            |                     |
                     Condition name         Response name
                            |                     |
# startcondresp "/export2 space 85percent" "E-mail root anytime"

# lscondresp
Displaying condition with response information:
Condition                  Response                   State    
"/export2 space 85percent" "E-mail root anytime"      "Active" 

Thats it.

Advertisements
2 comments
  1. Amit said:

    Hi,
    Its nice to know this RSCT monitoring for FS consumption,Is it possible to send notification to user specific e-mail id?

  2. JJ said:

    Hello Amit, yes you can change the existing response using “chresponse”, or create a new one using “mkresponse”. Details can be found in the respective man pages.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: