NetApp Performance Monitoring

 

Netapp sysstat reports filer performance statistics like CPU utilization, the amount of disk traffic, and cache utilization. When run without options, sysstat will print a new line every 15 seconds, of just a basic amount of information. You have to use control-C (^c) or set the interval count (-c count ) to stop sysstat after time. For more detailed information, use the -u option. For specific information to one particular protocol, you can use other options.

 

More info: http://www.wafl.co.uk/sysstat/

 

Synopsis:

sysstat [ interval ]

sysstat [ -c count ] [ -s ] [ -u | -x | -m | -f | -i | -b ] [ interval ]

  • -c count

    Terminate the output after count number of iterations. The count is a positive, nonzero integer, values larger than LONG_MAX will be truncated to LONG_MAX.

  • -s

    Display a summary of the output columns upon termination, descriptive columns such as `CP ty’ will not have summaries printed. Note that, with the exception of `Cache hit’, the `Avg’ summary for percentage values is an average of percentages, not a true mean of the underlying data. The `Avg’ is only intended as a gross indicator of performance. For more detailed information use tools such as nfsstat, netstat, or statit.

  • -f

    For the default format display FCP statistics.

  • -i

    For the default format display iSCSI statistics.

  • -b

    Display the SAN extended statistics instead of the default display.

  • -u

    Display the extended utilization statistics instead of the default display.

  • -x

    Displays the extended output format instead of the default display. This includes all available output fields. Be aware that this produces output that is longer than 80 columns and is generally intended for “offline” types of analysis and not for “realtime” viewing.

  • -m

    Displays multi-processor CPU utilization statistics. In addition to the percentage of the time that one or more CPUs were busy (ANY), the average (AVG) is displayed, as well as, the individual utilization of each processor.

  • interval

    A positive, non-zero integer that represents the reporting interval in seconds. If not provided, the default is 15 seconds.

     

Here are some explanations on the columns of netapp sysstat command.

 

Cache age : The age in minutes or seconds (by the added s) of the oldest read-only blocks in the buffer cache. Data in this column indicates how fast read operations are cycling through system memory; when the filer is reading very large files, buffer cache age will be very low. Also if reads are random, the cache age will be low. If you have a performance problem, where the read performance is poor, this number may indicate you need a larger memory system or  analyze the application to reduce the randomness of the workload.

 

Cache hit : This is the WAFL cache hit rate percentage. This is the percentage of times where WAFL tried to read a data block from disk that and the data was found already cached in memory. A dash in this column indicates that WAFL did not attempt to load any blocks during the measurement interval.

 

CP Ty : Consistency Point (CP) type is the reason that a CP started in that interval. The CP types are:

 


  • No CP started during sampling interval

  • number

    Number of CPs started during sampling interval, if greater than one

  • B

    Back to back CPs (CP generated CP)

  • b

    Deferred back to back CPs (CP generated CP)

  • F

    CP caused by full NVLog

  • H

    A type H CP is a CP from high watermark in modified buffers. If a CP is not in progress, and the number of buffers holding data that has been modified but not yet written to disk exceeds a threshold, then a CP from high watermark is triggered.

  • L

    A type L CP is a CP from low watermark in available buffers. If a CP is not in progress, and the number of buffers available goes below a threshold, then a CP form low watermark is triggered.

  • S

    CP caused by snapshot operation

  • T

    CP caused by timer

  • U

    CP caused by flush

  • Z

    CP caused by internal sync

  • V

    CP caused by low virtual buffers

  • M

    CP caused by low mbufs

  • D

    CP caused by low datavecs

  • :

    continuation of CP from previous interval

  • #

    continuation of CP from previous interval, and the NVLog for the next CP is now full, so that the next CP will be of type B.

 

The type character is followed by a second character which indicates the phase of the CP at the end of the sampling interval. If the CP completed during the sampling interval, this second character will be blank. The phases are:

 

  • 0

    Initializing

  • n

    Processing normal files

  • s

    Processing special files

  • q

    Processing quota files

  • f

    Flushing modified data to disk

  • v

    Flushing modified superblock to disk

     

CP util : The Consistency Point (CP) utilization, the % of time spent in a CP.  100% time in CP is a good thing. It means, the amount of time, used out of the cpu, that was dedicated to writing data, 100% of it was used. 75% means, that only 75% of the time allocated to writing data was utilized, which means we wasted 25% of that time. A good CP percentage has to be at or near 100%.

 

Examples:

 

sysstat
Display the default output every 15 seconds, requires control-C to terminate.

sysstat 1
Display the default output every second, requires control-C to terminate.

sysstat -s 1
Display the default output every second, upon control-C termination print out the summary statistics.

sysstat -c 10
Display the default output every 15 seconds, stopping after the 10th iteration.

sysstat -c 10 -s -u 2

sysstat -u -c 10 -s 2
Display the utilization output format, every 2 seconds, stopping after the 10th iteration, upon completion print out the summary statistics.

sysstat -x -s 5
Display the extended (full) output, every 5 seconds, upon control-C termination print out the summary statistics.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.