pbstop

Langue: en

Version: 2009-02-26 (fedora - 04/07/09)

Section: 1 (Commandes utilisateur)

NAME

pbstop - monitoring utility for OpenPBS or Torque

SYNOPSIS

pbstop [OPTION]... [@hostname]...

DESCRIPTION

Draws a full-terminal display of your nodes and jobs. The default grid shows each node's 1st CPU as a single character. The specific character denotes the state of the node or identifies the job running on that CPU. The job listing shows the job name, queue name, state, etc. and, on the far left, the character used to identify nodes in the upper grid. Pressing a number key will toggle the display of that CPU on all of the nodes.

This program runs best if the "perl-PBS" module is installed. While there are currently no loss of features if it isn't installed, it will run much faster with it. If you are unsure if PBS is installed, run this program, hit "h", and look for the Backend information at the top right.

COMMAND-LINE OPTIONS

-s num
seconds between refreshes
-c num
number of columns to display in the grid (0 scales based on term width)
-m num
max number of cpus in a node before it gets its own grid
-n
don't put spaces between each node in the grid display for a more compact display (no space)
-q
queue name for limiting the view of the grid and job list. Only one name is supported at this time. No corresponding interactive command.
-u
usernames for limiting the view of the grid and job list. Can be a comma-seperated list of usernames or "all". "me" is a pseudonym for the username running pbstop(1).
-C
toggle colorization
-S
toggle state summary display
-G
toggle grid display
-Q
toggle queue display
-t
toggle showing queued jobs in queue display
-[0-9]...
cpu numbers for grid display
-J
toggle jobs in grid display
-fillbg
fill the background with black instead of using the terminal's default
-V
print version and exit

INTERACTIVE COMMANDS

Several single-key commands are recognized while pbstop(1) is running. The arrow keys, PageUp, and PageDown keys will scroll the display if it doesn't fit in your terminal.

When prompted to type something, ctrl-g can be used to cancel the command.

space
Immediately update display
q
Quit pbstop(1)
h
Display help screen, version, and current settings
c
Prompts for the number of columns to display the node grid (0 auto-scales based on term width)
s
Prompts for the number of seconds to wait between display updates
u
Prompts for a username. The grid and job listing will be limited to the named user. Input "all" will remove all limitations (the default), and "me" will limit to the current username running pbstop(1). If the username or "me" is prefixed with a "+" or "-", the username will be added or removed from the list of usernames to be limited. "a" and "m" are shortcuts for "all" and "me".
/
Prompts the user for a search string, for displaying the details of. The search can optionally begin with one of the following pattern specifiers (think: mutt): "~s" for a server, "~n" for a node, or "~j" for a job number. If no pattern specifier is found, pbstop will attempt to find the object that best matches the search string. The string can be a server name, nodename, or a job number. Nodenames can optionally be followed by a space and the server name. Job numbers may optionally be followed by a dot and the server name.

If an object is found, a subwindow will be opened displaying details. Hit "q" to exit the window.

When viewing a job detail subwindow, pressing "l" is a shortcut for jumping directly to the associated job's node load subwindow.

(Mnemonic: like using / to search for text in vi or less)

l
Prompts the user for a job id. A node load report subwindow will be displayed for the given jobid. This subwindow shows the current load average, the physical and available memory, and the number of sessions. Available physical memory will be negative in the event of swapping. If the number of sessions is 0, that might indicate a problem on that node.

Pressing "l" in this subwindow jumps you directly to the associated job detail subwindow; as if the user typed "/jobid".

(Mnemonic: load average)

C
Toggle the use of the colors in the display
S
Toggle the display of the state summary
G
Toggle the display of the node grid
Q
Toggle the display of the job queue
t
Toggle the display of currently queued (not running) jobs in the display. This can reduce the size of the queue display considerably in some environments.

(Mnemonic: I don't know, toggle? "Q" was already used for something more important)

J
Toggle the display of job letters in the node grid. This handy because you can see the node state ``hidden'' behind the job letter. For example, use this to see which nodes are not yet ``busy'' that have jobs.
f
Toggle background fill with black instead of using the terminal's default. Use this if the display looks bad on your colored or transparent background.
Any single number (0-9)
Toggle display of that CPU number in the display. This is confusing at first, but useful in SMP environments (See SMP section below).

STARTUP

pbstop(1) has many configuration variables that can set on the command line, interactively, or from configuration files. When pbstop(1) starts, it first initializes these variables with built-in defaults, then reads in /etc/pbstoprc, the reads ~/.pbstoprc, and finally parses the command line arguments. Note that several of the command line arguments and interactive commands are toggles, they don't directly set the value of the configuration. In contrast, the configuration files are not toggles.

The configuration files may contain following name=value pairs:

columns
Number of columns in the node grid, positive integer (0 scales based on term width)
sleeptime
Number of seconds to pause between display updates, positive integer
colorize
Use colors in the display, 1 or 0
show_summary
Display the summary at the top of the display, 1 or 0
compact_summary
Show node state summary on one line, 1 or 0
showncpus
Show the NCPUs job resource in the queue display, 1 or 0.
nodesort
Define the sorting method for the nodes in the main display grid. The current possible methods are:
ordered
Preserves the order given from pbs_server without sorting; good for nodes that don't follow a specific pattern or order.
lexical
Simple alphabetical sort. Fastest method for nodes with zero-padded names such as node0023.
integer
The first numbers found for an integer sort. Useful if you are unfortunate enough to not have zero-padded nodes, like node1 and node23.
mixed
Lexical sort followed by an integer sort. Should give meaningful results in all cases, especially if you are *really* unfortunate enough to not have zero-padded nodes and have different leading strings, like lin34 and win5. This is the default.
mixed2
Mixed sort followed by another mixed sort. Useful for pathelogical admins that name their nodes after rack positions, like rack1node4 and rack10node12.
nodesort_host
Defines sorting methods on a per-server basis. It is a comma-delimited list of ``host=method'' pairs surrounded by paranthesis, i.e. nodesort_host=(serv1=ordered,serv2=lexical). The host part is first checked as an exact match, otherwise is interpreted as a perl regexp (first match wins).
nospace
No space between nodes in grid for a more compact display, 1 or 0
show_grid
Show the node grid, 1 or 0
show_queue
Show the job queue, 1 or 0
show_qqueue
Show queued (not running) jobs in the queue display, 1 or 0
show_jobs
Show job and color information in the node grid, 1 or 0
show_cpu
Comma seperated list of CPU numbers to display
show_onlyq
Queue name to limit the view in the grid and job list. Only one name is supported at this time.
show_user
Usernames to limit the view in the grid and job list. Can be a comma-seperated list of users, "all", or "me".

It might be reasonable for a site to have "show_user=me" in /etc/pbstoprc and for admin users to have "show_user=all" in their own ~/.pbstoprc.

Members of a group might want all of their groupmates's usernames in their own ~/.pbstoprc.

host
Comma seperated list of hostnames running pbs_server
maxrows
Number of rows in the large scrollable panel
maxcolums
Number of columns in the large scrollable panel
maxnodegrid
Fill the background with black, 1 or 0

A sample configuration file:

     # I'm grumpy and don't like color
     colorize=0
 
     # my 6 CPU machine should get a seperate grid
     maxnodegrid=5
 
     # all of my Torque servers
     host=teraserver,bigbird,testhpc
 
     # teraserver has strict naming, testhpc has useless naming
     nodesort_host=(.*\.usc.edu=integer,teraserver=lexical,testhpc=ordered)
 
 

SMP ENVIRONMENTS

pbstop(1) was developed with three specific clusters in mind, these are a 1700 node cluster of dual SMP machines, a 64 proc SMP with 16 single node machines, and a 21 node cluster of single procs without nicely numbered hostnames. With this kind of pedigree, pbstop(1) is fairly flexible.

The number of columns in the grid can be shrunk or expanded on the command line with "-C", or interactively with "c". Additional CPUs can be displayed by pressing the appropriate number key. Using the number keys is confusing at first, but if you try it a few times it will became natural. By default, nodes with 8 or more CPUs are displayed in a seperate grid.

The first two clusters mentioned above display well with the defaults. The third is typically displayed with the number of columns set to ``1''.

FILES

/etc/pbstoprc
The global configuration file
~/.pbstoprc
The personal configuration file.

ENVIRONMENTAL VARIABLES

PBS_DEFAULT
The server's hostname (same as most PBS client commands)

SEE ALSO

PBS(3pm), qstat(1B)

BUGS

The large Job structure uses the servername supplied by the user, the Job structure uses the servername returned by the server... so they don't match up (this makes the jobloadreport imprecise). The curses code is very ineffecient and the display gets corrupted at times. It can't produce plain text output like top's ``batch'' mode. grep FIXME from pbstop for more!

AUTHOR

pbstop(1) was originally written by Garrick Staples <garrick@usc.edu>. The node grid and lettering concept is from Dennis Smith. Thanks to Egan Ford and the xCAT mailing list for testing and feedback.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:
Around line 2613:
=back doesn't take any parameters, but you said =back 4