sacct

Langue: en

Version: 254060 (debian - 07/07/09)

Section: 1 (Commandes utilisateur)

NAME

sacct - displays accounting data for all jobs and job steps in the SLURM job accounting log

SYNOPSIS

sacct options

DESCRIPTION

Accounting information for jobs invoked with SLURM are logged in the job accounting log file.

The sacct command displays job accounting data stored in the job accounting log file in a variety of forms for your analysis. The sacct command displays information on jobs, job steps, status, and exitcodes by default. You can tailor the output with the use of the --format= option to specify the fields to be shown.

For the root user, the sacct command displays job accounting data for all users, although there are options to filter the output to report only the jobs from a specified user or group.

For the non-root user, the sacct command limits the display of job accounting data to jobs that were launched with their own user identifier (UID) by default. Data for other users can be displayed with the --all, --user, or --uid options.

Note:
Much of the data reported by sacct has been generated by the wait3() and getrusage() system calls. Some systems gather and report incomplete information for these calls; sacct reports values of 0 for this missing data. See your systems getrusage(3) man page for information about which data are actually available on your system.
If --dump is specified, the field selection options (--brief, --format, ...) have no effect.
Elapsed time fields are presented as 2 fields, integral seconds and integral microseconds
If --dump is not specified, elapsed time fields are presented as [[days-]hours:]minutes:seconds.hundredths.
The default input file is the file named in the jobacct_logfile parameter in slurm.conf.

Options

-a , --all
Displays the job accounting data for all jobs in the job accounting log file.
This is the default behavior when the sacct command is executed by the root user.
-A account_list, --accounts=account_list
Displays the statistics only for the jobs started on the accounts specified by the account_list operand, which is a comma-separated list of account names. Space characters are not allowed in the account_list. Default is all accounts.
-b , --brief
Displays a brief listing, which includes the following data:
jobid
status
exitcode
This option has no effect when the ---dump option is also specified.
-C cluster_list, --clusters=cluster_list
Displays the statistics only for the jobs started on the clusters specified by the cluster_list operand, which is a comma-separated list of clusters. Space characters are not allowed in the cluster_list. -1 for all clusters, default is current cluster you are executing the sacct command on.
-d , --dump
Displays (dumps) the raw data records.
This option overrides the --brief and --format= options.
The section titled "INTERPRETING THE --dump OPTION OUTPUT" describes the data output when this option is used.
--duplicates
If SLURM job ids are reset, but the job accounting log file isn't reset at the same time (with -e, for example), some job numbers will probably appear more than once in the accounting log file to refer to different jobs; such jobs can be distinguished by the "submit" time stamp in the data records.
When data for specific jobs are requested with the --jobs option, we assume that the user wants to see only the most recent job with that number. This behavior can be overridden by specifying --duplicates, in which case all records that match the selection criteria will be returned.
When --jobs is not specified, we report data for all jobs that match the selection criteria, even if some of the job numbers are reused. Specify that you only want the most recent job for each selected job number with the --noduplicates option.

-e time_spec , --expire=time_spec
Removes job data from SLURMs current accounting log file (or the file specified with --file) for jobs that completed more than time_spec ago and appends them to the expired log file.
If time_spec is an integer value only, it is interpreted as minutes. If time_spec is an integer followed by "h", it is interpreted as a number of hours. If time_spec is an integer followed by "d", it is interpreted as number of days. For example, "--expire=14d" purges the job accounting log of all jobs that completed more than 14 days ago.
The expired log file is a file with the same name as the accounting log file, with ".expired" appended to the file name. For example, if the accounting log file is /var/log/slurmacct.log, the expired log file will be /var/log/slurmacct.log.expired.
--endtime=endtime
Select jobs eligible before this time. Valid Formats are.         HH:MM[:SS] [AM|PM]
        MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
        MM/DD[/YY]-HH:MM[:SS]
-F format_list , --format=format_list
Displays the job accounting data specified by the format_list operand, which is a comma-separated list of fields. Space characters are not allowed in the format_list.
See the --helpformat option for a list of the available fields. See the section titled "Job Accounting Fields" for a description of each field.
The job accounting data is displayed in the order specified by the format_list operand. Thus, the following two commands display the same data but in different order:

 # sacct --format=jobid,status
 Jobid    Status
 ---------- ----------
 3          COMPLETED
 3.0        COMPLETED
 
 

 # sacct --format=status,jobid
 Status     Jobid
 ---------- ----------
 COMPLETED  3
 COMPLETED  3.0
 
 
The default value for the field_list operand is "jobid,jobname,partition,ncpus,state,exitcode".
This option has no effect when the --dump option is also specified.
-f file, --file=file
Causes the sacct command to read job accounting data from the named file instead of the current SLURM job accounting log file.
-g gid_list, --gid=gid_list
Displays the statistics only for the jobs started with the GID specified by the gid_list operand, which is a comma-separated list of gids. Space characters are not allowed in the gid_list. Default is no restrictions. This is virtually the same as the --group option.
-g group_list, --group=group_list
Displays the statistics only for the jobs started with the GROUP specified by the group_list operand, which is a comma-separated list of groups. Space characters are not allowed in the group_list. Default is no restrictions. This is virtually the same as the --gid option.
-h , --help
Displays a general help message.
--helpformat
Displays a list of fields that can be specified with the --format option.

 Fields available:
 account     associd     cluster     cpu       
 cputime     elapsed     eligible    end       
 exitcode    finished    gid         group     
 job         jobid       jobname     ncpus     
 nodes       nnodes      nprocs      ntasks    
 pages       partition   rss         start     
 state       status      submit      timelimit 
 submitted   systemcpu   uid         user      
 usercpu     vsize       blockid     connection
 geo         max_procs   reboot      rotate    
 bg_start_point  wckey     
 
 
The section titled "Job Accounting Fields" describes these fields.
-j job(.step) , --jobs=job(.step)
Displays information about the specified job(.step) or list of job(.step)s.
The job(.step) parameter is a comma-separated list of jobs. Space characters are not permitted in this list.
The default is to display information on all jobs.
-l, --long
Displays a long listing, which includes the following data:
jobid
jobname
partition
vsize
rss
pages
cputime
ntasks
ncpus
elapsed
status
exitcode
--noduplicates
See the discussion under --duplicates.
--noheader
Prevents the display of the heading over the output. The default action is to display a header.
This option has no effect when used with the --dump option.
-O , --formatted_dump
Dumps accounting records in an easy-to-read format.
This option is provided for debugging.
-P , --purge
Used in conjunction with --expire to remove invalid data from the job accounting log.
-p partition_list , --partition=partition_list
Displays information about jobs and job steps specified by the partition_list operand, which is a comma-separated list of partitions. Space characters are not allowed in the partition_list.
The default is to display information on jobs and job steps on all partitions.
-S , --stat
Queries the status of a job as the job is running displaying the following data:
jobid
vsize
rss
pages
cputime
ntasks
status
You must also include the --jobs=job(.step) option if no (.step) is given you will recieve the job.0 step.
-s state_list , --state=state_list
Selects jobs based on their current state, which can be designated with the following state designators:
r
running
s
suspended
ca
cancelled
cd
completed
pd
pending
f
failed
to
timed out
nf
node_fail
The state_list operand is a comma-separated list of these state designators. Space characters are not allowed in the state_list.
--starttime=starttime
Select jobs eligible after this time. Valid Formats are.         HH:MM[:SS] [AM|PM]
        MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
        MM/DD[/YY]-HH:MM[:SS]
-t , --total
Displays only the cumulative statistics for each job. Intermediate steps are displayed by default.
-u uid_list, --uid=uid_list
Displays the statistics only for the jobs started by the specified uid_list operand, which is a comma-separated list of uids. Space characters are not allowed in the uid_list. -1 for all uids, default is current uid. If run as user root default is all users. This is virtually the same as the --user option.
-u user_list, --user=user_list
Displays the statistics only for the jobs started by the specified user_list operand, which is a comma-separated list of users. Space characters are not allowed in the user_list. -1 for all uids, default is current uid. If run as user root default is all users. This is virtually the same as the --uid option.
--usage
Displays a help message.
-v , --verbose
Reports the state of certain variables during processing. This option is primarily used for debugging.
-W wckey_list, --wckeys=wckey_list
Displays the statistics only for the jobs started on the wckeys specified by the wckey_list operand, which is a comma-separated list of wckey names. Space characters are not allowed in the wckey_list. Default is all wckeys.

Job Accounting Fields

The following describes each job accounting field:
account
User supplied account number for the job
blockid
Block ID, applicable to BlueGene computers only
cpu
The sum of the system time (systemcpu) and user time (usercpu) in seconds
cputime
Minimum CPU time of any process followed by its task id along with the average of all processes running in the step.
elapsed
The jobs elapsed time.
The format of this fields output is as follows:
[DD-[hh:]]mm:ss
as defined by the following:
DD
days
hh
hours
mm
minutes
ss
seconds
end
Termination time of the job. Format output is as follows:
MM/DD-hh:mm:ss
as defined by the following:
MM
month
DD
day
hh
hours
mm
minutes
ss
seconds
exitcode
The first non-zero error code returned by any job step.
gid
The group identifier of the user who ran the job.
group
The group name of the user who ran the job.
idrss
Maximum unshared data size (in KB) of any process.
inblocks
Total block input operations for all processes.
isrss
Maximum unshared stack space size (in KB) of any process.
ixrss
Maximum shared memory (in KB) of any process.
job
The SLURM job identifier of the job.
jobid
The number of the job or job step. It is in the form: job.jobstep.
jobname
The name of the job or job step.
majflt
Maximum number of major page faults for any process.
minflt
Maximum number of minor page faults (page reclaims) for any process.
msgrcv
Total number of messages received for all processes.
msgsnd
Total number of messages sent for all processes.
ncpus
Total number of CPUs allocated to the job.
nivcsw
Total number of involuntary context switches for all processes.
nodes
A list of nodes allocated to the job.
nprocs
Total number of tasks in job. Identical to ntasks.
nsignals
Total number of signals received for all processes.
nswap
Maximum number of swap operations of any process.
ntasks
Total number of tasks in job.
nvcsw
Total number of voluntary context switches for all processes.
outblocks
Total block output operations for all processes.
pages
Maximum page faults of any process followed by its task id along with the average of all processes running in the step.
partition
Identifies the partition on which the job ran.
rss
Maximum resident set size of any process followed by its task id along with the average of all processes running in the step.
start
Initiation time of the job in the same format as end.
status
Displays the job status, or state.
Output can be RUNNING, SUSPENDED, COMPLETED, CANCELLED, FAILED, TIMEOUT, or NODE_FAIL.
submit
The time and date stamp (in Universal Time Coordinated, UTC) the job was submitted. The format of the output is identical to that of the end field.
systemcpu
The amount of system CPU time. (If job was running on multiple cpus this is a combination of all the times so this number could be much larger than the elapsed time.) The format of the output is identical to that of the elapsed field.
uid
The user identifier of the user who ran the job.
uid.gid
The user and group identifiers of the user who ran the job. (This field is used in record headers, and simply concatenates the uid and gid fields.)
user
The user name of the user who ran the job.
usercpu
The amount of user CPU time. (If job was running on multiple cpus this is a combination of all the times so this number could be much larger than the elapsed time.) The format of the output is identical to that of the elapsed field.
vsize
Maximum Virtual Memory size of any process followed by its task id along with the average of all processes running in the step.
wckey
Workload Characterization Key. Arbitrary string for grouping orthogonal accounts together.

INTERPRETING THE -DUMP OPTION OUTPUT

The sacct commands --dump option displays data in a horizontal list of fields depending on the record type; there are three record types: JOB_START, JOB_STEP, and JOB_TERMINATED. There is a subsection that describes the output for each record type.

When the data output is a job accounting field, as described in the section titled "Job Accounting Fields", only the name of the job accounting field is listed. Otherwise, additional information is provided.

Note:
The output for the JOB_STEP and JOB_TERMINATED record types present a pair of fields for the following data: Total CPU time, Total User CPU time, and Total System CPU time. The first field of each pair is the time in seconds expressed as an integer. The second field of each pair is the fractional number of seconds multiplied by one million. Thus, a pair of fields output as "1 024315" means that the time is 1.024315 seconds. The least significant digits in the second field are truncated in formatted displays.

Output for the JOB_START Record Type

The following describes the horizontal fields output by the sacct --dump option for the JOB_START record type.
Field #
Field
1
job
2
partition
3
submitted
4
The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
5
uid.gid
6
(Reserved)
7
JOB_START (literal string)
8
Job Record Version (1)
9
The number of fields in the record (16)
10
uid
11
gid
12
The job name
13
Batch Flag (0=no batch)
14
Relative SLURM priority
15
ncpus
16
nodes

Output for the JOB_STEP Record Type

The following describes the horizontal fields output by the sacct --dump option for the JOB_STEP record type.
Field #
Field
1
job
2
partition
3
submitted
4
The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
5
uid.gid
6
(Reserved)
7
JOB_STEP (literal string)
8
Job Record Version (1)
9
The number of fields in the record (38)
10
jobid
11
end
12
Completion Status; the mnemonics, which may appear in uppercase or lowercase, are as follows:
CA
Cancelled
CD
Completed successfully
F
Failed
NF
Job terminated from node failure
R
Running
S
Suspended
TO
Timed out
13
exitcode
14
ntasks
15
ncpus
16
elapsed time in seconds expressed as an integer
17
Integer portion of the Total CPU time in seconds for all processes
18
Fractional portion of the Total CPU time for all processes expressed in microseconds
19
Integer portion of the Total User CPU time in seconds for all processes
20
Fractional portion of the Total User CPU time for all processes expressed in microseconds
21
Integer portion of the Total System CPU time in seconds for all processes
22
Fractional portion of the Total System CPU time for all processes expressed in microseconds
23
rss
24
ixrss
25
idrss
26
isrss
27
minflt
28
majflt
29
nswap
30
inblocks
31
outblocks
32
msgsnd
33
msgrcv
34
nsignals
35
nvcsw
36
nivcsw
37
vsize

Output for the JOB_TERMINATED Record Type

The following describes the horizontal fields output by the sacct --dump option for the JOB_TERMINATED (literal string) record type.
Field #
Field
1
job
2
partition
3
submitted
4
The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
5
uid.gid
6
(Reserved)
7
JOB_TERMINATED (literal string)
8
Job Record Version (1)
9
The number of fields in the record (38)
Although thirty-eight fields are displayed by the sacct command for the JOB_TERMINATED record, only fields 1 through 12 are recorded in the actual data file; the sacct command aggregates the remainder.
10
The total elapsed time in seconds for the job.
11
end
12
Completion Status; the mnemonics, which may appear in uppercase or lowercase, are as follows:
CA
Cancelled
CD
Completed successfully
F
Failed
NF
Job terminated from node failure
R
Running
TO
Timed out
13
exitcode
14
ntasks
15
ncpus
16
elapsed time in seconds expressed as an integer
17
Integer portion of the Total CPU time in seconds for all processes
18
Fractional portion of the Total CPU time for all processes expressed in microseconds
19
Integer portion of the Total User CPU time in seconds for all processes
20
Fractional portion of the Total User CPU time for all processes expressed in microseconds
21
Integer portion of the Total System CPU time in seconds for all processes
22
Fractional portion of the Total System CPU time for all processes expressed in microseconds
23
rss
24
ixrss
25
idrss
26
isrss
27
minflt
28
majflt
29
nswap
30
inblocks
31
outblocks
32
msgsnd
33
msgrcv
34
nsignals
35
nvcsw
36
nivcsw
37
vsize

EXAMPLES

This example illustrates the default invocation of the sacct command:

 # sacct
 Jobid      Jobname    Partition    Ncpus Status     Exitcode
 ---------- ---------- ---------- ------- ---------- --------
 2          script01   srun             1 RUNNING           0
 3          script02   srun             1 RUNNING           0
 4          endscript  srun             1 RUNNING           0
 4.0                   srun             1 COMPLETED         0
 
 

This example shows the same job accounting information with the brief option.


 # sacct --brief
 Jobid      Status     Exitcode
 ---------- ---------- --------
 2          RUNNING           0
 3          RUNNING           0
 4          RUNNING           0
 4.0        COMPLETED         0
 

 # sacct --total
 Jobid      Jobname    Partition    Ncpus Status     Exitcode
 ---------- ---------- ---------- ------- ---------- --------
 3          sja_init   andy             1 COMPLETED         0
 4          sjaload    andy             2 COMPLETED         0
 5          sja_scr1   andy             1 COMPLETED         0
 6          sja_scr2   andy            18 COMPLETED         2
 7          sja_scr3   andy            18 COMPLETED         0
 8          sja_scr5   andy             2 COMPLETED         0
 9          sja_scr7   andy            90 COMPLETED         1
 10         endscript  andy           186 COMPLETED         0
 
 

This example demonstrates the ability to customize the output of the sacct command. The fields are displayed in the order designated on the command line.


 # sacct --fields=jobid,ncpus,ntasks,nsignals,status
 Jobid        Ncpus  Ntasks  Nsignals Status
 ---------- ------- ------- --------- ----------
 3                2       1         0 COMPLETED
 3.0              2       1         0 COMPLETED
 4                2       2         0 COMPLETED
 4.0              2       2         0 COMPLETED
 5                2       1         0 COMPLETED
 5.0              2       1         0 COMPLETED
 
 

COPYING

Copyright (C) 2005-2007 Copyright Hewlett-Packard Development Company L.P.

This file is part of SLURM, a resource management program. For details, see <https://computing.llnl.gov/linux/slurm/>.

SLURM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

SLURM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

FILES

/etc/slurm.conf
Entries to this file enable job accounting and designate the job accounting log file that collects system job accounting.
/var/log/slurm_accounting.log
The default job accounting log file. By default, this file is set to read and write permission for root only.

SEE ALSO

ps(1), srun(1), squeue(1), getrusage(2), time(2)