Rechercher une page de manuel
sacct
Langue: en
Version: 254060 (debian - 07/07/09)
Section: 1 (Commandes utilisateur)
Sommaire
NAME
sacct - displays accounting data for all jobs and job steps in the SLURM job accounting logSYNOPSIS
- sacct options
DESCRIPTION
- Accounting information for jobs invoked with SLURM are logged in the job accounting log file.
The sacct command displays job accounting data stored in the job accounting log file in a variety of forms for your analysis. The sacct command displays information on jobs, job steps, status, and exitcodes by default. You can tailor the output with the use of the --format= option to specify the fields to be shown.
For the root user, the sacct command displays job accounting data for all users, although there are options to filter the output to report only the jobs from a specified user or group.
For the non-root user, the sacct command limits the display of job accounting data to jobs that were launched with their own user identifier (UID) by default. Data for other users can be displayed with the --all, --user, or --uid options.
- Note:
- Much of the data reported by sacct has been generated by the wait3() and getrusage() system calls. Some systems gather and report incomplete information for these calls; sacct reports values of 0 for this missing data. See your systems getrusage(3) man page for information about which data are actually available on your system.
- If --dump is specified, the field selection options (--brief, --format, ...) have no effect.
- Elapsed time fields are presented as 2 fields, integral seconds and integral microseconds
- If --dump is not specified, elapsed time fields are presented as [[days-]hours:]minutes:seconds.hundredths.
- The default input file is the file named in the jobacct_logfile parameter in slurm.conf.
Options
- -a , --all
- Displays the job accounting data for all jobs in the job accounting log file.
- This is the default behavior when the sacct command is executed by the root user.
- -A account_list, --accounts=account_list
- Displays the statistics only for the jobs started on the accounts specified by the account_list operand, which is a comma-separated list of account names. Space characters are not allowed in the account_list. Default is all accounts.
- -b , --brief
- Displays a brief listing, which includes the following data:
-
- •
- jobid
- •
- status
- •
- exitcode
-
- This option has no effect when the ---dump option is also specified.
- -C cluster_list, --clusters=cluster_list
- Displays the statistics only for the jobs started on the clusters specified by the cluster_list operand, which is a comma-separated list of clusters. Space characters are not allowed in the cluster_list. -1 for all clusters, default is current cluster you are executing the sacct command on.
- -d , --dump
- Displays (dumps) the raw data records.
- This option overrides the --brief and --format= options.
- The section titled "INTERPRETING THE --dump OPTION OUTPUT" describes the data output when this option is used.
- --duplicates
- If SLURM job ids are reset, but the job accounting log file isn't reset at the same time (with -e, for example), some job numbers will probably appear more than once in the accounting log file to refer to different jobs; such jobs can be distinguished by the "submit" time stamp in the data records.
- When data for specific jobs are requested with the --jobs option, we assume that the user wants to see only the most recent job with that number. This behavior can be overridden by specifying --duplicates, in which case all records that match the selection criteria will be returned.
- When --jobs is not specified, we report data for all jobs that match the selection criteria, even if some of the job numbers are reused. Specify that you only want the most recent job for each selected job number with the --noduplicates option.
- -e time_spec , --expire=time_spec
- Removes job data from SLURMs current accounting log file (or the file specified with --file) for jobs that completed more than time_spec ago and appends them to the expired log file.
- If time_spec is an integer value only, it is interpreted as minutes. If time_spec is an integer followed by "h", it is interpreted as a number of hours. If time_spec is an integer followed by "d", it is interpreted as number of days. For example, "--expire=14d" purges the job accounting log of all jobs that completed more than 14 days ago.
- The expired log file is a file with the same name as the accounting log file, with ".expired" appended to the file name. For example, if the accounting log file is /var/log/slurmacct.log, the expired log file will be /var/log/slurmacct.log.expired.
- --endtime=endtime
- Select jobs eligible before this time. Valid Formats are. HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
- -F format_list , --format=format_list
- Displays the job accounting data specified by the format_list operand, which is a comma-separated list of fields. Space characters are not allowed in the format_list.
- See the --helpformat option for a list of the available fields. See the section titled "Job Accounting Fields" for a description of each field.
- The job accounting data is displayed in the order specified by the format_list operand. Thus, the following two commands display the same data but in different order:
-
-
# sacct --format=jobid,status Jobid Status ---------- ---------- 3 COMPLETED 3.0 COMPLETED
-
# sacct --format=status,jobid Status Jobid ---------- ---------- COMPLETED 3 COMPLETED 3.0
- The default value for the field_list operand is "jobid,jobname,partition,ncpus,state,exitcode".
- This option has no effect when the --dump option is also specified.
- -f file, --file=file
- Causes the sacct command to read job accounting data from the named file instead of the current SLURM job accounting log file.
- -g gid_list, --gid=gid_list
- Displays the statistics only for the jobs started with the GID specified by the gid_list operand, which is a comma-separated list of gids. Space characters are not allowed in the gid_list. Default is no restrictions. This is virtually the same as the --group option.
- -g group_list, --group=group_list
- Displays the statistics only for the jobs started with the GROUP specified by the group_list operand, which is a comma-separated list of groups. Space characters are not allowed in the group_list. Default is no restrictions. This is virtually the same as the --gid option.
- -h , --help
- Displays a general help message.
- --helpformat
- Displays a list of fields that can be specified with the --format option.
-
Fields available: account associd cluster cpu cputime elapsed eligible end exitcode finished gid group job jobid jobname ncpus nodes nnodes nprocs ntasks pages partition rss start state status submit timelimit submitted systemcpu uid user usercpu vsize blockid connection geo max_procs reboot rotate bg_start_point wckey
- The section titled "Job Accounting Fields" describes these fields.
- -j job(.step) , --jobs=job(.step)
- Displays information about the specified job(.step) or list of job(.step)s.
- The job(.step) parameter is a comma-separated list of jobs. Space characters are not permitted in this list.
- The default is to display information on all jobs.
- -l, --long
- Displays a long listing, which includes the following data:
-
- •
- jobid
- •
- jobname
- •
- partition
- •
- vsize
- •
- rss
- •
- pages
- •
- cputime
- •
- ntasks
- •
- ncpus
- •
- elapsed
- •
- status
- •
- exitcode
-
- --noduplicates
- See the discussion under --duplicates.
- --noheader
- Prevents the display of the heading over the output. The default action is to display a header.
- This option has no effect when used with the --dump option.
- -O , --formatted_dump
- Dumps accounting records in an easy-to-read format.
- This option is provided for debugging.
- -P , --purge
- Used in conjunction with --expire to remove invalid data from the job accounting log.
- -p partition_list , --partition=partition_list
- Displays information about jobs and job steps specified by the partition_list operand, which is a comma-separated list of partitions. Space characters are not allowed in the partition_list.
- The default is to display information on jobs and job steps on all partitions.
- -S , --stat
- Queries the status of a job as the job is running displaying the following data:
-
-
- •
- jobid
- •
- vsize
- •
- rss
- •
- pages
- •
- cputime
- •
- ntasks
- •
- status
-
- You must also include the --jobs=job(.step) option if no (.step) is given you will recieve the job.0 step.
- -s state_list , --state=state_list
- Selects jobs based on their current state, which can be designated with the following state designators:
-
- r
- running
- s
- suspended
- ca
- cancelled
- cd
- completed
- pd
- pending
- f
- failed
- to
- timed out
- nf
- node_fail
-
- The state_list operand is a comma-separated list of these state designators. Space characters are not allowed in the state_list.
- --starttime=starttime
- Select jobs eligible after this time. Valid Formats are. HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
- -t , --total
- Displays only the cumulative statistics for each job. Intermediate steps are displayed by default.
- -u uid_list, --uid=uid_list
- Displays the statistics only for the jobs started by the specified uid_list operand, which is a comma-separated list of uids. Space characters are not allowed in the uid_list. -1 for all uids, default is current uid. If run as user root default is all users. This is virtually the same as the --user option.
- -u user_list, --user=user_list
- Displays the statistics only for the jobs started by the specified user_list operand, which is a comma-separated list of users. Space characters are not allowed in the user_list. -1 for all uids, default is current uid. If run as user root default is all users. This is virtually the same as the --uid option.
- --usage
- Displays a help message.
- -v , --verbose
- Reports the state of certain variables during processing. This option is primarily used for debugging.
- -W wckey_list, --wckeys=wckey_list
- Displays the statistics only for the jobs started on the wckeys specified by the wckey_list operand, which is a comma-separated list of wckey names. Space characters are not allowed in the wckey_list. Default is all wckeys.
Job Accounting Fields
The following describes each job accounting field:-
- account
- User supplied account number for the job
- blockid
- Block ID, applicable to BlueGene computers only
- cpu
- The sum of the system time (systemcpu) and user time (usercpu) in seconds
- cputime
- Minimum CPU time of any process followed by its task id along with the average of all processes running in the step.
- elapsed
- The jobs elapsed time.
- The format of this fields output is as follows:
-
-
- [DD-[hh:]]mm:ss
-
- as defined by the following:
-
- DD
- days
- hh
- hours
- mm
- minutes
- ss
- seconds
-
- end
- Termination time of the job. Format output is as follows:
-
- MM/DD-hh:mm:ss
-
- as defined by the following:
-
- MM
- month
- DD
- day
- hh
- hours
- mm
- minutes
- ss
- seconds
-
- exitcode
- The first non-zero error code returned by any job step.
- gid
- The group identifier of the user who ran the job.
- group
- The group name of the user who ran the job.
- idrss
- Maximum unshared data size (in KB) of any process.
- inblocks
- Total block input operations for all processes.
- isrss
- Maximum unshared stack space size (in KB) of any process.
- ixrss
- Maximum shared memory (in KB) of any process.
- job
- The SLURM job identifier of the job.
- jobid
- The number of the job or job step. It is in the form: job.jobstep.
- jobname
- The name of the job or job step.
- majflt
- Maximum number of major page faults for any process.
- minflt
- Maximum number of minor page faults (page reclaims) for any process.
- msgrcv
- Total number of messages received for all processes.
- msgsnd
- Total number of messages sent for all processes.
- ncpus
- Total number of CPUs allocated to the job.
- nivcsw
- Total number of involuntary context switches for all processes.
- nodes
- A list of nodes allocated to the job.
- nprocs
- Total number of tasks in job. Identical to ntasks.
- nsignals
- Total number of signals received for all processes.
- nswap
- Maximum number of swap operations of any process.
- ntasks
- Total number of tasks in job.
- nvcsw
- Total number of voluntary context switches for all processes.
- outblocks
- Total block output operations for all processes.
- pages
- Maximum page faults of any process followed by its task id along with the average of all processes running in the step.
- partition
- Identifies the partition on which the job ran.
- rss
- Maximum resident set size of any process followed by its task id along with the average of all processes running in the step.
- start
- Initiation time of the job in the same format as end.
- status
- Displays the job status, or state.
- Output can be RUNNING, SUSPENDED, COMPLETED, CANCELLED, FAILED, TIMEOUT, or NODE_FAIL.
- submit
- The time and date stamp (in Universal Time Coordinated, UTC) the job was submitted. The format of the output is identical to that of the end field.
- systemcpu
- The amount of system CPU time. (If job was running on multiple cpus this is a combination of all the times so this number could be much larger than the elapsed time.) The format of the output is identical to that of the elapsed field.
- uid
- The user identifier of the user who ran the job.
- uid.gid
- The user and group identifiers of the user who ran the job. (This field is used in record headers, and simply concatenates the uid and gid fields.)
- user
- The user name of the user who ran the job.
- usercpu
- The amount of user CPU time. (If job was running on multiple cpus this is a combination of all the times so this number could be much larger than the elapsed time.) The format of the output is identical to that of the elapsed field.
- vsize
- Maximum Virtual Memory size of any process followed by its task id along with the average of all processes running in the step.
- wckey
- Workload Characterization Key. Arbitrary string for grouping orthogonal accounts together.
INTERPRETING THE -DUMP OPTION OUTPUT
The sacct commands --dump option displays data in a horizontal list of fields depending on the record type; there are three record types: JOB_START, JOB_STEP, and JOB_TERMINATED. There is a subsection that describes the output for each record type.When the data output is a job accounting field, as described in the section titled "Job Accounting Fields", only the name of the job accounting field is listed. Otherwise, additional information is provided.
- Note:
- The output for the JOB_STEP and JOB_TERMINATED record types present a pair of fields for the following data: Total CPU time, Total User CPU time, and Total System CPU time. The first field of each pair is the time in seconds expressed as an integer. The second field of each pair is the fractional number of seconds multiplied by one million. Thus, a pair of fields output as "1 024315" means that the time is 1.024315 seconds. The least significant digits in the second field are truncated in formatted displays.
Output for the JOB_START Record Type
The following describes the horizontal fields output by the sacct --dump option for the JOB_START record type.-
- Field #
- Field
- 1
- job
- 2
- partition
- 3
- submitted
- 4
- The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
- 5
- uid.gid
- 6
- (Reserved)
- 7
- JOB_START (literal string)
- 8
- Job Record Version (1)
- 9
- The number of fields in the record (16)
- 10
- uid
- 11
- gid
- 12
- The job name
- 13
- Batch Flag (0=no batch)
- 14
- Relative SLURM priority
- 15
- ncpus
- 16
- nodes
Output for the JOB_STEP Record Type
The following describes the horizontal fields output by the sacct --dump option for the JOB_STEP record type.-
- Field #
- Field
- 1
- job
- 2
- partition
- 3
- submitted
- 4
- The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
- 5
- uid.gid
- 6
- (Reserved)
- 7
- JOB_STEP (literal string)
- 8
- Job Record Version (1)
- 9
- The number of fields in the record (38)
- 10
- jobid
- 11
- end
- 12
- Completion Status; the mnemonics, which may appear in uppercase or lowercase, are as follows:
-
- CA
- Cancelled
- CD
- Completed successfully
- F
- Failed
- NF
- Job terminated from node failure
- R
- Running
- S
- Suspended
- TO
- Timed out
-
- 13
- exitcode
- 14
- ntasks
- 15
- ncpus
- 16
- elapsed time in seconds expressed as an integer
- 17
- Integer portion of the Total CPU time in seconds for all processes
- 18
- Fractional portion of the Total CPU time for all processes expressed in microseconds
- 19
- Integer portion of the Total User CPU time in seconds for all processes
- 20
- Fractional portion of the Total User CPU time for all processes expressed in microseconds
- 21
- Integer portion of the Total System CPU time in seconds for all processes
- 22
- Fractional portion of the Total System CPU time for all processes expressed in microseconds
- 23
- rss
- 24
- ixrss
- 25
- idrss
- 26
- isrss
- 27
- minflt
- 28
- majflt
- 29
- nswap
- 30
- inblocks
- 31
- outblocks
- 32
- msgsnd
- 33
- msgrcv
- 34
- nsignals
- 35
- nvcsw
- 36
- nivcsw
- 37
- vsize
Output for the JOB_TERMINATED Record Type
The following describes the horizontal fields output by the sacct --dump option for the JOB_TERMINATED (literal string) record type.-
- Field #
- Field
- 1
- job
- 2
- partition
- 3
- submitted
- 4
- The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
- 5
- uid.gid
- 6
- (Reserved)
- 7
- JOB_TERMINATED (literal string)
- 8
- Job Record Version (1)
- 9
- The number of fields in the record (38)
- Although thirty-eight fields are displayed by the sacct command for the JOB_TERMINATED record, only fields 1 through 12 are recorded in the actual data file; the sacct command aggregates the remainder.
- 10
- The total elapsed time in seconds for the job.
- 11
- end
- 12
- Completion Status; the mnemonics, which may appear in uppercase or lowercase, are as follows:
-
- CA
- Cancelled
- CD
- Completed successfully
- F
- Failed
- NF
- Job terminated from node failure
- R
- Running
- TO
- Timed out
-
- 13
- exitcode
- 14
- ntasks
- 15
- ncpus
- 16
- elapsed time in seconds expressed as an integer
- 17
- Integer portion of the Total CPU time in seconds for all processes
- 18
- Fractional portion of the Total CPU time for all processes expressed in microseconds
- 19
- Integer portion of the Total User CPU time in seconds for all processes
- 20
- Fractional portion of the Total User CPU time for all processes expressed in microseconds
- 21
- Integer portion of the Total System CPU time in seconds for all processes
- 22
- Fractional portion of the Total System CPU time for all processes expressed in microseconds
- 23
- rss
- 24
- ixrss
- 25
- idrss
- 26
- isrss
- 27
- minflt
- 28
- majflt
- 29
- nswap
- 30
- inblocks
- 31
- outblocks
- 32
- msgsnd
- 33
- msgrcv
- 34
- nsignals
- 35
- nvcsw
- 36
- nivcsw
- 37
- vsize
EXAMPLES
This example illustrates the default invocation of the sacct command:-
# sacct Jobid Jobname Partition Ncpus Status Exitcode ---------- ---------- ---------- ------- ---------- -------- 2 script01 srun 1 RUNNING 0 3 script02 srun 1 RUNNING 0 4 endscript srun 1 RUNNING 0 4.0 srun 1 COMPLETED 0
This example shows the same job accounting information with the brief option.
-
# sacct --brief Jobid Status Exitcode ---------- ---------- -------- 2 RUNNING 0 3 RUNNING 0 4 RUNNING 0 4.0 COMPLETED 0
-
# sacct --total Jobid Jobname Partition Ncpus Status Exitcode ---------- ---------- ---------- ------- ---------- -------- 3 sja_init andy 1 COMPLETED 0 4 sjaload andy 2 COMPLETED 0 5 sja_scr1 andy 1 COMPLETED 0 6 sja_scr2 andy 18 COMPLETED 2 7 sja_scr3 andy 18 COMPLETED 0 8 sja_scr5 andy 2 COMPLETED 0 9 sja_scr7 andy 90 COMPLETED 1 10 endscript andy 186 COMPLETED 0
This example demonstrates the ability to customize the output of the sacct command. The fields are displayed in the order designated on the command line.
-
# sacct --fields=jobid,ncpus,ntasks,nsignals,status Jobid Ncpus Ntasks Nsignals Status ---------- ------- ------- --------- ---------- 3 2 1 0 COMPLETED 3.0 2 1 0 COMPLETED 4 2 2 0 COMPLETED 4.0 2 2 0 COMPLETED 5 2 1 0 COMPLETED 5.0 2 1 0 COMPLETED
COPYING
Copyright (C) 2005-2007 Copyright Hewlett-Packard Development Company L.P.
This file is part of SLURM, a resource management program. For details, see <https://computing.llnl.gov/linux/slurm/>.
SLURM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
FILES
- /etc/slurm.conf
- Entries to this file enable job accounting and designate the job accounting log file that collects system job accounting.
- /var/log/slurm_accounting.log
- The default job accounting log file. By default, this file is set to read and write permission for root only.
SEE ALSO
ps(1), srun(1), squeue(1), getrusage(2), time(2)Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre