Slurm Job Accounting Gather Plugin API
Overview
This document describes Slurm job accounting gather plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own Slurm job accounting gather plugins.
Slurm job accounting gather plugins must conform to the Slurm Plugin API with the following specifications:
const char plugin_name[]="full text name"
A free-formatted ASCII text string that identifies the plugin.
const char
plugin_type[]="major/minor"
The major type must be "jobacct_gather." The minor type can be any suitable name for the type of accounting package. We currently use
- cgroup — Gathers information from Linux cgroup infrastructure and adds this information to the standard rusage information also gathered for each job. (Experimental, not to be used in production.)
- linux — Gathers information from Linux /proc table and adds this information to the standard rusage information also gathered for each job.
- none — No information gathered.
const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and
any attempt to load the plugin from a different version of Slurm will result
in an error.
If not specified, then the plugin may be loaded by Slurm commands and
daemons from any version, however this may result in difficult to diagnose
failures due to changes in the arguments to plugin functions or changes
in other Slurm functions used by the plugin.
The sacct program can be used to display gathered data from regular accounting and from these plugins.
The programmer is urged to study src/plugins/jobacct_gather/linux and src/common/slurm_jobacct_gather.[c|h] for a sample implementation of a Slurm job accounting gather plugin.
API Functions
All of the following functions are required. Functions which are not implemented must be stubbed.
int init (void)
Description:
Called when the plugin is loaded, before any other functions are
called. Put global initialization here.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void fini (void)
Description:
Called when the plugin is removed. Clear any allocated storage here.
Returns: None.
Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().
int jobacct_gather_p_poll_data(List task_list, bool pgid_plugin, uint64_t cont_id)
Description:
Build a table of all current processes.
Arguments:
task_list (in/out) List containing
current processes
pgid_plugin (input) if we are
running with the pgid plugin
cont_id (input) container id of processes if not running with pgid
int jobacct_gather_p_endpoll(void)
Description:
Called when the process is finished to stop the
polling thread.
Arguments:
none
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_p_add_task(pid_t pid, uint16_t tid)
Description:
Used to add a task to the poller.
Arguments:
pid (input) Process id
tid (input) slurm global task id
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
Job Account Gathering
All of the following functions are not required but may be used.
int jobacct_gather_init(void)
Description:
Loads the job account gather plugin.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_fini(void)
Description:
Unloads the job account gathering plugin.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_startpoll(uin16_t frequency)
Description:
Creates and starts the polling thread.
Arguments:
frequency (input) frequency of the polling.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacct_gather_change_poll(uint16_t frequency)
Description:
Changes the polling thread to a new frequency.
Arguments:
frequency (input) frequency of the polling
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacct_gather_suspend_poll(void)
Description:
Temporarily stops the polling thread.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacct_gather_resume_poll(void)
Description:
Resumes the polling thread that was stopped.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
jobacctinfo_t *jobacct_gather_stat_task(pid_t pid)
Description:
Gets the basis of the information of the task.
Arguments:
pid (input) process id.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
jobacctinfo_t *jobacct_gather_remove_task(pid_t pid)
Description:
Removes the task.
Arguments:
pid (input) process id.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_set_proctrack_container_id(uint64_t id)
Description:
Sets the proctrack container to a given id.
Arguments:
id (input) id to set.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacct_gather_set_mem_limit(uint32_t job_id, uint32_t step_id, uint64_t mem_limit)
Description:
Sets the memory limit of the job account.
Arguments:
job_id (input) id of the job.
sted_id (input) id of the step.
mem_limit (input) memory limit in megabytes.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacct_gather_handle_mem_limit(uint64_t total_job_mem, uint64_t total_job_vsize)
Description:
Called to find out how much memory is used.
Arguments:
total_job_mem (input) total
amount of memory for jobs.
total_job_vsize (input) the
total job size.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
Job Account Info
All of the following functions are not required but may be used.
jobacctinfo_t *jobacctinfo_create(jobacct_id_t *jobacct_id)
Description:
Creates the job account info.
Arguments:
jobacct_id (input) the job
account id.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacctinfo_destroy(void *object)
Description:
Destroys the job account info.
Arguments:
object (input) the job that needs to be destroyed
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacctinfo_setinfo(jobacctinfo_t *jobacct, enum jobacct_data_type type, void *data)
Description:
Set the information for the job.
Arguments:
jobacct (input) job account
type(input) enum telling the plugin how to transform the data.
data (input/output) Is a void * and
the actual data type depends upon the first argument to this function (type).
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacctinfo_getinfo(jobacctinfo_t *jobacct, enum jobacct_data_type type, void *data)
Description:
Gets the information about the job.
Arguments:
jobacct (input) job account.
type (input) the
data type of the job account.
data
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacctinfo_pack(jobacctinfo_t *jobacct, uint16_t rpc_version, Buf buffer)
Description:
Packs the job account information.
Arguments:
jobacct (input) the job account.
rpc_version (input) the
rpc version.
buffer (input) the buffer.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int jobacctinfo_unpack(jobacctinfo_t **jobacct, uint16_t rpc_version, Buf buffer)
Description:
Unpacks the job account information.
Arguments:
jobacct (input) the job account.
rpc_version (input) the rpc
version.
buffer (input) the buffer.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacctinfo_aggregate(jobacctinfo_t *dest, jobacctinfo_t *from)
Description:
Aggregates the jobs.
Arguments:
dest (input) New destination of the job.
from (input) Original location of job.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void jobacctinfo_2_stats(slurmdb_stats_t *stats, jobacctinfo_t *jobacct)
Description:
Gets the stats of the job in accounting.
Arguments:
stats (input) slurm data base stat.
jobacct (input) the job account.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
Parameters
These parameters can be used in the slurm.conf to configure the plugin and the frequency at which to gather information about running jobs.
- JobAcctGatherType
- Specifies which plugin should be used.
- JobAcctGatherFrequency
- Time interval between pollings in seconds.
Last modified 27 March 2015