Burst Buffer Plugin Programmer Guide
Overview
This document describes the Slurm burst buffer plugins and the APIs that defines them. It is intended as a resource to programmers wishing to write their own Slurm burst buffer plugin.
Slurm burst buffer plugins must conform to the Slurm Plugin API with the following specifications:
const char plugin_name[]="full text name"
A free-formatted ASCII text string that identifies the plugin.
const char
plugin_type[]="major/minor"
The major type must be "burst_buffer". The minor type can be any suitable name for the type of burst buffer package. The following burst buffer plugins are included in the Slurm distribution
- datawarp — Use Cray APIs to provide burst buffer.
- generic — Use generic burst buffer plugin.
const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and
any attempt to load the plugin from a different version of Slurm will result
in an error.
If not specified, then the plugin may be loaded by Slurm commands and
daemons from any version, however this may result in difficult to diagnose
failures due to changes in the arguments to plugin functions or changes
in other Slurm functions used by the plugin.
API Functions
All of the following functions are required. Functions which are not implemented must be stubbed.
int init (void)
Description:
Called when the plugin is loaded, before any other functions are
called. Put global initialization here.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void fini (void)
Description:
Called when the plugin is removed. Clear any allocated storage here.
Returns: None.
Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().
int bb_p_load_state(bool init_config)
Description:
This function loads the current state of the burst buffer.
Arguments:
init_config
(input) true if called as part of slurmctld initialization.
Returns:
A Slurm errno
int bb_p_state_pack(uid_t uid, Buf buffer, uint16_t protocol_version)
Description:
Pack current burst buffer state information for network transmission.
Arguments:
uid
(input) Owning user ID.
buffer
(input) buffer that will be packed.
protocol_version
(input) Version number of the data packing mechanism.
Returns:
A Slurm errno
int bb_p_reconfig(void)
Description:
Reread the burst buffer config file when it is updated.
Returns:
A Slurm errno
uint64_t bb_p_get_system_size(char *name)
Description:
Get the total burst buffer size in MB of a given plugin name.
Arguments:
name
(input) Plugin name of the burst buffer. If name is NULL, return the total
space of all burst buffer plugins.
Returns:
The size of the burst buffer in MB.
int bb_p_job_validate(job_desc_msg_t *job_desc, uid_t submit_uid)
Description:
Validation of a job submit request with respect to burst buffer option.
Arguments:
job_desc
(input) Job submission request.
submit_uid
(input) ID of the user submitting the job.
Returns:
A Slurm errno.
int bb_p_job_validate2(job_record_t *job_ptr, char **err_msg)
Description:
Validation of a job submit request with respect to burst buffer option.
Arguments:
job_ptr
(input) Job record for the job request with respect to burst buffer.
err_msg
(output) Error message, sent directlt to job submission command
Returns:
A Slurm errno.
void bb_p_job_set_tres_cnt(job_record_t *job_ptr, uint64_t *tres_cnt, bool locked);
Description:
Set the tres count in the job recored.
Arguments:
job_ptr
(input) Job record to be set.
tres_cnt
(input/output) Fill in this already allocated array with tres_cnts
locked
(input) If tres read lock is locked or not.
Returns:
None
time_t bb_p_job_get_est_start(job_record_t *job_ptr)
Description:
Get an estimation of when a job can start.
Arguments:
job_ptr
(input) Start time of this job.
Returns:
Estimated start time of job_ptr.
int bb_p_job_try_stage_in(void)
Description:
Allocate burst buffers to jobs expected to start soonest.
Returns:
A Slurm errno
int bb_p_job_test_stage_in(job_record_t *job_ptr, bool test_only)
Description:
Determine if a job's burst buffer stage-in is complete.
Arguments:
job_ptr
(input) Job record to test.
test_only
(input) If false, then attempt to load burst buffer if possible.
Returns:
0 stage-in is underway
1 stage-in complete
-1 state-in not started or burst buffer in some unexpeced state.
int bb_p_job_begin(job_record_t *job_ptr)
Description:
Attempt to claim burst buffer resources.
Arguments:
job_ptr
(input) Job record to test.
Returns:
A Slurm errno
int bb_p_job_revoke_alloc(job_record_t *job_ptr)
Description:
Revoke allocation, but do not release resources.
Executed after bb_g_job_begin if there was an allocation failure.
Does not release previously allocated resources.
Arguments:
job_ptr
(input) Job record to test.
Returns:
A Slurm errno
int bb_p_job_start_stage_out(job_record_t *job_ptr)
Description:
Trigger a job's burst buffer stage out to begin.
Arguments:
job_ptr
(input) Job to stage out.
Returns:
A Slurm errno
int bb_p_job_test_post_run(job_record_t *job_ptr)
Description:
Determine of jobs's post run operation is complete.
Arguments:
job_ptr
(input) Job to check if post run operation is complete.
Returns:
0 - post run operation is underway
1 - post run operation complete
-1 - fatal error
int bb_p_job_test_stage_out(job_record_t *job_ptr)
Description:
Determine of jobs's stage out is complete.
Arguments:
job_ptr
(input) Job to check if stage out is complete.
Returns:
0 - stage-out is underway
1 - stage-out complete
-1 - fatal error
int bb_p_job_cancel(job_record_t *job_ptr)
Description:
Terminate any file staging and release burst buffer resources.
Arguments:
job_ptr
(input) Job to cancel.
Returns:
A Slurm errno
char *bb_p_xlate_bb_2_tres_str(char *burst_buffer)
Description:
Translate burst buffer string to TRES string.
Arguments:
burst_buffer
(input) Burst buffer to translate to TRES string
Returns:
The TRES string of the given burst buffer (Note: User must xfree the
return value).
Last modified 23 October 2019