Release Notes
The following are the contents of the RELEASE_NOTES file as distributed with the Slurm source code for this release. Please refer to the NEWS include alongside the source as well for more detailed descriptions of the associated changes, and for bugs fixed within each maintenance release.
RELEASE NOTES FOR SLURM VERSION 20.11 IMPORTANT NOTES: If using the slurmdbd (Slurm DataBase Daemon) you must update this first. NOTE: If using a backup DBD you must start the primary first to do any database conversion, the backup will not start until this has happened. The 20.11 slurmdbd will work with Slurm daemons of version 19.05 and above. You will not need to update all clusters at the same time, but it is very important to update slurmdbd first and having it running before updating any other clusters making use of it. Slurm can be upgraded from version 19.05 or 20.02 to version 20.11 without loss of jobs or other state information. Upgrading directly from an earlier version of Slurm will result in loss of state information. If using SPANK plugins that use the Slurm APIs, they should be recompiled when upgrading Slurm to a new major release. NOTE: Slurmctld is now set to fatal in case of computing node configured with CPUs == #Sockets. CPUs has to be either total number of cores or threads. NOTE: The FastSchedule option has been removed. The FastSchedule=2 functionality (used for testing and development) is available as the new SlurmdParameters=config_overrides option. NOTE: Slurmdbd is now set to fatal if slurmdbd.conf file isn't owned by SlurmUser or it's mode is not set to 0600. NOTE: In the database the AllocGres and ReqGres columns have been removed as they are duplicates for those using AccountingStorageTRES. If you use these columns please make sure to backup this information since it will be lost and set up AccountingStorageTRES with your GRES to be able to get equivalent information out of AllocTRES and ReqTRES after upgrade. NOTE: PMIx v1.1.4 and below are no longer supported. HIGHLIGHTS ========== -- The example systemd unit files have been changed to the "simple" type of operation, and the daemon will now run in the foreground within systemd instead of daemonizing itself. -- Log messages enabled by the various DebugFlags have been overhauled, and will all print at the verbose() level, and prepend the flag name that is associated with a given log message. -- A separate unversioned libslurm_pmi.so will be installed, and the libpmi.so that Slurm can (optionally) install will link to that rather than libslurm. This should resolve long-standing issues when building static OpenMPI libraries and later updating your Slurm release, thereby breaking the embedded libslurm.so.link in those OpenMPI libraries that were inherited from libpmi.so. -- accounting_storage/filetxt has been removed as an option. Please consider using accounting_storage/slurmdbd as an alternative. -- setting of number of Sockets per node was standardized for configuration line with and without Boards=. Specifically in case of Boards=1 and #CPUs given the default value of Sockets will be set to #CPUs / #Cores / #Threads. -- Dynamic Future Nodes - slurmds started with -F[ ] will be associated with a nodename in Slurm that matches the same hardware configuration. -- SlurmctldParameters=cloud_reg_addrsa - Cloud nodes automatically get NodeAddr and NodeHostname set from slurmd registration. -- SlurmctldParameters=power_save[_min]_interval - Configure how often the power save module looks to do work. -- By default, a step started with srun will be granted exclusive (or non- overlapping) access to the resources assigned to that step. No other parallel step will be allowed to run on the same resources at the same time. This replaces one facet of the '--exclusive' option's behavior, but does not imply the '--exact' option described below. To get the previous default behavior - which allowed parallel steps to share all resources - use the new srun '--overlap' option. -- In conjunction to this non-overlapping step allocation behavior being the new default, there is an additional new option for step management '--exact', which will allow a step access to only those resources requested by the step. This is the second half of the '--exclusive' behavior. Otherwise, by default all non-gres resources on each node in the allocation will be used by the step, making it so no other parallel step will have access to those resources unless both steps have specified '--overlap'. -- --threads-per-core now influences task layout/binding, not just allocation. -- AutoDetect in gres.conf can now be specified for some nodes while not for others via the NodeName option. -- gres.conf - Add new MultipleFiles configuration entry to allow a single GRES to manage multiple device files simultaneously. -- Remove SallocDefaultCommand option. -- Add support for an "Interactive Step", designed to be used with salloc to launch a terminal on an allocated compute node automatically. Enable by setting "use_interactive_step" as part of LaunchParameters. -- Add IPv6 support. Must be explicitly enabled with EnableIPv6 in CommunicationParameters. IPv4 support can be disabled with DisableIPv4. -- Allow use of a target directory with "srun --bcast", and change the default filename to include the node name as well. -- Added a new --mail-type=INVALID_DEPEND option to salloc, sbatch, and srun. -- Differences between hardware (memory size, number of CPUs) discovered on node vs configured in slurm.conf will now throw an error only when the node state is set to drain. Previously it was done on every node registration, those messages were demoted to debug level. -- Added "scrontab", which permits crontab-compatible job scripts to be defined. These scripts will recurr automatically (at most) on the intervals described. -- Enable -lnodes=#:gpus=# in #PBS/qsub -l nodes syntax. -- Any user >= Operator can see any hidden partition by default, as SlurmUser or root already did. -- select/linear will now allocate up to nodes RealMemory when configured with SelectTypeParameters=CR_Memory and --mem=0 specified. Previous behavior was no memory accouted and no memory limits implied to job. -- slurmrestd - add API to interface with slurmdbd. -- Add --ntasks-per-gpu option. -- Add --gpu-bind=single option. -- Fix "scontrol takeover [backup]" hangs when specifying a backup > 1. All slurmctlds below the "backup" will be shutdown. -- The names of the functions in the cli_filter plugin are now prepended with the string "cli_filter_p_". For example, the function setup_defaults() was changed to cli_filter_p_setup_defaults(). CONFIGURATION FILE CHANGES (see man appropriate man page for details) ===================================================================== -- Removed "cpusets" option from TaskPluginParam. Please use task/cgroup. -- Removed MsgAggregationParams. -- Removed Layouts. -- Remove switch/generic plugin. -- The acct_gather_energy/cray_aries plugin has been renamed to acct_gather_energy/pm_counters. -- The JobCompLoc URL endpoint when the JobCompType=jobcomp/elasticsearch plugin is enabled is now fully configurable and the plugin no longer appends a hardcoded "/slurm/jobcomp" index and type suffix to it. -- Removed support for "default_gbytes" option in SchedulerParameters. COMMAND CHANGES (see man pages for details) =========================================== -- Make sacct get the UID from database instead of from the username and a system call. Add --use-local-uid option to sacct to use old behavior. -- The '%s' format in -e/-i/-o options to sbatch will expand to "batch" rather than "4294967294". -- squeue - added "pendingtime" as a option for --Format. -- sacct - AllocGres and ReqGres were removed. Alloc/ReqTres should be used instead. -- scontrol - added the "Reserved" license count to 'scontrol show licenses'. -- Add time specification: "now- " (i.e. subtract from the present) -- squeue - put sorted start times of "N/A" or 0 at the end of the list. -- Change "scontrol reboot ASAP" to use next_state=resume logic. -- scontrol - added an admin-settable "Comment" field to each Node. -- squeue and sinfo -O no longer repeat the last suffix specified. -- salloc now waits for PrologSlurmctld to finish before entering the shell. API CHANGES =========== -- slurm_ctl_conf_t has been renamed to slurm_conf_t. -- slurm_free_kvs_comm_set() has been renamed to slurm_pmi_free_kvs_comm_set(), slurm_get_kvs_comm_set() has been renamed to slurm_pmi_get_kvs_comm_set(). -- slurm_job_step_layout_get() parameters has changed to use slurm_step_id_t see slurm.h for new implementation. If not running hetsteps just put NO_VAL as the value for step_het_comp. -- slurm_job_step_stat() parameters has changed to use slurm_step_id_t see slurm.h for new implementation. If not running hetsteps just put NO_VAL as the value for step_het_comp. -- slurm_job_step_get_pids() parameters has changed to use slurm_step_id_t see slurm.h for new implementation. If not running hetsteps just put NO_VAL as the value for step_het_comp. -- slurmdb_selected_step_t has been renamed slurm_selected_step_t. -- slurm_sbcast_lookup() arguments have changed. It now takes a populated slurm_selected_step_t instead of job_id, het_job_offset, step_id. -- Due to internal restructuring ahead of the 20.11 release, applications calling libslurm MUST call slurm_init(NULL) before any API calls. Otherwise the API call is likely to fail due to libslurm's internal configuration not being available.