General Questions¶
‘Access denied’ error on the client, even with correct permissions¶
Please check if you have SELinux enabled on the client machine. If it is enabled, disabling it
should solve your problem. SELinux can be disabled by setting SELINUX=disabled
in the
configuration file /etc/selinux/config
. Afterwards, you might need to reboot your client for the
new setting to become effective.
Client refuses to mount because of an ‘unknown storage target’¶
Scenario¶
While testing BeeGFS, you removed the storage directory of a storage server, but kept the storage directory of the management server. Now the BeeGFS client refuses to mount and prints an error about an unknown storage target to the log file.
What happened to your file system¶
When you start a new beegfs-storage
daemon with a given storage directory, the daemon
initializes this directory by assigning an ID to this storage target path and registering this
target ID at the management server. When you delete this directory, the storage server creates a
new directory on next startup with a new ID and also registers this ID at the management server.
(Because the storage server cannot know what happened to the old directory and whether you might
have just moved the data to another machine, so it needs a new ID here.)
When the client starts, it performs a sanity check by querying all registered target IDs from the management server and checks whether all of them are accessible. If you removed a storage directory, this check fails and thus the client refuses to mount. (Note: This sanity check can be disabled, but it is definitely a good thing in this case and saves you from more trouble.)
Now you have two alternative options:
Solution A¶
Simply remove the storage directories of all BeeGFS services to start with a clean new file system:
Stop all the BeeGFS server daemons, i.e.
beegfs-mgmtd
,beegfs-meta
,beegfs-storage
:# systemctl stop beegfs\*
Delete (
rm -rf
) all their storage directories. The paths to the server storage directories can be looked up in the server configuration files:storeMgmtdDirectory
in configuration file/etc/beegfs/beegfs-mgmtd.conf
storeMetaDirectory
in configuration file/etc/beegfs/beegfs-meta.conf
storeStorageDirectory
in configuration file/etc/beegfs/beegfs-storage.conf
Restart the daemons.
# systemctl start beegfs\*
Now you have a fresh new file system without any of the previously registered target IDs.
Solution B¶
Unregister the invalid target ID from the management server.
For this, you would first use the beegfs-ctl
tool (the tool is part of the beegfs-utils
package on a client) to list the registered target IDs:
$ beegfs-ctl --listtargets --longnodes
Then check the contents of the file targetNumID
in your storage directory on the storage server
to find out which target ID is the current one that you want to keep.
For all other target IDs from the list, which are assigned to this storage server but are no longer
valid, use this command to unregister them from the management daemon:
$ beegfs-ctl --unmaptarget <targetID>
Afterwards, your client will no longer complain about the missing storage targets.
Note
There are options in the server config files to disallow initialization of new storage
directories and registration of new servers or targets, which are not set by default, but
should be set for production environments. See storeAllowFirstRunInit
and
sysAllowNewServers
.
Too many open files on beegfs-storage server¶
This usually happens when a user application leaks open files, e.g. it creates a lot of files and forgets to close them due to a bug in the application. (Note that open files will automatically be closed by the kernel when an application ends, so this problem is usually temporary.)
There are per-process limits and system-wide limits (accounting for all processes on a machine together) to control how many files can be kept open at the same time on a host.
To avoid applications from opening too many files at once and make sure that such application
problems do not affect servers, it makes sense to reduce the per-process limit for normal
applications of normal users to a reasonably low value, e.g. 1024 via the nofile
setting in
/etc/security/limits.conf
.
If your applications actually need to open a lot of files at the same time and you need to raise the
limit in the beegfs-storage
service, here are the steps to do this:
You can check the current limit for the maximum number of open files through the
/proc
file system, e.g. for runningbeegfs-storage
processes on a machine:$ for i in `pidof beegfs-storage`; do cat /proc/$i/limits | grep open; done Max open files 50000 50000 files
The
beegfs-storage
and thebeegfs-meta
processes can try to increase their own limits through the configuration optiontuneProcessFDLimit
, but this will be subject to the hard limits that were defined for the system. If thebeegfs-storage
service fails to increase its own limit, it will print a message line to its log file (/var/log/beegfs-storage.log
). Set the following in/etc/beegfs/beegfs-storage.conf
to let thebeegfs-storage
service try to increase its own limit to 10 million files:tuneProcessFDLimit=10000000
You can increase the system-wide limits (the limits that account for all processes together) to 20 million at runtime by using the following commands:
# sysctl -w fs.file-max=20000000 fs.file-max = 20000000 # sysctl -w fs.nr_open=20000000 fs.nr_open = 20000000
Make the changes from the previous step persistent across reboots by adding the following lines to
/etc/sysctl.conf
(or a corresponding file in the subdir/etc/sysctl.d
):fs.file-max = 20000000 fs.nr_open = 20000000
Add the following line to
/etc/security/limits.conf
(or a corresponding file in the subdir/etc/security/limits.d
) to increase the per-process limit to 10 million. If this server is not only used for BeeGFS, but also for other applications, you might want to set this only for processes owned by root.* - nofile 10000000
Now you need to close your current shell and reopen a new shell on the system make the new settings effective. You can then restart the
beegfs-storage
process from the new shell and look at its limits:$ for i in `pidof beegfs-storage`; do cat /proc/$i/limits | grep open; done Max open files 10000000 10000000 files
What needs to be done when a server hostname has changed¶
Scenario: hostname
or $HOSTNAME
report a different name than during the
BeeGFS installation and BeeGFS servers refuse to start up. Logs tell the nodeID
has changed and therefore a startup was refused.
Note that by default, node IDs are generated based on the hostname
of a server.
As IDs are not allowed to change, see here for information on how to manually
set your ID back to the previous value: Setting node or target IDs.
Change log level during runtime¶
You can use beegfs-ctl
to change the log level of any service. As in the config files, the
levels range from 1 to 5 with 5 being the most verbose.
$ beegfs-ctl --genericdebug --nodetype=meta --nodeid=1 "setloglevel 5"
Client won’t unmount due to filesystem being busy¶
Run the command below to identify which processes are keeping the mount point busy.
$ lsof /mnt/beegfs
If that is the case, you will see a list of processes as follows:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME bash 21354 john cwd DIR 0,18 1 2 /mnt/beegfs bash 21355 mary cwd DIR 0,18 1 2 /mnt/beegfs
Close the found processes. If that doesn’t work, try to use the following command to kill them:
$ fuser -k /mnt/beegfs
Wait 5 seconds.
Retry the unmount / client stop
If you still get the same error message, you might still be able to identify the processes by running lsof (without path argument) and checking the referenced paths. If a process is keeping a handler to a path like
/mnt/beegfs/mydir1
, then the path in the new list would appear as/mydir1
(the former mountpoint removed from the path).
What happens with BeeGFS in a split brain scenario?¶
In a split brain scenario, only the nodes of one system partition remain online, which is the partition where the management service is running. The nodes of the separated partition will deny access to files until they were reconnected and could resume their communication with the management service. Actually, their services would stall and only start producing IO errors if the system remained split for too long.
This behavior is intentional, because in typical BeeGFS use-cases, users can not be allowed to access data that might be out of date. Especially if write access is allowed in both partitions. In this case, the same file could end up being modified at both partitions, during the time when they were split, making it impossible for the synchronization to occur later when the split was over.