Frequently Ask Questions
- How do I get access to the cluster?
- How do I get access to an application I need?
- How busy is the cluster?
- Are any GPUs available?
- How do I submit a job?
- How do tell if a job is running?
- How do I cancel a job?
- What partition should I submit to?
- I accidently deleted a file/directory. What should I do?
- Can I reserve resources for an upcoming project?
- How do I run an interactive job?
- My jobs work in some partitions and not in others, why?
- How do I tell Open MPI to use Infiniband?
- Who created this website?
- I have a question that is not answered here. What can I do?
How do I get access to the cluster?
To qualify for access to the computational resources you must be a current group member of, collaborator of, and/or have permission from one of: Dr. Brooks, Dr. Frank, Dr. Wymore, Dr. Zgid, or Dr. Zimmerman. Instructions for requesting cluster access and configuring SSH access can be found here.
How do I get access to an application I need?
You can use the 'module avail' command to get a list of pre-installed applications. See the 'Module' button on the left for more information. If you do not see the application you need you can compile and install it in your home directory. If you need help with compiling an application see the 'Support' section on the left.
How busy is the cluster?
See the CPU Load button on the left for a graphical display of the CPU load on each node.
UtilizationUtilization refers to what percentage of CPUs that are busy. It is calculated by summing the load of all the nodes divided by the number of cores.
TimeTime is the time the data was collected. If the Time is not within 5 minutes of the current time that means the page is out of date and there is a problem with SLURM or the webserver.
ActiveActive refers to percent of nodes that are not idle
Are any GPUs available?
See the GPU Load button on the left for a graphical display of the GPU load on GPU nodes.
UtilizationUtilization refers to what percentage of the GPUs are busy. It is calculated by summing the load of all the GPUs divided by the number of GPUs.
TimeTime is the time the data was collected. If the Time is not within 5 minutes of the current time that means there is a problem with SLURM or the webserver.
ActiveActive refers to percent of GPUs that are not idle
How do I submit a job?
The SLURM commands used for job submission are : salloc, sbatch, and srun. See the SLURM button on the left for man pages for those and other SLURM commands. If you need examples of SLURM jobs scripts have a look in /home/dave/jobs.
How do tell if a job is running?
The SLURM commands used for checking job status are: scontrol, sdiag, squeue and sstat. See the SLURM button on the left for man pages for those and other SLURM commands.
How do I cancel a job?
The SLURM command to cancel a running or pending job is: scancel. See the SLURM button on the left for man pages for this and other SLURM commands.
What partition should I submit to?
There are 2 parts to the answer.
First, for information about the available partitions see the Partitions button on the left. Also, see the SLURM button to the left and then the sinfo man page for the explanation..
Secondly, see the Accounts button to the left to see which partitions you have been granted access to. To request access to additional partitions please submit a request by one of the contact methods listed in the SUPPORT section on the bottom of the left side panel.
I accidently deleted a file/directory. What should I do?
Cluster home directories are backed up, in most cases nightly. If the file/directory you deleted was in your home directory there is a chance it can be restored. Cluster scratch directories are not backed up.
Backups happen week nights starting at 8:00 p.m. If you create a file and delete it before a backup can complete then your file cannot be restored. On weedends there are backups, but fewer than on a weekday. If you create a file over the weekend and delete it there is a chance it can be recovered.
Backups are not kept forever. If you deleted a file less than 90 days ago there is a chance it can be recovered. The backup media is recycled every 90 days.
To request a RESTORE please have the following information available:
- Approximate date the file(s) or directory was created
- Approximate date the file(s) or directory was deleted
- There may be multiple backups of the files. Would you like the files restored to the state they were at a specific date and time?
- The absolute path to the file or directory
- The absolute path location you would like your file(s) or directory restored to
See the SUPPORT section at the bottom left side panel of this page. Send an email with the above information.
Can I reserve resources for an upcoming project?
See the SLURM documentation here. After reading the documentation you can contact support by one of the methods in the SUPPORT section at the bottom left side panel of this page and request a reservation.
How do I run an interactive job?
The '--pty /bin/bash -i' option to the srun command will give you a shell on a node. See the srun man page in the Slurm section on the left for more details.
My jobs work in some partitions and not in others, why?
Our cluster is a heterogeneous resource built over time. The nodes have anywhere from 2GB to 38GB RAM per core. Some of the nodes have SSE2 registers while others have AVX registers. Common issues are:
- There is not enough memory on the nodes in the partition you submitted the job.
- The application was compiled to run on a CPU with instructions that are not supported by the hardware in the parttion you submitted your job.
- The version of OpenMPI you are using is out of date..
To resolve these issues it is useful to know:
- The partition your job ran in
- The node(s) your job ran on
- The Modules you loaded
- The application you are running
- What partitions the application works in
See the SUPPORT section at the bottom left side panel of this page.
How do I tell Open MPI to use Infiniband?
Hmmm. Open MPI should use Infiniband by default if it is available. In general, you can specifiy a network fabric like this:
- mpirun --mca btl openib,self -np 16 ${CHARMMEXEC} < 5cb.inp
for Infiniband - mpirun --mca btl tcp,self -np 16 ${CHARMMEXEC} < 5cb.inp
for Ethernet
Who created this website?
I did. Of course, I shamelessly plagiarized the css template from Chris Coyier and Doug Neiner whose original work can be found here.
I have a question that is not answered here. What can I do?
See the SUPPORT section at the bottom left side panel of this page.