Walk Me Through

1. Generating key pair

After receiving mail notification about your account has been successfully created on the BA-HPC. Just one step more is required to establish a connection between your machine and the BA-HPC cluster. All you will need is to create public/private authentication key pair. It is a more secured and easy way to use than creating a password beside the username.

Passphrase (Optional)

Creating a passphrase while generating generate private/public key pair is a good practice. It encrypts your private key locally on your machine. You may need the creation of passphrase in case if your machine is shared with other colleagues/users. You will be asked about the passphrase everytime you turn on your machine to login. You can skip the passphrase creation if your machine is for personal use. You will not be asked for any passwords to login.

Please do not share your private key with any. We update your account only by taking your public key to add it in your account.

Unix-like systems users

Use the ssh-keygen command to generate a public/private key pair

$ ssh-keygen

Windows users

The following steps will help you to create public/private generate key pair via Bitvise

Install SSH client.
Generate public/private key pair from Client Key manager
Press the Generate button to generate a new keypair
Export SSH public key in OpenSSH format, 2048-bit length

Putty and Bitvise are SSH Clients for Windows. However, it is recommended to use Bitvise for securitry resasons.

2. Uploading public key

After generating the public key, you can be directed to BA-HPC web portal via your dashboard to upload your public key. You will go to your proposal you have created and click the Add key

For key must be in Openssh format warning, kindly make sure you exported SSH public key in OpenSSH format.
For Key is invalid warning, try to remove any trailing white spaces at the end of your public key.

3. Accessing BA-HPC

The cluster have two login nodes, login01 and login02, that are available for users to log into. From any of the login nodes you can submit and monitor your jobs, look at results of the jobs, etc.

For most tasks you will wish to accomplish, you will start by logging into the login node. In order to do that, you need to use the Secure Shell protocol (SSH). This is standardly installed as ssh on Unix systems, and clients are available for Windows and Mac.

Unix-like systems user

On Unix-like systems, you can login to login01 or login02 by executing in the terminal

$ ssh username@login01.c2.hpc.bibalex.org

If your private key is not palced under $HOME/.ssh directory on your local machine.

$ ssh -i path/to/private/key username@login01.c2.hpc.bibalex.org

Windows user

If you're using BitVise client, you need do the following:

Fill server info:

Set the host to login01.c2.hpc.bibalex.org or login02.c2.hpc.bibalex.org and port to 22

Fill authentication info:

Write your real username example alex030u1instead of username. Choose Initial method public key from drop down list, from Client Key drop down list, choose the name of the last key you created and uploaded to BA-HPC website, for example in the following screenshot named Profile 2.

Click on Login to establish ssh connection.

If you can't login

Make sure the network your are using allow SSH protocol through port 22.
Make sure the private key and public key are related to each other and not different ones. In case you have created several keys.
For Linux user, please make sure the file permission -rw------- is applied to your private key only and your .ssh directory permission is drwx------
For Windows user, you can check the reason of the error from Bitvise logs.
For Windows user, kindly make sure you are not using a depricated version of Bitvise.
For dual-boot user, please make sure to create key pair and login from the same system.

Click on New terminal console to interact with the BA-HPC through a new session.

New Terminal Console

That's the new terminal console, when it appears, it mean you successfully opened a new session on the BAHPC on the login node. You will type all the commands you need after the dollar sign. The current folder is you home directory.

The most popular commands to deal with files and directories are cd and ls. cd data command to change directory to data (in other words as if you are doubling clicking a folder called data on windows to acess it). After that, ls command, mainly for listing and checking the available files where is no files yet as shown below. That's takes us to next section, how to upload our work files under data directory.

Best practises on the login node

It's not preferred to run software commands on login node, it may result with memory fault or segmentation errors. The best practice is to run commands via Slurm, to distribute the job on the compute nodes, whether CPU or GPU.
DO NOT RUN computationally intensive processes on the login node. Maximum runtime of any process on login node is 30 minutes.
On login node, maximum simultaneous processes for each user is 100.
Note that your home directory $HOME is limited to 100 MB. Use data directory (linked at $HOME/data) for large data.

4. Uploading and downloading your work

Unix-like systems user

On your local machine, excute the follwoing command to upload your work on BA-HPC:

$ rsync -a ~/$HOME/my-work username@login01.c2.hpc.bibalex.org:/home/username/data/

On your local machine, excute the follwoing command to download your results from the BA-HPC :

$ rsync -a username@login01.c2.hpc.bibalex.org:/home/username/data/my-results ~/$HOME

For a user-friendly way, you may use a free SFTP program called FileZilla

Windows users

After login, click on new SFTP session to start transfering your files. After that you can drag and drop your files/folders to upload or download.

When you return to the terminal, you can see the files you uploaded on the bA-HPC whether your machine is Linux or Windows.

5. Setting up your environment modules

The software environment used on BA-HPC cluster can be managed via modules. Modules facilitate the task of updating applications and provide a user-controllable mechanism for accessing software revisions and controlling combination of versions. For the job to executed, you must load any required modules before submitting your job.

Common commands to work with modules:

                        
module avail                    # lists available modules
module list                     # lists current loaded modules
module help module-name         # help on specific module
module whatis module-name       # brief description on a specific module
module display module-name      # display changes by a given module
module load module-name         # load a specific module
module unload module-name       # unloads a specific module
module clear                    # unloads all loaded modules

Do not module load multiple versions of the same module at the same time (including same version for different compilers). The module command will report a conflict if you attempt to do so.

Build software from source

We would be happy to install for you in the global environment software that's currently not present. You are also welcome to install software yourself under your data directory by following the software's build or installation instructions as a non-root user.

Liscenced software

If you work with liscenced software. kindly note you are responsible for providing your own licenses for software not in the public domain. It's recommended to build the liscenced software, which support parallelism, under you data directory. The version of the software has to be for use on a Linux cluster.

Working on BA-HPC

1. Creating slurm script

To handle the queuing, scheduling, and execution of jobs the BA-HPC cluster use a batch scheduling system called Slurm (Simple Linux Utility for Resource Management). Normally, you will submit jobs by writing a job script file and submitting the job to Slurm with the sbatch command.

The sbatch command takes a number of options (some of which can be omitted or defaulted). These options define various requirements of the job, which are used by the scheduler to figure out what is needed to run your job, and to schedule it to run as soon as possible, subject to the constraints on the system, usage policies, and considering the other users of the cluster. The options to sbatch can be given on the command line, or in most cases inside the job script. When given inside the job script, the option is placed alone on a line starting with #SBATCH (you must include a space after the sbatch).

Kindly note that these #SBATCH lines SHOULD come before any non-comment/non-blank line in the script.

Choosing CPU queue

If your application doesn't know anything about the multiple cores, then it has only a single stream of instructions that will only occupy a single processor core. And most of the cores will be unused. Computers in a cluster are commonly called compute node, every compute node has multiple independent processors inside it, called cores. As the programmer, to make use of the cores, you have to run sequences of instructions. There are two ways to do that.

One is called multi-processing, where job allocate cores from more than one node. There are other options for set --ntasks-per-node=# rather than -cpus-per-task=# . The difference between task and cpu is that a task may compromise multiple cpus, though this scenario seems kind of rare on the cluster, they are the same when cpus-per-task=1. The following example requests 24 tasks, each with a single core.

                    
#! /bin/bash
#SBATCH --job-name=multi-processing
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=1

The other is multi-threading, where all the cores are on the same node. You must set --ntasks=1, nodes=1 and then set --cpus-per-task to the number of OpenMP threads you wish to use. You can use the script example for OpenMP job.

                    
#!/bin/sh
#SBATCH --job-name=multi-threading
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=24

SLURM is very flexible and allows you to be very specific about your resource requests. However, requesting more cores doesn't usually make the job run faster. This may result with communication overhead between cores. Thinking about your application and doing some testing will be important to determine the best set of resources for your specific job.

Choosing GPU queue

                    
#!/bin/sh

#SBATCH --job-name=gpu-job
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --ntasks=1

On the BA-HPC cluster, you only specify a partition when you want to run your job on a GPU enabled nodes. To request GPUs for your job on the HPC, you need to add the #SBATCH --partition=gpu and #SBATCH --gres=gpu:N options to your job script file, where N specifies the number of GPUs that you are requesting. Kindly note that we have at most 2 GPUs per node and the total number of GPU enabled nodes is 16.

Currently, we do not directly charge usage of GPUs. GPU based jobs usage will be charged for the CPUs they consume on the GPU enabled node.Every GPU enabled node have 2 GPUs and 16 CPU cores. Since all jobs run in exclusive mode, consuming 1 GPU resource will also consume 8 CPU cores. So, you can start with assigning nstaks=1.

Choosing Slurm account

You have by default slurm account for CPU and GPU. Therefore, it would be necessary to edit the job script with the account name through SLURM account (here for example account alex044) option.

For CPU slurm account:


#SBATCH --account=alex044

For GPU slurm account:


#SBATCH --account=g.alex044

Setting job time

You will need to add the time parameter to your job script file to set a limit on the total run time of the job allocation. The time for the job depends on your estimation for how long it may take to finish.


#SBATCH --time=01:00:00

Skipping the time parameter in your job script may leave your job in a PENDING state with reason AssocGrpCPUMinutesLimit.

[username@login01 ~]$ squeue
                 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 84353       cpu  jobname username PD       0:00      1 (AssocGrpCPUMinutesLimit)

Besides not setting time limit, if you ran out of CPU or GPU hours you may encounter AssocGrpCPUMinutesLimit. Your job state will change from RUNNING to PENDING again. You can monitor your CPU or GPU quota here.

2. Submitting parallel jobs

2.1 MPI (CPU) job

The Message Passing Interface (MPI) is a standardized and portable system for communication between the various tasks of parallized jobs in HPC environments. A number of different implementations of MPI libraries are available at our cluster. Although the MPI interface itself is somewhat standardized, the different versions are not binary compatible. It is important that you match the MPI implementation you use and with which your code was compiled. The recommended MPI library on BA-HPC cluster is Intel MPI libraries.

Let's start by compiling a sample MPI program written in C. The program initialize a defined number of processes that print the 'Hello World' line to a file along with process rank. The source code for this program can be found at this github gist. To start using MPI environment, load the intel impi module using: [username@login01 ~]$ module load impi Let's use our newly loaded module to compile our C program, using MPI C compiler wrapper mpicc. Execute [username@login01 ~]$ mpicc hello-mpi.c -o hello-mpi.bin Now that we got our binary file, let's create a job script to submit it to OGS.

Here's an example of a simple script that will specify the necessary job parameters, we'll call it hello-mpi.sh:

                    
#! /bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=1
#SBATCH --time=00:15:00

mpirun -np 24 ./hello-mpi.bin

#SBATCH --job-name=mpi_job specify the job name.
#SBATCH --ntasks=24 restart the job in the case the system has a crash or is rebooted.
#SBATCH --cpus-per-task=1 specifies the number of cores per task, we will need only one core per process.
mpirun -np 24 ./hello-mpi.bin run the MPI executable and specifies number of processes.

Now that you have a job script, you need to submit the job to the cluster with the sbatch command. Make sure the Intel MPI libraries are loaded first then use 'sbatch' to submit the job to the scheduler



                        
[username@login01 ~]$ module list
Currently Loaded Modulefiles:
  1) GCCcore/5.4.0
  2) binutils/2.26-GCCcore-5.4.0
  3) icc/2016.3.210-GCC-5.4.0-2.26
  4) ifort/2016.3.210-GCC-5.4.0-2.26
  5) iccifort/2016.3.210-GCC-5.4.0-2.26
  6) impi/5.1.3.181-iccifort-2016.3.210-GCC-5.4.0-2.26

[username@login01 ~]$ sbatch hello-mpi.sh
Submitted batch job 156

The number that is returned to you is your job identifier, and you should this ID anytime you want to find out more information about your job, and you should always include this ID when are opening a support ticket about a job.

At this point, your job has been placed in the queue, and will wait its turn for resources to be available. Depending on how heavily used the cluster is at that time, and how many resources you are requesting, your job might start within minutes or it might wait for hours.

Once resources become available, our scheduler will assign resources to your job, including one or more nodes.

The standard output and standard error streams will be directed to a file, by default slurm-.output in the directory where you started the job, where the job-id is the job number as described above.

Output from your job can be viewed in the above specified file shortly after it starts running (assuming it has output something). This can be used to check the status of your job, although it is recommended make your code generates a lot of output to redirect it to another file.

For our trivial example from the last section, when the job completes we should see something like

                                
[username@login01]$ cat slurm-156.output
Hello world: rank 12 of 24 running on comp085.local
Hello world: rank 1 of 24 running on comp085.local
Hello world: rank 2 of 24 running on comp085.local
Hello world: rank 4 of 24 running on comp085.local
Hello world: rank 7 of 24 running on comp085.local
Hello world: rank 8 of 24 running on comp085.local
Hello world: rank 9 of 24 running on comp085.local
Hello world: rank 14 of 24 running on comp085.local
Hello world: rank 15 of 24 running on comp085.local
Hello world: rank 16 of 24 running on comp085.local
Hello world: rank 17 of 24 running on comp085.local
Hello world: rank 18 of 24 running on comp085.local
Hello world: rank 20 of 24 running on comp085.local
Hello world: rank 21 of 24 running on comp085.local
Hello world: rank 0 of 24 running on comp085.local
Hello world: rank 3 of 24 running on comp085.local
Hello world: rank 5 of 24 running on comp085.local
Hello world: rank 6 of 24 running on comp085.local
Hello world: rank 10 of 24 running on comp085.local
Hello world: rank 11 of 24 running on comp085.local
Hello world: rank 13 of 24 running on comp085.local
Hello world: rank 19 of 24 running on comp085.local
Hello world: rank 22 of 24 running on comp085.local
Hello world: rank 23 of 24 running on comp085.local

As you can see in the output files above, the MPI program executed and each process was assigned a unique rank, which was printed off along with the hostname.

2.2 CUDA (GPU) job

CUDA is a parallel computing platform and API model created by Nvidia. It allows you to use a CUDA-enabled GPU for general purpose processing – an approach termed GPGPU (General-Purpose computing on Graphics Processing Units). The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.

Again let's start by compiling a sample CUDA program written in C. The program use GPU to add two 2 vectors of integers in parallel. It starts by generating 2 vectors of size n then pass these vectors to the GPU memory, after that we make each core simply sums a single element from each of the two input vectors and writes the result into the output vector. Finally, we print only m number of elements into the output file. The source code for this program can be found at this github gist . To start using the CUDA library, let's load Intel C compiler and CUDA [username@login01 ~]$ module load icc CUDA Then let's use the loaded Nvidia CUDA Compiler (NVCC) to compile our source code [username@login01 ~]$ nvcc vector-add.cu -o vector-add.bin

An example of a simple script that will specify the necessary job parameters for a GPU based job, we'll call it cuda-vec_add.sh:

                    
#!/bin/bash
#SBATCH --job-name=first-cuda-job
#SBATCH --account=g.projectname
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:15:00

./vector-add.bin 100000 10

There's some additional options we've put in our job script:

#SBATCH --account=g.account_name Specify your GPU slurm account.
#SBATCH --partition=gpu Submit the job to the gpu partition.
#SBATCH --gres=gpu:1 Specify the number of GPUs. In this example we only need one GPU card.
#SBATCH --nodes=1 Specify the number of required nodes. In this example we only need one node.
#SBATCH --ntasks=1 Specify the number of CPU cores/process to be used. In this example we only need one process to initiate our program execution on the GPU. Maximum number of CPU cores/process to be used is 16.
./vector-add.bin 100000 10 Generate two vectors of length 100000, and print only the first 10 elements in the output file.

It is possible to run MPI programs that use GPUs but only within a single node, maximum of 2 GPUs and 16 CPU cores.

Now that you have a job script, let's submit the job to the cluster. Make sure the CUDA library are loaded first then use 'sbatch' to submit the job to the scheduler

                    
[username@login01 ~]$ module list
Currently Loaded Modulefiles:
1) CUDA/8.0.44

[username@login01 ~]$ sbatch cuda-vec_add.sh
Submitted batch job 157

After the job completes, we should see something like

                        
[username@login01]$ cat slurm-157.out
h_x = 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0

h_y = 100000.0 99999.0 99998.0 99997.0 99996.0 99995.0 99994.0 99993.0 99992.0 99991.0

The sum is:
100001.0 100001.0 100001.0 100001.0 100001.0 100001.0 100001.0 100001.0 100001.0 100001.0

2.3 Gromacs job

In this section, you will see to run a gmx mdrun example, which is the "main computational chemistry engine within GROMACS. This sbatch script example uses GPU resources.

You will need to load the To start using the libraries you need through loading the module.


                    [username@login01 ~]$ module load Gromacs

You will need to get the gromacs sbatch script from BA-HPC repository on github.


[username@login01 ~]$ cd data

[username@login01 ~]$ git clone https://github.com/hpcalex/use-hpc.git [username@login01 ~]$ cd use-hpc/gromacs/

Open the gromacs sbatch script to edit the account name and the command to run your code or may be the change the GPU specs.


                       [username@login01 ~]$ nano mdrun.sh

 
#!/bin/sh
#SBATCH --job-name=Apo
#SBATCH --partition=gpu
#SBATCH --account=<g.account_name>
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:2
#SBATCH --time=120:00:00

gmx mdrun -nt 16 -s my_run.tpr -deffnm my_run

After editing and closing the file through ctlr + X then choose yes, you can submit the sbatch to slurm to run your code.


                       [username@login01 ~]$ sbatch mdrun.sh

2.4 Quantum Espresso job

Quantum ESPRESSO is an open-source plane-wave periodic density functional theory code and considered one of the products of (SCM) Software for Chemistry and Materials.

You can check an example for input file scf.in.

For QE 6.3, you will need to load the required modules for running QE 6.3 [username@login01 ~]$ PATH="/share/apps/q-e-qe-6.3/bin:$PATH [username@login01 ~]$ module load intel/2018

For QE 6.7, you can easily use recent QE from the environment modules

[username@login01 ~]$ module load QuantumESPRESSO/6.7-intel-2019b

Here's an example of a submission script called pw.x.sh for Quantum Espresso job.

 
#!/bin/sh
#SBATCH --job-name=qe-job
#SBATCH --account=<account_name>
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=1
#SBATCH --time=03:00:00

mpirun -np "$SLURM_NTASKS" pw.x scf.in > opt.out

To make slurm take action to work on your input file you will need to submit the submission as follows. [username@login01 ~]$ sbatch pw.x.sh

2.5 Python job

Using Anaconda3 module

Anaconda is a distribution of the Python and R programming languages for scientific computing that wil provide you with the packages you need by excuting the following

[username@login01 ~]$ module load Anaconda3
[username@login01 ~]$ source activate /cluster/envs/ba-hpc.bak
[username@login01 ~]$ conda list

conda list will show you the available libraries we have, that most researchers use on the BA-HPC. If the anaconda library you need does not exist, you can mail us at hpc@bibalex.org.

Create your special conda environment


[username@login01 ~]$ module load Anaconda3
[username@login01 ~]$ conda create -n <env_name> python=3.8
[username@login01 ~]$ conda activate  <env_name>

Using Python pip

You can use pip after activating the ba-hpc env. Besides, --user must be specified in all pip commands.

[username@login01 ~]$ pip install <package_name> --user

Actions below, are quite helpful when cache files crowdens your home more than 100 MB. So, you will move bin and lib directories to data directory and create symbolic links under .local which refrence the actual moved files. The same idea goes for .cache directory too.


[username@login01 ~]$ cd .local 
[username@login01 ~]$ mv bin lib ../data/ 
[username@login01 ~]$ ln -s ../data/bin . 
[username@login01 ~]$ ln -s ../data/lib .
[username@login01 ~]$ cd ..
[username@login01 ~]$ cd .cache
[username@login01 ~]$ mv pip ../data/pip_cache 
[username@login01 ~]$ ln -s ../data/pip_cache pip

Working with Python script

We have a code repository which is an accumulator for scripts intended as executable examples for users to carry out tasks related to the TensorFlow machine learning framework on a High-Performance Computing (HPC) system.

Listing GPU devices

The source code for this program and the submission script can be found at BA-HPC code repository.

Dynamic Recurrent Neural Network

A simple Tensorflow implementation of a Recurrent Neural Network (LSTM) that performs dynamic computation over sequences with variable length on a toy dataset. The source code for this program can be found at this link. The scripts to fetch and run this program can be found at BA-HPC code repository.

Convolutional Neural Network

A tensorflow tutorial demonstrates training a simple Convolutional Neural Network (CNN) to classify CIFAR images, which is a frequently used benchmark for image classification tasks. The source code for this program can be found at this link The scripts to fetch and run this program can be found at BA-HPC code repository.

2.6 GUI job

Working with Jupyter

Jupyter comes in handy for data visualization projects, since it is a GUI (web) interface for coding, you will need run a certain submission script called server.sh in the same directory that has jupyter notebooks under your data directory.

You will need to get the jupyter sbatch script from BA-HPC repository on github. It is recommended to put the ipynb next to server.sh to show up and be able to execute it directly from there.


[username@login01 ~]$ cd data

[username@login01 ~]$ git clone https://github.com/hpcalex/use-hpc.git [username@login01 ~]$ cd use-hpc/jupyter/

Open the server.sh sbatch script to edit the account name and the command to run your code or may be the change the GPU specs.


                       [username@login01 ~]$ nano server.sh

 
#!/bin/sh
#SBATCH --job-name=jupyter-server
#SBATCH --account=<g.account_name>
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --time=00:40:00

jupyter notebook --no-browser --ip=0.0.0.0
echo "$SLURM_JOB_NODELIST"

After editing and closing the file through ctlr + X then choose yes, you can submit the sbatch to slurm to run your code.


                       [username@login01 ~]$ sbatch server.sh

You should see the jupyter notebook job running in the queue after that. You will need to check the compute node you took from squeue command: [username@login01 ~]$ squeue -u $USER

Unix-like systems users

Besides login to the BA-HPC at the beginning, you will need SSH-tunelling/Port forwarding connection to run and view your python code and results on the cluster. As running GUI applications is not supported on the cluster. You will need to run the follwoing command on your linux machine/laptop.

  $ ssh -NL local-port:compute-node-you-took:remote-port username@remote-host

A real example: $ ssh -NL 8888:comp021:8888 alex005u1@login01.c2.hpc.bibalex.org

Windows user

On your local machine, Open bitvise and select C2S (Client-to-Server) for ssh tunelling and insert the following values:

Listen interface: localhost
Listen port: 8888
Destination host: compute_node_name
Destination port: 8888

Finally, open the browser locally on your personal computer or laptop to see jupyter notebook running on the BA-HPC and write

 localhost:8888

Working with vscode

In Bitvise (if using Bitvise on Windows), export the private key and copy the text

In VS Code, in a terminal window Ctrl+Shift+`, edit ~/.ssh/id_rsa and paste the private key there (recommended to touch the file first and clear file permissions for group and other on both ~/.ssh and ~/.ssh/id_rsa)

Test connection from the VS Code terminal to the login node: ssh <username>@login01.c2.hpc.bibalex.org

In VS Code's extension manager, find and install Remote - SSH

Click the new status bar (bottom-left) to Open a Remote Window, click Connect to Hostand enter <username>@login01.c2.hpc.bibalex.org and press the enter key

Wait a moment for VS Code to initialize the new remote window, clicking Allow in the dialog with the warning about the OS not being supported, a warning that should be safe to ignore until February 2025 (by then, an overall software redeployment of the BA-HPC would hopefully be complete)

You should then be in remote development mode, where you may, e.g., open a new terminal Ctrl+Shiftthat will give you access to the login node's shell environment from within VS Code

Monitoring

1. Job status

The basic command for monitoring your jobs' status is the squeue command. Because normally you are only interested in your jobs, it is advisable to add the -u username flags, to speed up the command and only show your jobs. Replace username with your username.

To check your jobs' state in the queue

[username@login01 ~]$ squeue -u $USER

Watch your job in the queue, time is updated every two seconds

[username@login01 ~]$ watch squeue -u $USER

To stop/cancel a job from the queue

[username@login01 ~]$ scancel <jobid>

After the job finishes

To check elapsed time in the same session

[username@login01 ~]$ sacct -Xo jobid,jobname,elapsed,state

To generally check all your jobs' status in any session

[username@login01 ~]$ sacct -S 2019-01-01 -u <username>

2. CPU or GPU hours

To check your CPU hours usage

[username@login01 ~]$ cpumins
<projectname> cpu : 3620550 of 4800000 mins (60342 of 80000 h) 75%

To check your GPU hours usage

[username@login01 ~]$ gpumins
g.<projectname> gres/gpu : 16493 of 60000 mins (274 of 1000 h) 27%

3. Disk quota

You could use the following command to check your account storage:

[username@login01 ~]$ lfs quota -hg <projectname> /lfs01

We should see something like

               [username@login01 ~]$ lfs quota -hg <projectname> /lfs01
               Disk quotas for grp alex036 (gid 1034):
                    Filesystem    used   quota   limit   grace   files   quota   limit   grace
                         /lfs01  2.785G     10G   10.1G       -     155  100000  105000       -

Here, the output shows the used storage under your data directory and also shows number of files which is 155 in this example.

It's advisable to make sure that your software doesn't generate many files that may exceed the maximum limit of files for each user.

Walk Me Through

Getting started

1. Generating key pair

Passphrase (Optional)

Unix-like systems users

Windows users

2. Uploading public key

3. Accessing BA-HPC

Unix-like systems user

Windows user

New Terminal Console

Best practises on the login node

4. Uploading and downloading your work

Unix-like systems user

Windows users

5. Setting up your environment modules

Common commands to work with modules:

Build software from source

Liscenced software

Working on BA-HPC

1. Creating slurm script

Choosing CPU queue

Choosing GPU queue

Choosing Slurm account

Setting job time

2. Submitting parallel jobs

2.1 MPI (CPU) job

2.2 CUDA (GPU) job

2.3 Gromacs job

2.4 Quantum Espresso job

2.5 Python job

Using Anaconda3 module

Create your special conda environment

Using Python pip

Working with Python script

Listing GPU devices

Dynamic Recurrent Neural Network

Convolutional Neural Network

2.6 GUI job

Working with Jupyter

Unix-like systems users

Windows user

Working with vscode

Monitoring

1. Job status

After the job finishes

2. CPU or GPU hours

3. Disk quota