Monitoring Jobs

Monitoring Existing Jobs

You can use srun to monitor existing jobs. It will login in to the allocated resource on the compute node where the job is running and give you an interactive session there.

Your interactive session will consume some of the resources allocated to your batch job. This may adversely affect your batch job.

You will need to know the JobID number of the job you wish to monitor, then run:

srun --jobid JOBID --pty bash

If you’ll be using a GUI tool to monitor your job, use:

srun-x11 --jobid JOBID            # NO "--pty bash" needed for srun-x11

To limit the amount of time your interactive session will run for, add the -t timespec flag to the srun command. For example: -t 10 for 10 minutes.

GPU jobs

If running a GPU job, you can now run nvidia-smi to get some info about your GPU usage.

Ending your monitoring session

Run exit to end your interactive monitoring session. This will NOT terminate your batch job. You’ll return to the login node.

Last modified on April 2, 2025 at 5:07 pm by George Leaver