nvitop
Overview
nvitop is an interactive NVIDIA device and process monitoring tool. It has a colorful and informative interface that continuously updates the status of the devices and processes. As a resource monitor, it includes many features and options, such as tree-view, environment variable viewing, process filtering, process metrics monitoring, etc.
Note that this app is provided by the uvx nvitop method, which downloads a python package and creates a temporary light-weight python virtual environment from which to run the app. The download is cached in your ~/.cache/uv/ directory, so that the download only occurs the first time you use the tool.
Restrictions on use
nvitop is dual licensed by the Apache License, Version 2.0 (Apache-2.0) and GNU General Public License, Version 3 (GPL-3.0), and all usage on the CSF must adhere to those licenses.
Set up procedure
You can load the modulefile on the compute node where your job is running. We recommend using the latest version by simply not including a version number in your command:
Load one of the following modulefiles:
module load tools/bintools/nvitop
Running the application
This app should be run on a compute node where you have a running GPU job. It is NOT possible to access compute nodes unless you have a job running on the node.
It is now possible to ssh to the compute node where your job is running.
# On the login node, find out where your GPU job is running squeue JOBID PRIORITY PARTITION NAME USER ST ... NODELIST 123456 0.000054 gpuA myjob mabcxyz1 R ... node860 # Now ssh to the node: ssh node860 # Now load the modulefile and run nvitop module load tools/bintools/nvitop nvitop # To return to the login node: exit
Your ssh session will be terminated automatically when your job finishes.
Further info
Updates
None.
