{"id":11203,"date":"2025-10-20T12:15:58","date_gmt":"2025-10-20T11:15:58","guid":{"rendered":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=11203"},"modified":"2026-06-08T18:11:52","modified_gmt":"2026-06-08T17:11:52","slug":"ollama","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/ollama\/","title":{"rendered":"Ollama"},"content":{"rendered":"<h2>Overview<\/h2>\n<p>OLLAMA is the easiest way to get up and running with large language models such as gpt-oss, Gemma 3, DeepSeek-R1, Qwen3 and more.<br \/>\nPlease note that it is for running LLM&#8217;s only, not for training them. You can however create customised models from existing models and even run those customised models.<\/p>\n<h2>Restrictions on use<\/h2>\n<p>OLLAMA is open source and freely distributed under <strong><a href=\"https:\/\/github.com\/ollama\/ollama\/blob\/main\/LICENSE\" target=\"_blank\" rel=\"noopener\">MIT License<\/a><\/strong>.<\/p>\n<p>Please note that the LLM&#8217;s license is different from that of ollama. Please check their respective license terms before using them.<\/p>\n<h2>Set up procedure<\/h2>\n<p>To access the software you must first load <em>one<\/em> of the the following modulefiles:<\/p>\n<pre>\r\nmodule load apps\/binapps\/ollama\/0.30.6\r\nmodule load apps\/binapps\/ollama\/0.19.0\r\nmodule load apps\/binapps\/ollama\/0.16.2\r\nmodule load apps\/binapps\/ollama\/0.12.3\r\n<\/pre>\n<h2>Location of downloaded LLM&#8217;s<\/h2>\n<p>When you try to run any model in Ollama, it will first download the model to the directory <code>~\/.ollama<\/code> in your home directory and then run it from there. This is the default location. Over time this directory can grow large as you run different LLM&#8217;s. <strong>It is therefore advised to changed the default location where Ollama stores the LLM&#8217;s<\/strong> to somewhere within your <code>~\/scratch<\/code> directory.<\/p>\n<p>The default location which Ollama will use for storing LLM&#8217;s can be controlled by the environment variable <strong><code>OLLAMA_MODELS<\/code><\/strong>.<\/p>\n<p>To change the default storage location, first create a directory inside your <code>~\/scratch<\/code> directory:<\/p>\n<pre>mkdir ~\/scratch\/ollama_models<\/pre>\n<p>Next, set that path as OLLAMA_MODELS by adding the following line to your <code>~\/.bashrc<\/code> file:<\/p>\n<pre>export OLLAMA_MODELS=\"~\/scratch\/ollama_models\"<\/pre>\n<p>Finally, re-login to CSF3 or source your edited <code>~\/.bashrc<\/code> file:<\/p>\n<pre>source ~\/.bashrc<\/pre>\n<h2>Interactive mode testing<\/h2>\n<p>Please do <strong>NOT<\/strong> run OLLAMA on the login node. Jobs should be submitted to the compute nodes via batch.<br \/>\nYou can test run OLLAMA in an <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/batch-slurm\/gpu-jobs-slurm\/#Interactive_Jobs\" target=\"_blank\" rel=\"noopener\">interactive job\/session<\/a>. However, we <strong>strongly<\/strong> advise that you use batch jobs rather than interactive jobs. Use this for short duration testing only.<\/p>\n<pre class=\"slurm\">\r\n# Start an interactive job.\r\n# Using long flag names\r\nsrun --partition=gpuL --gpus=1 --ntasks=1 --time=1-0 --pty bash\r\n\r\n# Or using short flag names\r\nsrun -p gpuL -G 1 -n 1 -t 1-0 --pty bash\r\n\r\n# Once resources are assigned for interactive job and you are logged in to an interactive node, run the following:\r\nmodule purge\r\nmodule load apps\/binapps\/ollama\/0.12.3\r\n\r\nunset ROCR_VISIBLE_DEVICES\r\n\r\nollama serve &                       # This will start the ollama server\r\n                                     # Uses port 11434 by default\r\nollama -v                            # Verify that ollama server is running\r\nollama run llama3.2 &                # This  will download Llama 3.2 LLM and run it\r\n                                     # It takes a lot of time to download a LLM depending on the size\r\n                                     # Next time it is run, it will start quickly.\r\nollama ps                            # This will list the running LLM\r\n\r\n# You will be able to interact with the running LLM at this stage.\r\nHello                                # Interact with LLM\r\n\/bye                                 # Exit interaction when done testing\r\nollama stop llama3.2                 # Stop the running LLM\r\nollama ps                            # Verify that the LLM has been stopped\r\nexit                                 # This will end the interactive job\/session and\r\n                                     # ollama server will be stopped.\r\n<\/pre>\n<h2>GPU batch job submission<\/h2>\n<h3>Single GPU job<\/h3>\n<p>Write a job submission script, for example:<\/p>\n<pre class=\"slurm\">\r\n#!\/bin\/bash --login\r\n#SBATCH -p gpuL   # GPU partition. Available options for all: gpuL(L40s-48GB), gpuA(A100-80GB)\r\n                  # GPU partitions with restricted access: gpuA40GB\r\n#SBATCH -G 1      # (or --gpus=N) Number of GPUs \r\n#SBATCH -t 1-0    # Wallclock timelimit (1-0 is one day, 4-0 is max permitted)\r\n### Optional flags\r\n#SBATCH -n 1      # (or --ntasks=) Number of CPU (host) cores (default is 1)\r\n                  # up to 12 CPU cores per GPU is permitted for gpuL, gpuA and gpuA40GB\r\n                  # Also affects host RAM allocated to job unless --mem=num used.\r\n\r\n\r\n# Load the module\r\nmodule purge\r\nmodule load apps\/binapps\/ollama\/0.12.3\r\n\r\n# Without the following line Ollama will run in CPU instead of GPU\r\nunset ROCR_VISIBLE_DEVICES\r\n\r\n# Start ollama server process and run the desired LLM\r\nexport OLLAMA_HOST=0.0.0.0:11434        # This enables you to interact with the API remotely\r\nollama serve &\r\nollama run llama3.2 &\r\n\r\n# below lines are for graceful stopping of the LLM\r\n# adjust as per wallclock time you have set for the job\r\nsleep 23h                 # To delay the execution of the command in the next line\r\nollama stop llama3.2      # To stop Llama 3.2 model. However, this does not stops \r\n                          # the ollama server, which is stopped when the job ends\r\n<\/pre>\n<p>Submit the jobscript using: <code>sbatch scriptname<\/code><\/p>\n<h2>Interacting with the LLM&#8217;s running in CSF3 compute node<\/h2>\n<p>Once your desired LLM is running in a GPU compute node in CSF3, you will want to interact with it.<br \/>\nYou can use the different API&#8217;s provided by Ollama to interact with the LLM&#8217;s.<br \/>\nYou can either:<\/p>\n<ol>\n<li><strong>Interact from CSF3 login node itself<\/strong><\/li>\n<li><strong>Interact directly from your laptop<\/strong><\/li>\n<\/ol>\n<p>Following are some examples to do that:<\/p>\n<h3>1. Interacting from CSF3 login node<\/h3>\n<p>Once your job is running, check which node the job is running in using <code>squeue<\/code> command.<br \/>\nThen you can interact from the CSF3 login node by running:<\/p>\n<pre>\r\ncurl http:\/\/nodeNNN:11434\/api\/generate -d '{\r\n  \"model\": \"llama3.2\",\r\n  \"prompt\": \"Tell me a joke\"\r\n}'\r\n<\/pre>\n<p>You will get responses in chunks instead of a single blob since the default behaviour is to respond in JSON chunks.<br \/>\nIf you prefer answer in a single blob, which is easier to read run the following instead, which prevents streaming in chunks:<\/p>\n<pre>\r\ncurl http:\/\/nodeNNN:11434\/api\/generate -d '{\r\n  \"model\": \"llama3.2\",\r\n  \"prompt\": \"Tell me a joke\",\r\n  \"stream\": false\r\n}'\r\n<\/pre>\n<h3>2. Interacting from your laptop<\/h3>\n<p>To interact with the LLM running in a compute node of CSF3, you will first need to set up a SSH Tunnel to the compute node of CSF3 from your laptop.<br \/>\nYou can setup the SSH tunnel by running the following command from your laptop (terminal or windows command prompt):<\/p>\n<pre>\r\nssh -L 11434:nodeNNN:11434 <username>@csf3.itservices.manchester.ac.uk\r\n<\/pre>\n<p>After completing the authentication, you will be logged in to CSF3. Now it is important that you leave this terminal\/window aside.<\/p>\n<p><strong>Linux\/MAC Laptop example<\/strong><br \/>\nNext, open a new terminal in your Linux\/MAC laptop and then run the same commands as above, just change <code>nodeNNN<\/code> to <code>localhost<\/code> this time.<\/p>\n<pre>\r\ncurl http:\/\/localhost:11434\/api\/generate -d '{\r\n  \"model\": \"llama3.2\",\r\n  \"prompt\": \"Tell me a joke\",\r\n  \"stream\": false\r\n}'\r\n<\/pre>\n<p><strong>Windows Laptop example<\/strong><br \/>\nThe command syntax for <code>curl<\/code> is different from that of Linux\/MAC. Open a new command prompt window in your Windows laptop and then run the following command.<\/p>\n<pre>\r\ncurl -X POST http:\/\/localhost:11434\/api\/generate -H \"Content-Type: application\/json\" ^\r\n     -d \"{\\\"model\\\":\\\"llama3.2\\\",\\\"prompt\\\":\\\"Tell me a joke\\\",\\\"stream\\\":false}\"\r\n<\/pre>\n<p><H3>Other API&#8217;s<\/H3><br \/>\nThe above example uses the GENERATE API which is fine for individual single-turn interactions.<br \/>\nFor longer multi-turn conversations, which needs remembering the previous messages or the context, you can use the CHAT API.<\/p>\n<p>Here&#8217;s and example:<\/p>\n<pre>\r\n#<strong>Linux\/MAC Example<\/strong>\r\ncurl http:\/\/localhost:11434\/api\/chat -d '{\r\n  \"model\": \"llama3.2\",\r\n  \"messages\": [\r\n    { \"role\": \"user\", \"Content\": \"Tell me a nerd joke\" }\r\n  ],\r\n  \"stream\": false\r\n}'\r\n\r\n\r\n#<strong>Windows Example<\/strong>\r\ncurl -X POST http:\/\/localhost:11434\/api\/chat -H \"Content-Type: application\/json\" ^\r\n     -d \"{\\\"model\\\":\\\"llama3.2\\\",\\\"messages\\\":[{\\\"role\\\":\\\"user\\\",\\\"content\\\":\\\"Tell me a nerd joke\\\"}],\\\"stream\\\":false}\"\r\n<\/pre>\n<h2>Further info<\/h2>\n<ul>\n<li><a href=\"https:\/\/ollama.com\/\" target=\"_blank\">Ollama website<\/a><\/li>\n<li><a href=\"https:\/\/ollama.com\/docs\" target=\"_blank\">Ollama official documentation<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/ollama\/ollama?tab=readme-ov-file#web--desktop\" target=\"_blank\">Web &#038; Desktop UI&#8217;s<\/a><\/li>\n<\/ul>\n<h2>Updates<\/h2>\n<p>None.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview OLLAMA is the easiest way to get up and running with large language models such as gpt-oss, Gemma 3, DeepSeek-R1, Qwen3 and more. Please note that it is for running LLM&#8217;s only, not for training them. You can however create customised models from existing models and even run those customised models. Restrictions on use OLLAMA is open source and freely distributed under MIT License. Please note that the LLM&#8217;s license is different from that.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/ollama\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":21,"featured_media":0,"parent":86,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-11203","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/11203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=11203"}],"version-history":[{"count":20,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/11203\/revisions"}],"predecessor-version":[{"id":12357,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/11203\/revisions\/12357"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/86"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=11203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}