Basics of Compiling MATLAB Code
October 2023: It is NO LONGER necessary to compile your MATLAB code on the CSF. You may compile it if you wish to do so. Hence our compilation instructions are still provided below. Alternatively, see our instructions on running un-compiled MATLAB.
Note that compiling your code will not make it run any faster.
Introduction – Why compile MATLAB code?
When you run MATLAB on University systems, a “network license token” is checked-out to you for the duration of your MATLAB session. We used to have a limited number of licenses. If a license is not available on the network then MATLAB will fail with a licensing error.
If you were to use MATLAB ‘normally’ on the CSF (i.e., using MATLAB to run your .m
file) and ran N (e.g., 100) jobs simultaneously then you would be using (up to) N (e.g. 100) MATLAB licenses. On a large cluster such as the CSF it is possible for a single user to check out all available MATLAB licenses for hours at a time resulting in what would effectively be a denial of service (DoS) attack for MATLAB. There would be a lot of very unhappy users around the campus if you were to do this! Therefore we do not allow the main MATLAB programme to be run directly on CSF compute-nodes.
In order to avoid this licensing problem, you will make use of the MATLAB Compiler to produce a standalone executable program (from your .m
file) that can be run on CSF compute-nodes without the need for any network licenses. The license is used when you compile your code, but this only needs to be done once (and after each time you change your MATLAB code in the .m
file). After compiling your MATLAB code you can run it in many 100s of jobs without using any licenses! Hence you MUST compile your code before submitting your jobs.
Controlling the number of cores used by MATLAB jobs
Before we show how to compile your MATLAB code, we must discuss how many CPU cores you wish your MATLAB jobs to use.
It is now possible to finely control the number of cores used by MATLAB jobs. There are two options available for MATLAB code:
- Use a single CPU core on the compute node where the job runs, or
- Compile your
.m
file including parpool for explicit parallelisation (instructions here) - Rely on the implicit parallelisation of many MATLAB functions – they may automatically use multiple cores, particularly when processing large arrays or matrices.
Hence you can compile your code to be a single-core (serial) application or a multi-core (parallel) application.
Many MATLAB functions are multi-threaded by default and an extensive list of these can be found at the following blog post written by Dr. Mike Croucher http://www.walkingrandomly.com/?p=1894. Hence if you compile your MATLAB code to be a multi-core application, it may automatically use multiple CPU cores without you having to explicitly write parallel MATLAB code.
In addition to the implicit, multi-threaded computation offered by these functions, you may wish to investigate use of the parallel computing toolbox. This allows for explicit multi-core computation as well as GPU computing.
Basic Serial (1 core) Compilation Example
We will compile and submit the following four-line .m file. Create a file named example.m
using your preferred CSF text editor (or download example.m)
myVar = ones(5); save('exampleoutput.mat', 'myVar') disp('Hello from the CSF!') disp('Be sure to save any variables you need to a mat file.')
From the bash prompt on the CSF login node, issue the following commands (lines starting with # are comments)
# First, start an interactive session on a high memory compute node so we can run the compiler qrsh -l short -l mem512 # Go to the directory (folder) where your matlab .m file is. For example: cd ~/my_experiment/matlab # (you could copy the example.m file above) # This line loads the MATLAB environment module load apps/binapps/matlab/R2018a # Perform the compilation, producing a serial (runs on one core) executable. mcc -R -singleCompThread -m example.m # If it compiles successfully, look at what files have been generated: ls example example.m mccExcludedFiles.log readme.txt requiredMCRProducts.txt run_example.sh # # This is the .m file you started with # All other files have been generated by the matlab compiler 'mcc' # You should now exit the interactive session to go back to the login node exit
The mcc
command is the MATLAB compiler. The flags added are as follows:
-R -singleCompThread
This option tells the compiler to produce a single threaded application and, unless you know that your application is going to take advantage of multiple cores, you should use it.Note: If you omit this option, you must run your compiled code as a parallel batch job which requests all cores on a single compute node from the batch system (see below for how to compile and run parallel code).-m example.m
This is your MATLAB source code (the.m
file) to be compiled.
If you see an error similar to:
[thread 140138232624896 also had an error] A fatal error has been detected by the Java Runtime Environment: java.lang.OutOfMemoryError: pthread_getattr_np Internal Error (os_linux_x86.cpp:681), pid=32693, tid=140138087524096 Error: pthread_getattr_np
it means we need to increase the amount of memory you are permitted to use on the login node. Java often requires more than the default user limit. Please email us at its-ri-team@manchester.ac.uk indicating you are trying to compile MATLAB code.
Note: do not compile code from within a batch job. If you have some MATLAB code that uses lots of different parameters it might be tempting to write a .m
file for each set of parameters and to compile a new version of your code for each run. But you could run out of compiler licenses if many jobs run at the same time. Instead, write your code to read parameters from a .mat
file or from the command-line and then you only need to compile one version of your code. This should be done on the login node.
We now describe the files that the mcc
compiler will generate.
Output files
After running the compiler additional files will have been created in your directory. For the purposes of submitting jobs to the CSF, the only files we are interested in are:
example
– This is the binary executable generated by the MATLAB compiler. It cannot be run directly.run_example.sh
– This is the wrapper script for the above executable and is the application that you will actually run from within your jobscript. So this is not the jobscript – you must still write a jobscript that runs thisrun_example.sh
script. It is a common mistake to think thatrun_example.sh
is the jobscript. It is not. It has been automatically generated by the MATLAB compiler.- your jobscript.qsub – this does not yet exist. You will write the jobscript next …
example.m
– This is your original.m
file that the compiler read to generate the above two file. You must not use this for anything else apart from compiling as above.
Once you have mastered the basics of compilation further advice on compilation is available. This deals with compiling multiple .m files, including directories of .m files, speeding up compilation time, use of toolboxes and other good advice.
Submitting serial jobs
The matlab compiler (mcc
) does not create a jobscript for you. You still need to write a jobscript. You might think that the generated file ending with .sh
in the name (run_example.sh
in the example above) is a jobscript. It is not a jobscript!
If you have compiled using the option
-R -singleCompThread
then you submit your job for serial execution (so do not specify a PE) using a jobscript.
For example, we create a jobscript named example.qsub
as follows:
#!/bin/bash --login # ---- SGE options (lines start with #$): ------------------------------- #$ -cwd # Run the job in the current directory # The module file must be loaded in order to run your MATLAB job module load apps/binapps/matlab/R2018a # ---- Commands to be executed (programs to be run) on a compute node ---- ./run_example.sh $MATLAB_HOME
Submit with the command:
qsub scriptname
where scriptname
is replaced with the filename of your submission script (example.qsub
in this example).
Output files
Two new files will appear – one for the standard output, and one for the standard error. See the SGE tutorial for further details.
Basic Parallel Compilation
Essential information about parallel Matlab jobs
If you wish to run parallel MATLAB on the CSF you MUST ensure you follow the advice in this section to ensure your job runs on the requested resources correctly. |
Reminder: If your code makes no use, or very little use, of MATLAB parallel functions, you should compile serial MATLAB code as described above (add -R -singleCompThread
). We now describe how to compile parallel (multi-core) MATLAB code:
Within your Matlab .m
file, please add the section below:
% Explicit parallelism using 'parpool' nslots = str2double(getenv('NSLOTS')); parpool(nslots); pp = gcp; poolsize = pp.NumWorkers;
This has the advantage that it will match the number of cores you specify in the -pe smp.pe N
line in your job script. More info on parpool
may be found here.
If you are using MATLAB functions that make use of implicit parallelism (i.e., they are parallelized internally) you can control the number of cores that MATLAB uses for these functions. Use the following code:
% Implicit parallelism using maxNumCompThreads nslots = str2double(getenv('NSLOTS')); maxNumCompThreads(nslots) % Now, operations that use implicit parallelism such as matrix multiplication % will use the required number of threads C = A * B; % matrix multiplication
You compile multi-threaded MATLAB by doing the following
# First, start an interactive session on a high memory compute node so we can run the compiler qrsh -l short -l mem512 # Go to the directory (folder) where your matlab .m file is. For example: cd ~/my_experiment/matlab # (you could copy the example.m file from the serail example above) # This line loads the MATLAB environment module load apps/binapps/matlab/R2018a # Perform the compilation, producing a parallel executable. mcc -m example.m # If it compiles successfully, look at what files have been generated: ls example example.m mccExcludedFiles.log readme.txt requiredMCRProducts.txt run_example.sh # # This is the .m file you started with # All other files have been generated by the matlab compiler 'mcc' # You should now exit the interactive session to go back to the login node exit
and then you must request the number of cores you require in your jobscript using the <code>smp.pe</code> parallel environment. Foe example:
#$ -pe smp.pe 12 # Minimum 2, Maximum 32
Please read on for the explanation:
Parallelisation in MATLAB is relatively complicated since there are many mechanisms by which it can be achieved. For example, many in-built MATLAB functions are automatically distributed across several processor cores if given large enough input (click here for a list of such functions), so-called implicit parallelism.
If your code uses some of these functions then you may be tempted to omit the -R -singleCompThread option and submit to a parallel environment. However, this may not maximise your throughput (the amount of work you get done on the CSF). For example, if only a short amount of time is spent in parallelized functions and the rest of your code is single threaded, you may spend longer in the CSF queue waiting for a 16-core node to become free (assuming you request 16 cores in your jobscript) than you gain from the parallel MATLAB sections of code. This becomes even more important if you have 100s of jobs to do – waiting for 16-core nodes to become free 100s of times could take a long time!). In this case you may be better off sticking with the serial jobs and keeping the -R -singleCompThread flag when compiling MATLAB. You’ll usually find that many single-core jobs spend less time in the CSF queues and so ultimately finish sooner, even if the actual runtime of the code is slightly longer than a parallel MATLAB job.
If however, you know that your code makes significant use of implicitly parallel functions then you should omit -R -singleCompThread at the compilation stage and submit to the smp.pe environment. Note that you must ensure you have used one of the methods above to control the parallelism, if not your job may try to grab more resources than have been allocated to it and trample on other jobs that have reserved the other cores on the node.
Submitting Parallel Jobs
A basic job script is as follows.
#!/bin/bash --login #$ -cwd # Job runs in current directory - where submitted from #$ -pe smp.pe 12 # No. of cores you wish to use, Min. 2, Max. 32. # Load the version of matlab you wish to use, for example 2018 module load apps/binapps/matlab/R2018a # Run your code ./run_myparallel.sh $MATLAB_HOME
Submit with the command
qsub scriptname
where scriptname
is replaced with the filename of your submission script.
Frequently Asked Questions
Can I run a job on more than one compute node?
No, MATLAB cannot run on more than one CSF node. This is because the University does not have licenses for the distributed computing server product. The maximum job size is therefore all cores on one single compute node and jobs must use smp.pe
or they will fail. Please get in touch if you would like help assessing the requirements of your job.
Hints, Tips and Code Samples
Passing Command-line Args to Compiled Code
You may wish to pass command-line parameters to your compiled MATLAB code. For example suppose you wish to pass in a couple of numbers representing settings to be used by your code. The jobscript will look something like:
#!/bin/bash #$ -cwd module load apps/binapps/matlab/R2018a #### This is a serial job. If you need to run the same code a lot with #### different parameters see the next tip about job arrays. # Pass in two numbers used by my MATLAB code: 500 and 10000 (for example) ./run_myinput.sh $MATLAB_HOME 500 10000
You will need to modify your myinput.m
file to read these args. You must make the entire code be run from a top-level function:
% myinput.m source code function exitcode = myinput(xparamarg, iters) % Args come in as strings. Convert to numbers: xparam=str2num(xparamarg); num_iters=str2num(iters); fprintf( 'Executing my code with xparam = %d, and %d iterations\n', xparam, num_iters ); % ...your code... % Set dummy function return val exitcode = 0; end
Compile the code as before, for example:
mcc -R -singleCompThread -m myinput.m
In the above sample note that
- The code is wrapped in a function and so will need an exit value – we set a dummy value at the end.
- The args passed in are presented as strings to your code. If they are to be used as numbers you must convert them to numbers (e.g., using
str2num()
).
Using the Job Array task ID as a Command-line Arg
The above method could be used within an SGE job array. This is a single job that runs the same MATLAB code multiple times as individual tasks, but with different input parameters. More than one task can run at the same time. Each task uses a special SGE variable ($SGE_TASK_ID) to pass a different parameter to your MATLAB code. This is a great alternative to using a for loop, because rather than running each loop sequentially, the job array tasks can run at the same time i.e. in parallel.
In the example jobscript below the $SGE_TASK_ID
value is passed to the MATLAB code. This can then do something unique with that value. For example:
#!/bin/bash #$ -cwd ### Jobarray with 1000 tasks numbered 1,2,...,1000 #$ -t 1-1000 module load apps/binapps/matlab/R2018a # Include the next two lines to avoid a common problem with MATLAB job arrays (described in the next tip) export MCR_CACHE_ROOT=$TMPDIR/mcrCache mkdir -p $MCR_CACHE_ROOT # Run our matlab code giving it the current task id on the command-line ./run_myinput.sh $MATLAB_HOME $SGE_TASK_ID
The MATLAB code will then read the first arg as shown earlier:
% myinput.m source code function exitcode = myinput(task_id_arg) % Args come in as strings. Convert to numbers: sge_task_id=str2num(task_id_arg); fprintf( 'Executing my code with SGE_TASK_ID = %d\n', sge_task_id); % ...your code... % Set dummy function return val exitcode = 0; end
As before, compile the code.
mcc -R -singleCompThread -m myinput.m
Our batch system documentation contains lots of examples of how to use job array.
MATLAB Job Array Error and the Fix
If using job arrays to run multiple instances of MATLAB (similar to condor), you may receive an error message about accessing a lock file:
terminate called after throwing an instance of 'dsFileBasedLockError' what(): \ Tried to obtain a lock on a directory without write permission: \ /mnt/iusers01/xy01/mabcxyz12/.mcrCache7.17/.deploy_lock.27
Or you may receive an error message of the form:
Could not access the MATLAB Runtime component cache. Details: Some error has occurred in the file: mcr_cache/mclComponentCache.cpp, at line: 328. The error message is: dsFileAccessError exception ; component cache root:/mnt/iusers01/xy01/mabcxyz12/.mcrCache9.6; componentname: example
This occurs when many job array tasks run concurrently and all try to access the same temporary directory used to store the lock file. The solution is to force MATLAB to create the temporary lock files on the nodes where the tasks are running rather than in your home or scratch space. Add the following lines to your jobscript before the ./run_xxxx.sh
line:
export MCR_CACHE_ROOT=$TMPDIR/mcrCache mkdir -p $MCR_CACHE_ROOT
The $TMPDIR
variables specifies a directory local to the compute node where the job is running and private to your job. It is automatically set by the batch system.
Further information
- Some functions cannot be compiled (Everything from the symbolic toolbox for example). A full list of restrictions and exclusions can be found at
- It is often possible to speed-up MATLAB code significantly using techniques such as vectorisation, mex files, parallelisation and more. If you would like advice on how to optimise your MATLAB application please get in touch.k