The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
PowerFLOW
Overview
PowerFLOW (from EXA Corporation) is a fluid dynamics package providing LBM simulation techniques for real-world flow predictions.
Version 5.3a is installed on the CSF
Restrictions on use
Access to this software is restricted to a specific research group. Please contact its-ri-team@manchester.ac.uk for access information.
Set up procedure
To access the software you must first load the modulefile:
module load apps/binapps/powerflow/5.3a
Running the application
Job Steps
Please do not run PowerFLOW simulations on the login node. Jobs should be submitted to the compute nodes via batch. The PowerFLOW user interface (PowerCASE) is not normally run on the CSF. Only the discretize / decompose / simulation batch processes should be used. You should upload your .cdi
file (prepared with the PowerCASE GUI) to the CSF for processing.
- To prepare your
.cdi
file (usually done on your local workstation), load yourmyinput.case
file in to the PowerCASE gui and run the Process > Prepare CDI… menu command to generate amyinput.cdi
file. It is this.cdi
file which must be uploaded to the CSF.
PowerFLOW simulation on the CSF then typically consists of three steps:
- Discretization – converts continuous geometry specified in your
myinput.cdi
file in to discrete voxels and surfels. Creates amyinput.lgi
file to be used by the decomposer or simulator. This file can be very large. Hence you should run jobs in your scratch directory to ensure they do not run out of disk space. If you run from your home area you could exceed the quota applied to that area. Note: the discretizer is multithreaded but does not run across multiple nodes via MPI. Hence this step is usually run in a separate batch job to the simulator (and decomposer) which can be run with MPI across multiple nodes. - Decomposition – partitions your geometry ready for multi-core simulation. This phase is run automatically by the simulator if not manually specified. It reads the
myinput.lgi
written by the discretizer, passing the results to the simulator without writing a file. A file will be written if this step is run individually or the--decomp_to_file
is added to the simulation step (see below). - Simulation – performs the simulation that you’ve set up in PowerCASE. Creates various results files. This step will automatically run the decomposer if it needs to. Simluation can be run across multiple nodes using MPI and hence with a lot of cores.
By default, each phase will write statistics and other messages to files named discretizer.o
, decomposer.o
and simulator.o
respectively. You should check these files for error messages if something is wrong with your simulation results.
The Extra Command-process
The simulation (and decomposition) stage requires an extra process to be run and this will be run automatically by PowerFLOW. This command process application is the executable which reads input files and distributes the data to the simulator processes (a separate executable and you will usually have many simulator processes running). When specifying the number of cores to use for your simulation you will need to think about how many CSF compute nodes you wish to use and remember that PowerFLOW will add one (by default) to the number of cores requested. This extra core will be used by the command processor. Hence you may need to reduce the number of cores you request for simulation to allow PowerFLOW to increase it for the command process!
For example: suppose you wish to run a simulation on two 24-core nodes (48 cores available in total). In this case you should either
- Request 47 cores for simulation and PowerFLOW will add one extra core for the command process. This will give you 48 cores which is exactly two 24-core nodes.
- It is also possible to assign more than one core to the command process so that it has more memory to work with. But you must then reduce the number of cores used by the simulation processes accordingly.
- Request that PowerFLOW does not allocate a core to the command processes. The process will still run but it will share a core (and memory) used by one of the simulation processes. This is NOT recommended for large simulations because the command process will likely run out of memory. But it can be useful to do this for small simulations to simplify the job submission command-line.
See below for more detailed examples.
Submitting Jobs to the Batch System
PowerFLOW uses its own script to submit jobs to the batch system named exaqsub
. Hence you do not need to write a jobscript. Everything is done with command-line flags to the exaqsub
command.
In particular you must specify the parallel environment on the command-line and the number of cores to use. The general forms of the commands to use on the CSF are given below.
Ensure you replace PEname with a valid CSF Parallel Environment name (for example smp.pe
). Replace N with the number of cores to use.
- The following will submit to the batch system two jobs: a discretizer job followed by a decompose-and-simulate job (both phases done in the same batch job). The second job will wait until discretizer job has finished.
exaqsub --mpirsh ssh --submit_all_now --pe PEname --nprocs N --cpcores 0 myinputfile
- Note 1: because we are using only a single
exaqsub
command, the discretizer and simulator will both use the same PE and number of cores. But the discretizer must run on a single node (it is multithreaded but not an MPI program) so the PE and number of cores must request only a single node. If you wish to run your simulation across multiple nodes, possibly using hundreds of cores, you will need to run the discretizer separately from the simulator. See below for how to do this. - Note 2: We do not use a dedicated core for the extra command process used by the simulator. This is generally NOT recommended. But without the
--cpcores 0
option the default action would be that PowerFLOW requests N+1 cores in the batch system for decomposition and simulation processes. This may result in a bad submission request for the CSF.
- Note 1: because we are using only a single
- The following will submit to the batch system only the discretizer job. Remember that the discretizer must only run on a single node – it is not an MPI program and so cannot run across multiple nodes. Your choice of PE and number of cores must satisfy this requirement.
exaqsub --mpirsh ssh --pe PEname --nprocs N --disc myinput
- The following will submit to the batch system the decomposition and simulation steps only. We only request simulation on the command-line but PowerFLOW will automatically do the decomposition step:
exaqsub --mpirsh ssh --pe PEname --nprocs N --sim --cpcores 1 myinput
Note that the extra command process will be given 1 core (the default if the flag is not given). Hence you must remember that the job submitted to the batch system will request
N+1
cores. - The
--sim
step will automatically run the--decomp
step if it needs to. No intermediate decomposition output file will be written unless you add--decomp_to_file
to the--sim
step (or run the--decomp
step separately), in which case themyinput.lgi
file will be updated with decomposition information.
Examples / Tutorial
Obtain the Example files
The following examples use the nano.cdi
file supplied with PowerFLOW. Replace this name with your own file as required. To copy the nano example file run the following on the CSF login node:
# Create a directory in scratch for your work cd ~/scratch mkdir pfexample cd pfexample # Set up to use PowerFLOW module load apps/binapps/powerflow/5.3a # Now copy the nano.case and nano.cdi files cp $EXA_DIST/testfiles/nano.c* . # The . at the end is important
Prepare the Simulation
Note that the .cdi
file has already been prepared for simulation from the .case
file and so we do not need to run the PowerCASE GUI on the CSF. You should do this step on your own workstation with your own .case
files and upload the .cdi
to the CSF.
However, if you wish to do this step on the CSF, run the following commands (which assume you have already done the previous commands to obtain the files, load the modulefile etc):
-
Start the GUI interactively on a compute node:
qrsh -l inter -l short -V -cwd powercase nano.case
- In the PowerCASE GUI go to the Process > Prepare CDI… menu option (or hit CTRL+p).
- Acecpt the
nano.cdi
name, pressing the Prepare button. Then exit the PowerCASE GUI (you don’t need to save the.case
file).
We can now run any of the following commands to submit the nano.cdi
file to the batch system for simulation. For your real simulations you may need to use much larger core counts and choose an appropriate PE for that number of cores.
Single-node Discretize + Decompose + Simulate Job
The following command will use 4 cores in a single Intel node to run all three job steps (see above). It does this by submitting two batch jobs: the first performs discretization, the second performs decomposition and simulation (these two steps are normally run together as a single batch job). The jobs will automatically run in the correct order (exaqsub
uses a job dependency to make the second job wait until the first job has finished).
The important thing to note is that we run this as a single-node job, as indicated by our choice of PE and number of cores. It must be remembered that the discretizer cannot be run across multiple compute nodes because it is not an MPI program. It is is multi-threaded so can use multiple cores in the same compute node. By using the simple command below we ask all three phases to run in the same PE (and same number of cores) and so we must choose a PE and number of cores suitable for the discretizer. See later for how to run the simulation phase using a much higher number of cores.
exaqsub --mpirsh ssh --submit_all_now --pe smp.pe --nprocs 4 --cpcores 0 nano
In the above command we also ensure the batch job requests 4 cores (not 5 cores) in the batch system by adding the --cpcores 0
flag. This causes the extra command-process to share a core with one of the simluator processes. This is NOT recommended for large simulations. An alternative would be to request 3 cores for simulation and allow 1 core to be used for the command process using:
exaqsub --mpirsh ssh --submit_all_now --pe smp.pe --nprocs 3 --cpcores 1 nano
- The above commands will create result files
forces.csnc
,simvol.fnc
,simvol.snc
. - Statistics and performance figures about the simulation will be written to
simulation.o
. - The discretizer step will have written
nano.lgi
anddiscretize.o
. - Statistics about the decompose step are in
decompose.o
(no intermediate result file from the decomposer is written).
Single-node (multicore) Discretize Job
The discretize step can be performed on its own. As noted earlier, this must be run on a single compute node, but can use multiple cores. Hence your choice of PE must ensure this condition. For example, to run on an Intel node with 16 cores:
exaqsub --pe smp.pe --nprocs 16 --disc nano
Note that the --mpirsh ssh
flags have been removed because the discretizer doesn’t use MPI. It won’t do any harm if you leave these flags on the command-line though. The discretizer does not use an extra command-process and so we do not specify the --cpcores
flag.
Multi-node Decompose + Simulate Job
To submit a multi-node job to the 64-core AMD Bulldozer nodes, using 128 cores (must be a multiple of 64 hence two complete nodes) performing the simulate step and, if necessary the decompose step (assuming you have run the discretize step earlier) and also giving the extra command-process two cores to work with (it will only use one but can access the memory of two cores):
exaqsub --mpirsh ssh --sim --pe orte-64bd-ib.pe --nprocs 126 --cpcores 2 nano
Notice that we request 126 cores for simulation and 2 cores for the command-process giving a total of 128 cores. Running qstat
will show that a request to the batch system has been made for 128 cores, thereby satisfying the requirements of the orte-64bd-ib.pe
parallel environment which requires the number of cores to be a multiple of 64.
To do the same on the Intel Nodes, we recommend using the InfiniBand connected nodes (which use faster networking). In this case the PE is orte-24-ib.pe
and the number of cores must be 48 or more and always a multiple of 24. For example:
exaqsub --mpirsh ssh --sim --pe orte-24-ib.pe --nprocs 46 --cpcores 2 nano
Single-node Discretize with Multi-node Decompose + Simulate Job
Finally, to combine the submission of the Discretize step as a single-node job and the Simulate (and possible decompose) step as a multi-node job, specify the PEs for discretization and simulation separately using the flags --pe disc,PEname
and --pe sim,PEname
as follows:
exaqsub --mpirsh ssh --submit_all_now \ --disc --pe disc,smp.pe --discnprocs 4 \ --sim --pe sim,orte-24-ib.pe --simnprocs 47 --cpcores 1 nano
For a large simulation running on the AMD compute nodes (suitable for MACE users) a similar command-line would be:
exaqsub --mpirsh ssh --submit_all_now \ --disc --pe disc,smp-64bd.pe --discnprocs 64 \ --sim --pe sim,orte-64bd-ib.pe --simnprocs 126 --cpcores 2 nano
- The above command will submit a 64-core discretization job and a 128-core decompose-and-simulation job. The decompose-and-simulation will use 126 cores with 2 extra cores being used by the PowerFLOW command process. We use 2 cores to give it 2-cores worth of memory to access (a large simulation may need more memory).
- The
--submit_all_now
flag ensures that two jobs are submitted to the batch system immediately from the login node (a job dependency makes them run in the correct order). If you omit this flag the discretizer job will run and then it will try to submit the simulator job to the batch system. This is not allowed on the CSF – jobs running on compute nodes are not able to submit further jobs to the batch system. Hence the discretizer job will fail at the point it tries to submit the simulator job. By using the--submit_all_now
flag, both jobs are submitted from the login node, which is the correct way to submit jobs.
Additional Flags
To rename the output log files from the different job types and include the batch job id:
--joblog @N_@J.o # @N will be replaced with the job name, @J by the job id
To have start and end times reported to a file, add the following:
--logfile filename
To have an email sent when the job starts and ends, add the following:
--notify
For a complete list of flags run, on the login node:
exaqsub --help | less
Further info
- The PowerFLOW manuals (.pdf) are available on the CSF, one you have loaded the modulefile using
evince $POWERFLOW_HOME/doc/PowerFLOW-UsersGuide.pdf # User Guide evince $POWERFLOW_HOME/doc/PowerFLOW-CLR.pdf # Command-line reference
- PowerFLOW website
Updates
None.