{"id":2250,"date":"2015-02-23T17:24:31","date_gmt":"2015-02-23T17:24:31","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/?page_id=2250"},"modified":"2016-07-22T12:59:36","modified_gmt":"2016-07-22T12:59:36","slug":"powerflow","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/powerflow\/","title":{"rendered":"PowerFLOW"},"content":{"rendered":"<h2>Overview<\/h2>\n<p><a href=\"http:\/\/www.exa.com\/powerflow.html\">PowerFLOW<\/a> (from EXA Corporation) is a fluid dynamics package providing LBM simulation techniques for real-world flow predictions.<\/p>\n<p>Version 5.3a is installed on the CSF<\/p>\n<h2>Restrictions on use<\/h2>\n<p>Access to this software is restricted to a specific research group. Please contact <a href=\"m&#97;&#105;&#x6c;&#x74;o:&#105;&#116;&#x73;&#x2d;ri&#45;&#116;&#x65;&#x61;m&#64;&#109;&#x61;&#x6e;&#x63;h&#101;&#115;&#x74;&#x65;&#x72;&#46;&#97;&#99;&#x2e;&#x75;&#x6b;\">&#105;&#x74;s&#45;&#x72;i&#x2d;&#x74;&#101;&#x61;m&#64;&#x6d;a&#110;&#x63;&#104;&#x65;s&#116;&#x65;r&#46;&#x61;&#99;&#x2e;&#x75;&#107;<\/a> for access information.<\/p>\n<h2>Set up procedure<\/h2>\n<p>To access the software you must first load the modulefile:<\/p>\n<pre>\r\nmodule load apps\/binapps\/powerflow\/5.3a\r\n<\/pre>\n<h2>Running the application<\/h2>\n<h3>Job Steps<\/h3>\n<p>Please do not run PowerFLOW simulations on the login node. Jobs should be submitted to the compute nodes via batch. The PowerFLOW user interface (PowerCASE) is <strong>not<\/strong> normally run on the CSF. Only the discretize \/ decompose \/ simulation batch processes should be used. You should upload your <code>.cdi<\/code> file (prepared with the PowerCASE GUI) to the CSF for processing.<\/p>\n<ul>\n<li>To prepare your <code>.cdi<\/code> file (usually done on your local workstation), load your <code><em>myinput<\/em>.case<\/code> file in to the PowerCASE gui and run the <em>Process > Prepare CDI&#8230;<\/em> menu command to generate a <code><em>myinput<\/em>.cdi<\/code> file. It is this <code>.cdi<\/code> file which must be uploaded to the CSF.<\/li>\n<\/ul>\n<p>PowerFLOW simulation on the CSF then typically consists of three steps:<\/p>\n<ol class=\"gaplist\">\n<li>Discretization &#8211; converts continuous geometry specified in your <code><em>myinput<\/em>.cdi<\/code> file in to discrete voxels and surfels. Creates a <code><em>myinput<\/em>.lgi<\/code> file to be used by the decomposer or simulator. This file can be very large. Hence you should run jobs in your <em>scratch<\/em> directory to ensure they do not run out of disk space. If you run from your <em>home<\/em> area you could exceed the quota applied to that area. <strong>Note:<\/strong> the discretizer is <em>multithreaded<\/em> but does <strong>not<\/strong> run across multiple nodes via MPI. Hence this step is usually run in a separate batch job to the simulator (and decomposer) which <em>can be<\/em> run with MPI across multiple nodes.<\/li>\n<li>Decomposition &#8211; partitions your geometry ready for multi-core simulation. This phase is run automatically by the simulator if not manually specified. It reads the <code><em>myinput<\/em>.lgi<\/code> written by the discretizer, passing the results to the simulator without writing a file. A file <em>will<\/em> be written if this step is run individually or the <code>--decomp_to_file<\/code> is added to the simulation step (see below).<\/li>\n<li>Simulation &#8211; performs the simulation that you&#8217;ve set up in PowerCASE. Creates various results files. This step will automatically run the decomposer if it needs to. Simluation can be run across multiple nodes using MPI and hence with a lot of cores.<\/li>\n<\/ol>\n<p>By default, each phase will write statistics and other messages to files named <code>discretizer.o<\/code>, <code>decomposer.o<\/code> and <code>simulator.o<\/code> respectively. You should check these files for error messages if something is wrong with your simulation results.<\/p>\n<h3>The Extra Command-process<\/h3>\n<p>The simulation (and decomposition) stage requires an extra process to be run and this will be run automatically by PowerFLOW. This <em>command process<\/em> application is the executable which reads input files and distributes the data to the <em>simulator processes<\/em> (a separate executable and you will usually have many simulator processes running). When specifying the number of cores to use for your simulation you will need to think about how many CSF compute nodes you wish to use and remember that PowerFLOW will add one (by default) to the number of cores requested. This extra core will be used by the <em>command processor<\/em>. Hence you may need to reduce the number of cores you request for simulation to allow PowerFLOW to increase it for the command process!<\/p>\n<p>For example: suppose you wish to run a simulation on two 24-core nodes (48 cores available in total). In this case you should either<\/p>\n<ul>\n<li>Request 47 cores for simulation and PowerFLOW will add one extra core for the command process. This will give you 48 cores which is exactly two 24-core nodes.\n<ul>\n<li>It is also possible to assign more than one core to the command process so that it has more memory to work with. But you must then reduce the number of cores used by the simulation processes accordingly.<\/li>\n<\/ul>\n<\/li>\n<li>Request that PowerFLOW does not allocate a core to the command processes. The process will still run but it will share a core (and memory) used by one of the simulation processes. This is <em>NOT<\/em> recommended for large simulations because the command process will likely run out of memory. But it can be useful to do this for small simulations to simplify the job submission command-line.<\/li>\n<\/ul>\n<p>See below for more detailed examples.<\/p>\n<h3>Submitting Jobs to the Batch System<\/h3>\n<p>PowerFLOW uses its own script to submit jobs to the batch system named <code>exaqsub<\/code>. Hence you do <strong>not<\/strong> need to write a jobscript. Everything is done with command-line flags to the <code>exaqsub<\/code> command.<\/p>\n<p>In particular you must specify the parallel environment on the command-line and the number of cores to use. The general forms of the commands to use on the CSF are given below. <\/p>\n<p>Ensure you replace <em>PEname<\/em> with a valid CSF <a href=\"\/csf2\/csf-user-documentation\/parallel-jobs\/\">Parallel Environment<\/a> name (for example <code>smp.pe<\/code>). Replace <em>N<\/em> with the number of cores to use.<\/p>\n<ul>\n<li>The following will submit to the batch system two jobs: a <em>discretizer<\/em> job followed by a <em>decompose<\/em>-and-<em>simulate<\/em> job (both phases done in the same batch job). The second job will wait until discretizer job has finished.\n<pre>\r\nexaqsub --mpirsh ssh --submit_all_now --pe <em>PEname<\/em> --nprocs <em>N<\/em> --cpcores 0  <em>myinputfile<\/em> \r\n<\/pre>\n<ul>\n<li>Note 1: because we are using only a single <code>exaqsub<\/code> command, the discretizer and simulator will both use the same PE and number of cores. But the discretizer <em>must<\/em> run on a single node (it is multithreaded but not an MPI program) so the PE and number of cores must request only a single node. If you wish to run your simulation across multiple nodes, possibly using hundreds of cores, you will need to run the discretizer separately from the simulator. See below for how to do this.<\/li>\n<li>Note 2: We do not use a dedicated core for the extra <em>command process<\/em> used by the simulator. This is generally NOT recommended. But without the <code>--cpcores 0<\/code> option the default action would be that PowerFLOW requests <em>N+1<\/em> cores in the batch system for decomposition and simulation processes. This may result in a bad submission request for the CSF.<\/li>\n<\/ul>\n<\/li>\n<li>The following will submit to the batch system only the <em>discretizer<\/em> job. Remember that the discretizer must only run on a single node &#8211; it is <em>not<\/em> an MPI program and so cannot run across multiple nodes. Your choice of PE and number of cores must satisfy this requirement.\n<pre>\r\nexaqsub --mpirsh ssh --pe <em>PEname<\/em> --nprocs <em>N<\/em> --disc <em>myinput<\/em> \r\n<\/pre>\n<\/li>\n<li>The following will submit to the batch system the decomposition and simulation steps only. We only request simulation on the command-line but PowerFLOW will automatically do the decomposition step:\n<pre>\r\nexaqsub --mpirsh ssh --pe <em>PEname<\/em> --nprocs <em>N<\/em> --sim --cpcores 1 <em>myinput<\/em> \r\n<\/pre>\n<p>Note that the extra <em>command process<\/em> will be given 1 core (the default if the flag is not given). Hence you must remember that the job submitted to the batch system will request <code><em>N<\/em>+1<\/code> cores.<\/li>\n<li>The <code>--sim<\/code> step will automatically run the <code>--decomp<\/code> step if it needs to. No intermediate decomposition output file will be written unless you add <code>--decomp_to_file<\/code> to the <code>--sim<\/code> step (or run the <code>--decomp<\/code> step separately), in which case the <code><em>myinput<\/em>.lgi<\/code> file will be updated with decomposition information.\n<\/li>\n<\/ul>\n<h2>Examples \/ Tutorial<\/h2>\n<h3>Obtain the Example files<\/h3>\n<p>The following examples use the <code>nano.cdi<\/code> file supplied with PowerFLOW. Replace this name with your own file as required. To copy the nano example file run the following on the CSF login node:<\/p>\n<pre>\r\n# Create a directory in scratch for your work\r\ncd ~\/scratch\r\nmkdir pfexample\r\ncd pfexample\r\n\r\n# Set up to use PowerFLOW\r\nmodule load apps\/binapps\/powerflow\/5.3a\r\n\r\n# Now copy the nano.case and nano.cdi files\r\ncp $EXA_DIST\/testfiles\/nano.c* .                 # The <strong>.<\/strong> at the end is important\r\n<\/pre>\n<h3>Prepare the Simulation<\/h3>\n<p>Note that the <code>.cdi<\/code> file has already been <em>prepared<\/em> for simulation from the <code>.case<\/code> file and so we do not need to run the PowerCASE GUI on the CSF. You should do this step on your own workstation with your own <code>.case<\/code> files and upload the <code>.cdi<\/code> to the CSF.<\/p>\n<p>However, if you wish to do this step on the CSF, run the following commands (which assume you have already done the previous commands to obtain the files, load the modulefile etc):<\/p>\n<ul>\n<li>\nStart the GUI interactively on a compute node:<\/p>\n<pre>\r\nqrsh -l inter -l short -V -cwd powercase nano.case\r\n<\/pre>\n<\/li>\n<li>In the PowerCASE GUI go to the <em>Process > Prepare CDI&#8230;<\/em> menu option (or hit CTRL+p).<\/li>\n<li>Acecpt the <code>nano.cdi<\/code> name, pressing the <em>Prepare<\/em> button. Then exit the PowerCASE GUI (you don&#8217;t need to save the <code>.case<\/code> file).\n<\/li>\n<\/ul>\n<p>We can now run any of the following commands to submit the <code>nano.cdi<\/code> file to the batch system for simulation. For your real simulations you may need to use much larger core counts and <a href=\"\/csf2\/csf-user-documentation\/parallel-jobs\/\">choose an appropriate PE<\/a> for that number of cores.<\/p>\n<h3>Single-node Discretize + Decompose + Simulate Job<\/h3>\n<p>The following command will use 4 cores in a single Intel node to run all three job steps (see above). It does this by submitting two batch jobs: the first performs discretization, the second performs decomposition and simulation (these two steps are normally run together as a single batch job). The jobs will automatically run in the correct order (<code>exaqsub<\/code> uses a job dependency to make the second job wait until the first job has finished).<\/p>\n<p>The important thing to note is that we run this as a single-node job, as indicated by our choice of PE and number of cores. It must be remembered that the discretizer <em>cannot<\/em> be run across multiple compute nodes because it is <em>not<\/em> an MPI program. It <em>is<\/em> is multi-threaded so can use multiple cores in the same compute node. By using the simple command below we ask all three phases to run in the <em>same<\/em> PE (and same number of cores) and so we must choose a PE and number of cores suitable for the discretizer. See later for how to run the simulation phase using a much higher number of cores. <\/p>\n<pre>\r\nexaqsub --mpirsh ssh --submit_all_now --pe smp.pe --nprocs 4 --cpcores 0 nano\r\n<\/pre>\n<p>In the above command we also ensure the batch job requests 4 cores (not 5 cores) in the batch system by adding the <code>--cpcores 0<\/code> flag. This causes the extra <em>command-process<\/em> to share a core with one of the simluator processes. This is NOT recommended for large simulations. An alternative would be to request 3 cores for simulation and allow 1 core to be used for the <em>command process<\/em> using:<\/p>\n<pre>\r\nexaqsub --mpirsh ssh --submit_all_now --pe smp.pe --nprocs 3 --cpcores 1  nano\r\n<\/pre>\n<ul>\n<li>The above commands will create result files <code>forces.csnc<\/code>, <code>simvol.fnc<\/code>, <code>simvol.snc<\/code>.<\/li>\n<li>Statistics and performance figures about the simulation will be written to <code>simulation.o<\/code>.<\/li>\n<li>The discretizer step will have written <code>nano.lgi<\/code> and <code>discretize.o<\/code>.<\/li>\n<li>Statistics about the decompose step are in <code>decompose.o<\/code> (no intermediate result file from the decomposer is written).<\/li>\n<\/ul>\n<h3>Single-node (multicore) Discretize Job<\/h3>\n<p>The discretize step can be performed on its own. As noted earlier, this must be run on a single compute node, but can use multiple cores. Hence your choice of PE must ensure this condition. For example, to run on an Intel node with 16 cores:<\/p>\n<pre>\r\nexaqsub --pe smp.pe --nprocs 16 --disc nano\r\n<\/pre>\n<p>Note that the <code>--mpirsh ssh<\/code> flags have been removed because the discretizer doesn&#8217;t use MPI. It won&#8217;t do any harm if you leave these flags on the command-line though. The discretizer does not use an extra <em>command-process<\/em> and so we do not specify the <code>--cpcores<\/code> flag.<\/p>\n<h3>Multi-node Decompose + Simulate Job<\/h3>\n<p>To submit a multi-node job to the 64-core AMD Bulldozer nodes, using 128 cores (must be a multiple of 64 hence two complete nodes) performing the <em>simulate<\/em> step and, if necessary the <em>decompose<\/em> step (assuming you have run the <em>discretize<\/em> step earlier) and also giving the extra <em>command-process<\/em> two cores to work with (it will only use one but can access the memory of two cores):<\/p>\n<pre>\r\nexaqsub --mpirsh ssh --sim --pe orte-64bd-ib.pe --nprocs 126 --cpcores 2 nano\r\n<\/pre>\n<p>Notice that we request 126 cores for simulation and 2 cores for the <em>command-process<\/em> giving a total of 128 cores. Running <code>qstat<\/code> will show that a request to the batch system has been made for 128 cores, thereby satisfying the requirements of the <code>orte-64bd-ib.pe<\/code> parallel environment which requires the number of cores to be a multiple of 64.<\/p>\n<p>To do the same on the Intel Nodes, we recommend using the InfiniBand connected nodes (which use faster networking). In this case the PE is <code>orte-24-ib.pe<\/code> and the number of cores must be 48 or more and always a multiple of 24. For example:<\/p>\n<pre>\r\nexaqsub --mpirsh ssh --sim --pe orte-24-ib.pe --nprocs 46 --cpcores 2 nano\r\n<\/pre>\n<h3>Single-node Discretize with Multi-node Decompose + Simulate Job<\/h3>\n<p>Finally, to combine the submission of the Discretize step as a single-node job and the Simulate (and possible decompose) step as a multi-node job, specify the PEs for discretization and simulation separately using the flags <code>--pe disc,<em>PEname<\/em><\/code> and <code>--pe sim,<em>PEname<\/em><\/code> as follows:<\/p>\n<pre>\r\nexaqsub --mpirsh ssh --submit_all_now \\\r\n        --disc --pe disc,smp.pe       --discnprocs 4 \\\r\n        --sim  --pe sim,orte-24-ib.pe --simnprocs 47 --cpcores 1 nano\r\n<\/pre>\n<p>For a large simulation running on the AMD compute nodes (suitable for MACE users) a similar command-line would be:<\/p>\n<pre>\r\nexaqsub --mpirsh ssh --submit_all_now \\\r\n        --disc --pe disc,smp-64bd.pe --discnprocs 64 \\\r\n        --sim  --pe sim,orte-64bd-ib.pe --simnprocs 126 --cpcores 2 nano\r\n<\/pre>\n<ul>\n<li>The above command will submit a 64-core discretization job and a 128-core decompose-and-simulation job. The decompose-and-simulation will use 126 cores with 2 extra cores being used by the PowerFLOW <em>command process<\/em>. We use 2 cores to give it 2-cores worth of memory to access (a large simulation may need more memory).<\/li>\n<li>The <code>--submit_all_now<\/code> flag ensures that two jobs are submitted to the batch system immediately from the login node (a job dependency makes them run in the correct order). If you omit this flag the discretizer job will run and then it will try to submit the simulator job to the batch system. This is not allowed on the CSF &#8211; jobs running on compute nodes are not able to submit further jobs to the batch system. Hence the discretizer job will fail at the point it tries to submit the simulator job. By using the <code>--submit_all_now<\/code> flag, both jobs are submitted from the login node, which <em>is<\/em> the correct way to submit jobs.<\/li>\n<\/ul>\n<h3>Additional Flags<\/h3>\n<p>To rename the output log files from the different job types and include the batch job id:<\/p>\n<pre>\r\n--joblog @&#78;&#x5f;&#x40;J&#46;&#x6f;         # @N will be replaced with the job name, @J by the job id\r\n<\/pre>\n<p>To have start and end times reported to a file, add the following:<\/p>\n<pre>\r\n--logfile <em>filename<\/em>\r\n<\/pre>\n<p>To have an email sent when the job starts and ends, add the following:<\/p>\n<pre>\r\n--notify\r\n<\/pre>\n<p>For a complete list of flags run, on the login node:<\/p>\n<pre>\r\nexaqsub --help | less\r\n<\/pre>\n<h2>Further info<\/h2>\n<ul>\n<li>The PowerFLOW manuals (.pdf) are available on the CSF, one you have loaded the modulefile using\n<pre>\r\nevince $POWERFLOW_HOME\/doc\/PowerFLOW-UsersGuide.pdf       # User Guide\r\nevince $POWERFLOW_HOME\/doc\/PowerFLOW-CLR.pdf              # Command-line reference\r\n<\/pre>\n<\/li>\n<li><a href=\"http:\/\/www.exa.com\/powerflow.html\">PowerFLOW website<\/a><\/li>\n<\/ul>\n<h2>Updates<\/h2>\n<p>None.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview PowerFLOW (from EXA Corporation) is a fluid dynamics package providing LBM simulation techniques for real-world flow predictions. Version 5.3a is installed on the CSF Restrictions on use Access to this software is restricted to a specific research group. Please contact i&#116;&#x73;&#x2d;&#x72;i&#45;&#x74;&#x65;&#x61;m&#64;&#x6d;&#x61;&#x6e;c&#104;&#x65;&#x73;&#x74;e&#114;&#x2e;&#x61;&#x63;&#46;&#117;&#x6b; for access information. Set up procedure To access the software you must first load the modulefile: module load apps\/binapps\/powerflow\/5.3a Running the application Job Steps Please do not run PowerFLOW simulations on the login.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/powerflow\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":31,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2250","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/2250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/comments?post=2250"}],"version-history":[{"count":20,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/2250\/revisions"}],"predecessor-version":[{"id":3228,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/2250\/revisions\/3228"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/31"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/media?parent=2250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}