{"id":226,"date":"2017-05-25T14:08:29","date_gmt":"2017-05-25T13:08:29","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/htccondor\/?page_id=226"},"modified":"2025-02-21T12:07:11","modified_gmt":"2025-02-21T12:07:11","slug":"postprocessing","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/jobs\/postprocessing\/","title":{"rendered":"Postprocessing your result files"},"content":{"rendered":"<h2>The problem<\/h2>\n<p>Most users queue up 10s, 100s or even thousands of jobs per submission. This results in numerous output files, typically one file per computed results, one per any errors reported from each job, and either a single, merged log file or, again, one per the number queued. You then need to copy all of this back to your own PC for post-processing, which can impact on our network efficiency, and also remember to tidy up or, better still, delete the copy on submitter.<\/p>\n<h2>The idea<\/h2>\n<p>The above computing paradigm is called \u2013 or at least is very similar to \u2013 <a href=\"https:\/\/en.wikipedia.org\/wiki\/MapReduce\">map-reduce<\/a> (by Google and others). The <em>map<\/em> is the aggregate of results computed in parallel, and the <em>reduce<\/em> is the post-processing to exactly what is required. Herein, we propose a mechanism whereby the reduce action is carried out as part of the Condor submission. There is one caveat: as it will be performed on our submit node it should take seconds, rather than minutes or longer. (In the rare case where post-processing is long and complicated, we suggest you do it on your own PC.)<\/p>\n<p>The implementation uses Condor\u2019s <a href=\"http:\/\/research.cs.wisc.edu\/htcondor\/dagman\/dagman.html\">DAGman<\/a> feature. The Directed Acyclic Graph Manager is a powerful and complex beast, but can be used in many simple variations, of which this is one. DAGman is a workflow or job manager on top of regular Condor. We ignore that central feature and just use its ability to attach a post-processing script at any stage. That is, we run one job (that queues as many parallel tasks as required) and attach a Bash script for Condor to execute on the submit node when all the results have been returned.<\/p>\n<p>There is one small restriction when using DAGman: there can be only one <code>Queue<\/code> line in the script. Of course it can still say <code>Queue 10000<\/code> (or whatever).<\/p>\n<h2>The example Bash script to do the desired postprocessing: postjob.sh<\/h2>\n<pre>\r\n#!\/bin\/bash\r\ncat job*.out > AggregatedOutFiles.txt\r\nrm -f job*.out\r\nfor i in *.err; do if ! test -s $i; then rm -f $i; fi; done\r\nexit 0\r\n<\/pre>\n<p>Remember the above is just an example. What it does is concatenate all the result files into a single one, deleting the originals; and finally it deletes any error files that are zero length. The <code>exit 0<\/code> at the end is important or DAGman may think your job has failed.<\/p>\n<p>Make postjob.sh executable before you submit:<\/p>\n<pre>\r\nchmod +x .\/postjob.sh\r\n<\/pre>\n<h2>The submission DAG script: submit.dag<\/h2>\n<p>This should be run in a terminal window via: <code>condor_submit_dag submit.dag<\/code> (or you can use the DropAndCompute interface provided the script is called <code>submit.dag<\/code>).<\/p>\n<pre>\r\nJob A condor_job\r\nscript post A  .\/postjob.sh\r\n<\/pre>\n<h2>The Condor job script example: condor_job<\/h2>\n<pre>\r\nuniverse = vanilla\r\nnotification = never\r\nrequirements = (arch == \"X86_64\") && (opsys == \"LINUX\")\r\nrequest_memory = 1\r\nexecutable = job.sh\r\nshould_transfer_files = yes\r\nwhen_to_transfer_output = on_exit\r\noutput = job$(Process).out\r\nerror  = job$(Process).err\r\nlog    = job.log\r\nqueue 10\r\n<\/pre>\n<h2>The example job script: job.sh<\/h2>\n<pre>\r\n#!\/bin\/bash\r\n# Pause for 10 seconds\r\nsleep 10\r\n# compute here - we just look up the target's name\r\nhostname\r\nexit 0\r\n<\/pre>\n<h2>The Download<\/h2>\n<p>All of the above scripts have been put together as a download: <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-content\/uploads\/example_postprocess.zip\">example_postprocess<\/a><\/p>\n<p>To use this, transfer it to submitter and use the following commands:<\/p>\n<pre>\r\nunzip example_postprocess.zip\r\ncd example_postprocess\r\ncondor_submit_dag submit.dag\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The problem Most users queue up 10s, 100s or even thousands of jobs per submission. This results in numerous output files, typically one file per computed results, one per any errors reported from each job, and either a single, merged log file or, again, one per the number queued. You then need to copy all of this back to your own PC for post-processing, which can impact on our network efficiency, and also remember to.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/jobs\/postprocessing\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":14,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-226","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/pages\/226","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/comments?post=226"}],"version-history":[{"count":11,"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/pages\/226\/revisions"}],"predecessor-version":[{"id":1243,"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/pages\/226\/revisions\/1243"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/pages\/14"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/htccondor\/wp-json\/wp\/v2\/media?parent=226"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}