{"id":2915,"date":"2019-03-29T15:57:25","date_gmt":"2019-03-29T15:57:25","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf3\/?page_id=2915"},"modified":"2019-03-29T15:58:57","modified_gmt":"2019-03-29T15:58:57","slug":"bioawk","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/bioawk\/","title":{"rendered":"bioawk"},"content":{"rendered":"<h2>Overview<\/h2>\n<p>Bioawk is an extension created by <a href=\"http:\/\/lh3lh3.users.sourceforge.net\/\">Heng Li<\/a> of <a href=\"http:\/\/www.cs.princeton.edu\/%7Ebwk\/btl.mirror\/\">Brian Kernighan&#8217;s awk<\/a>  which adds the support of several common biological data formats, including optionally gzip&#8217;ed BED, GFF, SAM, VCF, FASTA\/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input\/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk.<\/p>\n<p>Latest version dated 27-08-2013 is installed on the CSF. Bioawk does not have a version number hence date of github commit is used to identify versions.<\/p>\n<h2>Restrictions on use<\/h2>\n<p>No licence information available.<\/p>\n<h2>Set up procedure<\/h2>\n<p>To access the software you must first load the modulefile:<\/p>\n<pre>module load apps\/gcc\/bioawk\/27-08-2013<\/pre>\n<h2>Running the application<\/h2>\n<p>Please do not run bioawk on the login node. Jobs should be submitted to the compute nodes via batch.<\/p>\n<h3>Serial batch job submission<\/h3>\n<p>Make sure you have the modulefile loaded then create a batch submission script, for example:<\/p>\n<pre>#!\/bin\/bash --login\r\n#$ -cwd             # Job will run from the current directory\r\nmodule load apps\/gcc\/bioawk\/27-08-2013\r\n<code>bioawk -Hc sam '!and($flag,4)'<\/code><\/pre>\n<p>Submit the jobscript using:<\/p>\n<pre>qsub <em>scriptname<\/em><\/pre>\n<p>where <em>scriptname<\/em> is the name of your jobscript.<\/p>\n<h3>Parallel batch job submission<\/h3>\n<p>bioawk is not designed to run in parallel. If you need to run multiple similar jobs then please consider using <a title=\"Job Arrays\" href=\"http:\/\/ri.itservices.manchester.ac.uk\/csf\/csf-user-documentation\/sge-job-arrays\/\">job arrays<\/a>.<\/p>\n<h3>Examples of usage:<\/h3>\n<ol>\n<li>List the supported formats:\n<pre><code>bioawk -c help\r\n<\/code><\/pre>\n<\/li>\n<li>Extract unmapped reads without header:\n<pre><code>bioawk -c sam 'and($flag,4)' aln.sam.gz\r\n<\/code><\/pre>\n<\/li>\n<li>Extract mapped reads with header:\n<pre><code>bioawk -Hc sam '!and($flag,4)'\r\n<\/code><\/pre>\n<\/li>\n<li>Reverse complement FASTA:\n<pre><code>bioawk -c fastx '{print \"&gt;\"$name;print revcomp($seq)}' seq.fa.gz<\/code><\/pre>\n<\/li>\n<\/ol>\n<p>Further examples can be found on the <a title=\"Bioawk Help Page\" href=\"https:\/\/github.com\/ialbert\/bioawk\/blob\/master\/README.bio.rst\">bioawk help page<\/a>.<\/p>\n<h3>Recognized Formats<\/h3>\n<p>These formats may be passed as the -c flag:<\/p>\n<dl>\n<dt>bed<\/dt>\n<dd>1:chrom 2:start 3:end 4:name 5:score 6:strand 7:thickstart 8:thickend 9:rgb 10:blockcount 11:blocksizes 12:blockstarts<\/dd>\n<dt>sam<\/dt>\n<dd>1:qname 2:flag 3:rname 4:pos 5:mapq 6:cigar 7:rnext 8:pnext 9:tlen 10:seq 11:qual<\/dd>\n<dt>vcf<\/dt>\n<dd>1:chrom 2:pos 3:id 4:ref 5:alt 6:qual 7:filter 8:info<\/dd>\n<dt>gff<\/dt>\n<dd>1:seqname 2:source 3:feature 4:start 5:end 6:score 7:filter 8:strand 9:group 10:attribute<\/dd>\n<dt>fastx<\/dt>\n<dd>1:name 2:seq 3:qual<\/dd>\n<\/dl>\n<p>The fastx flag can handle both FASTA and FASTQ formats.<\/p>\n<h2>Further info<\/h2>\n<ul>\n<li><a title=\"Github\" href=\"https:\/\/github.com\/lh3\/bioawk\">Github<\/a><\/li>\n<li><a href=\"http:\/\/lh3lh3.users.sourceforge.net\/\">Heng Li<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/ialbert\/bioawk\/blob\/master\/README.bio.rst\">Bioawk help page<\/a><\/li>\n<\/ul>\n<h2>Updates<\/h2>\n<p>None.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview Bioawk is an extension created by Heng Li of Brian Kernighan&#8217;s awk which adds the support of several common biological data formats, including optionally gzip&#8217;ed BED, GFF, SAM, VCF, FASTA\/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input\/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/software\/applications\/bioawk\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":6,"featured_media":0,"parent":86,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2915","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/2915","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/comments?post=2915"}],"version-history":[{"count":3,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/2915\/revisions"}],"predecessor-version":[{"id":2918,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/2915\/revisions\/2918"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/pages\/86"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf3\/wp-json\/wp\/v2\/media?parent=2915"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}