Skip to main content

Run samtools view on WES with HTTP URLs as input

In the previous step, we configured and started instances of WES, DRS, and the Starter Kit UI.

Workflow and Dataset

We will now attempt to run a Nextflow-based samtools view workflow to count the number of reads in BAM files. We will use some small BAMs from the Tabula Muris open dataset as our test dataset.

Note: For reference, you may wish to read more about Tabula Muris, view the Github Repo, and see more info about the files uploaded to the Tabula Muris s3 bucket.

The Nextflow workflow is version controlled on Github at https://github.com/jb-adams/samtools-view-count-nf. We can tell WES to run this workflow simply by providing this Github repository URL.

Submit workflow run request to WES

The samtools-view-count-nf takes one input parameter, --input, which must reference a local (file path) or remote (URL) SAM/BAM/CRAM file. As our Tabula Muris test dataset BAMs are all stored on a public AWS S3 bucket, we can submit these as URLs to the workflow.

For now, let's work with the following BAM URL:

https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam

Let's submit a workflow run request to WES, running our samtools view workflow on the above input BAM. To do so, issue the following HTTP request using your preferred API testing tool:

Method and URL

POST http://localhost/ga4gh/wes/v1/runs

Headers

Content-Type: multipart/form-data

Form data (body)

workflow_type: NEXTFLOW
workflow_type_version: 21.04.0
workflow_url: https://github.com/jb-adams/samtools-view-count-nf
workflow_params: {"input": "https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam"}

Upon doing so, we should receive a single run_id, indicating that the workflow run request was accepted. The run_id provided to us will allow us to monitor the status of the workflow run. In this example, a run_id of 8d2f044a-a613-41c0-b67b-0477a1b1b1f8 was returned, however each time a run request is submitted to WES, and randomly generated run_id will be returned.

Example workflow run request and run_id response:

WES Workflow Run Submission

Monitor workflow run

If we give the workflow run a minute to complete, we can monitor its status by passing the run_id to the GET /runs/{run_id} endpoint:

Example request (substitute in your own run_id):

GET http://localhost/ga4gh/wes/v1/runs/8d2f044a-a613-41c0-b67b-0477a1b1b1f8

Response:

{
"run_id": "8d2f044a-a613-41c0-b67b-0477a1b1b1f8",
"request": {
"workflow_params": {
"input": "https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam"
},
"workflow_type": "NEXTFLOW",
"workflow_type_version": "21.04.0",
"workflow_url": "https://github.com/jb-adams/samtools-view-count-nf"
},
"state": "COMPLETE",
"run_log": {
"name": "jb-adams/samtools-view-count-nf",
"cmd": [
"#!/bin/bash -ue",
"echo \"Running samtools view on https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam\" >&2",
"samtools view -c https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam"
],
"start_time": "2021-09-03T13:40:06Z",
"end_time": "2021-09-03T13:40:20Z",
"stdout": "http://localhost/ga4gh/wes/v1/logs/nextflow/stdout/8d2f044a-a613-41c0-b67b-0477a1b1b1f8?workdirs=0f%2F4340a1a21885f9bfb182bf34a85d2f",
"stderr": "http://localhost/ga4gh/wes/v1/logs/nextflow/stderr/8d2f044a-a613-41c0-b67b-0477a1b1b1f8?workdirs=0f%2F4340a1a21885f9bfb182bf34a85d2f",
"exit_code": 0
},
"task_logs": [
{
"name": "samtools_view",
"cmd": [
"#!/bin/bash -ue",
"echo \"Running samtools view on https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam\" >&2",
"samtools view -c https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam"
],
"start_time": "2021-09-03T13:40:06Z",
"end_time": "2021-09-03T13:40:20Z",
"stdout": "http://localhost/ga4gh/wes/v1/logs/nextflow/stdout/8d2f044a-a613-41c0-b67b-0477a1b1b1f8/0f/4340a1a21885f9bfb182bf34a85d2f",
"stderr": "http://localhost/ga4gh/wes/v1/logs/nextflow/stderr/8d2f044a-a613-41c0-b67b-0477a1b1b1f8/0f/4340a1a21885f9bfb182bf34a85d2f",
"exit_code": 0
}
],
"outputs": {}
}

The above response payload tells us information about our workflow run:

  • state: COMPLETE indicates the workflow run completed successfully
  • the request attribute indicates the initial request parameters we supplied to trigger the workflow run
  • run_log gives us a summary of the entire run, including start and end times, stdout and stderr output
  • outputs provides URLs/paths to any output files/directories created during the workflow. In this case, no output files were created, all output is available in stdout

If we follow the URL at run_log.stdout, we will find the main result of the workflow run, that is, the read count of the input BAM file.

For example:

GET http://localhost/ga4gh/wes/v1/logs/nextflow/stdout/8d2f044a-a613-41c0-b67b-0477a1b1b1f8?workdirs=0f%2F4340a1a21885f9bfb182bf34a85d2f

Returns a response of:

848036

Therefore, the BAM file at https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam contains 848036 reads.

Run workflow with different inputs

Try running the same workflow with different input BAMs. Use WES API calls to submit the workflow run request and monitor its output (the read count). Below are some input URLs and expected read counts you can use to validate the run was configured correctly and completed successfully.

Input URLRead Count
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R1.mus.Aligned.out.sorted.bam848036
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000126-3_39_F-1-1_R2.mus.Aligned.out.sorted.bam806872
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000127-3_38_F-1-1_R1.mus.Aligned.out.sorted.bam821666
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/A1-B000127-3_38_F-1-1_R2.mus.Aligned.out.sorted.bam821604
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/B7-MAA000586-3_8_M-1-1_R1.mus.Aligned.out.sorted.bam3950888
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/C8-B002433-3_38_F-1-1_R1.mus.Aligned.out.sorted.bam1539546
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/D5-MAA000452-3_8_M-1-1_R1.mus.Aligned.out.sorted.bam3912618
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/E20-MAA000652-3_10_M-1-1_R1.mus.Aligned.out.sorted.bam758692
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/F17-B002765-3_38_F-1-1_R1.mus.Aligned.out.sorted.bam998700
https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/F19-D041912-3_8_M-1-1_R1.mus.Aligned.out.sorted.bam1521586

Summary

In this section, we ran the samtools view workflow to count reads in BAM files by executing simple HTTP requests to the WES service. You may have noticed that we did not need DRS or the UI for this section. In the next section, we will explore how to run the same workflow with DRS URLs instead of direct HTTP URLs as input, and how this abstraction provides us with much more flexibility in terms of where input resources can be stored.