The following examples demonstrate the commands in the BaseSpace CLI tool. For more information about the CLI and a list of commands, see CLI Overview.

Note that in examples where the output is very long it has been contracted to make this document more manageable. Users are encouraged to follow these examples whilst trying the commands for themselves on their own data to see the full output in their own system.

Authentication, listing projects and appresults

Install:

See main instructions.

Authenticate using default settings:

$ bs auth                                                           
Please go to this URL to authenticate:                                
https://basespace.illumina.com/oauth/device?code=jfHSG                
Created config file /home/psaffrey/.basespace/default.cfg             
Welcome, BSSH.V2 TestUser                                             

The default settings include:

  • The BSSH instance to use (default: US Virginia)
  • The scopes to request; these dictate the actions that can be performed
  • A timeout - how long to wait for authentication before the command fails

Inspect the token to see what it does:

Example:

$bs whoami
+----------------+-------------------------------------------------+
| Name           | BaseSpace User                                  |
| Id             | 1234567                                         |
| Email          | basespaceuser@illumina.com                      |
| DateCreated    | 2015-01-16 15:31:22 +0000 UTC                   |
| DateLastActive | 2017-06-01 12:59:24 +0000 UTC                   |
| Host           | https://api.basespace.illumina.com              |
| Scopes         | ["READ GLOBAL" "CREATE GLOBAL" "BROWSE GLOBAL"] |
+----------------+-------------------------------------------------+

List projects:

$ bs list projects   
+--------------------------------------------------+----------+---------------+
|                       Name                       |    Id    |   TotalSize   |
+--------------------------------------------------+----------+---------------+
| NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | 36080093 | 2233311909088 |
+--------------------------------------------------+----------+---------------+

List datasets:

$ bs list datasets
+--------------------------+-------------------------------------+--------------------------------------------------+---------------------+
|           Name           |                 Id                  |                   Project.Name                   |   DataSetType.Id    |
+--------------------------+-------------------------------------+--------------------------------------------------+---------------------+
| NA12878-I13_L002         | ds.184ba3d796f343f4886b4aa7fb43c496 | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I13_L001         | ds.c805113ed9884caa8912dafdf8edd63d | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I54_L001         | ds.dc5657d91983479eb0dd6abb53b9d60f | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I85_L001         | ds.0a7781b4d7684113a4c64c1f2ca3c175 | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
...

List all the available headers for datasets:

$ bs dataset headers                                                 
[                                                                    
    "Id",                                                                 
    "Name",                                                               
    "AppSession.Id",                                                      
    "AppSession.Name",                                                    
    "AppSession.Application.AppFamilySlug",                               
    "AppSession.Application.AppVersionSlug",                              
    "AppSession.Application.Id",                                          
    "AppSession.Application.VersionNumber",                               
...

Relist all datasets

Using custom columns selected from the headers list:

$ bs list datasets -F Name -F QcStatus -F TotalSize -F AppSession.Application.Name                                           
+--------------------------+-----------+-------------+---------------------------------+
|           Name           | QcStatus  |  TotalSize  |   AppSession.Application.Name   |
+--------------------------+-----------+-------------+---------------------------------+
| NA12878-I13_L002         | Undefined | 4047606231  | FASTQ Generation                |
| NA12878-I13_L001         | Undefined | 4118853581  | FASTQ Generation                |
| NA12878-I54_L001         | Undefined | 3496655065  | FASTQ Generation                |
| NA12878-I85_L001         | Undefined | 2462111619  | FASTQ Generation                |
| NA12878-I87_L001         | Undefined | 2271497152  | FASTQ Generation                |
...

Filtering list results

BaseSpaceCLI provides several options for filtering the entities that are output by a list command.

Filtering by name

The option to filter results on an entity field is --filter-term. By default, this filters on the Name field.

$ bs list projects --filter-term=examples
+---------------+---------+-----------+
|     Name      |   Id    | TotalSize |
+---------------+---------+-----------+
| data_examples | 5472467 | 26510     |
+---------------+---------+-----------+

The filter term is specified as a regular expression:

$ bs list appsessions --filter-term=" .* "
+------------------------------------------+----------+-----------------+
|                   Name                   |    Id    | ExecutionStatus |
+------------------------------------------+----------+-----------------+
| Illumina's Uploader 2012-11-19 22:06:29Z | 1313312  | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1306305  | Complete        |
| BaseSpaceCLI 2017-04-04 11:07:54Z        | 10743733 | Complete        |
+------------------------------------------+----------+-----------------+

Filtering by another field

You can also specify the field on which to filter by using the --filter-term option:

$ bs list datasets --filter-field=Project.Name --filter-term=data
+-----------+-------------------------------------+---------------+---------------------+
|   Name    |                 Id                  | Project.Name  |   DataSetType.Id    |
+-----------+-------------------------------------+---------------+---------------------+
| valid     | ds.5c4200d4f52e4a9dae86fd8b166e296d | data_examples | illumina.fastq.v1.8 |
| test_data | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples | common.files        |
+-----------+-------------------------------------+---------------+---------------------+

This is necessary for entities that do not have a "Name" field, like biosamples:

ukch-ofclt2606:fluidics psaffrey$ bs list biosamples --filter-term=demo
ERROR: *** Name "Name" not found in object ***
ukch-ofclt2606:fluidics psaffrey$ bs list biosamples --filter-term=demo --filter-field=BioSampleName
+-------------------------------+---------+---------------+-------------------+--------+
|         BioSampleName         |   Id    | ContainerName | ContainerPosition | Status |
+-------------------------------+---------+---------------+-------------------+--------+
| HiSeq_2500_NA12878_demo_2x150 | 2280211 |               |                   | New    |
+-------------------------------+---------+---------------+-------------------+--------+

Filtering by date

You can also specify the age of entities to be displayed by using --older-than and --newer-than.

$ bs list datasets  -F Name -F DateModified --newer-than=400d
+------------------------------------------+-------------------------------+
|                   Name                   |         DateModified          |
+------------------------------------------+-------------------------------+
| HiSeq 2500 NA12878 demo 2x150            | 2017-06-30 22:17:17 +0000 UTC |
| BWA GATK - HiSeq 2500 NA12878 demo 2x150 | 2017-07-04 03:09:15 +0000 UTC |
+------------------------------------------+-------------------------------+

By default, date filtering applies to the DateModified field. You can alter this to another date field by using the --date-field option.

$ bs list datasets --date-field=DateCreated --older-than=1y -F Name -F DateCreated
+-----------+-------------------------------+
|   Name    |          DateCreated          |
+-----------+-------------------------------+
| valid     | 2017-04-04 11:07:56 +0000 UTC |
| test_data | 2017-04-04 11:20:09 +0000 UTC |
+-----------+-------------------------------+

Server side filtering

The --filter-term and --filter-field options use client side filtering - the API returns all entities and they are filtered before they are displayed. This means that even if you only end up listing a handful of results, it can take a long time on a large account.

Some entities have specific filtering options that make use of server side filtering, where the API does the filtering and only returns the matching entities. These are available on an entity-specific basis:

$ bs list appsessions --exec-status=Complete
+------------------------------------------+----------+-----------------+
|                   Name                   |    Id    | ExecutionStatus |
+------------------------------------------+----------+-----------------+
| test_data                                | 10743734 | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1313312  | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1306305  | Complete        |
| BaseSpaceCLI 2017-04-04 11:07:54Z        | 10743733 | Complete        |
+------------------------------------------+----------+-----------------+

$ bs list datasets --is-type=common.files
+------------------------------------------+-------------------------------------+-------------------------------+----------------+
|                   Name                   |                 Id                  |         Project.Name          | DataSetType.Id |
+------------------------------------------+-------------------------------------+-------------------------------+----------------+
| test_data                                | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples                 | common.files   |
| BWA GATK - HiSeq 2500 NA12878 demo 2x150 | ds.2f03151b6c9b4a909d05b1af729a6fc2 | HiSeq 2500 NA12878 2x150 Demo | common.files   |
+------------------------------------------+-------------------------------------+-------------------------------+----------------+

You can discover the server side filtering options for each entity by using --help

$ bs list datasets --is-type=common.files
[dataset command options]
          --like-type=                       Filter DataSets that are LIKE this type
          --is-type=                         Filter DataSets that are this type
          --not-type=                        Filter DataSets that are NOT this type
          --input-biosample=                 Filter by Input BioSample
          --project-name=                    Name of parent project
          --project-id=                      ID of parent project

You can also combine server-side and client-side filtering:

$ bs list datasets --is-type=common.files --filter-term=data
+-----------+-------------------------------------+---------------+----------------+
|   Name    |                 Id                  | Project.Name  | DataSetType.Id |
+-----------+-------------------------------------+---------------+----------------+
| test_data | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples | common.files   |
+-----------+-------------------------------------+---------------+----------------+

BaseSpaceCLI filtering vs. POSIX

An alternative to using the BaseSpaceCLI filter options it to use the standard POSIX tools such as grep and cut. The advantage of using the BaseSpaceCLI filters is that you can still use other options such as column selection and output formatting to help get you the output you want, which can be more convenient:

#using POSIX tools:
$ bs list datasets -f csv | grep common.files | cut -d, -f2
ds.46c118551d51497789ddaf84bbc9bff0
ds.2f03151b6c9b4a909d05b1af729a6fc2
# the equivalent, with BSCLI filters:
$ bs list datasets -f csv --terse --is-type=common.files
ds.46c118551d51497789ddaf84bbc9bff0
ds.2f03151b6c9b4a909d05b1af729a6fc2

Drilling down into a data set

Get details about one data set.

This example is from the VCAT app:

$ bs -c v2cli_prod get dataset -i ds.f45e4fcccbce4fb18dd91bdad7dcb272
+---------------------------------------------------+----------------------------------------------------------------------------------------+
| Id                                                | ds.f45e4fcccbce4fb18dd91bdad7dcb272                                                    |
| Name                                              | NA12878-R1S1vcf-38337470                                                               |
| AppSession.Id                                     | 42463886                                                                               |
| AppSession.Name                                   | NA12878-R1_S1.vcf.gz_2                                                                 |
| AppSession.Application.AppFamilySlug              | basespace-labs.variant-calling-assessment-tool                                         |
...

Get dataset attributes

This example is a FASTQ app:

$ bs list attributes dataset -i ds.2f5b56dddc0440858943246ba4ac9d11
+---------------------+---------------+
|        Name         |     Value     |
+---------------------+---------------+
| TotalReadsPF        | 4.1119628e+07 |
| MaxLengthIndexRead1 | 8             |
| MaxLengthRead1      | 151           |
| MaxLengthRead2      | 151           |
| IsPairedEnd         | true          |
| TotalClustersPF     | 2.0559814e+07 |
| TotalClustersRaw    | 2.6711606e+07 |
| TotalReadsRaw       | 5.3423212e+07 |
| MaxLengthIndexRead2 | 8             |
+---------------------+---------------+

Get file contents for a dataset

This is a VCAT example:

$ bs contents dataset -i ds.f45e4fcccbce4fb18dd91bdad7dcb272
+------------+-----------------------------------------------------------------------------------------+
|     Id     |                                        FilePath                                         |
+------------+-----------------------------------------------------------------------------------------+
| 7240583239 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.vcf.gz.tbi   |
| 7240583238 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.vcf.gz       |
| 7240583237 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.summary.csv  |
| 7240583236 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.metrics.json |
| 7240583235 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.extended.csv |
| 7240583234 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.json  |
| 7240583233 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.csv   |
| 7240583232 | report.log                                                                              |
| 7240583231 | report.json                                                                             |
+------------+-----------------------------------------------------------------------------------------+

Download subset of files (by extension) from a dataset as a tgz:

# will download into file /tmp/vcat.tar.gz
$ bs download dataset -i ds.2f5b56dddc0440858943246ba4ac9d11 --extension=json -o /tmp/vcat
NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.metrics.json  27.44 KB / 27.44 KB [============] 100.00% 348.12 KB/s 0s
happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.metrics.json
NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.json  5.02 KB / 5.02 KB [================] 100.00% 22.04 MB/s 0s
happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.json
report.json  2.53 KB / 2.53 KB [=====================================================================================] 100.00% 13.41 MB/s 0s
report.json
NA12878-R1S1vcf-38337470.ds.f45e4fcccbce4fb18dd91bdad7dcb272.json  2.71 KB / 2.71 KB [===============================] 100.00% 61.58 MB/s 0s
NA12878-R1S1vcf-38337470.ds.f45e4fcccbce4fb18dd91bdad7dcb272.json

Working with properties

Many BSSH entities can be tagged with properties, key/value pairs that label those entities. Some entities, like appsessions, come tagged with properties automatically but others can still be added manually. BSCLI lets you inspect existing properties of any type and create string properties.

BSSH entities that can be labelled with properties include projects, runs, biosamples, appsession, appresults and datasets.

Inspecting properties of an appsession

The command to see all the properties of an entity is property list:

$ bs appsession property list -i 46664618
+-------------------------------+----------------------------+-------------+------------------------------------+
|             Name              |        Description         |    Type     |              Content               |
+-------------------------------+----------------------------+-------------+------------------------------------+
| Output.Projects               |                            | project[]   | <use `bs get` to obtain more info> |
| Output.Datasets               |                            | dataset[]   | <use `bs get` to obtain more info> |
| Input.snp_vqsr                | SNP VQSR sensitivity       | string      | "99.5"                             |
| Input.sample-id.attributes    | Sample Attributes          | map[]       | <use `bs get` to obtain more info> |
| Input.reference_genome        | Reference genome           | string      | "b37_decoy"                        |
| Input.Projects                |                            | project[]   | <use `bs get` to obtain more info> |
| Input.project-id.attributes   | Save Results To Attributes | map[]       | <use `bs get` to obtain more info> |
| Input.project-id              | Save Results To            | project     | <use `bs get` to obtain more info> |
| Input.indel_vqsr              | Indel VQSR sensitivity     | string      | "99.5"                             |
| Input.Datasets                |                            | dataset[]   | <use `bs get` to obtain more info> |
| Input.BioSamples              |                            | biosample[] | <use `bs get` to obtain more info> |
| Input.app-session-name        | Analysis Name              | string      | "Sentieon [LocalDateTime]"         |
| BaseSpace.Private.IsMultiNode |                            | string      | "True"                             |
+-------------------------------+----------------------------+-------------+------------------------------------+

To see an individual property, use property get with the --property-name switch:

$ bs appsession property get -i 46664618 --property-name="Input.snp_vqsr"
"99.5"

Note that many of the properties of this appsession are themselves BSSH entities, which can be listed directly by using property get:

$ bs appsession property get -i 46664618 --property-name="Output.Projects"
+---------------+----------+----------------+
|     Name      |    Id    |   TotalSize    |
+---------------+----------+----------------+
| sgdp_sentieon | 38827790 | 32372028734815 |
+---------------+----------+----------------+

This is particularly useful for finding the inputs and outputs of an app:

$ bs appsession property get -i 46664618 --property-name="Input.Datasets"
+------------+-------------------------------------+---------------------------------+---------------------+
|    Name    |                 Id                  |          Project.Name           |   DataSetType.Id    |
+------------+-------------------------------------+---------------------------------+---------------------+
| ERR1347692 | ds.4fa74a92d9b04a69a5cb53f603d965fa | Simons Genome Diversity Project | illumina.fastq.v1.8 |
+------------+-------------------------------------+---------------------------------+---------------------+




$ bs appsession property get -i 46664618 --property-name="Output.Datasets"
+----------+-------------------------------------+---------------+----------------+
|   Name   |                 Id                  | Project.Name  | DataSetType.Id |
+----------+-------------------------------------+---------------+----------------+
| 36701821 | ds.e599c516419e4470b03f26c480fce45d | sgdp_sentieon | common.files   |
+----------+-------------------------------------+---------------+----------------+

We can use some shell features to see the list of files that were output in a single command:

$ bs contents dataset -i $(bs appsession property get -i 46664618 --property-name="Output.Datasets" --terse)
+------------+--------------------------------------------+
|     Id     |                  FilePath                  |
+------------+--------------------------------------------+
| 7776871946 | .basespace/ERR1347692_128_hs37d5.cov.gz    |
| 7776871945 | .basespace/ERR1347692_128_Y.cov.gz         |
| 7776871944 | .basespace/ERR1347692_128_X.cov.gz         |
| 7776871943 | .basespace/ERR1347692_128_NC_007605.cov.gz |
| 7776871942 | .basespace/ERR1347692_128_MT.cov.gz        |
| 7776871941 | .basespace/ERR1347692_128_GL0002491.cov.gz |
| 7776871940 | .basespace/ERR1347692_128_GL0002481.cov.gz |
...

Setting and inspecting properties on a project

Projects by default do not have properties set:

$ bs projects properties list -i 27932921
$

We can add string properties:

$ bs projects properties set -i 27932921 --property-name="MyNamespace.TestProperty" --property-content="TestValue"
$ bs projects properties list -i 27932921
+--------------------------+-------------+--------+-------------+
|          Name            | Description |  Type  |   Content   |
+--------------------------+-------------+--------+-------------+
| MyNamespace.TestProperty |             | string | "TestValue" |
+--------------------------+-------------+--------+-------------+

Note that a namespace prefix for a property name is compulsory:

$ bs projects properties set -i 27932921 --property-name="TestProperty" --property-content="TestValue"
ERROR: *** BASESPACE.PROPERTIES.NAME_INVALID: Property name: TestPropery must contain 2 or more segments split by a period. Each segment may contain letters, numbers, '-', and '\_'. First segment must start with a letter or number. ***

Setting lane QC thresholds

Authenticate with the appropriate scopes into a config called "laneqc"

We will refer to this in future commands.

$ ./bs auth --scope="READ GLOBAL","CREATE GLOBAL","CONFIGURE QC" -c laneqc
Please go to this URL to authenticate:  https://basespace.illumina.com/oauth/device?code=HrACj
Created config file  /Users/basespaceuser/.basespace/laneqc.cfg
Welcome, BSSH.V2 TestUser

Look at lane QC thresholds to confirm they are blank:

$ bs -c laneqc lane threshold export
Name,Group,Operator,ThresholdValues

Create a csv file to define the QC thresholds and set the lane QC thresholds using that file:

$ cat > /tmp/thresholds.txt                                       
Name,Group,Operator,ThresholdValues                                   
PercentGtQ30,SequencingRead1,GreaterThanOrEqual,50                    
PercentGtQ30,SequencingRead2,GreaterThanOrEqual,40                    
$ bs -c laneqc lane threshold import -f /tmp/thresholds.txt          
# should finish without errors!                                      

View lane QC thresholds:

$ bs -c laneqc lane threshold export                                 
Name,Group,Operator,ThresholdValues                                   
PercentGtQ30,SequencingRead1,GreaterThanOrEqual,50                    
PercentGtQ30,SequencingRead2,GreaterThanOrEqual,40                    

Clear lane QC thresholds to remove them and view to make sure they have been cleared:

$ bs -c laneqc lane threshold clear                                  
$ ./bs -c laneqc lane threshold export                               
Name,Group,Operator,ThresholdValues                                   

Creating a biosample and uploading FASTQ data against it

Biosample Creation

# warning! if your project does not already exist it will be implicitly created                                                    
$ bs create biosample -n "MyBioSample" -p "MyProject"                

Note that there are quite a few optional metadata parameters for biosamples. These are primarily designed to help high-throughput labs classify and display biosamples:

$ bs create biosample --help
(snip)
    BioSample Options:
      -n, --name=                                  Name of the BioSample
      -p, --project=                               Name of the project where FastQs will be stored. Created if not found.
          --container-name=                        Name of container
          --container-position=                    Position within the container
          --analysis-workflow=                     Name of the analysis to schedule
          --prep-request=                          Name of the lab workflow that LIMS should perform
          --required-yield=                        Required yield in Gbp that is needed before launching analysis (required if --prep-request is provided)
          --metadata=                              Key/Value metadata properties to set on the BioSample
          --delivery-mode=[Deliver|Do Not Deliver] Intial delivery mode
(snip)

You can preview biosample creation (--preview), to validate that the data provided is correct.

$ bs create biosample -n "MyBioSample" -p "MyProject" --preview      
ERROR:  * * * Error in BioSample Name: BioSample 'MyBioSample' already exists and cannot be imported  * * *                                  

To attach a new analysis workflow to an existing biosample, use the --allow-existing option.

FASTQ upload

There are two options to associate uploaded FASTQ files with a biosample:

  • Upload files that match the biosample name
  • Force the upload to attach FASTQs to the specified biosample, regardless of their name

Upload FASTQ files that match the biosample name

$ ls                                                                 
MyBioSample_S1_L001_R1_001.fastq.gz                               
MyBioSample_S1_L001_R2_001.fastq.gz                               
# note that you need a project ID here, not a project name as you did when you created the biosample!                                       
$ bs upload dataset -p 27943921                                      
MyBioSample_S1_L001_R1_001.fastq.gz                               
MyBioSample_S1_L001_R2_001.fastq.gz                               
Sample(MyBioSample)  [1/1 ] Upload started                            
Sample(MyBioSample) Metadata complete. Waiting for upload             completion...                                                         
Sample(MyBioSample) Upload complete                                   
# note that datasets and created asynchronously and there can be a delay
$ bs list dataset --input-biosample="MyBioSample"  
+-------------+-------------------------------------+--------------+---------------------+
|    Name     |                 Id                  | Project.Name |   DataSetType.Id    |
+-------------+-------------------------------------+--------------+---------------------+
| MyBioSample | ds.94f7e9663e86473c8582dcf85a830195 | MyProject    | illumina.fastq.v1.8 |
+-------------+-------------------------------------+--------------+---------------------+

Force the upload to attach FASTQS to the specified biosample, regardless of their name

$ ls
valid_S1_L001_R1_001.fastq.gz   valid_S1_L001_R2_001.fastq.gz
# note that you need a project ID here, not a project name as you did when you created the biosample!
$ bs upload dataset --biosample-name="MyBioSample" -p 27943921 valid_S1_L001_R1_001.fastq.gz valid_S1_L001_R2_001.fastq.gz
Sample(MyBioSample) [1/1] Upload started
Sample(MyBioSample) Metadata complete. Waiting for upload completion...
Sample(MyBioSample) Upload complete
$ /tmp/BSCLI/amd64-darwin/bs list dataset --input-biosample="MyBioSample"
+-------------+-------------------------------------+--------------+---------------------+
|    Name     |                 Id                  | Project.Name |   DataSetType.Id    |
+-------------+-------------------------------------+--------------+---------------------+
| MyBioSample | ds.94f7e9663e86473c8582dcf85a830195 | MyProject    | illumina.fastq.v1.8 |
| MyBioSample | ds.64706f7d2e504e1c9495c00c468d6640 | MyProject    | illumina.fastq.v1.8 |
+-------------+-------------------------------------+--------------+---------------------+

Note that even though the dataset has a name to match the biosample, the files within retain their original names:

$ bs dataset contents -i ds.94f7e9663e86473c8582dcf85a830195
+------------+-------------------------------------+
|     Id     |              FilePath               |
+------------+-------------------------------------+
| 8652306587 | MyBioSample_S1_L001_R1_001.fastq.gz |
| 8652306586 | MyBioSample_S1_L001_R2_001.fastq.gz |
+------------+-------------------------------------+
$ bs dataset contents -i ds.64706f7d2e504e1c9495c00c468d6640
+------------+-------------------------------+
|     Id     |           FilePath            |
+------------+-------------------------------+
| 8652572270 | valid_S1_L001_R2_001.fastq.gz |
| 8652572269 | valid_S1_L001_R1_001.fastq.gz |
+------------+-------------------------------+

Uploading directories full of fastq files

The bs upload dataset supports a --recursive option that scans a directory and its subdirectories looking for fastq files:

$ ls
valid2_S1_L001_R1_001.fastq.gz  valid2_S1_L001_R2_001.fastq.gz  valid_S1_L002_R1_001.fastq.gz   valid_S1_L002_R2_001.fastq.gz
$ bs upload dataset -p 21646627 --recursive .
Sample(valid2) [1/2] Upload started
Sample(valid2) Metadata complete. Waiting for upload completion...
Sample(valid2) Upload complete
Sample(valid) [2/2] Upload started
Sample(valid) Metadata complete. Waiting for upload completion...
Sample(valid) Upload complete 

By default, these will be automatically grouped by name and uploaded as many individual fastq datasets, but you can also force these to all be uploaded to the same biosample:

$ bs upload dataset -p 21646627 --recursive --biosample-name=valid .
Sample(valid) [1/1] Upload started
Sample(valid) Metadata complete. Waiting for upload completion...
Sample(valid) Upload complete

Configure Automated Workflow

The automated workflow feature of BSSH V2 allows apps to be launched automatically when they meet a set of conditions or dependencies. The feature also allows automated quality control to be applied so that the AppSession is automatically marked as "QCPassed" or "QCFailed" based on the metrics it generated.

Before an app can be launched automatically in this way, a "workflow" needs to be created which wraps the app and describes its dependencies and (optionally) any QC thresholds. The workflow takes as input a template appsession, an app launch that has been configured and launched manually through the GUI to contain the desired settings; automated launches for this workflow will be based on these settings.

It is also possible to chain automated app launches by creating another workflow for the downstream app and creating a dependency on the upstream step.

Get a token

Get the MANAGE APPLICATIONS token to create and work with workflows:

$ bs auth -c emea --force --api-server https://api.emea.illumina.com/ --scopes "READ GLOBAL","CREATE GLOBAL","BROWSE GLOBAL","MANAGE APPLICATIONS"

Create a Workflow

$ bs -c cloud-manageapps workflow create -n TestWorkflow -d CLICreated --application-id=2039037 --appsession-id=41060211
3978975

In this example, the application and appsession are based on the WGS5.0.0 app.

The value returned is the ID of the workflow. You can see it by doing:

$ bs -c cloud-manageapps list applications --category=workflow
+--------------+---------+---------------+
|     Name     |   Id    | VersionNumber |
+--------------+---------+---------------+
| TestWorkflow | 3978975 | 1.0.0         |
+--------------+---------+---------------+

Set Dependencies

Create a biosample yield dependency:

$ bs -c cloud-manageapps workflow dependency add biosample-yield --chooser-id=sample-id --can-use-primary-biosample -i 3978975

Create an app completion dependency:

To look up the details of the app you want to use:

$ bs get application -i 19019
+----------------------------+----------------------------------------------------------------------------------------+
| AppFamilySlug              | illumina-lab-services.genotyping-vcf-uploader                                          |
| AppVersionSlug             | illumina-lab-services.genotyping-vcf-uploader.1.0.0                                    |
| Id                         | 19019                                                                                  |
(snip)

Create the dependency:

$ bs -c cloud-manageapps workflow dependency add app-completion --application-id illumina-lab-services.genotyping-vcf-uploader.1.0.0 -i 3978975 --chooser-id=array-vcf-file --qc-pass --file-selector='.*\\.vcf\\.gz$|.*\\.vcf$'

Review:

$ bs -c cloud-manageapps workflow dependency export -i 3978975
[
  {
    "Type": "BioSampleYield",
    "Attributes": {
      "BioSampleChooserId": "sample-id",
      "CanUsePrimaryBioSample": false,
      "Label": "",
      "LibraryPrepId": "",
      "MixLibraryTypesAllowed": false
    },
    "Dependencies": null
  },
  {
    "Type": "AppCompletion",
    "Attributes": {
      "ApplicationId": "illumina-lab-services.genotyping-vcf-uploader.1.0.0",
      "CanUsePrimaryResource": false,
      "ColumnId": "",
      "Label": "",
      "RequireQcPass": true,
      "ResourceChooserId": "array-vcf-file"
    },
    "Dependencies": null
  }
]

Set QC Thresholds

In this example, the thresholds are configured in the following csv file:

/tmp/qcthresholds.csv

Name

DatasetTypeId

Operator

ThresholdValues

illumina_isaac_v5.autosome_callability

illumina.isaac.v5

GreaterThanOrEqual

95

illumina_isaac_v5.autosome_coverage_at_10x

illumina.isaac.v5

GreaterThanOrEqual

98

illumina_isaac_v5.autosome_coverage_at_1x

illumina.isaac.v5

GreaterThanOrEqual

99.49

illumina_isaac_v5.autosome_exon_callability

illumina.isaac.v5

GreaterThanOrEqual

97

illumina_isaac_v5.autosome_exon_coverage_at_10x

illumina.isaac.v5

GreaterThanOrEqual

98.5

illumina_isaac_v5.autosome_exon_coverage_at_1x

illumina.isaac.v5

GreaterThanOrEqual

99.29

illumina_isaac_v5.autosome_mean_coverage

illumina.isaac.v5

GreaterThanOrEqual

30

illumina_isaac_v5.cnvs

illumina.isaac.v5

LessThanOrEqual

300

illumina_isaac_v5.contamination

illumina.isaac.v5

LessThanOrEqual

5

illumina_isaac_v5.fragment_length_median

illumina.isaac.v5

GreaterThanOrEqual

420

illumina_isaac_v5.frameshift_deletions

illumina.isaac.v5

GreaterThanOrEqual

0

illumina_isaac_v5.indel_het_to_hom_ratio

illumina.isaac.v5

LessThanOrEqual

2.5

illumina_isaac_v5.mapq_more_than_10_autosome_coverage_at_15x

illumina.isaac.v5

GreaterThanOrEqual

95

illumina_isaac_v5.mapq_more_than_10_autosome_exon_coverage_at_15x

illumina.isaac.v5

GreaterThanOrEqual

95

illumina_isaac_v5.mismatch_rate_read_1

illumina.isaac.v5

LessThanOrEqual

1

illumina_isaac_v5.mismatch_rate_read_2

illumina.isaac.v5

LessThanOrEqual

2.3

illumina_isaac_v5.percent_aligned_reads

illumina.isaac.v5

GreaterThanOrEqual

92

illumina_isaac_v5.percent_at_dropout

illumina.isaac.v5

LessThanOrEqual

2.43

illumina_isaac_v5.percent_gc_dropout

illumina.isaac.v5

LessThanOrEqual

2.5

illumina_isaac_v5.percent_q30_bases

illumina.isaac.v5

GreaterThanOrEqual

80

illumina_isaac_v5.percent_q30_bases_read_2

illumina.isaac.v5

GreaterThanOrEqual

77

illumina_isaac_v5.percent_read_pairs_aligned_to_different_chromosomes

illumina.isaac.v5

LessThanOrEqual

1.5

illumina_isaac_v5.q30_bases_excluding_clipped_and_duplicate_read_bases

illumina.isaac.v5

GreaterThanOrEqual

9e+10

illumina_isaac_v5.read_enrichment_at_80percent_gc

illumina.isaac.v5

GreaterThanOrEqual

0.8

illumina_isaac_v5.snv_het_to_hom_ratio

illumina.isaac.v5

LessThanOrEqual

2.05

illumina_isaac_v5.total_pf_bases

illumina.isaac.v5

GreaterThanOrEqual

1e+11

illumina_isaac_v5.array_concordance

illumina.isaac.v5

GreaterThanOrEqual

99.3

illumina_isaac_v5.array_concordance_usage

illumina.isaac.v5

GreaterThanOrEqual

95

Do this:

$ bs -c cloud-manageapps workflow threshold import -f /tmp/qcthresholds.csv -i 3978975

Accession Biosample

To create a new instance of your workflow, create a biosample with that workflow. Note that you need to use the project name not the ID here - this is to match the manifest import mechanism.

$ bs -c cloud-manageapps biosample create -p "MyProject" -n "MyBiosample" --analysis-workflow "TestWorkflow"

Inspect the Biosamples you've just created

$ bs list biosamples --newer-than=1d

Find the appsession that's been created as your workflow

Setting different fields here shows you extra information about the status. Note that you need to use the biosample ID and not the name here.

$ bs -c cloud-manageapps appsession list --input-biosamples=$BIOSAMPLEID -F Id -F Status -F StatusSummary

Upload Fastq files against that biosample

Uploading data against your new biosample can be carried out as listed in a previous example. This will register yield against this biosample that can trigger an app launch if the workflow has a yield dependency.

Launching Apps

This section provides some examples for manual app launch at the command line. This is distinct for configuring apps for automated apps, as provided in an earlier example.

Both workflow-driven launch and the V1 CLI only allow launches based on an initial launch from the GUI being used as a template. The V2 CLI removes the need for this template appsession by parsing the form for each app directly and giving the user access, via the command line, to view and configure options for the relevant app.

View the available options for an app

You can choose an app both using the name or an ID.

$ bs launch application -n "Whole Genome Sequencing" --list
+------------------------+----------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+----------+
|         Option         |      Type      |                                   Choices                                   |                                  Default                                   | Multiselect | Required |
+------------------------+----------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+----------+
| app-session-name       | TextBox        | <String>                                                                    | Example [LocalDateTime]                                                    | false       | true     |
| project-id             | ProjectChooser | <Project>                                                                   |                                                                            | false       | true     |
| sample-id              | SampleChooser  | <Sample>                                                                    |                                                                            | true        | false    |
| bam-file-id            | FileChooser    | <File>                                                                      |                                                                            | true        | false    |
| reference-genome       | Select         | /data/scratch/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta,        | /data/scratch/hg38/Homo_sapiens/NCBI/GRCh38Decoy/Sequence/WholeGenomeFasta | false       | true     |
|                        |                | /data/scratch/GRCh37/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta, |                                                                            |             |          |
|                        |                | /data/scratch/hg38/Homo_sapiens/NCBI/GRCh38Decoy/Sequence/WholeGenomeFasta  |                                                                            |             |          |
| enable-variant-calling | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| enable-sv-calling      | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| enable-cnv-calling     | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| annotation-source      | RadioButton    | ensembl, refseq, both, none                                                 | ensembl                                                                    | false       | false    |
+------------------------+----------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+----------+

Build a launch command

Build a launch command which includes the required options. Launch by ID rather than app name to be sure of the exact app version being used:

$ bs launch application -i 4878874 -o project-id:projects/1232 -o bam-file-id:files/3534333 \
-l "My test appsession"

Refer to a specific app version

You can also launch by name and version, allowing instance-independent app launches. You have to specify the three-digit version number in full:

$ bs launch application -n 'Whole Genome Sequencing' --app-version 7.0.1 \
-o project-id:projects/1232 -o bam-file-id:files/3534333 \
-l "My test appsession"

Launching an app on biosamples

The V2 launch endpoint only accepts V2 entities for all inputs. Biosamples must be given with a libraryprepid. You can also comma-separate multiple arguments:

$ bs launch application -i 4878874 -o project-id:projects/1232 \
-o sample-id:biosamples/3242424/librarypreps/343432,biosamples/232321/librarypreps/343432

ResourceMatchers

Apps such as Tumor Normal use a ResourceMatcher to submit matched pairs of WGS datasets. Through the CLI the format for these fields is:

-o input-id:'col1_dataset1,col1_dataset2;col2_dataset1,col2_dataset2'

Here commas seperate multiple inputs and a semi-colon delimits a column, so the above would look like this in the web UI:

Normal            Tumor
col1_dataset1     col2_dataset1
col1_dataset2     col2_dataset2

Some apps are not yet supported for CLI launch, such as old versions of VCAT:

$ bs launch application -i 1800799 --list
WARNING: Input field 'sample-pairs' is a TabularFieldSet and not yet supported, you may not be able to launch this app !

How do I pass a biosample FASTQ dataset?

To launch with a biosample you need two components, the biosample ID (easy, get from web UI or from bs list biosample) and the library prep ID which is trickier. One way of getting this is through the API directly, e.g. by opening: api.basespace.illumina.com/v2/biosamples//libraries and locating the libraryprep ID.

Alternatively, use basemount:

  • make sure V2 endpoints are enabled (bm-cmd useV2API)
  • find your biosample under ${BASEMOUNT}/biosamples/ or use .ResourceById/
  • Read the id from: ... /Libraries//LibraryPrep/.id

In many cases this libraryprepid will be 1014015 ('Unknown' library prep kit).

Once you have both parts, stick them together in the form:

biosamples/<biosampleid>/librarypreps/<libraryprepid>

How can I find a BSSH file ID?

You can list the contents of a dataset to get the file IDs:

$ bs contents dataset -i ds.15d19fafafb64bdd858dec50f90d2300 | head
+------------+----------------------------------------------------------------------+
|     Id     |                               FilePath                               |
+------------+----------------------------------------------------------------------+
| 8618632807 | Plots/s_0_1_2212_MismatchByCycle.png                                 |
| 8618632806 | Plots/s_0_1_2113_MismatchByCycle.png                                 |
| 8618632805 | Plots/sorted_S1_G1_chr2_MismatchByCycle.png                          |
| 8618632804 | Plots/s_0_1_1115_MismatchByCycle.png                                 |
| 8618632803 | Plots/s_1_1_2108_MismatchByCycle.png                                 |
| 8618632802 | Plots/s_0_2_1107_MismatchByCycle.png                                 |
| 8618632801 | Plots/s_0_1_1112_NoCallByCycle.png                                   |

You can also use BaseMount to find the file by navigating to the project and appresult which generated it:

${BASEMOUNT}/Projects/<project>/appresults/<appsession name>/Files/file.vcf

Then use the Files.metadata directory to get the ID:

$ cat ${BASEMOUNT}/Projects/<project>/appresults/<appsession name>/Files.metadata/file.vcf/.id