The following examples demonstrate the commands in the BaseSpace CLI tool. For more information about the CLI and a list of commands, see CLI Overview.

Note that in examples where the output is very long it has been contracted to make this document more manageable. Users are encouraged to follow these examples whilst trying the commands for themselves on their own data to see the full output in their own system.

Authentication, listing projects and appresults

Install:

See main instructions.

Authenticate using default settings:

$ bs auth                                                           
Please go to this URL to authenticate:                                
https://basespace.illumina.com/oauth/device?code=jfHSG                
Created config file /home/psaffrey/.basespace/default.cfg             
Welcome, BSSH.V2 TestUser                                             

The default settings include:

  • The BSSH instance to use (default: US Virginia)
  • The scopes to request; these dictate the actions that can be performed
  • A timeout - how long to wait for authentication before the command fails

Inspect the token to see what it does:

Example:

$bs whoami
+----------------+-------------------------------------------------+
| Name           | BaseSpace User                                  |
| Id             | 1234567                                         |
| Email          | basespaceuser@illumina.com                      |
| DateCreated    | 2015-01-16 15:31:22 +0000 UTC                   |
| DateLastActive | 2017-06-01 12:59:24 +0000 UTC                   |
| Host           | https://api.basespace.illumina.com              |
| Scopes         | ["READ GLOBAL" "CREATE GLOBAL" "BROWSE GLOBAL"] |
+----------------+-------------------------------------------------+

List projects:

$ bs list projects   
+--------------------------------------------------+----------+---------------+
|                       Name                       |    Id    |   TotalSize   |
+--------------------------------------------------+----------+---------------+
| NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | 36080093 | 2233311909088 |
+--------------------------------------------------+----------+---------------+

List datasets:

$ bs list datasets
+--------------------------+-------------------------------------+--------------------------------------------------+---------------------+
|           Name           |                 Id                  |                   Project.Name                   |   DataSetType.Id    |
+--------------------------+-------------------------------------+--------------------------------------------------+---------------------+
| NA12878-I13_L002         | ds.184ba3d796f343f4886b4aa7fb43c496 | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I13_L001         | ds.c805113ed9884caa8912dafdf8edd63d | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I54_L001         | ds.dc5657d91983479eb0dd6abb53b9d60f | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I85_L001         | ds.0a7781b4d7684113a4c64c1f2ca3c175 | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
...

List all the available headers for datasets:

$ bs dataset headers                                                 
[                                                                    
    "Id",                                                                 
    "Name",                                                               
    "AppSession.Id",                                                      
    "AppSession.Name",                                                    
    "AppSession.Application.AppFamilySlug",                               
    "AppSession.Application.AppVersionSlug",                              
    "AppSession.Application.Id",                                          
    "AppSession.Application.VersionNumber",                               
...

Relist all datasets

Using custom columns selected from the headers list:

$ bs list datasets -F Name -F QcStatus -F TotalSize -F AppSession.Application.Name                                           
+--------------------------+-----------+-------------+---------------------------------+
|           Name           | QcStatus  |  TotalSize  |   AppSession.Application.Name   |
+--------------------------+-----------+-------------+---------------------------------+
| NA12878-I13_L002         | Undefined | 4047606231  | FASTQ Generation                |
| NA12878-I13_L001         | Undefined | 4118853581  | FASTQ Generation                |
| NA12878-I54_L001         | Undefined | 3496655065  | FASTQ Generation                |
| NA12878-I85_L001         | Undefined | 2462111619  | FASTQ Generation                |
| NA12878-I87_L001         | Undefined | 2271497152  | FASTQ Generation                |
...

Filtering list results

BaseSpaceCLI provides several options for filtering the entities that are output by a list command.

Filtering by name

The option to filter results on an entity field is --filter-term. By default, this filters on the Name field.

$ bs list projects --filter-term=examples
+---------------+---------+-----------+
|     Name      |   Id    | TotalSize |
+---------------+---------+-----------+
| data_examples | 5472467 | 26510     |
+---------------+---------+-----------+

The filter term is specified as a regular expression:

$ bs list appsessions --filter-term=" .* "
+------------------------------------------+----------+-----------------+
|                   Name                   |    Id    | ExecutionStatus |
+------------------------------------------+----------+-----------------+
| Illumina's Uploader 2012-11-19 22:06:29Z | 1313312  | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1306305  | Complete        |
| BaseSpaceCLI 2017-04-04 11:07:54Z        | 10743733 | Complete        |
+------------------------------------------+----------+-----------------+

Filtering by another field

You can also specify the field on which to filter by using the --filter-term option:

$ bs list datasets --filter-field=Project.Name --filter-term=data
+-----------+-------------------------------------+---------------+---------------------+
|   Name    |                 Id                  | Project.Name  |   DataSetType.Id    |
+-----------+-------------------------------------+---------------+---------------------+
| valid     | ds.5c4200d4f52e4a9dae86fd8b166e296d | data_examples | illumina.fastq.v1.8 |
| test_data | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples | common.files        |
+-----------+-------------------------------------+---------------+---------------------+

This is necessary for entities that do not have a "Name" field, like biosamples:

ukch-ofclt2606:fluidics psaffrey$ bs list biosamples --filter-term=demo
ERROR: *** Name "Name" not found in object ***
ukch-ofclt2606:fluidics psaffrey$ bs list biosamples --filter-term=demo --filter-field=BioSampleName
+-------------------------------+---------+---------------+-------------------+--------+
|         BioSampleName         |   Id    | ContainerName | ContainerPosition | Status |
+-------------------------------+---------+---------------+-------------------+--------+
| HiSeq_2500_NA12878_demo_2x150 | 2280211 |               |                   | New    |
+-------------------------------+---------+---------------+-------------------+--------+

Filtering by date

You can also specify the age of entities to be displayed by using --older-than and --newer-than.

$ bs list datasets  -F Name -F DateModified --newer-than=400d
+------------------------------------------+-------------------------------+
|                   Name                   |         DateModified          |
+------------------------------------------+-------------------------------+
| HiSeq 2500 NA12878 demo 2x150            | 2017-06-30 22:17:17 +0000 UTC |
| BWA GATK - HiSeq 2500 NA12878 demo 2x150 | 2017-07-04 03:09:15 +0000 UTC |
+------------------------------------------+-------------------------------+

By default, date filtering applies to the DateModified field. You can alter this to another date field by using the --date-field option.

$ bs list datasets --date-field=DateCreated --older-than=1y -F Name -F DateCreated
+-----------+-------------------------------+
|   Name    |          DateCreated          |
+-----------+-------------------------------+
| valid     | 2017-04-04 11:07:56 +0000 UTC |
| test_data | 2017-04-04 11:20:09 +0000 UTC |
+-----------+-------------------------------+

Server side filtering

The --filter-term and --filter-field options use client side filtering - the API returns all entities and they are filtered before they are displayed. This means that even if you only end up listing a handful of results, it can take a long time on a large account.

Some entities have specific filtering options that make use of server side filtering, where the API does the filtering and only returns the matching entities. These are available on an entity-specific basis:

$ bs list appsessions --exec-status=Complete
+------------------------------------------+----------+-----------------+
|                   Name                   |    Id    | ExecutionStatus |
+------------------------------------------+----------+-----------------+
| test_data                                | 10743734 | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1313312  | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1306305  | Complete        |
| BaseSpaceCLI 2017-04-04 11:07:54Z        | 10743733 | Complete        |
+------------------------------------------+----------+-----------------+

$ bs list datasets --is-type=common.files
+------------------------------------------+-------------------------------------+-------------------------------+----------------+
|                   Name                   |                 Id                  |         Project.Name          | DataSetType.Id |
+------------------------------------------+-------------------------------------+-------------------------------+----------------+
| test_data                                | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples                 | common.files   |
| BWA GATK - HiSeq 2500 NA12878 demo 2x150 | ds.2f03151b6c9b4a909d05b1af729a6fc2 | HiSeq 2500 NA12878 2x150 Demo | common.files   |
+------------------------------------------+-------------------------------------+-------------------------------+----------------+

You can discover the server side filtering options for each entity by using --help

$ bs list datasets --is-type=common.files
[dataset command options]
          --like-type=                       Filter DataSets that are LIKE this type
          --is-type=                         Filter DataSets that are this type
          --not-type=                        Filter DataSets that are NOT this type
          --input-biosample=                 Filter by Input BioSample
          --project-name=                    Name of parent project
          --project-id=                      ID of parent project

You can also combine server-side and client-side filtering:

$ bs list datasets --is-type=common.files --filter-term=data
+-----------+-------------------------------------+---------------+----------------+
|   Name    |                 Id                  | Project.Name  | DataSetType.Id |
+-----------+-------------------------------------+---------------+----------------+
| test_data | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples | common.files   |
+-----------+-------------------------------------+---------------+----------------+

BaseSpaceCLI filtering vs. POSIX

An alternative to using the BaseSpaceCLI filter options it to use the standard POSIX tools such as grep and cut. The advantage of using the BaseSpaceCLI filters is that you can still use other options such as column selection and output formatting to help get you the output you want, which can be more convenient:

#using POSIX tools:
$ bs list datasets -f csv | grep common.files | cut -d, -f2
ds.46c118551d51497789ddaf84bbc9bff0
ds.2f03151b6c9b4a909d05b1af729a6fc2
# the equivalent, with BSCLI filters:
$ bs list datasets -f csv --terse --is-type=common.files
ds.46c118551d51497789ddaf84bbc9bff0
ds.2f03151b6c9b4a909d05b1af729a6fc2

Drilling down into a data set

Get details about one data set.

This example is from the VCAT app:

$ bs -c v2cli_prod get dataset -i ds.f45e4fcccbce4fb18dd91bdad7dcb272
+---------------------------------------------------+----------------------------------------------------------------------------------------+
| Id                                                | ds.f45e4fcccbce4fb18dd91bdad7dcb272                                                    |
| Name                                              | NA12878-R1S1vcf-38337470                                                               |
| AppSession.Id                                     | 42463886                                                                               |
| AppSession.Name                                   | NA12878-R1_S1.vcf.gz_2                                                                 |
| AppSession.Application.AppFamilySlug              | basespace-labs.variant-calling-assessment-tool                                         |
...

Get dataset attributes

This example is a FASTQ app:

$ bs list attributes dataset -i ds.2f5b56dddc0440858943246ba4ac9d11
+---------------------+---------------+
|        Name         |     Value     |
+---------------------+---------------+
| TotalReadsPF        | 4.1119628e+07 |
| MaxLengthIndexRead1 | 8             |
| MaxLengthRead1      | 151           |
| MaxLengthRead2      | 151           |
| IsPairedEnd         | true          |
| TotalClustersPF     | 2.0559814e+07 |
| TotalClustersRaw    | 2.6711606e+07 |
| TotalReadsRaw       | 5.3423212e+07 |
| MaxLengthIndexRead2 | 8             |
+---------------------+---------------+

Get file contents for a dataset

This is a VCAT example:

$ bs contents dataset -i ds.f45e4fcccbce4fb18dd91bdad7dcb272
+------------+-----------------------------------------------------------------------------------------+
|     Id     |                                        FilePath                                         |
+------------+-----------------------------------------------------------------------------------------+
| 7240583239 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.vcf.gz.tbi   |
| 7240583238 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.vcf.gz       |
| 7240583237 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.summary.csv  |
| 7240583236 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.metrics.json |
| 7240583235 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.extended.csv |
| 7240583234 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.json  |
| 7240583233 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.csv   |
| 7240583232 | report.log                                                                              |
| 7240583231 | report.json                                                                             |
+------------+-----------------------------------------------------------------------------------------+

Download subset of files (by extension) from a dataset:

# will download into directory /tmp/vcat
$ bs download dataset -i ds.2f5b56dddc0440858943246ba4ac9d11 --extension=json -o /tmp/vcat
NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.metrics.json  27.44 KB / 27.44 KB [============] 100.00% 348.12 KB/s 0s
happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.metrics.json
NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.json  5.02 KB / 5.02 KB [================] 100.00% 22.04 MB/s 0s
happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.json
report.json  2.53 KB / 2.53 KB [=====================================================================================] 100.00% 13.41 MB/s 0s
report.json
NA12878-R1S1vcf-38337470.ds.f45e4fcccbce4fb18dd91bdad7dcb272.json  2.71 KB / 2.71 KB [===============================] 100.00% 61.58 MB/s 0s
NA12878-R1S1vcf-38337470.ds.f45e4fcccbce4fb18dd91bdad7dcb272.json

Working with properties

Many BSSH entities can be tagged with properties, key/value pairs that label those entities. Some entities, like appsessions, come tagged with properties automatically but others can still be added manually. BSCLI lets you inspect existing properties of any type and create string properties.

BSSH entities that can be labelled with properties include projects, runs, biosamples, appsession, appresults and datasets.

Inspecting properties of an appsession

The command to see all the properties of an entity is property list:

$ bs appsession property list -i 46664618
+-------------------------------+----------------------------+-------------+------------------------------------+
|             Name              |        Description         |    Type     |              Content               |
+-------------------------------+----------------------------+-------------+------------------------------------+
| Output.Projects               |                            | project[]   | <use `bs get` to obtain more info> |
| Output.Datasets               |                            | dataset[]   | <use `bs get` to obtain more info> |
| Input.snp_vqsr                | SNP VQSR sensitivity       | string      | "99.5"                             |
| Input.sample-id.attributes    | Sample Attributes          | map[]       | <use `bs get` to obtain more info> |
| Input.reference_genome        | Reference genome           | string      | "b37_decoy"                        |
| Input.Projects                |                            | project[]   | <use `bs get` to obtain more info> |
| Input.project-id.attributes   | Save Results To Attributes | map[]       | <use `bs get` to obtain more info> |
| Input.project-id              | Save Results To            | project     | <use `bs get` to obtain more info> |
| Input.indel_vqsr              | Indel VQSR sensitivity     | string      | "99.5"                             |
| Input.Datasets                |                            | dataset[]   | <use `bs get` to obtain more info> |
| Input.BioSamples              |                            | biosample[] | <use `bs get` to obtain more info> |
| Input.app-session-name        | Analysis Name              | string      | "Sentieon [LocalDateTime]"         |
| BaseSpace.Private.IsMultiNode |                            | string      | "True"                             |
+-------------------------------+----------------------------+-------------+------------------------------------+

To see an individual property, use property get with the --property-name switch:

$ bs appsession property get -i 46664618 --property-name="Input.snp_vqsr"
"99.5"

Note that many of the properties of this appsession are themselves BSSH entities, which can be listed directly by using property get:

$ bs appsession property get -i 46664618 --property-name="Output.Projects"
+---------------+----------+----------------+
|     Name      |    Id    |   TotalSize    |
+---------------+----------+----------------+
| sgdp_sentieon | 38827790 | 32372028734815 |
+---------------+----------+----------------+

This is particularly useful for finding the inputs and outputs of an app:

$ bs appsession property get -i 46664618 --property-name="Input.Datasets"
+------------+-------------------------------------+---------------------------------+---------------------+
|    Name    |                 Id                  |          Project.Name           |   DataSetType.Id    |
+------------+-------------------------------------+---------------------------------+---------------------+
| ERR1347692 | ds.4fa74a92d9b04a69a5cb53f603d965fa | Simons Genome Diversity Project | illumina.fastq.v1.8 |
+------------+-------------------------------------+---------------------------------+---------------------+




$ bs appsession property get -i 46664618 --property-name="Output.Datasets"
+----------+-------------------------------------+---------------+----------------+
|   Name   |                 Id                  | Project.Name  | DataSetType.Id |
+----------+-------------------------------------+---------------+----------------+
| 36701821 | ds.e599c516419e4470b03f26c480fce45d | sgdp_sentieon | common.files   |
+----------+-------------------------------------+---------------+----------------+

We can use some shell features to see the list of files that were output in a single command:

$ bs contents dataset -i $(bs appsession property get -i 46664618 --property-name="Output.Datasets" --terse)
+------------+--------------------------------------------+
|     Id     |                  FilePath                  |
+------------+--------------------------------------------+
| 7776871946 | .basespace/ERR1347692_128_hs37d5.cov.gz    |
| 7776871945 | .basespace/ERR1347692_128_Y.cov.gz         |
| 7776871944 | .basespace/ERR1347692_128_X.cov.gz         |
| 7776871943 | .basespace/ERR1347692_128_NC_007605.cov.gz |
| 7776871942 | .basespace/ERR1347692_128_MT.cov.gz        |
| 7776871941 | .basespace/ERR1347692_128_GL0002491.cov.gz |
| 7776871940 | .basespace/ERR1347692_128_GL0002481.cov.gz |
...

Setting and inspecting properties on a project

Projects by default do not have properties set:

$ bs projects properties list -i 27932921
$

We can add string properties:

$ bs projects properties set -i 27932921 --property-name="MyNamespace.TestProperty" --property-content="TestValue"
$ bs projects properties list -i 27932921
+--------------------------+-------------+--------+-------------+
|          Name            | Description |  Type  |   Content   |
+--------------------------+-------------+--------+-------------+
| MyNamespace.TestProperty |             | string | "TestValue" |
+--------------------------+-------------+--------+-------------+

Note that a namespace prefix for a property name is compulsory:

$ bs projects properties set -i 27932921 --property-name="TestProperty" --property-content="TestValue"
ERROR: *** BASESPACE.PROPERTIES.NAME_INVALID: Property name: TestPropery must contain 2 or more segments split by a period. Each segment may contain letters, numbers, '-', and '\_'. First segment must start with a letter or number. ***

Setting lane QC thresholds

Authenticate with the appropriate scopes into a config called "laneqc"

We will refer to this in future commands.

$ ./bs auth --scope="READ GLOBAL","CREATE GLOBAL","CONFIGURE QC" -c laneqc
Please go to this URL to authenticate:  https://basespace.illumina.com/oauth/device?code=HrACj
Created config file  /Users/basespaceuser/.basespace/laneqc.cfg
Welcome, BSSH.V2 TestUser

Look at lane QC thresholds to confirm they are blank:

$ bs -c laneqc lane threshold export
Name,Group,Operator,ThresholdValues

Create a csv file to define the QC thresholds and set the lane QC thresholds using that file:

$ cat > /tmp/thresholds.txt                                       
Name,Group,Operator,ThresholdValues                                   
PercentGtQ30,SequencingRead1,GreaterThanOrEqual,50                    
PercentGtQ30,SequencingRead2,GreaterThanOrEqual,40                    
$ bs -c laneqc lane threshold import -f /tmp/thresholds.txt          
# should finish without errors!                                      

View lane QC thresholds:

$ bs -c laneqc lane threshold export                                 
Name,Group,Operator,ThresholdValues                                   
PercentGtQ30,SequencingRead1,GreaterThanOrEqual,50                    
PercentGtQ30,SequencingRead2,GreaterThanOrEqual,40                    

Clear lane QC thresholds to remove them and view to make sure they have been cleared:

$ bs -c laneqc lane threshold clear                                  
$ ./bs -c laneqc lane threshold export                               
Name,Group,Operator,ThresholdValues                                   

Creating a biosample and uploading FASTQ data against it

Biosample Creation

# warning! if your project does not already exist it will be implicitly created                                                    
$ bs create biosample -n "MyBioSample" -p "MyProject"                

Note that there are quite a few optional metadata parameters for biosamples. These are primarily designed to help high-throughput labs classify and display biosamples:

$ bs create biosample --help
(snip)
    BioSample Options:
      -n, --name=                                  Name of the BioSample
      -p, --project=                               Name of the project where FastQs will be stored. Created if not found.
          --container-name=                        Name of container
          --container-position=                    Position within the container
          --analysis-workflow=                     Name of the analysis to schedule
          --prep-request=                          Name of the lab workflow that LIMS should perform
          --required-yield=                        Required yield in Gbp that is needed before launching analysis (required if --prep-request is provided)
          --metadata=                              Key/Value metadata properties to set on the BioSample
          --delivery-mode=[Deliver|Do Not Deliver] Intial delivery mode
(snip)

You can preview biosample creation (--preview), to validate that the data provided is correct.

$ bs create biosample -n "MyBioSample" -p "MyProject" --preview      
ERROR:  * * * Error in BioSample Name: BioSample 'MyBioSample' already exists and cannot be imported  * * *                                  

To attach a new analysis workflow to an existing biosample, use the --allow-existing option.

FASTQ upload

There are two options to associate uploaded FASTQ files with a biosample:

  • Upload files that match the biosample name
  • Force the upload to attach FASTQs to the specified biosample, regardless of their name

Upload FASTQ files that match the biosample name

$ ls                                                                 
MyBioSample_S1_L001_R1_001.fastq.gz                               
MyBioSample_S1_L001_R2_001.fastq.gz                               
# note that you need a project ID here, not a project name as you did when you created the biosample!                                       
$ bs upload dataset -p 27943921 MyBioSample_S1_L001_R1_001.fastq.gz MyBioSample_S1_L001_R2_001.fastq.gz               
Creating sample: MyBioSample
MyBioSample_S1_L001_R1_001.fastq.gz 1.07 GiB / 1.07 GiB [=========================================================] 100.00%
MyBioSample_S1_L001_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [=========================================================] 100.00%
Upload complete


# note that datasets and created asynchronously and there can be a delay
$ bs list dataset --input-biosample="MyBioSample"  
+-------------+-------------------------------------+--------------+---------------------+
|    Name     |                 Id                  | Project.Name |   DataSetType.Id    |
+-------------+-------------------------------------+--------------+---------------------+
| MyBioSample | ds.94f7e9663e86473c8582dcf85a830195 | MyProject    | illumina.fastq.v1.8 |
+-------------+-------------------------------------+--------------+---------------------+

Force the upload to attach FASTQS to the specified biosample, regardless of their name

$ ls
valid_S1_L001_R1_001.fastq.gz   valid_S1_L001_R2_001.fastq.gz
# note that you need a project ID here, not a project name as you did when you created the biosample!
$ bs upload dataset --biosample-name="MyBioSample" -p 27943921 valid_S1_L001_R1_001.fastq.gz valid_S1_L001_R2_001.fastq.gz
MyBioSample_S1_L001_R1_001.fastq.gz 1.07 GiB / 1.07 GiB [=========================================================] 100.00%
MyBioSample_S1_L001_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [=========================================================] 100.00%
Upload complete
$ /tmp/BSCLI/amd64-darwin/bs list dataset --input-biosample="MyBioSample"
+-------------+-------------------------------------+--------------+---------------------+
|    Name     |                 Id                  | Project.Name |   DataSetType.Id    |
+-------------+-------------------------------------+--------------+---------------------+
| MyBioSample | ds.94f7e9663e86473c8582dcf85a830195 | MyProject    | illumina.fastq.v1.8 |
| MyBioSample | ds.64706f7d2e504e1c9495c00c468d6640 | MyProject    | illumina.fastq.v1.8 |
+-------------+-------------------------------------+--------------+---------------------+

Note that even though the dataset has a name to match the biosample, the files within retain their original names:

$ bs dataset contents -i ds.94f7e9663e86473c8582dcf85a830195
+------------+-------------------------------------+
|     Id     |              FilePath               |
+------------+-------------------------------------+
| 8652306587 | MyBioSample_S1_L001_R1_001.fastq.gz |
| 8652306586 | MyBioSample_S1_L001_R2_001.fastq.gz |
+------------+-------------------------------------+
$ bs dataset contents -i ds.64706f7d2e504e1c9495c00c468d6640
+------------+-------------------------------+
|     Id     |           FilePath            |
+------------+-------------------------------+
| 8652572270 | valid_S1_L001_R2_001.fastq.gz |
| 8652572269 | valid_S1_L001_R1_001.fastq.gz |
+------------+-------------------------------+

Uploading directories full of fastq files

The bs upload dataset supports a --recursive option that scans a directory and its subdirectories looking for fastq files:

$ ls
MyBioSample2_S1_L002_R1_001.fastq.gz    MyBioSample2_S1_L002_R2_001.fastq.gz    MyBioSample_S1_L001_R1_001.fastq.gz MyBioSample_S1_L001_R2_001.fastq.gz
$ bs upload dataset -p 21646627 --recursive .
Creating sample: MyBioSample2
MyBioSample2_S1_L002_R1_001.fastq.gz 1.08 GiB / 1.08 GiB [==========================================================] 100.00%
MyBioSample2_S1_L002_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [==========================================================] 100.00%
Upload complete
Creating sample: MyBioSample
MyBioSample_S1_L001_R1_001.fastq.gz 1.07 GiB / 1.07 GiB [==========================================================] 100.00%
MyBioSample_S1_L001_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [==========================================================] 100.00%
Upload complete 

By default, these will be automatically grouped by name and uploaded as many individual fastq datasets, but you can also force these to all be uploaded to the same biosample:

$ bs upload dataset -p 21646627 --recursive . --biosample-name=MyBioSample
Creating sample: MyBioSample
MyBioSample2_S1_L002_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [==========================================================] 100.00%
MyBioSample2_S1_L002_R1_001.fastq.gz 1.08 GiB / 1.08 GiB [==========================================================] 100.00%
MyBioSample_S1_L001_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [==========================================================] 100.00%
MyBioSample_S1_L001_R1_001.fastq.gz 1.07 GiB / 1.07 GiB [==========================================================] 100.00%
Upload complete

Uploading arbitrary data

You can use bs upload dataset to upload arbitrary files:

$ ls
testfile1.txt   testfile2.txt
$ bs upload dataset -p 21646627 --type common.files --recursive .
Creating dataset: testfile1.txt+
testfile2.txt 2.76 KiB / 2.76 KiB [==========================================================] 100.00%
testfile1.txt 2.31 KiB / 2.31 KiB [==========================================================] 100.00%
Upload complete

If no name for the dataset is specified, a name will be created based on one of the files to be uploaded. You can specify a name with the --name option.

Controlling upload options

The bs upload dataset command contains a number of options to control how the upload is conducted and reported. For example, to upload with high concurrency and no progress bars:

$ bs upload dataset -p 21646627 --type common.files --no-progress-bars --concurrency=high --recursive .

Configure Automated Workflow

The automated workflow feature of BSSH V2 allows apps to be launched automatically when they meet a set of conditions or dependencies. The feature also allows automated quality control to be applied so that the AppSession is automatically marked as "QCPassed" or "QCFailed" based on the metrics it generated.

Before an app can be launched automatically in this way, a "workflow" needs to be created which wraps the app and describes its dependencies and (optionally) any QC thresholds. The workflow takes as input a template appsession, an app launch that has been configured and launched manually through the GUI to contain the desired settings; automated launches for this workflow will be based on these settings.

It is also possible to chain automated app launches by creating another workflow for the downstream app and creating a dependency on the upstream step.

Get a token

Get the MANAGE APPLICATIONS token to create and work with workflows:

$ bs auth -c emea --force --api-server https://api.emea.illumina.com/ --scopes "READ GLOBAL","CREATE GLOBAL","BROWSE GLOBAL","MANAGE APPLICATIONS"

Create a Workflow

$ bs -c cloud-manageapps workflow create -n TestWorkflow -d CLICreated --application-id=2039037 --appsession-id=41060211
3978975

In this example, the application and appsession are based on the WGS5.0.0 app.

The value returned is the ID of the workflow. You can see it by doing:

$ bs -c cloud-manageapps list applications --category=workflow
+--------------+---------+---------------+
|     Name     |   Id    | VersionNumber |
+--------------+---------+---------------+
| TestWorkflow | 3978975 | 1.0.0         |
+--------------+---------+---------------+

Set Dependencies

Create a biosample yield dependency:

$ bs -c cloud-manageapps workflow dependency add biosample-yield --chooser-id=sample-id --can-use-primary-biosample -i 3978975

Create an app completion dependency:

To look up the details of the app you want to use:

$ bs get application -i 19019
+----------------------------+----------------------------------------------------------------------------------------+
| AppFamilySlug              | illumina-lab-services.genotyping-vcf-uploader                                          |
| AppVersionSlug             | illumina-lab-services.genotyping-vcf-uploader.1.0.0                                    |
| Id                         | 19019                                                                                  |
(snip)

Create the dependency:

$ bs -c cloud-manageapps workflow dependency add app-completion --application-id illumina-lab-services.genotyping-vcf-uploader.1.0.0 -i 3978975 --chooser-id=array-vcf-file --qc-pass --file-selector='.*\\.vcf\\.gz$|.*\\.vcf$'

Review:

$ bs -c cloud-manageapps workflow dependency export -i 3978975
[
  {
    "Type": "BioSampleYield",
    "Attributes": {
      "BioSampleChooserId": "sample-id",
      "CanUsePrimaryBioSample": false,
      "Label": "",
      "LibraryPrepId": "",
      "MixLibraryTypesAllowed": false
    },
    "Dependencies": null
  },
  {
    "Type": "AppCompletion",
    "Attributes": {
      "ApplicationId": "illumina-lab-services.genotyping-vcf-uploader.1.0.0",
      "CanUsePrimaryResource": false,
      "ColumnId": "",
      "Label": "",
      "RequireQcPass": true,
      "ResourceChooserId": "array-vcf-file"
    },
    "Dependencies": null
  }
]

Set QC Thresholds

In this example, the thresholds are configured in the following csv file:

/tmp/qcthresholds.csv

Name

DatasetTypeId

Operator

ThresholdValues

illumina_isaac_v5.autosome_callability

illumina.isaac.v5

GreaterThanOrEqual

95

illumina_isaac_v5.autosome_coverage_at_10x

illumina.isaac.v5

GreaterThanOrEqual

98

illumina_isaac_v5.autosome_coverage_at_1x

illumina.isaac.v5

GreaterThanOrEqual

99.49

illumina_isaac_v5.autosome_exon_callability

illumina.isaac.v5

GreaterThanOrEqual

97

illumina_isaac_v5.autosome_exon_coverage_at_10x

illumina.isaac.v5

GreaterThanOrEqual

98.5

illumina_isaac_v5.autosome_exon_coverage_at_1x

illumina.isaac.v5

GreaterThanOrEqual

99.29

illumina_isaac_v5.autosome_mean_coverage

illumina.isaac.v5

GreaterThanOrEqual

30

illumina_isaac_v5.cnvs

illumina.isaac.v5

LessThanOrEqual

300

illumina_isaac_v5.contamination

illumina.isaac.v5

LessThanOrEqual

5

illumina_isaac_v5.fragment_length_median

illumina.isaac.v5

GreaterThanOrEqual

420

illumina_isaac_v5.frameshift_deletions

illumina.isaac.v5

GreaterThanOrEqual

0

illumina_isaac_v5.indel_het_to_hom_ratio

illumina.isaac.v5

LessThanOrEqual

2.5

illumina_isaac_v5.mapq_more_than_10_autosome_coverage_at_15x

illumina.isaac.v5

GreaterThanOrEqual

95

illumina_isaac_v5.mapq_more_than_10_autosome_exon_coverage_at_15x

illumina.isaac.v5

GreaterThanOrEqual

95

illumina_isaac_v5.mismatch_rate_read_1

illumina.isaac.v5

LessThanOrEqual

1

illumina_isaac_v5.mismatch_rate_read_2

illumina.isaac.v5

LessThanOrEqual

2.3

illumina_isaac_v5.percent_aligned_reads

illumina.isaac.v5

GreaterThanOrEqual

92

illumina_isaac_v5.percent_at_dropout

illumina.isaac.v5

LessThanOrEqual

2.43

illumina_isaac_v5.percent_gc_dropout

illumina.isaac.v5

LessThanOrEqual

2.5

illumina_isaac_v5.percent_q30_bases

illumina.isaac.v5

GreaterThanOrEqual

80

illumina_isaac_v5.percent_q30_bases_read_2

illumina.isaac.v5

GreaterThanOrEqual

77

illumina_isaac_v5.percent_read_pairs_aligned_to_different_chromosomes

illumina.isaac.v5

LessThanOrEqual

1.5

illumina_isaac_v5.q30_bases_excluding_clipped_and_duplicate_read_bases

illumina.isaac.v5

GreaterThanOrEqual

9e+10

illumina_isaac_v5.read_enrichment_at_80percent_gc

illumina.isaac.v5

GreaterThanOrEqual

0.8

illumina_isaac_v5.snv_het_to_hom_ratio

illumina.isaac.v5

LessThanOrEqual

2.05

illumina_isaac_v5.total_pf_bases

illumina.isaac.v5

GreaterThanOrEqual

1e+11

illumina_isaac_v5.array_concordance

illumina.isaac.v5

GreaterThanOrEqual

99.3

illumina_isaac_v5.array_concordance_usage

illumina.isaac.v5

GreaterThanOrEqual

95

Do this:

$ bs -c cloud-manageapps workflow threshold import -f /tmp/qcthresholds.csv -i 3978975

Accession Biosample

To create a new instance of your workflow, create a biosample with that workflow. Note that you need to use the project name not the ID here - this is to match the manifest import mechanism.

$ bs -c cloud-manageapps biosample create -p "MyProject" -n "MyBiosample" --analysis-workflow "TestWorkflow"

Inspect the Biosamples you've just created

$ bs list biosamples --newer-than=1d

Find the appsession that's been created as your workflow

Setting different fields here shows you extra information about the status. Note that you need to use the biosample ID and not the name here.

$ bs -c cloud-manageapps appsession list --input-biosamples=$BIOSAMPLEID -F Id -F Status -F StatusSummary

Upload Fastq files against that biosample

Uploading data against your new biosample can be carried out as listed in a previous example. This will register yield against this biosample that can trigger an app launch if the workflow has a yield dependency.

Launching Apps

This section provides some examples for manual app launch at the command line. This is distinct for configuring apps for automated apps, as provided in an earlier example.

Both workflow-driven launch and the V1 CLI only allow launches based on an initial launch from the GUI being used as a template. The V2 CLI removes the need for this template appsession by parsing the form for each app directly and giving the user access, via the command line, to view and configure options for the relevant app.

View the available options for an app

For a given app name and version, the --list flag will return a large table showing all available options for a commandline launch:

$ bs launch application -n "Whole Genome Sequencing" --app-version 7.0.1 --list
+------------------------+----------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+----------+
|         Option         |      Type      |                                   Choices                                   |                                  Default                                   | Multiselect | Required |
+------------------------+----------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+----------+
| app-session-name       | TextBox        | <String>                                                                    | Example [LocalDateTime]                                                    | false       | true     |
| project-id             | ProjectChooser | <Project>                                                                   |                                                                            | false       | true     |
| sample-id              | SampleChooser  | <Sample>                                                                    |                                                                            | true        | false    |
| bam-file-id            | FileChooser    | <File>                                                                      |                                                                            | true        | false    |
| reference-genome       | Select         | /data/scratch/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta,        | /data/scratch/hg38/Homo_sapiens/NCBI/GRCh38Decoy/Sequence/WholeGenomeFasta | false       | true     |
|                        |                | /data/scratch/GRCh37/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta, |                                                                            |             |          |
|                        |                | /data/scratch/hg38/Homo_sapiens/NCBI/GRCh38Decoy/Sequence/WholeGenomeFasta  |                                                                            |             |          |
| enable-variant-calling | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| enable-sv-calling      | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| enable-cnv-calling     | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| annotation-source      | RadioButton    | ensembl, refseq, both, none                                                 | ensembl                                                                    | false       | false    |
+------------------------+----------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+----------+

Build a launch command

To launch an app, supply the app name and version along with any settings using the --option / -o flag in the format optionName:value. The launch command expects "New" / V2 entities for all inputs, such as biosamples and datasets rather than samples and appresults. Sequence Hub entities can be referred to in an app launch by their unique ID, while other launch arguments can accept plain text:

$ bs launch application -n 'Whole Genome Sequencing' --app-version 7.0.1 \
-o project-id:1232 -o bam-file-id:3534333 -o annotation-source:both \
-l "My test appsession"

You can also launch apps by their ID:

$ bs launch application -i 5143138 -o project-id:1232 -o bam-file-id:3534333

If an option accepts multiple arguments (check with --list), you can supply these as comma-separated values:

$ bs launch application -i 5143138 -o project-id:1232 -o bam-file-id:3534333,232321

Launching an app with biosamples

In most cases, biosamples can be passed to a launch command by just their ID:

$ bs launch application -n 'Whole Genome Sequencing' --app-version 7.0.1 \
-o project-id:1232 -o sample-id:2323244

If a biosample contains FASTQ datasets with a mix of library preps however, you will need to specify the library prep ID for the FASTQ datasets you wish to launch with in the format:

-o sample-id:343432/librarypreps/1014015

Launching an app with advanced form controls

ResourceMatchers

Apps such as Tumor Normal v5 use a ResourceMatcher to submit matched pairs of WGS datasets. For CLI launch, the format for these fields is:

-o input-id:'col1_dataset1,col1_dataset2;col2_dataset1,col2_dataset2'

Here commas seperate multiple inputs and a semi-colon delimits a column, so the above string would render as the following table if launched through the Sequence Hub web user interface:

Normal            Tumor
col1_dataset1     col2_dataset1
col1_dataset2     col2_dataset2

TabularFieldsets

TabularFieldsets are sophisticated form controls shown as an expandable table of sub-controls when launching an application through the web user interface. To set options for this type of controls through the CLI, the format is: -o tabularFieldsetControlName.subControlName:value

For example, VCAT v2.3.0 uses a TabularFieldset named sample-pairs containing a FileChooser named file-id and a TextBox named file-label (as shown by --list). This application can be launched as shown:

$ bs launch application -n "Variant Calling Assessment Tool" --app-version 2.3.0 \
-o sample-pairs.file-id:11232444,11232445 -o sample-pairs.file-label:vcf1,vcf2 \
...

DisplayFields

DisplayFields are additional controls that pop-up underneath SampleChoosers and AppResultChoosers — these are not yet supported by the launch API. The CLI will warn you if the app you're trying to launch uses these, for example:

$ bs launch application -n 'SPAdes Genome Assembler' --app-version 3.5.0 --list
WARNING: Input field 'sample-id' uses DisplayFields which are not yet supported, you may not be able to launch this app !

Application launch FAQs

How can I find biosample/project/dataset IDs?

The CLI can get many entities by name and return their ID:

$ bs get project -n MyProjectName --terse
$ bs get biosample -n MyBioSampleName --terse
$ bs get dataset -n MyDatasetName --terse

Alternatively you can retreive IDs via list commands, for example:

$ bs list datasets --project-name ProjectContainingMyDataset

Project and Biosample IDs are also visible in URLs when browsing Sequence Hub through the web user interface.

How can I find a file ID?

You can list the contents of an appresult or dataset to get the file IDs:

$ bs contents appresult -i 12224237 | head
+------------+----------------------------------------------------------------------+
|     Id     |                               FilePath                               |
+------------+----------------------------------------------------------------------+
| 8618632807 | Plots/s_0_1_2212_MismatchByCycle.png                                 |
| 8618632806 | Plots/s_0_1_2113_MismatchByCycle.png                                 |
| 8618632805 | Plots/sorted_S1_G1_chr2_MismatchByCycle.png                          |
| 8618632804 | Plots/s_0_1_1115_MismatchByCycle.png                                 |
| 8618632803 | Plots/s_1_1_2108_MismatchByCycle.png                                 |
| 8618632802 | Plots/s_0_2_1107_MismatchByCycle.png                                 |
| 8618632801 | Plots/s_0_1_1112_NoCallByCycle.png                                   |

You can also use BaseMount to find the file by navigating to the project and appresult which generated it:

${BASEMOUNT}/Projects/<project>/appresults/<appsession name>/Files/file.vcf

Then use the Files.metadata directory to get the ID:

$ cat ${BASEMOUNT}/Projects/<project>/appresults/<appsession name>/Files.metadata/file.vcf/.id  

How can I change my analysis name?

Analysis name is a special parameter, best practice is to set your argument through both the form control and via the --appsession-label / -l argument:

bs application launch -n "Whole Genome Sequencing" --app-version 7.0.1 \
-o app-session-name:"Your appsession name" -l "Your appsession name" \
...

How can I fix an 'AppSession name is required' error?

This error is due to an outdated CLI version which is no longer compatible with application launch. Please update to the latest CLI version.