Native App Conventions

For a general overview of Native Apps, please refer to the Native App Overview.

This document covers the conventions of the Native Apps Infrastructure in BaseSpace. Much of these conventions fall into a service that we call SpaceDock. SpaceDock performs many functions for the app which will be covered below. From this document, you will learn the following:


What is SpaceDock

SpaceDock is a Linux package developed by BaseSpace that is installed on any machine that wishes to receive Native app jobs from BaseSpace. SpaceDock is a service that comes installed on the Native Apps Developer VM and is the same service that is installed in the AWS instance where the app will be run.

When a Native apps job is submitted to BaseSpace, BaseSpace will spin up an AWS EC2 instance (or multiple instances) also with SpaceDock installed and your application will be started and your app will behave identically as on your developer VM.

SpaceDock automatically manages several important functions for developers, including:

  • Receiving jobs from BaseSpace
  • Granting permission to the app to access the user's data for the analysis
  • Downloading input data to the machine
  • Reporting status and logs back to BaseSpace for the user
  • Providing reference Genomes data
  • Uploading results back to BaseSpace

While SpaceDock handles all of this automatically, provided that certain conventions are followed, a developer may want to further optimize these features. In these cases, the BaseSpace Rest API can also be used to perform many of these functions. In most cases though, no direct BaseSpace API calls will need to be performed directly by the app as SpaceDock is designed to take care of it for you.


How to get SpaceDock on your Virtual Machine

When a new version of SpaceDock is released, it will be available as a Debian package for download.

Update an existing SpaceDock package

There are two methods to update SpaceDock.

If you are only updating the existing version of SpaceDock on your machine (or your virtual machine), please refer to the below information. In the future, we will release more information on how the SpaceDock package can be installed on any machine (not limited to the virtual machine.)

To get the latest version of SpaceDock on your machine, simply execute the following steps:

  1. Update all packages on your machine

    sudo apt-get update
    
  2. Install the newly updated SpaceDock package

    sudo apt-get install spacedock
    

This will put the latest verion of SpaceDock on your machine.

Download SpaceDock for the first time

If you are configuring your own Native Apps development machine, one of the steps you will have to do is to download SpaceDock.

To get SpaceDock on your machine:

  1. Point to the SpaceDock repository and add it to apt-get:

    echo deb http://basespace-apt.s3.amazonaws.com spacedock main | sudo tee -a /etc/apt/sources.list;

  2. Update packages and policy for SpaceDock:

    sudo apt-get update; apt-cache policy spacedock

  3. Update SpaceDock:

    sudo apt-get update; sudo apt-get install spacedock


What Does SpaceDock Do

This section will cover all of the developer-related features of SpaceDock.


Receiving jobs from BaseSpace

SpaceDock's main feature is to poll BaseSpace for new Jobs for your application. SpaceDock is used to create a connection with BaseSpace.

To have SpaceDock poll BaseSpace for Jobs for your particular application, please follow these steps:

  1. Ensure that the SpaceDock package is installed on your local machine, virtual machine, or instance. The location depends on where the app is running.

  2. Create an Input Form for the app

  3. Open the form in the Forms Builder window

  4. On the right side of the Forms Builder window, select and copy the command after Sample command-line to start local agent:, it should look something like sudo spacedock -a {Agent_Id} -m https://mission.basespace.illumina.com

  5. Paste this command into your machine's terminal (the machine where the app was installed)


User Authentication and Permissions

Based on the app's input form, BaseSpace will automatically ask for the permissions necessary to download input data and save results back to BaseSpace, the developer does not have to worry about this! SpaceDock will automatically gain permission to the user's data based on what is required in the form and store this permission while the app is running. Each form control dictates what level of access is needed for a given resource chosen by the user.


Getting access to input data

One of the first things SpaceDock does when running an app is to place the data chosen by the user in a location accessible to the app. SpaceDock creates a data folder in the Docker image containing two other folders, input and output.

The input folder will contain the following data by the time your application is started:

  • Any input Sample data that was selected in the input form is downloaded to the /data/input/samples/<sample-id> folder
  • Any input AppResult data that was selected in the input form is downloaded to the /data/input/appresults/<appresult-id> folder
  • The AppSession that was created after the user selected Launch on the input form is saved as a JSON file in /data/input/AppSession.json. This file is important because the AppSession holds all of the information the user selected on the input form as Properties of the AppSession. This is persisted for the convenience of not needing to make the API call GET: appsessions/{id}. Please refer to AppSessions in the Rest API Reference for further information about AppSessions.
  • For Multi-Node applications, the AppSession that retains all of the information that the user selected on the input form is saved as a .json file within the data/input folder as ParentAppSession.json. You will need to parse this information in order to find what the user selected when launching the app.

The conventions for downloading input data depend on the type of Callbacks.js script that is configured for the app. For more information about the Callbacks.js script, please refer to the Formbuilder Documentation. The conventions are slightly different for the Single-Node configuration vs the Multi-Node configuration, they are described in more detail below.

In addition, the app will not pre-download all of the input data if the BSFS option is enabled on the input form and will instead download the data as the app requests it.

Using BSFS or pre-download

Data can be made available to your container using 2 different mechanisms: pre-download of files or through a BSFS (BaseSpace File System) mount point.

Using BSFS to mount your data presents a number of benefits over the download option, which is why BSFS is now enabled by default when creating a new application in the developer portal.

Essentially, the developer has to simply add the following to their return value for their callbacks.js script:

Options: [ "bsfs.enabled=true" ]

This is shown in the following Single-Node example:

function launchSpec(dataProvider)
{
    return 
        {
            commandLine: ["/helloWorld.sh"],
            containerImageId: "tliu1/helloworld",
            Options: [ "bsfs.enabled=true" ]
        };
}

Accessing Input Data for a Single-Node Native App

Here is an example of a Single-Node Native App's Callbacks.js script:

function launchSpec(dataProvider)
{
    return 
        {
            commandLine: ["/helloWorld.sh"],
            containerImageId: "tliu1/helloworld",
            Options: [ "bsfs.enabled=true" ]
        };
}

Note the Options array, which is used to pass the option "bsfs.enabled=true" in order to enable BSFS. The option may be removed in the event using BSFS is not desired.

When the Single-Node template is used for the Callbacks.js script, all input data is made available at the locations referenced above.

Accessing Input Data for a Native App using the Multi-Node Template

The Multi-Node case is a little more complicated. When a Multi-Node app is started, one parent AppSession is create for the aggregation of each node, and one child AppSession is created per node of the analysis. The parent AppSession will contain all of the input Properties from the form, however the child node's AppSession needs to be explicitly assigned Properties from the Callbacks.js script otherwise it will have no Properties. Here is an example of a Multi-Node Native App's Callbacks.js script:

function launchSpec(dataProvider)
{
    var retval = { nodes: [] };

    var samples = dataProvider.GetProperty("Input.samples");
    var appresults = dataProvider.GetProperty("Input.appresults");


    for(var i = 0; i < samples.length; i++)
    {
        retval.nodes.push({
              commandLine: ["/helloworld.sh"],
              containerImageId:"tliu1/helloworld”,
                      properties: {
                                    "Input.Samples": [samples[i]], 
                                    "Input.AppResults": [appresults[i]]
                      },
              Options: [ "bsfs.enabled=true" ]
        });
    }
    return retval;
} 

In this example, each node of the analysis is explicitly being assigned two Properties, the Input.Samples and Input.AppResults Properties. These are Sample and AppResult resources that were chosen in the input form by the user and are being retrieved by the Callbacks.js script via the dataProvider.GetProperty() method. These are full Sample and AppResult resources. In the for loop, we can see that each node is given the commandLine argument, containerImageId value which points to the Docker image, the properties field and an Options array. The properties field will add the specified Properties to that particular node, so in this case it would be on each node that is created per Sample. Note that the "bsfs.enabled=true" option in the Options array is set for each node in the multi-node launch.

For multi-node apps, the information from the input form will be saved to the ParentAppSession.json file in the data/input folder for each child node. For any properties that are explicity set for each node, they will be stored in the standard AppSession.json file. For example, if an app's form has many elements, these values do not have to be passed on to each node because they will be saved to the ParentAppSession.json file.

The number of properties that can be passed to each node is currently limited to 25.

Furthermore, this example shows how the commandLine and containerImageId are set in each child node.


Report Status and Errors to the User in the Logs Window

When a user launches an app in BaseSpace, they are taken to the Analysis page where they are given more information about the Analysis that was launched. This Analysis page refreshes periodically and updates the AppSession Status and StatusSummary as well as the output Logs from the app. Here is an example:

SpaceDock will automatically report some Logs to the window, including downloading input data, downloading the docker image, and uploading results back to BaseSpace. In addition to these automatic logs, the app can also write back custom messages to the Logs window.

Writing to the Logs window manually

Anything written to stdout will be written to the Logs window. This window isn't designed to show the entire application log but only a small buffer of recent activity.

Reporting Errors

If the app encounters a minor error but can still continue with the analysis, it may simply write back to the Logs window using the above example.

If the app encounters a fatal error from which the app cannot recover, the app should exit with a non-zero exit status code. This error will be reported to the User automatically in the Logs window and the session will be marked as Aborted.


Uploading custom Log files

From within a Native app, custom log files can also be uploaded back to BaseSpace. These files, once they are uploaded, are available via the blue Logs link on the Analysis Info page above the Logs window.

/data/logs/

Custom log files can be written to the data/logs/ folder. Multiple file types are allowed, it is very flexible. The logs in this folder will be uploaded back to BaseSpace once the job ends, either through a Complete or Aborted status.


Upload Results to BaseSpace with Properties

For upload, place output files into:

/data/output/appresults/<project-id>/[directory_with_appresult_name]/[your_files]

All data in the data/output folder will be uploaded to BaseSpace when the App returns an exit code of 0 which lets SpaceDock know that the analysis is Complete.

A Native Application must always create an AppResult in the output folder, please ensure that a folder for at least one AppResult is created in that folder.

Each AppResult that is created can be individually tagged with Properties. Properties are important because they give the user more information about the analysis, each AppResult should always contain the inputs that were used to get those results. For instance, if Samples were used as input, the AppResult(s) created should be tagged with the input.samples Property indicating which Samples were used with which result. There are some conventions that developers can follow to allow SpaceDock to automatically tag their results with some input data, this will be described below.

App Outputs Only 1 AppResult Folder

If the app takes one or many Samples or AppResults as input but produces only one AppResult each time, that AppResult will automatically be tagged with the input.samples and input.appresults Properties from the AppSession that is created when the form is submitted.

Matching Input Sample and Output AppResult Names

If the name of the AppResult is the same as the name of an input Sample, SpaceDock will automatically tag that AppResult with the input.samples Property and the id of the Sample within it.

Tagging Each AppResult Manually

A _metadata.json file may be included with each AppResult folder in the output, this file just needs to have the following information about the AppResult and a Properties array. The following is an example of a _metadata.json file that can be included with an AppResult, the Name of the AppResult must match the name of the folder that was created for it:

{
    "Name": "MyAppResult",
    "Description": "Analysis result for MyAppResult",
    "HrefAppSession": "v1pre3/appsessions/2056097",
    "Properties": [
        {
            "Type": "string",
            "Name": "RW_BaseSpace_App.Month",
            "Description": "Testing property creation",
            "Content": "1"
        },
        {
            "Type": "string",
            "Name": "RW_BaseSpace_App.Day",
            "Description": "Testing property creation",
            "Content": "20"
        },
        {
            "Type": "string",
            "Name": "RW_BaseSpace_App.Year",
            "Description": "Testing property creation",
            "Content": "2014"
        },
        {
            "Type": "sample[]",
            "Name": "Input.Samples",
            "Items": [
                "v1pre3/samples/710710"
            ]
        }
    ]
}

In the above example, we have included the following information which will be added to the AppResult's metadata in BaseSpace (this is very useful for users and for future apps that may was more information about the analysis):

  • Name: Name of the AppResult, must match the name of the folder created in the data/output/appresults/<project-id>/<appresult-name> folder
  • Description: A description of the AppResult, displayed to the user in the UI
  • HrefAppSession: This will put a reference to the AppSession specified on the AppResult, AppSessions are a way for users to track all of the information in one analysis so it is important to include the HrefAppSession field. This will be the same Href that is found in the metadata for the appsession.json file that is in the data/input folder when the app is launched.
  • Properties: An array of Properties that will be added to the AppResult's metadata
    • Input.Samples: The input Samples that were selected by the user in the input form for the app. Notice that the Type is sample[] or sample, which specifically defines this as a Sample in BaseSpace. The Items array then contains an array of HrefSamples properties which will tag the AppResult with the Samples specified in this array.
    • RW_BaseSpace_App.Year, RW_BaseSpace_App.Month, and RW_BaseSpace_App.Day: These are simple string Properties that will be added to the AppResult's Properties.

For more information about Properties, please refer to Properties in the Rest API Reference, Properties can be free-form but there are some conventions to follow that will designate Properties as certain resources in BaseSpace, e.g. if the Type is sample, sample[], appresult, and appresult[] the Property will be read as a BaseSpace Resource, these are described more in the link above. For more information about AppResults, please refer to AppResults in the Rest API Reference.

The App Report Builder tool can then be used to create output reports for the analysis, these are automatically generated once the Analysis is marked as Complete. For more information about the App Report Builder tool, please refer to the App Report Builder Documentation. Here is an example of an output report:


Access the Genomes Folder For Genome Reference Data

In your Native Apps Virtual Machine, you will find the genomes folder. In this folder, on your local Virtual Machine, you will find only the Phix reference genome. To reduce the size of the Virtual Machine, we have only limited genomes available for local testing. However, we are working on integrating all of iGenomes and many other references into our genomes folder.

In addition, the genomes folder in the Amazon Machine that runs in BaseSpace (instead of locally) has a more robust collection of reference genomes.

The genomes folder can be duplicated on your Docker image by following the Adding Files to your Image.

How to determine which genomes are available

In order to determine what genomes are available for an application, we have created a Docker image that you can run in your account. When launched as a Native app, this Docker image will output a text file with the full structure of the genomes drive.

In order to view the full genomes drive, you must run the application in the BaseSpace Cloud Infrastructure. The VM that we provide has a limited genomes drive with only select genomes available.

The docker image can be pulled down with the following command:

sudo docker pull rwentzel/list-genomes

Now, you can go to the developer portal and configure one of your applications to point to rwentzel/list-genomes as the docker image in the callbacks.js script. Your callbacks.js script should look like the following example:

function launchSpec(dataProvider)
{
    var project = dataProvider.GetProperty("Input.project-id");
    var appResultPathArg = "/data/output/appresults/" + project.Id + "/results";

    var retval = {
        commandLine: ["/opt/illumina/list-genomes/app.sh", "/genomes", appResultPathArg],
        containerImageId: "rwentzel/list-genomes"
    };
}

After the callbacks.js script is configured, simply go to the form for your application in the Form Builder and launch the application.

The output of the application will be a genomes list file.

This Docker image can be further tweaked in order to output more detailed information, feel free to view the source in the Docker image in order to tweak it to your specific needs.


Scratch Folder for Storing Files During Analysis

In some cases, an app may need to use scratch space while it is computing. This is available within BaseSpace in the data/scratch folder. The other directories have limits on how much data can be added, which can cause the app to fail, so please use the scratch folder in all cases.