Tool Integration Short Tutorial
Best practices for writing 'Galaxifiable' programs/scripts
A few guidelines can be followed when developing a program/script with Galaxy integration in mind:
- Input and output files should be passed through arguments (preferably long, eg. "
- Errors and warnings should return a positive exit code (with distinct values if possible) and a meaningful error message.
- Tool purpose and parameters should be documented in detail and available through the help or the manual
Prerequisites to integration
Before integrating a tool in Galaxy, there is a few prerequisites:
- The tool has to be:
- installed on your computing hardware
- tested via classic console runs
- available to the Galaxy user
- Dependencies also have to be installed on your computing hardware
- You must have an access to a Galaxy development instance (if possible with restart rights)
Wrapper writing. Do I need a wrapper ?
Several cases leading to an additional wrapper development:
- Input files are not passed through arguments but loaded based on predetermined fixed names
- The wrapper can create symbolic links between the files and the predetermined fixed names in the cwd
- Output files with conditional names are not passed through arguments
- The wrapper can create symbolic links between the files and predetermined fixed names in the cwd
- Using R functions
- The wrapper can be used for libraries loading, data preprocessing and argument parsing
- Multiple command lines
- A simple bash wrapper can be used to pass arguments and call multiple commands
The main step to integrate a tool in Galaxy consists in writing its XML description file. This file has to be put in the
~/tools directory of your Galaxy instance.
An empty template can be found here: Template.xml
and a very complete example can be found here: Tutorial.xml
A few definitions
As a standard XML, the tool description file consists of interlocked elements opening with a start-tag (e.g. "
<toto>") and closing with an end-tag (e.g. "
</toto>"). Attributes in the start-tag can allow more or less complex configuration (e.g. "
<toto attribute1='string' attribute2=42 >"). The content is the space between the start-tag and the end-tag. An empty-element tag is self-sufficient and doesn't have any content nor end-tag (e.g. "
<toto name='Toto' />").
<tool> element is the master element. Every other element is inside the
<tool> element content.
The tool XML description file begins by:
<tool id="tool_id" name="Tool Name" version="1.0.0">
and ends with:
The complete description of all required and optional attributes can be found on Galaxy wiki.
<requirements> element is optional. It allows defining a list of 3rd party tools, binaries, modules, or ToolShed packages required for the tool to work. Each requirement will be checked when Galaxy starts and the tool will not be loaded if there is one missing.
An individual requirement can be defined with the sub-element
<requirement> (note the missing "s").
<requirements> <requirement type="binary">perl</requirement> <requirement type="python-module">argparse</requirement> </requirements>
More info on the Galaxy wiki.
<command> element contains the unique command line that will be run by Galaxy. Multiple command lines cannot directly be run from the XML description file and will require an additional wrapper (#Wrapper writing. Do I need a wrapper ?).
Some example of
interpreter: bash, python, perl, Rscript ...
interpreter attribute should be set to the executable/wrapper language following a few rules:
|Executable in $PATH||Executable NOT in $PATH|
|Interpreter doesn't have to be set
<command> tool_binary </command>
<command> script.py </command>
|Compiled executable||Interpreted executable/wrapper|
|Interpreter doesn't have to be set
Use full path
<command> /path/to/tool_binary </command>
|Executable/wrapper and xml description file in same directory||Executable/wrapper and xml description file in different directories|
<command interpreter="python"> script.py </command>
Use full path
<command interpreter="python"> /path/to/script.py </command>
The executable input parameters are defined in the <inputs> element. Their value can be used in the command line by preceding the parameters name with a dollar sign ($).
<command> tool_binary --input $input --output $output </command>
In order to access a parameter defined in conditional blocs, it has to be referred to with every leading conditional bloc name.
Command special syntax
A comment begins with a double sharp sign
<command> ## This is a comment tool_binary </command>
Python flow controls are implemented by using single sharp signs "
- the if-then-else conditional statement
#if str($input) == “test.txt”: dummy command #else: dummier command #end if
- a for loop for iterable parameters
#for $r in $repeat: --param $r #end for
Every new line return in the
<command> element content will be converted in spaces at runtime by Galaxy. As a consequence, the resulting command becomes a single line.
<inputs> element allows defining the tool input files and parameters. Most inputs are defined using empty-element tags and various attributes.
Input files are defined as data type parameters:
<inputs> <param name="input" type="data" format="tabular" optional="false" label="Input file" help="Some help" /> </inputs>
format attribute can be set to a predefined Galaxy datatype (txt, fasta, pdf, bam, ...). By setting it to
format="data" any file format will be accepted. This configuration can be useful if the input format is unknown, not defined in Galaxy datatypes, or you encounter problems with the predefined datatype methods.
Numbers (integers and float)
Integers and float parameters can be defined as follow:
<inputs> <param name="int" type="integer" value="42" min="0" max="100" label="Integer value"/> <param name="flt" type="float" value="1.6180" min="0" max="2.5" label="Float value"/> </inputs>
value (default value),
max attributes are optionals.
Text boxes can be defined as follow:
<inputs> <param name="txt" type="text" value="GATTACA" optional="true" label="Text box" help="Some help"/> </inputs>
Checkbox (on/off switch)
Checkboxes are an easy way to pass switches to the tool executable:
<inputs> <param name="switch" type="boolean" checked="false" truevalue="--switch" falsevalue="" label="Check-box"/> </inputs>
This type of parameter is easily used in the <command> element:
<command> tool_binary $switch "Some random string" </command>
which becomes at runtime:
tool_binary "Some random string"
tool_binary --switch "Some random string"
Single or multiple choice(s) list
To let the user choose a value from a list:
<inputs> <param name="selection" type="select" label="List Selection"> <option value="value1" selected="true">Value 1</option> <option value="value2" >Value 2</option> <option value="value3" >Value 3</option> </param> </inputs>
To allow multiple choices selection, with check-boxes:
<inputs> <param name="selection" type="select" display="checkboxes" multiple="true" label="Multiple Choices"> <option value="value1" selected="true">Value 1</option> <option value="value2" selected="true">Value 2</option> <option value="value3" >Value 3</option> </param> </inputs>
In this case, the
$selection variable will contain all the values separated by comas (e.g.
To select from multiple reference files (databases) located on the instance disks, .loc files can be used (#Using .loc files).
- This is the parameter variable name. It is used internally to designate the parameter in all the XML elements.
- This is the parameter name, displayed above the parameter in the Galaxy web interface. It should be short and meaningful.
- This is the parameter help, displayed below the parameter in the Galaxy web interface. It should be clear and as detailled as it need to be for the user to know how to use the parameter.
- When not present or set to
false, the parameter is required and a value has to be set. If set to
true, the parameter is optional.
Frequently, some tool parameter usage depends on other parameters. Conditional blocs can be used to resolve these cases:
<inputs> <conditional name="conditional_bloc" > <param name="condition" type="select" label="Condition" help="" > <option value="choice1" selected="true">Choice 1</option> <option value="choice2">Choice 2</option> </param> <when value="choice1"> <param name="simple_param" type="text" value="dummy" /> </when> <when value="choice2"> <param name="complex_param1" type="text" value="dummy" /> <param name="complex_param2" type="text" value="dummy" /> </when> </conditional> </inputs>
Conditional blocs can also be used to hide advanced parameters:
<inputs> <conditional name="advanced_parameters" > <param name="adv_param" type="select" label="Advanced Parameters" help="" > <option value="hide" selected="true">Hide</option> <option value="show">Show</option> </param> <when value="hide" /> <when value="show"> <param name="adv_param1" type="text" value="dummy" /> <param name="adv_param2" type="text" value="dummy" /> </when> </conditional> </inputs>
Reusing repeated configuration elements (macros)
You can repeat the same XML fragments in a file or between tools in the same repository, by using the macros element.
To reuse XML elements between wrappers in the same directory, you must create a "file_macros.xml (example shown below)
<macros> <macro name="own_junctionsConditional"> <conditional name="own_junctions"> <param name="use_junctions" type="select" label="Use Own Junctions"> <option value="No">No</option> <option value="Yes">Yes</option> </param> </macro> </macros>
<outputs> element allows retrieving all the relevant tool output files.
Output passed through arguments or being STDOUT
With a command line as:
<command> tool_binary --input $input --output $output </command>
<command> tool_binary --input $input > $output </command>
output files can be defined as follow:
<outputs> <data name="output" format="tabular" label="Tabular output file" /> </outputs>
Outputs written to the disk with a fixed name
Some tools do not allow to pass output files through arguments, but instead write them to the current working directory with predefined fixed names. In that case, output files can be defined as follow:
<outputs> <data name="output" format="tabular" from_work_dir="tool_binary.output" label="Tabular output file" /> </outputs>
In previous Galaxy releases, when a tool wrote some informations to STDERR without being fatal errors, the tool run was considered as failed. In order to activate a more user-friendly error management, the
<stdio> element has to be defined:
<stdio> <exit_code range="1:" level="fatal" /> </stdio>
WARNING: There is currently a bug in Galaxy when are used both
<stdio> and the option of
from_work_dir=. It's explain here. So if
from_work_dir= is avoided, the error code is well return.
Using .loc files
Using pre-defined datatypes
The list of supported data formats is contained in the ~/datatypes_conf.xml.sample file. The “format” argument from an input or output file has to match the “extension” argument from an existing datatype.
Each Galaxy datatypes is defined by a Python class, sub-classed from the data:Data class, with its own methods and attributes. These methods usually check that the given file is correctly formatted. They also allow the system to convert to other formats, or indicate it how to display the file. There are some loosely defined format (eg. data:Text, binary:Binary, tabular:Tabular) and some much more strictly defined formats with multiple checking points (eg. binary:Bam, sequence:FastqSanger, interval:Gff)
Most usual file formats are already defined, or can be sub-classed without modification from an existing class. In rare cases, a specific format will not be already defined, or it will be defined too strictly for the wanted usage. In these cases, refer to the following section (#Adding proprietary datatypes).
Known problems with specific datatypes
- Compressed files (zip, tar, gz): decompression before sniffing
- HTML files: sometimes (html5?) can’t display or download
- BAM files: can’t upload non-sorted bam
Adding proprietary datatypes
Making Galaxy aware of your tool
Add the following line to the appropriated section:
<tool file=”path/to/the/tool_wrapper.xml” />
with the path to the wrapper starting from the “~galaxy/tools” directory
Did it work ?
After XML wrapper writing: open with a browser
Sharing your Galaxy tool (ToolShed)
Creating a ToolShed repository
First you have to create a ToolShed account, which is different from the main public Galaxy or a local instance. Then you log in and click on the option "Create new repository" on the left menu.
You have to fill different options from the "Create Repository" form:
- name of the repository
- Repository type
- (unrestricted or tool dependency definition).
- short description of the objects that contains the repository
- Detailled description
- explain with more details what are the function of the objects within the repository ( tools, workflows,etc)
- categorie in which the repository will appear in the ToolShed's research list categorie (data source, text manipulation, sequence analysis,etc)
Associating a repository with a certain type, changes the way that the ToolShed generates metadata for the repository revisions.
There are two types of repository:
- Unrestricted: the repository can contain any set of Galaxy utilities or files.
- Tool dependency definition: the repository can only contain a single file named tool_dependencies.xml. Generally, this type of repository is used to download and compile certain versions of a package tool. Following best practices, these repositories are named like this: package_<name>_<version> (e.g., package_amos_3_1_0, package_ape_3_0, package_atlas_3_10, etc).
Adding files to a repository
You must click on the "Repository actions" button, then on the "Upload files to repository" of the the ToolShed page. You can upload individual files or tar archives (gzip and bzip2 supported).
Type of files
- Basic Galaxy tool Wrapper (tool config file and executable).
- functional tests: input and output datasets used by the tests, must be put in a directory named test-data.
- index location file: your repository should include a xxx.loc.sample file.
- Images displayed in tool's help section: All image files must be contained in the directory path: <repository root>/static/images within the repository hierarchy ("best practice" approach ).