Using bilby_pipe

Basics

The primary user-interface for this code is a command line tool bilby_pipe which is available after following the installation instructions. To see the help for this tool, run

$ bilby_pipe --help

(the complete output is given in bilby_pipe help)

To run bilby_pipe, you first need to define an define <an ini file <ini_file.txt>`_; examples for different types of ini files can be found below.

Once you have an ini file (for the purpose of clarity, lets say my-run.ini, you initialize your job with

$ bilby_pipe my-run.ini

This will produce a directory structure as follows:

my-run.ini
outdir/
  -> data/
  -> log_data_analysis/
  -> log_data_generation/
  -> log_results_page/
  -> result/
  -> results_page/
  -> submit/

The first six of these folders will initially be empty, but as the job progresses will be populated. The data directory will contain all the data to be analyses while result will contain the *result.json result files generated by bilby along with any plots. Note that the location to the log and results_page folders can be modified.

The final folder, submit, contains all the of the DAG submission scripts. To submit your job, run condor_submit_dag giving as a first argument the file prepended with dag under outdir/submit (instructions to do this are printed to the terminal after). Alternatively, you can initialise and submit your jobs with

$ bilby_pipe my-run.ini --submit

Running all or part of the job directly

In some cases, you may need to run all or part of the job directly (e.g., not through a scheduler). This can be done by using the file prepended with bash in the submit/ directory. This file is a simple bash script that runs all commands in sequence. One simple way to run part of the job is to open the bash file and copy the commands you require to another script and then run that. For convenience, we also add if statements to the bash script to enable you to run parts of the analysis by providing a pattern as a command line. For example, to run the data generation step, you can call the bash script with generation in the arguments, e.g.:

$ bash outdir/submit/bash_my_label.sh generation

If you want to run the analysis step and n-parallel=1, then you would use

$ bash outdir/submit/bash_my_label.sh analysis

Note, if n-parallel > 1 this will run all the parallel jobs. To run just one, run (replacing par0 with the analysis you want to run):

$ bash outdir/submit/bash_my_label.sh par0

Finally to merge the analyses, run

$ bash outdir/submit/bash_my_label.sh merge

Internally, the bash script is simply matching the given argument to the job name. This works in simple cases, but will likely fail or need inspection of the base file itself in complicated cases. Moreover, if you use any of the special key words (generation, analysis, par, or merge) in your label, the ability to filter to single jobs will be lost.

Using the slurm batch scheduler

By default, bilby_pipe runs under a HTCondor environment (the default for the LIGO data grid). It can also be used on a slurm-based cluster. Here we give a brief description of the steps required to run under slurm, but a full list of available options, see the output of bilby_pipe –help.

To use slurm, add scheduler=slurm to your ini file. Typically, slurm needs you to configure the correct environment, you can do this by passing it in to scheduler-env=my-environment. This will add the following line to your submit scripts.

$ source activate my-environment

(Note: for conda users, this is equivalent to conda activate my-environment).

If the cluster you are using does not provide network access on the compute nodes, the data generation step may fail if an attempt is made to remotely access the data. (If you are creating simulated data, or have local copies of the data, this is, of course, not a problem). To resolve this issue, you can set local-generation=True in your ini file. The generation steps will then be run on the head node when you invoke bilby_pipe after which you simply submit the job.

Slurm modules can be loaded using scheduler-modules, a space-separated list of modules to load. Additional commands to sbatch can be given using the scheduler-args command.

Putting all this together, adding these lines to your ini file

scheduler = slurm
scheduler-args = arg1=val1 arg2=val2
scheduler-modules = git python
scheduler-env = my-environment
scheduler-analysis-time = 1-00:00:00   # Limit job to 1 day

Will produce a slurm submit files which contains

#SBATCH --arg1=val1
#SBATCH --arg2=val2

module load git python

and individual bash scripts containing

module load git python

source activate my-environment

Summary webpage

bilby_pipe allows the user to visualise the posterior samples through a ‘summary’ webpage. This is implemented using PESummary (documentation here).

To generate a summary webpage, the create-summary option must be passed in the configuration file. Additionally, you can specify a web directory where you would like the output from PESummary to be stored; by default this is placed in outdir/results_page. If you are working on an LDG cluster, then the web directory should be in your public_html. Below is an example of the additional lines to put in your configuration file to generate ‘summary’ webpages:

create-summary = True
email = albert.einstein@ligo.org
webdir = /home/albert.einstein/public_html/project

If you have already generated a webpage in the past using PESummary, then you are able to pass the existing-dir options to add further results files to a single webpage. This includes all histograms for each results file as well as comparison plots. Below is an example of the additional lines in the configuration file that will add to an existing webpage:

create-summary = True
email = albert.einstein@ligo.org
existing-dir = /home/albert.einstein/public_html/project

bilby_pipe help

For reference, here is the full output of .. code-block:: console

$ bilby_pipe –help

usage: 
bilby_pipe is a command line tools for taking user input (as command line
arguments or an ini file) and creating DAG files for submitting bilby parameter
estimation jobs. To get started, write an ini file `config.ini` and run

$ bilby_pipe config.ini

Instruction for how to submit the job are printed in a log message. You can
also specify extra arguments from the command line, e.g.

$ bilby_pipe config.ini --submit

will build and submit the job.

Positional Arguments

ini

Configuration ini file

Named Arguments

-v, --verbose

Verbose output

Default: False

--version

show program’s version number and exit

Calibration arguments

Which calibration model and settings to use.

--calibration-model

Possible choices: CubicSpline, Precomputed, None

Choice of calibration model, if None, no calibration is used

--spline-calibration-envelope-dict

Dictionary pointing to the spline calibration envelope files

--spline-calibration-nodes

Number of calibration nodes

Default: 10

--spline-calibration-amplitude-uncertainty-dict

Dictionary of the amplitude uncertainties for the constant uncertainty model

--spline-calibration-phase-uncertainty-dict

Dictionary of the phase uncertainties for the constant uncertainty model

--calibration-prior-boundary

Boundary methods for the calibration prior boundary

Default: reflective

Data generation arguments

How to generate the data, e.g., from a list of gps times or simulated Gaussian noise.

--ignore-gwpy-data-quality-check

Ignores the check to see if data queried from GWpy (ie not gaussian noise) is obtained from time when the IFOs are in science mode.

Default: True

--gps-tuple

Tuple of the (start, step, number) of GPS start times. For example, (10, 1, 3) produces the gps start times [10, 11, 12]. If given, gps-file is ignored.

--gps-file

File containing segment GPS start times. This can be a multi-column file if (a) it is comma-separated and (b) the zeroth column contains the gps-times to use

--timeslide-file

File containing detector timeslides. Requires a GPS time file to also be provided. One column for each detector. Order of detectors specified by –detectors argument. Number of timeslides must correspond to the number of GPS times provided.

--timeslide-dict

Dictionary containing detector timeslides: applies a fixed offset per detector. E.g. to apply +1s in H1, {H1: 1}

--trigger-time

Either a GPS trigger time, or the event name (e.g. GW150914). For event names, the gwosc package is used to identify the trigger time

--n-simulation

Number of simulated segments to use with gaussian-noise Note, this must match the number of injections specified

Default: 0

--data-dict

Dictionary of paths to gwf, or hdf5 data files

--data-format

If given, the data format to pass to `gwpy.timeseries.TimeSeries.read(), see gwpy.github.io/docs/stable/timeseries/io.html

--allow-tape

If true (default), allow reading data from tape. See `gwpy.timeseries.TimeSeries.get() for more information.

Default: True

--channel-dict

Channel dictionary: keys relate to the detector with values the channel name, e.g. ‘GDS-CALIB_STRAIN’. For GWOSC open data, set the channel-dict keys to ‘GWOSC’. Note, the dictionary should follow basic python dict syntax.

--frame-type-dict

Frame type to use when finding data. If not given, defaults will be used based on the gps time using bilby_pipe.utils.default_frame_types, e.g., {H1: H1_HOFT_C00_AR}.

--data-find-url

URL to use for datafind, default is https://datafind.ligo.org to query CVMFS

Default: “https://datafind.ligo.org

--data-find-urltype

URL type to use for datafind, default is osdf

Default: “osdf”

--gaussian-noise

If true, use simulated Gaussian noise

Default: False

--zero-noise

Use a zero noise realisation

Default: False

Detector arguments

How to set up the interferometers and power spectral density.

--coherence-test

Run the analysis for all detectors together and for each detector separately

Default: False

--detectors

The names of detectors to use. If given in the ini file, detectors are specified by detectors=[H1, L1]. If given at the command line, as –detectors H1 –detectors L1

--duration

The duration of data around the event to use

Default: 4

--generation-seed

Random seed used during data generation. If no generation seed provided, a random seed between 1 and 1e6 is selected. If a seed is provided, it is used as the base seed and all generation jobs will have their seeds set as {generation_seed = base_seed + job_idx}.

--psd-dict

Dictionary of PSD files to use

--psd-fractional-overlap

Fractional overlap of segments used in estimating the PSD

Default: 0.5

--post-trigger-duration

Time (in s) after the trigger_time to the end of the segment

Default: 2.0

--sampling-frequency

Default: 4096

--psd-length

Sets the psd duration (up to the psd-duration-maximum). PSD duration calculated by psd-length x duration [s]. Default is 32.

Default: 32

--psd-maximum-duration

The maximum allowed PSD duration in seconds, default is 1024s.

Default: 1024

--psd-method

PSD method see gwpy.timeseries.TimeSeries.psd for options

Default: “median”

--psd-start-time

Start time of data (relative to the segment start) used to generate the PSD. Defaults to psd-duration before the segment start time

--maximum-frequency

The maximum frequency, given either as a float for all detectors or as a dictionary (see minimum-frequency)

--minimum-frequency

The minimum frequency, given either as a float for all detectors or as a dictionary where all keys relate the detector with values of the minimum frequency, e.g. {H1: 10, L1: 20}. If the waveform generation should start the minimum frequency for any of the detectors, add another entry to the dictionary, e.g., {H1: 40, L1: 60, waveform: 20}.

Default: “20”

--tukey-roll-off

Roll off duration of tukey window in seconds, default is 0.4s

Default: 0.4

--resampling-method

Possible choices: lal, gwpy

Resampling method to use: lal matches the resampling used by lalinference/BayesWave

Default: “lal”

Injection arguments

Whether to include software injections and how to generate them.

--injection

Create data from an injection file

Default: False

--injection-dict

A single injection dictionary given in the ini file

--injection-file

Injection file to use. See bilby_pipe_create_injection_file –help for supported formats

--injection-numbers

Specific injections rows to use from the injection_file, e.g. `injection_numbers=[0,3] selects the zeroth and third row. Can be a list of slice-syntax values, e.g, [0, 2:4] will produce [0, 2, 3]. Repeated entries will be ignored.

--injection-waveform-approximant

The name of the waveform approximant to use to create injections. If none is specified, then the waveform-approximant will be usedas the injection-waveform-approximant.

--injection-frequency-domain-source-model

Frequency domain source model to use for generating injections. If this is None, it will default to the frequency domain source modelused for analysis.

--injection-waveform-arguments

A dictionary of arbitrary additional waveform-arguments to pass to the bilby waveform generator’s waveform arguments for the injection only

Job submission arguments

How the jobs should be formatted, e.g., which job scheduler to use.

--accounting

Accounting group to use (see, https://accounting.ligo.org/user)

--accounting-user

Accounting group user to use (see, https://accounting.ligo.org/user)

--label

Output label

Default: “label”

--local

Run the job locally, i.e., not through a batch submission

Default: False

--local-generation

Run the data generation job locally. This may be useful for running on a cluster where the compute nodes do not have internet access. For HTCondor, this is done using the local universe, for slurm, the jobs will be run at run-time

Default: False

--local-plot

Run the plot job locally

Default: False

--outdir

The output directory. If outdir already exists, an auto-incrementing naming scheme is used

Default: “outdir”

--overwrite-outdir

If given, overwrite the outdir (if it exists)

Default: False

--periodic-restart-time

Time after which the job will self-evict when scheduler=condor. After this, condor will restart the job. Default is 28800. This is used to decrease the chance of HTCondor hard evictions

Default: 28800

--request-disk

Disk allocation request in GB. Default is 5GB.

Default: 5

--request-memory

Memory allocation request (GB). Default is 8GB

Default: 8.0

--request-memory-generation

Memory allocation request (GB) for data generation step

--request-cpus

Use multi-processing. This options sets the number of cores to request. To use a pool of 8 threads on an 8-core CPU, set request-cpus=8. For the dynesty, ptemcee, cpnest, and bilby_mcmc samplers, no additional sampler-kwargs are required

Default: 1

--conda-env

Either a conda environment name of a absolute path to the conda env folder.

--scheduler

Format submission script for specified scheduler. Currently implemented: SLURM

Default: “condor”

--scheduler-args

Space-separated #SBATCH command line args to pass to slurm. The args needed will depend on the setup of your slurm scheduler.Please consult documentation for your local cluster (slurm only).

--scheduler-module

Space-separated list of modules to load at runtime (slurm only)

--scheduler-env

Python environment to activate (slurm only)

--scheduler-analysis-time

Default: 7-00:00:00

--submit

Attempt to submit the job after the build

Default: False

--condor-job-priority

Job priorities allow a user to sort their HTCondor jobs to determine which are tried to be run first. A job priority can be any integer: larger values denote better priority. By default HTCondor job priority=0.

Default: 0

--transfer-files

If true (default), use the HTCondor file transfer mechanism For non-condor schedulers, this option is ignored. Note: the log files are automatically synced, but to sync the results during the run (e.g. to inspect progress), use the executable bilby_pipe_htcondor_sync

Default: True

--additional-transfer-paths

Additional files that should be transferred to the analysis jobs. The default is not transferring any additional files. Additional files can be specified as a list in the configuration file [a, b] or on the command line as –additional-transfer-paths a –additonal-transfer-paths b

--environment-variables

Key value pairs for environment variables formatted as a json string, e.g., ‘{‘OMP_NUM_THREADS’: 1, ‘LAL_DATA_PATH’=’/home/data’}’. These values take precedence over –getenv. The default values are {‘HDF5_USE_FILE_LOCKING’: ‘FAlSE’, ‘OMP_NUM_THREADS’: 1, ‘OMP_PROC_BIND’: ‘false’}.

--getenv

List of environment variables to copy from the current session.

--disable-hdf5-locking

If true (default), disable HDF5 locking. This can improve stability on some clusters, but may cause issues if multiple processes are reading/writing to the same file. This argument is deprecated and should be passed through –environment-variables

Default: False

--log-directory

If given, an alternative path for the log output

--osg

If true, format condor submission for running on OSG, default is False

Default: False

--desired-sites

A comma-separated list of desired sites, wrapped in quoates. e.g., desired-sites=’site1,site2’. This can be used on the OSG to specify specific run nodes.

--analysis-executable

Path to an executable to replace bilby_pipe_analysis, be aware that this executable will pass the complete ini file (in the outdir.)

--analysis-executable-parser

Python path to the analysis executable parser, used in conjunction with analysis-executable. Note, if this is not provided any new arguments to analysis-executable will raise a warning, but they will be passed to the executable directly.

Likelihood arguments

Options for setting up the likelihood.

--calibration-marginalization

Boolean. If true, use a likelihood that is numerically marginalized over the calibration uncertainty as described in arXiv:2009.10193.

Default: False

--distance-marginalization

Boolean. If true, use a distance-marginalized likelihood

Default: False

--distance-marginalization-lookup-table

Path to the distance-marginalization lookup table

--phase-marginalization

Boolean. If true, use a phase-marginalized likelihood

Default: False

--time-marginalization

Boolean. If true, use a time-marginalized likelihood

Default: False

--jitter-time

Boolean. If true, and using a time-marginalized likelihood ‘time jittering’ will be performed

Default: True

--reference-frame

Reference frame for the sky parameterisation, either ‘sky’ (default) or, e.g., ‘H1L1’

Default: “sky”

--time-reference

Time parameter to sample in, either ‘geocent’ (default) or, e.g., ‘H1’

Default: “geocent”

--likelihood-type

The likelihood. Can be one of [GravitationalWaveTransient, ROQGravitationalWaveTransient, zero] or python path to a bilby likelihood class available in the users installation. The –roq-folder or both –linear-matrix and –quadratic-matrix are required if the ROQ likelihood used. If both the options are specified, ROQ data are taken from roq-folder, and linear-matrix and quadratic-matrix are ignored.If zero is given, a testing ZeroLikelihood is used which alwaysreturn zero.

Default: “GravitationalWaveTransient”

--calibration-lookup-table

Dictionary of calibration lookup files for use with calibration marginalization/the precomputed model. If these files don’t exist, they will be generated from the passed uncertainties.

--number-of-response-curves

The number of response curves to use for calibration marginalization

Default: 1000

--roq-folder

The data for ROQ

--roq-linear-matrix

Path to ROQ basis for linear inner products. This option is ignored if roq-folder is not None.

--roq-quadratic-matrix

Path to ROQ basis for quadratic inner products. This option is ignored if roq-folder is not None.

--roq-weights

If given, the ROQ weights to use (rather than building them). This must be given along with the roq-folder for checking

--roq-weight-format

File format of roq weights. This should be npz, hdf5, or json. If not specified, it is set to hdf5.

Default: “hdf5”

--roq-scale-factor

Rescaling factor for the ROQ, default is 1 (no rescaling)

Default: 1

--fiducial-parameters

The reference parameters for the relative binning likelihod. If this is not specified, the value will be drawn from the prior.

--update-fiducial-parameters

Whether to update the fiducial parameters using an optimization algorithm. This is automatically set to True if –fiducial-parameters is None.

Default: False

--epsilon

Epsilon value for the relative binning likelihood

Default: 0.025

--extra-likelihood-kwargs

Additional keyword arguments to pass to the likelihood. Any arguments which are named bilby_pipe arguments, e.g., distance_marginalization should NOT be included. This is only used if you are not using the GravitationalWaveTransient or ROQGravitationalWaveTransient likelihoods

Output arguments

What kind of output/summary to generate.

--plot-trace

Create traceplots during the run

Default: False

--plot-data

Create plot of the frequency domain data

Default: False

--plot-injection

Create time-domain plot of the injection

Default: False

--plot-spectrogram

Create spectrogram plot

Default: False

--plot-calibration

Create calibration posterior plot

Default: False

--plot-corner

Create intrinsic and extrinsic posterior corner plots

Default: False

--plot-marginal

Create 1-d marginal posterior plots

Default: False

--plot-skymap

Create posterior skymap

Default: False

--plot-waveform

Create waveform posterior plot

Default: False

--plot-format

Format for making bilby_pipe plots, can be [png, pdf, html]. If specified format is not supported, will default to png.

Default: “png”

--create-summary

Create a PESummary page

Default: False

--email

Email for notifications

--notification

Notification setting for HTCondor jobs. One of ‘Always’,’Complete’,’Error’,’Never’. If defined by ‘Always’, the owner will be notified whenever the job produces a checkpoint, as well as when the job completes. If defined by ‘Complete’, the owner will be notified when the job terminates. If defined by ‘Error’, the owner will only be notified if the job terminates abnormally, or if the job is placed on hold because of a failure, and not by user request. If defined by ‘Never’ (the default), the owner will not receive e-mail, regardless to what happens to the job. Note, an email arg is also required for notifications to be emailed.

Default: Never

--queue

Condor job queue. Use Online_PE for online parameter estimation runs.

--existing-dir

If given, add results to an directory with an an existing summary.html file

--webdir

Directory to store summary pages. If not given, defaults to outdir/results_page

--summarypages-arguments

Arguments (in the form of a dictionary) to pass to the summarypages executable

--result-format

Possible choices: json, hdf5, pickle

Format to save the result file in.

Default: “hdf5”

--final-result

If true (default), generate a set of lightweight downsamples final results.

Default: True

--final-result-nsamples

Maximum number of samples to keep in the final results

Default: 20000

Prior arguments

Specify the prior settings.

--default-prior

The name of the prior set to base the prior on. Can be one of[PriorDict, BBHPriorDict, BNSPriorDict, CalibrationPriorDict]or a python path to a bilby prior class available in the user’s installation.

Default: “BBHPriorDict”

--deltaT

The symmetric width (in s) around the trigger time to search over the coalescence time

Default: 0.2

--prior-file

The prior file

--prior-dict

A dictionary of priors (alternative to prior-file). Multiline dictionaries are supported, but each line must contain a singleparameter specification and finish with a comma.

--enforce-signal-duration

Whether to require that all signals fit within the segment duration. The signal duration is calculated using a post-Newtonian approximation.

Default: True

Post processing arguments

What post-processing to perform.

--postprocessing-executable

An executable name for postprocessing. A single postprocessing job is run as a child of all analysis jobs

--postprocessing-arguments

Arguments to pass to the postprocessing executable

--single-postprocessing-executable

An executable name for postprocessing. A single postprocessing job is run as a child for each analysis jobs: note the difference with respect postprocessing-executable

--single-postprocessing-arguments

Arguments to pass to the single postprocessing executable. The str ‘$RESULT’ will be replaced by the path to the individual result file

Sampler arguments

--sampler

Sampler to use

Default: “dynesty”

--sampling-seed

Random sampling seed

--n-parallel

Number of identical parallel jobs to run per event

Default: 1

--sampler-kwargs

Dictionary of sampler-kwargs to pass in, e.g., {nlive: 1000} OR pass pre-defined set of sampler-kwargs {DynestyDefault, BilbyMCMCDefault, FastTest}

Default: “DynestyDefault”

--reweighting-configuration

Configuration for reweighting the result. This can be specified as either a dictionary in the configuration file, or a json file.

--reweight-nested-samples

Whether to reweight nested samples directly. Currently this only works with dynesty.

Default: True

Waveform arguments

Setting for the waveform generator

--waveform-generator

The waveform generator class, should be a python path. This will not be able to use any arguments not passed to the default.

Default: “bilby.gw.waveform_generator.LALCBCWaveformGenerator”

--reference-frequency

The reference frequency

Default: 20

--waveform-approximant

The name of the waveform approximant to use for PE.

Default: “IMRPhenomPv2”

--catch-waveform-errors

Turns on waveform error catching

Default: True

--pn-spin-order

Post-newtonian order to use for the spin

Default: -1

--pn-tidal-order

Post-Newtonian order to use for tides

Default: -1

--pn-phase-order

post-Newtonian order to use for the phase

Default: -1

--pn-amplitude-order

Post-Newtonian order to use for the amplitude. Also used to determine the waveform starting frequency.

Default: 0

--numerical-relativity-file

Path to a h5 numerical relativity file to inject, seehttps://git.ligo.org/waveforms/lvcnr-lfs for examples

--waveform-arguments-dict

A dictionary of arbitrary additional waveform-arguments to pass to the bilby waveform generator’s waveform_arguments

--mode-array

Array of modes to use for the waveform. Should be a list of lists, eg. [[2,2], [2,-2]]

--frequency-domain-source-model

Name of the frequency domain source model. Can be one of[lal_binary_black_hole, lal_binary_neutron_star,lal_eccentric_binary_black_hole_no_spins, sinegaussian, supernova, supernova_pca_model] or any python path to a bilby source function the users installation, e.g. examp.source.bbh

Default: “lal_binary_black_hole”

--conversion-function

Optional python path to a user-specified conversion function If unspecified, this is determined by the frequency_domain_source_model.If the source-model contains binary_black_hole, the conversion function is bilby.gw.conversion.convert_to_lal_binary_black_hole_parameters. If the source-model contains binary_neutron_star, the generation function is bilby.gw.conversion.convert_to_lal_binary_black_hole_parameters. If you specify your own function, you may wish to use the I/O of those functions as templates.If given as ‘noconvert’ (case insensitive), no conversion is used’

--generation-function

Optional python path to a user-specified generation function If unspecified, this is determined by the frequency_domain_source_model.If the source-model contains binary_black_hole, the generation function is bilby.gw.conversion.generate_all_bbh_parameters. If the source-model contains binary_neutron_star, the generation function is bilby.gw.conversion.generate_all_bns_parameters. If you specify your own function, you may wish to use the I/O of those functions as templatesIf given as ‘noconvert’ (case insensitive), no generation is used’