.. _cbc-analysis: CBC Analysis (Offline) ======================== To start an offline CBC analysis, you'll need a configuration file to point at the start/end times to analyze, input data products (e.g. template bank, mass model) and other workflow-related configuration needed. All the below steps assume a Singularity container with the GstLAL software stack installed. Other methods of installation will follow a similar procedure, however, with one caveat that workflows will not work on the Open Science Grid (OSG). For a dag on the OSG IGWN grid, you must use a Singularity container on cvmfs, set the ``profile`` in ``config.yaml`` to ``osg`` and make sure to submit the dag from a OSG node. Otherwise the workflow is the same. When running without a Singularity container, the commands below should be modified. (Such as running ``gstlal_inspiral_workflow init -c config.yml``) instead of ``singularity exec gstlal_inspiral_workflow init -c config.yml``). For ICDS gstlalcbc shared accounts, the ``env.sh`` contents much be changed and instead of running ``$ X509_USER_PROXY=/path/to/x509_proxy ligo-proxy-init -p albert.einstein`` run ``source env.sh``. (Details are below.) Running Workflows ^^^^^^^^^^^^^^^^^^ 1 Build Singularity image (optional) """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" NOTE: If you are using a reference Singularity container (suitable in most cases), you can skip this step. The ```` throughout this doc refers to ``singularity-image`` specified in the ``condor`` section of your configuration. If not using the reference Singularity container, say for local development, you can specify a path to a local container and use that for the workflow (non-OSG). To pull a container with gstlal installed, run: .. code:: bash $ singularity build --sandbox --fix-perms docker://containers.ligo.org/lscsoft/gstlal:master To use a branch other than master, you can replace `master` in the above command with the name of the desired branch. To use a custom build instead, gstlal will need to be installed into the container from your modified source code. For installation instructions, see the `installation page `_ 2. Set up workflow """""""""""""""""""" First, we create a new analysis directory and switch to it: .. code:: bash $ mkdir $ cd $ mkdir bank mass_model idq dtdphi Default configuration files and environment (``env.sh``) for a variety of different banks are contained in the `offline-configuration `_ repository. One can run the commands below to grab the configuration files, or clone the repository and copy the files as needed into the analysis directory. To download data files (mass model, template banks) that may be needed for offline runs, see the `README `_ in the offline-configuration repo. Move the template bank(s) into ``bank`` and the mass model into ``mass_model``. For example, to grab all the relevant files for a small BNS dag: .. code:: bash $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/configs/bns-small/config.yml $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/env.sh $ source /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/etc/profile.d/conda.sh $ conda activate igwn $ dcc archive --archive-dir=. --files -i T2200318-v2 $ conda deactivate Then move the template bank, mass model, idq file, and dtdphi file into their corresponding directories. When running an analysis on the ICDS cluster in the gstlalcbc shared account, the contents of ``env.sh`` must be changed to what is given below. In addition, below in the tutorial, where it says to run ``ligo-proxy-init -p``, instead, run ``source env.sh`` on the modified ``env.sh``. When running on non gstlalcbc shared accounts on ICDS or when running on other clusters, the ``env.sh`` does not need to be modifed, and ``ligo-proxy-init -p`` can be run as in the tutorial. .. code-block:: yaml export PYTHONUNBUFFERED=1 unset X509_USER_PROXY export X509_USER_CERT=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem export X509_USER_KEY=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem export GSTLAL_FIR_WHITEN=0 Now, we'll need to modify the configuration as needed to run the analysis. At the very least, setting the start/end times and the instruments to run over: .. code-block:: yaml start: 1187000000 stop: 1187100000 instruments: H1L1 Ensure the template bank, mass model, idq file, and dtdphi file are pointed to in the configuration: .. code-block:: yaml data: template-bank: bank/gstlal_bank_small.xml.gz .. code-block:: yaml prior: mass-model: bank/mass_model_small.h5 idq-timeseries: idq/H1L1-IDQ_TIMESERIES-1239641219-692847.h5 dtdphi: dtdphi/inspiral_dtdphi_pdf.h5 If you're creating a summary page for results, you'll need to point at a location where they are web-viewable: .. code-block:: yaml summary: webdir: ~/public_html/ If you're running on LIGO compute resources and your username doesn't match your albert.einstein username, you'll also additionally need to specify the accounting group user for condor to track accounting information: .. code-block:: yaml condor: accounting-group-user: albert.einstein In addition, update the ``singularity-image`` in the ``condor`` section of your configuration if needed: .. code-block:: yaml condor: singularity-image: /cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master If not using a reference Singularity image, you can replace this with the full path to a local singularity container ````. For more detailed configuration options, take a look at the :ref:`configuration section ` below. If you haven't installed site-specific profiles yet (per-user), you can run: .. code:: bash $ singularity exec gstlal_grid_profile install which will install configurations that are site-specific, i.e. ``ldas`` and ``icds``. You can select which profile to use in the ``condor`` section: .. code-block:: yaml condor: profile: ldas For a OSG IGWN grid run, use ``osg``. To view which profiles are available, you can run: .. code:: bash $ singularity exec gstlal_grid_profile list Note, you can install :ref:`custom profiles ` as well. Once you have the configuration, data products, and grid profiles installed, you can set up the Makefile using the configuration, which we'll then use for everything else, including the data file needed for the workflow, the workflow itself, the summary page, etc. .. code:: bash $ singularity exec gstlal_inspiral_workflow init -c config.yml By default, this will generate the full workflow. If you want to only run the filtering step, a rerank, or an injection-only workflow, you can instead specify the workflow as well, e.g. .. code:: bash $ singularity exec gstlal_inspiral_workflow init -c config.yml -w injection for an injection-only workflow. If you already have a Makefile and need to update it based on an updated configuration, run ``gstlal_inspiral_workflow`` with ``--force``. Next, if you accessing non-public (GWOSC) data, you'll need to set up your proxy to ensure you can get access to LIGO data: .. code:: bash $ X509_USER_PROXY=/path/to/x509_proxy ligo-proxy-init -p albert.einstein Note that we are running this step outside of Singularity. This is because ``ligo-proxy-init`` is not installed within the image currently. If you are running on the ICDS gstlalcbc shared account, do not run the command above. Instead, run: .. code:: bash $ source env.sh Also update the configuration accordingly (if needed): .. code-block:: yaml source: x509-proxy: /path/to/x509_proxy Finally, set up the rest of the workflow including the DAG for submission: .. code:: bash $ singularity exec -B $TMPDIR make dag If running on the OSG IGWN grid, make sure to submit the dags from the OSG node. This should create condor DAGs for the workflow. Mounting a temporary directory is important as some of the steps will leverage a temporary space to generate files. If one desires to see detailed error messages, add ```` to ``environment`` in the submit (``*.sub``) files by running: .. code:: bash $ sed -i '/^environment = / s/\"$/ PYTHONUNBUFFERED=1\"/' *.sub 3. Launch workflows """"""""""""""""""""""""" .. code:: bash $ source env.sh $ make launch This is simply a thin wrapper around `condor_submit_dag` launching the DAG in question. You can monitor the dag with Condor CLI tools such as ``condor_q`` and ``tail -f full_inspiral_dag.dag.dagman.out``. 4. Generate Summary Page """"""""""""""""""""""""" After the DAG has completed, you can generate the summary page for the analysis: .. code:: bash $ singularity exec make summary To make an open-box page after this, run: .. code:: bash $ make unlock .. _analysis-configuration: Configuration ^^^^^^^^^^^^^^ The top-level configuration consists of the analysis times and detector configuration: .. code-block:: yaml start: 1187000000 stop: 1187100000 instruments: H1L1 min-instruments: 1 These set the start and stop gps times of the analysis, plus the detectors to use (H1=Hanford, L1=Livingston, V1=Virgo). There is a nice online converter for gps times here: https://www.gw-openscience.org/gps/. You can also use the program `gpstime` as well. Note that these start and stop times have no knowledge about science quality data, the actual science quality data that are analyzed is typically a subset of the total time. Information about which detectors were on at different times is available here: https://www.gw-openscience.org/data/. ``min-instruments`` sets the minimum number of instruments we will allow to form an event, e.g. setting it to 1 means the analysis will consider single detector events, 2 means we will only consider events that are coincident across at least 2 detectors. Section: Data """""""""""""" .. code-block:: yaml data: template-bank: bank/gstlal_bank_small.xml.gz analysis-dir: /path/to/analysis/dir The ``template-bank`` option points to the template bank file. These are xml files that follow the LIGOLW (LIGO light weight) schema. The template bank in particular contains a table that lists the parameters of all of the templates, it does not contain the actual waveforms themselves. Metadata such as the waveform approximant and the frequency cutoffs are also listed in this file. The ``analysis-dir`` option is used if the user wishes to point to an existing analysis to perform a rerank or an injection-only workflow. This grabs existing files from this directory to seed the rerank/injection workflows. One can use multiple sub template banks. In this case, the configuration might look like: .. code-block:: yaml data: template-bank: bns: bank/sub_bank/bns.xml.gz nsbh: bank/sub_bank/nsbh.xml.gz bbh_1: bank/sub_bank/bbh_low_q.xml.gz bbh_2: bank/sub_bank/other_bbh.xml.gz imbh: bank/sub_bank/imbh_low_q.xml.gz Section: Source """""""""""""""" .. code-block:: yaml source: data-source: frames data-find-server: datafind.gw-openscience.org frame-type: H1: H1_GWOSC_O2_16KHZ_R1 L1: L1_GWOSC_O2_16KHZ_R1 channel-name: H1: GWOSC-16KHZ_R1_STRAIN L1: GWOSC-16KHZ_R1_STRAIN sample-rate: 4096 frame-segments-file: segments.xml.gz frame-segments-name: datasegments x509-proxy: x509_proxy The ``data-find-server`` option points to a server that is queried to find the location of frame files. The address shown above is a publicly available server that will return the locations of public frame files on cvmfs. Each frame file has a type that describes the contents of the frame file, and may contain multiple channels of data, hence the channel names must also be specified. ``frame-segments-file`` points to a LIGOLW xml file that describes the actual times to analyze, i.e. it lists the time that science quality data are available. These files are generalized enough that they could describe different types of data, so ``frame-segments-name`` is used to specify which segment to consider. In practice, the segments file we produce will only contain the segments we want. Users will typically not change any of these options once they are set for a given instrument and observing run. ``x509-proxy`` is the path to your ``x509-proxy``. Section: Segments """""""""""""""""" The ``segments`` section specifies how to generate segments and vetoes for the workflow. There are two backends to determine where to query segments and vetoes from, ``gwosc`` (public) and ``dqsegdb`` (authenticated). An example of configuration with the ``gwosc`` backend looks like: .. code-block:: yaml segments: backend: gwosc vetoes: category: CAT1 Here, the ``backend`` is set to ``gwosc`` so both segments and vetoes are determined by querying the GWOSC server. There is no additional configuration needed to query segments, but for vetoes, we also need to specify the ``category`` used for vetoes. This can be one of ``CAT1``, ``CAT2``, or ``CAT3``. By default, segments are generated by applying ``CAT1`` vetoes as recommended by the Detector Characterization group. An example of configuration with the ``dqsegdb`` backend looks like: .. code-block:: yaml segments: backend: dqsegdb science: H1: DCS-ANALYSIS_READY_C01:1 L1: DCS-ANALYSIS_READY_C01:1 V1: ITF_SCIENCE:2 vetoes: category: CAT1 veto-definer: file: H1L1V1-HOFT_C01_V1ONLINE_O3_CBC.xml version: O3b_CBC_H1L1V1_C01_v1.2 epoch: O3 Here, the ``backend`` is set to ``dqsegdb`` so both segments and vetoes are determined by querying the DQSEGDB server. To query segments, one needs to specify the flag used per instrument to query segments from. For vetoes, we need to specify the ``category`` used for vetoes as with the ``dqsegdb`` backend. Additionally, a veto definer file is used to determine which flags are used for which veto categories. The file need not be provided, the ``file``, ``version`` and ``epoch`` fully specify how to access the veto definer file used for generating vetoes. Section: PSD """""""""""""" .. code-block:: yaml psd: fft-length: 8 sample-rate: 4096 The PSD estimation method used by GstLAL is a modified median-Welch method that is described in detail in Section IIB of Ref [1]. The FFT length sets the length of each section that is Fourier transformed. The default whitener will use zero-padding of one-fourth the FFT length on either side and will overlap fourier transformed segments by one-fourth the FFT length. For example, an ``fft-length`` of 8 means that each Fourier transformed segment used in the PSD estimation (and consequently the whitener) will contain 4 seconds of data with 2 seconds of zero padding on either side, and will overlap the next segment by 2 seconds (i.e. the last two seconds of data in one segment will be the first two seconds of data in the following window). Section: SVD """""""""""""" .. code-block:: yaml svd: f-low: 20.0 num-chi-bins: 1 sort-by: mchirp approximant: - 0:1.73:TaylorF2 - 1.73:1000:SEOBNRv4_ROM tolerance: 0.9999 max-f-final: 1024.0 num-split-templates: 200 overlap: 30 num-banks: 5 samples-min: 2048 samples-max-64: 2048 samples-max-256: 2048 samples-max: 4096 autocorrelation-length: 701 max-duration: 128 manifest: svd_manifest.json ``f-low`` sets the lower frequency cutoff for the analysis in Hz. ``num-chi-bins`` is a tunable parameter related to the template bank binning procedure; specifically, sets the number of effective spin parameter bins to use in the chirp-mass / effective spin binning procedure described in Sec. IID and Fig. 6 of [1]. ``sort-by`` selects the template sort column. This controls how to bin the bank in sub-banks suitable for the svd decomposition. It can be ``mchirp`` (sorts by chirp mass), ``mu`` (sorts by mu1 and mu2 coordiantes), or ``template_duration`` (sorts by template duration). ``approximant`` specifies the waveform approximant that should be used along with chirp mass bounds to use that approximant in. 0:1000:TaylorF2 means use the TaylorF2 approximant for waveforms from systems with chirp-masses between 0 and 1000 solar masses. Multiple waveforms and chirp-mass bounds can be provided. ``tolerance`` is a tunable parameter related to the truncation of SVD basis vectors. A tolerance of 0.9999 means the targeted matched-filter inner-product of the original waveform and the waveform reconstructed from the SVD is 0.9999. ``max-f-final`` sets the max frequency of the template. ``num-split-templates``, ``overlap``, ``num-banks``, are tunable parameters related to the SVD process. ``num-split-templates`` sets the number of templates to decompose at a time; ``overlap`` sets the number of templates from adjacent template bank regions to pad to the region being considered in order to actually compute the SVD (this helps the performance of the SVD, and these pad templates are not reconstructed); ``num-banks`` sets the number of sets of decomposed templates to include in a given bin for the analysis. For example, ``num-split-templates`` of 200, ``overlap`` of 30, and ``num-banks`` of 5 means that each SVD bank file will contain 5 decomposed sets of 200 templates, where the SVD was computed using an additional 15 templates on either side of the 200 (as defined by the binning procedure). ``samples-min``, ``samples-max-64``, ``samples-max-256``, and ``samples-max`` are tunable parameters related to the template time slicing procedure used by GstLAL (described in Sec. IID and Fig. 7 of Ref. [1], and references therein). Templates are slice in time before the SVD is applied, and only sampled at the rate necessary for the highest frequency in each time slice (rounded up to a power of 2). For example, the low frequency part of a waveform may only be sampled at 32 Hz, while the high frequency part may be sampled at 2048 Hz (depending on user settings). ``samples-min`` sets the minimum number of samples to use in any time slice. ``samples-max`` sets the maximum number of samples to use in any time slice with a sample rate below 64 Hz; ``samples-max-64`` sets the maximum number of samples to use in any time slice with sample rates between 64 Hz and 256 Hz; ``samples-max-256`` sets the maximum number of samples to use in any time slice with a sample rate greater than 256 Hz. ``autocorrelation-length`` sets the number of samples to use when computing the autocorrelation-based test-statistic, described in IIIC of Ref [1]. ``max-duration`` sets the maximum template duration in seconds. One can choose not to use ``max-duration``. ``manifest`` sets the name of a file that will contain metadata about the template bank bins. If one uses multiple sub template banks, SVD configurations can be specified for each sub template bank. Reference `mario config `_ . Users will typically not change these options. Section: Filter """""""""""""""" .. code-block:: yaml filter: fir-stride: 1 min-instruments: 1 coincidence-threshold: 0.01 ht-gate-threshold: 0.8:15.0-45.0:100.0 veto-segments-file: vetoes.xml.gz time-slide-file: tisi.xml injection-time-slide-file: inj_tisi.xml time-slides: H1: 0:0:0 L1: 0.62831:0.62831:0.62831 injections: bns: file: bns_injections.xml range: 0.01:1000.0 ``fir-stride`` is a tunable parameter related to the matched-filter procedure, setting the length in seconds of the output of the matched-filter element. ``coincidence-threshold`` is the time in seconds to add to the light-travel time when searching for coincidences between detectors. ``ht-gate-threshold`` sets the h(t) gate threshold as a function of chirp-mass. The h(t) gate threshold is a value over which the output of the whitener plus some padding will be set to zero (as described in IIC of Ref. [1]). 0.8:15.0-45.0:100.0 mean that a template bank bin that that has a max chirp-mass template of 0.8 solar masses will use a gate threshold of 15, a bank bin with a max chirp-mass of 100 will use a threshold of 45, and all other thresholds are described by a linear function between those two points. ``veto-segments-file`` sets the name of a LIGOLW xml file that contains any vetoes used for the analysis, even if there are no vetoes. ``time-slide-file`` and ``inj-time-slide-file`` are LIGOLW xml files that describe any time slides used in the analysis. A typical analysis will only analyze injections with the zerolag “time slide” (i.e. the data are not slid in time), and will consider the zerolag and one other time slide for the non-injection analysis. The time slide is used to perform a blind sanity check of the noise model. injections will list a set of injections, each with their own label. In this example, there is only one injection set, and it is labeled “bns”. file is a relative path to the injection file (a LIGOLW xml file that contains the parameters of the injections, but not the actual waveforms themselves). range sets the chirp-mass range that should be considered when searching for this particular set of injections. Multiple injection files can be provided, each with their own label, file, and range. The only option here that a user will normally interact with is the injections option. When using multiple sub template banks, replace ``bns:`` under ``injections:`` with ``inj:`` Section: Injections """""""""""""""""""" .. code-block:: yaml injections: sets: expected-snr: f-low: 15.0 bns: f-low: 14.0 seed: 72338 time: step: 32 interval: 1 shift: 0 waveform: SpinTaylorT4threePointFivePN mass-distr: componentMass mass1: min: 1.1 max: 2.8 mass2: min: 1.1 max: 2.8 spin1: min: 0 max: 0.05 spin2: min: 0 max: 0.05 distance: min: 10000 max: 80000 spin-aligned: True file: bns_injections.xml The ``sets`` subsection is used to create injection sets to be used within the analysis, and referenced to by name in the ``filter`` section. In ``sets``, the injections are grouped by key. In this case, one ``bns`` injection set which creates the ``bns_injections.xml`` file and used in the ``injections`` section of the ``filter`` section. For multiple injections, the chunk for ``bns:`` should be repeated for each injection. Reference `mario config `_ . Besides creating injection sets, the ``expected-snr`` subsection is used for the expected SNR jobs. These settings are used to override defaults as needed. ``spin-aligned`` specifies whether the injections should be spin-(mis)aligned spins (if ``spin-aligned: True``) or precessing spins (if ``spin-aligned: False``). In the case of multiple injection sets that need to be combined, one can add a few options to create a combined file and reference that within the filter jobs. This can be useful for large banks with a large set of templates. To do this, one can add the following: .. code-block:: yaml injections: combine: true combined-file: combined_injections.xml The injections created are generated from the ``lalapps_inspinj`` program, with the following mapping between configuration and command line options: * ``f-low``: ``--f-lower`` * ``seed``: ``--seed`` * ``time`` section: ``-time-step``, ``--time-interval``. ``shift`` adjusts the start time appropriately. * ``waveform``: ``--waveform`` * ``mass-distr``: ``--m-distr`` * ``mass/spin/distance`` sections: maps to options like ``--min-mass1`` Section: Prior """""""""""""""" .. code-block:: yaml prior: mass-model: mass_model/mass_model_small.h5 ``mass-model`` is a relative path to the file that contains the mass model. This model is used to weight templates appropriately when assigning ranking statistics based on our understanding of the astrophysical distribution of signals. Users will not typically change this option. An optional ``dtdphi-file`` and ``idq-timeseries`` can be provided here. If not given, a default model (included in the standard installation) will be used. The dtdph file will specify a probability distribution function for the probability of measuring a given time shift and phase shift in mulitple detector observation. It enters in the ranking statistics. The idq file will give information about the data quality around the time of coalescence. If specifying idq files and dtdphi files, create a directory for idq and dtdphi each in the ````, and put the idq files and dtdphi files in the respective directory. Reference `mario config `_ . Section: Rank """""""""""""""" .. code-block:: yaml rank: ranking-stat-samples: 4194304 ``ranking-stat-samples`` sets the number of samples to draw from the noise model when computing the distribution of log likelihood-ratios (the ranking statistic) under the noise hypothesis. Users will not typically change this option. Section: Summary """""""""""""""""" .. code-block:: yaml summary: webdir: /path/to/public_html/folder ``webdir`` sets the path of the output results webpages produced by the analysis. Users will typically change this option for each analysis. Section: Condor """""""""""""""""" .. code-block:: yaml condor: profile: osg-public accounting-group: ligo.dev.o3.cbc.uber.gstlaloffline accounting-group-user: singularity-image: ``profile`` sets a base level of configuration options for condor. ``accounting-group`` sets accounting group details on LDG resources. Currently the machinery to produce an analysis dag requires this option, but the option is not actually used by analyses running on non-LDG resources. ``singularity-image`` sets the path of the container on cvmfs that the analysis should use. Users will not typically change this option (use ``/cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master``). .. _install-custom-profiles: Installing Custom Site Profiles ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can define a site profile as YAML. As an example, we can create a file called ``custom.yml``: .. code-block:: yaml scheduler: condor requirements: - "(IS_GLIDEIN=?=True)" Both the directives and requirements sections are optional. To install one so it's available for use, run: .. code:: bash $ singularity exec gstlal_grid_profile install custom.yml