Nextflow Configuration

Nextflow pipelines look for a configuration file nextflow.conf at runtime to determine resource allocation and define any environment variables required for processes within the script. The longitudinal GWAS pipeline provides a default configuration file for users to manage resource allocation and define environment variables specifying the paths of the outputs and cache directory. Users can override the default configuration by specifying a custom configuation file with the -params-file option or specifying individual parameters via the commandline --something value at runtime.

By default, each task is launched using the gwas-pipeline docker image and assigned 2 cpus. The process labels small and medium are used to manage resource allocation between 2 types of tasks within the pipeline. small tasks run in parallel each with access to 2 cpus and 20GB of working memory. This label is typically assigned to the model fitting jobs. medium tasks have access to 48 cpus and 225GB of working memory. These tasks benefit from access to multiple cores and are typically run consecutively to take advantage of the capcacity of the compute resources.

Below is the default nextflow.config file used by the longitudinal GWAS pipeline

process.container = 'gwas-pipeline'

// environment variables for GWAS pipeline
env {
  GWAS_RESOURCE_DIR = '/files'
  GWAS_OUTPUT_DIR = '/files/longGWAS_pipeline/results'
  GWAS_STORE_DIR = '/files/longGWAS_pipeline/cache'
  ADDI_QC_PIPELINE = '/usr/src/ADDI-GWAS-QC-pipeline/addi_qc_pipeline.py'
}

executor {
  name = 'local'
  cpus = 48
  memory = '480 GB'
}

docker {
  enabled = true
  temp = 'auto'
}

process {
  cpus = 2

  withLabel: small {
    cpus = 2
    memory = '20 GB'
  }

  withLabel: medium {
    cpus = 48
    memory = '225 GB'
  }
}