Install

Summary

Pull docker container:

docker pull neurodata/ndmg_dev

Run dmri participant pipeline:

docker run -ti -v /path/to/local/data:/data neurodata/ndmg_dev /data/ /data/outputs

System Requirements

The ndmg pipeline was developed and tested primarily on Mac OSX, Ubuntu (12, 14, 16, 18), and CentOS (5, 6);

Made to work on Python 3.6;

Is wrapped in a Docker container;

Has install instructions via a Dockerfile;

Requires no non-standard hardware to run;

Has key features built upon FSL, Dipy, Nibabel, Nilearn, Networkx, Numpy, Scipy, Scikit-Learn, and others;

Takes approximately 1-core, 8-GB of RAM, and 1 hour to run for most datasets.

While ndmg is quite robust to Python package versions (with only few exceptions, mentioned in the installation guide), an example of possible versions (taken from the ndmg Docker Image with version v0.2.0) is shown below. Note: this list excludes many libraries which are standard with a Python distribution, and a complete list with all packages and versions can be produced by running pip freeze within the Docker container mentioned above.

awscli==1.16.210 , boto3==1.9.200 , botocore==1.12.200 , colorama==0.3.9 , configparser>=3.7.4 ,
Cython==0.29.13 , dipy==0.16.0 , duecredit==0.7.0 , fury==0.3.0 , graspy==0.0.3 , ipython==7.7.0 ,
matplotlib==3.1.1 , networkx==2.3 , nibabel==2.5.0 , nilearn==0.5.2 , numpy==1.17.0 , pandas==0.25.0,
Pillow==6.1.0 , plotly==1.12.9, pybids==0.6.4 , python-dateutil==2.8.0 , PyVTK==0.5.18 ,
requests==2.22.0 , s3transfer==0.2.1 , setuptools>=40.0 scikit-image==0.13.0 , scikit-learn==0.21.3 ,
scipy==1.3.0 , sklearn==8.0 , vtk==8.1.2

Installation Guide

Currently, the Docker image is recommended.

pip and Github installations are also available.

Docker

The neurodata/m3r-release Docker container enables users to run end-to-end connectome estimation on structural MRI or functional MRI right from container launch. The pipeline requires that data be organized in accordance with the BIDS spec. If the data you wish to process is available on S3 you simply need to provide your s3 credentials at build time and the pipeline will auto-retrieve your data for processing.

If you have never used Docker before, it is useful to run through the Docker documentation.

Getting Docker container:

$ docker pull neurodata/ndmg_dev

(A) I do not wish to use S3:

You are good to go!

(B) I wish to use S3:

Add your secret key/access id to a file called credentials.csv in this directory on your local machine. A dummy file has been provided to make the format we expect clear. (This is how AWS provides credentials)

Processing Data

Below is the help output generated by running ndmg with the -h command. All parameters are explained in this output.

$ docker run -ti neurodata/ndmg_dev -h

usage: ndmg_bids [-h]
             [--participant_label PARTICIPANT_LABEL [PARTICIPANT_LABEL ...]]
             [--session_label SESSION_LABEL [SESSION_LABEL ...]]
             [--run_label RUN_LABEL [RUN_LABEL ...]] [--bucket BUCKET]
             [--remote_path REMOTE_PATH] [--push_data] [--dataset DATASET]
             [--atlas ATLAS] [--debug] [--sked] [--skreg] [--vox VOX] [-c]
             [--mod MOD] [--tt TT] [--mf MF] [--sp SP] [--seeds SEEDS]
             [--modif MODIF]
             bids_dir output_dir

This is an end-to-end connectome estimation pipeline from M3r Images.

positional arguments:
bids_dir              The directory with the input dataset formatted
                      according to the BIDS standard.
output_dir            The directory where the output files should be stored.
                      If you are running group level analysis this folder
                      should be prepopulated with the results of the
                      participant level analysis.

optional arguments:
-h, --help            show this help message and exit
--participant_label PARTICIPANT_LABEL [PARTICIPANT_LABEL ...]
                      The label(s) of the participant(s) that should be
                      analyzed. The label corresponds to
                      sub-<participant_label> from the BIDS spec (so it does
                      not include "sub-"). If this parameter is not provided
                      all subjects should be analyzed. Multiple participants
                      can be specified with a space separated list.
--session_label SESSION_LABEL [SESSION_LABEL ...]
                      The label(s) of the session that should be analyzed.
                      The label corresponds to ses-<participant_label> from
                      the BIDS spec (so it does not include "ses-"). If this
                      parameter is not provided all sessions should be
                      analyzed. Multiple sessions can be specified with a
                      space separated list.
--run_label RUN_LABEL [RUN_LABEL ...]
                      The label(s) of the run that should be analyzed. The
                      label corresponds to run-<run_label> from the BIDS
                      spec (so it does not include "task-"). If this
                      parameter is not provided all runs should be analyzed.
                      Multiple runs can be specified with a space separated
                      list.
--bucket BUCKET       The name of an S3 bucket which holds BIDS organized
                      data. You must have built your bucket with credentials
                      to the S3 bucket you wish to access.
--remote_path REMOTE_PATH
                      The path to the data on your S3 bucket. The data will
                      be downloaded to the provided bids_dir on your
                      machine.
--push_data           flag to push derivatives back up to S3.
--dataset DATASET     The name of the dataset you are perfoming QC on.
--atlas ATLAS         The atlas being analyzed in QC (if you only want one).
--debug               If False, remove any old files in the output
                      directory.
--sked                Whether to skip eddy correction if it has already been
                      run.
--skreg               whether or not to skip registration
--vox VOX             Voxel size to use for template registrations (e.g.
                      default is '2mm')
-c, --clean           Whether or not to delete intemediates
--mod MOD             Determinstic (det) or probabilistic (prob) tracking.
                      Default is det.
--tt TT               Tracking approach: local or particle. Default is
                    local.
--mf MF               Diffusion model: csd or csa. Default is csd.
--sp SP               Space for tractography. Default is native.
--seeds SEEDS         Seeding density for tractography. Default is 20.
--modif MODIF         Name of folder on s3 to push to. If empty, push to a
                      folder with ndmg's version number.

In order to share data between our container and the rest of our machine in Docker, we need to mount a volume. Docker does this with the -v flag. Docker expects its input formatted as: -v path/to/local/data:/path/in/container. We’ll do this when we launch our container, as well as give it a helpful name so we can locate it later on.

To run ndmg on data

docker run -ti -v /path/to/local/data:/data neurodata/ndmg_dev /data/ /data/outputs

Pip

ndmg relies on FSL, Dipy, networkx, and nibabel, numpy scipy, scikit-learn, scikit-image, nilearn. You should install FSL through the instructions on their website, then follow install other Python dependencies with the following:

pip install ndmg

The only known packages which require a specific version are plotly and networkx, due to backwards-compatability breaking changes.

Installation shouldn’t take more than a few minutes, but depends on your internet connection.

Github

To install directly from Github, run:

git clone https://github.com/neurodata/ndmg
cd ndmg
python setup.py install