Input filename matching

Cross-subject analysis

By default, NiftyNet treats each image file as a subject, training/validation/inference procedures are designed in a cross-subject manner.

To facilitate the cross-subject analysis, the user should specify lists of files to be used. For example, the relevant configurations could be:

[SYSTEM]
dataset_split_file = '/mnt/data/cross_validation_fold_01.csv'

[MRI_T1]
csv_file = '/mnt/data/t1_list.csv'

[segmentation_target]
csv_file = '/mnt/data/ground_truth.csv'

where [MRI_T1] and [segmentation_target] are input source sections, with .csv files specify the lists of input images; dataset_split_file under the [SYSTEM] section specifies the partitioning of the dataset.

The csv files should be created beforehand by the user and share the same set of unique subject identifier (“subject ID”) among them, for example:

Content of t1_list.csv:

subject_001,/mnt/data/t1/T1_001_img.nii.gz
subject_002,/mnt/data/t1/T1_002_img.nii.gz

Content of ground_truth.csv:

subject_001,/mnt/data/ground_truth/001_img_seg.nii.gz
subject_002,/mnt/data/ground_truth/002_img_seg.nii.gz

Content of cross_validation_fold_01.txt:

subject_001,training
subject_002,inference

In this example, image and the corresponding ground truth of subject_001 will be used for training; subject_002 will be used at the inference phase.

Automatic filename matching

Manually creating the .csv files could be error-prone, NiftyNet also provides automatic file searching functionalities.

Searching file by filename

The configuration parameters for filename searching are:

  • path_to_search
  • filename_contains
  • filename_not_contains Multiple values are supported for these parameters. For example:
path_to_search = /mnt/data/image_folder_1, /mnt/shared/image_folder_2
filename_contains = img, T1
filename_not_contains = label, seg

will find all files in folder /mnt/data/image_folder_1 and /mnt/shard/image_folder_2 with name containing “img” and “T1”, and name not containing “label” and “seg”. The subject ID will be automatically assigned as filename without the filename_contains keywords and the file extension names.

Based on these criteria, regular file name

/mnt/data/image_folder_1/T1_001_img.nii.gz

will be matched. The subject ID will be _001_ which is T1_001_img without the matched keywords “img” and “T1”.

As a result, a new line will be appended to the automatically generated .csv file:

_001_,/mnt/data/image_folder_1/T1_001_img.nii.gz

NiftNet will go through each of the path_to_each, and extract subject IDs. A ValueError will be raised when the same subject ID are extracted from more than one filename.

Extracting subject ID

By default, subject ID is automatically determined as removing filename_contains and file extension names from the matched filename. This behaviour could be altered by further specifying:

filename_removefromid = img

NiftyNet interprets the value of filename_removefromid parameter as regular expression, and uses the output of

# replacing matched filename_removefromid with an empty string
re.sub(filename_removefromid, '', input_filename)

as the subject ID.