niftynet.io.image_sets_partitioner module¶

This module manages a table of subject ids and their associated image file names. A subset of the table can be retrieved by partitioning the set of images into subsets of Train, Validation, Inference.

class ImageSetsPartitioner[source]¶

Bases: object

This class maintains a pandas.dataframe of filenames for all input sections. The list of filenames are obtained by searching the specified folders or loading from an existing csv file.

Users can query a subset of the dataframe by train/valid/infer partition label and input section names.

data_param = None¶

ratios = None¶

new_partition = False¶

data_split_file = ''¶

default_image_file_location = u'/home/docs/niftynet'¶

initialise(data_param, new_partition=False, data_split_file=None, ratios=None)[source]¶

Set the data partitioner parameters

Parameters:

data_param – corresponding to all config sections
new_partition – bool value indicating whether to generate new partition ids and overwrite csv file (this class will write partition file iff new_partition)
data_split_file – location of the partition id file
ratios – a tuple/list with two elements: (fraction of the validation set, fraction of the inference set) initialise to None will disable data partitioning and get_file_list always returns all subjects.

number_of_subjects(phase=u'all')[source]¶

query number of images according to phase.

Parameters:	phase –
Returns:

get_file_list(phase=u'all', *section_names)[source]¶

get file names as a dataframe, by partitioning phase and section names set phase to ALL to load all subsets.

Parameters:	phase – the label of the subset generated by self._partition_ids should be one of the SUPPORTED_PHASES section_names – one or multiple input section names
Returns:	a pandas.dataframe of file names

load_data_sections_by_subject()[source]¶

Go through all input data sections, converting each section to a list of file names.

These lists are merged on COLUMN_UNIQ_ID.

This function sets self._file_list.

grep_files_by_data_section(modality_name)[source]¶

list all files by a given input data section::: if the csv_file property of data_param[modality_name] corresponds to a file, read the list from the file; otherwise

write the list to csv_file.

Returns:	a table with two columns, the column names are `(COLUMN_UNIQ_ID, modality_name)`.

randomly_split_dataset(overwrite=False)[source]¶

Label each subject as one of the TRAIN, VALID, INFER, use self.ratios to compute the size of each set.

The results will be written to self.data_split_file if overwrite otherwise it tries to read partition labels from it.

This function sets self._partition_ids.

to_string()[source]¶: Print summary of the partitioner.

has_phase(phase)[source]¶

Returns:	True if the phase subset of images is not empty.

has_training¶: return – True if the TRAIN subset of images is not empty.

has_inference¶: return – True if the INFER subset of images is not empty.

has_validation¶: return – True if the VALID subset of images is not empty.

validation_files¶: return – the list of validation filenames.

train_files¶: return – the list of training filenames.

inference_files¶: return – the list of inference filenames (defaulting to list of all filenames if no partition definition)

all_files¶: return – list of all filenames

get_file_lists_by(phase=None, action='train')[source]¶

Get file lists by action and phase.

This function returns file lists for training/validation/inference based on the phase or action specified by the user.

phase has a higher priority: If phase specified, the function returns the corresponding file list (as a list).

otherwise, the function checks action: it returns train and validation file lists if it’s training action, otherwise returns inference file list.

Parameters:	action – an action phase – an element from `{TRAIN, VALID, INFER, ALL}`
Returns:

reset()[source]¶: reset all fields of this singleton class.

__init__¶: x.__init__(…) initializes x; see help(type(x)) for signature