niftynet.io.image_sets_partitioner module

This module manages a table of subject ids and their associated image file names. A subset of the table can be retrieved by partitioning the set of images into subsets of Train, Validation, Inference.

class ImageSetsPartitioner[source]

Bases: object

This class maintains a pandas.dataframe of filenames for all input sections. The list of filenames are obtained by searching the specified folders or loading from an existing csv file.

Users can query a subset of the dataframe by train/valid/infer partition label and input section names.

data_param = None
ratios = None
new_partition = False
data_split_file = ''
default_image_file_location = u'/home/docs/niftynet'
initialise(data_param, new_partition=False, data_split_file=None, ratios=None)[source]

Set the data partitioner parameters

Parameters:
  • data_param – corresponding to all config sections
  • new_partition – bool value indicating whether to generate new partition ids and overwrite csv file (this class will write partition file iff new_partition)
  • data_split_file – location of the partition id file
  • ratios – a tuple/list with two elements: (fraction of the validation set, fraction of the inference set) initialise to None will disable data partitioning and get_file_list always returns all subjects.
number_of_subjects(phase=u'all')[source]

query number of images according to phase.

Parameters:phase
Returns:
get_file_list(phase=u'all', *section_names)[source]

get file names as a dataframe, by partitioning phase and section names set phase to ALL to load all subsets.

Parameters:
  • phase – the label of the subset generated by self._partition_ids should be one of the SUPPORTED_PHASES
  • section_names – one or multiple input section names
Returns:

a pandas.dataframe of file names

load_data_sections_by_subject()[source]

Go through all input data sections, converting each section to a list of file names.

These lists are merged on COLUMN_UNIQ_ID.

This function sets self._file_list.

grep_files_by_data_section(modality_name)[source]
list all files by a given input data section::

if the csv_file property of data_param[modality_name] corresponds to a file, read the list from the file; otherwise

write the list to csv_file.
Returns:a table with two columns, the column names are (COLUMN_UNIQ_ID, modality_name).
randomly_split_dataset(overwrite=False)[source]

Label each subject as one of the TRAIN, VALID, INFER, use self.ratios to compute the size of each set.

The results will be written to self.data_split_file if overwrite otherwise it tries to read partition labels from it.

This function sets self._partition_ids.

to_string()[source]

Print summary of the partitioner.

has_phase(phase)[source]
Returns:True if the phase subset of images is not empty.
has_training

True if the TRAIN subset of images is not empty.

Type:return
has_inference

True if the INFER subset of images is not empty.

Type:return
has_validation

True if the VALID subset of images is not empty.

Type:return
validation_files

the list of validation filenames.

Type:return
train_files

the list of training filenames.

Type:return
inference_files

the list of inference filenames (defaulting to list of all filenames if no partition definition)

Type:return
all_files

list of all filenames

Type:return
get_file_lists_by(phase=None, action='train')[source]

Get file lists by action and phase.

This function returns file lists for training/validation/inference based on the phase or action specified by the user.

phase has a higher priority: If phase specified, the function returns the corresponding file list (as a list).

otherwise, the function checks action: it returns train and validation file lists if it’s training action, otherwise returns inference file list.

Parameters:
  • action – an action
  • phase – an element from {TRAIN, VALID, INFER, ALL}
Returns:

reset()[source]

reset all fields of this singleton class.

__init__

x.__init__(…) initializes x; see help(type(x)) for signature