This module manages a table of subject ids and their associated image file names. A subset of the table can be retrieved by partitioning the set of images into subsets of Train, Validation, Inference.

class ImageSetsPartitioner[source]

Bases: object

This class maintains a pandas.dataframe of filenames for all input sections. The list of filenames are obtained by searching the specified folders or loading from an existing csv file.

Users can query a subset of the dataframe by train/valid/infer partition label and input section names.

data_param = None
ratios = None
new_partition = False
data_split_file = ''
default_image_file_location = u'/home/docs/niftynet'
initialise(data_param, new_partition=False, data_split_file=None, ratios=None)[source]

Set the data partitioner parameters

  • data_param – corresponding to all config sections
  • new_partition – bool value indicating whether to generate new partition ids and overwrite csv file (this class will write partition file iff new_partition)
  • data_split_file – location of the partition id file
  • ratios – a tuple/list with two elements: (fraction of the validation set, fraction of the inference set) initialise to None will disable data partitioning and get_file_list always returns all subjects.

query number of images according to phase.

get_file_list(phase=u'all', *section_names)[source]

get file names as a dataframe, by partitioning phase and section names set phase to ALL to load all subsets.

  • phase – the label of the subset generated by self._partition_ids should be one of the SUPPORTED_PHASES
  • section_names – one or multiple input section names

a pandas.dataframe of file names


Go through all input data sections, converting each section to a list of file names.

These lists are merged on COLUMN_UNIQ_ID.

This function sets self._file_list.

list all files by a given input data section::

if the csv_file property of data_param[modality_name] corresponds to a file, read the list from the file; otherwise

write the list to csv_file.
Returns:a table with two columns, the column names are (COLUMN_UNIQ_ID, modality_name).

Label each subject as one of the TRAIN, VALID, INFER, use self.ratios to compute the size of each set.

The results will be written to self.data_split_file if overwrite otherwise it tries to read partition labels from it.

This function sets self._partition_ids.


Print summary of the partitioner.

Returns:True if the phase subset of images is not empty.

return – True if the TRAIN subset of images is not empty.


return – True if the INFER subset of images is not empty.


return – True if the VALID subset of images is not empty.


return – the list of validation filenames.


return – the list of training filenames.


return – the list of inference filenames (defaulting to list of all filenames if no partition definition)


return – list of all filenames

get_file_lists_by(phase=None, action='train')[source]

Get file lists by action and phase.

This function returns file lists for training/validation/inference based on the phase or action specified by the user.

phase has a higher priority: If phase specified, the function returns the corresponding file list (as a list).

otherwise, the function checks action: it returns train and validation file lists if it’s training action, otherwise returns inference file list.

  • action – an action
  • phase – an element from {TRAIN, VALID, INFER, ALL}


reset all fields of this singleton class.


x.__init__(…) initializes x; see help(type(x)) for signature