niftynet.io.image_sets_partitioner module¶
This module manages a table of subject ids and
their associated image file names.
A subset of the table can be retrieved by partitioning the set of images into
subsets of Train, Validation, Inference.
-
class
ImageSetsPartitioner[source]¶ Bases:
objectThis class maintains a pandas.dataframe of filenames for all input sections. The list of filenames are obtained by searching the specified folders or loading from an existing csv file.
Users can query a subset of the dataframe by train/valid/infer partition label and input section names.
-
data_param= None¶
-
ratios= None¶
-
new_partition= False¶
-
data_split_file= ''¶
-
default_image_file_location= u'/home/docs/niftynet'¶
-
initialise(data_param, new_partition=False, data_split_file=None, ratios=None)[source]¶ Set the data partitioner parameters
Parameters: - data_param – corresponding to all config sections
- new_partition – bool value indicating whether to generate new partition ids and overwrite csv file (this class will write partition file iff new_partition)
- data_split_file – location of the partition id file
- ratios – a tuple/list with two elements:
(fraction of the validation set, fraction of the inference set)initialise to None will disable data partitioning and get_file_list always returns all subjects.
-
number_of_subjects(phase=u'all')[source]¶ query number of images according to phase.
Parameters: phase – Returns:
-
get_file_list(phase=u'all', *section_names)[source]¶ get file names as a dataframe, by partitioning phase and section names set phase to ALL to load all subsets.
Parameters: - phase – the label of the subset generated by self._partition_ids should be one of the SUPPORTED_PHASES
- section_names – one or multiple input section names
Returns: a pandas.dataframe of file names
-
load_data_sections_by_subject()[source]¶ Go through all input data sections, converting each section to a list of file names.
These lists are merged on
COLUMN_UNIQ_ID.This function sets
self._file_list.
-
grep_files_by_data_section(modality_name)[source]¶ - list all files by a given input data section::
if the
csv_fileproperty ofdata_param[modality_name]corresponds to a file, read the list from the file; otherwisewrite the list tocsv_file.
Returns: a table with two columns, the column names are (COLUMN_UNIQ_ID, modality_name).
-
randomly_split_dataset(overwrite=False)[source]¶ Label each subject as one of the
TRAIN,VALID,INFER, useself.ratiosto compute the size of each set.The results will be written to
self.data_split_fileif overwrite otherwise it tries to read partition labels from it.This function sets
self._partition_ids.
-
has_training¶ return – True if the TRAIN subset of images is not empty.
-
has_inference¶ return – True if the INFER subset of images is not empty.
-
has_validation¶ return – True if the VALID subset of images is not empty.
-
validation_files¶ return – the list of validation filenames.
-
train_files¶ return – the list of training filenames.
-
inference_files¶ return – the list of inference filenames (defaulting to list of all filenames if no partition definition)
-
all_files¶ return – list of all filenames
-
get_file_lists_by(phase=None, action='train')[source]¶ Get file lists by action and phase.
This function returns file lists for training/validation/inference based on the phase or action specified by the user.
phasehas a higher priority: If phase specified, the function returns the corresponding file list (as a list).otherwise, the function checks
action: it returns train and validation file lists if it’s training action, otherwise returns inference file list.Parameters: - action – an action
- phase – an element from
{TRAIN, VALID, INFER, ALL}
Returns:
-
__init__¶ x.__init__(…) initializes x; see help(type(x)) for signature
-