niftynet.io.image_sets_partitioner module¶
This module manages a table of subject ids and
their associated image file names.
A subset of the table can be retrieved by partitioning the set of images into
subsets of Train
, Validation
, Inference
.
-
class
ImageSetsPartitioner
[source]¶ Bases:
object
This class maintains a pandas.dataframe of filenames for all input sections. The list of filenames are obtained by searching the specified folders or loading from an existing csv file.
Users can query a subset of the dataframe by train/valid/infer partition label and input section names.
-
data_param
= None¶
-
ratios
= None¶
-
new_partition
= False¶
-
data_split_file
= ''¶
-
default_image_file_location
= u'/home/docs/niftynet'¶
-
initialise
(data_param, new_partition=False, data_split_file=None, ratios=None)[source]¶ Set the data partitioner parameters
Parameters: - data_param – corresponding to all config sections
- new_partition – bool value indicating whether to generate new partition ids and overwrite csv file (this class will write partition file iff new_partition)
- data_split_file – location of the partition id file
- ratios – a tuple/list with two elements:
(fraction of the validation set, fraction of the inference set)
initialise to None will disable data partitioning and get_file_list always returns all subjects.
-
number_of_subjects
(phase=u'all')[source]¶ query number of images according to phase.
Parameters: phase – Returns:
-
get_file_list
(phase=u'all', *section_names)[source]¶ get file names as a dataframe, by partitioning phase and section names set phase to ALL to load all subsets.
Parameters: - phase – the label of the subset generated by self._partition_ids should be one of the SUPPORTED_PHASES
- section_names – one or multiple input section names
Returns: a pandas.dataframe of file names
-
load_data_sections_by_subject
()[source]¶ Go through all input data sections, converting each section to a list of file names.
These lists are merged on
COLUMN_UNIQ_ID
.This function sets
self._file_list
.
-
grep_files_by_data_section
(modality_name)[source]¶ - list all files by a given input data section::
if the
csv_file
property ofdata_param[modality_name]
corresponds to a file, read the list from the file; otherwisewrite the list tocsv_file
.
Returns: a table with two columns, the column names are (COLUMN_UNIQ_ID, modality_name)
.
-
randomly_split_dataset
(overwrite=False)[source]¶ Label each subject as one of the
TRAIN
,VALID
,INFER
, useself.ratios
to compute the size of each set.The results will be written to
self.data_split_file
if overwrite otherwise it tries to read partition labels from it.This function sets
self._partition_ids
.
-
has_training
¶ True if the TRAIN subset of images is not empty.
Type: return
-
has_inference
¶ True if the INFER subset of images is not empty.
Type: return
-
has_validation
¶ True if the VALID subset of images is not empty.
Type: return
-
validation_files
¶ the list of validation filenames.
Type: return
-
train_files
¶ the list of training filenames.
Type: return
-
inference_files
¶ the list of inference filenames (defaulting to list of all filenames if no partition definition)
Type: return
-
all_files
¶ list of all filenames
Type: return
-
get_file_lists_by
(phase=None, action='train')[source]¶ Get file lists by action and phase.
This function returns file lists for training/validation/inference based on the phase or action specified by the user.
phase
has a higher priority: If phase specified, the function returns the corresponding file list (as a list).otherwise, the function checks
action
: it returns train and validation file lists if it’s training action, otherwise returns inference file list.Parameters: - action – an action
- phase – an element from
{TRAIN, VALID, INFER, ALL}
Returns:
-
__init__
¶ x.__init__(…) initializes x; see help(type(x)) for signature
-