matryoshka.training_funcs

Module containing functions used to train the matryoshka NNs.

File contains functions used when training the NNs

class matryoshka.training_funcs.LogScaler

Class for a log scaler. Linearly transforms logX such that all samples in logX are in the range [0,1].

fit(X)

Fit the parameters of the transformer based on the training data.

Parameters

X (array) – The training data. Must have shape (nsamps, nfeatures).

inverse_transform(X)

Inverse transform the data.

Parameters

X (array) – The data to be transformed.

Returns

Array containing the inverse transformed data.

transform(X)

Transform the data.

Parameters

X (array) – The data to be transformed.

Returns

Array containing the transformed data.

class matryoshka.training_funcs.Resampler(simulation_samples=None, parameter_ranges=None, use_latent_space=False)

Class for re-sampling the parameter space covered by a suite of simulations. The new samples can then be used to generate training data for the base model componenet emulators.

Parameters
  • simulation_samples (array) – The samples in the parameter space from the simulation suite. Default is None.

  • parameter_ranges (array) – Ranges that define the extent of the parameter space. Should have shape (n, 2), where the first column is the minimum value for the n parameters, and the second column is the maximum. Default is None.

  • use_latent_space (bool) – If True the origonal simulation samples will be transfromed into an uncorrelated latent space for re-sampling. Default is False.

new_samples(nsamps, LH=True, buffer=None)

Generate new samples from the region covered by the simulations.

Parameters
  • nsamps (int) – The number of new samples to generate.

  • LH (bool) – If True will use latin-hypercube sampling. Default is True.

Returns

Array containing the new samples. Has shape (nsamps, d).

class matryoshka.training_funcs.StandardScaler

Replacement for sklearn StandardScaler(). Rescales X such that it has zero mean and unit variance.

fit(X)

Fit the parameters of the transformer based on the training data.

Parameters

X (array) – The training data. Must have shape (nsamps, nfeatures).

inverse_transform(X)

Inverse transform the data.

Parameters

X (array) – The data to be transformed.

Returns

Array containing the inverse transformed data.

transform(X)

Transform the data.

Parameters

X (array) – The data to be transformed.

Returns

Array containing the transformed data.

class matryoshka.training_funcs.UniformScaler

Class for a simple uniform scaler. Linearly transforms X such that all samples in X are in the range [0,1].

fit(X)

Fit the parameters of the transformer based on the training data.

Parameters

X (array) – The training data. Must have shape (nsamps, nfeatures).

inverse_transform(X)

Inverse transform the data.

Parameters

X (array) – The data to be transformed.

Returns

Array containing the inverse transformed data.

transform(X)

Transform the data.

Parameters

X (array) – The data to be transformed.

Returns

Array containing the transformed data.

matryoshka.training_funcs.dataset(target, split, X_or_Y)

Convenience function for loading datasets for the base model component emulators.

Parameters
  • target (str) – The target function of interest.

  • split (str) – Can be “train”, “test”, or “val” (when a validation set is available).

  • X_or_Y (str) – Do you want the features (“X”) or the function (“Y”).

Returns

Array containing the dataset.

matryoshka.training_funcs.trainNN(trainX, trainY, validation_data, nodes, learning_rate, batch_size, epochs, callbacks=None, DR=None, verbose=0)

A high-level function for quickly training a simple NN based emulator. The user NN will be optimsed with an Adam optimser and mean squared error loss function.

Parameters
  • trainX (array) – Array containing the parameters/features of the training set. Should have shape (n, d).

  • trainY (aray) – Array containing the target function of the training set. Should have shape (n, k).

  • validation_data (tuple) – Tuple of arrays (valX, valY). Where valX and valY are the equivalent of trainX and trainY for the validation data. Can be None if there is not a validation set.

  • nodes (array) – Array containing the number of nodes in each hidden layer. Should have shape (N, ), with N being the desired number of hidden layers.

  • learning_rate (float) – The learning rate to be used during training.

  • batch_size (int) – The batch size to be used during training.

  • epochs (int) – The number of epochs to train the NN.

  • callbacks (list) – List of tensorflow callbacks e.g. EarlyStopping

  • DR (float) – Float between 0 and 1 that defines the dropout rate. If None dropout will not be used.

  • verbose (int) – Defines how much information tensorflow prints during training. 0 = silent, 1 = progress bar, 2 = one line per epoch.

Returns

Trained keras Sequential model.

matryoshka.training_funcs.train_test_indices(N, split=0.2)

Return indicies that can be used to split a dataset into train and test sets.

Parameters
  • N (int) – The size of the original dataset

  • split (float) – The proportion of the data to be used for the test set. Should be a float between 0 and 1. Default is 0.2

Returns

The train and test indicies arrays.