matryoshka.training_funcs

Module containing functions used to train the matryoshka NNs.

File contains functions used when training the NNs

class matryoshka.training_funcs.LogScaler

Class for a log scaler. Linearly transforms logX such that all samples in logX are in the range [0,1].

fit(X)

Fit the parameters of the transformer based on the training data.

Parameters: X (array) – The training data. Must have shape (nsamps, nfeatures).

inverse_transform(X)

Inverse transform the data.

Parameters: X (array) – The data to be transformed.
Returns: Array containing the inverse transformed data.

transform(X)

Transform the data.

Parameters: X (array) – The data to be transformed.
Returns: Array containing the transformed data.

class matryoshka.training_funcs.Resampler(simulation_samples=None, parameter_ranges=None, use_latent_space=False)

Class for re-sampling the parameter space covered by a suite of simulations. The new samples can then be used to generate training data for the base model componenet emulators.

Note

See the Generating training samples for the base model componenets example.

Parameters

simulation_samples (array) – The samples in the parameter space from the simulation suite. Default is None.
parameter_ranges (array) – Ranges that define the extent of the parameter space. Should have shape (n, 2), where the first column is the minimum value for the n parameters, and the second column is the maximum. Default is None.
use_latent_space (bool) – If True the origonal simulation samples will be transfromed into an uncorrelated latent space for re-sampling. Default is False.

new_samples(nsamps, LH=True, buffer=None)

Generate new samples from the region covered by the simulations.

Parameters

nsamps (int) – The number of new samples to generate.
LH (bool) – If True will use latin-hypercube sampling. Default is True.

Returns

Array containing the new samples. Has shape (nsamps, d).

class matryoshka.training_funcs.StandardScaler

Replacement for sklearn StandardScaler(). Rescales X such that it has zero mean and unit variance.

fit(X)

Fit the parameters of the transformer based on the training data.

Parameters: X (array) – The training data. Must have shape (nsamps, nfeatures).

inverse_transform(X)

Inverse transform the data.

Parameters: X (array) – The data to be transformed.
Returns: Array containing the inverse transformed data.

transform(X)

Transform the data.

Parameters: X (array) – The data to be transformed.
Returns: Array containing the transformed data.

class matryoshka.training_funcs.UniformScaler

Class for a simple uniform scaler. Linearly transforms X such that all samples in X are in the range [0,1].

fit(X)

Fit the parameters of the transformer based on the training data.

Parameters: X (array) – The training data. Must have shape (nsamps, nfeatures).

inverse_transform(X)

Inverse transform the data.

Parameters: X (array) – The data to be transformed.
Returns: Array containing the inverse transformed data.

transform(X)

Transform the data.

Parameters: X (array) – The data to be transformed.
Returns: Array containing the transformed data.

matryoshka.training_funcs.dataset(target, split, X_or_Y)

Convenience function for loading datasets for the base model component emulators.

Parameters

target (str) – The target function of interest.
split (str) – Can be “train”, “test”, or “val” (when a validation set is available).
X_or_Y (str) – Do you want the features (“X”) or the function (“Y”).

Returns

Array containing the dataset.

matryoshka.training_funcs.trainNN(trainX, trainY, validation_data, nodes, learning_rate, batch_size, epochs, callbacks=None, DR=None, verbose=0)

A high-level function for quickly training a simple NN based emulator. The user NN will be optimsed with an Adam optimser and mean squared error loss function.

Parameters

trainX (array) – Array containing the parameters/features of the training set. Should have shape (n, d).
trainY (aray) – Array containing the target function of the training set. Should have shape (n, k).
validation_data (tuple) – Tuple of arrays (valX, valY). Where valX and valY are the equivalent of trainX and trainY for the validation data. Can be None if there is not a validation set.
nodes (array) – Array containing the number of nodes in each hidden layer. Should have shape (N, ), with N being the desired number of hidden layers.
learning_rate (float) – The learning rate to be used during training.
batch_size (int) – The batch size to be used during training.
epochs (int) – The number of epochs to train the NN.
callbacks (list) – List of tensorflow callbacks e.g. EarlyStopping
DR (float) – Float between 0 and 1 that defines the dropout rate. If None dropout will not be used.
verbose (int) – Defines how much information tensorflow prints during training. 0 = silent, 1 = progress bar, 2 = one line per epoch.

Returns

Trained keras Sequential model.

matryoshka.training_funcs.train_test_indices(N, split=0.2)

Return indicies that can be used to split a dataset into train and test sets.

Parameters

N (int) – The size of the original dataset
split (float) – The proportion of the data to be used for the test set. Should be a float between 0 and 1. Default is 0.2

Returns

The train and test indicies arrays.