Deep Learning Tutorial Sessions

In this demo, we will review:

  • Implementing fully connected neural networks using Keras
  • Changing parameters within the Keras models and observing the effect on performance
  • Achieving the performance statistics cited during this week's presentation

Prior to beginning this session, it is assumed that you have already downloaded Demo_Files.zip and moved the contents into a desired folder on your system.

1.0 - Setting up

To begin, locate the file path to the folder on your system storing the files from Demo_Files.zip. The "folder_path" string variable in the cell below currently holds a dummy file path string for reference. Change it to the file path on your system.

In [ ]:
folder_path = '/Users/christopher.chin/Documents/Project_Part_1/Demo_Files'

Run the next cell to load the modules, random seed initializations, functions, and data you will be using in the rest of this demo.

In [ ]:
# Modules to Import and random seeds to set
import numpy as np
import tensorflow as tf
import random as rn
import os
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(42)
rn.seed(12345)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
import scipy.io
import glob
import matplotlib.pyplot as plt
from matplotlib import mlab
import pylab as pl
import math
import sklearn
from sklearn import preprocessing
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import learning_curve
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_curve, auc
import scipy
import pandas 
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from keras.callbacks import Callback, TensorBoard
from keras import layers
from keras import regularizers
from keras.initializers import glorot_uniform
from keras.layers import Input, Dense, Dropout, Activation, BatchNormalization, Flatten, Conv1D, Conv2D, MaxPooling1D, MaxPooling2D, GlobalAveragePooling1D, LSTM, Dropout, Bidirectional
from keras.layers.advanced_activations import LeakyReLU
from keras.models import Model, load_model, Sequential
from keras.optimizers import SGD, Adam
from keras.utils import plot_model, to_categorical
from matplotlib.pyplot import imshow
K.set_image_data_format('channels_last')

class data_and_labels:    
    ''' data_and_labels Class
            Reads Tom's Dataset and extracts the curves and corresponding annotations
            
            Args:
                filenames: list of string filenames for each Excel spreadsheet in the dataset
    '''

    def __init__(self,filenames):
        self.filenames = filenames
        # all_X is an array of shape (64001,40), comprising the 64001 normalized curves
        # that are 40 points each 
        self.all_X = []
        # all_X is an array of shape (64001,40), comprising the 64001 raw, unnormalized
        # curves that are 40 points each 
        self.all_X_raw = []
        # all_Y is an array of shape (64001,1) comprising the annotations for the 64001
        # curves
        self.all_Y = []
        # all_numcurves is an int variable to continuously keep track of how many curves
        # have been analyzed so far
        self.all_numcurves = []

    def excel_to_curves(self,filename):
        ''' excel_to_curves Method
                Takes one Excel spreadsheet and extracts the curves, corresponding annotations,
                and number of points in each curve 
            
                Args:
                    filename: string name of file to analyze
                Returns:
                    file_labels: array containing the annotations of each curve
                    file_numcycles: array containing the number of points in each curve
                    file_curves: array containing all the curves in the file
        '''
        file = pandas.read_excel(io=filename,sheet_name='Sheet1',header=0,usecols='F,K:AY')
        file = file.values
        file_labels = file[:,0]
        file_numcycles = file[:,1]
        file_curves = file[:,2:]
        file_curves = np.transpose(file_curves)
        return file_labels,file_numcycles,file_curves

    def normalize_curves(self,file_curves):
        ''' normalize_curves Method
                Takes all the curves from an Excel spreadsheet and normalizes them from
                0 to 1
            
                Args:
                    file_curves: array containing all the curves in the file of interest
                Returns:
                    file_curves_normalized: array containing all the normalized curves
        '''
        min_max_scaler = sklearn.preprocessing.MinMaxScaler()
        file_curves_normalized = min_max_scaler.fit_transform(file_curves)
        return file_curves_normalized
    
    def sample_to_features(self,sample,numcycles):
        ''' sample_to_features Method
                Takes one curve and converts it into features. If a curve is less than 
                40 points, pad the last value of the curve until the curve is 40 points
                long. 
            
                Args:
                    sample: array containing all the points in the curve
                    numcycles: int number of points in the curve
                Returns:
                    sample_y: array containing the curve padded to 40 points. 
        '''
        sample_y = sample
        if numcycles < 40:
            while len(sample_y) < 40:
                sample_y = np.append(sample_y,sample_y[len(sample_y)-1])
        sample_y = np.reshape(sample_y,(1,len(sample_y)))
        return sample_y
    
    def curves_to_fileXY(self,file_labels,file_numcycles,file_curves_normalized,file_curves):
        ''' curves_to_fileXY Method
                Takes all the curves in one Excel spreadsheet, converts them into features, 
                and concatenates them into one array. Does the same for all the annotations 
                in the spreadsheet.
            
                Args:
                    file_labels: array containing all the annotations in the file
                    file_numcycles: array containing the number of points per curve in the
                                    file
                    file_curves_normalized: array containing all the normalized curves 
                                            in the file before conversion to features
                    file_curves: array containing all the raw, unnormalized curves in the
                                 file before conversion to features
                Returns:
                    file_X: array containing all the normalized curves, now all 40 points long 
                    file_X_raw: array containing all the raw, unnormalized curves, now all 
                                40 points long
                    file_Y: array containing all the annotations of the curves
        '''
        file_numcurves = np.shape(file_curves_normalized)[1]
        self.all_numcurves = np.append(self.all_numcurves,file_numcurves)
        file_X = []
        file_X_raw = []
        file_Y = []
        for ii in range(file_numcurves):
            sample_y = self.sample_to_features(file_curves_normalized[:,ii],file_numcycles[ii])
            sample_y_raw = self.sample_to_features(file_curves[:,ii],file_numcycles[ii])
            if len(file_X) == 0:
                file_X = sample_y
                file_X_raw = sample_y_raw 
                file_Y = file_labels[ii]
            else:
                file_X = np.append(file_X,sample_y,axis=0)
                file_X_raw = np.append(file_X_raw,sample_y_raw,axis=0)
                file_Y = np.append(file_Y,file_labels[ii])
        return file_X,file_X_raw,file_Y
    
    def allexcel_to_allXY(self): 
        ''' allexcel_to_allXY Method
                Master function that calls the other functions - takes all the spreadsheets
                in the dataset, extracts the curves and annotations, converts the curves to
                features, performs normalization, and concatenates all the curves into one 
                master array all_X, and all the annotations into one master array all_Y

        '''
        for ii in range(len(self.filenames)):
            file_labels,file_numcycles,file_curves = self.excel_to_curves(filenames[ii])
            file_curves_normalized = self.normalize_curves(file_curves)
            file_X,file_X_raw,file_Y = self.curves_to_fileXY(file_labels,file_numcycles,file_curves_normalized,file_curves)
            if len(self.all_X) == 0:
                self.all_X = file_X
                self.all_X_raw = file_X_raw
                self.all_Y = file_Y
            else:
                self.all_X = np.append(self.all_X,file_X,axis=0)
                self.all_X_raw = np.append(self.all_X_raw,file_X_raw,axis=0)
                self.all_Y = np.append(self.all_Y,file_Y)
                
    def grandindex_to_filename_and_row(self,grandindex):
        ''' grandindex_to_filename_and_row Method
                After separating into training and test sets, the master arrays all_x
                and all_Y are shuffled, losing the connection between the curve and the
                file it came from. Pandas DataFrames are used to maintain the indices 
                from the original master arrays. This function converts these indices
                into the corresponding file and the row in the file you can find
                the curve of interest
            
                Args:
                    grandindex: Pandas DataFrame index of the curve of interest
                Returns:
                    filename: string name of file that the curve came from
                    row: int row number in the file that you can find the curve
        '''
        total = 0
        row = 0
        for ii in range(len(self.all_numcurves)): 
            if (grandindex > total + self.all_numcurves[ii]):
                total = total + self.all_numcurves[ii]
            else: 
                # 2 is added because python indexes start at 0 but curves in Tom's 
                # Excel spreadsheets start from row 2 down. 
                row = grandindex - total + 2
                filename = self.filenames[ii]
                break
        return filename,row
            
    def find_error_indices(self,Y_pred,Y_truth):
        ''' find_error_indices Method
                Takes the array of predictions Y_pred and the array of ground truth 
                annotations Y_test and compares them elementwise. If they differ, 
                the index is added to the returned array error_indices
            
                Args:
                    Y_pred: array of predictions from the model
                    Y_truth: array of ground truth annotations for the predictions
                Returns:
                    error_indices: array of indices in the input arrays where the 
                                   values differed (indicating an error)
        '''
        Y_truth_temp = Y_truth.values
        error_indices = []
        for ii in range(len(Y_truth_temp)):
            if Y_truth_temp[ii] != Y_pred[ii]:
                error_indices = np.append(error_indices,ii)
        return error_indices

    def check_error_samples(self,Y_pred,Y_truth,error_indices,num_to_check):
        ''' check_error_samples Method
                For a specified number of misclassfied examples to find
                (num_to_check) and the input array of all indices corresponding 
                to misclassified examples (error_indices), randomly extract 
                num_to_check misclassified examples and output the file and row
                that that example can be found
            
                Args:
                    Y_pred: array of predictions from the model
                    Y_truth: array of ground truth annotations for the predictions
                    error_indices: array containing the indices of all misclassified
                                   examples
                    num_to_check: int number of misclassified examples to find
                Returns:
                    error_samples: Dictionary containing num_to_check dictionaries
                                   Each sub-dictionary contains three keys and values
                                   corresponding to the prediction of a misclassified
                                   example, annotation of a misclassified example, and
                                   tuple of the filename and row that that misclassified
                                   example can be found
        '''
        error_samples = dict()
        for ii in range(num_to_check):
            rand_array_index = np.random.randint(len(error_indices))
            rand_error_index = int(error_indices[rand_array_index])
            grandindex = Y_truth.iloc[rand_error_index:rand_error_index+1].index[0]
            label = Y_truth.iloc[rand_error_index]
            prediction = Y_pred[rand_error_index]
            filename,row = self.grandindex_to_filename_and_row(grandindex)
            error_samples[ii] = dict()
            error_samples[ii]['Prediction'] = prediction
            error_samples[ii]['Label'] = label
            error_samples[ii]['Location'] = (filename,row)
        return error_samples    
                
    def add_noise_data(self,numcurves): 
        ''' add_noise_data Method
                Add random noise to the dataset, with annotation -1 (non-amplified)
            
                Args:
                    numcurves: number of random noise curves to add to the dataset

        '''
        for ii in range(numcurves):
            noise_data = np.random.randn(1,40)
            noise_label = 0
            self.all_X = np.append(self.all_X,noise_data,axis=0)
            self.all_Y = np.append(self.all_Y,noise_label)

def compute_confmat_3classes(y_pred, y_true):
    ''' compute_confmat_3classes Method
            Given the predictions and corresponding annotations, construct the 
            confusion matrix and output its contents 

            Args:
                y_pred: array of predictions from the model
                y_true: array of ground truth annotations for the predictions

            Returns:
                df_conf_mat: array corresponding to the confusion matrix table 
                fpr: false positive rate 
                fnr: false negative rate
                tpr: true positive rate
                tnr: true negative rate
                inc_pos: Number of inc annotations called positive
                inc_inc: Number of inc annotations called inc
                inc_neg: Number of inc annotations called negative 
                    
    '''

    conf_mat = np.zeros((3, 3))
    row_labels = ['Actual: -1','Actual: 0','Actual: 1']
    col_labels = ['Predicted: -1','Predicted: 0','Predicted: 1']
    y_true_copy = np.copy(y_true)
    y_pred_copy = np.copy(y_pred)
    for ii in range(K.flatten(y_true).shape[0].value):
        conf_mat[int(y_true_copy[ii])][int(y_pred_copy[ii])] += 1
    df_conf_mat = pandas.DataFrame(conf_mat,row_labels,col_labels)
    fpr = df_conf_mat.values[0,2]/(df_conf_mat.values[0,0]+df_conf_mat.values[0,2])
    fnr = df_conf_mat.values[2,0]/(df_conf_mat.values[2,0]+df_conf_mat.values[2,2])
    tpr = df_conf_mat.values[2,2]/(df_conf_mat.values[2,0]+df_conf_mat.values[2,2])
    tnr = df_conf_mat.values[0,0]/(df_conf_mat.values[0,0]+df_conf_mat.values[0,2])
    inc_pos = df_conf_mat.values[1,2]
    inc_inc = df_conf_mat.values[1,1]
    inc_neg = df_conf_mat.values[1,0]
    pos_inc = df_conf_mat.values[2,1]
    neg_inc = df_conf_mat.values[0,1]
    return df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc

def cv_trainingset(model_name,X_train,Y_train,X_test,Y_test,numsamples,cv_dict,aux_dict=dict(),class_weight_dict=dict()): 
    ''' cv_trainingset Method
            Performs Stratified K-Fold Cross Validation on the training set, where "numsamples"
            defines the number of folds. Averages the performance metrics across all the folds 
            and returns the mean and standard deviation of those performance metrics in "all_dict."
            "Stratified" K-Fold refers to preserving the class distribution of the entire training 
            set in each fold. 

            Args:
                model_name: string name of the model to load 
                X_train: array containing the training set (curves)
                Y_train: array containing the training annotations 
                X_test: array containing the test set (curves)
                Y_test: array containing the test set annotations
                numsamples: number of folds
                cv_dict: Dictionary containing pre-determined indices to split the training
                         set into "numsamples" folds.
                aux_dict: Dictionary containing auxiliary inputs for the training 
                          and test set (if applicable). Defaults to empty dictionary
                class_weight_dict: Dictionary containing class weights for fitting the model
                                   (if applicable). Defaults to empty dictionary

            Returns:
                all_dict: dictionary containing the mean and standard deviation of 
                          the performance metrics for the numsamples runs. 
                    
    '''
    all_acc_train = []
    all_fpr_train = []
    all_fnr_train = []
    all_tpr_train = []
    all_tnr_train = []
    all_inc_pos_train = []
    all_inc_inc_train = []
    all_inc_neg_train = []
    all_pos_inc_train = []
    all_neg_inc_train = []
    all_acc_val = []
    all_fpr_val = []
    all_fnr_val = []
    all_tpr_val = []
    all_tnr_val = []
    all_inc_pos_val = []
    all_inc_inc_val = []
    all_inc_neg_val = []
    all_pos_inc_val = []
    all_neg_inc_val = []
    all_acc_test = []
    all_fpr_test = []
    all_fnr_test = []
    all_tpr_test = []
    all_tnr_test = []
    all_inc_pos_test = []
    all_inc_inc_test = []
    all_inc_neg_test = []
    all_pos_inc_test = []
    all_neg_inc_test = []
    all_dict = dict()
    # Execute numsamples times
    for ii in range(numsamples):
        os.environ['PYTHONHASHSEED'] = '0'
        np.random.seed(42)
        rn.seed(12345)
        tf.set_random_seed(1234)
        model = load_model(model_name)
        sample_indices_train = cv_dict[ii]['train']
        sample_indices_val = cv_dict[ii]['val']
        X_train_sample = X_train[sample_indices_train]
        Y_train_sample = Y_train[sample_indices_train]
        X_val_sample = X_train[sample_indices_val]
        Y_val_sample = Y_train[sample_indices_val]
        Y_train_argmax = np.argmax(Y_train_sample,axis=1)
        Y_val_argmax = np.argmax(Y_val_sample,axis=1)
        # If aux_dict is not an empty dictionary, extract the auxiliary inputs 
        # corresponding to the training and test sets 
        if bool(aux_dict):
            X_train_aux_input = aux_dict['X_train_aux_input']
            X_train_aux_input_sample = X_train_aux_input[sample_indices_train]
            X_val_aux_input_sample = X_train_aux_input[sample_indices_val]
            X_test_aux_input = aux_dict['X_test_aux_input']
            if bool(class_weight_dict):
                model.fit([X_train_sample,X_train_aux_input_sample],Y_train_sample,epochs=10,batch_size=200,class_weight=class_weight_dict)
            else:
                model.fit([X_train_sample,X_train_aux_input_sample],Y_train_sample,epochs=10,batch_size=200)
            Y_pred_train = np.squeeze(model.predict([X_train_sample,X_train_aux_input_sample]))
            Y_pred_train = Y_pred_train.argmax(axis=-1)
            Y_pred_val = np.squeeze(model.predict([X_val_sample,X_val_aux_input_sample]))
            Y_pred_val = Y_pred_val.argmax(axis=-1)
            Y_pred_test = np.squeeze(model.predict([X_test,X_test_aux_input]))
            Y_pred_test = Y_pred_test.argmax(axis=-1)
        else:
            if bool(class_weight_dict):
                model.fit(X_train_sample,Y_train_sample,epochs=10,batch_size=200,class_weight=class_weight_dict)
            else:
                model.fit(X_train_sample,Y_train_sample,epochs=10,batch_size=200)
            Y_pred_train = np.squeeze(model.predict(X_train_sample))
            Y_pred_train = Y_pred_train.argmax(axis=-1)
            Y_pred_val = np.squeeze(model.predict(X_val_sample))
            Y_pred_val = Y_pred_val.argmax(axis=-1)
            Y_pred_test = np.squeeze(model.predict(X_test))
            Y_pred_test = Y_pred_test.argmax(axis=-1)
        # Calculate the number of errors by comparing the annotations and 
        # predictions elementwise and summing the number of times they differ
        numerrors = (Y_train_argmax != Y_pred_train).sum()
        _, fpr, fnr, tpr, tnr, inc_pos, inc_inc, inc_neg, pos_inc, neg_inc = compute_confmat_3classes(Y_pred_train, Y_train_argmax)
        # Calculate accuracy 
        acc = (len(Y_pred_train)-numerrors)/len(Y_pred_train)
        all_acc_train.append(acc)
        all_fpr_train.append(fpr)
        all_fnr_train.append(fnr)
        all_tpr_train.append(tpr)
        all_tnr_train.append(tnr)
        all_inc_pos_train.append(inc_pos)
        all_inc_inc_train.append(inc_inc)
        all_inc_neg_train.append(inc_neg)
        all_pos_inc_train.append(pos_inc)
        all_neg_inc_train.append(neg_inc)
        # Repeat the procedure for the validation set 
        numerrors = (Y_val_argmax != Y_pred_val).sum()
        _, fpr, fnr, tpr, tnr, inc_pos, inc_inc, inc_neg, pos_inc, neg_inc = compute_confmat_3classes(Y_pred_val, Y_val_argmax)
        acc = (len(Y_pred_val)-numerrors)/len(Y_pred_val)
        all_acc_val.append(acc)
        all_fpr_val.append(fpr)
        all_fnr_val.append(fnr)
        all_tpr_val.append(tpr)
        all_tnr_val.append(tnr)
        all_inc_pos_val.append(inc_pos)
        all_inc_inc_val.append(inc_inc)
        all_inc_neg_val.append(inc_neg)
        all_pos_inc_val.append(pos_inc)
        all_neg_inc_val.append(neg_inc)
        # Repeat the procedure for the test set 
        numerrors = (Y_test != Y_pred_test).sum()
        _, fpr, fnr, tpr, tnr, inc_pos, inc_inc, inc_neg, pos_inc, neg_inc = compute_confmat_3classes(Y_pred_test, Y_test)
        acc = (len(Y_pred_test)-numerrors)/len(Y_pred_test)
        all_acc_test.append(acc)
        all_fpr_test.append(fpr)
        all_fnr_test.append(fnr)
        all_tpr_test.append(tpr)
        all_tnr_test.append(tnr)
        all_inc_pos_test.append(inc_pos)
        all_inc_inc_test.append(inc_inc)
        all_inc_neg_test.append(inc_neg)
        all_pos_inc_test.append(pos_inc)
        all_neg_inc_test.append(neg_inc)
    # Performance metrics for the training set:
    # Accuracy
    all_dict['acc_train'] = [np.mean(all_acc_train),np.std(all_acc_train)]
    # False Positive Rate
    all_dict['fpr_train'] = [np.mean(all_fpr_train),np.std(all_fpr_train)]
    # False Negative Rate
    all_dict['fnr_train'] = [np.mean(all_fnr_train),np.std(all_fnr_train)]
    # True Positive Rate
    all_dict['tpr_train'] = [np.mean(all_tpr_train),np.std(all_tpr_train)]
    # True Negative Rate
    all_dict['tnr_train'] = [np.mean(all_tnr_train),np.std(all_tnr_train)]
    # Number of inc annotations called positive
    all_dict['inc_pos_train'] = [np.mean(all_inc_pos_train),np.std(all_inc_pos_train)]
    # Number of inc annotations called inc
    all_dict['inc_inc_train'] = [np.mean(all_inc_inc_train),np.std(all_inc_inc_train)]
    # Number of inc annotations called negative
    all_dict['inc_neg_train'] = [np.mean(all_inc_neg_train),np.std(all_inc_neg_train)]
    # Number of positive annotations called inc
    all_dict['pos_inc_train'] = [np.mean(all_pos_inc_train),np.std(all_pos_inc_train)]
    # Number of negative annotations called inc
    all_dict['neg_inc_train'] = [np.mean(all_neg_inc_train),np.std(all_pos_inc_train)]
    # Performance metrics for the validation set:
    all_dict['acc_val'] = [np.mean(all_acc_val),np.std(all_acc_val)]
    all_dict['fpr_val'] = [np.mean(all_fpr_val),np.std(all_fpr_val)]
    all_dict['fnr_val'] = [np.mean(all_fnr_val),np.std(all_fnr_val)]
    all_dict['tpr_val'] = [np.mean(all_tpr_val),np.std(all_tpr_val)]
    all_dict['tnr_val'] = [np.mean(all_tnr_val),np.std(all_tnr_val)]
    all_dict['inc_pos_val'] = [np.mean(all_inc_pos_val),np.std(all_inc_pos_val)]
    all_dict['inc_inc_val'] = [np.mean(all_inc_inc_val),np.std(all_inc_inc_val)]
    all_dict['inc_neg_val'] = [np.mean(all_inc_neg_val),np.std(all_inc_neg_val)]
    all_dict['pos_inc_val'] = [np.mean(all_pos_inc_val),np.std(all_pos_inc_val)]
    all_dict['neg_inc_val'] = [np.mean(all_neg_inc_val),np.std(all_pos_inc_val)]
    # Performance metrics for the test set:
    all_dict['acc_test'] = [np.mean(all_acc_test),np.std(all_acc_test)]
    all_dict['fpr_test'] = [np.mean(all_fpr_test),np.std(all_fpr_test)]
    all_dict['fnr_test'] = [np.mean(all_fnr_test),np.std(all_fnr_test)]
    all_dict['tpr_test'] = [np.mean(all_tpr_test),np.std(all_tpr_test)]
    all_dict['tnr_test'] = [np.mean(all_tnr_test),np.std(all_tnr_test)]
    all_dict['inc_pos_test'] = [np.mean(all_inc_pos_test),np.std(all_inc_pos_test)]
    all_dict['inc_inc_test'] = [np.mean(all_inc_inc_test),np.std(all_inc_inc_test)]
    all_dict['inc_neg_test'] = [np.mean(all_inc_neg_test),np.std(all_inc_neg_test)]
    all_dict['pos_inc_test'] = [np.mean(all_pos_inc_test),np.std(all_pos_inc_test)]
    all_dict['neg_inc_test'] = [np.mean(all_neg_inc_test),np.std(all_pos_inc_test)]
    return all_dict

def create_training_and_test_sets(all_X,all_Y,test_size=0.2):
    ''' create_training_and_test_sets Method
            Shuffle the entire dataset (all_X) and set of annotations (all_Y).
            Then split into a training set and test set, the proportions of 
            which are determined by the input "test_size." In addition, 
            one-hot-encode the training set annotations for use by Keras

            Args:
                all_X: Pandas DataFrame of all the curves in the dataset
                all_Y: Pandas DataFrame of all the annotations in the dataset
                test_size: fraction of the entire dataset that should be set
                           aside for the test set

            Returns:
                X_train: Pandas DataFrame containing the training set (curves)
                Y_train: array containing the one-hot-encoded training annotations 
                X_test: Pandas DataFrame containing the test set (curves)
                Y_test: Pandas DataFrame containing the test set annotations
                    
    '''
    X_train,X_test,Y_train,Y_test = train_test_split(all_X,all_Y,test_size)
    # Adjust annotations to comply with Keras encoding, which cannot accept 
    # label of -1. Convert -1 to 0, 0 to 1, and 1 to 2. 
    Y_train += 1
    Y_test += 1
    # Convert annotations to Keras one-hot-encoding
    Y_train = keras.utils.to_categorical(Y_train,num_classes=3)
    return X_train,X_test,Y_train,Y_test

# Load in the dataset of Excel spreadsheets
filenames = glob.glob(folder_path+'/*.xls')
# Use the data_and_labels class to extract the curves and annotations
# from the Excel spreadsheets 
all_XY = data_and_labels(filenames)
all_XY.allexcel_to_allXY()
# all_X holds the curves in the dataset and all_Y holds the annotations
# in the dataset
all_X = pandas.DataFrame(all_XY.all_X)
all_Y = pandas.Series(all_XY.all_Y) 

# Load in the data for all auxiliary inputs
# RMS Noise
mat_contents = scipy.io.loadmat(folder_path+'/RMSNoise.mat')
RMSNoise = mat_contents['approxRMSNoise']
RMSNoise = np.swapaxes(RMSNoise,0,1)
# Curve Derivatives
mat_contents = scipy.io.loadmat(folder_path+'/Derivs_RMS.mat')
Derivs_RMS = mat_contents['zerothAndHigherOrderDerivs_RMS']
Derivs_RMS = np.swapaxes(Derivs_RMS,0,2)
# Number of Polynomial Curve Crossings
mat_contents = scipy.io.loadmat(folder_path+'/numPolyCrossings.mat')
numPolyCrossings = mat_contents['numPolyCrossings']
numPolyCrossings = np.swapaxes(numPolyCrossings,0,1)
# Number of Diagonal Line Crossings
mat_contents = scipy.io.loadmat(folder_path+'/numZeroCrossingsDiagonal.mat')
numZeroCrossingsDiagonal = mat_contents['numZeroCrossingsDiagonal']
numZeroCrossingsDiagonal = np.swapaxes(numZeroCrossingsDiagonal,0,1)
# Number of Horizontal Line Crossings
mat_contents = scipy.io.loadmat(folder_path+'/numZeroCrossingsHorizontal.mat')
numZeroCrossingsHorizontal = mat_contents['numZeroCrossingsHorizontal']
numZeroCrossingsHorizontal = np.swapaxes(numZeroCrossingsHorizontal,0,1)
# Location of Second Derivative Maxima
mat_contents = scipy.io.loadmat(folder_path+'/yMaxMinLocs.mat')
yMaxMinLocs = mat_contents['yMaxMinLocs']
yMaxMinLocs = np.swapaxes(yMaxMinLocs,0,1)
# Value of Second Derivative Maxima
mat_contents = scipy.io.loadmat(folder_path+'/yMaxMinVals.mat')
yMaxMinVals = mat_contents['yMaxMinVals']
yMaxMinVals = np.swapaxes(yMaxMinVals,0,1)
# RMS Noise Scaling Factor
mat_contents = scipy.io.loadmat(folder_path+'/RMSNoiseScalingFactor.mat')
RMSNoiseScalingFactor = mat_contents['approxRMSNoiseBasedCurvesScalingFactor']
RMSNoiseScalingFactor = np.swapaxes(RMSNoiseScalingFactor,0,1)
# numZeroCrossingsDiagonal contains two columns, the first for positive slope diagonal line
# crossings, and the second for negative slope diagonal line crossings.
# More strongly weight the number of negative slope diagonal line crossings (weighting of 2)
numZeroCrossingsDiagonal[:,1]*=2
# Combine the two diagonal line crossing numbers into one summary number
numZeroCrossingsDiagonal = np.sum(numZeroCrossingsDiagonal,axis=1)
numZeroCrossingsDiagonal = numZeroCrossingsDiagonal[:,np.newaxis]
# numZeroCrossingsHorizontal contains several columns, each for the number of crossings
# for horizontal lines with different slopes and intercepts. Combine all the numbers into
# one summary number.
numZeroCrossingsHorizontal = np.sum(numZeroCrossingsHorizontal,axis=1)
numZeroCrossingsHorizontal = numZeroCrossingsHorizontal[:,np.newaxis]
# Combine the number of horizontal and diagonal line crossings into one summary number
numZeroCrossings = np.concatenate((numZeroCrossingsDiagonal,numZeroCrossingsHorizontal),axis=1)
numZeroCrossings = np.sum(numZeroCrossings,axis=1)
numZeroCrossings = numZeroCrossings[:,np.newaxis]
# More strongly weight the number of polynomial curve crossings (weighting of 6)
numPolyCrossingsScaled = numPolyCrossings[:,0]*6
numPolyCrossingsScaled = numPolyCrossingsScaled[:,np.newaxis]
# Combine the number of line crossings (horizontal and diagonal) and the number of 
# polynomial curve crossings into one summary number
numCrossings = numZeroCrossings + numPolyCrossingsScaled
# yMaxMinVals and yMaxMinLocs contain two columns, the first with information about
# second derivative maxima and the second with information about second derivative 
# minima. Extract information about only the second derivative maxima
yMaxMinVals = yMaxMinVals[:,0]
yMaxMinVals = yMaxMinVals[:,np.newaxis]
yMaxMinLocs = yMaxMinLocs[:,0]
# The range of values for yMaxMinLocs is 1 to ~39. Subtract 1 from all values to comply
# with Python indexing which begins at 0. 
yMaxMinLocs -= 1
yMaxMinLocs = yMaxMinLocs[:,np.newaxis]
# Instantiate an array to create a new auxiliary input for the value of the amp curve
# at the second derivative maxima
amp_curve_values_at_second_deriv_max = []
# Number of points per curve
numPts = 40
# Total number of curves
numCurves = Derivs_RMS.shape[0]
for ii in range(numCurves):
    # Get the location (cycle number) of the second derivative maximum for the current
    # curve of interest
    loc = int(yMaxMinLocs[ii])
    # Use all_X.values, which contains the normalized curve values, to obtain the value
    # of the amp curve at the second derivative maximum
    amp_curve_value_at_second_deriv_max = all_X.values[ii,loc]
    amp_curve_values_at_second_deriv_max.append(amp_curve_value_at_second_deriv_max)
amp_curve_values_at_second_deriv_max = np.array(amp_curve_values_at_second_deriv_max)
amp_curve_values_at_second_deriv_max = amp_curve_values_at_second_deriv_max[:,np.newaxis]
# Create a new auxiliary input where the values of the second derivative maxima are 
# divided by the RMSNoiseScalingFactor to obtain a scaled second derivative maximum value
yMaxMinValsScaled = yMaxMinVals/RMSNoiseScalingFactor
# Normalize the values of all auxiliary inputs between 0 and 1
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
yMaxMinLocs = min_max_scaler.fit_transform(yMaxMinLocs)
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
numZeroCrossings = min_max_scaler.fit_transform(numZeroCrossings)
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
numPolyCrossings = min_max_scaler.fit_transform(numPolyCrossings)
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
RMSNoiseScalingFactor = min_max_scaler.fit_transform(RMSNoiseScalingFactor)
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
yMaxMinValsScaled = min_max_scaler.fit_transform(yMaxMinValsScaled)

# Load in the pre-defined training and test sets. Also load in the
# training set indices and test set indices, which define which elements
# of all_X and all_Y correspond to the pre-defined training and test sets.

X_train = pandas.DataFrame(np.loadtxt(folder_path+'/X_train.txt',delimiter=','))
Y_train = np.loadtxt(folder_path+'/Y_train.txt',delimiter=',')
X_test = pandas.DataFrame(np.loadtxt(folder_path+'/X_test.txt',delimiter=','))
Y_test = pandas.DataFrame(np.loadtxt(folder_path+'/Y_test.txt',delimiter=','))
X_train_indices = np.loadtxt(folder_path+'/X_train_indices.txt',delimiter=',')
X_train_indices = np.array([int(el) for el in X_train_indices])
X_test_indices = np.loadtxt(folder_path+'/X_test_indices.txt',delimiter=',')
X_test_indices = np.array([int(el) for el in X_test_indices])

# Use the training set indices to extract the correct auxiliary
# inputs for the corresponding curves in the training set.
X_train_RMSNoise = np.take(RMSNoise,X_train_indices,axis=0)
X_train_numZeroCrossings = np.take(numZeroCrossings,X_train_indices,axis=0)
X_train_numPolyCrossings = np.take(numPolyCrossings,X_train_indices,axis=0)
X_train_yMaxMinVals = np.take(yMaxMinVals,X_train_indices,axis=0)
X_train_yMaxMinLocs = np.take(yMaxMinLocs,X_train_indices,axis=0)
X_train_amp_curve_values_at_second_deriv_max = np.take(amp_curve_values_at_second_deriv_max,X_train_indices,axis=0)
X_train_RMSNoiseScalingFactor = np.take(RMSNoiseScalingFactor,X_train_indices,axis=0)
X_train_yMaxMinValsScaled = np.take(yMaxMinValsScaled,X_train_indices,axis=0)
X_train_numCrossings = np.take(numCrossings,X_train_indices,axis=0)

# Use the test set indices to extract the correct auxiliary
# inputs for the corresponding curves in the test set.
X_test_RMSNoise = np.take(RMSNoise,X_test_indices,axis=0)
X_test_numZeroCrossings = np.take(numZeroCrossings,X_test_indices,axis=0)
X_test_numPolyCrossings = np.take(numPolyCrossings,X_test_indices,axis=0)
X_test_yMaxMinVals = np.take(yMaxMinVals,X_test_indices,axis=0)
X_test_yMaxMinLocs = np.take(yMaxMinLocs,X_test_indices,axis=0)
X_test_amp_curve_values_at_second_deriv_max = np.take(amp_curve_values_at_second_deriv_max,X_test_indices,axis=0)
X_test_RMSNoiseScalingFactor = np.take(RMSNoiseScalingFactor,X_test_indices,axis=0)
X_test_yMaxMinValsScaled = np.take(yMaxMinValsScaled,X_test_indices,axis=0)
X_test_numCrossings = np.take(numCrossings,X_test_indices,axis=0)

# Create the training and test set auxiliary input arrays
X_train_aux_input_1 = X_train_RMSNoise
X_test_aux_input_1 = X_test_RMSNoise
X_train_aux_input_2 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings),axis=1)
X_test_aux_input_2 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings),axis=1)
X_train_aux_input_3 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings),axis=1)
X_test_aux_input_3 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings),axis=1)
X_train_aux_input_4 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals),axis=1)
X_test_aux_input_4 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals),axis=1)
X_train_aux_input_5 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals,X_train_yMaxMinLocs),axis=1)
X_test_aux_input_5 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals,X_test_yMaxMinLocs),axis=1)
X_train_aux_input_6 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals,X_train_yMaxMinLocs,X_train_amp_curve_values_at_second_deriv_max),axis=1)
X_test_aux_input_6 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals,X_test_yMaxMinLocs,X_test_amp_curve_values_at_second_deriv_max),axis=1)
X_train_aux_input_7 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals,X_train_yMaxMinLocs,X_train_amp_curve_values_at_second_deriv_max,X_train_RMSNoiseScalingFactor),axis=1)
X_test_aux_input_7 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals,X_test_yMaxMinLocs,X_test_amp_curve_values_at_second_deriv_max,X_test_RMSNoiseScalingFactor),axis=1)
X_train_aux_input_8 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals,X_train_yMaxMinLocs,X_train_amp_curve_values_at_second_deriv_max,X_train_RMSNoiseScalingFactor,X_train_yMaxMinValsScaled),axis=1)
X_test_aux_input_8 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals,X_test_yMaxMinLocs,X_test_amp_curve_values_at_second_deriv_max,X_test_RMSNoiseScalingFactor,X_test_yMaxMinValsScaled),axis=1)

# Change the one-hot-encoded training set annotations into single number class
# labels 
Y_train_argmax = np.argmax(Y_train,axis=1)
# Use numsamples Stratified K-Fold Cross Validation
numsamples = 10
skf = StratifiedKFold(n_splits=numsamples)
# Create a dictionary holding the training and test set indices for all the
# folds in Stratified K-Fold Cross Validation
counter = 0
cv_dict = dict()
for train_index, test_index in skf.split(X_train, Y_train_argmax):
    cv_dict[counter] = dict()
    cv_dict[counter]['train'] = train_index
    cv_dict[counter]['val'] = test_index
    counter += 1

# Activation Functions to be used 
softsign = 'softsign'
relu = 'relu'
tanh = 'tanh'
sigmoid = 'sigmoid'
linear = 'linear'

# Optimizer to be used
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)

2.0 - Parameter Toolbox

The next few cells will guide you through choosing the parameters of your neural network. As a first pass to get a feeling for how the model is working, you can run the cells below keeping the default values.

2.1 - Number of neurons per hidden layer

The architecture we will be implementing consists of one input layer (not shown), 3 hidden layers, and one output layer as depicted above.

  • Input Layer (Not shown): Default (Unchangeable) of 40. 40 neurons for the 40-dimensional input (40 normalized fluorescence values for every amplification curve in the dataset).

  • First Hidden Layer: Default of 30. Every neuron in this layer is "fully connected" to every neuron in the input layer. Learns the interrelationships and interdependencies between all the input values.

  • Second Hidden Layer: Default of 30. Every neuron in this layer is "fully connected" to every neuron in the first hidden layer. The second hidden layer learns more abstract, latent features produced by the first hidden layer.

  • Third Hidden Layer:: Default of 20. Every neuron in this layer is "fully connected" to every neuron in the second hidden layer. The third hidden layer learns even more abstract, latent features produced by the second hidden layer. Typically this layer will contain less neurons than the second and first hidden layers, as the model more finely squeezes information out of the inputs.

  • Output Layer: Default (Unchangeable) of 3. 3 neurons for the 3-dimensional output (3 probabilities for each of the three classes: -1 or Non-Amplified, 0 or Inconclusive, and 1 or Amplified.

Change (or leave as default) the values in the cell below to experiment with the number of neurons in each layer of the architecture.

In [ ]:
# Number of neurons in hidden layer 1
layer1_depth = 30

# Number of neurons in hidden layer 2
layer2_depth = 30

# Number of neurons in hidden layer 3
layer3_depth = 20

2.1 - Activation Functions per Hidden Layer

As demonstrated by the diagram above, after each neuron in a layer outputs its raw value, this value is typically passed through an activation function. This activation function adds an additional nonlinearity, allowing the network to learn more nonlinear mappings. A few activation functions are displayed below:

Change (or leave as default) the values in the cell below to experiment with the activation functions in each layer of the architecture. The options you have are:

 - tanh
 - softsign
 - sigmoid
 - relu
 - linear
In [ ]:
# Activation function in hidden layer 1
layer1_activation = relu

# Activation function in hidden layer 2
layer2_activation = relu

# Activation function in hidden layer 3
layer3_activation = relu

2.2 - Batch Size and Number of Epochs

The last parameters we will be changing before creating and fitting our model are the batch size and number of epochs:

  • Batch Size: Default of 200. The number of samples in your dataset to give to the neural network before it computes the loss, backpropagates the error, and updates its parameters.

    • Increasing the batch size lets the network run in shorter computation time. As the network learns, larger batch sizes may also prevent oscillations about local minima due to computing the loss over a larger number of samples instead of (in the extreme case) one sample at a time.
    • Decreasing the batch size allows the network to update its parameters more frequently.
  • Number of Epochs: Default of 10. The number of times the neural network will pass through the entire dataset.

    • Increasing the number of epochs will let the network learn the training set very well. Increasing the number too far may lead to overfitting.

Change (or leave as default) the values in the cell below to experiment with the batch size and number of epochs:

In [ ]:
# Batch Size
batch_size = 200

# Number of epochs
epochs = 10

3.0 - Creating and Fitting the Neural Network

Keras is an excellent framework for rapidly prototyping deep learning models.

Here is an example of how to create a neural network in Keras:

# Uses the Sequential Model API in which every layer is linearly stacked on top of the last
model = Sequential()
# To add a layer, simply use the model.add() function
# Here we are adding a Dense (fully connected hidden layer) to the network with the number
# of neurons ("layer1_depth") and activation function ("layer1_activation") you specified 
# earlier. 
# Note that for the first layer in the Sequential Model, you must specify the dimensionality
# of your inputs. After this point, Keras will automatically calculate input and output shapes
# for you. 
model.add(Dense(layer1_depth, activation=layer1_activation, input_dim=40))
# Add the second hidden layer 
model.add(Dense(layer2_depth,activation=layer2_activation))
# Add the third hidden layer
model.add(Dense(layer3_depth,activation=layer3_activation))
# The last layer will be the output layer and must contain 3 neurons for the 3 class 
# probabilities we want to output. The "softmax" activation function is also necessary
# to ensure these probabilities sum to 1. 
model.add(Dense(3, activation='softmax'))
# Compile the model, preparing it to be fit to the training data. Leave the parameters 
# at the defaults for now. 
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])

In only six lines, you've created a neural network model. Run the following cell to create your neural network and fit it on the training set:

Note that there are additional BatchNormalization layers after each Dense layer. These BatchNormalization layers normalize the distribution of the output values after each Dense layer, improving network performance.

In [ ]:
# Create Neural Network model
model = Sequential()
model.add(Dense(layer1_depth, activation=layer1_activation, input_dim=40))
model.add(BatchNormalization())
model.add(Dense(layer2_depth,activation=layer2_activation))
model.add(BatchNormalization())
model.add(Dense(layer3_depth,activation=layer3_activation))
model.add(BatchNormalization())
model.add(Dense(3, activation='softmax'))
model.compile(optimizer=sgd,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model on the training set for the batch size and number of epochs you specified earlier
model.fit(X_train,Y_train,epochs=epochs,batch_size=batch_size)

Now that you've successfully created and fit your model, run the following cell to see the false positive rate (FPR), false negative rate (FNR), true positive rate (TPR), and true negative rate (TNR) performance metrics on the training set. A confusion matrix is also displayed:

In [ ]:
# Convert array of annotations (Y_train) from Keras one-hot encoding to int labels (0,1,2)
Y_train_argmax = np.argmax(Y_train,axis=1)
# Obtain predictions from model on training set 
Y_pred = model.predict(X_train)
# Convert probabilities into class labels 
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_train_argmax)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat

Run the following cell to see the FPR, FNR, TPR, and TNR performance metrics, along with the confusion matrix for the test set.

In [ ]:
# Obtain predictions from model on test set 
Y_pred = model.predict(X_test)
# Convert probabilities into class labels 
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_test)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat

4.0 - Adding Auxiliary Inputs

Now, to improve performance, you can add auxiliary inputs to the model. These features were hand-engineered to give the network additional information it can use to make predictions. The auxiliary inputs are as follows:

  • RMS Noise: Approximate RMS Noise in the curve, calculated by subtracting the normalized curve from a smoothed version of the normalized curve and finding the sum of squares of the residual.

  • Number of Line Crossings: Total number of crossings between the curve and various horizontal and diagonal lines with different slopes and intercept.

  • Number of Polynomial Curve Crossings: Total number of crossings between the curve and a 4th order polynomial fit to the curve.

  • Second Derivative Max Value:: Value of the second derivative maximum of the curve.

  • Second Derivative Max Location: Cycle number at which the second derivative maximum occurs.

  • Amp Curve Value at Second Derivative Max: Value of the normalized amp curve at the second derivative max

  • RMS Noise Scaling Factor: Scaling factor used to scale the second derivative max value. The RMS Noise Scaling factor is larger for noisier curves.

  • Scaled Second Derivative Max Value: Second derivative max value divided by the RMS Noise Scaling Factor

Choose the number of auxiliary inputs you wish to add to the model. The options you have are:

 - 1: RMS Noise
 - 2: (Same as 1) + Number of Line Crossings
 - 3: (Same as 2) + Number of Polynomial Curve Crossings
 - 4: (Same as 3) + Second Derivative Max Value
 - 5: (Same as 4) + Second Derivative Max Location
 - 6: (Same as 5) + Amp Curve Value at Second Derivative Max
 - 7: (Same as 6) + RMS Noise Scaling Factor
 - 8: (Same as 7) + Scaled Second Derivative Max Value

Change the argument of "num_aux_input" below to the number of auxiliary inputs you wish to add:

In [ ]:
# The number of auxiliary inputs to add. Choose from [1,2,3,4,5,6,7,8]:
num_aux_input = 6

As before, you can change (or leave as default) the values in the cell below, which specify the parameters of the neural network:

In [ ]:
# Number of neurons per hidden layer
layer1_depth = 30
layer2_depth = 30
layer3_depth = 20

# Activation Functions per hidden layer
layer1_activation = relu
layer2_activation = relu
layer3_activation = relu

# Batch size and Number of epochs
batch_size = 200
epochs = 10

4.1 - Regularization

An additional modification that can improve performance is the incorporation of reguarlization into the loss function:

"C0" is the categorical cross entropy loss function used by the model we are working with, "n" is the number of samples in the training set, and "w" are the weights learned by the network. Regularization serves to penalize larger weights, effectively making the model simpler and less prone to overfitting. The amount of regularization is determined by lambda (regularization parameter) in the equation.

Change (or leave as default) the regularization parameter (lambda) in the cell below. Keep in mind that due to the way it is incorporated in the loss function, the regularization parameter (lambda) determines a tradeoff between learning small weights and minimizing the original loss function (categorical cross entropy).

- Increasing lambda increases the emphasis on learning small weights
- Decreasing lambda increases the emphasis on minimizing the original loss function. 

It is suggested to start with small values of lambda, such as the default of 0.01, to incorporate the benefits of regularization (increasing generalization capability) as well as the benefits of the original loss function (learning the training set well).

In [ ]:
# Regularization parameter (lambda)
regularization = 0.01

4.2 - Class Weighting

The last modification we will make to the loss function is the addition of class weighting.

"Li" is the loss function for sample i, "n" is the total number of samples in the training set, "ni" is the number of samples in class i, and "c" is the total number of classes.

  • wi: Weight applied to the loss for each sample in class i. In order to address the skewed distribution in the dataset, higher weights are given to more poorly-represented classes, informing the network that those classes contain the greatest source of information.

  • alpha: Weighting exponent that determines how aggressively to emphasize the more poorly-represented classes as sources of information.

Change (or leave as default) the alpha parameter in the cell below.

- Increasing alpha, for example, increases the weight given to the Non-Amplified (-1) class. This more heaviliy penalizes misclassifications from this class (i.e. when an example annotated as -1 is called 1), thereby decreasing the false positive rate. 
In [ ]:
# alpha parameter in the class weighting formula
alpha = 1

5.0 - Creating and Fitting the Modified Model

The Sequential API was perfect for the relatively simple models we implemented thus far in the tutorial because those models were all "sequential," in other words one layer's output became the next layer's input and so on. For example:

However, now that we wish to add auxiliary inputs to our model, the model can no longer be compiled sequentially and we must turn instead to Keras' Functional API. Keras' Functional API has an incredible range of flexibility in model development, allowing you now to create models that accept multiple inputs:

At more advanced stages you can even create models with multiple inputs and multiple outputs, with layers recombining at different stages:

With the added benefit of flexibiltiy, Keras' Functional API is only slightly more complex than the Sequential API. Let's dive in to the syntax.

Previously we created the following neural network via the Sequential API, repeated below for convenience. The line numbers will be labeled in order for future reference.

# Line 0
model = Sequential()
# Line 1
model.add(Dense(layer1_depth, activation=layer1_activation, input_dim=40))
# Line 2
model.add(Dense(layer2_depth,activation=layer2_activation))
# Line 3
model.add(Dense(layer3_depth,activation=layer3_activation))
# Line 4
model.add(Dense(3, activation='softmax'))
# Line 5
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])

Let's jump straight into the Functional API translation of the same model. Note that line numbers here are not in order, but refer to the corresponding line numbers in the Sequential API code:

# Line 1a
inputs = Input(shape=(40,))
# Line 1b
x = Dense(layer1_depth, activation=layer1_activation)(inputs)
# Line 2
x = Dense(layer2_depth, activation=layer2_activation)(x)
# Line 3
x = Dense(layer3_depth, activation=layer3_activation)(x)
# Line 4
outputs = Dense(3, activation='softmax')(x)
# Line 0
model = Model(inputs=inputs, outputs=outputs)
# Line 5
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])

Now for the details. The major difference between the Sequential and Functional API is how layers are added to the model. In both cases, a layer is created the same way - by using the syntax typically reserved for classes in Python (indicated in blue font). For example, to create a Dense layer you use:

Dense(layer_depth, activation=layer_activation)

  • In the Sequential API, this layer object is then added to the model via the add() function. The Sequential API-specific syntax is indicated in red font:

    model.add(Dense(layer_depth,activation=layer_activation))

  • In the Functional API, this layer object is stored as a variable, which can be regarded as the layer's "name." Layers no longer have to be added sequentially because these layer "names" can be used to reference and connect layers anywhere within the model. The Functional API-specific syntax is indicated in red font:

    x = Dense(layer_depth,activation=layer_activation)(inputs)

    Here, "x" is the name of the Dense layer we have created and "inputs" is the name of the layer we want to connect our new Dense layer to (i.e. "inputs" precedes our new Dense layer in the model architecture).

Let's now proceed through a line-by-line comparison between the two APIs:

  • Line 0: Line 0 in the Sequential API immediately instantiates the model. By contrast, in the Functional API, instantiation of the model occurs in the second-to-last line after all of the layers are defined. To instantiate the model, call the Model() function and pass it two arguments:
    • The first argument "inputs" is the name of the Input Layer (to be discussed)
    • The second argument "outputs" is the name of the Output Layer (to be discussed)
  • Line 1: In the Sequential API, Line 1 creates and adds the first Dense layer to the model. Note that Line 1 must also specify an argument "input_dim," which informs the model what dimensionality to expect from the input vectors. (In this case, the input vectors are 40-dimensional for the 40 cycles of the amp curves). In the Functional API, Line 1 is split into two parts:
    • Line 1a informs the model what dimensionality to expect from the input vectors by creating an Input Layer. The "shape" argument to the Input Layer replaces the "input_dim" argument in the Sequential API. (Note that the "," is necessary after the 40 in order to indicate there are no additional dimensions). The Input layer is saved in the variable "inputs."
    • Line 1b creates the first Dense layer in the model, saves it in the variable "x," connects it to the layer "inputs," which is the name of our Input Layer from Line 1a.
  • Line 2: In the Sequential API, Line 2 creates and adds the second Dense layer to the model. Note that the "input_dim" argument is no longer necessary because Keras can infer the input dimensionalities of all layers after the first. In the Functional API, Line 3 creates the second Dense layer in the model, saves it in the variable "x," and connects it to the layer "x," which was the name of the first Dense layer in the model.
    • Note that we are overriding the "x" variable every time we create a new layer. This is completely fine for our purposes and for simplicity. However, for more advanced uses it is recommended that you create unique variable names for each layer in your model.
  • Line 3: In the Sequential API, Line 3 creates and adds the third Dense layer to the model. In the Functional API, Line 3 creates the third Dense layer in the model, saves it in the variable "x," and connects it to the layer "x," which was the name of the second Dense layer in the model.
  • Line 4: In the Sequential API, Line 4 creates and adds the fourth Dense layer to the model. In the Functional API, Line 4 creates the fourth Dense layer in the model, saves it in the variable "x," and connects it to the layer "x," which was the name of the third Dense layer in the model.
  • Line 5: Line 5 is identical in the Sequential API and Functional API, its purpose being to compile our completed model using the SGD optimizer, categorical-cross-entropy loss, and accuracy metric. The details of these parameters are unimportant at this time. It is, however, important to note that all three arguments must be specified in the compile() function for models in this tutorial.

We will use Keras' Functional API to create our modified model, which now incorporates auxiliary inputs, regularization, and class weighting. Run the cell below to create and fit the model on the training set. If you wish to read through the details of the code below, make note of four important points:

  1. As before in the Sequential model, there are additional BatchNormalization layers after each Dense layer, which normalize the distribution of the output values after each Dense layer, improving network performance.
  2. For clarity, layer names are explicitly specified with strings using the "name" parameter when creating layers.
  3. When instantiating the model using the Model() function, the "inputs" parameter can accept an array, the elements of which are all the desired Input layers of the model.
  4. If the model was instantiated using an array for the "inputs" parameter in Model(), it must also be fit using an array of inputs of the same size. The first element of the input given to fit() corresponds to the first layer in the array given to the "inputs" parameter of Model(), the second element of the input given to fit() corresponds to the second layer in the array given to the "inputs" parameter of Model(), and so on.
In [ ]:
# Create Neural Network Model
X_input = Input(shape=(40,),name='Input')
X_dense1 = Dense(layer1_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_1')(X_input)
X_bn1 = BatchNormalization(name='BatchNorm_1')(X_dense1)
X_dense2 = Dense(layer2_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_2')(X_bn1)
X_bn2 = BatchNormalization(name='BatchNorm_2')(X_dense2)
aux_input = Input(shape=(num_aux_input,), name='aux_input')
X_concat = keras.layers.concatenate([X_bn2,aux_input],name='concat')
X_bn3 = BatchNormalization(name='BatchNorm_3')(X_concat)
X_dense3 = Dense(layer3_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_3')(X_bn3)
X_bn4 = BatchNormalization(name='BatchNorm_4')(X_dense3)
X_output = Dense(3, activation='softmax', name='Output')(X_bn4)
model = Model(inputs=[X_input,aux_input],outputs=X_output)
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])

# Determine which auxiliary inputs to use based on the user-specified "num_aux_inputs"
if num_aux_input == 1:
    X_train_aux_input = X_train_aux_input_1
    X_test_aux_input = X_test_aux_input_1
elif num_aux_input == 2:
    X_train_aux_input = X_train_aux_input_2
    X_test_aux_input = X_test_aux_input_2
elif num_aux_input == 3:
    X_train_aux_input = X_train_aux_input_3
    X_test_aux_input = X_test_aux_input_3
elif num_aux_input == 4:
    X_train_aux_input = X_train_aux_input_4
    X_test_aux_input = X_test_aux_input_4 
elif num_aux_input == 5:
    X_train_aux_input = X_train_aux_input_5
    X_test_aux_input = X_test_aux_input_5 
elif num_aux_input == 6:
    X_train_aux_input = X_train_aux_input_6
    X_test_aux_input = X_test_aux_input_6
elif num_aux_input == 7:
    X_train_aux_input = X_train_aux_input_7
    X_test_aux_input = X_test_aux_input_7
elif num_aux_input == 8:
    X_train_aux_input = X_train_aux_input_8
    X_test_aux_input = X_test_aux_input_8

# Determine class weight dictionary based on user-specified "alpha"
# "fracts" is the distribution of classes in our dataset, indicating that ~16% of the dataset is non-amplified,
# ~1% is inconclusive, and ~83% is amplified.
fracts = np.array([0.16,0.01,0.83])
invFracts = np.power((1/fracts),alpha)
weights = invFracts/(np.sum(invFracts))
class_weight_dict = {0:weights[0],1:weights[1],2:weights[2]}

# Fit the model
model.fit([X_train,X_train_aux_input],Y_train,epochs=epochs,batch_size=batch_size,class_weight=class_weight_dict)

Now that you've successfully created and fit your model, run the following cell to see the false positive rate (FPR), false negative rate (FNR), true positive rate (TPR), and true negative rate (TNR) performance metrics on the training set. A confusion matrix is also displayed:

In [ ]:
# Convert array of annotations (Y_train) from Keras one-hot encoding to int labels (0,1,2)
Y_train_argmax = np.argmax(Y_train,axis=1)
# Obtain predictions from model on training set 
Y_pred = model.predict([X_train,X_train_aux_input])
# Convert probabilities into class labels 
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_train_argmax)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat

Run the following cell to see the FPR, FNR, TPR, and TNR performance metrics, along with the confusion matrix for the test set.

In [ ]:
# Obtain predictions from model on test set 
Y_pred = model.predict([X_test,X_test_aux_input])
# Convert probabilities into class labels 
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_test)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat

6.0 - Experiment

Now that you have a feel for how the neural networks work, all of the default parameters are listed in the cell below. Adjust them, experimenting with different combinations of parameter values.

In [ ]:
# Number of auxiliary inputs
num_aux_input = 6
# Number of neurons per hidden layer
layer1_depth = 30
layer2_depth = 30
layer3_depth = 20
# Activation Functions per hidden layer
layer1_activation = relu
layer2_activation = relu
layer3_activation = relu
# Batch size and Number of epochs
batch_size = 200
epochs = 10
# Regularization parameter (lambda)
regularization = 0.01
# alpha parameter in the class weighting formula
alpha = 1

Then run the next three cells to create and fit the model, obtain performance metrics on the training set, and obtain performance metrics on the test set. Go back and repeat the steps for this section (5.0) as many times as you wish before proceeding to the final section (6.0).

In [ ]:
# Create Neural Network Model
X_input = Input(shape=(40,),name='Input')
X_dense1 = Dense(layer1_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_1')(X_input)
X_bn1 = BatchNormalization(name='BatchNorm_1')(X_dense1)
X_dense2 = Dense(layer2_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_2')(X_bn1)
X_bn2 = BatchNormalization(name='BatchNorm_2')(X_dense2)
aux_input = Input(shape=(num_aux_input,), name='aux_input')
X_concat = keras.layers.concatenate([X_bn2,aux_input],name='concat')
X_bn3 = BatchNormalization(name='BatchNorm_3')(X_concat)
X_dense3 = Dense(layer3_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_3')(X_bn3)
X_bn4 = BatchNormalization(name='BatchNorm_4')(X_dense3)
X_output = Dense(3, activation='softmax', name='Output')(X_bn4)
model = Model(inputs=[X_input,aux_input],outputs=X_output)
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
# Determine which auxiliary inputs to use based on the user-specified "num_aux_inputs"
if num_aux_input == 1:
    X_train_aux_input = X_train_aux_input_1
    X_test_aux_input = X_test_aux_input_1
elif num_aux_input == 2:
    X_train_aux_input = X_train_aux_input_2
    X_test_aux_input = X_test_aux_input_2
elif num_aux_input == 3:
    X_train_aux_input = X_train_aux_input_3
    X_test_aux_input = X_test_aux_input_3
elif num_aux_input == 4:
    X_train_aux_input = X_train_aux_input_4
    X_test_aux_input = X_test_aux_input_4 
elif num_aux_input == 5:
    X_train_aux_input = X_train_aux_input_5
    X_test_aux_input = X_test_aux_input_5 
elif num_aux_input == 6:
    X_train_aux_input = X_train_aux_input_6
    X_test_aux_input = X_test_aux_input_6
elif num_aux_input == 7:
    X_train_aux_input = X_train_aux_input_7
    X_test_aux_input = X_test_aux_input_7
elif num_aux_input == 8:
    X_train_aux_input = X_train_aux_input_8
    X_test_aux_input = X_test_aux_input_8

# Determine class weight dictionary based on user-specified "alpha"
fracts = np.array([0.16,0.01,0.83])
invFracts = np.power((1/fracts),alpha)
weights = invFracts/(np.sum(invFracts))
class_weight_dict = {0:weights[0],1:weights[1],2:weights[2]}

# Fit the model
model.fit([X_train,X_train_aux_input],Y_train,epochs=epochs,batch_size=batch_size,class_weight=class_weight_dict)
In [ ]:
# Convert array of annotations (Y_train) from Keras one-hot encoding to int labels (0,1,2)
Y_train_argmax = np.argmax(Y_train,axis=1)
# Obtain predictions from model on training set 
Y_pred = model.predict([X_train,X_train_aux_input])
# Convert probabilities into class labels 
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_train_argmax)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat
In [ ]:
# Obtain predictions from model on test set 
Y_pred = model.predict([X_test,X_test_aux_input])
# Convert probabilities into class labels 
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_test)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat

7.0 - Obtaining Performance Statistics

Now that you've created, fit, and determined the performance metrics for one iteration of your model, you can create several iterations to obtain performance statistics (mean, standard deviation). The cell below will perform the following steps:

  • 1. Instantiate Model: Your model architecture and initialized weights will be saved to an H5 file. This file represents the base version of the model prior to any fitting on the dataset and will be called each time we create a new iteration of the model.

  • 2. Stratified K-Fold Partitioning: The training set will be stratified into K folds, such that K-1 folds are used to train the model and the last, unused fold is used for validation. "Stratified" refers to the class distribution in the entire training set being preserved in each fold.

  • 3. Cross Validation and Testing: The stratified K-Fold partitioning is repeated 10 times and the mean and standard deviation of the performance metrics over all 10 runs computed. Performance metrics are computed, for each run, on the training set and the validation set. Note that the training and validation set change during each run, so that the model eventually encounters every example in the training set. The test set, which the model never encountered during any run of training, is then predicted on to obtain performance metrics.

Run the cell below to run the stratified K-Fold cross validation procedure. For simplicity, the code below will display performance metrics (FPR, FNR, TPR, TNR) on only the training and test set. Note that this cell will take a few minutes to run:

In [ ]:
# Determine which auxiliary inputs to use based on the user-specified "num_aux_inputs"
aux_dict = dict()
if num_aux_input == 1:
    aux_dict['X_train_aux_input'] = X_train_aux_input_1
    aux_dict['X_test_aux_input'] = X_test_aux_input_1
elif num_aux_input == 2:
    aux_dict['X_train_aux_input'] = X_train_aux_input_2
    aux_dict['X_test_aux_input'] = X_test_aux_input_2
elif num_aux_input == 3:
    aux_dict['X_train_aux_input'] = X_train_aux_input_3
    aux_dict['X_test_aux_input'] = X_test_aux_input_3
elif num_aux_input == 4:
    aux_dict['X_train_aux_input'] = X_train_aux_input_4
    aux_dict['X_test_aux_input'] = X_test_aux_input_4 
elif num_aux_input == 5:
    aux_dict['X_train_aux_input'] = X_train_aux_input_5
    aux_dict['X_test_aux_input'] = X_test_aux_input_5 
elif num_aux_input == 6:
    aux_dict['X_train_aux_input'] = X_train_aux_input_6
    aux_dict['X_test_aux_input'] = X_test_aux_input_6
elif num_aux_input == 7:
    aux_dict['X_train_aux_input'] = X_train_aux_input_7
    aux_dict['X_test_aux_input'] = X_test_aux_input_7
elif num_aux_input == 8:
    aux_dict['X_train_aux_input'] = X_train_aux_input_8
    aux_dict['X_test_aux_input'] = X_test_aux_input_8

# Determine class weight dictionary based on user-specified "alpha"
class_weight_dict = dict()
fracts = np.array([0.16,0.01,0.83])
invFracts = np.power((1/fracts),alpha)
weights = invFracts/(np.sum(invFracts))
class_weight_dict = {0:weights[0],1:weights[1],2:weights[2]}

# Compute performance metric statistics for your Neural Network 
model_name = 'amp_model_demo.h5'
model.save(model_name)
all_dict = cv_trainingset(model_name,X_train.values,Y_train,X_test.values,Y_test.values,numsamples,cv_dict,aux_dict,class_weight_dict)

print('Training Set Performance Metrics')
print("FPR: %0.2f (+/- %0.2f)" % (all_dict['fpr_train'][0]*100, all_dict['fpr_train'][1]*100 * 2))
print("FNR: %0.2f (+/- %0.2f)" % (all_dict['fnr_train'][0]*100, all_dict['fnr_train'][1]*100 * 2))
print("TPR: %0.2f (+/- %0.2f)" % (all_dict['tpr_train'][0]*100, all_dict['tpr_train'][1]*100 * 2))
print("TNR: %0.2f (+/- %0.2f)" % (all_dict['tnr_train'][0]*100, all_dict['tnr_train'][1]*100 * 2))

print('Test Set Performance Metrics')
print("FPR: %0.2f (+/- %0.2f)" % (all_dict['fpr_test'][0]*100, all_dict['fpr_test'][1]*100 * 2))
print("FNR: %0.2f (+/- %0.2f)" % (all_dict['fnr_test'][0]*100, all_dict['fnr_test'][1]*100 * 2))
print("TPR: %0.2f (+/- %0.2f)" % (all_dict['tpr_test'][0]*100, all_dict['tpr_test'][1]*100 * 2))
print("TNR: %0.2f (+/- %0.2f)" % (all_dict['tnr_test'][0]*100, all_dict['tnr_test'][1]*100 * 2))

Try running the cell below and compare the results you get with the table in the following cell:

In [ ]:
# Set parameters 
num_aux_input = 8
regularization = 0.01
alpha = 1.2
layer1_depth = 30
layer2_depth = 30
layer3_depth = 20
layer1_activation = relu
layer2_activation = relu
layer3_activation = relu
batch_size = 200
epochs = 10

# Set random seeds
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(42)
rn.seed(12345)
tf.set_random_seed(1234)

# Create model
X_input = Input(shape=(40,),name='Input')
X_dense1 = Dense(layer1_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_1')(X_input)
X_bn1 = BatchNormalization(name='BatchNorm_1')(X_dense1)
X_dense2 = Dense(layer2_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_2')(X_bn1)
X_bn2 = BatchNormalization(name='BatchNorm_2')(X_dense2)
aux_input = Input(shape=(num_aux_input,), name='aux_input')
X_concat = keras.layers.concatenate([X_bn2,aux_input],name='concat')
X_bn3 = BatchNormalization(name='BatchNorm_3')(X_concat)
X_dense3 = Dense(layer3_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_3')(X_bn3)
X_bn4 = BatchNormalization(name='BatchNorm_4')(X_dense3)
X_output = Dense(3, activation='softmax', name='Output')(X_bn4)
model = Model(inputs=[X_input,aux_input],outputs=X_output)
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])

# Create auxiliary input dictionary
aux_dict = dict()
aux_dict['X_train_aux_input'] = X_train_aux_input_8
aux_dict['X_test_aux_input'] = X_test_aux_input_8

# Create class weight dictionary
class_weight_dict = dict()
fracts = np.array([0.16,0.01,0.83])
invFracts = np.power((1/fracts),alpha)
weights = invFracts/(np.sum(invFracts))
class_weight_dict = {0:weights[0],1:weights[1],2:weights[2]}

# Save model to H5 file 
model_name = 'amp_model_demo.h5'
model.save(model_name)
# Run Stratified K-Fold Cross Validation
all_dict = cv_trainingset(model_name,X_train.values,Y_train,X_test.values,Y_test.values,numsamples,cv_dict,aux_dict,class_weight_dict)

print('Training Set Performance Metrics')
print("FPR: %0.2f (+/- %0.3f)" % (all_dict['fpr_train'][0]*100, all_dict['fpr_train'][1]*100 * 2))
print("FNR: %0.2f (+/- %0.3f)" % (all_dict['fnr_train'][0]*100, all_dict['fnr_train'][1]*100 * 2))
print("TPR: %0.2f (+/- %0.3f)" % (all_dict['tpr_train'][0]*100, all_dict['tpr_train'][1]*100 * 2))
print("TNR: %0.2f (+/- %0.3f)" % (all_dict['tnr_train'][0]*100, all_dict['tnr_train'][1]*100 * 2))

print('Test Set Performance Metrics')
print("FPR: %0.2f (+/- %0.3f)" % (all_dict['fpr_test'][0]*100, all_dict['fpr_test'][1]*100 * 2))
print("FNR: %0.2f (+/- %0.3f)" % (all_dict['fnr_test'][0]*100, all_dict['fnr_test'][1]*100 * 2))
print("TPR: %0.2f (+/- %0.3f)" % (all_dict['tpr_test'][0]*100, all_dict['tpr_test'][1]*100 * 2))
print("TNR: %0.2f (+/- %0.3f)" % (all_dict['tnr_test'][0]*100, all_dict['tnr_test'][1]*100 * 2))

Expected output: Your expected output should be close to the following:

**Training Set FPR =** 0.190
**Training Set FNR =** 0.048
**Test Set FPR =** 0.021
**Test Set FNR =** 0.085

Bonus Section - Visualization

1.0 - Plotting Model Architecture

Run the cell below to create a graphic visualization of the model architecture you created for this lesson. The visualization will be an image called 'model.png' saved in the same directory as this notebook.

In [ ]:
plot_model(model, show_shapes=True, show_layer_names=True, to_file='model.png')

Upon locating the "model.png" file and opening it, ithe image should look similar to the following (the details depend on what model you designed prior to this section):

2.0 - Tensorboard

Tensorboard is an excellent visualization tool offered by Tensorflow, described by one of its developers as a "flashlight to shine on the black box of neural networks." Tensorboard will allow you to not only see a graph of your model architecture, as we created in the previous section using Keras, but also graphs of accuracy over time, loss over time, histograms of weight values over time, and much more.

2.1 - Setting up

To begin, we will be creating a new folder called "tb_logs" in the directory this notebook is saved on your system. As your neural network trains, information will be saved every epoch in the form of logs, which Tensorboard will use to create its visualizations.

Run the following cell to create the "tb_logs" folder:

In [ ]:
# Specify the absolute path of the directory you will be saving the log files 
LOG_DIR = folder_path+'/tb_logs'
os.mkdir(LOG_DIR)

As before, change (or leave as default) the parameters in the cell below, which will determine your neural network design. We will be working with a simpler 4-layer network in this section with no auxiliary inputs:

In [ ]:
# Run K.clear_session() to clear old models and layers from memory to speed up execution in this section
# K.clear_session()

# Set parameters
layer1_depth = 40
layer1_activation = 'relu'
layer2_depth = 20
layer2_activation = 'relu'
layer3_depth = 20
layer3_activation = 'relu'
layer4_depth = 16
layer4_activation = 'relu'
epochs = 30
batch_size = 200

# Create Neural Network model
model = Sequential()
model.add(Dense(layer1_depth, activation=layer1_activation, name='dense_1', input_dim=40))
model.add(BatchNormalization())
model.add(Dense(layer2_depth, activation=layer2_activation, name='dense_2'))
model.add(BatchNormalization())
model.add(Dense(layer3_depth, activation=layer3_activation, name='dense_3'))
model.add(BatchNormalization())
model.add(Dense(layer4_depth, activation=layer4_activation, name='dense_4'))
model.add(BatchNormalization())
model.add(Dense(3, activation='softmax', name='dense_5'))
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
model.compile(optimizer=sgd,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

2.2 - Starting Tensorboard

As you will see going through this section, there are a lot of moving parts required to get Tensorboard up and running. In order to get you started, the following cell encapsulates a lot of those moving parts and we will not be going through all the details. Feel free, however, to read through the comments to see what is going on under the hood.

For the purposes of this tutorial, there are a few points you should know about how Keras is using Tensorboard:

  • Tensorboard obtains information/performs calculations to assess the performance of the neural network every epoch, with two of the most important calculations it performs being accuracy and loss. It is important to note that Tensorboard is capable of performing these calculations not only on the training set (default), but also on an optionally provided validation/test set (with accompanying annotations). We will be using this feature of Tensorboard extensively to see the performance of the neural network on the training set and test set every epoch.
  • In addition to accuracy and loss, as we will see in the subsequent sections, Tensorboard also generates, for every epoch, histograms/distributions of the weight values, as well as "embeddings" (to be discussed) of the data for each layer. Because these specialized computations are time-consuming, Tensorboard has arguments to specify the frequency with which you want to perform them in units of epochs. We will be calling this "calculations_freq" and setting the default to 10, i.e. Tensorboard will perform the specialized calculations every 10 epochs.

Change (or leave as default) the value of "calcluations_freq" below:

In [ ]:
# The frequency, in units of epochs, with which Tensorboard should perform specialized calculations
calculations_freq = 10

Run the following cell to initialize Tensorboard. The details of the code are not important for this tutorial:

In [ ]:
# Create a one-hot encoding of Y_test 
Y_test_onehot = to_categorical(Y_test)
# Specify which layers Tensorboard should derive embeddings from
embedding_layer_names = set(layer.name for layer in model.layers if layer.name.startswith('dense_'))
# Specify what data Tensorboard should use to calculate the embeddings 
embeddings_data = X_test.values

# Create metadata file containing annotations associated with each example that will be embedded 
metadata_file = os.path.join(LOG_DIR, 'metadata.tsv')
with open(metadata_file, 'w') as f:
    for ii in range(len(Y_test.values)):
        f.write('{}\n'.format(int(Y_test.values[ii])-1))

# Define the Tensorboard callback. 
tensorboard = TensorBoard(log_dir=LOG_DIR, histogram_freq=calculations_freq, batch_size=32,
                           write_graph=True, write_grads=False, write_images=True,
                           embeddings_freq=calculations_freq, embeddings_metadata=metadata_file,
                           embeddings_layer_names=embedding_layer_names,
                           embeddings_data=embeddings_data)
# Fit the model 
model.fit(X_train,Y_train,callbacks=[tensorboard],epochs=epochs,batch_size=batch_size,validation_data=(X_test,Y_test_onehot))

While the cell above is running, open Terminal (for Mac) or Command Prompt (for Windows). For Mac Users in Terminal, you will see something like the following:

Note the absolute path on your system where the "tb_logs" folder is stored. Enter the following command in Terminal/Command Prompt, replacing "[path]" with your absolute path:

"tensorboard --logdir [path]"

In my case, "[path]" was "/Users/christopher.chin/tb_logs." After entering the command in Terminal/Command Prompt your window should be similar to the following:

Press "Enter" and the command will run, starting up Tensorboard.

In order to access Tensorboard, you will have to open a new window in the Web Browser of your choice and enter the following in the address bar. Note that "6006" is the port on your localhost assigned to Tensorboard:

"localhost:6006"

After entering the line above in your browser's address bar, your screen should be similar to the following:

Press "Enter" and you will be directed to the server running Tensorboard. You should see a screen in your browser similar to the following. (If your start-up screen is initially very different, the "Scalars" dashboard in the top bar of the page may not yet be available. In that case, wait a little for Tensorboard to perform its computations. Within a few minutes, the "Scalars" dashboard option should appear and clicking on it will produce the start-up screen below):

On the top bar of the screen are the different dashboards you have access to:

  • Scalars

  • Images

  • Graphs

  • Distributions

  • Histograms

  • Projector

We will be going through each of these dashboards step-by-step in the next few sections.

2.2 - Scalars Dashboard

We will be focusing in this section on the "Scalars" dashboard. Examine the screenshot below:

The Scalars dashboard gives you visualizations of all the scalar information Tensorboard has collected over the epochs of training, specifically the accuracy and loss. As noted previously, every epoch the model will evaluate the accuracy and loss on the training set (indicated by the top two white tabs "acc" and "loss"), but also the accuracy and loss on the test set (indicated by the bottom two white tabs "val_acc" and "val_loss"). You can click on the white tabs themselves to show or hide the corresponding graphs. Note that "val_loss" does not appear in the screenshot above, but is beneath the white "val_acc" tab if you continue to scroll down on the page.

All of the graphs can be manipulated in the same way, so we will examine the training accuracy "acc" plot as a representative model. As indicated in the screenshot, click the left-most button to enlarge the image. You should see something similar to the following:

As indicated in the screenshot, there is a section on the left-hand-side pane called "Smoothing" that contains a sliding bar and box that accepts numerical inputs. By default, the graph of accuracy is smoothed, with degree of smoothing 0.6, to more easily view the trend in the graph. You can reduce the degree of smoothing to 0, or increase the smoothing to 1 by sliding the bar or entering the desired value into the box. By reducing the smoothing to 0, you can see a graph similar to the following:

Note also that moving your cursor along the graph provides you with a black box containing information about the specific training accuracy values calculated every epoch.

2.3 - Images Dashboard

The Images dashboard allows you to visualize the weights that the neural network is learning over time. Clicking on the "Images" dashboard at the top of the Tensorboard page should open a windows similar to the following:

As indicated in the screenshot, click the white tab labeled "batch_normalization_1" to close it. Close all other open tabs until the screen looks like the following:

As indicated in the screenshot, click the white tab labeled "dense_1" to open it. Your screen should now look similar to the following:

We are now looking at a visualization of the bias vector and weight matrix for the first hidden layer (called "dense_1") of the neural network. Darker colors correspond to values of larger magnitude while lighter colors correspond to weights of smaller magnitude. In order to interpret these images, recall that neural network computations follow the series of steps described below. This entire set of computations occurs for every neuron in every layer:

Let's fix one neuron in a given layer as an example. The first part of the calculations is the "pre-activation," which is computed by taking the sum of the products of the outputs of all neurons in the previous layer with their corresponding weighted connections to the fixed neuron under consideration. A bias vector is then added to the result. In the second part of the calculations, an "activation function" is applied to the "pre-activation" output to yield the final output. Therefore, since the first hidden layer "dense_1" in the network contains 40 neurons and the previous layer - the input layer - is 40-dimensional, there must be 40 weighted connections per neuron in "dense_1." Since the "pre-activation" computation can be summarized as a matrix-vector multiplication (between a weight matrix and the output vector of the previous layer), the weight matrix associated with "dense_1" must be of shape (40x40). (Multiplying the 40x40 weight matrix by the 40x1 vector of inputs yields a 40x1 vector representing the "pre-activation" output). The bias vector must then be of shape (40x1), as it is added to the result of the matrix-vector multiplication. The weight matrix is reproduced and enlarged below:

The weight matrix can be interpreted as follows:

  • Every row (i) in the matrix corresponds to the weighted connections between neuron (i) in the given layer and all the inputs from the previous layer.
  • Every element (j) in a row corresponds to the weighted connection between neuron (i) in the given layer and input (j) from the previous layer.
  • Example: Consider the topmost row of the weight matrix, which represents the weighted connections between the first neuron in "dense_1" and all the elements of the input vector. The first element in the row represents the connection between the first neuron in "dense_1" and the first element of the input vector, which corresponds to the fluorescence value at cycle 1. Likewise, the last element in the row represents the connection between the first neuron in "dense_1" and the last element of the input vector, which corresponds to the fluorescence value at cycle 40.

As indicated in the screenshot, you can also move the "Epochs slider" from right to left to see the weight matrix values at earlier epochs. The same operations can also be done on the bias vector visualization.

2.4 - Graphs Dashboard

In Bonus Section 1.0, we plotted your model architecture using Keras as a high-level flow-diagram showing the layer names, layer sizes, and connections between layers. The Graphs Tab of Tensorboard functions as a more in-depth, low-level visualization of your model. After selecting the "Graphs" dashboard, your screen should be similar to the following:

There are a lot of details on this page that I encourage you to expore on your own. For the purposes of this tutorial, we will focus on the flow-diagram on the left titled "Main Graph," which shows all the connections between layers and serves as a similar graphical rendition to the Keras visualization we plotted earlier. Here, however, you can interact with the graph and double-click on a layer to show the inner computations. As indicated in the screenshot, double-click on "dense_4" to show more details about the layer:

Upon opening up the "dense_4" layer, you can now see the computations with the weight matrix, bias vector, the regularization applied, etc. Opening up all the layers in this fashion would allow you to follow the backpropagation of the gradient through all the layers of the network (used to update all weight parameters as the model trains).

2.5 - Distributions Dashboard

The Distributions dashboard is a variation upon the Images dashboard, in the sense that it allows you to see the evolution of the weight matrices and bias vectors in your model over time. This time, however, the visualization takes the form of distribution, as the heading suggests. Upon clicking on the "Distributions" dashboard, your screen should be similar to the following:

As indicated in the screenshot, click on the white tab titled "batch_normalization_1" to close it. Close any other open white tabs in a similar manner until your screen resembles the following:

As indicated in the screenshot, click on the "dense_1" tab to open it. You should now see graphs similar to the ones below:

Here is how to interpret the graphs:

  • Horizontal Axis: Epochs

  • Vertical Axis: Values of the weights/bias

  • Horizontal Lines: The horizontal lines cutting through the graphs represent percentiles of the distribution over the weights/biases. From top to bottom, the lines represent the maximum, 93%, 84%, 69%, 50%, 31%, 16%, 7%, and minimum percentiles in the distribution. Following an individual line across the graph shows how that particular percentile has over the epochs of training. For instance, follow the highest line in the graphs to see how the maximum value changes over time.

2.6 - Histograms Dashboard

The Histograms dashboard contains rotated views of the graphs from the Distributions dashboard, so we will not spend too much time in this section. However, the different perspective can provide different insights into how the weights/biases are changing over time. Upon clicking on the Histograms dashboard, closing all open white tabs, and opening the white tab titled "dense_1," your screen should be similar to the following:

Once again, these graphs are the same as those we saw in the previous section on the Distributions tab, only rotated so that the following interpretation holds:

  • Horizontal Axis: Values of the weights/bias

  • Vertical Axis: Epochs

  • Horizontal Slices: The horizontal slices that appear on individual horizontal axes in the graphs represent histograms of the weight/bias values at a particular epoch. Histograms for earlier, older epochs appear in the "back," while histograms for later, more recent epochs appear in the front. Looking at an individual temporal slice histogram in this way allows you to see the distribution of the weights/bias values at a given moment in time.

2.7 - Projector Dashboard

One of the most exciting features of Tensorboard is the visualization of embeddings. Typically, embeddings refer to the conversion of qualitiative objects like words into numerical vectors, which can then be compared on the basis of similarity metrics like distance. More generally, however, embeddings can be thought of as function mappings from input to output. Each layer of a neural network can therefore be thought of as creating an embedding of the input feature vector, transforming it into a new-dimensional space. For example, given the 40-dimensional input feature vector we have been considering thus far (corresponding to the 40 fluorescence values for each cycle), if the first hidden layer has 40 neurons, it is embedding or transforming the input feature vector into a new 40-dimensional space. The second hidden layer, which has 20 neurons, is then transforming its input into a new 20-dimensional space, and so on. The Projector dashboard allows us to visualize these high-dimensional embeddings by translating them into 2 or 3 dimensions via algorithms like PCA (taking the first two or three principal components), or t-SNE.

Upon clicking on the Projector Dashboard, your screen should be similar to the following. (If you instead receive an error message saying "Error fetching tensors" the projector dashboard is not yet ready for viewing, as Tensorboard may still be performing its calculations. Wait a few minutes and re-visit the dashboard. Typically, the Projector dashboard is fully accessible once the neural network has finished training completely):

Each data point in this space represents an example in our training set. The visualization right now does not have much meaning, however, because we are unable to identify which data point belongs to which class. We will be distignuishing the different classes by color. First, as indicated in the screenshot, click on the dropdown menu titled "Color by":

Now select the option "label":

All the data points are now colored according to the class they correspond to. (Important: Note that the colors on your screen may differ from those depicted in the screenshots. Hover your cursor over individual, colored data points to see which color corresponds to which class). In the screenshots, red corresponds to "Amplified (Label: 1)", orange corresponds to "Inconclusive (Label: 0)", and blue corresponds to "Non-Amplified (Label: -1)." Representative points belonging to each of these classes are labeled in the screenshot. Feel free to explore the visualization by clicking individual data points, clicking and dragging to rotate the plot, etc. You can see that the first hidden layer of the neural network has found an embedding for the different classes, in which each class belongs to a to different region of a high-dimensional manifold. Now click on the dropdown menu on the top of the left-hand-side pane titled "5 tensors found." Select "dense_5_embedding" and your screen should be similar to the following:

This visualization represents the embedding of the last layer of the neural network. Once again, the neural network has found an embedding where each class belongs to a different region in the high-dimensional space. Data points that are "close" to each other in the space ("closeness" being defined by a distance metric) are more similar to each other and more likely to belong to the same class.

Congratulations!!!

That concludes our tutorial. Hopefully, this demo provided a good starting point for you to begin working with your own neural network designs. As a final note, the Keras documentation is an excellent resource for future inquiries and additional exploration:

https://keras.io/

In addition, note that if you wish to use Tensorboard after finishing this demo, you must delete all the contents of your tb_logs folder so that Tensorboard does not overlap your new graphs with graphs of previous models.

In [ ]: