In this demo, we will review:
Prior to beginning this session, it is assumed that you have already downloaded Demo_Files.zip and moved the contents into a desired folder on your system.
To begin, locate the file path to the folder on your system storing the files from Demo_Files.zip. The "folder_path" string variable in the cell below currently holds a dummy file path string for reference. Change it to the file path on your system.
folder_path = '/Users/christopher.chin/Documents/Project_Part_1/Demo_Files'
Run the next cell to load the modules, random seed initializations, functions, and data you will be using in the rest of this demo.
# Modules to Import and random seeds to set
import numpy as np
import tensorflow as tf
import random as rn
import os
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(42)
rn.seed(12345)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
import scipy.io
import glob
import matplotlib.pyplot as plt
from matplotlib import mlab
import pylab as pl
import math
import sklearn
from sklearn import preprocessing
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import learning_curve
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_curve, auc
import scipy
import pandas
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from keras.callbacks import Callback, TensorBoard
from keras import layers
from keras import regularizers
from keras.initializers import glorot_uniform
from keras.layers import Input, Dense, Dropout, Activation, BatchNormalization, Flatten, Conv1D, Conv2D, MaxPooling1D, MaxPooling2D, GlobalAveragePooling1D, LSTM, Dropout, Bidirectional
from keras.layers.advanced_activations import LeakyReLU
from keras.models import Model, load_model, Sequential
from keras.optimizers import SGD, Adam
from keras.utils import plot_model, to_categorical
from matplotlib.pyplot import imshow
K.set_image_data_format('channels_last')
class data_and_labels:
''' data_and_labels Class
Reads Tom's Dataset and extracts the curves and corresponding annotations
Args:
filenames: list of string filenames for each Excel spreadsheet in the dataset
'''
def __init__(self,filenames):
self.filenames = filenames
# all_X is an array of shape (64001,40), comprising the 64001 normalized curves
# that are 40 points each
self.all_X = []
# all_X is an array of shape (64001,40), comprising the 64001 raw, unnormalized
# curves that are 40 points each
self.all_X_raw = []
# all_Y is an array of shape (64001,1) comprising the annotations for the 64001
# curves
self.all_Y = []
# all_numcurves is an int variable to continuously keep track of how many curves
# have been analyzed so far
self.all_numcurves = []
def excel_to_curves(self,filename):
''' excel_to_curves Method
Takes one Excel spreadsheet and extracts the curves, corresponding annotations,
and number of points in each curve
Args:
filename: string name of file to analyze
Returns:
file_labels: array containing the annotations of each curve
file_numcycles: array containing the number of points in each curve
file_curves: array containing all the curves in the file
'''
file = pandas.read_excel(io=filename,sheet_name='Sheet1',header=0,usecols='F,K:AY')
file = file.values
file_labels = file[:,0]
file_numcycles = file[:,1]
file_curves = file[:,2:]
file_curves = np.transpose(file_curves)
return file_labels,file_numcycles,file_curves
def normalize_curves(self,file_curves):
''' normalize_curves Method
Takes all the curves from an Excel spreadsheet and normalizes them from
0 to 1
Args:
file_curves: array containing all the curves in the file of interest
Returns:
file_curves_normalized: array containing all the normalized curves
'''
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
file_curves_normalized = min_max_scaler.fit_transform(file_curves)
return file_curves_normalized
def sample_to_features(self,sample,numcycles):
''' sample_to_features Method
Takes one curve and converts it into features. If a curve is less than
40 points, pad the last value of the curve until the curve is 40 points
long.
Args:
sample: array containing all the points in the curve
numcycles: int number of points in the curve
Returns:
sample_y: array containing the curve padded to 40 points.
'''
sample_y = sample
if numcycles < 40:
while len(sample_y) < 40:
sample_y = np.append(sample_y,sample_y[len(sample_y)-1])
sample_y = np.reshape(sample_y,(1,len(sample_y)))
return sample_y
def curves_to_fileXY(self,file_labels,file_numcycles,file_curves_normalized,file_curves):
''' curves_to_fileXY Method
Takes all the curves in one Excel spreadsheet, converts them into features,
and concatenates them into one array. Does the same for all the annotations
in the spreadsheet.
Args:
file_labels: array containing all the annotations in the file
file_numcycles: array containing the number of points per curve in the
file
file_curves_normalized: array containing all the normalized curves
in the file before conversion to features
file_curves: array containing all the raw, unnormalized curves in the
file before conversion to features
Returns:
file_X: array containing all the normalized curves, now all 40 points long
file_X_raw: array containing all the raw, unnormalized curves, now all
40 points long
file_Y: array containing all the annotations of the curves
'''
file_numcurves = np.shape(file_curves_normalized)[1]
self.all_numcurves = np.append(self.all_numcurves,file_numcurves)
file_X = []
file_X_raw = []
file_Y = []
for ii in range(file_numcurves):
sample_y = self.sample_to_features(file_curves_normalized[:,ii],file_numcycles[ii])
sample_y_raw = self.sample_to_features(file_curves[:,ii],file_numcycles[ii])
if len(file_X) == 0:
file_X = sample_y
file_X_raw = sample_y_raw
file_Y = file_labels[ii]
else:
file_X = np.append(file_X,sample_y,axis=0)
file_X_raw = np.append(file_X_raw,sample_y_raw,axis=0)
file_Y = np.append(file_Y,file_labels[ii])
return file_X,file_X_raw,file_Y
def allexcel_to_allXY(self):
''' allexcel_to_allXY Method
Master function that calls the other functions - takes all the spreadsheets
in the dataset, extracts the curves and annotations, converts the curves to
features, performs normalization, and concatenates all the curves into one
master array all_X, and all the annotations into one master array all_Y
'''
for ii in range(len(self.filenames)):
file_labels,file_numcycles,file_curves = self.excel_to_curves(filenames[ii])
file_curves_normalized = self.normalize_curves(file_curves)
file_X,file_X_raw,file_Y = self.curves_to_fileXY(file_labels,file_numcycles,file_curves_normalized,file_curves)
if len(self.all_X) == 0:
self.all_X = file_X
self.all_X_raw = file_X_raw
self.all_Y = file_Y
else:
self.all_X = np.append(self.all_X,file_X,axis=0)
self.all_X_raw = np.append(self.all_X_raw,file_X_raw,axis=0)
self.all_Y = np.append(self.all_Y,file_Y)
def grandindex_to_filename_and_row(self,grandindex):
''' grandindex_to_filename_and_row Method
After separating into training and test sets, the master arrays all_x
and all_Y are shuffled, losing the connection between the curve and the
file it came from. Pandas DataFrames are used to maintain the indices
from the original master arrays. This function converts these indices
into the corresponding file and the row in the file you can find
the curve of interest
Args:
grandindex: Pandas DataFrame index of the curve of interest
Returns:
filename: string name of file that the curve came from
row: int row number in the file that you can find the curve
'''
total = 0
row = 0
for ii in range(len(self.all_numcurves)):
if (grandindex > total + self.all_numcurves[ii]):
total = total + self.all_numcurves[ii]
else:
# 2 is added because python indexes start at 0 but curves in Tom's
# Excel spreadsheets start from row 2 down.
row = grandindex - total + 2
filename = self.filenames[ii]
break
return filename,row
def find_error_indices(self,Y_pred,Y_truth):
''' find_error_indices Method
Takes the array of predictions Y_pred and the array of ground truth
annotations Y_test and compares them elementwise. If they differ,
the index is added to the returned array error_indices
Args:
Y_pred: array of predictions from the model
Y_truth: array of ground truth annotations for the predictions
Returns:
error_indices: array of indices in the input arrays where the
values differed (indicating an error)
'''
Y_truth_temp = Y_truth.values
error_indices = []
for ii in range(len(Y_truth_temp)):
if Y_truth_temp[ii] != Y_pred[ii]:
error_indices = np.append(error_indices,ii)
return error_indices
def check_error_samples(self,Y_pred,Y_truth,error_indices,num_to_check):
''' check_error_samples Method
For a specified number of misclassfied examples to find
(num_to_check) and the input array of all indices corresponding
to misclassified examples (error_indices), randomly extract
num_to_check misclassified examples and output the file and row
that that example can be found
Args:
Y_pred: array of predictions from the model
Y_truth: array of ground truth annotations for the predictions
error_indices: array containing the indices of all misclassified
examples
num_to_check: int number of misclassified examples to find
Returns:
error_samples: Dictionary containing num_to_check dictionaries
Each sub-dictionary contains three keys and values
corresponding to the prediction of a misclassified
example, annotation of a misclassified example, and
tuple of the filename and row that that misclassified
example can be found
'''
error_samples = dict()
for ii in range(num_to_check):
rand_array_index = np.random.randint(len(error_indices))
rand_error_index = int(error_indices[rand_array_index])
grandindex = Y_truth.iloc[rand_error_index:rand_error_index+1].index[0]
label = Y_truth.iloc[rand_error_index]
prediction = Y_pred[rand_error_index]
filename,row = self.grandindex_to_filename_and_row(grandindex)
error_samples[ii] = dict()
error_samples[ii]['Prediction'] = prediction
error_samples[ii]['Label'] = label
error_samples[ii]['Location'] = (filename,row)
return error_samples
def add_noise_data(self,numcurves):
''' add_noise_data Method
Add random noise to the dataset, with annotation -1 (non-amplified)
Args:
numcurves: number of random noise curves to add to the dataset
'''
for ii in range(numcurves):
noise_data = np.random.randn(1,40)
noise_label = 0
self.all_X = np.append(self.all_X,noise_data,axis=0)
self.all_Y = np.append(self.all_Y,noise_label)
def compute_confmat_3classes(y_pred, y_true):
''' compute_confmat_3classes Method
Given the predictions and corresponding annotations, construct the
confusion matrix and output its contents
Args:
y_pred: array of predictions from the model
y_true: array of ground truth annotations for the predictions
Returns:
df_conf_mat: array corresponding to the confusion matrix table
fpr: false positive rate
fnr: false negative rate
tpr: true positive rate
tnr: true negative rate
inc_pos: Number of inc annotations called positive
inc_inc: Number of inc annotations called inc
inc_neg: Number of inc annotations called negative
'''
conf_mat = np.zeros((3, 3))
row_labels = ['Actual: -1','Actual: 0','Actual: 1']
col_labels = ['Predicted: -1','Predicted: 0','Predicted: 1']
y_true_copy = np.copy(y_true)
y_pred_copy = np.copy(y_pred)
for ii in range(K.flatten(y_true).shape[0].value):
conf_mat[int(y_true_copy[ii])][int(y_pred_copy[ii])] += 1
df_conf_mat = pandas.DataFrame(conf_mat,row_labels,col_labels)
fpr = df_conf_mat.values[0,2]/(df_conf_mat.values[0,0]+df_conf_mat.values[0,2])
fnr = df_conf_mat.values[2,0]/(df_conf_mat.values[2,0]+df_conf_mat.values[2,2])
tpr = df_conf_mat.values[2,2]/(df_conf_mat.values[2,0]+df_conf_mat.values[2,2])
tnr = df_conf_mat.values[0,0]/(df_conf_mat.values[0,0]+df_conf_mat.values[0,2])
inc_pos = df_conf_mat.values[1,2]
inc_inc = df_conf_mat.values[1,1]
inc_neg = df_conf_mat.values[1,0]
pos_inc = df_conf_mat.values[2,1]
neg_inc = df_conf_mat.values[0,1]
return df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc
def cv_trainingset(model_name,X_train,Y_train,X_test,Y_test,numsamples,cv_dict,aux_dict=dict(),class_weight_dict=dict()):
''' cv_trainingset Method
Performs Stratified K-Fold Cross Validation on the training set, where "numsamples"
defines the number of folds. Averages the performance metrics across all the folds
and returns the mean and standard deviation of those performance metrics in "all_dict."
"Stratified" K-Fold refers to preserving the class distribution of the entire training
set in each fold.
Args:
model_name: string name of the model to load
X_train: array containing the training set (curves)
Y_train: array containing the training annotations
X_test: array containing the test set (curves)
Y_test: array containing the test set annotations
numsamples: number of folds
cv_dict: Dictionary containing pre-determined indices to split the training
set into "numsamples" folds.
aux_dict: Dictionary containing auxiliary inputs for the training
and test set (if applicable). Defaults to empty dictionary
class_weight_dict: Dictionary containing class weights for fitting the model
(if applicable). Defaults to empty dictionary
Returns:
all_dict: dictionary containing the mean and standard deviation of
the performance metrics for the numsamples runs.
'''
all_acc_train = []
all_fpr_train = []
all_fnr_train = []
all_tpr_train = []
all_tnr_train = []
all_inc_pos_train = []
all_inc_inc_train = []
all_inc_neg_train = []
all_pos_inc_train = []
all_neg_inc_train = []
all_acc_val = []
all_fpr_val = []
all_fnr_val = []
all_tpr_val = []
all_tnr_val = []
all_inc_pos_val = []
all_inc_inc_val = []
all_inc_neg_val = []
all_pos_inc_val = []
all_neg_inc_val = []
all_acc_test = []
all_fpr_test = []
all_fnr_test = []
all_tpr_test = []
all_tnr_test = []
all_inc_pos_test = []
all_inc_inc_test = []
all_inc_neg_test = []
all_pos_inc_test = []
all_neg_inc_test = []
all_dict = dict()
# Execute numsamples times
for ii in range(numsamples):
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(42)
rn.seed(12345)
tf.set_random_seed(1234)
model = load_model(model_name)
sample_indices_train = cv_dict[ii]['train']
sample_indices_val = cv_dict[ii]['val']
X_train_sample = X_train[sample_indices_train]
Y_train_sample = Y_train[sample_indices_train]
X_val_sample = X_train[sample_indices_val]
Y_val_sample = Y_train[sample_indices_val]
Y_train_argmax = np.argmax(Y_train_sample,axis=1)
Y_val_argmax = np.argmax(Y_val_sample,axis=1)
# If aux_dict is not an empty dictionary, extract the auxiliary inputs
# corresponding to the training and test sets
if bool(aux_dict):
X_train_aux_input = aux_dict['X_train_aux_input']
X_train_aux_input_sample = X_train_aux_input[sample_indices_train]
X_val_aux_input_sample = X_train_aux_input[sample_indices_val]
X_test_aux_input = aux_dict['X_test_aux_input']
if bool(class_weight_dict):
model.fit([X_train_sample,X_train_aux_input_sample],Y_train_sample,epochs=10,batch_size=200,class_weight=class_weight_dict)
else:
model.fit([X_train_sample,X_train_aux_input_sample],Y_train_sample,epochs=10,batch_size=200)
Y_pred_train = np.squeeze(model.predict([X_train_sample,X_train_aux_input_sample]))
Y_pred_train = Y_pred_train.argmax(axis=-1)
Y_pred_val = np.squeeze(model.predict([X_val_sample,X_val_aux_input_sample]))
Y_pred_val = Y_pred_val.argmax(axis=-1)
Y_pred_test = np.squeeze(model.predict([X_test,X_test_aux_input]))
Y_pred_test = Y_pred_test.argmax(axis=-1)
else:
if bool(class_weight_dict):
model.fit(X_train_sample,Y_train_sample,epochs=10,batch_size=200,class_weight=class_weight_dict)
else:
model.fit(X_train_sample,Y_train_sample,epochs=10,batch_size=200)
Y_pred_train = np.squeeze(model.predict(X_train_sample))
Y_pred_train = Y_pred_train.argmax(axis=-1)
Y_pred_val = np.squeeze(model.predict(X_val_sample))
Y_pred_val = Y_pred_val.argmax(axis=-1)
Y_pred_test = np.squeeze(model.predict(X_test))
Y_pred_test = Y_pred_test.argmax(axis=-1)
# Calculate the number of errors by comparing the annotations and
# predictions elementwise and summing the number of times they differ
numerrors = (Y_train_argmax != Y_pred_train).sum()
_, fpr, fnr, tpr, tnr, inc_pos, inc_inc, inc_neg, pos_inc, neg_inc = compute_confmat_3classes(Y_pred_train, Y_train_argmax)
# Calculate accuracy
acc = (len(Y_pred_train)-numerrors)/len(Y_pred_train)
all_acc_train.append(acc)
all_fpr_train.append(fpr)
all_fnr_train.append(fnr)
all_tpr_train.append(tpr)
all_tnr_train.append(tnr)
all_inc_pos_train.append(inc_pos)
all_inc_inc_train.append(inc_inc)
all_inc_neg_train.append(inc_neg)
all_pos_inc_train.append(pos_inc)
all_neg_inc_train.append(neg_inc)
# Repeat the procedure for the validation set
numerrors = (Y_val_argmax != Y_pred_val).sum()
_, fpr, fnr, tpr, tnr, inc_pos, inc_inc, inc_neg, pos_inc, neg_inc = compute_confmat_3classes(Y_pred_val, Y_val_argmax)
acc = (len(Y_pred_val)-numerrors)/len(Y_pred_val)
all_acc_val.append(acc)
all_fpr_val.append(fpr)
all_fnr_val.append(fnr)
all_tpr_val.append(tpr)
all_tnr_val.append(tnr)
all_inc_pos_val.append(inc_pos)
all_inc_inc_val.append(inc_inc)
all_inc_neg_val.append(inc_neg)
all_pos_inc_val.append(pos_inc)
all_neg_inc_val.append(neg_inc)
# Repeat the procedure for the test set
numerrors = (Y_test != Y_pred_test).sum()
_, fpr, fnr, tpr, tnr, inc_pos, inc_inc, inc_neg, pos_inc, neg_inc = compute_confmat_3classes(Y_pred_test, Y_test)
acc = (len(Y_pred_test)-numerrors)/len(Y_pred_test)
all_acc_test.append(acc)
all_fpr_test.append(fpr)
all_fnr_test.append(fnr)
all_tpr_test.append(tpr)
all_tnr_test.append(tnr)
all_inc_pos_test.append(inc_pos)
all_inc_inc_test.append(inc_inc)
all_inc_neg_test.append(inc_neg)
all_pos_inc_test.append(pos_inc)
all_neg_inc_test.append(neg_inc)
# Performance metrics for the training set:
# Accuracy
all_dict['acc_train'] = [np.mean(all_acc_train),np.std(all_acc_train)]
# False Positive Rate
all_dict['fpr_train'] = [np.mean(all_fpr_train),np.std(all_fpr_train)]
# False Negative Rate
all_dict['fnr_train'] = [np.mean(all_fnr_train),np.std(all_fnr_train)]
# True Positive Rate
all_dict['tpr_train'] = [np.mean(all_tpr_train),np.std(all_tpr_train)]
# True Negative Rate
all_dict['tnr_train'] = [np.mean(all_tnr_train),np.std(all_tnr_train)]
# Number of inc annotations called positive
all_dict['inc_pos_train'] = [np.mean(all_inc_pos_train),np.std(all_inc_pos_train)]
# Number of inc annotations called inc
all_dict['inc_inc_train'] = [np.mean(all_inc_inc_train),np.std(all_inc_inc_train)]
# Number of inc annotations called negative
all_dict['inc_neg_train'] = [np.mean(all_inc_neg_train),np.std(all_inc_neg_train)]
# Number of positive annotations called inc
all_dict['pos_inc_train'] = [np.mean(all_pos_inc_train),np.std(all_pos_inc_train)]
# Number of negative annotations called inc
all_dict['neg_inc_train'] = [np.mean(all_neg_inc_train),np.std(all_pos_inc_train)]
# Performance metrics for the validation set:
all_dict['acc_val'] = [np.mean(all_acc_val),np.std(all_acc_val)]
all_dict['fpr_val'] = [np.mean(all_fpr_val),np.std(all_fpr_val)]
all_dict['fnr_val'] = [np.mean(all_fnr_val),np.std(all_fnr_val)]
all_dict['tpr_val'] = [np.mean(all_tpr_val),np.std(all_tpr_val)]
all_dict['tnr_val'] = [np.mean(all_tnr_val),np.std(all_tnr_val)]
all_dict['inc_pos_val'] = [np.mean(all_inc_pos_val),np.std(all_inc_pos_val)]
all_dict['inc_inc_val'] = [np.mean(all_inc_inc_val),np.std(all_inc_inc_val)]
all_dict['inc_neg_val'] = [np.mean(all_inc_neg_val),np.std(all_inc_neg_val)]
all_dict['pos_inc_val'] = [np.mean(all_pos_inc_val),np.std(all_pos_inc_val)]
all_dict['neg_inc_val'] = [np.mean(all_neg_inc_val),np.std(all_pos_inc_val)]
# Performance metrics for the test set:
all_dict['acc_test'] = [np.mean(all_acc_test),np.std(all_acc_test)]
all_dict['fpr_test'] = [np.mean(all_fpr_test),np.std(all_fpr_test)]
all_dict['fnr_test'] = [np.mean(all_fnr_test),np.std(all_fnr_test)]
all_dict['tpr_test'] = [np.mean(all_tpr_test),np.std(all_tpr_test)]
all_dict['tnr_test'] = [np.mean(all_tnr_test),np.std(all_tnr_test)]
all_dict['inc_pos_test'] = [np.mean(all_inc_pos_test),np.std(all_inc_pos_test)]
all_dict['inc_inc_test'] = [np.mean(all_inc_inc_test),np.std(all_inc_inc_test)]
all_dict['inc_neg_test'] = [np.mean(all_inc_neg_test),np.std(all_inc_neg_test)]
all_dict['pos_inc_test'] = [np.mean(all_pos_inc_test),np.std(all_pos_inc_test)]
all_dict['neg_inc_test'] = [np.mean(all_neg_inc_test),np.std(all_pos_inc_test)]
return all_dict
def create_training_and_test_sets(all_X,all_Y,test_size=0.2):
''' create_training_and_test_sets Method
Shuffle the entire dataset (all_X) and set of annotations (all_Y).
Then split into a training set and test set, the proportions of
which are determined by the input "test_size." In addition,
one-hot-encode the training set annotations for use by Keras
Args:
all_X: Pandas DataFrame of all the curves in the dataset
all_Y: Pandas DataFrame of all the annotations in the dataset
test_size: fraction of the entire dataset that should be set
aside for the test set
Returns:
X_train: Pandas DataFrame containing the training set (curves)
Y_train: array containing the one-hot-encoded training annotations
X_test: Pandas DataFrame containing the test set (curves)
Y_test: Pandas DataFrame containing the test set annotations
'''
X_train,X_test,Y_train,Y_test = train_test_split(all_X,all_Y,test_size)
# Adjust annotations to comply with Keras encoding, which cannot accept
# label of -1. Convert -1 to 0, 0 to 1, and 1 to 2.
Y_train += 1
Y_test += 1
# Convert annotations to Keras one-hot-encoding
Y_train = keras.utils.to_categorical(Y_train,num_classes=3)
return X_train,X_test,Y_train,Y_test
# Load in the dataset of Excel spreadsheets
filenames = glob.glob(folder_path+'/*.xls')
# Use the data_and_labels class to extract the curves and annotations
# from the Excel spreadsheets
all_XY = data_and_labels(filenames)
all_XY.allexcel_to_allXY()
# all_X holds the curves in the dataset and all_Y holds the annotations
# in the dataset
all_X = pandas.DataFrame(all_XY.all_X)
all_Y = pandas.Series(all_XY.all_Y)
# Load in the data for all auxiliary inputs
# RMS Noise
mat_contents = scipy.io.loadmat(folder_path+'/RMSNoise.mat')
RMSNoise = mat_contents['approxRMSNoise']
RMSNoise = np.swapaxes(RMSNoise,0,1)
# Curve Derivatives
mat_contents = scipy.io.loadmat(folder_path+'/Derivs_RMS.mat')
Derivs_RMS = mat_contents['zerothAndHigherOrderDerivs_RMS']
Derivs_RMS = np.swapaxes(Derivs_RMS,0,2)
# Number of Polynomial Curve Crossings
mat_contents = scipy.io.loadmat(folder_path+'/numPolyCrossings.mat')
numPolyCrossings = mat_contents['numPolyCrossings']
numPolyCrossings = np.swapaxes(numPolyCrossings,0,1)
# Number of Diagonal Line Crossings
mat_contents = scipy.io.loadmat(folder_path+'/numZeroCrossingsDiagonal.mat')
numZeroCrossingsDiagonal = mat_contents['numZeroCrossingsDiagonal']
numZeroCrossingsDiagonal = np.swapaxes(numZeroCrossingsDiagonal,0,1)
# Number of Horizontal Line Crossings
mat_contents = scipy.io.loadmat(folder_path+'/numZeroCrossingsHorizontal.mat')
numZeroCrossingsHorizontal = mat_contents['numZeroCrossingsHorizontal']
numZeroCrossingsHorizontal = np.swapaxes(numZeroCrossingsHorizontal,0,1)
# Location of Second Derivative Maxima
mat_contents = scipy.io.loadmat(folder_path+'/yMaxMinLocs.mat')
yMaxMinLocs = mat_contents['yMaxMinLocs']
yMaxMinLocs = np.swapaxes(yMaxMinLocs,0,1)
# Value of Second Derivative Maxima
mat_contents = scipy.io.loadmat(folder_path+'/yMaxMinVals.mat')
yMaxMinVals = mat_contents['yMaxMinVals']
yMaxMinVals = np.swapaxes(yMaxMinVals,0,1)
# RMS Noise Scaling Factor
mat_contents = scipy.io.loadmat(folder_path+'/RMSNoiseScalingFactor.mat')
RMSNoiseScalingFactor = mat_contents['approxRMSNoiseBasedCurvesScalingFactor']
RMSNoiseScalingFactor = np.swapaxes(RMSNoiseScalingFactor,0,1)
# numZeroCrossingsDiagonal contains two columns, the first for positive slope diagonal line
# crossings, and the second for negative slope diagonal line crossings.
# More strongly weight the number of negative slope diagonal line crossings (weighting of 2)
numZeroCrossingsDiagonal[:,1]*=2
# Combine the two diagonal line crossing numbers into one summary number
numZeroCrossingsDiagonal = np.sum(numZeroCrossingsDiagonal,axis=1)
numZeroCrossingsDiagonal = numZeroCrossingsDiagonal[:,np.newaxis]
# numZeroCrossingsHorizontal contains several columns, each for the number of crossings
# for horizontal lines with different slopes and intercepts. Combine all the numbers into
# one summary number.
numZeroCrossingsHorizontal = np.sum(numZeroCrossingsHorizontal,axis=1)
numZeroCrossingsHorizontal = numZeroCrossingsHorizontal[:,np.newaxis]
# Combine the number of horizontal and diagonal line crossings into one summary number
numZeroCrossings = np.concatenate((numZeroCrossingsDiagonal,numZeroCrossingsHorizontal),axis=1)
numZeroCrossings = np.sum(numZeroCrossings,axis=1)
numZeroCrossings = numZeroCrossings[:,np.newaxis]
# More strongly weight the number of polynomial curve crossings (weighting of 6)
numPolyCrossingsScaled = numPolyCrossings[:,0]*6
numPolyCrossingsScaled = numPolyCrossingsScaled[:,np.newaxis]
# Combine the number of line crossings (horizontal and diagonal) and the number of
# polynomial curve crossings into one summary number
numCrossings = numZeroCrossings + numPolyCrossingsScaled
# yMaxMinVals and yMaxMinLocs contain two columns, the first with information about
# second derivative maxima and the second with information about second derivative
# minima. Extract information about only the second derivative maxima
yMaxMinVals = yMaxMinVals[:,0]
yMaxMinVals = yMaxMinVals[:,np.newaxis]
yMaxMinLocs = yMaxMinLocs[:,0]
# The range of values for yMaxMinLocs is 1 to ~39. Subtract 1 from all values to comply
# with Python indexing which begins at 0.
yMaxMinLocs -= 1
yMaxMinLocs = yMaxMinLocs[:,np.newaxis]
# Instantiate an array to create a new auxiliary input for the value of the amp curve
# at the second derivative maxima
amp_curve_values_at_second_deriv_max = []
# Number of points per curve
numPts = 40
# Total number of curves
numCurves = Derivs_RMS.shape[0]
for ii in range(numCurves):
# Get the location (cycle number) of the second derivative maximum for the current
# curve of interest
loc = int(yMaxMinLocs[ii])
# Use all_X.values, which contains the normalized curve values, to obtain the value
# of the amp curve at the second derivative maximum
amp_curve_value_at_second_deriv_max = all_X.values[ii,loc]
amp_curve_values_at_second_deriv_max.append(amp_curve_value_at_second_deriv_max)
amp_curve_values_at_second_deriv_max = np.array(amp_curve_values_at_second_deriv_max)
amp_curve_values_at_second_deriv_max = amp_curve_values_at_second_deriv_max[:,np.newaxis]
# Create a new auxiliary input where the values of the second derivative maxima are
# divided by the RMSNoiseScalingFactor to obtain a scaled second derivative maximum value
yMaxMinValsScaled = yMaxMinVals/RMSNoiseScalingFactor
# Normalize the values of all auxiliary inputs between 0 and 1
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
yMaxMinLocs = min_max_scaler.fit_transform(yMaxMinLocs)
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
numZeroCrossings = min_max_scaler.fit_transform(numZeroCrossings)
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
numPolyCrossings = min_max_scaler.fit_transform(numPolyCrossings)
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
RMSNoiseScalingFactor = min_max_scaler.fit_transform(RMSNoiseScalingFactor)
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
yMaxMinValsScaled = min_max_scaler.fit_transform(yMaxMinValsScaled)
# Load in the pre-defined training and test sets. Also load in the
# training set indices and test set indices, which define which elements
# of all_X and all_Y correspond to the pre-defined training and test sets.
X_train = pandas.DataFrame(np.loadtxt(folder_path+'/X_train.txt',delimiter=','))
Y_train = np.loadtxt(folder_path+'/Y_train.txt',delimiter=',')
X_test = pandas.DataFrame(np.loadtxt(folder_path+'/X_test.txt',delimiter=','))
Y_test = pandas.DataFrame(np.loadtxt(folder_path+'/Y_test.txt',delimiter=','))
X_train_indices = np.loadtxt(folder_path+'/X_train_indices.txt',delimiter=',')
X_train_indices = np.array([int(el) for el in X_train_indices])
X_test_indices = np.loadtxt(folder_path+'/X_test_indices.txt',delimiter=',')
X_test_indices = np.array([int(el) for el in X_test_indices])
# Use the training set indices to extract the correct auxiliary
# inputs for the corresponding curves in the training set.
X_train_RMSNoise = np.take(RMSNoise,X_train_indices,axis=0)
X_train_numZeroCrossings = np.take(numZeroCrossings,X_train_indices,axis=0)
X_train_numPolyCrossings = np.take(numPolyCrossings,X_train_indices,axis=0)
X_train_yMaxMinVals = np.take(yMaxMinVals,X_train_indices,axis=0)
X_train_yMaxMinLocs = np.take(yMaxMinLocs,X_train_indices,axis=0)
X_train_amp_curve_values_at_second_deriv_max = np.take(amp_curve_values_at_second_deriv_max,X_train_indices,axis=0)
X_train_RMSNoiseScalingFactor = np.take(RMSNoiseScalingFactor,X_train_indices,axis=0)
X_train_yMaxMinValsScaled = np.take(yMaxMinValsScaled,X_train_indices,axis=0)
X_train_numCrossings = np.take(numCrossings,X_train_indices,axis=0)
# Use the test set indices to extract the correct auxiliary
# inputs for the corresponding curves in the test set.
X_test_RMSNoise = np.take(RMSNoise,X_test_indices,axis=0)
X_test_numZeroCrossings = np.take(numZeroCrossings,X_test_indices,axis=0)
X_test_numPolyCrossings = np.take(numPolyCrossings,X_test_indices,axis=0)
X_test_yMaxMinVals = np.take(yMaxMinVals,X_test_indices,axis=0)
X_test_yMaxMinLocs = np.take(yMaxMinLocs,X_test_indices,axis=0)
X_test_amp_curve_values_at_second_deriv_max = np.take(amp_curve_values_at_second_deriv_max,X_test_indices,axis=0)
X_test_RMSNoiseScalingFactor = np.take(RMSNoiseScalingFactor,X_test_indices,axis=0)
X_test_yMaxMinValsScaled = np.take(yMaxMinValsScaled,X_test_indices,axis=0)
X_test_numCrossings = np.take(numCrossings,X_test_indices,axis=0)
# Create the training and test set auxiliary input arrays
X_train_aux_input_1 = X_train_RMSNoise
X_test_aux_input_1 = X_test_RMSNoise
X_train_aux_input_2 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings),axis=1)
X_test_aux_input_2 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings),axis=1)
X_train_aux_input_3 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings),axis=1)
X_test_aux_input_3 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings),axis=1)
X_train_aux_input_4 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals),axis=1)
X_test_aux_input_4 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals),axis=1)
X_train_aux_input_5 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals,X_train_yMaxMinLocs),axis=1)
X_test_aux_input_5 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals,X_test_yMaxMinLocs),axis=1)
X_train_aux_input_6 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals,X_train_yMaxMinLocs,X_train_amp_curve_values_at_second_deriv_max),axis=1)
X_test_aux_input_6 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals,X_test_yMaxMinLocs,X_test_amp_curve_values_at_second_deriv_max),axis=1)
X_train_aux_input_7 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals,X_train_yMaxMinLocs,X_train_amp_curve_values_at_second_deriv_max,X_train_RMSNoiseScalingFactor),axis=1)
X_test_aux_input_7 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals,X_test_yMaxMinLocs,X_test_amp_curve_values_at_second_deriv_max,X_test_RMSNoiseScalingFactor),axis=1)
X_train_aux_input_8 = np.concatenate((X_train_RMSNoise,X_train_numZeroCrossings,X_train_numPolyCrossings,X_train_yMaxMinVals,X_train_yMaxMinLocs,X_train_amp_curve_values_at_second_deriv_max,X_train_RMSNoiseScalingFactor,X_train_yMaxMinValsScaled),axis=1)
X_test_aux_input_8 = np.concatenate((X_test_RMSNoise,X_test_numZeroCrossings,X_test_numPolyCrossings,X_test_yMaxMinVals,X_test_yMaxMinLocs,X_test_amp_curve_values_at_second_deriv_max,X_test_RMSNoiseScalingFactor,X_test_yMaxMinValsScaled),axis=1)
# Change the one-hot-encoded training set annotations into single number class
# labels
Y_train_argmax = np.argmax(Y_train,axis=1)
# Use numsamples Stratified K-Fold Cross Validation
numsamples = 10
skf = StratifiedKFold(n_splits=numsamples)
# Create a dictionary holding the training and test set indices for all the
# folds in Stratified K-Fold Cross Validation
counter = 0
cv_dict = dict()
for train_index, test_index in skf.split(X_train, Y_train_argmax):
cv_dict[counter] = dict()
cv_dict[counter]['train'] = train_index
cv_dict[counter]['val'] = test_index
counter += 1
# Activation Functions to be used
softsign = 'softsign'
relu = 'relu'
tanh = 'tanh'
sigmoid = 'sigmoid'
linear = 'linear'
# Optimizer to be used
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
The next few cells will guide you through choosing the parameters of your neural network. As a first pass to get a feeling for how the model is working, you can run the cells below keeping the default values.
The architecture we will be implementing consists of one input layer (not shown), 3 hidden layers, and one output layer as depicted above.
Input Layer (Not shown): Default (Unchangeable) of 40. 40 neurons for the 40-dimensional input (40 normalized fluorescence values for every amplification curve in the dataset).
First Hidden Layer: Default of 30. Every neuron in this layer is "fully connected" to every neuron in the input layer. Learns the interrelationships and interdependencies between all the input values.
Second Hidden Layer: Default of 30. Every neuron in this layer is "fully connected" to every neuron in the first hidden layer. The second hidden layer learns more abstract, latent features produced by the first hidden layer.
Third Hidden Layer:: Default of 20. Every neuron in this layer is "fully connected" to every neuron in the second hidden layer. The third hidden layer learns even more abstract, latent features produced by the second hidden layer. Typically this layer will contain less neurons than the second and first hidden layers, as the model more finely squeezes information out of the inputs.
Output Layer: Default (Unchangeable) of 3. 3 neurons for the 3-dimensional output (3 probabilities for each of the three classes: -1 or Non-Amplified, 0 or Inconclusive, and 1 or Amplified.
Change (or leave as default) the values in the cell below to experiment with the number of neurons in each layer of the architecture.
# Number of neurons in hidden layer 1
layer1_depth = 30
# Number of neurons in hidden layer 2
layer2_depth = 30
# Number of neurons in hidden layer 3
layer3_depth = 20
As demonstrated by the diagram above, after each neuron in a layer outputs its raw value, this value is typically passed through an activation function. This activation function adds an additional nonlinearity, allowing the network to learn more nonlinear mappings. A few activation functions are displayed below:
Change (or leave as default) the values in the cell below to experiment with the activation functions in each layer of the architecture. The options you have are:
- tanh
- softsign
- sigmoid
- relu
- linear
# Activation function in hidden layer 1
layer1_activation = relu
# Activation function in hidden layer 2
layer2_activation = relu
# Activation function in hidden layer 3
layer3_activation = relu
The last parameters we will be changing before creating and fitting our model are the batch size and number of epochs:
Batch Size: Default of 200. The number of samples in your dataset to give to the neural network before it computes the loss, backpropagates the error, and updates its parameters.
Number of Epochs: Default of 10. The number of times the neural network will pass through the entire dataset.
Change (or leave as default) the values in the cell below to experiment with the batch size and number of epochs:
# Batch Size
batch_size = 200
# Number of epochs
epochs = 10
Keras is an excellent framework for rapidly prototyping deep learning models.
Here is an example of how to create a neural network in Keras:
# Uses the Sequential Model API in which every layer is linearly stacked on top of the last
model = Sequential()
# To add a layer, simply use the model.add() function
# Here we are adding a Dense (fully connected hidden layer) to the network with the number
# of neurons ("layer1_depth") and activation function ("layer1_activation") you specified
# earlier.
# Note that for the first layer in the Sequential Model, you must specify the dimensionality
# of your inputs. After this point, Keras will automatically calculate input and output shapes
# for you.
model.add(Dense(layer1_depth, activation=layer1_activation, input_dim=40))
# Add the second hidden layer
model.add(Dense(layer2_depth,activation=layer2_activation))
# Add the third hidden layer
model.add(Dense(layer3_depth,activation=layer3_activation))
# The last layer will be the output layer and must contain 3 neurons for the 3 class
# probabilities we want to output. The "softmax" activation function is also necessary
# to ensure these probabilities sum to 1.
model.add(Dense(3, activation='softmax'))
# Compile the model, preparing it to be fit to the training data. Leave the parameters
# at the defaults for now.
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
In only six lines, you've created a neural network model. Run the following cell to create your neural network and fit it on the training set:
Note that there are additional BatchNormalization layers after each Dense layer. These BatchNormalization layers normalize the distribution of the output values after each Dense layer, improving network performance.
# Create Neural Network model
model = Sequential()
model.add(Dense(layer1_depth, activation=layer1_activation, input_dim=40))
model.add(BatchNormalization())
model.add(Dense(layer2_depth,activation=layer2_activation))
model.add(BatchNormalization())
model.add(Dense(layer3_depth,activation=layer3_activation))
model.add(BatchNormalization())
model.add(Dense(3, activation='softmax'))
model.compile(optimizer=sgd,
loss='categorical_crossentropy',
metrics=['accuracy'])
# Fit the model on the training set for the batch size and number of epochs you specified earlier
model.fit(X_train,Y_train,epochs=epochs,batch_size=batch_size)
Now that you've successfully created and fit your model, run the following cell to see the false positive rate (FPR), false negative rate (FNR), true positive rate (TPR), and true negative rate (TNR) performance metrics on the training set. A confusion matrix is also displayed:
# Convert array of annotations (Y_train) from Keras one-hot encoding to int labels (0,1,2)
Y_train_argmax = np.argmax(Y_train,axis=1)
# Obtain predictions from model on training set
Y_pred = model.predict(X_train)
# Convert probabilities into class labels
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_train_argmax)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat
Run the following cell to see the FPR, FNR, TPR, and TNR performance metrics, along with the confusion matrix for the test set.
# Obtain predictions from model on test set
Y_pred = model.predict(X_test)
# Convert probabilities into class labels
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_test)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat
Now, to improve performance, you can add auxiliary inputs to the model. These features were hand-engineered to give the network additional information it can use to make predictions. The auxiliary inputs are as follows:
RMS Noise: Approximate RMS Noise in the curve, calculated by subtracting the normalized curve from a smoothed version of the normalized curve and finding the sum of squares of the residual.
Number of Line Crossings: Total number of crossings between the curve and various horizontal and diagonal lines with different slopes and intercept.
Number of Polynomial Curve Crossings: Total number of crossings between the curve and a 4th order polynomial fit to the curve.
Second Derivative Max Value:: Value of the second derivative maximum of the curve.
Second Derivative Max Location: Cycle number at which the second derivative maximum occurs.
Amp Curve Value at Second Derivative Max: Value of the normalized amp curve at the second derivative max
RMS Noise Scaling Factor: Scaling factor used to scale the second derivative max value. The RMS Noise Scaling factor is larger for noisier curves.
Scaled Second Derivative Max Value: Second derivative max value divided by the RMS Noise Scaling Factor
Choose the number of auxiliary inputs you wish to add to the model. The options you have are:
- 1: RMS Noise
- 2: (Same as 1) + Number of Line Crossings
- 3: (Same as 2) + Number of Polynomial Curve Crossings
- 4: (Same as 3) + Second Derivative Max Value
- 5: (Same as 4) + Second Derivative Max Location
- 6: (Same as 5) + Amp Curve Value at Second Derivative Max
- 7: (Same as 6) + RMS Noise Scaling Factor
- 8: (Same as 7) + Scaled Second Derivative Max Value
Change the argument of "num_aux_input" below to the number of auxiliary inputs you wish to add:
# The number of auxiliary inputs to add. Choose from [1,2,3,4,5,6,7,8]:
num_aux_input = 6
As before, you can change (or leave as default) the values in the cell below, which specify the parameters of the neural network:
# Number of neurons per hidden layer
layer1_depth = 30
layer2_depth = 30
layer3_depth = 20
# Activation Functions per hidden layer
layer1_activation = relu
layer2_activation = relu
layer3_activation = relu
# Batch size and Number of epochs
batch_size = 200
epochs = 10
An additional modification that can improve performance is the incorporation of reguarlization into the loss function:
"C0" is the categorical cross entropy loss function used by the model we are working with, "n" is the number of samples in the training set, and "w" are the weights learned by the network. Regularization serves to penalize larger weights, effectively making the model simpler and less prone to overfitting. The amount of regularization is determined by lambda (regularization parameter) in the equation.
Change (or leave as default) the regularization parameter (lambda) in the cell below. Keep in mind that due to the way it is incorporated in the loss function, the regularization parameter (lambda) determines a tradeoff between learning small weights and minimizing the original loss function (categorical cross entropy).
- Increasing lambda increases the emphasis on learning small weights
- Decreasing lambda increases the emphasis on minimizing the original loss function.
It is suggested to start with small values of lambda, such as the default of 0.01, to incorporate the benefits of regularization (increasing generalization capability) as well as the benefits of the original loss function (learning the training set well).
# Regularization parameter (lambda)
regularization = 0.01
The last modification we will make to the loss function is the addition of class weighting.
"Li" is the loss function for sample i, "n" is the total number of samples in the training set, "ni" is the number of samples in class i, and "c" is the total number of classes.
wi: Weight applied to the loss for each sample in class i. In order to address the skewed distribution in the dataset, higher weights are given to more poorly-represented classes, informing the network that those classes contain the greatest source of information.
alpha: Weighting exponent that determines how aggressively to emphasize the more poorly-represented classes as sources of information.
Change (or leave as default) the alpha parameter in the cell below.
- Increasing alpha, for example, increases the weight given to the Non-Amplified (-1) class. This more heaviliy penalizes misclassifications from this class (i.e. when an example annotated as -1 is called 1), thereby decreasing the false positive rate.
# alpha parameter in the class weighting formula
alpha = 1
The Sequential API was perfect for the relatively simple models we implemented thus far in the tutorial because those models were all "sequential," in other words one layer's output became the next layer's input and so on. For example:
However, now that we wish to add auxiliary inputs to our model, the model can no longer be compiled sequentially and we must turn instead to Keras' Functional API. Keras' Functional API has an incredible range of flexibility in model development, allowing you now to create models that accept multiple inputs:
At more advanced stages you can even create models with multiple inputs and multiple outputs, with layers recombining at different stages:
With the added benefit of flexibiltiy, Keras' Functional API is only slightly more complex than the Sequential API. Let's dive in to the syntax.
Previously we created the following neural network via the Sequential API, repeated below for convenience. The line numbers will be labeled in order for future reference.
# Line 0
model = Sequential()
# Line 1
model.add(Dense(layer1_depth, activation=layer1_activation, input_dim=40))
# Line 2
model.add(Dense(layer2_depth,activation=layer2_activation))
# Line 3
model.add(Dense(layer3_depth,activation=layer3_activation))
# Line 4
model.add(Dense(3, activation='softmax'))
# Line 5
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
Let's jump straight into the Functional API translation of the same model. Note that line numbers here are not in order, but refer to the corresponding line numbers in the Sequential API code:
# Line 1a
inputs = Input(shape=(40,))
# Line 1b
x = Dense(layer1_depth, activation=layer1_activation)(inputs)
# Line 2
x = Dense(layer2_depth, activation=layer2_activation)(x)
# Line 3
x = Dense(layer3_depth, activation=layer3_activation)(x)
# Line 4
outputs = Dense(3, activation='softmax')(x)
# Line 0
model = Model(inputs=inputs, outputs=outputs)
# Line 5
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
Now for the details. The major difference between the Sequential and Functional API is how layers are added to the model. In both cases, a layer is created the same way - by using the syntax typically reserved for classes in Python (indicated in blue font). For example, to create a Dense layer you use:
Dense(layer_depth, activation=layer_activation)
Let's now proceed through a line-by-line comparison between the two APIs:
We will use Keras' Functional API to create our modified model, which now incorporates auxiliary inputs, regularization, and class weighting. Run the cell below to create and fit the model on the training set. If you wish to read through the details of the code below, make note of four important points:
# Create Neural Network Model
X_input = Input(shape=(40,),name='Input')
X_dense1 = Dense(layer1_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_1')(X_input)
X_bn1 = BatchNormalization(name='BatchNorm_1')(X_dense1)
X_dense2 = Dense(layer2_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_2')(X_bn1)
X_bn2 = BatchNormalization(name='BatchNorm_2')(X_dense2)
aux_input = Input(shape=(num_aux_input,), name='aux_input')
X_concat = keras.layers.concatenate([X_bn2,aux_input],name='concat')
X_bn3 = BatchNormalization(name='BatchNorm_3')(X_concat)
X_dense3 = Dense(layer3_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_3')(X_bn3)
X_bn4 = BatchNormalization(name='BatchNorm_4')(X_dense3)
X_output = Dense(3, activation='softmax', name='Output')(X_bn4)
model = Model(inputs=[X_input,aux_input],outputs=X_output)
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
# Determine which auxiliary inputs to use based on the user-specified "num_aux_inputs"
if num_aux_input == 1:
X_train_aux_input = X_train_aux_input_1
X_test_aux_input = X_test_aux_input_1
elif num_aux_input == 2:
X_train_aux_input = X_train_aux_input_2
X_test_aux_input = X_test_aux_input_2
elif num_aux_input == 3:
X_train_aux_input = X_train_aux_input_3
X_test_aux_input = X_test_aux_input_3
elif num_aux_input == 4:
X_train_aux_input = X_train_aux_input_4
X_test_aux_input = X_test_aux_input_4
elif num_aux_input == 5:
X_train_aux_input = X_train_aux_input_5
X_test_aux_input = X_test_aux_input_5
elif num_aux_input == 6:
X_train_aux_input = X_train_aux_input_6
X_test_aux_input = X_test_aux_input_6
elif num_aux_input == 7:
X_train_aux_input = X_train_aux_input_7
X_test_aux_input = X_test_aux_input_7
elif num_aux_input == 8:
X_train_aux_input = X_train_aux_input_8
X_test_aux_input = X_test_aux_input_8
# Determine class weight dictionary based on user-specified "alpha"
# "fracts" is the distribution of classes in our dataset, indicating that ~16% of the dataset is non-amplified,
# ~1% is inconclusive, and ~83% is amplified.
fracts = np.array([0.16,0.01,0.83])
invFracts = np.power((1/fracts),alpha)
weights = invFracts/(np.sum(invFracts))
class_weight_dict = {0:weights[0],1:weights[1],2:weights[2]}
# Fit the model
model.fit([X_train,X_train_aux_input],Y_train,epochs=epochs,batch_size=batch_size,class_weight=class_weight_dict)
Now that you've successfully created and fit your model, run the following cell to see the false positive rate (FPR), false negative rate (FNR), true positive rate (TPR), and true negative rate (TNR) performance metrics on the training set. A confusion matrix is also displayed:
# Convert array of annotations (Y_train) from Keras one-hot encoding to int labels (0,1,2)
Y_train_argmax = np.argmax(Y_train,axis=1)
# Obtain predictions from model on training set
Y_pred = model.predict([X_train,X_train_aux_input])
# Convert probabilities into class labels
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_train_argmax)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat
Run the following cell to see the FPR, FNR, TPR, and TNR performance metrics, along with the confusion matrix for the test set.
# Obtain predictions from model on test set
Y_pred = model.predict([X_test,X_test_aux_input])
# Convert probabilities into class labels
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_test)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat
Now that you have a feel for how the neural networks work, all of the default parameters are listed in the cell below. Adjust them, experimenting with different combinations of parameter values.
# Number of auxiliary inputs
num_aux_input = 6
# Number of neurons per hidden layer
layer1_depth = 30
layer2_depth = 30
layer3_depth = 20
# Activation Functions per hidden layer
layer1_activation = relu
layer2_activation = relu
layer3_activation = relu
# Batch size and Number of epochs
batch_size = 200
epochs = 10
# Regularization parameter (lambda)
regularization = 0.01
# alpha parameter in the class weighting formula
alpha = 1
Then run the next three cells to create and fit the model, obtain performance metrics on the training set, and obtain performance metrics on the test set. Go back and repeat the steps for this section (5.0) as many times as you wish before proceeding to the final section (6.0).
# Create Neural Network Model
X_input = Input(shape=(40,),name='Input')
X_dense1 = Dense(layer1_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_1')(X_input)
X_bn1 = BatchNormalization(name='BatchNorm_1')(X_dense1)
X_dense2 = Dense(layer2_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_2')(X_bn1)
X_bn2 = BatchNormalization(name='BatchNorm_2')(X_dense2)
aux_input = Input(shape=(num_aux_input,), name='aux_input')
X_concat = keras.layers.concatenate([X_bn2,aux_input],name='concat')
X_bn3 = BatchNormalization(name='BatchNorm_3')(X_concat)
X_dense3 = Dense(layer3_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_3')(X_bn3)
X_bn4 = BatchNormalization(name='BatchNorm_4')(X_dense3)
X_output = Dense(3, activation='softmax', name='Output')(X_bn4)
model = Model(inputs=[X_input,aux_input],outputs=X_output)
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
# Determine which auxiliary inputs to use based on the user-specified "num_aux_inputs"
if num_aux_input == 1:
X_train_aux_input = X_train_aux_input_1
X_test_aux_input = X_test_aux_input_1
elif num_aux_input == 2:
X_train_aux_input = X_train_aux_input_2
X_test_aux_input = X_test_aux_input_2
elif num_aux_input == 3:
X_train_aux_input = X_train_aux_input_3
X_test_aux_input = X_test_aux_input_3
elif num_aux_input == 4:
X_train_aux_input = X_train_aux_input_4
X_test_aux_input = X_test_aux_input_4
elif num_aux_input == 5:
X_train_aux_input = X_train_aux_input_5
X_test_aux_input = X_test_aux_input_5
elif num_aux_input == 6:
X_train_aux_input = X_train_aux_input_6
X_test_aux_input = X_test_aux_input_6
elif num_aux_input == 7:
X_train_aux_input = X_train_aux_input_7
X_test_aux_input = X_test_aux_input_7
elif num_aux_input == 8:
X_train_aux_input = X_train_aux_input_8
X_test_aux_input = X_test_aux_input_8
# Determine class weight dictionary based on user-specified "alpha"
fracts = np.array([0.16,0.01,0.83])
invFracts = np.power((1/fracts),alpha)
weights = invFracts/(np.sum(invFracts))
class_weight_dict = {0:weights[0],1:weights[1],2:weights[2]}
# Fit the model
model.fit([X_train,X_train_aux_input],Y_train,epochs=epochs,batch_size=batch_size,class_weight=class_weight_dict)
# Convert array of annotations (Y_train) from Keras one-hot encoding to int labels (0,1,2)
Y_train_argmax = np.argmax(Y_train,axis=1)
# Obtain predictions from model on training set
Y_pred = model.predict([X_train,X_train_aux_input])
# Convert probabilities into class labels
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_train_argmax)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat
# Obtain predictions from model on test set
Y_pred = model.predict([X_test,X_test_aux_input])
# Convert probabilities into class labels
Y_pred = Y_pred.argmax(axis=-1)
# Compute performance metrics
df_conf_mat,fpr,fnr,tpr,tnr,inc_pos,inc_inc,inc_neg,pos_inc,neg_inc = compute_confmat_3classes(Y_pred, Y_test)
print('FPR: '+str(fpr*100))
print('FNR: '+str(fnr*100))
print('TPR: '+str(tpr*100))
print('TNR: '+str(tnr*100))
df_conf_mat
Now that you've created, fit, and determined the performance metrics for one iteration of your model, you can create several iterations to obtain performance statistics (mean, standard deviation). The cell below will perform the following steps:
1. Instantiate Model: Your model architecture and initialized weights will be saved to an H5 file. This file represents the base version of the model prior to any fitting on the dataset and will be called each time we create a new iteration of the model.
2. Stratified K-Fold Partitioning: The training set will be stratified into K folds, such that K-1 folds are used to train the model and the last, unused fold is used for validation. "Stratified" refers to the class distribution in the entire training set being preserved in each fold.
3. Cross Validation and Testing: The stratified K-Fold partitioning is repeated 10 times and the mean and standard deviation of the performance metrics over all 10 runs computed. Performance metrics are computed, for each run, on the training set and the validation set. Note that the training and validation set change during each run, so that the model eventually encounters every example in the training set. The test set, which the model never encountered during any run of training, is then predicted on to obtain performance metrics.
Run the cell below to run the stratified K-Fold cross validation procedure. For simplicity, the code below will display performance metrics (FPR, FNR, TPR, TNR) on only the training and test set. Note that this cell will take a few minutes to run:
# Determine which auxiliary inputs to use based on the user-specified "num_aux_inputs"
aux_dict = dict()
if num_aux_input == 1:
aux_dict['X_train_aux_input'] = X_train_aux_input_1
aux_dict['X_test_aux_input'] = X_test_aux_input_1
elif num_aux_input == 2:
aux_dict['X_train_aux_input'] = X_train_aux_input_2
aux_dict['X_test_aux_input'] = X_test_aux_input_2
elif num_aux_input == 3:
aux_dict['X_train_aux_input'] = X_train_aux_input_3
aux_dict['X_test_aux_input'] = X_test_aux_input_3
elif num_aux_input == 4:
aux_dict['X_train_aux_input'] = X_train_aux_input_4
aux_dict['X_test_aux_input'] = X_test_aux_input_4
elif num_aux_input == 5:
aux_dict['X_train_aux_input'] = X_train_aux_input_5
aux_dict['X_test_aux_input'] = X_test_aux_input_5
elif num_aux_input == 6:
aux_dict['X_train_aux_input'] = X_train_aux_input_6
aux_dict['X_test_aux_input'] = X_test_aux_input_6
elif num_aux_input == 7:
aux_dict['X_train_aux_input'] = X_train_aux_input_7
aux_dict['X_test_aux_input'] = X_test_aux_input_7
elif num_aux_input == 8:
aux_dict['X_train_aux_input'] = X_train_aux_input_8
aux_dict['X_test_aux_input'] = X_test_aux_input_8
# Determine class weight dictionary based on user-specified "alpha"
class_weight_dict = dict()
fracts = np.array([0.16,0.01,0.83])
invFracts = np.power((1/fracts),alpha)
weights = invFracts/(np.sum(invFracts))
class_weight_dict = {0:weights[0],1:weights[1],2:weights[2]}
# Compute performance metric statistics for your Neural Network
model_name = 'amp_model_demo.h5'
model.save(model_name)
all_dict = cv_trainingset(model_name,X_train.values,Y_train,X_test.values,Y_test.values,numsamples,cv_dict,aux_dict,class_weight_dict)
print('Training Set Performance Metrics')
print("FPR: %0.2f (+/- %0.2f)" % (all_dict['fpr_train'][0]*100, all_dict['fpr_train'][1]*100 * 2))
print("FNR: %0.2f (+/- %0.2f)" % (all_dict['fnr_train'][0]*100, all_dict['fnr_train'][1]*100 * 2))
print("TPR: %0.2f (+/- %0.2f)" % (all_dict['tpr_train'][0]*100, all_dict['tpr_train'][1]*100 * 2))
print("TNR: %0.2f (+/- %0.2f)" % (all_dict['tnr_train'][0]*100, all_dict['tnr_train'][1]*100 * 2))
print('Test Set Performance Metrics')
print("FPR: %0.2f (+/- %0.2f)" % (all_dict['fpr_test'][0]*100, all_dict['fpr_test'][1]*100 * 2))
print("FNR: %0.2f (+/- %0.2f)" % (all_dict['fnr_test'][0]*100, all_dict['fnr_test'][1]*100 * 2))
print("TPR: %0.2f (+/- %0.2f)" % (all_dict['tpr_test'][0]*100, all_dict['tpr_test'][1]*100 * 2))
print("TNR: %0.2f (+/- %0.2f)" % (all_dict['tnr_test'][0]*100, all_dict['tnr_test'][1]*100 * 2))
Try running the cell below and compare the results you get with the table in the following cell:
# Set parameters
num_aux_input = 8
regularization = 0.01
alpha = 1.2
layer1_depth = 30
layer2_depth = 30
layer3_depth = 20
layer1_activation = relu
layer2_activation = relu
layer3_activation = relu
batch_size = 200
epochs = 10
# Set random seeds
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(42)
rn.seed(12345)
tf.set_random_seed(1234)
# Create model
X_input = Input(shape=(40,),name='Input')
X_dense1 = Dense(layer1_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_1')(X_input)
X_bn1 = BatchNormalization(name='BatchNorm_1')(X_dense1)
X_dense2 = Dense(layer2_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_2')(X_bn1)
X_bn2 = BatchNormalization(name='BatchNorm_2')(X_dense2)
aux_input = Input(shape=(num_aux_input,), name='aux_input')
X_concat = keras.layers.concatenate([X_bn2,aux_input],name='concat')
X_bn3 = BatchNormalization(name='BatchNorm_3')(X_concat)
X_dense3 = Dense(layer3_depth, activation=relu, kernel_regularizer=regularizers.l2(regularization), name='Dense_3')(X_bn3)
X_bn4 = BatchNormalization(name='BatchNorm_4')(X_dense3)
X_output = Dense(3, activation='softmax', name='Output')(X_bn4)
model = Model(inputs=[X_input,aux_input],outputs=X_output)
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
# Create auxiliary input dictionary
aux_dict = dict()
aux_dict['X_train_aux_input'] = X_train_aux_input_8
aux_dict['X_test_aux_input'] = X_test_aux_input_8
# Create class weight dictionary
class_weight_dict = dict()
fracts = np.array([0.16,0.01,0.83])
invFracts = np.power((1/fracts),alpha)
weights = invFracts/(np.sum(invFracts))
class_weight_dict = {0:weights[0],1:weights[1],2:weights[2]}
# Save model to H5 file
model_name = 'amp_model_demo.h5'
model.save(model_name)
# Run Stratified K-Fold Cross Validation
all_dict = cv_trainingset(model_name,X_train.values,Y_train,X_test.values,Y_test.values,numsamples,cv_dict,aux_dict,class_weight_dict)
print('Training Set Performance Metrics')
print("FPR: %0.2f (+/- %0.3f)" % (all_dict['fpr_train'][0]*100, all_dict['fpr_train'][1]*100 * 2))
print("FNR: %0.2f (+/- %0.3f)" % (all_dict['fnr_train'][0]*100, all_dict['fnr_train'][1]*100 * 2))
print("TPR: %0.2f (+/- %0.3f)" % (all_dict['tpr_train'][0]*100, all_dict['tpr_train'][1]*100 * 2))
print("TNR: %0.2f (+/- %0.3f)" % (all_dict['tnr_train'][0]*100, all_dict['tnr_train'][1]*100 * 2))
print('Test Set Performance Metrics')
print("FPR: %0.2f (+/- %0.3f)" % (all_dict['fpr_test'][0]*100, all_dict['fpr_test'][1]*100 * 2))
print("FNR: %0.2f (+/- %0.3f)" % (all_dict['fnr_test'][0]*100, all_dict['fnr_test'][1]*100 * 2))
print("TPR: %0.2f (+/- %0.3f)" % (all_dict['tpr_test'][0]*100, all_dict['tpr_test'][1]*100 * 2))
print("TNR: %0.2f (+/- %0.3f)" % (all_dict['tnr_test'][0]*100, all_dict['tnr_test'][1]*100 * 2))
Expected output: Your expected output should be close to the following:
**Training Set FPR =** | 0.190 |
**Training Set FNR =** | 0.048 |
**Test Set FPR =** | 0.021 |
**Test Set FNR =** | 0.085 |
plot_model(model, show_shapes=True, show_layer_names=True, to_file='model.png')
Upon locating the "model.png" file and opening it, ithe image should look similar to the following (the details depend on what model you designed prior to this section):
Tensorboard is an excellent visualization tool offered by Tensorflow, described by one of its developers as a "flashlight to shine on the black box of neural networks." Tensorboard will allow you to not only see a graph of your model architecture, as we created in the previous section using Keras, but also graphs of accuracy over time, loss over time, histograms of weight values over time, and much more.
To begin, we will be creating a new folder called "tb_logs" in the directory this notebook is saved on your system. As your neural network trains, information will be saved every epoch in the form of logs, which Tensorboard will use to create its visualizations.
Run the following cell to create the "tb_logs" folder:
# Specify the absolute path of the directory you will be saving the log files
LOG_DIR = folder_path+'/tb_logs'
os.mkdir(LOG_DIR)
As before, change (or leave as default) the parameters in the cell below, which will determine your neural network design. We will be working with a simpler 4-layer network in this section with no auxiliary inputs:
# Run K.clear_session() to clear old models and layers from memory to speed up execution in this section
# K.clear_session()
# Set parameters
layer1_depth = 40
layer1_activation = 'relu'
layer2_depth = 20
layer2_activation = 'relu'
layer3_depth = 20
layer3_activation = 'relu'
layer4_depth = 16
layer4_activation = 'relu'
epochs = 30
batch_size = 200
# Create Neural Network model
model = Sequential()
model.add(Dense(layer1_depth, activation=layer1_activation, name='dense_1', input_dim=40))
model.add(BatchNormalization())
model.add(Dense(layer2_depth, activation=layer2_activation, name='dense_2'))
model.add(BatchNormalization())
model.add(Dense(layer3_depth, activation=layer3_activation, name='dense_3'))
model.add(BatchNormalization())
model.add(Dense(layer4_depth, activation=layer4_activation, name='dense_4'))
model.add(BatchNormalization())
model.add(Dense(3, activation='softmax', name='dense_5'))
sgd = SGD(lr=0.1,decay=0.005,nesterov=False)
model.compile(optimizer=sgd,
loss='categorical_crossentropy',
metrics=['accuracy'])
As you will see going through this section, there are a lot of moving parts required to get Tensorboard up and running. In order to get you started, the following cell encapsulates a lot of those moving parts and we will not be going through all the details. Feel free, however, to read through the comments to see what is going on under the hood.
For the purposes of this tutorial, there are a few points you should know about how Keras is using Tensorboard:
Change (or leave as default) the value of "calcluations_freq" below:
# The frequency, in units of epochs, with which Tensorboard should perform specialized calculations
calculations_freq = 10
Run the following cell to initialize Tensorboard. The details of the code are not important for this tutorial:
# Create a one-hot encoding of Y_test
Y_test_onehot = to_categorical(Y_test)
# Specify which layers Tensorboard should derive embeddings from
embedding_layer_names = set(layer.name for layer in model.layers if layer.name.startswith('dense_'))
# Specify what data Tensorboard should use to calculate the embeddings
embeddings_data = X_test.values
# Create metadata file containing annotations associated with each example that will be embedded
metadata_file = os.path.join(LOG_DIR, 'metadata.tsv')
with open(metadata_file, 'w') as f:
for ii in range(len(Y_test.values)):
f.write('{}\n'.format(int(Y_test.values[ii])-1))
# Define the Tensorboard callback.
tensorboard = TensorBoard(log_dir=LOG_DIR, histogram_freq=calculations_freq, batch_size=32,
write_graph=True, write_grads=False, write_images=True,
embeddings_freq=calculations_freq, embeddings_metadata=metadata_file,
embeddings_layer_names=embedding_layer_names,
embeddings_data=embeddings_data)
# Fit the model
model.fit(X_train,Y_train,callbacks=[tensorboard],epochs=epochs,batch_size=batch_size,validation_data=(X_test,Y_test_onehot))
While the cell above is running, open Terminal (for Mac) or Command Prompt (for Windows). For Mac Users in Terminal, you will see something like the following:
Note the absolute path on your system where the "tb_logs" folder is stored. Enter the following command in Terminal/Command Prompt, replacing "[path]" with your absolute path:
"tensorboard --logdir [path]"
In my case, "[path]" was "/Users/christopher.chin/tb_logs." After entering the command in Terminal/Command Prompt your window should be similar to the following:
Press "Enter" and the command will run, starting up Tensorboard.
In order to access Tensorboard, you will have to open a new window in the Web Browser of your choice and enter the following in the address bar. Note that "6006" is the port on your localhost assigned to Tensorboard:
"localhost:6006"
After entering the line above in your browser's address bar, your screen should be similar to the following:
Press "Enter" and you will be directed to the server running Tensorboard. You should see a screen in your browser similar to the following. (If your start-up screen is initially very different, the "Scalars" dashboard in the top bar of the page may not yet be available. In that case, wait a little for Tensorboard to perform its computations. Within a few minutes, the "Scalars" dashboard option should appear and clicking on it will produce the start-up screen below):
On the top bar of the screen are the different dashboards you have access to:
Scalars
Images
Graphs
Distributions
Histograms
Projector
We will be going through each of these dashboards step-by-step in the next few sections.
We will be focusing in this section on the "Scalars" dashboard. Examine the screenshot below:
The Scalars dashboard gives you visualizations of all the scalar information Tensorboard has collected over the epochs of training, specifically the accuracy and loss. As noted previously, every epoch the model will evaluate the accuracy and loss on the training set (indicated by the top two white tabs "acc" and "loss"), but also the accuracy and loss on the test set (indicated by the bottom two white tabs "val_acc" and "val_loss"). You can click on the white tabs themselves to show or hide the corresponding graphs. Note that "val_loss" does not appear in the screenshot above, but is beneath the white "val_acc" tab if you continue to scroll down on the page.
All of the graphs can be manipulated in the same way, so we will examine the training accuracy "acc" plot as a representative model. As indicated in the screenshot, click the left-most button to enlarge the image. You should see something similar to the following:
As indicated in the screenshot, there is a section on the left-hand-side pane called "Smoothing" that contains a sliding bar and box that accepts numerical inputs. By default, the graph of accuracy is smoothed, with degree of smoothing 0.6, to more easily view the trend in the graph. You can reduce the degree of smoothing to 0, or increase the smoothing to 1 by sliding the bar or entering the desired value into the box. By reducing the smoothing to 0, you can see a graph similar to the following:
Note also that moving your cursor along the graph provides you with a black box containing information about the specific training accuracy values calculated every epoch.
The Images dashboard allows you to visualize the weights that the neural network is learning over time. Clicking on the "Images" dashboard at the top of the Tensorboard page should open a windows similar to the following:
As indicated in the screenshot, click the white tab labeled "batch_normalization_1" to close it. Close all other open tabs until the screen looks like the following:
As indicated in the screenshot, click the white tab labeled "dense_1" to open it. Your screen should now look similar to the following:
We are now looking at a visualization of the bias vector and weight matrix for the first hidden layer (called "dense_1") of the neural network. Darker colors correspond to values of larger magnitude while lighter colors correspond to weights of smaller magnitude. In order to interpret these images, recall that neural network computations follow the series of steps described below. This entire set of computations occurs for every neuron in every layer:
Let's fix one neuron in a given layer as an example. The first part of the calculations is the "pre-activation," which is computed by taking the sum of the products of the outputs of all neurons in the previous layer with their corresponding weighted connections to the fixed neuron under consideration. A bias vector is then added to the result. In the second part of the calculations, an "activation function" is applied to the "pre-activation" output to yield the final output. Therefore, since the first hidden layer "dense_1" in the network contains 40 neurons and the previous layer - the input layer - is 40-dimensional, there must be 40 weighted connections per neuron in "dense_1." Since the "pre-activation" computation can be summarized as a matrix-vector multiplication (between a weight matrix and the output vector of the previous layer), the weight matrix associated with "dense_1" must be of shape (40x40). (Multiplying the 40x40 weight matrix by the 40x1 vector of inputs yields a 40x1 vector representing the "pre-activation" output). The bias vector must then be of shape (40x1), as it is added to the result of the matrix-vector multiplication. The weight matrix is reproduced and enlarged below:
The weight matrix can be interpreted as follows:
As indicated in the screenshot, you can also move the "Epochs slider" from right to left to see the weight matrix values at earlier epochs. The same operations can also be done on the bias vector visualization.
In Bonus Section 1.0, we plotted your model architecture using Keras as a high-level flow-diagram showing the layer names, layer sizes, and connections between layers. The Graphs Tab of Tensorboard functions as a more in-depth, low-level visualization of your model. After selecting the "Graphs" dashboard, your screen should be similar to the following:
There are a lot of details on this page that I encourage you to expore on your own. For the purposes of this tutorial, we will focus on the flow-diagram on the left titled "Main Graph," which shows all the connections between layers and serves as a similar graphical rendition to the Keras visualization we plotted earlier. Here, however, you can interact with the graph and double-click on a layer to show the inner computations. As indicated in the screenshot, double-click on "dense_4" to show more details about the layer:
Upon opening up the "dense_4" layer, you can now see the computations with the weight matrix, bias vector, the regularization applied, etc. Opening up all the layers in this fashion would allow you to follow the backpropagation of the gradient through all the layers of the network (used to update all weight parameters as the model trains).
The Distributions dashboard is a variation upon the Images dashboard, in the sense that it allows you to see the evolution of the weight matrices and bias vectors in your model over time. This time, however, the visualization takes the form of distribution, as the heading suggests. Upon clicking on the "Distributions" dashboard, your screen should be similar to the following:
As indicated in the screenshot, click on the white tab titled "batch_normalization_1" to close it. Close any other open white tabs in a similar manner until your screen resembles the following:
As indicated in the screenshot, click on the "dense_1" tab to open it. You should now see graphs similar to the ones below:
Here is how to interpret the graphs:
Horizontal Axis: Epochs
Vertical Axis: Values of the weights/bias
Horizontal Lines: The horizontal lines cutting through the graphs represent percentiles of the distribution over the weights/biases. From top to bottom, the lines represent the maximum, 93%, 84%, 69%, 50%, 31%, 16%, 7%, and minimum percentiles in the distribution. Following an individual line across the graph shows how that particular percentile has over the epochs of training. For instance, follow the highest line in the graphs to see how the maximum value changes over time.
The Histograms dashboard contains rotated views of the graphs from the Distributions dashboard, so we will not spend too much time in this section. However, the different perspective can provide different insights into how the weights/biases are changing over time. Upon clicking on the Histograms dashboard, closing all open white tabs, and opening the white tab titled "dense_1," your screen should be similar to the following:
Once again, these graphs are the same as those we saw in the previous section on the Distributions tab, only rotated so that the following interpretation holds:
Horizontal Axis: Values of the weights/bias
Vertical Axis: Epochs
Horizontal Slices: The horizontal slices that appear on individual horizontal axes in the graphs represent histograms of the weight/bias values at a particular epoch. Histograms for earlier, older epochs appear in the "back," while histograms for later, more recent epochs appear in the front. Looking at an individual temporal slice histogram in this way allows you to see the distribution of the weights/bias values at a given moment in time.
One of the most exciting features of Tensorboard is the visualization of embeddings. Typically, embeddings refer to the conversion of qualitiative objects like words into numerical vectors, which can then be compared on the basis of similarity metrics like distance. More generally, however, embeddings can be thought of as function mappings from input to output. Each layer of a neural network can therefore be thought of as creating an embedding of the input feature vector, transforming it into a new-dimensional space. For example, given the 40-dimensional input feature vector we have been considering thus far (corresponding to the 40 fluorescence values for each cycle), if the first hidden layer has 40 neurons, it is embedding or transforming the input feature vector into a new 40-dimensional space. The second hidden layer, which has 20 neurons, is then transforming its input into a new 20-dimensional space, and so on. The Projector dashboard allows us to visualize these high-dimensional embeddings by translating them into 2 or 3 dimensions via algorithms like PCA (taking the first two or three principal components), or t-SNE.
Upon clicking on the Projector Dashboard, your screen should be similar to the following. (If you instead receive an error message saying "Error fetching tensors" the projector dashboard is not yet ready for viewing, as Tensorboard may still be performing its calculations. Wait a few minutes and re-visit the dashboard. Typically, the Projector dashboard is fully accessible once the neural network has finished training completely):
Each data point in this space represents an example in our training set. The visualization right now does not have much meaning, however, because we are unable to identify which data point belongs to which class. We will be distignuishing the different classes by color. First, as indicated in the screenshot, click on the dropdown menu titled "Color by":
Now select the option "label":
All the data points are now colored according to the class they correspond to. (Important: Note that the colors on your screen may differ from those depicted in the screenshots. Hover your cursor over individual, colored data points to see which color corresponds to which class). In the screenshots, red corresponds to "Amplified (Label: 1)", orange corresponds to "Inconclusive (Label: 0)", and blue corresponds to "Non-Amplified (Label: -1)." Representative points belonging to each of these classes are labeled in the screenshot. Feel free to explore the visualization by clicking individual data points, clicking and dragging to rotate the plot, etc. You can see that the first hidden layer of the neural network has found an embedding for the different classes, in which each class belongs to a to different region of a high-dimensional manifold. Now click on the dropdown menu on the top of the left-hand-side pane titled "5 tensors found." Select "dense_5_embedding" and your screen should be similar to the following:
This visualization represents the embedding of the last layer of the neural network. Once again, the neural network has found an embedding where each class belongs to a different region in the high-dimensional space. Data points that are "close" to each other in the space ("closeness" being defined by a distance metric) are more similar to each other and more likely to belong to the same class.
That concludes our tutorial. Hopefully, this demo provided a good starting point for you to begin working with your own neural network designs. As a final note, the Keras documentation is an excellent resource for future inquiries and additional exploration:
In addition, note that if you wish to use Tensorboard after finishing this demo, you must delete all the contents of your tb_logs folder so that Tensorboard does not overlap your new graphs with graphs of previous models.