API Reference¶
AlgorithmMeta¶
-
class
mnist_classifier.algorithm_meta.
AlgorithmMeta
(report_directory: str = None, test_suite_iter: int = None)¶ The Algorithm parent class which contains all the basic algorithm methods. Most of the logic of the algorithm is done here. Indeed, other than setting up the algorithm to match given specs (like number of trees or hidden layers) the train/test mechanics is the same.
-
static
calc_standard_error
(error: float, sample_count: int)¶ Calculates the Wilson score interval with 95% confidence, based on the paper by Edwin B. Wilson 1.
- 1
Edwin B. Wilson (1927) Probable Inference, the Law of Succession, and Statistical Inference, Journal of the American Statistical Association
We interpret the results as the +/- of the error rate of the algorithm. For example, an error rate of 0.02 with 50 samples and a confidence of 95% yields 0.0388. So the error rate can be read as 0.02 +/- 0.0388.
- Parameters
error (float) – the error rate of the test results
sample_count (int) – the number of test samples used
- Returns
the standard error
- Return type
float
-
display_results
(cache)¶ Displays various graphs that are pertinent to the algorithm’s score (such as a confusion matrix)
- Parameters
cache (dict) – the arguments to display (varies from algorithm to algorithm)
-
eval_train_test_cache
(train_data, train_labels, test_data, test_labels)¶ Generates a cache object containing the test and test data, labels, and accuracies, and a copy of the model.
- Parameters
train_data (numpy.array) – the raw training data
train_labels (numpy.array) – the ground truth of the training data
test_data (numpy.array) – the data to test against
test_labels (numpy.array) – the ground truth of the test data
- Returns
dict
- Return type
the actual data, predicted data, accuracy, and model in a dict format
-
fit
(data, targets)¶ Fits the internal model on the given data, and returns it
- Parameters
data (numpy.array) – the data on which you want to fit
targets (numpy.array) – the target classes of the training data you want to fit
- Returns
sklearn.BaseEstimator
- Return type
The trained model
-
load_model
(filepath)¶ Loads the model from disk into the object’s
model
attribute- Parameters
filepath (str) – the path of the model on disk
-
predict
(data_to_predict)¶ Returns prediction of the class y for input
- Parameters
data_to_predict (numpy.array) – Sample data set on which to generate predictions
- Returns
numpy.array
- Return type
Array with the predicted class label
-
print_results
(cache)¶ Prints the results of the classification, and returns them as a pandas DataFrame
- Parameters
cache (dict) – the cache of a
run_classification()
function call.- Returns
the classification results as a single-line data frame
- Return type
pandas.DataFrame
-
run_classification
(train_data, train_labels, test_data, test_labels, model_to_save=None, model_to_load=None)¶ Trains and tests the classification
- Parameters
train_data (numpy.array) – the data to train on
train_labels (numpy.array) – the labels of the train data
test_data (numpy.array) – the data to use to run predictions
labels (test) – the ground truth of the test data
model_to_load (str) – filepath of a saved model to load instead of train
model_to_save (str) – filepath on which to save the trained model
- Returns
dict
- Return type
Returns collection with prediction and accuracy
-
save_model
(filepath)¶ Saves the trained model attribute to disk
- Parameters
filepath (str) – the destination filepath to save to disk to.
-
save_results
(results: pandas.core.frame.DataFrame)¶ Saves the results to disk as a CSV file if the report_directory is not None. If the output report file already exists, it will have lines appended to it
- Parameters
results (pandas.DataFrame) – the results table to save to disk.
-
static
RandomForest¶
-
class
mnist_classifier.random_forest.
RandomForest
(n_estimators, max_depth, criterion, random_seed: int = None, report_directory: str = None, test_suite_iter: int = None)¶ Random Forest which inherits from the AlgorithmMeta class
-
display_results
(cache)¶ Displays various graphs that are pertinent to the algorithm’s score (such as a confusion matrix)
- Parameters
cache (dict) – the arguments to display (varies from algorithm to algorithm)
-
load_model
(filepath)¶ Loads the model from disk into the object’s
model
attribute- Parameters
filepath (str) – the path of the model on disk
-
print_results
(cache)¶ Prints the results of the classification, and returns them as a pandas DataFrame
- Parameters
cache (dict) – the cache of a
run_classification()
function call.- Returns
the classification results as a single-line data frame
- Return type
pandas.DataFrame
-
MLP¶
-
class
mnist_classifier.mlp.
MLP
(hidden_layer_sizes: tuple = 10, 10, 10, alpha: float = 0.0001, batch_size='auto', max_iter: int = 200, verbose: bool = False, random_seed: int = None, report_directory: str = None, test_suite_iter: int = None)¶ A basic MLP classifier
-
display_results
(cache)¶ Displays various graphs that are pertinent to the algorithm’s score (such as a confusion matrix)
- Parameters
cache (dict) – the arguments to display (varies from algorithm to algorithm)
-
load_model
(filepath)¶ Loads the model from disk into the object’s
model
attribute- Parameters
filepath (str) – the path of the model on disk
-
print_results
(cache)¶ Prints the results of the classification, and returns them as a pandas DataFrame
- Parameters
cache (dict) – the cache of a
run_classification()
function call.- Returns
the classification results as a single-line data frame
- Return type
pandas.DataFrame
-
Dataset¶
Downloads and prepares the dataset for use with other algorithms
-
mnist_classifier.dataset.
load_test_data
()¶ loads the test data
- Returns
data (numpy.array) – 2D numpy array with the image data (one image per row)
labels (numpy.array) – 1D numpy array with the label for each corresponding image
-
mnist_classifier.dataset.
load_train_data
()¶ loads the training data
- Returns
data (numpy.array) – 2D numpy array with the image data (one image per row)
labels (numpy.array) – 1D numpy array with the label for each corresponding image
Visualizer¶
Visualizer
-
mnist_classifier.visualizer.
display_loss_curve
(losses, save_location: str = None)¶ Plots and displays the loss curve (usually for Neural Network models)
- Parameters
save_location (str) – the location to save the figure on disk. If None, the plot is displayed on runtime and not saved.
losses (numpy.array) – the losses array of the MLP classifier’s training.
- Returns
the figure
- Return type
matplotlib.pyplot.figure
-
mnist_classifier.visualizer.
display_mlp_coefficients
(coefficients, rows=4, cols=4, save_location: str = None)¶ Shows the first layer’s coefficients of the input layer
The first rows*cols neurons’ coefficients are displayed. if rows*cols is greater than the number of neurons, all the neurons are displayed. If there are more neurons’ worth of coefficients to display than rows*cols, only the first ones are displayed.
- Parameters
numpy.array (coefficients) – 2D numpy array containing the input coefficients (or weights) of the MLP’s hidden layers. Only the first layer’s coefficients are displayed
rows (int) – the number of rows to display in the figure
cols (int) – the number of columns to display in the figure
save_location (str) – the location to save the figure on disk. If None, the plot is displayed on runtime and not saved.
- Returns
the figure
- Return type
matplotlib.pyplot.figure
-
mnist_classifier.visualizer.
display_rf_feature_importance
(cache, save_location: str = None)¶ Displays which pixels have the most influence in the model’s decision. This is based on sklearn,ensemble.RandomForestClassifier’s feature_importance array
- Parameters
save_location (str) – the location to save the figure on disk. If None, the plot is displayed on runtime and not saved.
cache (dict) – the cache dict returned by the classifier. Must at least include [‘actual’, ‘prediction’] objects, each with [‘train’, ‘test’] arrays
- Returns
the figure
- Return type
matplotlib.pyplot.figure
-
mnist_classifier.visualizer.
display_train_test_matrices
(cache, save_location: str = None)¶ Displays the train and test confusion matrices
- Parameters
save_location (str) –
location to save the figure on disk. If None (the) –
plot is displayed on runtime and not saved. (the) –
cache (dict) – the cache dict returned by the classifier. Must at least include [‘actual’, ‘prediction’] objects, each with [‘train’, ‘test’] arrays
- Returns
the figure
- Return type
matplotlib.pyplot.figure
Report Manager¶
Handles everything concerning reporting.
-
mnist_classifier.report_manager.
load_test_suite_conf
(filepath: str)¶ Loads a test suite json configuration file, and returns an array of parameters to pass to the argument parser. The JSON file should be formatted like the test_suite_example.json file in this repository. The names of the dict keys for each test should be
- Parameters
filepath (str) – the filepath to the test suite JSON.
- Returns
a list of parameter lists corresponding to the configurations of each of the tests to be run
- Return type
list
-
mnist_classifier.report_manager.
prepare_report_dest
(report_filepath: str)¶ Prepares the destination output file. Checks if the location exists already, if it does, it creates a unique version with an auto_increment. So for example if inputted “my_report” and a folder “my_report” exists, a folder will be created called “my_report_1”
- Parameters
report_filepath (str) – the target folder where the report should be created
- Returns
the actual filepath that was created
- Return type
str