Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.
GET A 40% DISCOUNT ON YOU FIRST ORDER
To apply learned machine learning, deep learning, and explainable AI skills to predict and explain outcomes in a binary classification problem in which the data is heavily weighted toward one class.
• To consolidate the problem-solving skills in designing and executing a deep learning pipeline
• To develop skills in handling unbalanced data
• Consolidate skills in programming in TensorFlow/Keras
• Perform tuning of deep learning models
• Compare regularisation strategies commonly used in deep learning
• Compare the outcomes of different classification algorithms
• Implement an explainable AI system to compare performances of different models
• Reflect on machine learning model performance
The following table shows a breakdown of the activities for this assignment to assist you in prioritising your time. Note that these times are representative of the average time needed to complete each task. The exact time to complete each task is dependent on your implementation/analysis speed and personal circumstances.
Task 1 Task 2 Task 3 Task 4 Write-Up
2-3 days 5-6 Days 3-4 days 3-4 days
Note that this assignment is time-consuming and requires thoughtful planning to complete all required tasks. There will be discussion threads available through Keats to answer your questions and help you with technical difficulties.
A juypter notebook containing snippets of useful code has been supplied to help you get started
Task: Classifying the presence of cold and flu in speech samples
Now that you have been equipped with the skills to use different machine learning and deep learning algorithms, you will have the opportunity to practice and apply it to a practical problem.
The task you will conduct in this assignment is classifying the presence or absence of cold and flu in speech samples. For this task, you will be using a publicly available dataset from the INTERSPEECH Computational Paralinguistics Challenge Series. For details on this data, please see the following paper
The data has already been partitioned into three separate groupings. There are two partitions (train and development) for initial testing of your models and a third (test) for the blind testing their generalisability. The distribution of speech samples in these partitions are:
895 2,876 8,656 25,776 9,551 28,652
Cold 970 Not Cold 8,535 Σ 9,505
1,011 8,585 9,595
To protect the privacy of the speakers in these files, you will not be given the speech files. Instead, you will be supplied with features already extracted from each speech sample. The feature representation is an 88-dimensional representation of each file, known as the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS). Note, each speech file has been converted into one 88-dimensional vector, so while speech is a time-series, the conversion into an eGeMAPS feature vector removes time dependencies. Therefore, there is no benefit in using Recurrent Neural Networks in the assignment.
Your task for the assignment will be to create generalisable deep learning models using the supplied data and TensorFlow/Keras. If you do not have, or need help installing this software on your computer, please contact the module organiser. You will be supplied with the “Cold” and “Not-Cold” labels for the training and development partitions. Using these, you can develop different systems for performing the 2-class classification task.
Then, from your most suitable models, you will then generate predictions for the test set. The accuracy of the test set predictions will be independently verified by your lecturer. You are required to verify the results of 5 different models.
Additionally, you will implement the Local Interpretable Model-agnostic Explanations (LIME) framework to help compare the performance of your developed models on exemplar data instances.
Task 1: Baseline System and Data Balancing (15% of marks)
As deep neural networks contain a considerably higher number of parameters and a wider choice of hyperparameters than other classification approaches, it is often necessary to first implement a simpler classifier to understand the difficulty of the associated task. Use the supplied data training and development data partitions for this task.
1. Generate a suitable baseline classification model. For this, use either a Support Vector Machine or Random Forest Classifier. Both are implementable through SciKit-Learn.
2. Identify the most suitable metric for assessing system performance
3. Perform a suitable grid-search to find a reasonable set-up of your chosen classifier
4. Using your chosen baseline system, explore the benefits of random downsampling of the
majority class; random upsampling of the minority class and cost-sensitive training (i.e., the inclusion of class weights)
• Hint #1: both sampling methods are implementable using the imbalance-learn toolkit
• Hint #2: you may need to find new hyperparameter values for your models to release the benefits of these methods
5. Document your findings and observations
Task 2: Neural Network Classification (40% of marks)
The aim of this task is to identify suitable deep learning models using the supplied data training and development data partitions and associated labels
1. Explore different approaches for implanting both Feedforward (Dense) Neural Networks and Convolutional Neural Networks as realised through TensorFlow/Keras. This work should include observing the effect of changing
• The number and width of hidden layers used
• Using of different activation functions
• Different optimisation techniques
• Different regularisation strategies
• Combinations of the above
2. Once you have identified reasonable network architectures, observe the effect of different data balancing approaches with different networks.
3. Document your findings and observations
Task 3: Generating Explanations (25% of marks)
The aim of this task is to compare the predictions made by your baseline and strongest deep learning based system using the LIME toolkit.
1. Identify your most robust baseline and deep learning models
2. Document your reasons for choosing these models.
3. From the development data, identify a data instance that returned true-positive results for
both classifiers, compare the resulting explanations
4. Do the same for a false-positive, true-negative, and false-negative instance.
5. Finally, compare for a data instance that returned true-positive on the baseline system, but
not on the deep learning; and a data instance that returned a true-positive for your deep
learning system but not your baseline system
6. Document your findings for steps 3, 4 and 5. What similarities did you seen in each pair of
explanations? What were the main differences?
Task 4: Generalisability Testing (20% of marks)
The aim of this task is to test the generalisability of your developed models on completely held-out test set data.
1. Identify your 5 most suitable models for further evaluation
• Choose two “baseline” approaches and three deep learning approaches, including at
least one feedforward and one convolutional network
2. Document your reasons for choosing these models.
3. Combine the training and development features and labels to create a ‘new’, larger training
4. Retrain your 5 chosen models using this ‘new’ training data. Keep all settings and hyperparameters the same as you had originally implemented them in the previous days
5. Using these models generate predictions for the supplied test set features
• Convert the predications into a .csv file
• The file should be named _Trial_.csv
• CSV files format:
• A Scikit-Learn classification
report and a confusion matrix for each of the 5 submissions will be emailed back to you.
6. Analyse the returned performance of each model. Did the model generalise well? Document
a potential reason why it did or did not.
Note, your marks in this section will not be affected by the performance of your models, rather your reasons for choosing the models and subsequent thoughts regarding their performance.
IMPORTANT: Do not leave the generation of test-set predications to the last minute. Assume at least a 24-hour delay between emailing your CSV and the return of your results.
3pm (BST) 25th June via the module’s Keats page
You are required to submit the following files:
• A well-documented and fully executable Jupyter notebook containing the models and LIME
pipe used in Task 3, and the 5 models assessed in Task 4.
• A 1000-1500 report detailing:
o Therangeoftestingyoudidtodevelopyourbaselinesystemsandthemake-up (core hyperparameters, any data balancing steps, etc) of your final baseline model
the make-up of the best performing model you discover
via LIME, and your insights from this analysis
o Youranalysisoftheperformanceofthesemodels o Conclusions
The assignment will be marked out of 100, with the distribution of marks as given in each task’s section heading.
Each section will be marked against the following criteria:
1. Correct, executable, complete and well-documented code (Jupyter Notebook).
2. Minimalist-style coding (Jupyter Notebook. Points will be deducted when unnecessary repetition
of steps due to the lack of control structures.
3. Detailed description and justification of key design choices in all tasks (Report). Each step must be fully described in detail and justified. E.g., why, and how did you select the set of hyperparameters shown in the notebook? An analytical approach should be taken when choosing all key (hyper)parameters.
4. Overall quality (Report). It should be a fully specified document containing an Introduction and conclusion alongside the required information for each task.
5. Visualisation (Jupyter Notebook & Report). Any figures must be readable/understandable (suitable sized text and figure), with adequate captions and axis titles. You may duplicate figures in your report to aid readability.