Library we used in this project


alternative
alternative
alternative
alternative
alternative

Overview

alternative

There are two ways of getting the prediction from our model.

  • Preprocess -> Embedding Model -> Classifier.
  • Preprocess -> Embedding Model -> Label correction -> Classifier.

We can get an average accuracy of 94% by method 1. Furthermore, the average accuracy increased to 96% by adding label correction algorithm.

Reference: "Deep Learning for Suicide and Depression Identification with Unsupervised Label Correction" by Ayaan Haque*, Viraaj Reddi*, and Tyler Giallanza from Saratoga High School and Princeton University. In ICANN, 2021.

Methodology

alternative
Step 1

Preprocess data

Filter content length < 100 words (~150,000 data), remove URLs and Emoji, unpack contraction words and case normalization.

Step 2

Embedding Models

BERT (Bidirectional Encoder Representations from Transformers) is a popular transfer-based word embedding model. Instead of proceeding word by word sequentially like RNN/LSTM, it totally avoids recursion, by processing sentences as a whole and by learning relationships between words.

We feed 150,000 posts into distilBERT to extract only the dimension with [CLS] for our classification task.

alternative
alternative
Step 3

Decompose and cluster

In this step, we use PCA (Principal component analysis) to decompose the data from (n, 768) to (n, 2) where n is the number of posts and 768 is the number of features extracted from BERT.

Step 4

GMM and Relabel

Cluster the data into two groups by GMM (Guassian Mixture Model) and Relabel the data which has different label with its initial.

  • Get the results from GMM (e.g. 90% in group 0 and 10% in group 1)
  • Relabel the data only if "data is different from its initial" AND "the possibility is higher than 95%"".
alternative
Step 5

DNN classifier

The network is composed of an input layer of size 768, a hidden ReLu layer of size 128, and another hidden ReLu layer of size 64. Output layer is a sigmoid activation layer. Adam optimizer is used with binary cross entropy for loss.

Check out the results

Our Fully Dense Network with unsupervised GMM clustering confidence correction achieved a 96% testing accuracy in successfully determining whether input sentences portray depressive or suicidal sentiment.