TY - GEN
T1 - That’s What She Said_ Humor Identification with Word Embeddings and Recurrent Neural Networks
AU - Kayastha, Ashish
AU - Redei, Alexander
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Humor identification is an interesting problem in natural language that has previously been tackled with handcrafted features and traditional classifiers. We take on a famous double entendre identification problem—the “That’s What She Said” (TWSS) joke—using a powerful class of models for Natural Language Processing (NLP): Word Embeddings and Recurrent Neural Networks (RNNs). We investigated the benefits of these models by discriminating their performance on three train/test sets, each having a different class balance. Our best model achieves a precision of 93.7% and a recall of 96.5% on a class balanced train/test set (generated from web data) using GloVe embeddings paired with Gated Recurrent Unit (GRU). More importantly, the model maintains stellar performance with a precision of 89.2% and a recall of 84.7% on a class imbalanced train/test set. These results are remarkable compared to the previous state-of-the-art approach, based on feature engineering that only manages to achieve a precision of 71.4% and a recall of less than 20%.
AB - Humor identification is an interesting problem in natural language that has previously been tackled with handcrafted features and traditional classifiers. We take on a famous double entendre identification problem—the “That’s What She Said” (TWSS) joke—using a powerful class of models for Natural Language Processing (NLP): Word Embeddings and Recurrent Neural Networks (RNNs). We investigated the benefits of these models by discriminating their performance on three train/test sets, each having a different class balance. Our best model achieves a precision of 93.7% and a recall of 96.5% on a class balanced train/test set (generated from web data) using GloVe embeddings paired with Gated Recurrent Unit (GRU). More importantly, the model maintains stellar performance with a precision of 89.2% and a recall of 84.7% on a class imbalanced train/test set. These results are remarkable compared to the previous state-of-the-art approach, based on feature engineering that only manages to achieve a precision of 71.4% and a recall of less than 20%.
KW - Computational humor
KW - Computational linguistics
KW - Deep learning
KW - Natural language processing
KW - Recurrent Neural Networks
KW - That’s What She Said
UR - http://www.scopus.com/inward/record.url?scp=85126935340&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-98015-3_14
DO - 10.1007/978-3-030-98015-3_14
M3 - Conference contribution
AN - SCOPUS:85126935340
SN - 9783030980146
T3 - Lecture Notes in Networks and Systems
SP - 209
EP - 221
BT - Advances in Information and Communication - Proceedings of the 2022 Future of Information and Communication Conference, FICC
A2 - Arai, Kohei
PB - Springer Science and Business Media Deutschland GmbH
T2 - Future of Information and Communication Conference, FICC 2022
Y2 - 3 March 2022 through 4 March 2022
ER -