TY - JOUR
T1 - Optimizing non-decomposable measures with deep networks
AU - Sanyal, Amartya
AU - Kumar, Pawan
AU - Kar, Purushottam
AU - Chawla, Sanjay
AU - Sebastiani, Fabrizio
N1 - Funding Information:
Acknowledgements A.S. did this work while he was a student at IIT Kanpur and acknowledges support from The Alan Turing Institute under the Turing Doctoral Studentship grant TU/C/000023. P. Kar is supported by the Deep Singh and Daljeet Kaur Faculty Fellowship and the Research-I foundation at IIT Kanpur, and thanks Microsoft Research India and Tower Research for research grants.
Funding Information:
A.S. did this work while he was a student at IIT Kanpur and acknowledges support from The Alan Turing Institute under the Turing Doctoral Studentship grant TU/C/000023. P. Kar is supported by the Deep Singh and Daljeet Kaur Faculty Fellowship and the Research-I foundation at IIT Kanpur, and thanks Microsoft Research India and Tower Research for research grants. Editors: Jesse Davis, Elisa Fromont, Derek Greene, and Bjorn Bringmann. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Publisher Copyright:
© 2018, The Author(s).
PY - 2018/9/1
Y1 - 2018/9/1
N2 - We present a class of algorithms capable of directly training deep neural networks with respect to popular families of task-specific performance measures for binary classification such as the F-measure, QMean and the Kullback–Leibler divergence that are structured and non-decomposable. Our goal is to address tasks such as label-imbalanced learning and quantification. Our techniques present a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations offer several advantages including (i) the use of fewer training samples to achieve a desired level of convergence, (ii) a substantial reduction in training time, (iii) a seamless integration of our implementation into existing symbolic gradient frameworks, and (iv) assurance of convergence to first order stationary points. It is noteworthy that the algorithms achieve this, especially point (iv), despite being asked to optimize complex objective functions. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as popular techniques used to handle label imbalance.
AB - We present a class of algorithms capable of directly training deep neural networks with respect to popular families of task-specific performance measures for binary classification such as the F-measure, QMean and the Kullback–Leibler divergence that are structured and non-decomposable. Our goal is to address tasks such as label-imbalanced learning and quantification. Our techniques present a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations offer several advantages including (i) the use of fewer training samples to achieve a desired level of convergence, (ii) a substantial reduction in training time, (iii) a seamless integration of our implementation into existing symbolic gradient frameworks, and (iv) assurance of convergence to first order stationary points. It is noteworthy that the algorithms achieve this, especially point (iv), despite being asked to optimize complex objective functions. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as popular techniques used to handle label imbalance.
KW - Deep learning
KW - F-measure
KW - Optimization
KW - Task-specific training
UR - http://www.scopus.com/inward/record.url?scp=85049596317&partnerID=8YFLogxK
U2 - 10.1007/s10994-018-5736-y
DO - 10.1007/s10994-018-5736-y
M3 - Article
AN - SCOPUS:85049596317
SN - 0885-6125
VL - 107
SP - 1597
EP - 1620
JO - Machine Learning
JF - Machine Learning
IS - 8-10
ER -