TY - JOUR
T1 - Modelling count response variables in informetric studies
T2 - Comparison among count, linear, and lognormal regression models
AU - Ajiferuke, Isola
AU - Famoye, Felix
N1 - Funding Information:
The authors would like to thank Mary Aderayo Bamimore for collecting the knowledge management dataset from the Web of Science database. The authors are grateful to the Department of Mathematics, CMU for the Collaborative Research Grant that partially supported the research work. The authors are also grateful to the Editor-in-Chief of the Journal of Informetrics and the two anonymous reviewers for their constructive comments that greatly improved the quality of the paper.
Publisher Copyright:
© 2015 Elsevier Ltd.
PY - 2015/7/1
Y1 - 2015/7/1
N2 - The purpose of the study is to compare the performance of count regression models to those of linear and lognormal regression models in modelling count response variables in informetric studies. Identified count response variables in informetric studies include the number of authors, the number of references, the number of views, the number of downloads, and the number of citations received by an article. Also of a count nature are the number of links from and to a website. Data were collected from the United States Patent and Trademark Office (. www.uspto.gov), an open access journal (. www.informationr.net/ir/), Web of Science, and Maclean's magazine. The datasets were then used to compare the performance of linear and lognormal regression models with those of Poisson, negative binomial, and generalized Poisson regression models. It was found that due to over-dispersion in most response variables, the negative binomial regression model often seems to be more appropriate for informetric datasets than the Poisson and generalized Poisson regression models. Also, the regression analyses showed that linear regression model predicted some negative values for five of the nine response variables modelled, and for all the response variables, it performed worse than both the negative binomial and lognormal regression models when either Akaike's Information Criterion (AIC) or Bayesian Information Criterion (BIC) was used as the measure of goodness of fit statistics. The negative binomial regression model performed significantly better than the lognormal regression model for four of the response variables while the lognormal regression model performed significantly better than the negative binomial regression model for two of the response variables but there was no significant difference in the performance of the two models for the remaining three response variables.
AB - The purpose of the study is to compare the performance of count regression models to those of linear and lognormal regression models in modelling count response variables in informetric studies. Identified count response variables in informetric studies include the number of authors, the number of references, the number of views, the number of downloads, and the number of citations received by an article. Also of a count nature are the number of links from and to a website. Data were collected from the United States Patent and Trademark Office (. www.uspto.gov), an open access journal (. www.informationr.net/ir/), Web of Science, and Maclean's magazine. The datasets were then used to compare the performance of linear and lognormal regression models with those of Poisson, negative binomial, and generalized Poisson regression models. It was found that due to over-dispersion in most response variables, the negative binomial regression model often seems to be more appropriate for informetric datasets than the Poisson and generalized Poisson regression models. Also, the regression analyses showed that linear regression model predicted some negative values for five of the nine response variables modelled, and for all the response variables, it performed worse than both the negative binomial and lognormal regression models when either Akaike's Information Criterion (AIC) or Bayesian Information Criterion (BIC) was used as the measure of goodness of fit statistics. The negative binomial regression model performed significantly better than the lognormal regression model for four of the response variables while the lognormal regression model performed significantly better than the negative binomial regression model for two of the response variables but there was no significant difference in the performance of the two models for the remaining three response variables.
KW - Count regression models
KW - Count response variable
KW - Informetric studies
KW - Linear regression model
KW - Lognormal regression model
KW - Negative binomial regression model
UR - http://www.scopus.com/inward/record.url?scp=84930640136&partnerID=8YFLogxK
U2 - 10.1016/j.joi.2015.05.001
DO - 10.1016/j.joi.2015.05.001
M3 - Article
AN - SCOPUS:84930640136
SN - 1751-1577
VL - 9
SP - 499
EP - 513
JO - Journal of Informetrics
JF - Journal of Informetrics
IS - 3
ER -