TY - JOUR
T1 - Sample size and informetric model goodness-of-fit outcomes
T2 - A search engine log case study
AU - Ajiferuke, Isola
AU - Wolfram, Dietmar
AU - Famoye, Felix
PY - 2006/6
Y1 - 2006/6
N2 - The influence of sample size on informetric characteristics is examined to determine whether theoretical mathematical models can adequately fit large data sets. Two large data sets of queries submitted to the Excite search service were sampled for search characteristics (term frequencies, terms used per query, pages viewed per query, queries submitted per session) producing data sets of various sizes that were fitted to theoretical models to determine how the sample may influence a model's goodness-of-fit. Although theoretical models could adequately fit smaller data sets of up to 5000 observations in some cases, larger data sets could not be satisfactorily fitted using several goodness-of-fit techniques. Investigators must take into account that sample size does influence goodness-of-fit outcomes. The nature of the data and not the limitations of given goodness-of-fit tests results in significant outcomes. Such goodness-of-fit tests should be used for comparative purposes, rather than significance testing.
AB - The influence of sample size on informetric characteristics is examined to determine whether theoretical mathematical models can adequately fit large data sets. Two large data sets of queries submitted to the Excite search service were sampled for search characteristics (term frequencies, terms used per query, pages viewed per query, queries submitted per session) producing data sets of various sizes that were fitted to theoretical models to determine how the sample may influence a model's goodness-of-fit. Although theoretical models could adequately fit smaller data sets of up to 5000 observations in some cases, larger data sets could not be satisfactorily fitted using several goodness-of-fit techniques. Investigators must take into account that sample size does influence goodness-of-fit outcomes. The nature of the data and not the limitations of given goodness-of-fit tests results in significant outcomes. Such goodness-of-fit tests should be used for comparative purposes, rather than significance testing.
KW - Frequency distributions
KW - Goodness-of-fit tests
KW - Informetric modelling
KW - Internet usage patterns
UR - http://www.scopus.com/inward/record.url?scp=33745021907&partnerID=8YFLogxK
U2 - 10.1177/0165551506064361
DO - 10.1177/0165551506064361
M3 - Article
AN - SCOPUS:33745021907
SN - 0165-5515
VL - 32
SP - 212
EP - 222
JO - Journal of Information Science
JF - Journal of Information Science
IS - 3
ER -