ACCOUNTING VARIABLES, DECEPTION AND A BAG OF WORDS: ASSESSING THE TOOLS OF FRAUD DETECTION
Lynnette Purda
Queen’s School of Business
Queen’s University
Kingston, Ontario, Canada
(lpurda@business.queensu.ca)
David Skillicorn
School of Computing
Queen’s University
Kingston, Ontario, Canada
(skill@cs.queensu.ca)
Abstract
We develop a data-generated list of words most predictive of fraudulent financial reporting and compare
its success in correctly classifying truthful and fraudulent reports with predictions made by models based
on quantitative financial statement variables and four alternative fixed word lists previously shown to be
associated with fraud. We find that the data-generated list can be a useful complement to alternative
methods, greatly reducing the number of false positives produced by models based on accounting
variables and correctly identifying a higher proportion of frauds than alternative “bag of word” methods.
We study the time series of annual and interim reports of firms eventually accused of fraud and assign a
probability of truthfulness to each using the various word lists. On average, the data-generated word list
shows declines in truthfulness approximately three quarters prior to the actual instance of fraud while
other word lists measuring negative tone and the presence of litigation generate probabilities that decline
by a lesser extent much closer to the event. We find that a word list designed to capture conscious
deception has no predictive power in this setting, consistent with financial reports being written by
multiple individuals, many of whom are likely unaware that misrepresentation is occurring. Our results
contribute not only the development of a new detection tool but also to improving our understanding of
how textual analysis may ultimately contribute to the financial investigators’ toolkit.