The Immortal Life of the Enron E-mails; A decade after the Enron scandal, the company’s internal messages are still helping to advance data science and many other fields

The Immortal Life of the Enron E-mails

A decade after the Enron scandal, the company’s internal messages are still helping to advance data science and many other fields.

By Jessica Leber on July 2, 2013

Corporate corpus: Volumes of e-mails that were sent and received in Enron’s headquarters in Houston, seen here in 2002, are still parsed and dissected by computer scientists and other researchers. Former Enron executive Vincent Kaminski is a modest, semi-retired business school professor from Houston who recently wrote a 960-page bookexplaining the fundamentals of energy markets. His most lasting legacy, however, may involve thousands of e-mails he wrote more than a decade ago at the energy-services company.Kaminski, a former managing director for research who warned repeatedly about concerning practiceshe saw at Enron, is among more than 150 senior executives whose e-mail boxes were dumped onto the Internet by the Federal Energy Regulatory Commission (FERC) on March 26, 2003. In the name of serving the public’s interest during its investigation of Enron, the federal agency made the controversial decision to post online more than 1.6 million e-mails that Enron executives sent and received from 2000 through 2002. FERC eventually culled the trove to remove the most sensitive and personal data, after receiving complaints (see PDF). Even so, the “Enron e-mail corpus,” as the cleaned-up version is now known, remains the largest public domain database of real e-mails in the world—by far.

This corpus, as it is known, is valuable to computer scientists and social-network theorists in ways that the e-mails’ authors and recipients never could have intended. Because it is a rich example of how real people in a real organization use e-mail—full of mundane lunch plans, boring meeting notes, embarrassing flirtations that revealed at least one extramarital affair, and the damning missives that spelled out corruption—it has become the foundation of hundreds of research studies in fields as diverse as machine learning and workplace gender studies.

This research has had widespread applications: computer scientists have used the corpus to train systems that automatically prioritize certain messages in an in-box and alert users that they may have forgotten about an important message. Other researchers use the Enron corpus to develop systems that automatically organize or summarize messages. Much of today’s software for fraud detection, counterterrorism operations, and mining workplace behavioral patterns over e-mail has been somehow touched by the data set.

“It’s like we are studying yeast,” says William Cohen, a Carnegie Mellon University computer scientist who helped put the corpus in a database that could be mined by researchers. “It’s studied and experimented on because it is a very well understood model organism. [The e-mail generated by] Enron is similar. People are going to keep using it for a long time.”

The Enron e-mails were given their extended life by scientists at MIT, Carnegie Mellon University, and the nonprofit research institute SRI International. Ten years ago, researchers at these institutions were collaborating on the DARPA-funded CALO project, which stands for “Cognitive Assistant that Learns and Organizes,” and whose biggest claim to fame is giving rise to Apple’s Siri software. For CALO, the researchers were cobbling together much smaller e-mail data sets to analyze.

When the Enron e-mails were posted in 2003, the researchers realized that they could be extremely useful for testing algorithms that could process written language and form the basis of intelligent workplace tools.  Because FERC had posted the e-mails in an unusable format, MIT’s Leslie Kaelbling purchased the raw files from a government contractor for $10,000, and others spent time cleaning up the data—weeding out duplicates, organizing folders, taking out the remaining private attachments and e-mails, and mapping the senders and recipients to Enron’s organizational structure. The corpus, at first more than 517,431 e-mails, was whittled down to 200,000 by 2004.

A research ecosystem still blooms around the corpus because there is nothing else like it in the public domain. If it didn’t exist, research into business e-mails could be done only by people with access to big corporate or government servers. That probably would exclude social science, organizational, and linguistics researchers—many of whom have used the corpus to glean valuable insights into corporate culture, says Owen Rambow, a Columbia University professor involved in a research project that used the Enron corpus and received a $510,000 grant from the National Science Foundation.

Since 2010, about 30 papers a year have cited the original paper that presented the Enron corpus, Carnegie Mellon’s Cohen estimates. This year, for instance,researchers at HP Labs turned to the corpus to demonstrate an artificial intelligence program for automatically identifying the commitments people make over e-mail. Jafar Adibi, who worked on an early map of the Enron social network, says he still gets handfuls of inquiries every month, more and more from researchers outside of the United States. There is still an active list-servdevoted to discussing the corpus.

Researchers who have worked with the corpus know there won’t be another Enron. FERC released the e-mails back when the world still had a lot to learn about online privacy. The harms to people mentioned—most of whom were innocent of any wrongdoing at Enron—were quickly apparent. Social security numbers and even bank records were in there. Though much private data has been removed, browsing hundreds of e-mails in Kaminski’s “sent” folder, I found a home phone number, his wife’s name, and an unflattering opinion he held of a former colleague. I also got the sense that he had been long, long overdue for the promotion he received in 2000. At the time the e-mails were first released, Kaminski, the manager of about 50 employees at Enron, said he was most disturbed to see his back-and-forth communications about HR complaints and job candid­ate evaluations become public. A job candidate he once interviewed got upset after their release.

Today, many people who work in highly regulated industries like finance avoid putting sensitive information in their e-mails. Kaminski, who later served as a managing director at Citigroup, notes that the acronym “LTOL” became popular e-mail lingo in the years following Enron. It stands for “Let’s take this offline.”

Unknown's avatarAbout bambooinnovator
Kee Koon Boon (“KB”) is the co-founder and director of HERO Investment Management which provides specialized fund management and investment advisory services to the ARCHEA Asia HERO Innovators Fund (www.heroinnovator.com), the only Asian SMID-cap tech-focused fund in the industry. KB is an internationally featured investor rooted in the principles of value investing for over a decade as a fund manager and analyst in the Asian capital markets who started his career at a boutique hedge fund in Singapore where he was with the firm since 2002 and was also part of the core investment committee in significantly outperforming the index in the 10-year-plus-old flagship Asian fund. He was also the portfolio manager for Asia-Pacific equities at Korea’s largest mutual fund company. Prior to setting up the H.E.R.O. Innovators Fund, KB was the Chief Investment Officer & CEO of a Singapore Registered Fund Management Company (RFMC) where he is responsible for listed Asian equity investments. KB had taught accounting at the Singapore Management University (SMU) as a faculty member and also pioneered the 15-week course on Accounting Fraud in Asia as an official module at SMU. KB remains grateful and honored to be invited by Singapore’s financial regulator Monetary Authority of Singapore (MAS) to present to their top management team about implementing a world’s first fact-based forward-looking fraud detection framework to bring about benefits for the capital markets in Singapore and for the public and investment community. KB also served the community in sharing his insights in writing articles about value investing and corporate governance in the media that include Business Times, Straits Times, Jakarta Post, Manual of Ideas, Investopedia, TedXWallStreet. He had also presented in top investment, banking and finance conferences in America, Italy, Sydney, Cape Town, HK, China. He has trained CEOs, entrepreneurs, CFOs, management executives in business strategy & business model innovation in Singapore, HK and China.

Leave a comment