How the NSA Could Get So Smart So Fast; Modern Computing Is Helping Companies and Governments Accurately Parse Vast Amounts of Data in a Matter of Minutes

Updated June 12, 2013, 7:51 p.m. ET

How the NSA Could Get So Smart So Fast

Modern Computing Is Helping Companies and Governments Accurately Parse Vast Amounts of Data in a Matter of Minutes

Five years ago it would have been unimaginable for a government agency such as the National Security Agency to efficiently parse millions of phone, text and online conversations for keywords that could have warned of an impending terrorist attack. Today, it’s much easier. Michael Hickins joins the News Hub.

By MICHAEL HICKINS

Five years ago it would have been unimaginable for a government agency such as the National Security Agency to efficiently parse millions of phone, text and online conversations for keywords that could have warned of an impending terrorist attack. Today, a set of new technologies make it relatively affordable and manageable for it do so. These technologies can store vastly different types of data in a single database, and can be processed rapidly using inexpensive hardware, without an analyst having to formulate a hypothesis. “They’ve substantially reduced the cost and greatly increased the [government’s] ability to analyze this type of data,” says Tom Davenport, an expert on analytics and a visiting professor at Harvard Business School. The technology needed to outfit data centers to perform these tasks has become “orders of magnitude” less expensive than in the past, he said.It is unclear exactly what type of computing the NSA is using in its data-center facilities around the U.S., or in a $1.2 billion facility in Utah that will open this fall.

But broadly speaking, the technology can be broken down into three categories:

Database systems

Traditional databases, usually written in a language known as SQL (pronounced sequel), store data in tables, columns and rows but are limited when it comes to storing strings of words such as those found in an email or text message. They also can’t handle pictures or video.

New types of databases that emerged beginning in late 2009, known collectively as NoSQL (for “not only SQL”), such as MongoDB, Cassandra and Simple DB, don’t have these limitations, and allow analysts to create queries against all these types of data.

NoSQL databases can make a huge difference to companies analyzing very large data sets, even if they’re fairly conventional. For example, analysts at risk consultancy Verisk Analytics Inc.VRSK +0.10% are “constantly running different models and analytics” against billions of customer records in order to help identify fraudulent insurance claims.

Perry Rotella, vice president and chief information officer at Verisk, says using a traditional DB2 database from International Business Machines Corp., IBM -0.77%“would be a six-hour job” that had to run overnight. Analysts would pore over the results and generate new queries that would again have to run overnight. He said it took weeks every time analysts needed to create a new statistical model. The company recently changed to a NoSQL database that allows analysts to run the same types of queries in 30 seconds.

“So all of a sudden your model-building becomes iterative in real-time instead of over days. [Using NoSQL], you can run analytics on your data multiple times a day, and it compresses your ability to get results from weeks into days. It’s extremely powerful,” he said.

For online businesses like photography marketplace Shutterstock Inc., SSTK +2.01%which store a great variety of file types, it is difficult to imagine life without this technology. Shutterstock has a library of more than 24 million images and adds an additional 10,000 each day, each of which has associated data to help narrow search results. Its databases also record everything that users do on the site—not just decisive actions such as what images they license, but also minute details such as where they place their cursor and how long they hover there.

Machine learning

Traditional analysis requires analysts to have enough understanding of the data to form a hypothesis and then create complex queries to run against the database. Recently developed programs known as machine learning and natural language processing rely on the computer programs themselves to find patterns and even elucidate the meaning of ambiguous words based on context. “You can turn a machine-learning program loose on a lot of data and you can see what they are able to be predictive of,” said Mr. Davenport. With natural language processing, “you could figure out whether a term like ‘bomb’ is being used to describe a Broadway play versus something a terrorist would use,” he said.

 

Machine learning, also known as cognitive analytics, allows queries to continually “tune themselves,” Gartner Inc. analyst Douglas Laney explains. For example, retailers use this technology to automatically update pricing algorithms in real time as new information, such as weather, time of day and even information gleaned from video of customers browsing in their stores become available. “It used to take more than a day to update pricing, but these retailers can reprice every hour and use trending information to do real-time product pricing,” says Mr. Laney. “I’m not sure they could do that even a year ago,” he said.

Hadoop

Until recently, complex computer programs needed to run on expensive hardware, such as enormous mainframe computers. Today, an open-source software framework called Hadoop—developed at Yahoo Inc. YHOO -0.34% with contributions from technology developed by Google Inc. GOOG -0.22% and named after a child’s toy elephant—allows queries to be split up by the program, with different analytic tasks distributed among scads of inexpensive servers, each of which solves a part of the puzzle, before reassembling the queries when the work is completed. “It’s really cheap and really fast,” said Mr. Davenport.

The ability to distribute complex queries to a large number of inexpensive computers helps people get very quick responses to complicated questions with a large number of variables. For example, online automotive market Edmunds.com Inc. can help auto dealers predict how long a given car will remain on their lots by comparing car makes, models and trim against the number of days inventory cars at that price point averaged on a lot in a given dealer’s region. The predictions help minimize the number days a car remains unsold—”one of the most important sales metrics for dealers,” said Philip Potloff, Edmunds.com’s chief information officer.

Video-streaming company Netflix Inc. NFLX -0.65% uses Hadoop to graph traffic for every type of device people are using to access video across multiple markets, allowing the company to improve the reliability of video feeds on mobile devices, laptops and TVs, and plan for future growth of streaming movies and TV shows. It also helps Netflix to better analyze customer preferences so that it can make improved recommendations.

About bambooinnovator
Kee Koon Boon (“KB”) is the co-founder and director of HERO Investment Management which provides specialized fund management and investment advisory services to the ARCHEA Asia HERO Innovators Fund (www.heroinnovator.com), the only Asian SMID-cap tech-focused fund in the industry. KB is an internationally featured investor rooted in the principles of value investing for over a decade as a fund manager and analyst in the Asian capital markets who started his career at a boutique hedge fund in Singapore where he was with the firm since 2002 and was also part of the core investment committee in significantly outperforming the index in the 10-year-plus-old flagship Asian fund. He was also the portfolio manager for Asia-Pacific equities at Korea’s largest mutual fund company. Prior to setting up the H.E.R.O. Innovators Fund, KB was the Chief Investment Officer & CEO of a Singapore Registered Fund Management Company (RFMC) where he is responsible for listed Asian equity investments. KB had taught accounting at the Singapore Management University (SMU) as a faculty member and also pioneered the 15-week course on Accounting Fraud in Asia as an official module at SMU. KB remains grateful and honored to be invited by Singapore’s financial regulator Monetary Authority of Singapore (MAS) to present to their top management team about implementing a world’s first fact-based forward-looking fraud detection framework to bring about benefits for the capital markets in Singapore and for the public and investment community. KB also served the community in sharing his insights in writing articles about value investing and corporate governance in the media that include Business Times, Straits Times, Jakarta Post, Manual of Ideas, Investopedia, TedXWallStreet. He had also presented in top investment, banking and finance conferences in America, Italy, Sydney, Cape Town, HK, China. He has trained CEOs, entrepreneurs, CFOs, management executives in business strategy & business model innovation in Singapore, HK and China.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: