Tuesday, March 17, 2009

Blog: Hadoop, a Free Software Program, Finds Uses Beyond Search

Hadoop, a Free Software Program, Finds Uses Beyond Search
New York Times (03/17/09) P. B3; Vance, Ashlee

Hadoop software has quickly become widely used by the top search engines and other Web sites to analyze and access the unprecedented amounts of data created by the Internet. The free program maps information over thousands of computers and offers a simpler method for writing analytical queries, thus enabling users to explore data by simply asking a question. "It's a breakthrough," says Lawrence Livermore National Laboratory's Mark Seager. "I think this type of technology will solve a whole new class of problems and open new services." Hadoop is based on MapReduce technology developed by Google. MapReduce, when paired with the file management technology Google uses to catalog the Web, can be used to index the entire Internet on a regular basis and analyze the vast amounts of information to determine the quality of search results and how people use the company's various services. MapReduce makes it possible to break large sets of data into small pieces, which can be spread across thousands of computers, ask the computers questions, and then receive cohesive answers. Google has largely kept the MapReduce technology a secret, but the company published papers on some of the underlying techniques, which software consultant Doug Cutting used to create Hadoop. Hadoop can track people's behavior to see what types of stories and content they view, and then match ads with that content. Microsoft uses Hadoop to improve its search system, and Facebook uses the program to determine how closely linked people are based on who appears in users' photographs.

View Full Article

No comments:

Blog Archive