Friday, December 17, 2010

Blog: In 500 Billion Words, New Window on Culture [Steven Pinker involved]

In 500 Billion Words, New Window on Culture
New York Times (12/17/10) Patricia Cohen

Researchers from Google and Harvard University have developed an online database of 500 billion words taken from 5.2 million digitized books published between 1500 and 2008 in English, French, Spanish, German, Chinese, and Russian. The database offers a year-by-year count of how often certain words and phrases appear, data representations, and searching tools. Users can submit a string of up to five words and see a graph that displays the phrase's use over time. "The goal is to give an eight-year-old the ability to browse cultural trends throughout history, as recorded in books," says Harvard's Erez Lieberman Aiden. The database provides research opportunities to liberal arts professors, who have historically avoided quantitative analysis, in a new field dubbed culturomics. "We wanted to show what becomes possible when you apply very high-turbo data analysis to questions in the humanities," says Lieberman Aiden. The data set is downloadable and users can develop their own search tools. The researchers estimate that the English language has grown by 70 percent in the last 50 years and the new system could be used to update dictionaries by highlighting newly popular and underused words. The database and others like it will soon become universal in humanities research, says Harvard's Steven Pinker.

View Full Article

No comments:

Blog Archive