In 500 Billion Words, New Window on Culture
New York Times (12/17/10) Patricia Cohen
Researchers from Google and Harvard University have developed an online database of 500 billion words taken from 5.2 million digitized books published between 1500 and 2008 in English, French, Spanish, German, Chinese, and Russian. The database offers a year-by-year count of how often certain words and phrases appear, data representations, and searching tools. Users can submit a string of up to five words and see a graph that displays the phrase's use over time. "The goal is to give an eight-year-old the ability to browse cultural trends throughout history, as recorded in books," says Harvard's Erez Lieberman Aiden. The database provides research opportunities to liberal arts professors, who have historically avoided quantitative analysis, in a new field dubbed culturomics. "We wanted to show what becomes possible when you apply very high-turbo data analysis to questions in the humanities," says Lieberman Aiden. The data set is downloadable and users can develop their own search tools. The researchers estimate that the English language has grown by 70 percent in the last 50 years and the new system could be used to update dictionaries by highlighting newly popular and underused words. The database and others like it will soon become universal in humanities research, says Harvard's Steven Pinker.
Friday, December 17, 2010
Blog: In 500 Billion Words, New Window on Culture [Steven Pinker involved]
Labels:
linguistics,
research
Subscribe to:
Post Comments (Atom)
Blog Archive
-
►
2012
(35)
- ► April 2012 (13)
- ► March 2012 (16)
- ► February 2012 (3)
- ► January 2012 (3)
-
►
2011
(118)
- ► December 2011 (9)
- ► November 2011 (11)
- ► October 2011 (7)
- ► September 2011 (13)
- ► August 2011 (7)
- ► April 2011 (8)
- ► March 2011 (11)
- ► February 2011 (12)
- ► January 2011 (15)
-
▼
2010
(183)
-
▼
December 2010
(16)
- Blog: Movie Magic Conjured by Science
- Blog: Algorithms Take Control of Wall Street [an e...
- Blog: 7 Programming Languages on the Rise
- Blog: Meet the Data-Storing Bacteria [each cell ca...
- Blog: Software [using classical computing] Said to...
- Blog: DARPA Goal for Cybersecurity: Change the Game
- Blog: Computers Help Social Animals to See Beyond ...
- Blog: In 500 Billion Words, New Window on Culture ...
- Blog: JASON: Science of Cyber Security Needs More ...
- Blog: Cryptographers Chosen to Duke It Out in Fina...
- Blog: Problem-Solving Ants Inspire Next Generation...
- Blog: Researchers Open the Door to Biological Comp...
- Blog: UCSF Team Develops "Logic Gates" to Program ...
- Blog: Quantum Links Let Computers Understand Language
- Blog: How Rare Is that Fingerprint? Computational ...
- Blog: New Psychology Theory Enables Computers to M...
- ► November 2010 (15)
- ► October 2010 (15)
- ► September 2010 (25)
- ► August 2010 (19)
- ► April 2010 (21)
- ► March 2010 (7)
- ► February 2010 (6)
- ► January 2010 (6)
-
▼
December 2010
(16)
-
►
2009
(120)
- ► December 2009 (5)
- ► November 2009 (12)
- ► October 2009 (2)
- ► September 2009 (3)
- ► August 2009 (16)
- ► April 2009 (4)
- ► March 2009 (20)
- ► February 2009 (9)
- ► January 2009 (19)
-
►
2008
(139)
- ► December 2008 (15)
- ► November 2008 (16)
- ► October 2008 (17)
- ► September 2008 (2)
- ► August 2008 (2)
- ► April 2008 (12)
- ► March 2008 (25)
- ► February 2008 (16)
- ► January 2008 (6)
-
►
2007
(17)
- ► December 2007 (4)
- ► November 2007 (4)
- ► October 2007 (7)
Blog Labels
- research
- CSE
- security
- software
- web
- AI
- development
- hardware
- algorithm
- hackers
- medical
- machine learning
- robotics
- data-mining
- semantic web
- quantum computing
- Cloud computing
- cryptography
- network
- EMR
- search
- NP-complete
- linguistics
- complexity
- data clustering
- optimization
- parallel
- performance
- social network
- HIPAA
- accessibility
- biometrics
- connectionist
- cyber security
- passwords
- voting
- XML
- biological computing
- neural network
- user interface
- DNS
- access control
- firewall
- graph theory
- grid computing
- identity theft
- project management
- role-based
- HTML5
- NLP
- NoSQL
- Python
- cell phone
- database
- java
- open-source
- spam
- GENI
- Javascript
- SQL-Injection
- Wikipedia
- agile
- analog computing
- archives
- biological
- bots
- cellular automata
- computer tips
- crowdsourcing
- e-book
- equilibrium
- game theory
- genetic algorithm
- green tech
- mobile
- nonlinear
- p
- phone
- prediction
- privacy
- self-book publishing
- simulation
- testing
- virtual server
- visualization
- wireless
No comments:
Post a Comment