Tuesday, August 24, 2010

Blog: Sizing Samples

Sizing Samples
MIT News (08/24/10) Hardesty, Larry

Numerous scientific fields employ computers to deduce patterns in data, and Massachusetts Institute of Technology researchers led by graduate student Vincent Tan have taken an initial step to determine how much data is enough to support reliable pattern inference by envisioning data sets as graphs. In the researchers' work, the nodes of the graph represent data and the edges stand for correlations between them, and from this point of view a computer tasked with pattern recognition is provided a bunch of nodes and asked to construe the weights of the edges between them. The researchers have demonstrated that graphs configured like chains and stars establish, respectively, the best- and worst-case scenarios for computers charged with pattern recognition. Tan says that for tree-structured graphs with shapes other than stars or chains, the "strength of the connectivity between the variables matters." Carnegie Mellon University professor John Lafferty notes that a tree-structured approximation of data is more computationally efficient.

View Full Article

No comments:

Blog Archive