NIST taking on big data problem

Friday - 2/3/2012, 2:37pm EST

Ellen Vorhees, project manager, Text Retrieval Conference, NIST

Download mp3

By Sean McCalley
Federal News Radio

The National Institute of Standards and Technology is leading an effort to make better sense of the flood of data many organizations in the public and private sector face.

NIST is hosting a conference with other agencies and members of academia and the computer industry to find patterns among the billions of documents created by federal agencies.

At the annual Text Retrieval Conference (TREC), public and private-sector organizations are hoping to use the patterns to strengthen digital infrastructures and develop new research strategies.

TREC encourages friendly competition in creating algorithms to organize and interpret the massive amount of data, including everything from emails and Tweets to, even, medical records.

Ellen Vorhees is the project manager of TREC at the National Institute of Standards and Technology. She joined The Federal Drive with Tom Temin to discuss its goals and methods.

"TREC has been on the leading edge of being able to handle these large amounts of unstructured data," Vorhees said. "We keep these different focus areas and try to keep ourselves at the front."

This effort is not just for the government, but commercial companies also are finding benefits. Industry representatives include Microsoft and IBM; in fact, Big Blue collected from TREC the original research for the famous Watson computer.

NIST will hold the conference in November, but participants have until the end of February to submit their applications to participate in the effort to deal with unstructured data.

RELATED STORIES:

Industry Chatter: Agencies must get grip on big data

Gov lessons from Jeopardy's Watson computer challenge