August 19, 2008
INTERNET users who solve distorted-word puzzles to access websites may unknowingly be helping The New York Times digitise old print articles.
Companies such as New York Times Co are harnessing millions of web users around the world to help digitise books and articles that were written before computers existed.The method, in use for a year, can process 160 books a day with almost perfect accuracy, according to a study by Carnegie Mellon University researchers.
Computers have been able to read old books and archived newspapers with optical character recognition software for years.
The new method takes distorted or faded words that the software did not recognise and displays them in website puzzles for humans to solve.
"The problem is that OCR is not perfect," says Luis von Ahn, an assistant professor with Carnegie Mellon's computer science department.
"For really old books, say before 1900, 20 to 30 per cent of words are going to be wrong."
Deciphering the words takes humans about 10 seconds and saves 150,000 hours of manual transcription, according to recaptcha.net, a site that chronicles the researchers' work.
About 4 million words are deciphered each day with more than 99 per cent accuracy, according to the study, published today in the Science Express journal.
"During those 10 seconds your brain is doing something that computers cannot do," von Ahn says.
In the first year, the method helped decipher 440 million words, or about 17,600 books.
Bloomberg