NEWS.com.au |
Fox Sports |
Newspapers |
CareerOne |
carsguide |
TrueLocal |
Real Estate |
MySpace AU
previous pause next Network Highlights:

Puzzles sort out OCR errors

August 19, 2008

INTERNET users who solve distorted-word puzzles to access websites may unknowingly be helping The New York Times digitise old print articles.

Companies such as New York Times Co are harnessing millions of web users around the world to help digitise books and articles that were written before computers existed.

The method, in use for a year, can process 160 books a day with almost perfect accuracy, according to a study by Carnegie Mellon University researchers.

Computers have been able to read old books and archived newspapers with optical character recognition software for years.

The new method takes distorted or faded words that the software did not recognise and displays them in website puzzles for humans to solve.

"The problem is that OCR is not perfect," says Luis von Ahn, an assistant professor with Carnegie Mellon's computer science department.

"For really old books, say before 1900, 20 to 30 per cent of words are going to be wrong."

Deciphering the words takes humans about 10 seconds and saves 150,000 hours of manual transcription, according to recaptcha.net, a site that chronicles the researchers' work.

About 4 million words are deciphered each day with more than 99 per cent accuracy, according to the study, published today in the Science Express journal.

"During those 10 seconds your brain is doing something that computers cannot do," von Ahn says.

In the first year, the method helped decipher 440 million words, or about 17,600 books.

Bloomberg

Story Tools

Share This Article

From here you can use the Social Web links to save Puzzles sort out OCR errors to a social bookmarking site.

Email To A Friend

* Required fields

Information provided on this page will not be used for any other purpose than to notify the recipient of the article you have chosen.

Keep up to date with all the latest In Depth news, delivered straight to you.

Register now!

Sign up for a daily update of the biggest stories in IT. From Microsoft to Microformats, you'll be on top of all the latest in IT news five days a week.

Also in Australian IT

Crisis may threaten tech firms

SMALL technology firms may collapse and inventions be lost overseas because research commercialisation faces such uncertainty and turmoil on several fronts, leading industry figures have warned.

Hard times ahead for hardware

CUTBACKS caused by the global financial crisis will affect some sectors heavily, Gartner research shows.

Chumby content lets it down

THE Chumby is a cute Gen Y digi-toy, an expensive, glorified alarm clock, or an example of the future of consumer computer devices.

Telstra best suited for NBN build

TELSTRA'S plan to build the National Broadband Network is predicated on us continuing to be a fully integrated company.

Also in the Australian

Hicks 'relieved' to lose control order

6:46pm FORMER Guantanamo Bay inmate David Hicks has described tonight's decision to lift a control order on his movements as a "great relief".

Stocks in worst bear market since 1987

STOCKS entered the second-worst bear market in Australian history today as Asia tumbled more than 5 per cent on recession fears.

NZ papers barred from covering Test

NZ's biggest newspapers remain barred from covering today's Test after failing to resolve a dispute with Cricket Australia.

Protest over more uni job cuts

INDUSTRIAL unrest at Victorian unis is set to worsen after La Trobe warned staff that voluntary job cuts weren't meeting targets.