General Question

kranthili2020's avatar

How is spell checking done in search engines?

Asked by kranthili2020 (10points) August 21st, 2009

Hi all,
I was wondering how the search engines provide “Did you know”/“Spelling Correction” to the queries we type at such a fast rate and such a good enough accuracy. They not only correct spelling at word level but also do the same based on Context. EG : “java fle” is given “Did you mean : java file” but when query is just “fle” you get results for this query. Does any one know the basic algorithms that they might be using ?

Observing members: 0 Composing members: 0

10 Answers

richardhenry's avatar

Rather than using a normal English dictionary to provide suggestions, a heuristic dictionary is used. The dictionary learns from new input rather than a predefined list.

The basis of the concept is simple enough; whenever someone types in a query, a record is saved. If anyone else types in the same query later, the weight of the record for that query increases. The heaviest (most popular) records (search queries) are the ones that are the most likely to be suggested.

Above that is spelling correction; the same type of thing used by applications like Microsoft Word. A spell checker is essentially an algorithm for comparing a word against a known list of correctly spelled words (ie., the dictionary).

Rather than comparing the phrase to the English dictionary, we can compare it to our custom dictionary of all the popular things people are search for and see if we have a probable match.

YARNLADY's avatar

@richardhenry Why does it insist that judgement is wrong?

markyy's avatar

Like @richardhenry says, you need to have something to compare to.

I always imagined they used algorithms and functions like the Levenshtein distance, phonetics and so on. But what do I know, It’s probably infinitely more complex. And atleast the same amount of data that is cached on google’s servers

markyy's avatar

O and of course, there is no clear cut answer to this. If it was a single technique, we would have dozens of google clones.

kranthili2020's avatar

For single word spellling corrections you could always use K-Grams in combination with levenshtein distance or jaro-winkler distance. But using these simple techinques will not solve the problem of Context Based Spell Checking. Suppose you don’t have query logs and have just a crawled corpus… Is there is decent solution which can give correct answers with 70 to 80% accuracy?

richardhenry's avatar

Don’t forget that we also have people clicking on results to use as a metric.

If you clicked a result, you very probably did mean to use that particular search term.

If I type “tomato soop”, don’t click any results, and search for “tomato soup” shortly after…

There’s so much that’s measurable here, it’s crazy.

rebbel's avatar

@richardhenry So, if we all go searching for Baanaanaas and don’t except Google’s suggestion to click Bananas (but click our own Baanaanaas), that would mean that, in time, people who look for Bananas will get the suggestion to look for Baanaanaas?

Hehheh, click our own baanaanaas…

markyy's avatar

@rebbel I think it only works the other way around. It’s the times you do click on the suggestion that are valuable to count. Why am I telling you this, you’re probably too busy clicking your baanaanaa.

@richardhenry Do search engines really compare older queries to the new one? I get dizzy just imaging it, or maybe they just check the new one versus the last one. I would never get any work done while working for a searchengine. I always feel like I get lost in the possibilities.

augustlan's avatar

@YARNLADY Because the correct spelling is ‘judgment’. I always spell it wrong!

richardhenry's avatar

@rebbel Yes. To make even more of an impact, you would search for “bananas”, don’t click any results, and then search for “baanaanaas”.

I’m imagining the same thing would happen with a search for “cursebird”, where it eventually stopped suggesting “curse bird” because enough people were searching for the concatenated version.

However, because there’s still people looking for “curse bird” too, it understands that the terms are exclusive and doesn’t suggest either to eachother.

Cursebird is something I made.

Answer this question

Login

or

Join

to answer.

This question is in the General Section. Responses must be helpful and on-topic.

Your answer will be saved while you login or join.

Have a question? Ask Fluther!

What do you know more about?
or
Knowledge Networking @ Fluther