Send to a Friend

richardhenry's avatar

How can I create an ordered list of the most common substrings inside of a MySQL varchar column?

Asked by richardhenry (12692points) October 29th, 2008

I have a MySQL database table with a couple thousand rows. The table is setup like so:

id | text

The id column is an auto-incrementing integer, and the text column is a 200-character varchar.

Say I have the following rows:

3 | I think I’ll have duck tonight

4 | Maybe the chicken will be alright

5 | I have a pet duck now, awesome!

6 | I love duck

Then the list I’m wanting to generate might be something like:
– 3 occurrences of ‘duck’
– 3 occurrences of ‘I’
– 2 occurrences of ‘have’
– 1 occurrences of ‘chicken’
– .etc .etc

Plus, I’ll probably want to maintain a list of substrings to ignore from the list, like ‘I’, ‘will’ and ‘have. It’s important to note that I do not know what people will post.

I do not have a list of words that I want to monitor, I just want to find the most common substrings. I’ll then filter out any erroneous substrings that are not interesting from the list manually by editing the query.

Can anyone suggest the best way to do this? Thanks everyone!

Using Fluther

or

Using Email

Separate multiple emails with commas.
We’ll only use these emails for this message.