General Question

gorillapaws's avatar

What is the spammer's strategy behind listing dozens of random "benign" key words?

Asked by gorillapaws (27698points) February 5th, 2012

I just received a spam e-mail with a link to something spammy, followed by several paragraphs of “benign” keywords.

Is the strategy here to teach my junk mail filter to associate these “acceptable” words with spam, so legit messages start getting flagged as junk and encouraging me use less restrictive mail filtering in the future? Or is it trying to “hit” a possible “whitelisted” word and thereby slip through the junk mail filter? Is it something else?

Here’s an example of a couple paragraphs:
“tractor tortellini bath saw magic house art weasel curve adapter tank stop sign newsprint priest knickers freckle dock pin denim tachometer kite gate blade smash increase mitten restaurant face temper disease community fibre boy hamburger raft copy delete lycra africa banana workshop furniture sea goat doctor”

“sarah kangaroo skirt dashboard kenneth brace lunch windscreen priest payment apparatus pyramid polish beech approval nest case cocktail backbone locust weasel brazil age fired owner sudan block coke celery ketchup tree crawdad gander education umbrella cd law spot bangladesh handsaw”

Observing members: 0 Composing members: 0

6 Answers

XOIIO's avatar

Because if people search for something random that result will show up, it’s the exact same strategy as on youtube with tags, totally unrelated stuff.

Tropical_Willie's avatar

Some of the “Spam” filter logic is looking for paragraphs of words.

No words = Spam Or words equal ok e-mail which is as you see is not true.

gorillapaws's avatar

@Tropical_Willie that makes sense, but if that were the case, why go through the trouble of randomly generating dictionary words when they could cut-paste in text from a legit source, like a news article?

Tropical_Willie's avatar

I said “some” but not all filters use the words and paragraph logic. It may be a proprietary thing… Some filters have a “L O N G” list of suspect addresses and/or mail servers.

gambitking's avatar

There could be several reasons. What you’re talking about is commonly called “keyword stuffing”. It happens on blogs, websites, emails, pretty much anywhere there’s digital marketing.

It’s typically found on web pages, and on blogs these lists are commonly known as ‘tag clouds’ or just keyword tags. The primary purpose of those keywords on the web is for SEO (Search engine optimization). The spammers who do this think that they are actually telling search engines (google) that their site should be ranked at the top of search results for those keywords because the keywords appear on their site so much. They would be wrong, Google is way smarter than that. Instead, they’re more likely to get their site penalized, or even de-indexed from Google.

So regarding the stuffing in an email message, that’s kind of odd, and I haven’t seen that too often. If it’s not for the purpose of ranking some content or a hyperlink well in the search engines, there could be something else going on here.

One possibility could indeed be a method to bypass spam filters, which is pretty likely.

Another could be that the spammer uses an email marketing tool by which he can use keyword tags to choose who gets what emails.

One more possibility is that they’re just trying to confuse the hell out of you. Which it appears they have accomplished!

blueiiznh's avatar

It is typically called Word Salad.

More technically called Bayesian poisoning. It is a technique used by e-mail spammers to attempt to degrade the effectiveness of spam filters that rely on Bayesian spam filtering. Bayesian filtering relies on Bayesian probability to determine whether an incoming mail is spam or is not spam. The spammer hopes that the addition of random (or even carefully selected) words that are unlikely to appear in a spam message will cause the spam filter to believe the message to be legitimate.

Spammers also hope to cause the spam filter to have a higher false positive rate by turning previously innocent words into spammy words in the Bayesian database because a user who trains their spam filter on a poisoned message will be indicating to the filter that the words added by the spammer are a good indication of spam.

Anti Virus and Anti Spam heuristic engines are continually refining the changes everyday and this one is normally caught by most fine tuned corporate configurations. ISP’s however have a tougher time, because they would be marking many and causing too many false positives.

Answer this question




to answer.

This question is in the General Section. Responses must be helpful and on-topic.

Your answer will be saved while you login or join.

Have a question? Ask Fluther!

What do you know more about?
Knowledge Networking @ Fluther