Exactly how do internet work censors operate?

Asked by ibstubro (18804

) June 20th, 2014

I’ve only worked one place that had computers, and they had a closed system with strictly limited access to certain content.

If a company allows an employee access to the internet, yet employs a censor or filter, how does that work?

Say you frequented a Q&A site. Is it realistic that the censor or filter would apply only to the questions, but not to the question details or answers? So you could ask a question about, say, “Yayas” then say “tits” in the details and discussion? That even social questions could contain the “7 words you can never say on television’?

Seriously, I’m curious about how this works.

Observing members: 0

Composing members: 0

17 Answers

I don’t know all of the specifics of computer filters. From what I know, the filters look for key words and phrases on every page you look at. Depending on how the filters are set up, having words get caught in the filter can lead to several things happening. Some send a message back to the employer that the employee accessed the content. Others will block the offending site completely from being allowed to be viewed. There may be other things that happen as well.

Funkdaddy provided a pretty good explanation here in case you haven’t seen it yet.

Seaofclouds (23108

)“Great Answer” (1

) Flag as…

Internet filters look at every word on the page. If there are NSFW words in a description/answer, then that thread will be blocked. The problem is that if there are NSFW words in the title, then an entire section of the website will be blocked.

For example, if there is an NSFW word in a social question title and it gets blocked, then the entire http://www.fluther.com/browse/social/ section of the website will be blocked because question titles are visible there.

In order to avoid this, there is the requirement that question titles themselves be safe-for-work and have the [NSFW] tag if needed. This way, users are able to avoid those threads and still access the website.

PhiNotPi (12681

)“Great Answer” (6

) Flag as…

So… once again… what is the point of censoring question titles, if the comments on any question can be (and often are) profane? Any question can contain NSFW content, regardless of the type of question.

dappled_leaves (15893

)“Great Answer” (1

) Flag as…

Should we remove this question. I was having a bad day.
http://www.fluther.com/169079/what-do-you-do-when-youre-so-fucking-pissed-you-just/

Adirondackwannabe (36713

)“Great Answer” (1

) Flag as…

@dappled_leaves The founders specifically wanted to titles to be safe for work. It was their decision to put that as part of the guidelines in order to keep the main pages safe for work. So the easiest answer of why is because that is what Ben and Andrew want for the site.

Seaofclouds (23108

)“Great Answer” (1

) Flag as…

Thanks, I had missed Funkdaddy’s explanation, @Seaofclouds, and it was a good one. Perhaps he’ll re-post here.

I get that, @PhiNotPi. Great breakdown. The censor only blocks what it can see: if there is a list of questions with objectionable keywords, the list will be blocked; if a thread within a specific question contains a keyword, only the question will be blocked from the company computers.

I’m trying to keep this from being kicked to Meta, because it’s about how company censorship works in general, not specifically to this site.

I think I finally got my mind around it, @dappled_leaves, and it seems to make sense.

Ummm…probably, @Adirondackwannabe, since you asked.

ibstubro (18804

)“Great Answer” (2

) Flag as…

I don’t want to have fluther blocked in a bunch of places is why I asked.
Okay, it’s flagged.

Adirondackwannabe (36713

)“Great Answer” (1

) Flag as…

It’s gone. Thanks mods. You do a great job with so little reward. This jelly really appreciates your efforts.

Adirondackwannabe (36713

)“Great Answer” (4

) Flag as…

The relevant wiki article is content-control software. Filtering software uses one of many document classification methods to determine what type of content is on the page. A “simple” form is Bayesian filtering, which is typically used to detect spam.

To create a Bayesian filter, a company takes a broad sample of websites and has humans classify them as safe or not safe. The software then looks at all of the words and finds patterns, such as which words typically appear in which kind of website.

Let’s say that you have a word XXX, and it appears in 5% of all sampled websites. Furthermore, it appears in 40% of NSFW websites sampled. Also, 10% of all sampled websites were NSFW.

Using Bayes’ theorem, we can determine if XXX is an obscene word:

P(NSFW | contains XXX) = ( P(contains XXX | NSFW) * P(NSFW) ) / P(contains XXX)

In this case, we end up with (0.4 * 0.1 / 0.05) = 0.8.

This means: Given a site containing the word XXX, there is an 80% chance that is is NSFW.

An 80% chance will probably not be enough to block the site, but when it contains many such words, that triggers the filter.

PhiNotPi (12681

)“Great Answer” (6

) Flag as…

@Seaofclouds do you really believe that the site owners had censoring titles of questions about the bird called a booby or the late great Dick Clark in mind? I don’t care how internet filters work, I seriously doubt it. Honestly modding for things like that is over the top.

I’m a teacher and the district where I work blocks certain specific websites. I have never seen extra sites blocked because of words that appear. My kids have school laptops from another district, and their filters work the same way – certain sites are always blocked.

@PhiNotPi do actual people look at the content that is leading to a page/site being blocked or is it all done with the software? Because any actual person with half a brain would realize that a question about Dick Cavett is unlikely to be NSFW. I really can’t help but think that if a jelly worked somewhere with such strict filtering that a clicking on a link about booby the bird is going to get the site blocked, they would have the common sense to be careful where they go and what they click on.

FlyingWolf (2830

)“Great Answer” (1

) Flag as…

@FlyingWolf An initial, hand-sorted sample is used to create the filter, but then the filtering software is automated.

PhiNotPi (12681

)“Great Answer” (3

) Flag as…

@FlyingWolf I’m sure the thought of such a thing never crossed their mind. Just as it hadn’t came up as an issue for us until now. That doesn’t change how internet filtering software works and that is our concern. As I’ve mentioned, the mod team is actively discussing the situation and will let everyone know when we come up with a plan of how to handle such situations in the future and clarify what will and won’t be acceptable in question titles.

Seaofclouds (23108

)“Great Answer” (1

) Flag as…

@PhiNotPi Thank you – that actually answers my question.

“An 80% chance will probably not be enough to block the site, but when it contains many such words, that triggers the filter.”

Right. And since there are very few questions on Fluther that have NSFW words in the titles, there should be more than enough room for the very occasional questions which use words like “tits” and “boobies” in their non-NSFW senses. Fluther is not going to be blocked by any filter because of that question. Nor by occasional questions about guys named Dick.

However you slice it, the mods have gone overboard on this one. And yet again, instead of simply admitting that, they are doubling down and tightening the rules. It gets harder and harder to breathe here.

dappled_leaves (15893

)“Great Answer” (2

) Flag as…

@Seaofclouds I have to think it probably did cross their minds. Words that can mean two totally different things are super common, and not just limited to the ones we have discussed here. I really hope the mod team comes to the conclusion that words such as these are not going to break any internet filters.

FlyingWolf (2830

)“Great Answer” (1

) Flag as…

I think the original developers intended to block any word whose primary dictionary definition was defined as “vulgar”.

There is no way Kardamom was the first person to ask a question about ‘tits and boobies’, and it can’t be an accident that a search for questions containing the words “tits”, “boobies” yield no results (and “Dick” has only been allowed as a male name.) It has. obviously, come up before and the policy seems to have been pretty consistent and straightforward to me.

I don’t believe claiming that issues ‘have never come up in the history of Fluther’ is very useful. Fluther used to be much more creative an vibrant than it is now. Either there is a valid reason that can be explained, or there has been a recent change in the rules. In this particular case, I’m just trying to help explain that the ‘tit’s and boobies’ exclusion has been pretty consistent., and as far as I know, you can still ask a question about Dick Van Patton.

ibstubro (18804

)“Great Answer” (3

) Flag as…

@ibstubro Your response has caused this thread to be blocked by my school’s filter. (I often use a school-owned iPad).

PhiNotPi (12681

)“Great Answer” (2

) Flag as…

Sorry, @PhiNotPi. No doubt the paring of the T & B words in quotes. All I can say is better the thread than the section.

ibstubro (18804

)“Great Answer” (0

) Flag as…