Are there any statistic research on peoples' use (count) of each character in the alphabet?

Asked by hced (17points) December 3rd, 2010

I plan to program a creative word processor that I intend to deploy as a web application.

The central feature in this word processor will be based on statistic research (if I can find any) that has been gathered about peoples’ use of each particular letter in the alphabet (from A–Z). That is, statistics on each character’s general frequency, or use-percentage (“A: 6%”, “B: 3.3%”, “C: 1.8%” and so on) in ‘common’ language.

It doesn’t matter which group of people the research was focusing on, I just need some numbers to go with right now.

I’m initially interested in latin-based languages (English in particular) but other stats (latin or not) would be great too.

So, is anyone aware of such linguistic research?

I’m very thankful for any help!

Have you not tried googling for this?

@flutherothertsk ;-)

Also, you ruined my tada!

flutherother: Oh lord. I almost feel a bit ashamed for not finding this. I actually had a Google session yesterday, but must have been either too absent minded or I might have searched on the wrong things. I’m not English by mother tongue so sometimes it’s hard to get it right.

Good luck with your program, sounds like it will be very painful. (mentally I mean)

I know R S T L N E and C D M A are used the most…

iamthemob's avatar

That’s some Wheel of Fortune hotness.

As a cryptogram solver, I know E and T are most frequent.
I also know Spiro Agnew used P and N far more than normal people.

The alphabet in order of its frequency of use in written English: E, T, A, O, I, N, S, H, R, D, L, U, C, M, F, G, Y, P, W, B, V, K, X, J, Q, Z

Taken from a Fact Book I have. Sorry, no percentages.

@hced No problem. Sometimes its just a question of guessing the right search terms.

Does anyone else find it weird that P and B are used less frequently than F, W or G?

Note that the frequencies vary in different languages. You can also find lists of frequencies for two-letter and three-letter combinations (and perhaps more) in various languages. It is interesting to observe how quickly a language’s defining characteristics show up in those lists.

@Provlear, not when you consider the most common words and forms—for instance, “if,” “of,” “for,” and “from,” the “wh” words, verb forms with “will” and “would,” and words that end in “ing.”

Knowing the letter frequency has always been important in solving cryptograms. Tricky cryptograms depart from the usual frequency in some significant way.

