# What is the probability of a random string of numbers having a consistent pattern?

I saw a video by Michael Shermer today, where he said people who believed in ESP etc. were more likely to see patterns in selected pictures than the average person, whether or not the picture held a pattern.

This got me thinking, what is the probability of a random data set having a pattern? Of course with pictures it would be astronomically high without a lot of approximation, but what about a simple string of numbers? Is there a rule that can describe this?

Here is a random string I have generated as an example:
8177322396

Within this, if I had only produced 3223, it could be an inflection, or a repeated pattern 2233… If I only had 396, it could be 3,9,6,12,9,15… If I only had 732, it could be the end of the pattern 32,16,7,3,2. As a whole though, the string has no pattern that I can see. The probability of a pattern therefore decreases with string length, but is there some sort of relationship that we can describe empirically?

Observing members: 0 Composing members: 0

I think if it’s truely random then it is stochastic, but true randomness may not even be possible, so who knows?

dpworkin (27035)

@dpworkin The point is the ability to read patterns that aren’t really there into a stochastic data set. For example, there is no pattern that can be used to predict the decimal places of e, but most people recognise the “18281828”, and wonder how long the pattern lasts. Later on someone might recognise the pattern “996996”, but of course the pattern is not intended.

Isn’t it the nature of randomness to be random? That would mean that are times when you have distinct patterns, and other times not. I heard that when people try and make a random line of numbers, it is usually obvious because it has so few patterns, when they do occur in nature and randomly.

skfinkel (13478)

@skfinkel That would mean that are times when you have distinct patterns, and other times not.
Of course, but in what proportions?

I thought that when we study math in school, its broken to discrete and statistical mathematics. When we observe random patterns, that’s when statistical maths is used to deduce the pattern in that randomness. I know some people have the ability to understand pattern easier than others, there is a movie about John Nash, a great mathematician who decifers code for the military in the war, interesting movie to watch.

mea05key (1802)

I understood the question, I just didn’t know how to predict the answer.

dpworkin (27035)

In a truly random sequence of digits, any given finite pattern is not only possible but inevitable. In other words, as the length of the sequence approaches infinity, the probability of any specified finite sequence approaches 1, which is certainty.

In your example of “8177322396”, which is ten digits long, it’s frequency of occurrence among all 10-digit strings should be, on average one in 10^10 (once per ten billion). So if you looked at, say, 10^15 such 10-digit strings you should find yours many times over.

The same is true for any finite sequence no matter how large—say a trillion 3’s in a row. the “law of large numbers” can seem counter-intuitive.

By the way, the digits of irrational numbers such as pi, the square root of 2, etc., exhibit the same statistical properties as random sequences. 8177322396 ought to occur sooner or later, as would a trillion 3’s in a row—though we might not be able to actually calculate where in any reasonable amount of time such as the lifetime of the universe.

gasman (11296)

Part of the problem is determining what is meant by pattern. Suppose that I generated a random digit sequence using nuclear decay, and I modified it so that every time there were 7 1’s in a row, the seventh 1 is replaced by randomly choosing one of the other digits. This sequence is no longer random, and there are computer programs that test for randomness that would recognize that it is not random. But would you say that it has a pattern?

Clearly the 9th digit is nine… which happens in 10% of random digit sequences with 9 digits in them.

Patterns can be invented to fit a random sample, once the sample is available. The chance that that sample could have been guessed and would show up, depends on the pattern definition, and the random number generation algorithm. Sometimes the people generating the randomness unintentionally build in some patterns, too. And sometimes, subconscious figuring can look like something else. For example, horses that seem to be able to do math, but are really picking up on the way the asker asks the question, and learns how many times to stomp their hoof to get a biscuit, based on the asker’s expectation and clues they didn’t realize they were giving.

Zaku (22533)

It depends on how long the string is. If it is infinitely long, the chance is zero.

If you want relatively accurate proportions, your “patterns” need to be formally defined within the possibility domain – and thus you can get to ratios. Truly random assigning will result in a value close to that ratio(s). Testing to see if someone is psychic you can use their test results to the ratio metrics to see if they do significantly better than random.

ipso (4476)

@ipsoTesting to see if someone is psychic you can use their test results to the ratio metrics to see if they do significantly better than random.” Yes, this is one of the basic tests that paranormal claims must pass to demonstrate scientific validity. Generally statistical significance (low p-value) is lacking in tests of psychic ability. There’s a vast skeptical literature on the long and sad history of junk science, fake science, and hoaxes on which evidence for psychic abilities rests.

@Zaku: Read about Clever Hans: ”Though the experiment strongly indicated that the horse probably had no real grasp of math, it did uncover an extraordinary insight…There is evidence to indicate that horses may possess an enhanced sensitivity to inconspicuous body language, perhaps as a key part of their social interactions with other horses” (the point being the horse trainer was subtly cuing the horse to stop tapping while probably unaware of it himself). Which calls for a Feynman quote: Science is a way of trying not to fool yourself. The first principle is that you must not fool yourself, and you are the easiest person to fool. [1964 lecture]

Back to the original question: ”What is the probability of a random data set having a pattern” As noted by others above, pattern is a vague term. A small number of simple rules can generate non-trivial infinite patterns—think computer programs. The results may be pseudo-random or chaotic—but deterministic nonetheless. (True randomness occurs in nature, e.g. in the form of radioactive decay, though the probabilities are precisely measured and explained by theory.)

But then if an underlying pattern exists which is generating the sequence, then it isn’t really “random” after all, is it? At some level, being random and having a pattern are mutually exclusive properties. In fact randomness is even more slippery a concept than pattern. Making the question a bit ambiguous.

In Carl Sagan’s sci-fi novel Contact the aliens send blocks of data whose length is the product of two prime numbers, suggesting rearrangement as a 2-dimensional raster-scan bit image (which indeed pans out in the story). Many seemingly random sequences can turn out to be highly patterned, organized, and intelligent.

But wait: Pattern recognition is what our Home sapiens brains have been honed to do over millions of years. As Michael Shermer put it in Why People Believe Weird Things, ”We evolved to be skilled, pattern-seeking, cause-finding creatures. Those who were best at finding patterns (e.g., standing upwind of game animals is bad for the hunt, cow manure is good for the crops, etc.) left behind the most offspring. We are their descendants.

Again there’s a huge amount written about the human tendency to find signals in noise (seeing the Virigin Mary in fried tortillas, for instance). Eerie-seeming coincidences in numerical sequences are no different, and can give the illusion of “pattern” where none actually exists, such as the first few decimal digits of e, mentioned in somebody’s answer above.

Bottom line: You can always find patterns in random, or seemingly random, data. Whether such patterns are actually meaningful is usually in doubt. It’s easy to fool yourself.

gasman (11296)

Isn’t randomness in and of itself a pattern?

@thekoukoureport You’ve uncovered another ambiguity of the word ‘pattern’. Randomness has certain definite mathematical statistical properties. If you label that ‘pattern’ then the answer is definitely yes.

But randomness is inherently non-deterministic—following no rules to predict future behavior. If you label that ‘unpatterned’ then the answer is definitely no.

You can always find some kind of rule or pattern to account—after the fact—for some finite list of numbers. We do it all the time to devise a mnemonic rule for remembering phone numbers, etc. Some people use geometric patterns on a phone keypad. None of this has much bearing on underlying randomness.

gasman (11296)

I’d say the probability is 100% for real numbers like e or pi or sqrt(2). All you need is plenty of computing time to find plenty of patterns. For example

3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587006606315588174881520920962829254091715364367892590360011330530548820466521384146951941511609

I found a pattern: 74

It occurs three times.

mattbrowne (31595)

One definition of randomness, Kolmogorov complexity , talks about the number of lines of code needed to reproduce a pattern. If the program is no better than one required to individually name each item then the pattern is random. I am not that familiar with how this is used, but I would think that it would have to be modified, since a data compaction algorithm could do at least some compaction on random data. For example, random data will have sequences where a single character is repeated several times and the compaction algorithm can take advantage of this by using an escape sequence to indicate the repetition.

@LostInParadise I can’t say I follow you completely, having no math, but I was still fascinated by your answer and found a lot in it to think about. Thanks.

dpworkin (27035)

@LostInParadise Thanks for your answer. As far as I understand it, compression algorithms only work well when there is a lot of homogeneous data. If you only have three repeated digits in a thread of three hundred, then the extra two bytes (or however many) to define how many digits and where in the look-up table to find what digit to repeat would take up more room than that saved. For example, saying “two sixes” is longer than 66, but “ten nines” is shorter than 9999999999.

But I think the point @LostInParadise was making (and it’s an excellent one: G.A.) is that repetition is not the only hallmark of patterns that can be compressed. For example (as one of the related links points out) you could have a full-color, hi-res image of the Mandelbrot set containing millions or billions of pixels, yet it can be easily described by a simple equation consisting of only a handful of symbols & a few other bits of data (starting seed, zoom scale, etc) even though no explicit repeating groups of data are involved.

With truly random data, on the other hand, any computer algorithm you might devise to describe it would itself would be longer or more complex than the data it’s describing. I was only dimly aware of Kolmogorov theory but I see now that it’s highly relevant to the original question.

gasman (11296)

@gasman I understand that, I was simply trying to illustrate that compression only works well for strong patterns, not for data sets that are a distant approximation to a pattern. ‘Homogeneous’ wasn’t the best choice of words though.

What I was trying to say is that in a long enough sequence of random digits there are going to be some long sequences of digits and other repeating patterns, allowing for some compaction, although I imagine the space saving will not be very significant.

By definition there can be no meaningful pattern in randomness. It is true that a billion monkeys typing for eternity will produce the works of Shakespeare but any literary gems they produce will be hopelessly lost in a chaos of letters. If you tried to prevent the monkeys from typing out Macbeth the sequence would cease to be random.

flutherother (29399)

Depending on what you want from your pattern.

If you want “999” then the longer the string, the higher the probability.

But people will find patterns wherever they want. The whole problem skeptics have with ESP is that the “pattern” is not defined ahead of time.

roundsquare (5517)

or