Is Too Much Data Too Much?

by Joe Schaefer No Comments April 13, 2010

I’m not sure that Avinash Kaushik would agree, but it seems that an overwhelming amount of data does not necessarily equate the most valuable data. Can too much data be a problem and not lead to a solution?

Let’s look at examples of data collection prior to website data collection:

  • The feds collect massive data to help ‘predict’ periods of inflation.
  • Some might say (and I’m not entirely sure) that large data sets are collected to help ‘predict’ natural disasters (such as earthquakes).
  • I’m quite sure mega-data-sets are created to help ‘predict’ the weather.

My argument here (and don’t bother looking for a solution in this post, rather, seek new questions to ask for your own Internet-marketing-best-interests) is that data collection, metrics and the like are collected in huge amounts in order to help make web design and development decisions as well as base strong assumptions on visitor behavior.

So, let’s step back to those other data collectors (like the feds, like scientists, etc) and realize that there are more recent improvements, discoveries and theories that point to better ‘predictor’ models. These predictor models are shedding more light on more natural, real and more accurate ‘predictions’ using much less data.

I’m speaking of course of Matrix Theory and what’s called the “Curse of Dimensionality’. I know, “Wait, what?”

First, I’m no expert, so bare with me theoretically. Or, read the Matrix Theory article and become better educated on the subject if need be, otherwise, I’ll summarize below.

From the Matrix Theory article: “It’s all part of the recent explosion of work in an area of physics known as random matrix theory. Originally developed more than 50 years ago to describe the energy levels of atomic nuclei, the theory is turning up in everything from inflation rates to the behavior of solids. So much so that many researchers believe that it points to some kind of deep pattern in nature that we don’t yet understand.”

Curse of Dimensionality means “while a large amount of information makes it easy to study everything, it also makes it easy to find meaningless patterns. That’s where the random-matrix approach comes in, to separate what is meaningful from what is nonsense.”

Some teams of researchers are showing that these large sets of data are not typically pointing to the most meaningful and/or accurate answers, rather with so many variables to study and compare, insignificant results may appear in patterns that actually have less of a bearing on reality than we once thought.

Perhaps it is smaller sets of data and a different kind of analysis (like those derived from Matrix Theory) that can help us discover more meaningful answers or indicators for proposed change, improvement and/or avoidance of failures.

Internet Marketing and this hoopla

Surely, many business owners who ask us to market their websites (whether SEO, SMM, or PCC) want to or need to know the most important metrics. There is, however, the argument to be made that as a practitioner, of marketing in general and marketing online, that gathering huge data sets seems valuable, but are we looking too closely — to the point of making some mistakes?

Is anyone out there creating better data selection and better data defining tools for Internet marketing? Yes? No? Maybe? Let me know!

Thoughts?

Download PDF