Statistical Analysis & Reporting
We perform rigorous statistical testing so you can trust our findings.
To make reliable conclusions about a dataset, we must to do more than simply calculate an average and compare it to another. There are any number of issues or errors, including pure chance that could be skewing average values. We need to know what’s going on within our data, which is why we use advanced method of statistical analysis:
Phase 1: Data that is reliable and valid
Our first priority is to ensure a dataset and the mean values we are using for comparisons are robust. We qualify this in terms of reliability (the data has good consistency) and validity (it corresponds to the real world).
This process might involve running algorithms to identify and filter out data anomalies (abnormal readings) and running power calculations to ensure our dataset has the necessary sample size to show genuine trends and patterns. Testing for ‘Normal Distribution’ is an important part of this phase. To learn more about normal distribution and why we do it, click here.
Phase 2: What is the data telling us?
The secondary phase of analysis is to examine the data for patterns, trends and correlations that are not due to chance. This involves testing for statistically significant differences or relationships between our means. The term statistical significance is important here – just because one normally distributed mean is greater than another, it still doesn’t mean there is a true difference we can trust!
Therefore we run parametric tests on our data that allows us to analyse means with an applied confidence interval. The result of these tests will indicate if the difference or relationship is true (greater than 95% probability), or due to chance.
Comparing two means is just the tip of the iceberg!
The best part of statistical analysis (in our opinion) is digging deeper into data to find solutions to your research objectives. There are literally hundreds of statistical tests you can run on a dataset to answer different types of questions. Here are a few of our favourites:
Correlation Analysis: Does one mean directly relate to another? (e.g. does the distance a visitor walks, correlate to the time they spend in the restaurant?)
Regression Analysis: Does one mean predict another? (e.g. does the distance a visitor walk, predict how long they spend in the restaurant?)
Mediation Analysis: Does one mean ‘mediate’ or influence the relationship between two others (e.g. Does the weather impact the relationship between the distance they walk and the time spent in the restaurant).
Phase 3: Our findings, interpretations and reports…
Our reports are where the magic happens! The principle aim of our reporting is to provide complete answers to the research objectives you set at the beginning of the project, and draw conclusions and interpretations of what this might mean for you and your organisation.
We use graphs, tables and text to break the data into consumable facts and figures, and explain each one in plain English, so you get a complete picture of your research findings.
And we don’t stop there. We present our reports to you, either in person or via video conference. This is important to us, as we want to make sure you have the opportunity to ask questions and fully understand the intricacies of our reports.
EXAMPLE: Why we check for Normal Distribution…
The distribution of data is a common problem with many surveys and research projects. Most of these projects compare measures of central tendency (mean, mode and median) taken from the data. Means values of a sample (sum of scores, divided by number of scores) are particularly important for nearly all statistical analysis, but many people will use them without verifying the reliability of a mean.
The graph on the left demonstrates how the distribution of scores can actually throw off your mean, and thus your analysis and resulting findings. This graph shows the variance in the distances people walked around a site (represented as Zscores). The bar at 0.00 shows how many people walked the mean distance. The bars to the left of 0.00 show the number of people who walked distances less than the mean, and the bars to the right show how many people walked distances that are further than the mean.
This graph is skewed to the left (positive), as the mean is not centrally located in the dataset. This is firstly because the people who walked shorter distances were still not too far off the mean value (bars to the left of 0.00), whereas the distances of people who walked further (the bars to the right), are much more spread out. The major problem though, is the mean distance is much higher than the most popular distance walked (the mode – highest bar), and it is also probably higher than the middle (median) distance.
Ideally the distribution graph will look like the right hand graph, where the mean is centrally located and similar to the mode and median scores. The distances around the mean should tail off more evenly on either side. This is what we call normal distribution.
At TracAce Analytics, this is one of the first and most important statistical tests we run on a data set. It tells us whether we can trust our mean is a true average, or if we need to check anomalies or collect more data to consider it a finding we can apply to an entire population of visitors.