8.4 Level of Significance and the p-Value – Simple Stats Tools (2024)

8 Hypotheses Testing

The conceptlevel of significance is used to adjudicate whether the probability (of our results if the null hypothesis is true) is too high to dismiss the null hypothesis or low enough to allow us to reject the null hypothesis. In other words, the level of significance is what we use to proclaim results as statistically significant (when we reject the null hypothesis) or not statistically significant (when we fail to reject the null hypothesis).

Think about it this way: recall that with confidence intervals we had selected 95% certainty and 99% certainty as meaningful levels of confidence. What is left is 5% and 1% “uncertainty”, as it were, which we agree to tolerate. These 5% or 1% are distributed equally between the two tails of the normal distribution (2.5% on each side or 0.5% on each side, respectively). They also correspond to z=1.96 and z=2.58. Following the logic of Example 8.2 (A) from the preivious section, in order to reject a null hypothesis, we want the probability to be lower that these 5% or 1% (so that we can “feel confident enough”).

And this is exactly it: When we put it that way, saying that we want the probability (of the null hypothesis being true) — called a p-value — to be less than 5%, we have essentially set the level of significance at 0.05. If we want the probability to be less than 1%, we have set the level of significance at 0.01. We can go even further: we might want to be extra cautious and to wanta “confidence” of 99.99%, so that we want the probability to be less than 0.01% — then we have set the level of significance at 0.001.

These three numbers — 0.05, 0.01, and 0.001 — are the most commonly used levels of significance. The level of significance is denoted by the small-case Greek letter a, i.e.,α, thus we usually choose one of the following:

8.4 Level of Significance and the p-Value – Simple Stats Tools (1)

8.4 Level of Significance and the p-Value – Simple Stats Tools (2)

8.4 Level of Significance and the p-Value – Simple Stats Tools (3)

You can think of the significance level as the acceptable probability of being wrong — and what is acceptable is left to the discretion of the researcher, subject to the purposes of the particular study.

Following the logic presented in Example 8.2(A) then, if the probability of the result under the null hypothesis — the p-value — is smaller than a pre-selected significance levelα, the null hypothesis is rejected and the result is considered statistically significant[1]. This is denoted in one of the following ways:

p ≤ 0.05

p ≤ 0.01

p ≤ 0.001[2]

To summarize, when a hypothesis is tested, we end up with an associated p-value (again, the probability of the observed sample statistics if the null hypothesis is true). We compare the p-value to the pre-selected significance levelα: if pα, the results are statistically significant and therefore generalizable to the population.

So far so good? Good. However, unfortunately this isn’t all (sorry!). What I have presented above is the most conventional treatment of how to use and interpretp-values. It is attractively straightforward — but it’s also arbitrary, and its true interpretation is subject of an ongoing debate. As an introduction to the topic, I will leave it at that but you should be aware that there’s more to the p-value, and that its usage has been (rightfully) questioned and/or challenged in recent years.[3].

Going back to our example from the preivious section, let’s see how the p-values can change due to particular features of the study, like the sample size. Example 8.2(B) illustrates.

Example 8.2(B) Employee Productivity (Finding Statistically Non-significant Results, N=25)

Imagine that we had the same information as in Example 8.2(A), however, 25 employees took the training course instead of 100 and their average score was 620. The we have:

8.4 Level of Significance and the p-Value – Simple Stats Tools (4)

8.4 Level of Significance and the p-Value – Simple Stats Tools (5)

8.4 Level of Significance and the p-Value – Simple Stats Tools (6)

8.4 Level of Significance and the p-Value – Simple Stats Tools (7)

We still want to know the probability of a score of 620 if the training course didn’t contribute to the gain, i.e., the probability of a score of 620 under the condition of the null hypothesis.

  • H0: The training course did not affect productivity (the 620 score was due to random chance); 8.4 Level of Significance and the p-Value – Simple Stats Tools (8) 8.4 Level of Significance and the p-Value – Simple Stats Tools (9).
  • Ha: The training course affected productivity (the 620 score was a true gain); 8.4 Level of Significance and the p-Value – Simple Stats Tools (10) 8.4 Level of Significance and the p-Value – Simple Stats Tools (11).

The new standard error is:

8.4 Level of Significance and the p-Value – Simple Stats Tools (12) 8.4 Level of Significance and the p-Value – Simple Stats Tools (13)

Then the z-value of 620 is:

8.4 Level of Significance and the p-Value – Simple Stats Tools (14) 8.4 Level of Significance and the p-Value – Simple Stats Tools (15)

Given the properties of the normal curve, we know that 68% of all means in infinite sampling will fall between±1 standard error (i.e, between 580 and 620), 95% will fall between±1.96 standard errors (i.e., approximately between 560 and 640), and 99% will fall between±2.58 standard errors (i.e., approximately between 540 and 660). The score of 620 has8.4 Level of Significance and the p-Value – Simple Stats Tools (16) — it falls quite close to the not-trained group’s mean of 600.

In terms of probabilities, consider the following: z=1 has a p>0.30.Assuming the null hypothesis is true, our calculations show that the 620 score will appear more than 30% of the time due to random chance, which is a lot more than the 5% (atα=0.05) that we are willing to tolerate. As such, we cannot reject the null hypothesis: we do not have enough evidence to conclude that the gain in productivity of 20 points which the 25 employees demonstrated is statistically significant. In other words, we don’t have enough evidence that the training course was effective. (This doesn’t mean that it didn’t beyond a shadow of a doubt, just that at this point in this particular study we don’t have enough evidence to say it did.)

We can also see the correspondence with confidence intervals:

  • 95% CI:8.4 Level of Significance and the p-Value – Simple Stats Tools (17) 8.4 Level of Significance and the p-Value – Simple Stats Tools (18)

That is, we canbe 95% certain that the average score for the population of employees who take the training course would be between roughly 581 points and 659 points. The average general score of 600 points is a plausible value for8.4 Level of Significance and the p-Value – Simple Stats Tools (19), which is consistent with our decision to not reject the null hypothesis.

Again, Example 8.2 is a heuristic device, used only to explain the logic of hypotheses testing. Of course, normally we wouldn’t have information about population parameters and we will be using sample statistics (i.e., we would use not only the sample mean 8.4 Level of Significance and the p-Value – Simple Stats Tools (20) but also the sample standard deviation s, to calculate the estimated sampling distribution 8.4 Level of Significance and the p-Value – Simple Stats Tools (21)). (Not to mention that we would have two different standard deviations, one for the trained group and one for the not-trained group of employees.) As you learned in the previous chapter, this moves us from using the z-distribution to the t-distribution with given degrees of freedom. Recall that with a sample size of about 100 — i.e., with df=100 — the two distributions converge.

Here then is a quick-and-dirty method you can use as a preliminary indication of whether something will be statistically significant. Since z=1.96 corresponds to 5% probability (2.5% in each tail), and z=2.58 corresponds to 1% probability (0.5% in each tail), even without knowing the exact p-value associated with a given z-value, you can guess that getting a z<1.96 will be non-significant while a z>1.96 will be significant atα=0.05; similarly, getting a z>2.58 will be statistically significant atα=0.01[4]. As samples used in sociological research are commonly of N>100, the same insight applies to the corresponding t-values with df≥100. Understand, however, that this is not an official way to test hypotheses or report findings: to do that, you always need to report the exact p-value associated with a z-value or a t-value with given df[5].

One-tailed tests. Finally, a note on one-tailed tests. While at the beginner researcher level, I advise you against using them yourself, it is not a bad idea to know they exist and what they are. Briefly, the idea is that if we have a good reason to suspect not only a difference/effect but a difference/effect with a specific direction (i.e., positive or negative), we can specify the hypotheses accordingly. To use Example 8.2(A) again, say, we think there is no possibility that the training course decreased productivity scores. Then we can state the hypotheses as:

  • H0: The training course either did not affect productivity or decreased it; 8.4 Level of Significance and the p-Value – Simple Stats Tools (22)8.4 Level of Significance and the p-Value – Simple Stats Tools (23).
  • Ha: The training course increased productivity; 8.4 Level of Significance and the p-Value – Simple Stats Tools (24) >8.4 Level of Significance and the p-Value – Simple Stats Tools (25).

This is a stronger claim (that’s why it needs to be well-justified) — we test not a difference (that can be either positive or negative) but an increase. Thus, we move the significance level to only one of the tails, as it were, the positive (right) tail, so instead of 2.5% being there, 5% are.

This change in probability essentially “moves” the z-value corresponding to significance closer to the mean; now a smaller z-value will have the p-value necessary to achieve statistical significance. To be precise, 5% (2.5% in each tail) corresponded to z=1.96; all 5% in the right tail corresponds to z=1.65[6]. This obviously “lowers the bar” of achieving statistical significance without changing the level of significanceαitself, and makes rejecting the null hypothesis easier, hence my description of the two-tailed test as more conservative (and my insistence on using it instead of a one-tailed test).

Before we move on to the last section of this theoretical chapter, the promised warning about the meanings of the term significance.

Watch out!! #15… for Mistaking Statistical Significance for Magnitude or Importance

If you have been paying attention, you have learned by now that statistical significance has a very narrow meaning. To have a statistically significant result simply means that the probability of observing our sample statistics (or difference, or effect, etc.) as they are, given that the null hypothesis is true, is small enough to be (highly) unusual, to be so relatively rare as to indicate what we have is not a result of random sampling variation but of untrue null hypothesis.

None of this says anything about how big a difference/effect is — in fact it can be quite small, and still statistically significant, given large enough sample size and other study specifications[7].

Similarly, many people unfamiliar with statistics take statistical significance to mean that the finding are of significant importance. Again, nothing about statistical significance confers great meaning to or implies importance of statistically significant findings. One can study an objectively trivial/unimportant issue and have statistically significant findings of no relevance to anyone whatsoever.

To conclude, keep these distinctions in mind — between the conventional usage of the word significant (meaning either important, or big) and statistical significance — both when interpreting and reporting results and when reading and evaluating existing research.

When testing hypotheses, I defined the significance level as sort of probability of being wrong we are willing to tolerate. This implies that a likelihood of making anerroneousdecision about the null hypothesis (to reject it or not) exists. The next and final section deals with just that.

  1. Note the difference between α and the p-value. While α indicates what probability of being wrong we are willing to tolerate, the actual p-value we obtain is not the probability of being wrong. The p-value, again, is the probability of our result if the null hypothesis were true; in other words, if the null hypothesis is in fact true, and our p-value is, say, 0.03, we'd obtain our results 3% of the time simply due to random sampling error.
  2. In published research you will find results marked by one asterisk, two asterisks, and three asterisks. These correspond to their significance based on the level used: α=0.05, α=0.01, and α=0.001, respectively. The smaller the level of significance, the more strongly statistically significant the result is (i.e., most considerα=0.001 to indicate "highly statistically significant" results). (If you happen upon a dagger (†), it indicates significance at α=0.1 level, or 10% probability of being wrong, which most researchers consider too high, but some still use.
  3. You can find plenty of information on the topic online; from journals banning the use of p-values and hypothesis testing in favour of effect size (the Journal of Applied and Social Psychology, see Trafimow & Marks, 2015 https://www.tandfonline.com/doi/full/10.1080/01973533.2015.1012991), to calls to abandon statistical significance (e.g., McShane, Gal, Gelman, Robert & Tackett, 2019https://www.tandfonline.com/doi/abs/10.1080/00031305.2018.1527253), to others calling for its and p-values' defense (e.g., Kuffner & Walker, 2016https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1277161?src=recsys; Greenland, 2019 https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1529625?src=recsys). One thing is clear: p-values and levels of significance have become increasingly controversial. Still, the American Statistical Association's position is that although caution against over-reliance on a single indicator is necessary, p-values can still be used, alongside with other appropriate methods: "Researchers should recognize that ap-value without context or other evidence provides limited information. For example, ap-value near 0.05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively largep-value does not imply evidence in favor of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data. For these reasons, data analysis should not end with the calculation of ap-value when other approaches are appropriate and feasible" (Wasserstein & Lazar, 2016https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108?src=recsys). Finally, if you really want to not to overstate what the p-value actually shows, see Greenland et al. (2016) for a of common misinterpretations and over-interpretations of the p-value, of confidence intervals, and tests significance (here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/). Because of its enormity, the topic is still conventionally taught as I presented it above (as it goes way beyond the scope of this book), at least at introductory level.
  4. Obviously, for negative z-values we'll have all these in reverse: -z>-1.96 will be non-significant and -z<-1.96 will be significant, etc.
  5. You can find a handy online p-value calculator of t-values here: https://goodcalculators.com/student-t-value-calculator/.
  6. You can check it here by selecting "up to Z": https://www.mathsisfun.com/data/standard-normal-distribution-table.html.
  7. This is actually one of the reasons some have called for abandoning p-values, statistical significance, and hypothesis testing whatsoever, because statistical significance is not indicative of effect size and is frequently over-stated to mean more than it does; at the same time over-reliance on p-values decreases attention to effect size, careful study design, context, etc.
8.4 Level of Significance and the p-Value – Simple Stats Tools (2024)

References

Top Articles
10 Foods That Lower Blood Pressure
How Long Does A Bottle Of Ranch Last After It's Been Opened
Houses For Sale 180 000
Ups Drop Off Newton Ks
Equipment Hypixel Skyblock
Nazir Afzal on the BBC: ‘Powerful predators were allowed to behave terribly on an industrial level’
Feliz Domingo Bendiciones, Mensajes cristianos para compartir | Todo imágenes
Chronological Age Calculator - Calculate from Date of Birth
Aita For Helping My Girlfriend Get Over Her Trauma
Julia Is A Doctor Who Treats Patients
Fairwinds Shred Fest 2023
50 Shades Of Grey Movie 123Movies
Hmr Properties
Gsa Elibary
Guide:How to make WvW Legendary Armor
Fortnite Fap Hero
Pokimane Titty Pops Out
Zmanim 10977
Dollar General Cbl Answers Shrink Awareness
Walgreens On 37Th And Woodlawn
The Abduction of Heather Teague
Bully Scholarship Edition Math 5
Bakkesmod Preset
Skyward Crawford Ausable
How Much Is 10000 Nickels
Late Bloomers Summary and Key Lessons | Rich Karlgaard
Target Savannah Mall Evicted
How To Create A Top Uber Boss Killer In POE 3.25 League?
Bridger Elementary Logan
How To Use DeSmuME Emulator To Play Nintendo DS Games?
Ebk Jaaybo Net Worth
6023445010
Craigslist For Port Huron Michigan
Re/Max Houses For Sale
30 Day Long Range Weather for 82801 (Sheridan), Wyoming. Weather Outlook for 30 Days From Today.
Hannaford Weekly Flyer Manchester Nh
Patriot Ledger Obits Today
Ihop Ralph Ave
Myusu Canvas
O'reillys Parts Store
Sacramento Library Overdrive
Ccga Address
Bella Poarch Husband: A Deep Dive Into Her Relationship And Personal Life
What Does Wmt Contactless Mean
Saulr80683
Wat is een Microsoft Tenant
Apartments for Rent in Buellton, CA - Home Rentals | realtor.com®
Savor Some Southern Tradition With Crispy Deep-Fried Chitterlings
Mi Game Time
Caldo Tlalpeño de Pollo: Sabor Mexicano - Paulina Cocina
Academic calendar: year cycle and holidays | University of Twente | Service Portal
Temperature At 12 Pm Today
Latest Posts
Article information

Author: Trent Wehner

Last Updated:

Views: 6394

Rating: 4.6 / 5 (76 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Trent Wehner

Birthday: 1993-03-14

Address: 872 Kevin Squares, New Codyville, AK 01785-0416

Phone: +18698800304764

Job: Senior Farming Developer

Hobby: Paintball, Calligraphy, Hunting, Flying disc, Lapidary, Rafting, Inline skating

Introduction: My name is Trent Wehner, I am a talented, brainy, zealous, light, funny, gleaming, attractive person who loves writing and wants to share my knowledge and understanding with you.