Statistical Hypothesis Testing

For machine learning practicals

Statistical Hypothesis Testing

You are training a Linear Regression model (y = a + b · y). You want to test if the feature “House_Size” has any impact on the target “Price”. What is the correct Null Hypothesis?

1: You are training a Linear Regression model (y = a + b · y). You want to test if the feature “House_Size” has any impact on the target “Price”. What is the correct Null Hypothesis?

H0​: b=0

H0​: b!=0

H0​: b>0

H0​: b<0

You have a baseline model with 85% accuracy. You want to prove your new model is statistically better. Which set of hypotheses is correct?

2: You have a baseline model with 85% accuracy. You want to prove your new model is statistically better. Which set of hypotheses is correct?

H0​: μ=0.85 vs. H1​: μ!=0.85

H0: The model achieves consciousness.

H0​: μ≤0.85 vs. H1​: μ>0.85

H0​: μ>0.85 vs. H1​: μ≤0.85

You are A/B testing a new “Buy” button color at significance level α = 0.05. You get resulting p-value 0.03. How will you decide?

3: You are A/B testing a new “Buy” button color at significance level α = 0.05. You get resulting p-value 0.03. How will you decide?

Accept H0​. The colors perform identically.

Reject H0​. The color change made a significant difference.

Fail to reject H0​. The difference is just noise.

No idea, just make the button invisible.

You are building a mission-critical AI for a nuclear reactor. You need to be extremely sure before you change any settings. Which α value do you choose?

4: You are building a mission-critical AI for a nuclear reactor. You need to be extremely sure before you change any settings. Which α value do you choose?

α=1.00

α=0.10

α=0.05

α=0.01

A firewall fails to identify a virus and lets it in. What type of error is this?

5: A firewall fails to identify a virus and lets it in. What type of error is this?

Type I

Type II

Which of the following are examples of a Type I Error?

6: Which of the following are examples of a Type I Error?

Medical test saying you are sick when you are healthy.

Spam filter marking a spam as real email.

FaceID unlocking your phone for a stranger.

Your code running without errors on the first try.

What does a p-value of 0.03 mean?

7: What does a p-value of 0.03 mean?

There are 3% of errors in the dataset.

The model is 3% accurate.

The result is 97% correct.

There is a 3% chance the result happened by random chance assuming the Null is true.

You are building an AI for autonomous braking. H0: no obstacle. False Negative: car does not stop while seeing an obstacle. False positive: The car brakes for a shadow. How do you adjust α to make the car safer?

8: You are building an AI for autonomous braking. H0: no obstacle. False Negative: car does not stop while seeing an obstacle. False positive: The car brakes for a shadow. How do you adjust α to make the car safer?

Increase α, Type I error increases, Type II error decreases

Increase α, Type I error decreases, Type II error increases

Decrease α, Type I error decreases, Type II error increases

Decrease α, Type I error increases, Type II error decreases

You get a p-value of 0.04. This means there is a 96% probability that the Alternative Hypothesis is true.

9: You get a p-value of 0.04. This means there is a 96% probability that the Alternative Hypothesis is true.

True

False

Slide

10: Slide

You trained a model to detect fake product reviews. You want to check if the categorical feature “Review Platform” (Amazon, eBay, Etsy) is independent of the model’s prediction (Fake, Genuine). Which statistical test do you use?

11: You trained a model to detect fake product reviews. You want to check if the categorical feature “Review Platform” (Amazon, eBay, Etsy) is independent of the model’s prediction (Fake, Genuine). Which statistical test do you use?

Z-test

Pearson Correlation Coefficient

Student’s T-test

Chi-Square Test of Independence

Comparing the average salary of Data Scientists vs. Software Engineers. You survey 100 random people from each job. Which test is appropriate?

12: Comparing the average salary of Data Scientists vs. Software Engineers. You survey 100 random people from each job. Which test is appropriate?

Paired T-test

One-Sample Z-test

Independent Two-Sample T-test

A coding battle

You want to test if data augmentation improves model accuracy. You train a model on Dataset 1 (with aug) and Dataset 1 (without aug). You repeat this for all 20 datasets. Which test is appropriate?

13: You want to test if data augmentation improves model accuracy. You train a model on Dataset 1 (with aug) and Dataset 1 (without aug). You repeat this for all 20 datasets. Which test is appropriate?

Paired T-test

The Turing Test

Independent Two-Sample T-test

Chi-Square Test

When do you use a T-test instead of a Z-test?

14: When do you use a T-test instead of a Z-test?

When sample size is small.

When sample size is huge.

When data is categorical.

When you lost your Z-table.

You test 1,000 random features against a target with α=0.05. How many features will likely appear significant purely by chance?

15: You test 1,000 random features against a target with α=0.05. How many features will likely appear significant purely by chance?

0

H0: b=0

H0: b!=0

H0: b>0

H0: b<0

H0: μ=0.85 vs. H1: μ!=0.85

H0: μ≤0.85 vs. H1: μ>0.85

H0: μ>0.85 vs. H1: μ≤0.85

Accept H0. The colors perform identically.

Reject H0. The color change made a significant difference.

Fail to reject H0. The difference is just noise.