Evaluating Hypotheses
Li M. Fu
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Introduction
-
How to evaluate a hypothesis produced by a machine learning program?
-
How to estimate the accuracy of a hypothesis produced based on a limited
amount of data? What is the confidence level and interval?
-
How to compare two hypotheses?
-
How to best utilize the data for both learning and evaluation?
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Topics
-
Estimating hypothesis accuracy
-
Sampling theory
-
Confidence intervals
-
Comparing hypotheses
-
Comparing learning algorithms
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Hypothesis Accuracy
- Sample error
- True error
- Confidence intervals
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error Estimation and Binomial distribution
- A Binomial distribution gives the probability of observing r heads
in a sample of n independent coin tosses.
- In parallel, error estimation is concerned with the probability
of observing r correct predictions out of n predictions.
- In this analogy, the error rate corresponds to the probability
of heads on a single coin toss.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Confidence Intervals
- Confidence intervals calculation for the Binomial distribution
is somewhat difficult. However, the binomial distribution can be closely
approximated by the Normal distribution for sufficiently large
sample sizes.
- For a measured value y,
the true mean u will fall into the interval
N% of the time, called an N% confidence interval.
- An N% confidence interval for some parameter p is an interval
that is expected with probability N% to contain p.
- Two-sided versus one-sided bounds
- Central limit theorem
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Comparing Two Different Hypotheses
- Different in error
- Hypothesis testing
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Comparing Two Different Learning Algorithms
- Paired t test
- (1) Formulate a null hypothesis
- (2) Calculate the t value
- (3) Look up the t distribution table
- (4) Accept or reject the null hypothesis
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Summary
-
The difference between training and test errors.
-
How to calculate the confidence interval?
-
What is the distribution of the error?
-
How does the sample size affect the error measurement?
Variance?
-
When can we say a hypothesis is statistically better than another?
Other Supplementary Material
Predictive Values