What is the relevant outcome space of the random variable Y ?

Econ 41 Review

Discrete Random Variables. Suppose that we are interested in the number of cups of coffee drank by a
(randomly selected) student at UCLA. This quantity can be represented as a random variable Y with
probability mass function:
pY (a) =



1
4
if a ∈ {0, 1, 2}
1
8
if a = 3
3
32 if a = 4
c if a = 5
0 otherwise
,
where c is an unknown constant.
(a) Explain why the number of cups of coffee drank in a day by a randomly selected student at UCLA
is a random variable.
(b) What is the relevant outcome space of the random variable Y ?
(c) Explain what the distribution of this random variable represents. In other words distribution of
Y assigns a probability to any subset of the outcome space. How do we interpret this probability?
(d) Solve for c. (Hint: Recall that PY (OY ) = 1 so that P
a∈OY
pY (a) must equal one).
(e) What is the probability that a randomly selected student at UCLA drinks at least 3 cups of coffee
a day, PY (Y ≥ 3)?
(f) What is the expected number of cups of coffee drank per day for a randomly selected student at
UCLA?
Continuous Random Variables. Suppose that we are interested in the income of a randomly selected
Angeleno. The distribution of incomes (in tens of thousands of dollars) for residents of Los Angeles
can be described as a random variable, X, with the following pdf.
fX(a) =



0.11 − ca if 0 ≤ a ≤ 10
0 otherwise
,
where c is an unkown constant.
1
Page 2
(a) What is the outcome space of X, OX?
(b) Using the relationship
PX(l ≤ X ≤ m) = Z m
l
fX(a) da,
explain why the pdf must always be weakly positive, fX(a) ≥ 0, for any a ∈ R.
(c) Because PX(OX) = 1 we must have that R 10
0
fX(a) da = 1. Using this fact, solve for c.
(d) What is the expected value of X, E[X]?
(e) What is the variance of X, Var(X)?
Variance and Covariance. Let Y be a random variable representing income (in tens of thousands of
dollars) and X be a random variable representing years of education. Suppose that the marginal
distribution of X is described by its probability mass function
pX(x) =



0.05 if x ∈ {1, 2, . . . , 12}
0.09 if x ∈ {13, 14, 15, 16}
0.04 if x ∈ {17}
0 otherwise
.
The marginal distribution of Y is described by its probability density function
fY (y) =



0.1 if 0 ≤ y ≤ 10
0 otherwise
.
(a) What is the expectation of Y , E[Y ]? What is its variance, Var(Y )?
(b) What is the expectation of X, E[X]? What is its variance, Var(X)?
(c) Using E[Y X] = 60 compute the covariance between Y and X, Cov(X, Y ).
(d) Calculate the correlation coefficient between X and Y .
ρY X =
Cov(X, Y )
σXσY
.
(e) What does this covariance tell us about the relationship between education levels and income? Is
there a positive or negative association?
(f) Should we interpret this result as a causal relationship between education and income? What are
some reasons we may want to refrain from this interpretation?
(g) (Challenge) A common inequality used in econometrics is the Cauchy-Schwarz inequality. It
states that, for any random variables X and Y , and any functions g(·) and h(·),

E[g(X)h(Y )]

≤
p
E[g
2(X)]p
E[h
2(Y )].
Use this inequality to show why the correlation coefficient is bounded between negative one and
Page 3
one, −1 ≤ ρXY ≤ 1. (Hint: Try g(x) = x − µX and h(y) = y − µY ).
Introduction to Single Linear Regression
Useful Equalities. Recall that in deriving the form of βˆ
1 we used the following equalities
1
n
Xn
i=1
(Yi − Y¯ )(Xi − X¯) = 1
n
Xn
i=1
YiXi − Y¯ X¯ and 1
n
Xn
i=1
(Xi − X¯)
2 =
1
n
Xn
i=1
X2
i − (X¯)
2
.
Show either one of these equalities (only have to show one or the other).
Assumptions for Inference. Suppose we are interested in the relationship between the size of the average
American’s social circle, X, and whether or not they are unemployed, Y . To investigate this relationship
we want to estimate the following regression equation1
Y = β0 + β1X + , E[] = E[X] = 0.
To estimate the regression coefficient parameters we collect a sample of size n, {Yi
, Xi}
n
i=1. Recall
that for valid asymptotic inference on our estimates βˆ
0 and βˆ
1 we require the following assumptions:
Random Sampling, Homoskedasticity, and Rank condition.
• Random Sampling: Assume that {Y,Xi} are independently and identically distributed from the
population of interest, (Yi
, Xi)
i.i.d ∼ (Y, X).
• Homoskedasticity: Assume that Var(|X = x) = σ
2

for all possible values of x.
• Rank Condition: There must be at least two distinct values of X that appear in the population.
(a) Suppose we collect our sample by only randomly surveying people on UCLA campus. Which
assumption would be violated?
(b) Suppose we collect our sample and find that everyone appears to have exactly one friend. Which
assumption would be violated? Why is this a problem when computing the line of best fit through
our sample?
(c) Suppose random sampling, homoskedasticity, and the rank condition are all satisfied, but n = 10.
Why might inferences based on the approximation
βˆ
1 − β1
σˆβ1
/
√
n
∼ N(0, 1)
not be valid?
Hypothesis Testing. Suppose now that we are interested in investigating the relationship between the
size of someone’s social circle, X, and their income (in tens of thousands of dollars), Y . We want to
estimate the following linear regression model
Y = β0 + β1X + , E[] = E[X] = 0.
1Recall that this regression specification corresponds to finding the line of best fit parameters β0, β1 = arg minb0,b1 E[(Y −
b0 − b1X)
2
] and defining = Y − β0 − β1X
Page 4
To do so we collect a random sample of size n = 64, {Yi
, Xi}
64
i=1 and find that 1
n
Pn
i=1(Xi −X¯)
2 = 100,
1
n
Pn
i=1(Yi − Y¯ )(Xi − X¯) = 225, Y¯ = 5.5, and X¯ = 1.5.
(a) Using this information find and interpret βˆ
1 and βˆ
0.
(b) After finding βˆ
1 and βˆ
1 describe how you would construct the estimated residuals ˆi
.
(c) We find that 1
n
Pn
i=1 ˆ
2
i = 36. Use this and the result that, for n large,
βˆ
1 − β1
σˆβ1
/
√
n
∼ N(0, 1),
to compute the (approximate) probability that, if the true value was given β1 = 0, we would see
a value of |βˆ
1| equal to or larger than the one that we observed.
(d) Use this result to test, at level α = 0.1, the hypotheses
H0 : β1 = 0 vs. H1 : β1 6= 0
(e) Conduct this test in another fashion by constructing the test statistic t
∗ and comparing to either
z0.95 = 1.64 or z0.9 = 1.24 (indicate which value you are comparing the test statistic to).
(f) Construct a 90% confidence interval for β1. How could we use this to conduct the hypothesis test
in part (d)?
(g) Suppose that we find we made an error in our calculation and actually 1
n
Pn
i=1(Xi − X¯)
2 = 1. If
all other values stayed the same, how would this change the result of the hypothesis test in part
(d)?