What parallels do you see between events in our times and the events and/or practices described in this reading?
What new things do we learn in this reading, and why are they important to the story that Orwell is telling?
- What parallels do you see between events in our times and the events and/or practices described in this reading?
- In what specific ways do you see this book coming true now, and in what ways did Orwell get it wrong?
- What questions do you still have about the reading, and what would you like to ask your peers about this material?
Find the expected value of the sum obtained when n fair coin flips are rolled independently.
Supervised Machine Learning and Learning Theory Fall 2022
Problem Set 1
Instructor: Hongyang Ryan Zhang Due: September 30, 2022, 11:59pm
Instructions:
• You are expected to write up the solution on your own. Discussions and collaborations are
encouraged; remember to mention any fellow students you discussed with when you turn in
the solution.
• There are up to three late days for all the problem sets and project submissions. Use them
wisely. After that, the grade depreciates by 20% for every extra day. Late submissions are
considered case by case. Please reach out to the instructor if you cannot meet the deadline.
• Submit your written solutions to Gradescope and upload your code to Canvas. You are
recommended to write up the solution in LaTeX.
Problem 1 (20 points)
(a) (1 point) Calculate V ar(X) when X represents the outcome when a fair coin flip (i.e., E[X] =
1/2).
(b) (1 point) Find the expected value of the sum obtained when n fair coin flips are rolled
independently.
(c) (2 point) For three events A, B, and C, we know that: A and C are independent, B and C are
independent, A and B are disjoint, P (A∪C) = 2/3, P (B ∪C) = 3/4, P (A∪B ∪C) = 11/12.
Find P (A), P (B) and P (C).
(d) (2 points) Consider a test to detect a disease (e.g., COVID-19), assuming that 0.6% of the
population has it. The test is 97% effective in detecting an infected person. However, the
test gives a false-positive result in 1% of cases (meaning that it shows a positive result if the
person is not infected). What is the probability that a person gets a negative test result?
(e) (2 points) If a person tests positive for the disease, what is the probability that they actually
have COVID?
(f) (2 points) If a person tests negative for the disease, what is the probability that they are
infected with COVID?iPhoneiPhoneiPhoneiPhoneiPhone
Along with the tests, data regarding the number of symptoms shown by the patients was also
recorded and is given below. The data was collected from 2 different sources.
No. of Symptoms Patients
1 20
2 20
3 20
4 20
No. of Symptoms Patients
1 70
2 15
3 10
4 5
(g) (2 points) Suppose you pick one patient from each of the above 2 sources independently.
What would be the expected number of symptoms detected in each of them?
(h) (2 points) Prove that V ar(X) = E[X]2−(E[X])2. Explain the interpretation of this derivation.
(i) (2 points) Let Y1 and Y2 denote the number of symptoms detected in each of the above two
patients respectively, where Y1, Y2 ∈ [1, 2, 3, 4]. Then calculate the following probabilities: (i)
E[Y1Y2]; (ii) V ar[Y1 − Y2].
(j) (2 points) Among a population of n people, let X be the number of people that test positive.
What is the expectation of X, E[X]? What is the variance of X, V ar[X]? Make sure to
include all the steps in the calculation.
(k) (2 points) Define bias error and variance error. What do you understand by Bias-Variance
trade-off?
Problem 2 (20 points)
(a) (2 points) Show that for any arbitrary matrix X ∈ Rm×n, the matrix XX⊤ is always positive
semi-definite.
(b) Recall that the SVD of a rank-r matrix M has the form
M =
r∑ i=1
σiuiv T i ,
where {ui}ri=1 denote the left singular vectors, {vi}ri=1 denote the right singular vectors, and
{σi}ri=1 denote the singular values.
i) (2 points) Let
A =
1 1
1 1
1 −1
.
Calculate the left and right singular vectors {ui}ri=1 and {vi}ri=1 of A. Then show that
{ui}ri=1 and {vi}ri=1 are the eigenvectors of AA⊤ and A⊤A.
ii) (5 points) Let M ∈ Rm×n be an arbitrary real-valued rank-r matrix, show that the
eigenvectors of MMT and MTM are {ui}ri=1 and {vi}ri=1 respectively.
(c) Recall that the best rank-k approximation of M in Frobenius norm is attained by
B =
k∑ i=1
σiuiv T i .
i) (2 points) For the matrix A defined above, calculate the best rank-1 approximation of
A in Frobenius norm. Then find out the approximation error ||M −B||F .
ii) (5 points) Let M ∈ Rm×n be an arbitrary real-valued rank-r matrix. Show that
||M −B||F =
√√√√ r∑ i=k+1
σ2 i .
(d) (4 points) Write a Python file to verify your calculation in (b-i) and (c-i). You may find the
library numpy.linalg.svd and numpy.linalg.eig useful.
Problem 3 (15 points)
(a) (6 points) For vectors x ∈ Rn, a ∈ Rn and matrices X ∈ Rn×n,A ∈ Rn×n, show the following:
(i) ∂aTx
∂x = a.
(ii) ∂xTAx
∂x = (A+AT )x.
(iii) ∂||y −Ax||22
∂x = 2AT (Ax− y).
(b) (4 points) You are given a training set {(x1, y1), . . . , (xn, yn)}, where xi ∈ Rd and yi ∈ R. Consider the regression problem
min θ∈Rd
1
n
n∑ i=1
(yi − θTxi)) 2.
What is the minimizer of the above regression problem? Provide all steps of your derivation.
Feel free to assume that the rank of {xi}ni=1 is equal to d.]
(c) (5 points) Let the cost function to minimize is:
J(w) =
n∑ i=1
(yi − θTxi)) 2 + λ
d∑ j=0
θj 2
Prove that the vector w⋆ that minimizes J(w) is:
w⋆ = (X⊤X + λI)−1X⊤y,
where X is the n by d design matrix, whose i-th row is xi, and y = (y1, …, yn) ⊤.
Problem 4 (15 points)
We consider a regression problem for predicting the demand of bike-sharing services in Washington
D.C.1 The prediction task is to predict the demand for the bikes (column cnt) given the other
features: ignore the columns instant and dteday. Use the day.csv file from the data folder.
(a) (4 points) Write a Python file to load day.csv.2 Compute the correlation coefficient of each
feature with the response (i.e., cnt). Include a table with the correlation coefficient of each
feature with the response. Which features are positively correlated (i.e., have positive corre-
lation coefficient) with the response? Which feature has the highest positive correlation with
the response?
(b) (2 points) Were you able to find any features with a negative correlation coefficient with the
response? If not, can you think of a feature that is not provided in the dataset but may have
a negative correlation coefficient with the response?
(c) (5 points) Now, divide the data into training and test sets with the training set having about
70 percent of the data. Import train_test_split from sklearn to perform this operation.
Use an existing package to train a multiple linear regression model on the training set using all
the features (except the ones excluded above). Report the coefficients of the linear regression
models and the following metrics on the training data: (1) RMSE metric; (2) R2 metric.
[Hint: You may find the libraries sklearn.linear_model.LinearRegression useful.]
(d) (2 points) Next, use the test set that was generated in the earlier step. Evaluate the trained
model in step (c) on the testing set. Report the RMSE and R2 metrics on the testing set.
(e) (2 points) Interpret the results in your own words. Which features contribute mostly to the
linear regression model? Is the model fitting the data well? How large is the model error?
1https://www.kaggle.com/datasets/marklvl/bike-sharing-dataset?search=bike+demand+Washington&
select=Readme.txt. You can also find a Readme.txt file that explains all the features in the dataset. 2Refer to https://docs.python.org/3/library/csv.html on how to load a csv file in Python.https://www.kaggle.com/datasets/marklvl/bike-sharing-dataset?search=bike+demand+Washington&select=Readme.txthttps://www.kaggle.com/datasets/marklvl/bike-sharing-dataset?search=bike+demand+Washington&select=Readme.txthttps://docs.python.org/3/library/csv.html
Problem 5 (10 points)
This question should be answered using the Diabetes data set that is readily available in the
Scikit-learn library.3 This data set has information about 442 patients and whether they have
suffered from diabetes or not.
(a) (2 points) Fit a multiple regression model to predict Diabetes using Age, Sex, BMI, and BP.
(b) (4 points) Provide an interpretation of each coefficient in the model. Be careful—some of the
variables in the model are qualitative!
(c) (2 points) Write out the model in equation form, being careful to handle the qualitative
variables properly.
(d) (2 points) Using the model from (c), obtain 95% confidence intervals for the coefficient(s).
Problem 6 (20 points)
We will now perform cross-validation on a simulated data set.
(a) (2 points) Generate a simulated data set as follows:
numpy.random.seed(12345)
x = numpy.random.normal(0, 1, (200))
y = x + 2 * x**2 – 2 * x**3 + numpy.random.normal(0, 1, (200))
In this data set, what is n and what is p? Write out the model used to generate the data in
equation form.
(b) (2 points) Create a scatterplot of X against Y . Comment on what you find. (Hint: You may
find matplotlib.pyplot.plot() helpful)
(c) (9 points) Set a random seed 123, and then compute the leave-one-out cross validation errors
that result from fitting the following five models using least squares:
(i) Y = β0 + β1X + ε
(ii) Y = β0 + β1X + β2X 2 + ε
(iii) Y = β0 + β1X + β2X 2 + β3X
3 + ε
(iv) Y = β0 + β1X + β2X 2 + β3X
3 + β4X 4 + ε
3https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html. You can
find the description of this data set at https://scikit-learn.org/stable/datasets/toy_dataset.html.https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.htmlhttps://scikit-learn.org/stable/datasets/toy_dataset.html
(v) Y = β0 + β1X + β2X 2 + β3X
3 + β4X 4 + β5X
5 + ε
[Hint: You may find LeaveOneOut() and cross_val_score() in sklearn.model_selection
helpful.]
(d) (2 points) Repeat (c) using another random seed 12345, and report your results. Are your
results the same as what you got in (c)? Why?
(e) (5 points) Which of the models in (c) had the smallest leave-one-out cross validation error?
Is this what you expected? Explain your answer.
In your own words, how would you define that theory?
Chapter 5 of the textbook mentions the self-presentation theory. In your own words, how would you define that theory? Then, based on the various types of online communities outlined in the textbook, from the list below, what online community would you identify yourself with and why?
- Communities of practice
- Communities of interest
- Communities of transaction
- Communities of relationship
- Communities of fantasy
- Brand communities
- Social Networks
Does employment-at-will have better protections for employees or employers? Why or why not?
Primary Task Response: Within the Discussion Board area, write 400-600 words that respond to the following questions with your thoughts, ideas, and comments. This will be the foundation for future discussions by your classmates. Be substantive and clear, and use examples to reinforce your ideas.
Over the years, there has been much debate over the classification of employment-at-will employees. Employment-at-will is a term that refers to the protection that is applied to the employment relationship, such that the employer or the employee has the right to terminate the employment relationship at any time. There are different modifications to employment-at-will that vary at the state level. With your classmates, please discuss the following:
- Does employment-at-will have better protections for employees or employers? Why or why not?
- Choose a state and describe its modifications to employment-at-will. Do you agree with these modifications? Why or why not?
Based upon your experience or recent research, what are some examples of grievances?
Deliverable Length: 4–6 pages (not including cover page and resource page)
The National Labor Relations Act (NLRA) of 1935 was established to help protect the rights of employees and employers, encourage collective bargaining, and discourage harmful labor and management practices. It also created the National Labor Relations Board (NLRB). When employees work at a unionized company, the union negotiates for wages, hours, and other workplace factors. However, if an employee or group of employees finds that their company is not following parts of the collective bargaining agreement, they can file a grievance. In your assignment, please answer the following questions:
- Based upon your experience or recent research, what are some examples of grievances?
- If you had to report a grievance to the NLRB, what steps would you take?
- Using recent examples, what are the differences between mediation and arbitration?
- If you were a human resources (HR) professional, what strategies would you recommend the company take to create a work climate where disputes could be resolved and grievances are a last resort?
Describe 3 models of change.
Deliverable Length: 400-600 words
Now that you have given your team and management an overview of the reorganization and accompanying change impacts, you need to prepare your team to help manage the change. Using the Library, other materials, and credible Internet sources, complete the following:
- Describe 3 models of change.
- Discuss their advantages and disadvantages in relation to the situation and culture at Pegasus.
Cite your sources using APA guidelines.
Describes a step-by-step process of change for the reorganization.
You will be presenting your change design to the department heads of Pegasus. To prepare for this meeting, create a PowerPoint presentation of 5-7 slides (excluding the title and reference slide) with speaker notes of 200–250 words per slide that does the following:
- Describes a step-by-step process of change for the reorganization.
- From the 3 change models that you identified in your DB, determine which one is most appropriate for the Pegasus situation. Justify your answer.
Follow APA guidelines when citing sources.
Discuss the development of Pop Art in America.
Intro to Modern Art – Week 7 Assignment
Pop Art
Write 1000 words on Pop Art in America citing at least 3 sources.
Include the following aspects in the assignment:
· Discuss the development of Pop Art in America.
· Discuss at least three specific artists and titles of three artworks by three different Pop artists
· Discuss the media and imagery of Pop Art
Discuss an incident in which cultural considerations played a role in patient care OR in leadership.
- Do you work with a culturally diverse nursing staff?
- Is your patient population diverse?
- Discuss an incident in which cultural considerations played a role in patient care OR in leadership.
- How were decisions made regarding patient care?
- Was communication an issue in this case?
- How was the situation handled to ensure the best outcome for the patient, or if it was not handled well, what could have been done to increase awareness and sensitivity to the patient’s needs?
At least two references should be used to support the content of the initial post. At least one reference should be outside the assigned textbook, not a website, and written within the last 5 years or less.
Based on what you learned in this course, if you could give one piece of job advice to a new HR representative on how best to handle their new responsibilities, what would it be?
Part I: Reflect on your knowledge base of information pertaining to employment law prior to the start of this course. Identify one area in which you feel you have gained the most insight and how it will assist you in the future.
Part II: Based on what you learned in this course, if you could give one piece of job advice to a new HR representative on how best to handle their new responsibilities, what would it be?