Carry out an econometrics investigation of the determinants of new COVID-19 cases across the continent.
Econometrics analysis to the investigation of the COVID-19 pandemic.
The “Our World in Data” website gives you access to a large set of time series and cross-country data related to the pandemic. The datasets are updated daily when new data from countries’ statistical agencies become available. A copy of the dataset collected on 10th December 2020 (as uploaded on EXCEL FILE. The dataset also contains a description of all variables. You are asked to use this data to write an economic report based on a statistical investigation of the data outlined in the sections below.
Submission details
Alongside the project you should also submit: a) the do file containing all your commands and b) the data file you used for your analysis.
The Dataset
At the beginning of 2020 the organisation “Our World in Data” started the regular (daily) collection of data concerning the Covid-19 pandemic across countries. The data is available for public use and the dataset “Covid-19 Dataset.xlsx” was downloaded from the “Our World in Data” website on 4th December 2021. On the same site you can also find a detailed codebook with details of each variable in the dataset and its source (covid-19-data/public/data at master · owid/covid-19-data · GitHub) You are asked to use this dataset in order to complete the tasks below.
The Tasks
The World Health Organisation has asked you to carry out an analysis of the determinants of COVID-19 cases across a given continent and a given country.
AFRICA, KENYA
You are required to write a 1,800-word economic report in which you report on your findings. Your report is expected to be organised to include the following analysis.
1. For your allocated continent construct a panel dataset that, for each country, includes one observation per month, starting from March 2020, taken at the end of each month (i.e. the 30th of each month). Once you have set up your dataset you are asked to complete the following tasks:
a. Carry out an econometrics investigation of the determinants of new COVID-19 cases across the continent. Select the variables you plan to use in your econometric model, justify your selection, produce some summary statistics for each variable and carry out your analysis. (approx. 500 words, 10 marks)
b. What does the literature state on new covid-19 cases in your chosen continent? Do your results in part 1a agree with those found in the literature? (approx.. 200 words, 10 marks)
2. A “reproduction rate” (R) greater than one is regarded as the threshold beyond which the pandemic becomes ‘explosive’ with sharp increases in cases. The WHO is interested in investigating what factors are affecting the likelihood of the “reproduction rate” being greater than one.
a. By using a linear probability model and the observations from the countries in your continent on 25th November 2020, explain how you would go about evaluating such a likelihood and carefully present the findings of your analysis. (100 words; 10 marks)
b. Do you believe that modelling R in this way is appropriate? Can you comment on the minimum and maximum probabilities for the reproduction rate? What are they? (100 words; 5 marks)
c. If you believe that there is an alternative modelling strategy to modelling R, please proceed with it here. (approx. 200 words; 10 marks)
3. The WHO is also interested in supplying each individual member country with a detailed analysis of the pandemic in the country. You have been asked to provide an analysis for the country allocated to you over the period 1st April 2020 to 1st April 2021 and your analysis should contain:
a. A regression analysis aimed at estimating the determinants of new COVID-19 cases in the country over time. (approx. 200 words; 10 marks)
b. For your regression in part 3a, do you think that serial correlation might be a problem? (100 words; 10 marks)
c. Can you preform the same regression in part 3a by including a time trend? (approx. 100 words; 10 marks)
d. Can you estimate a model to predict future new covid cases? (approx. 100 words; 10 marks)
4. In response to the pandemic, Covid-19 vaccines became available. For your allocated country and using the data available, do you think vaccinations helped make a difference? (approx. 200 words; 15 marks)
Project Guidelines
The aim of this project is to test your understanding of and ability to apply the statistical concepts and methodologies discussed throughout the module as well as your ability to analyse and evaluate the outcome of your analysis. The project is deliberately ‘open ended’ or, in other words, not very prescriptive in what and how you should conduct your analysis. You should refer to the material covered in the module and the activities carried out during the term to decide how to answer the questions and shape your investigation. To help your thinking, you can find the following guidelines of some help.
Task 1
In constructing your dataset and in commenting on the data try to think about questions such as: what type of data do you have in the original and in your adjusted dataset? How many variables do you have? What are the types of variables you have? How many countries and observations do you have? Are there variables containing missing observations? How do you handle the missing information? Overall, how would you regard the quality of your data? In investigating the COVID-19 cases and deaths across your continent can you see any pattern or trend? In addressing the regression analysis make sure to explain how you construct your econometric model by specifying its functional form and its estimated outcome. Make sure to interpret the estimated model, the significance of each individual estimation and the overall goodness of fit of the regression. Produce a clear account of your findings in such a way that WHO officials, who are not necessarily economist and/or statisticians, can understand the meaning of your analysis.
Task 2
The “reproduction rate” is also commonly referred to as the R number. An R number greater than 1 leads to an explosive behaviour in the reproduction of new cases. The dataset contains estimates of the R number for all countries over time. You are asked to carry out an investigation on the ‘likelihood’ that the “reproduction rate” is greater than one. In other words, what factors are likely to influence the probability that the R number will be greater than one? This should be the focus of your analysis: identify those factors that are most likely to make the R number greater than one. Please notice that for this task you are asked to use a cross-section of your database i.e. one observation for each country in your continent at the specified date (25th November 2020).
Task 3
In addressing this question reflect on what type of data and analysis you are asked to carry out. How does it differ from the analysis you carried out in the previous two parts? Make sure to provide a brief but informative analysis of the COVID-19 cases and deaths for the country assigned to you. You should set up your econometrics model and estimate it. Are you, perhaps, considering more than one model because of data availability? As in the previous two parts make sure to comment on your findings both in terms of the estimated coefficient and the goodness of fit. Can you reassure the reader that your estimates are unbiased and efficient? Can you use your model for some forecasting of future COVID-19 cases? Make sure to check that your estimation is providing you with accurate and valid estimates.
Task 4
This question is open ended. It is up to you how you want to analyse the impact of vaccinations on the pandemic for your allocated country. There are many variables you can use to study the impact of the vaccinations. Start by thinking of what variables you want to use and then think about what type of analysis you want to use. If you have limited data availability how is that going to impact your analysis? Make sure you come back to the actual question and answer it.
Report Style
The project gives you an indication of the number of words for each task. However, within the word limit of 1,800 words, you should feel free to arrange the number of words for each task in a way that best fits your approach. You should also feel free to organise the report in whatever way you think is most appropriate. You can divide the report into three parts (one for each task) but you should also consider writing one single report that starts with a short introduction, continues with the main body that contains the analysis of the three tasks and then concludes with a brief summary. Make sure that you carefully comment on all the evidence emerging from your statistical analysis. Always justify your choices. Make sure that your writing style is clear and accurate. Make sure that all tables and figures are labelled and numbered. If you are using some external sources of information, make sure to use appropriate citation and referencing rules. Also, please make sure to number your pages.
Missing Observations and Time Periods
The dataset is updated every day but not all countries are able to regularly report data for all observations. This means that your analysis is likely to be affected by missing observations that will reduce the power of your analysis. Of course, there is not much that can be done about this problem. However, it will be fine for you to slightly change the dates given in the three tasks if you think that by changing the dates you will have fewer missing observations and, hence, a better analysis. If you decide to change some of the time periods please just mentions it in your report.
Managing the Dataset
The dataset is available in Excel format. While the regression and statistical analysis should be conducted in STATA, it will be fine for you to prepare the dataset, plot graphs and produce basic statistics in Excel if you wish to do so.
