Exercise 1 (10 marks)
For this exercise you will be using Stata’s built-in auto dataset. Type sysuse auto in the command prompt to load the data (domestic in this data means American)
a) Briefly describe the data. What is each observation? What is the sample size?
b) Give descriptive statistics for mileage, price, length and weight, both for the whole sample, as well as separately for foreign and domestic models.
c) We are interested between domestic and foreign models. A representative for American car manufacturers claims that domestic models are both cheaper and have better mileage (they go more miles per consumed gallon of gas). Test both hypotheses for alpha=0.05. Show and describe your output.
d) Categorize cars with prices higher than 5500 as “expensive”. Test the hypothesis that the proportion of expensive models is higher for foreign cars for alpha=0.05.
e) Show graphics and statistics to suggest the relationship between price and mileage. How do those variables seem to be related?
Exercise 2 (20 marks)
Assume you want to analyse the causes of injuries in road traffic accidents for a group of OECD countries. Income, alcohol consumption and population structure are among the indicators that you suspect may influence road traffic injuries. The table below lists the variables that you can find in the dataset injuries.xls and their description.
Variable | Description |
injuries | Injuries in road traffic accidents (Injured per million population) |
GDP | Gross Domestic Product (/capita, US$ purchasing power parity) |
alcohol | Alcohol Consumption (liters per capita) |
pop | Total population (in thousands) |
pop0_14 | Population aged 0 to 14 years old (in thousands) |
pop15_64 | Population aged 15 to 64 years old (in thousands) |
pop65 | Population aged 65 and over (in thousands) |
a) Plot injuries in road traffic accidents versus GDP, alcohol and proportion of population aged between 15 to 64 years old. Describe any patterns observed and comment.
b) Use appropriate simple regression modelling to help determine the relationship between injuries in road traffic accidents and each of the explanatory variables considered in section a. (Your answer should include: description of the dependent and independent variables, equation of each model considered, number of observations included in each regression model, significance and interpretation of all the coefficients, interpretation of the ANOVA table, comment on the goodness of fit of the model, etc).
c) Suppose now you want to estimate a linear-log model that explains the relationship between savings and the log of GDP. Run the regression and give an interpretation of the coefficient. Comment on the general results and compare them with the case in which you analyzed the linear relationship between injuries and GDP.
Exercise 3 (35 marks)
The table below shows cost-of-living indexes for different geographical areas. Assume you want to see which indexes affect the grocery cost-of-living index.
a) Plot grocery index against each of the other indexes. Can you observe any patterns?
b) Run individual regressions of grocery index on each of the other indexes. Give an interpretation of the coefficients. Discuss which variables seem to explain the grocery index and the goodness of fit of the models.
c) You now want to estimate the elasticity of each index with respect to the grocery index. Estimate the individual log-log regression models and give an interpretation of the results.
d) Now you think that the grocery cost of living index should be explained by all other indexes and that this will give you improved results. Obtain estimates of this multiple linear model. Discuss the results. Discuss the reason why you may want to include all indexes as explanatory variables.
Grocery | Housing | Utilities | Transportation | Healthcare |
108.3 | 106.8 | 127.4 | 89.1 | 107.5 |
Exercise 4 (35 marks)
For this exercise you will be using Stata’s built-in blood pressure dataset. Type sysuse bplong in the command prompt to load the data.
a) Briefly describe the data. What is each observation, and what kind of information is provided for the observations?
b) We want to know whether gender and age are associated with high blood pressure before treatment. Run the appropriate regression and explain the results (be careful about the age categories).
c) Now we want to know if the treatment has been successful in reducing patients’ blood pressure. Test this hypothesis at the 5% significance level. Hint: you need to figure out how to get the data in the right format for this.
d) Finally, use a regression to comment whether or not the reduction in blood pressure was related to gender or age (i.e. did the treatment work better for women? For old people?)
Репетитор английского языка. TOEFL. IELTS. Московский репетитор: объявления московских репетиторов