Dear All..

Thankyou for reading my thread and experts please pardon me if this question seems too silly or an outlier or too lengthy to answer ..but your valuable time is highly appreciated as it would guide me to solve my problem domain (a) and (b).

Attached is description of data set:

A real estate agent is trying to understand the nature of housing stock and

home prices in and around a medium sized town in upstate New York. She

has collected data from a random sample of 1047 homes sold in the last 12

months. Data was collected on the following variables, and is available in the

attached houseprices.csv file.

Price the sale price of the house in $

Living Area in Sq. ft.

Bathrooms number of bathrooms in the house (powder rooms with

no tub or shower area are considered 0.5 baths)

Bedrooms the number of bedrooms

Lot Size size of the property on which the house sits (in acres).

Age of the house in years

Fireplace whether or not the house has a fireplace (Yes = 1, No = 0)

===============

Part (A)

1. Prepare a brief report summarizing the home values (prices) in this area.

Use both graphical and numerical summaries. Your report should briefly

describe what those summaries tell you, and anything of particular

note/interest.

2. Does the normal model provide a good description of the prices? Use a

Normal Quantile plot to frame your response.

3. Irrespective of your response to Q2, assume that Price ~ N(164K, (68K)2).

Given this:

A. Calculate the following probabilities P(Price > 92.8K), P(Price <

255.5K). Do these numbers agree with what you see in the data?

B. Once again, assuming the above normal distribution, what

percentage of houses should have a value less than 232K? Does that

agree with the data?

C. Based on the theoretical model, what do you expect should be the

price of a house that is exactly on the 3rd quartile (75th percentile,).

How does that compare to the actual?

4. Create a histogram and boxplot for the Living Area variable. What does

the histogram tell you that the boxplot does not, and vice-versa? Is the

distribution symmetric? Check the skewness measure to see if it is

consistent with your observation.

5. Create a new column in the dataset by taking the logarithm of the Living

Area variable. Is the normal distribution a better fit for this variable or the

original (Living Area) variable? Why do you think this is the case?

===========

Part (B)

1. Create the 90%, 95%, and 99% confidence intervals for the average home

price and explain what these mean. How do the margins of error for these

three confidence intervals compare? Does that make sense? Before

creating the confidence intervals, be sure to check the conditions

necessary to create confidence intervals (and briefly describe this in your

submission).

2. Your friend has asked you to provide an estimate for the 95th percentile of

home prices in this market. Which (if any) of the above confidence

intervals can you use to give an answer? Describe briefly.

3. The sample data given to you all come from home sales within the past 12

months. Suppose you had sample data of the same size each year going

back several years, and calculated the average sale price for each year.

What kind of distribution do you expect to see for these averages and

why? (Include the parameters of the distribution in your response,

assuming that the house prices dont change i.e. go up or down, over

time. Clearly, this is not a great assumption but make it anyway.)

4. The architecture changed significantly in this geographical area about

30 years ago. So any houses aged more than 30 years are considered

old houses. What proportion of the houses in the sample is old?

Provide the 95% and 99% confidence intervals for the proportion of

old houses in this area, and interpret them. Once again, make sure

that the necessary conditions are satisfied before creating confidence

intervals.

Warm regards

Ravin