An origin-destination survey in 10 travel-analysis zones provided the following data on zonal residential densities (households/acre) and average daily trip productions per household:
Density X: 42 5 25 10 4 15 20 12 14
22
Trip Rate Y: 1.5 4 2.1 2.6 4.8 2 2.5 3.3 1.9
2
Use the method of least squares to develop a regression model for predicting trip production rates as a function of density.
a. Plot the data and the model.
b. How good is a fit?

Respuesta :

Answer:

a) [tex]y=-0.0715 x +3.878[/tex]

b) [tex]r=\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2 -(\sum x)^2][n\sum y^2 -(\sum y)^2]}}[/tex]  

For our case we have this:

n=10 [tex] \sum x = 169, \sum y = 26.7, \sum xy = 370.9, \sum x^2 =3979, \sum y^2 =81.21[/tex]  

[tex]r=\frac{10(370.9)-(169)(26.7)}{\sqrt{[10(3979) -(169)^2][10(81.21) -(26.7)^2]}}=-0.761[/tex]  

And the variation coeffcient would be [tex] r^2= 0.579[/tex]

And we can conclude that the linear model explains 57.9 % of the variation.

Explanation:

a. Plot the data and the model.

Data given:

X: 42 5 25 10 4 15 20 12 14  22

Y: 1.5 4 2.1 2.6 4.8 2 2.5 3.3 1.9  2

For this case we need to calculate the slope with the following formula:

[tex]m=\frac{S_{xy}}{S_{xx}}[/tex]

Where:

[tex]S_{xy}=\sum_{i=1}^n x_i y_i -\frac{(\sum_{i=1}^n x_i)(\sum_{i=1}^n y_i)}{n}[/tex]

[tex]S_{xx}=\sum_{i=1}^n x^2_i -\frac{(\sum_{i=1}^n x_i)^2}{n}[/tex]

So we can find the sums like this:

[tex]\sum_{i=1}^n x_i =169[/tex]

[tex]\sum_{i=1}^n y_i =26.7[/tex]

[tex]\sum_{i=1}^n x^2_i =3979[/tex]

[tex]\sum_{i=1}^n y^2_i =81.21[/tex]

[tex]\sum_{i=1}^n x_i y_i =370.9[/tex]

With these we can find the sums:

[tex]S_{xx}=\sum_{i=1}^n x^2_i -\frac{(\sum_{i=1}^n x_i)^2}{n}=3979-\frac{169^2}{10}=1122.9[/tex]

[tex]S_{xy}=\sum_{i=1}^n x_i y_i -\frac{(\sum_{i=1}^n x_i)(\sum_{i=1}^n y_i)}{n}=370.9-\frac{169*26.7}{10}=-80.33[/tex]

And the slope would be:

[tex]m=-\frac{80.33}{1122.9}=-0.0715[/tex]

Nowe we can find the means for x and y like this:

[tex]\bar x= \frac{\sum x_i}{n}=\frac{169}{10}=16.9[/tex]

[tex]\bar y= \frac{\sum y_i}{n}=\frac{26.7}{10}=2.67[/tex]

And we can find the intercept using this:

[tex]b=\bar y -m \bar x=2.67-(-0.0715*16.9)=3.878[/tex]

So the line would be given by:

[tex]y=-0.0715 x +3.878[/tex]

The plot on this case is on the figure attached.

b. How good is a fit?

For this case we can calculate the correlation coefficient with the following formula:

The correlation coefficient is a "statistical measure that calculates the strength of the relationship between the relative movements of two variables". It's denoted by r and its always between -1 and 1.

And in order to calculate the correlation coefficient we can use this formula:  

[tex]r=\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n\sum x^2 -(\sum x)^2][n\sum y^2 -(\sum y)^2]}}[/tex]  

For our case we have this:

n=10 [tex] \sum x = 169, \sum y = 26.7, \sum xy = 370.9, \sum x^2 =3979, \sum y^2 =81.21[/tex]  

[tex]r=\frac{10(370.9)-(169)(26.7)}{\sqrt{[10(3979) -(169)^2][10(81.21) -(26.7)^2]}}=-0.761[/tex]  

And the variation coefficient would be [tex] r^2= 0.579[/tex]

And we can conclude that the linear model explains 57.9 % of the variation.

Ver imagen dfbustos