| Person | Age | Personal Income | Density (persons/mi²) |
|---|---|---|---|
| 1 | 20 | 30,000 | 140,115 |
| 2 | 46 | 62,000 | 43,469 |
| 3 | 44 | 110,000 | 13,970 |
| 4 | 25 | 51,000 | 3,317 |
| 5 | 23 | 30,000 | 23,429 |
\[ Var(x) = \frac{\sum\limits_{i=1}^{n}(x_i - \bar x)^2}{n - 1} \]
\[ Var(x) = \frac{\sum\limits_{i=1}^{n}[(x_i - \bar x)(x_i - \bar x)]}{n - 1} \]
\[ Cov(x, y) = \frac{\sum\limits_{i=1}^{n}[(x_i - \bar x)(y_i - \bar y)]}{n - 1} \]
| When x is… | |||
|---|---|---|---|
| Less than its mean | More than its mean | ||
| And y is… | Less than its mean | + | - |
| More than its mean | - | + | |
| Person | Age | Personal Income | Density (persons/mi²) |
|---|---|---|---|
| 1 | 20 | 30,000 | 140,115 |
| 2 | 46 | 62,000 | 43,469 |
| 3 | 44 | 110,000 | 13,970 |
| 4 | 25 | 51,000 | 3,317 |
| 5 | 23 | 30,000 | 23,429 |
| Person | Age | Personal Income | Density (persons/mi²) |
|---|---|---|---|
| 1 | 20 | 30,000 | 140,115 |
| 2 | 46 | 62,000 | 43,469 |
| 3 | 44 | 110,000 | 13,970 |
| 4 | 25 | 51,000 | 3,317 |
| 5 | 23 | 30,000 | 23,429 |
| Person | Age | Personal Income |
|---|---|---|
| 1 | −12 | −26,600 |
| 2 | 14 | 5,400 |
| 3 | 12 | 53,400 |
| 4 | −7 | −5,600 |
| 5 | −9 | −26,600 |
| Person | Age | Personal Income | Product |
|---|---|---|---|
| 1 | −12 | −26,600 | 308,560 |
| 2 | 14 | 5,400 | 77,760 |
| 3 | 12 | 53,400 | 662,160 |
| 4 | −7 | −5,600 | 36,960 |
| 5 | −9 | −26,600 | 228,760 |
\[ Cor(x, y) = \frac{\quad\frac{\sum\limits_{i=1}^{n}[(x_i - \bar x)(y_i - \bar y)]}{n - 1}\quad}{s_x s_y} \]
\[ Cor(x, y) = \frac{\quad\frac{\sum\limits_{i=1}^{n}[(x_i - \bar x)(y_i - \bar y)]}{n - 1}\quad}{ \sqrt{\frac{\sum\limits_{i=i}^{n}(x_i - \bar x)^2}{n - 1}} \quad \sqrt{\frac{\sum\limits_{i=i}^{n}(y_i - \bar y)^2}{n - 1}} } \]
\[ Cor(x, y) = \frac{\quad\frac{\sum\limits_{i=1}^{n}[(x_i - \bar x)(y_i - \bar y)]}{n - 1}\quad}{ \frac{\sqrt{\sum\limits_{i=i}^{n}(x_i - \bar x)^2}}{\sqrt{n - 1}} \quad \frac{\sqrt{\sum\limits_{i=i}^{n}(y_i - \bar y)^2}}{\sqrt{n - 1}} } \]
\[ Cor(x, y) = \frac{\quad\frac{\sum\limits_{i=1}^{n}[(x_i - \bar x)(y_i - \bar y)]}{n - 1}\quad}{ \frac{ \sqrt{\sum\limits_{i=i}^{n}(x_i - \bar x)^2} \sqrt{\sum\limits_{i=i}^{n}(y_i - \bar y)^2} }{n - 1} } \]
\[ Cor(x, y) = \frac{ \sum\limits_{i=1}^{n}[(x_i - \bar x)(y_i - \bar y)] }{ \sqrt{\sum\limits_{i=i}^{n}(x_i - \bar x)^2} \sqrt{\sum\limits_{i=i}^{n}(y_i - \bar y)^2} } \]
| Person | Age | Personal Income | Product |
|---|---|---|---|
| 1 | −12 | −26,600 | 308,560 |
| 2 | 14 | 5,400 | 77,760 |
| 3 | 12 | 53,400 | 662,160 |
| 4 | −7 | −5,600 | 36,960 |
| 5 | −9 | −26,600 | 228,760 |
CORREL function to compute the correlation between two variables=CORREL(A:A, B:B) to calculate the correlation between income and age\[ \frac{ \sum\limits_{i=1}^{n}[(x_i - \bar x)(y_i - \bar y)] }{ \sqrt{\sum\limits_{i=i}^{n}(x_i - \bar x)^2} \sqrt{\sum\limits_{i=i}^{n}(y_i - \bar y)^2} } \]
These both have the same correlation coefficient
\[ y = mx + b \]
\[ y = \alpha + \beta x \]

\[ y = \alpha + \beta x \]
\[ y = \alpha + \beta x + \epsilon \]
| Characteristic | Beta1 | SE | 95% CI | p-value |
|---|---|---|---|---|
| (Intercept) | 1,121*** | 71.7 | 981, 1,262 | <0.001 |
| UNITSIZE | 0.47*** | 0.059 | 0.36, 0.59 | <0.001 |
| R² | 0.031 | |||
| Adjusted R² | 0.031 | |||
| Statistic | 64.8 | |||
| p-value | <0.001 | |||
| No. Obs. | 2,000 | |||
| Residual df | 1,998 | |||
| Abbreviations: CI = Confidence Interval, SE = Standard Error | ||||
| 1 *p<0.05; **p<0.01; ***p<0.001 | ||||
AHS_2021.csv file from CanvasAHS_2021.csv fileread.csv, lm, and summary# character is ignored, I encourage you to use this “comment” functionality to take notesdata and pressing enter in the consoleView(data) to open up a spreadsheet-like viewerlm function estimates a regressionmodel variableThe summary function will display regression results
Call:
lm(formula = RENT ~ BEDROOMS, data = data)
Residuals:
Min 1Q Median 3Q Max
-2157.7 -774.7 -364.7 371.0 9671.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1080.40 73.71 14.658 <2e-16 ***
BEDROOMS 274.32 32.72 8.385 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1411 on 1998 degrees of freedom
Multiple R-squared: 0.03399, Adjusted R-squared: 0.03351
F-statistic: 70.3 on 1 and 1998 DF, p-value: < 2.2e-16
\(y=\) \(1080.4 +\) \(274.32x +\) \(\epsilon\)
Call:
lm(formula = RENT ~ BEDROOMS, data = data[data$RENT <= 5000,
])
Residuals:
Min 1Q Median 3Q Max
-1812.8 -591.7 -236.4 365.4 3363.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1047.09 46.84 22.357 <2e-16 ***
BEDROOMS 196.43 20.95 9.378 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 878 on 1933 degrees of freedom
Multiple R-squared: 0.04351, Adjusted R-squared: 0.04302
F-statistic: 87.94 on 1 and 1933 DF, p-value: < 2.2e-16
| Characteristic | Beta1 | SE | 95% CI | p-value |
|---|---|---|---|---|
| (Intercept) | 993*** | 78.1 | 840, 1,146 | <0.001 |
| UNITSIZE | 0.26*** | 0.078 | 0.11, 0.41 | <0.001 |
| BEDROOMS | 177*** | 43.9 | 91, 263 | <0.001 |
| R² | 0.039 | |||
| Adjusted R² | 0.038 | |||
| Statistic | 40.8 | |||
| p-value | <0.001 | |||
| No. Obs. | 2,000 | |||
| Residual df | 1,997 | |||
| Abbreviations: CI = Confidence Interval, SE = Standard Error | ||||
| 1 *p<0.05; **p<0.01; ***p<0.001 | ||||
+ signYRBUILT)
Call:
lm(formula = RENT ~ UNITSIZE + YRBUILT, data = data)
Residuals:
Min 1Q Median 3Q Max
-2483.5 -736.4 -369.9 382.9 9886.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.116e+03 2.569e+03 -3.549 0.000395 ***
UNITSIZE 4.667e-01 5.835e-02 7.998 2.12e-15 ***
YRBUILT 5.190e+00 1.302e+00 3.987 6.93e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1407 on 1997 degrees of freedom
Multiple R-squared: 0.03905, Adjusted R-squared: 0.03809
F-statistic: 40.58 on 2 and 1997 DF, p-value: < 2.2e-16
Call:
lm(formula = RENT ~ UNITSIZE + YRBUILT + BEDROOMS, data = data)
Residuals:
Min 1Q Median 3Q Max
-2142.8 -747.0 -376.3 408.8 9850.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.878e+03 2.564e+03 -3.853 0.00012 ***
UNITSIZE 2.420e-01 7.818e-02 3.096 0.00199 **
YRBUILT 5.508e+00 1.298e+00 4.243 2.31e-05 ***
BEDROOMS 1.879e+02 4.375e+01 4.294 1.84e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1401 on 1996 degrees of freedom
Multiple R-squared: 0.04785, Adjusted R-squared: 0.04642
F-statistic: 33.43 on 3 and 1996 DF, p-value: < 2.2e-16




METRONAME that contains the metropolitan area the home is in| City | Minneapolis | Richmond | Tampa | Oklahoma City |
|---|---|---|---|---|
| Minneapolis | 1 | 0 | 0 | 0 |
| Richmond | 0 | 1 | 0 | 0 |
| Minneapolis | 1 | 0 | 0 | 0 |
| Tampa | 0 | 0 | 1 | 0 |
| Richmond | 0 | 1 | 0 | 0 |
| Oklahoma City | 0 | 0 | 0 | 1 |
\[ y = \alpha + \beta_1 \mathrm{Minneapolis} + \beta_2 \mathrm{Richmond} + \beta_3 \mathrm{Tampa} + \beta_4 \mathrm{OklahomaCity} + \beta_6 \mathrm{Bedrooms} + \epsilon \]
| Characteristic | Beta1 | SE | 95% CI | p-value |
|---|---|---|---|---|
| (Intercept) | 767*** | 92.6 | 586, 949 | <0.001 |
| METRONAME | ||||
| Minneapolis-St. Paul-Bloomington, MN-WI | — | — | — | |
| Oklahoma City, OK | 84 | 99.6 | -112, 279 | 0.4 |
| Richmond, VA | -107 | 103 | -309, 95 | 0.3 |
| San Jose-Sunnyvale-Santa Clara, CA | 1,050*** | 91.2 | 871, 1,229 | <0.001 |
| Tampa-St. Petersburg-Clearwater, FL | -148 | 101 | -346, 51 | 0.14 |
| BEDROOMS | 289*** | 30.8 | 229, 350 | <0.001 |
| R² | 0.157 | |||
| Adjusted R² | 0.155 | |||
| Statistic | 74.5 | |||
| p-value | <0.001 | |||
| No. Obs. | 2,000 | |||
| Residual df | 1,994 | |||
| Abbreviations: CI = Confidence Interval, SE = Standard Error | ||||
| 1 *p<0.05; **p<0.01; ***p<0.001 | ||||
factor(variable) in the modelMETRONAME and BLDTYPE (building type) to your R model, and run it again
Call:
lm(formula = RENT ~ BLDTYPE + METRONAME + UNITSIZE + BEDROOMS,
data = data)
Residuals:
Min 1Q Median 3Q Max
-3074.3 -534.5 -136.6 217.2 9969.9
Coefficients:
Estimate Std. Error t value
(Intercept) 644.24922 99.53834 6.472
BLDTYPESingle family -94.37313 79.75814 -1.183
BLDTYPETrailer, mobile home, boat, RV, van, etc. -646.81135 236.34569 -2.737
METRONAMEOklahoma City, OK 105.45456 100.50465 1.049
METRONAMERichmond, VA -97.21605 102.68464 -0.947
METRONAMESan Jose-Sunnyvale-Santa Clara, CA 1057.07199 90.86762 11.633
METRONAMETampa-St. Petersburg-Clearwater, FL -111.81644 101.59910 -1.101
UNITSIZE 0.26395 0.07389 3.572
BEDROOMS 221.90817 45.94583 4.830
Pr(>|t|)
(Intercept) 1.21e-10 ***
BLDTYPESingle family 0.236855
BLDTYPETrailer, mobile home, boat, RV, van, etc. 0.006261 **
METRONAMEOklahoma City, OK 0.294190
METRONAMERichmond, VA 0.343884
METRONAMESan Jose-Sunnyvale-Santa Clara, CA < 2e-16 ***
METRONAMETampa-St. Petersburg-Clearwater, FL 0.271219
UNITSIZE 0.000363 ***
BEDROOMS 1.47e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1313 on 1991 degrees of freedom
Multiple R-squared: 0.1663, Adjusted R-squared: 0.163
F-statistic: 49.65 on 8 and 1991 DF, p-value: < 2.2e-16
| Characteristic | Beta1 | SE | 95% CI | p-value |
|---|---|---|---|---|
| (Intercept) | 953*** | 152 | 654, 1,251 | <0.001 |
| METRONAME | ||||
| Minneapolis-St. Paul-Bloomington, MN-WI | — | — | — | |
| Oklahoma City, OK | -229 | 227 | -675, 216 | 0.3 |
| Richmond, VA | 102 | 237 | -364, 567 | 0.7 |
| San Jose-Sunnyvale-Santa Clara, CA | 609** | 193 | 230, 988 | 0.002 |
| Tampa-St. Petersburg-Clearwater, FL | -245 | 232 | -700, 211 | 0.3 |
| BEDROOMS | 189** | 72.4 | 47, 331 | 0.009 |
| METRONAME * BEDROOMS | ||||
| Oklahoma City, OK * BEDROOMS | 163 | 105 | -43, 368 | 0.12 |
| Richmond, VA * BEDROOMS | -86 | 107 | -296, 125 | 0.4 |
| San Jose-Sunnyvale-Santa Clara, CA * BEDROOMS | 228* | 89.7 | 52, 404 | 0.011 |
| Tampa-St. Petersburg-Clearwater, FL * BEDROOMS | 60 | 104 | -144, 263 | 0.6 |
| R² | 0.163 | |||
| Adjusted R² | 0.160 | |||
| Statistic | 43.2 | |||
| p-value | <0.001 | |||
| No. Obs. | 2,000 | |||
| Residual df | 1,990 | |||
| Abbreviations: CI = Confidence Interval, SE = Standard Error | ||||
| 1 *p<0.05; **p<0.01; ***p<0.001 | ||||
* instead of a + between variablesmodel = lm(RENT ~ METROAREA * BEDROOMS, data) includes and interaction between metro area and bedrooms| Characteristic | Beta1 | SE | 95% CI | p-value |
|---|---|---|---|---|
| (Intercept) | 679*** | 95.5 | 492, 867 | <0.001 |
| BEDROOMS | 190*** | 41.1 | 110, 271 | <0.001 |
| UNITSIZE | 0.26*** | 0.073 | 0.12, 0.41 | <0.001 |
| METRONAME | ||||
| Minneapolis-St. Paul-Bloomington, MN-WI | — | — | — | |
| Oklahoma City, OK | 80 | 99.3 | -114, 275 | 0.4 |
| Richmond, VA | -111 | 103 | -312, 90 | 0.3 |
| San Jose-Sunnyvale-Santa Clara, CA | 1,049*** | 90.9 | 871, 1,227 | <0.001 |
| Tampa-St. Petersburg-Clearwater, FL | -149 | 101 | -347, 49 | 0.14 |
| R² | 0.163 | |||
| Adjusted R² | 0.160 | |||
| Abbreviations: CI = Confidence Interval, SE = Standard Error | ||||
| 1 *p<0.05; **p<0.01; ***p<0.001 | ||||
| Characteristic | Beta1 | SE | 95% CI | p-value |
|---|---|---|---|---|
| (Intercept) | 400** | 128 | 149, 651 | 0.002 |
| BEDROOMS | 13 | 68.1 | -121, 146 | 0.9 |
| UNITSIZE | 0.20** | 0.076 | 0.05, 0.35 | 0.008 |
| METRONAME | ||||
| Minneapolis-St. Paul-Bloomington, MN-WI | — | — | — | |
| Oklahoma City, OK | 63 | 99.2 | -132, 257 | 0.5 |
| Richmond, VA | -108 | 102 | -309, 93 | 0.3 |
| San Jose-Sunnyvale-Santa Clara, CA | 1,067*** | 90.8 | 889, 1,245 | <0.001 |
| Tampa-St. Petersburg-Clearwater, FL | -135 | 101 | -332, 63 | 0.2 |
| TOTROOMS | 159** | 48.7 | 64, 254 | 0.001 |
| R² | 0.167 | |||
| Adjusted R² | 0.164 | |||
| Abbreviations: CI = Confidence Interval, SE = Standard Error | ||||
| 1 *p<0.05; **p<0.01; ***p<0.001 | ||||
| 0.1 | -0.09 | -0.17 | -0.03 | 0.07 | -0.12 | 0.07 | 0 |
| 0.09 | -0.03 | -0.06 | -0.02 | -0.06 | -0.15 | -0.13 | -0.08 |
| 0.08 | 0.16 | 0.22 | -0.17 | 0.01 | 0.05 | 0.23 | -0.2 |
| -0.04 | -0.11 | -0.19 | -0.2 | -0.16 | -0.05 | 0.01 | -0.27* |
| 0.02 | 0.09 | -0.06 | -0.01 | 0.22 | -0.01 | 0.12 | 0.19 |
(* = p < 0.05, ** = p < 0.01, *** = p < 0.001)
© xkcd

This work by Matthew Bhagat-Conway is licensed under a Creative Commons Attribution 4.0 International License.
the unit size variable is categorical in the AHS. It has been randomly distributed within categories for visualization purposes.
