# Worksheet 9 - Regression Continued

### Lecture and Tutorial Learning Goals:

After completing this week's lecture and assignment work, you will be able to:

* Recognize situations where a simple regression analysis would be appropriate for making predictions.
* Explain the $k$-nearest neighbour regression algorithm and describe how it differs from k-nn classification.
* Interpret the output of a $k$-nn regression.
* In a dataset with two variables, perform $k$-nearest neighbour regression in Python using `scikit-learn` to predict the values for a test dataset.
* Execute cross-validation in Python to choose the number of neighbours.
* Using Python, evaluate $k$-nn regression prediction accuracy using a test data set and an appropriate metric (*e.g.*, root means square prediction error).
* In a dataset with > 2 variables, perform $k$-nn regression in Python using `scikit-learn` to predict the values for a test dataset.
* In the context of $k$-nn regression, compare and contrast goodness of fit and prediction properties (namely RMSE vs RMSPE).
* Describe advantages and disadvantages of the $k$-nearest neighbour regression approach.
* Perform ordinary least squares regression in Python using `scikit-learn` to predict the values for a test dataset.
* Compare and contrast predictions obtained from $k$-nearest neighbour regression to those obtained using simple ordinary least squares regression from the same dataset.

This worksheet covers parts of [Chapter 8](https://python.datasciencebook.ca/regression2) of the online textbook. You should read this chapter before attempting this assignment. Any place you see `___`, you must fill in the function, variable, or data to complete the code. Substitute the `raise NotImplementedError` with your completed code and answers then proceed to run the cell.

In [None]:
### Run this cell before continuing.
import altair as alt
import numpy as np
import pandas as pd
from sklearn import set_config
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Simplify working with large datasets in Altair
alt.data_transformers.disable_max_rows()

# Output dataframes instead of arrays
set_config(transform_output="pandas")

### Warm-up Questions

Here are some warm-up questions on the topic of multiple regression to get you thinking before we jump into data analysis. The course readings should help you answer these.

**Question 1.0**
<br> {points: 1}

In multivariate k-nn regression with one outcome/target variable and two predictor variables, the predictions take which shape?

A. a flat plane

B. a wiggly/flexible plane

C. A straight line

D. a wiggly/flexible line

E. a 4D hyperplane

F. a 4D wiggly/flexible hyperplane

*Assign the letter of your answer to a variable named `answer1_0`. Make sure you put quotations around the letter and pay attention to case (e.g., `"F"`).*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_0)).encode("utf-8")+b"890878964ff94ca4").hexdigest() == "04b4dcd601bac914024aee542227bb5961b2e6ae", "type of answer1_0 is not str. answer1_0 should be an str"
assert sha1(str(len(answer1_0)).encode("utf-8")+b"890878964ff94ca4").hexdigest() == "56d8f2b2b25c3d1fff1c4994c48900aacad00f5d", "length of answer1_0 is not correct"
assert sha1(str(answer1_0.lower()).encode("utf-8")+b"890878964ff94ca4").hexdigest() == "d3ba313e2881695779bf9f875b50066c897f1108", "value of answer1_0 is not correct"
assert sha1(str(answer1_0).encode("utf-8")+b"890878964ff94ca4").hexdigest() == "c1c27c8b83007bca3385e69529d83b66e738f642", "correct string value of answer1_0 but incorrect case of letters"

print('Success!')

**Question 1.1** 
<br> {points: 1}

In simple linear regression with one outcome/target variable and one predictor variable, the predictions take which shape?

A. a flat plane

B. a wiggly/flexible plane

C. A straight line

D. a wiggly/flexible line

E. a 4D hyperplane

F. a 4D wiggly/flexible hyperplane

*Assign the letter of your answer to a variable named `answer1_1`. Make sure you put quotations around the letter and pay attention to case (e.g., `"F"`).*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_1)).encode("utf-8")+b"07a3c77247a04195").hexdigest() == "6cb7e96d54488463441b36c8ec1ab915897169b4", "type of answer1_1 is not str. answer1_1 should be an str"
assert sha1(str(len(answer1_1)).encode("utf-8")+b"07a3c77247a04195").hexdigest() == "c2b2fff9a9f72667176c4a22cae9534d39b127cf", "length of answer1_1 is not correct"
assert sha1(str(answer1_1.lower()).encode("utf-8")+b"07a3c77247a04195").hexdigest() == "0e8985730208c961f126aa6cd08824b82b9d1c3d", "value of answer1_1 is not correct"
assert sha1(str(answer1_1).encode("utf-8")+b"07a3c77247a04195").hexdigest() == "ce2c3daa594d43ee4ee3f8489895389a704c1822", "correct string value of answer1_1 but incorrect case of letters"

print('Success!')

**Question 1.2**
<br> {points: 1}

In multiple linear regression with one outcome/target variable and two predictor variables, the predictions take which shape?

A. a flat plane

B. a wiggly/flexible plane

C. A straight line

D. a wiggly/flexible line

E. a 4D hyperplane

F. a 4D wiggly/flexible hyperplane

*Assign the letter of your answer to a variable named `answer1_2`. Make sure you put quotations around the letter and pay attention to case (e.g., `"F"`).*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_2)).encode("utf-8")+b"b004f451379057b6").hexdigest() == "d3d1c4bbdbc20cfde5db743727980b18e99fdefb", "type of answer1_2 is not str. answer1_2 should be an str"
assert sha1(str(len(answer1_2)).encode("utf-8")+b"b004f451379057b6").hexdigest() == "cb2f9c3344fd8433a82fe0f613adcdbaf9424025", "length of answer1_2 is not correct"
assert sha1(str(answer1_2.lower()).encode("utf-8")+b"b004f451379057b6").hexdigest() == "626bd23a78d772d9e3ce2df620ae485125a72bac", "value of answer1_2 is not correct"
assert sha1(str(answer1_2).encode("utf-8")+b"b004f451379057b6").hexdigest() == "351e783d88c000240c41764c8ba744497d0087c0", "correct string value of answer1_2 but incorrect case of letters"

print('Success!')

### Understanding Simple Linear Regression

Consider this small and simple dataset: 

In [None]:
points = pd.DataFrame(
    [[1, 1], [2, 1], [3, 3], [6, 5], [7, 7], [7, 6]],
    columns=["X", "y"]
)

base = alt.Chart(points).mark_point().encode(
    x='X',
    y='y'
)

base

Now consider these three **potential** lines we could fit for the same dataset:

In [None]:
lines = pd.DataFrame(
    [
        [0.93, 0.017562, 'Line A'],
        [7, 5.9868, 'Line A'],
        [0, 0.1022, 'Line B'],
        [7, 6.965, 'Line B'],
        [0.26, 0.003564, 'Line C'],
        [8, 7.0965, 'Line C']
    ],
    columns=["X", "y", 'Name']
)

base + alt.Chart(lines).mark_line().encode(
    x='X',
    y='y',
    color='Name'
)

**Question 2.0**
<br> {points: 1}

Use the graph below to roughly calculate the average squared vertical distance between the points and the blue line ("Line A" above). **Read values of the graph to a precision of 0.25** (e.g. 1, 1.25, 1.5, 1.75, 2). We reprint the plot for you with only a single line to make it easier to estimate the locations on the graph.

*Save your answer to a variable named `answer2_0`.*

In [None]:
base + alt.Chart(lines[lines['Name'] == 'Line A']).mark_line().encode(
    x='X',
    y='y',
)

In [None]:
# your code here
raise NotImplementedError
answer2_0

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_0)).encode("utf-8")+b"6c7a08a8d640e413").hexdigest() == "c0a06509e31952573f6919e9afbe0456f6b5f69e", "type of answer2_0 is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(answer2_0, 2)).encode("utf-8")+b"6c7a08a8d640e413").hexdigest() == "c645d3533aa9559c10770ae96b066615b7bc3313", "value of answer2_0 is not correct (rounded to 2 decimal places)"

print('Success!')

**Question 2.1**
<br> {points: 1}

Use the graph below to roughly calculate the average squared vertical distance between the points and the orange line ("Line B" above). **Read values of the graph to a precision of 0.25** (e.g. 1, 1.25, 1.5, 1.75, 2). We reprint the plot for you with only a single line to make it easier to estimate the locations on the graph.

*Save your answer to a variable named `answer2_1`.*

In [None]:
base + alt.Chart(lines[lines['Name'] == 'Line B']).mark_line(color='#f58518').encode(
    x='X',
    y='y',
)

In [None]:
# your code here
raise NotImplementedError
answer2_1

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_1)).encode("utf-8")+b"175af5e766816de1").hexdigest() == "9784e0c29fc384f12488ae964ec19527df11fafd", "type of answer2_1 is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(answer2_1, 2)).encode("utf-8")+b"175af5e766816de1").hexdigest() == "a1cef548b2b99ee93170c21af643ad78699a8f49", "value of answer2_1 is not correct (rounded to 2 decimal places)"

print('Success!')

**Question 2.2** 
<br> {points: 1}

Use the graph below to roughly calculate the average squared vertical distance between the points and the red line ("Line C" above). **Read values of the graph to a precision of 0.25** (e.g. 1, 1.25, 1.5, 1.75, 2). We reprint the plot for you with only a single line to make it easier to estimate the locations on the graph.

*Save your answer to a variable named `answer2_2`.*

In [None]:
base + alt.Chart(lines[lines['Name'] == 'Line C']).mark_line(color='#e45756').encode(
    x='X',
    y='y',
)

In [None]:
# your code here
raise NotImplementedError
answer2_2

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_2)).encode("utf-8")+b"ca794ab7ed821511").hexdigest() == "7c183ca2ce7b4f7d3d1f6c90f6d6db8515a4ad39", "type of answer2_2 is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(answer2_2, 2)).encode("utf-8")+b"ca794ab7ed821511").hexdigest() == "6b4f7cef2335ed3d57d98ca9cc996a21d4da56f5", "value of answer2_2 is not correct (rounded to 2 decimal places)"

print('Success!')

**Question 2.3**
<br> {points: 1}

Based on your calculations above, which line would linear regression by ordinary least squares choose given our small and simple dataset? Line A, B or C? 

*Assign the letter of your answer to a variable named `answer2_3`. Make sure you put quotations around the letter and pay attention to case.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_3)).encode("utf-8")+b"3972f03d256ee33c").hexdigest() == "dad569c72043685343be147dba491f94e16572cc", "type of answer2_3 is not str. answer2_3 should be an str"
assert sha1(str(len(answer2_3)).encode("utf-8")+b"3972f03d256ee33c").hexdigest() == "c1773a90664c730da607a6155895fe198ec35fd2", "length of answer2_3 is not correct"
assert sha1(str(answer2_3.lower()).encode("utf-8")+b"3972f03d256ee33c").hexdigest() == "3c4d808a7b834e7e6bee0a5c2e9cd0fefeec9c87", "value of answer2_3 is not correct"
assert sha1(str(answer2_3).encode("utf-8")+b"3972f03d256ee33c").hexdigest() == "c0bcc15972ed0a4b45c0247abe3b9c00c6d4e941", "correct string value of answer2_3 but incorrect case of letters"

print('Success!')

## Marathon Training Revisited with Linear Regression!

<img src='https://media.giphy.com/media/BDagLpxFIm3SM/giphy.gif' width='400'>

Source: https://media.giphy.com/media/BDagLpxFIm3SM/giphy.gif

Remember our question from last week: what features predict whether athletes will perform better than others? Specifically, we are interested in marathon runners, and looking at how the maximum distance ran per week during training predicts the time it takes a runner to end the race? 

This time around, however, we will analyze the data using simple linear regression rather than $k$-nn regression. In the end, we will compare our results to what we found last week with $k$-nn regression.

**Question 3.0**
<br> {points: 1}

Load the `marathon` data from the `data/` folder and assign it to an object called `marathon`. 

In [None]:
# your code here
raise NotImplementedError
marathon

In [None]:
from hashlib import sha1
assert sha1(str(type(marathon is None)).encode("utf-8")+b"6ce0b12672fb24e2").hexdigest() == "71239a7e9c098951c0765e17519ccb9110195f1a", "type of marathon is None is not bool. marathon is None should be a bool"
assert sha1(str(marathon is None).encode("utf-8")+b"6ce0b12672fb24e2").hexdigest() == "1bf86a01bb8e6911f0fec93c8a123eed481405a2", "boolean value of marathon is None is not correct"

assert sha1(str(type(marathon)).encode("utf-8")+b"840c63c2544c88ed").hexdigest() == "7f74d41f1d5cd56aef365318ff7e1ef443bb83ea", "type of type(marathon) is not correct"

assert sha1(str(type(marathon.shape)).encode("utf-8")+b"f5295a1891d0693a").hexdigest() == "dba3e1e736f8213cd66d8ab41d987e4efc6c174e", "type of marathon.shape is not tuple. marathon.shape should be a tuple"
assert sha1(str(len(marathon.shape)).encode("utf-8")+b"f5295a1891d0693a").hexdigest() == "85ea30058aa48407f612ac283b06a85a97251fa5", "length of marathon.shape is not correct"
assert sha1(str(sorted(map(str, marathon.shape))).encode("utf-8")+b"f5295a1891d0693a").hexdigest() == "10dd37f78974e0265e82ae0959e4cce69c64b57d", "values of marathon.shape are not correct"
assert sha1(str(marathon.shape).encode("utf-8")+b"f5295a1891d0693a").hexdigest() == "037439261030ff3bcf7d0baa53e8387efd60fd21", "order of elements of marathon.shape is not correct"

assert sha1(str(type("time_hrs" in marathon.columns)).encode("utf-8")+b"0ccb2a1795fa4c83").hexdigest() == "a7aa0e58bf704c5e89b67fa18c01ac0212b168f9", "type of \"time_hrs\" in marathon.columns is not bool. \"time_hrs\" in marathon.columns should be a bool"
assert sha1(str("time_hrs" in marathon.columns).encode("utf-8")+b"0ccb2a1795fa4c83").hexdigest() == "c28734c36151e2c694db0184490c34865d71a6e2", "boolean value of \"time_hrs\" in marathon.columns is not correct"

assert sha1(str(type("max" in marathon.columns)).encode("utf-8")+b"8c4e0aaa380bd68c").hexdigest() == "502edc622c1f29fb3f29e167945148a693fc11c2", "type of \"max\" in marathon.columns is not bool. \"max\" in marathon.columns should be a bool"
assert sha1(str("max" in marathon.columns).encode("utf-8")+b"8c4e0aaa380bd68c").hexdigest() == "b7afdb28f348429c223ca3482930d7ea7dc713b0", "boolean value of \"max\" in marathon.columns is not correct"

assert sha1(str(type(round(sum(marathon['max']), 0))).encode("utf-8")+b"fa7be8243229f5ea").hexdigest() == "ca104317cf21c2f5653a7c1fcf33bc4cad00ce84", "type of round(sum(marathon['max']), 0) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(marathon['max']), 0), 2)).encode("utf-8")+b"fa7be8243229f5ea").hexdigest() == "d38d9ad3f59e9d1d920002f4e22e5ac1f5b1a808", "value of round(sum(marathon['max']), 0) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(round(sum(marathon['time_hrs']), 0))).encode("utf-8")+b"43c6d19edaecc115").hexdigest() == "3dcc74317b1ad2853cd077ed721d9f57589a0dbe", "type of round(sum(marathon['time_hrs']), 0) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(marathon['time_hrs']), 0), 2)).encode("utf-8")+b"43c6d19edaecc115").hexdigest() == "f6874f0baad9bff4146c966349a5b348c97f6a4e", "value of round(sum(marathon['time_hrs']), 0) is not correct (rounded to 2 decimal places)"

print('Success!')

**Question 3.1**
<br> {points: 1}

Similar to what we have done for the last few weeks, we will first split the dataset into the training and testing datasets, using 75% of the original data as the training data. Remember, we will be putting the test dataset away in a 'lock box' that we will comeback to later after we choose our final model. Assign your training dataset to an object named `marathon_training` and your testing dataset to an object named `marathon_testing`.

Next, set the `time_hrs` as the target (y) and `max` as the feature (X). Store the features as `X_train` and `X_test` and targets as `y_train` and `y_test` respectively for the `marathon_training` and `marathon_testing`.

*Assign the objects to `marathon_training`, `marathon_testing`, `X_train`, `y_train`, `X_test` and `y_test` respectively.*

In [None]:
# ___, ___ = train_test_split(
#     ___,
#     test_size=___,
#     random_state=2000,  # Do not change the random_state
# )

# X_train = ___[___]  # A single column data frame
# y_train = ___[___]  # A series

# X_test = ___[___]  # A single column data frame
# y_test = ___[___]  # A series

# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(marathon_training is None)).encode("utf-8")+b"bd46b74e620fa656").hexdigest() == "3c47613c7f72c3b82a30ceb088435e05378a21e6", "type of marathon_training is None is not bool. marathon_training is None should be a bool"
assert sha1(str(marathon_training is None).encode("utf-8")+b"bd46b74e620fa656").hexdigest() == "7266f3448b7579c9f06928fb3accd73385494927", "boolean value of marathon_training is None is not correct"

assert sha1(str(type(marathon_training.shape)).encode("utf-8")+b"68c03e09bec9b7fa").hexdigest() == "009190aa1a5f0aaea1c43c9b335d5bba9dc46bd9", "type of marathon_training.shape is not tuple. marathon_training.shape should be a tuple"
assert sha1(str(len(marathon_training.shape)).encode("utf-8")+b"68c03e09bec9b7fa").hexdigest() == "e726add8024c009a6fc6ad647feb6736d1699838", "length of marathon_training.shape is not correct"
assert sha1(str(sorted(map(str, marathon_training.shape))).encode("utf-8")+b"68c03e09bec9b7fa").hexdigest() == "8b7a2508cac192822aa5929e4ab923f5a01041f9", "values of marathon_training.shape are not correct"
assert sha1(str(marathon_training.shape).encode("utf-8")+b"68c03e09bec9b7fa").hexdigest() == "81d4e3cc0c70ec33c11e41acadb6f8b2809315f6", "order of elements of marathon_training.shape is not correct"

assert sha1(str(type(sum(marathon_training.age))).encode("utf-8")+b"44c36ddde1346394").hexdigest() == "6111b25c3cc87e87339754b1343cb4cb3658f537", "type of sum(marathon_training.age) is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sum(marathon_training.age)).encode("utf-8")+b"44c36ddde1346394").hexdigest() == "8a5a1c953212a779f195cc27c2119493a49c7143", "value of sum(marathon_training.age) is not correct"

assert sha1(str(type(marathon_testing is None)).encode("utf-8")+b"950a29e104cd4ce4").hexdigest() == "51bedea40a1afa52cbcd0b84f0b1d0082ce0b9fb", "type of marathon_testing is None is not bool. marathon_testing is None should be a bool"
assert sha1(str(marathon_testing is None).encode("utf-8")+b"950a29e104cd4ce4").hexdigest() == "c5771b8cd44b6ddb363b232fa459c062ba9a3553", "boolean value of marathon_testing is None is not correct"

assert sha1(str(type(marathon_testing.shape)).encode("utf-8")+b"87da7ba2c8c054d6").hexdigest() == "097e7486677afb43477ab5164e7a28d809beb68e", "type of marathon_testing.shape is not tuple. marathon_testing.shape should be a tuple"
assert sha1(str(len(marathon_testing.shape)).encode("utf-8")+b"87da7ba2c8c054d6").hexdigest() == "00dce8955ce6c945184c408d2b04790cf14b3359", "length of marathon_testing.shape is not correct"
assert sha1(str(sorted(map(str, marathon_testing.shape))).encode("utf-8")+b"87da7ba2c8c054d6").hexdigest() == "a1a2f6847e41d2a28ea71fc1fb0baa69a27d238e", "values of marathon_testing.shape are not correct"
assert sha1(str(marathon_testing.shape).encode("utf-8")+b"87da7ba2c8c054d6").hexdigest() == "9226ccbe098059619cfa1c60c73cdaa8b6dc92a4", "order of elements of marathon_testing.shape is not correct"

assert sha1(str(type(sum(marathon_testing.age))).encode("utf-8")+b"70554129b764603a").hexdigest() == "30e96c0258dabcb67c24a9d4ac5290346008ffdd", "type of sum(marathon_testing.age) is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sum(marathon_testing.age)).encode("utf-8")+b"70554129b764603a").hexdigest() == "3312ccabc385ece519ad7b45eba8d2c4b47e3ac7", "value of sum(marathon_testing.age) is not correct"

assert sha1(str(type(X_train.columns.values)).encode("utf-8")+b"2204601e36f33695").hexdigest() == "981d1123ee05d279759f181be7dc64e8be2be015", "type of X_train.columns.values is not correct"
assert sha1(str(X_train.columns.values).encode("utf-8")+b"2204601e36f33695").hexdigest() == "1ec78d16a465b8461d7e88042e1769a884ec0fb7", "value of X_train.columns.values is not correct"

assert sha1(str(type(X_train.shape)).encode("utf-8")+b"26d525c956786c39").hexdigest() == "165a7e1e8f1a65b9601b40bfc219dfa514187f45", "type of X_train.shape is not tuple. X_train.shape should be a tuple"
assert sha1(str(len(X_train.shape)).encode("utf-8")+b"26d525c956786c39").hexdigest() == "200cbcb97df1d747b1ed49222f52b031142f928c", "length of X_train.shape is not correct"
assert sha1(str(sorted(map(str, X_train.shape))).encode("utf-8")+b"26d525c956786c39").hexdigest() == "cfcda3455ca710805d00cd0128f451643fa1fd8f", "values of X_train.shape are not correct"
assert sha1(str(X_train.shape).encode("utf-8")+b"26d525c956786c39").hexdigest() == "f692e752947d06874559780c7df01a080e892e65", "order of elements of X_train.shape is not correct"

assert sha1(str(type(y_train.name)).encode("utf-8")+b"c8ac1d4c0103da06").hexdigest() == "63ec88dd4d1ea40348f4879a03cf09c617e58782", "type of y_train.name is not str. y_train.name should be an str"
assert sha1(str(len(y_train.name)).encode("utf-8")+b"c8ac1d4c0103da06").hexdigest() == "2626d10bc3356f369431c4535d8a118dbe65bcd2", "length of y_train.name is not correct"
assert sha1(str(y_train.name.lower()).encode("utf-8")+b"c8ac1d4c0103da06").hexdigest() == "34c2938524244d3ff4a8244619f999bf95d329b4", "value of y_train.name is not correct"
assert sha1(str(y_train.name).encode("utf-8")+b"c8ac1d4c0103da06").hexdigest() == "34c2938524244d3ff4a8244619f999bf95d329b4", "correct string value of y_train.name but incorrect case of letters"

assert sha1(str(type(y_train.shape)).encode("utf-8")+b"dde85b4237b3f4cf").hexdigest() == "76978f2283b3d80869f9cb3590022ad9f564cf20", "type of y_train.shape is not tuple. y_train.shape should be a tuple"
assert sha1(str(len(y_train.shape)).encode("utf-8")+b"dde85b4237b3f4cf").hexdigest() == "c54a6df5d7d8a634fa6232dbbda4dac0e14d06fc", "length of y_train.shape is not correct"
assert sha1(str(sorted(map(str, y_train.shape))).encode("utf-8")+b"dde85b4237b3f4cf").hexdigest() == "8bca2059b197a83849209b3e7772e2ad45786d82", "values of y_train.shape are not correct"
assert sha1(str(y_train.shape).encode("utf-8")+b"dde85b4237b3f4cf").hexdigest() == "91e9141afe650e2c6882c7fb927ec5db5e00efcf", "order of elements of y_train.shape is not correct"

assert sha1(str(type(X_test.columns.values)).encode("utf-8")+b"b4f3f4601328acfc").hexdigest() == "48370be31fa89366e7d88da1b80abf804197c964", "type of X_test.columns.values is not correct"
assert sha1(str(X_test.columns.values).encode("utf-8")+b"b4f3f4601328acfc").hexdigest() == "6eff97c22b4204dd6fd7ab594153a09a2a2acb8c", "value of X_test.columns.values is not correct"

assert sha1(str(type(X_test.shape)).encode("utf-8")+b"bb3b1f8a3ccb7b31").hexdigest() == "ac30eb38fc7871ed7aab1e0331d6d4d136f009da", "type of X_test.shape is not tuple. X_test.shape should be a tuple"
assert sha1(str(len(X_test.shape)).encode("utf-8")+b"bb3b1f8a3ccb7b31").hexdigest() == "b6f2dd5574b4da4b25e784cc615af86832cdd0b5", "length of X_test.shape is not correct"
assert sha1(str(sorted(map(str, X_test.shape))).encode("utf-8")+b"bb3b1f8a3ccb7b31").hexdigest() == "4e0c6052600892a08375216425580034eaefe84c", "values of X_test.shape are not correct"
assert sha1(str(X_test.shape).encode("utf-8")+b"bb3b1f8a3ccb7b31").hexdigest() == "90e984fe5e5df5d8e435ba9a3ab7e91755ea5f8d", "order of elements of X_test.shape is not correct"

assert sha1(str(type(y_test.name)).encode("utf-8")+b"7ab6b2c0c33d4080").hexdigest() == "1da038a4a28025229535a4bcf8ba1489dc5192e5", "type of y_test.name is not str. y_test.name should be an str"
assert sha1(str(len(y_test.name)).encode("utf-8")+b"7ab6b2c0c33d4080").hexdigest() == "a80ca7f9ca2ceb6fa7531f94324a880db0c62e22", "length of y_test.name is not correct"
assert sha1(str(y_test.name.lower()).encode("utf-8")+b"7ab6b2c0c33d4080").hexdigest() == "6bba73ab3a09eae2b3897c2037ebbf0993e6b9bc", "value of y_test.name is not correct"
assert sha1(str(y_test.name).encode("utf-8")+b"7ab6b2c0c33d4080").hexdigest() == "6bba73ab3a09eae2b3897c2037ebbf0993e6b9bc", "correct string value of y_test.name but incorrect case of letters"

assert sha1(str(type(y_test.shape)).encode("utf-8")+b"c2f61d2b0af0a4eb").hexdigest() == "f364532f75ff6b25bf546ebdce3b0118a0b7d232", "type of y_test.shape is not tuple. y_test.shape should be a tuple"
assert sha1(str(len(y_test.shape)).encode("utf-8")+b"c2f61d2b0af0a4eb").hexdigest() == "a1a74a2d2a67a8b258cd9421b647ebe2cbe3450f", "length of y_test.shape is not correct"
assert sha1(str(sorted(map(str, y_test.shape))).encode("utf-8")+b"c2f61d2b0af0a4eb").hexdigest() == "f9e360954106f047ac7bde7d925998f777501286", "values of y_test.shape are not correct"
assert sha1(str(y_test.shape).encode("utf-8")+b"c2f61d2b0af0a4eb").hexdigest() == "22e0cbd2f7b3751b50d250b8e5ab436abc2be9f6", "order of elements of y_test.shape is not correct"

print('Success!')

**Question 3.2**
<br> {points: 1}

Using only the observations in the training dataset, create a scatterplot to assess the relationship between race time (`time_hrs`) and maximum distance ran per week during training (`max`). Put `time_hrs` on the y-axis and `max` on the x-axis. Use `mark_point` and remember to do whatever is necessary to make this an effective visualization, including addressing overplotting in a suitable manner.

*Assign this plot to an object called `marathon_scatter`.*

In [None]:
# your code here
raise NotImplementedError
marathon_scatter

In [None]:
from hashlib import sha1
assert sha1(str(type(marathon_scatter is None)).encode("utf-8")+b"1b3718a8aa17653b").hexdigest() == "8413f9f66314df6fc9303928552c58cfae5c3956", "type of marathon_scatter is None is not bool. marathon_scatter is None should be a bool"
assert sha1(str(marathon_scatter is None).encode("utf-8")+b"1b3718a8aa17653b").hexdigest() == "16eeb7bfefe311bd8e62fde95b10bbab7b1da72a", "boolean value of marathon_scatter is None is not correct"

assert sha1(str(type(marathon_scatter.encoding.x['shorthand'])).encode("utf-8")+b"529487924b6c22f2").hexdigest() == "b0f75aa7c61d4ccb8b5585b55ec8e27bd096e5f5", "type of marathon_scatter.encoding.x['shorthand'] is not str. marathon_scatter.encoding.x['shorthand'] should be an str"
assert sha1(str(len(marathon_scatter.encoding.x['shorthand'])).encode("utf-8")+b"529487924b6c22f2").hexdigest() == "4809bd871b33c3457159d7092ed43157102129e3", "length of marathon_scatter.encoding.x['shorthand'] is not correct"
assert sha1(str(marathon_scatter.encoding.x['shorthand'].lower()).encode("utf-8")+b"529487924b6c22f2").hexdigest() == "07bd17efc6f23f577edbe247659c8374c3e6b5ec", "value of marathon_scatter.encoding.x['shorthand'] is not correct"
assert sha1(str(marathon_scatter.encoding.x['shorthand']).encode("utf-8")+b"529487924b6c22f2").hexdigest() == "07bd17efc6f23f577edbe247659c8374c3e6b5ec", "correct string value of marathon_scatter.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(marathon_scatter.encoding.y['shorthand'])).encode("utf-8")+b"216e344631a0fb67").hexdigest() == "20f27984684d3ce4e92410c1ffbb79bf1fe3ae07", "type of marathon_scatter.encoding.y['shorthand'] is not str. marathon_scatter.encoding.y['shorthand'] should be an str"
assert sha1(str(len(marathon_scatter.encoding.y['shorthand'])).encode("utf-8")+b"216e344631a0fb67").hexdigest() == "3783a669bfbe5d18eaa37b5be5f879cf060f9b87", "length of marathon_scatter.encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_scatter.encoding.y['shorthand'].lower()).encode("utf-8")+b"216e344631a0fb67").hexdigest() == "e2046679b23f8f0edb13146e1c96dab292675011", "value of marathon_scatter.encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_scatter.encoding.y['shorthand']).encode("utf-8")+b"216e344631a0fb67").hexdigest() == "e2046679b23f8f0edb13146e1c96dab292675011", "correct string value of marathon_scatter.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(marathon_scatter.mark.type)).encode("utf-8")+b"5c1fbb95a3a19056").hexdigest() == "c5c0a4b9c7f3531e87d1c4e96bf9e309fc20b234", "type of marathon_scatter.mark.type is not str. marathon_scatter.mark.type should be an str"
assert sha1(str(len(marathon_scatter.mark.type)).encode("utf-8")+b"5c1fbb95a3a19056").hexdigest() == "cea811d7b68a9bd805ca41e977ba38aa429aa5d3", "length of marathon_scatter.mark.type is not correct"
assert sha1(str(marathon_scatter.mark.type.lower()).encode("utf-8")+b"5c1fbb95a3a19056").hexdigest() == "0328a5eb0b1d57e3b0f3be9905a91a1a2547cb82", "value of marathon_scatter.mark.type is not correct"
assert sha1(str(marathon_scatter.mark.type).encode("utf-8")+b"5c1fbb95a3a19056").hexdigest() == "0328a5eb0b1d57e3b0f3be9905a91a1a2547cb82", "correct string value of marathon_scatter.mark.type but incorrect case of letters"

assert sha1(str(type(marathon_scatter.data.shape[0])).encode("utf-8")+b"9dca22156e0ba137").hexdigest() == "0a812925620f4ebed2c8037f89f6b0226c89f84f", "type of marathon_scatter.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(marathon_scatter.data.shape[0]).encode("utf-8")+b"9dca22156e0ba137").hexdigest() == "75f42bcc1bbdec0107e924e366bb572b528af875", "value of marathon_scatter.data.shape[0] is not correct"

assert sha1(str(type('opacity' in marathon_scatter.mark.to_dict())).encode("utf-8")+b"7714232ebec61d2f").hexdigest() == "7e8cf1c5d208dde75e232ca06f5c717b613b6eec", "type of 'opacity' in marathon_scatter.mark.to_dict() is not bool. 'opacity' in marathon_scatter.mark.to_dict() should be a bool"
assert sha1(str('opacity' in marathon_scatter.mark.to_dict()).encode("utf-8")+b"7714232ebec61d2f").hexdigest() == "85b6414211291f50911248a49d878c1207f5eb04", "boolean value of 'opacity' in marathon_scatter.mark.to_dict() is not correct"

assert sha1(str(type(isinstance(marathon_scatter.encoding.x['title'], str))).encode("utf-8")+b"cd5b99b959316607").hexdigest() == "38a02f9bc4e8139b12c544e1abab022ff7d919a4", "type of isinstance(marathon_scatter.encoding.x['title'], str) is not bool. isinstance(marathon_scatter.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(marathon_scatter.encoding.x['title'], str)).encode("utf-8")+b"cd5b99b959316607").hexdigest() == "1c0e2a7e045f436b4249b61760b35aea994c8382", "boolean value of isinstance(marathon_scatter.encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(marathon_scatter.encoding.y['title'], str))).encode("utf-8")+b"8646964f35dcb937").hexdigest() == "4c2c1c9136b225291e85bcd8fc7453ca3867d5af", "type of isinstance(marathon_scatter.encoding.y['title'], str) is not bool. isinstance(marathon_scatter.encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(marathon_scatter.encoding.y['title'], str)).encode("utf-8")+b"8646964f35dcb937").hexdigest() == "5ee0a435d742014f6de20ecc0ff2c68239f6be4c", "boolean value of isinstance(marathon_scatter.encoding.y['title'], str) is not correct"

print('Success!')

**Question 3.3**
<br> {points: 1}

Now that we have looked at our training data, the next step is to build a linear regression model. 

Instead of using the `KNeighborsRegressor` function, we will be using the `LinearRegression` function to let `scikit-learn` know we want to perform a linear regression.

*Assign your answer to an object named `lm`.*

In [None]:
# lm = _____()

# your code here
raise NotImplementedError
lm

In [None]:
from hashlib import sha1
assert sha1(str(type(lm is None)).encode("utf-8")+b"ffb500d04ff3fbbd").hexdigest() == "20e2348c089ea6a5b1c04bf566c22a76d5d7619a", "type of lm is None is not bool. lm is None should be a bool"
assert sha1(str(lm is None).encode("utf-8")+b"ffb500d04ff3fbbd").hexdigest() == "5e927b21fab3a5d05ed7270fb3851cfaf8e33590", "boolean value of lm is None is not correct"

assert sha1(str(type(type(lm))).encode("utf-8")+b"3427c49f32318e90").hexdigest() == "71c72d512fc50d0288ae9462c33068e5905a15b5", "type of type(lm) is not correct"
assert sha1(str(type(lm)).encode("utf-8")+b"3427c49f32318e90").hexdigest() == "9ac817c61d3f864be000d70ff70c2d76b966b1df", "value of type(lm) is not correct"

print('Success!')

**Question 3.3.1**
<br>{points: 1}

After we have created our linear regression model, the next step is to fit the training dataset. 

*Assign your answer to an object named `lm_fit`.*

In [None]:
# ___ = ___.fit(___, ___)

# your code here
raise NotImplementedError
lm_fit

In [None]:
from hashlib import sha1
assert sha1(str(type(lm_fit is None)).encode("utf-8")+b"97a37b2690148bce").hexdigest() == "4d1ea816938715cfd7a70da7cde92b7b68eff8ef", "type of lm_fit is None is not bool. lm_fit is None should be a bool"
assert sha1(str(lm_fit is None).encode("utf-8")+b"97a37b2690148bce").hexdigest() == "2ad3bef03dfd461f12b58d3205d0ee6d9cebe626", "boolean value of lm_fit is None is not correct"

assert sha1(str(type(type(lm_fit))).encode("utf-8")+b"688a058199fc114a").hexdigest() == "30fa6cfcb2b454b575b6c5c89fe096cb7b80cc71", "type of type(lm_fit) is not correct"
assert sha1(str(type(lm_fit)).encode("utf-8")+b"688a058199fc114a").hexdigest() == "b4fe223f502d32e9951afa75d75acb4270c30a20", "value of type(lm_fit) is not correct"

assert sha1(str(type(lm_fit.coef_)).encode("utf-8")+b"7b97c1dae8de6b8b").hexdigest() == "1b22bc96799804170c2c7d87e7b14d1f69149270", "type of lm_fit.coef_ is not correct"
assert sha1(str(lm_fit.coef_).encode("utf-8")+b"7b97c1dae8de6b8b").hexdigest() == "fd9ae2b896f9c8233d61511be285fb82cadf5b82", "value of lm_fit.coef_ is not correct"

assert sha1(str(type(lm_fit.intercept_)).encode("utf-8")+b"f108981739f1e6fd").hexdigest() == "c98c0affdf88872df9bfc627098e7a13312c2fae", "type of lm_fit.intercept_ is not correct"
assert sha1(str(lm_fit.intercept_).encode("utf-8")+b"f108981739f1e6fd").hexdigest() == "2d96404da7cba8d2ee54103ddee38866451fb16a", "value of lm_fit.intercept_ is not correct"

print('Success!')

**Question 3.4**
<br> {points: 1}

Now, let's visualize the model predictions as a straight line overlaid on the training data. Use the `predict` function of `lm` to create predictions for the `marathon_training` data. Then, add the column of predictions to the `marathon_training` data frame using the `assign` function. Name the resulting data frame `marathon_preds` and the new column `predictions`.

Next, create a scatterplot with the marathon time (y-axis) against the maximum distance run per week (x-axis) from `marathon_preds`. Use `mark_circle` with an opacity of 0.4 to avoid overplotting. Assign your plot to a variable called `marathon_plot`. **Plot the predictions as a black line over the data points.** Remember the fundamentals of effective visualizations such as having a human-readable axes titles.

*Name your plot `marathon_plot`.*

In [None]:
# marathon_preds = ____.assign(
#     predictions= _____.predict(____)
# )
# scatterplot = ___
#
# marathon_plot = scatterplot + ___.mark_line(___).encode(___)

# your code here
raise NotImplementedError
marathon_plot

In [None]:
from hashlib import sha1
assert sha1(str(type(marathon_preds is None)).encode("utf-8")+b"8c016494f179e882").hexdigest() == "f981d015cd1cd9a296c671175e08a134c7dd5313", "type of marathon_preds is None is not bool. marathon_preds is None should be a bool"
assert sha1(str(marathon_preds is None).encode("utf-8")+b"8c016494f179e882").hexdigest() == "62fc58f39fa8994a14e3d36da1cb8cf754142b30", "boolean value of marathon_preds is None is not correct"

assert sha1(str(type(marathon_preds)).encode("utf-8")+b"1cd01a6b6290c18c").hexdigest() == "1d033bae323ff74f364fbb09d41b2b90525ee3e9", "type of type(marathon_preds) is not correct"

assert sha1(str(type(marathon_preds.shape)).encode("utf-8")+b"d5c01120527fd284").hexdigest() == "e4f36d38b5bc843cd760693ae702af7c32787495", "type of marathon_preds.shape is not tuple. marathon_preds.shape should be a tuple"
assert sha1(str(len(marathon_preds.shape)).encode("utf-8")+b"d5c01120527fd284").hexdigest() == "9bd596725da839efad9e3be565feec5493dd41d0", "length of marathon_preds.shape is not correct"
assert sha1(str(sorted(map(str, marathon_preds.shape))).encode("utf-8")+b"d5c01120527fd284").hexdigest() == "def905edf84ebd3d3439fef64b2e78e2d51817a6", "values of marathon_preds.shape are not correct"
assert sha1(str(marathon_preds.shape).encode("utf-8")+b"d5c01120527fd284").hexdigest() == "66dd383bbd63e288df5a6449605cb3f98040011c", "order of elements of marathon_preds.shape is not correct"

assert sha1(str(type("predictions" in marathon_preds.columns)).encode("utf-8")+b"5da3f9cd2ebbc396").hexdigest() == "c529a0050a977e17f3d24382eae32d6fbf86f14a", "type of \"predictions\" in marathon_preds.columns is not bool. \"predictions\" in marathon_preds.columns should be a bool"
assert sha1(str("predictions" in marathon_preds.columns).encode("utf-8")+b"5da3f9cd2ebbc396").hexdigest() == "405ff042d2c33e30fba68acf6f4f2377782e322a", "boolean value of \"predictions\" in marathon_preds.columns is not correct"

assert sha1(str(type(sum(marathon_preds.predictions))).encode("utf-8")+b"59cb154a3098c596").hexdigest() == "21bf0ef95bde189085d756d6b86c206820013525", "type of sum(marathon_preds.predictions) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(sum(marathon_preds.predictions), 2)).encode("utf-8")+b"59cb154a3098c596").hexdigest() == "ee95327750641dfa10b420a80b18f16e57662ef4", "value of sum(marathon_preds.predictions) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(sum(marathon_preds.time_hrs))).encode("utf-8")+b"2f887ce8243f0144").hexdigest() == "51b338dac1912b485bfaf1716fe3a42d3e4b4ffd", "type of sum(marathon_preds.time_hrs) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(sum(marathon_preds.time_hrs), 2)).encode("utf-8")+b"2f887ce8243f0144").hexdigest() == "4699ecdffc81394de4ea315eee053e08c7425f9c", "value of sum(marathon_preds.time_hrs) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(marathon_plot is None)).encode("utf-8")+b"c933dd6bef9e3e46").hexdigest() == "f6c9e67bae503a8f00a87920335a5fef4064e0e6", "type of marathon_plot is None is not bool. marathon_plot is None should be a bool"
assert sha1(str(marathon_plot is None).encode("utf-8")+b"c933dd6bef9e3e46").hexdigest() == "033cfd93416e1bda20b439f8e929e17b689e6941", "boolean value of marathon_plot is None is not correct"

assert sha1(str(type(len(marathon_plot.layer))).encode("utf-8")+b"122f0545088ad69f").hexdigest() == "6a29b7906b3aadb419a5aa2250d2986f8875d484", "type of len(marathon_plot.layer) is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(len(marathon_plot.layer)).encode("utf-8")+b"122f0545088ad69f").hexdigest() == "d082935f15ec87c142f6ffd2a36d3d541a347962", "value of len(marathon_plot.layer) is not correct"

assert sha1(str(type(marathon_plot.layer[0].mark)).encode("utf-8")+b"9a5b0211ae456091").hexdigest() == "e995f966a72aacc6377dcc0ac2afee83702fa2df", "type of marathon_plot.layer[0].mark is not correct"
assert sha1(str(marathon_plot.layer[0].mark).encode("utf-8")+b"9a5b0211ae456091").hexdigest() == "82b20158d9dd76eda535abd9b7bd8022043559df", "value of marathon_plot.layer[0].mark is not correct"

assert sha1(str(type(marathon_plot.layer[1].mark)).encode("utf-8")+b"b832743c95dd47ad").hexdigest() == "c3a5e1762c7437b7cc8b8d82ed7fc21422b6b7bc", "type of marathon_plot.layer[1].mark is not correct"
assert sha1(str(marathon_plot.layer[1].mark).encode("utf-8")+b"b832743c95dd47ad").hexdigest() == "9c800164ae7dc998ee76c3b08e9e67989015032e", "value of marathon_plot.layer[1].mark is not correct"

assert sha1(str(type(marathon_plot.layer[0].encoding.x['shorthand'])).encode("utf-8")+b"bc87bc8fce4269e1").hexdigest() == "15c6158494b616c8c33beb325d72e1aecc181df3", "type of marathon_plot.layer[0].encoding.x['shorthand'] is not str. marathon_plot.layer[0].encoding.x['shorthand'] should be an str"
assert sha1(str(len(marathon_plot.layer[0].encoding.x['shorthand'])).encode("utf-8")+b"bc87bc8fce4269e1").hexdigest() == "e61449a1816f89758e40667631a6ff5390c0d5cf", "length of marathon_plot.layer[0].encoding.x['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[0].encoding.x['shorthand'].lower()).encode("utf-8")+b"bc87bc8fce4269e1").hexdigest() == "361c21950afb52ee5b9c3fc8976ff50ca553675d", "value of marathon_plot.layer[0].encoding.x['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[0].encoding.x['shorthand']).encode("utf-8")+b"bc87bc8fce4269e1").hexdigest() == "361c21950afb52ee5b9c3fc8976ff50ca553675d", "correct string value of marathon_plot.layer[0].encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(marathon_plot.layer[0].encoding.y['shorthand'])).encode("utf-8")+b"04cfd1e0485488c6").hexdigest() == "1d5859615b1d86ded34a49926531655dc86ba904", "type of marathon_plot.layer[0].encoding.y['shorthand'] is not str. marathon_plot.layer[0].encoding.y['shorthand'] should be an str"
assert sha1(str(len(marathon_plot.layer[0].encoding.y['shorthand'])).encode("utf-8")+b"04cfd1e0485488c6").hexdigest() == "b1a814d142aafb934d4bbdbf60e722329aac754a", "length of marathon_plot.layer[0].encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[0].encoding.y['shorthand'].lower()).encode("utf-8")+b"04cfd1e0485488c6").hexdigest() == "36d0e5cf7afb46fc0a8ec475c2c59012a27896b1", "value of marathon_plot.layer[0].encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[0].encoding.y['shorthand']).encode("utf-8")+b"04cfd1e0485488c6").hexdigest() == "36d0e5cf7afb46fc0a8ec475c2c59012a27896b1", "correct string value of marathon_plot.layer[0].encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(marathon_plot.layer[1].encoding.y['shorthand'])).encode("utf-8")+b"3dd1c29f155a6d3b").hexdigest() == "1d99cfe5e01ec77217226ee7efc938c6df1809af", "type of marathon_plot.layer[1].encoding.y['shorthand'] is not str. marathon_plot.layer[1].encoding.y['shorthand'] should be an str"
assert sha1(str(len(marathon_plot.layer[1].encoding.y['shorthand'])).encode("utf-8")+b"3dd1c29f155a6d3b").hexdigest() == "361344f7502449a63357051e80c321b277644d12", "length of marathon_plot.layer[1].encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[1].encoding.y['shorthand'].lower()).encode("utf-8")+b"3dd1c29f155a6d3b").hexdigest() == "bce8d564745e60c4b36d16e6c2e3b3e9c052939c", "value of marathon_plot.layer[1].encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[1].encoding.y['shorthand']).encode("utf-8")+b"3dd1c29f155a6d3b").hexdigest() == "bce8d564745e60c4b36d16e6c2e3b3e9c052939c", "correct string value of marathon_plot.layer[1].encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(marathon_plot.layer[0].encoding.x['title'], str))).encode("utf-8")+b"907e22588b011757").hexdigest() == "00dfd437276a762dcd76440caa4290f6cb7ced17", "type of isinstance(marathon_plot.layer[0].encoding.x['title'], str) is not bool. isinstance(marathon_plot.layer[0].encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(marathon_plot.layer[0].encoding.x['title'], str)).encode("utf-8")+b"907e22588b011757").hexdigest() == "5201d06b3285a78ff85dd52bd0ccea973891b3a1", "boolean value of isinstance(marathon_plot.layer[0].encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(marathon_plot.layer[0].encoding.y['title'], str))).encode("utf-8")+b"5d1bbad601b1dac4").hexdigest() == "f92f01d1e72ab1b7aa5e7792983c507d3bf2cf0b", "type of isinstance(marathon_plot.layer[0].encoding.y['title'], str) is not bool. isinstance(marathon_plot.layer[0].encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(marathon_plot.layer[0].encoding.y['title'], str)).encode("utf-8")+b"5d1bbad601b1dac4").hexdigest() == "877897f71db34e7441955010cd752e69c9938585", "boolean value of isinstance(marathon_plot.layer[0].encoding.y['title'], str) is not correct"

print('Success!')

**Question 3.5**
<br> {points: 1}

Great! We can now see the line of best fit on the graph. Now let's calculate the RMSPE using the **test data**. To get to this point, first, use the `lm` object to make predictions on the test data. Then, add the column of predictions to the `marathon_testing` data frame using the `assign` function. Name the resulting data frame `test_preds` and the new column `predictions`.

Afterwards, calculate the RMSPE using the `mean_squared_error` function.

*Assign the RMSPE score to an object called `lm_rmspe`.*

In [None]:
# ___ = ___.assign(
#     predictions=___.predict(___)
# )

# ___ = ___(___, ___)**(1/2)

# your code here
raise NotImplementedError
lm_rmspe

In [None]:
from hashlib import sha1
assert sha1(str(type(test_preds is None)).encode("utf-8")+b"970a6d0791dd0d2f").hexdigest() == "d082dc6bb3b6c351dbf8fd07013c1c1eb8125f75", "type of test_preds is None is not bool. test_preds is None should be a bool"
assert sha1(str(test_preds is None).encode("utf-8")+b"970a6d0791dd0d2f").hexdigest() == "9936351c1443c8708371d9a82a2772879b233c2b", "boolean value of test_preds is None is not correct"

assert sha1(str(type(test_preds)).encode("utf-8")+b"77c10c4bfab138fc").hexdigest() == "f09b02bf5c057cbe968ea48a65b4cedf6a86cd70", "type of type(test_preds) is not correct"

assert sha1(str(type(test_preds.shape)).encode("utf-8")+b"f886b8c00ca10408").hexdigest() == "d455471fb8e0ecfdebb4700009b52de7c2471150", "type of test_preds.shape is not tuple. test_preds.shape should be a tuple"
assert sha1(str(len(test_preds.shape)).encode("utf-8")+b"f886b8c00ca10408").hexdigest() == "31567839fc6b2a2e3de1732f4402534d2fd327f7", "length of test_preds.shape is not correct"
assert sha1(str(sorted(map(str, test_preds.shape))).encode("utf-8")+b"f886b8c00ca10408").hexdigest() == "c9ccdf7cd5d9647b983b78158f23955f5aaff40e", "values of test_preds.shape are not correct"
assert sha1(str(test_preds.shape).encode("utf-8")+b"f886b8c00ca10408").hexdigest() == "1905ff4843d4c4bd3c899bdbd68a99f3220e4adb", "order of elements of test_preds.shape is not correct"

assert sha1(str(type(sum(test_preds.predictions))).encode("utf-8")+b"1c03c711fa3889b9").hexdigest() == "0c7871d575b52695d4ddf25cb4f074d1dacaf611", "type of sum(test_preds.predictions) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(sum(test_preds.predictions), 2)).encode("utf-8")+b"1c03c711fa3889b9").hexdigest() == "f49b45a86b67e20607c3e2343595ded456e0d1a4", "value of sum(test_preds.predictions) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(lm_rmspe is None)).encode("utf-8")+b"a2e898cc1bce110c").hexdigest() == "7a93fc7eb168c869b0bafe99b55250eed27d3702", "type of lm_rmspe is None is not bool. lm_rmspe is None should be a bool"
assert sha1(str(lm_rmspe is None).encode("utf-8")+b"a2e898cc1bce110c").hexdigest() == "75c09ba80cd3cf0fa3094eb9058bd8bb723ba885", "boolean value of lm_rmspe is None is not correct"

assert sha1(str(type(lm_rmspe)).encode("utf-8")+b"7d725f74b8b2b5cf").hexdigest() == "5b4084d87c918bbe232011854998675a2ef56fdf", "type of type(lm_rmspe) is not correct"

assert sha1(str(type(round(lm_rmspe, 1))).encode("utf-8")+b"d5ab8bf9e1088dbb").hexdigest() == "ab2a6e52c5c1bebb1b7273e523781c44226c4bd1", "type of round(lm_rmspe, 1) is not correct"
assert sha1(str(round(lm_rmspe, 1)).encode("utf-8")+b"d5ab8bf9e1088dbb").hexdigest() == "5faaa722fa0a4ec49a197833a5880882a47ed43a", "value of round(lm_rmspe, 1) is not correct"

print('Success!')

**Question 3.5.1**
<br> {points: 1}

Now, let's visualize the model predictions as a straight line overlaid on the test data. First, create a scatterplot to assess the relationship between race time (`time_hrs`) and maximum distance ran per week during training (`max`) on the **testing data.** Then add a line to the plot corresponding to the predictions (`predictions`) from the fit linear regression model. Remember to do whatever is necessary to make this an effective visualization.

*Assign the plot to an object called `marathon_plot_test`.*

In [None]:
# marathon_plot = ___

# your code here
raise NotImplementedError
marathon_plot_test

In [None]:
from hashlib import sha1
assert sha1(str(type(marathon_plot is None)).encode("utf-8")+b"7b4fc87eb3be904c").hexdigest() == "2b7703caa3ab95b81333a41911035a784dc17943", "type of marathon_plot is None is not bool. marathon_plot is None should be a bool"
assert sha1(str(marathon_plot is None).encode("utf-8")+b"7b4fc87eb3be904c").hexdigest() == "af52b893d32620ff50ca992e6a581179dc9c700c", "boolean value of marathon_plot is None is not correct"

assert sha1(str(type(len(marathon_plot.layer))).encode("utf-8")+b"674889921a64ef3e").hexdigest() == "dfe617248cd03d64a7a657a7ae957e7c7b397282", "type of len(marathon_plot.layer) is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(len(marathon_plot.layer)).encode("utf-8")+b"674889921a64ef3e").hexdigest() == "ec2bb5b0331bf7f946842e737308c03b87f570c9", "value of len(marathon_plot.layer) is not correct"

assert sha1(str(type(marathon_plot.layer[0].mark)).encode("utf-8")+b"79e3a1b47ddf5aac").hexdigest() == "638984548e5a0d6c35401cc7b372c1786083e28e", "type of marathon_plot.layer[0].mark is not correct"
assert sha1(str(marathon_plot.layer[0].mark).encode("utf-8")+b"79e3a1b47ddf5aac").hexdigest() == "20c921a35d00b60c23fe9857ce30af38e8dc6be2", "value of marathon_plot.layer[0].mark is not correct"

assert sha1(str(type(marathon_plot.layer[1].mark)).encode("utf-8")+b"b72b81237bd1c064").hexdigest() == "08883f8a033c59e3a9ea8193dc814a930c33fe46", "type of marathon_plot.layer[1].mark is not correct"
assert sha1(str(marathon_plot.layer[1].mark).encode("utf-8")+b"b72b81237bd1c064").hexdigest() == "29d58a65c0e7398c6d553959ae5904d36584caad", "value of marathon_plot.layer[1].mark is not correct"

assert sha1(str(type(marathon_plot.layer[0].encoding.x['shorthand'])).encode("utf-8")+b"a78a3aca125e53ff").hexdigest() == "d473627eb902b51249d7db28e9fa6f53e3f47281", "type of marathon_plot.layer[0].encoding.x['shorthand'] is not str. marathon_plot.layer[0].encoding.x['shorthand'] should be an str"
assert sha1(str(len(marathon_plot.layer[0].encoding.x['shorthand'])).encode("utf-8")+b"a78a3aca125e53ff").hexdigest() == "8b9aa301c3e86bc0fcd6411440ea70ccbcdca466", "length of marathon_plot.layer[0].encoding.x['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[0].encoding.x['shorthand'].lower()).encode("utf-8")+b"a78a3aca125e53ff").hexdigest() == "9c7007e5bd0b57884479071c078b321973dd4d2d", "value of marathon_plot.layer[0].encoding.x['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[0].encoding.x['shorthand']).encode("utf-8")+b"a78a3aca125e53ff").hexdigest() == "9c7007e5bd0b57884479071c078b321973dd4d2d", "correct string value of marathon_plot.layer[0].encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(marathon_plot.layer[0].encoding.y['shorthand'])).encode("utf-8")+b"93f524138c5867fc").hexdigest() == "aac5c6394460458767533045c125acae58691899", "type of marathon_plot.layer[0].encoding.y['shorthand'] is not str. marathon_plot.layer[0].encoding.y['shorthand'] should be an str"
assert sha1(str(len(marathon_plot.layer[0].encoding.y['shorthand'])).encode("utf-8")+b"93f524138c5867fc").hexdigest() == "5b33d8ae6593eefe90042ff4614e0f729f632880", "length of marathon_plot.layer[0].encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[0].encoding.y['shorthand'].lower()).encode("utf-8")+b"93f524138c5867fc").hexdigest() == "657b825db79f06bed5c71bd963c9b53fa47f0663", "value of marathon_plot.layer[0].encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[0].encoding.y['shorthand']).encode("utf-8")+b"93f524138c5867fc").hexdigest() == "657b825db79f06bed5c71bd963c9b53fa47f0663", "correct string value of marathon_plot.layer[0].encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(marathon_plot.layer[1].encoding.y['shorthand'])).encode("utf-8")+b"b562352b67a5652e").hexdigest() == "82fce814664e0f8e827dd3eac312750f0231fbc9", "type of marathon_plot.layer[1].encoding.y['shorthand'] is not str. marathon_plot.layer[1].encoding.y['shorthand'] should be an str"
assert sha1(str(len(marathon_plot.layer[1].encoding.y['shorthand'])).encode("utf-8")+b"b562352b67a5652e").hexdigest() == "33adb0ca4b1182915b84ad9011d08ce14091fc81", "length of marathon_plot.layer[1].encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[1].encoding.y['shorthand'].lower()).encode("utf-8")+b"b562352b67a5652e").hexdigest() == "4c3d21eee874a3f4d8f1addd4b4cd48fc62875d8", "value of marathon_plot.layer[1].encoding.y['shorthand'] is not correct"
assert sha1(str(marathon_plot.layer[1].encoding.y['shorthand']).encode("utf-8")+b"b562352b67a5652e").hexdigest() == "4c3d21eee874a3f4d8f1addd4b4cd48fc62875d8", "correct string value of marathon_plot.layer[1].encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(marathon_plot.layer[0].encoding.x['title'], str))).encode("utf-8")+b"a7472b56b8eebfd9").hexdigest() == "976f63cc86308dda5a48ccc9344cc195220fb922", "type of isinstance(marathon_plot.layer[0].encoding.x['title'], str) is not bool. isinstance(marathon_plot.layer[0].encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(marathon_plot.layer[0].encoding.x['title'], str)).encode("utf-8")+b"a7472b56b8eebfd9").hexdigest() == "94f9a9962985dd837e7baad36a68e50a426c6ac3", "boolean value of isinstance(marathon_plot.layer[0].encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(marathon_plot.layer[0].encoding.y['title'], str))).encode("utf-8")+b"dfee7d3f500c89d9").hexdigest() == "4dff929ab95da1e884afbeca9ddc3e30e1d3c231", "type of isinstance(marathon_plot.layer[0].encoding.y['title'], str) is not bool. isinstance(marathon_plot.layer[0].encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(marathon_plot.layer[0].encoding.y['title'], str)).encode("utf-8")+b"dfee7d3f500c89d9").hexdigest() == "2d2907d992d3d65d418a88e050f7fdc9247d9ddb", "boolean value of isinstance(marathon_plot.layer[0].encoding.y['title'], str) is not correct"

print('Success!')

**Question 3.6**
<br> {points: 1}

Compare the RMSPE of k-nn regression (`0.616` from last worksheet) to that of simple linear regression. Which is greater? 

A. Simple linear regression has a greater RMSPE

B. $k$-nn regression has a greater RMSPE

C. Neither, they are identical

*Save the letter of your answer to a variable named `answer3_6`. Make sure you put quotations around the letter and pay attention to case.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer3_6)).encode("utf-8")+b"81d992813dac03f2").hexdigest() == "c23359ee0b75c465988ee4b93a3c477c4d31b362", "type of answer3_6 is not str. answer3_6 should be an str"
assert sha1(str(len(answer3_6)).encode("utf-8")+b"81d992813dac03f2").hexdigest() == "48091e934236a4213861cbdd646382630ff770b6", "length of answer3_6 is not correct"
assert sha1(str(answer3_6.lower()).encode("utf-8")+b"81d992813dac03f2").hexdigest() == "cdc0039f3646cd9805f1ec2052851b0535865724", "value of answer3_6 is not correct"
assert sha1(str(answer3_6).encode("utf-8")+b"81d992813dac03f2").hexdigest() == "1dd232c97c261c1c05a36d0640685d4e8d6d0869", "correct string value of answer3_6 but incorrect case of letters"

print('Success!')

**Question 3.7**
<br> {points: 1}

Which model does a better job of predicting on the test dataset?

A. Simple linear regression 

B. $k$-nn regression 

C. Neither, they are identical

*Save the letter of your answer to a variable named `answer3_7`. Make sure you put quotations around the letter and pay attention to case.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer3_7)).encode("utf-8")+b"d4f6e6c0e617bdbb").hexdigest() == "d77ea8d481027a3a9be489bc4168b57b9eba3aa0", "type of answer3_7 is not str. answer3_7 should be an str"
assert sha1(str(len(answer3_7)).encode("utf-8")+b"d4f6e6c0e617bdbb").hexdigest() == "af4938717b75e1669b1161e07d395a3b27cbe0a2", "length of answer3_7 is not correct"
assert sha1(str(answer3_7.lower()).encode("utf-8")+b"d4f6e6c0e617bdbb").hexdigest() == "1c81345a7245bd44e77393aaab1387affd6f99fe", "value of answer3_7 is not correct"
assert sha1(str(answer3_7).encode("utf-8")+b"d4f6e6c0e617bdbb").hexdigest() == "1af820514778ec156e0044e80bba6c578eb01ae2", "correct string value of answer3_7 but incorrect case of letters"

print('Success!')

Given that the linear regression model is a straight line, we can write our model as a mathematical equation. We can get the two numbers we need for this from the `coef_` and `intercept_` attributes from `lm_fit`. 

In [None]:
# run this cell
print(f"The coefficient for the linear regression is {lm_fit.coef_[0]:0.3f}.")
print(f"The intercept for the linear regression is {lm_fit.intercept_:0.3f}.")

**Question 3.8.1**
<br> {points: 1}

Which of the following mathematical equations represents the model based on the numbers output in the cell above? 

A. $Predicted \ race \ time \ (in \ hours) = 4.851 - 0.022  * max \ (in \ miles)$

B. $Predicted \ race \ time \ (in \ hours) = -0.022 + 4.851 * max \ (in \ miles)$

C. $Predicted \ max \ (in \ miles) = 4.851 - 0.022 *  \ race \ time \ (in \ hours)$
 
D. $Predicted \ max \ (in \ miles) = -0.022 + 4.851 *  \ race \ time \ (in \ hours)$

*Save the letter of your answer to a variable named `answer3_8_1`. Make sure you put quotations around the letter and pay attention to case.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer3_8_1)).encode("utf-8")+b"66fc3665b3692f1c").hexdigest() == "ed48d5c4373de01f4cec88b4c1e3dfadfce2ac5a", "type of answer3_8_1 is not str. answer3_8_1 should be an str"
assert sha1(str(len(answer3_8_1)).encode("utf-8")+b"66fc3665b3692f1c").hexdigest() == "7015114879a0a54e067934426c86a09bb37be3c5", "length of answer3_8_1 is not correct"
assert sha1(str(answer3_8_1.lower()).encode("utf-8")+b"66fc3665b3692f1c").hexdigest() == "206d800397494ca51ed78e47e540f346387458f7", "value of answer3_8_1 is not correct"
assert sha1(str(answer3_8_1).encode("utf-8")+b"66fc3665b3692f1c").hexdigest() == "de907d177b66e88539bd7515c6036f2a62703719", "correct string value of answer3_8_1 but incorrect case of letters"

print('Success!')