# Worksheet 5: Introduction to version control


<img src="img/logo.png" width=400>

### Lecture and Tutorial Learning Goals:

After completing this week's lecture and tutorial work, you will be able to:

* Describe what version control is and why data analysis projects can benefit from it
* Create a remote version control repository on GitHub
* Move changes to files from GitHub to JupyterHub, and from JupyterHub to GitHub
* Give collaborators access to the repository
* Resolve conflicting edits made by multiple collaborators
* Communicate with collaborators using issues
* Use best practices when collaborating on a project with others

This worksheet covers parts of [Chapter 12](https://python.datasciencebook.ca/version-control.html) of the online textbook. You should read this chapter before attempting the worksheet.

## 1. What is version control? Why use it? 

**Question 1.1** Multiple Choice:
<br> {points: 1}

Which reason listed below **is not** a good reason to use version control:

A. Version control tools provide transparency on how a project evolved by tracking the history of documents, and who made what changes to those documents.

B. Version control tools usually include a remote/cloud repository hosting service that can act as a backup of your local files (i.e., the files on your computer).

C. In practice, most data science projects involve collaboration on documents that contain code (e.g., Jupyter notebooks), and version control tools facilitate collaboration on such documents.

D. Version control tools check the accuracy of your code. 

*Assign your answer to an object called `answer1_1`. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. `"F"`).*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_1)).encode("utf-8")+b"e6065b93df163922").hexdigest() == "e272db7f973c560f51ec5a67526812b46690dcc8", "type of answer1_1 is not str. answer1_1 should be an str"
assert sha1(str(len(answer1_1)).encode("utf-8")+b"e6065b93df163922").hexdigest() == "afab2550a4717bc1ace5a0d1861f318cb9e92935", "length of answer1_1 is not correct"
assert sha1(str(answer1_1.lower()).encode("utf-8")+b"e6065b93df163922").hexdigest() == "7b7775327500fb22d03e2d4cf700b25caeac8cc6", "value of answer1_1 is not correct"
assert sha1(str(answer1_1).encode("utf-8")+b"e6065b93df163922").hexdigest() == "33992e507e5bbbb27b750ce66cdac30c66dc4ea2", "correct string value of answer1_1 but incorrect case of letters"

print('Success!')

**Question 1.2** True or false: 
<br> {points: 1}

**Git** is a remote/cloud repository hosting service where you can backup and share your files with collaborators.

*Assign your answer to an object called `answer1_2`. Make sure your answer is a boolean (e.g. `True` or `False`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_2)).encode("utf-8")+b"10c91b0df0aed46d").hexdigest() == "6cb508ba87af3b0bc2acc8f3b4054711c2d30d70", "type of answer1_2 is not bool. answer1_2 should be a bool"
assert sha1(str(answer1_2).encode("utf-8")+b"10c91b0df0aed46d").hexdigest() == "0ea0089fe1a84ad17dfcde6d0372aea1456746dd", "boolean value of answer1_2 is not correct"

print('Success!')

## 2. Creating a space for your data science project online

For the rest of this worksheet, you will create a toy data science project on GitHub to practice using Git and GitHub. We will ask you questions about what you are doing along the way to test your understanding.

#### Signup for a free GitHub.com account:
If you do not already have a free GitHub.com account, visit [GitHub.com](https://github.com/) and signup for one. Store your username and password in a secure place (we recomend using a password manager for things like this, examples of these are [LastPass](https://www.lastpass.com/), [1Password](https://1password.com/), etc).

#### Create a GitHub repository:
On [GitHub.com](https://github.com/) create a new repository and name it `toy_ds_project`. You can decide whether to make it private or public. Ensure that you select “Add a README file.” **This task corresponds to [this step](https://python.datasciencebook.ca/version-control.html#creating-a-remote-repository-on-github) in the textbook.**

**Question 2.1** Multiple Choice:

Which statement below is **not true** about GitHub repositories:
<br> {points: 1}

A. Immediately after a repository is created on GitHub.com using the website, the repository exists only on GitHub.com and does not exist on your computer (i.e., you need to do something to get a copy of it on your computer).

B. Only the creator of GitHub repository, and people the creator specify, can edit the files in the repository. This is true even when the repository is public.

C. If the repository is public, anyone on the web can view it.

D. If the repository is public, anyone on the web can edit it.

E. A GitHub repository is like a folder on Dropbox or Google Drive, but it is different in that it has special properties for version control.

*Assign your answer to an object called `answer2_1`. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. `"F"`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_1)).encode("utf-8")+b"a2d59828106c2056").hexdigest() == "f17b8efedde85463fd83cfd11e66156b8e7ecbfe", "type of answer2_1 is not str. answer2_1 should be an str"
assert sha1(str(len(answer2_1)).encode("utf-8")+b"a2d59828106c2056").hexdigest() == "ce443c826a8de2452f351422f583fd67ab4c407b", "length of answer2_1 is not correct"
assert sha1(str(answer2_1.lower()).encode("utf-8")+b"a2d59828106c2056").hexdigest() == "f96a8b748e3984c69b2df0d831bba5e25ce1db2f", "value of answer2_1 is not correct"
assert sha1(str(answer2_1).encode("utf-8")+b"a2d59828106c2056").hexdigest() == "5bb4b87729d72b32b6005f29ec2439cb5b7f1556", "correct string value of answer2_1 but incorrect case of letters"

print('Success!')

## 3. Creating and editing files on GitHub

1. Edit the `README.md` file in your `toy_ds_project` repository on GitHub.com using the pen tool. Write "project creation date:" and list today's date. 
2. Commit this change directly to the main branch and write the commit message "added creation date". **This task corresponds to [this step](https://python.datasciencebook.ca/version-control.html#editing-files-on-github-with-the-pen-tool) in the textbook.**
3. Next, use the pen tool again to edit the `README.md` file. Write "author" and list your name as the author. Commit this change and use the commit message "added project author".
4. Explore the commit history of your project by clicking on the link that looks like this: 

<img src="img/commits_history_link.png" width=600>

> Note: you can visit the version of your repository at any stage in its history by click on the `<>` buttons! Give it a try!

**Question 3.1**  True or false:
<br> {points: 1}

Even though commit messages are required to edit a file using the pen tool on GitHub.com, it doesn't matter what message you write in practice.

*Assign your answer to an object called `answer3_1`. Make sure your answer is a boolean (e.g. `True` or `False`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer3_1)).encode("utf-8")+b"ba99e0ac6500f14c").hexdigest() == "89b13c836775b04b68c1680d043eceb96582b64f", "type of answer3_1 is not bool. answer3_1 should be a bool"
assert sha1(str(answer3_1).encode("utf-8")+b"ba99e0ac6500f14c").hexdigest() == "710c3e06d237a5c1899c5c98b9496b9f8b4cb3ec", "boolean value of answer3_1 is not correct"

print('Success!')

## 4. Cloning your repository on JupyterHub

For our data science project, we need to put a copy of our repository somewhere we can run and test the code we write (otherwise, we won't know that our code works!!!). We can use the course JupyterHub for this!

Clone a copy of this GitHub repository to the course JupyterHub using the Jupyter Git extension. **This task corresponds to [this step](https://python.datasciencebook.ca/version-control.html#cloning-a-repository-using-jupyter) in the textbook.**


**Question 4.1** True or false:
<br> {points: 1}

The definition of **cloning a repository** is to copy/download the entire contents (files, project history, and location of the remote repository) of a remote GitHub.com repository to a computer (e.g., your workspace on a JupyterHub, or your laptop).

*Assign your answer to an object called `answer4_1`. Make sure your answer is a boolean (e.g. `True` or `False`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer4_1)).encode("utf-8")+b"7786969e2866751e").hexdigest() == "1362e2445866d20e39263cbac9fdc989ed5a4845", "type of answer4_1 is not bool. answer4_1 should be a bool"
assert sha1(str(answer4_1).encode("utf-8")+b"7786969e2866751e").hexdigest() == "18c54a3763c25886370ee74fc23367753c3956a0", "boolean value of answer4_1 is not correct"

print('Success!')

## 5. Working in a cloned repository on JupyterHub

Now that your repository exists in your workspace on the course JupyterHub, you can create a new Jupyter notebook with an Python kernel and write some code! To help this project move along, we show you below how to create a new Jupyter notebook and save it and some code to put in it. 

### Creating a new Jupyter notebook with an Python kernel

To create a new Jupyter notebook with an Python kernel in your `toy_ds_project` repository, use the file navigation menu of Jupyter so that you are inside the `toy_ds_project`:

<img src="img/to-get-to-project-root.png" width=500>


Once there, click on new Python notebook.

<img src="img/new-notebook_py.png" width=500>

Next, right-click on the filename and click on "Rename", to rename the file `marg_vs_divorce_viz.ipynb`.

<img src="img/rename.png" width=500>


### Add code to the notebook you created

Add the code below to the notebook and run it to display the data visualization. Feel free to add a narrative to the notebook if you like, commenting on the question being asked, the data visualization results, and whether correlation means causation. When you are done, save the notebook.

```python
import altair as alt
import pandas as pd


should_have_bought_butter = pd.DataFrame({
    "margarine_consumption": [8.2, 7, 6.5, 5.3, 5.2, 4, 4.6, 4.5, 4.2, 3.7],
    "maine_divorce_rate": [5, 4.7, 4.6, 4.4, 4.3, 4.1, 4.2, 4.2, 4.2, 4.1],
    "year": [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009]
})

marg_vs_time = alt.Chart(should_have_bought_butter).mark_line(color="steelblue").encode(
    x=alt.X("year").title("Year"),
    y=alt.Y("margarine_consumption")
        .title("Margarine consumption (lbs per capita)")
        .scale(zero=False)
).properties(
    height=250
)


divorce_rate_vs_time = marg_vs_time.mark_line(color="coral").encode(
    y=alt.Y("maine_divorce_rate")
        .title("Divorce rate in Maine (per 1000")
        .scale(zero=False)
)

(marg_vs_time | divorce_rate_vs_time).properties(
    title="Divorce rate in Maine correlates with margarine consumption"
)
```

## 6. Specifying files to commit

Now we would like to start the process of putting `marg_vs_divorce_viz.ipynb` under version control and eventually push this file to our remote repository on GitHub.com. The first step to doing this is to **add** the changes to this file (creating it and the code) to the Git staging area. Go ahead and use the Jupyter Git extension to do this now. **This task corresponds to [this step](https://python.datasciencebook.ca/version-control.html#specifying-files-to-commit) in the textbook.**

**Question 6.1** Multiple Choice:
<br> {points: 1}

Git has a distinct step of **adding** files to the staging area because:

A. Not all changes we make (i.e., files we create or edit) are ones that we want to push to our remote GitHub repository.

B. It allows us to edit multiple files at once, but associate particular commit messages with particular files (so that the commit messages can more specifically reflect the changes that were made).

C. This is technically required of all version control software.

D. A and C.

E. A and B.

*Assign your answer to an object called `answer6_1`. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. `"F"`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer6_1)).encode("utf-8")+b"82d81074c0157707").hexdigest() == "4c2c9a3af9b3aa9c1b31351cd854e4a23c2ae6c6", "type of answer6_1 is not str. answer6_1 should be an str"
assert sha1(str(len(answer6_1)).encode("utf-8")+b"82d81074c0157707").hexdigest() == "1b55f48e31b3707c6399def901a31febdc677537", "length of answer6_1 is not correct"
assert sha1(str(answer6_1.lower()).encode("utf-8")+b"82d81074c0157707").hexdigest() == "5fc5b9d566c2b9b364d9accca5a63d1718ef5a4a", "value of answer6_1 is not correct"
assert sha1(str(answer6_1).encode("utf-8")+b"82d81074c0157707").hexdigest() == "dd3c28cdf794f9cdf34d432903eab16f788d5817", "correct string value of answer6_1 but incorrect case of letters"

print('Success!')

## 7. Making the commit 

The next step is to **commit** our changes to our local Git repository. You can use the Jupyter Git extension to do this now. **This task corresponds to [this step](https://python.datasciencebook.ca/version-control.html#making-the-commit) in the textbook.**

**Question 7.1** True or false:
<br> {points: 1}

When we **commit** our changes to Git, the snapshot of changes, the commit message, the time and date stamp, and the user who committed the changes are all saved to the Git history on GitHub.

*Assign your answer to an object called `answer7_1`. Make sure your answer is a boolean (e.g. `True` or `False`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer7_1)).encode("utf-8")+b"06ac43a27281dd4e").hexdigest() == "1a2e7c0d2d3f97a69ff02f95ed4987853970082a", "type of answer7_1 is not bool. answer7_1 should be a bool"
assert sha1(str(answer7_1).encode("utf-8")+b"06ac43a27281dd4e").hexdigest() == "d0e752e8cbd62466f66748587877162af627778d", "boolean value of answer7_1 is not correct"

print('Success!')

## 8. Pushing the commits to GitHub

Finally, we are ready to send our changes (creating and adding code to `marg_vs_divorce_viz.ipynb`) to our remote repository through a process we call "pushing". Go ahead and do this now. **This task corresponds to [this step](https://python.datasciencebook.ca/version-control.html#pushing-the-commits-to-github) in the textbook.**

After completing pushing your work to the remote repository on GitHub, visit your repository on GitHub.com and check out what your awesome toy project looks like!!!

**Question 8.1** Multiple Choice:

Which statement below is **not** true?
<br> {points: 1}

A. Cloning and pulling a GitHub repository are the exact same thing.

B. Pushing with Git is the act of sending changes that were committed to Git to a remote repository, for example, on GitHub.com.

C. Pulling with Git is the act of collecting changes that exists in a remote repository, for example, on GitHub.com, that do not yet exist on the local computer you are working on (i.e., your workspace on the JupyterHub or your laptop).

D. You should push your work to GitHub anytime you want to share your work with others, or when you are done a work session and want to back up your work.

*Assign your answer to an object called `answer8_1`. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. `"F"`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer8_1)).encode("utf-8")+b"c34bf20431a9f37a").hexdigest() == "cadeddac8b846295ba8efef86d614e3bdab484ea", "type of answer8_1 is not str. answer8_1 should be an str"
assert sha1(str(len(answer8_1)).encode("utf-8")+b"c34bf20431a9f37a").hexdigest() == "56b220c14d001ccbdd3a081b001916bbde2bae21", "length of answer8_1 is not correct"
assert sha1(str(answer8_1.lower()).encode("utf-8")+b"c34bf20431a9f37a").hexdigest() == "97ce4537033b9a8b10e7d076175c03378f96e45c", "value of answer8_1 is not correct"
assert sha1(str(answer8_1).encode("utf-8")+b"c34bf20431a9f37a").hexdigest() == "00f226ae3b5b37a6aeca79b47db55c72b6a3a733", "correct string value of answer8_1 but incorrect case of letters"

print('Success!')

## 9. Giving collaborators access to your project

One of the advantages of using version control tools, such as Git and GitHub, is how it lets you collaborate. Let's get some practice starting down this path. Add one or more of your breakout group members to your GitHub repository as a collaborator. **This task corresponds to [this step](https://python.datasciencebook.ca/version-control.html#giving-collaborators-access-to-your-project) in the textbook.**

**Question 9.1** True or false:
<br> {points: 1}

You can clone or pull from any public remote repository on GitHub.com, however you can only push to public remote repositories on GitHub.com that you own are a collaborator on.

*Assign your answer to an object called `answer9_1`. Make sure your answer is a boolean (e.g. `True` or `False`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer9_1)).encode("utf-8")+b"9a882712e6d5127f").hexdigest() == "e5190871e42d741e6da7a0931d90d66f94ecc778", "type of answer9_1 is not bool. answer9_1 should be a bool"
assert sha1(str(answer9_1).encode("utf-8")+b"9a882712e6d5127f").hexdigest() == "e6c2f1d49aa0afed1b35609f4677b3bd743aebec", "boolean value of answer9_1 is not correct"

print('Success!')

#### (Optional) more collaboration practice!

If you want to practice more Git & GitHub skills for collaboration, ask someone in your team if you can collaborate and send an edit to their project. To do this, they will need to add you as a collaborator, and then you will need to clone their repository to your JupyterHub. After that, you can edit some files (or create a whole new one), save your work, and then use the Jupyter Git extension to add, commit, and push your changes to their remote GitHub repository.

## 10. Communicating using GitHub issues

It's easy for project communications to get lost in email or whatever messaging platform you use to communicate with your team. GitHub issues are an excellent tool explicitly designed for project collaboration as they are "attached" to the project's remote GitHub repository. Your task here is to go to the issue tab for your project and create an issue about something you might want to improve about your project. **This task corresponds to [this step](https://python.datasciencebook.ca/version-control.html#communicating-using-github-issues) in the textbook.**

**Question 10.1** Multiple Choice:
<br> {points: 1}

Which statement below is **not** a reason why GitHub issues are an ideal medium for project-specific communications?

A. Issues are part of each GitHub repository, and thus "attached" to the project.

B. Issues only persist while they are open, and immediately deleted when they are closed.

C. Issues are easily searchable using GitHub’s search tools.

D. All issues are accessible to all project collaborators, so no one is left out of the conversation.

E. Issues can be set up so that team members get email notifications when a new issue is created or a new post is made in an issue thread.

*Assign your answer to an object called `answer10_1`. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. `"F"`).* 

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer10_1)).encode("utf-8")+b"1f8e6bc5630f5957").hexdigest() == "d86b622874c0fc8afcc1bcea262a17e49d32befe", "type of answer10_1 is not str. answer10_1 should be an str"
assert sha1(str(len(answer10_1)).encode("utf-8")+b"1f8e6bc5630f5957").hexdigest() == "5b161ce840819ec2fc718b784e1eceedb5ca9dd6", "length of answer10_1 is not correct"
assert sha1(str(answer10_1.lower()).encode("utf-8")+b"1f8e6bc5630f5957").hexdigest() == "12d725f71534bc8e7840215893254e23b3c4c0d8", "value of answer10_1 is not correct"
assert sha1(str(answer10_1).encode("utf-8")+b"1f8e6bc5630f5957").hexdigest() == "a38311cb4a43abb67a6742a94f6c3b79a69987d3", "correct string value of answer10_1 but incorrect case of letters"

print('Success!')

#### (Optional) Even more collaboration practice! 

Visit a team member's GitHub repository and leave a polite but constructive message on how they could improve their project. 


## Nice work! You're all done here!

<img src="https://media.giphy.com/media/lcYFNTaz4U9jy/giphy.gif">