First Time Coding in Python?#
This section will help you figure out how to run scripts using Python. As you may be aware of by now, Python is an interpreted language. This means that you donât need to compile code before running it, but it comes with a downside - you need to have an environment that meets all of the requirements of your script before you can run the code.
What do I mean by environment?#
In this context, environment refers to the base Python interpreter, which translates the code you write in Python to commands computers can understand and execute, plus any other required libraries and modules (i.e., code written by others that has conveniently been shared with you and which generally makes coding more straightforward). These libraries allow you to, e.g., efficiently carry out numerical operations (Numpy), effectively handle data structure and data analysis tasks (Pandas), and even implement state of the art machine learning solutions (SciKit Learn, Tensorflow, PyTorch).
Today, weâll be looking at three options for running Python code:
Google Colaboratory (oftened shortened to Colab)
Colab is a service run by Google, and gives you (limited) access to their CPUs, GPUs, and even TPUs! Colab is based on Jupyter Notebooks, and is meant to allow you to write and execute Python code from within your browser free of charge. It has many popular machine learning libraries installed, and has APIs to interact with Google services (e.g., Drive). Colab, however, requires you to log in with a Google account, which you may understandably may not have or want to have.
Plusses: Free of Charge, Integrates with Google services, Integrates with GitHub, Requires no setup on your end for most tasks
Minuses: Your code will be stored and run on Google servers, Requires the use of a Google Account, Limited Resources, Limited to Interactive Scripts
Binder
Like Colab, Binder is an online platform that allows you to write and execute Python code from your browser. Unlike Colab, The Binder Project is an open-source initiative that allows the creation of custom computing environments for interactive and reproducible data science and scientific computing. With it, you can turn GitHub repositories containing Jupyter notebooks or other data science code into executable environments that others can launch and interact with online, without the need to install anything locally.
Plusses: Free of charge, Open Source, Integrates with GitHub, Requires no setup on your end for most tasks, Support for Complex Environments (including Docker)
Minuses: Limited Resources, Setup complexity for new users, Building time can be long, Your code will be stored and run on external servers, you cannot save changes made to the files in the server; you have to download the files
Conda (using Miniconda)
Conda is a package manager that allows you to create and manage environments for Python and other languages. It is a very powerful tool, and is the recommended way to manage your Python environments. Conda is available in two versions: Anaconda and Miniconda. Anaconda is a full-fledged distribution of Python, and comes with many pre-installed packages. Miniconda, on the other hand, is a minimal version of Anaconda, and only comes with the bare minimum to get you started. We will be using Miniconda today, as it is much lighter and easier to install. Additionally, we can use Conda with a variety of IDEs (Integrated Development Environments), which allow us to write and execute Python code from our computers.
Plusses: Free of charge, Open Source and Cross-Platform, Allows you to create and manage your own environments, handles library dependency compatibility for your environments, Your code will be stored and run on the computer where Conda is installed (You can run code offline)
Minuses: Setup complexity for new users, Requires you to install software on your computer, Requires you to manage your environments
Letâs go over how to use each of these options.
Google Colaboratory đ„Œ#
To use Colab, you need to have a Google account. If you donât have one, you can create one here. Once you have an account, you can access Colab here. You should see a screen like this:
From this page, you can open up a new notebook by clicking on âNew Notebookâ or by clicking on âFileâ and then âNew Notebookâ. You should then see a screen like this:
You can now write and execute Python code in this notebook. To execute a cell, you can either click on the âplayâ button to the left of the cell, or you can press âShift+Enterâ on your keyboard. Try it out by copying the following text into a cell and executing the code:
print('Here is the list of all the packages installed in this environment:')
!pip list
This is nice, but what weâre most interested in today is to be able to run tutorials hosted on GitHub. There are a couple of ways to do this,
If the jupyter notebook hosted on GitHub and was uploaded via the Colab interface, you can simply click on the âOpen in Colabâ button at the top of the page.
This will open the notebook in Colab, and you can then execute the code as you would normally. You can find an example of this here. This notebook is part of a series materials developed for a course on Machine Learning given at UNIL, and is hosted on GitHub.
If the notebook is hosted on GitHub, but was not uploaded via the Colab interface, you can use the URL pattern in order to open it on Colab.
As an example, weâll use the notebooks developed by Jesper Dramsch for the EuroSciPy 2022 tutorial Machine Learning for Science Reproducibility. Though the materials have been prepared for opening with Colab and Binder (as evidenced by the open on widgets on the main Readme.md), letâs pretend they werenât and view the first notebook on GitHub. When viewing the hosted notebook in your browser, you should see that the URL is
https://github.com/JesperDramsch/ml-for-science-reproducibility-tutorial/blob/main/book/notebooks/0-basic-data-prep-and-model.ipynb
.
To open this notebook in Colab, you need to replacegithub.com
withcolab.research.google.com/github
.
The resulting URL should be
https://colab.research.google.com/github/JesperDramsch/ml-for-science-reproducibility-tutorial/blob/main/book/notebooks/0-basic-data-prep-and-model.ipynb
If you open this URL, you should see the notebook open in Colab, and you can execute the code as you would normally. Try it out by yourself with the following notebook:https://github.com/JesperDramsch/ml-for-science-reproducibility-tutorial/blob/main/book/notebooks/1-model-evaluation.ipynb
You can also use Colabâs
Open Notebook
command from the file menu, which will allow you to search through GitHub repositories by user, as shown in the screenshot below.
Binder đ#
Binder is a bit more complicated to use than Colab, but it is also more powerful. We wonât go into the details of how to create a Binder environment, but we will go over how to use one.
Letâs start by opening the Binder environment for the EuroSciPy 2022 tutorial Machine Learning for Science Reproducibility. You should see a screen like this:
This screen indicates that Binder is building the environment for you.
This can take a while, so please be patient.
Once the environment is built, you should see a screen like this:
Unlike with Colab, Binder launches a full instance of JupyterLab and not just the notebook interface. This allows you to open and edit multiple notebooks at the same time, and even to open other files (e.g., images, text files, etc.)! You can use the file explorer on the left plane to browse the contents of the repository, and you can open a notebook by double-clicking on it. Here is the first notebook in the repository, 0-basic-data-prep-and-model.ipynb
:
Note that if you want to run this notebook, at the time of writing, you should change the line
DATA_FOLDER = Path("..") / "data"
to
DATA_FOLDER = Path("../..") / "data"
Conda (using Miniconda) đ#
Conda is a much more flexible solution for running Python code, but it requires you to install software on your computer. In this tutorial, weâll go over how to install Miniconda on Windows and Linux (apologies to those on OSX - I donât have access to a Mac, so I canât provide the same level of instructions for you - it should however be quite similar).
Before we get into the installation, letâs go over some basic terminology:
Terminology#
Package: A package is a collection of code that can be installed and used in Python. Packages can be installed from a variety of sources, including the Python Package Index (PyPI), Anaconda Cloud, and Conda-Forge. Packages can be installed using the
conda
command, or usingpip
(the Python Package Installer). We will be relying onconda
, as it ensures that all of the dependencies of a package are installed as well (and checks for conflicts between packages)Virtual Environment: Like we mentioned before, an environment is a collection of packages that are installed in a specific location - Python environments contain all of the software needed to interpret code and run Python Scripts. Virtual environments are useful because they allow you to have different versions of packages installed in different environments, and they allow you to easily share your environment with others. Environments can be created using the
conda
command, or usingpipenv
(the Python Package Installer Environment).Channel: A channel is a source for packages. The default channel is the Anaconda channel, which contains packages that have been tested by the Anaconda team. Other channels include Conda-Forge, which contains packages that have been tested by the Conda-Forge team, and PyPI, which contains packages that have been uploaded by the community. Channels can be specified when installing packages using the
conda
command.
Installing Miniconda#
To install Miniconda, you need to download the installer for your operating system from here. Once you have downloaded the installer, you can run it to install Miniconda. To jump to the instructions for Linux, click here. To jump to the instructions for Windows. The ones for Windows are right below.
Windows đ#
Once you have downloaded the installer, run it and follow the instructions with the default settings. You should now have a program called Anaconda Prompt
and Anaconda Powershell Primpt
installed on your computer. These programs allows you to run commands in a terminal with the correct environment activated - letâs start by opening âAnaconda Powershell Promptâ. You should see a screen like this:
Notice the (base)
at the beginning of the line - this indicates that the base
environment is currently active. Letâs create a new environment called test_env
by running the following command:
conda create -n test_env
You should be asked to confirm the installation of the new environment by typing y
and pressing Enter
. Once the environment is created, you should see a screen like this:
If you run the conda env list
command, you should now see both the base
and test_env
environments listed. Letâs go ahead and activate the test_env
environment by running the following command:
conda activate test_env
If you run conda list
, youâll get a list of the packages installed on the virtual environment. For the moment, that list is empty! (We did, after all, just create a blank virtual environment)
Letâs go ahead and install ipython, short for interactive python, and which will let us run Python code interactively. To do this, weâll run the following command:
conda install ipython
Conda will generate a list of packages that will be installed, and will ask you to confirm the installation by typing y
and pressing Enter
. Once the installation is complete, you should see a screen like this:
Now type in ipython
and press Enter
. You should see a screen like this:
Try running a small program that prints out the numbers from 0 to 9:
[print(number) for number in range(10)]
You should see the following:
Once youâre done with ipython, simply type exit()
and press Enter
to exit the program.
Letâs now install a few commonly used packages. To do this, weâll use a different channel than the default one. Simply add -c conda-forge
as an option to the install command, as shown below:
conda install -c conda-forge numpy pandas matplotlib
Congratulations, you have a working Python environment! Weâll go over how to install IDEs in the next section; just skip over the Linux section for now.
Linux đ§#
After youâve downloaded the installer, open a terminal and navigate to the directory where you downloaded the installer. Assuming you used Firefox, the default command to take you to the directory where the installer is located is:
cd ~/Downloads
Once youâre in the correct directory, run the following command to make the installer executable:
chmod +x Miniconda3-latest-Linux-x86_64.sh
Then, run the installer:
bash Miniconda3-latest-Linux-x86_64.sh
You will be prompted to accept the license agreement. Press Space
to scroll through the license agreement, and then type yes
and press Enter
to accept the license agreement. You will then be asked to confirm the installation location. Press Enter
to accept the default location. You will then be asked if you want to initialize Miniconda3 by running conda init
. Type yes
and press Enter
to accept. Restart your terminal, and you should now see (base)
next to the prompt. This indicates that the base
environment is currently active:
Letâs create a new environment called test_env
by running the following command:
conda create -n test_env
You should be asked to confirm the installation of the new environment by typing y
and pressing Enter
. Once the environment is created, you should see a screen like this:
If you run the conda env list
command, you should now see both the base
and test_env
environments listed. Letâs go ahead and activate the test_env
environment by running the following command:
conda activate test_env
If you run conda list
, youâll get a list of the packages installed on the virtual environment. For the moment, that list is empty! (We did, after all, just create a blank virtual environment)
Letâs go ahead and install ipython, short for interactive python, and which will let us run Python code interactively. To do this, weâll run the following command:
conda install ipython
Conda will generate a list of packages that will be installed, and will ask you to confirm the installation by typing y
and pressing Enter
. Once the installation is complete, you should see a screen like this:
Now type in ipython
and press Enter
. You should see a screen like this:
Try running a small program that prints out the numbers from 0 to 9:
[print(number) for number in range(10)]
You should see the following:
Once youâre done with ipython, simply type exit()
and press Enter
to exit the program.
Letâs now install a few commonly used packages. To do this, weâll use a different channel than the default one. Simply add -c conda-forge
as an option to the install command, as shown below:
conda install -c conda-forge numpy pandas matplotlib
Congratulations, you have a working Python environment! Weâll go over how to install IDEs in the next section.
Integrated Development Environments (IDEs)#
IDEs are software that provide you with tools for writing code. While it is true that you could write your programs in any text editor, youâll quickly run into issues - did you spell everything correctly? What arguments did my_function
take again? What part of your code caused your script to fail? What is the current value of my_variable
? IDEs can help answer these and many other questions.
Today, weâll be looking at three IDEs: Jupyter, Spyder, and Visual Studio Code (VScode for short). Jupyter is a web-based IDE that allows you to write and execute code in a web browser. Spyder is a desktop IDE that allows you to write and execute code in a single window. VScode is a desktop IDE that allows you to write and execute code in a single window, and which also provides you with a variety of tools for debugging and managing your code (including plugins for GitHub Copilot - an AI assistant that is free for students - at the time of writing, at least).
Jupyter Notebook đ#
Weâll be using Jupyter Notebook for this tutorial, but there is also a JupyterLab interface that is more similar to Spyder and VScode. To install Jupyter, simply run the following command:
conda install -c conda-forge jupyter
THATS IT!
You now have a working Python environment with Jupyter installed! To launch Jupyter, simply run the following command:
jupyter notebook
You should see a screen like this:
You can now browse to wherever you have your notebooks stored and open them. You can also create new notebooks by clicking on the New
button on the top right corner of the screen.
For the documentation on Jupyter Notebook, please refer to the official documentation.
Jupyter Lab đ§Ș#
If you prefer the JupyterLab interface, you can install it by running the following command:
conda install -c conda-forge jupyterlab
You can then launch JupyterLab by running the following command:
jupyter lab
You should see a screen like this:
You can now browse to wherever you have your notebooks stored and open them. From the launcher, you can create a new Jupyter Notebook (and you can open additional launches by clicking on the + button at the top left corner of the screen).
For the documentation on JupyterLab, please refer to the official documentation.
Spyder đ·ïž#
Spyder is a desktop IDE that allows you to write and execute code in a single window. (If youâre curious, it stands for Scientific Python Development EnviRonment) To install Spyder, simply run the following command:
conda install -c conda-forge spyder
However, Spyder isnât configured to run Jupyter Notebooks by default, so weâll need to install a few additional packages. We will, after all, mostly work with Jupyter Notebooks. (Though you can do everything that is done in the course with scripts instead!) To install the required packages, run the following command:
conda install -c conda-forge spyder-notebook
Letâs go ahead and start Spyder by running the following command:
spyder
You should see a screen like this:
Spyder will open a temporary script file upon launching (temp.py
). Youâll notice at the bottom of the screen that there are two tabs: Editor
and Notebook
. Click on the Notebook
tab to open the notebook interface. You should see a screen like this:
From this screen, you can create Jupyter Notebooks and open existing ones. Congratulations, you now have a working Python environment with Spyder installed! đ
For the documentation on Spyder, please refer to the official documentation. For the documentation on Spyder-Notebook, please refer to the official documentation.
Visual Studio Code (VScode) đ#
VScode is a powerful, desktop IDE that allows you to write and execute code in a single window, and which also provides you with a variety of tools for debugging and managing your code (including plugins for GitHub Copilot - an AI assistant that is free for students - at the time of writing, at least). To install VScode, youâll have to download the installer from the VScode website and run it.
On windows, simply launch the installer. On Linux, navigate to the directory where you downloaded the installer and run the following command:
sudo dpkg -i code_1.60.2-1632313585_amd64.deb
Once you have VScode installed, launch it (either by clicking on the appropriate icon on Windows, or by running code
in a terminal on Linux). You should see a screen like this:
On the left, youâll see an icon that looks like a set of 3 blocks about to be joined by a fourth block. Click on it to open the extensions menu. Search for Python
and install the first result. You should see a screen like this:
Go ahead and install the Jupyter extension on your own đ.
Now that you have both of these installed, click on the Explorer
icon on the left - it looks like two files stacked on top of each other. You should see a screen like this:
Click on the Open Folder
button, and navigate to the folder where you have your notebooks stored. Select the folder and click on Open
. You should see a screen like this:
Letâs go ahead and start a new notebook. Click on the New File
button on the top left corner of the screen, and type in test.ipynb
. You should see a screen like this:
You should see âselect kernelâ on the top right of the screen - click on it, choose Python Environment from the prompt, and select your virtual environment (e.g., test_env) from the list.
Try importing a library (you might be prompted by vscode to install some files required for the interactive running of jupyter notebooks - go ahead and accept this prompt).
Congratulations - you can now run Jupyter Notebooks from VScode! đ
For the documentation on VScode, please refer to the official documentation.
Using Git and GitHub#
Git is a version control system that allows you to keep track of changes made to your code. GitHub is a service that allows you to store your code online, and which provides you with a variety of tools for collaborating with others. We wonât go over how to use Git and GitHub in this tutorial, but we will go over how to install Git and how to clone a repository from GitHub. Additionally, weâll go over how to use the integrated Git tools in VScode. Finally, weâll discuss how to set up Github Copilot in VScode đ€.
Installing Git on Windows#
To install Git on Windows, youâll have to download the installer from the Git website and run it - leave all of the default settings clicked. Once you have Git installed, you can open a terminal (e.g., the Powershell) and run the following command to check that it is installed correctly:
git --version
Installing Git on Linux#
To install Git on Linux, simply run the following commands:
sudo apt-get update
sudo apt install git
Answer y
when prompted to confirm the installation. Once you have Git installed, you can open a terminal and run the following command to check that it is installed correctly:
git --version
Setting up your GirHub account#
You donât need to create an account on GitHub to clone a repository, but you will need one if you want to push changes to a repository. To create an account, go to the sign-up section of the GitHub website. Youâll be asked to enter your email (REMEMBER TO USE YOUR ACADEMIC EMAIL IF YOUâRE PLANNING TO REQUEST STUDENT/TEACHER BENEFITS) in the following screen:
Follow the prompts to create your account, and then verify your email address. You will then be asked for your preferences - if youâre a student or teacher, you can apply for benefits associated with your status in this process.
Once youâve done that, letâs go ahead and set up an SSH key. SSH keys are a way to identify trusted computers, without involving passwords. You will need to set this up in order to upload changes to GitHub - password authentication is no longer supported đ
.
Setting up an SSH key on Linux#
To set up an SSH key on Linux, youâll need to
Open a terminal and run the following command:
ssh-keygen -t ed25519
Youâll be asked to enter a file in which to save the key. Press Enter
to accept the default location. Youâll then be asked to enter a passphrase. You can either enter a passphrase or leave it blank. If you leave it blank, youâll be asked to confirm that you want to leave it blank - in our case weâll leave it blank (NOTE THAT THIS IS LESS SECURE). Once youâve done that, you should see a screen like this:
You can now view your SSH key by running the following command, after which you should copy the key to your clipboard.:
cat ~/.ssh/id_ed25519.pub
Now go to your GitHub settings
Click on
SSH and GPG keys
on the left side of the screenClick on
New SSH key
Give your key a title (e.g.,
My SSH Key
)Paste your key into the
Key
fieldClick on
Add SSH key
Your SSH key should now appear in the list of SSH-keys on your GitHub settings, similar to the screenshot below:
Setting up an SSH key on Windows#
To set up an SSH key on Windows, *after youâve installed Git youâll need to:
Open the start menu and search for
Powershell
. Right click on the icon and selectRun as administrator
.Activate the ssh-agent by running the following command:
Get-Service ssh-agent | Set-Service -StartupType Automatic -PassThru | Start-Service
Run the following command to start the ssh-agent without restarting your computer:
start-ssh-agent.cmd
Run the following command to generate an SSH key:
ssh-keygen -t ed25519
Youâll be asked to enter a file in which to save the key. Press
Enter
to accept the default location. Youâll then be asked to enter a passphrase. You can either enter a passphrase or leave it blank. If you leave it blank, youâll be asked to confirm that you want to leave it blank - in our case weâll leave it blank (NOTE THAT THIS IS LESS SECURE). Once youâve done that, you should see a screen like this:
Navigate to the directory where you saved your SSH key (e.g.,
C:\Users\username\.ssh\id_ed25519.pub
) and open the file in a text editor (e.g., Notepad). Copy the contents of the file to your clipboard. You should see something like this:
Go to your GitHub settings
Click on
SSH and GPG keys
on the left side of the screenClick on
New SSH key
Give your key a title (e.g.,
My SSH Windows Key
)Paste your key into the
Key
fieldClick on
Add SSH key
Your SSH key should now appear in the list of SSH-keys on your GitHub settings, similar to the screenshot below:
Cloning a GitHub repository#
You should now be able to clone repositories from GitHub! Letâs go ahead and clone the repository for the EuroSciPy 2022 tutorial Machine Learning for Science Reproducibility. Open the repository in your browser, and click on the Code
button, then on the SSH tab. You should see a screen like this:
Open a terminal and navigate to the directory where you want to clone the repository. Then, run the following command:
git clone git@github.com:JesperDramsch/ml-for-science-reproducibility-tutorial.git
If this is the first time youâre cloning a repository, youâll be asked to confirm that you want to connect to the host. Type yes
and press Enter
to confirm. You should see a screen like this:
And if you navigate to the directory where you cloned the repository, you should see a folder with the name of the repository, as shown below:
You can now open the repository in your IDE of choice and start working on it! đ
Using Git in VScode#
VScode has a built-in Git interface that allows you to manage your repositories. Letâs go ahead and open the repository we just cloned in VScode. Open VScode, and click on the Explorer
icon on the left - it looks like two files stacked on top of each other. You should see a screen like this:
If you now click on the Source Control icon (the one that looks like a set of circles joined by wires), you should see a screen like this (note that you may have to click on the âManage Unsafe Repositoriesâ button and select âTrust the authorsâ to manage the repository - this is, after all, a repository that you cloned from the internet from a user that you donât know):
Whenever you make changes to your repository, you will be able to push and pull changes using the buttons at the top of the screen. You can also use the +
and ...
buttons to stage and commit changes.
WE WONT BE GOING OVER HOW TO USE GIT IN THIS TUTORIAL, but you can find the documentation here.
Additionally, please donât try to push changes to the repository we cloned - itâs not yours, and you donât have permission to do so. Iâd also rather not to burden the repository owner with a bunch of pull requests from people who are just trying to learn how to use Git đ
.
Setting up GitHub Copilot in VScode#
First, open up your Github account settings - you should see a screen like this:
On the left column, you should see a link to Copilot
under the Code, Planning, and Automation
category. If you have a student account, you should be able to activate it for free so long as you subscribed to Github Global Campus when setting up your account. (Alternatively, you can subscribe to it now by following this link) If youâre not a student, you can still try it out for free with a trial, and at the time of writing the subscription is $10 a month.
Once your copilot license is active, go ahead and open up VSCode and head to the extensions tab. Search for Copilot
and install the first result. You should see a screen like this:
On the left, you should see a user icon with a green dot next to it. Click on it and select the option to sign in Github to use Github Copilot
. This will redirect you to an authorization screen that will look like this:
Accept any prompts that pop up, and you should be good to go! đ
Now, whenever you are typing in a document within VSCode, you should see suggestions from Copilot pop up. You can accept these suggestions by pressing Tab
.
Finally, you can sign up for the GitHub Copilot Chat Beta here (github-copilot/chat_waitlist_signup) if youâre interested in trying it out. The chat interface allows you to have a convenient way of interacting with Copilot - for example, here is a quick conversation asking Copilot to make an LDA (linead discriminant analysis) model that fits 5 randomly generated features:
Note that copilot can handle natural language requests, even with spelling mistakes! đ
The full text of the code is:
from sklearn.datasets import make_classification
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# Generate random dataset with 5 features
X, y = make_classification(n_samples=100, n_features=5, n_informative=3, n_redundant=0, n_classes=2, random_state=42)
# Fit LDA model to dataset
lda = LinearDiscriminantAnalysis()
lda.fit(X, y)