This post provides an overview of the topics that will be covered in each session.

Session 1: Version Control

Reshama Shaikh

This session will cover Git, the open source version control system for storing you programs The following topics will be covered:

  • Introduction to Git and GitHub; explain the difference between the two
  • Introduce GitHub website; review setting and options on GitHub Account
  • Create a repo on GitHub website
  • Set up a repo on GitHub and invite collaborators
  • Fork and clone user or organization repo
  • Mark changes and track them using git
  • Use branches on GitHub (*time permitting)
  • Undo changes – revert commits (*time permitting)

Preparation

  1. Ensure Git is installed.
  2. Save your GitHub user ID and password somewhere that is easy to reference.
  3. Complete this brief pre-course preparation list

Workshop Info

Twitter: @reshamas

Session 2: Data Management

April Clyburne-Sherin

This session will develop your skills in organization, documentation, automation and dissemination of your research. The following topics will be covered:

  • Data collection
  • Repository organization (separate code and data)
  • Configuring run environments
  • Documentation:
    • Specifying dependencies
    • Create a README
    • Creating data dictionaries
  • Automation:
    • Creating a master script
    • Creating relative paths
  • Dissemination
    • Specifying a license
    • Publishing your code

Session 3: Clean Code and Documentation

Daniel Smith

This session will cover code documentation and best practices in code style.

The following skills will be covered:

  • The PEP8 style guide for Python.
  • The YAPF formatter for automatic PEP8 implementation.
  • Beyond style guides, tips for writing readable software
  • Getting started with Sphinx documentation.
  • Automatic function and class documentation
  • Sphinx shortcuts and usage ideas.

Session 4: Package management and Environments

Christopher J. Wright

Software packages are the most common way to distribute and install software in the scientific/data/developer fields. Gone are the days of shipping CDs and floppy disks of software around the globe or downloading source code and hoping that everything is compatible. Package managers can have new software on your machine in seconds, and automatically keep everything up to date and compatible with very little human intervention. However, using packages and making packaging can be a bit of an art.

In this session I will discuss software packaging from three perspectives: users, maintainers, and backend engineers.

The user’s perspective will focus on:

  • What are some of the common package managers and what do they do
  • How do we use package managers to get software and keep it up to date
  • What are some best practices when using a package manager to avoid headaches

The maintainer’s perspective will focus on:

  • How do I know if my code is ready to be packaged
  • How do I package my code
  • How do I keep my packages up to date

The backend engineer’s perspective (time permitting) will focus on:

  • How this all works under the hood
  • What are some of the fronters for packaging and how do they impact user and maintainer experience.

Finally we’ll make packages of our own (if anyone has code ready to be packaged).

Session 5: Testing your code

Jane Adams

Scientists are always hearing that they should be testing their code, but rarely do they hear what that would actually look like. In this session, we will introduce the principles of unit testing, and outline the major assumptions and consequences of these principles. You will learn how to write unit tests, as well as how to determine what unit tests your code needs and what to do with hard-to-test code. You will get hands-on experience writing unit tests using Python’s unittest library, and learn about additional tools and best practices that you can adopt to efficiently incorporate unit testing into your everyday coding workflow. We will also discuss the limits of unit testing in the sciences specifically and discuss alternative testing approaches and libraries that can handle unique scenarios like stochasticity.