===== Github Contributor Best Practices =====
Contributors: Anthony Sena, Chris Knoll, Frank DeFalco, Peter Rijnbeek
==== Overview ====
For reference, we reviewed some common release/code management articles such as [[http://nvie.com/posts/a-successful-git-branching-model/|Gitflow]] and [[http://endoflineblog.com/gitflow-considered-harmful|alternatives]] to form the basis of this article. The goal with this approach was to keep the code management process as simple as possible while leveraging the functionality available in Git and GitHub. We've also formally adopted the semantic versioning scheme for all software which is already in practice in most of the OHDSI code repositories.
==== Assumptions and Prerequisites ====
This article assumes you have familiarity with [[https://git-scm.com/|Git]] and [[https://github.com/|GitHub]]. If you do not, here is a nice collection of guides to get you started:
https://help.github.com/articles/good-resources-for-learning-git-and-github/
For the purposes of this article, we assume that you have installed Git and have access to a Git repository. You can set up your own Git repository to follow along with the examples in this article. As a note, all of the code in this article is using the Git command line on Windows. There should be no material difference between running Git commands on Windows versus another operating system but if you do experience any problems please report them on the OHDSI forum:
http://forums.ohdsi.org/c/developers
==== Contributing code ====
If you decide that you'd like to contribute code to OHDSI, the following sections will provide you with approaches for making contributions. Throughout this section, we'll refer to two distinct groups: **internal collaborators** and **external collaborators**. The difference between these groups is centered around their access to the repository. **Internal Collaborators** have rights to merge and push their changes directly into the repository while **external collaborators** do not. External collaborators must fork a repository in order to do their work and submit a pull request. This [[http://stackoverflow.com/questions/3611256/forking-vs-branching-in-github|Stack Overflow Article]] provide an explanation with supporting details.
Whether you are an internal or external collaborator, the process of contributing changes is the same: you clone the repository, create a branch, make your changes, and submit a pull request. The information presented here will be equally valid from either perspective because the steps of branching and pull requests are the same for both groups. The main difference is which repository the collaborator will be branching from: the OHDSI repository or a private fork of the repository.
==== Code Management ====
The following section is described how to work with source code in a Git repository. It provides a general overview of using Git and explains how to provide code contributions. Subsequently, the [[Release Management]] article will focus on best practices and conventions for controlling versions and releases of the source code in a repository.
=== Forking and Cloning ===
In order to begin working on OHDSI code, you will either need to clone a repository (repo) either directly from OHDSI or from a fork of an OHDSI repo. If you do not have permissions to push changes into a repo (which is usually the case when you are an external collaborator), then you can go to the main OHDSI repository site (example: https://github.com/ohdsi/atlas) and click the Fork button in the top right area of the screen. This will create a copy of the repository under your own user in GitHub. For example, if you were to fork https://github.com/ohdsi/atlas, it would create the new repository under https://github.com/{your github user}/atlas. You have full permissions to commit changes to the personal repo you forked, and can submit ehancements to the OHDSI repo via pull requests (discussed later).
=== Working with a repository ===
Once you know the repository you want to work on (either a fork or directly in the OHDSI repo), you will then need to clone it locally. Create a directory that you'd like to use to store your Git code (i.e. C:\Git). Then using the Git command line, navigate to the directory you created and clone a repository using the following command:
$ cd C:\git
$ git clone "https://github.com/{owner}/{RepoName}.git"
This will create a directory {RepoName} under the directory you executed the git command (in this example, C:\Git). This will also copy down the repo contents, and set your active branch to 'master'. We will talk about branches in a later section.
An additional step for **forked repositories**: In order to remain in sync with the original repository (the one that was forked from), a special remote reference will be needed in order to grab the new changes in the 'upstream' repository into the forked repository. We will get into the details of this remote reference in the section for syncing upstream changes, but this is the command that should be executed to add the upstream reference to your forked repo (from [[https://help.github.com/articles/configuring-a-remote-for-a-fork/|Configuring a remote for a fork]] in GitHub help):
$ git remote add upstream https://github.com/ORIGINAL_OWNER/{RepoName}.git
We will describe this 'upstream' remote in the later section 'Getting Changes'
=== Viewing the repository history ===
To see the history in the repo, you can use any of the following commands:
$ git log
$ git log --graph
$ gitk
gitk will launch a UI that let's you see the commits in a visual graph and is the recommended commit history viewer.
If everything looks good, then congratulations! You have cloned your repository and are ready to contribute!
=== Working in branches ===
When you want to add a new feature or contribute a fix, you will always start by creating a branch. To see the list of branches in the repository:
$ git branch
You will see something similar to the following:
$ git branch
Branch 1
Branch 2
(*) master
The asterisk will provide you with an indicator of your current branch which in this case is "master". The "master" branch is, by convention, the main branch where the most stable code will live. It is no different from any other branch except that is never removed.
Let's start by creating a new branch for our work by issuing the command:
$ git checkout -b "new-feature-branch"
You'll then see the following:
$ git checkout -b "new-feature-branch"
Switched to a new branch 'new-feature-branch'
Git has created a new branch called "new-feature-branch" and made it your active branch. You can verify this by running the git branch as we did above
$ git branch
Branch 1
Branch 2
master
(*) new-feature-branch
Now you can begin working by adding new commits to the repository. The new commits will be only visible in the 'new-feature-branch'. These commits are merged into the 'master' branch via Pull Requests.
=== Committing changes ===
As you develop your code, you will be changing files and adding them to the repository by using the "commit" command. If you'd like to commit all of your changed files, you can do this by issuing the following command:
$ git commit -a -m ""
This is a good option if you know all your changed files relates to a single feature or bug-fix. However, if you want to divide commits into separate logical units you can perform the following steps:
$ git add {file 1}
$ git add {file 2}
$ git add {file 3}
$ git commit -m "Fixes #125"
$ git add {file 4}
$ git add {file 5}
$ git add {file 6}
$ git commit -m "Fixes #128"
The above adds file 1, file 2 and file 3 into one commit, and file 4, file 5 and file 6 into a second commit, each commit having a different message. This is sometimes useful in order to organize your commits into logical pieces.
=== Pushing Changes and Pull Requests ===
Once you have completed your local changes, you will want to submit them for inclusion in the main repository.
First, you will need to "push" your changes from your local repository to your remote repository using the "git push" command:
$ git push -v origin --all
This will make your branch available on GitHub as a source for a pull request.
GitHub has an [[https://help.github.com/articles/about-pull-requests/|excellent]] article on initiating pull requests. Please refer to this article for detailed instructions. The important item to note is: when creating the pull request, the **base** will be set to the repository's branch that you wish to contribute to, and the **head** branch will the the branch that your contributions were made.
If you wanted to contribute your commits from 'new-feature-branch' in your repository, you would navigate to your branch in GitHub, click New pull request, and then chose the target repository (OHDSI/Atlas for example) and the branch to apply your commits to (usually 'master'). The only difference between an internal collaborator Pull Request and an External Collaborator Pull Request is that the Internal Collaborator will be making a pull request between 2 branches within the OHDSI repository, and the External Collaborator will be making a pull request between the branch in the External Collaborator's fork and the branch in the OHDSI repository. Otherwise, it is exactly the same process.
=== Fetching and Merging changes ===
At some point during your feature branch development, another pull request may have been applied to master, or some other commit may have been applied to master before you issued your pull request. This means that your branch is 'behind' master by a number of commits. Not only that, it means that there's new code in master that you should be using in your branch so that when it's time to commit to master, you don't have any conflicts or bugs that could arise from the code changes. To solve this, you "rebase" your branch onto the tip of master.
Before you rebase, however, you must ensure that you have the latest version of master from the remote repository.
$ git fetch origin
remote: Counting objects: 72, done.
emote: Total 72 (delta 24), reused 24 (delta 24), pack-reused 48
Unpacking objects: 100% (72/72), done.
From https://github.com/OHDSI/WebAPI
1327c1f..0f3297c shiro -> origin/shiro
In the above example, the 'shiro' branch was updated to a new commit based on changes in the remote shiro branch.
**Note:** This command returns no output if nothing has changed on the remote.
Once the remote branches (including master) have been fetched, you can then rebase your feature branch on top of the latest tip of master.
$ git rebase origin/master new-feature-branch
**Note:** the new-feature-branch parameter is not required if the active branch is 'new-feature-branch'. Additionally, after you execute this command the active branch will be set to 'new-feature-branch'.
Visually, the rebase performed like this:
{{:development:images:rebase.png}}
Now that the new-feature-branch has been moved to the tip of origin/master, all changes that have been made to origin/master are now reflected in new-feature-branch. It is important to remember that before creating a Pull Request from your feature branch that your branch shows 'zero commits behind master' before it is created. If your branch is behind master, it must be rebased before submitting a pull request.
=== Working inside a Shared Branch ===
For private, single-developer branches, rebasing is the cleanest method of incorporating upstream changes into your branch. However, once a branch is pushed to the remote repository and other developers are working together in the same branch, rebasing should be avoided. Instead, each developer pulls changes from the remote repository and pushes commits as needed:
Developer 1 creates the new feature branch:
$ git checkout -b shared-feature
Switched to a new branch 'shared-feature'
...hack hack hack....
$ git add .
$ git commit -m "my new feature"
From here, Developer 2 may want to get involved with this feature, so Developer 1 pushes the branch, and Developer 2 pulls and checks out the branch.
Developer 1 pushes using the -u flag so that the remote branch will be tracked:
$ git push -u origin
Developer 2 fetches the new branch:
$ git fetch origin
* [new branch] -> origin/shared-feature
$ git checkout -b shared-feature origin/shared-feature
Now Developer 2 is working inside a local branch "shared-feature' that is tracking the remote origin/shared-feature branch.
Developer 1 and Developer 2 are now sharing this branch, and will use git pull and git push to merge changes into the shared-feature branch. After development is completed, a new 'rebased' branch will be created on to master which will be the branch that the pull request will be made from:
Either Developer 1 or Developer 2 can perform this, but only one of them needs to do this to create the pull request branch:
$ git fetch origin
$ git checkout shared-feature
$ git checkout -b shared-feature-pr # creating a new branch at the tip of the shared-feature branch; active branch is now "shared-feature-pr"
$ git rebase origin/master
$ git push origin
The shared-feature-pr will contain a copy of the commits from shared-feature branch, but applied to the top of the master branch. After pushing, the branch will be available on github as a **head** of the pull request. The pull request is created using the base of master and the head is set to shared-feature-pr. Note: development on the feature should now continue under the shared-feature-pr branch. The old "shared-feature" branch can be deleted.
=== Syncing an upstream Fork ===
In most cases, the repository you're contributing to is a fork of an OHDSI repository. In order to get any changes from the upstream repository into your forked repository (including any PRs that you created from your own repository that were applied to the upstream's master), you will need to perform the following steps. **Note:** this assumes that the 'upstream' remote was created. See Working With a Repository section for details on setting this up. For details on the following commands, please see [[https://help.github.com/articles/syncing-a-fork/| syncying up a fork]] article from github.
$ git fetch upstream # will fetch latest commits from the upstream repository
$ git checkout master
$ git merge upstream/master --ff-only # merges the commits from upstream/master into local master using fast-forward only
We force a --ff-only because the only commits that should be applied to your forked repository's master branch are those commits coming in from the upstream repository. This means that the only way that your commits can be seen in the master branch are if they are accepted into the upstream's master via a Pull Request. This is intended and by design.
==== Conclusion =====
In summary, the main mode of contributing to OHDSI repositories is branching from the master, and submitting Pull Requests. If you do not have rights to directly push to an OHDSI repository (which is usually the case), then you must first fork the repository, submit pull requests off of branches creating from your fork. Keeping your fork up to date involves defining an 'upstream' remote, and using git fetch and git merge to apply upstream commits into your fork.
If there's any concerns or questions about the above procedures, please contact the developers at [[http://forums.ohdsi.org/c/developers|the OHDSI developer forum]].