Check code changes with flake8 before committing
In this post, I will show you how to improve the quality of your Python code by checking it for issues with flake8 before committing it. There are two options:
- check all files of the project before every commit, which works best approach for new projects
- check only the code that was modified in a commit, which works better for
legacymature projects
I assume you are familiar with PEP8, the style guide for Python code, and the tool flake8. If you are not, I urge you to read up on both!
But why even bother? IDEs like PyCharm or Eclipse with PyDev will highlight PEP8 violations and other issues, and most text editors can be configured to do that as well. If you are a Sublime Text user, my friend Daniel Bader has written an extensive guide how to set up Sublime Text for Python development.
This helps a lot, but even the most diligent developer has a bad day. A pre-commit hook removes the human error and acts as a final frontier to prevent bad code from entering your repository.
Check the whole project with flake8 before every commit
When starting with a new Python project, you can configure a Git pre-commit hook that runs flake8 before every commit. That ensures that you never accidentally introduce any issues and that your codebase is always clean.
#!/bin/sh flake8
Once you created the file, you have to make it executable. On Linux or MacOS the command would be:
chmod u+x .git/hooks/pre-commit
Now every time you commit, your code is checked with flake8, and if it doesn’t pass, your commit is rejected.
$ echo "x=4" > foo.py $ git add foo.py $ git commit -m "Added file with PEP8 violations" ./foo.py:1:2: E225 missing whitespace around operator
Simple, isn’t it? Now, let’s fix that error.
$ echo "x = 4" > foo.py $ git commit -m "Added file without PEP8 violations"</code> [master 192fcc9] a 1 file changed, 8 insertions(+) $ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: b.py no changes added to commit (use "git add" and/or "git commit -a") $ git diff diff --git a/foo.py b/foo.py index bac8999..486a4f7 100644 --- a/b.py +++ b/b.py @@ -1 +1 @@ -x=4 +x = 4
Oops, we forget to add our fix and committed the broken version. Why didn’t the pre-commit hook catch that?
The pre-commit hook is running flake8 against the current working copy, which contains the fix. What we really want to do is run flake8 against what is actually committed, i.e. the changes that are currently staged. This can be done, but it makes the hook a bit more complicated.
#!/bin/sh # script to run tests on what is to be committed # Source: http://stackoverflow.com/a/20480591/211960 CHECK=flake8 # First, stash index and work dir, keeping only the # to-be-committed changes in the working directory. old_stash=$(git rev-parse -q --verify refs/stash) git stash save -q --keep-index --include-untracked new_stash=$(git rev-parse -q --verify refs/stash) # If there were no changes (e.g., `--amend` or `--allow-empty`) # then nothing was stashed, and we should skip everything, # including the tests themselves. (Presumably the tests passed # on the previous commit, so there is no need to re-run them.) if [ "$old_stash" = "$new_stash" ]; then echo "pre-commit script: no changes to test" sleep 1 # XXX hack, editor may erase message exit 0 fi # Run tests $CHECK status=$? # Restore changes git reset --hard -q && git stash apply --index -q && git stash drop -q # Exit with status from test-run: nonzero prevents commit exit $status
At some point, your project will probably grow so large that checking the whole project takes too long. If you (and everyone on your team) used this hook, most of your codebase doesn’t trigger any warnings anyway, so let’s change our pre-commit hook to only check modified files. Find the CHECK variable at the beginning and change it like this.
CHECK=flake8 `git diff --cached --name-only` || errors=1
Edit: the maintainer of flake8 chimed in on the Reddit discussion of this post to note that flake8 actually has the ability to install a pre-commit hook with more or less the same functionality. Go check out the flake8 documentation on using hooks and see if it fits your needs. One advantage of the approach presented here is that it also works with other tools.
Check only the code that was modified
When working on a legacy project that already has a ton of flake8 warnings, such a pre-commit hook that checks the whole file is to probably restrictive. It won’t let you push before your resolved all issues. While it is certainly a good idea to clean up the portions of the code that you are working on, you probably don’t want to fix up every PEP8 violation in every file you touch in a commit. If you need any convincing that this might be a bad idea, check out the talk Beyond PEP 8 — Best practices for beautiful intelligible code, given by Raymond Hettinger at PyCon 2015. It’s almost an hour long, but quiet entertaining and worth every minute.
Now, wouldn’t it be awesome to just check the lines you changed?
Today is your lucky day, because flake8 can actually do exactly that! Just pipe a diff to stdin, e.g. from git:
git diff -U0 | flake8 --diff
Working with the diff means we don’t have to stash any changes, so the pre-commit hook now looks like this:
#!/bin/sh git diff --cached -U0 | flake8 --diff
Limiting flake8 to check only lines that changed has one limitation: it doesn’t catch issues that your changes introduced in other parts of the code. The most common case is probably removing a line that uses an import, but forgetting to remove the corresponding import statement.
A solution to this could be hook that checks that your changes do not increase the total number of flake8 warnings in the project. This requires checking the whole project twice and is better suited for a pre-push hook. I will show you how to write such a pre-push hook in a future post. If you want to be notified of the post, follow @consideratecode on Twitter or subscribe to my mailing list.