Metrics based Refactoring for cleaner code

Refactoring is a key practice to improved code hygiene. Making refactoring part of your next project is one thing but if you have just joined a team or project with a significant amount of debt how do you work on making things better? Over the last few months I have been assessing a number of code-bases and speaking about technical debt management. While preparing for these engagements I realized that combining two code and project metrics could be used to help focus efforts on code that would deliver the most benefit. Toxicity is a combined measurement of static code analysis metrics. Volatility a measure of changes made to files within a code-base over time. By combining these two measures we can create a source file scatter chart correlating toxicity against volatility.

images/refactoring-toxicity.png

Toxicity charts have proved very useful in quantifying the amount of technical debt in a code-base. The magnitude of the debt is quantified by comparing a value against an arbitrary threshold. The toxicity is expressed as a score against the threshold.

Thresholds are not quite arbitrary. They are derived from peer code reviews and other observations on how readable a code-base is. A fuller description of these thresholds can be found in Erik Dörnenburg’s article here.

images/refactoring-volatility.png

The volatility chart is derived from the version control system – typically from a log of activity on the trunk branch. If a team is working of multiple branches then the activity from all of the branches should be included. We are trying to count the number of changes made to each source file over a reasonable period.

Choosing the right time period is key. We are trying to identify files that require frequent changes. If the chosen period is too short then it is going to skewed by the current work. Too long and it could be skewed by some historical instability. A period of 3-6 months should be reasonable time period.

images/toxicity-volatility-grid.png

Volatile and Toxic – Refactor Now!

So things that are both volatile and toxic should be our primary focus. The code is in flux. People are working with toxic code on a regular basis. Improvements here will deliver an immediate benefit to the team.


Stable but Toxic – Refactor Later

Toxic but stable code is not causing any immediate problems. If there are identified defects but we are not working on them then they are not causing us any pain (other than knowing there is a big ball of mud waiting the cause problems). It would also seem likely that code in this category will over time either be eroded during refactoring the volatile and toxic code or move into another category over time.


Volatile but Clean

Highly volatile clean code is likely to be caused by unstable requirements. The code is being maintained in a good state but changes are being requested in a small area of the code-base indicating that things have not settled down. It might also be that the sample period for volatility is too small.


Stable and Clean

Obviously this is the idea state. Changes are spread through the system with no clear hot spots. The code is well factored with low toxicity allowing changes to be made more easily. Over time this should be where the majority of you code lives.

Share

7 thoughts on “Metrics based Refactoring for cleaner code

  1. graham Post author

    Thanks for the link. You might be interested in this one:

    git whatchanged | grep ‘^:’ | awk ‘{print $6}’ | sort | uniq -c | sort -r | head

  2. Glyn Normington

    Interesting blog – a little more science to “if it ain’t broke, don’t fix it”.

    But I tried the “git whatchanged | etc.” shell command on the largest and most active git repo in our project (git://git.eclipse.org/gitroot/virgo/org.eclipse.virgo.kernel.git) and it produced no output. Is this what you would expect? I was hoping for something equivalent to the volatility chart above…

  3. graham Post author

    I thought I would double check that nothing changed in the copy/paste and that the script worked for one of my repos. This is what I got:

    I would suggest building up the command bit by bit and seeing where the data is getting lost.

    ➜ toxic git:(master) ✗ git whatchanged | grep ‘^:’ | awk ‘{print $6}’ | sort | uniq -c | sort -r | head
    13 Rakefile
    9 .idea/toxic.iml
    8 lib/toxic/commands/project_analysis_command.rb
    7 lib/toxic/dsl.rb
    7 lib/toxic/commands/list_command.rb
    6 lib/toxic/model/project.rb
    5 test/toxic/test_environment.rb
    5 test/toxic/reports/test_template_factory.rb
    5 lib/toxic/source_control/subversion_repository.rb
    5 lib/toxic/project_workspace.rb

  4. Leena N

    Thanks for sharing this. Got this at the right time as we were looking out of guidelines on how to handle technical debts for one of our large project.

    I was wondering whether the ruby gem like churn, https://github.com/danmayer/churn will serve the same purpose as the above? It will be good if you’ve any thoughts on the same.

    Thanks again.

  5. graham Post author

    There are several tools that can help identify the files most changes in a source repository. Thanks for suggesting another one to look at.

  6. robin

    A small simplification & bugfix for the one-liner above:
    [sourcecode language="bash"]
    git whatchanged | awk ‘/^:/ {print $6}’ | sort | uniq -c | sort -nr | head
    [/sourcecode]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>