Graham Brooks - Metrics based Refactoring for cleaner code

Refactoring is a key practice to improved code hygiene. Making refactoring part of your next project is one thing but if you have just joined a team or project with a significant amount of debt how do you work on making things better? Over the last few months I have been assessing a number of code-bases and speaking about technical debt management. While preparing for these engagements I realized that combining two code and project metrics could be used to help focus efforts on code that would deliver the most benefit. Toxicity is a combined measurement of static code analysis metrics. Volatility a measure of changes made to files within a code-base over time. By combining these two measures we can create a source file scatter chart correlating toxicity against volatility.

Toxicity

Toxicity Charts have proved very useful in quantifying the amount of technical debt in a code-base. The magnitude of the debt is quantified by comparing a value against an arbitrary threshold. The toxicity is expressed as a score against the threshold.

Thresholds are not quite arbitrary. They are derived from peer code reviews and other observations on how readable a code-base is. A fuller description of these thresholds can be found in Erik Dörnenburg’s article here

Volatility

The volatility chart is derived from the version control system - typically from a log of activity on the trunk branch. If a team is working of multiple branches then the activity from all of the branches should be included. We are trying to count the number of changes made to each source file over a reasonable period.

Choosing the right time period is key. We are trying to identify files that require frequent changes. If the chosen period is too short then it is going to skewed by the current work. Too long and it could be skewed by some historical instability. A period of 3-6 months should be reasonable time period.

Toxicity Volatility Grid

Volatile and Toxic - Refactor Now!

So things that are both volatile and toxic should be our primary focus. The code is in flux. People are working with toxic code on a regular basis. Improvements here will deliver an immediate benefit to the team.

Stable but Toxic - Refactor Later

Toxic but stable code is not causing any immediate problems. If there are identified defects but we are not working on them then they are not causing us any pain (other than knowing there is a big ball of mud waiting the cause problems). It would also seem likely that code in this category will over time either be eroded during refactoring the volatile and toxic code or move into another category over time.

Volatile but Clean

Highly volatile clean code is likely to be caused by unstable requirements. The code is being maintained in a good state but changes are being requested in a small area of the code-base indicating that things have not settled down. It might also be that the sample period for volatility is too small.

Stable and Clean

Obviously this is the idea state. Changes are spread through the system with no clear hot spots. The code is well factored with low toxicity allowing changes to be made more easily. Over time this should be where the majority of you code lives.