Metrics based Refactoring for cleaner code

Refactoring is a key practice to improved code hygiene. Making refactoring part of your next project is one thing but if you have just joined a team or project with a significant amount of debt how do you work on making things better? Over the last few months I have been assessing a number of code-bases and speaking about technical debt management. While preparing for these engagements I realized that combining two code and project metrics could be used to help focus efforts on code that would deliver the most benefit. Toxicity is a combined measurement of static code analysis metrics. Volatility a measure of changes made to files within a code-base over time. By combining these two measures we can create a source file scatter chart correlating toxicity against volatility.

images/refactoring-toxicity.png

Toxicity charts have proved very useful in quantifying the amount of technical debt in a code-base. The magnitude of the debt is quantified by comparing a value against an arbitrary threshold. The toxicity is expressed as a score against the threshold.

Thresholds are not quite arbitrary. They are derived from peer code reviews and other observations on how readable a code-base is. A fuller description of these thresholds can be found in Erik Dörnenburg’s article here.

images/refactoring-volatility.png

The volatility chart is derived from the version control system – typically from a log of activity on the trunk branch. If a team is working of multiple branches then the activity from all of the branches should be included. We are trying to count the number of changes made to each source file over a reasonable period.

Choosing the right time period is key. We are trying to identify files that require frequent changes. If the chosen period is too short then it is going to skewed by the current work. Too long and it could be skewed by some historical instability. A period of 3-6 months should be reasonable time period.

images/toxicity-volatility-grid.png

Volatile and Toxic – Refactor Now!

So things that are both volatile and toxic should be our primary focus. The code is in flux. People are working with toxic code on a regular basis. Improvements here will deliver an immediate benefit to the team.


Stable but Toxic – Refactor Later

Toxic but stable code is not causing any immediate problems. If there are identified defects but we are not working on them then they are not causing us any pain (other than knowing there is a big ball of mud waiting the cause problems). It would also seem likely that code in this category will over time either be eroded during refactoring the volatile and toxic code or move into another category over time.


Volatile but Clean

Highly volatile clean code is likely to be caused by unstable requirements. The code is being maintained in a good state but changes are being requested in a small area of the code-base indicating that things have not settled down. It might also be that the sample period for volatility is too small.


Stable and Clean

Obviously this is the idea state. Changes are spread through the system with no clear hot spots. The code is well factored with low toxicity allowing changes to be made more easily. Over time this should be where the majority of you code lives.

Share

Oh My ZSH

oh-my-zsh is framework for managing zsh configuration. The default configuration adds some interesting enhancements.

The following shows all the Java files in the current and sub directories.

ls **/*.java

Ryan Bates has a nice Railscasts episode covering oh-my-zsh

I have been using oh-my-zsh for the last couple of weeks. The GIT enhancement showing the current git branch in the prompt is a nice useful and elegant enhancement.

Share

AgileDC – Introduction to Continuous Delivery

Yesterday I presented a talk entitled Introduction to Continuous Delivery at AgileDC. The audience was great and the room packed which is always a recipe for success. I really enjoyed talking about Continuous Delivery and there were some really interesting questions.

As promised I have uploaded the slides from the presentation to iWork.

As I mentioned yesterday Jez Humble was kind enough to donate his slides from which I borrowed shamelessly.

link=http://agiledc.org/

Share

Encapsulating Databases

Small systems grow with success. As these systems grow they often take on more and more functionality either directly into the main system component or into sub-systems. As the systems grow in complexity and responsibility their database requirements grow at a similar rate become more and more complex.

In these complex systems each component needs to not only have access to data but also to transitive data generated by other components. This need often leads to the database becoming the inter-component communication channel.

This design means that the components are coupled through one or more databases. In normal operation this design is simple and fairly effective. Upgrading areas of the system however becomes more and more difficult whenever database changes are required. Things become even more complex if application logic in embodied in the database in some way (Stored Procedures, Triggers etc).

So let’s consider a simple scenario where two components A and B both require the same data from a table T.

Simple table dependency

Figure 1. Simple Table dependency

Now lets suppose a change to component A requires some change to the table T. Before we can deploy the update B also needs to be updated to use updated table. Quite often development is not isolated to a single component. Business need to move quickly to take advantage of a shifting market. These changes are unlikely to be isolated to to a single component but changes are often constrained by priority and resources. This added logistical complexity has to be managed by the development team either by extending time-lines or complicating the way the team use version control etc.

Upgrading a simple table dependency

Figure 2. Upgrades requiring database changes

Wrapping the database within an encapsulating layer of software that provided application level interfaces to the components we essentially break the direct implementation dependency between the components and the table implementation.

Single insulation layer

Figure 3. Encapsulating the database – single adaptor

Insulating component

Figure 4. Encapsulating database – multiple adaptor instances

In this configuration if component a A change requires a database change the encapsulation layer can be updated to support the new component A functionality whilst supporting the existing functionality of component B. We need to regression test both components with the new encapsulation layer but in deployment we only need to deploy the new component A and the updated encapsulation layer.

As the system grows further we can break up the database into smaller more manageable chunks with logical groupings within one or more encapsulation layering components.

Over the years I have seen lots of systems develop their integration patterns around the database. This pattern, while initially successful leads to greater and greater challenges for the development organisation. These challenges surface as elaborate proceedures, complex deployment policies and large comlicated deployments.

Share

Insulating against failure using Caching Reverse Proxies

Reverse proxies have been around for a very long time and depending on your application either interesting additions or a key element to your architecture. Despite their long history I was recently reminded of some interesting applications of Caching reverse proxies that are worth revisiting.

First lets look at a fairly typical configuration where a caching reverse proxy is fronting some sort of web application. The web application exposes its functionality on one port (say 8080) and the proxy connects to this port for in-bound HTTP requests (say on port 80). The application responds with the data matching the request and if cacheable the proxy stores a copy of that content in its cache. There are lots of other paths – typically around failure and different responses but essentially the proxy acts as a pipe between the Internet and the application.

Caching reverse proxy configuration

Figure 1. Basic Caching Reverse Proxy configuration

Lets assume that the application makes use of some internal services on the corporate network. Typically these might be HTTP based web services. This sort of arrangement is fairly common and quite often the design stops here.

Caching reverse proxy with internal HTTP service

Figure 2. Basic Caching Reverse Proxy with internal service

Since we are using HTTP to communicate between the web application and some internal web service we can add a reverse proxy between the application and the service. In doing so we have added a layer of insulation. Lets assume that the proxy is installed and configured on the web application side. It has been configured to honor cache headers and to serve stale content should something go wrong.

This insulating capability can then be leveraged to to provide a degrading service during updates and system failures.

Caching reverse proxy used to insulate from back end service failures

Figure 3. Caching Reverse Proxy used to insulate from service failures.

Should the service be unavailable for any reason content that is already in the cache can be used. Updates and other non-cached operations would fail as usual but for a large group of web applications this may be acceptable and presenting the user with information about the problem and that it is being worked on may be all that is required.

Just like insulation the proxy does not block all errors propagating from the service, but in the right circumstances it can lessen the blow of something going wrong.

Attributes of this design
These are just some of the attributes that I think apply to this design.

  • Multiple requests resulting in the same cacheable service request (by URL) results in a single service call.
  • Simpler application design. The application need not worry about caching service call results.
  • Resilient to intermittent back-end service failure
  • Improved performance
  • Reduced service call bandwidth requirements
  • Lower load on service calls.
  • More complex configuration and deployment process
  • Slightly increased latency for in-bound and out-bound requests.
Share

asciidoc experiments

I have always been interested in text processing systems. This is probably rooted in the time that I discovered computing and programming. At that time the state of the art was ROFF (T)ROFF and a whole family of plain text processing engines that produced nicely formatted output.

With the advent of word processing applications the emphasis moved away from plain test precessing systems to more sophisticated systems involving opaque binary formats. More recently the use of binary formats has fallen out of favor and many word processing systems now provide a level of interoperability with other systems though more open document formats.

A notable exception has been in the evolving popularity of Wiki – online collaborative web sites where the content is entered in plain text with formatting instructions included in the text. Over time these formats have evolved into a sophisticated mark up language in their own right.

Plain text mark-up makes it very easy to compare two versions or variants of a document. At GitHub the Gollum system is entirely based on controlling content using Git.

One of the things I find challenging about word processing systems is their lack of features that allow for collaborative writing. Many documents these days are produced by a team of people and in these situations editing a document means someone driving the process, distributing updated version of the document and incorporating changes into a master copy. Often these documents are distributed by email and the filename used to distinguish versions. This approach means that a lot of time is spent updating and merging.

Git Scribe takes a different view. It uses asciidoc formatted text in one or more source files and produces documents in various formats using different back-end processors. In this approach the document source (plain) text can be managed using version control more easily. Changes can be merged together easily. What you lack in fine grained formatting control you gain in improved collaborative document development and more optimal work-flow.

Share

Open Source project durations

During a recent discussion about open source development we wondered how long these projects lasted. In particular if there was a rapid drop off in activity.

One of the great things about recent open source code repositories is that they often provide APIs allowing this sort of analysis. The chart below was generated from a sample of 2360 public GitHub repositories. I took the started pushed-at and created-at dates to calculate how many days a repository had been worked on. I then grouped them together into 30 day increments and charted the count in each 30 day period.

This has probably been done before but I find the results interesting.

Screen shot 2011-04-28 at 20.25.11.png

Click to view full size

Share

Stabilizing Velocity

I came across this post by Michael Norton and thought I would reference it here: Stabilising Velocity Michael makes some keen observations on both causes and effects of unstable velocity.

Predictable velocity is a key agile planning metric. There is an implicit assumption in agile planning that this iterations velocity will be similar to the last iteration. Without this predicability it is very difficult, if not impossible, to provide and sort of forecast.

Trying to identify the root causes of unstable velocity is difficult. Quite often there are layer upon layer of symptoms masking or hiding the real cause. I delving into root causes I find these techniques useful

  • Value Stream Analysis – helps identify waste: time waiting for some dependency.
  • The 5 Why’s – for finding the root cause
  • Story dependency analysis
  • Cause and effect diagramming
Share

Rake db:migrate MySQL gem dependency

If you have already added

gem 'mysql2'

to your Gemfile but get a message saying that it is missing when you try to migrate

Make sure that you have installed libmysql-ruby

sudo apt-get install libmysql-ruby

This one caught me out on the weekend and it took a while to work out what was going on. I am guessing that not having libmysql-ruby installed caused the gem to incorrectly advertise that it was not available. I did not get any errors when installing the mysql2 gem which is strange.

Share