All truth

“All truth passes through three
stages. First, it is ridiculed. Second, it is violently opposed.
Third, it is accepted as being self-evident.”

– Arthur Schopenhauer, German philosopher (1788 – 1860)

Share

If

Quotes: Things that inspire and be remembered

“If you can dream – and not make dreams your master;
If you can think – and not make thoughts your aim;
If you can meet with Triumph and Disaster
And treat those two impostors just the same;”

Kipling

Share

I can’t believe I missed this!

Just over 20 years ago Jack W. Reeves wrote an article in the C++ Journal entitled “What is Software Design?” and I missed it. Not only that but no one thought to point out that I was missing a very important article. An article that challenged and changed/clarified my mental model of software design and construction – 20 years after it was published. A copy of the original article can be found here: http://www.developerdotstar.com/mag/articles/reeves_design.html. I guess it is better late than never and it demonstrates that some things stay relevant and important. Sometimes they remain controversial.


Traditional Thinking

Traditional thinking places writing source code or coding firmly in the construction phase. Architecture and design being another activity quite often divorced from the actual code and coding activity. Quite often done around whiteboards or visual design tools. I remember long review sessions to make sure that code actually matched the original design. This view has lead to writing code to being viewed as a commodity which I think very often results in sub-optimal solutions (I am being kind here).

In his article Jack Reeves considers construction to be confined to compilation and coding to be design. Even with long running builds in this model the act of construction is short compared to design activities. It also (in my view) more closely matches what building/engineering/crafting software is all about. It also highlights why a lot of analogies to other human activities like house building break down pretty quickly. If building a house from the design was cheap and near instantaneous I wonder what sort of world we might live in.

Dynamic interpreted languages

It is interesting to extrapolate this view to interpreted languages like Ruby and JavaScript. For interpreted languages there is no explicit construction/compilation step. Interpreting and compiling happen during execution typically shortening the feedback cycle significantly – albeit with some impact to runtime performance.


Design is hard

Design is hard, iterative, incremental and demands feedback. Writing code is hard, iterative, incremental and demands feedback. Design is a creative process that is informed both by external forces but also the design being created. Writing code is an act of design not an act of construction – that’s the compiler’s job.

Share

DRYing out code

Removing duplicate code is a great way to improve the internal quality of your application code. Duplications mean that you have more code than you should and are often the source of more subtle bugs of the “I’ve already fixed that ..” variety.

While working on a scriptable command line tool for Java (see Automated refactoring for library updates) I discovered a neat way of identifying duplications at the method and block level.

The tool is written in Scala (still learning so improvement suggestions welcome) and uses ANTLR4 to generate an Abstract Syntax Tree (AST) of Java source supplied to the parser. The AST and the Java grammar I am using make it really easy to traverse the tree after processing and identify blocks { code }. The lexer ignores white space and comments so the AST just contains tokens of the blocks.

Buckets

Once the source code has been parsed and blocks identified they can be placed into buckets. A bucket is a hash-map of lists. The key for the hashmap must be unique to the significant elements of the block. For this I used a hash from the token text of all terminal elements within the block. The list contains the block represented by the hash. In this way lists with more than one element identify block duplicates.

1
2
3
4
5
6
7
private def bucketBlocks(models: List[CodeModel]): BucketSet[BlockDeclaration] = {
    val blockBuckets = new BucketSet[BlockDeclaration]
    models foreach (model => {
      model.blocks foreach (block => blockBuckets.add(block.signature, block))
    })
    blockBuckets
  }

The block.signature is a computed hash of all the token in the block. A BlockDeclaration holds a reference to the root node in the AST for the block.

Iterating through all the blocks and added them to the buckets means that duplicate blocks are buckets with more than one block:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class BucketSet[T] {

  val buckets = new mutable.HashMap[Signature, Bucket[T]]

  def add(signature: Signature, item: T) {
    if (buckets.contains(signature)) {
      buckets(signature).add(item)
    } else {
      buckets(signature) = new Bucket(item)
    }
  }

  def duplicates(): List[Bucket[T]] = buckets.values.filter(bucket => bucket.hasDuplicates).toList

  def getDuplicateCount = duplicates().size

  def getDuplicates = duplicates()

  def eachDuplicate(function: (Bucket[T]) => Any) {
    buckets.values foreach (bucket => {
      if (bucket.hasDuplicates) {
        function(bucket)
      }
    })
  }

  private def hasDuplicates(items: List[T]): Boolean = {
    items.size > 1
  }
}

What’s interesting about this approach is that the identified duplicates are naturally low hanging fruit for extracting common methods.

Share

Integrating GPUs in Application Development – From Concept to Deployment

This post is a little overdue :(

In June I presented at QCon NYC on using GPUs. It was a great chance to catch up on all the changes to C and C++ that I have missed out on in recent years. You can catch up on the presentation over on InfoQ

The source code used in the presentation is on github https://github.com/grahambrooks/qcon-ctod

Share

Autmated refactoring for library updates


Vision: Automated refactoring for upgrades

After watching Clang MapReduce — Automatic C++ Refactoring at Google Scale I was struck with the idea that this could help with the upgrade problem. Almost every application uses libraries. Those libraries need to be updated from time but each time they are updated all the code using those libraries also needs to be updated. For development teams finding time to upgrade to the latest libraries against competing functional updates is challenging. What if as part of the release a set of refactoring commands or programs accompanied the libraries. These refactoring scripts would automatically update the consuming application code to use the new libraries saving time and money.

Google uses the Clang compiler to generate and store abstract syntax tree (AST) information about the build. Google build all their applications from source everytime so the data about a particular version of the source code is known and all the binary dependencies are up to date. This AST data is then processed via map-reduce to refactor the code-base.

Chandler Carruth talks about using semantic predicates to identify source to be updated from the AST data. Similar to modern testing and mocking frameworks the semantic predicates are used to match source code elements to be updated. Refactoring functions are then applied using Clag’s source rewriting system. Chandler mentioned that Google are looking to open source this capability. When writing this I could find no reference to the open source version. Hopefully it will be released into the wild soon.

So if we have a system that can programatically refactor code then the refactoring program could be shipped with particular version of a library to upgrade the client code. Upgrading from version 1 to version 3 would roll up changes from the intervening versions. Code can now be considered data and updated in a similar way we update databases with Active Record migrations or dbDeploy DDL

This capability could be integrated into a Continuous Integration systems and in particular Continuous Delivery pipelines. Large enterprise development teams can keep up with their colleagues changing the libraries that they depend on. The reduction in technical debt in such environments could be huge. Of course its not just about the semantics. High levels of automated test coverage would also be required.

Share

Metrics based Refactoring for cleaner code

Refactoring is a key practice to improved code hygiene. Making refactoring part of your next project is one thing but if you have just joined a team or project with a significant amount of debt how do you work on making things better? Over the last few months I have been assessing a number of code-bases and speaking about technical debt management. While preparing for these engagements I realized that combining two code and project metrics could be used to help focus efforts on code that would deliver the most benefit. Toxicity is a combined measurement of static code analysis metrics. Volatility a measure of changes made to files within a code-base over time. By combining these two measures we can create a source file scatter chart correlating toxicity against volatility.

images/refactoring-toxicity.png

Toxicity charts have proved very useful in quantifying the amount of technical debt in a code-base. The magnitude of the debt is quantified by comparing a value against an arbitrary threshold. The toxicity is expressed as a score against the threshold.

Thresholds are not quite arbitrary. They are derived from peer code reviews and other observations on how readable a code-base is. A fuller description of these thresholds can be found in Erik Dörnenburg’s article here.

images/refactoring-volatility.png

The volatility chart is derived from the version control system – typically from a log of activity on the trunk branch. If a team is working of multiple branches then the activity from all of the branches should be included. We are trying to count the number of changes made to each source file over a reasonable period.

Choosing the right time period is key. We are trying to identify files that require frequent changes. If the chosen period is too short then it is going to skewed by the current work. Too long and it could be skewed by some historical instability. A period of 3-6 months should be reasonable time period.

images/toxicity-volatility-grid.png

Volatile and Toxic – Refactor Now!

So things that are both volatile and toxic should be our primary focus. The code is in flux. People are working with toxic code on a regular basis. Improvements here will deliver an immediate benefit to the team.


Stable but Toxic – Refactor Later

Toxic but stable code is not causing any immediate problems. If there are identified defects but we are not working on them then they are not causing us any pain (other than knowing there is a big ball of mud waiting the cause problems). It would also seem likely that code in this category will over time either be eroded during refactoring the volatile and toxic code or move into another category over time.


Volatile but Clean

Highly volatile clean code is likely to be caused by unstable requirements. The code is being maintained in a good state but changes are being requested in a small area of the code-base indicating that things have not settled down. It might also be that the sample period for volatility is too small.


Stable and Clean

Obviously this is the idea state. Changes are spread through the system with no clear hot spots. The code is well factored with low toxicity allowing changes to be made more easily. Over time this should be where the majority of you code lives.

Share

Oh My ZSH

oh-my-zsh is framework for managing zsh configuration. The default configuration adds some interesting enhancements.

The following shows all the Java files in the current and sub directories.

ls **/*.java

Ryan Bates has a nice Railscasts episode covering oh-my-zsh

I have been using oh-my-zsh for the last couple of weeks. The GIT enhancement showing the current git branch in the prompt is a nice useful and elegant enhancement.

Share

AgileDC – Introduction to Continuous Delivery

Yesterday I presented a talk entitled Introduction to Continuous Delivery at AgileDC. The audience was great and the room packed which is always a recipe for success. I really enjoyed talking about Continuous Delivery and there were some really interesting questions.

As promised I have uploaded the slides from the presentation to iWork.

As I mentioned yesterday Jez Humble was kind enough to donate his slides from which I borrowed shamelessly.

link=http://agiledc.org/

Share