After watching Clang MapReduce – Automatic C++ Refactoring at Google Scale I was struck with the idea that this could help with the upgrade problem. Almost every application uses libraries. Those libraries need to be updated from time but each time they are updated all the code using those libraries also needs to be updated. For development teams finding time to upgrade to the latest libraries against competing functional updates is challenging. What if as part of the release a set of refactoring commands or programs accompanied the libraries. These refactoring scripts would automatically update the consuming application code to use the new libraries saving time and money.

Google uses the Clang compiler to generate and store abstract syntax tree (AST) information about the build. Google build all their applications from source everytime so the data about a particular version of the source code is known and all the binary dependencies are up to date. This AST data is then processed via map-reduce to refactor the code-base.

Chandler Carruth talks about using semantic predicates to identify source to be updated from the AST data. Similar to modern testing and mocking frameworks the semantic predicates are used to match source code elements to be updated. Refactoring functions are then applied using Clag’s source rewriting system. Chandler mentioned that Google are looking to open source this capability. When writing this I could find no reference to the open source version. Hopefully it will be released into the wild soon.

So if we have a system that can programatically refactor code then the refactoring program could be shipped with particular version of a library to upgrade the client code. Upgrading from version 1 to version 3 would roll up changes from the intervening versions. Code can now be considered data and updated in a similar way we update databases with Active Record migrations or dbDeploy DDL

This capability could be integrated into a Continuous Integration systems and in particular Continuous Delivery pipelines. Large enterprise development teams can keep up with their colleagues changing the libraries that they depend on. The reduction in technical debt in such environments could be huge. Of course its not just about the semantics. High levels of automated test coverage would also be required.