One of the software engineering topics that I find myself discussing quite often in various contexts, from work meetings to job interviews, is that of writing “high-quality” code. Everyone does agree (obviously) that this is an important practice in software development, but when it comes to precisely define what it means things start to get confusing.
Typically, software engineers – even very experienced ones – would answer the question either with broad statements, such as defining high-quality code as code that’s written according to specific guidelines, that follows universally accepted practices or standards, etc, or by listing (often at random) some characteristics that are commonly found in well-written code bases, such as readability, maintainability, test-ability, etc.
I generally find these definitions unsatisfactory as they tend to be incomplete or overly focused on particular elements of programming and do not consider other essential characteristics. Even though they are fundamental to writing good code, there is much more to effective software engineering than just following rules and patterns. In fact, blindly following them without proper consideration for other important aspects may foster bad practices and lead to the opposite effect of writing poor-quality code.
So, what does writing “high-quality” code mean?
After some research and consideration it appears that all the features generally associated with code regarded as of high quality contribute to address the following concerns
Writing high-quality code means developing software that satisfies all of these three requirements, which we can call the 3C of high-quality code. This sort of abstraction provides a more holistic view of the quality of a code base compared to just considering a few specific aspects. It represents all the features that a code base should have in order to meet the requirements of pretty much any industrial application by answering the following crucial questions
- Is the code doing correctly what it’s required to do?
- Is the code easy to work with and to adapt to new situations?
- Is the code driving value and serving the business scope efficiently?
The characteristics that contribute to each of these requirements relate to both structural and behavioral aspects of the code, and some of them are instrumental to satisfying more than one requirement at the same time. The following sections briefly summarize the key characteristics without going into details as entire books have already been written for each one of them.
This requirement determines behavioral characteristics of the code that measure its adherence to functional specifications, its ability to handle exceptional situations and the inability to maliciously manipulate its scope. The following characteristics are key to classify code as being correct
The code must properly implement the functional requirements, which in layman words simply means it must correctly do what it’s supposed to. While this sounds like an obvious fact, in reality writing valid (bug-free) code is no easy task for sufficiently complex software systems.
There are several reasons for this that may be organic (insufficient domain knowledge, incorrect specifications, poor communication, etc.) or unpredictable such as mistakes caused by the cognitive state of the developers (fatigue, stress, etc.). Writing valid code, in fact, does not depend entirely on the developers’ skills but on the environment as a whole.
Handling expected conditions with valid behavior is only a partial achievement for correctness. Unexpected situations and edge cases may occur anytime and the code must be able to handle them without causing incorrect behavior or even severe system failures.
This aspect must be approached by design and supported at all stages of the Software Development Life Cycle (SDLC) thru strict verification processes. It should never be tackled as an afterthought. Making code reliable a posteriori is quite difficult because defects may propagate virally and if that happens they can only be fixed with workaround and patching approaches, potentially introducing new issues.
Security has been historically one of the most overlooked aspects of software development, and the results of this still keep making headlines around the world. Secure code is code that can only be used in a predetermined way without the possibility of exploiting it to perform unwanted actions that may be potentially harmful.
So even if the code appears to be valid (i.e. it’s doing what’s expected), it doesn’t meet the requirement of correctness if it’s possible to hijack it to carry out unintended activities. Vulnerable code is not correct code, especially when no effort at all has been made to make it secure.
This requirement determines the structural characteristics that code must have in order to make working with it easy and straightforward. Code is written for others to use and therefore it should be easy to use correctly and hard to use wrongly. It should also be relatively simple to change and adapt to different operating conditions. Clear code is code that addresses the following main concerns.
As rules represent the fundamental framework of any system to maintain order and avoid chaos, so code should follow well defined policies that set a standard for how to write it . Following a set of common rules gives a clear indication of how to address specific situations and ease the process of understanding intents. These policies are generally meant to deal with the following aspects of the code
- Structure – How all of the parts fit and interact together
- Implementation – How each part accomplishes its goal
- Form – How to keep things in order
Design principles, language best practices, coding conventions and programming styles are typical examples of such policies. Following them makes the code more coherent and intuitive, increasing its readability and maintainability and lowering the chances of errors.
Programmers should not be forced to analyze huge portions of possibly unrelated code in order to understand, use and modify specific functionality. Regardless of the programming paradigm used, there should be a methodical organization so that it’s immediately clear which part of the code is providing which service.
A general programming concept to achieve this goal is that of modularity, which is a “systemic” approach to writing code that puts careful consideration on the roles of the different parts of a system and how such parts collaborate, while keeping these aspects clearly separated. This organization into self-contained cooperative entities with precise and highly focused functional boundaries avoids writing spaghetti code and is key to developing well-designed software systems.
Many developers – and especially the most experienced ones – have the tendency to boasting by showcasing their programming skills and abilities. Unfortunately, this is often done inopportunely and when that happens it leads to what I define as “toxic complexity”.
Simple and expressive code should always be preferred to complex and ineloquent code whenever possible because simple things are easy to understand, inexpensive to maintain and very fast to execute. Following Occam’s Razor Principle, implementing simple (but not simpler) solutions in few precise terms leads to optimal and clear code in many cases.
Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.Edsger Wybe Dijkstra
This requirement relates to characteristics of the code that determine its effectiveness in producing value. After all, code is not created to be just a beautiful piece of craftsmanship but as an effective tool to serve a business purpose. Cost-efficiency is all about how good the code is at delivering the expected value considering the running costs. A key factor in writing efficient code is the choice of a suitable and consistent development approach, which may be found among the several X-Driven-Development methodologies available in the field.
In industrial applications the highest cost to the business is largely dependent upon the robustness of the system because a failure at any given time may likely cause a big loss. Therefore, the code must be highly fault-tolerant and resilient before anything else. And while making use of specific coding practices such as robust error handling, fault isolation, monitoring mechanisms for good observability (i.e. logging, self-diagnostics, etc.) are necessary conditions, they may not be sufficient.
Resilient code is the result of not only the application of specific programming techniques but of a systematic approach to developing software that fosters robustness as a core philosophy. This is a very important aspect as it sets the developers in the right mindset by design, making sure that every functionality meets strict quality requirements. A typical example of such an approach is found in the popular TDD development methodology.
For many systems the value for the business is primarily determined by the amount of processed information or used resources at any given time. In such scenarios the priority may be low-latency and high-throughput (e.g. trading systems), or low resources consumption (e.g. IoT devices). In these cases the code must be highly optimized to minimize its time/space complexity in order to achieve the maximum cost-efficiency.
Such optimizations, however, should never be done at the expenses of resiliency. This may be the case, for example, when doing “premature optimizations” without a clear understanding of the real impact they will have in the system. The effects may vary from the introduction of unnecessary complexity (with increased likelihood of bugs) to even removing code that plays an important role in keeping the system reliable because considered a bottleneck!
The survival of modern businesses is highly dependent upon their ability to quickly change to take advantage of new circumstances. For these reasons, code that is highly adaptable will be very cost-effective as it allows following new and more profitable business directions with minimal costs.
To achieve this, it is crucial to follow coding methodologies that give the right emphasis to architectural considerations and avoid being driven by implementation details. Adopting an approach that promotes a good abstract representation of the business domain without tight coupling to specific technologies (e.g. DDD) will foster a codebase that’s highly flexible and that can easily adapt to new business needs minimizing disruptions.
Measuring the quality of the code
Determining the key characteristics that make for high-quality code is not of much use if there are no metrics in place to get a measure of such “quality”. Modern Software Engineering provides many means to quantitatively determine if a code base has been written in accordance with all the criteria.
Continuous Testing is the modern practice to measure the correctness of a code base. Running the code thru a variety of test stages in the CI/CD pipeline is the only way to verify its validity and security. While testing does not give a 100% guarantee that a software is bug/vulnerability-free, it does provide valuable metrics that serve as strong indicators of the code’s alignment with correctness requirements.
The clarity of a code base is expressed mostly in its design and implementation stages and the skills of the engineers play a crucial role in this regard. Coding abilities are essential, but it is undeniable that software design skills greatly contribute to the overall quality, so knowledge of the fundamentals of good software design is paramount and should be assessed alongside coding skills (sadly this is often not the case).
Many modern software development platforms provide useful code analysis tools that can calculate metrics such as degree of coupling between components (useful to have a measure of cohesion), cyclomatic complexity (to measure conciseness), maintainability index, LOC, etc. A very important role is played by code reviews, which are the last line of defense against poorly written code and should be a fundamental part of the SDLC.
Several techniques also exist to measure the performance of the code in order to determine its ability to drive value. For high-performance systems the efficiency is assessed thru profiling, which should be an integral part of the software development pipeline. And several testing methods also exist to measure the resiliency of the code at different levels of depth, such as stress testing, load testing, chaos testing and more.
Writing high-quality code is a pretty hard endeavor and takes a substantial amount of time. There are several characteristics that determine the value of a code base and all of them contribute to 3 main goals that should be achieved in order for the code to be considered of high quality.
Undeniably, correctness is the most important requirement as code that’s doing things wrongly or that’s unreliable and exploitable bears little to no utility, no matter how clean and efficient it is. It’s also probably the trickiest to get right as bugs can be introduced for many unpredictable reasons, and because security is too often not given much consideration.
On the other hand, code is developed to serve a business scope and to create value. This entails considering the economics aspect of it, that is how it creates value given its running costs. In this context, writing clear code, regardless of the type of system, is certainly a way to achieve a business goal by reducing the cost of maintenance and adjustment to changing business directions.
At the same time, performance, resiliency and adaptability are determining factors for the efficiency of most commercial systems in terms of added value, and are the driving aspects in measuring how “useful” the code is for the business. In the end, stakeholders won’t be impressed by how beautifully the code is written, rather by how good it is at generating profit.
Even though in real-world scenarios there are often trade-offs to be made between writing good code and meeting business goals, striving for a correct, clear and cost-efficient codebase will certainly pay off as it will minimize the risk of accumulating technical debt and will serve both commercial and engineering requirements. A win-win situation.