To abstract, or not to abstract…

Following several discussions with peers and after being exposed to a substantial body of literature about the role of abstraction in computing during my work, I have come to the conclusion that many software engineers take a too radical stance on the matter. They either love it or hate it. I, for myself, believe that the truth lies somewhere in between these drastic views, as it is often the case. This article is the result of personal opinions and ideas that have developed in my mind in the course of time by reading the points of view of a variety of professionals, from ordinary programmers, to notable computer scientists and renowned personalities of the field such as Dijkstra, Parnas and Knuth.

Dealing with hardness and complexity

One of the misconceptions that I’ve witnessed when discussing this topic is that of considering abstraction as a way to make hard problems easier by “hiding complexity”. This belief is probably induced by some confusion between the concept of hard and that of complex, which are often used as synonyms in computer programming. But they are not the same thing, and one does not necessarily imply the other.

Put simply, hardness relates to the difficulty (measured by a cost) in implementing a solution to a problem, even if the problem is conceptually simple to understand. For example, it is not difficult to understand how to travel into space from Earth using the rocket equation and Newton’s Laws of motion and gravitation. However, building a spaceship and shooting it into Mars is a pretty hard task due to limitations in resources, technology, etc.

Complexity refers to the difficulty in understanding how a problem arises because it’s made up of many interdependent smaller problems, despite the smaller problems being simple to understand individually. Predicting the behavior of a single person in specific circumstances is not difficult given enough knowledge of its character. On the other hand, understanding how an entire population may act following specific events is rather challenging.

To deal with hardness we need suitable technology that allows us to simplify the implementation of a solution, or even make it feasible altogether. To deal with complexity we need an holistic approach to problem solving to opportunely represent and study the dynamics of a phenomenon.

In software engineering, abstraction provides a method to implement such an approach. It allows dealing with complex entities by representing them in a way that they become more tractable. This is achieved by focusing attention to particular aspects that are considered as the “core” of the problem at a specific level of detail (or level of abstraction) while ignoring others that are not immediately relevant. Such a focused hierarchical problem-solving approach is useful to observe how different parts of a complex entity interact together and usually reduces the effort in the search for a solution. This does not mean, however, that hard problems can easily be solved thru abstraction. As explained earlier, hardness relates to implementation costs and limitations, not to systemic complexity. If some part of a complex problem (a sub-problem) is hard, the overall solution will be hard to implement regardless of how well the problem is abstracted.

Abstraction does not make hard problems easy. It simply represents them in a way that allows to more selectively see into the solution space.

Creating abstraction layers, when done properly, is a way to reduce the complexity of computing systems and keep the intellectual effort of the developers within reasonable limits, but without losing accuracy in the representation of a process. An abstraction is, in fact, a transformation that must preserve semantic properties. As Dijkstra suggested, abstraction is not about making things less precise, but describing them using different semantic levels while still retaining the required precision. Ignoring implementation details in the abstraction does not mean losing representation power but simply switching to a more “systemic” view, which ultimately leads to the creation of more flexible and robust architectures.

The purpose of abstraction in software development is that of hiding information that’s not relevant to the developer’s current objective, but that’s still relevant to the process.

Modern software systems are very complex creatures. And from complexity theory we know that it is not possible to study the behavior of such systems by observing the dynamics of their individual parts taken alone. The behavior of a complex system as a whole is determined by the interactions between its parts. It seems to me that too many software developers are overly focused on the parts and ignoring the whole. But this “hacking mode” mentality does not play well with the nature of today’s systems, which require a more systemic approach and more effort in their design. Creating the perfect implementation is useless if the pieces, when put together, do not work as expected. And creating good designs requires “systems thinking” and abstraction skills. Abstraction is, in fact, essential to good design.

The leaky abstractions

It is often argued that an extreme reductionist approach to software development (i.e. an “hacking mentality”) is necessary because abstractions may hide unwanted behavior from the underlying implementation, a problem often referred to as abstractions being “leaky“, so all the details in the system cannot be ignored. This is, in my opinion, a flawed approach and will soon become an impractical way to software development for some simple reasons.

Modern systems are growing in complexity way larger than our intellect can grip and acquiring a full understanding of them is often impossible, so we are limited in the depth of detail we can effectively work at. Cognitive overload is the biggest enemy of modern software engineers, and its effects often show in the quality of the work they produce. Developers have always been using lots of black-boxed functionality that abstract complex mechanisms in the form of OS services, network interfaces, routine libraries, APIs, etc. to ease the development process. It’s the basis of modern computer programming.

But there is more. Such a reductionist approach goes against good software design principles. By focusing all the development efforts in the creation of concrete implementations, the whole architecture will be designed with a bottom-up process. This means that it will very likely become tightly dependent on such implementations, with the end result that the system will exhibit rigidity, that is it becomes very intolerant to changes. Complex architectures should be designed by grounding them upon abstract entities (interfaces, concepts, contracts, etc.) because, by definition, they provide the generality required for isolating the system from implementation details and making it more flexible and robust to changes.

Sometimes it would be desirable, however, to have more control over the underlying process that’s being abstracted, for example to fine tune the performance of the system or to customize certain aspects. To meet this requirements, the abstractions can be designed using a multi-level approach, as in the user access mechanism of operating systems. This will provide some flexibility and control over possible “leaks”, and will also prevent the developer from picking up toxic programming habits, such as coding custom implementations (the reinvent-the-wheel syndrome) or force the view of a “correct” implementation onto an existing one (the hack-&-patch syndrome).

To achieve this design goal, one possible approach could be that of providing multiple interfaces at different degrees of granularity so that “power users” may access finer-grained aspects of the implementation. For example, a component may expose a dual interface for “standard” functionality and “advanced” functionality allowing more demanding users to have some control over the inner workings. In any case, manipulating the implementation should always be done in a controlled way. Granting full unconditional access to the internals of a system so that one can hack it at will is, in my opinion, a bad design decision as it opens the door for any sort of bugs.

What to abstract

A typical situation where abstraction is (ab)used arises in object oriented programming where there is often the tendency to create overly complicated abstract class hierarchies. This is usually caused by a too strict application of programming principles such as DRY and separation of concerns. This is wrong and may lead to an excessive decomposition of the problem and an “abstraction explosion” effect, that is the introduction of too many levels of indirection, with the consequence that the code becomes unnecessarily bloated and, ultimately, inefficient.

Instead, one of the main points of introducing abstractions is that of providing stability to the system where areas of high uncertainty are identified. These areas usually correspond to parts of the system that are predicted to change often, or that would require the developer to be a subject matter expert in order for the functionality to be used correctly. By introducing an abstraction level, these critical parts of the system can be isolated and exposed through a more general interface so that the effects of changes to the implementation or its usability difficulties will not propagate across its boundaries.

The identifiability problem

Determining where to introduce abstraction layers is, however, only part of the problem. An even more difficult task presents at the time when the abstraction needs to be defined. A problem arises of how much and what information to include in the interfaces. The “how much” will determine the granularity of the abstraction, that is the amount of underlying implementation exposed to the users. The “what” will determine what aspects will be considered as characterizing the process, thus how well it is represented. This dilemma is also known as the identifiability problem.

There is actually no proven method to solve this problem and all is left to the intuition and skills of the developer. Usually this decision is driven by the specifications or architectural constraints. However, a general principle to keep in mind is that the design of a good abstraction (or model) is always a trade-off between sufficient information to correctly represent the underlying process and independence from implementation details that would hinder the stability and portability of the solution. As Parnas elegantly put it

Finding the simplest model that is not a lie is the key to better software design. (D. L. Parnas)

Another factor to be considered when creating an abstraction is its adaptability. Unlike diamonds, abstractions are not forever. The conditions in the environment where a system operates may vary at any time, so the aspects that need to be modeled may change. For this reason, models should be dynamic in nature so that they can continuously represent the solution to a problem in a consistent manner, otherwise the risk is that of developing erratic architectures that are not aligned with the business scope.

Software architectures must be very flexible, designed according to the principle of modularity so that they can be highly reactive to changes under any representation. And the only way to achieve this is by shielding the modules at the boundaries with good abstractions. Or, to put it differently, not making them tightly coupled with implementation details. As a side note, it is worth mentioning that abstraction is necessary condition, but not sufficient. A model, which is an abstract entity by definition, is not necessarily flexible just for the fact of being abstract. It must be properly designed in order to achieve that goal.

Conclusion

Despite what the title of this article implies, the problem is not whether or not to use abstraction, but rather why, when and how to use it. Undeniably, abstraction methods are a necessity, especially in modern IT where there is a constant need to deal with the ever increasing complexity of software systems. Knowing its different aspects and how it affects the design and development of such systems is key to building sound architectures. As with any other useful technology or methodology, the effectiveness of its application only depends on the skills of the developer. And abstract thinking must certainly be part of the mindset of every modern software developer.