Following several discussions with peers and after being exposed to a substantial body of literature about the role of abstraction in computing during my work, I have come to the conclusion that many software engineers take a too radical stance on the matter. They either love it or hate it. I, for myself, believe that the truth lies somewhere in between these drastic views, as it is often the case. This article is the result of personal opinions and ideas that have developed in my mind in the course of time by reading the points of view of a variety of professionals, from ordinary programmers, to notable computer scientists and renowned personalities of the field such as Dijkstra, Parnas and Knuth.
Dealing with hardness and complexity
Many seem to see abstraction as something that makes hard problems easier. This belief is probably induced by some confusion between the concept of hard and that of complex, which are often used as synonyms in computer programming. But they are not the same thing, and one does not necessarily imply the other. Put simply, hardness relates to the difficulty (measured by a cost) in implementing a solution to a problem, even if the problem is conceptually simple to understand. Complexity refers to the difficulty in understanding how a problem arises because it’s made up of many interdependent smaller problems, despite the smaller problems being simple to understand individually. To deal with hardness we need suitable technology that allows us to simplify the implementation of a solution, or even make it feasible altogether. To deal with complexity we need an holistic approach to problem solving to opportunely represent and study the dynamics of a phenomenon.
In software engineering, abstraction provides a method to implement such an approach. It allows dealing with complex entities by representing them in a way that they become more tractable. This is achieved by concentrating attention to particular aspects that are considered as the “core” of the problem at a specific level of decomposition (or level of abstraction) while ignoring others that are not immediately relevant. Such a focused hierarchical problem-solving approach is useful to observe how different parts of a complex entity interact together and usually reduces the effort in the search for a solution. It certainly does not make a hard problem easy. If some part of a complex problem (a sub-problem) is hard, it will still remain hard and, therefore, the overall solution will be hard to implement.
Abstraction does not make hard problems easy. It simply represents them in a way that allows to more selectively see into the solution space.
Creating abstraction layers, when done properly, is a way to reduce the complexity of computing systems and keep the intellectual effort of the developers within reasonable limits, but without losing accuracy in the representation of a process. An abstraction is, in fact, a transformation that must preserve semantic properties. As Dijkstra suggested, abstraction is not about making things less precise, but describing them using different semantics while still retaining the required precision. In fact, not exposing detailed information in the abstraction is not a bad thing at all. Actually, it is a necessity in order to create more flexible and robust architectures.
The purpose of abstraction in software development is that of hiding information that’s not relevant to the developer’s current objective, but that’s still relevant to the process.
Modern software systems are very complex creatures. And from complexity theory we know that it is not possible to study the behavior of such systems by observing the dynamics of their individual parts taken alone. The behavior of a complex system as a whole is determined by the interactions between its parts. It seems to me that too many software developers are overly focused on the parts and ignoring the whole. But this “hacking mode” mentality does not play well with the nature of today’s systems, which require a more systemic approach and more effort in their design. Creating the perfect implementation is useless if the pieces, when put together, do not work as expected. And creating good designs requires “systems thinking” and abstraction skills. Abstraction is, in fact, essential to good design.
The leaky abstractions
Some argue that an extreme reductionist approach to software development (i.e. an “hacking mentality”) is necessary because abstractions may hide unwanted behavior from the underlying implementation, a problem often referred to as abstractions being “leaky“, so one must be able to act on all parts of the system when necessary. While this may be true from the point of view of the creator of the abstractions, it is a flawed approach from that of their users and will soon become an impractical way to software development.
Modern systems are growing in complexity way larger than our intellect can grip and acquiring a full understanding of them is often impossible, so we are limited in the depth of detail we can effectively work at. Cognitive overload is the biggest enemy of modern software engineers, and its effects often show in the quality of the code they produce. Developers have always been using lots of black-boxed functionality that abstract complex mechanisms in the form of OS services, network interfaces, routine libraries, APIs, etc. to ease the development process. It’s the basis of modern computer programming.
But there is more. Such a reductionist approach goes against good software design principles. By focusing all the development efforts in the creation of concrete implementations the whole architecture will be designed with a bottom-up process. This means that it will very likely become tightly dependent on such implementations, with the end result that the system will exhibit rigidity: one single modification in a single module may propagate many other changes system-wide in a chain reaction fashion. Architectures should be designed by grounding them upon abstract concepts (interfaces, abstract classes, etc.) because, by definition, they provide generality with invariant semantics, thus making the system more flexible and robust to changes.
Sometimes it would be desirable, however, to have more control over the underlying process that’s being abstracted, for example to fine tune the performance of the system or to customize certain aspects. To meet this requirements, the abstractions can be designed using a multi-user approach, as in the user access mechanism of operating systems. This will provide some flexibility and control over possible “leaks”, and will also prevent the developer from picking up toxic habits that would make the software even more complex, such as code his own implementations (the reinvent-the-wheel syndrome) or force his view of a “correct” implementation onto an existing one (the hack-&-patch syndrome).
To achieve this design goal, one possible approach could be that of providing multiple interfaces at different degrees of granularity so that “power users” may access finer-grained aspects of the implementation. For example, a component may expose a dual interface for “standard” functionality and “advanced” functionality allowing more demanding users to have some control over the inner workings. In any case, manipulating the implementation should always be done in a controlled way. Granting full unconditional access to the implementation so that one can hack it at will is, in my opinion, a bad design decision as it opens the door for any sort of bugs.
What to abstract
A typical situation where abstraction is (ab)used arises in object oriented programming where there is often the tendency to create overly complicated abstract class hierarchies. This is typically caused by a too strict application of the DRY principle, where the programmer believes that it is necessary to create an abstraction every time some code seems to occur more than once in the program. This is wrong and may lead to an “abstraction explosion” effect, that is the introduction of too many levels of indirection, with the consequence that the code becomes unnecessarily bloated and, ultimately, inefficient.
Instead, one of the main points of introducing abstractions is that of providing stability to the system where areas of high uncertainty are identified. These areas usually correspond to parts of the system that are predicted to change often, or that are complicated enough that would require the developer to posses an unreasonable high level of competence in a specific camp in order for the functionality to be used. By introducing an abstraction level these critical parts of the system can be isolated and exposed through a more general interface so that the effects of changes to the implementation or its usability difficulties will not propagate across its boundaries.
The identifiability problem
Once some part of a system has been determined to be critical for stability and the decision has been made to abstract it, then the problem arises of how much and what information to include in the interface. The “how much” will determine the granularity of the abstraction, that is the amount of underlying implementation exposed to the users. The “what” will determine what aspects will be considered as characterizing the process, thus how well it is represented. This dilemma is also known as the identifiability problem.
There is actually no proven method to solve this problem and all is left to the intuition and skills of the developer. Usually this decision is driven by the specifications or architectural constraints. However, a general principle to keep in mind is that the design of a good abstraction (or model) is always a trade-off between sufficient information to correctly represent the underlying process and independence from implementation details that would hinder the stability and portability of the solution. As Parnas elegantly put it
Finding the simplest model that is not a lie is the key to better software design. (D. L. Parnas)
Another factor to be considered when creating an abstraction is its adaptability. Unlike diamonds, abstractions are not forever. The conditions in the environment where a system operates may vary at any time so the aspects that need to be modeled may change. For this reason, models should be dynamic in nature so that they can continuously represent the solution to a problem in a consistent manner, otherwise the risk is that of developing erratic models that are not aligned with the business scope.
This means that software architectures must be very flexible, designed according to the principle of modularity so that they can be highly reactive to changes under any representation. As a side note, it is worth mentioning that abstraction and flexibility are not the same thing. A model, which is an abstract entity by definition, is not necessarily flexible just for the fact of being abstract. It must be properly designed in order to achieve that goal.
Despite what the title of this article implies, the problem is not whether or not to use abstraction, but rather why, when and how to use it. Undeniably, abstraction methods are a necessity, especially in modern IT where there is a constant need to deal with the ever increasing complexity of software systems. Knowing its different aspects and how it affects the design and development of such systems is key to building sound architectures. As with any other useful technology or methodology, the effectiveness of its application only depends on the skills of the developer. And abstract thinking is something that certainly must be part of the mindset of every developer.