Skip to content

Modern C++ • Move Semantics

Proper resource acquisition and management is core to writing safe and performant code. C++ has never provided any sort of built-in mechanism for that, so the task of managing resource acquisition is totally on the developer’s shoulder. And this is the reason why understanding the concepts of “moving” vs “copying” is a must-have for any C++ developer.

The move semantics along with copy semantics refer to paradigms for resource acquisition between objects. Particularly, the move semantics refers to programming techniques that allow resources to be moved between objects rather than copied every time they’re acquired. On the other hand, copy semantics is based on the principle of making copies of the needed resources whenever they’re acquired. However, making full copies of data can easily turn into a serious performance bottleneck, and in many cases isn’t really necessary.

Suppose that we have a class Object that needs to acquire a resource, in the example represented by a vector but it can be something else. Then the standard way to do this is by implementing a copy constructor and an assignment operator, as in Source 1, following the RAII principle for safe disposal of the resource (here automatically done by the STL vector)

// Old C++
 
class Object {
 
    // The managed resource
    std::vector<int> resource;
 
public:

    Object() {}
 
    Object(const Object& o)
    {
        // The following code is equivalent to this->resource = o.resource
        // here it's made explicit the fact that the resource is acquired
        // by copy, with an (re)allocation if necessary.
 
        this->resource.resize(o.resource.size());
 
        std::copy(o.resource.begin(),
                  o.resource.end(),
                  this->resource.begin());
    }
 
    Object& operator=(const Object& o)
    {
        this->resource = o.resource;
        return *this;
    }
};
 
int main()
{
    Object o1;
 
    // o1 acquires its resource
    // ...
 
    // o2 acquires the resource from o1 by copy
    Object o2 = o1;
 
    return 0;
}

If we want to create an object o2 that acquires the resource from a source object o1, the resource is acquired by creating a full copy, eventually preceded by a (re)allocation. The source object can still be safely used to refer to the owned resource. This approach follows the “copy semantics” and is fine as long as creating multiple instances of the same resource does not pose performance issues.

But consider now the following code

Object o1 = some_process();

where a new object is created as a result of running some process rather than from a pre-existing object. The some_process() function returns an Object instance, which is a temporary anonymous object used only to initialize o1 and then destroyed after that statement is executed. If the process creates objects with a big memory footprint then using this data just to make a copy and then discard it is not really efficient, especially if it’s done very frequently. Another source of issues may be the fact that the resource is scarce or limited (file handles, sockets, DB connections, etc.) or it’s a unique one (a singleton, an external device, a video/audio stream, etc.).
In such cases it would be more convenient to move ownership of the resource from the source object to the destination object if the source object no longer needs it. In code it would look something like the following

class Object {
 
    std::vector<int> resource;
 
public:
    Object() {}
 
    Object(const Object& o)
    {
           this->resource.resize( o.resource.size() );
 
           std::copy(o.resource.begin(),
                     o.resource.end(), 
                     this->resource.begin());
    }
 
    Object& operator=(const Object& o)
    {
           this->resource = o.resource;
           return *this;
    }
 
    // A convenient method to move resource data.
    // NOTE: the below operations are currently not possible on
    // std::vectors, they're just for demonstration purposes.
    void move_from(Object& o)
    {
           // Dispose of the old resource data, if any
           this->resource->free();
           // Move the data from the source to this object
           this->resource->data( o.resource.data() );
           // Release the resource in the source object.
           o.resource.release();
     }
 
};

in the above code we have added a supplementary move_from(Object&) method that does the following:

1) deletes the current resource, if any (deallocates its memory)
2) moves the resource from the source object to this object (copy its memory location)
3) releases the resource in the source object (implementation-dependent)

After the resource is moved, the source object can no longer be used to access it and should be left into an “empty” valid state so that it can be safely destructed. In the above example, the release() method takes care of invalidating the resource in the source object, for example by setting the data to nullptr, but it may involve other operations needed to leave the source object in a valid empty state. With this method, every time we want to acquire ownership of the resources we could write something like this

o1.move_from(o2);
 
// or
 
o1.move_from( some_process() );

It would also be nice to have a mechanism that allows us to do so at object construction time as well, so that we can follow the RAII principle. We could do that in the copy constructor but its purposes (as the name itself implies) is not that of implementing move semantics, so there should be a mechanism to automatically disambiguate the cases where we need copy semantics and where we need move semantics using OOP paradigms, such as method overloading. C++11 has introduced such a mechanism natively: rvalue references.

Moving around with r-value references

To understand how it works we need to understand the concepts of rvalues and lvalues. These concepts have changed substantially during the course of history and new C++ standards keep changing their meaning by introducing new *values or modifying existing ones, so a correct definition now would probably become obsolete in the next version. Historically, and in the broader sense, an lvalue is any expression to which we can assign a value directly by its name or thru a reference to it. An rvalue is any expression that cannot be named or referenced to, such as temporary objects created and returned by function calls, numeric literals and the result of built-in mathematical operators, among other things. Variables are the most typical example of lvalues. But also an expression such as f()=3, if the function’s return value references a named variable. For example, in the following code

Object make_object() {
   Object o;
   // Do something with 'o' ...
   return o;
}

Object o1;
Object o2 = make_object();

o1 is an lvalue since it can be identified (and assigned to) by its name and can be safely bound to a reference or pointer. On the other hand, the Object produced by make_object() is an rvalue, a temporary one with no name that can’t be bound to a reference or pointer [1]. Another way to understand lvalues and rvalues is by their lifetime: lvalues, being named values, have a lifetime determined by the scope in which they’re defined, while rvalues’ lifetime is limited to the expression or statement that generates them. These are oversimplified (thus not quite exact) definitions that should be sufficient to proceed further as a complete and precise dissertation is beyond the scope of this article.

Modern C++ provides a mechanism to handle rvalues by introducing the rvalue reference T&& and the std::move() function. When using these two features in combination it is possible to capture rvalues (like temporary objects) or turn any lvalue into an rvalue in order to move the resources they own rather than copying them. Returning to our previous example, we can forget about the imaginary move_from() method and extend the Object class with the new C++ features as follows

// Modern C++
 
class Object {
 
     std::vector<int> resource;
 
public:
 
     Object() = default;
 
     Object(const Object& o)
     {
          // Copy
          this->resource = o.resource;
     }
 
     Object(Object&& o)
     {
          // Move
          // turns 'o', which is an lvalue here, into an rvalue
          this->resource = std::move( o.resource );
     }
 
     Object& operator=(const Object& o)
     {
          // Copy
          this->resource = o.resource;
          return *this;
     }
 
     Object& operator=(Object&& o)
     {
          // Move
          this->resource = std::move( o.resource );
          return *this;
     }
};

where Object(Object&& o) is called the move constructor and operator=(Object&& o) the move assignment operator. Their arguments are rvalue references to an Object and together with the std::move() function they allow the implementation of the move semantics. Specifically, if we write the following code

Object o1;
Object o2 = o1;
Object o3 = some_process(); // returns an rvalue Object

the assignment at line 2 will trigger the copy constructor, since o2 is an lvalue and will be captured by the standard lvalue reference operator T&. On the other hand, the assignment at line 3 will trigger the move constructor because the expression on the right side is an rvalue (it generates a temporary anonymous object). In the move constructor we can then move the resource from the source object by using std::move(), or implement a set of operations like in our imaginary move_from(Object&) method if our object directly manages the resource.

The std::move() function does not actually move anything but merely “signals” that the resource (a vector in this case) can be moved. It is up to the moved resource to implement the necessary logic to actually move the data. Under the hood, std::move() simply performs a (static) cast on the object to an rvalue reference so that it will trigger the appropriate overloaded move method. So, in the above example the statement std::move( o.resource ); simply returns an rvalue reference to the source vector and triggers the move assignment operator of std::vector, where a set of operations similar to the ones in our imaginary move_from() method are performed to move the data.

Since lvalues can now be turned into rvalues by using std::move(), the distinction between the two becomes quite fuzzy. And in fact, C++11 has overhauled these old concepts by redefining (once again) what they mean. lvalues can now also be rvalues! That is, any lvalue that can be moved is also an rvalue, which the new Standard calls xvalues (eXtraterrestrial values) and that are part of a larger superset, the glvalues . The term “rvalue” now indicates anything that can be moved, whether temporary or not, while “lvalue” refers to anything that can be referenced by name but cannot be moved. For more esoteric details see Value Categories.

All STL containers implement the move semantics to optimize resource acquisition. This can be directly verified with the following code

template<typename C>
C get_container() {
    return C();
}
 
std::vector<int> v1, v2;
std::list<int> l1, l2;
std::map<int, int> m1, m2;
std::deque<int> d1, d2;
 
v1 = v2;                                  // calls vector& operator=(const vector& rhs)
v1 = get_container<std::vector<int>>();   // calls vector& operator=(vector&& rhs)
 
l1 = l2;                                  // calls list& operator=(const list& rhs)
l1 = get_container<std::list<int>>();     // calls list& operator=(list&& rhs)
 
m1 = m2;                                  // calls map& operator=(const map& rhs)
m1 = get_container<std::map<int,int>>();  // calls map& operator=(map&& rhs)
 
d1 = d2;                                  // calls deque& operator=(const deque& rhs)
d1 = get_container<std::deque<int>>();    // calls deque& operator=(deque&& rhs)
 
// and others ...

Conclusion

C++ has finally introduced some tools that make it much more clean to implement resource acquisition by moving. Whenever trying to pass around objects holding resources, always consider whether it’s more convenient to move the resource instead of copying it. Many STL classes are “move-aware” and provide this functionality out-of-the-box, but in user-defined classes, it’s a good idea to implement the move mechanism if the object is supposed to manage some kind of expensive resource.


[1] C++ allows temporaries (r-values) to be bound to const references thru a mechanism called “object lifetime extension”, but that’s a controversial feature that doesn’t find much justification for its use in modern C++.

Published inModern C++