Skip to content

Modern C++ • Type Inference

Type systems are a major component of any programming language and, broadly speaking, their purpose is to associate “types” to “values” by finding a binding type ↔ value from annotations and/or flow paths in the source code using a set of rules. The term type is a property indicating a specific data type among those supported by the language, and value is any construct of the language taking or producing a value, that is a variable or an expression. Depending on the amount of information about the type of a value provided in the code through annotations, a language is said to be explicitly typed or implicitly typed. The way a language implementation (compiler, interpreter) determines the type when it meets a value declaration (hereafter referred to as simply “typing”) generally falls in one (or a combination) of two categories:

  • Explicit typing – the type of value is determined from explicit annotations given by the programmer in the source code
  • Implicit typing – the type of value is deduced by analyzing the code because limited or no type annotations are given

Explicit typing finds a binding type ↔ value from code annotations that expressly indicate what the type for value must be and is typically performed at compile time since the type is known before the program is executed. In this case it is usually referred to as explicit static typing. Some languages support explicit typing at run time but it is rarely strictly enforced since in these languages types can change anytime during the execution of the program.

Implicit (or automatic) typing is performed from non-annotated or partially annotated code where only value is specified (or value with some “hints” about type) and can happen at both compile and run time, depending on the language. In this case, the language implementation has to reconstruct the missing information about type from the context by analyzing the code, thus performing what’s called type inference. If performed at compile time it’s said to be implicit static typing and the type inference is also static, that is once the type of value is determined and checked it cannot be changed when the program is executed. If performed at run time, it is usually associated with dynamic typing where the type of value can be inferred and changed multiple times during the execution of the program.

Directly related to typing is the concept of type-checking, that is the verification that a type ↔ value binding is valid and safe. Depending on how the typing is performed, this operation may be called static type checking (for static typing, whether explicit or implicit), dynamic type checking (for dynamic typing) or a combination of both (hybrid type checking or soft type checking). Also, depending on how strictly the binding is enforced, the type-safety of the language can be classified as being “strong” or “weak” [1].

The code below illustrates the difference between a statically typed language and a dynamically typed one

// We want lasagne ... 3 portions of them!

// C++'s (explicit) static typing
int order = 3;                       // type: int
process(order)
order = "Lasagne";                   // ERROR: can't assign string to int
process(order)
std::cout << "Enjoy your meal!";     // Ouch... no lasagne for you
# Python's (implicit) dynamic typing
order = 3                            # type: <class 'int'>
process(order)
order = "Lasagne"                    # type: <class 'str'>
process(order)
print ("Enjoy your meal!")           # Buon appetito!

C++’s type inference is done once at compile time by enforcing a constraint on the type that the variable can take at run time. If such constraints are broken, an error or undefined behavior will occur. On the other hand, in a dynamically typed language, such as Python, the type of a variable may be inferred and changed multiple times at run time, allowing for more flexibility (at the expense of performance).

C++ is a statically typed language, both explicitly (through annotations) and implicitly (through type inference), and quite weakly typed since the type-system can be easily broken either indirectly (e.g. through implicit conversions) or by making use of certain language features [2], which can be avoided by following modern practices [3]. However, until C++11 it only supported explicit typing, requiring the programmer to expressly specify the type for each declared value in the program. This approach, while in line with the philosophy of the language, has some drawbacks. Consider the following code

std::map<std::string, std::list<std::string>> records = get_records();

// Old C++

for (std::map<std::string, std::list<std::string>>::iterator it=records.begin();
     it!=records.end(); ++it)
{
	std::list<std::string> &fields = (*it).second;

	for (std::list<std::string>::iterator it2=fields.begin(); it2!=fields.end(); ++it2) {
		// Process fields
	}
}

// Modern C++

for (std::pair<const std::string, std::list<std::string>> &record : records)
{
	std::list<std::string> fields = record.second;

	for (std::string &field : fields) {
		// Process fields
	}
}

The old-style C++ is very verbose, and even using the more compact new range-based loop it is still not pleasant to read. If you are working on even a moderately complex project where there are many types with long baroque names the readability of that code base will be an assured nightmare. But there is also another problem. Consider now the code below

// Indirectly get a closure by wrapping it into a function object
std::function<int(int)> inc = [](int a) { return a + 1; }

the problem here is that lambda functions have unspecified types known only to the compiler and the only way to capture them would be through an indirection by using function objects as wrappers. In both of the above examples the compiler could very well deduce the types of the values by looking, for example, at the return type and then from there figuring out the rest, and for the lambda it would be trivial since the compiler itself creates the type. Instead it did not, forcing the programmer to write redundant code.

It would have been nice if C++ allowed some degree of freedom in the declarations by dispensing the programmer from having to specify the type each time a value is declared, even when it’s very clear what the type is.

 

The auto keyword

C++11 has introduced implicit typing capabilities through type inference by (wisely) repurposing a keyword with a previously useless semantics: auto. The auto keyword indicates to the compiler that the type for value must be inferred from the context, specifically, from the initializer or return statements, depending on whether value is a variable or a function. The above examples can then be rewritten as follows

// A relief for the eyes...
auto records = get_records();

for (auto &record : records)
{
	auto fields = record.second;

	for (auto &field : fields) {
		// Process fields
	}
}

// ...and we get direct binds to lambdas
auto inc = [](int a) { return a + 1; }

with a clear improvement in readability and allowing direct binding to otherwise unspecified types. By default, autoinfers the types by value but it can be used together with other modifiers, such as const, &, && and *

auto a = 1;                // inferred type is int
auto &b = a;               // inferred type is int&
auto *c = &a;              // inferred type is int*
auto &&d = 5;              // inferred type is int&&
auto &&e = 1.5;            // inferred type is double&&
const auto pi = 3.14;      // inferred type is const double
auto ages = {25,35,45};    // inferred type is std::initializer_list<int>
auto z {10};               // inferred type is int

and can also be used with functions, even though, in my opinion, this should be discouraged (see section “Abusing type inference”)

// Return type inferred from local variables
auto make_contacts()
{
	std::map<std::string, int> contacts 
	{
		{ "Albert", 209309 },
	    { "Mark", 398494 },
	    { "Paul", 109397 },
	};

	// ...
	return contacts;
}

// Return type inferred from arguments
auto add_num(int a, int b) 
{
	return a + b;
}

// This is controversial and not supported by all compilers
auto process(auto a, auto b) 
{
	// ...
}

auto contacts = make_contacts();
auto sum = add_num(1,2);

actually using auto in function parameters seems not supported by all compilers. For example Microsoft’s C/C++ compiler (v14) does not support it. If i am not mistaken there was a proposal for this feature to be included in C++17, but has then been dropped. And I, personally, agree with that choice.

 

Abusing type inference

The introduction of type inference in C++ has undoubtedly improved the life of the programmer by allowing to write more readable and maintainable code if used judiciously, and it can also do some optimization by reducing unnecessary type conversions. However, programmers should not be tempted to auto‘ing anything under the sun. Remember, this feature does not turn C++ into a dynamically typed language so that you can use it Javascript-style.

Using auto in function signatures should be done with caution, especially with public classes. This is because methods and functions may be part of an API or interface, in which case the signatures should clearly and immediately specify the types involved and be self-documenting without unnecessarily forcing the reader to dive into the documentation or, even worse, the code.

Use auto locally, not globally. That is, only for types that have dependencies within a local scope. Using it all over the place can lead to the opposite effect of making the code too cryptic by obscuring intents and making it harder to maintain. Although this problem could be mitigated by using very descriptive identifiers, not all developers do a good job at that.

Also, keep in consideration the possibility of getting a value that’s not really the expected one. While auto generally deduces the right type, sometimes it may not be a suitable one. So if the code has been mindlessly polluted with autos, beware of subtle bugs that can arise, like the one in the following code (find it 🙂 )

auto get_list_size()
{
	auto a = 145000;
	auto b = 35000;
	return a * b;
}

auto size = get_list_size();
auto list = make_list(size);  // create a list with 'size' elements

In short, use it judiciously. Mainly whenever there is a clear readability issue and the types are evident or locally deducible, or when the type of a value is unknown but you need to capture the value for later use (like in the lambdas case). Do not assume that just because your intelligent IDE can instantly track down the types anyone else reading the code can do the same.

Use it, but don’t abuse it!

 

NOTES:

  1. There is a common misuse of terms (and often even a misconception) that associates static typing to a strongly typed language. This is generally incorrect as binding a type to a value at compile time does not automatically make the type system strong if it can be easily broken at run-time using features of the language itself.
  2. Foundations of C++ (B. Stroustrup) [Section 13]
  3. A brief introduction to C++’s model for type- and resource-safety (B. Stroustrup, H. Sutter, G. Dos Reis)

 

Published inModern C++