Source: LearnCpp.com by Alex

Introduction to programming languages

Optional Reading: Difference between compiled and interpreted languages?

Interpreters and compilers have complementary strengths and weaknesses, it’s becoming increasingly common for language runtimes to combine elements of both. Java’s JVM is a good example of this. The Java itself is compiled to byte code, and then directly to machine code.

Also, another key point from this article is that languages are not compiled or interpreted. They are not a nature of the languages. C code is compiled, but there are C interpreters available that make it easier to debug or visualize the code.

Introduction to C++

History of C++

  • 1979: Bjarne Stroustrup at Bell Labs started developing C++
  • 1998: Ratified by the ISO committee in 1998
  • 2003: C++03
  • 2011: C++11 (a huge number of new capabilities)
  • 2014: C++14
  • 2017: C++17

C and C++’s philosophy

The underlying design philosophy of C and C++ can be summed up as "trust the programmer", which is both wonderful and dangerous. Although we have the freedom, it’s important to know the things you should not do with C++.

History of “bug”

The term bug was first used by Thomas Edison back in the 1870s! However, the term was popularized in the 1940s when engineers found an actual moth stuck in the hardware of an early computer, causing a short circuit. Both the log book in which the error was reported and the moth are now part of the Smithsonian Museum of American History.

Compilers, linkers, and the libraries

For complex projects, some development environments use a makefile, which is a file that describes how to build a program.

C++ Basics

This section covers too many C stuff. I just write down something important to notice and recall!

std::endl vs ‘\n’

Using std::endl can be a bit inefficient, as it actually does two jobs: it moves the cursor to the next line, and it “flushes” the output (makes sure that it shows up on the screen immediately; output the buffer).

When writing text to the console using std::cout, std::cout usually flushes output anyway (and if it doesn’t, it usually doesn’t matter), so having std::endl flush is rarely important.

The ‘\n’ character doesn’t do the redundant flush, so it performs better.

  • << - insertion operator
  • >> - extraction operator. The input must be stored in a variable to be used.

Expressions

Initialization can be used to give a variable a value at the point of creation. C++ supports 3 types of initialization:

  • copy initialization
  • direct initialization
  • uniform initialization.

An expression is a combination of literals, variables, operators, and explicit function calls (not shown above) that produces a single output value. When an expression is executed, each of the terms in the expression is evaluated until a single value remains (this process is called evaluation). That single value is the result of the expression.

Statements vs. Expressions:

  • Statements are used when we want the program to perform an action.
  • Expressions are used when we want the program to calculate a value.

Certain expressions (e.g. x = 5) are useful by themselves. However, we mentioned above that expressions must be part of a statement, so how can we use these expressions by themselves?

Fortunately, we can convert any expression into an equivalent statement (called an expression statement). An expression statement is a statement that consists of an expression followed by a semicolon. When the statement is executed, the expression will be evaluated (and the result of the expression will be discarded).

1
2
3
4
5
6
7
8
int x;  // this statement does not contain an expression (this is just a variable definition)
5 * 6;
5 < 6;

std::cout << x; // hint: operator << is a binary operator
// If operator << is a binary operator, then std::cout must be the left-hand operand,
// and x must be the right-hand operand. Since that's the entire statement,
// this must be an expression statement.

Functions and Files

C++ does not define whether function calls evaluate arguments left to right or vice-versa.

Function parameters and variables defined inside the function body are called local variables. The time in which a variable exists is called its lifetime. Variables are created and destroyed at runtime, which is when the program is running. A variable’s scope determines where it can be accessed. When a variable can be accessed, we say it is in scope. When it cannot be accessed, we say it is out of scope. Scope is a compile-time property, meaning it is enforced at compile time.

Refactoring is the process of breaking down a larger function into many smaller, simpler functions.

Whitespace refers to characters used for formatting purposes. In C++, this includes spaces, tabs, and newlines.

  • A definition actually implements (for functions and types) or instantiates (for variables) an identifier.
  • A declaration is a statement that tells the compiler about the existence of the identifier. In C++, all definitions serve as declarations. Pure declarations are declarations that are not definitions (such as function prototypes).

Namespace

In C++, a namespace is a grouping of identifiers that is used to reduce the possibility of naming collisions. It turns out that std::cout‘s name isn’t really std::cout. It’s actually just cout, and std is the name of the namespace that identifier cout is part of. In modern C++, all of the functionality in the C++ standard library is now defined inside namespace std (short for standard).

When you use an identifier that is defined inside a namespace (such as the std namespace), you have to tell the compiler that the identifier lives inside the namespace.

  • Explicit namespace qualifier std::
    This is the safest way to use cout, sine there’s no ambiguity.

  • Using directive (and why to avoid it!!!)
    A using directive tells the compiler to check a specified namespace when trying to resolve an identifier that has no namespace prefix. So in the above example, when the compiler goes to determine what identifier cout is, it will check both locally (where it is undefined) and in the std namespace (where it will match to std::cout).

    1
    2
    3
    4
    5
    6
    #include <iostream>
    using namespace std;
    int main() {
    cout << "Hello World!";
    return 0;
    }

Many texts, tutorials, and even some compilers recommend or use a using directive at the top of the program. However, this is a bad practice, and is highly discouraged.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream> // imports the declaration of std::cout
using namespace std; // makes std::cout accessible as "cout"

int cout() { // declares our own "cout" function
return 5;
}

int main() {
cout << "Hello, world!";
// Compile error! Which cout do we want here?
// The one in the std namespace or the one we defined above?

return 0;
}

bool & Literals

1
2
3
4
5
int main() {
bool flag; // remember to initialize it!
std::cout << flag << "\n"; // 1 or 0
return 0;
}

In C++, bool is treated as 0 or 1.

C++ has two kinds of constants: literal and symbolic. In this lesson, we’ll cover literals.

Just like variables have a type, all literals have a type too. The type of a literal is assumed from the value and format of the literal itself.

You can use literal suffixes to change the default type of a literal if its type is not what you want.

1
2
float f1 = 5.0f;
float f2 = 4.1; // // since the type of 4.0 is double, it results in a loss of precision

In C++14, we can assign binary literals by using 0b prefix:

1
2
3
4
5
int bin(0);
bin = 0b1; // assign binary 0000 0001 to the variable
bin = 0b11; // assign binary 0000 0011 to the variable
bin = 0b1010; // assign binary 0000 1010 to the variable
bin = 0b11110000; // assign binary 1111 0000 to the variable

Because long literals can be hard to read, C++14 also adds the ability to use a quotation mark ' as a digit separator. In Java, we use _ instead.

1
2
int bin = 0b1011'0010;  // assign binary 1011 0010 to the variable
long value = 2'132'673'462; // much easier to read than 2132673462

const

To make a variable constant, simply put the const keyword either before or after the type identifier, like so:

1
2
const double gravity { 9.8 }; // preferred use of const before type
int const sidesInSquare { 4 }; // okay, but not preferred

Although C++ will accept const either before or after the type identifier, we recommend using it before the type because it better follows standard English language convention where modifiers come before the object being modified (e.g. a green ball, not a ball green).

Const variables must be initialized when you define them, and then that value cannot be changed via assignment. Otherwise, it will cause a compile error:

1
const double gravity;  // compiler error

Note that const variables can be initialized from non-const values:

1
2
3
4
std::cout << "Enter your age: ";
int age;
std::cin >> age;
const int usersAge (age); // usersAge cannot be changed

Const is used most often with function parameters:

1
2
3
void printInteger(const int myValue) {
std::cout << myValue << "\n";
}

Does two things:

  • Let the person calling the function know that the function will not change the value of myValue.
  • It ensures that the function doesn’t change the value of myValue.

However, with parameters passed by value, like the above, we generally don’t care if the function changes the value of the parameter, since it’s just a copy that will be destroyed later. Thus, we usually don’t make parameters passed by value const.

Runtime constants are those whose initialization values can only be resolved at runtime (when your program is running). Variables such as myValue above is a runtime constant, because the compiler can’t determine their values at compile time. myValue depends on the value passed into the function (which is only known at runtime).

In most cases, it doesn’t matter whether a constant value is runtime or compile-time. However, there are a few odd cases where C++ requires a compile-time constant instead of a run-time constant (such as when defining the length of a fixed-size array — we’ll cover this later). Because a const value could be either runtime or compile-time, the compiler has to keep track of which kind of constant it is.

To help provide more specificity, C++11 introduced new keyword constexpr, which ensures that the constant must be a compile-time constant:

1
2
3
4
5
6
7
8
9
10
constexpr double gravity (9.8);
// ok, the value of 9.8 can be resolved at compile-time
constexpr int sum = 4 + 5;
// ok, the value of 4 + 5 can be resolved at compile-time

std::cout << "Enter your age: ";
int age;
std::cin >> age;
constexpr int myAge = age;
// not okay, age cannot be resolved at compile-time
  • Rule 1: Any variable that should not change values after initialization and whose initializer is known at compile-time should be declared as constexpr.
  • Rule 2: Any variable that should not change values after initialization and whose initializer is not known at compile-time should be declared as const.

Naming your const variables: Some programmers prefer to use all upper-case names for const variables. Others use normal variable names with a k prefix. However, we will use normal variable naming conventions, which is more common. Const variables act exactly like normal variables in every case except that they cannot be assigned to, so there’s no particular reason they need to be denoted as special.

Symbolic constants: A symbolic constant is a name given to a constant literal value. There are two ways to declare symbolic constants in C++. One of them is good, and the other one is not.

  • Bad: Using object-like macros with a substitution parameter as symbolic constants.
  • Better: Use const variables, or better, constexpr.

That way using symbolic constants, if you ever need to change them, you only need to change them in one place.

Side effects in inc / dec

A function or expression is said to have a side effect if it modifies some state (e.g. any stored information in memory), does input or output, or calls other functions that have side effects.

Usually, they are useful:

1
2
3
x = 5; // the assignment operator modifies the state of x
++x; // operator++ modifies the state of x
std::cout << x; // operator<< modifies the state of the console

However, side effects can also lead to unexpected results:

1
2
3
4
5
6
7
8
9
10
11
int add(int x, int y) { return x + y; }

int main() {
int x = 5;
int value = add(x, ++x);
// is this 5 + 6, or 6 + 6? It depends on what in order your compiler evaluates the function arguments.

std::cout << value;
// value could be 11 or 12, depending on how the above line evaluates!
return 0;
}

C++ does not define the order in which function arguments are evaluated. Note that this is only a problem because one of the arguments to function add() has a side effect.

Another popular example:

1
2
3
4
5
6
7
int main() {
int x = 1;
x = x++;
std::cout << x;

return 0;
} // the output is undefined
  • If the ++ is applied to x before the assignment, the answer will be 1 (postfix operator++ increments x from 1 to 2, but it evaluates to 1, so the expression becomes x = 1).

  • If the ++ is applied to x after the assignment, the answer will be 2 (this evaluates as x = x, then postfix operator++ is applied, incrementing x from 1 to 2).

There are other cases where C++ does not specify the order in which certain things are evaluated, so different compilers will make different assumptions.

Please don’t ask why your programs that violate the above rules produce results that don’t seem to make sense. That’s what happens when you write programs that have "undefined behavior". :)

Bit flags & Bit masks (optional)

In the majority of cases, this is fine – we’re usually not so hard-up for memory that we need to care about 7 wasted bits. However, in some storage-intensive cases, it can be useful to “pack” 8 individual boolean values into a single byte for storage efficiency purposes. This is done by using the bitwise operators to set, clear, and query individual bits in a byte, treating each as a separate boolean value. These individual bits are called bit flags.

Defining bit flags in C++14

In order to work with individual bits, we need to have a way to identify the individual bits within a byte, so we can manipulate those bits (turn them on and off). This is typically done by defining a symbolic constant to give a meaningful name to each bit used. The symbolic constant is given a value that represents that bit.

Because C++14 supports binary literals, this is easiest in C++14:

1
2
3
4
5
6
7
8
9
// Define 8 separate bit flags (these can represent whatever you want)
const unsigned char option0 = 0b0000'0001; // represents bit 0
const unsigned char option1 = 0b0000'0010; // represents bit 1
const unsigned char option2 = 0b0000'0100; // represents bit 2
const unsigned char option3 = 0b0000'1000; // represents bit 3
const unsigned char option4 = 0b0001'0000; // represents bit 4
const unsigned char option5 = 0b0010'0000; // represents bit 5
const unsigned char option6 = 0b0100'0000; // represents bit 6
const unsigned char option7 = 0b1000'0000; // represents bit 7

Defining bit flags in C++11 or earlier

Because C++11 doesn’t support binary literals, we have to use other methods to set the symbolic constants. There are two good methods for doing this. Less comprehensible, but more common, is to use hexadecimal.

1
2
3
4
5
6
7
8
9
// Define 8 separate bit flags (these can represent whatever you want)
const unsigned char option0 = 0x1; // hex for 0000 0001
const unsigned char option1 = 0x2; // hex for 0000 0010
const unsigned char option2 = 0x4; // hex for 0000 0100
const unsigned char option3 = 0x8; // hex for 0000 1000
const unsigned char option4 = 0x10; // hex for 0001 0000
const unsigned char option5 = 0x20; // hex for 0010 0000
const unsigned char option6 = 0x40; // hex for 0100 0000
const unsigned char option7 = 0x80; // hex for 1000 0000

This can be a little hard to read. One way to make it easier is to use the left-shift operator to shift a bit into the proper location.

1
2
3
4
5
6
7
8
const unsigned char option0 = 1 << 0; // 0000 0001 
const unsigned char option1 = 1 << 1; // 0000 0010
const unsigned char option2 = 1 << 2; // 0000 0100
const unsigned char option3 = 1 << 3; // 0000 1000
const unsigned char option4 = 1 << 4; // 0001 0000
const unsigned char option5 = 1 << 5; // 0010 0000
const unsigned char option6 = 1 << 6; // 0100 0000
const unsigned char option7 = 1 << 7; // 1000 0000

Using bit flags to manipulate bits

The next thing we need is a variable that we want to manipulate. Typically, we use an unsigned integer of the appropriate size (8 bits, 16 bits, 32 bits, etc… depending on how many options we have).

1
unsigned char myflags = 0; // all bits turned off to start

To set a bit (turn on)

We use bitwise OR equals (operator |=):

1
2
3
4
5
6
myflags |= option4; // turn option 4 on
myflags |= (option4 | option5); // turn on multiple bits
// myflags = 0000 0000 (we initialized this to 0)
// option4 = 0001 0000
// -------------------
// result = 0001 0000

Turn bits off

1
2
3
4
5
6
myflags &= ~option4;
myflags &= ~(option4 | option5);
// myflags = 0001 1100
// ~option4 = 1110 1111
// --------------------
// result = 0000 1100

Toggle a bit state

We use bitwise XOR:

1
2
myflags ^= option4; // flip option4 from on to off, or vice versa
myflags ^= (option4 | option5); // flip options 4 and 5 at the same time

Determining if a bit is on or off

1
2
3
4
if (myflags & option4)
std::cout << "myflags has option 4 set";
if (!(myflags & option5))
std::cout << "myflags does not have option 5 set";

Here is an actual example for a game we might write:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Define a bunch of physical/emotional states
const unsigned char isHungry = 1 << 0; // 0000 0001
const unsigned char isSad = 1 << 1; // 0000 0010
const unsigned char isMad = 1 << 2; // 0000 0100
const unsigned char isHappy = 1 << 3; // 0000 1000
const unsigned char isLaughing = 1 << 4; // 0001 0000
const unsigned char isAsleep = 1 << 5; // 0010 0000
const unsigned char isDead = 1 << 6; // 0100 0000
const unsigned char isCrying = 1 << 7; // 1000 0000

unsigned char me = 0; // all flags/options turned off to start
me |= isHappy | isLaughing; // I am happy and laughing
me &= ~isLaughing; // I am no longer laughing

// Query a few states (we'll use static_cast<bool> to interpret the results as a boolean value rather than an integer)
std::cout << "I am happy? " << static_cast<bool>(me & isHappy) << '\n';
std::cout << "I am laughing? " << static_cast<bool>(me & isLaughing) << '\n';

Why are bit flags useful?

Astute readers will note that the above myflags example actually doesn’t save any memory. 8 booleans would normally take 8 bytes. But the above example uses 9 bytes (8 bytes to define the bit flag options, and 1 bytes for the bit flag)! So why would you actually want to use bit flags?

Bit flags are typically used in two cases:

  • When you have many sets of identical bitflags.

  • Imagine you had a function that could take any combination of 32 different options. One way to write that function would be to use 32 individual boolean parameters:

    1
    2
    3
    void someFunction(bool option1, bool option2, bool option3, bool option4, bool option5, bool option6, bool option7, bool option8, bool option9, bool option10, bool option11, bool option12, bool option13, bool option14, bool option15, bool option16, bool option17, bool option18, bool option19, bool option20, bool option21, bool option22, bool option23, bool option24, bool option25, bool option26, bool option27, bool option28, bool option29, bool option30, bool option31, bool option32);
    // ↓
    void someFunction(unsigned int options);

An introduction to std::bitset

All of this bit flipping is exhausting, isn’t it? Fortunately, the C++ standard library comes with functionality called std::bitset that helps us manage bit flags.

To create a std::bitset, you need to include the bitset header, and then define a std::bitset variable indicating how many bits are needed. The number of bits must be a compile time constant.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <bitset>

// Note that with std::bitset, our options correspond to bit indices, not bit patterns
const int option0 = 0;
const int option1 = 1;
const int option2 = 2;
const int option3 = 3;
const int option4 = 4;
const int option5 = 5;
const int option6 = 6;
const int option7 = 7;

std::bitset<8> bits; // we need 8 bits
std::bitset<8> bits(option1 | option2);
// start with option 1 and 2 turned on
std::bitset<8> morebits(0x3);
// start with bit pattern 0000 0011

std::bitset provides 4 key functions:

  • test() allows us to query whether a bit is a 0 or 1
  • set() allows us to turn a bit on (this will do nothing if the bit is already on)
  • reset() allows us to turn a bit off (this will do nothing if the bit is already off)
  • flip() allows us to flip a bit from a 0 to a 1 or vice versa

Bit masks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
const unsigned int redBits   = 0xFF'00'00'00;
const unsigned int greenBits = 0x00'FF'00'00;
const unsigned int blueBits = 0x00'00'FF'00;
const unsigned int alphaBits = 0x00'00'00'FF;

std::cout << "Enter a 32-bit RGBA color value in hexadecimal (e.g. FF7F3300): ";
unsigned int pixel;
std::cin >> std::hex >> pixel; // std::hex allows us to read in a hex value

// use bitwise AND to isolate red pixels, then right shift the value into the range 0-255
unsigned char red = (pixel & redBits) >> 24;
unsigned char green = (pixel & greenBits) >> 16;
unsigned char blue = (pixel & blueBits) >> 8;
unsigned char alpha = pixel & alphaBits;

std::cout << "Your color contains:\n";
std::cout << static_cast<int>(red) << " of 255 red\n";
std::cout << static_cast<int>(green) << " of 255 green\n";
std::cout << static_cast<int>(blue) << " of 255 blue\n";
std::cout << static_cast<int>(alpha) << " of 255 alpha\n";

Variable Scope and More Types

Local variables, scope, and duration

When discussing variables, it’s useful to separate out the concepts of scope and duration.

  • A variable’s scope determines where a variable is accessible.
  • A variable’s duration determines when it is created and destroyed. The two concepts are often linked.

Variables defined inside a function are called local variables. Local variables have automatic duration, which means they are created (and initialized, if relevant) at the point of definition, and destroyed when the block they are defined in is exited. Local variables have block scope (also called local scope), which means they enter scope at the point of declaration and go out of scope at the end of the block that they are defined in.

1
2
3
4
5
6
7
8
9
10
11
int main() { // outer block
int n(5); // n created and initialized here

{ // begin nested block
double d(4.0); // d gets created and initialized here
} // d goes out of scope and is destroyed here

// d cannot be used here because it was already destroyed!

return 0;
} // n goes out of scope and is destroyed here

Note that a variable inside a nested block can have the same name as a variable inside an outer block. When this happens, the nested variable “hides” the outer variable. This is called name hiding or shadowing.

Shadowing is something that should generally be avoided, as it is quite confusing!

Rule: Avoid using nested variables with the same names as variables in an outer block. Variables should be defined in the most limited scope possible.

Global variables and linkage

Variables declared outside of a function are called global variables. Global variables have static duration, which means they are created when the program starts and are destroyed when it ends. Global variables have file scope (also informally called global scope or global namespace scope), which means they are visible until the end of the file in which they are declared.

Similar to how variables in an inner block with the same name as a variable in an outer block hides the variable in the outer block, local variables with the same name as a global variable hide the global variable inside the block that the local variable is declared in. However, the global scope operator :: can be used to tell the compiler you mean the global version instead of the local version.

1
2
3
4
5
6
7
int value(5);

int main() {
int value = 7;
value++; // local
::value++; // global
}

However, having local variables with the same name as global variables is usually a recipe for trouble, and should be avoided whenever possible. By convention, many developers prefix global variable names with g_ to indicate that they are global. This both helps identify global variables as well as avoid naming conflicts with local variables.

Internal and external linkage via the static and extern keywords:

In addition to scope and duration, variables have a third property: linkage. A variable’s linkage determines whether multiple instances of an identifier refer to the same variable or not.

(Strong & weak symbols?)

  • A variable with no linkage can only be referred to from the limited scope it exists in. Regular local variables are an example of variables with no linkage. Two local variables with the same name but defined in different functions have no linkage – each will be considered an independent variable.

  • A variable with internal linkage is called an internal variable (or static variable). Variables with internal linkage can be used anywhere within the file they are defined in, but cannot be referenced outside the file they exist in.

  • A variable with external linkage is called an external variable. Variables with external linkage can be used both in the file they are defined in, as well as in other files.

If we want to make a global variable internal (able to be used only within a single file), we can use the static keyword to do so; similarly, if we want to make a global variable external (able to be used anywhere in our program), we can use the extern keyword to do so.

1
2
3
4
extern double g_y(9.8); 
// g_y is external, and can be used by other files

// Note: those other files will need to use a forward declaration to access this external variable (extern)

By default, non-const variables declared outside of a function are assumed to be external. However, const variables declared outside of a function are assumed to be internal.

Note that this means the extern keyword has different meanings in different contexts:

  • In some contexts, extern means “give this variable external linkage”.
  • In other contexts, extern means “this is a forward declaration for an external variable that is defined somewhere else”.

Function linkage:

Functions have the same linkage property that variables do. Functions always default to external linkage, but can be set to internal linkage via the static keyword.

Function forward declarations don’t need the extern keyword. The compiler is able to tell whether you’re defining a function or a function prototype by whether you supply a function body or not.

The one-definition rule and non-external linkage:

Forward declarations and definitions, we noted that the one-definition rule says that an object or function can’t have more than one definition, either within a file or a program.

However, it’s worth noting that non-extern objects and functions in different files are considered to be different entities, even if their names and types are identical. This makes sense, since they can’t be seen outside of their respective files anyway.

Global symbolic constants:

1
2
3
4
5
6
7
8
9
10
11
12
#ifndef CONSTANTS_H
#define CONSTANTS_H

// define your own namespace to hold constants
namespace Constants {
const double pi(3.14159);
const double avogadro(6.0221413e23);
const double my_gravity(9.2);
// m/s^2 -- gravity is light on this planet
// ... other related constants
}
#endif

This duplication of variables isn’t really that much of a problem (since constants aren’t likely to be huge), but changing a single constant value (not names) would require recompiling every file that includes the constants header, which can lead to lengthy rebuild times for larger projects.

We can avoid this problem by turning these constants into const global variables, and changing the header file to hold only the variable forward declarations:

constants.cpp:

1
2
3
4
5
6
7
namespace Constants {
// actual global variables
extern const double pi(3.14159);
extern const double avogadro(6.0221413e23);
extern const double my_gravity(9.2);
// m/s^2 -- gravity is light on this planet
}

constants.h:

1
2
3
4
5
6
7
8
9
10
#ifndef CONSTANTS_H
#define CONSTANTS_H

namespace Constants {
// forward declarations only
extern const double pi;
extern const double avogadro;
extern const double my_gravity;
} // use of g_ is not necessary
#endif

However, there are a couple of downsides to doing this. First, these constants are now considered compile-time constants only within the file they are actually defined in (constants.cpp), not anywhere else they are used. This means that outside of constants.cpp, they can’t be used anywhere that requires a compile-time constant (constexpr) (such as for the length of a fixed array, something we talk about in chapter 6). Second, the compiler may not be able to optimize these as much.

Given the above downsides, we recommend defining your constants in the header file. If you find that for some reason those constants are causing trouble, you can move them into a .cpp file as per the above as needed.

Link: When to use extern in C++

Why (non-const) global variables are evil?

If you were to ask a veteran programmer for one piece of advice on good programming practices, after some thought, the most likely answer would be, “Avoid global variables!”. And with good reason: global variables are one of the most abused concepts in the language. Although they may seem harmless in small academic programs, they are often hugely problematic in larger ones.

But before we go into why, we should make a clarification. When developers tell you that global variables are evil, they’re not talking about ALL global variables. They’re mostly talking about non-const global variables.

One of the reasons to declare local variables as close to where they are used as possible is because doing so minimizes the amount of code you need to look through to understand what the variable does. Global variables are at the opposite end of the spectrum — because they can be used anywhere, you might have to look through a significant amount of code to understand their usage.

Global variables also make your program less modular and less flexible. A function that utilizes nothing but its parameters and has no side effects is perfectly modular. Modularity helps both in understanding what a program does, as well as with reusability. Global variables reduce modularity significantly.

Rule: Use local variables instead of global variables whenever reasonable, and pass them to the functions that need them.

So what are very good reasons to use non-const global variables?

There aren’t many. In many cases, there are other ways to solve the problem that avoids the use of non-const global variables. But in some cases, judicious use of non-const global variables can actually reduce program complexity, and in these rare cases, their use may be better than the alternatives.

For example, if your program uses a database to read and write data, it may make sense to define the database globally, because it could be needed from anywhere. Similarly, if your program has an error log (or debug log) where you can dump error (or debug) information, it probably makes sense to define that globally, because you’re mostly likely to only have one log and it could be used anywhere. A sound library would be another good example: you probably don’t want to pass this to every function that needs it. Since you’ll probably only have one sound library managing all of your sounds, it may be better to declare it globally, initialize it at program launch, and then treat it as read-only thereafter.

If you do find a good use for a non-const global variable, a few useful bits of advice will minimize the amount of trouble you can get into.

  • First, prefix all your global variables with g_, and/or put them in a namespace, both to reduce the chance of naming collisions and raise awareness that a variable is global.

  • Second, instead of allowing direct access to the global variable, it’s a better practice to “encapsulate” the variable.

  • Third, when writing a standalone function that uses the global variable, don’t use the variable directly in your function body. Instead, pass it in as a parameter, and use the parameter. That way, if your function ever needs to use a different value for some circumstance, you can simply vary the parameter. This helps maintain modularity.

    1
    2
    3
    4
    5
    6
    7
    8
    // bad
    double instantVelocity(int time) {
    return g_gravity * time;
    }
    // good
    double instantVelocity(int time, double gravity) {
    return gravity * time;
    }

Static duration variables

The static keyword is one of the most confusing keywords in the C++ language (maybe with the exception of the keyword class). This is because it has different meanings depending on where it is used.

Just like we use “g_” to prefix global variables, it’s common to use “s_” to prefix static (static duration) variables. Note that internal linkage global variables (also declared using the static keyword) get a “g_”, not a “s_”.

Static variables offer some of the benefit of global variables (they don’t get destroyed until the end of the program) while limiting their visibility to block scope. This makes them much safer for use than global variables.

Scope, duration, and linkage summary

A variable’s duration determines when it is created and destroyed.

  • automatic duration (stack)
  • static duration (static section)
  • dynamic duration (heap)

Linkage of an identifier determines whether multiple instances of an identifier refer to the same identifier or not.

  • no linkage (the identifier only refers to itself)
  • internal linkage (can be accessed anywhere within the file, by “static”)
  • external linkage (the file, or the other files via forward declaration, by “extern”)

Namespaces

Problem (this is why namespaces are introduced):

foo.h

1
2
3
int doSomething(int x, int y) {
return x + y;
}

goo.h

1
2
3
int doSomething(int x, int y) {
return x - y;
}

main.cpp

1
2
3
4
5
6
7
8
#include "foo.h"
#include "goo.h"
#include <iostream>

int main() {
std::cout << doSomething(4, 3) << "\n";
return 0;
}

What is a namespace?

A namespace defines an area of code in which all identifiers are guaranteed to be unique. By default, global variables and normal functions are defined in the global namespace.

foo.h:

1
2
3
4
5
namespace Foo {
int doSomething(int x, int y) {
return x + y;
}
}

Multiple namespace blocks with the same name are allowed. It’s legal to declare namespace blocks in multiple locations (either across multiple files, or multiple places within the same file). All declarations within the namespace block are considered part of the namespace.

add.h:

1
2
3
4
5
namespace BasicMath {
int add(int x, int y) {
return x + y;
}
}

subtract.h:

1
2
3
4
5
namespace BasicMath {
int subtract(int x, int y) {
return x - y;
}
}

The standard library makes extensive use of this feature, as all of the different header files included with the standard library have their functionality inside namespace std.

Nested namespaces and namespace aliases:

1
2
3
4
5
6
7
namespace Foo {
namespace Goo {
const int g_x = 5;
}
}
// main
std::cout << Foo::Goo::g_x;

In C++17, nested namespaces can also be declared this way:

1
2
3
4
5
namespace Foo::Goo { // left 2 right
const int g_x = 5;
}
// main
std::cout << Foo::Goo::g_x;

Because typing the fully qualified name of a variable or function inside a nested namespace can be painful, C++ allows you to create namespace aliases.

1
2
3
4
5
6
7
8
namespace Foo {
namespace Goo {
const int g_x = 5;
}
}
namespace Boo = Foo::Goo; // Boo now refers to Foo::Goo
// main
std::cout << Boo::g_x; // This is really Foo::Goo::g_x

It’s worth noting that namespaces in C++ were not designed as a way to implement an information hierarchy – they were designed primarily as a mechanism for preventing naming collisions.

In general, you should avoid nesting namespaces if possible, and there are few good reasons to nest them more than 2 levels deep. However, in later lessons, we will see other related cases where the scope resolution operator needs to be used more than once.

Using statements

If you’re using the standard library a lot, typing “std::” before everything you use from the standard library can become repetitive. C++ provides some alternatives to simplify things, called using statements.

The using declaration:

1
2
using std::cout; // this using declaration tells the compiler that cout should resolve to std::cout
cout << "Hello world!"; // so no std:: prefix is needed here!

This doesn’t save much effort in this trivial example, but if you are using cout a lot inside of a function, a using declaration can make your code more readable. Note that you will need a separate using declaration for each name you use (e.g. one for std::cout, one for std::cin, and one for std::endl).

The using directive:

1
2
using namespace std; // this using directive tells the compiler that we're using everything in the std namespace!
cout << "Hello world!"; // so no std:: prefix is needed here!

For illustrative purposes, let’s take a look at an example where a using directive causes ambiguity:

Example 1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
namespace a {
int x(10);
}

namespace b {
int x(20);
}

int main() {
using namespace a;
using namespace b;
std::cout << x << '\n'; // error
return 0;
}

Example 2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int cout() {
return 5;
}

int main() {
using namespace std; // makes std::cout accessible as "cout"
cout << "Hello, world!"; // uh oh! Which cout do we want here? The one in the std namespace or the one we defined above?
// error
return 0;
}

// but we can use:
int main() {
// 1
std::cout << "Hello, world!";
// 2
using std::cout;
cout << "Hello, world!";
}

Many new programmers put using directives into the global scope. This pulls all of the names from the namespace directly into the global scope, greatly increasing the chance for naming collisions to occur. This is considered bad practice.

Rule: Avoid using statements outside of a function (in the global scope).

Suggestion: We recommend you avoid using directives entirely.

It’s because there’s no way to cancel the “using namespace XXX”.

The best you can do is intentionally limit the scope of the using statement from the outset using the block scoping rules.

1
2
3
4
5
6
7
8
9
10
11
12
13
int main() {
{
using namespace Foo;
// calls to Foo:: stuff here
} // using namespace Foo expires

{
using namespace Goo;
// calls to Goo:: stuff here
} // using namespace Goo expires

return 0;
}

Of course, all of this headache can be avoided by explicitly using the scope resolution operator :: in the first place.

Random number generation

Computers are generally incapable of generating random numbers. Instead, they must simulate randomness, which is most often done using pseudo-random number generators.

A pseudo-random number generator (PRNG) is a program that takes a starting number (called a seed), and performs mathematical operations on it to transform it into some other number that appears to be unrelated to the seed. It then takes that generated number and performs the same mathematical operation on it to transform it into a new number that appears unrelated to the number it was generated from. By continually applying the algorithm to the last generated number, it can generate a series of new numbers that will appear to be random if the algorithm is complex enough.

A short program that generates 100 pseudo-random numbers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <iostream>

unsigned int PRGN() {
static unsigned int seed = 5323;
// Take the current seed and generate a new value from it
// Due to our use of large constants and overflow, it would be
// hard for someone to casually predict what the next number is
// going to be from the previous one.
seed = 8253729 * seed + 2396403;

// Take the seed and return a value between 0 and 32767
return seed % 32768;
}

int main() {
// Print 100 random numbers
for (int count = 1; count <= 100; ++count) {
std::cout << PRNG() << "\t";
if (count % 5 == 0) {
std::cout << "\n";
}
}
}

What is a good PRNG?

  • The PRNG should generate each number with approximately the same probability.
  • The method by which the next number in the sequence is generated shouldn’t be obvious or predictable.
  • The PRNG should have a good dimensional distribution of numbers.
  • All PRNGs are periodic, which means that at some point the sequence of numbers generated will eventually begin to repeat itself.

std::rand() is a mediocre PRNG

The algorithm used to implement std::rand() can vary from compiler to compiler, leading to results that may not be consistent across compilers. Most implementations of rand() use a method called a Linear Congruential Generator (LCG). If you have a look at the first example in this lesson, you’ll note that it’s actually a LCG, though one with intentionally picked poor constants. LCGs tend to have shortcomings that make them not good choices for most kinds of problems.

For applications where a high-quality PRNG is useful, I would recommend Mersenne Twister (or one of its variants), which produces great results and is relatively easy to use. Mersenne Twister was adopted into C++11, and we’ll show how to use it later in this lesson.

Although you can create a static local std::mt19937 variable in each function that needs it (static so that it only gets seeded once), it’s a little overkill to have every function that need a random number generator seed and maintain its own local generator. A better option in most cases is to create a global random number generator (inside a namespace!). Remember how we told you to avoid non-const global variables? This is an exception (also note: std::rand() and std::srand() access a global object, so there’s precedent for this).

1
2
3
4
5
6
7
8
9
10
11
#include <random> // for std::mt19937
#include <ctime> // for std::time
namespace MyRandom {
// Initialize our mersenne twister with a random seed based on the clock (once at system startup)
std::mt19937 mersenne(static_cast<unsigned int>(std::time(nullptr)));
}

int getRandomNumber(int min, int max) {
std::uniform_int_distribution<> die(min, max); // we can create a distribution in any function that needs it
return die(MyRandom::mersenne); // and then generate a random number from our global generator
}

A perhaps better solution is to use a 3rd party library that handles all of this stuff for you, such as the header-only Effolkronium's random library. You simply add the header to your project, #include it, and then you can start generating random numbers via Random::get(min, max).

1
2
3
4
5
6
7
8
9
#include <iostream>
#include "random.hpp"
using Random = effolkronium::random_static;
int main() {
std::cout << Random::get(1, 6) << '\n';
std::cout << Random::get(1, 10) << '\n';
std::cout << Random::get(1, 20) << '\n';
return 0;
}

Input Extraction, and dealing with invalid text input

When the user enters input in response to an extraction operation, that data is placed in a buffer inside of std::cin. A buffer (also called a data buffer) is simply a piece of memory set aside for storing data temporarily while it’s moved from one place to another. In this case, the buffer is used to hold user input while it’s waiting to be extracted to variables.

When the extraction operator >> is used, the following procedure happens:

  • If there is data already in the input buffer, that data is used for extraction.
  • If the input buffer contains no data, the user is asked to input data for extraction (this is the case most of the time). When the user hits enter, a ‘\n’ character will be placed in the input buffer.
  • operator >> extracts as much data from the input buffer as it can into the variable (ignoring any leading whitespace characters, such as spaces, tabs, or ‘\n’).
  • Any data that cannot be extracted is left in the input buffer for the next extraction.

Extraction succeeds if at least one character is extracted from the input buffer. Any unextracted input is left in the input buffer for future extractions. For example:

1
2
int x;
std::cin >> x;

If the user enters “5a”, 5 will be extracted, converted to an integer, and assigned to variable x. “a\n” will be left in the input stream for the next extraction. Extraction fails if the input data does not match the type of the variable being extracted to. For example:

1
2
3
// current buffer: "a\n"
int x;
std::cin >> x;

There are three basic ways to do input validation:

  • Inline (as the uesr types)
    • Prevent the user from typing invalid input the first place.
    • Unfortunately, std::cin does not support this style of validation.
  • Post-entry (after the user types)
    • Let the user enter whatever they want into a string, then validate whether the string is correct, and if so, convert the string to the final variable format.
    • Let the user enter whatever they want, let std::cin and operator >> try to extract it, and handle the error cases.

Types of invalid text input:

  • Error A: Input extraction succeeds but the input is meaningless to the program (e.g. entering k as your mathematical operator).

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    // For A
    char getOperator() {
    while (true) {
    std::cout << "Enter:" << "\n";
    char op;
    std::cin >> op;

    if (op == '+' || op == '-' || op == '*' || op == '/')
    return op;
    else
    std::cout << "Oops! Try again!" << "\n";
    }
    }
  • Error B: Input extraction succeeds but the user enters additional input (e.g. entering *q hello as your mathematical operator).

    1
    Enter a double value: 5*7

    Then,

    1
    Enter one of the following: +, -, *, or /: Enter a double value: 5 * 7 is 35

    In this case above, we cannot continue entering text since there is data in the buffer.

    Since the last character the user entered must be a ‘\n’, we can tell std::cin to ignore buffered characters until it finds a newline character (which is removed as well).

    1
    std::cin.ignore(32767, '\n');  // clear (up to 32767) characters out of the buffer until a '\n' character is removed
  • Error C: Input extraction fails (e.g. trying to enter q into a numeric input).

    Now consider the following execution of the calculator program:

    1
    Enter a double value: a

    You shouldn’t be surprised that the program doesn’t perform as expected, but how it fails is interesting:

    1
    2
    Enter a double value: a
    Enter one of the following: +, -, *, or /: Enter a double value:

    and the problem suddenly ends. This looks pretty similar to the extraneous input case, but it’s a little different. Let’s take a closer look.

    When the user enters ‘a’, that character is placed in the buffer. Then operator >> tries to extract ‘a’ to variable x, which is of type double. Since ‘a’ can’t be converted to a double, operator >> can’t do the extraction.

    Two things happen at this point: ‘a’ is left in the buffer, and std::cin goes into failure mode.

    Once in failure mode, future requests for input extraction will silently fail. Thus in our calculator program, the output prompts still print, but any requests for further extraction are ignored. The program simply runs to the end and then terminates (without printing a result, because we never read in a valid mathematical operation).

    Fortunately, we can detect whether an extraction has failed and fix it:

    1
    2
    3
    4
    5
    if (std::cin.fail()) {
    // yep, so let's handle the failure
    std::cin.clear(); // put us back in 'normal' operation mode
    std::cin.ignore(32767, '\n'); // and remove the bad input since previous is still in the buffer
    }

    Combine A, B, and C:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    double getDouble() {
    while (true) {
    std::cout << "Enter a double value: ";
    double x;
    std::cin >> x;

    if (std::cin.fail()) {
    std::cin.clear();
    std::cin.ignore(32767, '\n');
    } else {
    std::cin.ignore(32767, '\n');
    return x;
    }
    }
    }

    Note: Prior to C++11, a failed extraction would not modify the variable being extracted to. This means that if a variable was uninitialized, it would stay uninitialized in the failed extraction case. However, as of C++11, a failed extraction due to invalid input will cause the variable to be zero-initialized. Zero initialization means the variable is set to 0, 0.0, “”, or whatever value 0 converts to for that type.

  • Error D: Input extraction succeeds but the user overflows a numeric value.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    int main() {
    std::int16_t x { 0 }; // x is 16 bits, holds from -32768 to 32767
    std::cout << "Enter a number between -32768 and 32767: ";
    std::cin >> x;

    std::int16_t y { 0 }; // y is 16 bits, holds from -32768 to 32767
    std::cout << "Enter another number between -32768 and 32767: ";
    std::cin >> y;

    std::cout << "The sum is: " << x + y << '\n';
    return 0;
    }
    1
    2
    Enter a number between -32768 and 32767: 40000
    Enter another number between -32768 and 32767: The sum is: 32767

    In the above case, std::cin goes immediately into “failure mode”, but also assigns the closest in-range value (>= C++11) to the variable. Consequently, x is left with the assigned value of 32767. Additional inputs are skipped, leaving y with the initialized value of 0. We can handle this kind of error in the same way as a failed extraction.

Conclusion

For each point of text input, consider:

  • Could extraction fail?
  • Could the user enter more input than expected?
  • Could the user enter meaningless input?
  • Could the user overflow an input?

Remember that the following code will test for and fix failed extractions or overflow:

1
2
3
4
5
if (std::cin.fail()) { // has a previous extraction failed or overflowed?
// yep, so let's handle the failure
std::cin.clear(); // put us back in 'normal' operation mode
std::cin.ignore(32767,'\n'); // and remove the bad input
}

The following statement will also clear any extraneous input:

1
std::cin.ignore(32767,'\n'); // and remove the bad input

Finally, use loops to ask the user to re-enter input if the original input was invalid (meaningless).