Quarterstar

Rust to C++: Implementing the Question Mark Operator

2026-03-30T00:00:00+02:00

Introduction

One of the most important quality-of-life features coming from Rust, for me, that is missing from C++, is the ? operator; it is critical if you handle errors via values instead of exceptions, like many modern languages do. This article will show you how I implemented it in C++.

This article may not be particularly suitable for C++ beginners, but I have tried to make it approachable with as much love as I’m legally allowed to give.

The feature discussed was proposed in P2561 all the way back in 2022.

Why does it exist in Rust?

Instead of writing code like so:

const auto maybe_value{some_optional_or_expected(/* ... */)};
if (!maybe_value.has_value()) {
  return /* .error() or nullopt or some other error */;
}
auto value{maybe_value.value()};

If we had the same operator from Rust, we would be able to write it like this instead:

const auto value{some_optional_or_expected(/* ... */)?};

(This is completely unrelated to the ternary operator, and ? is used for illustration here.)

If the optional is missing, then the std::nullopt is returned, and if std::expected has error that error is returned; otherwise, control flow continues.

Now, the lack of this feature would be somewhat tolerable if C++ supported variable shadowing within the same scope:

const auto value{some_optional_or_expected(/* ... */)};
if (!value.has_value()) {
  /* ... */
}
auto value{maybe_value.value()}; // type changes!

Of course, we can at least write it like this C++17’s if statement with initializer:

if (const auto value = some_optional_or_expected(); !value.has_value()) {
  return value.error();
} else {
  /* ... */
}

What’s wrong with the current approaches of C++?

The problem this operator solves is not being able to do this common procedure inline; this becomes very important when your entire application is based on optionals and expecteds, and it’s why Rust made the operator.

In addition, something that Rust doesn’t have with ? is being able to return a modified error type within the same operator. This is something I actively need: if for example it is an error, you should be able to change the return type to something else, possibility looking at the error itself. (This is basically syntactic sugar for doing a chain of transformations, but covering simple cases.)

Unfortunately, the lack of this feature is one of the many annoyances of C++ we have to deal with to the heat death of the universe, just like we cannot extend STL types to have basic string methods we have been asking for the last decade because UFCS won’t be supported until C++53, for various reasons.

Fortunately, however, one tool comes to the rescue at the end of this insidiously obscene tunnel, as the C preprocessor is more powerful than some people think; it allows us to fully emulate this feature in C++, and on top of that, get a text dump jumpscare when you make a mistake for free. (Due to the ongoing tarriffs, error production has been reduced by 50% since last quarter.)

The implementation

Since it’s a convention in C++ to prefix the discussed practice with maybe_, I thought it would be fitting to name my macro maybe:

const auto value{maybe(/* ... */)};

(IIRC, maybe was a keyword in C++20 contracts that were subsequently removed, so it should be safe to use, but if you want to be safe you can add your own prefix to it; for example, in C++ the co_yield keyword has that name instead of yield for routines.)

The reason I don’t use uppercase is because in my codebase I treat this like a primitive, just like ’s assertion macro is not uppercase. (Did you know that it’s a macro?)

The base case is when the user simply wants to return the bad value, which is just one argument:

const auto value{maybe(expected_or_optional)};

Now, in my case, one thing I wanted is a second argument that lets me modify the return value within the error basis:

const auto value{maybe(expected_or_optional, /* ??? */)};

This could likely be avoided by using and_then, transform, but the point of this optimization is to make our syntax as compact as possible.

For example, in the tokenization phase of a parser tool I’m developing, I needed this to return a missing component if a regular expression did not find a match (for those interested, I used a great library called ctre):

const auto binary{maybe(search<binary_pattern>(line), std::unexpected(missing_component("binary")))};

I will be able to make errors that repeated in my code even smaller when C++26 compile-time reflection is implemented in compilers, but until then we have another layer to deal with.

In other cases, I want to take the actual wrapped return value and run some expression on it that modifies it in some way. To do this, I made my macro define the intermediate _maybe_macro variable:

const auto binary{maybe(expected_or_optional, expression(_maybe_error))};

Note that the resolution of shadowing concerns with this will come packaged with the language construct that we shall use.

So our first question is how do we even have macro overloading? The preprocessor does not let us simply define a macro with the same name twice. There is a pattern to implement it and I personally call it macro routing. It looks like so:

#define GET_maybe_MACRO(_1, _2, NAME, ...) NAME
#define maybe(...) GET_maybe_MACRO(__VA_ARGS__, maybe_2, maybe_1)(__VA_ARGS__)

maybe_1 and maybe_2 are other macros that will be explained in a moment. The ... is the variadic parameter itself; it is what allows us to collect all the extra parameters we need. Since we cannot simply refer to ... in the body, another preprocessor token was defined by the standard, __VA_ARGS__.

The routing you see works by pushing the macro names to the end of the argument list, so that the NAME picked is the one that is last available. So suppose you call:

maybe(x)

In that situation, the preprocessor would evaluate it (well not literally evaluate since it’s just substituting literal text) to:

GET_maybe_MACRO(x, maybe_2, maybe_1)

So the NAME will be maybe_1. If we added another argument:

maybe(x, y)

This is also fine because even though we exceed three arguments we support accepting more because variadic arguments are accepted and subsequently ignored, so we get:

GET_maybe_MACRO(x, y, maybe_2, maybe_1)

So our final evaluation of NAME is the third argument, maybe_2. So that’s how the macro routing aspect works.

Now, you may have many ideas for the implementation of maybe_1 and maybe_2.

Can we use IIFE?

You may think that an IIFE is the appropriate approach to this, something along the lines of:

const auto iife{[](auto&& v){
  if (!v.has_value()) {
    if constexpr (requires { v.error() }) {
      return std::unexpected(v.error());
    } else {
      return std::nullopt;
    }
  }

  return v.value();
}()};

That would simply not work because you cannot return from the outer function from inside the body of a lambda called in that function; this would only return from the lambda itself.

Can we use goto?

Alternatively, you may be naturally conditioned to think that we could do this by using a goto:

auto parent_function() {
  // this is more for demonstration not how you would actually write it
  T _value;
_failure:
  return std::unexpected(v.error());

  const auto iife{[](auto&& v){
    if (!v.has_value()) {
      if constexpr (requires { v.error() }) {
        _value = v.error();
        goto _failure;
      } else {
        _value = std::nullopt;
        goto _failure;
      }
    }

    return v.value();
  }()};
}

This reminds me of my patching days since it is a similar concept to trampolines.

(If you are not familiar with some of these language constructs, they will be explained in a moment.)

This is extremely dangerous, and would require us setting up a hook at the top of the function (for the _failure label) with a separate macro — a side effect that would immediately negate any ergonomics we otherwise want to gain. And we definitely don’t want to introduce impure ABI by modifying our compiler’s code to hard code it in the prologue of our functions, either.

So what even remains for us to consider?

Sometimes, to find the solution you seek, you must venture out into the wilderness. C and C++ are well known for their unique property of having ISO standards backing their specification, but just like any other language, extensions exist to them. GNU maintains a set of extension for it, and the one we are interested in for this discussion is statement expressions.

The word extension might sound scary, but it is implemented by all new versions of GCC and Clang; if you have concerns about MSVC and Visual Studio compatibility, you can use a compatibility layer or the clang-cl toolset.

Statement expressions are written like this:

({
  statement1;
  statement2;
  // ...
  final_value;
})

The value of final statement is the value returned, which treats our statement like a return one. But the benefit from this is that actual return statements (such as replacing statement1 with a conditional return) can return from the parent function! This is precisely what we are asking for.

The smaller pillar

So we begin by writing the second case of the macro, which is less technically challenging:

#define maybe_2(expr, fallback)                                                                   \
  ({                                                                                              \
    auto&& _result{(expr)};                                                                       \
    if (!_result) {                                                                               \
      [[maybe_unused]] auto&& _maybe_error{::_get_error(_result)};                                \
      using _fallback_type = std::decay_t;                                    \
      return std::unexpected<_fallback_type>(fallback);                                           \
    }                                                                                             \
    *std::move(_result);                                                                          \
  })

The fallback expression provided the user may use _maybe_error if they want to. We use C++17’s [[maybe_unused]] attribute to semantically communicate that its use is unnecessary, but also to hide potential compiler warnings.

The std::decay is useful because it removes cv-qualifiers, references, and other things that could be obstructing our view on the actual type.

For the _get_error function, we need to handle the possibly of both std::optional and std::expected. We can use a requires expression and compile-time branching to implement it.

template <typename T>
auto _get_error(T&& container) -> decltype(auto) {
  if constexpr (requires { container.error(); }) {
    return std::forward<T>(container).error();
  } else {
    return nullptr;
  }
}

Explanation

The use of decltype(auto) is a relatively old feature from C++14 that essentially lets one use auto but with the rules of decltype; it exists because auto uses the rules of template deduction, so it does things like drop cv-qualifiers and references, but decltype follows different rules that preserve them with the use of reference collapsing rules. Don’t be confused because we aren’t literally passing an argument to decltype; this combination is actually a reserved keyword specified in the standard.

The requires expression allows us to check if a particular piece of code would compile, if we assume that the variables that it uses exist. In particular, the C++ standard explicitly states that if a type used is a template, then it will return a boolean where the program would otherwise be ill-formed; in other cases, the requires expression yields a hard error, so it is generally only used in contexts when we have a template parameter.

Finally, the std::forward is similar to decltype(auto), in that it was created to preserve some sort of reference type passed. If you remember how std::move works, it doesn’t actually “move” anything; it simply converts the type passed (usually lvalue) back to the appropriate reference type, which can later be used with the move-assignment-operator. More specifically, while std::move unconditionally converts anything into an rvalue (specifically an xvalue), std::forward conditionally casts to rvalue or lvalue depending on the template type.

The larger pillar

Now we need to implement the unary overload, i.e. the macro with a single argument. The reason this is more difficult is we need to be the ones who actually return the correct error.

Suppose first that a function returns std::expected. If you want to use this macro inside that function for a procedure that returns std::expected, then you cannot simply return the error because there is a type mismatch between T and U; rather, you have to check that it is an std::expected, steal its wrapped value, and finally create an std::expected with the correct type template parameters.

My naive approach was to do something along the lines of:

#define maybe_1(expr) \
  ({ \
    auto&& _result{(expr)}; \
    if (!_result) { \
      auto&& _maybe_error = ::_get_error(_result); \
      using _maybe_error_type = std::decay_t; \
      if constexpr (std::is_same_v<_maybe_error_type, std::nullptr_t>) { \
        return std::nullopt; \
      } else { \
        return std::unexpected(_maybe_error); \
      } \
    } \
    *std::move(_result); \
  })

(If you were wondering why we use nullptr over NULL, this is why; it has its own type, std::nullptr_t, which opens the door for metaprogramming.)

But anyway, this approach is incorrect because we cannot return two different types from an expression. You don’t see this happen often in C++ because we don’t have if expressions like in Rust, but with statement expressions it is something we actually need to be careful about.

This is the compiler error you would get, when used in my application:

../src/logic.cpp: In explicit object member function ‘std::expected colorlink::logic::ColorLinkLogic::process_lines(this colorlink::logic::ColorLinkLogic&, std::string)’:
../include/colorlink/macros.hpp:89:21: error: could not convert ‘std::nullopt’ from ‘const std::nullopt_t’ to ‘std::expected’
   89 |         return std::nullopt;                                                                       \
      |                ~~~~~^~~~~~~
      |                     |
      |                     const std::nullopt_t

So if you really think about it, we need some sort of wrapper type with an implicit conversion operator so that when our parent function receives the type back, it can convert it to the correct value.

We shall store the error in a template:

Storage value;

It will be inferred with a similar technique:

auto&& _maybe_error{::_get_error(_result)};
using _maybe_error_type = std::decay_t<decltype(_maybe_error)>;

C++11 was the first to explicitly allow templated conversion operators, including implicit ones, so we can make our operators generic:

template <typename T>
operator std::optional<T>() const {
  return std::nullopt;
}

template <typename T, typename E>
operator std::expected<T, E>() const {
  /* ... */
}

That’s the meat of our solution. The rest is simple, with the only ceveat being that we treat nullptr as a special input for default-constructing the error, if it does not exist; this is used when we are converting optional error to an expected.

template <typename Storage>
struct _maybe_failure_proxy {
private:
  using Self = _maybe_failure_proxy;

public:
  Storage value;

  template <typename T>
  operator std::optional<T>([[maybe_unused]] this const Self& _) {
    return std::nullopt;
  }

  template <typename T, typename E>
  operator std::expected<T, E>(this Self&& self) {
    using value_type = std::decay_t<Storage>;

    if constexpr (std::is_same_v<value_type, std::nullptr_t>) {
      static_assert(
        std::default_initializable<E>,
        "E must be default-initializable to convert from optional failure."
      );
      return std::unexpected(E {});
    } else {
      return std::unexpected<E>(std::forward<value_type>(self.value));
    }
  }
};

So our code compiles this time, using the new type:

#define maybe_1(expr) \
  ({ \
    auto&& _result{(expr)}; \
    if (!_result) { \
      auto&& _maybe_error = ::_get_error(_result); \
      using _maybe_error_type = std::decay_t; \
      return ::_maybe_failure_proxy<_maybe_error_type>(std::move(_maybe_error)); \
    } \
    *std::move(_result); \
  })

Also, I greatly simplified the updated code with the use of deduction guides, part of C++17’s CTAD:

template <typename T>
_maybe_failure_proxy(T&&) -> _maybe_failure_proxy<std::decay_t<T>>;

#define maybe_1(expr)                                          \
  ({                                                           \
    auto&& _result{(expr)};                                   \
    if (!_result) {                                           \
      auto&& _maybe_error = ::_get_error(_result);          \
      return ::_maybe_failure_proxy{::_get_error(_result)}; \
    }                                                          \
    *std::move(_result);                                      \
  })

#define maybe_2(expr, fallback)                                \
  ({                                                           \
    auto&& _result{(expr)};                                   \
    if (!_result) {                                           \
      return ::_maybe_failure_proxy{::_get_error(_result)}; \
    }                                                          \
    *std::move(_result);                                      \
  })

So now in my application I can write code like in the following form:

auto ColorLinkLogic::get_errors(
  [[maybe_unused]] this const Self& self,
  const std::string_view lines
) -> std::expected<std::vector<std::string>, error::ColorLinkError>;

auto ColorLinkLogic::process_lines(
  this Self& self,
  std::string lines
) -> std::expected<void, error::ColorLinkError> {
  auto errors {maybe(self.get_errors(lines))};
  /* ... */
}

I think you could possibly write it with overloaded functions instead of a proxy type, but then you would need to repeat the Storage template for many functions; it becomes especially annoying if you want to extend this utility to convert to your own error types.

Lambda support

Instead of taking the _maybe_error for granted as a magic variable, some people may prefer a lambda.

I would argue that this isn’t strictly better in this case, because a lambda is much more verbose for a commonly used primitive like this; we don’t need to worry about conflicts of the magic variable’s name in most cases, and if we keep it short (e.g. trimming it to _e from _maybe_error), then we have saved ourselves quite a bit of syntax for a primitive, because some clang-format setups dictate that all lambdas have an explicit return type, even if that type is auto.

Regardless, it can be implemented relatively easily:

#define maybe_2(expr, fallback)                                      \
  ({                                                                 \
    auto &&_result{(expr)};                                         \
    if (!_result) {                                                 \
      [[maybe_unused]] auto &&_e {::_get_error(_result)};         \
      return ::_maybe_failure_proxy{::_get_result(fallback, _e)}; \
    }                                                                \
    *std::move(_result);                                            \
  })

The new function _get_result simply checks if it is a callable:

template <typename F, typename T>
auto _get_result(F &&fallback, T &&error) -> decltype(auto) {
  if constexpr (std::invocable<F, T>) {
    return std::forward<F>(fallback)(std::forward<T>(error));
  } else {
    return std::forward<F>(fallback);
  }
}

You can use the lambda variant in more complex cases, and you can define the lambda externally for more complex error handling that requires reusability.

Important edge cases

The more complex you make something, the merrier the room for failure.

Adding std::unexpected if missing

To support the last implicit example, we define the metafunction:

template <typename T>
struct is_expected : std::false_type {};

template <typename E>
struct is_expected<std::unexpected<E>> : std::true_type {};

template <typename T, typename E>
struct is_expected<std::expected<T, E>> : std::true_type {};

If you aren’t familiar with metafunctions, they are essentially small programs that run during compilation. They weren’t an explicit feature of the C++ standard initially (until type traits basically made them standardized) but a happy consequence of other features; the same goes for a popular but mostly obsolete trick you may have heard about named SFINAE. I wrote more about metafunctions here.

We use these in another compile-time branch:

template <typename T, typename E>
operator std::expected<T, E>(this Self&& self) {
  using value_type = std::decay_t<Storage>;

  if constexpr (std::is_same_v<value_type, std::nullptr_t>) {
    static_assert(
      std::default_initializable<E>,
      "E must be default-initializable to convert from optional failure."
    );
    return std::unexpected(E {});
  } else if constexpr (_is_expected<value_type>::value) {
    return std::forward<value_type>(self.value);
  } else {
    return std::unexpected<E>(std::forward<value_type>(self.value));
  }
}

Avoiding dereference of void

Notice how we are dereferencing _result, and that is ill-formed if the return type T is void. (Recall that we expect _result to be an optional or an expected.) We fix this by adding yet another compile-time check to the return before doing so. So we have to update both macros.

C++ STL provides us the unary type trait std::is_void for this.

#define maybe_1(expr)                                                                              \
  ({                                                                                               \
    auto&& _result {(expr)};                                                                      \
    if (!_result) {                                                                               \
      return ::_maybe_failure_proxy{::_get_error(_result)};                                     \
    }                                                                                              \
    using _container_type = std::decay_t;                                     \
    if constexpr (std::is_void_v<_container_type::value_type>) {                                  \
      (void)0;                                                                                     \
    } else {                                                                                       \
      *std::move(_result);                                                                        \
    }                                                                                              \
  })

#define maybe_2(expr, fallback)                                                                    \
  ({                                                                                               \
    auto&& _result {(expr)};                                                                      \
    if (!_result) {                                                                               \
      [[maybe_unused]] auto&& _e {::_get_error(_result)};                                       \
      return ::_maybe_failure_proxy{::_get_result(fallback, _e)};                               \
    }                                                                                              \
    using _container_type = std::decay_t;                                     \
    if constexpr (std::is_void_v<_container_type::value_type>) {                                  \
      (void)0;                                                                                     \
    } else {                                                                                       \
      *std::move(_result);                                                                        \
    }                                                                                              \
  })

Haha, kidding!

This doesn’t work because in C++ if is a statement, not an expression, and more importantly, the statements in our conditional blocks are NOT the last statements.

Instead, we define another helper function that returns the proper deduced type:

template <typename T>
_maybe_failure_proxy(T &&) -> _maybe_failure_proxy<std::decay_t<T>>;

template <typename T> auto _deref_or_void(T &&container) -> decltype(auto) {
  using value_type = typename std::decay_t<T>::value_type;
  if constexpr (std::is_void_v<value_type>) {
    return;
  } else {
    return *std::forward<T>(container);
  }
}

From there, it’s plug-and-play:

#define maybe_1(expr)                                                          \
  ({                                                                           \
    auto &&_result{(expr)};                                                   \
    if (!_result) {                                                           \
      return ::_maybe_failure_proxy{::_get_error(_result)};                 \
    }                                                                          \
    ::_deref_or_void(std::move(_result));                                    \
  })

#define maybe_2(expr, fallback)                                                \
  ({                                                                           \
    auto &&_result{(expr)};                                                   \
    if (!_result) {                                                           \
      [[maybe_unused]] auto &&_e {::_get_error(_result)};                   \
      return ::_maybe_failure_proxy{_get_result(fallback, _e)};             \
    }                                                                          \
    ::_deref_or_void(std::move(_result));                                    \
  })

By the way, I was doing some research and I found that there is an old GNU extension for C specifically called __builtin_choose_expr, which works like this:

__builtin_choose_expr(
  std::is_void_v<_val_t>,
  (void)0,
  *std::move(_result);
);

This would also allow us to achieve the same thing for constant expressions. It seems like it’s not needed for C++ as it does not support it, but it’s a cool historical artifact.

Nested uses of maybe expressions

This already works because statement expressions automatically resolve any scoping issues we would have with macros, so you can do things like:

maybe(maybe(foo()));

Some conveniences

In my actual library, I have maybe_twice and maybe_thrice to make the nesting cleaner:

maybe_twice(foo()); // equivalent to maybe(maybe(foo()));
maybe_thrice(foo()); // equivalent to maybe(maybe(maybe(foo())));

Also, I made it detect nullary functors so that I don’t have to use parentheses:

maybe_twice(foo);

(These changes will be pushed to my repository soon.)

I do kind of wish that we would be able to define our own operators in C++ so that I can have something like an ! operator to invoke the basic maybe overload (e.g. foo()!!), or at least be able to use non-ASCII names for macros (e.g. !(!(foo()))), but it is what it is. You would get the added benefit of your code screaming at you, which is something that could prove useful from time to time.

The Turing Test

We now support a variety of things. Here are the primary examples:

auto foo() -> std::expected<void, int> { return {}; }

auto bar() -> std::expected<int, int> {
  maybe(foo()); // minimal
  maybe(foo(), std::unexpected(_e)); // explicit version of above
  maybe(foo(), std::unexpected(_e + 1)); // our own
  maybe(foo(), _e + 1); // implicit std::unexpected() wrapper
  maybe(foo(), [](auto&& e) -> auto { return e + 1; }); // lambda variant
  return {};
}

Going forward

A new language construct has been proposed called do expressions in P2806. Its purpose is to modernize and standardize statement expressions. Perhaps we will get something like this in C++29, which will make our implementation of the operator more idiomatic.

Handy tips

If you want to use this solution, here are my personal guidelines:

If you want to return a simple error, use the _e shortcut.
If you observe that you repeatedly use the same error in maybe, define a local lambda and pass it via the second argument’s overload.
If you need more complex error transformations, do them before the call to maybe and store the error for reuse; you can also define a namespace function if you have an error pattern repeating across several modules.

Full code

Since my website currently has bad formatting for macros, pasting the entire thing would be horror, so you can find the code and star my GitHub repository to track its progress here.

Additional notes & honorable mentions

Following my implementation, I noticed that another project, cppmatch, from last year, implemented this by using the same compiler extension, so I wanted to give a props to them as well. Nevertheless, I made this blog post since mine works a bit differently and adds some additional features that I wanted to make a proper writeup about, like the binary overload.

In addition, I want to give a honorable mention to effort that has been done to implement it with coroutines instead. This can be seen in a blog by cpp-rendering.io. Note that this approach has had reports that there are irreducible heap allocations and issues on GCC for similar coroutine implementations, and there might be additional performance overhead that has not yet been documented.

The Essence Of Mathematics From Basic Counting To Fourier Transforms And Beyond

2025-12-22T00:00:00+01:00

Introduction

This article takes a tour from the very beginning of mathematics to advanced topis with the focus being developing intuition. It’s aimed for students who do not fully understand how mathematics was discovered and the intuition behind it—this is why there is verbose wording in a lot of places.

Initially, I wanted to split the content of this article to several ones, but I realized that the topics are so interconnected that it would be of disservice to my readers to organize it that way.

The way I like to teach things, like you will see in this article, is with a concept called Just in Time Learning. It was coined from programming and it means to learn things when their use comes up in an application. For example, the use of division for ratios is demonstrated right before trigonometric functions are introduced. This method, from my experience, helps recall the details that are important for active learning.

For the people already familiar with the basics, the bread and butter of the article is the section on complex numbers and the fourier transform itself. However, in that case, I hope you enjoy reading this article more as a historic overview of mathematics itself.

Numbers, Numbers, and More Numbers

What is a number? Maybe 2, 4, $3^2$, even $2.5$; or if you are really fancy, maybe even negative numbers. If you are a mathematician, you might think of it as an abstract element of a structure that satisfies certain rules. No matter your viewpoint, the fact remains: the need for numbers arises from real world use cases.

At the start of mathematics, we didn’t have negative numbers. All we had was natural numbers ($1, 2, \ldots$) that were used to count the number of apples or bananas we trade. As time passed, however, we realized that we can make useful statements if we had an indeterminate amount of them, say $x$ apples and $y$ bananas; that way, you can say that you need, for example, $x + y = 15$ of any fruit in total—it does not matter which specific combination, just that you satisfy the thing that would be later known as the equation. These equations have existed since ancient times—civilizations the likes of Babylonians, to be exact.

Metaphilosophical Interjection

Mathematics is really just a reflection of our perception of our world. Before continuing, you should realize that mathematics are not truly objective, in the sense that the rules we have defined or absolute, but rather rules that fit our perception of reality. In that sense, mathematics can be viewed as nothing more than a game whose rules we have all decided to follow. It isn’t any more objective than a game like monopoly. The base rules in math that we follow are called axioms. These axioms could in fact be anything, like our rules of counting. The “objective” part of mathematics is the results that we derive if we follow these axioms. Our logic is fallible, and often times the course of history has led to them being changed to better fit our empirical model.

During school, you are usually taught these axioms before the motivation that fueled their initial formulation, which is precisely why you may feel confused or “forced” to do things in a particular way. The point of this article is to take the converse direction and give you that motivation.

Now, imagine you a visually impaired person is in front of you. The assumption is that the person has never been able to in their life—as in, equivalent to being born with no eyes. Can you find a way to describe to that person who has never seen, what “seeing” looks like? Can you describe to a person born with no ears what sound is like? Most likely, you cannot find an accurate description, and if you can, it is not going to understood by such a hypothetical person.

In a similar manner, there could be senses that humans do not know exist out there that, if we had access to, would modify our axioms significantly, and maybe even solve some of our biggest theories, like consciousness. The key takeaway from this is that mathematics is really not modeling our reality, but our cognitive perception of it.

Equations

Now, lets return to equations. For the equation $x + y = 15$, one reasonable solution is 10 apples ($x$) and 5 bananas ($y$). Clearly 10 + 5 = 15, and claiming otherwise would lead to ambiguities and contradictions like 1 = 2. But lets say that for your exchange, you don’t have 5 bananas but you have 20 apples. It seems reasonable to give those 20 apples instead since 20 - 5 = 15. Wait, what did we just use? Subtraction? Oh yeah, that seems useful for anything related to a deficit. So we need some number to add to 20 so that it equals 15. Since we have subtraction, what if we just wrapped the operation and the number into one, like -5, and then add that to 20? In such case, we have 20 + (-5) = 15, and our trader is confused but satisfied nevertheless.

And so negative numbers were invented as solutions to those equations, which later on became more useful for things like displaying the remaining debt from student loans in your bank account. Since subtracting moves a number back in the number line, it makes sense to extend the positive number line by adding negative numbers in the reverse direction.

Neat Detail about Equations & Notations

Mathematical notation took long to be “standardized,” and it still isn’t to a large degree. Equations were no exception to this. Before the notation for them was created, when you needed to express something like $x + y = 5$, people would literally write “a quantity x and a quantity y is equal to 5.” Really makes you think how, as more and more ideas are piled up to make new concepts in mathematics, compact notation is our one and only salvation.

Arithmetic Operations

As time passed, mathematics became more and more intricate. We saw that we needed to add many numbers together a constant number of times, so we invented multiplication to save us from the hustle. Division was invented with a similar argument which you should try to recall yourself.

The subtraction that we intuitively understand is really just defined as

\[ a - b = a + (-b) \]

Functions

Apples are cool. Numbers too, I guess. What else can numbers do?

Well, suppose that the trader from the previous chapter received your apples and bananas and wants to cut the apples into slices—4—to make his very own apple slice collection. (Clearly, the number of slices can be 1, 2, 3, and 4.)

Our goal is to make some sort of black box machine that takes our apples and spits out sliced apples. The quantity we care about in this case is the number of them that it produces. It will need to calculate the slices for each apple individually and then add them up. But how will the machine know when to create 1, 2, 3, or 4 slices, for each specific output?

We have not yet specified on what grounds it will pick one of those 4 values. For now, lets assume that it always picks 4 slices for each apple. If he received 5 apples from you, based on your intuitive understanding, this is an exact case for the use of multiplication. So we have $4 \times 5 = 20$ slices in total.

How could we represent the fact that resultant equation Y = 4X in a way that denotes that this specific equation is for this purpose? Mathematicians invented special notation for it called functions. We have a set of inputs—our apples—and a set of outputs—our slices. The function essentially maps each input to an output, as seen below.

In mathematics, the way we denote a function for our case is the following:

\[ \operatorname{slices}(x) = 4x. \]

The “slices” part is simply the name of the function and does not make it changes its behavior. You can let $y = \operatorname{slices}(x)$, which is clearly the same as the $Y = 4X$ we had before, but now with a distinctive name.

Does the mapping of one input (in a function) have to be unique? In other words, does it make sense for this machine to be able to give two distinct outputs for a single input, depending on how it feels, if its behavior is constant like at the moment? Well, no, you usually expect a machine to give you a precise result from a particular input. In computing, we call this quality determinism. For instance, the implications of such a scenario would be that if we passed 0 apples, we wouldn’t be able to make the definitive claim that it must produce 0 slices. And in general, defining the behavior of the machine as such would have far too many problems.

Well, you could argue that you want it to have this quality if you need it to generate a random number of slices, but we can do that in the inner machine’s computations for the outputs instead. Indeed, however, we will use a form of randomness to determine the number of slices for this example. But hold up, what even is random? We know random as an event that we cannot predict. But similarly to our former metaphilosophical interjection, one needs to realize that randomness is just a very convenient abstraction for statistically predicting systems we do not fully understand.

In fact, the way one might generate a random number is to find a generally unpredictable source of output and somehow use it to transform our output to be within a certain bound, which in our case is 1 to 4. Such an action is called normalization, and we will see it in action:

Every randomness needs a source. For this example, if our machine somehow (we don’t care in what way) calculated the number of stars that are currently glowing in some galaxy that has only 100 stars, then the “random” output would range from 0 to 100. Then, after our machine calculates that, it would need to somehow algebraically manipulate the number of apples it receives as the input to determine the number of slices. The following equation demonstrates this behavior:

\[ y = 1 + s \times \frac{3}{100}. \]

In this case, $s$ is the number of glowing stars. The reason the “1” is outside the division is to ensure that the output is at least 1—after all, we don’t want our trader left with no slices. For the division part, consider this: if the number of glowing stars were 100, then it would produce slices(100) = 4 slices for one apple, which is the exact upper bound we want. We can’t have any more than 100 glowing stars, so this function is well defined.

We can now create a function that takes input $x$, the number of apples, which gets multiplied by the expression equal to y.

\[ \operatorname{slices}(x) = (1 + s*3/100) * x. \]

Notice that in this case, $s$ is not in the parentheses of the function, which means that it is constant across all $x$ inputs we give. This is because the function is representing what the user interfaces with. In reality, $s$ could be a function in and of itself, but in this case we don’t know how to actually calculate the number of glowing stars, so we use it as a hypothetical variable.

Complex Numbers & Circles

However, one thing that remained a mystery for a while is the square root operation. First of all, the reason we call it the square root is because it is the inverse of the square, and that name comes from the area of a square in geometry— which is equal to the multiplication of any two side lengths, which are equal by definition.

We know from school that, if you multiply a negative number by itself, you are always going to get a positive number. But have you considered why that’s the case? Think about this for a moment: if you wanted to rotate a number 180 degrees to the other side of the line—that is, draw a perpendicular line at zero and find the “mirror point” of your number (labeled x on the graph)—how would you do that?

Consider the number 3. The mirror point, just by looking at this graph, should intuitively be -3. How could we define our number system to do that? Well, we could make it so that multiplying 3 by -1 yields -3. Multiplication is usually stated as repeated addition a certain number of times (for instance, -1 * 2 is -1 repeated 2 times, so -2), but repeating something a negative amount of times (in the case where both numbers being multiplied are negative, since you cannot swap them to form a positive number of times) does not make sense in that regard, so let’s consider this utility as a rotation instead. (See Appendix A for a rigorous axiomatic explanation.) In a similar manner, if we started at -3, with the same logic, we should multiply by -1 to rotate it 180 degrees to the corresponding mirror point. (It’s really a reflection about 0 until we introduce planes.) From this we can infer that our new system, for multiplication of negative numbers, strictly gives positive numbers.

Now, about the square root: our initial definition for it is to give us the original number after it being multiplied by itself. For example, $3 \times 3 = 9$, so $\sqrt{9}$ should be 3. So for any number $1, 2, \ldots$, you should be able to get back the original value from such a multiplication. (Zero is a special case and a very peculiar number, which will get an entire article dedicated to it in the future.) What about the negative numbers which we just defined, which have been proven useful for rotations on a line? Well, let’s try one. Maybe $-3 \times -3$, which gives 9.

Now imagine someone gives you the number 9 and, without any of the calculations that you just did, asks you to find the square root. You maybe inclined to tell the person that it is -3 since that’s the number you multiplied to get it. But then he gives you a counterexample: what about positive 3? The square root is, fundamentally speaking, a function, which we have seen gives you a unique output for each input, so it wouldn’t make sense for the square root function to give you two different results when plugging in 9. Also, the square root is especially useful in geometry, where negative side lengths for squares do not exist, so it doesn’t really make sense to say that it can give negative results. So lets restrict its output to positive numbers.

However, one issue still stands. Suppose that someone has multipled a magic value twice and given an output of -1. Lets try to consider what could give that result. Maybe $1 \times 1$? Hmm, still 1. $1 \times (-1)$? They aren’t the same numbers, so that’s not allowed. And how about $(-1) \times (-1)$? Well, by our definition it should also give 1, so this doesn’t seem like the answer either. So what is it?

Well, to resolve that debate, lets go back to our lovely number line. We said how multiplying by -1 gives you the number rotated 180 degrees. But why just 180 degrees? Why not, for example, half of that? Try to imagine where a number like 1 would fall if you just rotated 90 degrees, counterclockwise. Evidently, if we label the perpendicular line that we constructed with equidistant points, it should fall 1 units above the intersection point, as shown here:

Geometrically, doing two 90 degree rotations in the same direction should be the same as doing a single 180 degree rotation. Since we previously multiplied by a particular number (-1) to get the desired rotation, maybe we have to do the same here. Whatever that special value we have to multiply it with to get a 90 degree rotation is, multiplying that result by the special value again should give us the equivalent of a 180 degree rotation:

The number that it should fall to after rotating 90 degrees has been labeled $x$, and after 90 degrees again, y.

For the sake of a thought experiment, lets label that special value by i. Our rules for this value is that multiplying any number by it (say 1) once should give as a 90 degree rotation, and multiply the result of that by $i$ again should give us 180 degrees of rotation in total. That means it must be the result of 1 * (-1), which we know is -1. In other words, $i \times i$ (or in other words, $i^2$) is equal to -1. Even though such a number doesn’t actually exist (it isn’t something that we can naturally see like $1, 2, 3, \ldots$), pretending that it does exist gives us a lot of insight about algebraically expressing such rotations.

Going back to our square root problem, lets think about what we had. We essentially wanted to know what the square root of a negative number should be. (Remember, it’s just the number that we multiplied by itself to get a negative result.) Hmm, since we have $i^2 = -1$, what if we say that the square root of -1 equals $i$? Congratulations, you have just discovered complex numbers. Not that complex after all, right?

Well, we also might want the square root of -2, -3, etc. It can be proven that the square root of a product of numbers, $a$ and $b$, when $a, b \geq 0$, can be split into two square roots. Understanding why this is the case isn’t really important for the purpose of this article. So the square root of a and b is equal to the square root of a, times the square root of b. That means the square root of -2 can be split into $\sqrt{-1}$ and $\sqrt{2}$, so the result should be $i \times \sqrt{2}$. So by treating $i$ as the unit of rotation, we can get rotations for any other numbers too.

We now have a complete system for representing intermediate rotations on our number line, and more precisely, rotations on a a plane. A plane is the space of all of the points formed by two perpendicular lines. (Interestingly enough, there are systems for representing rotations in more dimensions too, like quarternions for 3D spaces, which have 3 perpendicular lines. They follow a very similar logic to the one we used here. This might be touched on in a later article.)

What does our rotation system remind us of? Circles! First, remember what a circle is: it is the set of all points that are equally distant from a center point. Take a look at a circle like the one shown here:

Notice how the points 1 and -1 from all axes are equally distant. It indeed makes sense to define the unit circle as having radius 1, just like we used 1 as the basis of all of our other transformations thus far.

So how can we represent such a circle with an algebraic equations? Let’s place a circle on a plane and see what it looks like.

We can imagine this as a circle halved and so we have two halves of a circle. First of all, how can we get any point of this circle? In other words, how do we calculate the distance from (0, 0) to any point on a circle, (x, y)?

This problem can be modeled using right triangles, which are triangles with an angle equal to 90 degrees. The angle formed by the intersection of our two perpendicular lines of the plane fits that model.

The triangle has a vertical and horizontal line. Those lines are equal to the $x$ and $y$ position of our point on the triangle respectively. Notice what happens if we move the circle one unit to the right:

Now we have to account for a difference of 1, so instead of the lines being $x$ and $y$, they will be x - 1 and y - 1. In general, such differences will be written as $\Delta x = x - x_0$ and $\Delta y = y - y_0$.

Notice how the legs of the triangle ($\Delta x$ and $\Delta y$) get stretched alongside the third side, which we will call the hypotenuse. (You can remember the hypotenuse as the side that is not perpendicular to any other.) You can see this as an illustration with the interactive slider below:

[SLIDER OUT OF ORDER]

A theorem known as the Pythagorean theorem indeed confirms our suspicion that the side $c$ can be expressed in terms of $a$ and $b$. In general, for any right triangle with legs $a$ and $b$ and hypotenuse $c$,

\[ a^2 + b^2 = c^2. \]

In our case, that means the distance we are looking for, which is written $d$ instead of $c$, can simply be expressed as

\[ d^2 = (x - x_0)^2 + (y - y_0)^2. \]

Solving for $d$, we get

\[ d = \pm \sqrt{(x - x_0)^2 + (y - y_0)^2}. \]

Note that the square, as we explained previously, ensures distance is always nonnegative, which is geometrically sound. Also, we have two solutions for the distance as expected.

Okay, so now we know how to get the distance from the starting point to any point on the circle. Since this guarantees to capture every point on the circle, we are done! All we need is to plug and chug the respective values for $x_0$ and $y_0$ depending on what we want our offset of the circle to be. In the general case, however, we will assume that the circle is centered for the sake of simplicitly of calculations, or in other words, $x_0 = 0$ and $y_0 = 0$, giving us

\[ d = \pm \sqrt{x^2 + y^2}, \quad x^2 + y^2 = d^2. \]

Ratios, Trigonometry & Parametric Coordinates

Hmm, our circle is cool and all, but it uses two variables. Is there an equation we can use to represent a circle with a single variable?

Previously, triangles seemed pretty good at modeling the inner part of the circle, so maybe we have to use something similar to find such a relation. Perhaps there is some sort of relationship between the sides of the triangle in the circle? When mathematicians were thinking about this problem way back, they realized something:

From algebra, we know that division can be used for resource allocation. For instance, if you have 4 kids and 8 cups of juice, $8 \div 4$ signals that 2 cups of juice should be allocated to each kid; more precisely, we say that the ratio of juice per kid is 2:1.

When you hear ratio, what you should think of is not splitting items fairly but relative comparisons. For example, suppose Alice is 150 cm tall and Bob is 180cm. The ratio of their height, 180 : 150, can be simplified to 6:5. The benefit with relative comparisons is that you do not need to comprehend the true size of the unit system, but instead you simply compare its usage with other samples. In our example, the samples are centimeters. If we converted them to meters, their ratio would still be the exact same! (Try that yourself.) In fact, ratios are unit-independent—that’s one of the reasons why are so powerful. But you need to be careful and use the same units for both quantites being compared, otherwise the comparison is nonsensical.

How does this fit to our triangle relationship problem? Well, we could try comparing the sides of it and see if we get any meaningful metrics. For instance, for this triangle:

We could first check the ratio of the two legs, 6 : 4. Observe that if the hypotenuse is stretched far out, any ratio of the legs will remain the same.

This is… very interesting. Also, if we stretch the hypotenuse while keeping the legs the same, the triangle is no longer a right triangle! Lets investigate that further with a ratio between the hypotenuse and one of the legs. When you play with the values, you find that ratio to remain the same as well. This seems very promising and is a worthwhile candidate for our search towards a single-variable circle equation.

Mathematicians noticed these relationships that are maintained specifically for right triangles and assigned special values to them. The ratio between the leg that the angle theta points to and the hypotenuse, for example:

was called sine, later abbreviated sin (which is a sin in and of itself, if you ask me). And since we have demonstrated that the ratios do indeed stay the same, it can be uniquely represented with a single value, the angle $\theta$ itself! So our research totals to a function $\sin(\theta)$, where

\[ \sin(\theta) = \frac{\text{opposite side}}{\text{hypotenuse}}. \]

(In mathematics, you will often see the notation abbreviated as $\sin \theta$, but it means the exact same thing.)

Similarly, the following functions were defined:

\[ \cos \theta = \frac{\text{adjacent side}}{\text{hypotenuse}}, \quad \tan \theta = \frac{\text{opposite side}}{\text{adjacent side}}. \]

These three functions are also called trigonometric functions, and since they satisfy $x^2 + y^2 = 1$, their maximum value is 1 and their minimum value is -1. (To see why: if $x^2 + \sin^2 \theta$ = 1, then $\sin^2 \theta = 1 - x^2$ and since $x^2 \geq 0$ and $1 - x^2 \leq 1$, $0 \leq \sin^2 \theta \leq 1$, and so taking square roots yields $-1 \leq \sin \theta \leq 1$ because $\sin \theta = + \sqrt{1 - x^2} \leq 1$ or $\sin \theta = - \sqrt{1 - x^2} \geq -1$)

Remember our old friend, the distance formula? Well, you probably do because I told you about it 2 seconds ago. But anyway, before we consider how it connects to these trigonometric values, lets restrict the radius to 1 for simplicity:

\[ x^2 + y^2 = 1. \]

(The number 1 is the assignment of $d^2$.)

This gives us our unit circle once again. Now, lets draw a triangle inside that circle.

Clearly, the opposite side of $\theta$ is $y$ and the adjacent side is $x$. Notice that since the hypotenuse is 1, $\sin \theta = \frac{y}{1} = y$ and $\cos x = \frac{x}{1} = x$. This fact saves us the hassle of dealing with ratios, but remember that it only holds for unit circles!

But wait, we just said $x^2 + y^2 = 1$. It follows, at least in this case, that

\[ (\cos \theta)^2 + (\sin \theta)^2 = 1. \]

By applying some more notational magic,

\[ \cos^2 \theta + \sin^2 \theta = 1. \]

This is huge. Massive, even. A simple fact that ratios aid relative comparisons got transformed into us finding such a useful result.

With trivial multiplication, this can be extended to be a general result:

\[ k \cos^2 \theta + k \sin^2 \theta = k. \]

Okay, so now we have a single-variable representation of points on a circle. What if we wanted to simplify this problem further by not having any squares or any of that jazz? Yet another old friend comes into the rescue.

Recall that with imaginary numbers, multiplying a number (say, $b$) by $i$ gives you a rotation. Adding a real number to that, i.e. $a + bi$, can be used to represent a point, where $a$ is the point on the x-axis and $bi$ is the point on the y-axis:

In other words, any point $(x,y)$ is represented on the complex plane with $a + bi$. If we combine that with the fact that any point on a unit circle $(x,y) = (cos \theta, sin \theta)$, we find that

\[ z = cos \theta + i sin \theta \]

is the corresponding point of the circle on the complex plane.

Natural Logarithm, e & Euler’s Formula

In the late 1500s to early 1600s, astronomers and navigators had to compute stupidly large numbers like $(1.00023)^{7423}$ or multiply many large numbers repeatedly. This was very annoying to say the least, so the dream was to replace multiplication with addition.

John Napier was one of those mathematicians. To help progress this search, he researched a lot of sequences. What he found was that there is a connection between arithmetic progressions and geometric decay. For example, he placed the sequence of natural numbers in one table

\[ 0, 1, 2, 3, \ldots \]

and the geometric decay in another:

\[ 1, 0.9999999, 0.9999998, 0.9999997, \ldots \]

This is a pair of two evolving quantities. Napier defined the logarithm as the index that connects the two. (For example, in a sequence 2, 4, 6, 8, …, the number 6 is located at index 3.)

Suppose you have:

\[ x = (1 - \varepsilon)^n \]

Then:

\[ \log(x) = n \]

By multiplying two numbers:

\[ x_1 x_2 = (1 - \varepsilon)^{n_1 + n_2} \]

\[ \log(x_1 x_2) = \log(x_1) + \log(x_2). \]

This property was the entire point of the link between the two sequences. Now, instead of decay, if you imagine growth:

\[ (1 + \varepsilon)^n \]

As $\varepsilon$ gets smaller and $n$ gets larger in just the right way, the expression stabilizes. If we let $\varepsilon = \frac{1}{n}$, then we get

\[ (1 + \frac{1}{n})^n. \]

You can see the table of values of this function approach a value:

$n = 10$: 2.594
$n = 100$: 2.705
$n = 1000$: 2.717
$n = 10000$: 2.718

This number kept appearing, regardless of the table’s construction. He tried many sequences besides these two and the results always matched. The number that it approached was later named $e$:

\[ e = \lim_{n \to \infty} (1 + \frac{1}{n})^n \]

The “lim” is an operator that tells you what value you see a function get close to as you increase the input $n$. It’s going to be used a lot in following chapters.

This number $e$ is commonly used for compound interest, calculating the growth of populations, the cooling laws of physics, and many more fields.

This part is not finished.

Division by Zero & Limits in More Detail

Your middle school teacher probably taught you that division by zero is a big no-no. But why is that the case? Lets check what happens when we divide 10 by integers that get smaller and smaller.

$10 \div 10 = 1$.
$10 \div 9$ = 1.111…
$10 \div 8$ = 1.25.
…
$10 \div 3 = 3.333…$
$10 \div 2 = 5$
$10 \div 1 = 10$.

No matter what, as we go lower and lower, the value increases. The result is only going to start to decrease if we go even lower, to negative values:

$10 \div (-1) = -10$
$10 \div (-2) = -5$
$10 \div (-3) = -3.333…$

If we focus on one of these particular divisions by positive or negative values, like $10 \div 2$ which equals 5, we notice that the function “wraps” around that value for very small differences, from both sides. For example, from the right side:

$10 \div 2.01 \approx 4.975.$
$10 \div 2.001 \approx 4.998.$
$10 \div 2.0001 \approx 4.999.$

And from the left side:

$10 \div 1.99 \approx 5.025$
$10 \div 1.999 \approx 5.0025$
$10 \div 1.9999 \approx 5.00025$

So in fact, if we had chosen any number for division other than 2, we will see the division wrap around the result of it for arbitrarily small differences from either side.

Lets analyze division by 0 and see if the same happens. From the right side:

$10 \div 0.01 = 1000$
$10 \div 0.001 = 10000$
$10 \div 0.0001 = 100000$

And again, from the left side:

$10 \div (-0.01) = -1000$
$10 \div (-0.001) = -10000$
$10 \div (-0.001) = -100000$

So in fact, the division does not wrap around a particular value for division by very small differences from 0. This means we cannot be confident about what value the operation stabilizes to in order to be able to define division by zero.

Why does this happen? We saw before that for negative values, the function actually decreases, and since 0 is the “limit” point of where the function’s behavior is increasing or decreasing for the division, so it makes sense why this is treated as an undefined point for division.

To be precise, we conclude that if we approach 0 with very small differences from the positive side, the division goes to infinity, and if we approach 0 with very small differences from the negative side, the division goes to negative infinity. This is also why “infinity” isn’t actually a number but rather an idea; that something keeps on increasing or decreasing without bound, so the “negative” infinity that you see with a minus isn’t literally multiplying by $-1$.

This idea of limit points is formalized with the operator we saw before — the limit. There are right-side limits and left-side limits, denoted with + and - respectively above the value being approached. Based on our analysis, we can clearly see that

\[ \lim_{x \to 2^+} \frac{10}{x} = 5, \quad \lim_{x \to 2^-} \frac{10}{x} = 5. \]

This is read like, “as the input x approaches 2, 10/x approaches 5 from both the right and left side.” Furthermore,

\[ \lim_{x \to 0^+} \frac{10}{x} = \infty, \quad \lim_{x \to 0^-} \frac{10}{x} = - \infty. \]

However, don’t assume that the limit existing means you can evaluate the function at that point. For example, if you have

\[ f(x) = \frac{x^2 - 1}{x - 1} \]

and you want to know the limit of it as $x$ approaches 1, then if you analyze the limit using the same approximation technique and check what value it wraps around, you find that

\[ \lim_{x \to 1} f(x) = 2. \]

However, calling the function with $x = 1$ gives us division by zero, which is undefined:

\[ f(1) = \frac{1^2 - 1}{1 - 1} = \frac{0}{0}. \]

Also, limits can sometimes simply be inferred by looking at the graph of the function:

Therefore the limit at $x = 1$ exists but $f(1)$ itself is not the number it approaches.

Limits can also be computed with algebraic tricks. For example, we see that

\[ \lim_{x \to 1} f(x) = \lim_{x \to 1} \frac{x^2 - 1}{x - 1} = \lim_{x \to 1} \frac{(x-1)(x+1)}{x-1} = \lim_{x \to 1} (x + 1) = 2. \]

So to wrap things up (no pun intended), division by zero is undefined, and one informal way is to check how the operation you are interested in behaves for very small differences and if it wraps around a particular value (it doesn’t have to be division; it could be anything). And if you want to be 100% sure that your approximation is not misleading or incorrect, you use algebra to rigorously calculate the limit.

Rate of Change & Derivatives

In a previous chapter, we discussed functions. One thing that we might want is a reliable metric that tells us how fast a function changes in a particular range of inputs. For instance, let $f(x) = x$.

Let us take any two positions, $a = 2$ and $b = 4$. So we are interested in how quickly the function changes inputs in the range of values between $a$ and $b$. Looking at the function’s graph, we see that the function grows exactly the same for any range that we choose, regardless if the particular $a$ and $b$ we choose. So when defining this “rate of change,” it should remain constant.

Really, we are interested in a relative comparison of the input and the output of the function during that range. We know from previous chapters that the way to calculate that difference is $b - a$ and $f(b) - f(a)$ for the inputs and outputs, respectively.

Now, recall that ratios are a great way to gain a relative measure for comparing two quantities, regardless of the metric system used and the size of other quantities in the same system. So we can find the relative difference with the formula

\[ \frac{f(b) - f(a)}{b - a}. \]

Why did we put $f(b) - f(a)$ at the top and not at the bottom? Because when that value increases we want the rate of change of the function in the interval from $a$ to $b$ to increase, and conversely, decrease when $f(b) - f(a)$ decreases.

If we try this for our values, we find that

\[ \frac{f(b) - f(a)}{b - a} = \frac{4 - 2}{4 - 2} = 1. \]

is our desired rate of change. And in fact, the rate of change “1” remains constant for any $a$ and $b$.

We can try this formula with other functions as well. Let $g(x) = x^2$ be an exponential function and use the same $a$ and $b$.

Then we find that

\[ \frac{g(b) - g(a)}{b - a} = \frac{4^2 - 2^2}{4 - 2} = \frac{16 - 4}{4 - 2} = 6, \]

which does not remain constant if we make other choices of $a$ and $b$. This can be seen in the graph since the functions output explodes for large inputs.

Why restrict the idea of “rate of change” to intervals? Say we want to know how the function behaves at a particular point $a$. Then in our formula, we could set $b$ be addition to it like $0.001$ to get a very small measure. So we set $b = a + 0.001$ and we try $a = 3$:

\[ \frac{f(b) - f(a)}{b - a} = \frac{f(3.001) - f(3)}{3.001 - 3} = 6.001. \]

This doesn’t seem much, but repeating the process for larger values reveals that the rate of change is increasing very fast. For example, $a = 100$:

\[ \frac{f(b) - f(a)}{b - a} = 200.001. \]

We have a constant $0.001$ at the end because of our choice for $b$, so we could make our formula subtract it after the addition to make the rate of change look a little bit nicer. Since so far our selection has been $b = a + 0.001$, lets generalize this by setting $b = a + h$, where $h$ is a very small constant. Then our formula becomes

\[ \frac{f(b) - f(a)}{b - a} - h. \]

It can be simplified further:

\[ \frac{f(b) - f(a)}{b - a} - h = \frac{f(a + h) - f(a)}{(a + h) - a} - h. \]

So in the end we have:

\[ \frac{f(a + h) - f(a)}{h} - h. \]

Now we have the rate of change of the function at a particular point with a very small difference, so we could call it the “pointwise” rate of change (or in standard texts, instantaneous rate of change). Lets denote this operation with a special name:

\[ f’(x) = \frac{f(a + h) - f(a)}{h} - h. \]

There is one problem with the current variation of this formula, however. Consider $h(x) = -x^2$. Then for $a = 10$ and $b = a + 0.001$,

\[ \frac{h(10.001) - h(10)}{10.001 - 10} = -20.001, \]

and adding 0.001 to it makes it -20.002 rather than a nice -20, so our initial oversimplification of the solution being just subtracting h is not correct. However, we will see that this does not change the notion of our formula at all.

In general, for the pointwise rate of change, we care about extremely small differences only; infinitesimally small. So instead of having h be a chosen constant, we can let it be a limit:

\[ f’(x) = \lim_{h \to 0} (\frac{f(a + h) - f(a)}{h} - h). \]

Notice how the division grows independently and far larger from the subtraction $h$, so as $h$ approaches 0, that subtraction actually vanishes:

\[ f’(x) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h} \]

Let us call $f’(x)$ the derivative. The property of the limit we just discovered can be generalized to this: for any function $f(x)$ and $g(x)$,

\[ \lim_{x \to c} (f(x) + g(x)) = \lim_{x \to c} f(x) + \lim_{x \to c} g(x). \]

Finally, if you use a limit calculator, you find that

\[ h’(a) = h’(10) = -20, \]

so this verifies that our problem has been fixed without actually needing to address the issue.

As one final note, if you go back to the examples of the early version of the derivative for $g(x)$, our approximation were very close to the value of $2x$ but with $x = a$. In fact, it can be proven that

\[ g’(x) = 2x, \]

so you don’t need to deal with limits at all! You can verify this by manually computing it for any value:

\[ g’(2) = \lim_{h \to 0} \frac{g(2 + h) - g(2)}{h} = \lim_{h \to 0} \frac{(2 + h)^2 - 2^2}{h} = \lim_{h \to 0} \frac{4 + 4h + h^2 - 4}{h} = \lim_{h \to 0} \frac{h(4 + h)}{h}. \]

So it simplifies to:

\[ g’(2) = \lim_{h \to 0} (4 + h) = 4. \]

Our takeaway from this: besides the fact that we have a nice pointwise rate of change formula, miniscule differences with limits usually abstract away and contribute nothing in the long run. That’s one of the reasons limits are very powerful.

Areas of Functions & Integrals

What does area actually represent in mathematics? Imagine you have a chess board. The chess board has rows 1, 2, 3, all the way to 8 and similarly columns 1, 2, 3, all the way to 8.

This means that, with multiplication, we find that the total number of slots for pieces are $8 \times 8$. This covers every single possible combination of row and column. So the value $8 \times 8$ could be considered the area of the board.

Notice that the value we can set for the row and column position is discrete. For example, we can’t say row 4.5 and column $\sqrt{2}$. In other words, we only accept integer values.

In mathematics, we extend this idea for real numbers as well. But notice how we cannot describe the area when talking about real positions as the number of possible coordinates $(x, y)$ we can position in the shape, because the number of decimals for any real number is infinite, so we would have infinite such coordinates. (For example, for the integer 4, some decimal positions are 4.1, 4.01, 4.001, 4.0001, …) Instead, how you should think about it is that it “fills” the entire area of the geometric shape.

What is the area of a square? Take, for example, the square shape below.

We can see that each of the side lengths is 4, so the area should be $4 \times 4 = 16$.

What if we cut the square diagonally?

This forms two right triangles. To calculate the area of each, thinking about the area of the shape as the total that fills it, it’s only natural to halve it. So the area of each triangle is $4 \times 4 \times \frac{1}{2} = 8$. In general, we can calculate the area of any right triangle with legs $a$ and $b$ with $a \times b \times \frac{1}{2}$.

Now pay close attention to this example. Let $f(x) = x + 1$.

We want a general formula for the area of $f(x)$ — specifically, the part from the triangle formed starting from $x = 1$ and some arbitrary point $x_0$ with $x_0 > 1$.

Because of the offset of 1, the base of the triangle will be $x - 1$. The height of the triangle is just $f(x)$, so our area is

\[ A(x) = \frac{1}{2} (x - 1)(x + 1) = \frac{1}{2} x^2 - x + \frac{1}{2} \]

Nothing out of the ordinary. However, observe what happens if you compute the derivative:

\[ A’(x) = x - 1. \]

It is simply the original function! Why is this happening? Our observation is that the derivative of the area of a function is equal to the function itself.

If we have a more complicated function like $x^2$, which is not linear, calculating its area for specific intervals is going to be difficult. Could we perhaps find a way to generalize this result to compute areas under curves?

From our example, we could define such an operator with the $\int$ symbol and name it the integral from $x$ to $x_0$. Then we have

\[ \int_x^{x_0} f(x) = A(x). \]

So the first integration rule we have discovered is

\[ \int (x - 1) = \frac{1}{2} (x - 1)^2. \]

If you follow the exact same steps with $f(x) = x$ instead, you find that

\[ \int x = \frac{1}{2} x^2. \]

What other rules exist? Integration appears to be the opposite of differentiation, and inverse operations in mathematics are generally more difficult. Finding the exact formula for an integral without an approximation technique is generally difficult, so for this reason mathematicians have developed a set of integration rules that are applied systematically to compute the integral. These rules could have been found simply by differentiating random functions based on your intuition, to make them equal to the desired area.

Surprisingly, an approximation technique will let us formalize the definition of an integral. Consider the $f(x)$ again but partitioned into many blocks. These blocks have height equal to f(x).

We might not be able to compute areas of complicated functions with standard area shape formulas, but by creating these blocks, we can calculate each individually and then sum them up to get an estimate of the area. This means the smaller the blocks, the more accurate the approximation.

To do so, we will start with a function $f(x) = x^2$.

To be able to sum the up smoothly, the width of each block is equal. Let $t$ be the width of the interval that we are interested the area of. If we were interested in a particular range, it would be $b - a$. Then the width of each block is $\frac{t}{n}$, where $n$ is the number of blocks we want. Lets call it $\Delta x$.

\[ \Delta x = \frac{t}{n}. \]

This evenly slices it for each block.

For a practical demonstration, let $n = 5$ and $t = 10$. Then

\[ \Delta x = \frac{10}{5} = 2. \]

So the interval $[0, 10]$ is divided into the subintervals:

\[ [0, 2], [2, 4], [4, 6], [6, 8], [8, 10]. \]

Now there is a crucial decision for us to make. We need to choose which point we will use to calculate the height of each subinterval’s block. It doesn’t matter which point we choose as long as we keep the choice the same for all subintervals.

For example, if we pick the rightmost point of each subinterval, then for each one, our choices would be

\[ x^{\ast}_1 = 2, x^{\ast}_2 = 4, x^{\ast}_3 = 6, x^{\ast}_4 = 8, x^{\ast}_5 = 10. \]

The use of asterisk to denote these special variables is merely a stylistic choice and there isn’t a particular reason they are used.

With these choices, the area of the first block, for instance, would be

\[ f(x^{\ast}_i) \cdot \Delta x. \]

Remember that $\Delta x$ is the base and that $f(x^*_i)$ is the height. So we can do the same for the rest of the blocks and get an approximation of the area under the curve.

\[ \int_0^{10} f(x) \approx \Delta x (f(x^{\ast}_1) + f(x^{\ast}_2) + f(x^{\ast}_3) + f(x^{\ast}_4) + f(x^{\ast}_5)). \]

This result can be generalized. When we have a sum in mathematics like

\[ 1 + 2 + 3 + \ldots + n \]

we can denote it as

\[ \sum_{i=1}^n i. \]

The $i = 1$ is the initial value of the variable. The expression in front of the Greek letter $\Sigma$ is what is being added each time. It gets added up until $i$ reaches $n$.

So our previous expression can be simplified to

\[ \int_0^{10} f(x) \approx \Delta x \sum_{i=1}^5 f(x^{\ast}_i) \]

We don’t need to restrict it to sums of 5 blocks:

\[ \int_0^{10} f(x) \approx \Delta x \sum_{i=1}^n f(x^{\ast}_i), \]

where $n$ is the number of blocks that were chosen as before.

Then we apply the same idea that we did for the derivative where we made it shrink $h$ to 0, but here, we are gonna make the number of blocks grow without bound instead. That way, it is no longer an approximation but instead precisely equal to the integral itself. So we have

\[ \int_0^{10} f(x) = \lim_{n \to \infty} \Delta x \sum_{i=1}^n f(x^{\ast}_i), \]

and more generally,

\[ \int_a^b f(x) = \lim_{n \to \infty} \Delta x \sum_{i=1}^n f(x^{\ast}_i), \]

where $\Delta x = \frac{b - a}{n}$.

Not only have we obtained a great approximation method, but we have a sufficiently rigorous definition of an integral now. Our updated view of an integral is the sum of blocks that partition a function $f(x)$ into equal subintervals, where the blocks become more and more.

Before we continue, ask yourself this question to see if you truly understand what the limit does here: If we made $n$ smaller and smaller instead of larger and larger, what would happen to our approximation?

Now, notice that since it is a sum of blocks, and more specifically, it is a sum of products of bases and heights. In fact, since $\Delta x$ is a variable defined based on $n$, when the base becomes really really small (as in $n \to \infty$), we refer to it as $dx$ instead. So you can imagine it (not literally the same) as

\[ dx = \lim_{\Delta x \to 0} \Delta x. \]

The reason this convention was made was to be used directly inside the definition of the integral. Our integral notation $\int f(x)$ at the moment does not hint to us what variable is constant (for example, we could have $\int axyz$) and what is the actual one, so it is written as

\[ \int_a^b f(x)\, dx \]

instead.

Unfortunately, going deeper into how one finds a solution for an integral without reverse engineering it with the derivative would require more advanced knowledge from real analysis to decomposite it into its logical lower-sum and upper-sum definition, which is beyond the scope of this article, so our investigation on the rigor of it must be put at a stop here. The good news is that with our intuitive understanding of trigonometric functions, $e$, complex numbers, and differentiation/integration, we are ready to see much more advanced concepts.

The Fourier Transform

This section is under construction.

The main question the Fourier transform wanted to answer is: If a signal is made of waves, what waves are they, and how strong is each one?

In other words, it decomposes the function and finds its recipe; it transforms a function from the time/space domain to the frequency domain.

As an analogy, consider what happens when you hear a chord. You only hear one sound, but it is actually made of different notes, each one having its own pitch (called the frequency), and its own “loudness” (amplitude). Your brain automatically decomposes that chord, but Fourier does it mathematically.

A signal is represented as a function of time, $f(t)$, and the Fourier transform transforms it into a frequency as $F(\omega)$. So instead of asking how much is the value at time $t$, you ask how much frequency $\omega$ is present.

The reason sine/cosine waves are used is because they are perfectly smooth, repeat forever, and combine to approximate almost any signal. If you are familiar with the Taylor series, it uses the exact same idea but instead with a special kind of polynomial approximation. Sine/cosine waves will be more generally referred to as sinusoids.

Its formula is

\[ F(\omega) = \int_{- \infty}^\infty f(t) e^{-i \omega t}\, dt \]

It’s really just multipling by a complex wave $e^{-i \omega t}$ and then integrating to measure similarity. $e^{-i \omega t}$ traces a circle where the speed of rotation is the frequency $\omega$ and the radius its amplitude. So if we draw it as $t$ increases, a high frequency means a fast spinning circle, a low frequency means a slow spinning circles, a negative frequency means opposite direction, and zero frequency means constant point. This is useful because a circle is the geometric representation of a sinusoid.

We can see this better by recalling Euler’s identity:

\[ e^{i \omega t} = cos(\omega t) + i sin(\omega t) \]

Appendix

A — Axioms of Real Numbers

In algebra, the real numbers (2, 4.5, 2.222 repeating) are rigorously defined with a set of axioms. Some of these are:

Distributivity: $a \cdot (b + c) = a \cdot b + a \cdot c$
Multiplicative identity: $1 \cdot a = a$
Additive inverse: For every $a$, there exists $-a$ such that $a + (-a) = 0$.

These can be used to extend the real number line to include negative numbers. We define -1 as the additive inverse of 1:

\[ 1 + (-1) = 0. \]

Now, using distributivity:

\[ 0 \cdot 1 = (1 + (−1)) \cdot 1 = 1 \cdot 1 + (−1) \cdot 1. \]

We know $0 \cdot 1 = 0$ and $1 \cdot 1 = 1$, so

\[ 0 = 1 + (-1) \cdot 1 \]

implies

\[ (-1) \cdot 1 = -1. \]

For multiplying a negative by another negative, we want $(-1) \cdot (-1)$. Again, we start from distributivity:

\[ 0 = (-1) \cdot 0 = (-1) \cdot (1 + (-1)) = (-1) \cdot 1 + (-1) \cdot (-1). \]

We know $(-1) \cdot 1 = -1$, so

\[ 0 = -1 + (-1) \cdot (-1) \]

implies

\[ (-1) \cdot (-1) = 1. \blacksquare \]

Credits

Universe Wallpaper

Template Specialization & Concepts in C++

2025-11-21T00:00:00+01:00

If you come from Rust, you might miss the value of traits. Thankfully, C++ has something very similar to them. Suppose you are creating your own vector type (for the thousandth time). You probably do something like:

template<typename T>
struct Vector3 {
  T x, y, z;
};

template<typename T>
struct Vector3 {
  T x, y, z, w;
};

Template Specialization

C++ has a feature called template specialization that allows you to specify behavior for particular instances of the template:

template <typename T>
struct Foo {
  static void display() {
    std::println("General template");
  }
};

template <>
struct Foo<int> {
  static void display() {
    std::println("Specialized for int");
  }
};

The C++ compiler knows that you are specifying a specialization when you use empty <>. If you want to be really fancy, you might use it for your vector class like so:

template <int_or_float T, typename Derived> struct IVector {
  T x{};
  T y{};

  auto operator+(this const IVector &self, const Derived &other) noexcept
      -> Derived {
    return Derived{self.x + other.x, self.y + other.y};
  }
};

template <typename T> struct Vector2 : IVector<T, Vector2<T>> {
  using Base = IVector<T, Vector2<T>>;
  using Base::Base;
}; // AGGREGATE

template <typename T> struct Vector3 : IVector<T, Vector3<T>> {
  T z{};

  auto operator+(this const Vector3 &self, const Vector3 &other) noexcept
      -> Vector3 {
    auto result = IVector<T, Vector3>::operator+(other);
    result.z = self.z + other.z;
    return result;
  }
}; // AGGREGATE

template <typename T> struct Vector4 : IVector<T, Vector4<T>> {
  T z{};
  T w{};

  auto operator+(this const Vector4 &self, const Vector4 &other) noexcept
      -> Vector4 {
    auto result = IVector<T, Vector4>::operator+(other);
    result.w = self.w + other.w;
    return result;
  }
}; // AGGREGATE

Since inheritance (which is a runtime mechanism) and templates (which are a compile-time mechanism) don’t go well together, we have applied CRTP. But a problem still remains: how do we ensure that the user of our user supplies correct types? In our case, the ones that make sense are integers and floats.

This is where concepts come in. For the basic types like int, unsigned int, and so on, the C++ standard groups them under one general name called integral types.

Concepts

The exact implementation of the things that we will discuss ultimately depend on the C++ implementation by the compiler, but we will use some basic patterns that are common across all of them. Conceptually, std::is_integral, which is a utility for determining whether a type is one of the integrals, is defined like so:

template <typename T>
struct is_integral : std::false_type { };

template <>
struct is_integral<bool> : std::true_type { };
template <>
struct is_integral<char> : std::true_type { };
template <>
struct is_integral<signed char> : std::true_type { };

Here, std::false_type is what is known as a type trait. Type traits are compile-time tools that allow you to ask questions about a particular type. Most type traits are implemented with a static boolean member value, which is precisely what std::false_type does:

struct false_type {
    static constexpr bool value = false;
    using type = false_type;
};

You can probably guess the implementation of std::true_type based on this as well. The idea is that if we make specializations inherit std::true_type instead, the base value becomes true, which allows the compiler to do some type checking we will see just in a second.

Any type that inherits from a type trait is also a type trait. Consequently, std::is_integral is a type trait.

The value of a type trait is often abbreviated with _v prefixes in the standard library:

template <typename T>
inline constexpr bool is_integral_v = is_integral<T>::value;

Now, where do concepts come in play? Concepts allow constraining template parameters with semantic rules defined by type traits. That means if we define:

template<typename T>
concept integral = std::is_integral_v<T>;

And use the concept in place of our typename:

template <int_or_float T, typename Derived> struct IVector {
  T x{};
  T y{};

  auto operator+(this const IVector &self, const Derived &other) noexcept
      -> Derived {
    return Derived{self.x + other.x, self.y + other.y};
  }
};

The compiler, fundamanentally speaking, will evaluate the boolean condition at compile-time. If it is false, you will get a compilation error. We can make the rest of our vector implementation use this concept:

template <int_or_float T, typename Derived> struct IVector {
  T x{};
  T y{};

  auto operator+(this const IVector &self, const Derived &other) noexcept
      -> Derived {
    return Derived{self.x + other.x, self.y + other.y};
  }
};

template <int_or_float T> struct Vector2 : IVector<T, Vector2<T>> {
  using Base = IVector<T, Vector2<T>>;
  using Base::Base;
}; // AGGREGATE

template <int_or_float T> struct Vector3 : IVector<T, Vector3<T>> {
  T z{};

  auto operator+(this const Vector3 &self, const Vector3 &other) noexcept
      -> Vector3 {
    auto result = IVector<T, Vector3>::operator+(other);
    result.z = self.z + other.z;
    return result;
  }
}; // AGGREGATE

template <int_or_float T> struct Vector4 : IVector<T, Vector4<T>> {
  T z{};
  T w{};

  auto operator+(this const Vector4 &self, const Vector4 &other) noexcept
      -> Vector4 {
    auto result = IVector<T, Vector4>::operator+(other);
    result.w = self.w + other.w;
    return result;
  }
}; // AGGREGATE

You will now notice that, in the following code:

#include 

#include "vector.hpp"

int main() {
  // this is OK
  auto a{Vector4<std::float32_t>{0.5f32, 0.5f32, 1.0f32, 1.0f32}};

  // this fails to compile because the constraints are not satisfied
  auto b{Vector4<std::string>{"A", "B", "B", "C"}};
  
  return 0;
}

So concepts let us have these compile-time guarantees that were impossible before C++20.

Now that you understand how they work and how to use them, enjoy applying them to your codebase (if you don’t have a 32 year old monolith stuck in C++11). Below is a small list of them.

std::copyable: T is copy-constructible and copy-assignable.
std::movable: T is move-constructible and move-assignable.
std::equality_comparable: T supports == and !=.
std::totally_ordered: T Supports <, <=, >, >=, ==, and !=.
std::default_initializable: T can be default-constructed (T t{} or T t;).
std::destructible: T can be destroyed with a destructor.
std::assignable_from: Can assign a value of type U to a variable of type T.
std::same_as: T and U are exactly the same type.
std::derived_from: T is derived from U.
std::convertible_to: T can be implicitly converted to U.

Understanding the Nix Store by Hunting Packages Down

2025-10-26T00:00:00+02:00

In NixOS, packages installed on the system do not follow the standard Linux filesystem hierarchy. I see this nonstandard approach perplex newcomers who are not familiar with it in a practical way. Thankfully, Nix(OS) provides a set of utilities to traverse its special quirks, in the form of commands. This article focuses on such commands, the motivation behind using them, and how they relate to the Nix store. It also aims to show (partially) why it is designed this way by using concrete examples.

By the end of this, you will know how to effectively navigate the Nix store and understand the underlying principles of its package management system; you will learn the Nix store’s structure, the relationship between derivations and store paths, and how symlinks and profiles work to manage packages.

This is my first blog, so I will keep it concise to get feedback on the pacing, tone, and the content being covered. I hope it’s helpful 🙏

Prerequisites

Before I begin, a few infamous terms need to be thrown out of the way. A derivation is a build recipe for software. It consists of inputs and outputs. Outputs are what we will later discover to be directly connected to paths in the Nix store (store paths). Store paths are always directories, and they are one kind of object in the Nix store, or a store object. There are other store objects, and in fact derivations are a store object. Store paths are a byproduct of derivations, and they contain the programs that we use.

Normally, for a top-to-bottom approach like this you would start with something everyone is familiar with, such as software applications. Instead, I will take the converse approach to give further motivation for the concepts in this blog.

Scenario 1 - Finding a Library

For the first scenario, suppose you are a programmer and have installed a library like glm on your NixOS system, which is a mathematical library for C/C++. If you do not understand what that means, imagine code that exists on your system as a package that you cannot directly run. Since it is not an executable, you cannot simply cheat your way by reading a symlink in your path (though specialized environment variables exist). So unless we want to run find in /nix/store and wait until the heat death of the universe for it to finish, and then go through duplicate installations of the library to find the one we are looking for, we need to learn some Nix commands.

First, we need to enter a Nix REPL environment to resolve the package directly. REPL stands for Read-Eval-Print Loop, which is a powerful environment for testing various Nix expressions. We can access it with the primary nix command.

$ nix repl
nix-repl> pkgs = import  {}
nix-repl> pkgs.glm
«derivation /nix/store/25rdysybbl5aangkgkdznc57xfihq2zk-glm-1.0.1.drv»

This article assumes a basic familiarity with the Nix language, so a crash course for it will not be interleaved. But as a side note, attributes in Nix are evaluated lazily, so we do not worry about the import taking a long time. This is due to the fact that lazy evaluation guarantees expressions are evaluated only when they are needed—not when they are defined.

The last command yields a path because the interpreter treats it as a special kind of attribute set that was introduced as a derivation. Indeed, we can confirm that by running another command in the environment.

nix-repl> lib = pkgs.lib
nix-repl> lib.attrsets.isDerivation pkgs.glm
true

From the penultimate set of commands, we infer that the derivation is located at /nix/store/25rdysybbl5aangkgkdznc57xfihq2zk-glm-1.0.1.drv. (Derivations always use the .drv file extension.)

Every derivation (.drv) has a set of outputs. When a derivation is realised (i.e. built), these become what we will find out to be store paths.

The next step is finding the actual store path where the library file (.a or .so) resides in. We can use the nix-store command to achieve this.

$ nix-store --query --outputs /nix/store/25rdysybbl5aangkgkdznc57xfihq2zk-glm-1.0.1.drv
/nix/store/gkahdgly8x8z8b6cvgab4gij0niajx7x-glm-1.0.1-doc
/nix/store/nlrq8s273x3kbm2w4g90pymx1ab9ww3p-glm-1.0.1

The latter is the one we are looking for since the former is documentation. This shows that store paths are blobs in the Nix store where different versions of the same package may exist. It is now trivial to list the lib directory, which reveals that the library we are searching for is there and is statically-linked.

$ ls /nix/store/nlrq8s273x3kbm2w4g90pymx1ab9ww3p-glm-1.0.1/lib
libglm.a
pkgconfig

For programmers, it should be completely unsurprising that it is not in some sort of path because it need not be resolved automatically. Later, we will see that this is not the case for the other type of libraries.

It is also possible to inspect the derivation itself in a human-readable JSON form with the nix command, which lists the mysterious outputs that were mentioned before:

$ nix derivation show /nix/store/25rdysybbl5aangkgkdznc57xfihq2zk-glm-1.0.1.drv
...lots of json output

Here is a snippet from the part of the JSON that specifically contains the outputs:

"outputs": {
  "doc": {
    "path": "/nix/store/gkahdgly8x8z8b6cvgab4gij0niajx7x-glm-1.0.1-doc"
  },
  "out": {
    "path": "/nix/store/nlrq8s273x3kbm2w4g90pymx1ab9ww3p-glm-1.0.1"
  }
}

It is not hard to deduce from this that outputs have a one-to-one correspondence with store paths. So that is what they create when you realise a derivation with:

$ nix-store --realize /nix/store/25rdysybbl5aangkgkdznc57xfihq2zk-glm-1.0.1.drv

Note that the .drv file that has produced the store path we are looking for is called the deriver. The inverse operation—finding a derivation from a store path—simply involves asking for the deriver.

$ nix-store --query --deriver /nix/store/nlrq8s273x3kbm2w4g90pymx1ab9ww3p-glm-1.0.1/
/nix/store/25rdysybbl5aangkgkdznc57xfihq2zk-glm-1.0.1.drv

Putting everything together, we can see the chain of symlinks in this concise graph:

But why is it stored this way? We have the initial motivation for researching it, so all that remains is reverse engineering the design choice.

Scenario 2 - Understanding Benefits with an Example

Now suppose you had a store path with an executable, say git. We can leverage the fact that it is in the path to track it down more easily and figure out more stuff about our system. First, we find where its reference is in the path.

$ which git
/home/user/.nix-profile/bin/git

This is part of the running user profile. By definition, profiles in NixOS are symlinks to specific generations. More precisely, there is one profile in every system, but there can be more than one generation. Generations are simply snapshots of profiles, which are pointers for a set of installed packages or system configuration.

In our case, you can see the symlink here:

$ ls -l /home/user/.nix-profile
lrwxrwxrwx 1 user users 44 Apr  5  2025 .nix-profile -> /home/user/.local/state/nix/profiles/profile

Programs in the bin directory of profiles are also symlinks, so using ls we can find its store path:

$ ls -l /home/user/.nix-profile/bin/git
lrwxrwxrwx 20 root root 62 Jan  1  1970 /home/user/.nix-profile/bin/git -> /nix/store/v2rxk9xkcxsas64wl7ds31al15cm2wqd-git-2.50.1/bin/git

(The command readlink -f is generally better for fetching the real file of a symlink, but ls -l will be used for demonstrative purposes.)

Equivalently, NixOS has a system profile, which resides in /run/current-system; it is the same as /nix/var/nix/profiles/system, whose content follows this structure:

$ ls /run/current-system
activate
bin
dry-activate
extra-dependencies
init
initrd
kernel-modules
nixos-version
sw
systemd
append-initrd-secrets
boot.json
etc
firmware
init-interface-version
kernel
kernel-params
specialisation
system

The only difference one needs to understand at this stage of learning is that the packages you install with Home Manager will be in the user profile, and the ones you do not are in the system profile.

Furthermore, /run/current-system/sw (where sw is a shorthand for “software”) is where all of our executables are symlinked to. And unsurprisingly, these symlinks point to the store paths we discussed!

$ ls -l /run/current-system/sw
lrwxrwxrwx 11 root root 55 Jan  1  1970 /run/current-system/sw -> /nix/store/fbdm2v6r78w3n0a7f78pbnjdwpdwi12x-system-path
$ ls /nix/store/fbdm2v6r78w3n0a7f78pbnjdwpdwi12x-system-path
bin
etc
lib
sbin
share

An important interjection here is that, since glm is a static library, it is not located in the lib subdirectory; only shared library files (.so) are in it. Just like a path needs to exist for regular programs, it suffices to view this as an equivalent path existing exclusively for shared libraries, which are used by other programs.

Since Git is installed system-wide, you can probably guess that it is in the bin (binary) directory, which in turn is symlinked to the abovementioned store path.

> ls -l /nix/store/fbdm2v6r78w3n0a7f78pbnjdwpdwi12x-system-path/bin/git
lrwxrwxrwx 62 root root 65 Jan  1  1970 /nix/store/fbdm2v6r78w3n0a7f78pbnjdwpdwi12x-system-path/bin/git -> /nix/store/gz9a9vvx15cwznzw2h1gr4k7778bbgqk-firejail-wrap/bin/git

Wrapping this section up, you can see everything we have discovered from the system profile in this graph:

The Why & Conclusion

As we observed in the previous example, the profile is a chain of symlinks whose destinations are store paths in the Nix store. The fact that we can simply switch applications by modifying the symlinks means that, if we later want to revert to an older version of an arbitrary package, say because it broke, all we have to do is make the symlinks point to the old store paths! And that’s precisely what your NixOS system does—automatically manages profiles and the Nix “generations” that you see upon system initialization.

The store path has the version of the package to help distinguish between new and old versions, and the random letters that you see in front of its name (in this case, fbdm2v6r78w3n0a7f78pbnjdwpdwi12x) is called the store path hash. If two store paths in the Nix store were to have the same name and version, then this random string of letters that gets generated through a mathematical process ensures that they do not conflict. The combination of these three attributes makes up the store path, and this is why the Nix store is called content-addressed.

Obviously, if we modify an existing store path, that could break the link between the derivation, but also make future reverts unreliable. This is why you are not allowed to edit it. The property is called immutability.

Lastly, the reason derivations are the way they are is they orchestrate all of the store paths and provide them all the necessary dependencies. We saw the inner details of how they achieve this in the first example.

And this concludes our witchhunt! We have seen how all of these little things are interconnected in the ecosystem, and you have learned the fundamentals of this complex monolith called the Nix store. I leave the traversal of the user profile located in ~/.nix-profile as an exercise to the reader.

To summarize everything that I demonstrated, below is a cheatsheet of the commands and material introduced above.

nix-store --query --outputs : fetch the store paths associated with a derivation
nix-store --query --deriver : fetch the derivation that a store path originates from
nix-store --realize : build (realise) a derivation; create store paths from outputs
nix derivation show : inspect a derivation in human-readable JSON form
/run/current-system: the NixOS system profile; same as /nix/var/nix/profiles/system
~/.nix-profile: the NixOS user profile

Here are some bonus commands that were not needed but very useful in day-to-day store operation:

nix-store --query --referrers : see what references a store path
nix-store --query --roots: roots of a store path
nix-store --query --graph: produce a neat Graphviz DOT representation of a dependency graph for a package

For further reading, I suggest looking into the official NixOS documentation.

Bonus: Garbage Collection

Ever wondered how the Nix store is able to know which store paths need to be deleted? In case you weren’t aware, garbage collection allows us to delete old store paths by running nix-collect-garbage -d.

So far, we know that the relation is essentially profile ↔ store ↔ deriver. As it was explained before, store paths are immutable by design. Garbage collection simply identifies every store path that is not reachable from a set of GC roots, and deletes them.

Since every profile generation in ~/.nix-profile is considered a root in and of itself, as long as it exists, these store paths that we saw are considered new. If you build a new generation, the old generations still exist until you delete them or they get automatically deleted, so their store paths are still referenced. In other words, if you want old packages actually removed, you must remove or prune those old generations and then run the garbage collector.