The Catalytic Potential of Entropy
Weekly Article
Alex Gontar / Shutterstock.com
Jan. 9, 2020
You may have casually used the word “entropy” to describe a world that looks on the verge of collapse—or at least one that is less predictable—where order is in decline. But if we really dive into the word, we might get some insight into how entropy defines how we use language and technology to communicate with one another—and whether a cooling cup of coffee in front of a movie helps us gain that insight.
Entropy forms the backbone of both natural and computer science—two disciplines responsible for many powerful technologies of the modern world.
In thermodynamics, entropy is the key tenet of the discipline’s second law. The First Law of Thermodynamics states that energy can neither be created nor destroyed—only its state can change. The Second Law, also known as the Entropy Law, states that in a closed system, energy always seeks to be evenly distributed. In practice, this means that in a small energy “system” involving a room and a mug of coffee, the mug of coffee will eventually become room temperature. At that point, the system achieves thermodynamic equilibrium—energy is evenly distributed.
In information theory, on the other hand, entropy refers to the rate at which information is communicated in a message. That's pretty abstract, but it makes more sense when considering language. Say you’re searching for The Dark Knight Rises on Netflix. You want the desired result to pop up after entering just one or two words, so assuming the order doesn’t matter, you have to decide which words communicate the most information, or reduce uncertainty, to the greatest extent. You’d probably want to enter “Dark Knight” rather than “The Dark,” since there are fewer movie titles that feature Batman than there are about darkness. Different word combinations communicate the message at different rates—that's what information entropy seeks to define.
So how are these two seemingly disparate ideas similar? Understanding a bit more about how they work both in practice and in theory will illuminate their relationship—and its importance for our evolution and technological advancement.
Thermodynamic energy in a concentrated, usable form is considered ordered; energy in a distributed, unavailable form is considered disordered. What does this mean? For fossil fuels, it means that a lump of coal is ordered and usable, but once you use it and convert it to mechanical work (the desired end) or heat (a byproduct), it becomes disordered and unusable. The tricky part is that entropy only moves in one direction, from ordered to disordered; you can’t create a lump of coal with heat and pressure. So, in a way, entropy forms the basis of scarcity in natural resources.
Food, a form of ordered energy, sustains one of the most beautiful systems of all: life. As established, entropy moves in only one direction, toward equilibrium—unless acted upon by a force outside the system. Thus, plants and animals survive by gathering available, ordered energy from the environment and then emitting waste stripped of nutrients. But continuously passing energy eventually degrades our bodies, causing us to break down and die. And after death, bodies decompose and dissipate into the surrounding environment—like heat from a mug dissipating into a room—to reach thermodynamic equilibrium. In this way, entropy is responsible not only for material scarcity, but for scarcity of time.
Theoretically, entropy will only rest its steady march once it's brought about the heat-death of the universe—the end of time and ultimate end state. Everything on earth and in space will eventually expand, explode, die, and distribute free energy evenly through what is really the biggest closed system of all: the universe. In this way, entropy is a universal law similar to gravity— it operates at both the smallest and largest scales of biophysics.
Before delving into how information theory uses entropy, it’s helpful to establish a crucial fact about information itself: The informative value of a communicated message depends on the degree to which its content is surprising. A more surprising message has more informational gain.
Informational entropy helps a great deal in machine learning, where computer systems use algorithms and statistical models to perform tasks via patterns and inference rather than explicit instructions from a human. To continue with the language example, when laying out words letter by letter to compose a message, some letters, such as "E," will appear more frequently than others. But paradoxically, since "E" is so common, it communicates less information: More words have "E"s in them than "X"s or "Z"s. So, the event of a letter "Z" rather than "E"—just like searching “dark knight” instead of “the dark”—reduces uncertainty, or entropy, at a higher rate, because there's more surprise.
How does machine learning use entropy? Decision trees, a common algorithm, select one of many different attributes—also known as features or independent variables—to repeatedly split samples into subsets. At each split, the algorithm selects one attribute to split the sample on, and continues to do so until all subsets are pure—or, in other words, until each individual sample in a subset shares the same classification.
If our sample was a group of words, and our features letters in those words, the algorithm would split the group of words based on their inclusion or exclusion of letters. So, if we used each letter in the alphabet, then the branches of the tree at the very bottom would include each word in our sample. In practice, we limit the purity of the tree to avoid overfitting, so the algorithm can generalize to words it hasn’t seen before.
We would expect the decision tree to come to the conclusion that the letter "Z" decreases entropy at the highest rate based on the distribution of the letters in the sample. When searching for a new word, it would split the sample on the letter "Z"—those words that include "Z," and those that do not. If we give the algorithm the task of searching for The Dark Knight Rises, it would prioritize words based on the rarity of the letters in the sample it was trained on. It would check for a "Z," "Q," and so on down the list of importance—until it finally found that "K" is pretty valuable. Then, it would be programmed to recommend “Dark Knight” rather than “The Rises.”
So, how do these two interpretations of entropy lead to the same purpose? In short, information theory identified another, essential way that humans reduce entropy—by communicating. When we communicate a message, we reduce uncertainty about the world—and this had an indelible impact on the evolution of language and social organization.
Our hunter-gatherer ancestors used language both to acquire ordered energy—via coordinated hunting for food—and to avoid being killed by rival groups or predators. When communicating under pressure, it was essential for them to reduce entropy as efficiently as possible: Shouting “Lion!” or “Run!” is much more effective than saying, “There is a lion sneaking up behind you—run away!” This warning reduces entropy to a greater extent by reducing 1) the uncertainty of whether or not you’re in danger (informational entropy) and 2) the process of being eaten (energy entropy). Efficient communication reduces the probability space of all possible events, allowing us to act more quickly and effectively.
Our goal is to find ordered sources of energy and resist the influence of entropy on our bodies. In communication, we minimize entropy by finding information and reducing uncertainty. We've invented technologies to help us with both—we use machines to expend energy and computers to communicate vast amounts of information. Maximizing the returns of technology requires an understanding of both the physical and digital domains—and of the powerful law that connects them.