Why intelligence is compression
“Data compression, also called compaction, is the process of reducing the amount of data needed for the storage or transmission of a given piece of information” – (Encyclopedia Britannica). Lossless compression is equivalent to intelligence. Intelligence is that which understands. The ultimate intelligence would be an intellect capable of compressing all the world's information into a single intelligence and then being able to recreate every piece of that data without loss and more efficiently than any other theoretical intelligence. But what exactly is intelligence and why is it necessarily compression? It is clear to anyone who has spent any time investigating epistemology that memory is not synonymous with wisdom, intelligence, or understanding. Although compression directly affects memory by making its saved data smaller and more efficient, compression is not memory. However, this is not to downplay the necessity of memory when it comes to intelligence, since one cannot save any amount of information compressed or otherwise unless there is a place to store it. An intelligent being without memory is not intelligent since he will have neither reference nor storage for his thoughts. Likewise, a child with a photographic memory may be able to memorize “4+4+4+4=16” but that is not equivalent to understanding addition. So how would compressing this simple addition problem help us understand it? The goal of compression is to shrink the data or information into as small a size as possible while still being able to recreate the entirety of data in full. Typically this entails some high-level mathematics, but thankfully our example is simple and as we can see the number four is used four times. So, instead of repeating each instance of four, a simple compressor may just make the equation four squared. Now a computer would most likely use other means of compressing the data but this will work as a very simple example of how human beings already use and understand compression intuitively.
Addition, or the laws of mathematics generally, are ways that intellegent beings are able to compress information. A mathematician does not memorize every possible equation, he is a mathematician because he understands the principles governing mathematics. A good teacher does not have her students work out specific equations hoping they will memorize the answers to every problem. Rather, a good teacher only uses specifics to make the abstraction or generalization more explicit. In other words, your teacher is attempting to make you a more efficient compressor. A student who has mastered addition has not necessarily performed a lot of addition problems, a very gifted child could theoretically understand and master addition without performing a specific equation. Theoretically, a class full of first graders could learn math exclusively through arithmetic laws alone, although these abstract concepts are easily lost in the mind until they are first demonstrated on specific equations and even then a child usually requires a physical representation in order for them to abstract out the concepts from the specific. This can be observed in the intellectual growth of students who go on to study mathematics in college. The child begins by learning that one marble added to a bag with one marble already within it creates a bag with two marbles and ends (or nearly ends depending on how far he continues his study) by creating a proof justifying the logic behind his original equation 1+1=2. In the beginning, the child must learn by having these numbers represented as physical objects while the more developed collegiate mathematician deals almost exclusively with abstraction (like a computer would) and no longer sees the 1 as a stand-in for a specific object. 1 is never just one, it is an abstraction or compression of all ones. 1 can be one ounce, one box, one calorie, etc... So mathematics itself is a form of compression since we do not have to devise a new system when adding all the different varieties of 1. We can think of numbers as a platonic form. When one adds an ounce to an ounce what we are really doing is adding the forms of two 1’s. By abstracting a calorie or an ounce into a number, we can compress the relevant data and work with it inside of a logical system that is comprehensible. It is not relevant to know that a calorie is a measure of energy when attempting to find the total number of calories consumed during lunch and dinner, instead we compress the relevant information allowing us to intelligently (compressively) calculate how many calories we have consumed after breakfast. We can add calories because we are able to abstract out the concept from the physical since only the conceptual is capable of being compressed without loss.
How is intelligence defined? It would be a cheap tautology to define compression as intelligence and intelligence as that which compresses, so we will attempt to articulate the meaning of intelligence in a far more precise and less circular manner. Intelligence is that which understands. But what does it mean to understand and why must it be distinct from memory? In other words, why is the most intelligent man not necessarily the one with the best memory? For something to be intelligent it must not only be able to arrive at a logical conclusion but it must also be able to justify its answer and make verifiable predictions. To understand is to know why, it is to understand what will happen given certain conditions, it is necessity. If I know the rate at which all objects will fall when acted upon by a gravitational force, I can make some intelligent predictions about what will happen if you drop a bowling ball and a feather simultaneously in a gravity filled vacuum. Compression is prediction because when a piece of information is compressed the actual data is not saved in full and is instead given a statistical probability. Take the beginning of this sentence which I began with a capital “T”, the odds of an “h” or an “a” appearing next are much more likely than that of a “q”. This is how information compression is able to store and categorize data so efficiently because of its ability to assign mathematical probabilities to its data. This ability to assign probabilities to pieces of data is part of what makes compression such an appropriate definition for intelligence since its very function is by definition predication. But does an accurate prediction model understand why ‘x’ is more likely than ‘y’? Using the fraction 4/8 we will demonstrate how compression also implies understanding. A compressed version of the fraction 4/8 would be 2/4, ½, or maybe even .5, depending on whether the decimal is a more efficient way of compressing the information within the theoretical compressor, either human or machine. If a one-year-old child were able to articulate all the ways we may compress and or simplify this fraction we may consider that child to be intelligent. Why? Because it would be expressing a logical (not random but showing a clear series of events leading from X (the question) to the justification Y (the answer) all of which follow from one another and necessitate each link in the chain of causation) justification as to why ½ is not only equivalent to 4/8 but is a more efficient definition of the equation because it cannot be simplified or divided upon more than it already is. Therefore, the fraction or decimal is as compact as logically possible at least for a one-year-old’s mind, there may be a compressor who could store the fraction more efficiently. It also indicates that the child is not just a memorization machine since to a memorization machine 4/8 and ½ would not be equivalent unless there was a specific memory of the two being made equivalent; but if this was the cause of the child’s answer, the child would only be able to appeal to memorized definitions and would have no justification other than pure definition/memory. A child who is pure memory would not understandwithout being explicitly told that two is just a set of two ones added together and even then, he would not understand but would only have a repeatable definition that would be utterly divorced from any insight or understanding it would be equivalent to a dictionary that could speak its own definitions. The child would not understand or be able to explain the fraction as a division, but only as a definition. There would be no awareness or understanding that the fraction is splitting or dividing one into two pieces with the 2 itself being nothing more than the magnitude of two 1’s compressed together. This begs the question if compression is genuinely understanding or if it is still somehow just a function or tool of memory. Since compression can reduce memory size, this suggests compression just allows more information to be stored along with more definitions making it “know” the answers to more questions but still lacking what we would define as “intelligence” or “understanding”. We have already established that compression impacts memory but is itself not memory, although some may accuse the mathematic equations and computer code that make up such a compressor as being a form of memorization, rendering compression nothing more than an efficient tool that frees up the storage space for electronics. This argument could only be taken seriously if applied equally to all possibly intelligent beings, since humans too come equipped with their own type of a-priori functions that are written into our code and must necessarily be followed in order for experience to be experienced. This code is what allows human beings to intuitively understand logic, it is the a-priori understanding we are endowed with that allows us to intuitively grasp concepts such as equivalence. We understand that adding one to one equals two and that the second one is not suddenly equivalent to four. Equivalent objects retain equivalence, this is a fundamental component to logic and it is not something learned but rather something intuited. I would argue this intuition is God-given (to use an expression not an objective scientific or philosophic claim although I do not wish to rule it out entirely since it is never a good idea to throw the Kant out with the bathwater) or in other words a naturally endowed compressor. So, are humans intelligent or are we nothing more than an efficient memorizing device for biology? Let us answer this question by returning to our favorite theoretical mathematician. Is the mathematician considered “intelligent” because he rejects all standards of logic and creates his own, or because he uses logic more efficiently? IQ tests, which are the best current measurement of human intelligence would seem to side with the latter since the test is timed and thus attempts to measure in some capacity how efficiently one can reason. The test also has definite answers that rely on a-priori reason so by our best current standards of intelligence we have already agreed that our intuitive logic and its use are what makes us intelligent. But this does not fundamentally mean that we are intelligent or that compression implies either understanding or intelligence; but, if it takes you a hundred years to solve x = 10000**100000 because one would have to handwrite every ten thousand multiplied by ten thousand one hundred thousand times you are probably less intelligent than the mathematician who can perform the equation in his head or even in a few minutes while putting pen to paper. But does this mean that the calculator is more intelligent than a person? When considering just pure calculations it is hard to argue otherwise although arguing that a calculator understands what it is doing is a much more difficult claim. But doesn’t the calculator by definition “understand” calculation? A calculator isn’t a dictionary its answers are not memorized it possesses its own internal system of logic (a compressor) that allows the device to solve problems given to it. If I was able to absorb the calculator's program into my brain, would I have a better understanding of mathematics? I would certainly be more efficient at computation and all my future predictions about equations would be much less prone to error, seeing as I would now be endowed with a dedicated mathematic compressor. Isn’t this what understanding is? Maybe the problem is we have not yet devised a program to allow the calculator sufficient ability to explain its ignorance or understanding. But the mathematician is not using a calculator for his problem, he has to be his own calculator; and although the calculator’s program would not translate into the mathematician’s mind, since it is written in a different programing language, the same cannot be said about the human who created the calculator based entirely off of his own human model of compressing mathematics. But how would the mathematician solve this problem? He would most likely opt out of writing all of this down and would instead write a more compressed version of the problem which brings the size and scope of the numbers into something more easily computable to the natural human mind. This is not to say definitively that the mathematician is smarter than the man who spent one hundred years solving the equation or that he is dumber than the calculator that only took a second, but it certainly demonstrates that all three exhibit different levels of intelligence in at least that particular instance; and if we can grant this, than we have granted compression as at least an aspect of what we define as intelligence.
What is the difference between the map and the territory? Baudrillard asks this exact question at the beginning of his book Simulacra and Simulation, where he quotes directly from a passage written by Jorge Luis Borges. Borges famous writing details a fictional world where cartographers have created a map so detailed and large that it lays on top of the city it represents. Eliezer Yudkowsky reframes the same question in terms of compression when he says that, “Sometimes fallacies of compression result from confusing two known things under the same label—you know about acoustic vibrations, and you know about auditory processing in brains, but you call them both "sound" and so confuse yourself. But the more dangerous fallacy of compression arises from having no idea whatsoever that two distinct entities even exist. There is just one mental folder in the filing system, labeled "sound", and everything thought about "sound" drops into that one folder. It's not that there are two folders with the same label, there's just a single folder. By default, the map is compressed; why would the brain create two mental buckets where one would serve?” (Less Wrong, Yudkowsky). What Yudkowsky is essentially asking is what is lost in compression, or what is lost in the mapping of the town? What is lost in a map is what is lost in any symbol, namely detail and specificity. Yudkowsky uses the example of “sound” and the difference between acoustic vibrations which are the physical manifestation of “sound” and auditory processing, which is the process through which the brain and ears communicate and translate the physical vibration into a phenomenological “sound” of experience. What Yudkowsky seems to misunderstand is that the two could only be compressed if they were related, and that relation is the basis for intelligent understanding. The only reason Yudkowsky can criticize this hypothetical model is because he knows as an intelligent system himself that both acoustic vibrations and auditory processing are each aspects of that which we classify as sound. Auditory processing is itself a vague category that can be subdivided into all the functions which the brain and ears perform when interacting with an acoustic vibration not to mention the phenomenological processes that occur in consciousness to allow the subject to recognize that it is hearing which would also fall into the bucket of “sound”. But have we not already demonstrated that intelligence is this very act of abstracting? The knowledge of mathematics does not reside within any one specific equation, but within the laws of mathematics itself, just as the meaning of a language has no specific identity in any particular symbol. It is only when we mistake the map for the territory or when we mistake the compressed file for the file itself that compression appears unintelligent. When looking at a map inside a mall, it is typical to see a black dot in the symbolic location of the map that says “you are here” although the interior of the mall features no such dot, excluding the one inside the map that resides within the mall. This is important because you actually are here and the dot and symbol do exist if only in the symbol that makes up the map but nevertheless the map is still contained within the mall and is still a real part of its physical existence (territory). We would be fools to mistake a chair for Plato’s form of a chair, but it would be equally foolish to deny that there is no chair or chair-like qualities. A map is only a tool of understanding if there is a territory to contrast it against. There would be no point in developing mathematics if the world as we observed it was pure mathematics it would again be an attempt to make a map the size of the territory. A map the size of the territory is neither symbol nor compression it is its own territory it has become a clone, it would be pure memory.
So what distinguishes intelligence from embodiment? Is it the ability to recreate intelligence? Isn’t compression an attempt to recreate what was stored without the loss of any data? What is meant by compression, if not the ability to recreate, clone, or reproduce the memory of what was stored? These objections or questions are valid, but they also indicate a lack of understanding of what is meant by compression. The purpose that compression serves humanity at the moment is fundamentally different from what compression is. A car is what gets us from point A to point B but any car enthusiast, collector, or hobbyist will tell you a car is far more than a way of transportation. Ironically enough, these reductionist claims that define things as pure purpose are forms of compression but they are not quite intelligent, they are lossy forms of compression. Human beings are masters at lossy compression, we are great at giving a synopsis of a movie but horrible at recreating the details. Many people can explain what happened during the T.V. show, “Game of Thrones” but I’m confident far fewer could remember each character's name, and even less would require such memorization as a necessary component of “getting” the show. What compression is, is fundamentally different from the purpose it serves. The storing or reproducing of information that was reduced in size is only a definition of compression not compression itself, don’t mistake the definition of intelligence for intelligence. But what is the difference between a definition and intelligence and why are human beings intelligent if we are not lossless compressors? Firstly, a definition is a compressed version of whatever it is attempting to represent, and secondly, the argument could be made that our compressor is adequately lossless and the problem is located in our recall mechanism. If you watched Game of Thrones and “forgot” the main character’s name then chances are that you have not actually forgotten the name, but rather misplaced the mental file of information somewhere in your brain's hardware. If someone reminds you of the name John Snow (the main character of the show), a moment of eureka usually occurs; you have remembered where the file is located or someone has helped you find it. If the memory was truly lost, the name would ring no mental bell. Alzheimer’s is not a problem of compression it is a degradation of internal memory and recall.
Although this idea is not widely accepted, even within the computer science community, one philosopher, in particular, was ahead of his time in his articulation of intelligence as compression. Hegel’s Phenomenology of Spirit is the best, and maybe only philosophic justification of compression as intelligence. In his magnum opus, Hegel takes the reader through the stages of consciousness which have developed throughout human history and by doing so articulates the dialectics of spirit (understanding). Hegel commonly makes reference to the acorn in which he believes his philosophy to be the articulated tree. Whitehead was close when he deemed all of philosophy to be but a series of footnotes to Plato, but in reality philosophy was the decompression of Plato or the specifying of what Plato generalized and then the re-generalization (compression) of the act of conscious development (Spirit’s dialectic movement through history ). If Plato is the acorn, Hegel is the tree. What Hegel attempted, and partially succeeded in doing, was compressing all historic knowledge, and by doing so, understanding the particular evolutions within knowledge (Spirit). For example, Hegel was particularly fond of the Roman Empire, whose place in conscious development is that of the citizen or individual. Now, it would be incomplete to say Rome was only a conscious development that introduced the individual citizen, but understanding Rome requires this dialectic revelation. Is a Historian of Napoleon merely an encyclopedic dictionary of Napoleon’s deeds or is the true Napoleon scholar the one who can tell the “story” of Napoleon through the use of his specific actions? In other words, the true Historian is the one who studies the concepts of History by abstracting out from its specific details. Kojev sums up Hegel’s thoughts on this elegantly and efficiently when he says, “to speak of this table without speaking of the rest is to abstract from this rest, which in fact is just as real and concrete as this table itself. To speak of this table without speaking of the whole Universe which implies it, or likewise to speak of this Universe without speaking of this table which is implied in it, is, therefore, to speak of an abstraction and not of a concrete reality” (Kojev, Introduction to the reading of Hegel). What Kojev makes explicit in Hegel is that the dialectic itself is knowledge. Each truth will only be partial and therefore false unless related to the whole. In this regard compression is the only way understanding is possible since each piece of data must be connected together or compressed under a unifying dialectic truth of relation and negation. Compression is negation because it sacrifices what was for the abstraction or compressed version but is still able to recreate the specific from the abstraction. In a similar way, Hegel attempts to compress all of history into his phenomenology in order to articulate his compression program. Only by knowing the whole can we know the parts and only by knowledge of the parts can we understand the whole, making compression the only way out of the cave of ignorance and into the light of knowledge.
Kojève, A., & Queneau, R. (1980). Introduction to the reading of Hegel. Ithaca, NY: Cornell University Press.
Hemmendinger, D. (2013, April 23). Data compression. Retrieved October 05, 2020, from https://www.britannica.com/technology/data-compression
Yudkowsky, E. (February 28). Fallacies of Compression. Retrieved October 01, 2020, from https://www.lesswrong.com/posts/y5MxoeacRKKM3KQth/fallacies-of-compression