In a research jackpot, I recently came across a deeply rich school of thought that’s had zero critical engagement. Inspired by the French structuralists who used formal models to analyze texts, a group of mathematicians in Romania in the 1970s and ’80s applied formal language theory (à la Chomsky) to the fine arts. Far from just a historical curiosity, these ideas have no parallel in digital humanities, and I think they could yield incredible insights if coded up with a modern programming language.
There should be a book about these ideas. No one else is going to do it, so I will. I thought hard about it, and drew up a proposal for what it would entail to write a decent monograph. You can read the proposal here, and I welcome feedback and suggestions for getting this off the ground.
For a time I worried that someone might try to scoop this idea — but seeing as the project requires learning both algebraic topology and how to read in Romanian, the barriers to entry seem high enough to deter even the most unscrupulous grad student. Here, all my weird scholarly interests come elegantly together, and I feel like no-one else could ever write this in quite the way I envision it.
As part of making this proposal, I watched Jane Friedman’s lovely course, How to Publish your Book. One thing I learned is that publishers like when you list similar books that have been successful, so they have an idea of its market segment. In this case, I see it as somewhere in between Moretti’s Graphs, Maps, Trees and Zalamea’s Synthetic Philosophy of Contemporary Mathematics. Moretti’s book outlines abstract data structures to codify literary texts, and is an early manifesto for digital humanities. Zalamea’s book is a philosophical sketch of higher mathematics — hardly easy reading, but still accessible due to focusing on specific research programmes and broad conceptual themes.
This course also drove home how publishers want you to have a platform: they’ll do basically nothing to market your book, so you need to bring your own audience. While it makes sense that you should already have a reputation for producing good stuff, the problem is that researchers tend to have a huge amount of disdain for any sort of marketing. We’d like to think that quality speaks for itself, but that’s not always the case, and many strong researchers have remained obscure because of this.
I was surprised to learn that the publisher decides a book’s title and cover art, as well as whether it’s soft- or hardcover. Still, authors have a high amount of control for online and multimedia promotion. It becomes very clear after dipping your toe into this that dealing with publishers is a skill in itself.
I also found out just how expensive it is to publish open-access books at a major academic publisher. Routledge wants $13,000, Springer wants $15K for 400 pages, and Brill wants $12,200 for 350 pages. Yeesh. However, there are also grants specifically to help pay these costs, which is good to know.
Finally, I was struck by the close parallels between writing a book and starting a business. Far from scribbling in a garret, a writer today has to research their target audience, actively pitch their book to publishers, and schmooze in the literary community. I’ve often wondered about the lack of major philosophical works published in the last few decades, and this goes a long way in explaining why.
This project likely won’t happen for a while, since I need to catch up on the math. Also, with German I’m still only memorizing verbs, while some of the key papers are in German. As a final caveat, squeezing the timeline into a year is probably way too optimistic, but I feel like that’s all I can ask for.
A few things I’m still not clear about are the grant ecosystem, especially in Romania, as well as what kind of books scholarly publishers look for. I’d also like to make a software package so that researchers can use these ideas in practice, but it’s hard to see ahead of time what this will look like.
If anyone wants to help out, one way to do so is pointing out grants or other funding opportunities. I’m also looking for experts in related fields, who I can reach out to if I have questions, and who can recommend ways to simplify or generalize these older results. Last, many of these papers are buried in obscure journals that aren’t available in Canada, so a scan of these would make a huge difference.
Anyway, I tried to make this proposal genuinely snazzy, and I’d like to think it can give other young researchers a template to help focus their thoughts. Notably, the graph on page 6 is called a Gantt chart, and making it really helped crystallize my inchoate idea into a venture with manageable steps. I hope this also shows how the kind of work in an office job can strengthen your creative projects — and, for job-hunting humanities students, how good research habits can be valuable in industry.
]]>Most of us get through life by mining our past for analogies, but at extremes this breaks down. From historians trying to reconstruct forms of life of the ancient Greeks, to anthropologists meeting tribes untouched by modern society, or psychologists reaching out to patients in the depths of psychosis, sometimes these emotional ‘theories of mind’ are all we have.
I remember my surprise to hear that such a thing as ‘computational psychoanalysis’ existed. Far from the soft couch on which Freud’s patients, often neurotic Victorian housewives, laid supine, the roots of these ideas came from desperately trying to decipher straitjacketed lunatics shrieking absolute nonsense. We should expect this theory to be completely fucked, and I assure you, it is.
Much clinical research already exists on Ignacio Matte Blanco’s The Unconscious as Infinite Sets (1975). Here I want to examine its computational side, brought to light only far more recently. First, I’ll give an overview of Matte Blanco’s model of the unconscious as increasing degrees of logical symmetry; next we’ll reframe this through his later emphasis on multi-dimensional cognitive spaces; and finally, all this will be formalized in terms of ultrametric spaces in mathematics, allowing topological insights into Freudian concepts that can be implemented as machine learning algorithms for text-mining.
So much of Freud has trickled into popular consciousness that it’s easy to forget just how weird his notion of the unconscious really is. Specifically, Freud notes the following defining characteristics:
These come up again and again in psychoanalysis, but it’s not immediately clear how they’re related. In 1959, a Chilean analyst Ignacio Matte Blanco claimed that these can be reduced to two principles.
The principle of generalization claims that the mind classifies each thing (person, object, concept) into classes sharing some quality, which in turn belong to more general classes, and so on. This operates in both conscious and unconscious thought, and is commonsensical.
The principle of symmetry claims that the unconscious treats any relation as identical to its converse, i.e. \(xRy\!\iff\!yRx\). While conscious thought allows asymmetric relations where a part belongs to a whole, or one event occurs before another, in the unconscious the whole also belongs to the part, and any ordering relation disappears. As another example, I am my father’s son and he is my son.
This sounds crazy, but that’s the point. One of Matte Blanco’s examples is a schizophrenic woman who, after doctors took a blood sample, complained that they had taken her arm away (1975: 137).
Note how these explain the characteristics of the unconscious (1976: 215). Time is an order-relation, so it disappears. With negation, \(p\) and \(\neg p\) are generalized into a wider class, and thus get treated as symmetric. Condensation is just generalization, where classes are treated as homogeneous due to symmetry. Displacement occurs when things in a class (e.g. bosses and fathers) are, by symmetry, taken as identical. Last, replacement of external by psychical reality is just a version of displacement.
Here it’s natural to ask whether there’s any mathematical object where a part can be equal to the whole. In fact, this is a defining property of infinite sets. Take for example the natural numbers \(\mathbb{N}\) and the set of even numbers. For each natural number \(n\), there is a unique even number \(2n\) corresponding to it. We say that there exists a bijective mapping between the two sets. Counterintuitively, such a mapping implies that both infinite sets are the same size.
Matte Blanco takes this in a surprisingly practical direction when he notes how emotional life is dominated by infinite quantities. It would be emotionally untruthful for a lover to say “I’ll love you for a finite amount of time.” Likewise, if you feel like a bad person, everything you’ve ever done or will do is not good enough. We could go on and on, especially for pathological thoughts. A huge part of psychoanalysis is to ‘finitize’ these emotions, putting your flaws and limitations in perspective.
Pathology, in short, is the invasion of unconscious symmetries into thought; e.g. prejudice is to view all members of a social class as the same. For Matte Blanco there is a continuum from conscious to unconscious, marked by increasing degrees of symmetry (1988: 43-5). In conscious thought, we can apprehend individual things in their singularity (first stratum); still consciously, we see how these things are interrelated, and have feelings about them (second stratum). In the third stratum, things are placed into equivalence classes, and this is where emotions of infinite degree can arise.
In the fourth stratum we reach the ‘unconscious’ proper, where equivalence classes themselves are placed into larger classes. This is where schizophrenics live: one patient said a man was very rich; when asked why, she said “He is very tall” — here the symmetric class in question is ‘things of high degree’ (1988: 44). Another noteworthy phase shift is that intense aggression involves a high degree of asymmetry (e.g. self vs. other), so is confined to the third stratum, while the fourth embodies the unconscious characteristics of lack of negation, and identification of internal and external reality.
The deepest part of the unconscious is so symmetric as to be inaccessible to asymmetric conscious thought. Everything equals everything else, and all relations between things are contained within any one thing. Hence we actually answer the question “why is the Freudian unconscious unconscious?” rather than just taking it for granted. Beyond this, however, psychoanalytical theory hits a wall.
This differs from orthodox Freudians, for whom unconscious ‘depth’ corresponds to developmental stages (oral, anal, etc.). Yet, instinct presumes asymmetric divides such as me versus ‘thing in mouth’, so for Matte Blanco it can’t be deepest (1975: 133). Further, in Freud’s later view of the unconscious as ego vs. id, the barrier between conscious and unconscious was explained by repression, whereas Matte Blanco returns to Freud’s original conception of the (mostly) ‘unrepressed unconscious’.
Matte Blanco’s work is filled with many more clinical examples, but this gives the gist of his theory. His use of mathematics is somewhat shallow, leaving many open questions, though apparently his later (untranslated) work probes the deepest layers of the unconscious by means of large cardinals. He also reframed his ideas geometrically through the notion of multi-dimensional spaces, which is worth exploring to get a more practical feel for his theory’s implications.
Matte Blanco called his theory ‘bi-logic’, to denote the disparate logics of conscious vs. unconscious thought. In fact, he was originally experimenting with multi-valued logics, and shifted to geometry as a better way to account for the facts, and this in turn gave rise to the principle of symmetry (1959: 4). Thereafter he still held on to his geometric ideas, but in an uneasy relation to his set-theoretic views.
Here it helps to start with the math. Suppose we have a triangle made from points A, B, and C. This can also be expressed as the line CABC. Notice how representing a 2D object as a 1D line entails repetition, namely of a zero-dimensional point C. A similar thing happens if we unfold a 3D cube into a 2D cross-shape. This time, not only points but also lines get repeated. In fact, this kind of repetition occurs for any projection of an \(n\)-dimensional shape onto a lower-dimensional surface.
A strikingly similar thing happens in displacement, where the same unconscious class gets manifested in many different objects. Likewise, condensation in dreams compresses disparate ideas all within one symbol, like a cubist painting on a flat canvas. Hence, a dream is a way to represent multi-dimensional mental objects for eyes made to see in a three-dimensional world (1975: 418).
In this view, the deepest level of the unconscious is an \(\infty\)-dimensional structure, and higher strata of the unconscious are lower-dimensional projections of this, bringing multiplicities along in turn.
We can also use this to think about introjection, or internalizing an object or idea (1988: 254). We’re essentially building a mental model of that thing, which has a certain dimensionality, and thus might fit well or poorly in a given part of the unconscious. If we put a model in an isodimensional stratum, then we’re good to go. It may also happen that we put our model in a certain stratum, but later find that it’s more complex, like a childood memory that a person only appreciates when they’re mature.
In most cases our mental models are incomplete, having a lower dimension than the object itself. Matte Blanco notes that this gives a new way to interpret the concept of ‘partial object’ (1988: 255).
The psychoanalyst’s task thereby becomes a process of unfolding or translating symptomatic objects from a high-dimensional space into one that consciousness can grasp (Carvalho, 2010: 234). Sanchez-Cardenas (2011) uses this to justify pluralism: rather than an analyst holding back in fear of imposing their own meaning, any interpretation has value as material to bridge psychic voids.
Matte Blanco actually uses the term ‘topological psychology’, but is far more down-to-earth than Lacan. When I have a thought or emotion, it occupies the whole of my ego; we know it’s only a part, but at the same time it’s absurd to try to localize this to a specific part. Compared to claiming that the psyche has discrete parts (like id, ego, superego), the idea that these are projections of a more complex object is almost tame in comparison. Further, there is a ‘quantitative’ aspect, where higher-dimensional objects in the psyche are associated with more intense emotions (1988: 197).
We’ve seen two very different interpretations for the same basic model of the mind, with clear parallels such as dimensionality with degree of symmetry, but striking differences as well that can’t obviously be reconciled. Nevertheless, both approaches are based on clearly-delineated principles. Thus, Matte Blanco lets us think an unconscious that is unattached to any particular person.
As theories go, Matte Blanco’s is quite elegant, and it’s natural to ask why he’s not more well-known. The answer, I think, is terrible writing — most of The Unconscious as Infinite Sets is absolute twaddle, filled with pedantry, self-praise, and saying the same things over and over again. He seems to be much more popular in Latin America than in the Anglophone world, though has a positive reception by clinicians because he offers interpretive rules that can be used again and again among patients.
Reading Matte Blanco, one feels that he could have gone far deeper, if he only knew more math. In this vein, several attempts have been made to upgrade bi-logic into more advanced formalisms. This section is meant to show what else is out there, but I don’t find any of these approaches compelling. The reader is free to skip to the next section, where I outline a far more exciting formalization.
Rayner (1995) outlines his collaborative project to frame bi-logic through chaos theory. Matte Blanco (1988: 242) himself posited fractional dimensions of the unconscious between 2D and 3D, which we know today as fractals. Like multidimensional mental objects, chaotic structures often repeat at different scales (1995: 156), and Rayner brings up other parallels such as transitional states (‘bifurcations’ as patients gain insight), and internal objects as strange attractors (1995: 158).
The problem is that chaos is defined by sensitivity to initial conditions, making it infeasible to model empirical states, let alone something as hard to measure as the mind. Chaos was in vogue in the ’90s, but dried up once people found they couldn’t go any deeper beyond these initial metaphors.
Iurato (2018) approaches bi-logic through tools used in physics to study symmetry-breaking, or how systems transition from a symmetric to asymmetric state. In mathematics, symmetries take the form of group-theoretic operations; for example, a square is invariant when rotated 90°, so it is rotation-symmetric. Conscious thought is dominated by order-relations (asymmetric), while unconscious thought is dominated by equivalence relations (symmetric), so Iurato claims that groupoids can help us formalize the ‘break’ as unconscious thoughts their lose symmetry and become conscious.
In fact, it’s Iurato who coined the term ‘computational psychoanalysis’, which hasn’t caught on. The dealbreaker is that his papers consist of long-winded descriptions of any formal tool he can think of, without actually using these to formalize Matte Blanco’s ideas. In short, there is no actual good-faith engagement, such as through sustained examples, but only tossing around undeveloped analogies.
In yet another approach, Tomic (2020) notes that the main ingredient of quantum logic is subspaces of a Hilbert space. Logical operators are interpreted as interactions between subspaces, such as conjunction (‘and’) as intersection, or negation as a subspace’s complement. Notably, disjunction (‘or’) is defined as the linear closure of two subspaces, or all possible linear combinations of vectors in each subspace. This has the odd corollary that \(p \lor q\) can be true even if both \(p\) and \(q\) are false.
This means that from \(x \lor y \lor z\) we can derive both \(p\) and \(\neg p\), giving the principle of symmetry. We can go on to interpret a ‘class’ as a subspace, and condensation as a linear combination of qualities, but unfortunately there’s no elegant way to express the lack of time and space in the unconscious.
Battilotti (2013, 2014) likewise uses quantum logic, but starts from Matte Blanco’s thesis that 1) all sets are infinite, and 2) all relations are symmetric. The only type of set that satisfies the latter is a singleton (a set with one element), so the key is to find a way that infinite sets can act as singletons. She does this by through the concept of ‘virtual singletons’ (\(V\)) translated into logical terms, where
\[(\forall x \in V)A(x) \equiv (\exists x \in V)A(x)\]
for any formula \(A\). Thus, the set’s single element applies to an infinite number of formulas. From there things get highly technical, but the gist is that this boutique logic has both an asymmetric mode with negation, and a symmetric mode where negation is meaningless, and with no temporal processes, interpreted as logical consequence having no direction (i.e. \(A\vdash\!B\!\iff\!B\vdash\!A\)).
From the viewpoint of building a formal basis for computational psychoanalysis, I feel that quantum logic brings in way too much extra baggage, such as spin operators. This can’t be justified unless we literally believe mental objects follow quantum rules — which is insane, and not in a productive way.
While Matte Blanco based his theory on a smattering of logic and real analysis, it’s interesting to see the wide range of tools invoked to take it to a deeper level. Still, none of these really lives up to the surreal aura evoked by the term ‘computational psychoanalysis’. What we really want is something that can be implemented on a computer, but which also plugs in to a broader philosophical edifice.
The roots of our next approach to computational psychoanalysis come from Lauro-Grotto (2008), originally a theoretical physicist who transitioned to cognitive neuroscience. Her research involved working with dementia patients, trying to measure loss of memory. To do this, she developed a test involving photos of famous people, which the patient would classify in terms of nationality and profession; if they didn’t recognize the person, they were asked to guess.
The right and wrong answers were used as an index of ‘metric content’, or degree of similarity relations that patients perceived. She found that dementia leads to a loss of information content (as we expect), but at the same time a rise in the metric content. That is, the mind generates clusters of concepts that are internally homogeneous, such as a superclass that makes no distinction between dogs and cats. Mathematically, this is called a transition from a metric space to an ultrametric space.
Later, Lauro-Grotto encountered Matte Blanco’s book by accident, and saw the same ultrametric structure in his two principles of the unconscious: the generalization principle implies a hierarchy of classes, while the symmetry principle means that all elements of a class are indistinguishable.
I’ve stressed Lauro-Grotto’s role because of her solid foundation in both math and psychology, which makes me trust the theory far more. She’s since moved on to study mirror neurons through the lens of bi-logic, while ‘ultrametric psychoanalysis’ has been passed on to mathematicians. It turns out that the further we get into the mathematics, the more striking the parallels become. So let’s go deeper.
Suppose we have a triangle whose base is a line \(z\), and two more lines \(x\) and \(y\). In the worst-case scenario, \(x\) and \(y\) overlap with \(z\) to give a straight line, or triangle with zero area. Here, the lengths of \(x\) and \(y\) sum to that of \(z\), which we denote as \(d(z) = d(x) + d(y)\). The right-hand side simplifies to \(d(x + y)\), which is a lower bound for \(z\). In any other case, the combined lengths of \(x\) and \(y\) will always be greater than this. Therefore we get the triangle inequality: \(d(x + y) \leq d(x) + d(y)\).
The usual way to measure distance between points is the Euclidean metric, where if \(x=(x_1,x_2)\) and \(y=(y_1,y_2)\), then the distance between \(x\) and \(y\) is \(d_e(x,y) = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2}\). Other metrics are possible, such as Manhattan distance, where we can only move along a grid rather than ‘as the crow flies’ like in the Euclidean metric. Here we need to measure the horizontal and vertical distances, and add the two together, or \(d_m(x,y) = |x_1 - y_1| + |x_2 - y_2|\). A still weirder form of distance is the infinity metric, which is the maximum of the vertical and horizontal distances, or \(d_\infty(x,y) = \max\{|x_1 - y_1|, |x_2 - y_2|\}\).
A way to measure distance is a metric. Many metrics are valid, but each entails a different geometry. An example is the notion of circle, where each point has the same distance from the center [via].
Most notions of distance we are familiar with are Archimedean, where distances add up in a way we find natural, e.g. we can keep adding a number to itself until it is greater than 1. This isn’t always so. For example, number systems with infinitesimals are considered non-Archimedean.
Another non-Archimedean object is an ultrametric space, whose distances obey a stronger version of the triangle inequality, namely: \(d(x,z) \leq \max\left(d(x,y),d(y,z)\right)\). To see how ultrametric spaces defy our normal Archimedean intuitions about space, it’s easiest to illustrate their odd properties.
Consider a triangle in ultrametric space, and suppose \(d(x,y) < d(x,z)\). Then we know: \[d(y,z) \leq \max\{d(x,y), d(x,z)\} = d(x,z)\] \[d(x,z) \leq \max\{d(x,y), d(y,z)\} = d(y,z)\] and therefore, \(d(x,z) = d(y,z)\). In other words, two sides of a triangle will always have the same length, so every triangle in an ultrametric space is isosceles [via].
Another curious property is that every point of an ultrametric ball is its centre. In math, a ball is the set of points whose distance from the centre (\(a\)) is less than a given radius (\(r\)), which we can write as \(B(a,r) = \{y\:|\:d(y-a) < r\}\). Now let’s pick another point \(b\) in \(B(a,r)\) to be the new center and \(x\) in the ball with the new center, namely in \(B(b,r)\). Then \(d(y-a)<r\) and \(d(x-b)<r\) so that
\[d(x-a) = d(x - b + b\,−\,a) \leq \max\{d(x - b), d(b\,−\,a)\} < r.\]
That is, the distance of \(x\) from the old center \(a\) is still less than \(r\), so any \(x\) is in \(B(a,r)\), and hence \(B(b,r) \subseteq B(a,r)\). A similar argument lets us conclude \(B(a,r) \subseteq B(b,r)\), from which we get \(B(a,r) = B(b,r)\). So ultrametric spaces destroy our intuition that any shape has a unique centre.
One last odd property is that an ultrametric ball (or cluster) is both open and closed, or clopen, since these are not topologically exclusive: “the set is closed because objects on its boundary can be members; and it is open because the cluster extremity is defined by what is not a member relative to the external, complement set” (Murtagh, 2014a: 45). Thus a class can be defined by what it is not, which we can interpret as expressing absence of negation in the unconscious (Murtagh, 2012a: 200).
A major practical application of ultrametric spaces is phylogenetic trees, where evolutionary distance between nodes is measured by the height of their least common ancestor, which satisfies the strong triangle inequality [via]. A simple change in notation shows how this is closely related to clustering in machine learning, where clusters are ‘bags of symmetry’ subject to increasing levels of generality.
Murtagh uses this kind of clustering to raise some provocative empirical evidence for the theory. One experiment took texts from the Brothers Grimm, Jane Austen, James Joyce, air accident reports, and a database of dream reports to measure their ultrametricity values (2012b). The Brothers Grimm and accidents were lowest, and did not noticeably differ. Jane Austen had a small but distinguishably higher value. The dream reports were still higher, especially when limited to a corpus by one single dreamer; Joyce was higher than the total dream corpus, but lower than the reports by one dreamer. Conversely, the most non-ultrametric forms of time series data are those exhibiting chaos (2014c: 4).
Murtagh notes that in general, as dimensionality of data increases, distances become increasingly ultrametric — each point more and more equidistant (2020: 7). As higher-dimensional symmetries emerge, these can be exploited in algorithms to gain computational efficiency. He speculates that similar cognitive processes account for the far greater efficiency of unconscious pattern-matching over conscious reasoning (2012a: 202). The main goal in his quantitative approach to psychoanalysis is to measure degrees of condensation, ranking objects by their emotional valence (2014b: 155).
It’s also possible to build more fine-grained models using \(p\)-adic numbers, where \(p\) is some prime. As opposed to our usual notion of ‘closeness’ on the real number line, two \(p\)-adics are ‘close’ if they can both be divided by a power of \(p\); the higher the power \(p^k\), the closer they are. The \(p\)-adics form an ultrametric space, and several papers in the computational psychoanalysis literature (notably by Khrennikov) attempt to build \(p\)-adic models of the mind. Yet, the sad fact is that no-one with a deep knowledge of psychoanalysis is familiar with this way of thinking, to enrich it with empirical details.
Thinking bigger, Rejaibi et al. (2020) use deep learning to diagnose depression based on patients’ speech, and one can imagine psychoanalytic AI identifying unconscious condensations based on transcripts from analysis sessions, and perhaps even creating synthetic dreams that materialize those same complexes through virtual reality, which the analyst and patient could navigate together.
Computational psychoanalysis as a discipline can contribute to philosophy and other fields a better understanding of abstract distances between ideas. It’s a shining example of just how surreal rationality can get, and how by means of weirdness we can get a conceptual grip on deep problems.
Matte Blanco’s basic model of the mind consists of a nested hierarchy of classes defined by common qualities, with increasingly symmetric relations as we progress to more general superclasses, down to the deepest strata of the unconscious. From a geometric view, we can conceive the core of the unconscious as an \(\infty\)-dimensional object, whose projections onto lower-dimensional strata closer to consciousness lead to repeated volumes, manifesting as condensations in pathologies and dreams. This bizarre theory can be elegantly expressed as an ultrametric space, which has close links to machine learning, potentially opening the way for a computational approach to psychoanalysis.
By typical standards, all this is deeply whacked. Yet, I’d sooner trust a theory that 1) real psychiatrists find useful and 2) is backed up by actual math, rather than the brazen p-hacking that psychologists produce today. From the other direction, if we hope to find new applications for abstract math, we should seek out perverse kinds of reasoning that don’t conform to typical logical rules. Formalized philosophy need not imply sterile repetition — rather, expressing a concept in math means plugging it into an infinite network of entailments, extending it far deeper than its creator ever imagined.
If anyone can remedy this, Joseph Goguen (1941-2006) is the guy for the job. His remarkably prolific research includes categorical fuzzy set theory, inventing the OBJ family of programming languages (e.g. CafeOBJ, Maude), and creating the theory of ‘institutions’ as invariant properties of all logical systems. He was also a practicing Buddhist, and editor-in-chief of the Journal of Consciousness Studies.
All these come together in Goguen’s algebraic semiotics, which uses category theory to formalize the notion of sign-system, serving as a principled approach to user interface design. The rabbit-hole goes quite deep, and the main ideas are strewn throughout numerous papers. This post gives a self-contained introduction to algebraic semiotics, outlining semiotic morphisms as mappings between sign-systems, conceptual blending as condensation of morphisms, hidden algebra as formalizing dynamic creation of meaning, and polymorphic poetics as computational semiology.
The key insight of semiotics is that instead of meaning being inherent to a sign, signs acquire meaning differentially, through a system of oppositions. Naturally, we don’t want to spell these out, which for n signs implies n*(n-1)/2
oppositions. If we used sets as a framework, we would be in for a rough time, not least because our understandings are always only partial — this wouldn’t leave room for signs we don’t know about, or may not want to include (1999: 250).
Instead, an approach using algebra gives the structure we need while also allowing open systems. Goguen gives a formal definition of a sign-system that I’ll explain in detail, so the reader may want to just skim until the next section. A sign-system is made of the following ingredients (2004: 4):
First, a ‘sort’ is the type of a sign. It can be mundane, like separating text from numbers, but can also be elaborate, such as components of a multimedia display, with nested subsorts and supersorts. Sorts have a part-whole hierarchy (‘level ordering’): for example a menu bar and scroll bar may be on the same level, but at a lower level than the window of which they are parts (Harrell, 2013: 133).
Operations include constructors that build new signs from old signs as parts, and selectors that pull out parts from compound signs (2004: 4). Constructors can have parameters, such an image of a cat that takes parameters for its color, size, and location (Goguen & Harrell, 2005a: 86); each parameter of a constructor has a corresponding selector to extract its value (1999: 263). These more standard attributes of signs (e.g. colors, booleans, integers) are ‘data sorts’, and go in the subsignature.
Axioms are logical formulas made of constructors, functions and predicates, and constrain the set of possible signs (Goguen & Harrell, 2005a: 86). For example, we may want to stipulate that all windows on a screen are below a certain size, or that an integer has no leading zeros (Harrell, 2013: 132).
The first four items make an algebraic theory, which just means a declaration of symbols plus rules to restrict their use (1999: 244). This is what makes Goguen’s semiotics ‘algebraic’. Sometimes he refers to a sign-system as a ‘semiotic theory’, as opposed to a specific model (i.e. interpretation) that instantiates it. The class of models that satisfy a given theory is called its semiotic space (2003: 2).
Note that we add some extra structure through a priority ordering, which is assigned to constructors and their arguments to express the relative importance of the signs they build (2003: 2). Hence, “priorities indicate the relative significance of subsigns at a given level” (2004: 4). The level and priority ordering are the main ways that social context is integrated into a model.
In sum: “Sorts classify signs, operations construct signs, data sorts provide values for attributes of signs, and levels and priorities indicate saliency” (2001: 2).
Any representation is a mapping from signified to signifier. A semiotic morphism is precisely this: a structure-preserving map from one sign-system to another. Instead of signifieds we have a source space, while instead of signifiers we have a target space. Understanding takes place as a process from target to source, while design proceeds from source to target (Goguen & Harrell, 2005a: 88).
Algebraic semiotics begins from the idea that we can evaluate and compare morphisms by how well they preserve structure. A good explanation or design is all about making it easy to translate from one system (e.g. words) to another (how to do something). The kinds of structure to be preserved from one sign-system to another are just the components mentioned before (Harrell, 2013: 147):
Hence semiotic morphisms map sorts to sorts, subsorts to subsorts, constructors to constructors, and so on, from source to target (Goguen & Harrell, 2005a: 88). The sign-systems need not be exactly the same, but should have corresponding structure. That is, morphisms translate “from the language of one sign system to the language of another, instead of just translating the concrete signs in the models” (1999: 256). Likewise, these mappings are partial, since we can’t expect to keep every single element, and our level and priority orderings help us decide which losses matter.
Since much of design involves choosing to preserve one thing or the other, Goguen identified several principles through detailed psychological and linguistic experiments (2001: 3):
These results are definitely non-obvious, and allow a principled approach to many design problems otherwise lacking solid guidelines. While most design can be done well without algebraic semiotics, the formalism really shines in resolving difficult decisions (2001: 4). One example of these principles in action was designing a proof assistant to be more pedagogically-friendly, which found that “early designs…were incorrect because the corresponding semiotic morphisms failed to preserve certain key constructors” (Goguen & Lin, 2001: 31). In a setting like teaching where students have vastly different intuitions, it could pay off to take a more abstract view to accommodate everybody.
The easiest way to understand something we don’t know is by analogy with something we do know. Once there were no words for ‘computer virus’ or ‘roadkill’, so we just blended existing concepts.
One way to think about analogy is ‘conceptual spaces’, formalized as sets of elements with relations among them. A blending is then just a mapping from one conceptual space to another. Metaphoric blends are slightly more interesting, in that they are asymmetric: in saying “the sun is a king”, we don’t evoke every quality of a king (crown, taxes), only the most salient ones. The literature even identifies optimality principles to judge whether a blend is good or not (Goguen & Harrell, 2005b: 5).
This all sounds familiar. In fact, we can see that this framework is quite impoverished compared to semiotic morphisms: elements and relations are not typed, nor are there functions or axioms (ibid.).
Types, for instance, give us information like if a metaphor is a personification, or how ‘far apart’ are the elements being compared. Likewise, functions and axioms help account for structure, such as how a poetic meter blends with a rhyme scheme (Goguen & Harrell, 2004: 51). Most interesting of all, this enriched ‘structural blending’ can be elegantly formalized as pushouts in category theory.
In the simplest case of conceptual blending we have a base space G (which stands for ‘generic space’) plus two input spaces 𝐼₁ and 𝐼₂. Here G,𝐼₁,𝐼₂ and their morphisms G→𝐼₁, G→𝐼₂ make up the ‘input diagram’. Likewise, a blendoid is a space B with morphisms 𝐼₁→B, 𝐼₂→B and G→B called injections. The main thing we want is symmetry, where each element of G gets mapped to the same element of B regardless of whether we choose G→𝐼₁→B, G→𝐼₂→B, or G→B, i.e. the diagram commutes.
In English: 𝐼₁ and 𝐼₂ are two things being compared, G is what they have in common, and a blend B should be consistent no matter what ‘side’ you come from. The intuition behind pushouts is that “nothing can be added to or subtracted from such an optimal blendoid without violating consistency or simplicity in some way” (2004: 13). This is mostly abstract nonsense, so let’s do an example.
Here we see a structural blend for the term ‘houseboat’, or a boat that is used as a house. The left circle is ‘house’ (𝐼₁), the right circle is ‘boat’ (𝐼₂). The bottom circle is their common elements (G), namely that they include a person, and are on a certain medium. If you check 𝐼₁ or 𝐼₂, each gives specifics for its input. And of course, at the top we have the blend ‘houseboat’ (B). Note that for the object, person, and their relation, B combines from both inputs; yet, for the medium we only have ‘water’ — which is fine, because we only need weak equality, where each element maps to another.
Still, we can imagine other possible mappings, like ‘boathouse’ (a shelter for a boat). Goguen actually had to create a new categorical concept for this (‘3/2-pushouts’), since the output is not unique. Still, ideally we’d like to have rules so that a computer can tell which blends make sense and which don’t.
Hence, Goguen & Harrell (2005b) wrote a computer program to generate all possible blends for this example. To their surprise, there’s actually a lot — namely 2^{A}P, where A is the number of axioms, and P is the number of primary blendoids. As far as I can tell, A=4 and P=3, giving 48 possible blendoids.
To narrow these down, they looked for optimality principles. The main challenge is that to automate these, they need to be fully formal, based on structure rather than meaning. Goguen & Harrell (2004: 52-3) ultimately arrived at degrees of commutativity (i.e. how much the arrows are ‘equal’), degree of axiom preservation (i.e. how well blends follow the rules), and amount of type casting for constants (i.e. whether a blendoid has an unnatural type). Overall, they’re satisfied that these principles match our intuition of how much a blend seems ‘boat-like’ or ‘house-like’ (Goguen & Harrell, 2005b: 16).
This example was very simple, and it turns out we can further generalize from pushouts to colimits. Colimits “capture the notion of ‘putting together’ objects to form larger objects, in a way that takes account of shared substructures” (2005: 62, fn. 14). They give an optimal blend in that “they put some components together, identifying as little as possible, with nothing left over, and with nothing essentially new added” (1999: 279). As before, we weaken these to ‘3/2-colimits’. In short, they’re a powerful tool both to combine meanings and to study the effect of context on meaning (1997: 12).
On the conceptual end, this formalism lets us think of ‘style’ as a choice of blending principles, and gives us a new language for stylistics. Notably, many artistic works make use of disoptimization principles, creating original ideas by violating our expectations (Goguen & Harrell, 2004: 56). More whimsically, Goguen curated a ‘semiotic zoo’ of bad design choices illustrating semiotic principles. (Unfortunately they’re all extremely ’90s.) While these examples are evocative, we don’t quite have a general theory — at least not until we extend our formalism to encompass sign-dynamics.
The math of algebraic semiotics is closely related to formal verification, or proving that software is bug-free. One problem with this is that most code is written in a rush to meet deadlines, often with last-minute design changes, and just isn’t worth the trouble of verifying. Sometimes we only want to know a design works (e.g. a cryptographic protocol) and leave the code up to the user. This is called formal specification, where we prove properties of designs, as opposed to code (1997: 10).
In a static design, we want to know the different parts and what they do, which is like an algebra with elements and operations. However, in dynamic designs an operator often depends on a state that changes over time. A nice example is undo
, which goes back to the state before the last command, so the computer needs to have the last state on record, to access it if needed (1999: 272).
The motivation for hidden algebra was to formalize object-oriented software (Goguen & Malcolm, 2000: 56), in which ‘objects’ have various attributes put together (e.g. a person’s name and height), visible to other objects. Each object also has a ‘hidden’ internal state that other objects can modify through methods (functions). Thus we have a division between attributes that stay the same, and states that can change. Hidden algebra is all about handling both of these at once.
While visible attributes are easy to handle with algebra, we can only find out a state by performing an experiment on it. This leads naturally to the idea that two designs are ‘equivalent’ if they behave the same in all relevant experiments (Goguen & Lin, 2000: 28). For example, practical software often doesn’t follow its specification exactly, so we may want to prove whether this makes any difference (1997: 10). Likewise, this can help if want to simplify a UI design without losing functionality.
Classical semiotics takes signs as given, but in UI design we need to think about signs that change in response to user input, or that move on their own (2004: 22). Likewise, real-world sign systems are dynamic: words change their meaning, new words are introduced, old words disappear, and even syntax changes (1999: 272). It’s common to bash structuralism for being static, but without a formalism to explicitly express changing states, ‘post-structuralist’ semiotics is no better.
This form of analysis for behaviours of hidden states is called coalgebra, a ‘dual’ to normal algebra. Even ignoring the technical details, Goguen makes a strong case that any dynamic semiotics must be coalgebraic, and hidden algebra’s strength comes precisely from combining algebra with coalgebra.
The most elaborate application so far has been Triantafyllou et al. (2014) using algebraic semiotics to measure quality of video annotations. Goguen envisioned a form of algebraic engineering for sign-systems (Goguen & Malcolm, 1999: 164), and made impressive progress in laying its foundations. Still, for this to actually catch on, it should bring not only new formalisms, but also radical new ideas.
Polypoems use algebraic semiotics in a generative way to create interactive artworks. Goguen was especially interested in computational narratology, so the term isn’t limited to poetry by any means. One more poetic example is “November Qualia”, which is essentially a poem built from randomized phrases, much like Queneau’s Hundred Thousand Billion Poems. Other proposed applications include computer games that generate a plot as they are played (Goguen & Harrell, 2004: 49), computer-generated hip-hop (Goguen & Harrell, 2005b: 23), and various elaborate multimedia projects that — probably for the better — never materialized (Mamakos & Stefaneas, 2013).
Conversely, polymorphic poetics is the use of algebraic semiotics as an analytical method, describing how choices of semiotic morphisms affect the expression of a work (Harrell, 2013: 150). In UI design, it “uses morphic semiotics to help describe how meaning ‘gets into’ computing systems” (ibid., 117). Proposed applications include designing new media such as VR whose rules are not well-known, increasing usability of hardware, and supporting non-standard users such as people with disabilities (2004: 1-2). For the semiotically-inclined reader, this is probably the most compelling idea so far, but it was little developed before Goguen’s unexpected death from illness in 2016.
Some hints are there about what polymorphic poetics might have looked like if better theorized. Sadly, it’s clear that Goguen never engaged with semiotics at a graduate level. His general idea of Peirce was that he had a triadic system of signifier, signified, and an interpretant that links these two. He sees Saussure as having a more dyadic system of signifier vs. signified, but his major insight is that signs occur in sign-systems (1999: 244-5). Goguen frames his approach as similar to Peirce, whose semiotic triangle is ‘relational’, as opposed to the ‘functional’ view of Saussure (2003: 7).
An evocative illustration of the potential for algebraic semiotics is the idea that art functions through non-preservation of structure and violation of axioms (Harrell, 2004: 148). With a large enough corpus, we can imagine establishing ‘meta-rules’ of when a violation is acceptable — and these, perhaps, get violated in turn. Note, however, that algebraic semiotics is less a school of thought on its own, and more a tool to formalize diverse readings, ensuring consistency and revealing structure.
In a fascinating paper, Goguen & Borgo (2005) model free-jazz performances as nonlinear dynamical systems, where improvisation enacts the complex dynamics of musical phase space. Chiriţă & Fiadiero (2016) framed this through the lens of algebraic semiotics, creating a logic for free jazz that can be used to evaluate how it meets listeners’ expectations, find which music fragments are hardly reachable, and predict how an improvisation will evolve. This is by far the most impressive extension of algebraic semiotics thus far, and shows the deep richness that formal methods can bring.
A truly scientific theory of signs would have vast consequences for every field. For Goguen (2005), the ultimate scope of his project was a ‘unified concept theory’ using his theory of institutions to raise algebraic semiotics into a rigorous theory of knowledge representation, superseding formal concept analysis. These claims sound grandiose to the point of being crankish, but by now I hope the reader has seen that Goguen was perhaps the single person who could realistically deliver on this.
It’s clear that the tools are in place for a formal science of signs. Goguen’s algebraic semiotics was developed with working examples implemented in OBJ code. The main barrier has simply been that experts in semiotics have never even heard of ideas like colimits or universal algebra. Again, all of this is realizable right now — all that’s missing is someone willing to do the dirty work.
Radical ideas like ‘cognitive ergonomics’ are often tossed around for selling snake oil, but Goguen opens up the tantalizing thought that foundations for this could truly exist. We can speculate on an algebraic semiotics software added to design workflows like a debugger, optimizing user experience and potentially avoiding disastrous design flaws. We can imagine a semiotic branch of numerous sciences, such as computational biosemiotics giving us algebraic models of animal communication.
Overwhelmingly, semiotics is used as an academic acrolect to ensure that people can ‘talk the talk’, as well as dressing up insipid research to sound radical and profound. It’s time for semiotics to finally live up to its potential, as the kind of unified theory that gives post-structuralists nightmares.
In philosophy circles, topos theory is best known as the machinery behind Badiou’s Logics of Worlds. It is often brought up by Zalamea, who outlines how the closely-related sheaf theory opens up novel relations between local and global that philosophy should take up in turn. Yet, due to the obvious barriers to entry, most attempts at this have just been exercises in hand-waving.
This post aims to give a general overview of Sallach’s research project, from the preliminary concepts it started with, onward to the process of narrowing down among competing frameworks, and finally to the new ideas that topos theory can bring for social analysis. I assure you, I am in no position to talk about these things, but no-one else is going to do it, so what the heck.
Sallach comes from a background in computational social science, which at this point is essentially a hodgepodge of agent-based simulations and social network modelling. While it’s possible to make highly practical models, lack of underlying theory makes it very hard to draw general conclusions.
An obvious reason why category theory suggests itself is its potential to tie together many disparate formalisms currently in use. Culture is a network of networks, where social distinctions create ingroups and outgroups, with an associated core and periphery (2011: 7). The main unit here is discrete agents, but it’s also natural to think of macro-level trends as if they were continuous.
Curiously, a major part of Sallach’s initial metaphors come from topological quantum field theory. Rather than quantitative metrics, here we are interested in topology change, a far more ‘qualitative’ kind of reasoning, much like how we think of transforming social structures. We want to analyze degrees of freedom as capacities for variation, and how changes propagate through them (2011: 10).
Similarly, it seldom makes sense to say that two social tendencies are ‘equal’ in a mathematical way, and a major benefit of categories is to allow weaker forms of equality. Objects in a category are specified only ‘up to isomorphism’, where instead of spelling out how two things are the same, they are simply taken as indistinguishable. Further, if two things are equivalent up to isomorphism, these equivalences can likewise be equivalent from the point of view of a higher category, in a “recursive weakening of the notion of uniqueness” (Baez & Dolan, 1998), allowing various levels of granularity.
However, categorical objects tend to be viewed as invariants, of which there are few in the social world — death and taxes excepted. To get around this problem, for Sallach the main objects of categorical social theory are ideal types, never appearing in pure form empirically (2012a: 12).
Sallach extends this notion of ideal type by drawing from Karl Popper’s propensity interpretation of probability. The paradigmatic example here is rolling a die: we know the odds of getting a given side are 1 in 6, but frequentist probability must justify this through a complex rigmarole of supposing an infinite sequence of dice rolls that would solidify these probabilities. It’s far more cogent to say these probabilities are built into the structure of the die (6 equal sides, etc.). Thus: “Propensity measures the causal pressure exerted by certain conditions toward the realization of certain events” (2011: 11).
Especially in contexts like quantum theory, we want to talk about the chances of an atom decaying, even though it will decay only once — i.e. single-case statistics. Analogously, we also want to be talk about probability in the case of broad social tendencies whose conditions can never be replicated again. Hence, Urbach (1980) extends the notion of propensity in a direction that Popper himself would have hated, to encompass the holistic forces dealt with in sociology. His key example is Durkheim’s study of suicide, which found a remarkable persistence in suicide rates irrespective of mortality rates, with equally-persisting discontinuities across state and demographic boundaries.
Because ‘propensity’ is such an underdetermined concept, it applies to vastly different social fields and forces, making it ideal for the level of abstraction in category theory (2012b: 7-8); in fact, a metaphorical inspiration here is quantum spin networks (2011: 11). The next step is to find morphisms g: A ⇒ B, i.e. processes carrying a system from state A to state B (2011: 10). The wealth of categorical abstractions lets us express dynamics subject to local entanglements, as actors’ endogenous behaviour gives rise to emergent structure.
In sum, the appeal of category theory in social science is to bridge continuous and discrete patterns, and allow ‘calibrated generalization’ of social transformations rather than strict equality (2012b: 2). Most of all it offers a form of recursive locality, giving a unified way to analyze bottom-up emergent effects of aggregated agents, plus top-down structural effects on micro-level decisions (2012b: 6).
In 2013, some of the greatest mathematicians in the world published a new foundations of mathematics: homotopy type theory. Naturally, this is exciting stuff, which led Sallach into a detour to see how HoTT might bear upon social science. I feel like not much actually came out of this project, but it’s still worth seeing how he went about it.
Type theory originated in response to a paradox in the foundations of mathematics. Russell’s paradox runs: if a barber shaves everyone who doesn’t shave himself, who shaves the barber? If he does, he doesn’t; if he doesn’t he does. The paradox extends to set theory, at that time the main building block for all mathematics. Russell & Whitehead’s Principia Mathematica used type theory to circumvent it. Imagine a village with a system of castes 1,2,3,4, where 4 is higher than 3, 3 than 2, 2 than 1. A person may only be shaved by someone of a lower caste (e.g. a 3 by a 2 or 1). Since there is no self-shaving, there is no paradox (Doxiadis & Papadimitriou, 2009: 174-5).
In this sense, type theory refutes the claim that “everything is a set” (de Bruijn, 1995: 28). However, until quite recently it was unclear just how the two differ. One property peculiar to types is that “any definable property of objects is invariant” (Awodey, 2014: 7). The solution, it turns out, is to view types as spaces, and examine them with a branch of math that concentrates on spaces: topology.
Topology is the mathematics of squishing shapes into other shapes. A well-known result is that, squish as you might, a sphere can’t be made into a doughnut without tearing it. That is, there exist invariants to the squishability of shapes. If a shape is only a squish or two away from another shape — that is, there exists a homotopy between the two — the shapes are ‘homotopy equivalent’.
To put this more formally, if T is a topological space, then two elements a and b in T are identical if T has a continuous path from a to b (Leslie-Hurd & Haworth, 2013: 101). Given functions f and g mapping T onto another space U, a homotopy morphs one map into another. This means that a homotopy can “cleanly lift the notion of identity from elements a and b to functions f and g” (ibid.).
HoTT closely corresponds to n-category theory, which uses layers of categories to leverage weak equivalences, thereby incorporating greater complexity (2012a: 14). The appeal of HoTT for social science is the high expressiveness afforded by its higher-order logic, which allows (2014: 2):
While the notion of isomorphism lets us ignore all the quantitative details of social transfomation, HoTT can talk about this topologically. Identity in HoTT means two objects are the same homotopy type, so their transformations through deformation will be isomorphic; social interaction can thereby be seen as progession along one such path (2014: 6). It also lets us attribute agency to these transformations by means of type constructors: “a social actor can be an elicitor, a role in which one seeks to alter the motive pattern of another, and thus change the other’s social type” (2014: 8).
One last interesting detail about HoTT is that it can be used as a proof language to formally verify theorems. Thus it lends itself to axiomatic approaches to social science, such as that attempted in Austrian economics. Given social scientists’ well-known inability to agree on even the simplest of terms, the odds seem stacked against such a project. Still, perhaps such a highly abstract language can identify common ground among seemingly incommensurate approaches, such as the following otherworldly axioms (2016a: 8):
Axiom 4a. The more directly that the path of a social actor or cultural trend approaches an idealized conceptual pole, the higher is the n-category required to characterize its structure.
Axiom 5. The more rapidly a social actor or cultural trend approaches an idealized conceptual pole, the more morphisms it manifests.
Overall, Sallach prefers to use HoTT metaphorically, mapping from qualitative social concepts to mathematical ones. Yet, I’d be far more impressed if he approached it from a quantitative angle, showing how one well-known model in the social sciences topologically morphs into another. Still, given its Rosetta stone-like translation into categories, HoTT may prove to be a powerful tool for implementing categorical social science, by means of tools such as cubical type theory that make HoTT into a computable language. As a language for theorizing, however, topos theory sounds far more promising, as we will see next.
To give an idea of their mathematical expressiveness, a topos can be defined in 13 different ways, via different mathematical languages (2015: 41, fn. 2). For our purposes, the simplest is that a topos is a category with two additional conditions: it is ‘Cartesian closed’ and it has a subobject classifier (Bhattacharyya, 2012: 16). The first condition is easy to understand if you’ve run into group theory: a cartesian closed category admits a basic algebra whereby objects have products and exponents.
For the second condition, a subobject classifier provides a category with a logic of wholes and parts. Viewed through the lens of social science, a subobject classifier is a structured way to provide graduated, indexed and/or spectral distinctions within a stable set of values, and lets agents assess how incremental differences affect spaces or structures of interest (2015: 41-2).
Unlike the humanities where ‘topos’ is an incredibly pretentious word for a theme or topic, a topos in math should be thought of as an extended notion of place (Zalamea, 2018: 255). Topos theory makes space a more primary concept than points, unlike set-theoretic approaches that must spell out space as an aggregate of points (Plotnitsky, 2012: 355). This opens the way for ‘point-free’ topology, with objects defined only through flux processes (Zalamea, 2018: 255). In short, topoi allow an “algebraic concept of space, which applies to conventional spatial objects but extends beyond them” (Plotnitsky, 2012: 355).
The concept of sheaf becomes important here, where sheaves behave as “global comprehensions…where all local information…become glued together” (Zalamea, 2018: 261). In fact, a topos can be defined as a category of sheaves over an abstract topology (ibid., 254). The clearest definition of sheaf I’ve found is given by Plotnitsky (2012: 362-3):
A sheaf is a particular kind of arrow space, Y ⇒ X, over (projected onto) a given space, X, associating a space A, to each point of X, which is why it is called a ‘sheaf’, a sheaf of spaces over a given space, which can, again, be a single point. By making each topos a whole category of sheaves and thus spaces (plural) over and indeed defining a given space, the concept topos ‘multiplies’ this concept into an immensely rich architecture, again, even if X is a single point.
More concisely, a sheaf is “just two topological spaces related by a projection with a good local behaviour” (Zalamea, 2018: 253). These sheaves can be glued together or restricted to produce processes coupled across levels (Sallach, 2015: 42). Much like relativity theory, sheaves act like reference frames, where each sheaf in a topos has its own local logic (ibid., 44).
This is the sense in which Badiou draws upon topoi to express ‘logics of worlds’. A topos can encode a logic, such as fuzzy logic where truth values occur along the interval [0,1], or paraconsistent logic “where we can have local contradictions without forcing global contradictions, which would destroy the system” (Zalamea, 2018: 254). Curiously, Badiou leans heavily on this logical interpretation of topoi, while adopting a more spatial perspective allows us to talk about a plurality of possible ontologies, rather than only a plurality of logics (Plotnitsky, 2018: 361). Rather than a universe of sets, we can move to a multiverse of topoi, each with its own dialectic of wholes and parts (Bhattacharyya, 2012: 17).
In this light, Sallach (2015: 46-7) raises a compelling example of the kind of new insights that topos theory can bring to social science. Since a topos allows local definition of truth, each social actor can have their own private method of attributing truth, which can differ by power relations, ingroup/ outgroup status, or hierarchical scale. Hence, topos theory should make it possible to simulate the aggregate effects of diverse truth-attribution practices among agents. Adding more quantitative machinery, it should even be possible to study statistical distributions of truth-attribution.
All this sounds promising as a way to formalize social theories that are currently only expressed in a qualitative way. The baby examples that Sallach provides aren’t quite enough to sway a skeptic, but I hope to have shown that the overall idea is sound. Topos theory opens up new questions of elasticity versus rigidity of social transformations, or continuity versus separation of societal fields, and offers “a dynamical study of dynamics” (Zalamea, 2018: 254) with the immense power of abstraction that social theory needs.
Sallach has been radio-silent since about 2016 — which could mean he’s working on a treatise that will revolutionize social science, or that he’s retired. I’m always excited to hear about new types of math being applied in social science, and am confident that such a powerful formalism can generate radical new ideas. It just hasn’t yet. While I’m not in a position to give substantive criticism, Sallach’s project clearly suffers a lot from a lack of collaboration with mathematicians, as well as his desire to reconstruct social theory from scratch rather than work with what’s already there.
While topos-theoretic social science still lacks any killer applications, it could just be that the idea is too far ahead of its time. A recent book by Schultz & Spivak (2019) uses topos theory to develop a ‘temporal type theory’ that can analyze and compare systems with both continuous dynamics (e.g. differential equations) or discrete dynamics (e.g. jumps). One highly practical application has been to construct a provably correct air traffic control system, and I hope there is much more to come.
Of course, it’s deeply controversial whether this account of Qwerty is true, but this controversy is itself a core part of qwernomics — whether it is possible, within a path-dependent regime, to identify global criteria (such as ergonomics) that let us objectively determine if we are in a suboptimal state.
What if Qwerty isn’t the exception, but the rule? David (1985: 336) points to various other examples in economic history, and the metaphor ties in nicely with economic accounts of market failure, as well as extending to subject matter such as vestigial structures in evolutionary biology.
Nick Land’s more recent thoughts try to take the idea even further by reading the keyboard through Hjelmslev’s quadripartite semiotics, as a paradigmatic example that can take Deleuze & Guattari’s stratoanalysis from pure theory into a workable discipline. This search for nuance often amounts to a paranoiac cryptoanalysis of the keyboard’s geography, culminating in the ‘extreme qwernomic thesis’ that a future alien civilization could reconstruct all human knowledge from Qwerty alone.
My own interest in qwernomics stems from the idea of computationally simulating a sign-system (cf. egregorics). In this view, qwernomics gives a self-contained semiosis we can understand in detail. Below I’ll elaborate on this using a program to translate from Qwerty to Dvorak, revealing overlaps and invariances. Then we can examine this translation process itself through iteration, to locate even further structure. Last, I’ll explore qwernomics more deeply, and suggest further avenues to explore.
A qwernomic puzzle by @cyborg_nomade: suppose we transpose the alphabet onto the Qwerty keyboard in order: ‘q’=‘a’, ‘w’=‘b’, and so on; which keystrokes will yield English words on both the Qwerty and alphabetic keyboard? More interesting: are any words the same on both keyboards?
The latter is actually quite easy to answer — all we have to do is write the letters in a list. Let’s also throw in the Dvorak keyboard to make things interesting.
alpha = ['a'..'z']
qwerty = ['q','w','e','r','t','y','u','i','o','p',
'a','s','d','f','g','h','j','k','l',
'z','x','c','v','b','n','m']
dvorak = ['p','y','f','g','c','r','l',
'a','o','e','u','i','d','h','t','n','s',
'q','j','k','x','b','m','w','v','z']
If we put all the keys in a single line, we can see just by eyeballing that in Qwerty and Dvorak, the ‘x’, ‘o’, and ‘d’ keys are in the same position. Thus, words such as ‘do’, ‘ox’, ‘odd’, ‘dodo’ and ‘doxx’ are all fixpoints, taking the nth keys in both systems. (Note that these aren’t the same physical keys.)
We can confirm this by writing a Haskell function to do exactly what we would do in eyeballing, only more thoroughly. Reading from right to left, it takes two lists (e.g. xs
=alpha and ys
=qwerty), then zip
puts them in pairs, like ('a','q')
. We want to know that the two elements are equal, which is what the lambda function does (\(x,y) -> x==y
). Last, filter
gives us only the pairs that satisfy that equality, and map fst
gives us the first element of each such pair.
From this we find that the alphabetic and Dvorak keyboard only have ‘z’ in the same place. Last, alpha
and qwerty
have no pairs: no words are the same on both keyboards. We can visualize these correspondences as follows, going clockwise from 90°.
For finding ‘revolving pairs’ between alphabetic and Qwerty, @eccehetero has already done this in Java (code; results). Notably, he points out its similarity to cryptography, such as the Caesar cipher. Since alpha-to-Qwerty translation is taken care of, here I’d like to experiment with Qwerty-to-Dvorak.
To compare whole words, we’ll need a function to convert from one keyboard to another. Note that this requires functions from the Data.List and Data.Maybe libraries.
convert xs ys [] = []
convert xs ys (z:zs) = (toEnum (fromEnum z + shift) :: Char) : convert xs ys zs
where kNums = zipWith (-) (map fromEnum ys) (map fromEnum xs)
shift = kNums !! fromJust (elemIndex z xs)
Our function convert
takes two lists — the one we’re starting from (xs
), and the one we want to convert it to (ys
). It also takes a string (z:zs)
, where z
is the head or first letter of that string. The core idea is just to convert letters to numbers (fromEnum
), add a ‘shift’ factor, and then convert back to letters (toEnum
). We get this shift factor by pairing up letters, converting them to numbers, and subtracting. Then for a given letter z
(the nth on the keyboard) we just find the corresponding shift.
To translate key-wise from Qwerty to Dvorak, we can just use map (convert qwerty dvorak) words
, where words
is a list of strings. One list of 58K words is available here.
First, however, convert
isn’t super-efficient, since we’re reconstructing this list of shift factors over and over again for each letter. To go through an entire dictionary, it’s better to define this list separately and take it as an argument to the function. This is actually really easy, but less general:
qwertyToDvorak = zipWith (-) (map fromEnum dvorak) (map fromEnum qwerty)
convert' xs shifts [] = []
convert' xs shifts (z:zs) = (toEnum (fromEnum z + shift) :: Char) : convert' xs shifts zs
where shift = shifts !! fromJust (elemIndex z xs)
With this done, the crucial function to use is the following, which converts the words and then filters out those that aren’t in the dictionary.
After back-translating the result to Qwerty, we get this list of words, in the format (qwerty,dvorak)
:
[("bid","wad"),("bids","wadi"),("bob","wow"),("boo","woo"),("bop","woe"),("cad","bud"),("car","bug"),("caw","buy"),("cob","bow"),("coo","boo"),("copy","boer"),("cor","bog"),("coup","bole"),("cow","boy"),("cox","box"),("dad","dud"),("did","dad"),("do","do"),("dodo","dodo"),("dog","dot"),("doh","don"),("dopy","doer"),("dor","dog"),("dot","doc"),("ear","fug"),("fag","hut"),("far","hug"),("fig","hat"),("fir","hag"),("fob","how"),("fog","hot"),("food","hood"),("fop","hoe"),("for","hog"),("gaga","tutu"),("gig","tat"),("go","to"),("goo","too"),("ha","nu"),("hag","nut"),("hic","nab"),("ho","no"),("hob","now"),("hod","nod"),("hog","not"),("if","ah"),("irish","again"),("jap","sue"),("job","sow"),("jog","sot"),("lag","jut"),("log","jot"),("low","joy"),("moo","zoo"),("odd","odd"),("of","oh"),("oh","on"),("ox","ox"),("pig","eat"),("pro","ego"),("rag","gut"),("raw","guy"),("rid","gad"),("rocs","gobi"),("rod","god"),("roo","goo"),("soh","ion"),("tag","cut"),("tap","cue"),("tic","cab"),("tog","cot"),("too","coo"),("tow","coy"),("toy","cor"),("yap","rue"),("yogi","rota")]
This gives 76 words out of 58,110, or 0.13%. Note the prevalence of words featuring ‘o’ and ‘d’, our semiotic fixpoints from before. There’s nothing too eldritch, I’m afraid — the longest is irish/again. If anyone feels brave, there’s also a dictionary of 466K words here, to root out more obscure pairings.
I’ve put my code in a GitHub repository, which includes both 58K-word and 466K-word dictionaries, which can be imported as a module. The code could be improved in a lot of ways — notably, it breaks if it encounters a non-alphabetic character, like an apostrophe. I’d like to come back to it once I’ve learned more about data structures and parallelism, to see if my poor laptop can handle 466K words. The repo also has TikZ code for the diagrams in this post, which may be fun to experiment with. The arcs below are inspired by a much prettier version by Amy Ireland.
Elsewhere, @8051Enthusiast pointed out that changing from Qwerty to Dvorak is a permutation operation that can be repeated to give a Dvorak² — far more powerful than mere Dvorak¹.
All we need for this is to iterate our convert
function from before. In convert qwerty dvorak qwerty
, the first two arguments create a permutation from the Qwerty keyboard to a Dvorak keyboard, while the third argument (qwerty
again) uses the whole Qwerty keyboard as material. Thus, to get Dvorak-squared, we can just do convert qwerty dvorak (convert qwerty dvorak qwerty)
.
This gives the keys "erhtbgjuofladncvipsqxwzymk"
, and the following diagram shows clearly that these permutations are the same.
We can write this in a more sophisticated way using the iterate
function, which repeats the convert
operation as many times as we want.
forbiddenDvorak n
| (n<=0) = iterate (convert dvorak qwerty) dvorak !! (abs n + 1)
| otherwise = iterate (convert qwerty dvorak) dvorak !! (n-1)
Thus we can get Dvorak-cubed, up to any power n
that we want. Notice that I’ve included the case where n
is less than zero. This acts as an inverse function, or the reverse permutation (here from Dvorak back to Qwerty).
Perhaps the most curious aspect is that if we keep iterating our function, we’ll eventually get back to where we started. We can check this by creating a list of Dvorak^{n} keyboards for increasingly higher powers, and seeing if any besides the first are the same as Dvorak. (The fmap (+2)
is because indexing starts at zero, plus we’re excluding Dvorak¹ = Dvorak.)
This outputs Just 211
, and as we can plug in and confirm, forbiddenDvorak 211
gives the Dvorak keyboard. You can likewise confirm that forbiddenDvorak 210
gives Qwerty, which of course follows.
We can make an analogous function permuting alphabetical order into Qwerty:
forbiddenQwerty n
| (n<=0) = iterate (convert qwerty alpha) qwerty !! (abs n + 1)
| otherwise = iterate (convert alpha qwerty) qwerty !! (n-1)
This one cycles after only 42 iterations. If the obvious association just came to mind, that’s a taste of the mix between wild coincidence and transcendental conspiracy that defines CCRU humour.
As an exercise, make a forbiddenAlpha
function that converts from alphabetic to Dvorak. This one also has a cycle — it’s quite a bit larger than 250, but it’s there, and you can find it on your laptop.
I know very little about abstract algebra, but this opens up numerous interesting questions. I didn’t include punctuation for each keyboard layout, but how would these cycles change if we added it? Why does forbiddenQwerty
cycle so much sooner than the others, and is there a metric to measure how ‘close’ a permutation is? Last, I can’t help but wonder if this group-theoretic structure is an artifact of our simple example, or whether it can be found in more complex sign-systems.
For Le (2019: 100-1), qwernomics represents how “the demands of capital accumulation led the technology of the keyboard to develop down a de-anthropomorphising path from which we could not diverge.” That is, assuming Qwerty’s inferiority, companies’ myopic refusal to train their workers in a better typographic system is “exemplary of technocapitalism’s way of locking us into a trajectory that will ultimately dehumanise us.” (Though dehumanization isn’t necessarily a bad thing!) While this may be a fair reading of Land’s original essay, in light of his 2016 lectures it’s far too glib.
Consider Nash equilibrium, where no agent can benefit by unilaterally changing their strategy. It may be, as in the Prisoner’s dilemma, that if all players changed their strategy, all could benefit, but none can on their own. Conversely, qwernomics presents a situation where any one player can unilaterally benefit from changing their strategy (e.g. a firm switching from Qwerty to Dvorak), but the societal transaction costs are so large that — across different games — they nullify this advantage.
This parallels the notion of patchwork, where normally a larger country is more efficient due to economies of scale, but now the transaction costs associated with large political institutions are so immense as to outweigh all these benefits, so on net it’s more efficient just to have a smaller state.
Choosing a keyboard is a non-ergodic process, in that we cannot freely wander through the space of possibilities toward a global optimum, where Qwerty has come out best from untrammeled market experimentation. An ergodic process tends toward a fixpoint or equilibrium, while non-ergodic systems do not, but exhibit path dependency, where states far in the past can influence states far into the future. Land perceptively notes that this division of ergodic versus non-ergodic is similar to molar (tending toward an average) versus molecular (centrifugally tending away from an average).
If Qwerty represents a ‘market solution’, Dvorak represents ‘rational planning’ from top-down. Yet, we can also imagine a centrally-planned society unwilling to escape from Qwerty lock-in, where the benefits of Dvorak are judged as not worth the cost of retraining and retooling. Thus, qwernomics is not limited to this simple opposition of interventionism versus spontaneity. In either system, transaction costs have built up over time, so that Qwerty becomes a self-installing law.
Just as Keynesian economics claims that markets get stuck in a low-employment equilibrium, which we can identify from a superior frame, we want to know by what criteria we could evaluate our path as suboptimal from a rational, Archimedean point. Ergonomics seems like the best candidate for Qwerty, but is that all a keyboard really is? The deeper notion here is that a Schelling point — an arbitrary but handy standard, like driving on the right side of the road — creates the possibility space by which we can retroactively judge it. At a certain point, the standard consumes the horizon.
As another example, Land explicitly compares Qwerty’s hegemony to that of decimal numeracy, and some mathematicians claim octal numbers are superior. Conversely, vestigial tails are evocative, but key to qwernomics is plasticity of agents’ behaviour: they’re free to use a different keyboard if they want to, but don’t — rather than having a suboptimal state biologically baked into them.
Thus, the main flaw in Le’s account is workers’ lack of agency. Moreover, instead of emphasizing suboptimality as ‘dehumanizing’, Land’s main interest is in Qwerty as a causal loop, like the bootstrap paradox in time travel. Qwerty as a code represents a radically alien form of immanence: it is authoritative because it has made itself occur.
Land goes on to use the Qwerty keyboard as a hermeneutic device to understand Deleuze & Guattari’s “Geology of Morals”. There, stratoanalysis is introduced as a synonym for D&G’s project, alongside rhizomatics, nomadology, schizoanalysis, pragmatics, and so on. In general, stratoanalysis is very seldom invoked, but it’s actually quite rich as a framework.
The basic idea is that the world is made up of relatively autonomous codes, such as the genetic code versus body language. The system of oppositions that defines a code takes place on an independent layer or stratum. Strata interact horizontally via parastrata (codes presupposed by another code of the same order, such as the prison system and legal system) — or vertically via epistrata (codes presupposed by another code of a different order, such as the legal system relying on language, biochemistry, and so on). An example of parastrata in Qwerty is using keys in a game for movement.
A keyboard lends itself as a paradigmatic example because it combines so many forms of code, at so many levels. There are linguistic codes (letters in English words), physical codes (limitations of human hands), and mechanical codes (technical specifications). Further, a keyboard embodies various semiotic distinctions, such as code vs. territory or form vs. substance. Qwerty takes the alphabet out of its ‘territory’, as ordered in the ABCs song; likewise, the fingers of the typist are deterritorialized onto the keyboard. In this sense, Qwerty is an abstract diagram of stratification.
While typical semiotics deals with signifiers and signifieds, Deleuze & Guattari draw from Hjelmslev’s quadripartition. In a verbal language, expression-substance is the continuum of sounds producible by the human vocal apparatus; expression-form is its differentiation into phonemes (signifiers); content-substance is sphere of concepts; and content-form is its differentiation into signifieds [via].
Analogously, if substance is the physical aspect, then a physical keyboard is content-substance, while Qwerty is an expression-substance. Likewise, the expression-form is the signifier being typed, while content-form is the signified. (Don’t quote me on this, I’m sure I mangled it.) The main point is that variations such as Azerty keyboards for Francophones can be localized onto one of these planes.
Therefore, more broadly, qwernomics is an analysis of stratification or capture, as Qwerty directs codes’ flows of intensities. Audaciously, we can say that qwerty is a cultural genome, with knowledge geologically deposited into it. By way of a quasi-diagrammatic analysis of the keyboard through stratograms, we can hack the cryptographic protocol that the qwerty apocalypse has delivered to us.
If all this sounds increasingly weird, then good. It gets weirder.
Above I stressed qwernomics’ skepticism towards global criteria for optimality, and we can radicalize this still further. One form has shown up in empirical studies: that even prior to Qwerty there was already a pool of typing competence that settled the arrangement of Qwerty. Hence, Qwerty has no localizable origin, and the entire geopolitical structure of world history is Qwerty-shaped.
Thus the ‘extreme qwernomic thesis’ can perhaps be most cogently stated as follows:
the highly obscure historical stratodynamic process from which we have inherited the keyboard is the same process that has provided us with all the cognitive resources that we could conceivably gain access [to] in investigating our object" [2016, VII, around 41:00]
We can restate this in various other ways. (I’m mostly just plagiarizing Land here.)
The core idea is that of an immanent cognitive horizon: if qwerty is the machine talking about itself, there is no meta-discourse on the machine that we’re ever going to find. There is no position of purchase or overview or superior perspective, because the function of overviewing is itself a product of a stratic mechanism, a symptom of stratic embeddedness. Hence there is no possible that isn’t already within the processing system. The overview position itself is structured inside that strata.
Qwerty is the revelation of a transcendental cognitive engine, in that the cognitive machinery we are trying to put into play has to come out of our object. Faced with the immanence of all criteria, there is no way out, but only through — mining parts of the keyboard for parts of our theory, in the form of “delirialized, wild particles of conception” such as the Ctrl
key as zone of transference between dimensional systems, or Qabbalistic resonances of the Esc
key. The purpose of these schizoid parts of qwernomics is to explore the extent to which qwerty is a transcendental cognitive resource.
Most of these statements sound absurd when applied to the keyboard, but the same considerations come into play for concrete problems where we don’t have nearly as much of a conceptual foothold.
The ultimate stakes of qwernomics are to delineate the meaning of a ‘critique’ of capitalism, in the Kantian sense of a global tribunal of reason. Many people will be sympathetic to the notion that there is no such universal position; yet, even by calling something a local optimum at all, we’re already implying the existence of some position of transcendent criticism. At a bare minimum, qwernomics makes us aware of the ‘spectre of the universal’ by which we criticize standards.
While I still haven’t digested all this, it seems that simulation offers an interesting counterpoint. Instead of looking at our path from some external position and saying it was wrong, simulations immanently generate a multiplicity of paths to which we can compare our own. We don’t need an Archimedean point, but only to differentiate the various strata as manifested in parameters that can undergo variations. Instead of a global perspective, we generate a garden of forking paths to populate the search space with purely local theories. This seems entirely compatible with the points made above. In a word, we can radicalize qwernomics still further into computational stratoanalysis.
Enter Moments. Moments are like eavesdropping on hundreds of conversations at precisely the most interesting point. As opposed to obtrusive tweetstorms based on searching a keyword (as I did in my early days, riffing off @youtopos), Moments provide a handy cubbyhole for tweets you want to return to later, whether 10 years old or 10 minutes. It’s the closest you can get to meta-tweeting.
I’m obviously what programmers call an ‘edge case’ here — someone who uses a software feature in a way the programmer never anticipated. Yet, an interesting mathematical fact is that in an n-dimensional hypercube, most of the volume occurs at the edges, unlike with 2 or 3 dimensions (h/t). This is precisely the ‘pataphysical space that programmers occupy, where the exception is the rule.
I’ve made 561 Moments, mostly on philosophy or math. My favorite aspect is how they encompass a mosaic of often contradictory perspectives on the same concept — heteroglossia. Scouring twitter is also a surprisingly effective way to get the core of an idea, shorn of all unneeded verbosity. Last, it’s a backstage pass to the lore of a (sub-sub-)discipline, like the deep grad school conversations that I never got to have, or the mysterious origins of zygohistomorphic prepromorphisms.
After Twitter’s last major upgrade, I used an app to keep the old layout, which had a simple button for adding tweets to a Moment. On June 1, Twitter removed support for this. In short, it’s an order of magnitude more difficult to make them, so I have little choice but to move on. Still, I deserved any flack I got for curating other people’s ideas instead of my own, and now plan to blog more instead.
Some of my favorites came from conferences, compiling reams of hyper-erudite thought-crystals. (Though I’m told these authors optimized precisely for pithy tweets — rather than, say, substance.)
For the rest, even the titles alone give an evocative bestiary of avant-garde ideas. I certainly can’t claim to have my own ‘take’ on each of these, but I’ve at least tried to understand them from many angles, like a Cubist painting. It’s fun and informative just to go through the list and see how many you can rattle through, and for someone hoping to learn, these provide digestable sound-bytes spanning the spectrum from gadfly-bites to perspective-changing insights.
However, lots of people have encountered hitches to installing it (at least in Windows). Worse, most online resources only say how to install it on Linux, which is super-easy. Here, I just want to spell everything out, since even though it’s not super-fancy, you can still do some fun things with EasyPlot.
EasyPlot is based on gnuplot, a fairly retro program dating back to the ’80s. Interestingly, the open-source econometrics package Gretl uses gnuplot. You can install it by downloading a nice .exe file, which makes everything easy. The most recent version (currently 5.2.8) worked fine for me.
With this installed, the first hitch is that you need to update your %PATH%
variable, so that your computer can find the file. In Windows 10, you can just put “environment variables” in the searchbar. The longer way is: Control Panel > System and Security > System > Advanced System Settings
.
Click on the button that says ‘Environment Variables’. Under ‘System Variables’ there should be one called Path
. Highlight it, click ‘Edit’, click ‘New’, and add the folder that gnuplot.exe is in. For me, it’s: C:\Program Files\gnuplot\bin
. Then click OK, OK, OK, and we’re done that part.
Then we need to install it on Haskell. For 8.10.2, putting cabal new-install easyplot
in the command line worked for me. For 8.6.5, I needed to use the older cabal install easyplot
.
There’s a very similar package gnuplot
, which you can install too if you want. The hitch with this is that it keeps throwing an error message asking for pgnuplot
, which is deprecated. To fix this, you just need to make a copy of gnuplot.exe
and name it pgnuplot.exe
. The syntax for Gnuplot is very similar, but far more customizable. Let’s leave that aside for now.
If EasyPlot is installed correctly, you should be able to import Graphics.EasyPlot
from GHCi or WinGHCi. If it’s still not working, try using stack install easyplot
or cabal new-install easyplot
.
The second hitch is actually using it. The documentation kindly offers some example plots, but they don’t work in Windows without some editing. Let’s take one example:
plot X11 $ Function2D [Title "Sine and Cosine"] [] (\x -> sin x * cos x)
First, X11
specifies that it’s meant for Linux, so for Windows we need to change this to Windows
, or for Mac change it to Aqua
. However, if we put this into the terminal, our chart pops up for a split second, then erases, leaving behind True
to show that it ‘worked’.
The solution, thanks to the mailing list, is to enclose our plot in a monad, namely a do
statement. This makes sense if you’re vaguely familiar with monads, since making a plot is an IO action.
Next, instead of plot
we have to use plot' [Interactive]
. Here’s a version that will actually work:
do plot' [Interactive] Windows $ Function2D [Title "Sine * Cosine"] [] (\x -> sin x * cos x)
Here, [Interactive]
specifies that we want a window we can adjust. This is quite nice, actually: we just resize the window and the diagram stretches and shrinks along with it.
Alternatively, if we want to print out an image we use the following syntax:
do plot (PNG "test.png") $ Gnuplot2D [Color Blue] [] "2**cos(x)"
Note that we’re using plot
with no apostrophe, and that we’re replacing [Interactive] Windows
. It can also make PDFs (like so: (PDF "test.pdf")
), which are vector images, so won’t get blurry if you zoom in on them. Still, since EasyPlot doesn’t let you set the axes beforehand (and the printed PNG often looks coarser), most of the time I’d rather adjust the window by hand and take a screenshot.
One last hitch, and this is a weird one. Suppose we save one of our plots to a variable, like so:
test = do plot' [Interactive] Windows $ Gnuplot2D [Color Blue] [] "2**cos(x)"
Then when we put test
into the terminal, it makes the plot we want, but the next line looks like this:
gnuplot >
This happens because Haskell is calling the program gnuplot, and hasn’t left that program yet. So you need to manually type quit
in the terminal. Once you know you have to do this, it’s easy enough, but it also means you can’t make automated plots, which is a drag. Furthermore, if you’re doing this on WinGHCi it doesn’t even show the gnuplot >
line, so it’s even more confusing.
All that was a huge pain to figure out, but from there everything else is easy. Wahey! Now let’s go on to see some examples of stuff we can make.
Here are some EasyPlot examples I found on various sites in French, Spanish, and Russian(!).
First, here’s one I made of a logarithmic spiral (where you can save the code as spiralplot.hs):
module SpiralPlot where
import Graphics.EasyPlot
spiral = [(x,y) | t <- [0,0.01..4],
let x = a * exp t * cos (b*t),
let y = a * exp t * sin (b*t)]
where a = 0.1 ; b = 4
main = do plot' [Interactive] Windows $ Data2D [Title "Logarithmic Spiral", Style Lines] [] spiral
For the parameters we can also pattern-match like where (a,b) = (0.1,4)
, but as such it’s fine. Another example here (code) plots radioactive decay; just don’t forget to change the plot X11
part.
The main drawback is that functions need to be expressed explicitly in the form z = f(x,y)
or y = f(x)
(i.e. the dependent variable can’t be part of the equation), while apparently other math software lets you plot implicit functions like \(\sin(y^2 * x^3) = \cos(y^3 * x^2)\). A cool project might be to write a program that makes an implicit function into an explicit one and feeds it into EasyPlot.
Next, here’s a 3D plot from the documentation:
do plot' [Interactive] Windows $ Gnuplot3D [Color Magenta] [] "x ** 2 + y ** 3"
This is where EasyPlot shines relative to other software; in TikZ this would probably take hours to do. Judging from a quick search, it’s possible with gnuplot to color in the mesh, making it even snazzier.
One user on a French forum wanted to know if we can diagram Taylor polynomials without formally using calculus in Haskell. That is, we want to approximate an equation using its Taylor expansion:
\[T_{n,0}(x) = \frac{x^0}{0!}\cdot f^{(0)}(0) + \frac{x^1}{1!}\cdot f^{(1)}(0) + \frac{x^2}{2!}\cdot f^{(2)}(0) + \cdots\]
To do this, we’ll create two infinite lists: one for \(\left[ \frac{x^0}{0!}, \frac{x^1}{1!}, \frac{x^2}{2!}, \cdots \right]\) and one for \(\left[ f^{(0)}(0), f^{(1)}(0), \cdots \right]\).
The idea is to graph both the true function and its approximation, showing how the latter becomes better as our Taylor polynomial increases in degree. This is how it’s supposed to look like:
However, I can’t get it working. My best guess is, it’s because of either polyTaylor (snd tf) n
or fst tf
, or both. I played around with it, but no luck. Below I’ll put a translated version of the code, and if anyone is able to fix it, do let me know. I’ll make a StackExchange question about this problem later.
module TaylorPolynomials where
import Graphics.EasyPlot
-- Returns the vector [1, x, x^2/2!, x^3/3!, ..]
vecTaylor :: Double -> [Double]
vecTaylor x = scanl (\ acc k -> acc * x / k) 1.0 [1 .. ]
-- Basic functions
-- fonction :: Double -> Double
invPlus = (\ x -> 1 / (1 + x))
invMinus = (\ x -> 1 / (1 - x))
logPlus = (\ x -> log (1 + x))
rootPlus = (\ x -> sqrt (1 + x))
{- We return the tuple (functions, list of its successive derivatives in 0)
We usually look for a recurrence formula and use scanl
Otherwise, we repeat patterns
tTruc :: (Double -> Double, [Double]) -}
tExp = (exp, repeat 1.0)
tinvMinus = (invMinus, scanl (\ acc k -> acc * k ) 1.0 [1 .. ])
tInvPlus = (invPlus, scanl (\ acc k -> acc * (-k)) 1.0 [1 .. ])
tRoot = (rootPlus, scanl (\ acc k -> acc * (-0.5) * (2*k - 1)) 1.0 [0 .. ])
tLogPlus = (logPlus, 0.0 : (snd tInvPlus))
tSin = (sin, cycle [0.0,1.0,0.0,-1.0])
tCos = (cos, cycle [1.0,0.0,-1.0,0.0])
{- Returns the evaluation of the Taylor polynomial of degree n in x of the function
whose derivatives are given in 0 -}
polyTaylor :: [Double] -> Int -> Double -> Double
polyTaylor listDerivatives n x =
sum $ take (n + 1) $ zipWith (*) (vecTaylor x) listDerivatives
-- We draw on the same graph the function and its Taylor polynomial on [a, b] with a step of 1/2 ^ 8
compareDL :: Double -> Double -> ((Double -> Double) , [Double]) -> Int -> IO Bool
compareDL a b tf n =
do plot' [Interactive] Windows $
[Function2D [Title ("n = " ++ (show n)), Color Blue] [Range a b, Step (2**(-8))]
(polyTaylor (snd tf) n),
Function2D [Title "True function", Color Red] [Range a b, Step (2**(-8))] (fst tf)]
-- this is the part that I can't get to work
main = compareDL (-0.8) 1.2 tLogPlus 4
As another calculus example, one Finnish blog has an example of Euler’s method for approximating differential equations, but the results are underwhelming. (Spoiler: it’s just a diagonal line.)
My favorite example is linear regression by José A. Alonso, whose twitter you should totally follow. This one is in Gnuplot; someone in the comments did an EasyPlot version, but I can’t get it to work.
import Data.List (genericLength)
import Graphics.Gnuplot.Simple
varX, varY :: [Double]
varX = [5, 7, 10, 12, 16, 20, 23, 27, 19, 14]
varY = [9, 11, 15, 16, 20, 24, 27, 29, 22, 20]
linearRegression :: [Double] -> [Double] -> (Double,Double)
linearRegression xs ys = (a,b)
where n = genericLength xs
sumX = sum xs
sumY = sum ys
sumX2 = sum (zipWith (*) xs xs)
sumY2 = sum (zipWith (*) ys ys)
sumXY = sum (zipWith (*) xs ys)
b = (n*sumXY - sumX*sumY) / (n*sumX2 - sumX^2)
a = (sumY - b*sumX) / n
mygraph :: [Double] -> [Double] -> IO ()
mygraph xs ys =
plotPathsStyle
[YRange (0,10+mY)]
[(defaultStyle {plotType = Points,
lineSpec = CustomStyle [LineTitle "Data points",
PointType 2,
PointSize 2.5]},
zip xs ys),
(defaultStyle {plotType = Lines,
lineSpec = CustomStyle [LineTitle "Regression line",
LineWidth 2]},
[(x,a+b*x) | x <- [0..mX]])]
where (a,b) = linearRegression xs ys
mX = maximum xs
mY = maximum ys
main = mygraph varX varY
I’d be willing to switch to Haskell for time-series if it had a nice way for handling dates on the x-axis. Even in R, this is a huge pain in the ass and looks terrible. As of now, it seems not — the typical method (via here) seems to be having a list of values
on the y-axis, and then using zip [1..] values
to make a list of tuples. I’d much rather think in days and months rather than “point #3000”.
There’s a really beautiful and elaborate example on Rosetta Code: a Pythagoras tree. However, not only does it fail to work, but every time I try to run it, it makes 1,023 .dat files in my current directory! If anyone else can get it working, I’ll be super-excited, but reader beware.
Since Gnuplot has been around for a while, there are lots of groovy examples that you can try reproducing. As you learn more possible features, you’ll likely ‘graduate’ to the gnuplot
library. Moreover, it seems that you can take the .dat file generated by Haskell, and plug it right into gnuplot to configure it on the command line.
A nice exercise to start off with is this brief tutorial on studying functions. The documentation is quite friendly as well, and outlines various customizations such as adjusting the colors, range, or style. Last, ch. 4 of Church’s Learning Haskell Data Analysis uses EasyPlot for financial time-series.
In and of itself, EasyPlot isn’t too special, but once the installation is done it’s quite easy to integrate with anything else you’re doing in Haskell. I expect it can be very helpful in studying empirical data to look for outliers, or as an accessible learning aid for calculus (especially multivariable calculus).
Its main flaw is its limited features, but this is solved by Gnuplots, which I haven’t had a chance to try out yet in detail. I also came across Hatlab, likewise based on gnuplot but with very different syntax. Last, I’ve heard good things about Chart, and hope to experiment with it in a future post.
Overall, I’d take this over a fugly Excel chart any day.
]]>A pithy definition I’ve found is that monads are a “type for output impurity”. Even from this we can see how much the concept is tied to the functional programming paradigm, and makes little sense outside of it. Within that paradigm, however, monads are a deeply elegant solution to a variety of problems, and continually give new ‘aha’ moments as one progresses deeper and deeper.
This post documents one such moment — a tiny one, but one that tripped me up for quite a while. To follow along, the reader should have read and understood the Haskell wikibook page on monads.
Associativity means it doesn’t matter what order we evaluate a statement, typically written as:
(a ∘ b) ∘ c = a ∘ (b ∘ c)
However, in Haskell the associativity law is written as follows:
(𝑚 >>= f) >>= g = 𝑚 >>= (\x -> f x >>= g)
Yeesh. It’s not at all obvious that these are the same. For this reason, many people prefer the Kleisli composition operator (or ‘fish’), which lets us write the above as:
(f >=> g) >=> h = f >=> (g >=> h)
This gives us the structure we want, but just hides the whole problem within the definition of >=>, which is just as hairy, namely: f >=> g = \x -> f x >>= g
. Gag me with a spoon.
Here I want to explain my Eureka moment in finally getting the associativity law for bind.
First, note that the type signature for bind (>>=) is as follows:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
So when used as an infix, the left-hand argument must be a monadic object, while the right-hand argument must be a function that takes a non-monadic object, and makes it into a monadic object. (Note that 𝑚 is not the same as m.)
In the left-association (𝑚 >>= f) >>= g
, the bracketed part gives the two arguments we need for bind: 𝑚 :: m a
, and f :: (a -> m b)
. Evaluating these gives a new term 𝑥 :: m b
. This is a monadic object, which is exactly what we expect on the left-hand side of the next bind. Therefore the next step is to apply g to 𝑥, giving a new monadic object, and then we’re done. Easy.
Right-association, on the other hand, is a pain in the ass. Again:
𝑚 >>= (\x -> f x >>= g)
To understand why we have to write it this monstrous way, let’s see how it would look if we didn’t:
a >>= (b >>= c)
Here, b is a monadic object, and c is a function. Applying bind to these gives a new monadic object 𝑦. Then the next step is to evaluate a >>= 𝑦
.
Here we have a problem, because 𝑦 needs to be both a monadic object (as output of b >>= c
) and a function (as input to a >>= 𝑦
), which doesn’t really make sense.
To understand right-association (𝑚 >>= (\x -> f x >>= g)
), let’s start from the first bind. We know the left-hand side 𝑚 is a monadic object. We also know that the right-hand side needs to be a function, which is satisfied due to the lambda-term (i.e. \x ->
).
In evaluating 𝑚:: m a
, the first bind ‘unwraps’ the 𝑎 from its monadic wrapper, and feeds it to the function on the right-hand side. So this 𝑎 is an ordinary non-monadic object, and we’re using it as an input for our function, which has type (a -> m b)
.
But this 𝑎 is precisely the argument x that is called for in the lambda-function \x. Once it gets that argument, we can just ignore the function part (\x ->
), and only need to deal with f x >>= g
.
Well, f has type (a -> m b)
, so it takes this argument 𝑎 (or x) to produce a monadic object m b, which is what we expect on the left-hand side. Then we unwrap the b and pass it to the function g, which is likewise typed (a -> m b)
in general, or in this case (b -> m c)
. Thus, the end result of this double bind is a monadic object m c
. Hooray!
To recap: with left-associativity (𝑚 >>= f) >>= g
, things are easy because we can simply evaluate the expression in brackets, and then feed the result as an argument to the remaining expression.
With right-associativity 𝑚 >>= (\x -> f x >>= g)
, we can’t quite do that, because of this pesky \x. Instead, the 𝑎 term from 𝑚 :: m a
is used as the x argument, and then we can evaluate the bracketed part independently.
I’ve reread the monad laws many times, and this eluded me up until now. This is likely one of those things that will seem ‘obvious’ going forward, but even as I wrote this post I almost lost the main insight, due to a minor confusion about 𝑚. (My mistake was: I briefly thought the first bind passed all of 𝑚 to \x, not just the 𝑎 term.)
Given how garrulous this post was, it’s no wonder most treatments of the monad laws just glance over associativity. The step I was missing was so profoundly simple, but easily lost among the mess of symbols. Therefore I hope this post helps not only ‘advanced beginners’ like myself, but also reminds experienced coders how to think of monads from the ground up.
]]>There may be a day when formal proof is a gold standard, separating those who tell people what they want to hear, from those who tell it like it is. That’s a ways away. Yet, just as we smirk at poor Milton Friedman taking hours to run a regression using punch cards, so shall we be condescended for assuming that if an idea makes sense in our head, this is evidence for its correctness.
Here, I would like to examine a very specific project led by Casey B. Mulligan at the University of Chicago, on automated reasoning in economics using quantifier elimination — a tool from logic for systematically proving statements about polynomial inequalities of real variables, which can prove theorems, generate counterexamples, and automate counterfactual statements on economic policy.
Suppose we want to find the set of coefficients for parabolas with real roots — values of b and c that solve the equation for a u-shaped curve, where x is a real number: {(b,c) ∈ ℝ² | ∃𝑥(x² + bx + c = 0)}.
This is easy if we remember the quadratic formula, namely the part √(b² − 4ac). If 4ac is bigger than b², then we’re taking the square root of a negative number, so the value of x is not real anymore. Therefore the answer to our question is {(b,c) ∈ ℝ²: b² ≥ 4ac}, where a=1 by assumption.
Notice that our entire deduction takes the form of eliminating the quantifier ∃𝑥. If ∃x is a question, then x without ∃ is the answer. If we want to check if a parabola is in the set, with the quantified definition we would have to check all possible values of x, but with the quantifier-free definition we just need to see if b and c satisfy the inequality (2016: 2).
We can also think of quantifier elimination by analogy with Boolean satisfiability (SAT), which takes a logic formula made up of variables that can be True
or False
, and finds values for the variables so the formula evaluates to True
. This was one of the first problems to be proven NP-complete, i.e. we don’t have a general algorithm to solve all such problems, but we do have lots of smaller ones.
Satisfiability modulo theories (SMT) are a step up that allows inequalities (e.g. a ≥ b), which lets us tackle far richer questions with numbers. (This is similar to constraint solving.) Quantifier elimination is a type of SMT that involves polynomials like ax² + bx + c, or with even higher powers.
So the main ‘unit’ in quantifier elimination is Tarski formulas: boolean combinations of polynomial equalities and inequalities. We input a Tarski formula, and quantifier elimination gives us True
, False
, or (if there are any unquantified variables, as above) an equivalent quantifier-free formula.
Now we get what quantifier elimination is. Now let’s look at how it works.
Broadly, we want to show that for all values of a set of variables, our assumptions under those variables imply our hypothesis: ∀v, A(v) → H(v). Possible results can be tabulated as follows.
~∃v(A ^ ~H) | ∃v(A ∧ ¬H) |
|||||||||
∃v(A ^ H) | True |
Mixed |
||||||||
~∃v(A ^ H) | Contradiction |
False |
If all values support H and none support ¬H, then H is True
; if the other way around, H is False
. More likely, some values will support the hypothesis, while some will support its negation (Mixed
). Last, if our assumptions are contradictory, any implication is vacuously true (Contradiction
).
If we want to prove a hypothesis is True
, we just show that (A ∧ ¬H) is false for all v. Once we have a True
result, we can try to weaken it by removing assumptions, and finding any that are irrelevant. Likewise, for a Mixed
result we can add assumptions until it becomes True
or False
.
There is also a clever way to generate examples or counterexamples (Mulligan, 2018: 5, fn. 9):
Existentially quantify N−1 of the variables in the Tarski formula leaving free, say, x₁, and then eliminate quantifiers. The result is a formula in x₁ alone. Choose a real number for x₁ that satisfies the formula and substitute that value into the original N-variable Tarski formula, making it an (N−1)-variable Tarski formula. Repeat the process for x₂, etc., until real numbers are assigned to all N variables.
Now we can see how quantifier elimination lets us investigate theories, not just play with formulas.
The main algorithm behind quantifier elimination is cylindrical algebraic decomposition (CAD). The steps of a CAD are themselves a proof (2016: 35), and the fewer steps it takes, the shorter and more intelligible the proof. CAD actually has a nice geometric interpretation — in a word: “Removing existential quantifiers from the formula defining a set in ℝⁿ is the algebraic equivalent of projecting that set into the space of free variables”, or on the origin if there are no free variables (2018: 6).
For the gory details, the most cogent explanation I’ve found is from Caviness & Johnson (1998: 2):
The CAD method for QE can be briefly described as a method that extracts the polynomials occurring in the input formula (having first trivially reduced all atomic formulas to the forms A = 0 and A > 0) and then constructs a decomposition of real r-dimensional space, where r is the number of variables in the formula, into a finite number of connected regions, called cells, in which each polynomial is invariant in sign. Moreover these cells are arranged in a certain cylindrical manner. From this cylindrical decomposition it is then quite straightforward to apply the quantifiers by using a sample point from each cell to determine the (invariant) truth value of the input formula in that cell. This application of quantifiers reveals which cells in the subspace of the free variables are true. It then remains to construct an equivalent quantifier-free formula from this knowledge. In Collins’ original QE method this problem was solved by a method called augmented projection that provided a quantifier-free formula for each of the true cells.
Don’t worry if you didn’t get all that. The main takeaway is the rather beautiful idea that SMT solvers in computer science and projection in algebraic geometry are just different perspectives of the same automated reasoning problem (Mulligan, 2018: 8).
The main reason quantifier elimination is seldom used its that its complexity is double-exponential
(\(\mathcal{O}[d^{2^{(2n+8)}} \!* m^{2^{(n+6)}}]\)), where d is the highest power (degree) in the polynomial, and we see that its exponent has an exponent, meaning processing time increases really fast as d gets larger.
Yet, computational complexity measures worst-case behaviour, which can be much better in practice — especially when most economic problems have low degree (x³ at worst). Order of variables also matters for CAD’s time and proof length, and since the number of combinations is n!, we can only check a few. Interestingly, recent research uses machine learning to find an order that will work well.
The CAD algorithm can be improved by ignoring irrelevant cases (cells), constructing a ‘partial CAD’. Other tricks take advantage of repeating substructures. Virtual term substitution sounds especially promising for economic problems, whose degree tends to be low, and which are ‘sparse’, i.e. “most variables are absent from most of the polynomials in the Tarski formula” (Mulligan, 2018: 22).
We saw above that deductive reasoning can be thought of as a process of quantifier elimination; likewise, if–then statements are implicitly just eliminating ‘for all’ quantifiers from a True
sentence ∀x[P(x) → Q(x)] (Mulligan, 2018: 1 & 28). Quantifier elimination goes well with economic theory because much of economics is counterfactuals about polynomial equations and inequalities.
Mulligan and his coauthors have assembled over 45 problems that can be solved in this way, ranging from the Laffer curve to arguments about the gender gap for wages (nonparametric Roy model). While above I focused on the implementation, their goal is to create a domain-specific language so economists can simply plug in theorems and test them. They use Mathematica because it has a nicer visual display for calculus, but also have code for other SMT solvers such as Z3, Redlog, and Maple.
Mulligan makes a provocative claim that “Published theoretical results should be coded and made available[, just] as empirical economists are already expected to do with data-processing code” (here, around 44:20). He also frequently compares theorem-provers to matrix inversion — many famous economists such as Paul Samuelson got RA jobs that were simply inverting matrices, which (happily!) is now entirely done by computers, and no-one would even think to verify them by hand.
In fact, Mulligan is writing a grad-level textbook that leans heavily on quantifier elimination. I think this is amazing, and may be the start of a really important change in how economics is done.
Curiously, revealed preference arguments — very tedious to do on paper — are far more amenable to quantifier elimination than the more standard pedagogical method of inspecting first-order conditions (‘local analysis’). Likewise, more ‘global’ forms of analysis are often easier to deal with, while specific functional forms are intractable. Statements about Cobb-Douglas production functions (of the form ANᵅKᵅᐨ¹) are polynomial inequalities, but it’s often easier to treat them using functional forms like f(n), since the CAD algorithm doesn’t work if α is a variable, and fractional exponents can take immense amounts of time (5 times longer with \(n^\frac{5}{8}\), 3000 times longer with \(n^\frac{23}{30}\)).
From the other direction, solving these problems led to some new tricks for encoding integrals, and vectors with an indeterminate number of elements (e.g. number of goods in an economy) via statements about their dot product. The pollination may well go both ways.
A paper as recent as 2014 claimed that five variables were a practical limit for quantifier elimination methods. By contrast, Mulligan’s problems have on average 19.2 polynomials and 17.2 variables (Mulligan et al., 2018: 10). That said, at least one problem with 18 variables couldn’t be solved even in 5 days of processing (2016: 29-30), illustrating just how bad double-exponential complexity can be.
Most compelling for me is the other avenues that quantifier elimination in economics could open up. By adding slack variables to our Tarski formula, quantifier elimination lends itself to polynomial optimization: maximizing a polynomial subject to polynomial inequality constraints (Caviness & Johnson, 1998: 1). Further, it can handle complex scheduling problems, where some machines can process several tasks in parallel, or some jobs require more than one machine in parallel. Even more curiously, it allows hierarchical scheduling, where in two steps “a second objective function is optimized under the assumption of an optimal solution wrt. a first objective function” (Dolzmann, Sturm, & Weispfenning, 1999: 237-8). Quantifier elimination may well turn out to be a gateway drug.
Mulligan’s favorite example is from Paul Krugman in the New York Times, to the effect that whenever taxes on labor supply are primarily responsible for a recession, then wages increase. With much schadenfreude, Mulligan shows by running this through a theorem prover that this is only so when “labor supply is at least as sensitive to wages as labor demand” (Mulligan et al., 2018: 3-4).
Cursory familiarity with economics culture drives it home that this will only catch on if it has an idiot-proof point-and-click interface. It’s hard enough to get economists to use LaTeX or anything more advanced than Stata, and even this is orders of magnitude better than elsewhere. Ideally, it should be just as easy for a theorist to verify a theorem as it is for an applied economist to run a regression.
A formal proof means the difference between reading a paper by an independent scholar and having faith in the results, versus relying on the brand names of academic old boys’ clubs. It means clarity: thinking from the bottom up. And my favorite, mechanized proof is an intellectual foundation for cognition-curdling fuckery that would otherwise be dismissed purely out of boorishness.
On @deepchimera’s recommendation, I came to discover the data visualizations of Giorgia Lupi. In her book Dear Data, she and a coauthor send postcards with elaborate graphs about some feature of their past week. My favorite part was following along and tracing out in my head how I might replicate each graph in code. I’ve gotten some flack for making diagrams in TikZ rather than using a staple like Tableau, but non-standard diagrammatics is where TikZ really shines.
For this graph, the author recorded her feelings of envy. Here is the original, minus the legend:
And here is my reproduction (pdf here):
My TikZ code is available here, so anyone who wants can make their own version. The biggest pain is that your topics of envy will differ from the author’s, so you’ll have to choose new colors and labels.
Beyond that, the main data points you will need are the following:What I mean by ‘continuous scale’ is that it can include decimals, like 5.25. So if it’s easier, scales from 1 to 10 are fine, and you can normalize it afterwards (multiply by 2 or 1.2). Both 20 and 12 are multiples of 4; this way, with 4 levels for frequency and attainability, they fit on a grid (not shown).
The spirals are mostly an artistic flourish (circles would be fine), but give it style. I like how the x-axis focuses on frequency rather than order. The most creative part is the y-axis (attainability), which makes the graph far more poignant. In the bottom-left quadrant, the author fleetingly covets some feature of a celebrity who she can never measure up to. In the top-left quadrant (note the leftmost whitespace), she sees a nice article of clothing that she could just buy herself. The bottom-right is largely dissatisfaction with her personality, and the top-right is mainly her English.
I thought of a few possible variations to make the chart even more elaborate, if desired.rotate=-90
to any angle you want
I’m too shy to do my own version, but if anyone else is brave enough, give me a shout and I’ll post it!
]]>