Welcome to the Convivial Society, a newsletter about technology and culture. Many of you are receiving your first installment after finding your way here from my conversation with Sean Illing about attention. Welcome aboard. I was grateful for the invitation, and I thoroughly enjoyed the conversation. You can listen to it here or here. While you’re there, check out the recent interviews with Dr. Gabor Maté and Ian Bogost.
In this installment, I offer some thoughts on AI generated images … finally. I think it was about a month ago that I first mentioned I was working on this post. It took awhile to come together. As per usual, no hot takes contained within. This will be me thinking about what we’re looking at when we’re looking at AI-generated images and how this looking trains our imagination. Or something like that.
This past summer, the image above, titled “Théâtre D’opéra Spatial,” took first prize at the Colorado State Fair. It was created by Jason Allen with Midjourney, an impressive AI tool used to generate images from text prompts. The image won in the division for “digital art/digitally manipulated photography.” It also prompted a round of online debate about the nature of art and its future. Since then you’ve almost certainly seen a myriad of similar AI generated images come across your feed as more and more people gain access to Midjourney and other similar tools such as DALL-E or Stable Diffusion.1 About about month or two ago, on my little corner of the internet, the proliferation of these images seemed to plateau as their novelty wore off. But this does not mean that such tools are merely a passing fad, only that they may already be settling into more mundane roles and functions: as generators of images for marketing campaigns, for example.
The debate about the nature and future of art might have happened anyway, but it was undoubtedly encouraged by Allen’s own provocative claims in interviews about his win at the State Fair. They are perhaps best summed up in this line: “Art is dead, dude. It’s over. AI won. Humans lost.”
I’m not sure we need to necessarily defend art from such claims. And if we were so inclined, I don’t think it would be of much use to perform the tired litany of rehearsing similar claims about earlier technologies, such as photography or film. Such litanies tend to imply, whether intentionally or not, that nothing changes. Or, better, that all change is merely additive. In other words, that we have simply added something to the complex assemblage of skills, practices, artifacts, tools, communities, techniques, values, and economic structures that constitute what we tend to call art. They fail to understand, as Neil Postman once put it, that technological change is ecological rather than additive. Powerful new tools can restructure the complex techno-social ecosystem we call art in sometimes striking and often unpredictable ways. Even if we don’t think a new tool “kills” art, we should be curious about how it might transform art, or at least some of the skills and practices we have called art.
Others might argue in reply to Allen’s rash declaration that this new form is art, or maybe that there is an art to the construction and refinement of prompts that yield the desired images. Alternatively, they may argue that this present form of the technology is only one possible application of the underlying capacities, which might be harnessed more cooperatively by human artists. For example, Ethan Zuckerman wrote, “Jason Allen is trolling us by declaring art is dead. Instead, a new way of making art, at the intersection of AI and human skill, is being born.” Some others might even insist, less convincingly in my view, that, in fact, humans win because there is more stuff to go around. If some images are good, then more images are better. If only certain people could develop the skills to draw, paint, or design with digital tools, better to empower everyone with the machine-aided capacity to produce similar work. I’m not sure about any of that. Maybe the proliferation of images will prove alienating. Maybe the alien or hybrid quality of this work will fail to yield the same subjective experience for those who encounter it. Maybe doodling anonymously in notebooks no one will ever see turns out to be more satisfying for some people.
Back in September, John Herrman noted that the “flood of machine-generated media” had at least raised the caliber of the discourse around AI:
In contrast with the glib intra-VC debate about avoiding human enslavement by a future superintelligence, discussions about image-generation technology have been driven by users and artists and focus on labor, intellectual property, AI bias, and the ethics of artistic borrowing and reproduction.
As Herrman approvingly observed, most of the debates about the ethics of AI-generated art have thus far focused on justice for the artists, both living and dead, on whose work these models are trained and those whose labor might be displaced because of their success. These are legitimate and significant areas of concern. You can follow some of the links in the Herrman block quote above to read more about such matters.
I find that my own questions, as they have gradually come to me, are a bit different. I’ve been thinking about matters of depth and also about how these images might train our imagination. Along these lines, I appreciated the reflections of another digital artist, Annie Dorsen.2
“When tinkerers and hobbyists, doodlers and scribblers—not to mention kids just starting to perceive and explore the world—have this kind of instant gratification at their disposal,” Dorsen argues, “their curiosity is hijacked and extracted.” “For all the surrealism of these tools’ outputs,” she adds, “there’s a banal uniformity to the results.” She went on to write that “when people’s imaginative energy is replaced by the drop-down menu ‘creativity’ of big tech platforms, on a mass scale, we are facing a particularly dire form of immiseration.”
What exactly does such immiseration entail? Allow me to quote Dorsen at length:
By immiseration, I’m thinking of the late philosopher Bernard Stiegler’s coinage, “symbolic misery”—the disaffection produced by a life that has been packaged for, and sold to, us by commercial superpowers. When industrial technology is applied to aesthetics, “conditioning,” as Stiegler writes, “substitutes for experience.” That’s bad not just because of the dulling sameness of a world of infinite but meaningless variety (in shades of teal and orange). It’s bad because a person who lives in the malaise of symbolic misery is, like political philosopher Hannah Arendt’s lonely subject who has forgotten how to think, incapable of forming an inner life. Loneliness, Arendt writes, feels like “not belonging to the world at all, which is among the most radical and desperate experiences of man.” Art should be a bulwark against that loneliness, nourishing and cultivating our connections to each other and to ourselves—both for those who experience it and those who make it.
Not surprisingly, I was struck by the reference to Arendt. The world for Arendt is not simply coterminous with the earth. It is rather the relatively stable realm of human things that welcome and outlive each generation. It mediates human relationships and anchors our sense of self. Through our participation in the plurality of the common world of things, we enjoy the consolations community. To be alienated from the world is to find ourselves lonely and isolated—and it is to lose ourselves, too.
I’m not sure if this is exactly what Dorsen had in mind, but here’s how I would apply this strand of Arendt’s thinking. (Stay with me, it will seem as if I forgot about Arendt, but we’ll get back to her!) I’ll begin by noting that when I first glanced at Allen’s “Théâtre D’opéra Spatial,” I was taken in by the image, which struck me as evocative and intriguing. But as I came back to the image and sat with it for a while, I found that my efforts to engage it at depth were thwarted. This happened when I began to inspect the image more closely. As I did so, my experience of the image began to devolve rather than deepen. When taken whole and at a glance, the image invited closer consideration, but it did not ultimately sustain or reward such attention.
This is not only because the image appeared to fail in some technical sense—hands, for example, seem to give these models trouble—it is that these errors, aberrations, or incongruities are, in a literal sense, insignificant—they signify nothing. They may startle or surprise, which is something, but they do not then go on to capitalize on that initial surprise to lead me on to some deeper insight or aesthetic experience.
Rob Horning has made a similar observation in his recent comments about generative AI focused on ChatGPT. “AI models,” Horning observes,
presume that thought is entirely a matter of pattern recognition, and these patterns, already inscribed in the corpus of the internet, can [be] mapped once and for all, with human ‘thinkers’ always already trapped within them. The possibility that thought could consist of pattern breaking is eliminated.
This also hints at how, as I wrote last summer, we seem to be increasingly trapped in the past by what are essentially machines for the storage and manipulation of memory. The past has always fed our capacity to create what is new, of course, but the success of these tools depends on their ability to fit existing patterns as predictably as possible. The point is to smooth out the uncanny aberrations and to eliminate what surprises us.
“The best art isn’t about pleasing or meeting expectations,” as Dan Cohen has put it in a recent essay about generative AI. “Instead, it often confronts us with nuance, contradictions, and complexity. It has layers that reveal themselves over time. True art is resistant to easy consumption, and rewards repeated encounters.”
On the contrary, Cohen concluded, “The desire of AI tools to meet expectations, to align with genres and familiar usage as their machine-learning array informs pixels and characters, is in tension with the human ability to coax new perspectives and meaning from the unusual, unique lives we each live.”
Consider how, in The Rings of Saturn, W. G. Sebald interprets Rembrandt’s “The Anatomy Lesson of Dr Nicolaes Tulp.”
The dissected arm is all wrong, but this “error,” if we attend to it, leads us on to something vital. It invites a closer consideration of the significance of the scene being depicted, and it rewards such attention with critical insight and depth of meaning. Rather than straightforwardly depicting a step in the grand advance of scientific knowledge, Rembrandt appears to raise a series of questions about the moral standing of the body, the ethics of the procedure, and the nature of vision—the participants have lost sight of the body before them because they have become dependent on its representation in the medical textbook that commands their attention.3
As we are in the midst of what amounts to a series of digressions before we get back to Arendt and loneliness, so let us take one more. In “The Idea of Perfection,” philosopher Iris Murdoch described “uses of words by persons grouped round a common object” as a “central and vital human activity.” What she is aiming at is the importance of developing a wide and diverse vocabulary to support sound moral judgment and showing how that vocabulary depends on a context of common objects of attention, but she gets us there by analogy to the art critic.
“The art critic,” she explains, “can help us if we are in the presence of the same object and if we know something about his scheme of concepts. Both contexts are relevant to our ability to move towards ‘seeing more’, towards ‘seeing what he sees’.” And so there is a place for the critic or historian who can, as we gather around “The Anatomy Lesson,” help us to see what is before us. Along with the formal aesthetic features of the painting, there are historical, legal, and social dynamics in play that we may not be able to perceive. While there is room for errors of judgment with regard to interpretation, it is meaningful to say that we can be moved by such conversations toward a deeper understanding of the meaning and significance of the painting. It would be difficult for me to imagine such a conversation taking shape around “Théâtre D’opéra Spatial.”
Now, I am prepared to grant that it is I who am missing something of consequence or that this conclusion merely reflects a failure of my imagination. If so, please correct me. It seems to me that one may discuss the technical aspects of the technologies that are yielding these images or how certain features of the image might have appeared or for the artist to explain the process by which they arrived at the prompts that yielded the image. This would be not unlike talking exclusively about the shape of the brush or the chemical composition of the paint. It does not seem to me that we can talk about the image in the way that we could talk about “The Anatomy Lesson” and find that we are moving toward a deeper understanding of the image in the same way. In part, this is because we cannot properly speak about the intentions of the artist or seek to make sense of an embedded order of meanings without making what I think would be a category error.
I think I can begin to tie these threads together by reference to a few lines from Eva Brann, a long-time tutor at St. John’s College, who, in a talk to incoming freshman introducing the program of great books, observed the following:
To my mind texts, like people, are serious when they have a surface that arouses the desire to know them and the depth to fulfill that desire. I think that for us human beings only depths and mysteries induce viable desire. Many a failure of love follows on the—usually false—opinion that we have exhausted the other person’s inside, that there is no further promise of depth.
I left the last sentence in because it’s worth thinking about independently of our present subject, but it’s this line that I’d like us to consider: “A surface that arouses the desire to know them and the depth to fulfill that desire.” That line has stuck with me over the years. And I thought about it again as AI-generated images proliferated on my screens, and especially as I thought about Allen’s work and his rash pronouncements about art.
By contrast, I recently found myself looking again at Pieter Bruegel’s “Harvesters” on Google’s Arts and Culture site. You will recognize the image as the one that I use as the header art for this newsletter. Bruegel is one of my favorite painters, chiefly for what I take to be his extraordinarily humane and earthy vision. The resolution of the image on Arts and Culture is extremely high, and you can zoom in to see minute details in the painting. When doing so, I was struck by this scene from the deep background of the image:
The detail is remarkable. This scene appears in the field slightly to the left of the painting’s center. You could look at this painting for a long time without noticing it. And, of course, that’s much easier to do when looking at a lower resolution image appearing on your screen than it would be were we to be standing in front of the painting itself, although even these fine details would only begin to emerge over time.4
This is one way of thinking about what it means for a work of art to have depth. You can press in, and it won’t dissolve under a more attentive gaze. Naturally, what it means to “press in,” as I put it, varies depending on the medium under consideration. In this case, it means looking intently until our looking is transformed into seeing. But I can imagine analogous modes of pressing in that would apply for music and text, for example, or in the case of taste and texture. Whatever mode these engagements take, they involve attention—Iris Murdoch’s “just and loving gaze directed upon an individual reality.”
I suppose, then, that these are the sorts of questions I have for us just now as we navigate the flood of machine-generated media: How will AI-generated images train our vision? What habit of attention does it encourage? What modes of engagement do they sustain?
The most important thing about a technology is not necessarily what can be done with it in singular instances, it is rather what habits its use instills in us and how these habits shape us over time. I recently wrote about how the skimming sort of reading that characterizes so much of our engagement with digital texts (and which often gets transferred to our engagement with analog texts) arises as a coping mechanism for the overwhelming volume of text we typically encounter on any given day. So, likewise, might we settle for a scanning sort of looking, one that is content to bounce from point to point searching but never delving thus never quite seeing.
We are, it seems, offered an exchange. Brann wrote about works that have a “a surface that arouses the desire to know them and the depth to fulfill that desire.” This suggests that there are surfaces that may arouse a desire to know more deeply but which do not have the depth to satisfy that desire. I think this is where we find ourselves with AI-generated art. And, at one level, this is fine, unless we find ourselves conditioned to never expect depth at all or unable to perceive it when we do encounter it. The problem, as I see it, is that we need these encounters with depth of meaning to sustain us, indeed, to do more than sustain us, to elevate our thinking, judgment, and imagination. So the exchange we are offered is this: in place of occasional experiences of depth that renew and satisfy us, we are simply given an infinite surface upon which to skim indefinitely.
But let us not forget Arendt! Dorsen, you’ll recall, argued that “when people’s imaginative energy is replaced by the drop-down menu ‘creativity’ of big tech platforms” they suffer a form of symbolic immiseration. Loneliness for Arendt, she noted, “feels like ‘not belonging to the world at all, which is among the most radical and desperate experiences of man.’ Art should be a bulwark against that loneliness, nourishing and cultivating our connections to each other and to ourselves—both for those who experience it and those who make it.”
The lack of depth, as I’ve called it following Brann, ultimately issues forth in a kind of loneliness. When I turn to Bruegel or Rembrandt, what I find, whether or not I am fully conscious of it, is not merely technical virtuosity, it is another mind. To encounter a painting or a piece of music or poem is to encounter another person, although it is sometimes easy to lose sight of this fact. I can ask about the meaning of a work of art because it was composed by someone with whom I have shared a world and whose experience is at least partly intelligible to me. Without reducing the meaning of a work of art to the intention of its creator, I can nonetheless ask and think about such intentions. In putting a question to a painting, I am also putting a question to another person. It is for this reason, I think, that Dorsen argues that art can be a bulwark against the loneliness of finding that we do not belong to the world at all. “Friendship,” C. S. Lewis wrote, “is born at that moment when one person says to another, ‘What! You too? I thought I was the only one.’” That moment, I’d argue, can happen through the mediation of a work of art just as surely as it can in conversation with my neighbor. At least as long as the ratio of human to machine intentionality, perhaps difficult to ascertain in practice, is not such that the human is altogether obscured.
For what it’s worth, one of the better descriptions I’ve encountered of how these applications work was provided by Marco Donnarumma, who is himself a digital artist and a machine learning researcher. “Figuratively speaking,” Donnarumma explained, “AI image generators create a cartography of a dataset, where features of images and texts (in the form of mathematical abstractions) are distributed at particular locations according to probability calculations.” “The cartography,” he goes on to say, “is called a ‘manifold’ and it contains all the image combinations that are possible with the data at hand. When a user prompts a generator, this navigates the manifold in order to find the location where the relevant sampling features lie.”
Thanks to Neil Turkewitz for bringing this piece to my attention.
I don’t think he ever cites this painting, but this is a striking illustration of Ivan Illich’s argument about a regime of vision in thrall of what he called “the show.” He traced its origins back to “the anatomists looked for new drawing methods to eliminate from their tables the perspectival ‘distortion’ that the natural gaze inevitably introduces into the view of reality.” You can read more on Illich and “the show” in this older installment.
I’ve cited her essay before, but I’ll mention it here again. In 2013, art historian Jennifer Roberts wrote about helping her students more wisely deploy their attention: “An awareness of time and patience as a productive medium of learning is something that I feel is urgent to model for—and expect of—my students.” And, as she observed, “in any work of art there are details and orders and relationships that take time to perceive.”
"So the exchange we are offered is this: in place of occasional experiences of depth that renew and satisfy us, we are simply given an infinite surface upon which to skim indefinitely."
This is such a brilliant encapsulation of the challenge of living in our technological age, and promise offered by technology in so many areas: dependable yet unsatisfying. So much of life-worth-living comes like a surprise during the 'waiting for', the boredom, the mundane.
I had a thought regarding the quality of these tools. We have certainly not seen the peak in their abilities, but I would argue the peak may be not too far away. The reason is fairly simple: in the future, what will the AI's have left to be trained from?
Today, all of these language and image models are trained from existing human art and communication. But as people begin and continue to integrate AI outputs into their work and daily lives, the content on the web will increasingly be reflective of the AIs themselves. Eventually they will be heavily influencing or directly creating nearly all online artifacts. As this process continues, the training data available with which to create and refine AIs will begin to form a feedback loop. The fundamental question is: is this feedback loop one of exponential decay or exponential growth? And, is there a limit?
In a game like chess or Go, where AIs don't need human signal but can instead compete against algorithms or themselves, exponential growth (with limits) is both possible and demonstrated. However, I believe this scenario is the opposite and I therefore fail to see how the quality of the models could do anything but decay. This is because the individuals who depend on the AI will become increasingly unable to be coherent without them, effectively removing themselves as relevant training data for how to improve human understanding.
Of course, the AIs will always be able to be trained from an impressive trove of archived data, but with no feedback loop I wonder how many of the technologist dreams are even possible. Perhaps more realistically, the legal hurdles of appropriating other's works may actually _require_ new sources of corporate owned data for many use-cases -- data which will become increasingly impossible to find once people are dependent on their AI tools.