Credits
Chenoe Hart is an architectural designer, researcher and writer whose work investigates how technology is transforming the physical environment.
A man is walking down a crowded street at night, when a machine unexpectedly turns to greet him. Inside a glowing store window is an Apple iMac, its LCD screen mounted on a user-adjustable arm above a whimsical dome containing its processing hardware.
In this 2002 Apple ad, the computer’s screen miraculously swings to follow the man’s movement as he walks past. He pauses, surprised. He starts to step away from the window, and the screen follows him as he turns. He shakes his head, and the computer shakes its screen; he jumps, the screen bobs up and down. He sticks out his tongue; the computer opens its CD-ROM drive.
The implication is clear: We’ve long desired technologies that intuitively respond to our interactions. As we look at our screens today, it’s easy to wonder, how soon before they look back at us?
We’re already making eye contact with our devices, to some degree. Our iPhones with Face ID process our gaze to unlock them. Newer cars may recognize your face and greet you as you sit behind the wheel. Meanwhile, AI assistants speak to us inside our homes. Our user interfaces seem to be becoming double-sided as they grow more responsive. In the case of AI agents, AIs can even act as autonomous participants in the world on behalf of their users. Soon, UX, or user experience, design could become a two-way street — XU design might join our list of tech acronyms — enabling computers to also experience their users.
AIs already act as though they experience us. Chatbots are engaging us in involved, ongoing conversations as new AI advances emerge. OpenAI described its Operator service as an agent that, like a human computer user, could “look at a webpage and interact with it by typing, clicking, and scrolling.” Instead of accessing online content through computer-native data transfer protocols, OpenAI’s agent is now a feature absorbed into ChatGPT that can “use the same interfaces and tools that humans interact with on a daily basis,” introducing a capability the company suggested “broadens the utility of AI.”
Reverse Skeuomorphic Perception
Human-only interfaces are already increasingly being used by both people and computers. But today’s interfaces are generally designed to assume a user in analog physical space is operating them. Skeuomorphic design reinforces this by representing a computer’s internal functions via physical metaphors. We access files via folders located on a desktop. Skeuomorphic graphics often proliferate during moments of technological change; embedding references to the past within the interfaces of new products can help with “easing the transition from the old to the new,” in the words of UX pioneer Don Norman. But skeuomorphism can also conceal a technology’s true nature: As the writer Clive Thompson advocated in a 2012 Wired article critiquing the early iPhone’s imitation paper and leather-stitched graphics, lingering on outdated metaphors could mean that “we’ll fail to produce digital tools that harness what computers do best.”
As today’s computers transition into being able to perceive our world, they appear to be seeing it through their own reverse skeuomorphic analogies. AI systems comprehend physical phenomena via existing computational metaphors. When the sensors in a driverless car observe a street, for example, its computer vision systems must process the raw data its cameras collect to make it legible for computational comprehension. The cars, pedestrians and bicycles that its cameras detect have their outlines surrounded by rectangular bounding boxes (or more detailed outlines in newer systems), and the content inside each region is assigned a label. The label is a digital representation of its original offline source material, an informational tag written in a more familiar language for the computer. Just as we understand internal computer operations through physical analogies like our desktop icons, the computer understands offline phenomena through its own computationally efficient translational lens, allowing it to see physical objects as text that can fit into a database.
AIs exhibit corresponding limitations in how they see. When the artist and media scholar Elisa Giardina Papa signed up to work on manually classifying images to train an AI as part of her research, she found the task involved an underlying regime of binary decision-making. As she described in her 2022 book“Leaking Subjects and Bounding Boxes: On Training AI,” to perform the task of labeling an image of a woman sitting on a couch, “I had to outline the contours of both the couch and the woman using polygonal circuits — bounding boxes — and then label them distinctly as ‘couch’ and ‘woman.’”
“We’ve long desired technologies that intuitively respond to our interactions. As we look at our screens today, it’s easy to wonder, how soon before they look back at us?”
The system glitched when the pattern on the woman’s T-shirt resembled a pattern on the couch. The AI became confused. In her book, Giardina Papa describes what she imagined its perspective might be: “the algorithm for which I was cleaning data could not identify where the piece of furniture began and where the woman ended; the image of the woman was leaking into the couch, and that of the couch was leaking into the woman. From an algorithmic point of view, the image resided in an undivided queer category of couch/woman.”
The two subjects were insufficiently binary or different to be separated. The training work of “segmenting, tracing, bounding-boxing, and labeling,” to separate “signal from noise, and orderly things from disorderly ones,” required, she concluded, artificial simplification at the root of its operation.
Similar detection challenges occur when AIs encounter other kinds of real-world content; transparent objects have proved difficult for robot vision systems to distinguish from their surroundings, according to the authors of a 2023 survey article, because they lack “salient surface features such as colour and texture, making their appearance highly dependent on the image background.” A photograph showing a familiar object like a chair in a less familiar orientation, like being turned on its side, can also be challenging for an AI to perceive.
AI vision systems more generally perceive the world in terms of average and standard representations. The artist and AI researcher Eryk Salvaggio, writing in a 2022 blog post about his experience taking photographs during nature walks to train a generative adversarial network (GAN) for creating new images, found himself departing from a photographer’s normal artistic instincts to “look for breaks in patterns” and “unique subjects.” He was instead compelled to seek “the patterns surrounding eruptions of variation” to capture a consistent set of images for the AI to understand.
The eye-catching appearance of a mushroom, for example, had to be dismissed as an anomaly against the background continuity of the forest floor. Although Salvaggio’s project was conducted using an older method of AI image processing, he told me this would still hold “fundamentally true” for newer imaging systems, including those used by people collecting images to train the gaze of diffusion-based models used to generate images today.
New Forms Of Empathy
As we continue to use AIs that see our world through such unfamiliar methods, we might find ourselves compelled to engage with a concept interface designers already rely on heavily: empathy. One relevant historical precedent for empathizing with radically different nonhuman perspectives might be found in the writings of the biologist Jakob von Uexküll, whose pioneering 1934 text “A Stroll Through the Worlds of Animals and Men” described a distinctly empathetic framework for considering unfamiliar forms of intelligence.
In Uexküll’s well-known example, a tick, perched on the tip of a blade of grass and perceiving the world through pressure, smell and temperature, would have a profoundly unfamiliar experience of the world — a unique, all-encompassing experience that he called the insect’s Umwelt — compared to our own. He was ahead of his time in calling for his readers to empathize with an animal’s perspective, proposing an approach to studying nature by considering “the world as it appears to the animals themselves, not as it appears to us.”
Such an approach would align with some user experience discourses: An AI might not need to have genuine emotional experiences (as Salvaggio, in a 2025 article, criticized as a fallacious expectation applied to AIs) for their perspectives to fit within a sufficiently empathetic user experience methodology. We could especially accommodate an AI’s unfamiliar worldview by leveraging a “cognitive” rather than emotion-based definition of empathy, as denoted by design researcher Andrea Alessandro Gasparini in a 2015 paper. Where “emotional empathy” involved an “instinctive, affective, shared and mirrored experience,” Gasparini wrote, we might interpret his cognition-based alternative as only requiring a more Uexküll-esque definition: that “one understands how others may experience the world from their point of view.”
Recent technical research has found approaches to working with AIs that acknowledge their alternative perceptual capabilities. In a 2025 study, researchers improved an AI’s ability to understand low-light photographs by optimizing the images for a machine rather than a human. Where conventional image processing would remove noise from photographs, their method retained it; pixels that provided no legible information to a human observer could still contain retrievable information for an AI’s pattern-recognition capabilities. The method left purple artifacts in the processed images that the researchers noted: “may not be visually pleasing for human vision.” But these artifacts also significantly improved the computer’s ability to decipher poses of photographed people compared with its performance when interpreting photos brightened under models tailored for pleasant viewing by human eyes; the AIs performed nearly as well as humans tracing well-lit photos of the same poses.
“AI vision systems more generally perceive the world in terms of average and standard representations.”
In this situation, the goals of good UX and good XU design — of producing a good experience for people versus a productive one for an AI — are at odds. The same image could not be formatted to provide an optimal view for both humans and computers.
Human Compatibility
AI perceptual differences could become particularly relevant to us as we begin to share more of our physical environment with AIs. AI-powered humanoid robots are a booming new area of technological investment, and even Apple is now said to be developing a robot. Physical embodiment could soon become a literal part of how we relate to technology. Once that happens, real-life objects could perpetuate similar forms of illusory emotional affect to those which some users already encounter through chatbots: Salvaggio speculated in his interview that “we tend to project personality and self-awareness to objects that write, but also objects that move,” citing the emotional attachments some people already exhibit toward their cars.
Just as our physical environment now contains automotive infrastructure built for moving vehicles rather than pedestrians, in the future we will likely see new infrastructure to accommodate robots. Portions of our everyday world are already designed for machine perception, such as the QR code menus that proliferated during the Covid pandemic. These are somewhat inconvenient to use; you must deliberately position your phone to read them. In the future, a restaurant menu might automatically appear on your phone after an AI monitoring the security camera detected you sitting down.
Similar hopes of effortless convenience informed the concept of Amazon’s “Just Walk Out” retail technology. Customers at Amazon Go prototype stores were offered the convenience of simply leaving the store with their purchases. As you shopped, a computer-vision-equipped system tracked the items you picked up and automatically charged you as you exited.
But the system didn’t work very well. As Theo Wayt for The Information reported in 2022, “Amazon had more than 1,000 people … whose jobs included manually reviewing transactions and labeling images from videos,” requiring “about 700 human reviews per 1,000 sales.” Although AI vision capabilities have grown more advanced in recent times, for instance, allowing robots to navigate without the aid of traditional QR code markings in many warehouses, in early 2026 Amazon announced they were shutting down their retail experiment.
When Amazon’s in-store computer vision system did work, it had additional limitations. Its performance was sensitive to the store layout, which had to be optimized for the AI’s perception. Wayt described how these tech-equipped stores required extra staff to ensure that “shelves are kept arranged so the scanning technology can work.” A customer placing a product back on the wrong shelf, for example, could throw off the visual perception system connected to the store’s cameras. The system failed when it had to comprehend objects found outside of their expected physical locations.
The potential need to change some of our activities to make them easier for AIs to comprehend becomes especially concerning when AIs misperceive aspects of the human social world. In too many cases, socially constructed biases in an AI’s data sets can lead to unequal treatment of their users. The robotics researcher Tom Williams described in “Degrees of Freedom” how voice assistants have been found to disproportionately misunderstand users speaking with accents or in dialects, whose speech falls outside a narrow set of expectations. Drawing on insights from the interaction researcher Christina Harrington’s investigations of voice assistant use, Williams described a scenario in which a user who preferred to interact with a robot in African American Vernacular English might have to “consciously Whiten their speech” to be understood, in a form of “technology-directed code switching.” Those types of interactions positioned “White speech” as a norm against which other speech deviated, potentially leading users to feel that a technology was “not made for them.”
Williams noted, however, that heavy-handed attempts to improve the diversity of AI training datasets could also lead to inappropriate surveillance. The New York Daily News reported on a 2019 incident in which employees of a Google subcontractor claimed they were told to invite people with “darker skin tones” to play a phone game while their video footage was recorded. They also said they were told to distract and rush participants through the survey’s permission forms to mask the fact that they were being recorded. A Google representative said the data collection was conducted to embed “fairness” into a new smartphone’s face unlock feature, noting it was “critical we have a diverse sample.” But Williams argued that if diversifying an AI dataset required exploitative collection methods, maybe computers shouldn’t be identifying people by demographic categories in the first place.
“AI perceptual differences could become particularly relevant to us as we begin to share more of our physical environment with AIs.”
AI systems that try to categorize people today can inherit biases from the more oppressive agendas under which some forms of social categorization were once developed, Williams wrote. A Microsoft dataset of facial images used for computer vision training, discovered by the AI ethics researcher Morgan Klauss Scheuerman, seemed to divide the world’s entire population into three racialized categories using terminology that Williams said evoked “19th-century scientific racism.”
Instead of expecting a computer vision system to guess and categorize sensitive characteristics such as a person’s ethnicity or gender, Williams proposed that people could use voluntary markers to help orient the AI, “such as wearable QR tags, RFID tags, or smartphone applications,” to avoid being tracked without permission or inaccurate assumptions. One might imagine an individual wearing a pin or a name tag that would voluntarily disclose their pronouns to the system, eliminating the risk of the AI accidentally misgendering someone. Preemptive human intervention could compensate for gaps in AI data on historically marginalized social groups; if people voluntarily disclosed their data, technology companies wouldn’t need to deploy their own initiatives to collect it. Data voluntarily shared might be less likely to be influenced by external biases or to contain content collected in unauthorized ways, albeit at the risk of requiring new kinds of labor to more directly provide AIs with information.
Understanding AIs means recognizing both the social and technical limits to their capabilities and uses. Williams’s proposed opt-in QR codes might allow us to bypass our reliance on AI in certain social situations, much like human overrides are used to assist AIs navigating events occurring outside the boundaries of their limited data sets regarding our physical world. Waymo has long relied on remote operators — a human might step in to guide a driverless car through a weather condition it hasn’t encountered before — to ensure safety in perceptually-ambiguous situations.
The reductive demographic categorizations that Williams saw a need to override in computer vision datasets could, meanwhile, be compared to the oversimplified visual nature of a bounding box. Williams told me he referenced the bounding box as a “direct corollary for computer perception,” representing how humans were “typically visualized” by computers, which also simultaneously symbolized how “computer vision technologies in robotics serve to bound people’s behaviors” and “sort people into boxes.” Sorting gives AI systems an artificial perspective on the world and on their users, even as we are starting to use AIs in more situations where their decisions have real-world impacts.
Perceptual Understanding
In March 2018, an automated Uber test vehicle carrying a test driver in Tempe, Arizona, struck and killed the pedestrian Elaine Herzberg as she was crossing a deserted road at night. According to a subsequent NTSB report on the collision, the vehicle’s systems detected Herzberg’s presence on the road 5.6 seconds before the vehicle struck her, but the car did not slow down or activate its brakes. The vehicle was being inadequately supervised by its single inattentive safety driver, who also did not notice her in time either.
AI categorization failures played a significant role in the accident: the system detected Herzberg’s presence and tracked her movement, but it “never accurately classified her as a pedestrian or predicted her path,” preventing the system from recognizing a collision risk while there was time to brake or steer away. Uber’s software instead “initially classified the pedestrian as a vehicle, and subsequently also as an unknown object and a bicyclist.”
The shifting of the system’s classification labels suggests that it might have potentially misperceived her in a manner similar to the errors described in Giardina Papa’s examination of AI image subjects: Herzberg’s activity of pushing the bicycle could have potentially rendered her appearance as being different from a standard pedestrian body, making her presence harder to accurately categorize. Her behavior also fell outside the system’s narrow set of expected standards, since Uber’s software lacked specific provisions for tracking pedestrians who were jaywalking rather than using crosswalks.
As AIs become embedded in an increasing variety of systems that involve physical automation, we are using them in more situations that entail physical risk. When a robot misperceives a person or a command, Williams noted, its actions can have irreversible physical consequences, whereas an inaccurate software program command can often be undone. It’s especially important for us to get the challenge of AI-human perception right as AI systems become more embodied, in what the company NVIDIA anticipates could be an upcoming era of “physical AI.” We will need to take the time to ensure that future safety-critical physical AI systems are designed and trained to adequately comprehend potential nuances and uncommon events that can occur as they experience the world.
“AI systems that try to categorize people today can inherit biases from the more oppressive agendas under which some forms of social categorization were once developed.”
Today, technology companies developing automated vehicles use more detailed reference points to monitor their surrounding streetscapes. In a 2019 blog post, NVIDIA described how its automated vehicle vision systems went beyond the simple categorization of objects via bounding boxes into finer-grained image content segmentation because “in the real world … not everything fits in a box.” The now-widely used technique they adopted, panoptic segmentation, could recognize objects with indistinct boundaries (such as road surfaces and tree canopies) while also more easily recognizing individual cars and people. That technique enabled NVIDIA to better identify intersecting and overlapping objects, such as a car behind a tree, road debris or even a person unloading a van — the latter being a potentially similar visual situation to the activity of the pedestrian that Uber’s earlier system had failed to identify.
Additional kinds of specialized systems can also enhance perception. Waymo described one such approach in a 2022 company blog entry:
“Historically, computer vision relies on rigid bounding boxes to locate and classify objects within a scene; however, one of the limiting factors in detection, tracking, and action recognition of vulnerable road users, such as pedestrians and cyclists, is the lack of precise human pose understanding. … For example, a bounding box won’t inherently tell you if a pedestrian is standing or sitting, or what their actions or gestures are.”
Waymo’s system instead digitized its subjects in human-specific terms: representing their bodies as point-and-line virtual armatures that evoked a high-tech version of a child’s stick-figure drawings. Those armatures offered a computationally efficient means of tracking a person’s moving limbs, enabling the system to decipher each person’s pose as they stood or walked. By monitoring pedestrians at that level of detail, a driverless car could “gain a deeper understanding of an individual’s actions and intentions, like whether they’re planning to cross the street,” the company stated. The armatures could also help Waymo navigate uncertain perceptual cases, by tracking the movement of “partially occluded objects, such as just a leg or arm of a person stepping out of a vehicle,” and by following the position of a person’s head, which “often indicates where they plan to go.”
Where a bounding box abstracted away the details of the world as it converted its contents into a database entry, Waymo’s digital tracking preserved some of them with its reference points grounded in the physical domain of human anatomy, allowing for a more detailed level of offline information to become computationally understandable.
The idea of embedding an understanding of humans into a computer system can be conceptually traced back to the origins of consumer UX design: arguably, Apple’s development of the Macintosh computer with its transformative release of a user-centered skeuomorphic interface. “Since computers are so smart,” an Apple advertisement at the time stated, “wouldn’t it make sense to teach computers about people, instead of teaching people about computers?”
This was only a metaphor at the time. But their statement described an early informal version of what we might now call a UX design process: the computer’s interfaces were designed with a new emphasis on usability. Meanwhile, today’s AIs engage in literal machine learning to acquire information about humans, and AI chatbots are often deployed on business websites in the same way a UX designer is, with the goal of making an interface simpler and easier to use.
Apple’s motives for designing the Macintosh, at least as described in their ad, suggested an additional layer of complexity. Learning to use a conventional pre-Macintosh computer was depicted as a difficult process, one that might involve “falling asleep over computer manuals.” As a result, “not very many people wanted to learn” about computers, and within Apple’s rhetoric, responsibility for learning was metaphorically transferred onto the computer itself. The Macintosh’s design aimed to eliminate a human’s need to learn how to operate it; it was supposedly “so easy to use, most people already know how.” Whenever AIs today are set up to interact with oversimplified reverse skeuomorphic representations of the analog human world that reflect computation’s familiar norms of binary categorization, they are similarly missing out on more detailed lessons to learn about us humans.
Future Lessons
More mutual learning could benefit both parties. Recent generations of students raised on smartphones now struggle to understand the basics of computer operation, like navigating a file system. As AI-equipped computers begin to learn about us, we can meet them partway by maintaining an awareness of how they think.
“As AI-equipped computers begin to learn about us, we can meet them partway by maintaining an awareness of how they think.”
From an XU perspective, AIs could also learn more effectively through richer human engagement. AIs should be able to access the human world through the equivalent of a Linux command line (with oversight to block harmful commands) in addition to a pinch-and-twist touchscreen. Examples of deeper interfaces AIs could use to experience the world might be found in Waymo’s anatomical models, in digital models documenting real-world infrastructure (known as digital twins) created to help computers manage the operations of a smart building or a dark factory, or in algorithms now being developed to enable robots to estimate the weights of objects as they handle them. Such representational affordances could make more information about the world available to an AI than it would otherwise see.
The knowledge an AI could gain from learning more about our world might be akin to what AI theorists call an AI’s world model, a coherent and consistent internal representation of the external world that they hope AIs will someday be able to retain to understand us more accurately. When we imagine how those world models might work with empathy for the AIs, it’s likely that an AI’s world model will need to be translated to match their distinctive Umwelt and their perceptual differences.
Whether by further researching how AI systems work or by optimizing AIs to better perceive us, our process of learning to live alongside AIs will, unfortunately, not be as sleek and simple as swiping through an iOS interface. Our future path requires us to get our hands dirty as we work to entangle both cultural and computational codes.

No comments yet. Be the first to comment!