Your Copresence is Requested

A lot of XR companies are springing up around the notion of copresence. How important it is, and why everybody wants it. There’s a lot of reasons. The current global health issues make us feel separate from each other. Remote work, or education in general, can feel isolating. Video communication tools have gotten very functional and have allowed many people to continue jobs remotely and see their relatives from afar. But those same tools have also stripped us of some of what it means to have people nearby, sharing their space with us as well as their ideas.

I believe copresence is actually NOT what you want, or at the very least not the ONLY thing. An analogy might be that while the foundation of your house is amongst the most important parts of the structure, nobody would suggest that a foundation is all you need.

For anybody needing a definition of copresence:

The feeling of being physically near other people. Specifically, the idea that physical proximity to other people will shape the thoughts and behavior of the group differently than if they are not present with each other.

The thing is, “physically near” is a vast understatement, and I feel is the base reason that copresence promises so much, but hasn’t quite delivered yet. In the real world, proximity begets many other things. A sense of focus, lots of nonverbal communication, a unified task, the collective group “vibe,” the confidence of group safety, feelings of belonging, and community, or even family.

Taken from this view, “proximity” adds up to a much richer concept, and we should expand our verbage. In a number of fields, there’s a term used for complex systems having an effect on each other in a strong manner when in proximity: entrainment.

In music, entrainment refers to ocilitory systems affecting each other (or nearby listening people) by aligning rhythmic elements. So people tap their toes to music almost unconsciously. Or they find that their walking gait falls into the rhythm of the music they’re listening to.

At the personal or social level, being entrained means the having a level of synchronization of attention, emotion, and behavior between actors. Crowds will spontaneously stop and all watch a street performer. A conversation slowly lifts the spirits of all the people involved as they align with a particularly exuberant speaker. Some board games are practically nothing but human entrainment, by giving people tasks to do, in a close physical space, with a specific emotional goal.

Entrainment is the level of interaction that physcial space sharing people can have, and remote folks are really hungering for. An actor’s perceived mutual entrainment level with another in-scenario actor is a much more informative and full sense of what people feel positively about when they have good “copresent” experiences, as opposed to “I could see and hear them, and they reacted to seeing and hearing me.” Yes, that’s a part. But it’s not everything.

Also note that you can become entrained with real people, machines, or even random characters. You can have copresence with a very non humanoid avatar (Sally shows up on a VR conference call “dressed” as a unicorn with a cowboy hat), a computer controlled entity of some kind (like the pithy shopkeeper in the Role Playing Game you’re playing), or really any character that can synchronize with you in a humanistic fashion. Even a disembodied Narrator can be copresent in some ways, although that might be intimidating to some. Hence the word “perceives” in the definition. Physical collocation is neither necessary nor sufficient for copresence. What’s important is the perceived alignment of behavior, emotion, and attention. So, as long as the disembodied voice speaks to similar areas of focus, emotional response, or desired actions…it will feel copresent with you, even without a body of any kind.

Many times you will hear companies touting copresence as some kind of magic elixir that makes everything better. Business meetings, classrooms, even phone calls will all be better with a sense of the other people being present in the room with you in some way. How true is this? To what degree does presence add to the experience, and how does our current presence fidelity measure up to real physical presence? What factors are getting lost? Lets dive in to these and other questions.

Presence, the State of Being There

While not 100% agreed upon, there are many cues that the mind uses to determine what’s real and what’s not. Some are useful for digital scenarios and some aren’t, or aren’t yet.

There are a number of Physical cues:

  • Visual cues: depth, motion, color, contrast.
  • Auditory cues: Interaural time and level, spectral (all deltas between ears).
  • Haptic cues. Texture, temperature, viscosity/density, conductance, proprioceptive, kinematic.
  • Obviously the other senses (smell and taste) have their own cues…but we aren’t close to them being useful for HCI work within a computing environment. Unless you’re in a specific installation and have done some heavy lifting to provide those types of input.

There are also a couple groups of Social Interaction cues that people use. The first group deals with how social context influences our presence behaviors; the other speaks to our behaviors when we determine others see us as present:

Contextual influence

  • Schemas. Social rules and template scenarios around different types of interactions. First date, for instance, or Job Interview.
  • Status
  • Social identity

Behavioral responses to our presence being seen

  • Persuasion
  • Social judgments
  • Attachment

As you read over those two lists, take a quick stock of our current digital communication styles and how many of these cues are available, or not.

  • Email. Completely asynchronous, gives you the person’s words.
  • Text/chats. While explicitly asynchronous, can be used in a real-time back and forth manner, giving more info. There’s also a culture of emojis used such that some emotionality comes through.
  • Telephone. I can now hear your words in your voice, so the listener can get most of the emotionality and tone.
  • Video call. I now hear your voice but can also see your face, giving me some facial recognition boost to the sentiment behind your words, as well as a smattering of nonverbal cues that I can pick up on. However there’s some “calibration issues” due to people’s cameras not being in the right spot for conversation to feel like they’re making eye contact, and also video calls can feel very oppressive, in that you’re in a spotlight of sorts, and thus a lot of people turn the video output of the chat off completely, thus negating the benefit.

When we make a new communication method, we should always ask ourselves: what channels of communication are we adding, and in some cases removing, with this new system? What channels are important for the specific scenarios our customers are using the system for? What is the cost of faulty information coming across? All of these factors are super important, for everything from making business deals to grandma wishing you a happy birthday.

One thing to notice about the social cues for copresence is that they tend to deal with the task at hand, and/or the relationships around the people performing the task. In fact, the more task driven and specific a multi user application is, the more copresent the people will feel, even if they’re invisible. This is because there’s so many ways for entrainment to occur with a larger structured objective.

The ability to “get past the novelty and do real work,” allows users to start behaving towards all the complex social cues that also signify people’s copresence. It’s a bit like the difference between a generic conference room (where people gather to be close or private) and NASA mission control (a very specific and equipment laden room for doing a complex multi person job). Even when people in a generic meeting are sitting right next to you, they might not feel present if they’re working on something else, or daydreaming. But you can be sure that every human helping control an ongoing space mission feels fully present with the others around them, even if their eyes are never unglued from the bank of displays they must monitor.

So, if you’re making an app just so people have a place to gather, they may get some of the physical cues that the person is nearby. But if you make an app where people have superpowered digital tools that magnify their work abilities, they may not even notice those physical cues, and thus will mostly experience copresence due to alignment of behavior and attention around the collaborative work project. I’m sure you’ve had moments where a task is so all encompassing that you stop noticing your workmates, and are just living in the collaborative effort.

XR’s “killer app” is not to make normal communication behaviors possible in a slightly new way. It’s to enable fully entrained communication, previously only doable in person, and not currently capable from phone or video chat. It’s also about enabling deeper communication through the ability to empower users with new inputs, new ways of visualizing things, and computing super powers at the ready.

Nowadays meetings are what happens between chunks of work. Imagine if they could become productive, collaborative jam sessions, powered by always on compute and pixels anywhere. I can’t wait for those meetings, personally.