The Dangers of Gaze Data with Brendan David-John

Brendan David-John joined Virginia Tech’s “Curious Conversations” to talk about gaze data, exploring its applications in virtual and augmented realities and the associated privacy concerns.

He highlighted the potential for gaze data to reveal personal information and related security implications, especially in a military context, and shared the projects he’s currently working on to better mitigate this threat.

Raw Transcript

(music)

Travis

It's been said that a person's eyes are the windows to their soul. While I'm not sure exactly how true that is or what their soul would even look like if I saw it in their eyes, I do know there's a lot you can learn about an individual from their eyes. And we're also moving more and more into a world where virtual reality, augmented reality, and other types of devices are using our eyes and our eye movement as part of that interactive experience. So I'm curious what these new devices, this new technology can actually learn about us from our eyes and also how can we guard against somebody misusing that same information? And thankfully Virginia Tech's Brendan David-Shon is an expert in this very subject.

Brendan is an assistant professor in the Department of Computer Science at Virginia Tech and also a member of the Private Eye Lab at Virginia Tech. His research interests include eye tracking, virtual reality, augmented reality, privacy, and computer graphics. Brendan and I talked a little bit about gaze data, what that actually entails, and what we can learn about individuals simply by looking at their eyes. We also talked a little bit about how nefarious actors could possibly misuse the same information, and he shared some of the projects he's working on to sure up security in the space, one of which involves helping create a more secure interface for our military members when it comes to using things like augmented reality. Brendan also shared some insights that you and I can use to help better protect ourselves as we venture into these spaces, and he shared how having all of this knowledge has impacted him when it comes to how he interacts with technology. As always, don't forget to follow, rate, and or subscribe to the podcast. I'm Travis Williams, and this is Virginia Tech’s Curious Conversations.

(music)

Travis

I am curious about this concept of Gaze Data. And so I guess that's a great place to start. What is Gaze Data?

Brendan

Yeah, yeah, I mean, I think the primer here is about human vision. And I've been working with GazeData, I've since been an undergrad researcher, and I guess summer of 2013. So I'm pretty in tune with what the eyes do. And I guess what you always hear is the eyes are the window to the soul, to some extent. And I think it's very true from a neuroscience perspective, right? The eyes are our main form of visual input and sensory input, assuming you have proper vision, folks with low vision or different conditions, where I have different, you priorities of their sensory source of information. But vision is essentially dominating the bandwidth of what goes to our brain, what our brain processes, and what we act and do next. So I think the best example, or guess the kind of metaphor I want to bring here, is we can think about our computer screen as having a very high resolution. It could be 4K, has thousands and thousands of pixels, but they're all uniformly distributed across the display. It's all the same resolution in some sense. Our eye works very differently. So where we look actually has a spike in resolution. And we call this area the fovea. So a lot of folks learn about this with rods and cones in high school and middle school. And that foveal region is really, really small. And the example of vision scientists use are you hold out your thumb at arm's length. The width of your thumb is pretty much all you can see in high detail at one instant of time. But your brain is what actually stitches all of this together. The world doesn't look blurry. It doesn't look like it loses detail. Our brain has just learned a process that kind of, I don't want to say frames, because we don't have a frame rate, but that frame by frame input of these small high detail regions are all stitched together to be a vibrant, high detail world around us. And it very quickly falls off, right? So that you get this really high resolution and the perfect kind of illusion that I like to bring up in this, it's really fun example where they'll have you look at one side of a screen and they'll bring a face in slowly from the right-hand side. And everything just looks normal, right? As you know, I'm keeping my gaze where I'm supposed to look on the left side of the screen. I can tell a face is coming in from the periphery, but once it overlaps with the region where I'm looking, I realized the nose and the mouth and the face are completely flipped upside down. And I couldn't tell that until it actually overlapped with my fovea to have the detail to pull that information out. But our brain's still smart enough to know, well, this kind of smart, you know, this sparse, blurry thing in the corner of my eye, it's definitely a human face. I'm very good at seeing human faces, right? Babies do this during development, right? And like until it overlaps, I don't get that shoe to drop that actually, there's something weird going on with the nose and mouth. Cause my brain has just put that part of the world and constructed that understanding of the world together. So that's at a high level why gaze data is so important. It really tells us what the brain is processing, what it's thinking. And our brain is really optimized how we survive, act and move in this type of environment.

Travis

Yeah, and so I guess when we are now using a lot of these virtual reality, augmented reality goggles that are, I guess, super close to our faces, how is that interacting with gaze data and our eyes in ways that's, I don't know, capturing or using that data?

Brendan

Yeah, there's definitely a few ways. And I think the first goes back to that monitor example, right? We have this 4K monitor. I have a pretty big curved screen in front of me, or you go to a projector system in an IMAX. And those things are just a bunch of pixels. And essentially, if I put the screen right up in front of your eyes, I have some optics and some lenses that kind of amplify things and make it look big in a field of view or look immersive. But essentially, there are so many pixels that I have to pack in, especially with how close it is to the eyes, to make the image look crystal clear.

What's interesting about virtual reality and how we see things in 3D is we actually have a left eye and a right eye view, right? That's how we see the world. And that's what the brain uses to figure out depth and information about the world. So we have to replicate that if we're doing what we call rendering or making a virtual world appear in front of us. So these virtual imagery and our brain kind of gets tricked to fuse these things together and think it's seeing a 3D world. And the whole point of this is that there's a bunch of pixels in your computer graphics, your video games, your virtual simulations, they have to fill all these pixels with colors. And that's a lot of computations, right? Because that distribution is uniform, but our eyes actually has a spike right around that point where your eye is focused or your fovea, which is where the gaze data is telling us, we can actually optimize that rendering and save thousands and thousands of pixels of computation. Cause my brain just can't put to, or I guess my eye in that case can't sense that high resolution there. So by giving these, you know, these systems gaze data, I actually get really nice optimizations in terms of power usage and the amount of time it takes to render a frame and make something show up. If you're thinking about maybe a video game or in a video game example, but this is really true for simulations, right? If I want to train somebody in a virtual environment and have them go perform well, whether it's a firefighter or a surgeon, I need to be actually be able to recreate photorealistic and high fidelity environments for them. So that's one way that it gets leveraged there. I think another interesting way it gets leveraged is with interaction. So folks are familiar with the Apple vision pro. One of their big selling points is that they have this really natural interaction called gaze and pinch. So instead of having to hold this physical controller, in fact, the Vision Pro doesn't even ship with a controller, right? It only tracks your hands and lets you point at things and type on virtual keyboards. It's a lot easier to look, just look naturally at an object and pinch my fingers. And if I can do this, it's kind of like magic is kind of the way they pitched that. And you, need that gaze data, right? If I don't have that direct direction where somebody's looking, it could be anything in front of me I'm trying to select.

But if I have an idea of where the eyes are pointing, I can do this, you know, detect this little pinch gesture out of some other camera on the headset and figure out, yeah, they're trying to select that button or interact with that character, right? In a certain way. So it's this really naturalistic form of interaction. And we've had this for a little bit, like accessible interfaces, think Stephen Hawking's set up, right? Where there was some sort of blink detection to do interaction and text to speech type of systems. We've had this around for a while, but we didn't always have the technology to just drop it on somebody's head and just have it work. There's a lot of calibration. There's a lot of time to build the systems, but we're really reaching that point where these VR systems can just track what you're looking at and naturally let you interact with them, which is really, really good because I don't want to ship a controller. I don't want to force you to also say, I forgot my Apple vision controller. can't even use it anymore. Right. We want that form of natural interaction. And I think the last thing I'd mentioned here about, you know, why this is really important is just that same thing I mentioned earlier, right? Our brain is constantly just sensing the world around us constructing some version of the reality, and we're using that to do actions, right? If I'm a surgeon, I need to do very specific things. I need to pay attention to specific areas. So what I, some of my research is actually called gaze intent modeling. So we take this data from the eye movements and what you're looking at and try to predict what you're going to do in the future. That way we could build some assistance, right? Maybe I need help selecting something quickly, right? Even in that case where I, you know, I look at something with my eyes and then I pinch.

Or if I need to determine that, maybe I missed something and I might make a mistake, right? It's a lot easier to do that if I know exactly what you are paying attention to instead of just getting, you know, a camera strapped to your head like a GoPro.

Travis

Yeah, it sounds like that you are almost trying to create like almost like predictive text that my cell phone has but for that system with my eyes.

Brendan

Exactly, exactly. Action and perception, yes. Based on that, yes.

Travis

When it comes to gaze data and gathering and collecting this data, what types of things could a person learn about an individual, perhaps a nefarious person, learn about an individual simply from the gaze data?

Brendan

And this gets into a lot of my research is like, this is really cool, but it's also a little bit scary about what could be learned about me from a privacy perspective. And there's a lot of privacy concerns, right? I think some of them are as simple as figuring out your age. Like there's an idea of anonymous browsing in a web browser, for example. I don't want targeted ads based on my age group or my gender or my ethnicity, but how you respond to the world and you can kind of think of the eyes as both an input and the output, right? It's searching for more information. And it's kind of revealing what I'm looking for. But it's also telling me what my brain's receiving and how I respond to it. It's kind of this loop between inputs and outputs. And actually, if you pay attention to that over time, you can really figure out, hey, what ethnicity is a person based on the way they look at other folks' faces? There's a lot of experimental research studies that do this. We're still seeing how it expands out to the natural world with what we see every day. But I can figure out your age just based on the eye movements themselves. I don't even care what you're looking at. The muscles in your eye have pretty...you know, pretty systematic tendencies to change over time. So if I just get this raw data unfiltered, just, the eyeball rotations, I can start to figure out, Hey, what roughly age group does this person fall within? And then if I pair that with the content they're looking at, I can start to figure out, maybe they look at things in a certain way from a gender perspective, or they look at different ethnicities and have some sort of implicit biases in the ways that they think and process the information about the world totally start to model this thing. whether it's accurate or not, you might feed this to some AI model or some predictive algorithm to feed you ads or even just personalized content in general, right? You think about different dynamic storylines and things of this nature, and that kind of push people to have different perspectives and thoughts and ways that they approach the world. All these things could in theory feed in from that perspective. So at a high level, those are a lot of the things that you learn about somebody from personal side of things. A lot of my research also looked at identification. So it's not the same as like a fingerprint or an iris pattern. It's not as strong. We call it a behavioral biometric. But if I have a sufficiently large set of people, let's say a few hundred or maybe even up to a thousand, if I have enough data, I could kind of pull you out of the hat if I have some of those eye movement data because it's very unique how your muscles are designed. So there's both some general trends like age that I mentioned before, but things that are very unique about you and the way that your eyes move and the way you respond to content. So identity and these other kinds of personal Preferences I would say are things that I can capture about you

Travis

And what might a nefarious actor do with that type of information?

Brendan

Yeah, I mean, at a high level, the ad space for VR, the immersive space, they're really, really unknown. But it's definitely the case where if I'm making money off a number of clicks that I get from somebody, and let's say in the VR space, a click is how long you look at an object, I can try to find the things that attract your attention most and make sure that I'm showing them to you. That could be more ad revenue for me because then I could potentially follow up, think about the way web ads work. They get money if you follow a sponsored click through an article or something.

If I know that I have more things that you're going to respond to, I can start to put those into the environment around you. And someone who's, you know, that's more, maybe I don't notice the effect, I'm just giving you my data, but someone who's really nefarious could find what my interests are. Maybe I have some very, very specific view on very specific politics or very specific items, and I want to push you to be more radical in that direction. I can start to control what you're seeing see how you're responding and maybe feed off of that with some sort of adaptive algorithm in a very nefarious, manipulative or deceptive way. That's something that I'm thinking about a lot. There's this idea of dark and deceptive patterns and I think vision and attention is a huge part of that, right? A web browser just knows what's on the screen at that point of time. The eye tracker gives them information about where you're allocating your attention.

Travis

You're doing a lot of research related to how we can be more secure in this space. And one of the projects that I know that you're working on is this project that's supported by DARPA. And so what are you working on right now related to how we can help keep our military members more secure in this space as they venture into using more VR and AR technology?

Brendan

Yeah, definitely. And at a high level, I think there's a lot of critical spaces for VR AR. I mentioned like surgery, for example, right? If I'm trying to get past these like physical monitors in the environment, I want to give you information. That is the benefit add. And it's usually more mixed reality. The idea that I can see the real world and augment it with additional information. mean, virtual reality is really good for simulations, but I'm not walking down the street wearing a VR headset and blocking out the real world. And especially these critical applications, you still want the real world preserved.

But what we're seeing is how can I display information in a way that enables you to perform more efficiently? And one reason that we think about that maybe in a critical, you know, as a soldier's case, right? Instead of having information that's maybe on a tablet, some digital information about a map or the rest of the world or the content, I might want that on a heads-up display. But I don't want that heads-up display to actually distract me from my, you know, the most critical task that I'm doing at that moment.

So part of the use of eye movements there is to understand the distribution of that content and make sure we're not distracting and actually having a net negative, right? We want to give them more information. We don't have to take their eyes off the task. But if I'm blocking relevant information or if I'm just making them look around too much and keeping them off the task, that's actually a net negative. So that's one reason we apply eye tracking there. My specific work within this DARPA project is looking a little bit more at what if somebody actually gets access to that information? So, right, I'm using that same information to make sure the display is optimized, I'm giving you information and maybe even letting you select things or give commands or indications really, really quickly if you can't speak at a certain point in time. But if somebody were able to get access to that communication channel, or even just put, you know, a really good thermal camera or some camera in the environment and looking at you and figure out your eye movements, does that reveal anything about your cognitive status and maybe when you're vulnerable? And that very much links to what I said before, right? Our cognitive, you know, our brain and that system is just taking information in, and your eyes relay that, if somebody can pay attention to that, they might be able to take advantage of a vulnerable moment or vulnerable status of the user, or even some long-term tracking of like, this person has been at their post for 12 hours, they're very tired. Right? If somebody can get any of that information, even at a very coarse level, right? They're not, you know, sneaking a camera into your, into your headset or something, but if at a very coarse level, they could get this idea of what you're thinking and doing, I think it could be used in very nefarious, malicious ways. And we want to understand, in just these critical settings, what is that trade-off to adding this technology and thinking about this data stream and the applications we use it for.

Travis

I know you're also working on an NSF project, National Science Foundation project, so what all does that entail?

Brendan

Yeah, so it's a really fun project funded by the Satsi Secure and TrustWorb cyberspace program within NSF. And this actually links back to that example I gave earlier about resolution and rendering, right? The idea that your display is uniform in pixels and lots of dense pixels, but I can only see a really sharp spike of those around my gaze direction at one point in time. So folks are going to actually use this optimization. We need to give eye tracking data to the system. So some of my past work from the dissertation has basically said, hey, I can trust maybe some Nvidia graphics card to get my gaze data, but I don't trust some random app developer or game developer. So your best practice is kind of just let the platform have access to it, but never give tools or code to developers. And like the Apple vision pro for example, has followed this type of approach. And that's actually, they're very privacy preserving in what they're doing. But what this NSF projects is exploring is, well, if I'm a very crafty game developer, I might be able to instrument my virtual reality game in such a way that different things that you look at

Result in different things happening on the graphics card or the performance of the system. This is called a side channel. So this actually isn't new, right? We can think about different side channels and systems like the GPS or sorry, the cell tower signal strength, for example. It can actually tell me how close I am to a certain cell tower and could give me an idea of where you are within Blacksburg or maybe where your office is or what medical doctor you're visiting for certain periods of time. So this is kind of a common side channel. It's a way to get information that I never gave an app access to.

But some other signal on the system it has access to reveals that information. So we've followed that analogy through to this virtual reality rendering case and said, hey, this foveated rendering thing is really cool. It saves power, right? It's really good and helps the games run faster. But what can I take advantage of by the fact that the graphics card responds differently to the environment that I render around the person? And the idea is that different things react to your gaze differently. And I can actually find these signals in the system and actually still try to solve and reconstruct that gaze data. So I'm not supposed to have access to it. Maybe I can track what you're doing in the environment, what general direction you're looking in, but not the specific object you're looking at. But if I take this attack, we call it a security side channel attack, I can kind of instrument what's called them trap doors. And that will capture some of the system performance. And that's something that I have access to as a developer, usually to optimize a game or make sure things are running smoothly. And I can pull and reconstruct that gaze data at a course level out of that. So that's what that project's exploring. Still, as I mentioned earlier, seeing these things emerge, we don't know all the applications, but if they're using these eye tracking based optimizations or interaction systems, we can start to pull out some of that data maliciously and just step right around the best practices for protecting your own privacy.

Travis

That is fascinating. sounds like you really are in this emerging space, kind leading the way, kind of exploring a new frontier of technology, which is really cool. Well, what should the average person be aware of in this space, and how can the average person really protect themselves against nefarious actors in this space?

Brendan

Yeah, think the average user is actually at a disadvantage because if you have one of these fancy headsets, would say eye tracking is showing up in these headsets, but it's not in the consumer. Say the MetaQuest 3, it's maybe a very accessible. The usage of this device spiked at Christmas time in 2024 recently, and it's used a lot for games. But we're starting to see these applications come out. What I think folks need to pay attention to is what are the privacy policies of this data?

Right? What is it? This kind of goes back to this idea of gender, age, and all the profiling that's going on on that side of things. Right? Like we don't know what happens with the data in the lot of case. And even if we were to try to read this really long policy and all the legalese that's in it, it's really hard to understand it. So one of the things I'm doing on that side, and this is still kind of, you know, in development, right? We don't know the full ecosystem, but what are better ways to like give you visual indicators of what your information relays about you? spent, you know, let's say we spent 10-15 minutes talking about eye tracking data, that might actually raise your eyebrows a little bit. And if nobody has heard this before, they aren't quite thinking about what my eye movements reveal. And if I just tell you, hey, your game is going to run faster if you turn on this eye tracking sensor and let this app have it, you're probably going to say yes, right? You just don't, you're not informed about these issues. And even if you were, and you would have to dive through the privacy policy to see how they're going to use that data. So we think about more immersive kind of actually visualizations in your environment. If I can just draw a little cyclops type of gaze ray out of your eye, for example, and let you play with that for a few minutes, you might start to think, there's some unconscious things my eyes do that I'm not actually comfortable with this app knowing. And maybe I'm OK just not sharing that data. Like we had an example where we put people into a virtual reality art gallery. And that was just randomly sampled art that we put up on the walls. And some of it had nude imagery. It's kind of older historical art. And when people turn this visualization on to try to inform them about gaze data, because they had never really used eye trackers or weren't very knowledgeable about it before, they noticed like, oh, these are the things I'm paying attention to. And they changed their behavior. They stopped looking at the nude regions and some of these paintings are like explicitly told themselves, try to avoid looking here. And that change in behavior really suggests that people aren't aware of what their eye movements reveal about them, whether it's just attention or these other risks that we talked about with age, gender, and profiling. So that's, I think that's my advice. You can't read all the privacy policies, but make informed decisions when people ask for sensor data, whether it's eye movements, heart rate, all these biosensors, you really should kind of really, you know, be a forensic investigator. Think about, do I need it for this use case? And who is this data going to? Is it just maybe the meta or Google who makes the device, or is it some third party application that maybe wants to make money or do something else with this data? I think that's the best advice I have for consumers. And we're starting to see more. Education tools, you know, obviously I can train medical doctors, but I can also immerse you in a language learning environment and make it very easy to, you know, practice learning French, for example, or some natural language, right? It's a lot easier to do than the, you know, talking to back and forth with some sort of computer device. I can actually see a 3d person and practice language with them, right? If I was also giving them gaze data, maybe they can make a more adaptive interface, but what are they doing with that data? So I went on a little bit there, but I think. I think that's the advice I would have for consumers is really think about what data you're giving up and when, when you're prompted for it. It's just so natural to say, yes, allow, yes, allow. Like I had to do it for this podcast recording is yes, allow my mic and camera and it's perfectly fine. And I picked while using this app only, right? Or the site only, for example. That's my advice from that perspective.

Travis

Yeah, I think that's great advice. sounds like it really comes down to awareness. Be aware of what you're clicking yes to, which I know I'm definitely guilty of probably agreeing to things that I didn't fully read. Well, I'm curious, doing all the research that you do, having all this knowledge, how does it change how you personally use technology?

Brendan

Yeah, that's a really good question. It goes back to what my PhD advisor said and she kind of got me on privacy security a few years into my PhD and she was kind of like, yeah, maybe once I retire, I'm just gonna live on an island and off the grid. But I still actively use the technology. And as I said, we're still not quite there to the point where like, everyone's using this in school, for example. I think there's a lot of challenges to fix first, which like, what age should I be using this? Like, does it affect the way my eyes develop? right, because putting some screen very close to the eyes does affect the development of the eyes. So I'm still obviously the early adopter type of person. I'm doing research. I'm a computer scientist in this space. And I tend to try a lot of these things, so it does, does, you know, I do it. Let me say it this way. I tend to lean in and just understand these things. I don't have like those meta Ray-Ban glasses that are, you know, always, you know, recording for example, not always recording, but always available to record a quick clip for something. I don't tend to put cameras on at home. Let me say it that way. But in a research setting or a very controlled setting, I think it's good to explore and play around with these things just to see the trends that are coming in the future. Definitely play console games, a little bit of VR games when I can. Like Beat Saber is really popular. It's a dance dance revolution, but for VR, right? And that has its own risks. I think a funny story there, there's this leaderboard for Beat Saber where I think 50,000 people or so basically say, hey, I want to be put on a leaderboard. They give up their motion data, which this isn't even eye tracking data. This is like them moving their head in hand to dodge things and like swipe things with swords, kind of virtual swords. Folks actually made an agreement with the company, some researchers at Berkeley to get that data set and were able to re-identify people out of 50,000 people with like 93 or 94 % accuracy. So just like as a gamer, even if I want to be anonymous in a different setting, like if I gave up some of this VR data, I could be tracked. But that went down a little bit of a rabbit hole. Yes, I do game. I think there's a value proposition, but you can always say no, right? I think maybe single player campaigns and long campaigns are dying a little bit, but that's what I would like to see more of. you don't, mean, social interaction is nice, especially through a platform like video games, but it's not the only thing. And I don't want to see the trend where like you have to always be online connected with some data connection to your gaming to be able to do and play with this type of entertainment.

(music)

Travis

And thanks to Brendan for helping us better understand gaze data and how our eyes might actually be windows into our souls. If you someone you know would make for a great curious conversation, email me at traviskw at vt.edu. I'm Travis Williams and this has been Virginia Tech's curious conversations.

(music)

About David-John

David-John is an assistant professor in the Department of Computer Science and the Virginia Tech Private Eye Lab, as well as a researcher with the Commonwealth Cyber Initiative. His research interests include eye tracking, virtual reality, augmented reality, privacy, and computer graphics.

The Dangers of Gaze Data with Brendan David-John

Raw Transcript

About David-John

Related Content