DESCRIPTION: What happens when we can no longer trust what we see? In this episode of Your AI Injection, host Deep Dhillon sits down with Ilke Demir of Intel Corporation to explore the unsettling rise of deepfake technology and its impact on truth and trust in our digital world. Dr. Demir, a leading expert in deepfake detection, explains how GANs (Generative Adversarial Networks) fuel these hyper-realistic videos and shares cutting-edge methods, like tracking PPG (photoplethysmography) signals, to detect the faint clues that reveal synthetic content. Together, they examine the deep social implications of deepfakes, discussing whether digital provenance and new legislation could help restore credibility in media—or if we’re entering an era where deception reigns.
DOWNLOAD: https://podcast.xyonix.com/1785161/9794865?t=0
AUTOMATED TRANSCRIPT:
Deep: [00:00:00] Hi there. I'm Deep Dhillon. Welcome to Your AI Injection, the podcast where we discuss the state of the art techniques in artificial intelligence with a focus on how these capabilities are used to transform organizations, making them more efficient, impactful and successful. Welcome back to your ai injection this week, we're going to dig in on deepfakes. That's the tech we've all seen of videos like Tom Cruise or President Obama saying things they most definitely did not say. Today, we're speaking with Dr Demir. Ilke received her Ph.D. in CS from Purdue University and is currently a senior research scientist at Intel Corporation last year. Ilke released a publication about deepfake detection using a tool called Fake Catcher. I'm thrilled to have her on the podcast this week so we can learn more about deepfake technology. So thanks so much for being here. Please set the context for our discussion. What are deepfakes? Just at a very high level, like, how do they work from a deep learning/ ai perspective [00:01:05][64.4]
Dr. Demir: [00:01:07] So deepfakes are videos where the actor or the action of the actor, I'm not you. So imagine me talking like this like thing, all accessible like an meant, but it's not real. Like someone else playing like me, or it's my visuals about my is the synthetic. Or it may even be just imaginary, basically fake. So any of those, like very high level people, are like celebrity videos that are not real, that are done with deep learning approaches are mostly called deepfakes. [00:01:36][28.6]
Deep: [00:01:37] Tell us a little bit about what changed in the machine learning world that allowed somebody to take. Let's say this video of everybody seen the Tom Cruise video at this point where he looks like he's saying things that he clearly didn't say. What changed? What's making that happen now, right? [00:01:54][16.5]
Dr. Demir: [00:01:54] So as you know, we had Photoshop for many, many years, right? Like, you were able to say that imagery. There are very talented artists out there that make us believe things that are not real. But the thing that change that enables all of these, like rise of deepfakes, is actually against generative adversarial networks. That's our new genetic approaches, which are fueled by deep learning, like those like different layers of all the rage and employment relations and learning to actually create that very realistic samples from the learn the solution. So as the data is increasing in image domain like we have, for example, if I want to create a deepfake of you, I need many, many, many more samples from you. And then again, we'll learn how to recreate you in the way that we can actually control or we can modify, not maybe control as like exactly where your mouth is, but to reenact things based on another image or Typekit video. So all of the advancements in deep learning, like novel approaches as Gans are like 3-D reenactments, are like more learning based 3-D black shapes are enabling Gonzo's deepfakes with respect to all of these approaches really being democratized, right? Like he thinks I'm not just in the research labs anymore. There are many tools that are open source and free to the public. So. [00:03:14][80.0]
Deep: [00:03:15] So just to be clear, so to make a deepfake, you need an input image or set of images of the target. So let's say we want to get President Biden to say whatever we want. So we need images of the president. Then we're going to have our own video saying with our AI expressions, our own expressions, and then we're going to apply this transformation to that image. And then suddenly we're going to be able to create this world where Biden saying whatever you want. Is that correct? In essence, [00:03:44][29.2]
Dr. Demir: [00:03:45] in a sense, it's correct. Sometimes it doesn't need to be like specific images of one person that is more towards like reenactment or to the 3-D replaying of the actor in general. Gans can be like imagery of many people so that we can actually capture the distribution of those images, especially for image based deepfakes. So for those you need like many, many, many images to actually capture the distribution of that data [00:04:07][22.5]
Deep: [00:04:08] that the person who you're trying to fake, if you will, how many images of them do you need and what are the characterizations of those images? [00:04:17][8.2]
Dr. Demir: [00:04:17] There are many guys that are many approaches that you can use, but most of them are in facial domain, so it's mostly like a portrait video that is segmented and face is either created by those gaps or it is modified to fit a 3D model. And that 3D model may be created from an actual like many models of imagery of that person. Or it may be a generic type of 3D model that you set the textures of that person into that static 2D model so that you don't represent the detailed 3D geometric, but you represent the face by the textures, not just the geometry itself. So there are several approaches, but the main thing is like you need that source model or imagery or video and the target imagery. VIDEO If I am being voiced by someone else, I need that someone else loves VIDEO so far are trust for my new self. [00:05:09][52.1]
Deep: [00:05:10] So let's take a specific case. Let's say you have one image of the target. We've got a video of what we want the target to say. Walk us through it. A high level like like what? What is the high level learning that takes place? What is the adversarial? Part of the can and how do we get this model to learn this? [00:05:29][19.4]
Dr. Demir: [00:05:31] So again, as I said, there are several approaches for that, especially for that for the case that you mentioned, the U.N. has different policies that you can manipulate or you can do a lot bit, right? So one of them is the reconstruction of those, for example, what do you create at the end? Is it similar to what you captures as the important inputs and outputs are known, for example, like you are trying to mimic a person smiling and you already have a smiling version of that face them in the training phase. This this, you can actually do that like reconstruction lost by they're saying that look like much. But I learned to the network, is it creating the same as what I gave as input? You can do landmark transformation lost by example. So moving in a target like moving the ends of my lips from here to here, we just smile action. If this is the target, then you want it to animate the source light. But you can also try and the transformation between the landmark things are they the same or not? [00:06:29][58.4]
Deep: [00:06:30] You're learning how to take two images and figure out how to transform one into the other. And then you're learning that for that trigger image between it, you're now applying it that transformation to every frame in the video. Is that [00:06:46][15.5]
Dr. Demir: [00:06:47] about right? That is one of the ways. So you may learn the transformation, you may learn the morphing, and you may learn to create the completely normal image. So that is not trying to transform the image that you already have. What it is trying to create something that doesn't exist from the whole distribution of human space. So generally, those are like more image against the listeners don't know guns have. They're a genetic part, and it is to make it so that so part is taking notes and trying to create realistic images and trying to take the discriminator saying that, look, this is a real image, and this dimension is trying to distinguish between real ones and fake ones and trying to catch the generator, creating those fake images, saying No, no, no, this is not realistic enough. This is fake. So, so to the nature you give the output of the generator and the real images and Jerry you can like in the very basic version of it again, you give like white noise to actually shift that right now to actual distribution of pixels in the image in the very basic version. Yes, it is a game between the discrimination and the generator. Now there are many different variations, like for organs like that. Let me show again, says bi directional games. There's like all types of guns in 2D and 3D, and it all arises from the fact that we can do many different things with demos. You can put multiple generators or multiple decision makers, or you can condition the output of generator based on initial distribution, etc. So all of these are enabling those very realistic, very high resolution deepfakes nowadays. [00:08:19][92.9]
Deep: [00:08:21] So what does it mean to catch these images, so set the context for us. You know, you've got Facebook, you've got Twitter, you got all these social media platforms. I believe all of them have policies and they don't want deep fakes on their platforms. But maybe if I'm wrong and lightness. And what are they doing to go about actually detecting these fakes? And once they find them, what are they doing? Are they marking them up? So everybody knows it's fake? [00:08:47][25.3]
Dr. Demir: [00:08:48] Yes, so as the effects are rising in quality and quantity, the deepfake detection levels are also rising and hopefully the quality and quantity. And so there are many different ways to look, whether something is a deep fried or not. So medios, right? If you look at the very basic general architecture that we just mentioned, this commentary itself is actually a detector. It's trying to detect whether the thing that comes from it is actually fake or not. [00:09:17][28.5]
Deep: [00:09:17] Just to be clear, like when you're training it and it's detecting whether it's fake or not, it's because on the one side, you've created the image of Joe Biden projected onto this fake video. And then on the other side is, you know that that particular frame of the video is real. [00:09:31][13.7]
Dr. Demir: [00:09:32] Is that going to change? [00:09:33][1.2]
Deep: [00:09:33] Just just to be clear for the audience? OK, for [00:09:36][2.3]
Dr. Demir [00:09:36] detection, there are several approaches, so one of them is to actually look at together and try to find the residual noise the most that the gun itself creates without being aware of it in the actual deepfakes. So you can imagine that as compression artifacts, right? Like in the early days for image forgery, there was JPEG compression artifacts that Newcastle said, Oh yeah, this is not the original image. It is compressed like this modified, et cetera, et cetera. So similar to that, Yanez also have those assets, and some of the fake detectors are actually classifying deep fakes or finding the fakes based on those residues. There are more generalized deepfakes that is not specific to each gun because they're so mean against right and those are looking at authenticity. And when I talk about authenticity, it is what is unique to humans, right? Like I told you, I'm looking to you. I have that consistency by myself. Maybe check all these like biological signals or physiological signals. And those automated based defect detectors are trying to find what is special. So real videos as opposed to one of the specialty fake [00:10:43][67.0]
Deep: [00:10:43] videos just to understand the difference between the detector in the Gan that built the deep fake. And the one that you're describing is that the former is built primarily and solely based on that video that it's trying to create a fake. Did I understand that right or not? [00:11:03][19.4]
Dr. Demir: [00:11:03] One of them are on many images or many videos. The distinction is that the first discriminator is specific to exchange, so it's not working in. Yes. Yeah. Detectors for other places are created by other sites in the generalized version of deepfake detectors. It does any given VIDEO and in addition, it may have be the case that those videos are post-process. So maybe there's a deepfake of me, but you change the background or change the lighting, or do some post-processing or compress the video and those joe as executive orders are still expected to find that it is fake. [00:11:41][38.2]
Deep: [00:11:43] Where does this wind up? I mean, is this ultimately a cat and mouse game between that level of compute resources and data you use to train the detector versus the level of compute resources you use to create a fake? And we're overall trying to just have the good guys on one side with a lot more GPU and CPU capability to train a more accurate detector. Or is it a different game than that [00:12:09][26.2]
Dr. Demir: [00:12:11] as a side like it's an arms race, right? Like the more high quality guns we have, the more high quality detectors we hope we have for the residues. The gangs are also getting a lot of progress in that space that they are trying to not create those side effects outside signals. But for the automated these signals, that's a little bit harder for games to actually replicate some of those automatic signals, for example, one of them is the eye and eyes on games that we use. It is a little bit easier if you want to design again. You can actually try to build that loss function amidst the iron gates features and try to learn those like iron gates features so that we can create more realistic deepfakes. But there are other signals that are really not that easy to mimic. For example, give me just signals. So you may say, Well, this is short, far short of lattice mammography. I know big word bottom. This mammography is detecting heart rate from video remotely. So when you look at me like this, there are color changes on my skin that is due to oxygen being in my veins and actually use that signal to detect deepfakes. And this signal is really hard to replicate, which again, because it needs like very complex relations between a president, really complex correlations between many places on your face that we accept the signals they are spatial and temporal correlation and spectral correlations should be preserved. It is very hard for them to learn that cells [00:13:45][93.7]
Deep: [00:13:46] are these signals that a regular human has a lot of intuition about. Like does this fall into the uncanny valley where we just know something strange about this image, but we don't know what? Or is this something that flies under the radar of a normal individual human's perception? [00:14:01][15.2]
Dr. Demir: [00:14:01] You may not see it. It is invisible to the eye, but computationally it is visible. If you remember, there was an MIT paper that was changing the color of a baby based on the heart rate. If you amplify those signals on the video, you can actually see your heart rate. It's like changing color like I am going back from white to red to white based on my heart rate. Amplify those signals. But without them suffocation, it is not is to die. [00:14:26][24.3]
Deep: [00:14:28] You are listening to your air injection brought to you by Zanik Stockholm. That's x y o nytimes.com. Check out our website for more content or if you need help injecting air into your organization. So if you can leverage some kind of biological symbol like that on the detection side, then why can't you leverage that sort of a signal on the fake creation side? [00:14:55][27.7]
Dr. Demir: [00:14:56] Yeah, as I just said before, the relations are still signals from many places in your face is really hard to replicate. So you need to keep their sector correlation. You need to keep their temporal correlation, they keep their facial color correlation and the gang is not able to learn all of that. And even if the formulation of the signals is not very intuitive to put it into a lost function that we can propagate because that is not a linear process to extract the signals. Even if you want to approximate images signals instead of extracting them by the actual function, you need very large big datasets to learn approximating its from, and there is not such big datasets that are like actual copy signals. There are some remote datasets that are some small datasets, but they are not really comprehensive enough that you can represent the biggest signal for the whole populations where all the skin tones for all the ages, et cetera. So that's why it's very hard to replicate images signals in text. [00:16:01][65.2]
Deep: [00:16:02] So how effective are these, you know, deepfake detectors? [00:16:08][5.9]
Dr. Demir: [00:16:11] That's a really good question, and the answer depends on the data set, and then we can assess it in the wild case, right? So four different data sets, for example, the YPG based detector that I mentioned, we have all the way from like 90 93 percent to 99 percent on site of academic datasets. And those datasets are like relatively large datasets like dozens of videos and then millions of image frames, and we can really have that up to 99 percent accuracy. [00:16:42][31.3]
Deep: [00:16:43] Is that accuracy at the video level or at the frame level? [00:16:46][2.4]
Dr. Demir: [00:16:47] A video level, media level. We actually have three levels. We don't do image based because we can extract signal from an image, right? We do it for [00:16:55][8.2]
Deep: [00:16:55] because you're you're kind of honing in on this heartbeat like. [00:16:58][2.5]
Dr. Demir: [00:16:58] Yes, yes. So we are expecting that it's taken from many places, [00:17:02][4.0]
Deep: [00:17:03] from all the pixel changes on the face. [00:17:05][1.5]
Dr. Demir: [00:17:06] Yeah. Yeah. And then we are correlating them spatially and spectrally and temporally. So even if there are some image distortions or some lighting changes or some some part of the image, you don't see the face or like, there's like very fast moments. You can still see that relatively enough of time agrees for a real long and for fake ones. They are everywhere. There is no structure. No privacy policy. No. [00:17:32][26.2]
Deep: [00:17:33] Let me just see if I can summarize. In the case of a real video of a human face or body, you're going to have pixel on one point of the face going to have this periodicity to it. And that's because it's basically tracking the heartbeat of the human and in the face. What's happening is they're not really so concerned about that. They're optimizing authenticity in a different way in a just a general sort of perceptual way. And in that case, this is like a byproduct that neither the deepfake generators or humans really hone in on. That's clever. So what was the original scenario for detecting heartbeats? How did that come about in the first place? I'm just curious. [00:18:12][40.0]
Dr. Demir: [00:18:13] Yeah, I think that was a research that was done in MIT from Fremont. Sort up. I don't want to give misinformation. I was sorry about that. I don't remember the original paper and more than that. But the origins of remote 5G signals are so monitored patients through video. So if a patient cannot go to the hospital, they need some care. They can just look at the video and at least see that their heart is beating as normal or they have like Typekit type Carteret, et cetera, especially when the babies are like in those glass areas that you did, you come up like, actually go and touch the baby or like, do some measurements on the baby. You can actually read what the looked at like. OK, baby still rating still the heart rate. Is this a surprise? And that's all that was more like patient monitoring. [00:19:03][49.9]
Deep: [00:19:05] So the inevitable question here, which I think most people who are non-technical overlook is there's usually like an incredible amount of positive uses of a technology. So maybe walk us through some like legitimate uses of of these scans and the quote deep fake capabilities, because I think there's just a lot of them that maybe on the surface, you don't think about [00:19:26][21.1]
Dr. Demir: [00:19:27] the first research approaches that are supporting people that are like enabling deepfakes for me, users such as these on humans, for example. So in our own VR, now that we are going in the Metaverse, right? So you want to be ourselves in that metaverse, right? Like in general? And how do we do that to connect with the headset? Obviously, we can actually look at what we are seeing, so we need to create ourselves in 3D. And how do we do that face our facial re-enactments or to deepfakes? Right? So that was one of the first cases that was avatars and digital humans was one of the first positive cases that this research was supporting. Another one is visual effects, and for basically like relieving us from physical dependency is right. If we want an actor to be in our movie, but that actor cannot travel or they have some limitations, they don't want to jump from there, etc. You can actually do deepfakes of them so that the sound or the actually physically not possible situation is being animate reanimated by the normal video of those people and that are that are serious that we were shooting in the studio. So we shot the first two episodes of this and the third episode there was a comment, so no one was able to move anywhere. They were not able to come to the studio so that we can 3D capture them. But for the main person, we had his earlier captures in 3D and 2D, so we just let him record a 2D video of himself in his home. And then we applied that visual reenactment to take that video and make the 3D capture that we have from previous episodes, talk and more like whatever he is bringing it to the video. So that is like the last Saddleworth case that we use deepfakes in our studios. [00:21:16][109.2]
Deep: [00:21:16] So, yeah, and I think that's, you know, like everybody's been kind of making fun of this metaverse lately. But I think, you know, me personally around twenty five, thirty years ago, I kind of got into this stuff and spent some time back in VR Ml and everybody saw the 3D internet was going to happen around the corner. But now, you know, it does seem like we're getting a lot closer to some sort of world because a, you know, a good chunk of the workforce and in the world is working remote now. Right now, we're talking over a Zoom call, we're interacting in a flat 2D way, but if you look at some of the there's some pretty wild work coming out of Facebook Reality Labs. So tell me, like what's the state of the art there? Have we gotten to the point where we can handle the uncanny valley problem that you see where the reason most of the 3D movies we saw 15 20 years ago, like Pixar, Toy Story was these cartoon characters, and a lot of it was because people got creeped out when we tried to make actual humanoid looking characters. But how close are we getting to being able to take like and menu and move this conversation from 2D Zoom into 3D metaverse world and have both of us genuinely able to kind of move around in that space and not have it look cartoonish and or just creepy? [00:22:36][80.0]
Dr. Demir: [00:22:37] Yeah. Right. So there are tradeoffs in that world, right? And all those who are just trying to balance or find a middle ground in this state. OK, so the two ends are scaling and details. So for the Facebook case, for example, the set up is small enough that the cameras that make that very modern imagery and the lighting sources are close enough to the face so that you can maybe get poor level details in your facial capture. And then, of course, like it amateurs and like other technologies that come from F4 L can actually carry that conversation into their spots. In that case, you may be very realistic upper body, but below we don't see or that the capture is not that good or the resolution is not that good. And the other end is scaling to the actual right. So in that case, I will give you an example again from our studios. There's a demo scene that we shot in the studio at the very early stage where there was a real horse running in the middle, and we can capture even like the dust particles that are flying around the horse in 3D. So that is like the realism that you get and that enables so many, so many possibilities. For example, in the real world, maybe you may not be able to see the world from the horse's point of view, right? You cannot just put a camera on the horse at Oakwell or something. But invariably electrically, you can actually put your camera everywhere. It's a 3D world you can watch from anywhere, and you don't even need to projected in 2-D like you can let the audience watch from where you are. They may be there like got point of view. They may watch it from the person that is being beaten, or they may watch it from the horse. You know, [00:24:20][102.7]
Deep: [00:24:21] how is that instrumented? Because these people who are in this dome world, are they all instrumented with cameras? [00:24:26][5.2]
Dr. Demir: [00:24:27] No, they are free. [00:24:28][0.8]
Deep: [00:24:29] Is so, so does the dome itself have the cameras and it's able to project into any any 3D point? [00:24:34][5.3]
Dr. Demir: [00:24:35] Yes. So the South has 100 cameras, all of them recording in 8K resolution. So we have 270 gigabytes per second data coming from all of those cameras. So just imagine how much data we are dealing with. [00:24:50][14.9]
Deep: [00:24:50] Hmm. I see. So yeah, I mean, I can imagine like the appeal to a director for making a film is a I don't have to go on set b. I can have much more fantastical sets. See, I don't have to reshoot from different angles and scenes. I just have them act it out. And then I can do that in post-production where I'm moving the camera all throughout the scene. I mean, that seems pretty compelling. Map it back to the deep fakes. Like, what's the technical linkage to the deepfake world here? [00:25:20][29.5]
Dr. Demir: [00:25:21] The first capture of the person of the actor that I mentioned was done in the. But if we want to really make him doing something else or saying something else, we can actually do the 3D deepfake version of it. We can also, as I said, like in the dome, we can relieve the physical limitations of the world. Like, I'm not the visual effects person, but literally how 3D visual effects are done is by compositing, by putting like layer by layer or by object detection, and like applying the visual effects to the objects that are being in focus or something. But when you capture everything in 3D, you actually have the world like you have all the spatial relations, all the empty volume and all the money involved. You have everything. So if you want to add an object, if you want to make some small simulation, if you want to add some fires, everything is spatial and very accurate. So that is another step towards like 3D movies and 3D visual effects that we can actually do everything much more realistically because you have the real 3D information that the effects also cover those like scene manipulation, fake objects and information that you do in the video or the actual like effect, right? [00:26:34][72.6]
Deep: [00:26:35] Let's let's jump ahead. Ten years, so 10 years has passed. OK, tell us. What does the world look like? And let's look at it from a few different angles, so let's first take the news and then we'll look at maybe this metaverse business in the news case. We don't have prolific deep fakes kind of consuming the news yet, but we have an alternate reality among a good 30 40 percent of our population. People believe all kinds of bizarre, crazy things that are completely unlinked to reality, whether it's the Jan. six insurrection about the election or whether it's anti-vax conspiracy theories. How much worse is this going to get in 10 years or is it going to get better? [00:27:22][46.9]
Dr. Demir: [00:27:23] OK. To answer that question, we need different shades of glasses. We either have that like very dark glasses that we are looking to that dystopian future [00:27:31][7.9]
Deep: [00:27:31] for our audiences benefit. I have I have some orange tinted glasses because fall here in Seattle is very dark. And I thought if I can have Instagram filters on my photos, then why not in my reality? [00:27:44][12.8]
Dr. Demir: [00:27:44] So that's my perspective. So yes, we have that like very dark dystopian glasses versus the glasses that everything is like very nicely. So I want to discuss those. Also, there will be some rain in the middle, but I want the first like, look at these. OK, so in Dallas, we are looking at prominence as a longer version. So detection is OK kind of working right now. It may work a little bit, then Gans may get better, etc. So detection may be an intermediate solution that we are lacking or warning everyone, et cetera. [00:28:23][39.0]
Deep: [00:28:32] Just to be clear, so in the Providence case, the AP Newswire, I'm a credible organization, I have a cameraman sitting our camera woman sitting in wherever, Afghanistan, France, whatever. I know that their hardware produced this camera and then somehow in that image or in that video, I know that I can track that prominence all the way back to the organization with credibility. Is that the whole concept, [00:28:57][25.0]
Dr. Demir: [00:28:57] all the way from the capture device to the consumer? You need that verification, which is which can't be solved due to features like Secure. So that like there's a check mark or some authenticity here saying that, OK, we know the origin, we know it is not modified from the origin. The origin was this credible source. Are this not credible source? But this was the origin like, you know, the provenance of the how? [00:29:20][22.6]
Deep: [00:29:20] How do you encode that and how do you not spoof the origin? [00:29:23][2.2]
Dr. Demir: [00:29:23] There are several enterprises that we can do. So if we can gather all the camera manufacturers in the world and give some kind of hash or watermark that they can float, which can be spoke with other things or like, I don't know, even with blockchain or something that they can have that token all the way through the pipeline. Or it maybe does no mean the software, a case that somehow Zoom knows that from camera to Zoom software, that has not been any modification or interception so that Zoom can all come together. OK, this is Mark, I guess, and this is the watermark. And if there's any modification, any fake, any alterations to this video of the watermark will just like to vanish or something. So there are some cases for this provenance approach. We are still exploring and we are still trying to build that consortium around it. And then we have that. We want, like all of those companies so far, have their companies to be on par because they format uses. We still don't run like no one wants deepfakes to be around right for positive use cases. We can have provenance for what's right for positive use cases. We can also have provenance approaches inside those. Yes, so that run gangs create something fake. We know the origin of the gun and we know the WHO created that gun. And we know like me, I'll go algorithm creative or rich dataset created that. Yeah, and for positive use cases, we will still have a lot of fake authenticity by that. Yeah. So it's not like use the word for malicious uses. If you are, for example, using Adobe software and you are using no feed filters, you haven't gotten that narrow filters you want your face to be. I don't know. Right? But although we can actually like Mark that saying that this is an automatic, yeah, this is like it doesn't get him still, but it is like a verified again. So those are the provenance approaches we are talking about. [00:31:16][113.2]
Deep: [00:31:17] Yeah. I mean, that makes a ton of sense for kind of rational, geeky people like you and I. It just seems really complex for people who are already susceptible to bizarre conspiracy theories like you and I can understand this. But can somebody who is like, you know, invading the Capitol on Jan. six? Or are they going to are they going to buy into this or just twist it into some new conspiracy theory? I mean it. It feels like we're entering a world where even the most basic of things sort of negative actors are able to manipulate them and cause effect in the world in a way that you couldn't do 40 or 50 years ago, or even maybe five years ago. [00:31:55][37.7]
Dr. Demir: [00:31:56] Yeah, actually, what in in some applications we actually formulated as a social the origin of trust. [00:32:01][5.0]
Deep: [00:32:02] So that's probably the perfect thing to pick. Yes, that's exactly what's happening, right? I mean, I have very bright friends who are like, I don't know what to believe the end, you know, I'm trying to like, explain my highly nuanced way of parsing reality. And it's not a simple thing anymore because we have such a distortion of economic incentives for, I don't know, distorted reality. [00:32:25][23.1]
Dr. Demir: [00:32:27] One of the point of view is that we haven't talked about is actually legal ways to stop deepfakes. And I am not an expert in that at all. It's just based on my very limited conversations with my colleagues that are, like, very specializing in that area. But we also hope that like some legislations and some actual like laws that are stopping deepfakes from being spread, being generated, be trained, et cetera, may increase the speed of the fakes a little bit. Even if we cannot stop them, totally. [00:33:00][32.9]
Deep: [00:33:00] I mean, you can imagine some just overt penalties for creating a deepfake without disclosing it. It's one thing to do it for entertainment purposes. It's another thing to do it with, like clear malicious intent. I don't know the state of the law, but I'm just going to guess it's so far behind on basic internet issues that it's probably like, you know, just non-existent, virtually non-existent in this case. But I'm sure there some precedent law that that that helps out. So we talked about the ten years ahead on the news front. What's that 10 years ahead vision you have in the metaverse front? [00:33:35][34.9]
Dr. Demir: [00:33:36] I definitely believe that we will be in 3D. We will be having this conversation in 3-D and maybe walking each other like looking back where we want. I don't want I don't mean, look at you. I kind of look at whatever I want. But the problem is, how much weight will I be carrying on my head? Or will it be glasses like yours like that or the contact lens or view of my whole environment to be projected there somehow? Or is it just my item to work here and some backgrounds will be projected there. So I think the answer to that, like we'll be doing this in 3-D, obviously, but I don't know. [00:34:12][36.6]
Deep: [00:34:13] Let me tell you something in nineteen ninety five I worked at at an R&D lab back in in Massachusetts every day for a good three or four weeks of my life. I would wake up. I would go to work. I would enter a 3D world. And in that 3-D world, I had spatial audio. I was a disembodied head. I could blink. I had like facial gestures, and I would meet with an architect who was architecting a 3D world that we were building. I would meet with a scene designer in the world. We were all disembodied heads walking around, all in 3D. That experience to this day is better than the experience I have right now with you talking to you. And that was, you know, a long time ago. It's mind boggling to me that that we still don't meet that way. So I guess I'm way more skeptical than everybody else. People have been predicting that the 2D Internet Internet's going to go to a 3D internet since I was in school and like forever. And on top of that, we thought we were only at that time because there were environments like habit due to the 70s that we're already doing this sort of thing. What's the theory as to why now? [00:35:22][69.1]
Dr. Demir: [00:35:24] There are several fronts for that, but I completely agree. For example, like Ramesh has start from my TV spot, is also my post-doc advisor. He has a paper called Office of the Future. I guess from my piece, I guess maybe even before. I don't remember the exact time, but from that time saying that in the future, our offices will be so well in 3D, all the holograms ever really kind of futuristic thing. And even that time, we know that people are going to end up there. But the I think the blockers work seriously like the corporate power. So we do know have powerful chips and powerful, you know, Moore's law, right? Like the transistor? Sure. Yeah. So we don't have this much small compute power to actually power the headsets and other one is abortion, for example. So how many big companies are actually trying to make that community make up like 3D Realm, where you can actually have your own space and own on? I don't know. Like I recently heard the news about, like real estate agents trying to find some parcels in 3D metaverse like, you know, like this is adoption like all the sectors, all of the parts of economy trying to find their portion in that 3D role. That was not the case because it wasn't as realistic as adopted. Not unlike those big companies are actually taking a stance and metaverse. Everyone is like, Yeah, this is this is happening. This this is there. I want to be a part of it. And I'm not only the researchers and scientific people, but also the general public is trying to go towards adoption in 3D, and the transition to mobile world was not really that easy, right? Like, we don't have that like very sophisticated mobile phones in L.A., we have all the way from the actual like home phones that they don't have tables than the smaller, smaller, smaller than like cleaning effect, the touch screens, everything with our faces, etc., right? So VR is also going there, like if you remember the veil. VR headsets like you need to give blood or maybe Diamond like the room space. We are VR devices that you call like a spaceship, cetera, right? Like from all the way to the glasses, hopefully to the glasses that you know, we are following the same transition from that light form to light forms that we have to the light room space VR devices. So the glasses were like, Yeah. [00:37:55][151.6]
Deep: [00:37:57] All right. Well, thanks a ton. I mean, I think this is a fascinating conversation. I know we covered a lot of terrain. We started off heavy in the deep fake arena and then we covered a lot of metaverse and fake news and all kinds of stuff. So it's it's been a lot of fun. So thanks so much for taking the time for enlightening us with your perspective. [00:38:16][19.2]
Dr. Demir: [00:38:18] Thank you. Yeah, it was a very fun conversation. Maybe just as a last note, hopefully we won't have that real time deepfake detection platform based on fake catcher algorithm that I talked about, like awesome, etc. So whenever it is online, everyone can go and like, upload whatever video you want and see whether or not. [00:38:36][18.1]
Deep: [00:38:37] All right. Thanks a ton. That's all for your A.I. injection. Many thanks to Ilke Demir for being such an enlightening guest. You can find out more about Ilke's work at IlkeDemir.Weebly.com. That's IlkeDemir.Weebly.com. As always, thanks so much for tuning in. We're a new podcast, so please tell your friends about us. Give us a review and any other social media love you can. You can also check out our past podcast episodes at podcast.xyonix.com or on your regular podcast platform. That's all for this episode. I'm Deep Dhillon, your host saying. Check back soon for your next A.I. injection. In the meantime, if you need help injecting A.I. into your business, reach out to us at xyonix.com. That's x y o n i x .com. Whether it's text, audio, video or other business data, we have all kinds of organizations like yours automatically find and operationalize transformative insights. [00:38:37][0.0]