Are human developers becoming obsolete in the age of generative AI?
In this episode of Your AI Injection, Deep Dhillon chats with Matt Van Itallie, founder and CEO of Sema, to uncover how generative AI is reshaping the software industry. Matt explains how Sema’s tools, like their GenAI Code Monitor, are helping organizations responsibly integrate AI-generated code while safeguarding with human oversight. Together, they tackle some high-value questions: Are developers shifting from creators to curators in the coding process? How do tools like GenAI Code Monitor identify risks like security vulnerabilities & intellectual property (IP) concerns? And what does the rise of AI coding assistants mean for transparency, collaboration, and innovation? Tune in to explore how to lead the charge in redefining modern software development.
Learn more about Matt here: https://www.linkedin.com/in/mvi/
and Sema here: https://www.linkedin.com/company/semasoftware/
Check out some more of our related podcast episodes:
Get Your AI Injection on the Go:
xyonix solutions
At Xyonix, we enhance your AI-powered solutions by designing custom AI models for accurate predictions, driving an AI-led transformation, and enabling new levels of operational efficiency and innovation. Learn more about Xyonix's Virtual Concierge Solution, the best way to enhance your customers' satisfaction.
[Automated Transcript]
Matt: Gen AI in general, I used to say, it was at least as important as open source and probably as big as electricity. I now say it's somewhere between electricity and steam engines in terms of how it will transform the world. I like the spirit of exploration, experimentation that starts with let's figure out what we do with it.
So let's actually use it. As an example, 97, 98 percent of developers are using some form of AI, at least on the side. What will come with it, it will be, transparency requirements, risk management, risk reduction requirements.
And it is good that it works that way. At least in the system, people shouldn't be experimenting with the nuclear arsenal, using gen AI on the, on the nuclear arsenal. All right. commercial grade applications. I'm I'm delighted.
CHECK OUT SOME OF OUR POPULAR PODCAST EPISODES:
Deep: Hello, I'm Deep Dhillon, your host, and today on your AI Injection, we're joined by Matt Van Itallie, founder and CEO of Sema.
With a career dedicated to developing code assessment tools and pioneering AI generated code management, Matt's created some impactful tools like the GenAI Code Monitor. Matt comes to us with a bachelor's degree from Swarthmore College and a JD from Harvard Law. Matt, thanks so much for coming on the show.
I'm looking forward, to how some of your innovations have helped shape, development practices, for software developers and maybe diligence applications and, potential conflicts with software licenses thanks so much for coming on the show maybe start us off by telling us what inspired the creation of SEMA and what major gap it fills in the software development life cycle?
Xyonix customers:
Matt: Absolutely. And first, Deep, thank you so much for having me on. I'm really looking forward to the conversation. CIMA got started more than seven years ago with the goal of making software engineering broadly. Understandable to non tech audiences, while also being precise and accurate for technical audiences, bridging the gap between tech and non tech.
As we say, the original use case was mine, sitting in executive team meetings and wishing I had a clearer understanding of what the engineering teams, were up to and what strategic investments were necessary to help make, the product as strong as it could be.
I kept looking at the, you know, we use Salesforce at the organizations I worked with, any CRM, It was so clear that anybody, whether or not they were a salesperson, could understand the broad direction of the state of sales and the health of the sales team, and I thought that code should have that too.
it's a hard problem because there's so many different flavors, that make it a lot harder, to understand code versus sales, but a noble one because if you get coding right, you can make code understandable to non technical audiences, you will unleash so much more innovation and so much more organizational success.
Deep: Huh. I mean, that's a, it's an interesting concept. developers certainly understand the sort of limits of talking about the code. at some point you're just like, I don't want to have this conversation anymore. I just want to go look at the code and see what's going on.
And I think it's interesting to project that up and out of the team itself. imagine it's quite challenging though, because a lot of times. there's some really geeky, nerdy things that one has to do as a software engineer, refactoring, or, you know, you're in the nuggets of an algorithm or something.
And I've always seen that it's more like social resistance to trying to understand what's in the code than it is an inability to understand what's in the code by non technical audiences. Are you seeing something similar to that? Yes.
Matt: Yeah, it's a great question. before I did the work that I do now, I was a school district administrator looking at data to try to improve student learning and teaching.
And one of my favorite quotes from the education days comes from Kierkegaard, a philosopher in the 1800s, that if you want to teach someone, you must first learn What they know and what they don't know. And so my, my broader take away from that, which I, I try pretty hard to live and certainly SEMA as our product strategy tries to live is we should be bending over backwards to make something understandable to the audience, really go out of our way to put things in terms that the audience Can understand, if you do that, you make it more likely that they get what they need out of the conversation and you get what you need out of the conversation.
a tiny example, from one of our products. you can think of our, we have 2 products. the 1st 1 is a code scan, kind of like a home inspection, it looks at the roof plumbing and electricity, all at once and typically around the buying and selling of a house, our code scan product looks at different areas like code quality, security, cyber.
Like gen AI, which we'll talk about so comprehensive and frequently happens during technical due diligence or prepping for diligence.
Deep: like
dependent libraries And the licenses that they exactly
Matt: open source license risk, open source security risk, open source version risk, all of those factors.
Exactly. so let's just take security warnings for a second. when we look at security warnings, one of the documents we produce is a summary. In that summary, we give a color coding, green, yellow, orange, red. everybody who's ever seen a traffic light, knows that, green is better and red is more risky.
And what we do is we take each code base, we compare it to a relevant comparison set, Such as a software organization with a similar amount of code, similar number of developers, similar age of the code base, and we just count the security warnings. We don't talk about any specific ones,
We're just doing a count and if the company, if that code base is green, that means relative to the other cohorts, they're in the top 25%. yellow, second, orange, third, red, fourth, from what we've learned from our customers, that's usually enough to start the conversation.
Cause if you're going to a board of directors meeting, or talking with a non technical CEO, they don't need to know. How many cross site scripting warnings they are. They need to know roughly speaking, how are we doing on security? And so to start with that kind of empathy, we found a very positive reaction, overcoming some of the fear and the hesitation for non technical audiences to understand it.
And then of course they can ask follow up questions or they know that they can ask someone who can really understand the detail to get to the bottom of it. I see that we're red on the security list. Please come back and tell me, CTO for example, what's it going to cost to get to yellow?
What's it going to cost to get to green? So the CEO can keep it in terms she understands of, finance, trade offs, budgets, et cetera, while still enabling the technologist to have the deeper conversations.
Deep: Interesting. So are you finding that your core users are non technical or do the development groups also embrace the tool who's the primary user?
Is it the non technical user that can come in and ask questions of what they're seeing. And are they driving that? interaction.
Matt: the four primary users of the SEMA code scans product are boards of directors, CEOs, heads of mergers and acquisitions or corporate development.
and CTOs. CTOs can be explaining the results, if it's a scan performed on their code base, or they might be understanding the results of someone else's, it is that C suite and board discussion about the health of the code base. It's an interaction between those four groups and a fifth.
most of SEMA's work, happens through, consultancy. So technologist, advisors. That's a critical fifth group who, who uses these results, but not hearing that list. not on the primary set are developers and development managers that sometimes look at the detail, but the primary audiences for the core reporter, those five, you know, C suite, type roles.
Deep: So basically people doing diligence.
Matt: People doing diligence, reading board decks, understanding what's it going to take for the company to make it, for the software organization to take it to the next level, executives thinking about the health of the business, more so for this product, than coders, doing the software engineering themselves.
Deep: And do you find that, one of the things with a lot of these tools is, coming at it from the development tool base, which can be even more kind of geeky and nerdy is that they spew a lot of light and obvious things that don't really matter in the large context but usually buried underneath a lot of this Straightforward, let's call it stuff to deal with our actual issues that are meaty that can take down servers and cause real problems. And it requires sort of a fair amount of contextualization to, like, understand what it is you're actually looking at.
Do you find that performing that contextualization task for non technical audiences is a challenge? And, what kinds of stuff Do you do to help them understand the practical ramifications of what they're looking at?
Matt: I definitely accept the premise of this question. I wrote about this a few years ago.
I think it's probably the best thing I've written. At least the most important thing code is a craft. It's not a competition. Everyone can win, it benefits from people working at it, and improving what they do, it is innately satisfying, not that it doesn't also come with external benefits, but doing it the right way is itself satisfying, and it's not about winners and losers.
And so much of helping software development work right and. Enterprises who are competitive and do have numbers to hit, is respecting the craft of coding while also, being realistic that, in my view, it's absolutely incredible that millions of people get to carry out this really wonderful job of building and ideating and testing, being a software engineer in the right place can be a magical job.
In exchange, we have to make sure it makes sense for it's, it's helping achieve business goals or organizational goals. And so that's, something we're really excited about. we think the right kinds of communication can go a long way. To be a slightly more specific, we try pretty hard in our, you know, so this is a report in the Code Scans product, for example, but also in our dashboards to provide that contextualization from the beginning.
Maybe I'll give a tiny example. we have a one to a hundred score, about code base health. That is provocative. And opinionated. We did, a lot of analysis, probably five years of analysis to collect enough data before we could create a scoring system like that.
Cause it's not hard mathematically to start assigning rankings. What's hard is to do it with judgment and integrity. And just to give two tiny examples of, how that methodology works, which is public happy to tell anybody who it is. it's not a secret, of that a hundred point scale, and this is non functional requirements, or what we call code based health, not what the product does and how well it does it, but things like security and quality and open source risk, etc.
Of those 100 points, approximately one point, uh, it is five divided by four. So one and a quarter points is how many line level warnings The code base has audiences, a technical audience listening to this. That's linter results. Um, folks who are not technical, that's like grammarly findings, summing up all the grammarly findings.
We still do that segmentation of, are you in the top quartile or the bottom quartile, wherever of how many of those line level warnings you have, but even if you were extremely off the charts in either direction, it's really not going to move the needle on the health of the code base and the ability to maintain and expand it.
By contrast, we have one individual metric that's worth 25 out of the 100 points. Deep, do you like improv? Do you want to take a guess at what we think is the most important metric, about code based health? And I can give you a clue if you want it, or no clue.
Deep: is it just the code base you're looking at, or are you looking at the GitHub repo interactions too?
Matt: Great question. So we, it is not just the lines of code. That's why we call it the code base. So this is something that's not specific to the lines of code. That's the clue.
Deep: I imagine, maybe like the, the rate of reviews on code or something.
Matt: That is amazing. I'm a passionate fan and code reviews.
And so I would, I would definitely agree that that was very important. The one that is most important for us 25 out of 100 points is key developer retention. Which is to say product group by product group, what percentage of the developers who wrote a meaningful portion of that code are still at the organization?
Deep: Oh, not necessarily on the team.
Matt: Not necessarily on the team. So if you're if you wrote the most code on product A Oh,
Deep: that's funny.
Yes, because The core developer writes the thing, disappears, and then, other folks, don't want to get their hands dirty and like dance around the code, around the periphery.
And then they find a, you know, a way to interact with the barnacles and make them bigger. Exactly. Right. Just grows into a
Matt: Exactly.
Deep: That makes sense.
Matt: But it makes sense, right? and the cute way of saying it is that having the code, but not having the coders who understand it.
It's like having a half written novel without the novelist, because code doesn't end. Of course, you have to upgrade it, you have to patch it, you have to be responsive to versions changing, etc, etc. So the code itself, without folks who have that deep subject matter expertise, is incredibly risky. And by contrast, having coders who know the code makes a huge difference. So what did you
Deep: do to make it, like, how did you determine that that should be 25 percent of your score?
Matt: Yeah, we did this for five years. And we had conversation after conversation. one, first three parts, response to your answer, which I'm taking at length, one is I totally agree. Code, the context really matters, really, really matters in a way that's true for code, I think arguably more than any other, you know, modern, corporate function, certainly one of the most second we did a lot of research, and had a lot of data to apply context, Automatically out of the box, to speak, things like saying 25 X, uh, developer retention counts 25 X more than than line level warnings.
The third is even with our report, you still need that contextualization. you still need the conversations, to really get at the ground truth, the. Quick example. We were looking at a code base that in the summer of 2020, where the development activity decreased by about 20%. And it was a big enough company.
I mean, if you have two developers and it declines 2%, you know, someone could go on vacation or whatever, right? There's lots of possible reasons, but a company this big, 20% Was meaningful and it wasn't, wasn't just random variation. And, uh, when we asked, well, what was actually causing, it looks like there was a 20 percent decline, is that right?
There could be a false positive, blah, blah, blah. And the answer was there was a 20 percent decline in development activity. I said, summer, 2020, the company was facing some. Pretty significant, economic risk because of COVID and they furloughed the engineering team or everyone at the company one day a week.
People just didn't work on Fridays for the six months leading. it's usually not that it's usually not that pretty. but in all cases, even, you know, as hard as we try to get the con the context, uh, automatically applied. It's still definitely necessary for an expert to understand the code and go deeper, and that's why SEMA is a software company, that's why we work with advisory partners, because it's their, their job and expertise to get that context to hear the real story and, and understand it.
Deep: So let's talk a little bit about the AI role here. So before we dig into the maybe what I'm assuming is like the use of LLMs to kind of assess code quality. Can you walk us through what is the objective criterion for saying that X block of code deserves a score of a hundred versus a score of one or 78.
And like, what's the difference between your scoring? So like, is this heuristics that you have evolved or is there something directly tied to maybe productivity on a code base over time? Or like, how, how is it that you're defining the objective truth of the score?
Matt: Yeah. And I don't know that I would say objective.
I'd say. opinionated based on evidence and transparent so folks can see our methods and if they think they would like to weight it differently they are able to unpack the results, unpack the components and apply their own weighting. Incruder, yeah, rubric. We worked pretty hard to get it right, but there are so many experts in, uh, software development and understanding, uh, and assessing software engineering.
We, we welcome folks bringing their own point of view. So there's certainly, certainly not just one way. It just happens to be our way that, that seems, seems to work pretty well. There are, 100 score, it is at the level of an organization could be one or more products, uh, all of the software engineers involved, and it encompasses now eight different.
Areas. One of them is open source legal risk. A second is code security. A third is code quality. Like any proud parent, I would be happy to name all eight. But, you know, it gives you the idea of those.
Deep: So if you if you can go into those, I think knowing the buckets that you're looking at is helpful.
And then understanding their temporal relationships also helpful.
Matt: Give me an example of explain the temporal relationship.
Deep: Like, are you going back 10 years into time in a GitHub repo and looking at the code's evolution as a function of its security score changes over time, something like that.
Matt: Perfect. So, so the eight are code quality, various forms of technical debt. That time series is only the present. Second is development process. That is how. the code, um, the development activity on the code over how it's evolved. We do time series of all time, but the particularly interesting time period is the last six months.
So we might, see how it's changed, you know, over years or decades, but really with respect to risk and consistency and the future. It's really heavily weighted towards the last six months.
Deep: So some of those variables would be like frequency of check in, lines of check in, of developers,
Matt: the degree to which comments are being added and how robust those comments are. Degree to which testing is being added. So we call that a process metric. Whereas the Absolute amount of testing is part of code quality, but adding testing is part of a, uh, it's part of a process metric.
Deep: And what about like, are you looking at stuff like, a generative code, uh, insertions?
So somebody is using GPT or claude or copilot or whatever, and they're taking suggestions and inserting, you know, code from that versus, when you can guess to be handwritten code.
Matt: Yeah. And so that is one of the eight modules and it really is our second product for the first six or so years.
We only did the other modules. One of them is open source risk as your listeners. No, of course, open source is code. The team didn't write, came from the community. It's a good idea. avoids rework, uh, et cetera. helps with developer satisfaction, but it comes with legal risk. Okay. It comes with security risk.
It comes with maintainability risk. All those things are also true about gen AI code, whether it comes from a gen AI specific, a code specific gen AI tool or a general one. It's a good idea for team productivity. There's literally a new. Study on GitHub that just came out today. Not by GitHub. That's, that's really, quite positive.
It's good for developers if used correctly and this, the, the right scaffolding and, you know, fits into their workflow, but it also comes with security risk, maintainability, risk, uh, IP risks, yeah, to a certain extent, all of those, and so actually some of our customers said you manage. If you detect open source composition and analyze its risks, please do the same for Gen AI code.
And so that became both a module of the report and also a standalone product. So our contribution and thought about Gen AI is detecting, uh, and helping guide the use of Gen AI code, which we do in part through a Gen AI model. So Deep, you're exactly right. that is now one of the eight modules that we use.
Thanks
Deep: I had an experience, maybe this is a while ago, but I was selling my company And I was, at one point, I was sitting in a room with seven lawyers you know, as a non lawyer, it was a horrifically boring conversation to be in. Um, but I was the only one who actually understood how the system worked.
And there were seven lawyers and they were all arguing for what I think was almost two and a half hours on whether, um, any software at all could be used. Uh, using the sun, Java language at some point I kind of, and I totally tuned out, but I was just like, I was doing the math in my head. I was like, okay, this is like, at that time is like, you know, four to 600 bucks an hour per lawyer times seven.
And I'm just calculating the loss in the sale of the company. That's going to these people who are arguing about something that I can only assume is like the most idiotic discussion that I'd ever heard in my life. So I finally, I spoke up and I said, look, I don't, really care about the legal nuance of this argument right now.
All I can tell you is there's 22 million developers that are using Java right now. So other lawyers who understand this in a lot more depth have already blessed it. So I think it's time to move on. And then they basically just sort of stopped and said, okay, yeah, let's move on. I bring this up because, it seems like if you ask a bunch of lawyers to look at the risk of something like Gen AI introduced code, they're.
Inevitably, depending on the evolution of the new technology, they're going to take a very conservative stance in the beginning phases, a slightly less conservative stance 25 years later, but in general, it's going to be an extremely conservative stance because they're lawyers.
But there's a risk on the other side of saying, Hey, we're going to be the only dev shop that doesn't use Gen AI. Again, I keep going back to this contextualizing the findings, but to me, this is the challenge with what you're trying to do here, is getting non technical audiences to understand what's, what this stuff means.
Um, stopping teams from using generative AI is something that many, many large companies have decided to do, and will only kind of get dragged in over time as they start acquiring startups that ignore all of that advice and do it anyway. And the milieu of legal actions that will erupt will eventually settle into something like, I don't know, just do it, like, whatever.
That's where it's gonna wind up, because it's too much of a, it's, it's A, impossible to really detect whether or not something is Gen AI written, if you're doing it successfully today, argue it's only because developers haven't figured out you're doing it and it's, getting in their way, but the second they do, there will be a long army of tools to obfuscate it so your models will fail again.
So what set you to that critique? Amazing. At the end of the day, this is actually not possible is my argument.
Matt: That detection is not possible.
Deep: That plagiarism detection that's accurate is not really possible.
Matt: Do you think open source code, detection is possible?
Deep: I think it's possible to detect when somebody puts a library in or somebody, attributes the code effectively.
But I think it's not possible to detect when somebody jumps into an open source Apache project, for example, or GNU, let's take a GPL project, for example, cuts a bunch of code. Paste it into their IDE, relabels a bunch of variables, starts manipulating it, like exerting some editorial control, and now shoves it into their code base.
So like, somewhere there exists a spectrum. On one end, it's like, yeah, proper attribution. On the other end of the spectrum, it's like, I expended effort to obfuscate the fact that I stole this code. And I think the same kind of Spectrum exists in the Gen AI world. Hey, I've watermarked my gen code generation, and I've got like the statistical distributions of, you know, variable naming techniques and all that kind of stuff. You can catch that. but can you catch the other end of the spectrum? No, that's why I think it's so important to like, contextualize this stuff.
Matt: Love it. Many, many, many points to react to on what you've said. I'll start at the biggest picture and then get into detail. First, we detect gen AI primarily to increase its usage, and to increase its responsible usage. In almost every situation where we looked at code bases, code owners, want to use gen AI.
And what just want to make sure it's used correctly. And in almost all of those cases, that means they need to be increasing the amount of usage. not trying to prevent it. There are certainly some number of organizations who, Are worried enough about the risks involved that they're not quite ready to adopt it.
But those are few and far between. The vast number of professional developers are using Gen AI coding tools at work, and we are incredibly supportive of that. Tools like ours are there to, frankly, to increase usage, assuming it's, you know, that's the organizational policy. So it's helping give guidance on how, how much more could be used and what parts of the code.
And then also to make sure that the code is used, that Gen AI code is used safely. And I can go through the detail, but the bottom line of using Gen AI code safely. is making sure that a human is reviewing the code and putting their stamp on it. And almost always that means the code coming out of the LLM is going to get modified, or we say blended, by a coder.
extreme other example, if you were looking at a code base that was 100 percent pure Gen AI, i. e. the only thing that happened was someone, prompted an LLM and spit it out and didn't make any changes to it. You should be very concerned that there might be security issues, there might be quality issues, there might be IP risk issues, etc.
So for us, and for our users, GenAI code transparency is generally about GenAI code. Number one, and then number two, um, making sure that it is blended GenAI code rather than pure.
Does that make sense?
Deep: Yeah, yeah, that makes sense. So, so tell me a little bit about, like, what's the general state of copyright concern
because what I'm reading from you is that, That's a fairly liberal take on Gen AI generated code. So is the copyright concerns of whatever OpenAI or Anthropic or whoever did to like slurp up code and it's just sort of accepted that it's okay to like use this stuff? Because. the parallel in the natural language generation world is that it's not okay high school teachers certainly will be bothered at the idea that their kids just, said, write me an essay about X, and then maybe a little bit further upset.
Hey, write me a paragraph on X what's the reality of the copyright concerns around code, generated code.
Matt: Yeah, I'm going to give a little bit of structure before I answer that specific question of the possible risks of Gen AI as opposed to the benefits of increasing productivity.
One risk is it comes with security vulnerabilities. Another is there are. Code quality risks, such as lacking context slash as not being understandable, such as making up packages. Um, and then there's intellectual property risk within intellectual property risk. There's copyright or training, set data infringement.
Let's just call it training, set data infringement, which could include copyright infringement. There's receiving copyright on the code, for companies that. Pursue copyright protection of software they've written, there's patent protection, on the code they've written, there's trade secret protection, and then there is something that we call deal risk, a variety of intellectual property risk.
With me so far?
Deep: Yeah, I am. Yeah. I'm thinking of a very specific scenario, right? Like, so I'm not using a coding tool. I'm just going to chat GPT or, or clot or something, or a Gemini
Matt: general LLM. I'm going
Deep: to a general LLM and I have no idea what they trained on. Cause they're not public about it. So I don't know whether or not, for example, chat GPT trained on GPL code.
For example, and when it generates code, I have no ability to prove that it did not use GPL code in its understanding and its generation. I like potentially infringing?
Matt: Okay. So we'll, we'll call
that training set data infringement risk. Okay. we assess it is. Very unlikely that there is actual copyright infringement of the training set data because, the LLMs weren't trained on copyrighted code.
They were trained, some of them on open source code. And so the infringement is less about copyright, potential risk. We, we assess the risk from copyright infringement. , to be extremely low, but now let's talk about, license infringement. And so the specific method that this could matter, just as you said it exactly right.
a LLM used for their training set open source code. And some of that open source code came with a restrictive license, such as a copyleft. Uh, GPL or AGPL, and that code makes it into your code base. So that's, that's the topic. Couple caveats. What we're about to say is not legal advice.
And if you are working on, if you're at a company large enough to be worried about this, then you absolutely should be bringing in your own lawyers and have them opine because they're the ones that are going to need to back you up. that's part one. Part two. An extremely important thing to do.
No questions asked is to only use, LLMs from specific companies with a tier that provide indemnification against this risk. So several organizations have already thought about this and are providing. Indemnification clauses, if you buy the right license tier, that, to the best of our knowledge, they have not been tested, nonetheless are, aimed at this scenario, a training set creator coming after an LLM user.
Now, if you're at an LLM, of course, training set creators are coming after you. That is a known and real risk there. These companies are being sued as we speak when it comes to LLM users, using a tier of an LLM with the right kinds of indemnifications and protections, and by the way, some of those tools you can choose not to be trained on certain open source libraries that are riskier. We would certainly recommend that as well. We assess the.
Open source infringement risk to be relatively low. but again, we would most certainly want you to check with your lawyers to make sure that, you know, the company as a whole, it feels comfortable with its with its risk exposure
Deep: so that part of it. So just I want to just translate that a little bit. So If I'm using Microsoft Copilot, for example, Microsoft puts an indemnification clause like this in their license?
Matt: at least at Enterprise. If you're paying some bucks, you get the indemnification. Yeah, not the free version, but the Enterprise version.
Deep: And, and therefore, Whatever, if somebody sues me or if I'm going under diligence, I can just say like, Hey, all this code was written using Microsoft Copilot or whatever, at least to my knowledge now on the other end of the spectrum, if I'm just, if my developers are just jumping into chat GPT and like cutting and pasting stuff out.
I don't know if they, do they have an indemnification clause like this? I'm guessing not. But they do as well
Matt: at the right tier, but absolutely anybody out there, if who's saying we shouldn't let our developers use it, I'd say, well, they are using it. So please give them the right tool with the right license.
So you're protecting yourself, not only from training set infringement, but also trade secret,
Deep: That's like a really intriguing point because. That was what I was going to get at is like, if you're not, if, if developers are going to use this stuff, no matter what, they're going to take the path of least resistance and they're going to, they're going to do it.
So I don't think this message is out there very loudly spoken, the one that you're kind of making here right now, which is developers, Unless they're living in a rock or going to use tools, you're best off giving them tools that contain indemnification. Otherwise, uh, you're assuming that that liability risk
Matt: for indemnification and also for trade secret, which basically means you get to protect your I.
P. As long as you don't give it away and put it out on the street and just like you shouldn't upload your code to stack overflow, or publish private code on an open source repos, so too, you should not use a entry level license on an LLM and load your code into it because then it becomes part of their training set.
And so the more professional licenses of the major providers that license comes with, an agreement not to train on your data as well as identification. So it's . I mean, we're advocates of Gen Ai, for coding primarily because it helps developers and therefore helps organizations achieve greater things.
But my goodness, if you're on the fence, Know that your developers are going to be finding a way anyway. Like imagine trying to prevent open source usage. Imagine it, it would be impossible. It would be crazy because developers are good at finding solutions. They're going to
Deep: tunnel out if they have to, they're going to tunnel out from home.
And even, and I would kind of go a step further, which is, I would say that even if you do give them the tools, they're picky lot, they use the tools they want to use. They'll tell you they're using your tools, but they're not going to necessarily. And they're still going to jump into, you know, open AI or whatever they want and generate their code and stick it in your code base anyway.
So I'm assuming there's some risk exposure that's reflected in your scoring, but maybe based on this kind of stuff too.
Matt: Yes. I second what you said. Worse is not letting them use quote, not letting them use any tools at all. That's not good. Forcing them to use one is certainly better than not letting them use any.
feasible, let the developers pick the suite of tools, you know, within guidelines, it has indemnification and there's no retraining. Let them pick. and, ideally when, if, if an organization has the budget, let them use more than one, because the technology is evolving because, uh, individual developers have different preferences.
There's anecdotal evidence that certain of these tools are better for certain instances. for most organizations, the monthly cost of a license for a coder to use a Gen AI tool, whether it's specific to coding or not, is many, many, many orders of magnitude less than the cost of that developer. So if they think that one or two or three tools combined will help them do their job better, almost always, ROI pays for itself.
SEMA has a free ROI tool if you want some help doing that math, but orders of magnitude buying several 20 to 40 licenses per month per developer, given what many developers make, make some basic conservative assumptions about the impact on their, productivity, job satisfaction, retention, all of the above, there's, usually a 10X, uh, return, 1000 percent ROI.
Deep: Oh, interesting. Okay, let's maybe change directions a little bit. I want to spend a little bit of time, I'm going to sort of guess how your system works. And I want you to tell me where I'm wrong. I haven't actually spent much time with it, but I'm, guessing that you pointed at your GitHub repo or some other repo, the system has, privileges to go in and look at, not just the current source, but the historical rendering of the, of the source.
Then you churn away, uh, maybe on a per release basis, maybe on a per day basis. You probably use a lot of like available tools that have been around for a bit that, will grab the code and look at warning levels at the line level, like you mentioned.
Stuff like, Lintern, Javadoc, all that kind of stuff. Then you probably take all of that information, plus, the actual code, plus maybe something else that I have to think about, but then you go to the LLM and you ask for scoring with explanation on why it's scoring.
Maybe you have fine tuned models there to like fine tune , the scores based on like a ground truth, if you will, where you sort of manually went through and like rated different blocks and chunks of code, and then you wind up with your scores something like that. Am I like off on Mars or am I in the ballpark or?
Matt: Very good. Let's just take the code scans product, producing, um, uh, an end result. analysis and end result, report about a code base. You're absolutely right. A big chunk of it is, mechanical or deterministic, evaluations of the code, for things like line level warnings or security warnings, et cetera.
Broadly speaking, wherever we can be mechanical, we prefer to, we don't use AI for its own sake. One, because we don't use any tools for its own sake. It's all about outcomes, but also. A big portion of what we do in code assessments is for incredibly high stake situations, and it is way easier for everybody involved, including post folks who've had blood, sweat and tears careers trying to get this code base to a good enough spot to sell it.
It's so much easier to have a conversation about with mechanicalistic, output, deterministic output. We counted 75 warnings, security warnings. Companies of your size and stage typically have 300 to 800. So you're doing better than average, right? Or vice versa. So the rest of the product is primarily, uh, deterministic, excuse me for AI detection in these reports, we do use a probabilistic, AI approach. We have our own, tuned deep learning model that was indeed trained on definitely gen AI code. And definitely not Gen AI code. How did we do that? Definitely not Gen AI code. We used pre, open source code that came before, Gen AI tools.
Coding tools are out in the wild. Definitely Gen AI code. How do we know that? We synthesized it. We gave NLM instruction, instructions and spit it out. And then it produces a prediction score of whether or not it's Gen AI or not. And then we carefully manage the. You know, the cut point to manage false positives and false negatives.
And then there's just been a ton of real world experience. Code is so varied, that you just need lots of, uh, lots and lots and lots of bats to make any interesting comment about code because of the number of languages and the number of uses, et cetera. And so the work after the original tuning of the model has been hundreds of tweaks based on new data, based on false positives and false negatives, constantly tuning the model itself and then optimizing the surrounding, tooling to get the right balance and as low as possible, uh, false positives and false negatives.
Deep: I think that gets me in the headspace, of what you're up to. And I think I can kind of wrap my head around it. So we've talked about what you do. We've talked about how you do it. Let's talk about the should, like this seems pretty straightforward.
Like you got a code base. You want to go in and like assess if you've got dangerous code But like let's get imaginative here and a lot of times it might not have anything to do with Sema Like it might be up a level or two. But what are some of the ethical? scenarios that might come up here
, maybe we even have to go out 5 or 10 years. But like, what are the ethical scenarios here that, you know, maybe are the ethical questions that maybe you struggle with?
Matt: Sure. I should say that I've been incredibly lucky from where I grew up and the education I've had and the parents I've had and the career opportunities that I try pretty hard to only work on things that are either morally neutral or morally positive doesn't mean I always get it right, but we certainly, we certainly try.
I try pretty hard to have made those choices and, building this business with that in mind, the number one, um, ethical issue about Gen AI code, is making sure that humans are still involved. and we come back to this notion of pure versus blended. I do not foresee a time, where a human supervision of what that code is up to is no longer necessary, certainly not for high stakes.
software applications. And so for us, Gen AI code transparency, not just Gen AI originated, but how much of it is pure versus blended is, we hope an extremely important tool to make sure that developers continue to stay to touching the code. because of what could go wrong, with code going in the wrong direction, we also think, you know, It happens to have some positive benefits for coders.
the rise of open source code didn't eliminate the need for coders. It changed the job, which is a big portion of the job is deciding which package to use and being thoughtful about making those trade offs. You're a system designer, more than heaven forbid, building your own, front end libraries, right?
And some people can do that, bless them, but for the vast majority of people, they can stand on the shoulders of others. We think Gen AI should and could be the same way, which is it can avoid some of the some of the manual work. I mean, there's no substitute to learning how to code by doing it by hand and understanding the basics, but once you get the judgment and wisdom, you can go faster with tools just like open source which Gen AI.
But we really want developers to stay involved, for the code basis sake.
Deep: Do you think that we run the risk of turning your average developer into a Homer Simpson who's just keeping an eye on the nuclear plant? I mean, nuclear plants , they're designed to run for centuries and, maybe even millennia without human intervention.
When you have to write your own code, your brain's pretty actively engaged and it's fun and interesting and quite engaging. When you're hitting a button and code comes out and it fails one out of seven times, you're still fairly engaged. And I'd say maybe one out of three times we're at that sort of point for at least coding in the small, like an encapsulated function, something that's like a really kind of a narrow task.
the rate at which this stuff is getting better, it's an obvious next step to like have a higher stack reasoning model, say what you would say when you look at this code. Five years down the road, it's hard for me to imagine where the models aren't almost always Right. so then it, it's inevitable that you start jumping up a level and abstraction, very, very few of us still write anything in assembly, similarly, like if you go back with each era, few of us write much in C even anymore today.
And less so in C plus plus, and even less so in higher stack languages, like C sharp and Java. What's the trajectory here? Like, it feels to me like it's a, it's a concern in general with AI that we dumbify the people that have to use the stuff. You no longer need.
This really exacting, thinking, and the assist is getting so much better at some point you're like, not Dilbert, but like the dumb boss, the pointy haired boss surrounded by incredibly capable people. you think, you know, what's going on, but you don't.
Matt: I definitely understand that, concern and I definitely think it is worth keeping track of. And of course the honest answer is I don't know for sure the direction we're heading a couple observations. I accept the premise that. You know, over the history of coding, the work is getting more and more abstracted.
Folks need to know the details a little bit less relative to previous generations. I think that is a fair, Fair encapsulation. Of course, my observation, I'm sure it's yours as well, that, the current generation still has many, many, many, many, many opportunities to really understand it, the code and needs for coders to understand the code and go in deep that just because folks aren't, inserting the punch cards like my parents did.
When they learned to code, right, there still are today. Great opportunities to, to stretch your brain and to have to be interactive with it to understand to understand it. That's been true at. Every stage of software development, it's certainly possible that we will get out of that completely, but the history isn't so far at least is on our side that there will at least be for some developers that need to really understand.
I do think the rapid, advancements of these Gen AI tools will change the nature of folks jobs. I think there is no, there's no question.
Deep: Already has. Already has. Tremendously. Even in just the last couple months, it's changed so much over a year ago.
Matt: Exactly. I'm still at the point where, you know, we see it , as a change in degree, not yet in kind. There have been low code platforms where , a very small percentage of people have to learn how to use the, have to code the low code programs, and then a vast of people can take advantage of it.
You could argue that learning Excel, right, a small number of people relative on Earth program Excel. And then, hundreds of millions of people use Excel itself. They're certainly not Excel programmers. There still is some thought and effort involved. Obviously a lot of thought and effort involved, but it's not traditional programming.
I do think we'll continue, we'll continue heading in that direction. I think, certainly, more and more people who aren't traditional coders will be able to create. code as a result. We're obviously seeing that right now in, uh, at least in toy examples. That's only going to continue. Uh, I guess I'm an optimist that there will still be hard problems that humans need to design, need to interact with the components to, to make it work.
And I honestly, this is not puffery, I don't believe there is a meaningful difference between putting the Gen AI coding tools to work and open source to work. I do not. They're both code you didn't write. This takes an incredible amount of judgment to decide how to use it and whether it's right and be situationally appropriate and stitch it together with the context that you need to solve the problem.
I, really do believe that Gen AI coding assistants, whether or not they're Gen AI specific or general. I really believe that they are just an extension, uh, an extension of what's happened with, with open source.
Deep: Yeah.
I'm going to riff on that a little bit, I think, , probably the thing about open source that helped it really take off is that You can take your library and you can stick it in your application and just use it and never crack it open and look at the source.
but generally if you do so, you have trust in somebody who has opened it up and looked at the source., but the point is that with all open source, by definition, you can open it up and look at the source and that source is human intelligible. By people who are skilled in the art and they can go in and figure out what it's doing and how it works now.
Today we're talking about, generation of computer languages that programmers understand and know well tomorrow. I think we have a choice. I think we'll probably, with the increasing power of these models, go up in a layer of abstraction and just talk to them in English, you know, or whatever language, and interact with them in some kind of like hybrid, super high level language.
The ethical question I would raise is, do we have a right to dig in down to the low level of the code that gets generated from that high level interaction? Today, we take it for granted.
Everybody, you know, you go to GPT, you say, make me an algorithm that does blah, it spits out code, you definitely understand, you put it in your IDE, you go. Tomorrow, I'm less convinced that we're going to even want that step in there. That we're just, like, I can, like, even speaking for myself, , I was working with a,, a vector store, this tool open search.
And, um, I didn't want to go out to stack overflow. Like it would have a couple of years ago to like, and get my, toy example together. I just need to take this thing, make a, make an index, stick a document in, uh, stick five documents in, grab a toy document, search for it. Give me the output.
So I go to, you know, four. Oh, I said, do this. It spits out something. I run it. It fails. Yeah, I can go look at the code, but it takes like time and energy and I'm lazy and I got to go to lunch in 20 minutes. , so I just copy paste the stack trace, stick it back in 4. 0 and say, try again. I loop around six or seven times.
It keeps flailing. And then I'm like, all right, whatever. Let's just use a one preview. I do that once it works. So now I can go look at the code. Um, but fast forward a couple of, not even a couple of months, right? Like o1 churned on it for a good couple of minutes to get, because, you know, there was something going on in there that was significant.
Fast forward a couple of months, going to get faster. That's going to get more seamless. It's going to get it right. It was a fairly complicated little piece of code. It wasn't super complicated, but I'm lazy and I didn't bother. Now fast forward a couple of years, I don't think I ever even need to crack it open and look at the code, but whether or not it's even possible for me to crack it open and look at the code is an open question for humanity.
And I feel like that is something that we should mandate by maybe by law or something. It's like, if I take my Python, I guess I can, you know, jump all the way down into the executed assembly, but it's certainly not convenient for me to do so. And it's not something that I have to do, but it feels to me like core to your premise.
Is a future where humans actually can jump in and see what's going on. And the risk is you know, like the doomsayers are like, well, how do we know there aren't Easter eggs in this generated that the AI decided to put it in there on its own? Okay. Maybe it decided, maybe it just screwed up and put them in there, but like, either way, it's a legit concern.
Matt: I do believe that transparency about the use of Gen AI is important today and will only increase. I'd say adoption is outpacing transparency, requirements and concerns. and not just about Gen AI for code, but Gen AI in general, because people are You know, experimenting with this new technology and finding applications for it.
I used to say, it was at least as important as open source and probably as big as electricity. I now say it's somewhere between electricity and steam engines in terms of how it will transform, transform the world. I like the spirit of exploration, experimentation that starts with let's figure out what we do with it.
So let's actually use it. Developers are as an example, developers are finding their way to it. 97, 98 percent of developers are using some form of AI, at least on the side. What will come with it, it will be, transparency requirements, risk management, risk reduction requirements.
And it is good that it works that way. At least in the system, people shouldn't be experimenting with the nuclear arsenal, using gen AI on the, on the nuclear arsenal. All right. You know, commercial grade applications. I'm I'm delighted. I'm absolutely delighted. I do think just like I'm a broken record, just like open source started with people using open source.
And then at some point people realize the security risk and the IP risk and the maintainability risk and added transparency requirements that are now mandated in. Procurement, in diligences, in insurance, like, we'll see that coming, um, with Gen AI for code and otherwise. I think ultimately, I mean, as long as you get the, as long as you get the transparency right and the various forms of regulation right, I think it will be a net positive.
I guess I'll be slightly provocative. I don't think there's a necessity that folks can open up Microsoft Excel's code. I think 99. 999 percent of interactions with Excel can be users of Excel by people who will never be allowed to see it. I have faith in regulatory and compliance organizations to make sure that Excel was set up the right way without having requiring individual users the right to see it.
Deep: Yeah. I didn't mean to, I didn't mean to imply that. I meant that developers have a right to get into like Microsoft Excel developers have a right to go see what the heck's actually being executed.
Matt: And I would say. I completely agree with you. , I guess when that comes, we'll have to be ready for it.
But, human developers must be in the loop. Full stop. Cannot see a world where they can't, they can't access it I just can't, I can't see, I completely agree with you for risk. It is the right thing to do. It is the right thing for safety to make sure that, developers involved will be able to see the underlying code.
Deep: Yeah, I think, it's easy today to say that I think it's going to be a lot harder to say it tomorrow. Like I think when the Delta between a human developer and a machine developer is super significant meaning that the machine developer is a lot better in most scenarios, I think, that time's coming.
And when it does, it might not be the low friction, easy choice that is today. but anyway, thanks so much for coming on the show. I think this has been a really fun discussion. I think what you guys are doing is, it seems super practical and super, super helpful.
And I think your orientation around, tapping into what can be done and seeing, how can we like, get developers more efficient? it, it, it seems like a, a natural next step for us. So thanks so much for coming on.
Matt: Deep, this was is so fun.
I really enjoyed the conversation. So thank you for making time for this.