Talk: Towards trustworthy AI-based systems with Dr. William Scherlis from DARPA

AI capabilities, and particularly machine learning (ML), crossed a scaling threshold a decade or so ago, and have since become a significant source of capability in a wide range of systems and in diverse sectors. In this talk, building on perspective granted from sixty years of DARPA investment in AI capabilities, we review the progress and prospects for AI-based systems, with emphasis on potential pathways to more trustworthy and reliable AI-based systems.

Enjoy a replay of The DEVOPS Conference sessions conveniently through the DevOps Sauna podcast. This talk is hosted by Dr. William Scherlis, Office Director at Information Innovation Office (I2O), DARPA

Lauri (00:05):

Hi there and welcome to DevOps Sauna podcast. Eficode has organized The DEVOPS conference, or its predecessors, for a number of times. Last time in April 2021, over 10,000 people registered to the event. The DEVOPS conference is a community event where attendees do not have the pay for the participation, and we are happy to organize it again on March 8th and 9th 2022. You are all wholeheartedly welcomed to join the event and register at thedevopsconference.com. If you are a DevOps Agile or cloud practitioner, or a decision maker, I also encourage to review our call for papers. We are introducing a whole new community track to give more stage time for people with their personal experiences and insights. All of the speeches from the previous time are available online to watch, but to make it easier for people to enjoy these great speeches, we are also sharing them through our podcast. I hope you like them as much as I do.

Lauri (01:10):

Dr. William Schlerlis joining, who is the office director from the information innovation office at the DARPA. And I actually had to go back and look up what is the acronym DARPA and it is Defense Advanced Research Project Agency. And if it doesn't ring a bell to some of you, their innovations and their contributions definitely does, the internet, GPS, big dog, which is the Boston dynamics robot, cyborg insects, graphical user interfaces, mouse, onion routing, which was new to me, but I had to go back and look it up a little more deeply, Human Universal Load Carrier HULC, which was an exoskeleton that allows people to carry up to 200 pounds, 100 kilograms of loads at 10 miles or 15 kilometers, 15 kilometers per hour on a continued basis, which is quite impressive. And Siri too, to fill in this list. Dr. Schlerlis built on the perspectives granted from 60 years of these investments in AI capabilities. So he'll review the progress and prospects of artificial intelligence with emphasis on potential pathways to more trustworthy and reliable artificial intelligence.

William (02:38):

Thank you very much for that introduction. And I'm very excited to speak with you today about trustworthy AI and systems. And I want to thank the organizers for the invitation. This topic is timely and so I'm particularly pleased to have the opportunity to speak about it today. There's a surprisingly complex tangle of issues around the engineering of trustworthy AI. And there are three related questions that I'm going to take up in this regard. The first is, how can we do better with the current generation of machine learning systems? Can we make progress, any progress towards trustworthiness with the current machine learning? This is the machine learning that burst onto the scene about 10 or 15 years ago. It's still very much the zeitgeist of computing along with cloud and data and IOT. So we'll talk a little bit about that.

William (03:35):

And then what can we do to advance towards the next generation of AI capabilities? Trustworthiness, resiliency, alignment with the context of the domains in which the AI is operating, a little more versatile maybe towards general intelligence, better able to interact with people. And then the third question is, what are some paths forward to the engineering of trustworthy and secure systems generally? How can we make systems in a way that we can come to confident judgements of their trustworthiness? So this is not the same as judgements about the system that we happen to have, it's about building and evolving systems in a way where we are designing to achieve trustworthiness more readily. And of course, keep in mind that trustworthiness is not the same as trust. And then how can we do this at scale and affordably and with a DevOps tempo?

William (04:31):

And you'd say, well, why not just do DevOps or do DevSecOps? The challenge for us in government and also for large firms, is to do this with integration, with diverse sourcing, rich supply chains. So, my talk has two main parts. First, a little more DARPA station identification in that regard, I appreciate the introduction. Some words about AI and how we see it evolving. I would say we're at the earliest stages of a very long history of AI still to unfold, really just at the beginning. We're not at a plateau. And then the second part of the talk is a closer focus on the engineering of trustworthy systems, what we might be able to do. And I'm going to make an assertion. In fact, now is really the right time to be pushing on trustworthiness. We have strong motives to do so for many reasons, one of which not least is to be able to assimilate AI into mainstream systems and to build those systems in a way that we have trustworthiness with regards to the critical attributes of quality and functionality that we care about.

William (05:46):

But I think also that we have the means and the opportunity to do so. So now a word about I2O. I2O is one of six offices at DARPA. We have a website with lots of details on the office and the many programs that we operate. DARPA does no research in house, it's entirely a funding agency. We sponsor academic groups in fact, around the world and also firms. We solicit proposals and you can look for our solicitations, in fact, if you want to participate in our programs. In the office, we focus in four areas, how to prevail in cyber operations. As you know, all of us in all sectors are experiencing cyber operations continually. The stuff that we create in these programs goes right into use. The picture here shows an exercise in restoring operations on a malware infested power grid.

William (06:51):

A second area is how to develop and sustain secure systems that must rapidly evolve. We call this the pipeline model. It sounds a lot like DevOps and other SAS methodologies. And it's also very much in line with practices at firms with primarily organic software practice. That is where the entire code base is managed in a uniform manner with full transparency. And in fact, I'll just mention as an aside, there's a really great explanation of the benefits of this approach in a video that Google put out five years ago, released on YouTube. If you search for "Google repo video", you'll find that. And it's a really good explanation of their billion line repository in the benefits of organic practice. Well, all this is in sharp contrast to the usual arms length contract based processes, familiar in governments and large firms everywhere. It's quite hard to support the pipeline model in this context with rich supply chains, contracted work lawyers and the like, and also because we have really intense security and safety requirements and a constant assault of adversaries, it pulls the technology quite aggressively.

William (08:04):

So the third area in the lower right here is what technologies can create confidence success in the information domain. Confident really is the keyword. Despite lots of data, it's hard to draw inferences on what's going on in a particular influence operation and lots of technologies contribute to this, natural language processing data analysis, and also modeling of social features that show up in language such as emotion, culture affinity, and the like. Early natural language processing really tended to focus on just the facts, who, what, when and where, but we need to add these other dimensions to make sense of these influence operations. And then the fourth area of course, is how to develop and deliver AI that is proficient. Proficient is the keyword here. It means competent, robust, trustworthy. Modern machine learning is great on the upside when it's smart, but it's also fragile, opaque and dynamic.

William (09:07):

So I'm going to say more about AI right now. So at DARPA, we've been involved with AI for 60 years already. We build AI. We also use AI. It's important for cyber operations, for autonomy, for understanding information, influence operations, for dealing with large bodies of data. It also has a role in the engineering of systems, it's everywhere in the agency. But in I2O we're the computer science people, we have primary mission to advance our core capabilities. The history of AI goes way back. Just for fun, here's some dates. 1943 was actually the key milestone in the origin of Neural Nets with McCulloch and Pitts. Turing's test 1950. The Dartmouth workshop that kind of kicked off symbolic AI, 1956. 20 people, they could all fit in one room with social distancing. And the first symbolic reasoning language IPL was developed in the same year.

William (10:08):

So now we fast forward to 2010. ML burst on the scene, entered the zeitgeist. And it's what many people think AI means, but AI is actually much broader. All kinds of enablers for that sort of tipping point, not just 70 years of history of neural net thinking, but also for example, gamers pushing the development of advanced graphic card capabilities and GPUs and access to data. So now of course, any high school kid can build AI tools using all those wonderful Python libraries. So there's more to the story, of course. In the early years, the AI that we delivered was symbolic. This was the AI that I learned about as a student in the 1970s. Actually, I spent, I had really the great fortune to spend a year on a fellowship at the University of Edinburgh in Scotland, in the department of AI, which they had back in the 1970s, studying knowledge, representation, surge planning, robotics, formal methods, which was then part of AI and the like.

William (11:17):

Symbolic AI was characterized dually as either empirical epistemology, experimenting with knowledge structures or theoretical psychology, which is to say, trying to understand how cognitive processes must unfold through the building of computational models. So one of my predecessors at DARPA in this role, John Launchbury, identified TurboTax as an exemplar of a symbolic reasoning system. It's a constrained domain. It embodies all the many tax rules of all the states and municipalities in the U.S. Clear rules, constrained domains. Symbolic AI is easy to reason about, we understand the reasoning behind the conclusions that are reached. So we have an ability to assess validity, but it has issues with scalability and brittleness, as you know, and it's also painstaking to construct and also to evolve. DevOps and symbolic AI don't really harmonize too well.

William (12:24):

And so this led to waves of hype and disappointment with symbolic AI. Now statistical AI machine learning also has a long history. It's more scalable. It gave rise, as my introducer mentioned, gave rise to Siri, that spun out of a DARPA project, and also all manner of machine learning. As, for example, face recognition, celebrity faces and the like. But it isn't robust. And it is also super hard to reason about. Fragile here means that it's easily spoofed. In fact, that's the one thing that's reliable about machine learning is that we can reliably spoof these networks, a few stickers on the stop sign. You've all seen that, maybe you've seen the eyeglasses case where a few pixels around the rims of your eyeglasses can reliably misdirect a net, but it's also opaque. It's a sea of nodes and coefficients. It's hard to understand why and how it gets to conclusions. This creature here in a net designed to identify breeds of dogs and canines generally. You know it's a wolf, right? But do you know why it's it deemed a wolf? Well, it's not because of the morphology, it's because of the snow in the background.

William (13:57):

And so the other problem is actually, it's transparent when you want it to be opaque, just to flip it around, I'm referring to modeling version where I can actually analyze the network and extract from it particular training cases where I thought maybe I protected the privacy, or the security of those individual cases. But in fact, maybe not so much. And of course it's dynamic because it continues to learn. And also, I'm just mentioning these points about ethics and bias. Unlike Orthodox experimental science, where data serves hypotheses, in AI we often start with data. We're not thinking about social science or empirical studies 101, where we think carefully about the selection of data and the choice of independent and dependent variables. And so we have abundant opportunity to create bias. Often our choice of variables is opportunistic. And so that pushes us in the wrong direction.

William (14:59):

I also want to say that we've had fears of automation and autonomy going way back, you may be familiar with Asimov's laws. This is from a comic book, Magnus Robot Fighter in the 1960s. But of course, before this, there was the Golem of Prague made famous in the 16th century, but a concept that was around before then. We have good reasons to fear deployment outside of the envelope of advisory activity, which is, I would say, perhaps the main commercial use case, its benefits are on the upside, but they don't necessarily always add risk. The cases where we accept risk are very narrow and constrained. For example, vehicles on a limited access highway. And so the challenge of trustworthiness and accountability really loom large for us.

William (15:55):

So at DARPA, we've had a long history as was noted. It goes back 60 years, and here's an illustration of that. And, you can see that we've had waves of expectation and waves of disappointment. This Y axis on this chart is kind of notional funding. Britain also had AI winters, ask some of the old folks there, it's interesting stories. DARPA has used the term wave one and wave two to refer to symbolic and statistical, of course they both continue, but now we have waves three and four. Wave three refers to harmonizing, symbolic and statistical. We call this reasoning and context. There are good reasons to believe that this is key to trustworthiness and explanation.

William (16:46):

And we also want to help AI systems do better at learning, by learning how to learn, think about it, we reflect on our capacity to learn, to assimilate new ideas effectively to manage ourselves. So I want to call this wave four metacognition. Metacognition is also an idea with a long history. Allen Newell was exploring this back in the early eighties, even. And some people say that it's a precondition for wave three. And also it's a precondition for self moderation, knowing what you know, knowing what you don't know, assessing the confidence of the conclusions that you reach. And so here's an example of contextual reasoning, just to illustrate. We can look at this image and what do we see? We see a person sitting on a sofa with a bowl of popcorn, but what can we infer from our domain knowledge? Well, probably watching TV, being surprised, maybe it's near the beginning of a movie, why is that? Because the popcorn ball is full. There's no beer in sight. So probably not the ball game, right? So we're inferring quite a bit from context.

William (18:06):

So this slide summarizes some of the challenges that we see facing us in AI. We call this focus proficient AI, as I noted. And, so I just want to mention some of the challenges. First is how to gain trustworthiness. We care about trustworthiness because we care about autonomy. In the world of defense, autonomy has become very important, but autonomy requires trustworthiness. It's more than just trust the supplier. It's not about trust, we ran a few test cases or we ran a lot of test cases. We really have to be very confident that the autonomous systems will operate as close as possible to 100% of the time in a way that's safe and cure and consistent with the rules of the road.

William (18:52):

The U.S defense department is concerned about this enough, that nine years ago, they issued a directive called autonomy and systems. You can look it up online and find it and read it. And so it highlights the point. Why do we want autonomy? Well, there's safety and there's scale, but it's also that the modern military engagement really needs to operate at speeds faster than thought. We learned about this, for example, in the cyber grand challenge. Okay? Let me jump down a little bit. We want to also learn how to support effective human system partnering. This actually is a problem that gets harder as time goes by, not easier. This is not about what color the pixels should be. This is about how humans and systems, as systems gets smarter, which is why the problem gets harder, can interact in an effect where there's the right kind of mixed initiative. The systems are non-obtrusive and not annoying.

William (20:01):

And so, to make this work, we really need a lot of technical enablers, human language technology, but we also need epistemic models, which is models of knowledge and belief. What are the right framing of concepts for communication? What is the intentional state of the human? What's the emotional state of the human? That's all going to influence how we interact, think about how you interact with your team members and partners, and also task information and the roles of team members. Think about commander data or C-3PO, as the AI gets smarter, the problem gets harder.

William (20:39):

And then the last point I want to make on this slide is how to get by with less data, self supervised learning, transfer learning, and so on. This is important in domains where data is scarce and expensive. You know, we can learn things without a million training cases. And so we are actually in our programs studying how the very, very best learners learn. The very, very best learners are human infants. So we're actually engaged with psychologists to assess those things. So I'm going to just rattle through some programs to give you an illustration. You can read about these on our website. So I'll go at lightning speed here. This is the program, what to do when the data is sparse and expensive, learning with less. This is a program about human systems symbiosis. We have individuals with certain fixed training time in their engagement, but an unbounded flow of new systems, new capabilities, new roles, field, medics, repair people, systems are always changing. And so the idea here is to build models and sent provide just in time guidance for the humans, communicate just when necessary, see what the human sees.

William (22:06):

So there's a little bit of AR there, we're building on all those technologies. This is a AI versus AI versus AI in the sense that you can have again, you detect again, result using more AI, this detects deep fakes in images and videos, manipulation and images, that flag was actually added in, kind of photoshopped in, and this system detects that. And now we're moving a little further along taking the next step, adding in speech, text and narrative. So a few takeaways that I'd like to highlight quickly, this illustration on the right first is our ACE air combat program. The AI beats the hotshot fighter pilots. The fighter pilots, do they hate it? Well, they actually don't hate it because they've learned a lot from it. And it is actually potentially changing how these air combat engagements are unfolding. First is that AI and computing are not at a plateau, nor will they be anytime soon.

William (23:15):

This is really, really important. It's not just that the chips are getting faster. It's that complexity is increasing and software uniquely really has no asymptote. The people curve is still flat. We're not any smarter than we were a hundred years ago, but we've been able to improve our capacity to engineer these complex symbolic structures. And that's why we're here at this meeting and pursuing the improvement in our software development capability. It's an interesting struggle. We start with creative activities that eventually get routinized, and then we take those routine activities. What's the next thing we do? We automate them. Well, we don't hit an asymptote. Instead, what we do is we move to the next level of complexity, the next level of abstraction, and we do this without limit. The laws of physics don't keep us down. So this is this funny conundrum, which is that because of this in software, the abstraction level keeps increasing and our managers want us to routinize, but there will always be that frustration because the creativity keeps popping out.

William (24:22):

That's good. Okay. And, there's a related point, which is in computing. What matters is not what's new, but rather what has become timely? Many of the breakthroughs that we see popping into the scene, AI machine learning, being one of them, but cloud, for example, cloud ideas were around back in the 1970s, is just that if you look at all those exponential curves on the semi-log paper and see how they're plotted, you see that at a certain point, cost effectiveness emerges. The algorithms are right. The use cases match. And then they suddenly pop into the scene. Second point is AI is not like other software. We don't know how to test it or inspect it. We don't know how to assess robustness. What's the specification for machine learning net other than its training cases? Well, how do I extrapolate from those training cases?

William (25:13):

Okay. A third point is, yeah, there's lots to do. You've just heard some thoughts on that. And then the fourth point is many, many applications for AI. Don't just focus on machine learning. So I've got five minutes left, and I'd just like to say a little bit about systems. What we're trying to do here, engineer systems to be resilient, to operate through when compromised, because systems are getting more interconnected and more complicated and larger. Security, by the way, here includes bolt-on security, security people say, "Oh, don't do bolt-on security. You should build it in." Yes, of course. But often we have legacy. We need to be good at building it in. And then finally adaptable. We want to be like SaaS, we want to be like DevOps. But for integrated systems with rich and complex supply chains, that's really hard. That's really hard.

William (26:07):

We're still moving away from a model that I call the RDS model. Requirements, spend a year or two writing requirements, spend three or four, five years developing things, test for a while, and then: Surprise! That's not what we want. That's the Requirements Delay Surprise model. We need adaptability. From the defense perspective, we have adversaries who are nimble and capable. The technologies are moving quickly. The mission therefore has to move quickly. AI is a big piece of this. So a quick illustration of some of our programs, safe docs, systems exchange, structured data with each other. This is a vector for infection, as well as for sharing data, the natural processes, the data formats get enriched over time. We feature them up. And so look at PDF. It's got incredible featuring thousands of pages in the ISO standards. It's a natural process. The other thing that happens is readers of these formats become more resilient and accepting because the writers make small errors.

William (27:11):

PDF is a canonical story, but there are lots of other examples. Case, how do we build in cyber resiliency? How do we assess it? Treat it as a first class property. Move to the left in our timelines so we can evaluate it early and often and not find out only after we build the system that whoop it's not resilient, it's a little more fragile than we thought. These spells is about software evolution. It's about recovering lost information from legacy components, I'm sure you've all experienced this phenomenon of dark shadows and superstitions regarding legacy components that nobody understands anymore. One of the reasons why we keep touching the code and refactoring it and fussing with it is that it keeps in our minds that the key intellectual assets of the models and intentions behind that code. Well in our world of legacy system, that's really hard to do so we have to be able to capture what we learn and then modify it appropriately.

William (28:09):

Okay. A short autonomy. Well, for certain machine learning nets, it's actually possible to extract mathematical models. And then we can feed those models to solvers and build those into the usual body of evidence, not so usual, but what we see as an emerging body of evidence, that's linked in with all of our tools and other engineering data and connect those up. And this is an example here. These are underwater autonomous vehicles that snoop around, sniff around on the ocean floor and then are meant to come home. And so they encounter unexpected phenomenon and so they need to be able to operate autonomously.

William (28:54):

Okay. So I'm almost at the end of my time. And I want to close with three summary points. First is that cyber adversaries are getting smarter. They're always on the prowl. So that means we have to get better at making systems more secure. AI makes this harder. So do diverse supply chains, components, services, tools, tools, operations, infrastructure, and often a lot of opacity in those supply chains.

William (29:23):

Second point trustworthiness and assurance, therefore are more important than ever. We should act on a bias for evidence and trustworthiness over reputation and trust. You know,igned code, may be authentic, you know it's authentic, but it could be authentically vulnerable. So really maybe software builders could contemplate what sort of claims they could offer to their clients and users. What evidence could be created to support those claims. And clients and users should think about how to reward those making of claims and the evidence behind them, even if the evidence is attested only by a trusted third party so that intellectual property is protected. The great thing about amassing evidence, concurrently with the engineering, is that we can more easily and more often pass evaluation gates. We identify and resolve uncertainties more quickly. This is why the engineering V which may be familiar to many of you simply doesn't work for software.

William (30:39):

A key tenant of DevOps is that continuous evaluation is key to continuous evolution. Software dies as soon as it stops swimming. So we have to translate this, our challenges to translate this into the world of systems integration with diverse supply chains. And finally, I'm going to reiterate my point about no plateau, because I think this is the most important takeaway. You should always try to routinize, make your practice routine. And then on that basis, then to automate more and more, and of course also stay open to evolution and advancement. As engineers, our creativity is our strength. And so this flow from routine to automated keeps us more creative more of the time. And I think it's not an arrogant statement as a software person to say that in software, we are therefore having to be more creative more of the time than maybe perhaps other engineers. But we'll just treat that as a hypothesis, that we cannot really easily test.

William (31:46):

In any case, our systems are becoming more capable because we get better at dealing with complexity and with trustworthiness, even though the human Moore's law curve, as I pointed out earlier, has really stayed flat for a very long time. So on this note, I will close my talk and I'm happy to take questions. Thank you very much.

Lauri (32:09):

Thank you for listening. To register to The DEVOPS conference, go to thedevopsconference.com. If you have not already, please subscribe to our podcast and give us a rating on your platform. It means the world to us. Also check out other episodes for interesting and exciting talks. I say now, take care of yourself and see you in The DEVOPS conference.

Dr. William Scherlis
Office Director at Information Innovation Office (I2O), DARPA
LinkedIn: linkedin.com/in/bill-scherlis

Watch all recordings from The DEVOPS Conference 2021 on the event's page: https://www.thedevopsconference.com/speakers

Published: December 15, 2021

DevOps The DEVOPS Conference 2021

Eficode

Subscribe to our podcast

Related tracks

Sauna Sessions

Interview: DevOps in financial services with Rouke de Jong

Check out our latest podcast which compares DevOps in financial services to DevOps in retail, through the lens of one of our Senior DevOps Consultants in Amsterdam, Rouke de Jong.

Go to full transcript

The DEVOPS Conference 2021

Talk: DevSecOps - going back to the roots with Patrick Debois, father of DevOps

A talk from The DEVOPS Conference, hosted by Patrick Debois, Father of DevOps and Director of Market Strategy at Snyk

Go to full transcript