David Keane shares insight with Marc and Andy into the challenges of implementing DevOps at scale in HSBC. Learn about the positive impact of DevOps on cost control and security for gaining buy-in, along with the importance of breaking down silos, promoting collaboration, and fostering a culture of accountability. David also tells of the lever of change following a 48-hour system outage that prompted improved collaboration between IT and business teams. He'll be speaking at The DEVOPS Conference Live, scheduled for October 23rd and 25th, 2023, in Stockholm and Copenhagen.
David Keane (00:06): Some of it comes from fear, and some of it comes from a very-- a well-motivated place, if you like. People are trying to protect the organization and they think that this sounds risky or dangerous and they're against it for that reason.
Marc Dillon (00:21): This season, Andy and Marc are back with a fantastic group of guests.
Andy Allred (00:26): I've been to depths that remain classified and Marc keeps his head in the clouds. With our combined experience in the industry, we can go from the bare metal to the boardroom. Enjoy your time in the DevOps Sauna.
Marc Dillon (00:46): We are back in the sauna. This is the DevOps Sauna pre-game podcast for The DEVOPS Conference Scandinavia, which is coming to Stockholm on October 23rd and Copenhagen on October 25th. We are interviewing some of the guest speakers for the conference and we have one hell of a story and a very, very wonderful human here, David Keane. He's the co-founder and head of DevOps at Calitii. Hello, David.
David Keane (01:14): Thank you so much. it's great to be here. We're looking forward to the conferences, as I said, a story to tell, particularly from my days at HSBC about large scale DevOps transformation in the enterprise.
Marc Dillon (01:26): I can't wait to hear what happens next. As usual I have my cohort, Mr. Andy Allred.
Andy Allred (01:34): Hello, hello. These ones have been fun because we're just warming up and getting ready for the conference and it's a couple of weeks away as we're recording this and I just-- I can't wait.
Marc Dillon (01:43): David, let's start. I'm always interested in the human side of things and how did we get to where we are? Can you maybe go back to the beginning, like how did you get started in DevOps? Or what made you interested in this area, and how did you get here today?
David Keane (01:59): Yeah, sure. So my own personal journey, I guess I got exposed to agile ways of working when I was working at Deutsche Bank around early 2000s in the UK. Actually, some of the Indian partners who we were working with introduced us to that way of working. I have to say it didn't really stick unfortunately, at that time at least for me, so I was curious but not converted perhaps. Then a number of years later, I was working for HSBC and I was looking after a large part of their operations, IT for the investment bank. It was around the time the DevOps movement was sort of really getting going, 2012, 2013. A colleague of mine who used to run development for a large part of the environment at the time. Really our CIO at the time called us both because he knew we were both transformation type people or change orientated and Peter was in a Dev role and I was in an Ops role. So, his solution to his DevOps ambitions were, ''Hey, you guys get together and solve this DevOps problem for me''. It was a really interesting journey.
I think coming from my career take me through development roles, operations type roles and project manager with all types of different things. But coming into that DevOps role with an Ops hat on was really-- I think, a bit unusual. Most of it came from the Dev space and these were things that they wanted from our operations. So, when you came from the Ops space you were kind of more the guardian. You have to look after things, you have to make sure the things didn't break. It gave me one advantage that I was kind of trusted by that part of the house but I was not used to responding in the same way to what the Dev requirements were. So around 2013, for 10 years, I've been involved in it. I ran the DevOps transformation for the investment bank first at HSBC and then after that we created a role in the department that was engineering focus, so developer experience in engineers IT basically. Then when we did the global, our sort of enterprise wide transformation for the entire bank, I led that as well for the last two and a half years, three years.
Marc Dillon (04:03): I understand a little bit because I read the syllabus of your talk, the scale here but would you like to give us an understanding of the massive scale.
David Keane (04:15): In most of my career, I've worked for fairly large organizations, but HSBC is kind of right up there. There's about a quarter of a million people working in HSBC. So I think it's said in my talk, it's larger than 50 countries if you talk about staff numbers versus citizens. In technology with 50,000 plus, it moves around, obviously 50,000 or so. They're spread across 75 countries, 75, 76 countries and obviously we're regulated everywhere we operate with 40 million customers in HSBC. So the scale is huge and it's complex. It's an old organization. It's been around for 160 years, set of IT departments since IT existed. It has its own separate silos. People probably know the retail bank. It is a commercial bank, which is really significant for HSBC and that has an investment bank. Then it had its old core central components. So you have three or four different significant silos as well as the sheer scale of 50,000 people.
Marc Dillon (05:16): it's quite astonishing. Just the number of countries that you're in and the number of developers that some companies are operating with today. How does something like this get started? like I understand that you were there for a while and running things. What initiates and lights? How do you snowball such a big transformation?
David Keane (05:35): There's a couple of different starting points and I think that's the part of what I tried to talk about, they don't all work, right? So I've seen some things that did work and I've seen some things that haven't worked or didn't work yet. They kind of combined. The most successful, if I trace it back to what were the really good things that happened early, that led us on the path to where we got to while I was with the firm. It started with a small group of developers that had come from a different shop. They had a curiosity about-- they had an experience of working in a different way and they really wanted to change things when they arrived. It is a small group, we're talking about half a dozen people.
When you are slotted into an organization of tens of thousands, it's hard to change things. There's a certain way of operating, there's responsibilities are divided out in a very logical but old school sort of way. So, I think that from where I would start the success stories was with a one particular application. Remember, with eight and a half thousand across the bank. So one particular application called FX Evolve, which that small group of people came into and I got involved with them shortly after they arrived. When we came together and tried to create that DevOps environment and help them out with some things that were block risk for them at the time.
Now the bank also just fully answer your question. The bank also went through a couple of transformation initiatives and once kicked off a couple years after that, around 2016. Which was attempting to be agile at scale or DevOps at scale and all of these things and we will train people. We will change the world in a year. Some good things came from that, the education, people started understanding what was required and people were learning new ways of working, exposed to new technologies but they didn't succeed in the way that they set out. If I look at the small germ that started a few years earlier and how that spread, that's had a much more lasting impact. So when it came to doing the enterprise wide transformation, we were much sure of what we wanted to do. Kind of transform bottom up. You do need that leadership bind and I’ll talk about that later. This was generally largely led by engineers and the treated engineers as the customer and that was incredibly important to getting scale.
Andy Allred (07:59): I just think it's interesting that as you're describing this and I think some of the little bit smaller cases I've worked on, where we take off two or three zeros from your numbers. The challenges and what we need to do and the silos and the way of working and the buy-in that's needed, I don't think there's anything different yet that I've spotted other than the number of zeros. So it's just interesting that this is a huge organization but it still comes down to the same fundamentals that a small team would have.
David Keane (08:28): I mean, I don't know what the magic number is but certainly it's a much smaller number than 50,000 working complex amongst people. How we organize ourselves and how we arrange our ourselves. Always with good logic applied in the past but when the paradigm changes, when we have the ability to work faster, quicker, better and in a different way, how you have to go by changing that culture, that structure and the silo. I'm sure it's quite similar for anything beyond maybe the two pizza pot type thing that we talk about.
Marc Dillon (09:01): There's never doubt the power of a small group in making change, because it's the only thing that ever does, is one of the ideas that comes to mind. I'm absolutely fascinated already that it came from the bottom up and then you just put the culture word in there. So, how do you plant the seed for such a large cultural transformation as well, because there has to be a lot of resistance.
David Keane (09:25): I mean, there is, some of it comes from fear. Some of it comes from a very well-motivated place, if you like. People are trying to protect the organization and they think that this sounds risky or dangerous and they're against it for that reason. It would be wrong of me to pretend that we had all the solutions at the start and all we needed to do was to plant the seeds and to water them. That they would grow. Looking back, what were the sort of the really important things? It was bottom up and there was an engineering led, but that wasn't going to succeed unless you got the C-suite with you as well. So you have to appeal to lots of different types of stakeholders. The good thing is that there's lots of good stuff for all those stakeholders to take out of it but convincing them of that is a difficulty.
One of the early successes that we had was agreeing how to measure things and how we would measure. That we would use those broadly. Now, it wasn't done quickly but we landed on things very close to what the DORA metrics are. And for us, I would say boil down to two, it was about your release frequency and the total number of incidents you have. Now, most organizations are able to capture that kind of data. If they have a ITIL background they'll be capturing that kind of information. We went for those for a couple of reasons and if you hear Nicole Forsgren talk about the metrics in tension. So, we certainly had two metrics in tension. We had your release frequency and your total number of incidents. It's pretty coming from operations background. If somebody had told me the way to greater stability was to do more releases. If they told me that before 2013, I'd have had a very hard time believing them but that's what we've been able to prove. To send that correlation into reverse essentially but getting those two metrics agreed amongst the entire organization that this is what we're going to use. Now, they don't tell you everything that you need to know and they don't tell you everything that a team might be focusing on at any one particular time. If we had to go for simplicity with such a large audience and making it understood to lots of people. We went for that, we want you to release more frequently but at the same time, you have to reduce your incidents.
So in getting the C-suite, getting the masses be able to boil things down or distill things down into a much simpler message that we could transparently and automatically produce for everybody with a little bit of work. So, with 50,000 engineers, we built a way, relatively easily to be able to visualize that set of numbers for you from your pod all the way up to the enterprise and it aggregated up. So you were able to have that conversation. So that helped tremendously about having-- what are we actually trying to do? I speak to lots of people who are doing different transformations and you ask them why? It's remarkable how often that's a stumbling block to try and answer that question.
Then you get lots of different reasons for and of course there's really good reasons, cost containment, your customer, your customer getting stuff out there. Also improving quality and including that in your cyber protection. Your ability to roll out fixes and all that kind of stuff. That's all an outcome of DevOps done well. So getting that message out and our go faster break less. We were able to explain and show to people how it feeds each of those different outcomes eventually.
Marc Dillon (12:50): So when I saw the go faster break less, I thought that it was novel, and then yes, metrics intention, frequency and incidents. I'm completely with you. Did you look at any adoption level metric or did you just try for the largest group possible let's measure the deployment frequency and the number of incidents over time, then use that as the only kind of metrics to guide the entire epic transformation?
David Keane (13:19): From a midpoint onwards, the answer to that question would be yes. So early on, there were measures to try and figure out adoption of tools, adoption of agile ways of working and lots of other metrics. They didn't help would be, looking back on it now, they were somewhat helpful maybe at the time but they didn't… They were working towards an outcome. So what we got is they became an end themselves and you're just adopting a different tool or adopting a discipline around that job wasn't what we wanted people to do. So we went through that. It was probably somewhat helpful for some parts of the organization at the time but we didn't stick with it. Another reason would be it's just that simplicity thing again. You're trying to speak to so many different people in different countries and different cultures. It pays an awful lot to reduce the complexity of your message and our message started out as go faster, break less.
Then as we matured it a little bit and this is in the investment bank where we were doing it. We wanted people to be ambitious as well. So actually that morphed into double and half. You would double your releases year on year and you would have the number of instances you have year on year. That was an OKR not a target and in the first year that we introduced it. It was-- there was a lot of pushback. That's six years ago and it's been there every year since. They've never doubled in a single year. It's 40, 50, 60% year on year. I would say we were doing 60 releases, 70 releases for a team of 10 typically in the investment bank back in 16, 17, it's about 600 now. So you can work out the math. It's not doubling every year but certainly it's been ambitious. That was an important learning point also, because to improve a little bit, it's challenging but if somebody says you have to improve by 10% and you do one release a month without changing how you operate, you can squeeze that out. If that's what you need to do to satisfy people, that's fine. So it needed to be ambitious and the other thing it needed to be was multi-year because in any organization, particularly in large organizations, the engineering group are well used to people coming up with new ideas. So this year, the fad is boom, next year, the fad. So for the cynical amongst they would say, well, you know what? It'll pass us by, next year we'll be fine but when we've been at it for so long and every year, it's doubling half as an OKR. The consistency and the ambition of it have been hugely important in getting that message across. It's been better than having 15 or 20 different metrics and trying to measure people on that.
Marc Dillon (16:15): Wow. Sorry, this is the first thing that comes to mind when I hear these. You're really putting it together beautifully. When I think in terms of like when we were talking about the C-suite buy-in, is it enough to talk about double and half, twice as many releases and half the number of incidents and then let them figure out how they're going to incentivize each other or whatever based upon that, it's hard to argue with, or is there any other kind of activity? Because what we find all the time with these types of transformations. The reason those initiatives that you had before probably failed is because everybody's always a special case. No, I can't talk to you right now. I have a release to make and no, I have a production bug that has to be fixed. Not this year, maybe next year and all this kind of thing.
David Keane (17:01): I am-- I have a very long list of those excuses. Really the environment that size and that covers, again, these different business structures as well. There are some, a small number who are enthusiastic sort of from the start. There's a much larger cohort that needed to be convinced and some came dragging the screening. So, we probably had all shades of that. If we're talking about the business for about one six, sometimes we talk to C-suite, we focus on the C-suite within technology. Your business customers, your CEOs and your COO, the people in the business we're making decisions. There's a few things that we learned through with them. The data that the Dora researchers has kicked out tells you that that organizations that have the ability to go faster will have better business outcomes too. So there's really good research there that backs it up. None of that feels real that you see it internally though. It always feels like that's okay out there. So trying to put that first through and this is not something I'd recommend for everybody. How we got our big breakthrough was through a disaster in that area, that FX earth of evolved space. I mentioned that in the talk I did in Amsterdam, that there was outage in that platform for 48 hours. Which is kind of unheard though. This is a platform that some of the biggest names in the world, you know did our FX trading on billions and billions of dollars. For it to be down for a long period of time is not acceptable but the business realized that they couldn't just go back to operation as normal.
They said, what is it that you've been talking about that you want to do differently? So for a few months, they should knock down some IT debt and said, this is how we want to operate the product ownership role in that business area started to grow. If you were to go back into-- this is 10 years ago now, probably, eight years ago and nine years ago. If you were to go back into those environments now, you would really struggle to tell the difference between the IT and the business folks because there's so much empathy across the table. That wasn't the case then but it grew. So, if you think that was a seed that helped us to grow. That took a year, 18 months to start showing benefits but those business folks became our best ambassadors to go and speak to other people in the business. So rather than some being technology coming and explaining, this is a really good thing for us to do. Why don't you talk to your business colleagues over there and they had really good business outcomes, differences that working like this had made to them. So when somebody else comes to you and says this has made a difference to our PnL. You guys should think about it too. That really helps. With the IT or the security C-suite, your CSOs and everybody else, sometimes it's easy to understand a rejection of these kind of things because they've got their perimeter fence, they know how to manage things there. They don't really want to change those things and they're suspicious of change. Quite rightly so maybe, right? There are some bad actors within what we were doing. Okay, so we're doing agile and DevOps, we don't need to do your stuff anymore is completely untrue but people would attempt to do that. Nothing large on a scale at that, thankfully. So all the baby steps that are important in order to go faster and work less include things like automation. We talk about controls in a little bit more depth but there was a massive change we were able to make to both the engineer's lives and to the security folks through our cyber checks and/or other controls that we had in place around change management. We took what was typically a 30 day journey down to seconds, essentially. We didn't do that in one day and we had to prove the case. When the C-suite saw the outcome of that, what did they see? They saw they were now getting automated almost real time results and artifacts that backed up what the cyber checks were, the controls and the rest of it. What did they experience previously?
A paper-based system that came six months later. So they became huge buyers of this because it made their job more secure and more doable. Also if you take out that wait time and the labor involved, the project management labor. There's lots of zeros in this example I gave you as well but this is absolutely, true. We took 50,000, like 35,000 days of labor out a year associated with that one software delivery journey to production. Seven and a half centuries of wait time, that was the difference 'cause the volume of change we're doing by the amount of work. So what your C-suite see there. Well, they see a cuff saving in the tens of millions of dollars and that was real. They also see that they are now in a better position with their controls.
The visibility of their controls is better. The visibility of people's adherence to what they need to do for controls is better and the time that they get to see it is way better. It is near real time versus, best case scenario one month later. So, those things fed into the C-suite. I'm talking about a 10 year journey from 2013 to 2023. So, those were things that we had to build up and prove to people along that journey but that changed, that one software delivery journey brought a lot of the C-suite with us. Because it gave them things that they wanted. They wanted cost control, they wanted security control. They wanted to delight their customers, their predictability and the frequency of releasing software. It gave them all those things in spades.
Marc Dillon (22:55): Hi, it’s Marc again. The DEVOPS Conference is coming to Scandinavia on the 23rd of October in Stockholm and the 25th of October in Copenhagen. We can't wait to see you there. Now, back to the show. There's a really interesting thing. When you said seven and a half centuries of wait time and then when you said this was a 10 year journey. It's like, in an IT career, a 10 year journey is almost like seven and a half centuries. How did you personally sustain and how did the primary actors sustain and not burn out and remain motivated, focused and clear on this 10 year journey? That's astonishing.
David Keane (23:40): I think a certain degree of madness going in, maybe it is helpful. If I look back prior to that. I'd done a lot of different kinds of roles. The last role I had was more of an operations role and I was charged with doing some transformation in that space. I was doing things which I thought made sense at the time but they were traditional to that time. You tried to lower the cost points for your ticket, you tried to reduce the number of those tickets. You did all your problem management, all good IT type of stuff but it was only servicing. It was largely servicing the operations capability and it didn't do very much at all for how we had to change to react to new actors.
The world was changing and you had challenger banks coming in who didn't have such legacy of technology. So I'd had some exposure to agile previously in my career but my-- from 2013 and onwards, I went through the real learning journey as well. I certainly did not come to that day with all of the answers. I learned a lot as I went along and I found that enormously exciting and enjoyable. We didn't try and do it all at once. First to begin with it was more focused on the smaller area. Then we did the investment back, which itself is pretty large and we spent three years working that you're never done but the program comes to end then and you move on to different things.
I had a bit of a breakdown in the middle where I wasn't running any transformation but I was still working to try and improve the engineering community and developer experience. So I guess I had a few breaks but it was-- I'd spent best part of 10 years of the cold face. Personally, I came to the conclusion a few months ago that I personally wanted a different challenge to try and use what I'd learned and take that elsewhere. Others came in and out. There was a few that were there as long as I but mostly people came in and out. We leveraged other externals occasionally to come in and help us, to advise, to challenge us. It can be and I guess any sort of transformation can be because as we've all discussed in this call, there's lots of people who say no or I can't or I won't or we shouldn't and to be able to go back into battle every day for that, you have to, I guess, enjoy celebrating your successes and use those moments. Either the successes or indeed the disasters, like I gave the example of around the FX thing. You have to use those to change, to move the dial in big organizations. Never let a disaster go to waste. Never let an opportunity to go to waste.
Marc Dillon (26:11): Never let a crisis go to waste.
David Keane (26:13): There you go. That's it.
Marc Dillon (26:15): Really nice. One thing occurred to me in the middle of this wonderfully inspirational story that I would guess you were coming off of COBOL and mainframes with this transformation. Is there anything about that level of legacy? because we see this a lot. I did my first helping to webify legacy COBOL in the 1990’s system and people are still doing it now. Was there any aspects there or was it from moving from mainframes or big iron or this kind of stuff?
David Keane (26:46): So that's a great question and of course that happened at a place or size you're going to have, the size of HSBC, you have legacy systems. I remember when I worked on myself, which almost the same age as I do, then you have much newer systems. For the brand new greenfield sites, this is an easier journey in terms of technology. It's kind of the same journey though, in terms of the structures and the silos of the organization and what you're allowed to do and the processes. So, it's not like it's completely different. So when we talked about or go faster, break less managers. A couple of new ones in that are kind of important. If you get a big budget injection one month and you hire a bunch of new developers and support staff, you do more releases and if your budget is reduced the next year, then it changes.
When an organization is constantly evolving there is always this argument that is different. So, once we did care about that number at the enterprise or department level, we came up with a way and it's a bad acronym but production deployments for a notional team of 10 per year. So when we talked about our release frequency number. It was that notional team of 10 a year and so whatever your system or whatever your budget, it was normalized if you like to a certain extent by doing that. The second thing was that we knew that everybody wasn't going to be the same. So you weren't being compared to your neighbor if they were completely different. You're being compared to yourself.
If you did 10 system change. Let's say you did 12, one per month. Where we wanted you to get to, if you're a legacy system, our Q rating system was 18, maybe the next year, that'd be great. Other systems were going from one to 400 or 500 but we didn't have that expectation for some of the older systems. We did have an expectation that they try and do what was possible in their environment. I remember fairly early on after the big success story for the first few years was that system apex evolved that I was talking about. With the best success story that kept us going in 15 and 16 was a much older system called Treats, which was a mainframe IER link system.
They used to do at least once a quarter. So, there was four releases a year, maybe three sometimes. There weren't those as successes. I remember with my previous half on talking to the CIO on a Monday after we'd had a terrible day and it was post a release, a quarterly release for a very large system. We looked at a few things and one of them was a number of approvers. There was 110 approvers on the change. Who do you think was taking accountability with a 110 approvers? Nobody and so we worked with the team. We had all this kind of pushback, we can't because we're legacy, rest of it. We looked at their change and this, their business didn't want it either. That's something we can talk about too 'cause that's sometimes but we got them to separate out some of their technical stuff from their delivery for business. So, let's just start with that and they did and you know I haven't spoke to them in a while but they were doing three releases a week last time I bumped into them. So they're not doing the hunt releases a day that some of the systems are but what they are doing is they now have the ability to respond to threats. They have the ability to respond to fixes they might have to do as quick as they like. Being practiced and fit for that is incredibly important to them.
Breaking up the business changes has taken longer but it certainly happened. One of the-- particularly for anybody who's in a large legacy environment, those big systems tend to have an outsized effect on the others because they might be your book of records. It might be some-- they might constraint people's ability to change their own systems. 'cause I can't 'cause those guys only change once every year. So, I gotta put my chances around them. So you really have to tackle that dynamic in the organization. So I guess very long-winded way to say it's not the same for everybody but we expect the same ambition from everybody.
Andy Allred (30:49): I think sometimes you really want to have a very huge and outrageous goal to drive that we have to think differently. Then as you were saying, other times, you need to have a much more constrained, well, let's be very careful with this case. How do you judge the difference? And how do you decide? Is it just every single team by team one at a time or do you have any guidelines that you would use?
David Keane (31:12): I would say largely those users would be made at a team level and in the role that I was in because it was enterprise wide and with eight and a half thousand systems. Each business line had a couple thousand systems each. So every department might have a hundred and so what I didn't say is it was you and you in charge of these systems, we didn't say every single one of your a hundred systems needs to double next year. What I would always said to you is, from a department point of view, I'm expecting you to hit your KRFW. 70% there, whatever looks good. You might look at your portfolio and say, that's legacy. We're going to shut it down. I'm not going to invested time and change, that one's high risk. So, maybe we've kicked that steady, that one over there, newer system, let's go for 10x on that one.
So we left the decision making around a lot of those things where we felt it should be, which is much closer to what the business impact is. You do that year on year, it changes the culture of the technology group as well as the businesses that they're working with. So, you might think in year one, find that no one is going to argue with that. In year two you wrap it up to year four or five, the department head and the business they have an expectation that they can get releases in and they can do so quickly for everything. So I don't remember ever saying to somebody, you must irrespective of risk. We left that decision locally but we had an expectation that at a sufficiently high level in the organization, that you would hit your numbers.
That had to be a bit real, because you mentioned there earlier on before, unless people have an ambition and it's difficult, they won't tackle the hard things. Lots of these things can't be solved in the year if you have to re-architect, if you have to shift the cloud for some things. I think it's a bit of a nonsense to say to everybody, pretty environment that I was in there with that complex, this all has to be done by Christmas. That just doesn't work. The funding often isn't there. The resources aren't there. There are business demands which have to be served and their regulation demands have to be served too. If you're consistent about it and you're changing the culture, those things become much easier in year six and seven than they were in year one and two.
Andy Allred (33:33): I guess I'm just looking for how do you balance psychologically safe environment and saying, don't worry, Andy, it's okay. Just do the best you can and hey Andy, you really need to push yourself and work outside, think outside the box, come up with creative solutions and really push your teams hard. It's always different balance were defined there.
David Keane (33:56): Here’s one thing we did early on. We took one of our systems that was going through a service improvement program. So, it had been unreliable and there was lots of things creaking that shouldn't be creaking because they needed to be upgraded. It was a-- it depended largely on some of a bender type of process. So we had to do from a four X to a seven X type upgrade. So it was a very significant upgrade. So it was a bunch of things going on. It was in intensive care and they had this SIP program that was gonna be funded. We were going to get to the end point and at the same time. We had a system, didn't have those sorts of challenges, but it was quite similar.
If you were to look at its instance and problem numbers and all those servers requests and everything else essentially against it, but we took a very different approach that we said you guys are double and a half. We want you to go at this and really challenge yourselves. I did an analysis of kind of all of their tickets. Let's say they started in January, finished in December, and if you look at the makeup of the tickets that were supporting the program, the system that was going through the seat, I don't remember exactly but let's say there was 40% of tickets were instant tickets, 30% of tickets were service requests, 10% of tickets were problem tickets, 5% of tickets were change tickets.
Now I know that doesn't add up but it was that kind of order. It was a mixture and changed a relatively small portion of it and what part of the target of that sip was to reduce incidents by 25%. I think by the end of the year, when you get to December, had done that, instance reduced by 25%. So, they were down from 40% to 30% or whatever else it was but he makeup of all the other work was still there. So you're the same number, change your best, the same numbers proportionately of service request, the rest of it. So the SIP was a success. If you look at what happened with the other application that went through a more of a DevOps transformation, a pure thing, all of the numbers with the exception of the change number had gone away almost to zero.
On my log scale I was tracking that you had the same sort of five or 10% tickets were change tickets in January. In December they were 98.5%, your incidents numbers were half percent your service requests. Why did that happen? It happened for a few reasons. Search requests typically happen when the functionality doesn't exist in the system. So, you build that in a system and if you're delivering change on a multiple times during the week, you can hit the reasons for the service request. You kill those off basically over time. When you do all of the small things that you need to do for DevOps, version controller stuff, automate things, everything. How you structure your teams, getting a proper product dollar in place. Those kinds of things will lend themselves to improving your stability. Then increasing your release frequency. So at the end of the year, essentially let's pretend you had a million dollars you were spending on one team and million dollars you're spending on the other team. Well, the million dollars you're spending on the team that did the SIP.
You might be able to take off a couple of hundred bucks, a hundred bucks or something and say, you know what? We've less instance, we don't need quite some money. So we can reduce some costs there but you really haven't changed. If you looked at it a year later, it had drifted back to where it was. If you looked at the top team that went through the DevOps transformation, you were spending a million dollars in January largely servicing a system that was creaking. In December, you were spending a million dollars a month largely shifting new software into production. That's the trajectory you were on. It's an enormous difference. So when you were trying to demonstrate to people and when we'd done this intern, they meant so much more than the academic research because they could believe it with all the headwinds that we had. So you were able to-- you were better prepared to convince people that from a safety and a cost point of view, this actually is a much more effective way of attacking the problem you're concerned about than the traditional service improvement program.
Andy Allred (37:52): Okay, I got my hands up here, Dave. I'm convinced I'm going to come to your talk. I'm going to be looking for you in the hallway afterwards.
David Keane (38:01): Okay. I look forward to that conversation.
Marc Dillon (38:04): Before I ask the two questions, I didn't capture all your numbers, David, what a glorious number fest that we had here. But I'm just gonna play back some somewhat incorrect numbers. Completely out of context just to, for our listeners. So, the company we were talking about, HSBC had 250,000 people. Which is larger than 50 countries, 50,000 engineers across over 70 countries, eight and a half thousand applications. A business line would've a couple of thousand systems, a business head a hundred. There were seven and a half centuries of wait time killed in this transformation on a 10 year journey supported by some madness, potentially eliminating the function of 110 approvers, getting from quarterly releases in some instances to a hundred releases a day, potentially, reducing incidents first by 25% and then by 98%. Then the bottom line here of if you go from a million on maintenance to investing a million upfront to being able to ship a million a month in new software as a kind of metric baseline. Did I get any of that right, David?
David Rutter (39:13): Most of that was right. I was using the million dollars at the end as an example of me. I'm not sure what those systems budgets were. The point that I was making was that if you're spending money on your technology department to deliver business. Do you want them spending their time and therefore having to pay for it in order to support the system and how it creates its service requests as incidents? Or do you want to spend most of your money on your technologists to deliver new business functionality that will allow you to enhance your business offering and drive more profitability for the firm in a safer way? So, I think that that's a no-brainer for people. So being able to demonstrate that by numbers as we were in that particular case of just one-on-one. I wouldn't claim a 98% reduction on incidents.
We did year on year, about 20% are shown this kind of flatlining but that example is all about what do you want to spend your money on? I think in a newer way of working and we can talk about agile even with DevOps, digital transformations. I get exhausted with different ways of describing it but there is a much better way for us to operate than we did historically. That it's all about seizing those opportunities.
Marc Dillon (40:28): Excellent and beautifully put. Thank you for that close. We do have two questions that we ask everybody that comes on the podcast. At the moment, they are a thought experiment on leadership. So the thought experiment goes like this, that David, you're the leader. I am a trusted team member and we have the team around us. Probably including Andy, complaining about a problem. I say, David, boss, I'll take care of that. So the first question is, David, what do you say in this situation?
David Keane (40:58): I've been nervous about these talk questions. Well, it's great that you are demonstrating that you want to take some ownership of it. I think one of the things in the teams that we try to encourage even before people become leaders is to take more responsibility for what they do. You build it, you run it, that part of the culture. So I think in so far as encouraging that, I think what you said is a good thing. I think the one thing that I wouldn't separate myself from is that I can't divorce myself or maybe even the team from their responsibility too. So, you know, I would want to understand a little bit more detail about it. I think there's somebody coming to you and saying that they want to take on leadership and they want to do these things, I see that, first and foremost as a positive but you have to make sure I suppose a level of trust. I think you said they were a trusted person. So we know that it's not been done for reasons of hiding and these kinds of things. So I'm largely being encouraging but making sure that somebody isn't taking on more than they can handle. It's a journey.
Marc Dillon (42:01): Excellent. Well put. So the second and to me most important question is how do we get the rest of the team and Andy to stop complaining and to start being more like me in this case? If you can answer this, then we're-
Andy Allred (42:14): Good luck with that.
David Keane (42:16): I don't know how long you guys have been working together but that’s a possible feat.
Marc Dillon (42:21): That was a new feature of this podcast, including Andy.
David Keane (42:24): Well I'll try and put in the context of some of the things that we did, if you'd allow me. So we went from a very typical, traditionally siloed organization in a team where you'd have a bunch of people responsible for developing something, other people responsible for deployment, other people responsible for support it. Somebody else may be doing some analysis, and what have you, very separate. When you're separate, you point fingers at each other and you don't take responsibility. We went through an exercise and this started, it was a very powerful thing with that team that I talked about in FX evolve. One thing decided, first of all to do was to break down those barriers and we did it in a really simple way.
So everybody started doing deployments, they have to learn. Everybody carried the pager, including the managers. Some of the developers felt it was beneath them to carry the pager, and they were thanked for the service and encouraged to find work elsewhere. In that way we changed, the culture of the organization changed. So I think what we're expecting of Andy is the same thing. It doesn't get you very far. It might win you one battle to blame somebody else and to hold something up and say no, it was his fault or her fault or whatever. That's short-termism. Really what we want is we want people to be collegiate and collaborative. It's easier said than done but by building in that culture where you're accountable and responsible. You're not handing over to other people. You're reducing the number of anchors. You build that culture. I think we'd encourage Andy to do that. We put into his OKRs for the end of the year. We would watch him develop and blossom. No doubt.
Andy Allred (44:05): Fine. I'll carry the pager.
Marc Dillon (44:10): He actually fought everybody and took it away already. Thank you, David. It's been, this is absolutely my favorite podcast so far. I can't wait to have you on our stage at The DEVOPS Conference Scandinavia. Thanks, David.
David Keane (44:25): Thank you very much. Been a great experience. Been my best podcast too, my first. My first externally, so thank you. I really looking forward to coming Stockholm on the 23rd and Copenhagen on the 25th.
Marc Dillon (44:41): Excellent. Thank you, Andy.
Andy Allred (44:42): Thank you. I will carry the pager. Don't worry. I'm fine with that.
Marc Dillon (44:46): This is The DEVOPS Conference pre-game podcast. We'll see you at the conference. Before we go, let's give our guest an opportunity to introduce themselves and tell you a little bit about who we are.
David Keane (45:00): Hi, my name's David Keane. I am the co-founder and head of DevOps at Calitii. Prior to this, I led what is perhaps largest DevOps transformation in the world at HSBC, where we did an Agile and DevOps transformation across 50,000 engineers and eight and a half thousand systems. Look forward to speaking to all in Copenhagen at Stockholm.
Marc Dillon (45:24): My name is Marc Dillon. I'm a lead consultant in the transformation business at Eficode.
Andy Allred (45:29): My Name is Andy Allred and I'm doing platform engineering at Eficode.
Marc Dillon (45:33): Thank you for listening. If you enjoyed what you heard, please like and subscribe, it means the world to us. Also, check out our other interesting talks and tune in for our next episode. Take care of yourself. Remember, what really matters is everything we do with machines is to help humans.