Explore platform engineering in 2024 with Marc and Darren as they discuss starting from scratch by addressing past errors, standardized tooling, and securing management support, all while taking a balanced approach that focuses on investment, change resistance, and adaptability to new platforms. Join in the conversation at The DEVOPS Conference Global and experience a fantastic group of speakers alongside a worldwide live and online audience made up of practitioners and decision-makers.
Marc (00:05): But how do we have our stuff running in a production-like environment as quickly as possible so that we can understand not only if it works, but also does the customer behave in the way that we expect them to with the feature that somebody wants us to build?
Welcome to DevOps Sauna season four, the podcast where technology meets culture and security is the bridge that connects them.
Marc (00:36): Hello, Darren and Marc are back in the Sauna. Hello, Darren!
Darren (00:41): Hey, Marc.
Marc (00:42): We have a neat topic today. We're going to continue talking about platform engineering. And Darren you have a neat topic. What is it?
Darren (00:50): Yeah, I want to talk about platform engineering from the very start, from like Day Zero. How do you start? Where's your starting point in 2024 because we have this interesting opportunity with the dawn of the main platform engineering, it's like a rebrand of the technical side of DevOps that allows us to maybe erase 10 years of history and poor tooling choices and change rules to ones that suit is better. Because if you are looking into starting to build an IDP now, every search you make is going to be polluted by 10 years of bad decisions. And now platform engineering has become overarching title for this, we can erase those, we can make new, hopefully less bad decisions and build a playbook out of those.
Marc (01:43): One of the cores that I always like to bring out at this point is that when Agile Manifesto came out, and it was people and interactions over process and tools, and when agile really got rolling, especially at the corporate level, where people had to tell people how to be agile, what ended up happening was every team did different process and tools, people completely forgot about the people and interactions part. And then what we've ended up with is so many companies now have, every team has their own Jenkins or they have a GitLab under one desk with a shadow IT that gets hit by the cleaning lady with a vacuum cleaner from time to time. And there's no standardization of anything. We're never really starting from scratch, what we're doing right is we have to push forward some kind of legacy.
Darren (02:30): Yeah, typically, I think it's very rare that anyone's starting from scratch. So maybe the title is a bit clickbait-y, and we're not talking about scratch, we're talking about starting fresh. But I do think the Agile Manifesto was interesting in that it did cause that huge fracturing of tool chains across the board, basically, the idea that everyone's managing their own tool chain, it just became the, in my opinion, the opposite of what it was supposed to inspire because obviously, it promoted a mindset, it promoted a work style, but it's like promoting a specific driving style, and then forgetting to build a car. So you just have people showing up with random spare parts on race day. That's how the ecosystem is, and that in my opinion, is what spawned platform engineering, this desire to take the tool sets and standardize them, push them all together without the baggage of the Agile Manifesto, maybe without as much of the culture, you can't completely separate them from the culture because then you just have the opposite of the Agile Manifesto, where you have the tooling is great, and the culture just completely ignores the tooling.
Marc (03:44): Okay, so let's get into the first steps. Investment, I think is one of the most interesting ones, it's going to cost something. And I think one of the most interesting things is that people look at, they think about platform engineering, and they say, okay, well, do we need to hire some people? Or do we need to buy some licenses for some cloud based tools? But I think one of the other interesting things to consider is that really what you're going to start doing is paying back the interest on your technical debt. Now, the good news is that one of the things that is a key advantage is that many surveys have gone out. And there's been a lot of research done. And many people believe that 40% of developers’ time has been spent on maintaining those tool chains that are different than every other team and causing all of this fragmentation and confusing. It's not a 100% investment, but there is interest on that technical debt that's going to end up coming due to the way that things have been done based upon each of those little silos that all of the teams have built themselves in.
Darren (04:48): Yeah, it kind of curious with technical debt. You get the option with these platforms of paying it now or it's just going to end up crashing you and I think you raised an interesting point about the 40% time developers spend with these tools, which is 40% of the time they're not doing what they actually want to do, which is being developers. And it's been a discussion we've had previously about maybe a short topic here on psychological safety of people and happiness in the workplace. Having almost half your time being dropped into something you don't want to be doing is an excellent way to make [unintelligible] unhappy. It has this knock on effect. It is all about investment and not just monetary investment, but investment in the process, investment across the board because if you don't have support for this change, if this is a small team trying to push this change outwards in all directions, then I don't know what your feelings are, Marc, but I think they're going to have a tough time, I think they're going to fail.
Marc (05:51): I think this leads into one of the most important ideas here, which is that you've got to have management buy in, and you need to have management buy in not only from the technical side of the house, but also from the sales and business sides of the house. And there's a balance there. You reminded me of something that like, show me a company that is where the developers are just suffering under a mountain of technical debt. And I'll show you a company with short term incentives based upon sales and revenue.
Darren (06:23): That's interesting. I don't get to deal with the sales side of things much. It's kind of interesting to see from that perspective.
Marc (06:30): Incentives being another part of this that is really, really important. And many times if companies, they may have a short term or a quarterly incentive, or maybe even they're a waterfall based company, which many-- every public company has to report quarterly. And most companies still have a yearly budgeting cycle and are not really agile in this department. So this putting down a bonus plan that at the end of the year, you're going to have sold X amount of product or made Y amount of EBITDA or something like that. And then having those motivations pushing against, wait a minute, you want to take a little bit of time every sprint or every two weeks or whatever in order to fix things that are already in the product that we're already selling? You want to go back? And what are you even talking about with this technical debt business? They push forward because they have incentives that say just sell, sell, sell, and never have the opportunity to look at how much of that debt is piling up.
Darren (07:28): But I have to say, if your company is operating like that, I think there are probably bigger changes that need to happen than platform engineering at this point. So in that case, it's going to be a struggle to push platform engineering anywhere because as we say, if you don't have the buy in from management, if you don't have the buy in from the developers, if you don't have the incentive for people to want to do this, it's not going to go anywhere because it does require consistent feedback from developers, it will require funding from management, it requires all these people to pull together. And if you're not ready to make that investment, that's going to be the most difficult hurdle to overcome. It's always about the matter of momentum, getting started is the most difficult part.
Marc (08:15): Absolutely. I'll add one last thing here, one of my favorite things about bigger changes in platform engineering, moving to new tool chains, taking one of the big platforms in to use, things like this is where it actually can be a positive lever for cultural change because now all of a sudden, developers have something like a 1.0 type of target that is very, very clear. Get your code over there and get it working. And that can provide an interesting amount of momentum towards making even more changes in the future, like developing new features, or spinning off a part of the product or breaking down monoliths, things like that. But one of the things here that is really fascinating to me is can you do this from the inside? Or do you need to find an external or a partner for that? What do you think, Darren?
Darren (09:08): I think it's going to depend. So I think in most cases, if you're attached to your legacy code, if you're like married to the code, if you don't want to make changes to this code, and you want a solution that fits around this code, then you're going to run into a lot of trouble. And one of them is going to be whether you can do it internally or not. So having an internal team portion IDP is great if you don't constrain them. And that's the thing. If you could do like an ideal IDP for your legacy code as it was, you would already have done it. You don't want to get hung back by the idea that your code doesn't need to change. And I think we're probably going to go into that a bit more later because it's kind of an important topic, but generally speaking, I feel like if you are ready to make an investment on this side, then your team, most of the time, shouldn't be internal unless they are all on board because it's very easy for them to get bogged down by old code, and by old practices, and by old processes.
Marc (10:12): One of the things that can help is if you make a really clean break. So we've built up this legacy tools. Jenkins is probably there, there's probably a lot of things that nobody would start a new project with today, but they were fine in their day. But if you're able to make a clean break into a new paradigm, then I think that that can be quite a powerful motivator. But it's difficult in order to take something like this. GitHub or GitLab, for example, are the two biggest ones that we see most of our customers going towards now. If you take one of these, you want to try to find ways to facilitate getting everybody in there out of the box as vanilla as possible, so that you have the opportunity to really understand what it is you need to do before you think that okay, we need to have all of these different plugins and integrations and different types of actions and take care of every corner case, instead of just let's try to get a standard tool chain up and running. And let's use that as a method not only for change, but also as a platform to enable future change.
Darren (11:21): Yes, so you get this consistent feedback loop of developers trying, developers discovering what their requirements are, developers then contributing to shape the tool. I think that's what you're leaning on. It's the idea that without the developer feedback your development platform is going to end up in an ideal and unusable state. And the ideal is very rarely in a functional system.
Marc (11:47): You prompted me in a really interesting direction, Darren. Let's go back to the working software part over comprehensive documentation. And let's think about that in an agile context into platform engineering. If we have a working platform, then we have something that feedback loops are live and are real. If we instead are trying to architect or document the perfect platform before we go and build it, we're never going to get there. And we're forgetting one more of the base tenants of agile, which is let's focus on getting something working. And then we can understand really, what do we need to do in order to make it more ideal or less bad, or something like that, but let's focus on getting the first things working before we understand how right they are for our situation.
Darren (12:37): Yeah, all about the iterative loop. And it always seems to come back to that for most of these predecessors.
Marc (12:41): Yeah, and get it running before you optimize it.
Darren (12:45): But that also leads us into a topic that I think should be discussed is the focus of the process. Do you focus on tooling first? Or do you focus on the process? Because obviously, you need both in place. And we've talked now a couple of times about the big platforms about GitLab, GitHub. And can you take one of those platforms and put it on a pedestal and say this is the target we're shooting for? Or do you then end up vulnerable to tooling bias? What do you think about that?
Marc (13:14): I think that the thing that's neat here is you can remarkably find a lot of companies doing most things in a similar fashion, be they modern, or not quite as modern. And when you take those things, and then you just push them up a little bit in terms of capabilities, then you end up with a nearly standard reference model that almost anybody can use. Some companies will do shrink wrap software, or big box or something like that. Some companies are doing software as a service, some companies are doing embedded, and all of them will say that they have a special case because well, we want to deploy four times a day, or well, we can only deploy via printing a CD, or we only deploy via over the air updates on to micro-controllers all over the world. And when you look at all of those, the destination is software running on a device and the device doesn't necessarily have to be in the customer's hands in order to be validated. And if I think in terms of what the end goal is, the end goal is as a developer, I want to get fast feedback to know while I'm still in the context that my changes are okay or not, I want to get some medium term feedback because I understand that we're not going to be able to run all of that stuff in the 15 minutes that it takes for me to get up and get a cup of coffee and come back to my desk. And I want to run stuff on a target environment that is similar to what the customers are running on. So if I think in those terms, then most tool chains will support that. And all of the processes need to support that. So for me, I've been kind of funny in my life that I look at the tools as facilitating the process and the process is like an extension of the tools and I don't separate them as much as many people do because a good pipeline, I'm going to push a change, and I'm going to get feedback. And if I get negative feedback, then it's going to let me know that I didn't update the test case that I need to update in order to get positive feedback. And I think that once you you think like this, you start to think in terms of tools and process as something that are intertwined and that are really trying to support how do you amplify to use Jean Kim's words, amplify those feedback loops as quickly as possible, while still in context, a developer can understand the quality of their work.
Darren (15:37): I like that. That's a good answer. It kind of leans into this idea of that your special cases are not that special, your unique requirements are actually pretty unique requirements across a certain percentage of other clients who are doing completely different things. And I think people get a bit lost in their own edge cases at times, they get lost in these ideas of, this is such a tricky requirement that non-customized IDP could fit it. I think it's you're right. As the large platforms become more and more widespread, they are adopting processes that eliminate a lot of these outlying problems. And as that happens, I think maybe the tooling centric approach is valid. You're talking about this merging of processes and tooling. And this is interesting to me because to me, what the idea of platform engineering has been is the separation of process from tooling. The platform engineering from my perspective is about making sure the tooling is there, I'm letting someone else worry about the process. But I feel like in that way, all that's going to happen is things are going to go wrong. So this merging of everything is quite interesting. I'm hoping people take that to heart because otherwise, if you end up with trying to make this golden platform have the perfect tools, the perfect platforms, it ended up being unusable. I'm wondering because you deal more with the consulting side, is that something you think is fair at the moment, platform engineering teams are looking at making too perfect tooling?
Marc (17:19): Absolutely true that people are unfortunately thinking about the ultimate solution rather than the solution that they need next, or that they need first. So this is one part and this kind of comes back to that, let's get it running before we optimize that idea. And one of the lean principles is we need to run it long enough to understand it before we start changing it again, and things like this. But one thing that I see, not only in consulting, but also with our different types of customers is oftentimes what will happen is, okay, so you guys are going to take care of the tooling. And we're not going to deal with the processes right now. Or you guys are going to deal with the processes, and somebody else is going to deal with the tooling. And not necessarily taking these things together. I started out as a developer like many people, and I understand obstruction. And I understand, let's have an independent tool chain that can support many different types of processes and things like this. But at the end of the day, we're still talking about how do we discover the assets in our code? How do we leverage those? How do we get fast feedback on that? And how do we have our code running in a production-like, if not production itself in a safe manner, blue, green, or Canary or whatever, but how do we have our stuff running in a production-like environment as quickly as possible, so that we can understand not only if it works, but also does the customer behave in the way that we expect them to with the feature that somebody wants us to build. So when I think of those, it's like, every platform should be able to support that. And it should have enough variance that you can deal with, the team that does the front end things that are more customer facing and customer touching, and the back end things that need to be stable over longer periods of time, and the things that change a lot, and the things that change less often. All platforms should be able to take care of these things. And to be frank, you can pull a set of tools off the shelf today, and be able to support any of those types of things, basically, from the same couple of dozen tools that you mix and match into a tool chain of, let’s say 10 tools. So I see it like this, that you're absolutely right, but let's solve these problems together to make sure that we're solving the right problems.
Darren (19:40): Yeah, I'm glad to hear people talking like that because the literature I read, it sounds like a separation and so like purely technical person, there is a part of me that is obviously interested in that being unrestrained by process. Well, I think there's also some things to talk about here in that we get caught up in both the process and the tooling and we don't really talk about the software that's being sent through it. And this idea that if your software is there, it shouldn't, like that the IDP in the process should mold itself around it. Well, actually, when coming into one of these platforms, your software also needs to change and your software needs to be ready and valid for this process, for this change. And that it's not going to be that you're just putting a set of wheels under your current software that gets it where it's going faster. That dream is just not going to play out. Your software also has to be in a valid condition for being put through platform engineering tools to being put through pipelines. And if you're running some legacy monolith that takes six hours to compile in your pipelines, then you're going to lose all the rapidity of feedback that you need to make this tool successful, to make this process successful. So on that subject, I think people need to not underestimate how much the software needs to be altered, or how much their software also needs to be in the right form.
Marc (21:12): And you drove a really interesting thing through there, which is release frequency. And if we think in terms of this legacy that you were talking about, that takes six hours to build, and you say, okay, we want to set a target. We're in Scandinavia, so everything has to happen before midsummer, or it's not going to happen this year. So one of the things is like, okay, before midsummer, let's bisect that builds by four times, right? Do the math for me, Darren, for that sweat, three hours, hour and a half, 45 minutes, and 22 and a half minutes.
Darren (21:51): Twenty-two and a half. Yeah.
Marc (21:22): All right. So let's get the software from six hours to 22 and a half minutes. And let's do that in modern tooling. And now, how do we do that? There's one thing in software that's been true since I was a child, essentially, there's nothing more permanent than temporary. So what we want to do is a proof of concept that we can build the software in 22 minutes instead of in six hours. And we're going to need to form a task force of some of our best guys. And we're going to tell them to do a couple of spreads worth of spikes maybe, and do that in a modern tool chain. And let's not even say tool chain, let's do that in a modern single platform solution without the bells and whistles and say, okay, look at what we just did, we bisected by four times our build time, we got from six hours to 22 and a half minutes in a month with a small group of people. And then you do the math on that. And you think about how big an improvement that would be for people waiting for stuff to compile, or people throwing a bunch of code at the nightly build, and then leaving and coming back the next day. And the build is always broken. And build boxes are always read or whatever it is that they're running. So those are the types of things that I like to throw the gauntlet out there and say, let's go for this. And let's use this tool change or this platform change in order to facilitate that. And then keep those types of targets on the ball where we have a clear thing that everybody needs to focus towards, and say, okay, these guys just proved that it's possible to build the software in 22 and a half minutes. Now, we need something from everybody on every team in order to make that a production reality type of thing, and then get that momentum going. And then we can add the scanning tools and whatever else, all the bells and whistles in order to have a really good secure compliant tool chain up and running for the next release.
Darren (23:45): I just think it's interesting how no matter where we go with this topic, we always come back to that idea of amplifying feedback of getting a larger scale, or let's say getting larger amounts of feedback, getting it faster, getting it to people's hands. So it's quite interesting that we can just maybe talk about everything as a scale of how developers receive that feedback, and how developers can integrate that feedback into their work and erase that waiting period, erase the time between them because you're a coder, we've both done this work. And the thing that you get into in such a way that you don't want to be interrupted by massive delays. If you're actually interested in a project and working on that project, you want to stay inside that project, you want to stay focused on it. So getting this feedback loop, getting it as rapid as possible and as useful as possible. It has to be the primary goal in my opinion.
Marc (24:50): I completely agree. I think that's a really interesting place to to park this week's episode. I think what we've talked about is investing:
Darren (25:02): Yes. So when you're investing, you basically have to be all in at all levels. You can't half do platform engineering.
Marc (25:09): All right. Is it an inside job?
Darren (25:13): Rarely, especially if you're trying to cling to your legacy code.
Marc (25:17): Tool-centric, process-centric?
Darren (25:19): It probably can be tool-centric because the large platforms are a lot more flexible than you think. And your unique cases are probably not as unique as you believe they are.
Marc (25:32): Excellent. And I'll add, please don't underestimate your software that needs the most change. And next week, I think we would like to talk about how do we manage resistance in platform engineering, especially when doing these big changes. Would you like to try that next time, Darren?
Darren (25:49): Yeah, I think that warrants its own conversation because I think we're going to have a lot to say about managing resistance.
Marc (25:56): Absolutely. Hey, Darren, thank you for this week's discussion on platform engineering. This has been really cool.
Darren (26:03): Yes. It's been a pleasure as always.
Marc (26:05): All right. Thanks. And see you next time. We'll now tell you a little bit about who we are. Hi, I'm Marc Dillon, lead consultant at Eficode in the advisory and coaching team and I specialize in enterprise transformations.
Darren (26:21): Hey, I'm Darren Richardson, security architect at Eficode. And I work to ensure the security of our managed services offerings.
Marc (26:28): If you like what you hear, please like, rate, and subscribe on your favorite podcast platform. It means the world to us.