BrandGhost

Why System Failures Make The Best Learning Opportunities! - Interview With Michael K.

In this talk with Michael K., he shares his perspective on why the high-priority P0 scenarios are some of the best to learn from... despite being high pressure and high stress! Everyone has something to bring to the table, and everyone has something they can learn -- regardless of your title or your seniority. Thanks for this awesome discussion, Michael!
View Transcript
P 0 or SEZero incidents. These are things that we often think about wanting to really avoid because something bad has happened. But when I got talking to my guest Michael Kill, we were looking at these as unique and very powerful learning experiences. And if you're a more junior engineer, these can be some of the best opportunities for you to learn about a whole system around you because the other engineers you're with are also learning incredible details at the same time. So, I think this is a really interesting conversation. I think it's a helpful perspective for people to have and I think that you're really going to enjoy it. So sit back, enjoy and I'll see you next time. >> Michael, thanks for joining me. Um if you don't mind kicking us off with a little bit of your career journey and as I mentioned before the call started uh as as sort of early as you'd like to go would be great, but um you don't have to start at a particular point. >> Okay. Uh so I did not study uh computer science or computer engineering or any of that in college. Um I was I think I started as a computer engineering major but I like partying too much to really have that rigor. Uh so I switched to what they called imaginary engineering which is really industrial engineering. >> Uh and spent that's my that's what my degrees are in. Uh and really uh the you know I I found that I was doing a lot of programming in that in that those courses. Um mostly because the the professors that I had were big on simulation. So like simulate this factory, simulate this machine. Uh and I really enjoyed like okay let's see how we can make this simulation as efficient as possible. Let's see how we can you know figure out how this factory floor there was a a factory in in Skoi where we need to move the machines around and change it. Like what's that going to look like? What's the whip going to look like? All that kind of stuff. Uh so uh got out of got out of college, got my first job at a consulting company. Um interviewed at Microsoft on campus and they were like tell us what a linked list is. And I was like I've never had to use that and I don't know what you're talking about. So I learned what a linked list was in my interview with Microsoft after as I was graduating. >> Um >> so did consulting for a while. Uh started off as a Java developer. Um the that sort of came to a an abrupt halt uh at like shortly after 9/11 like I think the Friday after 911. So that's how bad this is going. Um spent a lot of time I mean they had a pretty nice severance package but like >> a couple months like spent a lot of time after that unemployed uh and unable to find a job. Uh so much because that the economy had just pretty much grown to a halt at that point. Um, I had like I was to the point where I was going to Target and Barnes & Noble uh and applying for like stocking jobs and like I need I need to be able to like pay my mortgage because I had already had because the days before 911 were pretty heavy. Like they were I mean it was also like pre.com bust, pre- all that. So like I had enough money to put a money down on a condo. Um, but like pretty like 9/11 happened, the whole economy grinds to halt. I'm like, I need to be able to like maybe make this stretch 6 months, 8 months. Um, you know, I went in and and filled out the the little floor worker thing at Target. Uh, and Barnes & Noble. Uh, Target never got back to me. Uh, Barnes & Noble got back to me, I think, six months after I had actually started my next job. So it was yeah it was it was fun. I was like Target doesn't even want me which isn't you know to disparrage Target but it was like I got a master's degree. I'm pretty sure I can like you know hang >> in terms of the qualifications it's pro it feels like there's a discrepancy. Yeah. >> Yeah. Um but uh so then an old manager from the consulting company had found a place and he called me up and I went to work there. Uh it was a startup. Most of the companies I've worked for have been pretty small. Uh it was a startup. I think there were maybe 80 people there uh basically doing marketing websites for law firms. Uh so uh you know the the big you know the AmLaw 100 the big giant law firms they all have websites and it's a very big ego thing I would say like everybody wants the nicest website so they can compare it you know at the conferences and u not a lot of traffic I would say like a fair amount but mostly from like people looking themselves up um >> okay >> but uh and then uh you know I stayed there for about eight years. >> Yeah, I think about eight years. Uh, and I had just gotten burned out on it. Um, I didn't want to like I didn't want to care about whether or not some some lawyer's photo was showing up on the website correctly, sort of framed. And like these were the things that were big deals. Like they would call up and say, "Why is my why is my picture not showing? Why does it look skewed?" And it's like >> like hard to be motivated by what the the actual outcome of the work is. Okay. Yeah. >> Exactly. Exactly. Um, so I had been working on a side project for a while uh that had started to get some traction. I started to get a couple clients uh and uh my wife had finished school at that point. So she was she had an income and insurance and I had very good money from a bonus that I had gotten in 2007. So in 2008 I said see you started my own company. Um that was right before 2008 which was another sort of dismal economic event. Um so that was a lot of that was a lot of stress. Um everything since running my own company has felt like a vacation in terms of a job. >> Before we move on, this is just a reminder that I do have courses available on dome train if you want to level up in your C programming. If you head over to do train, you can see that I have a course bundle that has my getting started and deep dive courses on C. Between the two of these, that's 11 hours of programming in the C language, taking you from absolutely no programming experience to being able to build basic applications. You'll learn everything about variables, loops, a bit of async programming and object-oriented programming as well. Make sure to check it out. >> Like 100%. Like nothing will compare to that stress at all. Uh once we started hiring people, I think we grew to about like 12 people um at at the biggest. Uh it was just like like you know, how are we going to make payroll? Who's not paying their invoices? Uh who do we have to disappoint today in terms of the clients because we only have so many people and they want all these things. Uh so it was it it was very stressful. And then uh my wife and I had our second child and it was like okay we can't like I can't dedicate the amount of time that I need to dedicate to this startup to to keep it running. Uh so went uh basically shut that down like there was no acquisition, there was no buyout, there was no nothing. It's just like like we gradually over a period of time let people go and then one day you know uh I had one one employee into like 2012 I think that was still servicing one client but they left and then the client was just like okay we'll figure something out. Um so anyway uh no that wasn't that wasn't 212 that must have been like 2018. So in 2016 I shut the company down. >> Um and then so I was eight years and then I went to I got my first for into healthcare at that point. Um which is it's a really like for the amount of money that gets poured into healthare >> uh and healthcare technology. It still amazes me even to this day like years later like how bad some of it actually is. Um, and I don't mean like uh, you know, I don't mean to say like you know your your data is not secure like all this stuff, but like the underlying like the underpinnings of what the data look like, how bad it is, >> like how it's miscolcted and miscatategorized. It's just like we should be better at this. >> Like there's no consistency. It's like there's your data is flowing but it's like it's just a mess from all these different sources. Yeah. like the amount of like one of the things that we that that company was working on is like letting patients know before they go into the doctor's office exactly what's going to happen, right? So, let's look at their history. Let's look at, you know, their medical history, their charts, like the reason for the visit, all that kind of stuff, and come up with some sort of communication blurb that we can send to the patient to let them know, hey, when this when you show up, this this is what's going to happen. because the the hypothesis of the business was um people don't show up for appointments because they don't know what's going to happen and they don't their their default is it must not be important >> right I see u so true hypothesis you know I I think but it was just like how do we get how do we get a narrative created uh now with like chat GPT years later that would be very very easy uh we might you know but like back in the day like in in 2016. I don't know that they would be that something. So, I was there for a couple years. Uh had kind of a falling out with the founders. I was the founding engineer there. Um and then or I was the first engineer, let me put it that. >> Sure. Yeah. >> Uh and then, uh left there, went and worked for another startup in Chicago that did logistics. Um like actual fulfillment logistics. It was my I I I did a little like seven-year stint in logistics. Um great company. loved the company. Uh like would be there today if not for like one or two decisions that they had made that just did not did not jive. Um they uh they never had a CTO. So I reported directly to the VP. Uh and somebody convinced them they needed a CTO. Uh and since they were a logistics company, they went let's bring in a CTO from Amazon. Uh and you know, some of the horror stories about Amazon are not true. uh some of them are and just the way of working and the way of thinking and the level of you know top down sort of control that is had there um I just I you know again I was in a fortunate financial position where I could do this but I was just like either you're right about this or I am and either way I need to get out of your way right like I just I just >> like at some point it doesn't matter who's right it's like the it's too it's just too much friction to to operate so Yeah. And the, you know, the guy and eventually the the guy that hired the CTO and everybody else was like, you know, we gota disagree and comm or uh was it disagree and commit is their big phrase. >> Uh it's like no, no, I don't. Like I can go find another job. Like this isn't the army. I can leave whenever I want. >> Um so that's that's that's essentially what happened. There's parts of that that I like I definitely align with like the disagree and commit, but you can't just say that as a blanket statement for absolutely everything if you literally are not comfortable with committing to it. So >> like how do you like disagree and commit? Okay, like you can say the sky is blue and I can say it's Azure, right? Uh like there are there are there's a a limit there >> but how big that disagreement can be. Uh, and once you cross it like and you realize who these who the the person is that you're talking to, it's like, "Oh, you're that guy." >> Yep. >> Crap. >> In which case, time to move. >> Yeah, >> it's time to move. Um, went from there to another logistics company, stayed there for about three years, uh, trying to, you know, trying to get some work done there. Uh, they ended up, uh, it was like a lastditch effort to like save a particular business line. Um and uh that didn't that didn't turn out to be the case. Uh like it didn't it just didn't work. Uh the uh they had a few clients that were sort of like allowing the whole experiment to continue. Uh and when those clients said, "Okay, we're going to go our own way." Then like the whole thing just sort of >> I see shut down. Um which you know happens, right? I mean like you know we I I could probably talk about that for an hour or two but uh learned a lot of stuff there. Um you know that was I think that was the first place I was at where I had stopped coding right. So even previously even even previously in my last last role uh I was still pushing code to prod um you know I was still still part of the dev the dev team uh felt like I could get my hands dirty. Uh this place u you know based on the technology stack that they had I was just so inexperienced with it. Um, you know, I I had been, you know, from 2002, I think, when .NET the first version of .NET dropped, like I'd been a net person set until like and I still am, but that was a little window. That's right. >> It um they were a uh they were a Java shop on AWS and >> I'm so sorry. Yeah. >> Yeah. It wasn't bad. Like they just they they the technology was really fine. um you know curly braces on the wrong lines though. It's just and they don't have the same casing. It's not okay. >> And there's a factory factory every third line. It's just not good. Um but the uh I mean the technology was fine. It was more like you know how do we find product market fit for this business? It was essentially an acquired business like for this within our org. And we tried and we tried and we tried and you know ran into political stuff inside the company and a whole bunch of stuff and finally the that one one client left and they were just like you know what let's kill it there's there's other options uh which brought me to where I am today which is Aspen Dentl um or the tag group which is like Aspen Dentl is one of like five companies under you've probably seen their ads on TV uh or or YouTube or whatever right one of five companies underneath Um, and this just the Aspen Group umbrella, but really it's, you know, dentistry, uh, not dentistry as a service, but like I think the original founding proposition, it's like dentistry for uh, oh Christ, now I can't remember who what was the company the the company where you get eyeglasses in about an hour. Lens Crafters. It was like the dentist. It was the they they started off as like the dentistry version of Lens Crafters. is so strong that I forgot the company name. But when you said that I was like I know exactly what you're talking about. I just can't remember the name. Like that's how strong the brand is. Like the the message >> that anybody who needed glasses like knew Lens Crafters because in a pinch you're like I broke my glasses. What do I do? Um you can get a new pair in an hour. Uh you know but it was the so so that's where I am now. Uh I work mostly on the the clinical side of the house. So, uh, writing the software like if when you go into the dentist and you see like, you know, your the like the image of your teeth with all the problems on it. Uh, that's actually called an adtogram. I didn't know that until about a year ago. Uh, and like how the doctor records all that stuff, like how they track the progress. Um, we're using a little bit of AI now to like figure out like help the doctor identify like issues with the teeth. Um, and then the process of like, you know, if you do need something implanted, we call it like an appliance like a crown or dentures or whatever, like the manufacturing process for that is also the I I my teams I should say uh manage all of like all those workflows. Uh, so I've been there for about a year. Uh, it's been it's been good. Uh, there have been a couple Pzer uh both from a leadership perspective leading into this. uh there's there's been both you know P 0 from a leadership perspective you know a month after I started the CTO was uh was asked to leave and the CIO who had just recently started I think sort of took over like the entire dev shop um which was a huge change for the people in the dev shop um like different personality entirely >> uh different cadence different you know >> different expectations for what >> and how long after you started story I know you just mentioned it but you said you had started in just after that they change or right before >> so just after. So I started in June of last year and I think it was like late June. Uh because one of the things I've learned frankly like the tip to everybody like if you're going to leave a job or negotiate an exit date have it towards like late in the month. And the reason I say that is your benefits, your health care specifically, but your benefits will carry until the end of the calendar month regardless of when you leave. Um, but if you can start your next job like in that same month, then usually there's a 30day grace period uh like a 30-day window before, uh, your insurance kicks in. Uh, so you only have like if you time it just right, you only have to pay for a month of Cobra. Uh, and yes, that's one of those things I've had to learn, uh, like the hard way of just like, oh, because Cobra's not cheap. Um, but in any event, sorry. Uh, so I started in, uh, late June of 2024, and that transition happened, I think August 2nd of 2024. So, >> I was there for like a month. >> Yeah. Like I was just getting used to like, okay, what's the vibe in the office? Like, how do people like to work? Uh and then that like we we came in I think it was on like a Tuesday you know we we came in that Wednesday and it was like everything's different. It was like okay let's figure out how this goes now. Uh and you know we had a number like speaking of Pzer right we had a number of people who just said you know what I don't want to be a part of this uh and and exited themselves or Wow. >> Uh yeah uh there was >> so that dramatic of a change then. Yeah. >> Yeah. Well, it it was I think it was both, you know, who the old guy was and who the new guy was. It was very sort of uncertain. Uh there were a lot of people that had followed the old guy to the company from a previous company. So, there was a lot of loyalty there, right? >> Um which is usually why I don't like doing that kind of thing because like you lose one, you lose five people. Um but, uh so yeah, it was it was a substantial change. I'd say it probably knocked us back on our feet as a as an org uh for at least two months of like okay like what are we doing? Who's in charge? Who's setting the priorities? uh you know which honestly like I there was a call I I think I saw one of your uh you you were speaking to a young woman about like code debt or like tech debt and like not going rogue and >> making sure everything's communicated like those little windows of like there's two months of chaos like nobody knows what's going on. Nobody knows what the alignment is. It was like strike while the iron's hot guys. Like what are the two months? What are the two thing? What are the things that you can get done in two months that like they wouldn't otherwise fund but like we can get done? Uh and we got some great stuff done. So, in any event, uh, >> on that note, that's super interesting, though, because that must imply that there is, and I'm not I guess you weren't suggesting otherwise, but to me, that seems to imply that there was actually like a lot of um I don't know what the right word is, like passion, I don't think, is maybe the right word, but that people were super motivated. There's like, we want to get some really awesome stuff done. Like, let's let's go do it. Like, we have this chance. Like, that's really cool. >> Yeah. Yeah. Yeah. No, I mean I I think the like I have I've never worked for a big tech company. Like I just haven't like I've worked for companies where tech is sort of like a uh almost always a cost center. Uh where >> and like your your hands just get tied a little bit, right? Like you can you can want to experiment with stuff, but unless you can find a way to like sort of shoehorn it into a project or like get it like get approval from somebody that you've somehow sweet talked like they care about like the company I work for right now cares about the dentists. Like they don't care about my resume. They don't care about you know whether or not I have to be up until midnight every night fixing a bug. I mean on some level they do. Uh but like they want like they want stability. they don't want the the shiny new toy. Uh and if anything, they don't want the shiny new toy, right? Uh so having that little window in there was like, "Oh, what if we experimented with this?" Like what is what is, you know, tell me the the new minimized APIs from Microsoft. Like they're not going to give us time to go through and refactor stuff to that, >> right? What's the benefit of that going to be? Like make a business case for it. If it's not going to save us money or make us money, like >> let's not do it. >> Why do it? Yeah. Exactly. Uh which is why like you know some places I mean the not not the place I'm at but some places that I've worked at are still running VBScript or like VBScript sites internally right like the oldasp files and it's like if it works it works you know if it's not if it's not the bottleneck in a business function no one's going to touch it. But that's that's a really important point though, right? And I think like to reiterate on this because I know that from conversations online, people get completely bent out of shape on this, but I I think until you see like more of the business side of things, like you you may not realize like why that's the case. And I'm not saying that we should never refactor things or never address tech or never experiment. I'm not saying that at all. But to to be of the mindset like we always want to go like perfecting code or we have to grab the latest thing like this can be a huge detriment to a company if you're in a position where you're creating costs or introducing risks instead of like we're a business and you said it earlier being in this situation where founders are like how do we get the next paycheck to people like >> what you're going to go refactor to minimal APIs because it's new and shiny like come on No, you're not. It >> It starts to seem ridiculous when you when you put those pieces together, but I always try to remind people that if you're working somewhere and it feels like as a developer and you're feeling like I we never get to do tech debt, we never get to do anything else like you need to start trying to pitch these things as like a business motivated. >> 100%. Yes. 100%. like one of the I know it's not like uh like one of the things that you know the I don't want to get too much into the the sort of the chain of of whatever but like >> this company the company I was working at was spending um like just a well I can talk about this in general like >> software engineers these days don't have I feel like I I feel like they've started to pay less and less attention to the database um just because it's like either the company they company they work for has a DBA uh or a couple DBAs that are supposed to m take care of that kind of thing. Uh or they're using an OM, right, or something like that where they don't really have to understand what's going on. >> Um I mean >> there's data going in and data coming out and maybe some relationships but like not the >> that's it. But like the amount of but usually what happens in those cases is and it's a it's a it's a it's a false choice that gets made by either DevOps or old school operations people. They're like we can't like the engineers have done everything they can. We cannot uh we cannot ask them to go in and do more. So now we need to go to 64 cores on our database server or 96 cores on our database server. And it's like, yeah, no, that's you can like there's a part of that feedback loop there. Yeah. >> Where you yourself can be like, hey, you need to do some index tuning or you need to do whatever. And like one of the one of the big ways that I've been able to impact a couple different companies over the years is like >> drastically reduce their database costs. Um, >> I'm not a database guy. Like I know nothing about like my like at all. like my the the the chief data people that I've worked with would laugh at this this idea, but I'm just like, "Hey, why why are we joining to this?" Well, the OM's doing it. Yeah, but why? >> Like, it doesn't make sense >> because it it works, right? Like, >> yeah, it works. Yeah, but you're spending a lot of money here. >> Yeah. Um so in any event uh but but those are the kind of like those are those windows that you have where it's like you need to find a way to justify that as tech debt right and it's not as simple as you know refactoring a method to you know stop using automapper because they they're now a company or whatever the new deal is. Um, it's you need to reach across boundaries and be like, "Hey, how are we affecting you? Let's fix that." So that you can come out of that chaos, one looking great to the upper management because you've just now saved them a couple million dollars, but two, your software is better, like fundamentally better. >> Anyway, >> so yeah, you you started talking about Pzer, and I know that uh before we got recording here, you had mentioned like uh people versus like technology P 0. So this first one that you started introducing here >> was like a a people P 0 where basically um there was a huge change and not only with like sort of uh like a person in leadership changing but then like people like leaving and and like it's just like it's very different. Um, and maybe it's worth kind of saying like from from your perspective when when you go to define like what a P 0 is because I think um for for context and I didn't say this before we started recording. I think for me when I say P 0 I actually I think what we're closer talking to is almost like a sev0ero in like in the language that I would use. But the the only problem with that is like it's heavily anchored to an incident um like specifically like a live site incident for us. But I wanted to bring that up because I think how you were defining P 0 was very interesting that it's not just a technology thing. So do you want to take a moment and kind of like walk through like how you were framing that? So basically how I frame P 0 is is really like it's a it's a point at which a fundamental assumption >> about the nature of a relationship between two things is revealed to be untrue. Right? So there's some assumption that was made by somebody somewhere uh and you know maybe it was 15 years ago, maybe it was five minutes ago that all of a sudden just blow like it just oh that's not true at all. >> Uh and now some effort needs to be made to like fix it, right? Um and there's there's for anybody who doesn't know I yeah zero P 0 I think are synonymous. Um but like there's multiple layers here, right? So like a P a like a for us, right? A P3 is something's broken. There's a workaround. It's a small amount of users. Like get it out on your normal relay cycle. Like don't don't even don't even interrupt the sprint. Um SE two or a P2 or I guess P3 is there's something that's broken. There's not an easy workaround. Uh release a hot fix maybe. uh depending on how disruptive that would be. Um a P2 would be there's something broken for a large amount of people like on a critical path through the software. Uh so you know the example I'm in right now like if uh people couldn't order dentures anymore for whatever reason, right? Or uh you know we couldn't get estimates from the insurance companies for whatever reasoning like actual scenarios from happening. >> Yes. Yeah. Um and it's like ten like hundreds or thousands of people are being impacted. >> Yeah. It's not like a a user's one user scenario is not working. It's like this is a a primary scenario like Yeah. >> Yeah. Exactly. Um and then like a P1 is the like the whole thing's down. Like it's like you don't know like literally somebody threw in a team channel or a Slack channel and said, "Hey, what's wrong with the site?" Uh and that's literally all you know. and you get there and you're like, "Oh, nothing works. >> Nothing's working >> like at all." Uh, what has happened? And it really like it it becomes almost like a it become it very much becomes like just a a murder mystery at that point, right? like what like what changed and and different groups break off sort of organically and try to investigate different things like the networking team will go off and look at the networking logs or you know uh the engineers uh typically will you know they'll they'll open up Spre and see what the last 15 releases were and how that could have possibly changed something. Uh you know there's there's there's planned outages but then there's unplanned outages, right? Um, and then really you start to see like in a in a really juicy P1, you start to see how like this person made this change and this person made this change and either one of them independently are okay, right? Um, but they made them at the same time and that caused some sort of cash collision or that changed how we're reading stuff on Launch Darkly or whatever it is and then the world just falls apart, right? Um, and you know I am I'm and and it could you know we were talking about the the Google thing from like two months ago or I think it was about that you know where some engineer like changed a YAML file and pushed it and I don't know if you you knew the the whole story on that like some engineer >> all the details >> changed a YAML file there was no validation in the pipeline so by the time the YAML file landed where it was going the code that was reading it there had a null pointer and the way the global sort of you know >> instant replication happened it just took everything it was one one f one field in one file just took everything down >> and that's why we don't use yaml right but >> but like the the problem I think the reason it lasted so long I think I remember reading this is the system they would use to fix it was the system that was broken so like it became this circular loop I think they had to go in and start manually fixing these files that were distributed on all these nodes um because the the distribution p like control panel just didn't work anymore anyway. Um so yeah, so that's that's a P 0 and like that's how they leak from one like entity to another. Like Google's P 0 became our P 0 became our physicians or our doctor's P 0. Um those are somewhat okay because it's like okay they're down but it's not us like >> you yeah it's interesting right because like it's a different type of pressure >> because you might go like okay like it's there's something relieving to know like hey we didn't go screw this up like okay >> deep breath now you go wait a second people are still being screwed over though so like >> like now it's like how can we it feels like more how can we help them versus less of a oh crap like it's my fault it's my fault what are we going to do like >> yeah which is where like another reason that I really and I I I don't want to like say I really enjoy P zeros because that like >> I don't >> I mean I do but like I I don't you know what I mean I think um like you like you can like software engineering and systems design becomes jazz at that point right like you're just like okay what it's not best practices but we need to have these people up and running in the next hour. What can we actually do >> to make this work, right? Can we reroute traffic? Can we jump off, you know, Cloudflare for years was having all these problems like can we get off Cloudflare? What is what risk does that give us? Like how do we route traffic through our network or through this other office so that these facilities become active again? And it and really, you know, I think the main thing is you just learn a lot about the people you're working with, right? We we talked a little bit about this like you you really learn like who knows their stuff enough to play that jazz right and who knows you know who who knows things that they don't know um right or they don't know that they know like you know somebody's like oh wait a minute did you just say did you just say this word do you think oh that's what and then all of a sudden all the dots connect right um it's it's when they >> when they go well. I've never been I've been upset at Pzer's. I've been mad at P 0. I've never been upset that I was on a P1 or P 0 call, right? Like, so for all the new engineers out there, like anytime something goes wrong, if there's an emergency Slack channel, emergency teams channel, you see something go wrong, like jump on that call because you're going to learn how all of this stuff connects. Like your little for loop uh that was, you know, doing whatever sits inside this giant ecosystem. Um, and if you don't see how those things relate, like you pick it up during a P 0, like >> yeah, like it's >> you almost have no no choice but to like be forced into learning, right? because you're put into this environment where you're seeing all these other people probably like if we're talking especially about junior developers you're probably around other people that are significantly more senior, more tenure, more experienced and they're also going through this learning process around you where people I like how you were talking about like different functions might be like uh like role functions might be looking at different things trying to investigate trying to pull this information together to really understand like what is this scenario and those individuals are going to be learning from the other people as well. So you're in this like this think tank of people just being like we have to connect our brains and pull in whatever data we can. You have to start uh when that information is flowing in. It's like okay well what can we rule out right? So a lot of times this kind of these conversations happen and it's like oh well that's not the right answer. Oh no like but it's not really that's okay. Like it's good to know things that aren't related. You rule that out. you can go park that >> go focus on the next part, right? Like really interesting learning experiences for everyone. >> So, one of the things uh uh World War Z, have you seen World War Z? The movie? >> I have not. I've heard of it, though. Yeah. >> Oh, okay. There's this there's a scene in the movie where, you know, um I don't think it's spoilers at this point. Sure. Where like Israel Israel survives the zombie apocalypse at first better than everybody else. And they're trying to figure out why. Uh, and the Brad Pitts character is talking to the the the the general or whatever and he's like, you know, how do you like how did you survive? And they were like, well, we heard all this stuff and like everybody agreed it was nonsense, but like we have a policy and I don't know if this is actually a policy or just a, you know, something in the book, but like a policy where if nine people out of the 10 agree, then it's the duty of the 10th person to take it seriously as if it's a thing, right? So like when I jump on those P 0 calls and I hear we are 100% sure that it is this network segment that's down or it's this code this microser that's causing the problem or whatever like if it's not my micros service or it's not my network like if I'm not directly involved like I force myself to be like okay assume they're wrong >> right assume that they like everybody agrees it's that this is the problem like I'm going to assume that they've made an assumption and go off and see if I can find any evidence at all that what they're doing is wrong. Right? I'm going through Kafana. I'm going through Google Cloud logs. I'm going through all stuff going, "Wait a minute, when did this start? When are you saying the time started?" Like, time zones when it comes to these outages like are the most irritating things in the world because somebody could be like, "Oh, it started at 9:30." And then everybody localizes that in their head and the guy who's helping you that's out on the West Coast is like, "I see this started at 7:30." It's like, yeah, it's the same time. >> Um, like forcing yourself to just disagree or not even disagree, but like just make the assumption that their assumptions are all flawed. >> Uh, you get extra validation from different angles. Like you might try to disprove it and you're like everything is that from my perspective is also proving that this is true. >> Yes. >> That's great. Like that's really helpful. >> Yeah. Yeah. Yeah. Uh, and and the other thing, and I I'll say this too, like the other thing that Pzer's have, they're very egalitarian. Like rank and position and titles and all that stuff go right out the window. Um, it's all about solving the problem. It's not about anybody like when they go well, I'll say. Uh, it's not about anybody jockeying for position. It's not for anybody trying to make a name for themselves. It's like solve the problem. you know, whether it's a an SVP that hasn't touched this like weird obscure system that nobody has looked at in like 20 years and maybe that's where the problem is versus a junior dev who just did their first deploy. Um, you know, both both people have equal weight in those calls because they're both like it could be the junior guys, you know, issue or it could be the senior guy's issue, right? like let's just solve the problem and we'll worry about RCAs and the five W's and all that nonsense some other time, but let's get everybody back online. Um, and there's there's a beauty to that, right? Um, and the the thing that's most often frustrating is when people bring in their like I'm the SVP of like you're all going to take direction from me on this. It's like no, you don't know. Like you're you're being disrupted is what you're being. >> Yeah. it it's actually not helpful. >> Yeah, it's not helpful at all. Um, so yeah, >> I think the >> really the subject matter experts are the ones who like they should be running those calls like hands down. I don't care if the CTO's on. I think the where I've seen like it's not um it's someone stepping in to help organize I have found to be helpful but not um yeah not to like not to bark orders because I've seen people that join those and like one of the roles that they play is actually like they're observing chaos and like they're saying or they're observing like okay we have you know team X team Y and team Z like they're able to see okay I I can see the subject matter experts kind of doing their thing but what's happening is the communication isn't effective or it's uh like people aren't aligning. So some people step in to play that role of like let me help organize the information as it's flowing in >> and that could be helpful but what they're not doing is like >> everyone stop listen to me and I will organize like what you have to go do like not the order barking. >> Yeah. We like the company I'm at now get a really uh I I am I am mostly anti-process uh when it comes to things like I I believe that process just begets other process begets other process and pretty soon it's just a bureaucracy and nothing gets done. Uh, one of the things that they did, um, maybe four or five months ago is they introduced this thing called a a MIM team, a major incident management team for P 0 and P1s. Um, and their job solely is to communicate with the field. Like, so every time something happens, the MIM team gets called on like I think I think most of the engineers are like, what are the why are these people here? like at first and then and then pretty quickly realized the amount of value that they were bringing because it was like, "Oh, they don't have to send out a five paragraph email updating people on the status. These other people, these MIM people will do it." Uh, and they'll take that off of our plates because like if you're if you're sitting there and unable to do your job, the worst case scenario is you don't know why, right? Like you just you don't know why. you're just like I like your boss comes by and like hey how come those numbers aren't crunched or whatever and you're like I can't do anything and they're going to be like why and then you know uh but these like the the MIM team like I I I think yeah I I was definitely kind of like >> okay is this is this overhead like what is this and then you knowice one or two things happen it's like oh okay all right I see it and now you know I'll bring that to the next company I work at where I'm like, who's on our MIM team? Um, so anyway, >> yeah, that's that's super cool. I think one of the things you mentioned that I I kind of wanted to reiterate that I I really like I've talked about this in sort of under different circumstances, but I and I don't know the like a good quote for it exactly, so I'm going to butcher it, but something along the lines of like um you know, constraints breed creativity when it comes to engineering. And um I I like thinking about it this way because >> you know as as software developers, as software engineers, there's many patterns for many things. You know, many problems have already been solved. That's why we had Stack Overflow that was doing so well for so long. Now we have all the other LLMs to to help us along with the same type of thing. But a lot of things have been solved. They might be packaged a different way. They might have a different color, different face, but a lot of problems are solved in similar ways. And I I think what's interesting is that we can take different challenges or different engineering problems and when you start adding constraints onto them or changing the constraints suddenly it it like can force you to be very creative. So I'm going to try to come up with an example off the top of my head that's probably not going to work. But if you took something that was very simple and then you said, "Okay, but you you know that you wanted to set up a you know a database for your web server and you have you can go pick any database you want now and any web technology and then someone says but you need to make sure that like it can deploy in like uh literally 5 seconds and it has to have a a size constraint of like x number of bytes and you're like nothing's going to work for that." And they're like those are your constraints like >> straight get to it. >> Get creative. Yeah. >> Yeah. Yeah, I think I think it like it's a double like best case scenario is that, right? Best case scenario is uh you know it really let me put it this way. I think it depends on who's setting the constraints, right? Um you know there are I've been you know I've been in situations that you know it's clear that people are doing you know what what's called resumed driven development, right? Sure. Where they're just like I want to put this down so I'm going to use this technology stack. It's like, okay, really? Uh, go work on a side project. Like, I don't know what the business doesn't want for. >> Yeah. Um, if it's a, you know, if it's a business person putting the constraint on it, like, you know, they the the most they can have is 5 seconds of downtime, right? like because uh you know it's it affects the call center and every minute in the call center that isn't available constitutes you know $200,000 of loss lost value. >> Okay, sure that makes sense. Um when it's engineers putting constraints on themselves for no reason. Um I think that's what I what I mean by like process be gets process. >> Yeah. like like we have these and sometimes it's not so like we have DevOps tools now that we never had like literally you know 20 years ago when I needed to deploy to production it was you know one of two things I would do a local build I would zip up the bin folder I would email it to myself I would open a web mail client on the server I'd unzip it and I'd overwrite the bin folder >> uh >> you are the deployment system Yes. Or uh you know in some cases you could get maybe sign off on like an SFTP that like where you could like version it a little better or drop it a little better. Then like all these tools came out like you know I think Octopus Deploy was probably the first one. Uh and then you know Spre came out and now Argo and and all these these great tools that let you like push code out in like a matter of seconds, right? Um but we've responded as like we I mean a lot a lot of companies have sort of responded by like saying like okay is it good though that we can push code out in 5 seconds. >> Should we have a process now so that the deployment takes just as much time as it used to take when I was zipping folders up and emailing to myself or should we should we put all of this stuff in front of that to stop it from happening? It's like, well, kind maybe a little of both, >> right? >> But like we shouldn't we shouldn't >> we shouldn't put unnecessary walls up, right? I mean, we just we just shouldn't. Um, anyway. >> Yeah. And so the thought around like being in these like SE or P 0 subzero situations, it to me is is very interesting when it comes to those constraints because I like what you said earlier around like >> okay um one of the primary things that you want to do in those situations is to mitigate, right? Like >> sure we root causing would be lovely but like if people are not having availability like probably want to take care of that right away and >> depending on where things are at you might go wait we have we have all sorts of constraints that we don't normally have. If I'm just making this up but if you're like if it took an hour to build and an hour to deploy and you're like we don't even have a code fix. So like like you if you wanted to do something in less than those two hours, never mind the code fix, like that's a constraint that you now want to try and work within. If you said we need to make sure that people are up in the next hour, okay, there's a constraint. What can we do >> within that constraint and now you have to start having creative engineering solutions to go tackle those types of problems. So >> 100%. like and that's when it that's when you know you know the the scene from Apollo 11 or whatever the mo the the the Tom Hanks movie they're like not only do I want to like see the plans I want to talk to the guy who actually built the the bolt in the the the factory right that's when you're like you pull the the the junior engineer the senior engineer the staff engineer like okay tell me exactly how this works >> right like tell me like is there some play here is there a configuration setting that we can turn turn off. Is there, you know, is there anything that we can do to get that, you know, to to get over this hump, right? Because right now we're dead in the water. Um, and that I think like that may be when I realized the the power of these meetings is I was pulled into a meeting like that. But like always like always bring junior engineers into these things. Always bring your your staff in at time of providing or whatever. Um because they're going to realize like oh wait like that actually does matter. like is this a for loop or a while loop? Like I mean it kind of matters sometimes but like you know like there's there's really specific things where you go oh we can just turn this system off and then that won't process those events and then that'll free this system up over here and then then you talk to the business people and you're like hey how do you feel about like not getting like an accurate uh you know accounts receivable report until the morning and they're like oh that's fine like I didn't need it this afternoon anyway. way like, "Oh, sweet. We can kill all of this traffic and all this chatter on the network and suddenly we now have the headroom to actually bring other people back online, >> right?" And if you had gone into that being like, "But like we can't touch this because this system and we like we don't want to we can't affect that." And like if you don't start challenging some of these things like you have to start making these decisions in some situations where it's like we may need to make a trade-off here >> and that trade-off might be this other system this other part of the business might be that might be disrupted temporarily but in fact it's significantly more beneficial to disrupt that temporarily and get 99% of everything else back to like spot. So, I'm wondering like hearing you say that like I'm wondering if like the the definition of a P 0 is um isn't like a chain of assumptions. Uh like >> an assumption is broken, right, of like you know Google's always going to be on or SQL server will be performant or whatever. Um that really forces you >> to break other assumptions. >> I think so. >> In order to restore >> Yeah. Interesting. I I think so because when like even when you had started uh giving me a little bit of a heads up before we were recording on how you were thinking about defining that when you started talking about it like a like an assumption you know like you you assumed something to be true and now that's disrupted when I think about and yeah you kind of said maybe it's too philosophical but we're kind of going there with it. So I think that what what triggered for me was this idea that like when you are assuming like a truth like it's sort of like becomes a law in your mind that's gener in my mind it's generally based on other assumptions that have to hold true and if you start unwinding that a little bit for example in this like example we were just going through you know this other service for processing reports or something like that always has to be up but like >> does it like because maybe we cannot do that temporarily And now you start challenging other assumptions. Um, and I think that that's really a critical thing to be able to do as you go through these because >> when you have those constraints and you can start kind of like breaking the rules for other things, that's where you can like start getting more flexibility to deliver your options. >> Yep. And like we haven't touched on it too much, but like the >> this like like we most and we mostly focused on like uh you know technology stuff >> uh like services going down, outages, whatever. Um but when it comes to people, this holds too, right? Like when you go okay like so for as a as a people manager, right? like people, my my my guys, my folks have Pzer all the time in their lives, right? Like something's going to go wrong, right? Uh you know, I won't get into I can't technically get into the details, right? Um but like everything from you know family members getting sick and needing it like needing care to um and that's you know as I get up there in age like parents getting sick and like it's not it's not just kids or um you know your spouse or you know during COVID this was a huge like there was a lot of mental health stuff that was happening uh people you know people just isolating for you know weeks on end. Uh, you know, there there was one guy that that worked for me. I was like, I'm pretty sure he's been wearing that shirt for three days. Like, I need to have >> like how do I have this conversation with him? Um, >> but like the uh like coming at people and and really looking at them from like a P0 like okay, is this is this a fundamental change of our relationship where I have to change other assumptions? Um, and if so, what are those and how do I make them as minimal as possible, right? Like, do we need to do FMLA? Do you need to go on leave? Do you need additional support from, you know, whatever legal system that we have like the usually there's a some sort of legal advisor benefit at a company like do I need to forward this to them? um you know do I is this going to be and frankly like is this a constraint that I'm violating that HR is going to be okay with right um and and those are the constraint the same same kind of thing right like I am going to go tell HR that this is what we're doing does that violate a constraint that they have set right uh especially for the logistics company like there there are you know zero tolerance policies whenever just for insurance reasons right at uh at logistics companies, people are driving forklifts around, people are using heavy equipment, um you know, if if somebody comes into work a little, you know, hung over from the night before or whatever it is, like that's like zero tolerance is >> you might be like, "Ah, I know I know Bob and I'm like I I feel like he's he he's whatever. He's a little tired, but like that's fine." But like no, like >> right there is not okay. And then and then you need to unwind that a little bit and go like okay is this is this the first time like what do I know about what's going on with Bob right now. Um like is this the first time this has happened? Is this you know a black swan event or is this becoming a problem? And then if it is becoming a problem, and this scenario actually did happen, like what what do I need to do as a manager to help Bob? Like I guess first as a person, right? Because they're they're having issues. Um or, you know, or what do I need to do from a company perspective? Like usually the company perspective comes like three weeks later. uh frankly, but like what what do I need to do to help Bob and how is that going to line up against the constraints that are have been set down? Um you know, I've had people, you know, um the other Yeah. So, so yeah, and sometimes it ends like it the relationship ends, right? Sometimes the result of the P 0 is oh I can't I I cannot accept whatever redefinition of this P is like whatever assumptions are being dismissed here as because as a result of this P 0 like I can't accept them as a as a manager or the employee can't accept them as the employee um you know that's very rare uh but you know you know what happens right um and so yeah >> that's no it's definitely interesting because I I I agree like the same concepts apply. The um the I think one for me that has come up that I I never considered this in my time before Microsoft cuz I worked at a startup for eight years. Um you know we were at the point where we had like a satellite office but it was like 4 hours away like it's really just a different office. It's not like we had remote employees everywhere or something. when I came to Microsoft, it is a global company and I didn't like and I'm you know I was here on a visa. So I'm originally from Canada and the US on a visa and I barely even understand my how my own visa works. I I don't I just don't know how these things go. And then having employees that are on other visa types um and then like basically not getting the lottery placement and then I'm being told like hey >> like your employee can't be here anymore. But what's really cool uh and you know I don't know all the details of how this works but at least in some situations where this has happened Microsoft has been very good at like doing all of the pre-work and letting me know like here's like you know here's all of these options like we do this all the time like don't worry but it's interesting because like that ends up becoming a decision for the employee they might go like no I don't want to relocate to another office to still be employed by Microsoft like that's way too disruptive in my life or and rightfully so like so it's this very in this situation a very interesting challenge to go navigate someone else's like visa status like I like I said I barely even understood how my own worked. So, yeah, just a really disruptive thing. >> I have had a number of uh H1B employees over the years, like folks that that work for me. I still don't fully understand the the system entirely, right? And I'm I'm the one like right like I'm the one who's responsible for providing the documentation for these folks to the the immigration lawyers and HR. And it's like, so what like what level of detail do you want? Because some like if you go too far in one direction >> on the the detail, it's like, "Oh, now you've gone too far. Now you've said something you shouldn't have said." It's like, "Oh, come on." >> But you didn't say enough this other like, >> right? Yeah. It's it's very annoying. And so one of the other It's interesting you you bring this up. So like um talking about assumptions like the the number of assumptions that have that that are that have to be made about about you know uh hiring hiring H-1Bs or working with folks that are hiring people with H-1Bs or like hiring people that are in another country. Um like we like the the last company or two companies ago, one of that uh we had a number of contractors that were working out of South America. Um great people, huge skilled people, one like I could not have done half the stuff we did without them. Um but like a like maybe not a full-blown civil war, but certainly like a very disruptive event happened. I think the guy the guy was from Peru and it was just like oh how do we Okay, so this is a P 0 for you for sure. >> Like what do we need to do to make sure that you're safe and healthy as a person >> and then can we do anything to make sure that you stay employed here? Um because like I get it like I want to make sure that you're fine, but the company like we we we can't like say you're on indefinite leave. Um >> like even if you wanted to, can you like leave? Like maybe they don't even have employment laws that allow it. Like there's so many things. >> Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. No, that um that bit us like we were we were gonna uh the so so I don't know most people know so you can't actually directly contract like we couldn't directly contract with individuals right so there's like an agency that's actually a legal entity within the company >> uh and you pay the company and the company pays them and of course they take their cut um And you know it's so it's like well wait a minute why can't we just go down there and start a company and then you realize oh wait like you there's like 20 year residency requirements and like I was like just I just want that person to write to continue to make awesome improvements to my front end like come on. Um, >> yeah, >> but you know, sometimes >> so many loopholes or not loopholes, uh, bars to jump over, I don't know what the right phrase is, but yeah, >> if we could just if we could just get past that, if we can get past that >> and maybe some other stuff, we'd all be fine. >> Yeah. But, um, no, it's it's I think it's super interesting that, you know, the this type of thing can like a P zero by your definition comes up not only from a technology perspective, but but with people. I think that for folks that are, you know, watching and listening that on the people side, it it probably seems less obvious because it probably comes up less often. But in someone in a position where you're managing people or in a directorship position where there's even more people below you, you'll see these types of things like bubble up more because your umbrella and surface area of like of visibility to them is greater. Um, so you know, a manager managing a team might periodically see these types of things. someone in a directorship position who has managers like reporting up to them then may have a lot of visibility across all their managers where these like sort of micro it seems like a micro thing at that level but it's you know P 0 to the individuals but you see more of this and it is real >> well I think I think you know you you I think you mentioned it at the beginning of the call right like the national organic thing is you go from engineer to manager right um and most times I I would 99% of the times um you know we looked we like oh we need we need to hire a manager like who's got the best soft skills out of the engineers that we have um that is respected um that is you know will take ownership you know for the team if something goes wrong but that doesn't mean they've dealt with a lot of stuff in their life right in fact most times it means they haven't um you know I was thinking about this earlier about assumptions, right? I mean, most of the assumptions like every every time you hire somebody like you're making a thousand assumptions, right? Uh and every time you go to work for someone, you're making a thousand assumptions. Um and some of those assumptions you are like can't verify, right? They're just impossible to to check. Um, so like having a having a manager who is like, you know, comes to me and says, "Hey, I've got an employee whose kid is really not doing well." And they're like, or usually what happens, frankly, is the manager comes to me, if I haven't worked with them a lot, uh, and they're like, "Hey, so and so isn't, you know, is is struggling. Like, we need to talk to HR. They're not doing their job." And it's like, "Okay, why?" Like I don't care why they're not doing their job. Like okay. >> So what's up? >> First of all, let's back up. >> Like let's find out why they're not because they were good last month, right? Like let's find out what has changed. >> Uh it's almost like the the whole P 0 process for like investigating. It's like let's find out like who deployed some code to this person. Um like what what happened? like, well, their kid, uh, you know, and then it turns out like their kid fell off a slide and broke their leg. It's like, okay, so they're having to like their kid can't go out and play anymore and they're having to spend more time entertaining their kid than they used to and it's disrupting their >> and there's more appointments and they have to be away from their desk a lot more and >> 100. Yeah. Uh, that was totally like way too true during COVID, but like Yes. Yeah. Exactly. So it's like okay so we've assumed that this person can work eight hours a day or nine hours a day or whatever it is we're expecting uh let's you know that assumption has turned out to be false >> uh or it's turned out to be false because inherent in our assumption is like those nine hours have to be contiguous right or those nine hours have to be when everybody else is working those nine hours But that goes back to saying like we have these other assumptions that like we have to go invalidate potentially. Yeah. >> Yeah. Like okay, can they work in the morning? Can they work after the kid goes to sleep at night? Like is this impacting other people? Like where how far does the P 0 spread? >> Yeah. Do they just need flexibility in their working? Like is is is that really what it's coming down to is that everything else would be manageable for them except uh the core business hours are 9 to5 and like that's just not going to work. And if you could talk to them, they would be totally happy to go do like, you know, some morning kind of like shift and some evening shift and they would be great with that. Like >> why are we even >> why are we even having this conversation at that point, right? Like um and I think some of that is like >> and I struggle I I personally struggle with this a lot. Like I like to take a lot of the bureaucratic crap off of my my team's plates, >> right? Just so that they can focus on what they're good at. Um, but what that means a lot of times is they don't realize what the bureaucratic crap is. Uh, so when they get exposed to it, they're like, "What is this nonsense?" Yeah. >> So like, you know, if you think about like that that scenario we just played out, uh, you know, where the kids got broken leg and like the the like if the the reaction was, "Okay, let's fire him." Like let's put him on a pip. Let's let's get him get him out the door. like, okay, now you gotta open a wreck with HR, right, to for for a replacement, which is a nightmare, right? I mean, it's a nightmare for me. Uh like I don't like paperwork. So, like and it's not it's nothing to do with the HR folks. It's just like I got to send an I got to fill out what form where. Um then you got >> it's not just check the box on the website and the roles live and we just wait for the best person to show up. Yeah. >> Right. And then you got to interview and you know 50% of the interviews don't go past stage one. So now like I'm sitting there like okay I'm now spending 60 hours a month interviewing people that are are never going to come through the door. So it's like or or we could just say they can work flex time for a little bit because the leg's not going to be broken forever. Um, and then yeah, but sometimes you I mean you have to give that context to your people. Um, that like the the bigger picture here is not just oh this person who's messing up your burndown chart uh isn't like isn't going to be around anymore so the stats are going to be look good again. Like who cares? Like that's going to cost us more in the long run than anything else. >> Yeah. >> Anyway, >> no it's I think it's super interesting. I yeah I wanted to say thanks for for bringing up this topic because I I think when you had started discussing it I was thinking about times where I've sat in incident bridges I've uh I know the first time I was ever on call at Microsoft like the very first weekend I started as a backup and with uh one of my employees who was more like significantly more junior than me he was the primary and we had a security incident and like I'm like I don't know what to do but this is it it's going to be like such a good learning experience. There's no way that you cannot learn from being in those situations. So um yeah, I think what you said earlier around like hey if you are more a junior if you have the opportunity to at least observe and like I'm not necessarily saying like hey like go get your hands dirty and get disruptive necessarily but at least to observe um and trying to follow along with what's happening to go understand I think is a tremendous opportunity >> because you will you will learn and you're going to be learning things at a significantly faster rate than like >> you uh I'm chipping away a part of the codebase, building out some features like you'll see how parts of the system come together in ways that you that you didn't know because other people didn't even know. So >> know it right. Exactly. >> Yeah. So I wanted to say thanks again for that. Um Oh, sure, Michael. Um if people want to get in touch with you, what is the best way? Is it Twitter if they want to strike up a conversation? >> Probably Twitter. Yeah, it's probably Twitter. Um I left Facebook years ago. I'm not on any other social media platforms. Um, so it's just MDKill at uh on Twitter. >> Cool. Awesome. Well, thanks again. This is super cool. Um, any any parting words for a more junior audience that uh could help guide them. I didn't tell you about this ahead of time, so I don't mean to put you on the spot, but >> um so many um like it's 20 years a 20 almost 30 years ago now, like this college roommate that I had um said that like, you know, every single thing is a learning experience. Uh like every single thing. And I think at the time I I mean I certainly I think he was saying it because something had gone wrong in my life or whatever. But I was like no it's not. I understand the world. I'm 18. Um like but everything is every like literally everything is a learning experience. Uh and if there you know you're struggling it's because you're learning something like there's some fundamental assumption about the world that you're unwinding in your own head. Uh and that's I think what learning is. I'm not not entirely sure, not an education guy, but like, you know, everything you can learn something from. That's >> awesome. Yeah. Yeah. And it's a good way to reframe things that might seem like they suck in the moment, too. So, um, Awesome. Okay. Well, thanks again, Michael, and I definitely appreciate you being here.

Frequently Asked Questions

What is the significance of system failures in learning opportunities for engineers?

System failures can be incredibly valuable learning experiences for engineers, especially for those who are more junior. When something goes wrong, it creates a unique environment where everyone involved is forced to collaborate and problem-solve together. This not only helps in understanding the system better but also allows junior engineers to learn from their more experienced colleagues in real-time.

How should junior engineers approach incidents or system failures?

As a junior engineer, I encourage you to jump on calls during incidents or system failures. This is a prime opportunity to learn how different parts of the system interact and how experienced engineers tackle problems. Even if you don't have a direct role in resolving the issue, observing the process can provide invaluable insights into system design and troubleshooting.

What advice do you have for engineers dealing with stressful situations like system outages?

In stressful situations like system outages, it's crucial to maintain a mindset focused on problem-solving rather than panic. I recommend approaching the situation with curiosity and a willingness to learn. Assume that the assumptions being made by others could be flawed and seek to validate them. This not only helps in resolving the issue but also enhances your understanding of the system and your role within it.

These FAQs were generated by AI from the video transcript.
An error has occurred. This application may no longer respond until reloaded. Reload