BrandGhost

How I BLEW Through My Azure Budget (And How To Avoid My Mistake)

You wake up early. Time to make some progress on your side project before work. Still a bit groggy from the night before... Time to check the logs from last night's change. No. It can't be. This can't be right. Your precious logs are overrun. They're filled with spam. But it gets worse. Your Azure bill. In this video, I'll walk you through how a very simple mistake cost me $150 just for being asleep for a few hours. Fortunately, the fix was quick. The goal here is to show you how these issues can translate to real-world impact.
View Transcript
this one simple mistake cost me $150 overnight just for going to sleep and if this was in production for work this would have cost significantly more money hi my name is Nick centino and I'm a principal software engineering manager at Microsoft in this video I'm going to walk through a bug that I introduced into one of my projects called brand ghost that basically caused a loop to spam a log repeatedly now the fix for this ended up being as simple as swapping a couple of lines of code it's very trivial but that's exactly why I want to walk through this scenario with you now if that sounds interesting remember to subscribe to the channel and check out that pin comment for my courses on D Trin now let's go walk through this embarrassing mistake and that way you can see what I was up to okay so on my screen right now in Visual Studio I have a very simple web application this is basically just the weather template and then I've removed the weather API because we don't really need it but you'll notice that I've introduced a couple of dependencies that I've added in here and we're going to be looking at a host background service so I'm going to scroll down a little bit lower and let me get this on screen a little bit this is probably too far zoomed in for you but if we have this service it's called NYX service you can see that it's taking in a repository as a dependency so that was the other dependency right up at the top here uh right there on line four if we look at what this does it's pretty simple right so the mandatory method that we need to have to be a background service is execute a sync it takes in this cancellation token for for stopping so in the body of this what I'm doing is I just have this while loop and I'm going to walk through uh cuz this is very simple I want to walk through essentially what I had to worry about my own application I needed a background service that could essentially wake up periodically and then do some heavy processing and then go back to sleep the problem is that I don't want this thing to crash and then not run for the rest of the duration that my app is running so I want to make sure there's some resiliency there I wanted to make sure that this while loop couldn't break so we're going to check that out in just a second but I want to walk through the base case that we have here so this one is going to be very simplified in comparison I'm just going to be asking the repository to get some data and then I'm going to write it out so this is the equivalent to the heavy processing that I was talking about my own project so truly in my own project I had to do a bunch of database query and crunch some numbers and then send more information off to another service so there's a lot more room for error and truly one of the things that I was nervous about was some of my SQL queries I wanted to make sure that if I had the data wrong in some of the particular entries that I had that that wasn't going to break the processing so I would like it to be able to wake up and resume and I could correct the data if I needed to and then it would be okay for next time right before we move on this is just a reminder that I do have courses available on dome train if you want to level up in your C programming if you head over to dome train you can see that I have a course bundle that has my getting started and deep dive courses on C between the two of these that's 11 hours of programming in the C language taking you from absolutely no programming experience to being able to build basic applications you'll learn everything about variables Loops a bit of async programming and object-oriented programming as well make sure to check it out between the attempts it needs to go wait a little bit so for me the duration was a little bit longer than it had to wait I can't remember if it was north of 10 minutes or so but it had to go wait I'm just going to do it for 3 seconds so we're not sitting here on this video and you're bored out of your mind but if we look at the repository a little bit more below here this one's very fake I'm not going to go out to a database it's not really important for this video but this is the method we're calling it's just going to grab back hello world and like you can see up here we're going to print that to the console that's on line 29 when we go to run this what should happen is that we have this background task start up or this background hosted service and that's because when we put it onto the container and start the application it will automatically thanks to asp.net orgo run this for us so it will basically sit here in this Loop and every 3 seconds it's going to print out that it got some data and that it's uh waking up after it's waited those 3 seconds so let's go run it and see what the console output looks like okay so we can see it waking up there's the hello world waking up hello world it'll do it again and this will just keep running right so I want you to just conceptualize that we're doing some background work this might be more applicable in your own situation right but this is just a contrived example but it works right the situation that I had was well what happens if this part here can throw an exception and were in bad shape and what I decided to do was to be as safe as I could and I wanted to make sure that I could wrap basically this entire body of the W Loop in a tri catch so just to show you it looked something like this and I wanted to mention as I'm typing this out that I do have another video that is on exception handling and how you might want to consider failing fast or if you want to be more robust in your error handling so different scenarios for that if you haven't watched that I'll put a link up here and you can go check that out I should also mention too that all my console right lines consider that something like Telemetry and logging right so for me I'm not just writing to the console in my application I have other logging that I'm leveraging but for this example we're just going to do it this way so I decided hey look if we're cancelling let's just make sure that we can break it not so we'll put a break here um that was co-pilot cheating a little bit if we cancel we'll just break out of this Loop great otherwise I just want to be able to log that we have an error that way I could go make sure that I can observe it I'm using a spire and I'm can check that dashboard make sure if I see errors I can go update my data make sure it's okay let this thing run if we were to go run this now the behavior would be the exact same as we just saw right so let's go pull it back up and that's because there are are no errors this is just working as it was I just put some error handling in place to make sure that this thing can't get brought down if it should have a a period of time where it wakes up and does something bad so let me close this off again there we go now to simulate something going wrong instead of just getting the data I'm just going to throw an exception instead so this would be essentially what was happening because in my situation that I had that's real I was throwing an exception because of some bad data and what I had done was gone to sleep right this is a Surface that's running and I knew that I might have bad data that's why I wanted to put this protection in place but I figured hey look I'll check in the morning if there's any logs that show that it had triggered and something bad happened I'll correct the data no harm no foul let's see what happens when we go to run this though and you might notice something very bad happening already and this is exactly what happened to me overnight in Azure because I woke up in the morning went to go check the logs and I said oh no why are the logs filled completely filled with this one error message over and over again why is this happening and how can we fix it cuz the other thing that came to mind and I didn't check it right away but I said my Azure bill is not going to be okay because I know that something's stuck in an infinite tight Loop so that's exactly what we have here if we scroll back up for some people that were watching as I was explaining this you might have caught it right away and it's truly not that difficult it's just I was hoping that if I kept blabbing it might distract you from it the problem is that we have the sleeping part put inside of our TR catch okay so I was planning when I built this type of thing originally I said hey look I want to make sure that everything inside of here is not going to be at risk of causing this thing to break right that way we can go ahead and next time stay inside the loop but that's not actually what we want we do want to make sure that the delay is not within here because if we move it outside this does give us that extra protection that if we had an issue with the data processing which is what line 30 and 31 are supposed to be representing here if we had some type of issue that way we could say that's okay don't worry we can go log it and we'll still go wait until the next time we need to process data I was trying to essentially wrap too much in the exception handling trying to be safe but the side effect was that it completely backfired on me so if we structure it something like this again assuming this is kind of like sending up Telemetry for us this is some heavy data processing and this is the weight period before operations that we want to go run let's go try this now and see how it behaves error something went wrong right you can see it happen again and now it's actually doing what I would have expected is it great that it's going to add a bunch of these to the the log no it's not great but it's not sitting there in a tight Loop filling the log so at least for me because my period was something like 10 minutes or so whatever it happened to be not just a couple of seconds to see this a handful of times in the morning wouldn't be so bad I'd be a little bit disappointed that my stuff wasn't working but at least I'm not going to have to go sit there and look at an Azure Bill and that's how it ended up costing me $150 just to fall asleep in my one single instance on an Azure VM my reminder to you is that when you're thinking about your error handling have an extra look at where your TR catches are if you have to sit inside of a loop make sure you're looking at where you're taking a break between your polling or your other processing that you have to do right so just make sure that you're considering how this is structured if you're interested in seeing how you can add a global exception Handler into your asp.net core applications you can go ahead and check out this video next thanks and I'll see you next time

Frequently Asked Questions

What was the mistake that caused the $150 Azure bill?

The mistake was that I placed the sleep function inside the try-catch block of my while loop. This caused the application to enter an infinite loop of error logging without any delay, leading to excessive resource consumption and a high bill.

How can I avoid making the same mistake in my own projects?

To avoid this mistake, ensure that any sleep or delay logic is placed outside of your error handling code. This way, if an error occurs, your application can log the error and still wait before trying to process again, preventing infinite loops.

What should I consider when implementing error handling in a loop?

When implementing error handling in a loop, carefully structure your try-catch blocks. Make sure that any delays or sleep periods are outside of the try-catch to avoid getting stuck in a tight loop during errors, which can lead to unexpected costs.

These FAQs were generated by AI from the video transcript.
An error has occurred. This application may no longer respond until reloaded. Reload