Let's Debug Together! Fixing A Production Bug in ASP.NET

Name: Let's Debug Together! Fixing A Production Bug in ASP.NET
Uploaded: 2023-12-08T17:00:06.0000000+00:00
Duration: 11 min 15 s

December 8, 2023

• 1,045 views

In this video, I'm going to show you how to debug a production bug in ASP.NET core. ASP.NET core is a widely used web framework, and as such, it can be prone to production bugs. In this video, I'm going to show you how to debug a production bug in ASP NET, and help you fix it!

Have you subscribed to my weekly newsletter yet? A 5-minute read every weekend, right to your inbox, so you can start your weekend learning off strong:
https://subscribe.devleader.ca

Check out all of my courses:
https://...

View Transcript

but when I pushed a production things didn't go exactly as planned look the unfortunate reality is that as software Engineers there's going to be times where we're fixing bugs based on stack traces from production it's not something we want to happen but it's basically inevitable even if we test as much as we possibly can all of those tests are supposed to help us have confidence in our changes but we can't possibly prevent every possible bug that could occur so in this video I wanted to walk you through me problem solving and fixing bugs and real production code that I have based a snack Trace that I had from production I'm going to show you how I use the information from the trace to find where the problem in the code was diagnose why that problem happened fix it and then write some tests to help cover it and stay right to the end so you can see how things finally went when I pushed a prod a friendly reminder to check that pined comment for my free Weekly Newsletter and with that let's check out this stack Trace all right on my screen I have the stack Trace that I found in production when this error was occurring and for some contacts the situation is that I was having a customer do a password reset on their account because they couldn't log in because they just couldn't remember their password and the exception that we see in the stack Trace here I just missed the eye on invalid operation exception is telling us that we have a nullable object that we were trying to get the value from and in my opinion this is equally as crappy as coming across a null reference exception it's technically almost the exact same thing just a slightly different situation because we have a nullable object and if I look at the stack Trace I can see that it makes sense where the issue is coming from because I have some type of service for authentication in this case I'm using Cognito from AWS and if we think about the password resetting part that's going to have to interact with Cognito at some point to manage where that password is and the last little tip that we get from the stack Trace is that it's on line 239 so line 239 Incognito authentication Service that's the first spot that I think we want to go check out all right in my project I have this notification client inside of the Cognito authentication Service and I I've gone to line 239 like the stack tray said and you can see that when we write code across multiple lines like this it's technically telling us that everything that you're seeing here is still on line 239 now when I'm debugging null reference exceptions like this and I have a bunch of code that's technically going to line up against line 239 because everything that you're seeing here is going to classify still as line 239 I have to start walking through which things can be null now parameters just because I have it on my screen right here if I scroll up a little bit higher I can see parameters is coming from oh something that's passed in so this could potentially be null but let's scroll back down and see what else at line 239 could be null one that looks very interesting is is get user ID for email result because we're getting the value off of it and you'll notice that I'm using this bang operator which is supposed to say hey look we know that everything coming back that we're using at this point cannot be null therefore we're able to grab the value value off without this bang operator we get a warning from Visual Studio that's going to tell us hey look this thing could potentially be null but there's some situations where we might have more context than the compiler and in those cases we can insert that bang operator to let it know that we can guarantee it's not going to be null but what if it could be null in this case what if I screwed that up to me this is potentially more likely the culprit I'm not saying that parameters could not be null but I do want to go investigate this one because it seems a little bit suspicious that I've made the claim that I'm guaranteeing it's not null I mean I haven't looked at this code in a little while and I can't obviously tell why I should be able to guarantee that it's not null so let's go back to where we're getting this variable and I can see that I'm calling try get user ID for email asynchronously and we're passing in the email and a cancellation token so if I jump into this method now I can see that oh wait a second I have this return type that's a task because it's asynchronous but I'm using this result type that I built and the goal of it is to be able to package up exceptions as well as a return type and this way I don't need to wrap a try catch around everything but instead I can try executing things and ensure that I can check the exception if one occurs but there's something really interesting about this because the flavor of this result type that I chose looks like it's allowing nullable types in fact it's allowing null right here the other variation I have of this is just tried X not tried null X so that's really interesting that I made a claim in the other method the result of this could never be null when in fact if we look at the signature it's literally telling us that you can expect null to come back I think we're on to something I think that there's a bug in the other spot where I'm assuming that it can never be null so we'll scroll down a little bit and see what this method's doing I'm not going to go through all the details of exactly what's happening under the hood but we're essentially fetching users from a repository with a filter and we can see that if we have one user great will return it if we have multiple we're going to throw an exception because this is an exceptional case this should never happen in reality and it tells us that part of our system's broken but interestingly enough on line 894 we can see that it is going to return a null Bingo this has got to be the spot we have this null return value that can come back and the other part of our code that lines up on line 239 is telling us that we should never expect null I think we have to go back and properly handle that null instead of just assuming that it will never be null now I've already fixed this but I've gone back and my git history to show you what's on the screen here and I'm about to go check out the other spot that has the commit so I can show you the additional code so what we're about to observe is that I have another if statement that's going to come in right after this and with the fix checked out you can see that on line 221 that I've added in this additional if and what I'm doing is saying it doesn't matter if this was successful or not because that was the only condition I checked before so we were only looking for exceptional cases but what we're doing here now is saying if it's null which was totally valid and not exceptional but we're going to write out a debug log and this isn't something that I'm going to keep in production for long I just want to make sure that when I'm working with this user to ensure that everything's valid that if something weird is going on I can still see what's up but before I scroll down to where the other code was originally on line 239 I just want to call out that I'm now pulling out the user ID and I'm able to do that right from here and what you might observe and it might be not obvious is that I don't have that bang operator here right after value so there's no bang operator but Visual Studio is also telling me that it knows that I don't need it it knows that user ID is never going to be null and it knows that because I'm doing this null check explicitly here so this is a friendly reminder that when you're seeing warnings you probably want to take them very seriously there's a lot of times where we do know better than the compiler but there's many times where the compiler truly knows better than us so I'm pulling that user ID out without the bang operator now and I'm able to go use that back down below right in here in the notification client so this is great I believe at this point now I fixed the bug but we're not done because we need tests so one of the reasons that there was an escape here was not because I had bad tests in place but because I didn't have any tests in place on this code and I'm not saying that if I had a bunch of tests that I obviously would have caught this issue but the reality is I don't have any tests on it at all within that CL many of the other methods are tested but this particular one for resetting passwords just didn't have any test coverage on it and an important thing I want to call out about the test we can write on this is that in my opinion they are best served by being unit tests and what I mean by unit tests is being able to have the external systems in this case the Cognito service from AWS totally abstracted away and I'm not going to be interfacing with it and that's because I don't want to be interfacing with the internet with a production service for my test now I didn't scroll through all of the code in that class but this is a situation where I have a class that has over a thousand lines of code it's way too big and it makes it really difficult to test all of it in a really clear way it's probably a really good opportunity at this point to go back and refactor it and clean it up split it into smaller things I just haven't done that yet and what you're going to notice is if we go back to the test code you're going to see that I'm in the thousands of lines in the test code as well and again to make it more easy to navigate and understand it's probably a great opportunity for me to split all of this up have the dedicated test for the different parts also split up as well but for right this moment I don't want to couple all of that refactoring to go split up that class and add the test into my same commit and that can follow later but I want to focus on fixing this bug so I can help my customer and I'm going to add the test coverage to ensure I have confidence in this change afterwards if I have time I can go back and clean this up to make it more easy for me to navigate the next time because odds are if I'm having a couple of bugs right now in some of my authentication code I want to go make that feel really good to navigate for the future so all the bugs I have to fix can be way easier let's go check out the test all right so on line 1179 yes over a th lines of code in this test file I've added one new test which is going to check if an existing single user for the email is there and if so will return true when trying to reset the password I am using mock to set up the mocks in this case and then I have some parameters that I'm also setting up ahead of time the details of setting up these various mocks that you see on my screen aren't super important but I wanted to show you that I am having a scenario where I'm covering that I have one user that exists for the email and that's one happy path scenario that I would have wanted to cover before now the reality is that to make sure that this is working as expected I need to guarantee that I have a code path where the user doesn't exist that's what the whole problem was in the first place where we were getting that null coming back so to quickly show you on the mock that I'm setting up when I'm asking to get the users based on a filter I'm going to return back an empty array of users you'll also see that I am mocking out my logging I want to make sure that my logging is configured properly and that if that scenario is going to get hit then I do get that debug log printed out and when I do call this method with everything set up the way it is with that empty array coming back I should get a false value coming back as well so at this point I have fixed the code and I have added two tests the first test is going to be the happy path that I would have expected and the second test is that negative path the path that was causing me to have a code breakage and I'm now proving that with that path set up I'm getting the expected behavior from here I'm ready to push to production I have my test coverage and I have the fix that I think is ready to go and once that's up there I can sit down with the customer again and say Hey try resetting your password I got everything fixed up and you should be able to log in after you get that password reset but when I pushed a production things didn't go exact L as planned so when the next video is ready you can go ahead and watch that right here to see what happened thanks and I'll see you next time

Frequently Asked Questions

What should I do if I encounter a null reference exception in my code?

If you encounter a null reference exception, the first step is to check the stack trace to identify where the issue is occurring. From there, I recommend investigating the variables involved at that line of code to see which ones could potentially be null. It's important to ensure that you're not making assumptions about the values being non-null, especially if the code has changed since you last reviewed it.

How can I improve my testing strategy to prevent bugs like this in the future?

To improve my testing strategy, I plan to implement unit tests that cover various scenarios, including edge cases. It's crucial to ensure that all paths through the code are tested, especially for critical functions like password resets. Additionally, I want to refactor my code to make it more modular, which will make it easier to write and maintain tests.

What steps did you take to fix the bug in the production code?

To fix the bug, I first analyzed the stack trace to pinpoint the source of the null reference. After identifying the problematic code, I added checks to handle potential null values properly. I then wrote unit tests to cover both the happy path and the scenario that caused the bug, ensuring that the code behaves as expected before pushing the changes to production.

These FAQs were generated by AI from the video transcript.