Did Copilot One-Shot This ENTIRE Feature? Or... Complete Flop?

Name: Did Copilot One-Shot This ENTIRE Feature? Or... Complete Flop?
Uploaded: 2025-12-11T20:59:13.0000000+00:00
Duration: 25 min 24 s
Description: GitHub put this ENTIRE pull request together for me based on an AI-assisted spec... but did it do a good job? Let's review the changes together!

December 11, 2025

• 258 views

GitHub put this ENTIRE pull request together for me based on an AI-assisted spec... but did it do a good job? Let's review the changes together!

View Transcript

All right, folks. In the previous video, we were getting Co-Pilot to do some work for us by getting a GitHub issue put together. We just got it to the point where we could go give it to Copilot to go execute. And then I said I would make this follow-up video so we can check out the pull request once it was put together. Disclaimer, I am not trying to roleplay as a pirate. My right eye is just a little bit gross right now. So, I'm wearing this to save you. Okay, let's jump over to GitHub. I'm just showing you on my screen right now. This is the prior conversation that we had. This is in the previous video. If you have not watched that yet, you can check it right out here. The final point that we got to was that we ended up getting this work item made. So, I'm just going to switch over to Well, let me pull it up here. I'll do it this way. Okay. So, we have that. It's probably behind my Oh, it wasn't behind my camera. Excellent. So, I have the work item open. Now, this is what we had put together by co-pilot and we walked through trying to put together a step-by-step plan of attack that this agent could go do for us. And what we were building was basically new behavior in Brand Go. So, the context, if you didn't watch the previous video, was that we had a request come in for some additional Tumblr support. This is a social media platform. I don't know if everyone knows Tumblr at this point. And basically in Tumblr you can have one account with many blogs. However, initially when I put together the functionality in Brand Ghost, I didn't think about that. I only thought that you could have one blog. I've used Tumblr a bunch before in the past, I didn't know you could have multiple blogs. So, I built it one way and didn't even realize. I in hindsight when I look at the code, there was a sign cuz there's a flag that says primary blog. I should have known. and we wanted to build out this functionality for a user that said,"I want to be able to post to all of my Tumblr blogs." So, we can do that. So, the idea here was that we have similar functionality with our LinkedIn code paths that we want to have with Tumblr because LinkedIn when you connect to LinkedIn in Brand Ghost, whatever LinkedIn account you have, your personal account, if it can post to business pages that you're an admin of, then you can also use Brand Ghost to post to that. It has a very similar idea to having a 1 to n number of profiles and we wanted to build that out. So that's what all of this is on my screen. This is a step-by-step deliverable. And how did co-pilot do? So this is the PR that it opened. And spoiler alert, it actually did quite well. I think that I'm going to show you maybe one more set of tests to go add, but I'm not going to force you to sit here and watch that happen. And I just want to kind of go through this code review and talk about it. I will call out that you know so it left a bit of an update in the PR kind of high level what it attacked. Great. This is very much align with what we had in the GitHub issue. But honestly for something of this scope when I see like 11 files changed I was like that's roughly what I expect. I almost don't want to read through this because the code will tell me more clearly. Sometimes if there's like a whole lot [laughter] of files changed, then I like to start by reading through the poll request and being like, "Okay, man. Like, did you go touch way too many files? If so, like what was the rationale behind it?" But for 11, that's kind of on track to what I expected. One more thing I want to call out is that before recording this, I ran my pre-build jobs. So, there's a build and a test job. I just wanted to make sure those pass so that I'm not going too far off the rails missing something. But, they do pass. So, Copilot did its job. Let's go see what this pull request looks like. I'm going to go through pretty quick because a lot of the details do not matter to you, but I'll be able to call out to you like why this is beneficial. We needed to update our schema in our database. But I told C-Pilot specifically, do not write a new migration. Simply update it in the existing script. That's because I will migrate this data manually. That's my preference at this scale right now. I don't want to risk anything and I know exactly what I'm doing in this case, especially because it's being vibe coded. I would much rather take my time and do it by hand. We have the pair ID added. It defaults to null. And then we have a new index on it. So, I feel very good about that. We have this new pair ID that was added. This is actually different because it's in the test file. One of the conventions I have and I did not explicitly tell Copilot to do this is that in our tests I want to ensure that we don't see this kind of code written everywhere. I would much rather that when someone's trying to write a test and they want a new pair ID or any strongly typed ID that we have this pattern where you can just say create a random one for me. Right? I don't care that it's next int 64 like from random behind the scenes. I don't care if you're using a snowflake generator. I don't care what it is. Just give me a random one. That means in the future if I want to move to having like friendlier names or something in my IDs for tests, I can go change that behind the scenes here. So that's a convention we have. It did that. I did not tell it to. Great. A lot of the code that you're going to see as we go through this is just full transparency. There's a lot of mapping between data transfer objects and there's a lot of mapping between uh strongly typed IDs. There's far too much mapping going on at this point, but this part of the code is actually something that needs to be cleaned up. There was some refactoring that was done once upon a time and sometimes we're mapping between objects and recreating IDs that are the exact same type. Just a heads up if your eyes are scanning and you're like, "What the heck's going on here?" But we can see that one of the things it had to do was add pair ID as null in a lot of different instances. So, it's doing that in the test. I don't know why co-pilot sometimes likes to just like delete comments. This is probably a comment that was vibe coded in at some point and in my opinion I don't care if this is missing but I wanted to call it out because I have seen this kind of thing happen where there'll be a comment block and it's updating code nearby and it just is like I guess we don't want this comment now even if it's unrelated. So kind of interesting. Here's an example of just as a heads up when I was talking about strongly typed IDs. I don't like having this pattern everywhere. So, this should be like refresh token create random just so you understand what I was getting at. Overall, this test kind of interesting because like this is one of the spots I actually would leave a comment to say we need more. So, I'm going to do that actually as we're going through. What do I mean by that? Well, we touched Tumblr off client. Okay, there was code change there. If I scroll down a little bit lower, you can see on the left hand side Tumblr o client was modified. If Tumblr o client was modified and we have to take a pair ID throughout the whole thing, does it have any logic that also needs to be updated? Maybe let's just jump down to Tumblr o client right now. If it's all pass through stuff, then maybe it actually doesn't make a difference. So maybe we don't need to worry about that. So nice. It changed up some formatting. I'm okay with that. Just some white space. We can see pair ID is being passed down. Pair ID is being passed down. It might even be worth having a new test added where we make sure the pair ID is not null for one of these functional tests. But I'm going to keep scrolling a little bit here. This is one of those examples where you can see we're doing a refresh token to a new refresh token. Kind of silly. We got this added, but overall in here, it actually didn't use any uh logic with the pair ID. was just passed through. To be honest, I might not even care to add more tests, but I might want to go back. Let's just do it for the sake of uh completion here. So, Tumblr O client tests, let's just say at copilot, and I'm going to speak out the prompt. Ensure we have test coverage where the pair ID is not null. You can use the existing tests that are in place and update the pair ID to a non-null value using pair ids.create random. That is absolutely not what I said. [laughter] That's amazing. Um, we're going to keep that one in the video cuz it's funny. I'm going to have to do it old school. So, uh, [laughter] the thing about this is I could be a little bit more specific and say like I want a new test. I could say that I wanted to assert on the pair ID, but honestly, I just would like to see something with pair ID not being null. Just make me feel a little bit better. I'm not super concerned to be honest because there is enough functional test coverage in this solution that we'll catch it if that's not doing the right thing. So, not totally nervous about that. Here's another test file. It fixed a typo. That's great. So, apparently once upon a time I was copying and pasting code. There's Twitter that was replaced at Tumblr. Thank you, co-pilot. One of the things I called out in the last video was that we had this try get primary blog. That's the method that we had in place, but we actually don't necessarily want to do that. I said in the previous video that I might expect that we get rid of this method and maybe we just have this one. So, it looks like in actuality what we ended up getting was that we are using this method. I really don't think we're going to have a need for this one. We'll see if it deleted it. What I won't be able to tell just from looking at this code is if there's any use cases for try get primary blog ID. So we'll see if it actually got rid of it. Another typo. Thank you. And then this is also hilarious because assert pected Tumblr. Like I just had typos all over my code apparently. So this is a a nice embarrassing fun time for you to see all of my mess ups. Okay. And then expected pair ID that we're passing in is null. Okay. So that is the case for that test. But do we have any tests in here where that's not the case? I want to make sure that we have some where the expected pair ID is something. So, we'll keep going. More typos fixed. We're getting these mocks uh set up. By the way, I don't use a ton of mocks in Brand Ghost. I am not totally against mocks, but I have made a push for a lot more functional testing. Challenge is that with REST APIs, I really do want to mock them. I don't want to go calling the Tumblr service or any social media platform when I'm running my test. So, keep going. So far so good. More typo fixed. Another expected pair ID is null. We want to make sure we have some coverage without that. Another test change. This one, sorry, is actually the assertion. So, I misspoke. It updated the assertion that's common across all of these scenarios to take in a pair ID that's optionally null. And we are going to check on that pair ID. All of the existing tests as we went through that, every single one of them is using a null pair ID. That is the default. That's what things would have been before any of this code change. That's what I expect. By the way, before we jumped over to this part of the poll request, I showed you that my test passed. So, it did go run these tests. They all passed. That's a good sign. Now, we have new tests. Okay. So, try add or update off async. This is the scenario where there's multiple blogs and then all blogs added with a shared pair ID. That means if we scroll down, right, we can see that Tumblr rest client is being set up with two different blogs. One is primary, one is not. That's what we want to see. Okay, we'll keep going a little bit lower. We're getting second o request and then we're going to go validate it. And what do we see here? So what I want to see is if we have one sec this call assert expected Tumblr o and database I want to see that. So I don't see that in this method this new one that it added. So that's kind of interesting. That might not be what I want to see. But also you can't tell from this actually but I think I might want to see that in the assertions but as I'm reading through this I know this code as a viewer you will not know this code. That's totally okay. What this is doing is it's actually using two different methods that I have. So this is try getting the O by the blog ID. It can only do this by getting data from the database. Just as a heads up, this method is reading the data from the database. This method is also reading data from the database and then it's doing the comparison. In actuality, I was expecting that we would see this method called again. We're not. That's because those scenarios are actually covered. What it's exposing here is that it's going to go read this data back again. This comes from the database and it's going to do a different comparison. It's going to check that we have different social account IDs. It's going to show that we have the same pair IDs because these accounts are grouped together, but they have different identifiers. So, this is actually a good test from the standards of what I have in my codebase. This is doing another one to show that when we're adding a single blog, we get a null pair ID coming back. So the setup on the rest client is just the single blog. So that checks out. And then it's very much structured like the test we just looked at. Try get off by blog uh blog async. And then the pair ID is null. So that covers the single scenario. This is on the social o handler. I think this is one this is one where I wanted to add another test. I'm going to try to explain it. I guess I can't use my mic to to speak anymore cuz that seems like it's busted for this video at least. The scenario that I want to see, I'm going to try to type it and say it at the same time. I would like a test scenario covered where we have existing off for a user. Uh I they're called owner user ids owner user ID. Then we refresh. How do I want to say this? I want to have existing off. Someone goes to refresh and we get a different set of blogs. Then we refresh. Tumblr rest API gives us a different set of blog names. Now, do I want to tell it what's expected? No. The reason I don't want to is because I actually usually I would. I'm actually curious to see what the behavior is by default because this is going to answer some other questions for me because if we're not totally consistent in our codebase right now, then I want to make that consistent. I'm going to add that review comment in. Like I said, I'm not going to record watching Copilot try to go fix all this stuff up. We're just scrolling through here. We got some test helpers. Totally fine. That's just to help save some typing when I'm writing more tests. One of the things I told it to do was adding uh dot comments. So, this is in the actual GitHub issue. It it had a note to leave some helpful dot comments. So, it's doing that. Um, that's great. This is the strongly typed ID. So, it did listen to that. Looks good so far. Created from the same OOTH session, right? It's nice that it actually put this is a helpful description instead of it just being pretty useless. More white space fixups. This is really just DTO mapping. So, that's kind of boring. And then it added on I didn't go tell it to go update all of these, [laughter] but because it was putting the pair ID onto them, I guess it was like I might as well I'm touching this. I should leave a dot comment on it like we talked about. So, I'm going to tell it to do one more thing here. Uh, let's see if that's the end of the file. I'm just going to tell it to move the the DTO's, so the the positional records into a new class. So, let's get that at copilot. Move the position. There we go. It's way harder to type with one eye than you might expect. So, move the positional [laughter] record their own dedicated file. This is all stuff like if I was working in Visual Studio, this would be uh super quick for me to just go do by myself. By the way, I do stuff like this from my phone sometimes. You know, if I'm sitting on the couch, uh if I'm driving somewhere with my wife and I'm not the one driving, uh this kind of thing works really nice just to do quick reviews on our phones. Pair ID. So, this is the database stuff. I'm just scrolling through quickly. Like I said, some of the details may not necessarily matter to you, but it has updated every single one of these database calls to make sure the pair ID is there. This is a new one. Try getting all of them by pair IDs. So that works. Um I don't use entity framework by choice. I like using I like having my SQL queries in my code. You may not like that. That's cool. There's try bulk update tokens by pair ID. So when we refresh something, we need to make sure that we have tokens updated for the entire pair. So that actually worked. That looks pretty good. One thing I noticed is that we have Tumblr O repository was changed. There are no Tumblr o repository tests. So this is a great opportunity. Why won't it let me click that? You can't click it if it's collapsed. Okay. Um at copilot add test coverage for the methods in this class use and then I I want to be very clear with co-pilot that I wanted to use a certain pattern. use the existing patterns for testing repositories where we use a test fixture builder. By the way, this is in my co-pilot instructions. This is in my custom agent that is running for this pull request. This is in the documentation. I still put this in the comments. I dream of a day where I don't have to do this. [laughter] That's part of the other series I'm making about putting analyzer rules in place to catch stuff like this. If co-pilot or your favorite agent goes off the rails doing stuff with different patterns, you just enforce it with analyzers. So, using the existing patterns, uh, we're use a test fixture builder to build a test fixture with a MySQL. Cool. That should be enough based on my previous experience. Basically, I'm just telling it to use my functional testing patterns that I have so that it doesn't go, "Oh, cool. Let me make a repository." And then it mocks all the SQL. Please don't do that. Okay. And then the rest client changes. So, this is the new uh method. You might have missed when I said it earlier, but I said that there is an existing method call that's like this. So, what I'm going to do is leave a comment because really if this method, you might not have the context unless you watch the previous video, but this method calls the same REST API that we were using somewhere else already, but that method call was only getting the primary blog ID. We can do that exact same thing just by using this method. We don't need two with a bunch of duplicated code. I just wanted to expand that. So I think I think it's this method here. Yeah. So let me do this echo pilot. If there are no other callers to except for test then we can remove this method. And then I might say I just want to double check V2 user info. V2 user info. Right? This is the same. It's making the same rest call. Do this rest call in. So, it might be a better option to No, I'm not even going to say that. That's too wishy-washy. So, we do this rest call. So, we don't need to repeat that logic in this method body. The reason I'm saying this is like, and this is actually kind of a uh how I would treat this on a real review. I'm just thinking about this out loud for the first time. So, do I have an opinion about this? The answer is yes, but I would need to go actively looking through the code a lot more and how much I care about this relative to how much work I want to put into that. Like, I don't I don't care that much. So, I chose my words carefully here. We do this rest call in try get all blogs async, we don't need to repeat the logic in this method body. So I'm trying to tell it this is what I care about. If it decides that it's going to keep this method, try get primary blog ID async and then basically gut the body of this method to call this one down here. Great. I'm okay with that. Would I rather it do that versus delete this method entirely and change all the call sites to instead call this one then filter for you know the primary blog that could work too. I don't care. I feel like giving it enough context to say this is what I want. I think it should make a decent decision but I just want to also call out I might be disappointed. Right? That's the side effect of not you know of of not being very specific. I feel good about that. We have this method. It added a whole bunch of dot comments. Updated these for our error handling. Cool. It added a little bit of extra logging. Cool. And then this is where it gets a little bit gnarier. This is the kind of code that personally I don't want to review in a pull request. So genuinely before I go and you know push this to production, I want to walk through this in Visual Studio. I'm scrolling through this as I'm talking because the I'm not going to review this effectively here. Unfortunately, I know what this code is doing is trying to update off records per blog that it was able to link up with this user. So, that's good. I can I can see that it's doing that even with one eye. But what I don't know for sure is like I'm I'm not going to feel good until I can kind of have it in Visual Studio. I can press F12 and dive into the functions. I don't I just don't feel comfortable about that. It does look like it's the right idea. It does look like we have tests that cover what I wanted. The tests pass. I have like despite not feeling like I totally understand exactly what this code is doing or if there's edge cases missing. I think that because I have the tests in place and they're passing, I would personally rather look at the tests and say, "Oh, well, what about this scenario? What about that scenario? Make sure the tests are there and then kind of feel like I can revisit this in more detail. A lot of this it looks tricky because there's a bunch of red on here. What it had done was taken two methods and kind of like taken some common code from uh from both of them and you know shuffled it around. So it just looks like there's a lot more changed. Like see how much red is on my screen right now and then there's like a ton of green down here. Like that makes me uneasy. Like whoa whoa whoa. How much did you change? But a lot of it's doing kind of stupid stuff like this where we're just mapping between different objects. So, it looks a lot worse than it is. But overall, I think Copilot did a pretty pretty damn good job. Um, I'm going to go submit this review. Request changes. Submit. And to kind of wrap up this video, this is literally how I build a lot of smaller features into Brand Ghost, right? So I just to recap, I have this conversation that I showed in the previous video where I kind of worked back and forth very briefly with Copilot right inside of space is on GitHub. I gave it some information. We went back and forth a little bit. I had it put together this issue. So I didn't type this out myself. This is based on some information I gave it. It structured this and then I assigned it to co-pilot. One of the the really important notes here is like having like this to-do list kind of format really helps. I tried making sure that it was not putting guesswork into here. Like the guesswork is during our conversation. Let's get that sorted out when it's time to go make the GitHub issue. Let's be very clear about what's going on. And the side effect, like I said, you know, it made this code change. I think it's probably Yeah, it's probably lost now. But, um, you know, at the beginning of this video or earlier on, I was showing you that it passed the test. So, it did a really good job, but there's just a couple of extra things I'd like it to polish up. And I feel pretty good about this feature. So, could I have done all of this from my phone? Yeah, probably, which is pretty cool. So, anyway, thank you so much for watching. If you have thoughts, questions, you want to see other things like this, let me know. But I thought this would be fun to kind of walk through uh between a couple of videos real examples of how I, you know, this isn't quite vibe coding I would say, but I'm using AI to go build out features and works pretty well. Oh, I lied. This is the it's right on my screen. Again, limitations of a single eye when you're used to using two, but um the build and the test pass from before. I will run them again once it adds the extra tests and stuff. So, I feel pretty good about that. Thanks for watching. I will see you in the next video. Take care.

Frequently Asked Questions

What is the main purpose of this video?

In this video, I'm following up on a previous one where I had Copilot help me create a GitHub issue and execute a feature related to adding Tumblr support in Brand Ghost. I'm reviewing the pull request that Copilot generated and discussing how well it performed.

What challenges did you face while using Copilot for this feature?

While using Copilot, I encountered some challenges like it occasionally deleting comments in the code and not always following my conventions for test coverage. However, overall, I was impressed with how well it handled the task.

How do you feel about the quality of the code generated by Copilot?

I feel that Copilot did a pretty good job with the code. It made appropriate changes and passed all the tests, but there are still a couple of areas I want to polish up before finalizing the feature.

These FAQs were generated by AI from the video transcript.