Here's a walkthrough of how I go back-and-forth with Chat GPT to get a design spec put together. This is just an example, and of course, you should spend more time and refine the output that it gives you.
In the follow up, we walk through the code that gets built and see how we can improve it!
View Transcript
What I did was I kept my prompt very, very simple. A little spoiler, it did in fact work. In this video, I wanted to walk through how I'm getting a little bit more out of the agents that I'm working with. And this isn't like something that I invented, but I am finding that it's working a lot better for me when I go to use that. And that's getting specs built out by LLMs so that other LLMs can go build based on the spec. Let's jump over to chat GPT and I'm going to walk through one of the examples that I had in a recent live stream. To kick things off, how I like to approach this personally is that I do a bit of exploration with the LLM. And you might have something in mind already and you want to get right into designing it.
But the part that I like is that I can use the LLM to explore a little bit. So if you notice in my first message, I'm not even telling it like I have this thing I want to go build. I'm more just asking it about different technology. In hindsight, this might have been a little bit restrictive. So I was trying to ask about different vector database options and I gave three just as examples. Postgress, SQLite, and MySQL. Like I said, in hindsight, maybe I should have just left that more open-ended, but I wanted to go build a rag system or retrieval augmented generation so that I can essentially leverage text that I have and be able to search through it semantically. And I haven't had to do this yet locally. So I was trying to explore what options I have. Now these three databases I've
worked with before. I don't yet have a personal opinion, but I wanted to start exploring and see what chat GPT had to say. So when I go through this, you will note that it talks about Postgress. It talks about SQLite and MySQL. The silly part is that like I said, I kind of gave it three options and it only focused on those. I should have left it open-ended or followed up, but based on what I was reading here, I was actually okay with it. So again, I'm doing this to explore my options so that I can have some input on the decision-m. This is one of those things that if you don't really know it all, then maybe you just leave it up to the LLM. But I wanted to get some insight so that I could go pick and choose. Now, for the thing
that I wanted to go build, which we'll talk about in just a moment, I don't think that SQLite is going to be the best option. It's just going to be a local tool that I'm running at least how I plan to. But MySQL is something that I work with all the time. I haven't really used Postgress recently. So in my mind, I was thinking that might be a good option. So I'm kind of sitting between MySQL and Postgress. But one of the tricky things is that if I'm using MySQL, I don't want to have to go use Heatwave. That is a bit of a limiting factor. So that does kind of push me more towards Postgress. They did or it did mention um duck DB. I don't know how you address the LLM. Chat GBT mentioned duck DB. So, it was one thing that it
kind of called out, but like I said, not in a lot of detail like the others. So, it kind of gives a little bit of a high level for how to go put a rag system together. Again, kind of a nice intro, but I haven't told it what I want to build yet, which is where this comes in. And so the whole idea for the thing that I wanted to put together, little side project, is that I have this game that I've been building for over 20 years, and I'm not making that up. It's a side project that comes and goes, and I always come back to it after a few years to play around with it. I like building stuff on the side. I think it's a great way to practice building software. At one point a few years back, I said, "It's time
to actually put some of the game content together instead of just coding random stuff. There should be a game world." So, I started making a wiki for it. And I said, "This would be a really cool project to put together to have something that I can use retrieval, augmented generation across. I could use the LLM to ask questions about the game world and it could answer that for me." That's what we're going to build with this spec that we're putting together. So, it says, "Awesome. This is a perfect rag use case." Thank you so much, ChatGpt. I know it's always trying to inflate your ego, right? So, it gives a little bit of an architectural overview. To me, this makes sense. We want to ingest the files. We want to chunk up the files so that we can actually do the embedding across those chunks
which is as it says the next step here and then we need to be able to store that and then once we have that stored how do we go retrieve that and make use of it. So overall high level makes sense to me. What I find kind of interesting when we get into some of this stuff here like it's putting together schemas and things like that like there's a lot of detail in this and I haven't actually given it any sort of specific details about what I want. Right? If you see what I wrote here, this is three sentences. It's no details at all. The fact that it's getting into like some very specific things I think is kind of interesting. I look at this being like, I'm going to scan over this just to see that it's at least, you know, thinking through the
problem, but it's not really something that I'm going to leverage or that I have an opinion about yet necessarily. So, it's putting together the embedding part. It's putting together where we can actually go search for the data. And it's still doing it across these different database formats because I didn't really pick anything yet. But again, it brings up MySQL with some of the limitations, especially around having to use heatwave and, you know, the community one right now. Apparently, I could dig into this more, but it seems like it doesn't actually have good support for a vector store. So, I'm thinking Postgress at this point. A quick reminder that if you're enjoying this video to give it a thumbs up and subscribe to the channel. Make sure you stay right to the end so you can check out the next video where we actually step through
the code that this created and see what had to get fixed up for it to run. I don't know if I mentioned C. probably did or it just knows based on how much I talk to chat GPT. So, I'm going to be using C. Again, puts together some sample code based on the schema that it's done. It has a hybrid approach. All of this is like in my opinion a little bit too much detail this early on because I haven't given it a lot of guidance. But at this point, I'm also thinking, you know what, I don't actually think in this example that I want to give it a ton of detail. In fact, it might be kind of interesting to just let it ride this out. And that's actually what we did on the live stream. I put this PRD together, which is what
we're going to see at the end of this. I let it run during the live stream and then we had an output at the end of the live stream. It was kind of check it at the end and see if it works. So, I did let it kind of come up with whatever it wanted to. I said to it, you know, if I have a Docker image because I'm thinking now I want to use Postgress. Do I have to get a certain version of it? These are things that I could go search on Google but now I can just ask chat GPT directly and hopefully it finds an answer for me. So short answer yes Postgress docker image does not include vectors but there is an image for us already which is great news. I don't really want to have to go fiddle with things
to set them up if I don't have to. I'd like to just focus on getting the code and running this thing, especially for a little side project because for me, I'm not super interested in going to learn about having to set up the database at this point. I'd like to see it working so I can play with it. At this point, we're going to continue on. Again, it's talking about some details that I'm going I don't actually know if I care about this right now, but I'm going to start to pick some of the tech stack. I'm saying to Chat GPT, let's go with a system that has some of these assumptions for the text stack that we're working with. So, we're going to use Docker and run PG Vector, the latest image that we have for that. It listed out a couple of Nougat
packages for us to work with. And then I also mentioned that I want to use Dapper. This is a personal preference. If you watch my other videos, I'm not a big fan of Entity Framework Core. No real good reason, just from a use case perspective. I like using Dapper. It's more familiar and friendly for me. I'm suggesting I want to use Dapper, but then I said like if it doesn't work with the vector columns, then we can use Dapper for some of the other stuff. I just don't know yet because I haven't built one of these if Dapper is going to work how I expect. Kind of leaving that up for Chat GPT to decide. I'm telling it to use Needler, which is a dependency scanning system that I have for automatically registering types. So, it's kind of like a hybrid between Autofac and Screwor.
It's kind of the best of both worlds from my opinion. So, I'm telling it to use that because it will automatically register types. We'll see in just a moment. It's not really a big fan of how to use this cuz it doesn't actually know what it is. And then I want to use semantic kernel as well because in follow-up videos I want to expand on this and leverage semantic kernel. So, you'll see more semantic kernel tutorials. Finally, I told that I want to use seral log for logging. Everything else that I'm omitting here, I'm kind of just saying I don't really care. like go pick something and use it if you need to. Of course, because it's an LLM, we can always go back and forth and challenge it or ask it to do different things, but let's keep going. It's putting together some steps
for us to go work with, right? Again, it's giving us all these details, but we don't actually know like, you know, do I need to go set up logging like this? Do I need the code for that right now? Not really. But what's a good thing to pause and look at, especially if this is something you're going through and you're trying to take building a system a little bit more seriously is look at the code that it's trying to give you as examples because you can catch things early. I did not because I told it to use needler, which part of me was testing to see if it was going to go look at the repository and then infer the usage. No, not a chance. did not do that because everything that it said about working with needler is exactly how Needler does not work.
So kind of funny, but that's okay. It keeps spitting out more code for how to do things. You know, has uh it's following the same schema that it had from earlier on. Some example queries. Again, this is going to be using Dapper. So in Dapper, we actually have the SQL queries in code. If you're using Entity Framework Core, this is usually more sort of abstracted in the OM for you, but not my preference personally. And then we see some semantic kernel usage. So, I'm going to keep going, skip over some of this code, cuz this part is not super important. We will see how all of this comes together a little bit later. Go a little lower here. There's just a in my opinion spitting out this much code this early, it's not super helpful except for reviewing how it's thinking about things. I told
it that's not how Needler works. So maybe don't do that. Again, I'm not correcting it because I'm actually curious to see if it's going to pull stuff from the repository. And again, spoiler alert, it doesn't. But I'm trying to explore. I'm trying to see if you are, like I said, trying to take this more seriously, I would take the opportunity to correct it with examples because you can shortcut some of the pain in the butt that you'll get from this. And then I saw in one of the lore APIs that it's kind of formatting a big string and I said like, hey, look, we're in C code right now. You can use objects. You can make classes for this kind of stuff. You don't have to format a string to go use in the rest of the C code. So, don't do that. told it
to use a a positional record instead. And then um I said that for needler we want to be able to configure the the kernel builder for semantic kernel and we need to be able to select different LLM providers. So I am going to be using Azure Open AI APIs. That's a mouthful to say. And then in a follow-up video we will look at changing this over so we can run a local model because that will be fun to go explore for us. It goes and basically tries to update everything. I will spare you because there's a lot of code that it's putting out again. And finally, I said for plugins, I would like to make sure that they're focusing on like a single responsibility principle. In Needler, you can jam a whole bunch of stuff in one plugin, but I would rather have it be
more modular. So, go do that. And then I said, and this is sort of where the magic comes in. I would like you to create a software PRD that explains to an agent how we can go build this architecture. This should include to-do lists and exit criteria. This part right here I think is super important because it's giving the LLM that sorry the result of this is that we can give LLMs so agents the ability to understand the system and then have a to-do list to go step through. If you've worked with agents before, you will notice that if they have a to-do list and Claude Code and a lot more agent tools now are starting to make sure that the agents are having to-do lists because without them, they seem to go off the rails and just do stuff instead of following along and
staying on task. The other thing I called out though was exit criteria because I have noticed personally that when I'm giving even to-do lists to things like clawed code, what will happen is that it's like I'm done and then I check later and I'm like this code doesn't even compile. So having some type of exit criteria along with the to-do list, I'm hoping that it will actually reflect on this and go okay like I am meeting the exit criteria. I can move on to the next set of work. Then I also mentioned because I am going to be using clawed flow on top of claw code for this that I want to see what can be done in parallel because we might as well see if we can have agents run in parallel to build some of this stuff. Why not? I said I need
it to be consumable by an LLM. In my opinion, and I haven't proven this, but in my opinion, I'm trying to get chatpt in this case to make sure that whatever it's outputting is going to be understandable by an LLM. I don't really need to read it as a person. I'm not the one going to code it, but as long as it's going to be followed by an LLM and it's going to make sense, then we're going to be set up for success. And that means we're going to need some markdown. What it does is it ignores the raw markdown part, of course. And then it spits out this huge PRD. Great. Um, I'm going to open it up in an editor. We're going to step through it pretty quick. I just wanted to mention that I did follow up right after to say, how
do we extend this for using O LMAS? And like I said, we'll see that in another video. All right, so we have the PRD here. Let me get it a little bit larger. And we'll see that in the beginning of this PRD, we see some major sections that just have a lot of the information around design decisions. And what I would recommend is that you actually go back and forth with the LLM to improve this PRD. I did not do that before my live stream. It was like a oneshot. I had the PRD and I said, "We're going to go run this and see what happens." But again, if you're taking this more seriously, this is an opportunity for you to read through this and go, "Wait a second." Like, "That actually doesn't make sense." Or, "Oh, I missed talking about this thing." We have
this opportunity where it's not actually building this yet. We should review it. We should make our edits to it. And then we either edit it manually or we give that back to the LLM chat GPT in this case. and we say, "Hey, for these sections, go change this stuff around so that it's more to your liking." But the whole meta point here is that there's a lot of structure that we're going to be providing to a set of agents to work with. I keep scrolling here. You can see there's a lot of stuff that was decided upon in the chat with chat GPT. I scrolled through a lot of it very quickly, but you can see it's defining the plugins that it wants to use, right? A logging plugin for Sarah log. We need to be able to configure Postgress. We need to configure semantic
kernel with Azure Open AI. And then I told it I want to support other LLM providers. So it has a selector plugin as well. It has some of the schema that it came up with. Again, if I'm looking at this now, I might say, hey, you know what? Like I don't care about the headings or the title or I don't care about following the anchor ID. But it used the information I gave it was that I had a wiki built-in markdown file. So, it was making some assumptions about that. And I'm okay with that, I think. So, we'll keep scrolling through. Has more sample code, has the APIs to go build against. Again, if you're reading this and you're like, I don't like these APIs, right? Maybe I don't like these uh positional records being named with DTO in the name for data transfer object,
you know, or I don't want my cancellation tokens to have default values. I want to force people to use that. anything like that that you're seeing here. This is such a good opportunity to make sure that it's corrected early because the agents are going to build against this spec and they will go do it. There's some information here which is kind of interesting, right? There is a hybrid search and it talks about blending the weights for the results. I think that's super cool. I don't know enough about this, but the fact that it's going to be in the code means that in the result if it works, I should be able to play and tune this. I think that's pretty interesting for me to learn about. And then we'll see the work streams. This is what I was saying a little bit earlier that having
the to-do list is going to be incredibly important for the agents to build against. And it came up with basically some infrastructure. Then it's going to build out some of the features like we need to be able to ingest the wiki content. It kind of ends with like we need the web API that's kind of built on top and then some things around observability. like I didn't really ask for anything in the observability and ops. If I wasn't happy with this, get rid of it, right? Um or tell chat GPT, hey, great idea, but like you know, health endpoints. Cool. What health endpoints? So, it's not optional. I want health endpoints for, you know, these five features or something like that. up to you to decide, but this is why I'm calling it out because doing it early at the beginning is going to mean
that it's at least designing stuff from the start to incorporate these things. It's not that it can't be done later, it's just a really good time to do it in the beginning. So, those are the work streams and this is the actual to-do list. The difference between these two things, I think that this part is more around the high level and what things can be parallelized. And then if we scroll down, this is a little bit more specific and a little bit more detail that it can follow the checklist. Overall, pretty solid in terms of the steps that it put together. It has some things to go design against, like some non-functional requirements. To be honest, I don't think that any of this got used. We didn't tell it how to go measure these things. I think it's just super high level. If you really
cared about insuring these things, I would make sure you put into this PRD, how do you expect to have that proven? Because without that, you're leaving it up to the agents to go say, "Sure." Like, I've seen agents basically say, "We've optimized, you know, this code flow and it's 100% faster." And I go, "Great. Show me the benchmark results." And it's like, "Oh, we couldn't compile the code." So, I'm like, "How did you actually make it faster if the code doesn't compile?" So, make sure you're trying to be specific and again I'm going to keep saying it, do it early so that this gets carried through with the rest of the design. The rest there's, you know, a few more things on deliverables for a checklist, but what I did at this point was I opened up my terminal and I, like I said, I'm
using Claude Flow. I have a video on setting that up if you're interested. You don't have to use Claude Flow, though. You can use claude code and then instruct the console to basically try to do these things in parallel using tasks. So you can have custom agents and stuff, but Claude flow puts a lot of that together for us. What I did was I kept my prompt very very simple. I basically just said use the PRD file that I have and go build this software. And a little spoiler, it did in fact work, but it needed a handful of changes. So, if you're interested in seeing what it produced and what I had to tweak after it was all built, you can check out this video next. Thanks, and I'll see you next time.