BrandGhost

Epic Fail or Promising Attempt - Auto Pipeline Config in C#

The pipeline design pattern is an awesome way for us as C# developers to be able to process data. We can configure stages to wire up and pass data from one to the next. However, one of the challenges is setting this design pattern up! Nobody wants to configure it by hand... Will this thought experiment lead to an easier way for us to configure pipelines in dotnet? Have you subscribed to my weekly newsletter yet? A 5-minute read every weekend, right to your inbox, so you can start your weekend...
View Transcript
so this doesn't really feel any better in fact I would argue right now it's a little bit worse the pipeline design pattern allows us to be able to connect things in programs to be able to process data from one stage to another and in a previous video I made which you can find right up here I was explaining some of the basics of the pipeline design pattern in C now one of the things that we see with the pipeline design pattern is that we're responsible for connecting these pieces together and that can look a little bit unwieldly depending on how many stages you have now when I made that video I was promising that I was going to follow up with a better way to connect these stages so I wanted to put this video together to walk you through some ideas behind that but I did run into a problem and I still wanted to make this video for you because I think there's some interesting insights so I want you to stay tuned because we're going to see how complex this gets when you really want to start making your pipelines be configured more automatically before I jump into the code just a quick reminder to check that pin comment for a link to my newsletter and the courses that I'm working on all right let's jump over to visual studio if you haven't watched the last last video where we ended up was I was explaining that we had several different pipeline stages and if we wanted to we don't have to but if we wanted to we could use something like the task object syntax that we have to be able to run a task and then do continuations with it but we saw this example without the tasks as well we're just responsible for taking the input putting that into a stage and then taking the output of that stage to feed into the next input that's essentially what this is doing just with tasks so knowing that what the goal was was that we wanted to be able to come up with a more clean or simple way to go build that so for me that was going to mean moving towards dependency injection I love using dependency injection to be able to simplify object creation and we can use things like Factory patterns as well and sometimes combining Factory patterns with dependency injection can really clean things up it kind of moves all the complexities into your dependency injection and Factory class creation so as we walk through this in the start it's not going to look much more clean and I apologize for that but I need you to stick with me on this and we can see how we can evolve it okay so the code that I have on screen to start with is going to be using autofac to configure dependency injection and you can do this exact same type of thing with the buil in.net framework dependency injection framework I'm just using autofac because I'm more comfortable with it so all of the code that we see highlighted right here from line 3 to 10 is creating what called a container builder in autofac much like a service collection what we're doing from there is going to register the different things that we want and that registration happens in this module and you'll see that I have the different pipeline stages that we want to have configured much like the example from the previous video so we have a source for the pipeline that's where we're going to get the data from then we have a bunch of stages in the pipeline and then a sync which is going to be the end of the pipeline and then I have this other class that is the pipeline itself so we're going to see what that looks like but I just wanted to show you how the dependency injection is set up now because these pipeline stages don't have any state I am going with single instance on all of them they're technically like a Singleton but that's because they don't have state if you have state in your pipeline stages you definitely don't want to do this kind of thing because you might accidentally share a pipeline stage so let me go ahead and collapse this module what we end up with at the beginning of our program after getting our dependency injection set up is this spot where we can go ask for the pipeline from our dependency injection framework and we can simply call execute on it right now based on what you're seeing I'm hiding all of the complexity every time we want to go use this pipeline we can simply just resolve the instance of it and because there's no State we can technically just reuse this pipeline instance whenever we want to go run one but as you can see even with the dependency injection setup there's a lot more code and we're not not even done yet because we have all of the stages still these are going to be very similar to what we saw before so the pipeline sync is really just where we write to the console The Source was asking for that console input and then these different pipeline steps were really just the different things that we had where we were cleaning the text counting things and then writing a summary these were just unique to the scenario that we were looking at in the previous video on top of that I went a little bit further and I made the different pipeline stages have a different unique return type so I'm using records to have DT's data transfer objects for each of those stages did I need to do that no but this was one step that I was thinking about in terms of trying to make each stage look a little bit more unique instead of just having a stage that can return a string or take in a string I wanted to start seeing if I could shape the stages a little bit more if they have their own unique input and own unique output maybe I can get a little a little bit more information about how to piece these things together that is if I have something that can only take in a specific input that means the output from something else must be wired up to that if they have the same output to input so that's where my train of thought was going however when we go look at how we create this pipeline that's going to be this class right here so using dependency injection I can take in the source the different steps and then the sync from there what I'm able to do is just chain the pipeline stages together and if you've watched the previous video this is basically the exact same thing but it's even more coat and I was getting excited when I was putting this together because I started using dependency injection and then when I started to piece it all together I went wait a second this isn't really helping at all it's just making more work so why would we ever want to go do this we still end up with this same problem of getting the output of one part putting it into the next part taking the output of that putting it into the next part so so on and so forth so this doesn't really feel any better in fact I would argue right now it's a little bit worse okay so I didn't want to let everyone down and when I started looking at this I said okay hold on there's got to be something more I can do with this because I didn't make it any better I added overhead for dependency injection what are we going to do and I think that the problem comes down to our pipeline class I'm trying to package it all up into one spot and I was hoping that it was going to fit together a little bit more nicely but it turns out that even with dependency injection I still have to try and figure out how to connect these steps that's where the issue is all of the manual work that we have is really the step connection that's where it sucks and dependency injection isn't really helping with that so I figured okay maybe the example I picked is not a good candidate for being able to do this if we come up with something else maybe I can illustrate some of these steps being connected in a more obvious way so I put together another example that we're going to look at it's going to make the step connection a little bit more clean and easy to work with but we're going to see another catch that we can iterate on so let's go back to the code and check that out so as I was saying passing in all of these manually for the different steps that we have and if you think about extending your pipeline and adding more things in it means that you got to come back here add a new dependency it means that you're going to have to go wire them up in here again you have to keep touching your pipeline to add another stage in and I really wanted to get rid of this so let's scroll down a a little bit lower in this code now this is where I started to go a little bit Overkill I figured maybe if I put some interfaces together and I pull out some apis for these different stages it'll start to clean things up and it kind of did so let me explain what I did here in a little bit more detail I'm not going to go in extreme detail on each of these but I started thinking about pipeline stages at the sync at the source and at the individual stage level in between is having some common interface and then also some unique API for each part so for example the source of a pipeline doesn't have any input to it but it can only output things and the sync is the exact opposite right it can only take in an input but there's no output the intermediate stages that fit in between if I scroll down to that here you can see that that's going to have an input and an output right so when we execute a pipeline stage it needs something in and something out and you can see that I'm starting to think about using these typ parameters in order to try and rationalize how can I connect one stage to another so the problem still existed though and it's going to come back to these tight parameters when we start to reflect on where the challenges are but let's scroll down a little bit lower and see what we got in this new example I'm using a source here and a sync and what's interesting about these now is they implement the interface you can see that the start method on the source is just spitting something back out and the sink is just going to be able to take something in with no return return value just like the interfaces suggest now I wasn't able to get a fully automatic pipeline created but this is a little bit closer and if we think about the dependency injection that we have this is where the interfaces are going to make this a little bit more streamlined so if I was able to go register this source and this sync on the dependency container what I'm able to do from there is when this pipeline is also registered there it will go try to automatically resolve the source that meets this interface the sync that meets this interface and then any intermediate stages in the middle now what's cool about this is if we look at the execute method we have the source starting it off right so when we run the pipeline we're going to start with the source of course then we have stages and what's cool about the stages is that I don't have to manually connect them we just take them in the order that they come in and that way when we go to run a stage we can go past the output of that into the next one and finally we were able to put that into the sync if we compare that to the code that we saw Above This method here or at least to go build a pipeline we don't have to manually wire up the different stages that we want we start with a source we end with a sync and then we connect all of the stages automatically because we're just using a loop like this okay so this is a little bit better in terms of our pipeline creation because we don't have to manually connect up the steps but there's a couple GES with this pattern the first issue is that if we have multiple pipelines that we're trying to configure the way that this works especially if we're just dealing with really generic types as inputs and outputs that's going to make it really tricky to go configure this properly so if you had a pipeline source that produced a string I mean if you had more than one of those how are you going to make sure that you're configuring the right one when you go to do dependency injection like this the same thing with the sink and the same thing with all the steps in the middle right if they take in strings and they output strings how are you going to know if you have some stages for this Pipeline and some stages for that pipeline suddenly the magic of the dependency injection helps in terms of connecting things but also complicates it because it's doing it magically for you and you don't have a lot of control now I've already raised a couple of problems with this approach but one that I wanted to go back and solve in this video was the idea about the stages in particular for a given pipeline what we're not going to address in this video is if you have multiple different pipelines and you want to use really generic inputs and outputs like strings for example this one is not going to take care of how to make sure that those don't collide with each other this assumes that you're going to have one pipeline that's created or that you have very specific inputs and outputs like a dedicated input to your pipeline and a dedicated output there's still one problem with this though that stands out to me and that's the different stages that you have and the order that you need to work with those stages so for example if I had a three-stage pipeline so the source and the sink and then three stages in the middle using dependency injection how do I make sure that those stages are in the the right order let's go check out one more revision of this code to try and solve this problem okay so one more change that I made to here was creating yet another interface to extend what we already had and there's no reason that I had to extend it I could have just modified the base one but I wanted to show you this evolving so all that I've done is I've added a priority onto the different pipeline stages what's interesting from here is that what I can do is make a pipeline Builder so I have a new Factory type of pattern that I'm using I take in the source the sync and each of the stages and as long as the stages implement this new interface that has a priority you'll notice that here when I go to build that pipeline right this is not executing the pipeline it's just building it when I go to build it I'm able to order the stages by the priority they have and then I'm able to put that to an array that way I do this one time when I build the pipeline and then from there that pipeline is configured to be in the right order for the stages that it has and if you recall if I jump back to that pipeline up here if we go look at it if we look at the execute method we can see that these stages it still has no idea about the priority but this is going to assume that they come in order now but again if we look at the final product of this when we go to build it up and then when we go to run it this is a little bit better because if I wanted to go add a new stage all that I would need to do is go to the dependency injection registration portion of this make sure that I'm adding in my new stage and assigning the right priority it will automatically get picked up by dependency injection and then from there the part that we saw below when it goes to build the pipeline it's going to take that in on this parameter here it will make sure that they're ordered properly and then when we go back up to execute this pipeline that step will be taken in automatically in the right spot with respect to the order so this is starting to get a little bit better but what you're probably noticing is that this is adding a ton of complexity just to be able to do this kind of thing now in my opinion when you're building building stuff like this adding some complexity into a framework portion of things might not be so bad as long as you're building a framework that's easy for people to use if every single time I wanted to build any type of pipeline I had to go do this much work that would really suck but if instead I could build a pipeline framework once and make sure that I can build different types of pipelines with that it might be worth it to do a little bit of upfront work build a bit of a complex pipeline framework in the start and then using it is very simple now unfortunately at this point in time everything that I've showed you is still not making it as simple as I want the big challenge that I'm seeing so far is that when it comes to connecting the pipeline stages there's a lot of dependency around having really simple pipeline steps we go from one to the next and a lot of this comes from the fact that I'm using a loop to try and iterate over the stages and really just feeding in the same output type into the next input type it means that between stages we can't change the type in pipelines that I would like to build and if you think about using link for example a lot of the time what we're able to do from pipeline stage to pipeline stage right if you're doing a select and then another select or some other type of transform with link what you're able to do is have an input to that stage then you transform it and that goes into the next step we're not getting that at all here so I think that one of the tradeoffs that we need to start exploring is that if you want to have your pipeline automatically configured you're either going to have really complex logic to try and automatically figure out the right pathway which should come with some risks right so how does it know to go from this stage to the next or if you want to have automatic configuration you have to assume that you're going to have really simple pipeline stages kind of like I was showing this example the alternative of course is going back to having a manual set up for your pipeline and perhaps that's really not so bad after all if you put it behind a factory if you need to reuse the pipeline a lot you can essentially go configure your pipeline in one spot and that means that if you need to reuse it you can call the factory to go build a new pipeline for you and not necessarily have to worry about 10 different spots in your code to go configure this thing properly now based on how complex this is getting I'm not totally sure if I'm going to follow up quite yet with doing maybe some type of reflection system that can go look at configuring pipeline steps properly but that's where my head's at I think that's probably something like I would do so you give it a handful of stages for your pipeline and then it will say okay can I figure out how to get the input of one to the output of another that might be kind of fun to build but I think there's going to be a lot of headache that follows with that so I want to wrap up this video but I hope you found that insightful just to see how you might navigate something like this it's not a bulletproof solution it's probably not a solution that you really want to lean into but it's an option and I think that if you understand some of the decisions and tradeoffs that we're going into something like this it'll help you form opinions about things you want to build and where it might be good just to cut your losses so thanks for watching and I'll see you next time

Frequently Asked Questions

What is the pipeline design pattern and how is it used in C#?

The pipeline design pattern allows us to connect different processing stages in a program to handle data flow from one stage to another. In C#, I explained how we can use this pattern to create a series of stages that process input and produce output, chaining them together to form a complete pipeline.

Why did you choose to use dependency injection with the pipeline design pattern?

I chose to use dependency injection because it simplifies object creation and helps manage dependencies between different components. By using a dependency injection framework like Autofac, I can register my pipeline stages and automatically resolve them, which can clean up the code and make it easier to manage.

What challenges did you encounter when trying to automate the pipeline configuration?

One of the main challenges I faced was the complexity of connecting the stages together. Even with dependency injection, I found that I still had to manually wire up the stages, which added overhead. Additionally, ensuring that the stages were in the correct order and managing different input and output types made the automation process tricky.

These FAQs were generated by AI from the video transcript.
An error has occurred. This application may no longer respond until reloaded. Reload