EASIEST Way To Approach Data Processing in C# - Pipelines

Name: EASIEST Way To Approach Data Processing in C# - Pipelines
Uploaded: 2024-01-12T16:00:21.0000000+00:00
Duration: 16 min 37 s

January 12, 2024

• 4,957 views

Are you wondering how to process data effectively in C#? There's a simple design pattern that paves the foundation for doing this!

The pipeline design pattern is one of many design patterns that we can leverage in CSharp. Using different pipeline stages, we can chain together a system that does data processing for us!

Have you subscribed to my weekly newsletter yet? A 5-minute read every weekend, right to your inbox, so you can start your weekend learning off strong:
https://subscribe.devlead...

View Transcript

but wait we've seen a pattern like this before if we've been using C and you're more familiar with the language so a really popular thing that we do in software engineering is process data and often when we're processing data the way that we structure that en code ends up looking like we're doing something in one stage and then moving on to another stage that uses the output from that and we kind of chain these stages together fortunately for us there's a design pattern that we can use to accomplish this and that's called the pipeline design pattern in this video we're going to look at some C code to be able to set up a pipeline for processing some data we'll see how we can Define our pipeline stages and how data is going to flow from one stage to the next before I bring up the code just a quick reminder to check that pin comment for my free Weekly Newsletter all right let's dive into visual studio all right so the code that I have on this screen right now is just a quick little console application and what we're going to be looking at is how we can get some text input and process that through some different pipeline stages so you can see that I'm just writing to the console to ask for some input and then we're going to get that input out of the console and eventually we're going to Output the result of the pipeline once we're done processing it but it's important for us to understand what it means to be a pipeline in code well as I was mentioning in the intro to this video the idea behind a pipeline is that we're going to have data that goes into one stage of the pipeline and that's going to be a foundational building block of this design pattern so information goes into a stage it does some type of work and then information comes out of that stage to go into the next so in this particular case you can see that I've just mocked up some pseudo code here to have some process input stage in our pipeline that we might have and it's going to take some input and I should have put input right here so we would get the input passed in to this stage and then you can see that we get the output and I'm just assigning it to a dummy variable for this pseudo code to show you the pattern right so once we have the output we then go put that into the next stage of the pipeline and we're calling that clean text in this case so that stage of the pipeline would run and then we get the output stored into a variable and this pattern continues for the pipeline that's the whole idea is that we go from one stage to the next and this is a really simple pipeline approach so we get all the way until we're summarizing that final step from count wordss and we get a summary output from there we write that to the console and in a pipeline design pattern this part is what we would call the sync so that's the end part of the pipeline where the data is written out to if you've done some work on the command line you've probably seen the pipe character get used and we're piping data from program to program especially if you're working in Linux right that's more common that you'd see that and then sometimes you might see that we're you know outputting stuff to null or something like that on the command line like the null device and that's cuz we're saying that's the sync of the pipeline that we're doing because we don't care about the output right you would write the output put to null in this case we do care about the output so we're going to write that to the console but this is really just kind of like a dummy setup for uh you know some pseudo code for how we could make a really simple pipeline but I figured for this video we can go implement this pipeline we can see how it works and then talk about this design pattern so there's many different ways in C that we could go Implement a pipeline to do this but I'm going to start by introducing some delegates and delegates if you're not familiar they're just method signatures that we get to work with so this is just defining the signature for a method that we'd want to have that's called text cleaner it returns a string and it takes a string input right it's very similar to if I would Define the method right just like this here and then I could return something in here do the work right so this is basically saying the same type of thing as this delegate it's the same signature but the reason that we have a delegate instead of just defining the method is because we're saying that we want to have almost like an API that we're working with and you can see that we're going to have three different methods we're working with and you could if you're interested you could implement this with interfaces instead in fact I'm going to show a more complex variation of this after this video where we can define a cohesive interface for our pipeline stages and automatically put the pipeline together that's a little bit more complex so we'll wait till after this video in order to do that so these method signatures these delegates are going to define the different stages that we want to have in our pipeline right so they do kind of match up with what we have up here so you can see text cleaner would be this one we have a word counter for count Words text summarizer for this summarize one so these are going to map to the signatures that we want to use in our pipeline but we have to go Implement these things so the way that we would go Implement these delegates is to Define them in this type of syntax here so to explain what I mean by that this part that I have highlighted is in fact a method implementation so just this part is the body of the method from line 15 to 18 text is going to be the input parameter and then we're taking this and saying that we want to assign this entire function into the variable that's called cleaner and the type of that variable is that delegate type so it's a little bit confusing I'm going to say it one more time because this is important to grasp in order to kind of move forward on this but this is the body of a method from line 15 to 18 text is that input parameter so again I'm going to scroll down so you can see it but remember this text cleaner needs one string that's the input right so scroll back up that's this parameter here and that means that this that I have highlighted which includes the text parameter and the body from 15 to 18 this is an implementation of the function all that we're doing is assigning that implementation of a function to a variable called cleaner and then we're saying that it has the signature called text cleaner and it does because if we change the signature to be something like this right this is not going to be compatible with the text cleaner signature that's on that delegate you can see that it's moved the squiggly line to indicate an error up to here because it's like hey like this isn't looking good so let me backspace that now we have it nice and clean but what we don't have is the implementation yet but I just want to walk through these parts first before we go make the implementation so in this case if we look at word counter is our next stage in the pipeline it takes in the clean text right so that's going to be the input parameter into this function we would have to go implement the function here in the body and as you might imagine if we go to the final stage in our pipeline this text summarizer it takes the word frequency that's the input to this stage but it's the output from the pre previous stage so that means if we were to look at word frequency if we look at the type of that that's going to be a dictionary keyed by strings and the values are integers and I know that without seeing it written out here which we could do explicitly but I know that because that's what this is here it's a dictionary keyed by strings with values that are integers and if I look at text summarizer that's the type of this delegate so it's able to infer the compiler is able to to infer what this type needs to be so these are the three different stages that we have but let's go see if we can come up with an implementation for these this part the implementation isn't so important so I'm going to go kind of quickly through it but I just want to have some type of implementation here so that you can see these pieces working together let's start by making our text cleaner implementation all that we're going to do is remove the punctuation and convert that text to lowercase why I mean no good reason this part's not really important I just want to to make the stages and then we can look at how this works when it's wired up but quickly as you can see we're going to take all of the characters where it's not punctuation so we're removing them and then we take the output of that and make it too lower so that's going to be our cleaning stage and that means when we return this text that's what's going to be fed into this next pipeline stage at this point we still haven't wired up the stages we're just making the implementations of the stages first the cter stage is going to be a little bit more complicated but it's still not so bad so we're going to take in that clean text like we talked about and what I'm going to be doing is counting the frequency of the words that we have inside of that input text that's been cleaned right it's not the original input text to this point it's the part from the stage before which is cleaned it's all lowercase and it has the punctuation removed so what we're going to be doing is declaring a dictionary up here and then we're going to go through each each of the words and we can do that by splitting the text on Spaces we'll check if it's white space and just kind of skip over that that's not really important for us but we're going to try checking if we have this word already that we've come across with a count we're just going to get the count out post pre-increment it right so this will increment it first and then we're going to assign it into the dictionary back at that word location and otherwise if we haven't seen it we're going to initialize it right to one right so the first time we come across it we'll add that to the dictionary with a count of one but we Loop through all the words and then we make sure that we're counting all of them that's all that we're going to have for this stage so let's move on to the next stage all right and the implementation for our summarizer stage is a little bit arbitrary we're just going to pick the top three words and we're going to pick the top three most frequent words that we came across so that means we're going to write some link here we'll do order by descending so that we can get the highest counts first we're going to take the three like I had mentioned arbitrary but that's what we're going to do here and then we're just going to pick the key because the key is the word from there we're just going to format a string that says the top words are and then we're going to write them out if we wanted to we could also include the count but in this particular case we're just going to take the top words so I've just gone ahead and collapse all of the stages to our pipeline but we still have the work of combining them together to make the pipeline they're just individual stages right now nothing calls them nothing's hooked them up so let's go do that part next the important thing to remember here is that these are not method bodies that we've called and defined outside of a method this whole thing is a top level program so these are in fact variables which is a little bit weird to think about because I think a lot of the time we're used to seeing methods written out as separate methods but to make this a little bit more obvious these are my pipeline stages and what I'm going to do is I'm going to let apparently co-pilot go complete all this code for me so if I press tab we'll get clean text and then if I press enter what I hope to see is it's going to pass the clean text into the counter so word frequency right so we count that's this counter pipeline stage we'll pass in the clean text we get the word frequency back out of that and hopefully co-pilot is on a roll here we get the summarizer called with the word frequency so you can see this kind of pattern where it's zigzagging right we take the input pass it in then we get clean text that becomes the input to the next stage the word frequency becomes the input to the next stage so on and so forth until we want to go write it out so I'm just going to take this console right line that we had originally and we can write out the summary this is a really simple pipeline that we've just built together in C it has three stages and the sync is going to be this console right line and we're wiring it up manually as in I just Define this explicitly by hand how we pass data from one stage to the next now let's go try this out to prove that it works all right so with the program running we have to come up with a phrase that I can add in here that's going to have a couple of repeated words so that we can see something interesting in the analysis so let me come up with something good all right so the text that I'm going to use is Dev leader is an awesome YouTube channel if you watch YouTube then you should check out Dev leader now I'm a little nervous because I'm going to press enter here but I'm trying to make sure that I've counted in my head the right number of things that I'm expecting to see but let's just try it out the top words are Dev leader and YouTube so that is what I was expecting to see that's good news but what's interesting and I forgot about this is that the three stages of our pipeline the first one is going to take all of the punctuation out right that's not such a big deal but it also makes it low lower case so I was a little bit confused when I saw it all lowercase but totally makes sense that's the very first stage of the pipeline so if you're doing a quick double check and looking up at the top here I had YouTube twice right so YouTube YouTube Dev is here and leaders right after but it's also right here on this part of the text as well so it seems to check out that's good news but wait we've seen a pattern like this before if we've been using C and you're more familiar with the language there's something built into C that a lot of us use regularly and it's structured almost the exact same way can you think about what it might be well let me head back over to the code and let's modify this a little bit and I'm going to show you something that's built into C that does something just like this not the specific implementation of counting words and that kind of thing but the way that we've chained our different stages together all right I'm back in visual studio and the code that I had at the bottom of the screen that was a little surprise is right here now so the design design pattern with pipelines is really common with tasks and a lot of the time we're used to writing async await code and C sharp right we have this async method we've defined and then we put a weight in front of it and call it and we're getting things that come back and you might be familiar that you can go run some tasks in parallel with each other you can you know wait for multiple tasks at the same time but something you might not have tried is using tasks this particular way we can say that we want to run a task and then when the task is done we're going to continue with yet another task and then when that's done we'll do the same thing and what's interesting about this structure is if you're looking at the syntax here where we have task task and task this is going to be the previous task so what that means is we can take the result from the previous task and go leverage it in the next step but this is just like a pipeline the first task that we're running is the first stage and then we chain them all together and in fact the last stage of the way that I have this here is going to be the sync of the entire pipeline let me go expand this and show you the first part is the first stage that we had in our pipeline right we're doing the text cleaning and the next part is going to be this word frequency part we're counting the words the next part as you might guess is the summarization and we already know that the sink was going to be the console right line in this particular case there's literally no benefit or any reason to go structure it this way with tasks I just wanted to illustrate that tasks afford US this opportunity to effectively build a pipeline with the same type of syntax so so far what we've seen is that we're able to build up pipelines with individual stages and pipelines are defined by these stages that we chain together each stage has some input and some output and we take that output and pass it to the next input of the next stage now even though these were simple pipelines you couldn't in Theory run pipelines that have stages that tea off and you're running multiple different parts of a pipeline and you could go do that in parallel but something else that we looked at in this video was setting up the pipeline stages we defined some delegates that we wanted to have for the different stages and then manually wired up the inputs to the outputs so that we could go run it well if you're interested in seeing how you can structure a pipeline that has a bit of an automatic configuration you can check out this video next thanks and I'll see you next time

Frequently Asked Questions

What is the pipeline design pattern in C#?

The pipeline design pattern is a way to structure data processing in stages. Each stage takes input, performs some work, and then passes the output to the next stage. This chaining of stages allows for organized and efficient data processing.

How do I implement a simple pipeline in C#?

To implement a simple pipeline in C#, you can define delegates for each stage of the pipeline, create methods that correspond to these delegates, and then wire them up by passing the output of one stage as the input to the next. This way, you can process data through multiple stages seamlessly.

Can I use tasks to create a pipeline in C#?

Yes, you can use tasks to create a pipeline in C#. By chaining tasks together, you can run them sequentially, where the output of one task serves as the input for the next. This approach allows you to structure your code similarly to a pipeline, leveraging the async capabilities of C#.

These FAQs were generated by AI from the video transcript.