How To Cache Filtered & Paged Database Results In C# dotnet
July 12, 2023
• 1,235 views
In this video, I'll show you how to cache filtered and paged database results in C# by walking through my result set cache implementation. This will help you reduce the amount of traffic your application sends to the database, and make the application faster overall.
If you're working with a filtered or paged database in C#, then this video is for you! By caching the results of the database queries in memory, you'll reduce the amount of traffic your application sends to the database, and make t...
View Transcript
are you building applications like query data sets with filters and paging well my goal in this video is to show you some designs that I have for caching data sets and how I'm incorporating that in my own asp.net core application so before jumping over to visual studio and going through code I just want to talk about some high level Concepts that should hopefully align us on what we're trying to accomplish here and then hopefully that will make the code a little bit more easy to navigate while we're trying to scale up applications one of the things that we have to be concerned about at some point is that if we're running a lot of queries to hit a database that can be something that's really expensive and it's expensive because of the round trip time to go from your application to your database to get
those results back depending on the types of data you have of course this could be still pretty fast or you might find that with more user load and more queries running that this becomes a bottleneck in your application so of course you're going to want to do some benchmarking before going Overkill trying to Cache things and optimize but let's assume that you've done some of that work you're finding that your database and your filtering and paging is becoming a bit of a bottleneck then hopefully what we're going to talk about will make a lot more sense and be a lot more applicable so in this video I'm going to talk about a result set cache I've touched on this a little bit in a previous video which I will link above and I didn't get into the design of the result set cache I got
into the design of the other caching components with sort of a tiered architecture but in this we're just going to focus on the results that's being cached so what do I mean by a result set well if we consider a filter that's being run on a set of data and I'm trying to be generic here but to be more specific we could say like a particular query that we want to run on a mySQL database for example and it has like an offset and a limit so you're doing some paging as well what we want to be able to do is be able to snapshot the results that come back and then the next time we go to run that query instead of running the query we can actually check our cache and say hey look we have a filter that actually matches this we
have a cache of that filter and we can return that data from either memory or some other type of cache again going back to that multi-tiered cache idea instead of going to the database and running the query again so effectively we can short circuit going to the database now a couple of things that we have to keep in mind is that when we're thinking about caching this kind of stuff one we can't just go store everything in memory and potentially in your case if your data sets are small then maybe you can in my case I have at least one type of data set that has I think approximately two million records and there's no way that I can pull that all back into memory so for me I have to be able to do paging for my users but I also have to consider
how much I can store in memory when it comes to the results of the filters that I'm running so the one limitation is going to be the memory limit if we are caching things in memory even if you're using redis it's going to be memory somewhere the other constraint that we have to navigate around is that at least in my application design you can request Pages technically out of order so this is going to mean that if we have a filter that's running let's assume that the total set of results that can come back is a hundred thousand maybe the first person to run that filter we know there's a hundred thousand records but what's actually happening is they're requesting the first 100 items in that hundred thousand there's nothing to stop the application from allowing paging to go say get records 300 to 400
of the hundred thousand and technically that process can keep taking place to if you're thinking about it kind of as like like a bucket filling up to go keep getting these different segments and filling up that potential hundred thousand but going back to the first point I don't even know if I could store a hundred thousand records in memory so the constraints kind of put together will mean that we have some concept of being able to pull in Pages potentially out of order and we have a limit to how many pieces of those pages we can actually fit into memory so the class that we're going to be looking at is my design for a result set cache and I guess one more point that I want to make is that the result set cache itself is responsible for mapping a filter with sort of
some page constraints so an offset and a limit or like a page size and then the other part to that cache so it's a filter map to the IDS of the records and then there's another part to the cache that actually has IDs mapped to the entities as well so we are kind of separating the concerns of how you're going to Cache your entities with just the filters with a paging mechanism mapped to the IDS of those records so that's my Preamble let's jump over to the code and if you stay right till the end I'll have another video where I walk through why unit tests were super applicable for me in this scenario and how they made my life a lot easier in ensuring that this class was working as expected all right I'm here in visual studio and I have my result set
class pulled up here and the first thing we're going to notice is that a result set is composed of result set Pages which I will show you next so it can have multiple pages in it and we must know the total count of items for the filter that we're running and this is going to be an important mechanism for us to actually understand and make optimizations about how we consume this data by knowing that there is a full result set pulled into into memory here so a way to think about this is that we would have one contiguous page so this read-only list would have one item in it with a number of IDs that match the total count for filter if it's a full result set and that's what this property here actually demonstrates for us right so one page and that single page
has all of the IDS for that filter in theory you could have multiple pages and if you were to sum them the IDS should match up but the way that this class is designed is that it will merge the pages such that there is only one by the time that we want to have a full result set loaded into memory I'll come back to this method after I just want to go jump to the result set page first so the result set page is extremely simple if you recall what I was saying previously when we're talking about result sets for these filters that are running we're actually just going to be tracking the IDS for the entities that we're interested in and you'll notice that this is really very simple it's generic I'm not you know requiring a specific type of ID here this class
or this this record doesn't actually care at all it's just going to have a list of IDs and it's also going to have an offset there's just a couple of other convenience properties here and this is really just to help give it this uh this feeling that it is a page right so if we have an offset having this end index we're going to have a count which is truly just the uh the IDS collection the the count of that so just a couple of convenience things added on to here so that we're reading through the algorithms that consume this it's just a little bit more easy to understand if I jump back to the result set record that we have here I'm just going to walk through this other piece it's called try get page and try to explain how given a result set
how we'd be able to pull back a particular page defined by the offset and the page size here and something interesting that I just like to call out is the logic for this method is going to basically try to interpret this offset and Page size and figure out how that aligns to the data that's currently pulled in based on the pages we have and this method will only be able to return true if we were able to get the page based on the constraints that were passed into here so for example as if you recall what I was saying at the beginning we can technically have you know multiple pages of IDs pulled in right and they don't have to be contiguous so if a full result set before it's merged into you know the the single page if before that happens it was actually
a total of let's say 10 different pages of data let's assume each page was say a hundred records and there's a thousand in total so there's 10 pages that would make up the full result set if we only had three of those pages loaded for example and someone was requesting a particular offset and a page size that we did not have access to then this particular method would return false it's trying to suggest hey look based on this cash that we have we don't have the page that you're asking for so you're going to have to go get it for yourself so with that said let's expand this and check out what's happening inside of here I just have a couple of different contracts at the top here that kind of enforce that we have the right parameters passed in like the right ranges of
things so that's all that we're seeing right here at the top if a particular page size being requested is less than one then we're basically saying like you're not asking for anything so no if you're asking for that we're not giving you anything back it seems like it's a little bit of an invalid input but based on my design I was just saying like you know if you're looking for something less than one then I'm just gonna say it's uh it's empty so next we're going to talk about this full set optimization that we have here I do have a fix me because um I am not totally sure yet if I know the optimization I want to make here so I'll touch on this in just a moment but if we have a full result set and if you recall a full result set
is going to mean that we have one single page that has all of the IDS for that filter that we wanted to run and if we're looking at this next condition here someone is basically saying give me from zero so start at the beginning and give me as much as possible so this is actually you could make an assumption here that this should say you know double equals so exactly the full set then we just give you back the full set from the cache however there's a couple of instances of my code where I'm not respecting the page size so I might say like give me from offset 0 to int.max right that might be like a ridiculous page to go ask for but the implication of that or the the expectation as the caller is is like hey just give me everything that you
possibly can so if everything you possibly can essentially maps to give me the full result set for the filter then in my opinion that was a good match for just give me the full result set and instead of failing it for going over the limits it's just telling the caller like whatever you're asking for like this is truly the limit of it and it's everything that matches the the record set for that filter the other thing that I just want to call out is that if anyone's asking for a page and we have the full set I left this fix me here because I think there's probably some other optimization where I could do like a basically Pages like take the first and only page because it is a full result set I could probably do something like a skip so skip the offset take
the page size right so I could use something like link here and then return that um I have not actually tried that out um when I get into the unit testing video which will follow this one that might be something that we could explore together in that but for now I've just left this here because all of this is supposed to be an optimization to avoid the database and right now this didn't seem to be something that was uh you know causing an issue to not have this optimization in place let's go a little bit lower here some of what you're seeing on the screen so far is really just kind of setting up this algorithm and essentially without getting into all of the details because it's not super important here this part of the algorithm is just going to be responsible for essentially building
up a list of IDs that we want to pass back that match the particular page that we're trying to get out and the only reason that this becomes a little bit more complicated than just a very simple for Loop or a for each Loop is just the fact that we have to walk over multiple pages to be able to potentially pull back that set that the caller is asking for so that's what this particular piece does it's just trying to map the existing pages to that range and you can see that we end up adding those pieces right here and you can also see that there is this concept of skip and take so like I said something like this this line here line 68 probably if I scroll back up probably applies to here on the the single page scenario for the full result
set so probably we'll revisit that next this last part here is just looking at whether or not we can return that page so did we end up getting to a situation that's going to meet the expectations of what the caller was asking for so this will do and then this debug a cert is just because on top of the unit tests I had I explained in other videos that when I write tests it's to build confidence and for me this type of logic where I am mapping bounds of different pieces to a new set of Records this type of thing is something that always historically I've been programming for uh like 20 years now and this type of algorithm where I'm looking at different bounds and mapping them between different things always messes me up so this is just an extra layer of confidence so
that when I'm running things if I happen to miss testing scenarios that this would help um basically catch things at runtime and when I'm debugging so in particular this last line that I have highlighted on eight line 84 here this is actually a debug assert that I added because there was a weird situation and by having this line in here when I was exercising some more complicated filtering I actually ran into this issue so this was actually really helpful just as a stop Gap and that's mostly it for this result set class that we have but a result set on its own is not the full picture so I mentioned at the beginning that the purpose of the of these result sets is to map filters to these result sets right so we're going to have a filter that we can go run on something
like a mySQL database or whatever you know whatever your back end is we're gonna go run a filter on that and then we want to map that filter to the result set and that way we can have a cache that says if I'm running this filter again do I have results that I can just go look up from the cache so the next thing we're going to go look at is actually this result set cache class so we'll start from the top the results that cash is going to be generic again because we're only interested in caching the particular IDs and it's using this I cache layer and I will link another video in the top of this so that you can go see that because that's where I'm talking about my multi-tiered cache so for what it's worth if you don't go watch that
video and that's okay this I cache layer is just a concept that I have where it's a facade and it could be an in-memory cache it could be a distributed cache or it could be multiple layers of those caches so just an interface for caching I'm just going to collapse these methods so we can see what's available and what we have on the results that cache is to be able to set the things in the cache for a particular key and given a page which is also going to require the total count and that's a very important part like I mentioned because this particular piece of information will let us know if we have a full result set pulled into the cache and that's where a bunch of those optimizations came in the other piece that we have on here is really just trying to
pull out a result set given a cache key so you'll notice that on this particular class even though I've been talking about running filters you might say hey Nick I don't see anything about filters on here and that's right and it's because this I cache key I'm allowing the caller to basically decide how they would like to map a particular filter to a cache key for me I'm able to take the hash code of the filters that I'm running so my filters are basically just data transfer objects dtos that describe the constraints like the the conditions ending them together oring them together you know doing inverse operations they're just dtos and if I get the hash code of that it should give me a unique enough cash key to be able to go work with this but however someone else might want to implement a
cache key this is really just left to be a little bit generic for that reason so let's go look at the set method again just a little contract at the top to to double check for errors you can see that the first piece we have here is checking if we have a full result set so if we do we're just going to remove whatever we have in the cache because if you recall we could have had uh you know in particular like disjoint Pages already loaded in so we're just going to say whatever's there forget it right forget what's there and we're just going to go set that whole new result set to be the total new set of IDs that are pulled in so this is basically that situation where we're saying hey look we got everything we need and don't worry about what's
there right now because we can override that otherwise what we need to go do is get to see if there's already a a result set for this cache key and then from there what we want to do is actually basically add in a new entry if we don't have one right so if we go to ask the current cache do you have a result set for this and it says no Then no worries we'll just add in what we got so this is a pretty simple case here and then the final part is the more complicated one and I don't necessarily need to go walk through the algorithm here but essentially what it's going to do is try to insert the page into the cache if we need to go modify an existing result set so that's what this does here and then this last
part this is a little bit messy and it's kind of funny because as I'm creating this video there's probably I mean if I spent more time walking through the details of exactly what this does either myself explaining it or someone watching this might go Nick this is not optimal there's a way better way for you to merge these things and I'm not gonna sit here and pretend like I optimize this to be you know uh fewest allocations possible or fewest iterations uh certainly not claiming that but this is something again going back to me writing unit tests around this this is something that was working and now that I have tests in place if I need to go tune this for performance I can always come back and try to optimize it but just to collapse this for a moment so I can show you
kind of the overall algorithm at the end here what I'm doing is inserting a page into the full set and then double checking like do I have any overlapping ranges in my result set if so start merging them together and if you recall what I said a little bit earlier this is the part that's going to ensure that my result set if I have a full result set pulled into memory will always just end up having one single page so this enforces that if I had a thousand IDs in total that match the filter and each page was a hundred in terms of size then I'm not going to have 10 pages pulled in each with 100 records I will just end up getting one single page with all one thousand by the time this is done running then at the end we just set
that into the cache the next part here is just to try and get the things out of the cache and what's awesome about that is it's just asking the cache there's no complicated algorithm here and that really just summarizes the two operations that we can perform on this cache all right so that is the implementation of my result set cache I understand that walking through that code might be a little bit complicated especially because I'm just talking about it because I've already written it and you're not kind of sitting there kind of Designing it with me so I tried to go through at a pace where if you need to go rewind it and kind of watch over other parts hopefully it comes across clearly but just to quickly summarize what was covered the idea was that we have result set caches that will take
a filter and we didn't look at the filters being run that's an entirely separate thing you can Implement that however you would like and it's agnostic to whatever sort of data backend you're working with and the idea is that given some filter that's going to map to a cache key that's up for you to figure out how you'd like to do that given that filter we're going to have a set of IDs that match the results of that filter and we're also going to take into account the paging so filter and paging from there we're able to store the IDS that map to that filter and those pages and the classes that we looked at actually take into account managing the paging allowing you to pull out other sub pages from there and work with optimizations on a full result set that has been pulled
into memory one of the optimizations really requires that you have a full count of that filter so for example if your filter was going to match 10 million records from your data set I'm not suggesting you pull in 10 million records into your result set cache probably not a good idea but knowing that your result set does have a full 10 million you can actually have optimizations or the opposite right like you can't take advantage of some shortcuts and optimizations if you do not have the full set of IDs actually pulled in so I just wanted to like double down on that part because when I was designing this a lot of the challenges I was coming across like the the gotcha scenarios where that I did not have the count available so I had to go change my application when I'm running queries so
I actually have to go have a cache Miss go hit MySQL I have to go double up on queries now so one is going to be to pull back the the page of IDs that uh I'm I'm needing for my result set the filter that's being run but I also have to go pull back the count of potential records that map to that filter so two queries so the cache Miss becomes more expensive but when this caching's in place it ends up being so much faster and uh I think in general this is an approach I've used before in production and I wanted to apply it here in my own projects so I hope that this was valuable um I think that if you're if you're going to try implementing something like paging and you know result sets for your filters if you have a
different approach I'd love to hear about it in the comments because I think that this is a really the interesting space for optimizations so hopefully that was useful and thanks for staying right to the end because if you watch this video here you can see how I tried to unit test this and why that was really valuable so thanks for watching we'll see you next time
Frequently Asked Questions
What is the purpose of caching filtered and paged database results?
The purpose of caching filtered and paged database results is to reduce the number of expensive database queries by storing the results of previous queries. This allows my application to quickly return results from the cache instead of hitting the database again, which can significantly improve performance, especially under heavy user load.
How does the result set cache handle pagination?
The result set cache handles pagination by mapping filters to specific pages of results. When a filter is applied, I store the IDs of the records that match the filter in the cache, along with their pagination details like offset and limit. This way, when a user requests a specific page, I can check the cache to see if I already have that data available, allowing for efficient retrieval without querying the database again.
What are some limitations I should consider when implementing caching for large datasets?
When implementing caching for large datasets, I need to consider memory limits, as I can't store everything in memory. For instance, if I have a dataset with millions of records, I have to be strategic about what I cache and how much of it. Additionally, I have to manage the complexity of potentially requesting pages out of order, which means my caching strategy must accommodate these scenarios without overwhelming the memory.
These FAQs were generated by AI from the video transcript.