How to get TikTok Followers in C# .NET using Selenium
February 4, 2023
• 1,332 views
Interested in building your very own social media assistant in C#? In this series, we work through building out an application using #dotnet where we interface with popular social media platforms to get analytics and eventually help with content creation!
In this episode, I work with Jamal to get follower information from TikTok. We use his suggested approach of running Selenium to be able to get web page data and then pull our follower count directly from the loaded HTML.
The code for this pr...
View Transcript
so when we were last looking at this code here we were hitting a little bit of a roadblock because when we were trying to run our HTTP client it was basically hanging and when we started looking at what the the tick tock website's actually doing it's kind of the you know the worst fear becoming a reality where the site was not actually uh showing all of the HTML content right away it actually ends up generating that after the site's loaded so um basically using the HTTP client this way is not going to work so we spent a little bit of time trying to dig around to look for some some options I was really scratching my head because I didn't want to go back to the API calls started looking at a couple more advanced things but I personally hit a roadblock but Jamal you
had a really interesting idea and what was it that you were proposing we do yeah so basically in my limited programming experience a bit of Weds web scraping uh I suggested we use selenium uh so pull up a Chrome browser uh essentially automate the the experience of opening a browser and effectively going in and scraping it as if we did it manually right and I think that in theory like that approach makes a lot of sense because the whole issue that we're having is that we don't have an opportunity using this current mechanism to be able to have the website uh like load scripts generate the content and see it as you would um as a user visiting the site but with selenium that actually would allow us to do that so I think let's go try implementing this Jamal this is going to be
maybe a little bit more advanced than I was planning for us today because I think the the HTTP client route is kind of like a pretty basic thing um but when you encounter roadblocks like this I think your solution is more creative and I think is going to bypass this issue we have so all this code that you spent uh you know all of your effort writing I think we're gonna backspace everything except the comments at the top those are still pretty useful for us and we're going to start by actually getting what's called a nougat package so we're going to try to basically get a library that will let us use selenium from your project so this is going to be a little bit new for you but the same concept if you're like using python or something and you want to use someone's
Library same idea but it's called a nuget package in c-sharp so you're going to right click on your project in the on the right side of your screen okay right up here yep on the project though and then so the one right below that what is this one this is that's called the solution which can contain multiple projects guys so that one there and then most are halfway down there's something that says manage and you get packages you're going to want to click that nuget packages so not like the yogurt dessert that's right yeah it's not quite the same um almost as tasty when you see what we can get um you're going to want to go to browse it usually it starts on the installed tab but go to the browse Tab and you're going to want to look for selenium dot Webdriver okay
Boop cool and I don't know why it does this it does it on my computer too the top result is always scrolled at a view so just click somewhere like on any one of those items or just scroll up to the top there you go you want that one with almost 70 million downloads that's the okay that's great not the Chrome driver in this instance we're gonna I'm gonna get you to install that manually um okay to be totally honest I have not used the package rate below that that has the Chrome driver but we do have to download that separately so go ahead and press the install button there okay and it's just gonna download that for you can see in the console there it's installing you're going to want to press ok to that okay and it's done so you can go ahead
and close that that little window that's there um and basically just go back to your program.cs file perfect uh some stuff so now what we want to do is you're going to want to be able to make a selenium Webdriver and we do that by basically if you this is where like it might be a little bit more confusing because we have to access a package that you just downloaded you're going to have to refer to the right namespace for this so it's actually the the namespace is where the their code is located so it's just organized kind of like a folder structure is the best way to think about it so if you type using this is like gonna allow you to view um information from other packages or sources so using space and their namespace starts with open QA so if you start
typing open QA you'll see it pop up there sorry to interrupt you I see using is faded is this an instance where I should be capitalizing it uh nope that's right but you're going to want to basically uh hit dot now so that it'll complete with the right capitalization of opening eyes then selenium okay and so am I pressing tab yes perfect yep and then one more you're going to do is dot Chrome okay this is going to be automation for Chrome there you go great and then then a semicolon so what you should be able to do now is um on a new line you're going to want to type I Webdriver so capital I yeah there you go um the first one they have at the top there so this is actually what's called an interface so so it's just a small detail
it doesn't really matter but okay this is an interface what happened here when I oh when I uh accepted this iweb driver all of a sudden we have two lines now yeah it's just it's probably that iwebdriver is in the other namespace or just open qa.selenium okay but that's it's not a big deal so just like what we declared other variables I mentioned that iwebdriver is an interface but it's still on this line it's the type of the variable that you're making so you're going to want to say give it a name like Webdriver like it's suggesting so you can press tab on that part or type it and then equals but we're going to want to make a new instance of a Webdriver not setting it to null so their suggestion there is actually not right you're going to want to type new Chrome
driver so n e w space Chrome driver there it is so much autofill magic it's it's really um powerful and I don't know all the behind the scenes but I honestly in the past year or so I feel like the autocomplete has become so much more intelligent um it's really cool and actually it picks up on some of the patterns that you're typing so if I'm going through a lot of files and changing stuff it will actually be like hey I see that you just did this same thing like 10 times like let me keep applying it for you it's really cool so one more quick note this is not something I want to dive into a lot of detail but I want you to go to the beginning of this line and just type using again and just to make it a little bit
more clear for us go ahead and put a new line between uh this one you're on and yeah exactly so perfect just to briefly explain it when we make a new Chrome web driver it's going to start up some other processes it's going to actually have Chrome running and by putting it's confusing because it's the same word but they do very different things so I'm sorry but on line seven when you have using it just says that when we're done using this it goes out of scope we're going to release the resources for it so it's it's called a disposable object it doesn't matter but we just want to say that we're cleaning up resources um it takes some getting used to knowing which things you should dispose of this is one of them okay so that's a good point and I think for viewers
hopefully it'll make more sense doesn't make that much sense to me right now but we appreciate it yeah it's one of those like keep it in mind because as you go eventually it will catch up and you'll you'll understand a little bit better so this is a selenium Webdriver that we have access to now but it's not going to do anything quite yet we still have to configure this a little bit more but what we want to do is on a new line say Webdriver and I think you can just set the URL that you want to go to so webdriver.url I'm going to grab this exactly and because what you're what are you going to have to do with it quotes maybe a bracket you need quotes just quotes and a semicolon oh but you're um it missed the dot URL part um webdriver.url
I missed it I like what you said that's okay cool all right now now we have uh basically a chrome like a selenium Webdriver that should navigate to this URL we're still going to have to do some configuration but I want you to try running it just so we can see what happens let's do it I I think by default it should blow up but that's okay all right blowing up my computer now F5 and we see we see red the debug console Chrome has started successfully here it is this just started on its own it did not blow up and then it blew up no that's okay okay well it started and then it closed right do we know why do you know why it closed uh let's see um Chrome driver was started successfully listening exited with Code Zero I don't know so
Code Zero is generally a success code um usually a numbered code is an error but like zero is fine which sounds kind of weird but that's standard um it it exited because it didn't have anything to do it basically navigated to the web page you saw it come up in your Chrome browser but then when it finished it's like well I got the web like I navigated there I'm done so it finished so it did actually what we wanted I thought there was more configuration we had to do but it was able to find your Chrome install because you have Chrome installed on your computer so it just worked which is really cool um it's awesome there's a couple little tweaks we can do to the web driver itself so that you don't actually see it pop up on your screen like that so what
we can try doing is on the line above the Webdriver if you want to basically so you can close that exactly so between line six and seven if you add a new line okay and we're gonna make what's called Chrome options so you can either use VAR or you can type Chrome options up to you uh let's just go with try typing Chrome options here we go that works yeah so Chrome options and then give it a name so that's the type right you want to give it a name so let's just give it this is this is tough you're right naming it is challenging uh especially when you're put on the spot in a video right but yeah I love this so much so I'm just going to make this browser super bad name because it's very general we don't know what it does
but we're going to roll with it for now no worries so then after that you're going to type equals there you go I see it finished for you love that cool just tap on going yep and it's semicolon I'm getting better with the semicolons right it does it for you it's great so on now on the next line you're going to want to set an option on this so uh between lines seven and eight if you do the variable you called browser you can say browser dot add arguments that's right yep and then you can type so once you have that uh the string that you want to pass into there is just called headless so if you think about running a Chrome browser you could actually I'm pretty sure on the command line you could run Chrome and then pass in headless this is
the same idea so you're going to want to pass it in parentheses so you need an open parenthesis then your double quotes then a close parenthesis exactly you got it and we're gonna put headless just like this yep that's exactly it all right and semicolon cool Now quickly press F5 to run it again see what happens just to make sure we're not breaking stuff break anything for me so so far now the debug console comes up on the screen it's come up which I don't think we wanted yeah interesting so okay yeah I did I didn't expect that um maybe it actually is argument with a singular uh argument without an S anyway if that doesn't work not a big deal um this is just something that we would probably want to add when we're finalizing it because we don't want to have this browser
keep popping up okay still happens let's not worry about it for now but that's a you know a note that we want to come back and make sure we're not just like blasting out these Chrome browsers every time we run it fair okay okay headless to do yeah so now we basically we can navigate to this page but cool we want to be able to get the content from it so there's a tricky part here and the tricky part is on the web driver for selenium great nice comment that's good um on the web driver for salemium selenium not with an M my apologies there's something called a I think it's page source so on your very last line there like line 13 or 14. what you could do is type Webdriver Dot Page source okay there you go and that should have the HTML
in it so you don't want to assign to it you actually want to take that and put it into a variable so you do like string HTML equals webdriver.page source you lost me Nick so string other way not strong oh so you want to go to the beginning of the line say string got you okay oh and you pressed insert I think but you saw that yeah very very good observation let's zoom in actually speaking of which there you go yeah yeah insert is dangerous when you get like you type in really fast and it's on you just start deleting everything and you're like oh god what happened um but yeah so if you do like string and then give it a name like HTML for example I'll try to help on the naming on this one who knows so quick question uh yes see
I have no space between this equals is that just a you know stylistic doesn't really styling thing yeah generally styling would be to have a space but yeah it doesn't matter so no worries and then at the end of this line you need a semicolon not an equal sign but there you go so what's I'm we don't have to run this but I want to give you a heads up that this actually still won't work as we expect and I want to explain this for a brief moment it mostly can and should work but one of the tricky things about how selenium operates and how you access the page Source this will in fact get you HTML but depending when you ask for it it will give you different HTML so if you were to think about a timeline as soon as you start navigating
to the page if you said give me the HTML it'll give you what it currently has it's literally like your web browser right so if you were to wait you know 100 milliseconds 200 milliseconds do it again and ask it would give you different HTML because those scripts have to run and populate and actually build out the HTML that we see so this is where it gets a little bit tricky for us and why I mentioned that uh sort of completing this topic is maybe going to be a little bit more advanced than I kind of planned for but I can speed it up a little bit and I'm going to provide you some code on the side that you can drop in I'll walk us through it and the only reason I want to do this is because some of the syntax that we
have to use it's not obvious and it's not really fair for me to be like hey Jamal try typing this and you're like dude this doesn't make sense this took me like I had already been programming in c-sharp for years and when I started using what's called get ready for this an anonymous delegate right when I started using Anonymous delegates I was like this hurts my brain but once you get used to it it's like it's quite obvious so give me a sec I'm gonna I'm gonna send you some code Jamal and then you can drop this in let's do it basically what we've done is created a function that we can call and reuse and we need to use the the web driver so that's gets passed into here we want to provide it with what text we're looking for and we actually know
what that is based on some of the HTML we looked at earlier and then we're going to give it a timeout maybe something like we'll pass in like 10 seconds 15 seconds just so it has some time to load the page but what this will do is actually wait for that period of time up to 10 or 15 seconds whatever we specify and then when it's done it might be a little bit confusing but the the thing that it Returns on line 33 it's a a special um basically a special thing that I've created um it's called a record but it has the location at which it found the text and it actually gives us back the page Source at that point in time the reason why that's important is because if we go ask for the page Source again it might have changed which
is a little confusing but basically we want to say when you find this keep keep track of whatever page Source you had because in the next few milliseconds or whatever by the time we ask again it might look different okay so let's actually try calling this now so that we can use it will this comment that I put in here break our code nope absolutely not because it's doctor semicolon yep exactly open a parenthesis now we need to actually give it some arguments so you want to pass in Webdriver you just press tab yeah okay now the next thing we want to pass in is going to be some of the HTML we were looking at earlier and based on the HTML we were looking at if you scroll to the top of your file we had some text right so what we want to
do is just look for a piece of this so my thought process was like basically the part that says data Dash e2e follower count that seems unique enough we don't have to get super um picky but I think if you just grab that part we're just going to try and find where this is all right so you want to put that into a string you got it see you already knew oh we got double quotes that's right it that's right so it escaped the quotes that were in the HTML so that's good now the last parameter that we want um inside there still is a we need a timeout so you're going to want to do time span so if you type time span Dot from seconds there you go and then inside we're gonna so we're going to pass in a number into from
seconds so it's a method call so you need your parentheses exactly and I don't know 15 seconds is fine it should not take that long but cool and then a semicolon at the end okay and I added an extra bracket because this is two brackets inside looks good so one more thing that we want to do is we want to store the result of that into a variable of anytime you have an equal sign it's doing something on the right side right so for example line 11 is saying make a new Chrome driver so it does the right hand side of the equal sign and then assigns it to what's on the left line 13 is make a string assign it to the thing on the left line 15 is go run this method assign it to the thing on the left so that's the
pattern and you know now that it's like type variable name equals and then the something that you want to assign into there so I think we actually might be able to run this but we're going to want to um know that it's doing something so one way that we could actually do that is with a break point or you could just do console right line oh wow look at this it's telling us it's pretty much handing it to us on a platter yeah there you go just go ahead and do that love it so okay run it F5 and we're off okay it started it's not headless we're not trying to do that anymore I've got one more follower yeah here we go and it wasn't me so it it did it it worked this is The Tick Tock page all of the code from
all of the HTML it looks like yeah that forms NYX Tick Tock page so we know that that worked because it it would wait up to 15 seconds whatever you put in there and it returns something so we know that part worked so we can close that and move on to the next step and this is going to the other question that you had asked Jim all about um like you were hoping that when you type follower count there that was going to give you the number of followers so not quite it's just found where where that HTML is which is cool but we have one more step and I'm going to give you one more bit of code so I I used VAR here so uh just again to remind you VAR is just a short form to let the compiler figure out the
type for you um I like using it personally because it shortens the amount of text but um if you want to be verbose in your code and a lot of people do it can make it more readable then some people like putting the full type so just a heads up so you have some errors on your screen just because you and I named things differently so I called mine follower count result you called your's follower count so I you can I like the idea of maybe changing since we only have this in one spot and we have here's I'm gonna we're gonna adopt your no your new naming convention so for those following along we did change this to match Nick's new code that we've pasted in so let me let me walk you through it's only a couple lines which is great but um
we're going to be using something called a regular expression and actually Jamal based on some of your other experiences maybe not in C sharp but you might have actually encountered regular Expressions is that a familiar thing for you or no no they're not at all I've always seen this and I always thought it was uh something to do with your your uh registry on your computer so okay right over my head that's okay so a regular expression lets you define some syntax that matches patterns in strings and there's like a basically like a almost like a language or a syntax that you use to go look for stuff and if you take a look at the what's inside of that like equals new regex part you'll notice that it looks very similar to the the code that is on line 17 that string right that's
right but it's highlighting it for me right it's got a little bit of a different part though and you'll notice after because we're looking at an HTML tag right I'm talking with my hands here and using my my angled brackets but inside the inside the angle brackets I guess they should be the other way um it's going to hurt my arms you'll notice that it has um parentheses dot plus question mark and if you've never seen a regular expression before this is going to seem like it makes no sense all that I'm doing is basically saying I want to it's called a group I want to have a group for the value that's inside of this tag because that's where the number is that we want okay I'm going to say okay as if I understand I it it may become clearer lately later okay
so that's we we basically Define the regular expression that will find that thing for us that's line 21. line 22 actually performs the lookup for us and all that that line says is basically take the text from the page start at this index where we found that text originally go another 200 characters and somewhere in those 200 characters we should be able to match this pattern then the line after that is one more type conversion so followers should be an integer it should be a number when we're using regular Expressions it's all string based so because we want it to actually be a number we have to parse that string and get a number out of it there's an extra parameter in there called invariant culture and technically we may it might actually be the wrong thing to use here in general but I like
putting this in place because I have experiences where you're running software on different cultured computers so say running this on a computer in France uh where people use different separators for numbers we're only dealing with integers but there's commas sometimes instead of periods right it will literally blow up your program so I just make a habit of putting it in invariant culture everywhere um and I think we're good I'm going to dramatically press F5 boom let's see it and away we go Chrome driver's been started notice Chrome is being controlled by automated test software at the top what do you see yeah followers 141. it worked Nick basically pulled the number of followers using selenium from your Tick Tock profile the first step in our social media assistant and this is genuine excitement it worked my first C sharp program that Nick wrote oh man
you came up with the idea so we're gonna make a commit and in get the commit will be a local change and then after we're gonna push it up and actually this time we can try doing it in one Fell Swoop so you can see the three files it wants to add so you can actually press the the stage with three arrows I think it's three it's a little hard for me to see that one right there's definitely more than one stage all so so Nick before I press this have we pushed this any of this uh any of the solution to your review no so that's this is going to be the first time that happens okay so we're gonna press stage all so we've staged program.cs socialmedia assistant dot console and the sln that's right the vs community file that's right so what
you'll want to do here in the right side um in the bottom right is give it a title leave an empty line and then give it a little mini description despite what it says and you can press the commit and push button this time all right let's try it out away we go three files change 93 insertions a bunch of talk about those another bunch of stuff yeah we'll press ok to that so that did the commit locally now it's going to push it up to GitHub boom there you go if you press OK on that you'll notice in git extensions you can see Master as green and then origin Master they're both at the same commit An Origin Master is suggesting that the server that you just pushed to you now has your commit so that is now publicly available on the internet and
congratulations there's your first C sharp commit insert confetti fireworks yeah that's that's an editing job for you to make that like yeah there's gonna be something good here so Jamal I think this is a really good exercise like I mentioned I might do a follow-up to do a little bit of code cleanup so that when we go to move on from this we have some pieces that we can work with a little bit in a more organized structure because going through this is a bit of a prototyping phase which is totally fine that you wrote the code you ate the way you did I think that's a great way to do it we'll clean it up a little bit and then kind of work on the next phase where we maybe start to look at um basically wrapping this up in like an API that
we want to talk with that way when we consider the next plug-in for something like Facebook we can say hey look we also want it to have a number that comes back that we can print out so we can dive into that more and the next time but did that all seem useful for you Jamal super useful uh a little bit intimidating there's a lot of good stuff in here uh I think dealing with selenium in C sharp is a totally new adventure I think we're gonna have a lot of fun on the on the next step here yeah this is great so thanks for watching if this video was uh useful and insightful please like the video leave a comment below if you have any thoughts on this and of course subscribe if you want to see the rest of the journey here where
we get Jamal coding up the social media assistant so thanks again take care
Frequently Asked Questions
Why did we switch from using the HTTP client to Selenium for scraping TikTok?
We switched to Selenium because the TikTok website generates its HTML content dynamically after the page loads, which the HTTP client couldn't handle effectively. Selenium allows us to automate a browser and scrape the content as a user would, ensuring we get the fully rendered HTML.
What is a NuGet package and how do I install Selenium in my C# project?
A NuGet package is a library that you can add to your C# project to use external code. To install Selenium, you right-click on your project in Visual Studio, select 'Manage NuGet Packages', go to the 'Browse' tab, and search for 'Selenium.WebDriver'. Then, you can install it from there.
How can I extract the number of followers from the TikTok page using Selenium?
To extract the number of followers, we use Selenium to navigate to the TikTok page and retrieve the page source. Then, we apply a regular expression to find the specific HTML element that contains the follower count. This allows us to parse the number from the HTML.
These FAQs were generated by AI from the video transcript.