Web scraping and parsing with Beautiful Soup & Python Introduction p.1
![*](https://i0.wp.com/allprowebdesigns.com/wp-content/uploads/2024/04/1712369325_hqdefault.jpg?resize=480%2C360&ssl=1)
Video Title: Web scraping and parsing with Beautiful Soup & Python Introduction p.1
what is going on everybody and welcome to a quick mini series on web scraping with beautiful soup 4 so by no means are we going to cover everything you can do with beautiful soup but we’re going to cover some of the more common use cases before we go any further I do encourage you to check out the documentation for beautiful soup if what we cover here isn’t quite what you were hoping for you can do a whole lot of stuff with this so we’re really just going to be scratching the surface here now moving this aside to get beautiful soup the first thing you’ll need to do is pip install ba eautiful soup 4 and I already have it so that’s fine also you may need to do pip install LX ml and then we have that already as well so make sure you have those 2 things as long as you do you’re good to go so to begin I have set up a sample page on Python programming net for us to work with I’ll put again the link in the description if I forget someone remind me and also sample code as always posted on Python programming that Annette so this is just a quick sample page with some text list a table picture and the Zen of Python so moving this over that’s the link that we’re going to use again just go to the description it’s also just parse me make horseface you see what I did there ok so anyway no one’s letting import vs4 as BS this is because basically this way you can kind of look at old code and over time this will be still useful because in the past it was ps3 anyway yes now we’re also going to import URL Lib dot request just so we can actually make a request so now the source or sauce as the in-crowd calls it is going to be URL Lib dot request dot URL open and we want to open that link so for me that’s Python programming done it / parse me make parse base gonna get sued for trademark okay dot read okay so that’s the source now anytime you have source code so like when when you leave you that page basically it’s just returning the source code so all this and not necessarily as pretty as as this is so so that’s the source code and then we can turn it into a beautiful soup object by saying flowing soup equals BS capital B ei eautiful capital S soup what do we pass through the sauce and you can just say that but we’ll also go ahead and be explicit about the parser that we want to use we want to use that L XML there’s also like html5 Lib that you could use but L XML we’re going to use so cool so first let’s go ahead and like let’s just look like what are we dealing with right now so save and run that so as you can see it’s just it’s just the HTML basically as we would expect it what we’re looking at it now and it doesn’t really look too much different than if we were to print the source so sauce okay so this one’s a little a little Messier we’ve got the tabs in the new lines actually so anyway but it looks very similar I guess brilli is like this right so the soup looks more like what we’re seeing in the browser anyway it turns it into a beautiful soup object which then lets us actually interact with the soup that’s the whole point of doing that so I’m going to stop printing the sauce so one thing we can do is we could do something really simple like print soup title and this gives us the title of the document so in this case the title was Python programming tutorials the other thing we can do like if you want like soup title you can get like the name from the tag so that’s the title tag no doubt but you can also do like string and get the string version there so python programming tutorials so and we’re going to talk a little bit text should also and we’ll talk about this in the next tutorial I think yeah so those who will work and we’ll explain in the next tutorial what the difference is so there’s that and then we can also do the following so we can get like specific values like print su P now this is just going to print the first kind of paragraph element but let’s do it so you get that first okay it’s a paragraph class introduction oh hello this is a bubble up okay but we can also do something like this we can say print soup dot bind all and then P for paragraph tags so now we should find all of the paragraph tags so we did so we found a whole bunch of extra information and really this page doesn’t really have too many paragraph tags so it’s just like a sample page but if you did this on like a news website or something like that you you get substantially more more information I’m trying to find my mouse there it is okay so there’s that and then also you can instead you could do something like this let’s actually will come into space there that out so instead we could say for paragraph in soup dot find all P so for each paragraph we could do something like this we could print the paragraph like this we’ll just that for now but in most cases you actually don’t want the tag right so you know you could use regular expressions or something and get rid of it just like you could use regular expressions to parse all this stuff but in this case we can do dot string for example and you’ll see that we actually got just a couple strings but we’re missing some things and appears that they’re being returned as none this is because dot string returns a navigable string as opposed to text which is just going to return regular ol Unicode so this role were actually returned everything that you wanted but it’s not going to like basically the navigable string will only work if you don’t have um child tags inside of it so these first two had child tags this one just didn’t have any children tags so in by child tags just in case anybody is not following that if we go up to here oh hello this is a wonderful page you can see a wonderful look so it’s slightly different you might not be able to tell on video but it is and I can just do search real quick and we can see that actually this has it’s like basically this paragraph tag has a child tag within it that is a span it also has some strong tags anyway in most cases you’re really just going to want not Texas text maybe Texas nothing against Texas okay so yeah so that’s your paragraph text now there is one more thing that you can do like when you have an entire document rather than just going through and finding the paragraph tags and then getting the text in between them you can do something like print soup dot get text and while we’re doing that let’s just comment these lines out okay so here you get basically all of the text that’s found on this and you’ll see that also we picked up these n of Python text as well because remember that while you probably don’t remember because she’s going to break the page but but on this page the Zenna python is actually not in paragraph tags it’s encased in pre tags took a while to zoom anyway it’s encased in pre tags not paragraph tags so there are going to be times where you’re not going to get the paragraph data I’ve even seen some websites that don’t even use paragraph tags they use span tags so so don’t think that you’re always going to get away with like parsing for paragraphs okay so there’s that aside from like finding all the paragraphs another thing that you can do is find all of the links so for example you might say for URL in soup in souped-up find all and we’re going to look for all each or a tags and then we can say print well to start with a print URL and let’s just print that out just to see what we got okay so we get the entire tag as we’ve been seeing basically every time and so you might think okay here’s what I’ll do I’ll do text cuz I’m smart cookie and then you find out oh shoot no that doesn’t work because that’s the text of the tag so instead what you’ll do is you RL get and we want to get the reference so it’s ref there boom now you have your actual links okay so here you go okay so those are just some some quick basics of beautifulsoup that might actually be enough for you at this point but if it’s not in the next tutorial we’re going to talk a little bit more about some more slightly advanced features of beautifulsoup and kind of navigating around because especially if paragraph data is not the only thing that you want so stay tuned for that if you have any questions comments concerns whatever leaving below otherwise see in the next tutorial
-
Sale!
Wireless WIFI Repeater Extender Amplifier Booster 300Mbps
$29.99$14.99 Add to cartWireless WIFI Repeater Extender Amplifier Booster 300Mbps
Categories: Electronics, Wi-Fi Router, Wireless Wi-Fi Extender Tags: 300Mbps, 802.11N, Amplifier, Booster, Extender, mobile wi-fi booster, Remote, WIFI, Wireless, Wireless WIFI, Wireless WIFI Repeater, Wireless WIFI Repeater Extender, Wireless WIFI Repeater Extender Amplifier, Wireless WIFI Repeater Extender Amplifier Booster, Wireless WIFI Repeater Extender Amplifier Booster 300Mbps$29.99$14.99 -
Sale!
Full RGB Light Design Gaming Headset Headphones with Mic
$24.99$14.99 Add to cartFull RGB Light Design Gaming Headset Headphones with Mic
Categories: Electronics, Gaming, Gaming Headsets Tags: Design, Full, Full RGB Light Design Gaming Headset, Full RGB Light Design Gaming Headset Headphones, Full RGB Light Design Gaming Headset Headphones with Mic, Gamer, Gaming, Gaming Headset Headphones, gaming headset wireless, Headphone, Headphones, Headset, Light, Mic, Package, RGB$24.99$14.99 -
Sale!
Wireless BlueTooth Multi-Device Keyboard Mouse Combo
$39.99$19.99 Add to cartWireless BlueTooth Multi-Device Keyboard Mouse Combo
Categories: Electronics, Gaming, Gaming Keyboards, Keyboard Mouse Combos Tags: Combo, Keyboard, keyboard mouse combos, Mouse, MultiDevice, Set, WireKeyboard Mouse Combo, Wireless, Wireless BlueTooth Keyboard Mouse Combo, Wireless BlueTooth Keyboard Mouse Combos, Wireless BlueTooth Multi-Device Keyboard Mouse Combo, Wireless BlueTooth Multi-Device Keyboard Mouse Combos$39.99$19.99 -
Sale!
High Back Leather Executive Adjustable Swivel Gaming Chair with Headrest and Lumbar
$199.99$139.99 Add to cartHigh Back Leather Executive Adjustable Swivel Gaming Chair with Headrest and Lumbar
Categories: Gaming, Gaming Chairs Tags: Adjustable, Chair, computer chairs, Desk, Executive, Gaming, Girl, Headrest, High, High Back Leather Executive Adjustable Swivel Gaming Chair, High Back Leather Executive Adjustable Swivel Gaming Chair with Headrest, High Back Leather Executive Adjustable Swivel Gaming Chair with Headrest and Lumbar, High Back Leather Executive Adjustable Swivel Gaming Chairs, Leather, Lumbar, Office, Racing, Swivel$199.99$139.99 -
Sale!
Professional LED Light Wired Gaming Headphones with Noise Cancelling Microphone
$29.99$19.99 Select optionsProfessional LED Light Wired Gaming Headphones with Noise Cancelling Microphone
SKU: N/A Categories: Electronics, Gaming, Gaming Headsets Tags: Cancelling, Gaming, Gaming Headphones with Noise Cancelling Microphone, gaming headset, Headphones, Headset, LED, Light, Mic, Microphone, Noise, Professional, Professional LED Light Wired Gaming Headphones, Professional LED Light Wired Gaming Headphones with Noise Cancelling Microphone, Wired, Wired Gaming Headphones, Wired Gaming Headphones with Noise Cancelling Microphone$29.99$19.99 -
Sale!
Gaming Desk with LED Lights USB Power Outlets and Charging Ports
$349.99$249.99 Select optionsGaming Desk with LED Lights USB Power Outlets and Charging Ports
SKU: N/A Categories: Computer Desk, Gaming, Gaming Desk Tags: and Charging Ports, Charging, Desk, Desks, Gaming, gaming desk with led lights, Gaming Desks with LED Lights, Home, LED, Lights, Monitor, Office, Outlets, Port, Power, Room, Stand, USB, USB Power Outlets, White, Workstation$349.99$249.99 -
Sale!
Wired Mixed Backlit Anti-Ghosting Gaming Keyboard
$99.99$79.99 Add to cartWired Mixed Backlit Anti-Ghosting Gaming Keyboard
Categories: Electronics, Gaming, Gaming Keyboards Tags: Antighosting, Backlit, Blue, brown, Gaming, Gaming Keyboard, gaming keyboards, gaming keyboards and mouse, Keyboard, Laptop, Switch, Wired, Wired Mixed Backlit Anti-Ghosting Gaming Keyboard, Wired Mixed Backlit Anti-Ghosting Gaming Keyboards, Wired Mixed Backlit Gaming Keyboard$99.99$79.99 -
Sale!
Wireless Bluetooth 5.3 ANC Noise Cancellation Hi-Res Over the Ear Headphones Headset
$119.99$59.99 Add to cartWireless Bluetooth 5.3 ANC Noise Cancellation Hi-Res Over the Ear Headphones Headset
Categories: Electronics, Gaming, Gaming Headsets Tags: 5.3 ANC Noise Cancellation Hi-Res Over the Ear Headphones Headset, ANC, Audio, Bluetooth, Cancellation, Ear, Earphone, gaming headset, Headphones, Headset, Hi-Res Over the Ear Headphones Headset, HiRes, Noise, Wireless, Wireless Bluetooth 5.3 ANC Noise Cancellation Hi-Res Headphones, Wireless Bluetooth 5.3 ANC Noise Cancellation Hi-Res Over the Ear Headphones Headset, Wireless Bluetooth 5.3 ANC Noise Cancellation Hi-Res Over the Ear Headphones Headsets$119.99$59.99 -
Sale!
Wired Sports Gaming Headset Earbuds with Microphone
$19.99$9.99 Select optionsWired Sports Gaming Headset Earbuds with Microphone
SKU: N/A Categories: Gaming, Gaming Headsets Tags: Accessories, Earbud, Earphone, Earphones, Gaming, gaming headset with microphone, Headphones, Headset, IOS, Microphone, Sports, Wired, Wired Sports Gaming Headset Earbuds, Wired Sports Gaming Headset Earbuds with Microphone, Wired Sports Headset Earbuds$19.99$9.99 -
Sale!
150W Universal Multi USB Fast Charger 16 Port MAX Charging Station
$49.99$29.99 Add to cart150W Universal Multi USB Fast Charger 16 Port MAX Charging Station
Categories: Charging Stations, Electronics Tags: 150W, 150W Charging Station, 150W Universal Multi USB Charging Station, 150W Universal Multi USB Fast Charger 16 Port MAX Charging Station, 150W Universal Multi USB Fast Charger 16 Port MAX Charging Stations, 150W Universal Multi USB MAX Charging Station, 16 Port MAX Charging Station, 3.5A, Charger, Charging, Fast, laptop charging stations, Max, Multi, Port, Stand, Station, Universal, USB$49.99$29.99
im running into problems opening the file
7:14 got nothing against Texas
Also guys if you're using Python3+, you'll probably need to use pip3 (which is designed for python3) 'pip3 install beautifulsoup4'
i keep getting the error NameError: name 'BeautifulSoup' is not defined
bs4 content returned different than page im viewing for example @t. any ideas
❤🍾
Dear sentdex do you think it is possible to scrape the homepage of a newspaper site from the past? In other words is there a html archive of every website which I could scrape with bs4?
thanks man keep up the good work!
We provide high quality data scraping service, If the data exists anywhere we can get it for you! All we need from you is the data source, fields to extract, and desired output format. We can deliver extensive data outputs in a short time frame.
Contact us at: https://aitomation.com/data-web-scraping/
I generally think that you are a good
@1:28 what does he say & mean? "Its also just parse me just parce face" — not tracking at all
Además de usar python para el web scraping, también puede usar herramientas automatizadas para scrape datos de website, al igual que Octoaprse http://octoparse.es/ , solo necesita hacer clic y seleccionar datos web para establecer reglas de web scraping
Great series! Very informational
sauce = … this guy must be an anime weeb
Everytime i enter import bs4 as bs i get the error ModuleNotFoundError: No module named 'bs4'
. I already installed "pip install beautifulsoup4" and "pip install beautifulsoup4".. did i miss a step here?
I can't believe that every time I need some additional learning videos, I would always end up in your channel. You are AMAZING! Thank you so much!
Thanks for great content… I'm trying to scrap names,prices,links,rating from Daraz.pk website but every thing properly scraped exept rating like 120 links scraped but rating missed in between or end as 85 or 80 scraped…any solution ?
Really great and informative video thank you!
Is there something on the error 403 forbidden? Did I do something wrong or can I just not use the site?
Traceback (most recent call last):
File "C:/Users/Richmond23/Desktop/bs4.py", line 1, in <module>
from bs4 import BeautifulSoup
File "C:/Users/Richmond23/Desktopbs4.py", line 1, in <module>
from bs4 import BeautifulSoup
ImportError: cannot import name 'BeautifulSoup' from partially initialized module 'bs4' (most likely due to a circular import) (C:/Users/Richmond23/Desktopbs4.py)
having a trouble with this
Wow, did not realize this is Snowden's channel! Thank you, I admire you! Anyways, I am total beginner. Just wondering what setting allow you to show the outputs in a different screen from the the screen you put commands? I use Mac terminal. Sorry, this must sound like an idiots question.
If anyone is getting an error related to there not being a parser library installed, run the program in cmd and it will work.
Thank you very much you helped me, i saw many tutorials about scraping but u are explaining the most important things that i really need and i think most of begginer scrapers too 😀 <3
You are friggin hilarous. Parsememcparseface, sauce "as the in kids say" 🙂
Ничего не понято,Но очень интересно
How did you add ## in front of multiple lines of code at 7:42? Thanks in advance
I try to install lxml but got error
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
Love this people
quick and straight to the point! ty
which program do you use to type "import bs4 as bs"? it doesnt' look like a python shell to me
Someone pay this young lad!!
AttributeError: module 'bs4' has no attribute 'BeautifulSoup'
Once again…YOU ARE da BOWSS! You are the Sal Khan of Python! Keep up the great work!
at 7:40 mins how did you comment on multiple lines at the same time? I mean , whats shortcut?
I think you would really love the song "The Sauce" by Eminem. Here i'll link you! https://www.youtube.com/watch?v=XW27N-Ks7W8
is it to be running python 2 or python 3
I copied his code exactly and I keep getting "invalid character in identifier", what could the problem be?
If you are having a problem parsing the data with lxml or if you have been using selenium to navigate to the part of the page you want to parse with bs this is what i did… fyi I left the selenium code out
from urllib import request
import bs4 as bs
#I had already navigated to the desired page with selenium. This command saves the current page and passes it to bs
pageLocation = driver.current_url
sauce = request.urlopen(pageLocation).read()
soup = bs.BeautifulSoup(sauce, "html.parser")
print(soup.title)
pip doesn't work….
While running the program it gives an error to install a parser library
unicodeEncodeError: 'ascii' codec can't encode character 'u1d90' in position 6758: ordinal not in range(128)
does anybody know what the problem is?
Hi Harris. Hope you are doing great. Of course you will, cos you are making our lives better. 🙂 I've got a question. I can't make my mind on choosing between Selenium and Beatiful Soup. Which one should I go for? Do you have tutorials on Selenium Webdriver ? By the way. Yes your really ressemble Edward Snowden. 🙂 Here's my advanced Wishes to you. Merry Christmas and a Happy New Year !!
Thanks a lot! Really helpful video:)
I got error as no module named 'bs4'
Hi, thank you for the tutorial. It really helps.
Although I could not install PyQt4, I figured out how to make it work on PyQt5.
Hi,How we can scrap and pars google search?
How would I find a chunk of text based on its class? for example the tag `div class="meaning"`?