StackOverflow - StackOverflow
BeautifulSoup not returning full html script from airbnb search page
I am trying to use BeautifulSoup and Selenium to scrape data from Airbnb. I want to gather each listing from this search page.
This is what I have so far:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
def scrape_page(page_url):
driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(service = Service(driver_path))
driver.get(page_url)
wait = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'itemprop')))
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.close()
return soup
def extract_listing(page_url):
page_soup = scrape_page(page_url)
listings = page_soup.find_element(By.CLASS_NAME, "itemprop")
return listings
page_url = "https://www.airbnb.com/s/Kyoto-Prefecture--Japan/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJYRsf-SB0_18ROJWxOMJ7Clk&query=Kyoto%20Prefecture%2C%20Japan&date_picker_type=flexible_dates&search_type=unknown"
#items = extract_listing(page_url)
#process items to get all information you need, just an example
#[{'name':items.select_one('[itemprop="name"]')['content'],
# 'url':items.select_one('[itemprop="url"]')['content']}
# for i in items]
test = scrape_page(page_url)
test
It seems like scrape_page( ) returns the HTML script from the search page, but does not contain the full content. It does not include the information I need, which is this part of the HTML:
I did some research and I saw that WebDriverWait might help, but I get a TimeoutException Error.
The end goal is to get each listing's name and URL. The first 3 items in the resulting list should look similar to this:
[{'name': '✿Kyoto✿/Near Station & Bus/Temple/Twin Room(^^♪✿✿',
'url': 'www.airbnb.com/rooms/50290730?adults=1&children=0&infants=0&check_in=2022-07-20&check_out=2022-07-27&previous_page_section_name=1000'},
{'name': 'Stay in Kyoto central island',
'url': 'www.airbnb.com/rooms/42780789?adults=1&children=0&infants=0&check_in=2022-06-21&check_out=2022-06-28&previous_page_section_name=1000'},
{'name': '和楽庵【Single】100 Year old Machiya Guest House (1pax)',
'url': 'www.airbnb.com/rooms/48645312?adults=1&children=0&infants=0&check_in=2022-07-27&check_out=2022-08-03&previous_page_section_name=1000'}]
I apologize ahead if I did not include enough information in this question, as this is my first time posting here. I would appreciate any help, thank you.
Was this helpful?
Related Articles
Have a different question?
Can't find the answer you're looking for? Submit your own question to our community.
🛎️ Get Weekly OTA Fixes
New answers, vendor issues, and updates — straight to your inbox.