StackOverflow - StackOverflow

BeautifulSoup not returning full html script from airbnb search page

I am trying to use BeautifulSoup and Selenium to scrape data from Airbnb. I want to gather each listing from this search page.

This is what I have so far:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

def scrape_page(page_url):
    
    driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
    driver = webdriver.Chrome(service = Service(driver_path))
    driver.get(page_url)
    wait = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'itemprop')))
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    driver.close()
    
    return soup

def extract_listing(page_url):
    
    page_soup = scrape_page(page_url)
    listings = page_soup.find_element(By.CLASS_NAME, "itemprop")
    return listings

page_url = "https://www.airbnb.com/s/Kyoto-Prefecture--Japan/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJYRsf-SB0_18ROJWxOMJ7Clk&query=Kyoto%20Prefecture%2C%20Japan&date_picker_type=flexible_dates&search_type=unknown"
#items = extract_listing(page_url)

#process items to get all information you need, just an example
#[{'name':items.select_one('[itemprop="name"]')['content'],
#  'url':items.select_one('[itemprop="url"]')['content']} 
# for i in items]

test = scrape_page(page_url)
test

It seems like scrape_page( ) returns the HTML script from the search page, but does not contain the full content. It does not include the information I need, which is this part of the HTML:

Image of HTML Script

I did some research and I saw that WebDriverWait might help, but I get a TimeoutException Error.

TimeoutException Error

The end goal is to get each listing's name and URL. The first 3 items in the resulting list should look similar to this:

[{'name': '✿Kyoto✿/Near Station & Bus/Temple/Twin Room(^^♪✿✿',
  'url': 'www.airbnb.com/rooms/50290730?adults=1&children=0&infants=0&check_in=2022-07-20&check_out=2022-07-27&previous_page_section_name=1000'},
 {'name': 'Stay in Kyoto central island',
  'url': 'www.airbnb.com/rooms/42780789?adults=1&children=0&infants=0&check_in=2022-06-21&check_out=2022-06-28&previous_page_section_name=1000'},
 {'name': '和楽庵【Single】100 Year old Machiya Guest House (1pax)',
  'url': 'www.airbnb.com/rooms/48645312?adults=1&children=0&infants=0&check_in=2022-07-27&check_out=2022-08-03&previous_page_section_name=1000'}]

I apologize ahead if I did not include enough information in this question, as this is my first time posting here. I would appreciate any help, thank you.

Was this helpful?

Have a different question?

Can't find the answer you're looking for? Submit your own question to our community.

🛎️ Get Weekly OTA Fixes

New answers, vendor issues, and updates — straight to your inbox.