Scrape Javascript heavy website on RaspberryPi3B+ using Python with Selenium

When I was trying to scrape a Javascript heavy website with my Raspberry using Python, I ran into some interesting issues that needed to be solved.

I found that modules like request,request_html, urlllib did not deliver the complete content with Javascripts websites containing shadow-dom (#shadowroot). When searching for solution i found some, like the use of PhantomJS or other discontinued modules.

The solution I found was using Chromedriver in headless mode. But the version I got my hands on kept throwing errors on the version of the browser.

After extensive searches I found the solution in:

1. Download the latest chromedriver from:

(get the arvmv7 version)

2. Install this using the instructions i found on:

  • mkdir /tmp
  • wget <url latest version arm7>
  • unzip <zip file>
  • mv chromedriver /usr/local/bin
  • sudo chmod +x /usr/local/bin/chromedriver
    sudo apt-get install libminizip1
    sudo apt-get install libwebpmux2
  • sudo apt-get install libgtk-3-0

In your code add these two arguments, when you start the driver:

3 Update the Chromium bowser

When trying to execute the script I still got the error on Chromium version.I was able to solve that using:

  • sudo apt-get install -y chromium-browser


now the script finally worked

The Python Script to get the page content

from selenium import webdriver
import time
from import By
from import WebDriverWait
from import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from import Options

# Define the site to be opened
site = “http://….”

# Set Chrome Options
chrome_options = Options()
# Open Chrome Headless
driver = webdriver.Chrome(chrome_options=chrome_options)

4. Analyze the content of the page

With the content of the page in driver it is possible to further decompose the page.

content1= driver.find_element_by_tag_name(‘…..’)
shadow_content1 = expand_shadow_element(content1)

To get access to the shadow element the function below needs to be used:

# function to expand a shadow element to useable content
def expand_shadow_element(element):
shadow_root = driver.execute_script(‘return arguments[0].shadowRoot’, element)
return shadow_root