How to grab an output result from website using selenium

How to grab an output result from website using selenium

Problem Description:

So there is this code that i want to try. if a website exists it outputs available domain names. i used this website www.eurodns.com/whois-search/app-domain-name

Website available

If the website does not exist, currently parked, or registered it says this.

not valid

The code that i’m thinking involves selenium and chrome driver input the text and search it up.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

cli = ['https://youtube.com', 'https://google.com', 'https://minecraft.net', 'https://something.odoo.com']
Exists = []

for i in cli:
    driver.get("https://www.eurodns.com/whois-search/app-domain-name")
    Name = driver.find_element(By.CSS_SELECTOR, "input[name='whoisDomainName']")
    Name.send_keys(cli)
    driver.find_element(By.XPATH,/html/body/div/div[3]/div/div[2]/form/div/div/div/button).click()

Is there a way where for example if website available, exist.append(cli), elif web not valid, print('Not valid') so that it can filter out a website that exists and the website that does not. i was thinking of using beautifulsoup to get outputs, but i’m not sure how to use it properly.

Thank you!

Solution – 1

There is no need to use other libraries.

Rather than using XPATHs like that, because it may change the structure of the page. Always try to search for elements by ID, if it exists associated with that particular element (which by their nature should be unique on the page) or by class name (if it appears to be unique) or by name attribute.


Some notes on the algorithm:

  1. We can visit the homepage once and then submit the url from time to time. Thus we save execution time.

  2. Whenever we submit a url, we simply need to verify that that url does not exist (or that it does)

  3. Name the variables in a more conversational/descriptive way.

  4. Pay attention to sending too many requests quickly to the site. It may block you. Perhaps this is not the right approach for your task? Are there no APIs that can be used for such services?

Your code becomes:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time

opts = Options()

# make web scraping 'invisible' if GUI is not required
opts.add_argument("--headless")
opts.add_argument('--no-sandbox')

user_agent = "user-agent=[Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36]"
opts.add_argument(user_agent)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=opts)

urls = ['https://youtube.com', 'https://google.com', 'https://minecraft.net', 'https://something.odoo.com']
exists = []

driver.get("https://www.eurodns.com/whois-search/app-domain-name")

for url in urls:

    # send url to textarea
    textarea = driver.find_element(By.NAME, "whoisDomainName")
    textarea.clear()  # make sure to clear textarea
    textarea.send_keys(url)

    # click 'WHOIS LOOKUP' button
    driver.find_element(By.ID, "submitBasic").click()

    # try to find error message (wait 3 sec)
    try:
        WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.CLASS_NAME, 'whoisSearchError')))
        print(f'URL {url} is not valid')
    except TimeoutException:
        print(f'URL {url} is valid')
        exists.append(url)

    time.sleep(30)  # wait 30 seconds to avoid '429 too many requests'


print(f"nURLs that exist:n", exists)

Output will be:

URL https://youtube.com is valid
URL https://google.com is valid
URL https://minecraft.net is valid
URL https://something.odoo.com is not valid

URLs that exist:
 ['https://youtube.com', 'https://google.com', 'https://minecraft.net']
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject