How to read a webpage table using requests-html?

How to read a webpage table using requests-html?

Problem Description:

I am new to python and am trying to parse a table from the given website into a PANDAS DATAFRAME.

I am using modules requests-html, requests, and beautifulSoup.

Here is the website, I would like to gather the table from:
https://www.aamc.org/data-reports/workforce/interactive-data/active-physicians-largest-specialties-2019

MWE

import pandas as pd
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

url = 'https://www.aamc.org/data-reports/workforce/interactive-data/active-physicians-largest-specialties-2019'

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()

soup = BeautifulSoup(page, 'html.parser')

# soup.find_all('table')
pages = soup.find('div', {'class': 'data-table-wrapper'})
df = pd.read_html(pages) # PROBLEM: somehow this table has no data
df.head()

Another attempt:

import requests_html

sess = requests_html.HTMLSession()
res = sess.get(url)
page = res.html
import requests_html

sess = requests_html.HTMLSession()
res = sess.get(url)
page_html = res.html

df = pd.read_html(page_html.raw_html)
df # This gives dataframe, but has no Values

The screenshot is given below:
enter image description here

Solution – 1

The data you see on the page is embedded inside <script> in form of JavaScript. You can use selenium or parse the data manually from the page. I’m using js2py module to decode the data:

import re
import js2py
import requests
import pandas as pd


url = "https://www.aamc.org/data-reports/workforce/interactive-data/active-physicians-largest-specialties-2019"
html_doc = requests.get(url).text

data = re.search(r"(?s)$scope.schools = (.*?);", html_doc).group(1)
data = [{k: v.strip() for k, v in d.items()} for d in js2py.eval_js(data)]

columns = {
    "specialty": "Specialty",
    "one": "Total Active Physicians",
    "two": "Patient Care",
    "three": "Teaching",
    "four": "Research",
    "five": "Other",
}

df = pd.DataFrame(data).rename(columns=columns)
print(df[list(columns.values())].to_markdown(index=False))

Prints:

SpecialtyTotal Active PhysiciansPatient CareTeachingResearchOther
All Specialties938,980816,92212,47512,63296,951
Allergy and Immunology4,9004,22154268357
Anatomic/Clinical Pathology12,6438,7113855203,027
Anesthesiology42,26739,3775401802,170
Cardiovascular Disease22,52120,4302995731,219
Child and Adolescent Psychiatry9,7878,670134109874
Critical Care Medicine13,09311,1461781111,658
Dermatology12,51611,74710098571
Emergency Medicine45,20241,466469943,173
Endocrinology, Diabetes, and Metabolism7,9946,439155533867
Family Medicine/General Practice118,198108,9841,6142517,349
Gastroenterology15,46914,007186289987
General Surgery25,56421,9492591373,219
Geriatric Medicine5,9745,029105106734
Hematology and Oncology16,27413,5062508711,647
Infectious Disease9,6877,4482877011,251
Internal Medicine120,171105,7361,4091,44711,579
Internal Medicine/Pediatrics5,5094,9247428483
Interventional Cardiology4,4073,956226423
Neonatal-Perinatal Medicine5,9195,008135175601
Nephrology11,4079,964140316987
Neurological Surgery5,7485,2465232418
Neurology14,14611,8962456291,376
Neuroradiology4,0893,496637523
Obstetrics and Gynecology42,72039,8254991952,201
Ophthalmology19,31217,8591471261,180
Orthopedic Surgery19,06918,09712057795
Otolaryngology9,7779,1409023524
Pain Medicine and Pain Management5,8715,459389365
Pediatric Anesthesiology (Anesthesiology)2,5712,127474393
Pediatric Cardiology2,9662,4147464414
Pediatric Critical Care Medicine2,6392,1187820423
Pediatric Hematology/Oncology3,0792,25177210541
Pediatrics60,61854,7648446634,347
Physical Medicine and Rehabilitation9,7678,9206938740
Plastic Surgery7,3176,9385520304
Preventive Medicine6,6754,2181464571,854
Psychiatry38,79233,7765627353,719
Pulmonary Disease5,1064,490138296182
Radiation Oncology5,3064,8545633363
Radiology and Diagnostic Radiology28,02524,7484231532,701
Rheumatology6,2655,333108255569
Sports Medicine2,8972,624204249
Sports Medicine (Orthopedic Surgery)2,9032,7379157
Thoracic Surgery4,4794,1054540289
Urology10,2019,5937639493
Vascular and Interventional Radiology3,8773,425273422
Vascular Surgery3,9433,5864813296
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject