How to Extract Data From Multiple Div Ul Li A Tag Title Value Text From Given URL python code
To download the data from the provided HTML content, extracting the text within the <a>
tags along with the corresponding age value, you can use a combination of web scraping techniques and parsing libraries in Python. Here's how you can do it using the BeautifulSoup library:
from bs4 import BeautifulSoup
# HTML content string (replace this with the actual HTML content you have)
html_content = """
<div class="div-col" style="column-width: 25em;">
<ul><li><a href="/wiki/Beverly_Aadland" title="Beverly Aadland">Beverly Aadland</a> 1942–2010<sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</a></sup></li>
<li><a href="/wiki/Mariann_Aalda" title="Mariann Aalda">Mariann Aalda</a> born <span style="display:none"> (<span class="bday">1948-05-07</span>) </span>May 7, 1948<span class="noprint ForceAgeToShow"> (age 76)</span><sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup></li>
<li><a href="/wiki/Caroline_Aaron" title="Caroline Aaron">Caroline Aaron</a> born <span style="display:none"> (<span class="bday">1952-08-07</span>) </span>August 7, 1952<span class="noprint ForceAgeToShow"> (age 71)</span><sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup></li>
<!-- more list items -->
</ul>
</div>
"""
# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')
# Find all <a> tags within <li> tags
a_tags = soup.find_all('a')
# Iterate over each <a> tag and extract text
for a_tag in a_tags:
# Extract the text within <a> tag
name = a_tag.text.strip()
# Find the following sibling which contains age information
age_sibling = a_tag.find_next_sibling('span', class_='noprint')
if age_sibling:
# Extract the age value
age_value = age_sibling.text.strip().replace('(age ', '').replace(')', '')
print(name, age_value)
This script will parse the HTML content, extract the text within each <a>
tag (name of the person), and find the corresponding age value. It then prints the name along with the age value for each person listed. You can adjust the HTML content variable html_content
to contain the actual HTML content you want to scrape.
Post a Comment