Extract Data From Multiple Div Ul Li A Tag Title Value Text From Given URL python

  How to Extract Data From Multiple Div Ul Li A Tag Title Value Text From Given URL python code

To download the data from the provided HTML content, extracting the text within the <a> tags along with the corresponding age value, you can use a combination of web scraping techniques and parsing libraries in Python. Here's how you can do it using the BeautifulSoup library:

from bs4 import BeautifulSoup


# HTML content string (replace this with the actual HTML content you have)

html_content = """

<div class="div-col" style="column-width: 25em;">

<ul><li><a href="/wiki/Beverly_Aadland" title="Beverly Aadland">Beverly Aadland</a> 1942–2010<sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</a></sup></li>

<li><a href="/wiki/Mariann_Aalda" title="Mariann Aalda">Mariann Aalda</a> born <span style="display:none"> (<span class="bday">1948-05-07</span>) </span>May 7, 1948<span class="noprint ForceAgeToShow"> (age&nbsp;76)</span><sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup></li>

<li><a href="/wiki/Caroline_Aaron" title="Caroline Aaron">Caroline Aaron</a> born <span style="display:none"> (<span class="bday">1952-08-07</span>) </span>August 7, 1952<span class="noprint ForceAgeToShow"> (age&nbsp;71)</span><sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup></li>

<!-- more list items -->

</ul>

</div>

"""


# Parse the HTML content

soup = BeautifulSoup(html_content, 'html.parser')


# Find all <a> tags within <li> tags

a_tags = soup.find_all('a')


# Iterate over each <a> tag and extract text

for a_tag in a_tags:

    # Extract the text within <a> tag

    name = a_tag.text.strip()

    # Find the following sibling which contains age information

    age_sibling = a_tag.find_next_sibling('span', class_='noprint')

    if age_sibling:

        # Extract the age value

        age_value = age_sibling.text.strip().replace('(age&nbsp;', '').replace(')', '')

        print(name, age_value)


This script will parse the HTML content, extract the text within each <a> tag (name of the person), and find the corresponding age value. It then prints the name along with the age value for each person listed. You can adjust the HTML content variable html_content to contain the actual HTML content you want to scrape.


Post a Comment

Previous Post Next Post