code

code Python verified Free Download devices Cross-platform

code Code Preview

Python
#!/usr/bin/env python3
"""
Email Extractor for HTML Files
Extract emails from HTML using BeautifulSoup4
"""
from bs4 import BeautifulSoup
import re
import requests

EMAIL_PATTERN = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

def extract_from_html(html_content):
    """Extract emails from HTML content"""
    soup = BeautifulSoup(html_content, 'html.parser')

    # Remove script and style tags
    for tag in soup(['script', 'style']):
        tag.decompose()

    text = soup.get_text()
    emails = set(re.findall(EMAIL_PATTERN, text))

    # Also check href attributes
    for link in soup.find_all('a', href=True):
        href = link['href']
        if href.startswith('mailto:'):
            email = href[7:].split('?')[0]
            emails.add(email)

    return list(emails)

def extract_from_url(url):
    """Extract emails from a web page"""
    response = requests.get(url, timeout=10)
    return extract_from_html(response.text)

def extract_from_file(filepath):
    """Extract emails from HTML file"""
    with open(filepath, 'r', encoding='utf-8') as f:
        return extract_from_html(f.read())

if __name__ == '__main__':
    import sys
    if len(sys.argv) > 1:
        emails = extract_from_file(sys.argv[1])
        for email in sorted(emails):
            print(email)

info About This Tool

The Email Extractor for HTML Files uses BeautifulSoup4 to parse HTML content and extract all email addresses. Perfect for scraping contact information from web pages or processing saved HTML files.

Key Features

  • BeautifulSoup4 Parser - Robust HTML parsing handles malformed markup
  • Mailto Link Detection - Extracts emails from href="mailto:" links
  • Script/Style Removal - Cleans content before extraction
  • URL Support - Fetch and extract from live web pages
  • Deduplication - Automatically removes duplicate emails

Supported Sources

  • Local HTML files (.html, .htm)
  • Live web page URLs
  • HTML strings/content

Requirements

  • Python 3.7+
  • BeautifulSoup4 (pip install beautifulsoup4)
  • Requests (pip install requests)

Tip: Use this tool responsibly. Respect robots.txt and rate limits when scraping websites.

download Download Script

Need Full Automation?

Try Postigo for automated email campaigns with AI personalization

rocket_launch Start Free Trial