code Code Preview

Python

#!/usr/bin/env python3
"""
Email Extractor for HTML Files
Extract emails from HTML using BeautifulSoup4
"""
from bs4 import BeautifulSoup
import re
import requests

EMAIL_PATTERN = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

def extract_from_html(html_content):
    """Extract emails from HTML content"""
    soup = BeautifulSoup(html_content, 'html.parser')

    # Remove script and style tags
    for tag in soup(['script', 'style']):
        tag.decompose()

    text = soup.get_text()
    emails = set(re.findall(EMAIL_PATTERN, text))

    # Also check href attributes
    for link in soup.find_all('a', href=True):
        href = link['href']
        if href.startswith('mailto:'):
            email = href[7:].split('?')[0]
            emails.add(email)

    return list(emails)

def extract_from_url(url):
    """Extract emails from a web page"""
    response = requests.get(url, timeout=10)
    return extract_from_html(response.text)

def extract_from_file(filepath):
    """Extract emails from HTML file"""
    with open(filepath, 'r', encoding='utf-8') as f:
        return extract_from_html(f.read())

if __name__ == '__main__':
    import sys
    if len(sys.argv) > 1:
        emails = extract_from_file(sys.argv[1])
        for email in sorted(emails):
            print(email)

info About This Tool

The Email Extractor for HTML Files uses BeautifulSoup4 to parse HTML content and extract all email addresses. Perfect for scraping contact information from web pages or processing saved HTML files.

Key Features

BeautifulSoup4 Parser - Robust HTML parsing handles malformed markup
Mailto Link Detection - Extracts emails from href="mailto:" links
Script/Style Removal - Cleans content before extraction
URL Support - Fetch and extract from live web pages
Deduplication - Automatically removes duplicate emails

Supported Sources

Local HTML files (.html, .htm)
Live web page URLs
HTML strings/content

Requirements

Python 3.7+
BeautifulSoup4 (pip install beautifulsoup4)
Requests (pip install requests)

Tip: Use this tool responsibly. Respect robots.txt and rate limits when scraping websites.

download Download Script

Need Full Automation?

Try Postigo for automated email campaigns with AI personalization

rocket_launch Start Free Trial

Related Tools

description

Text Extractor

Extract from plain text

table_chart

CSV Extractor

Extract from CSV files

cleaning_services

Email Deduplicator

Remove duplicates

Scripts

Parsers

Validators

SMTP Settings

SMTP Errors

Server Limits