code
code
Python
verified
Free Download
devices
Cross-platform
code Code Preview
Python#!/usr/bin/env python3
"""
Email Extractor for HTML Files
Extract emails from HTML using BeautifulSoup4
"""
from bs4 import BeautifulSoup
import re
import requests
EMAIL_PATTERN = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
def extract_from_html(html_content):
"""Extract emails from HTML content"""
soup = BeautifulSoup(html_content, 'html.parser')
# Remove script and style tags
for tag in soup(['script', 'style']):
tag.decompose()
text = soup.get_text()
emails = set(re.findall(EMAIL_PATTERN, text))
# Also check href attributes
for link in soup.find_all('a', href=True):
href = link['href']
if href.startswith('mailto:'):
email = href[7:].split('?')[0]
emails.add(email)
return list(emails)
def extract_from_url(url):
"""Extract emails from a web page"""
response = requests.get(url, timeout=10)
return extract_from_html(response.text)
def extract_from_file(filepath):
"""Extract emails from HTML file"""
with open(filepath, 'r', encoding='utf-8') as f:
return extract_from_html(f.read())
if __name__ == '__main__':
import sys
if len(sys.argv) > 1:
emails = extract_from_file(sys.argv[1])
for email in sorted(emails):
print(email)
info About This Tool
The Email Extractor for HTML Files uses BeautifulSoup4 to parse HTML content and extract all email addresses. Perfect for scraping contact information from web pages or processing saved HTML files.
Key Features
- BeautifulSoup4 Parser - Robust HTML parsing handles malformed markup
- Mailto Link Detection - Extracts emails from href="mailto:" links
- Script/Style Removal - Cleans content before extraction
- URL Support - Fetch and extract from live web pages
- Deduplication - Automatically removes duplicate emails
Supported Sources
- Local HTML files (.html, .htm)
- Live web page URLs
- HTML strings/content
Requirements
- Python 3.7+
- BeautifulSoup4 (
pip install beautifulsoup4) - Requests (
pip install requests)
Tip: Use this tool responsibly. Respect robots.txt and rate limits when scraping websites.
download Download Script
Need Full Automation?
Try Postigo for automated email campaigns with AI personalization
rocket_launch Start Free Trial