folder_open
code
Python
verified
Free Download
devices
Cross-platform
code Code Preview
Python#!/usr/bin/env python3
"""
Email Extractor for Local Folders
Recursively scans directories and extracts emails
"""
import os
import re
from pathlib import Path
def scan_folder_recursive(root_path, file_types=None):
"""Recursively scan folder for emails"""
emails = set()
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
if file_types is None:
file_types = {'.txt', '.html', '.csv', '.log', '.xml'}
for root, dirs, files in os.walk(root_path):
dirs[:] = [d for d in dirs if not d.startswith('.')]
for filename in files:
if Path(filename).suffix.lower() not in file_types:
continue
file_path = os.path.join(root, filename)
try:
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
found = re.findall(pattern, content)
emails.update(e.lower() for e in found)
except (UnicodeDecodeError, PermissionError):
continue
return sorted(emails)
if __name__ == '__main__':
import sys
emails = scan_folder_recursive(sys.argv[1])
print(f"Found {len(emails)} unique emails")
info About This Tool
The Email Extractor for Folders recursively scans directory trees to find and extract email addresses from any supported file type. Perfect for legacy migrations, compliance audits, and data recovery projects.
Key Features
- Recursive Scanning - Unlimited folder depth traversal
- Multi-Format Support - TXT, HTML, CSV, LOG, PDF, XML files
- Include/Exclude Patterns - Glob patterns for filtering
- Real-Time Progress - Progress bar with ETA
- Duplicate Removal - Automatic deduplication
- Resume Support - Checkpoint for interrupted scans
Supported File Types
- Text files - .txt, .log, .md, .readme
- Markup - .html, .htm, .xml
- Data - .csv, .tsv, .json
- Documents - .pdf (requires pdfplumber)
Performance
- Speed - 100-500 files per minute
- Memory - ~50MB constant usage
- Depth - Unlimited folder levels
Requirements
- Python 3.6+
- No external dependencies for basic use
- pdfplumber for PDF support (optional)
Pro Tip: For very large folders (100K+ files), use --checkpoint 5000 to save progress every 5000 files for better performance.
download Download Script
Need Full Automation?
Try Postigo for automated email campaigns with AI personalization
rocket_launch Start Free Trial