| dynamic_scraped | ||
| .gitignore | ||
| .python-version | ||
| advanced_beautify.py | ||
| advanced_scraper.py | ||
| analyze_code.py | ||
| beautify_code.py | ||
| dynamic_scraper.py | ||
| index.html | ||
| main.py | ||
| pyproject.toml | ||
| README.md | ||
| requirements.txt | ||
| simple_scraper.sh | ||
| uv.lock | ||
| website_scraper.py | ||
Website Reverse Engineering Tools
This collection contains tools to help you test how easily your website can be reverse engineered by extracting HTML, JavaScript, CSS, and other assets.
Tools Included
1. Simple Scraper (Shell Script) - simple_scraper.sh
A basic shell script using curl that works on any Unix-like system without additional dependencies.
Usage:
./simple_scraper.sh https://example.com [output_directory]
Features:
- Downloads main HTML
- Extracts and downloads external JavaScript files
- Extracts and downloads CSS files
- Attempts to extract inline JavaScript
- Generates basic report
2. Basic Python Scraper - website_scraper.py
A more robust Python script using requests and BeautifulSoup.
Setup:
pip install -r requirements.txt
Usage:
python website_scraper.py
Features:
- Interactive prompts for URL and output directory
- Better HTML parsing
- Extracts inline JavaScript blocks
- Handles relative URLs properly
- More robust error handling
3. Advanced Python Scraper - advanced_scraper.py
A comprehensive reverse engineering tool that analyzes the website in depth.
Setup:
pip install -r requirements.txt
Usage:
python advanced_scraper.py
Features:
- 🔍 Technology Detection: Identifies React, Vue, Angular, Next.js, etc.
- 📦 Bundle Analysis: Analyzes webpack bundles and modules
- 🗺️ Source Maps: Downloads and saves source maps for debugging
- 💄 Code Beautification: Beautifies minified JavaScript
- 🚨 Security Analysis: Detects potential API keys, secrets, tokens
- 🌐 API Discovery: Extracts API endpoints from JavaScript
- 📊 Comprehensive Reporting: JSON and human-readable reports
What These Tools Reveal
Assets Extracted
- HTML files - Complete page structure and content
- JavaScript files - All client-side logic and functionality
- CSS files - Styling and layout information
- Source maps - Original source code (if available)
- Inline scripts - JavaScript embedded in HTML
Security Information
- API endpoints - URLs your frontend calls
- Authentication tokens - Exposed keys or secrets
- Configuration data - Environment variables in client code
- Third-party dependencies - External libraries and versions
Technology Stack
- Frontend frameworks (React, Vue, Angular)
- Build tools (Webpack, Vite, Parcel)
- UI libraries (Bootstrap, Material-UI)
- Analytics and tracking codes
Example Output Structure
reverse_engineered/
├── html/
│ └── index.html
├── js/
│ ├── main.js
│ ├── vendor.js
│ ├── main_beautified.js
│ └── inline_script_0.js
├── css/
│ ├── styles.css
│ └── bootstrap.min.css
├── sourcemaps/
│ └── main.js.map
└── analysis/
├── report.json
└── summary.txt
Security Implications
This demonstrates what information is publicly accessible in web applications:
Client-Side Vulnerabilities
- Exposed API Keys: Hardcoded secrets in frontend code
- Business Logic: Complete application flow visible
- API Endpoints: All backend routes discoverable
- Authentication Logic: Client-side auth mechanisms revealed
Mitigation Strategies
- Never expose secrets in frontend code
- Use environment variables properly (server-side only)
- Minimize client-side business logic
- Implement proper API authentication
- Use code obfuscation for sensitive algorithms
- Regular security audits of client-side code
Legal and Ethical Use
⚠️ Important: These tools are for testing your own websites or with explicit permission. Always follow responsible disclosure practices and respect website terms of service.
Requirements
- Shell script: Basic Unix tools (
curl,grep,sed) - Python scripts: Python 3.6+ with dependencies in
requirements.txt
Installation
# Clone or download the scripts
git clone <repository-url>
# Install Python dependencies
pip install -r requirements.txt
# Make shell script executable
chmod +x simple_scraper.sh
Examples
Quick test with shell script:
./simple_scraper.sh https://your-website.com
Comprehensive analysis:
python advanced_scraper.py
# Enter URL when prompted: https://your-website.com
Analyze your local development site:
python advanced_scraper.py
# Enter URL: http://localhost:3000
Understanding the Results
After running these tools on your website, review:
- What sensitive information is exposed?
- Are there hardcoded API keys or secrets?
- How much business logic is client-side?
- What does the technology stack reveal?
- Are source maps accidentally deployed?
This analysis helps you understand your attack surface from a potential attacker's perspective.