No description
Find a file
2026-03-07 21:25:34 +11:00
dynamic_scraped last commit before abandoning 2026-03-07 21:25:34 +11:00
.gitignore last commit before abandoning 2026-03-07 21:25:34 +11:00
.python-version last commit before abandoning 2026-03-07 21:25:34 +11:00
advanced_beautify.py last commit before abandoning 2026-03-07 21:25:34 +11:00
advanced_scraper.py last commit before abandoning 2026-03-07 21:25:34 +11:00
analyze_code.py last commit before abandoning 2026-03-07 21:25:34 +11:00
beautify_code.py last commit before abandoning 2026-03-07 21:25:34 +11:00
dynamic_scraper.py last commit before abandoning 2026-03-07 21:25:34 +11:00
index.html last commit before abandoning 2026-03-07 21:25:34 +11:00
main.py last commit before abandoning 2026-03-07 21:25:34 +11:00
pyproject.toml last commit before abandoning 2026-03-07 21:25:34 +11:00
README.md last commit before abandoning 2026-03-07 21:25:34 +11:00
requirements.txt last commit before abandoning 2026-03-07 21:25:34 +11:00
simple_scraper.sh last commit before abandoning 2026-03-07 21:25:34 +11:00
uv.lock last commit before abandoning 2026-03-07 21:25:34 +11:00
website_scraper.py last commit before abandoning 2026-03-07 21:25:34 +11:00

Website Reverse Engineering Tools

This collection contains tools to help you test how easily your website can be reverse engineered by extracting HTML, JavaScript, CSS, and other assets.

Tools Included

1. Simple Scraper (Shell Script) - simple_scraper.sh

A basic shell script using curl that works on any Unix-like system without additional dependencies.

Usage:

./simple_scraper.sh https://example.com [output_directory]

Features:

  • Downloads main HTML
  • Extracts and downloads external JavaScript files
  • Extracts and downloads CSS files
  • Attempts to extract inline JavaScript
  • Generates basic report

2. Basic Python Scraper - website_scraper.py

A more robust Python script using requests and BeautifulSoup.

Setup:

pip install -r requirements.txt

Usage:

python website_scraper.py

Features:

  • Interactive prompts for URL and output directory
  • Better HTML parsing
  • Extracts inline JavaScript blocks
  • Handles relative URLs properly
  • More robust error handling

3. Advanced Python Scraper - advanced_scraper.py

A comprehensive reverse engineering tool that analyzes the website in depth.

Setup:

pip install -r requirements.txt

Usage:

python advanced_scraper.py

Features:

  • 🔍 Technology Detection: Identifies React, Vue, Angular, Next.js, etc.
  • 📦 Bundle Analysis: Analyzes webpack bundles and modules
  • 🗺️ Source Maps: Downloads and saves source maps for debugging
  • 💄 Code Beautification: Beautifies minified JavaScript
  • 🚨 Security Analysis: Detects potential API keys, secrets, tokens
  • 🌐 API Discovery: Extracts API endpoints from JavaScript
  • 📊 Comprehensive Reporting: JSON and human-readable reports

What These Tools Reveal

Assets Extracted

  • HTML files - Complete page structure and content
  • JavaScript files - All client-side logic and functionality
  • CSS files - Styling and layout information
  • Source maps - Original source code (if available)
  • Inline scripts - JavaScript embedded in HTML

Security Information

  • API endpoints - URLs your frontend calls
  • Authentication tokens - Exposed keys or secrets
  • Configuration data - Environment variables in client code
  • Third-party dependencies - External libraries and versions

Technology Stack

  • Frontend frameworks (React, Vue, Angular)
  • Build tools (Webpack, Vite, Parcel)
  • UI libraries (Bootstrap, Material-UI)
  • Analytics and tracking codes

Example Output Structure

reverse_engineered/
├── html/
│   └── index.html
├── js/
│   ├── main.js
│   ├── vendor.js
│   ├── main_beautified.js
│   └── inline_script_0.js
├── css/
│   ├── styles.css
│   └── bootstrap.min.css
├── sourcemaps/
│   └── main.js.map
└── analysis/
    ├── report.json
    └── summary.txt

Security Implications

This demonstrates what information is publicly accessible in web applications:

Client-Side Vulnerabilities

  • Exposed API Keys: Hardcoded secrets in frontend code
  • Business Logic: Complete application flow visible
  • API Endpoints: All backend routes discoverable
  • Authentication Logic: Client-side auth mechanisms revealed

Mitigation Strategies

  1. Never expose secrets in frontend code
  2. Use environment variables properly (server-side only)
  3. Minimize client-side business logic
  4. Implement proper API authentication
  5. Use code obfuscation for sensitive algorithms
  6. Regular security audits of client-side code

⚠️ Important: These tools are for testing your own websites or with explicit permission. Always follow responsible disclosure practices and respect website terms of service.

Requirements

  • Shell script: Basic Unix tools (curl, grep, sed)
  • Python scripts: Python 3.6+ with dependencies in requirements.txt

Installation

# Clone or download the scripts
git clone <repository-url>

# Install Python dependencies
pip install -r requirements.txt

# Make shell script executable
chmod +x simple_scraper.sh

Examples

Quick test with shell script:

./simple_scraper.sh https://your-website.com

Comprehensive analysis:

python advanced_scraper.py
# Enter URL when prompted: https://your-website.com

Analyze your local development site:

python advanced_scraper.py
# Enter URL: http://localhost:3000

Understanding the Results

After running these tools on your website, review:

  1. What sensitive information is exposed?
  2. Are there hardcoded API keys or secrets?
  3. How much business logic is client-side?
  4. What does the technology stack reveal?
  5. Are source maps accidentally deployed?

This analysis helps you understand your attack surface from a potential attacker's perspective.