No description
Find a file
2026-03-14 13:27:41 +11:00
docs add docs and release files 2026-03-14 13:27:41 +11:00
.gitignore add docs and release files 2026-03-14 13:27:41 +11:00
.python-version initial working version 2024-12-02 17:34:31 +11:00
CONTRIBUTING.md add docs and release files 2026-03-14 13:27:41 +11:00
LICENSE add docs and release files 2026-03-14 13:27:41 +11:00
main.py upgrading to actually download papers 2025-02-01 22:02:44 +11:00
pyproject.toml add docs and release files 2026-03-14 13:27:41 +11:00
README.md add docs and release files 2026-03-14 13:27:41 +11:00
requirements.txt add docs and release files 2026-03-14 13:27:41 +11:00
summarizer.py fulltext pdf 2026-03-12 19:55:24 +11:00
uv.lock add docs and release files 2026-03-14 13:27:41 +11:00

Career Summarizer

Small Python project that builds academic career reports from publication metadata and paper text.

The workflow is:

  1. Read researcher names from names.txt.
  2. Fetch publication metadata using Google Scholar.
  3. Download available PDFs.
  4. Build a local vector store per researcher.
  5. Generate a markdown report in reports/.

Requirements

  • Python 3.10+
  • Ollama running locally (default model: mistral)

Quick Start

  1. Clone the repository and move into it.
  2. Install dependencies.
  3. Ensure Ollama is running and the model exists.
  4. Add names to names.txt (one per line).
  5. Run the script.

Using uv:

uv sync
uv run python main.py --max_papers 20

Using pip:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python main.py --max_papers 20

Output

  • Reports are written to reports/ as *_career_report.md.
  • Downloaded papers and vector stores are written under papers/.

Notes

  • Some papers do not expose direct PDF links and will be skipped.
  • First run can be slow due to embedding/model setup.
  • This project depends on third-party data sources and model output; review generated reports before use.

Project Structure

  • main.py: CLI entry point.
  • summarizer.py: publication collection, PDF processing, vector store creation, and report generation.
  • names.txt: input researcher names.
  • reports/: generated reports.
  • papers/: downloaded papers and per-researcher vector stores.

License

This project is licensed under the MIT License. See LICENSE.