No description
| docs | ||
| .gitignore | ||
| .python-version | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| main.py | ||
| pyproject.toml | ||
| README.md | ||
| requirements.txt | ||
| summarizer.py | ||
| uv.lock | ||
Career Summarizer
Small Python project that builds academic career reports from publication metadata and paper text.
The workflow is:
- Read researcher names from
names.txt. - Fetch publication metadata using Google Scholar.
- Download available PDFs.
- Build a local vector store per researcher.
- Generate a markdown report in
reports/.
Requirements
- Python 3.10+
- Ollama running locally (default model:
mistral)
Quick Start
- Clone the repository and move into it.
- Install dependencies.
- Ensure Ollama is running and the model exists.
- Add names to
names.txt(one per line). - Run the script.
Using uv:
uv sync
uv run python main.py --max_papers 20
Using pip:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python main.py --max_papers 20
Output
- Reports are written to
reports/as*_career_report.md. - Downloaded papers and vector stores are written under
papers/.
Notes
- Some papers do not expose direct PDF links and will be skipped.
- First run can be slow due to embedding/model setup.
- This project depends on third-party data sources and model output; review generated reports before use.
Project Structure
main.py: CLI entry point.summarizer.py: publication collection, PDF processing, vector store creation, and report generation.names.txt: input researcher names.reports/: generated reports.papers/: downloaded papers and per-researcher vector stores.
License
This project is licensed under the MIT License. See LICENSE.