mirror of https://github.com/Dadams2/liverag.git synced 2026-06-27 12:19:05 +00:00

Unimelb SIGIR Live Rag Competition

Jupyter Notebook 59.6%
Python 40.4%

Find a file

Andrew Pham 496ff87563 Merge pull request #12 from Dadams2/rag-response-formatting implemented batch querying		2025-05-03 13:19:00 +10:00
.vscode	feat: config pytest for vscode	2025-04-06 14:14:49 +10:00
Sample_Notebooks	feat: add RAG tutorial demo	2025-04-10 15:44:54 +10:00
scripts	chore: fix code styling	2025-05-03 07:43:38 +10:00
src	fix: fix code styling	2025-05-03 13:18:10 +10:00
tests	chore: fix code styling	2025-05-03 07:43:38 +10:00
.env.example	docs: hugging face setup	2025-04-10 17:12:40 +10:00
.gitignore	add test for datamorgana (test passed)	2025-04-13 15:06:59 +10:00
.python-version	finish setup	2025-03-28 01:07:36 +11:00
abby_test.ipynb	implemented batch querying	2025-05-03 12:55:51 +10:00
pyproject.toml	Merge pull request #4 from Dadams2/simple-rag-example	2025-04-11 09:53:02 +10:00
pytest.ini	finish setup	2025-03-28 01:07:36 +11:00
README.md	Merge branch 'main' into feat/QAGeneration	2025-04-13 15:09:19 +10:00
uv.lock	Merge pull request #4 from Dadams2/simple-rag-example	2025-04-11 09:53:02 +10:00

README.md

Unimelb SIGIR Live Rag Competition

This guide provides comprehensive instructions for setting up your development environment and using the resources provided for the LiveRAG Challenge.

Unimelb SIGIR Live Rag Competition

Quickstart

Clone the repository:

git clone https://github.com/Dadams2/liverag
cd liverag

Create a virtual environment:
```
uv venv
source .venv/bin/activate
```
Install dependencies:
```
uv sync
```
Set up environment variables: Copy the .env.example to .env and fill in the required values.
Run the setup scripts:
- To set up AWS resources:
```
python scripts/setup_aws.py
```
- To set up AWS credentials:
```
python scripts/setup_credentials.py
```
- To set up hugging face access:
```
python scripts/setup_hf.py
```

AWS Account Setup

Team Account

TBD

LiveRAG Account

Access to pre-built indices is provided through a TII-managed AWS account:

To find the Access Key ID and Secret Access Key, please refer to the email on Friday 21 Mar at 09:44.

Configure AWS CLI Profile:

aws configure --profile sigir-participant
# Use the following settings:
# AWS Access Key ID: [your access key]
# AWS Secret Access Key: [your secret key]
# Default region name: us-east-1
# Default output format: json

Verify Access:

# Should display your AWS account ID
aws sts get-caller-identity --profile sigir-participant

# Test access to configuration service
aws ssm get-parameter --name /pinecone/ro_token --profile sigir-participant

Development Environment

Setting Up Python

Install Python 3.12 (or a recent version) on your development machine
Choose a dependency management approach (see next section for uv)

Using uv for Dependency Management

uv is a fast Python package installer and resolver that can be used as an alternative to pip or conda.

Installing uv

# Using pip
pip install uv

# On macOS with Homebrew
brew install uv

# On Linux with curl
curl -LsSf https://astral.sh/uv/install.sh | sh

Creating a Virtual Environment with uv

# Create a new virtual environment
uv venv

# Activate the virtual environment
# On Unix/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

Installing Dependencies with uv

# Install packages directly
uv pip install torch transformers boto3 pinecone opensearch-py

# Install with specific versions 
uv pip install torch==2.5.1 transformers==4.45.2 boto3==1.35.88 pinecone==5.4.2 opensearch-py==2.8.0

# Install from requirements.txt
uv pip install -r requirements.txt

Advantages of uv

Much faster than pip (up to 10-100x)
Better dependency resolution
Compatible with existing tools and workflows
Can generate lock files for reproducible environments

Hugging Face Setup (for Private Model Access)

The model used in this challenge tiiua/Falcon3-10B-Instruct is a private model so you will need to generate an acess token and have it available for the code you wish to run.

Generate a Hugging Face Access Token

First, create or log into your Hugging Face account. Then:

Visit https://huggingface.co/settings/tokens
Click "New token"
Choose Read access
Copy the token

Authenticate via CLI (Recommended)

Assuming you have setup your python environment you should have huggingface-cli installed

huggingface-cli login
# Paste your token when prompted

This stores your credentials in ~/.huggingface/token and will be used automatically by transformers and other HF libraries.

Use Environment Variables

If you’re running code on amazon (EC2, or containers), set the token as an environment variable:

export HUGGINGFACE_HUB_TOKEN=your_token_here

Or add this to your .env file:

HUGGINGFACE_HUB_TOKEN=your_token_here

Ensure your code or scripts load the environment variables, e.g., with dotenv.

Pass Token Programmatically in Code

If needed, you can pass the token directly when loading models:

from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import login

login(token="your_token_here")  # optional if already logged in

model = AutoModelForCausalLM.from_pretrained(
    "tiiua/Falcon3-10B-Instruct",
    token="your_token_here"
)

tokenizer = AutoTokenizer.from_pretrained(
    "tiiua/Falcon3-10B-Instruct",
    token="your_token_here"
)

Using `huggingface_hub` Login in Notebooks

if you are just using a notebook (not reccomended)

from huggingface_hub import notebook_login

notebook_login()

Using Pre-Built Indices

We provide two pre-built indices for retrieval:

Pinecone (Dense) Index

import boto3
from pinecone import Pinecone
from transformers import AutoModel, AutoTokenizer

# Get Pinecone token from AWS SSM
session = boto3.Session(profile_name="sigir-participant", region_name="us-east-1")
ssm = session.client("ssm")
token = ssm.get_parameter(Name="/pinecone/ro_token", WithDecryption=True)["Parameter"]["Value"]

# Initialize Pinecone
pc = Pinecone(api_key=token)
index = pc.Index(name="fineweb10bt-512-0w-e5-base-v2")

# See the example notebook for full query implementation

OpenSearch (Sparse) Index

import boto3
from opensearchpy import OpenSearch, AWSV4SignerAuth, RequestsHttpConnection

# Get credentials and endpoint
session = boto3.Session(profile_name="sigir-participant")
credentials = session.get_credentials()
auth = AWSV4SignerAuth(credentials, region="us-east-1")

ssm = session.client("ssm")
host_name = ssm.get_parameter(Name="/opensearch/endpoint")["Parameter"]["Value"]

# Initialize OpenSearch client
aos_client = OpenSearch(
    hosts=[{"host": host_name, "port": 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

# See the example notebook for full query implementation

Cost Management

Efficient cost management is crucial to ensure AWS credits last throughout the competition:

Shut down unused resources – Turn off GPU instances when not in use
Monitor costs regularly – Use AWS Cost Explorer and set up CloudWatch billing alarms
Experiment on smaller datasets – Test on smaller data before scaling up
Use spot instances when appropriate for non-critical workloads
Set up AWS Budgets to receive notifications before exceeding planned spending

Additional Resources

Indices Usage Examples Notebook - Sample code for using indices
AWS Accounts Information - Detailed AWS account guidance
Pinecone for LiveRAG - Instructions for building your own Pinecone index
AWS CLI Documentation - Official AWS CLI guide
uv Documentation - Official uv documentation

Note: Remember that if you exceed your AWS credits, we will be directly charged and not refunded!

README.md

Unimelb SIGIR Live Rag Competition

Table of Contents

Quickstart

AWS Account Setup

Team Account

LiveRAG Account

Development Environment

Setting Up Python

Using uv for Dependency Management

Installing uv

Creating a Virtual Environment with uv

Installing Dependencies with uv

Advantages of uv

Hugging Face Setup (for Private Model Access)

Generate a Hugging Face Access Token

Authenticate via CLI (Recommended)

Use Environment Variables

Pass Token Programmatically in Code

Using Pre-Built Indices

Pinecone (Dense) Index

OpenSearch (Sparse) Index

Cost Management

Additional Resources

Development Updates

README.md Unescape Escape

Unimelb SIGIR Live Rag Competition

Table of Contents

Quickstart

AWS Account Setup

Team Account

LiveRAG Account

Development Environment

Setting Up Python

Using uv for Dependency Management

Installing uv

Creating a Virtual Environment with uv

Installing Dependencies with uv

Advantages of uv

Hugging Face Setup (for Private Model Access)

Generate a Hugging Face Access Token

Authenticate via CLI (Recommended)

Use Environment Variables

Pass Token Programmatically in Code

Using huggingface_hub Login in Notebooks

Using Pre-Built Indices

Pinecone (Dense) Index

OpenSearch (Sparse) Index

Cost Management

Additional Resources

Development Updates

README.md

Using `huggingface_hub` Login in Notebooks