You are not logged in.

#1 2025-03-10 15:08:01

nurber
Member
Registered: 2025-03-10
Posts: 2

python 'transformers' modules ?

Hi.  I'm trying to run the following python code, but missing 'transformers' modules and can't locate in any official or AUR repository.  Quite new to arch, but having always found needed package guessing I'm missing something obvious ?  If not what do others do ( virtual environment or pipx install ...) ? Thank you

import onnxruntime as ort
import numpy as np
import librosa
import librosa.display
import torch
import matplotlib.pyplot as plt
from transformers import WhisperProcessor, WhisperTokenizer

# Load Whisper processor and tokenizer
model_name = "openai/whisper-tiny"
processor = WhisperProcessor.from_pretrained(model_name)
tokenizer = WhisperTokenizer.from_pretrained(model_name)

# Load ONNX model
session = ort.InferenceSession("whisper_optimized.onnx", providers=["CPUExecutionProvider"])

# ? Step 1: Load and Preprocess Audio File
def load_audio(file_path):
    audio, sr = librosa.load(file_path, sr=16000)  # Load at 16kHz (Whisper requirement)
    return audio

# ? Step 2: Convert Audio to Mel Spectrogram
def audio_to_mel(audio):
    mel_spectrogram = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="np").input_features
    return mel_spectrogram

# ? Step 3: Perform Whisper ONNX Inference (Step-by-Step Decoding)
def whisper_transcribe(audio_file):
    audio = load_audio(audio_file)  # Load WAV audio
    mel_spectrogram = audio_to_mel(audio)  # Convert to mel spectrogram

    # Set forced language prompt "<|en|>" for English
    forced_language_prompt = tokenizer.encode("<|startoftranscript|> <|en|>", add_special_tokens=False)
    decoder_input_ids = np.array([forced_language_prompt], dtype=np.int64)  # (1, N)

    decoded_text = ""
    max_length = 100  # Limit max decoding steps

    for step in range(max_length):
        # Run ONNX inference
        outputs = session.run(None, {"mel": mel_spectrogram, "decoder_input_ids": decoder_input_ids})

        # Get predicted token (argmax over logits)
        predicted_token_id = np.argmax(outputs[0][0, -1])  # Select last step prediction

        # Stop decoding if end token (<|endoftext|>) is reached
        if predicted_token_id == 50256:
            break

        # Convert token ID to text and append to output
        decoded_text += tokenizer.decode([predicted_token_id]) + " "

        # Append token to decoder input for next iteration
        decoder_input_ids = np.hstack([decoder_input_ids, np.array([[predicted_token_id]])])

    return decoded_text.strip()

# ? Step 4: Run Transcription
audio_file = "harvard.wav"  # Replace with your actual file path
transcribed_text = whisper_transcribe(audio_file)
print("? Transcription:", transcribed_text)

Offline

#2 2025-03-10 15:41:42

lawmurray
Member
From: Bangkok
Registered: 2025-02-10
Posts: 2
Website

Re: python 'transformers' modules ?

Personally, I would use a virtual environment. I find Python dependencies very difficult to manage across multiple projects otherwise, as transitive dependencies are often set to specific versions, which may conflict.

Offline

#3 2025-03-10 20:36:42

ayekat
Member
Registered: 2011-01-17
Posts: 1,617

Re: python 'transformers' modules ?

https://aur.archlinux.org/packages/python-transformers?
More generally, most python modules are typically packaged as `python-…`.

Alternatively (for managing your project-specific development dependencies), probably a pyproject.toml with Poetry or uv.


pkgshackscfgblag

Offline

#4 2025-03-12 11:43:55

nurber
Member
Registered: 2025-03-10
Posts: 2

Re: python 'transformers' modules ?

don't know how I missed the package, thanks for informative replies

Offline

Board footer

Powered by FluxBB