You are not logged in.
Pages: 1
Hi. I'm trying to run the following python code, but missing 'transformers' modules and can't locate in any official or AUR repository. Quite new to arch, but having always found needed package guessing I'm missing something obvious ? If not what do others do ( virtual environment or pipx install ...) ? Thank you
import onnxruntime as ort
import numpy as np
import librosa
import librosa.display
import torch
import matplotlib.pyplot as plt
from transformers import WhisperProcessor, WhisperTokenizer
# Load Whisper processor and tokenizer
model_name = "openai/whisper-tiny"
processor = WhisperProcessor.from_pretrained(model_name)
tokenizer = WhisperTokenizer.from_pretrained(model_name)
# Load ONNX model
session = ort.InferenceSession("whisper_optimized.onnx", providers=["CPUExecutionProvider"])
# ? Step 1: Load and Preprocess Audio File
def load_audio(file_path):
audio, sr = librosa.load(file_path, sr=16000) # Load at 16kHz (Whisper requirement)
return audio
# ? Step 2: Convert Audio to Mel Spectrogram
def audio_to_mel(audio):
mel_spectrogram = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="np").input_features
return mel_spectrogram
# ? Step 3: Perform Whisper ONNX Inference (Step-by-Step Decoding)
def whisper_transcribe(audio_file):
audio = load_audio(audio_file) # Load WAV audio
mel_spectrogram = audio_to_mel(audio) # Convert to mel spectrogram
# Set forced language prompt "<|en|>" for English
forced_language_prompt = tokenizer.encode("<|startoftranscript|> <|en|>", add_special_tokens=False)
decoder_input_ids = np.array([forced_language_prompt], dtype=np.int64) # (1, N)
decoded_text = ""
max_length = 100 # Limit max decoding steps
for step in range(max_length):
# Run ONNX inference
outputs = session.run(None, {"mel": mel_spectrogram, "decoder_input_ids": decoder_input_ids})
# Get predicted token (argmax over logits)
predicted_token_id = np.argmax(outputs[0][0, -1]) # Select last step prediction
# Stop decoding if end token (<|endoftext|>) is reached
if predicted_token_id == 50256:
break
# Convert token ID to text and append to output
decoded_text += tokenizer.decode([predicted_token_id]) + " "
# Append token to decoder input for next iteration
decoder_input_ids = np.hstack([decoder_input_ids, np.array([[predicted_token_id]])])
return decoded_text.strip()
# ? Step 4: Run Transcription
audio_file = "harvard.wav" # Replace with your actual file path
transcribed_text = whisper_transcribe(audio_file)
print("? Transcription:", transcribed_text)
Offline
Personally, I would use a virtual environment. I find Python dependencies very difficult to manage across multiple projects otherwise, as transitive dependencies are often set to specific versions, which may conflict.
Offline
https://aur.archlinux.org/packages/python-transformers?
More generally, most python modules are typically packaged as `python-…`.
Alternatively (for managing your project-specific development dependencies), probably a pyproject.toml with Poetry or uv.
Offline
don't know how I missed the package, thanks for informative replies
Offline
Pages: 1