Multimodal Voice Assistant Pipeline
Developed a fully localized, multimodal voice assistant during an intensive university sprint project in early 2026. The system eliminates cloud reliance by integrating local Speech-to-Text (STT) algorithms directly with Large Language Models (LLMs) to create a natural, interactive dialogue interface.
The physical build required syncing multi-channel microphone arrays with visual processing units. A major technical hurdle involved managing the processing payload to ensure the hardware did not thermal throttle while running the LLM locally alongside audio capture algorithms.