임베디드 마이크 시스템(USB)을 사용하여 Ubuntu에서 심층적인 소음 억제를 구현하는 개발 방법 및 예
목적
DNS(Deep Noise Suppression) 방법을 사용하여 Ubuntu 기반 플랫폼에서 USB 연결 내장형 마이크 시스템용 소음 억제 솔루션을 구현합니다. 여기에는 환경 소음을 최소화하여 음성 선명도를 향상시키는 실시간 오디오 처리가 포함됩니다.
개발 방법
1. 개발 환경 설정
Ubuntu 설치:
오디오 라이브러리 및 도구와의 호환성을 위해 Ubuntu 20.04 이상을 사용하십시오.
필요한 라이브러리 및 도구 설치:
오디오 라이브러리:
PortAudio: 크로스 플랫폼 오디오 입력/출력용.
PySoundFile: 오디오 파일 조작용.
Librosa: 오디오 전처리용.
DNS 프레임워크:
Microsoft의 DNS Challenge 사전 훈련된 모델 또는 RNNoise 또는 NVIDIA Riva와 같은 대안.
Development Method and Example for Implementing Deep Noise Suppression on Ubuntu with an Embedded Microphone System (USB)
Objective
To implement a noise suppression solution for a USB-connected embedded microphone system on an Ubuntu-based platform using the Deep Noise Suppression (DNS) method. This involves real-time audio processing to enhance voice clarity by minimizing environmental noise.
Development Method
1. Setting Up the Development Environment
- Install Ubuntu:
- Use Ubuntu 20.04 or newer for compatibility with audio libraries and tools.
- Install Required Libraries and Tools:
- Audio Libraries:
- PortAudio: For cross-platform audio input/output.
- PySoundFile: For audio file manipulation.
- Librosa: For audio preprocessing.
- DNS Framework:
- Microsoft’s DNS Challenge pre-trained models or alternatives like RNNoise or NVIDIA Riva.
- Python Environment:
- Install Python 3.x and required dependencies using pip:
sudo apt update sudo apt install python3 python3-pip pip3 install numpy scipy soundfile librosa torch torchaudio
- Install Python 3.x and required dependencies using pip:
- Audio Libraries:
2. Connect and Configure the USB Microphone
- Verify USB Microphone Connection:
- Use lsusb to ensure the microphone is detected:
lsusb
- Check the audio input device with:
arecord -l
- Use lsusb to ensure the microphone is detected:
- Configure PulseAudio or ALSA:
- Use PulseAudio to set the USB microphone as the default input device.
- Test recording:
arecord -D plughw:1,0 -f cd test.wav
3. Preprocessing Audio Input
- Sample Rate Adjustment:
- Use Librosa or PySoundFile to resample the audio to the desired rate (e.g., 16 kHz for DNS models).
- Example:
import soundfile as sf import librosa audio, sr = librosa.load('input.wav', sr=16000) # Resample to 16 kHz sf.write('resampled.wav', audio, sr)
- Normalize Audio:
- Normalize audio levels to maintain consistent input.
4. Integrate Deep Noise Suppression
- Load the Pre-Trained DNS Model:
- Use PyTorch or TensorFlow to load a DNS model. Example with PyTorch:
import torch from dns_model import DNSModel # Example DNS model class model = DNSModel() model.load_state_dict(torch.load("dns_model.pth")) model.eval()
- Use PyTorch or TensorFlow to load a DNS model. Example with PyTorch:
- Process the Audio Input:
- Pass the recorded audio through the DNS model:
def suppress_noise(model, audio): with torch.no_grad(): input_tensor = torch.tensor(audio).unsqueeze(0) # Add batch dimension output = model(input_tensor) return output.squeeze(0).numpy() # Remove batch dimension
- Pass the recorded audio through the DNS model:
- Save or Stream Processed Audio:
- Save the noise-suppressed audio:
sf.write('output.wav', processed_audio, 16000)
- Stream processed audio back to the system using PortAudio or ALSA.
- Save the noise-suppressed audio:
5. Real-Time Implementation
- Implement a real-time pipeline to capture, process, and output audio:
- Capture audio using PortAudio.
- Pass audio frames through the DNS model.
- Output processed audio to the speaker or save it.
Example Implementation
Real-Time Noise Suppression Code Example
import pyaudio
import torch
import numpy as np
from dns_model import DNSModel
# Load DNS model
model = DNSModel()
model.load_state_dict(torch.load("dns_model.pth"))
model.eval()
# Audio configuration
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 1024
# Initialize PyAudio
audio = pyaudio.PyAudio()
# Open input and output streams
input_stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)
output_stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, output=True, frames_per_buffer=CHUNK)
print("Starting real-time noise suppression...")
try:
while True:
# Read audio input
input_data = input_stream.read(CHUNK)
audio_array = np.frombuffer(input_data, dtype=np.int16)
# Convert to float32 and normalize
audio_array = audio_array.astype(np.float32) / 32768.0
# Process with DNS model
input_tensor = torch.tensor(audio_array).unsqueeze(0)
with torch.no_grad():
output_tensor = model(input_tensor)
# Convert back to int16
processed_audio = (output_tensor.squeeze(0).numpy() * 32768).astype(np.int16)
# Output processed audio
output_stream.write(processed_audio.tobytes())
except KeyboardInterrupt:
print("Stopping noise suppression...")
# Close streams
input_stream.stop_stream()
input_stream.close()
output_stream.stop_stream()
output_stream.close()
audio.terminate()
Testing and Optimization
- Test in Noisy Environment:
- Use recorded noise samples or real-world factory environments for testing.
- Optimize for Low Latency:
- Reduce model size using quantization (e.g., PyTorch quantization to INT8).
- Evaluate Metrics:
- Measure signal-to-noise ratio (SNR) improvement and latency.
- Deploy on Embedded Systems:
- Deploy using NVIDIA Jetson Nano, Xavier, or Intel NUC for optimized performance.
Expected Results
- Significant noise reduction in real-time.
- Improved voice clarity, suitable for factory or office environments.
- Efficient performance on embedded hardware.
This implementation provides a foundation for developing an effective noise suppression system on Ubuntu using Deep Noise Suppression methods.
'기술자료' 카테고리의 다른 글
페르소나 모델 기법을 활용한 위험 상황 조기 감지 및 대화형 경보 생성을 위한 개발 방법개요 (0) | 2024.12.09 |
---|---|
텍스트 데이터 벡터화 방법 및 작업자 위험 시나리오의 예제별 테스트 (0) | 2024.12.09 |
공장과 같은 시끄럽고 노이즈가 많은 환경에 적합한 오디오 처리 모델과 적용 방안 (2) | 2024.12.08 |
rag llm 적용 방안 (2) | 2024.12.08 |
근거리 카메라, 원거리 카메라, 레이더 센서, 공기질 센서를 활용한 데이터 처리 및 분석 rag, LLM (0) | 2024.12.08 |