본문 바로가기

기술자료

임베디드 마이크 시스템(USB)을 사용하여 Ubuntu에서 심층적인 소음 억제를 구현하는 개발 방법 및 예제

임베디드 마이크 시스템(USB)을 사용하여 Ubuntu에서 심층적인 소음 억제를 구현하는 개발 방법 및 예
목적
DNS(Deep Noise Suppression) 방법을 사용하여 Ubuntu 기반 플랫폼에서 USB 연결 내장형 마이크 ​​시스템용 소음 억제 솔루션을 구현합니다. 여기에는 환경 소음을 최소화하여 음성 선명도를 향상시키는 실시간 오디오 처리가 포함됩니다.

개발 방법
1. 개발 환경 설정
Ubuntu 설치:

오디오 라이브러리 및 도구와의 호환성을 위해 Ubuntu 20.04 이상을 사용하십시오.
필요한 라이브러리 및 도구 설치:

오디오 라이브러리:
PortAudio: 크로스 플랫폼 오디오 입력/출력용.
PySoundFile: 오디오 파일 조작용.
Librosa: 오디오 전처리용.
DNS 프레임워크:
Microsoft의 DNS Challenge 사전 훈련된 모델 또는 RNNoise 또는 NVIDIA Riva와 같은 대안.

Development Method and Example for Implementing Deep Noise Suppression on Ubuntu with an Embedded Microphone System (USB)


Objective

To implement a noise suppression solution for a USB-connected embedded microphone system on an Ubuntu-based platform using the Deep Noise Suppression (DNS) method. This involves real-time audio processing to enhance voice clarity by minimizing environmental noise.


Development Method

1. Setting Up the Development Environment

  1. Install Ubuntu:
    • Use Ubuntu 20.04 or newer for compatibility with audio libraries and tools.
  2. Install Required Libraries and Tools:
    • Audio Libraries:
      • PortAudio: For cross-platform audio input/output.
      • PySoundFile: For audio file manipulation.
      • Librosa: For audio preprocessing.
    • DNS Framework:
      • Microsoft’s DNS Challenge pre-trained models or alternatives like RNNoise or NVIDIA Riva.
    • Python Environment:
      • Install Python 3.x and required dependencies using pip:
        sudo apt update
        sudo apt install python3 python3-pip
        pip3 install numpy scipy soundfile librosa torch torchaudio
        

2. Connect and Configure the USB Microphone

  1. Verify USB Microphone Connection:
    • Use lsusb to ensure the microphone is detected:
      lsusb
      
    • Check the audio input device with:
      arecord -l
      
  2. Configure PulseAudio or ALSA:
    • Use PulseAudio to set the USB microphone as the default input device.
    • Test recording:
      arecord -D plughw:1,0 -f cd test.wav
      

3. Preprocessing Audio Input

  • Sample Rate Adjustment:
    • Use Librosa or PySoundFile to resample the audio to the desired rate (e.g., 16 kHz for DNS models).
    • Example:
      import soundfile as sf
      import librosa
      
      audio, sr = librosa.load('input.wav', sr=16000)  # Resample to 16 kHz
      sf.write('resampled.wav', audio, sr)
      
  • Normalize Audio:
    • Normalize audio levels to maintain consistent input.

4. Integrate Deep Noise Suppression

  1. Load the Pre-Trained DNS Model:
    • Use PyTorch or TensorFlow to load a DNS model. Example with PyTorch:
      import torch
      from dns_model import DNSModel  # Example DNS model class
      
      model = DNSModel()
      model.load_state_dict(torch.load("dns_model.pth"))
      model.eval()
      
  2. Process the Audio Input:
    • Pass the recorded audio through the DNS model:
      def suppress_noise(model, audio):
          with torch.no_grad():
              input_tensor = torch.tensor(audio).unsqueeze(0)  # Add batch dimension
              output = model(input_tensor)
          return output.squeeze(0).numpy()  # Remove batch dimension
      
  3. Save or Stream Processed Audio:
    • Save the noise-suppressed audio:
      sf.write('output.wav', processed_audio, 16000)
      
    • Stream processed audio back to the system using PortAudio or ALSA.

5. Real-Time Implementation

  • Implement a real-time pipeline to capture, process, and output audio:
    1. Capture audio using PortAudio.
    2. Pass audio frames through the DNS model.
    3. Output processed audio to the speaker or save it.

Example Implementation

Real-Time Noise Suppression Code Example

import pyaudio
import torch
import numpy as np
from dns_model import DNSModel

# Load DNS model
model = DNSModel()
model.load_state_dict(torch.load("dns_model.pth"))
model.eval()

# Audio configuration
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 1024

# Initialize PyAudio
audio = pyaudio.PyAudio()

# Open input and output streams
input_stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)
output_stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, output=True, frames_per_buffer=CHUNK)

print("Starting real-time noise suppression...")

try:
    while True:
        # Read audio input
        input_data = input_stream.read(CHUNK)
        audio_array = np.frombuffer(input_data, dtype=np.int16)

        # Convert to float32 and normalize
        audio_array = audio_array.astype(np.float32) / 32768.0

        # Process with DNS model
        input_tensor = torch.tensor(audio_array).unsqueeze(0)
        with torch.no_grad():
            output_tensor = model(input_tensor)

        # Convert back to int16
        processed_audio = (output_tensor.squeeze(0).numpy() * 32768).astype(np.int16)

        # Output processed audio
        output_stream.write(processed_audio.tobytes())

except KeyboardInterrupt:
    print("Stopping noise suppression...")

# Close streams
input_stream.stop_stream()
input_stream.close()
output_stream.stop_stream()
output_stream.close()
audio.terminate()

Testing and Optimization

  1. Test in Noisy Environment:
    • Use recorded noise samples or real-world factory environments for testing.
  2. Optimize for Low Latency:
    • Reduce model size using quantization (e.g., PyTorch quantization to INT8).
  3. Evaluate Metrics:
    • Measure signal-to-noise ratio (SNR) improvement and latency.
  4. Deploy on Embedded Systems:
    • Deploy using NVIDIA Jetson Nano, Xavier, or Intel NUC for optimized performance.

Expected Results

  • Significant noise reduction in real-time.
  • Improved voice clarity, suitable for factory or office environments.
  • Efficient performance on embedded hardware.

This implementation provides a foundation for developing an effective noise suppression system on Ubuntu using Deep Noise Suppression methods.