散文網(wǎng) » 生活 »日常 » spectrum & Spectrogram

spectrum & Spectrogram

2022-11-28 12:47 作者:絕代膠片 0人讀過 | 我要投稿

Find the Spectrogram.

To understand spectrogram, we first need to know what is a spectrum. The Spectrum is the set of frequencies that make up a specific signal. The lowest frequency in a signal called the fundamental frequency. Frequencies that are whole number multiples of the fundamental frequency are known as harmonics. The spectrum of a signal (especially of a non-periodic) changes with time. Therefore, the common approach is to find the spectrum for small fixed section of a signal at a time. This is repeated until we have traversed all of sampled signal. The spectrum for all of the individual sections of a signal are stacked together and that gives us the spectrogram.

Another key thing to remember is that spectrum of a signal is found by taking the Fourier Transform of the signal in a time domain. The approach that is normally taken in to divide the sampled signal into equal parts (as mentioned above) and take the Fourier Transform of each part individually. This is called STFT. Therefore, when we want to take the STFT of a signal, we need to specify how many samples we should consider at a time.

Spectrogram is represented as a matrix.

The size of the spectrogram is ([(n_fft / 2) + 1 ,?number of frames]).

Transform the spectrogram output to a logarithmic scale

The change of scale is done so that we can actually get a spectrogram from which we can visually infer some meaningful information. The spectrogram above is hard to read as our hearing range is limited to a small set of frequecies and amplitudes.

How humans perceive the frequencies in a sound? We as humans perceive?frequrncies on a logarithmic scale, rather than a linear scale. ?Meaning we can easily tell the differences between lower frequencies (such as between 100Hz and 200Hz) but we can hardly tell the differences between higher frequencies (such as 10000Hz and 100100Hz). In both cases, the difference in frequency is 100Hz but to a human ear 100Hz-200Hz pair appears farther apart than 10000Hz-100100Hz pair. Looking at it differently, in the 100Hz-200Hz pair, the second frequency is double the first frequency, whereas in the 10000Hz-100100Hz pair, the second frequency is only 1% more than the first one. Therefore, we can say that we hear them on a logarithmic scale rather than a linear scale.?

How humans perceive amplitude?of a sound? Humans perceive amplitude of a sound as its loudness. We hear loudness logarithmically rather than linearly. This is accounted for with a Decibel scale. 0 dB is silence. 10 dB is 10 times louder than 0 dB, 20 dB is 100 times louder and 30 dB is 1000 times louder and so on .?

Mel Spectrogram

The mel scale is a non-linear transformation of frequency scale based on the perception of pitches. The mel scale is calculated so that two pairs of frequencies separated by a delta in the mel scale are perceived by humans as being equidistant.

In machine learning applications involving speech and audio, we typically want to represent the power spectrogram in the mel scale domain. We do that by applying a bank of overlapping triangular filters (called mel filter bank) that compute the energy of the spectrum in each band.

The shape of the Mel spectrogram is [number of mel bands,?number of frames], where frame_size is the number of FFT components (n_fft)

Log Mel Spectrogram

Move from power (mel) spectrum and apply log and move amplitude to a log scale (decibels). While doing so we will also normalize the spectrogram so that its maximum represent the 0 dB point.

標(biāo)簽：