散文網(wǎng) » 科技 »學(xué)習(xí) » 【語(yǔ)音識(shí)別】基于智能語(yǔ)音識(shí)別門(mén)禁系統(tǒng)matlab源碼含GUI

【語(yǔ)音識(shí)別】基于智能語(yǔ)音識(shí)別門(mén)禁系統(tǒng)matlab源碼含GUI

2022-04-04 08:42 作者:Matlab工程師 0人讀過(guò) | 我要投稿

一、簡(jiǎn)介

本文基于Matlab設(shè)計(jì)實(shí)現(xiàn)了一個(gè)文本相關(guān)的聲紋識(shí)別系統(tǒng)，可以判定說(shuō)話人身份。
1 系統(tǒng)原理
a.聲紋識(shí)別
? ? 這兩年隨著人工智能的發(fā)展，不少手機(jī)App都推出了聲紋鎖的功能。這里面所采用的主要就是聲紋識(shí)別相關(guān)的技術(shù)。聲紋識(shí)別又叫說(shuō)話人識(shí)別，它和語(yǔ)音識(shí)別存在一點(diǎn)差別。

b.梅爾頻率倒譜系數(shù)（MFCC）
梅爾頻率倒譜系數(shù)（Mel Frequency Cepstrum Coefficient, MFCC）是語(yǔ)音信號(hào)處理中最常用的語(yǔ)音信號(hào)特征之一。
實(shí)驗(yàn)觀測(cè)發(fā)現(xiàn)人耳就像一個(gè)濾波器組一樣，它只關(guān)注頻譜上某些特定的頻率。人耳的聲音頻率感知范圍在頻譜上的不遵循線性關(guān)系，而是在Mel頻域上遵循近似線性關(guān)系。
梅爾頻率倒譜系數(shù)考慮到了人類的聽(tīng)覺(jué)特征，先將線性頻譜映射到基于聽(tīng)覺(jué)感知的Mel非線性頻譜中，然后轉(zhuǎn)換到倒譜上。普通頻率轉(zhuǎn)換到梅爾頻率的關(guān)系式為：

c.矢量量化（VectorQuantization）
本系統(tǒng)利用矢量量化對(duì)提取的語(yǔ)音MFCC特征進(jìn)行壓縮。
VectorQuantization (VQ)是一種基于塊編碼規(guī)則的有損數(shù)據(jù)壓縮方法。事實(shí)上，在 JPEG 和 MPEG-4 等多媒體壓縮格式里都有 VQ 這一步。它的基本思想是：將若干個(gè)標(biāo)量數(shù)據(jù)組構(gòu)成一個(gè)矢量，然后在矢量空間給以整體量化，從而壓縮了數(shù)據(jù)而不損失多少信息。
3 系統(tǒng)結(jié)構(gòu)
本文整個(gè)系統(tǒng)的結(jié)構(gòu)如下圖：
? –訓(xùn)練過(guò)程
首先對(duì)語(yǔ)音信號(hào)進(jìn)行預(yù)處理，之后提取MFCC特征參數(shù)，利用矢量量化方法進(jìn)行壓縮，得到說(shuō)話人發(fā)音的碼本。同一說(shuō)話人多次說(shuō)同一內(nèi)容，重復(fù)該訓(xùn)練過(guò)程，最終形成一個(gè)碼本庫(kù)。
? –識(shí)別過(guò)程
在識(shí)別時(shí)，同樣先對(duì)語(yǔ)音信號(hào)預(yù)處理，提取MFCC特征，比較本次特征和訓(xùn)練庫(kù)碼本之間的歐氏距離。當(dāng)小于某個(gè)閾值，我們認(rèn)定本次說(shuō)話的說(shuō)話人及說(shuō)話內(nèi)容與訓(xùn)練碼本庫(kù)中的一致，配對(duì)成功。

4 測(cè)試實(shí)驗(yàn)

可以看到只有說(shuō)話人及說(shuō)話內(nèi)容與碼本庫(kù)完全一致時(shí)才會(huì)顯示“密碼正確”，否則顯示“密碼錯(cuò)誤”，實(shí)現(xiàn)了聲紋鎖的相關(guān)功能。

二、源代碼

function varargout = GUI(varargin) gui_Singleton = 1; gui_State = struct('gui_Name', ? ? ? mfilename, ... ? ? ? ? ? ? ? ? ? 'gui_Singleton', ?gui_Singleton, ... ? ? ? ? ? ? ? ? ? 'gui_OpeningFcn', @GUI_OpeningFcn, ... ? ? ? ? ? ? ? ? ? 'gui_OutputFcn', ?@GUI_OutputFcn, ... ? ? ? ? ? ? ? ? ? 'gui_LayoutFcn', ?[] , ... ? ? ? ? ? ? ? ? ? 'gui_Callback', ? []); if nargin && ischar(varargin{1}) ? ?gui_State.gui_Callback = str2func(varargin{1}); end if nargout ? ?[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:}); else ? ?gui_mainfcn(gui_State, varargin{:}); end % End initialization code - DO NOT EDIT % --- Executes just before GUI is made visible. function GUI_OpeningFcn(hObject, eventdata, handles, varargin) % This function has no output args, see OutputFcn. % varargin ? command line arguments to GUI (see VARARGIN) % Choose default command line output for GUI handles.output = hObject; % Update handles structure guidata(hObject, handles); % UIWAIT makes GUI wait for user response (see UIRESUME) % uiwait(handles.figure1); % --- Outputs from this function are returned to the command line. function varargout = GUI_OutputFcn(hObject, eventdata, handles) % Get default command line output from handles structure varargout{1} = handles.output; % --- Executes on button press in trainrec. function trainrec_Callback(hObject, eventdata, handles) speaker_id = trainrec(); set(handles.train_current,'string','Hurraay,DONE!'); speaker_iden = sprintf('you re speaker number %d', speaker_id); % set(handles.speaker,'string',speaker_iden); set(handles.access,'BackgroundColor','blue'); set(handles.access,'string','YOU HAVE ACCESS, TRAIN COMMANDS NOW!'); % if access_ == 1 % set(handles.access,'string','YOU HAVE ACCESS, TRAIN COMMANDS NOW!'); % else % set(handles.access,'string','YOU DONT HAVE ACCESS,SPEAKER NOT RECOGNIZED!'); % end % --- Executes on button press in command. function command_Callback(hObject, eventdata, handles) trai_pairs=30; out_neurons=5; hid_neurons=6; in_nodes=13; eata=0.1;emax=0.001;q=1;e=0;lamda=.7; ?t=1; load backp.mat W V; recObj = audiorecorder; Fs=8000; Nseconds = 1; while(1) fprintf('say any word immediately after hitting enter'); input(''); recordblocking(recObj, 1); x = getaudiodata(recObj); [kk,g] = lpc(x,12); Z=(kk); Z=double(Z); p1=max(Z); Z=Z/p1; for p=1:trai_pairs ? ? ? ?z=transpose(Z(p,:)); % ?calculate output ? y=(tansig(V*(z))); ? o=(tansig(W*(y))); ? break end ? ? ?b=o(1); ? ?c=o(2); ? ?d=o(3); ? ?e=o(4); ? ?f=o(5); ? ?a= max(o); ? ?if (b==a ) ? ? ? ?display('AHEAD'); ? ? ? ?set(handles.ahead,'BackgroundColor','green'); ? ? ? ?set(handles.command,'string','Ahead'); ? ? ? ?pause(2); ? ?elseif (c== a) ? ? ? ?display('STOP'); ? ? ? ?set(handles.stop,'BackgroundColor','green'); ? ? ? ?set(handles.command,'string','Stop'); ? ? ? ?pause(2); ? ?elseif (d== a) ? ? ? ?display('BACK'); ? ? ? ?set(handles.back,'BackgroundColor','green'); ? ? ? ?set(handles.command,'string','Back'); ? ? ? ?pause(2); ? ?elseif (e==a) ? ? ? ?display('LEFT'); ? ? ? ?set(handles.left,'BackgroundColor','green'); ? ? ? ?set(handles.command,'string','Left'); ? ? ? ?pause(2); ? ?elseif (f==a) ? ? ? ?display('RIGHT'); ? ? ? ?set(handles.right,'BackgroundColor','green'); ? ? ? ?set(handles.command,'string','Right'); ? ? ? ?pause(2); ? ?end ? ?set(handles.ahead,'BackgroundColor','white'); set(handles.left,'BackgroundColor','white'); set(handles.right,'BackgroundColor','white'); set(handles.stop,'BackgroundColor','white'); set(handles.back,'BackgroundColor','white'); ? ? end function traincommands() Fs=8000; Nseconds = 1; samp=6; words=5; recObj = audiorecorder; aheaddir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\ahead\'; ? backdir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\back\'; ? stopdir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\stop\'; ? rightdir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\right\'; ? leftdir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\left\'; ? s_right = numel(dir([rightdir '*.wav'])); ? ? for i= 1:1:samp ? filename = sprintf('%ss%d.wav', aheaddir, i); fprintf('Reading %ss%d ',aheaddir,i); [x,Fs] = audioread(filename); [s(i,:),g] = lpc(x,12); end for i= (samp+1):1:2*samp ? ? filename = sprintf('%ss%d.wav', stopdir, i- samp); fprintf('Reading %ss%d ',stopdir,i); [x,Fs] = audioread(filename); [s(i,:),g] = lpc(x,12); %plot(s(i,:)); end for i= (2*samp+1):1:3*samp filename = sprintf('%ss%d.wav', backdir, i-2*samp); fprintf('Reading %ss%d ',backdir,i); [x,Fs] = audioread(filename); [s(i,:),g] = lpc(x,12); end for i= (3*samp+1):1:4*samp filename = sprintf('%ss%d.wav', leftdir, i-3*samp); fprintf('Reading %ss%d ',leftdir,i); [x,Fs] = audioread(filename); [s(i,:),g] = lpc(x,12); end for i= (4*samp+1):1:5*samp ? ? filename = sprintf('%ss%d.wav', rightdir, i- 4*samp); fprintf('Reading %ss%d ',rightdir,i); [x,Fs] = audioread(filename); [s(i,:),g] = lpc(x,12); end S=zeros(1,13); for i=1:1:samp ? ?S=cat(1,S,s(i,:)); ? ?S=cat(1,S,s(samp+i,:)); ? ?S=cat(1,S,s(2*samp+i,:)); ? ?S=cat(1,S,s(3*samp+i,:)); ? ?S=cat(1,S,s(4*samp+i,:)); end S(1,:)=[]; save speechp.mat S trai_pairs=30; % 48 samples out_neurons=5; % no of words hid_neurons=6; %matka in_nodes=13; %features are 13 eata=0.1;emax=0.001;q=1;e=0;lamda=.7; ?t=1; load speechp.mat S p1=max(max(S)); s=S/p1; Z= double(s); dummy=[1 -1 -1 -1 -1; ? -1 1 -1 -1 -1; ? -1 -1 1 -1 -1; ? -1 -1 -1 1 -1; ? -1 -1 -1 -1 1]; ? t=trai_pairs/out_neurons; D=dummy; for i= 1:1:5 ? ?D=cat(1,D,dummy); end

三、運(yùn)行結(jié)果

?

標(biāo)簽：