【語(yǔ)音識(shí)別】基于智能語(yǔ)音識(shí)別門(mén)禁系統(tǒng)matlab源碼含GUI
一、簡(jiǎn)介
本文基于Matlab設(shè)計(jì)實(shí)現(xiàn)了一個(gè)文本相關(guān)的聲紋識(shí)別系統(tǒng),可以判定說(shuō)話人身份。
1 系統(tǒng)原理
a.聲紋識(shí)別
? ? 這兩年隨著人工智能的發(fā)展,不少手機(jī)App都推出了聲紋鎖的功能。這里面所采用的主要就是聲紋識(shí)別相關(guān)的技術(shù)。聲紋識(shí)別又叫說(shuō)話人識(shí)別,它和語(yǔ)音識(shí)別存在一點(diǎn)差別。

b.梅爾頻率倒譜系數(shù)(MFCC)
梅爾頻率倒譜系數(shù)(Mel Frequency Cepstrum Coefficient, MFCC)是語(yǔ)音信號(hào)處理中最常用的語(yǔ)音信號(hào)特征之一。
實(shí)驗(yàn)觀測(cè)發(fā)現(xiàn)人耳就像一個(gè)濾波器組一樣,它只關(guān)注頻譜上某些特定的頻率。人耳的聲音頻率感知范圍在頻譜上的不遵循線性關(guān)系,而是在Mel頻域上遵循近似線性關(guān)系。
梅爾頻率倒譜系數(shù)考慮到了人類的聽(tīng)覺(jué)特征,先將線性頻譜映射到基于聽(tīng)覺(jué)感知的Mel非線性頻譜中,然后轉(zhuǎn)換到倒譜上。普通頻率轉(zhuǎn)換到梅爾頻率的關(guān)系式為:

c.矢量量化(VectorQuantization)
本系統(tǒng)利用矢量量化對(duì)提取的語(yǔ)音MFCC特征進(jìn)行壓縮。
VectorQuantization (VQ)是一種基于塊編碼規(guī)則的有損數(shù)據(jù)壓縮方法。事實(shí)上,在 JPEG 和 MPEG-4 等多媒體壓縮格式里都有 VQ 這一步。它的基本思想是:將若干個(gè)標(biāo)量數(shù)據(jù)組構(gòu)成一個(gè)矢量,然后在矢量空間給以整體量化,從而壓縮了數(shù)據(jù)而不損失多少信息。
3 系統(tǒng)結(jié)構(gòu)
本文整個(gè)系統(tǒng)的結(jié)構(gòu)如下圖:
? –訓(xùn)練過(guò)程
首先對(duì)語(yǔ)音信號(hào)進(jìn)行預(yù)處理,之后提取MFCC特征參數(shù),利用矢量量化方法進(jìn)行壓縮,得到說(shuō)話人發(fā)音的碼本。同一說(shuō)話人多次說(shuō)同一內(nèi)容,重復(fù)該訓(xùn)練過(guò)程,最終形成一個(gè)碼本庫(kù)。
? –識(shí)別過(guò)程
在識(shí)別時(shí),同樣先對(duì)語(yǔ)音信號(hào)預(yù)處理,提取MFCC特征,比較本次特征和訓(xùn)練庫(kù)碼本之間的歐氏距離。當(dāng)小于某個(gè)閾值,我們認(rèn)定本次說(shuō)話的說(shuō)話人及說(shuō)話內(nèi)容與訓(xùn)練碼本庫(kù)中的一致,配對(duì)成功。

4 測(cè)試實(shí)驗(yàn)




可以看到只有說(shuō)話人及說(shuō)話內(nèi)容與碼本庫(kù)完全一致時(shí)才會(huì)顯示“密碼正確”,否則顯示“密碼錯(cuò)誤”,實(shí)現(xiàn)了聲紋鎖的相關(guān)功能。
二、源代碼
function varargout = GUI(varargin)
gui_Singleton = 1;
gui_State = struct('gui_Name', ? ? ? mfilename, ...
? ? ? ? ? ? ? ? ? 'gui_Singleton', ?gui_Singleton, ...
? ? ? ? ? ? ? ? ? 'gui_OpeningFcn', @GUI_OpeningFcn, ...
? ? ? ? ? ? ? ? ? 'gui_OutputFcn', ?@GUI_OutputFcn, ...
? ? ? ? ? ? ? ? ? 'gui_LayoutFcn', ?[] , ...
? ? ? ? ? ? ? ? ? 'gui_Callback', ? []);
if nargin && ischar(varargin{1})
? ?gui_State.gui_Callback = str2func(varargin{1});
end
if nargout
? ?[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
? ?gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT
% --- Executes just before GUI is made visible.
function GUI_OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% varargin ? command line arguments to GUI (see VARARGIN)
% Choose default command line output for GUI
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);
% UIWAIT makes GUI wait for user response (see UIRESUME)
% uiwait(handles.figure1);
% --- Outputs from this function are returned to the command line.
function varargout = GUI_OutputFcn(hObject, eventdata, handles)
% Get default command line output from handles structure
varargout{1} = handles.output;
% --- Executes on button press in trainrec.
function trainrec_Callback(hObject, eventdata, handles)
speaker_id = trainrec();
set(handles.train_current,'string','Hurraay,DONE!');
speaker_iden = sprintf('you re speaker number %d', speaker_id);
% set(handles.speaker,'string',speaker_iden);
set(handles.access,'BackgroundColor','blue');
set(handles.access,'string','YOU HAVE ACCESS, TRAIN COMMANDS NOW!');
% if access_ == 1
% set(handles.access,'string','YOU HAVE ACCESS, TRAIN COMMANDS NOW!');
% else
% set(handles.access,'string','YOU DONT HAVE ACCESS,SPEAKER NOT RECOGNIZED!');
% end
% --- Executes on button press in command.
function command_Callback(hObject, eventdata, handles)
trai_pairs=30;
out_neurons=5;
hid_neurons=6;
in_nodes=13;
eata=0.1;emax=0.001;q=1;e=0;lamda=.7; ?t=1;
load backp.mat W V;
recObj = audiorecorder;
Fs=8000;
Nseconds = 1;
while(1)
fprintf('say any word immediately after hitting enter');
input('');
recordblocking(recObj, 1);
x = getaudiodata(recObj);
[kk,g] = lpc(x,12);
Z=(kk);
Z=double(Z);
p1=max(Z);
Z=Z/p1;
for p=1:trai_pairs
? ?
? ?z=transpose(Z(p,:));
% ?calculate output
? y=(tansig(V*(z)));
? o=(tansig(W*(y)));
? break
end
?
? ?b=o(1);
? ?c=o(2);
? ?d=o(3);
? ?e=o(4);
? ?f=o(5);
? ?a= max(o);
? ?if (b==a )
? ? ? ?display('AHEAD');
? ? ? ?set(handles.ahead,'BackgroundColor','green');
? ? ? ?set(handles.command,'string','Ahead');
? ? ? ?pause(2);
? ?elseif (c== a)
? ? ? ?display('STOP');
? ? ? ?set(handles.stop,'BackgroundColor','green');
? ? ? ?set(handles.command,'string','Stop');
? ? ? ?pause(2);
? ?elseif (d== a)
? ? ? ?display('BACK');
? ? ? ?set(handles.back,'BackgroundColor','green');
? ? ? ?set(handles.command,'string','Back');
? ? ? ?pause(2);
? ?elseif (e==a)
? ? ? ?display('LEFT');
? ? ? ?set(handles.left,'BackgroundColor','green');
? ? ? ?set(handles.command,'string','Left');
? ? ? ?pause(2);
? ?elseif (f==a)
? ? ? ?display('RIGHT');
? ? ? ?set(handles.right,'BackgroundColor','green');
? ? ? ?set(handles.command,'string','Right');
? ? ? ?pause(2);
? ?end
? ?set(handles.ahead,'BackgroundColor','white');
set(handles.left,'BackgroundColor','white');
set(handles.right,'BackgroundColor','white');
set(handles.stop,'BackgroundColor','white');
set(handles.back,'BackgroundColor','white');
? ?
end
function traincommands()
Fs=8000;
Nseconds = 1;
samp=6;
words=5;
recObj = audiorecorder;
aheaddir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\ahead\'; ?
backdir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\back\'; ?
stopdir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\stop\'; ?
rightdir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\right\'; ?
leftdir = 'C:\Users\Rezetane\Desktop\HRI Proj\Speech-Recognition-master\data\train_commands\left\'; ?
s_right = numel(dir([rightdir '*.wav'])); ? ?
for i= 1:1:samp
?
filename = sprintf('%ss%d.wav', aheaddir, i);
fprintf('Reading %ss%d ',aheaddir,i);
[x,Fs] = audioread(filename);
[s(i,:),g] = lpc(x,12);
end
for i= (samp+1):1:2*samp
? ?
filename = sprintf('%ss%d.wav', stopdir, i- samp);
fprintf('Reading %ss%d ',stopdir,i);
[x,Fs] = audioread(filename);
[s(i,:),g] = lpc(x,12);
%plot(s(i,:));
end
for i= (2*samp+1):1:3*samp
filename = sprintf('%ss%d.wav', backdir, i-2*samp);
fprintf('Reading %ss%d ',backdir,i);
[x,Fs] = audioread(filename);
[s(i,:),g] = lpc(x,12);
end
for i= (3*samp+1):1:4*samp
filename = sprintf('%ss%d.wav', leftdir, i-3*samp);
fprintf('Reading %ss%d ',leftdir,i);
[x,Fs] = audioread(filename);
[s(i,:),g] = lpc(x,12);
end
for i= (4*samp+1):1:5*samp
? ?
filename = sprintf('%ss%d.wav', rightdir, i- 4*samp);
fprintf('Reading %ss%d ',rightdir,i);
[x,Fs] = audioread(filename);
[s(i,:),g] = lpc(x,12);
end
S=zeros(1,13);
for i=1:1:samp
? ?S=cat(1,S,s(i,:));
? ?S=cat(1,S,s(samp+i,:));
? ?S=cat(1,S,s(2*samp+i,:));
? ?S=cat(1,S,s(3*samp+i,:));
? ?S=cat(1,S,s(4*samp+i,:));
end
S(1,:)=[];
save speechp.mat S
trai_pairs=30; % 48 samples
out_neurons=5; % no of words
hid_neurons=6; %matka
in_nodes=13; %features are 13
eata=0.1;emax=0.001;q=1;e=0;lamda=.7; ?t=1;
load speechp.mat S
p1=max(max(S));
s=S/p1;
Z= double(s);
dummy=[1 -1 -1 -1 -1;
? -1 1 -1 -1 -1;
? -1 -1 1 -1 -1;
? -1 -1 -1 1 -1;
? -1 -1 -1 -1 1];
?
t=trai_pairs/out_neurons;
D=dummy;
for i= 1:1:5
? ?D=cat(1,D,dummy);
end
三、運(yùn)行結(jié)果


?