最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

READ

2022-12-13 20:30 作者:小狗麥兜不會丟  | 我要投稿

1hello, everyone, my name is Sun Xinyu. The paper I will present today is titled: deep year -sound localization with general microphones. Next, I will talk about my achievements in studying this paper in the tone of the author

2For sound localization and most methods are based on my phone ring, at least we need 3 microphones. therefore, if we only have 2 microphones, this method cannot work anymore.

3for example, if we made the time difference of arrival between these 2 microphones, What will happen? Then how to locate sources with only two microphones? for example,there are many humanoid service robots when a user is talking to a robot. The robot should figure out the voice direction with their ears and turn their head. if this device can detect the sound location。then, the microphone in the ear can amplify the sound from this a specific direction, then it will substantially improve the sound quality when the user talking to the others.

4so here we May ask a question, in this case, how to locate the sound south with only 2 microphones? It need to be clear that, we observe that we as humans have the natural ability to locate multiple sounds easily. for example, you can easily hear the people talking around. but we also have only 2 years, just like 2 microphones. why can we locate multiple people talking at the same time? actually it is because that humans have ears and brain.

5so next, I will briefly introduce the mechanism behind and then propose our design. let's have a look at the human auditory system. don't worry about this complex biological terms. I will give an intuitive understanding for them. first, the sounds will trap to the ear and the ear will bring unique reflection patterns to the sound from different directions. next, Cochlea AI can see here like a smile, it will transform the sounds into the electrical stimulus signal. then, along the neuropath and many brainstem nuclei will into the signal to perception. after that, the auditory cortex in human brain will interpret its perception as the cognition of sound direction.

6The signal collection device are binaural microphones, we first used the general microphones to record the sound and use some filters to emanate the function. This pattern is direction-dependent. Human beings learn this pattern to perform localization. As shown by the curves on these four figures.

7this special pattern actually is direction dependence, thus a human brain can learn this pattern to perform a localization. To validate these arguments, we use a loudspeaker to transmit excitation signals and measure the frequency response in the ear. As shown in the left figure, we can see that they have a significant difference with and without ears, no matter for left ear or right ear. Along with this spiral shape, its different parts vibrate in response to different frequencies.

8In the part of Feature Extraction, we have used VAE to realize compression and coding of simulated brainstem nucleus signal.

9 actually like an encoder。we know that an encoder can comprise or include data without much information laws.so we use a variational auto encoder to include the data into high level representations like the brainstem nuclei. ?most representative features from the input automatically。the final step is the location inference conventional methods usually use single label classification, which means the model only outputs one results and locates only one sound source.so they can't support the multiple found scenarios.

10for this reason, we formulate the multiple sounds localization as a multi label classification problem. specifically we divide the 2 space into several small sectors,for example , for each notes a reference each sector. then we train each output nodes independently. our model can output multiple results simultaneously to support co active sound cells. we can also increase the number of subsectors to adjust the special resolution as we need. this multi label classification network is used for sound detection, which means to detect.

11if there is an active sound source in this sector area, in addition, we design a market. -multitask learning structure jointly minimize the loss of sound’s detection ,direction ,and prediction. and distance estimation. here is an overview of a deeper year structure. we can see that the sound of left and right ears are passed through a gum to feet back. then, a variational auto encoder is used to include data into high level features. we also introduced the cross correlation between 2 channels, as a part of features. because it contains the time difference between the 2 years, which is very important to sound organization,we perform a substruction between 2 feature vectors。

12then concatenate the mall together as the final feature vector. we use some simple dense layers to constructed the inference network in this paper. we see the number of special sectors as it. ?for each sector accordingly, we have a sector sub night.? ?in each sectors night, it includes 3 small nights to detect the sound source to predict the angle of rival and estimate the sound distance. this is subsectors share a similar structure. now let's move to evaluation parts. we use the tube burning special sound data sites, ?it contains 3 rooms, ?including a lecture room and meeting room. considering that , there are not too many active sound sources in our real life. we see the maximum number of sound source.

13 then we select 18 percent of ?data for model training. ??this model is tested on there main TIC data, the meeting room data and lecture room data. we choose a baseline name, the wave lock, and it is a row waveform based end to end CNN approach.in an NEC ?environments, the average sound detection accuracy is ninety three percent. if only focus on positive class and the hamming score is eighty three percent.the angle of rival estimation error is about 7 degrees, and the distance classification accuracy.

14 we can see that the performance in one south scenario is the highest and it decrease with the number of sources increase. in addition, we can see the deeper year always outperformed the matter for which cases. we also test the model on the data in meeting room. ?unfortunately, on the sound detection accuracy drops to sixty five percent. and the angle of rival error increased to 16 degrees.it is in our expectation because ?we ?are training the model on the ann eco TIC data,but test it?are on the reverend data,which means they are 2 totally different environments.so to enable deeper ear to adapt to new environments,we explore the transport learning strategy.

15啊,specifically we have already trained this more global model啊,with massive and co icd at a。and thus we can keep the previous layers frozen,then fine tune the model with the data in new environment.so we can see啊。as the figure shows,the dark bar refers to the performance before transfer learning.and the light bar denotes the performance after transfer learning. we can see AH after applying the transfer learning.and sound detection accuracy again increase to nineteen two per cent and.it is the most same as before and the LA estimation error also decreased by a half.similarly,direct testing deep ear on lecture room data also has the performance decrease.and transfer learning can effectively recover the system performance.we also evaluate the performance of transfer learning with different sizes of data. we can see that in both meeting room and lecture room.

16only 2%of data are enough to boost the deeper year performance.it is because that we already use massive anne co TIC data to train a global model,thus the deeper ear can adjust the feature space.啊,with only a small number of new data and quickly adapt to the new environment。we also conduct a real world study to evaluate the importance of years for sound localization.at the left figure shoes,I will use a loudspeaker as a sound source,please at 8 different locations around a general microphone.then I will implement a simple one year lst m network to perform the sound localization task.we can see the confusion matrix with and without ears.when we detach the years,the localization accuracy is fifteen eight percent obviously. the model can hardly.I identified the degrees in the left side or right side.also,

17it confuse the the the direction at the front side and backside.so it suffers from the symmetric啊,sound confusions,as we mentioned in the beginning of the talk。by contrast,and the overall classification accuracy increased to uh ninety two percent.after mounting the airs.which means the ears indeed help or not啊to to improve the sound accuracy。so here I want to give a take away message of this talk.ears.啊,India play a significant role in sound organization and dissent igrish。to conclude,we proposed a deeper year,the first sound localization system for banner al microphones without a prior knowledge of sounds number.

18deeper ear is a bionic machine,her aling framework inspiring by a human auditory system. of course,we can replace the encoder of.or inference network with other powerful backbones like transformer.finally,with transfer learning strategy,deep ear can quickly adapt to new environments with a small number of extra training data.OK,that's all for my presentation,and I welcome any questions. thank you.

?

?


READ的評論 (共 條)

分享到微博請遵守國家法律
花莲县| 江安县| 天柱县| 周宁县| 庆云县| 晴隆县| 榆林市| 比如县| 老河口市| 科技| 望谟县| 射阳县| 浙江省| 涿鹿县| 阿勒泰市| 桐城市| 罗甸县| 监利县| 井研县| 武陟县| 灵川县| 洪江市| 马山县| 大埔县| 什邡市| 鞍山市| 同德县| 麦盖提县| 连山| 贺兰县| 辽阳市| 泰和县| 卢湾区| 赞皇县| 望城县| 来安县| 秦皇岛市| 麻江县| 澄江县| 运城市| 长葛市|