最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

如何用Trace32分析內(nèi)核死機(干貨~)

2022-07-24 14:02 作者:補給站Linux內(nèi)核  | 我要投稿

dmesg 初步分析


[ ?423.400073] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ ?423.400075] [silead finger_interrupt_handler 505]:S IRQ 19 , GPIO 12 state is 0
[ ?423.400083] [silead finger_interrupt_handler 506]:state is 0
[ ?423.400096] pgd = ffffffc0017eb000
[ ?423.400103] [00000008] *pgd=000000008ea0a003, *pud=000000008ea0a003, *pmd=000000008ea0b003, *pte=006000000b000707
[ ?423.400124] Internal error: Oops: 96000046 [#1] PREEMPT SMP
[ ?423.400132] Modules linked in: wlan(O)
[ ?423.400148] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G ? ? ? ?W ?O ? 3.18.31-perf #1
[ ?423.400155] Hardware name: Qualcomm Technologies, Inc. MSM8940-PMI8950 MTP (DT)
[ ?423.400164] task: ffffffc04eae4980 ti: ffffffc0b28bc000 task.ti: ffffffc0b28bc000
[ ?423.400182] PC is at run_timer_softirq+0x4ac/0x5ec
[ ?423.400192] LR is at run_timer_softirq+0x324/0x5ec
[ ?423.400199] pc : [<ffffffc0000feb98>] lr : [<ffffffc0000fea10>] pstate: 600001c5
[ ?423.400204] sp : ffffffc0b28bfb60
...
[ ?423.401490] Process swapper/4 (pid: 0, stack limit = 0xffffffc0b28bc058)
[ ?423.401496] Call trace:
[ ?423.401510] [<ffffffc0000feb98>] run_timer_softirq+0x4ac/0x5ec
[ ?423.401523] [<ffffffc0000a6864>] __do_softirq+0x178/0x350
[ ?423.401532] [<ffffffc0000a6c8c>] irq_exit+0x74/0xb0
[ ?423.401543] [<ffffffc0000edf18>] __handle_domain_irq+0xb4/0xec
[ ?423.401553] [<ffffffc00008254c>] gic_handle_irq+0x54/0x84
[ ?423.401560] Exception stack(0xffffffc0b28bfd40 to 0xffffffc0b28bfe60)
...
[ ?423.401671] [<ffffffc000085da8>] el1_irq+0x68/0xd4
[ ?423.401685] [<ffffffc000851480>] cpuidle_enter_state+0xd0/0x224
[ ?423.401695] [<ffffffc0008516ac>] cpuidle_enter+0x18/0x20
[ ?423.401706] [<ffffffc0000e1cc0>] cpu_startup_entry+0x288/0x384
[ ?423.401717] [<ffffffc000091d5c>] secondary_start_kernel+0x108/0x114
[ ?423.401728] Code: b90052a0 34000840 f9400321 f9400720 (f9000420) 
[ ?423.401736] ---[ end trace d0daa1892c14378b ]---
[ ?423.401753] Kernel panic - not syncing: Fatal exception in interrupt
[ ?423.401774] CPU7: stopping

連上trace32,load vmlinux后通過list source看下匯編源碼混合顯示,如下:



  • FFFFFFC0000FE968: 當(dāng)前匯編指令的虛擬地址

  • F9000760: 匯編機器碼,ARM/ARM64的指令機器碼都是32位固定長度

  • str x0,[x27, #0x8]: 匯編指令,;后的是注釋

根據(jù)AAPCS(ARM二進(jìn)制過程調(diào)用標(biāo)準(zhǔn))參數(shù)傳遞規(guī)則,ARM64的 v0 - v7 參數(shù)直接由 x0 - x7 傳遞,其他參數(shù)由壓棧傳遞,子程序返回結(jié)果保存到x0。

那么這里可推導(dǎo)如下:

x0 == prev, x1 == next

指令:str x0 ,[x1,#0x8] x1+0x8 其實就是next+8個字節(jié)的偏移,看下:


struct list_head {
 ? ?struct list_head *next, *prev;
};

ARM體系結(jié)構(gòu)中,ARM64一個指針占8個字節(jié)內(nèi)存,也就是: [x1+0x8] == prev

所以這個str指令就是對應(yīng)上面的next->prev = prev

我們根據(jù)異常棧的寄存器值來看下:

[ ?423.400182] PC is at run_timer_softirq+0x4ac/0x5ec
[ ?423.400192] LR is at run_timer_softirq+0x324/0x5ec
[ ?423.400199] pc : [<ffffffc0000feb98>] lr : [<ffffffc0000fea10>] pstate: 600001c5
[ ?423.400204] sp : ffffffc0b28bfb60
[ ?423.400210] x29: ffffffc0b28bfb60 x28: ffffffc0b2619038 
[ ?423.400219] x27: ffffffc000c9a000 x26: 0000000000000000 
[ ?423.400228] x25: ffffffc001741120 x24: ffffffc0006e277c 
[ ?423.400237] x23: ffffffc0b2619000 x22: ffffffc0b28bfbf8 
[ ?423.400246] x21: ffffffc0b28bc000 x20: ffffffc0013d2000 
[ ?423.400254] x19: ffffffc0b2618000 x18: 0000007f9588e080 
[ ?423.400263] x17: 0000007f9a36d4b4 x16: ffffffc0001e4f6c 
[ ?423.400272] x15: 003b9aca00000000 x14: 0000000000000001 
[ ?423.400280] x13: 0000000000000000 x12: 0000000000000001 
[ ?423.400289] x11: 000000000000000f x10: ffffffc000c9c3f4 
[ ?423.400297] x9 : 0000000000000000 x8 : 0000000000000005 
[ ?423.400305] x7 : 0000000000000000 x6 : 000000000001451c 
[ ?423.400314] x5 : ffffffc0b2ae8000 x4 : 00135f1aa15bb200 
[ ?423.400323] x3 : 0000000000000018 x2 : 0000000000000000 
[ ?423.400331] x1 : 0000000000000000 x0 : ffffffc0b28bfbf8

上面可以看到,x1+0x8 ==0x0000000000000000+0x8==0x0000000000000008,這個和出錯時候報的地址一致“Unable to handle kernel NULL pointer dereference at virtual address 00000008”

因為ARM64內(nèi)核的虛擬地址空間范圍是0xFFFF_0000_0000_0000 =>0xFFFF_FFFF_FFFF_FFFF,很明顯這個值0x0000000000000008不在范圍內(nèi),它屬于用戶空間的虛擬地址空間,肯定會被MMU攔截掉上報data abort異常,所以此題的真正原因是程序跑飛訪問非法地址所致。

目前看來從kernel log上的信息無法直接分析出導(dǎo)致問題的具體源代碼,從dmesg的這些信息我們已經(jīng)知道出問題的是這個prev指針,但是比較難直接抓到導(dǎo)致異常的真兇源碼位置。


【文章福利】小編推薦自己的Linux內(nèi)核技術(shù)交流群:【891587639】整理了一些個人覺得比較好的學(xué)習(xí)書籍、視頻資料共享在群文件里面,有需要的可以自行添加哦?。。。ê曨l教程、電子書、實戰(zhàn)項目及代碼)? ? ?


Trace32 分析

利用dmesg我們分析了產(chǎn)生問題時候的來龍去脈,但是想要徹底解決還需要trace32進(jìn)一步分析。

輸入v.f,調(diào)出當(dāng)前的調(diào)用棧關(guān)系:



可以看到,異常時候的各種參數(shù)都顯示出來了,這樣就非常有利于我們debug了,這也是單純從dmesg無法得到的重要信息!注意inline類型的函數(shù)會被編譯器默認(rèn)優(yōu)化掉,所以inline類型的函數(shù)的參數(shù)不可見,需要通過讀匯編代碼,分析寄存器傳參推導(dǎo)。

輸入d.list 查看PC停止的位置,如下高亮:



分析Call Stack:

為方便查看,把調(diào)用棧信息復(fù)制出來,如下:


 1. ...
-007|die(
 ? ?| ??,
 ? ?| ? ?regs = 0xFFFFFFC0B28BFA40 -> (
 ? ?| ? ? ?user_regs = (regs = (0xFFFFFFC0B28BFBF8, 0x0, 0x0, 0x18, 0x00135F1AA15BB200, 0xFFFFFFC0B2AE800
 ? ?| ? ? ?regs = (0xFFFFFFC0B28BFBF8, 0x0, 0x0, 0x18, 0x00135F1AA15BB200, 0xFFFFFFC0B2AE8000, 0x0001451C
 ? ?| ? ? ?sp = 0xFFFFFFC0B28BFB60,
 ? ?| ? ? ?pc = 0xFFFFFFC0000FEB98,
 ? ?| ? ? ?pstate = 0x600001C5,
 ? ?| ? ? ?orig_x0 = 0xFFFFFFC0B2618000,
 ? ?| ? ? ?syscallno = 0xFFFFFFC0000FE7D0),
 ? ?| ? ?err = 0x96000046)
 ? ?| ?flags = 0x01C0
 ? ?| ?ret = 0x1
 ? ?| ?tsk = 0xFFFFFFC04EAE4980
 ? ?| ?die_counter = 0x1
-008|__do_kernel_fault.part.5(
 ? ?| ? ?mm = 0x0,
 ? ?| ? ?addr = 0x8,
 ? ?| ? ?esr = 0x96000046,
 ? ?| ? ?regs = 0xFFFFFFC0B28BFA40)
-009|do_page_fault(
 ? ?| ? ?addr = 0x8,
 ? ?| ? ?esr = 0x96000046,
 ? ?| ? ?regs = 0xFFFFFFC0B28BFA40)
 ? ?| ?tsk = 0xFFFFFFC04EAE4980
 ? ?| ?mm = 0x0
 ? ?| ?vm_flags = 0xFFFFFFC000C9A000
 ? ?| ?vma = 0xFFFFFFC0B28BFA40
-010|do_translation_fault(
 ? ?| ? ?addr = 0x0A44,
 ? ?| ? ?esr = 0x0124F800,
 ? ?| ??)
-011|do_mem_abort(
 ? ?| ? ?addr = 0x8,
 ? ?| ? ?esr = 0x96000046,
 ? ?| ? ?regs = 0xFFFFFFC0B28BFA40)
 ? ?| ?inf = 0xFFFFFFC0013DC790 -> (
 ? ?| ? ?fn = 0xFFFFFFC000099A74,
 ? ?| ? ?sig = 0x0B,
 ? ?| ? ?code = 0x00030001,
 ? ?| ? ?name = 0xFFFFFFC0010DF250 -> 0x6C)
 ? ?| ?info = (
 ? ?| ? ?si_signo = 0x0032D110,
 ? ?| ? ?si_errno = 0xFFFFFFC0,
 ? ?| ? ?si_code = 0x01C0,
 ? ?| ? ?_sifields = (_pad = (0x7, 0x0, 0xB28BF9E0, 0xFFFFFFC0, 0x000A6D78, 0xFFFFFFC0, 0xB28BF9F0, 0xFFF
-012|el1_da(asm)
 -->|exception
-013|__list_del(inline)
-013|detach_timer(inline)
-013|detach_expired_timer(inline)
-013|__run_timers(inline)
-013|run_timer_softirq(
 ? ?| ??)
 ? ?| ?base = 0xFFFFFFC0B2618000 -> (
 ? ?| ? ?lock = (rlock = (raw_lock = (owner = 0x6FCD, next = 0x6FCE))),
 ? ?| ? ?running_timer = 0xFFFFFFC001741120 -> (
 ? ?| ? ? ?entry = (next = 0xFFFFFFC0B27BC9B8, prev = 0xFFFFFFC0B27BC9B8),
 ? ?| ? ? ?expires = 0x0000000100003098,
 ? ?| ? ? ?base = 0xFFFFFFC0B27BC000,
 ? ?| ? ? ?function = 0xFFFFFFC0006E277C -> ,
 ? ?| ? ? ?data = 0x0,
 ? ?| ? ? ?slack = 0xFFFFFFFF,
 ? ?| ? ? ?start_pid = 0xFFFFFFFF,
 ? ?| ? ? ?start_site = 0x0,
 ? ?| ? ? ?start_comm = (0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0))
 ? ?| ? ?timer_jiffies = 0x0000000100003035,
 ? ?| ? ?next_timer = 0x0000000100003034,
 ? ?| ? ?active_timers = 0x7,
 ? ?| ? ?all_timers = 0x7,
 ? ?| ? ?cpu_=_0x4,
 ? ?| ? ?tv1 = (vec = ((next = 0xFFFFFFC0B2618038, prev = 0xFFFFFFC0B2618038), (next = 0xFFFFFFC0B2618048
 ? ?| ? ?tv2 = (vec = ((next = 0xFFFFFFC0B2619038, prev = 0xFFFFFFC0B2619038), (next = 0xFFFFFFC0B2619048
 ? ?| ? ?tv3 = (vec = ((next = 0xFFFFFFC0B2619438, prev = 0xFFFFFFC0B2619438), (next = 0xFFFFFFC0B2619448
 ? ?| ? ?tv4 = (vec = ((next = 0xFFFFFFC0B2619838, prev = 0xFFFFFFC0B2619838), (next = 0xFFFFFFC0B2619848
 ? ?| ? ?tv5 = (vec = ((next = 0xFFFFFFC0B2619C38, prev = 0xFFFFFFC0B2619C38), (next = 0xFFFFFFC0B2619C48
 ? ?| ?fn = 0xFFFFFFC0006E277C ->
 ? ?| ?data = 0x0
 ? ?| ?it_func_ptr = 0x0
...

看到這里,我們可以猜想是不是run_timer_softirq的參數(shù)出現(xiàn)了問題導(dǎo)致后面發(fā)生的一系列異常?可以從這個方向開始思考,我們先來看下這個函數(shù)的實現(xiàn):

static void run_timer_softirq(struct softirq_action *h)
{
 ? ?struct tvec_base *base = __this_cpu_read(tvec_bases);
 ? ?hrtimer_run_pending();
 ? ?__run_deferrable_timers();
 ? ?if (time_after_eq(jiffies, base->timer_jiffies))
 ? ? ? ?__run_timers(base);
}

我們看到這個函數(shù)最重要的參數(shù)變量就是這個base,傳入的h沒有使用,繼續(xù)來看下base的結(jié)構(gòu)tvec_base :

struct tvec_base {
 ? ?spinlock_t lock;
 ? ?struct timer_list *running_timer;
 ? ?unsigned long timer_jiffies;
 ? ?unsigned long next_timer;
 ? ?unsigned long active_timers;
 ? ?unsigned long all_timers;
 ? ?int cpu; ?// 跟蹤所在的CPU是哪個核,這里是CPU 4
 ? ?struct tvec_root tv1;
 ? ?struct tvec tv2;
 ? ?struct tvec tv3;
 ? ?struct tvec tv4;
 ? ?struct tvec tv5;
} ____cacheline_aligned;

這里就看到 tvec_base 的結(jié)構(gòu)里面有個 struct timer_list * 的結(jié)構(gòu),我們繼續(xù)看它的結(jié)構(gòu)是怎么樣的:

struct timer_list {
 ? ?/*
 ? ? * All fields that change during normal runtime grouped to the
 ? ? * same cacheline
 ? ? */
 ? ?struct list_head entry;
 ? ?unsigned long expires;
 ? ?struct tvec_base *base;
 ? ?void (*function)(unsigned long);
 ? ?unsigned long data;
 ? ?int slack;
...

首先我們查看 running_timer 的數(shù)據(jù)內(nèi)容,

工具欄調(diào)出:view -> Watch,輸入:(struct timer_list *)0xffffffc001741120


這個就是發(fā)生異常的那個timer的數(shù)據(jù)結(jié)構(gòu)實例,我們最希望的就是希望可以通過這里的數(shù)據(jù)信息找到它在源碼的位置,然后進(jìn)一步分析它,使用Trace32的 dump 分析功能就可以做到這點。

菜單欄調(diào)出:view -> dump


輸入地址 0xffffffc001741120 然后點OK


右擊高亮,選擇view info:


同理,還可以看function的位置(0xFFFFFFC0006E277C):


上面所示,出異常的timer實例就是:fp_drv/m_timer, callback = timer_out_handle,源碼位置也給出來了,那么就可以著手修復(fù)問題了。



如何用Trace32分析內(nèi)核死機(干貨~)的評論 (共 條)

分享到微博請遵守國家法律
汶川县| 大厂| 南昌县| 旺苍县| 龙山县| 梁山县| 台安县| 兴安盟| 石渠县| 任丘市| 团风县| 灵武市| 龙里县| 神池县| 仙居县| 宣汉县| 许昌市| 凤冈县| 会泽县| 明水县| 德令哈市| 南和县| 长丰县| 鄂尔多斯市| 布尔津县| 福泉市| 汕尾市| 平塘县| 汝阳县| 普格县| 石城县| 沙雅县| 洛川县| 巩留县| 腾冲县| 中超| 南华县| 丹江口市| 顺义区| 广昌县| 万源市|