解决方案

android watchdog机制

seo靠我 2023-09-23 17:31:50

Android Watchdog 机制

早期手机平台上通常是在设备中增加一个硬件看门狗(WatchDog), 软件系统必须定时的向看门狗硬件中写值来表示自己没出故障(俗称“喂狗”), 否则超过了规定的时SEO靠我间看门狗就会重新启动设备. 大体原理是, 在系统运行以后启动了看门狗的计数器, 看门狗就开始自动计数,如果到了一定的时间还不去清看门狗,那么看门狗计数器就会溢出从而引起看门狗中断,造成系统复位。

而手机SEO靠我, 其实是一个超强超强的单片机, 其运行速度比单片机快N倍, 存储空间比单片机大N倍, 里面运行了若干个线程, 各种软硬件协同工作, Android 的 SystemServer 是一个非常复杂的进程SEO靠我,里面运行的服务超过五十种,是最可能出问题的进程,因此有必要对 SystemServer 中运行的各种线程实施监控。

但是如果使用硬件看门狗的工作方式,每个线程隔一段时间去喂狗,不但非常浪费CPU,而且SEO靠我会导致程序设计更加复杂。因此 Android 开发了 Watchdog 类作为软件看门狗来监控 SystemServer 中的线程。一旦发现问题,Watchdog 会杀死 SystemServer 进SEO靠我程。

Watchdog的功能

Watchdog主要有两个作用

Blocked in Monitor 被监控线程的monitor接口实现阻塞Blocked int handler 被监控线程的消息队列不处理消SEO靠我

判断线程是否卡住的方法

MessageQueue.isPolling Monitor.monitor --- HandlerChecker 检查loopSEO靠我er是否阻塞 monitor 检查是否死锁

Watchdog的工作机制

Watchdog的工作机制 https://img-blog.csdnimg.cn/img_convert/e5c8SEO靠我133c7f86583251c775de4ceae9c0.jpeg

Watchdog 的启动

Watchdog 是在 SystemServer 进程中被初始化和启动的,在 SystemServer 的 rSEO靠我un 方法中,各种Android 服务被注册和启动,其中也包括了Watchdog 的初始化和启动,代码如下:

final Watchdog watchdog = Watchdog.getInstanceSEO靠我();//line: 864 watchdog.init(context, mActivityManagerService);

在 SystemServer 中 startOtherSeSEO靠我rvices() 的后半段,在 AMS(ActivityManagerService) 的 SystemReady 接口的 CallBack 函数中实现 Watchdog 的启动:

Watchdog.gSEO靠我etInstance().start();//line: 1852

Watchdog的构造方法

super("watchdog"); //初始化每一个我们希望检查的线程 /SEO靠我/这里没有检查后台线程 //共享的前台线程是主检查器, 还有分配其monitor检查其它线程 mMonitorChecker = new HandlerChecker(SEO靠我FgThread.getHandler(),"foreground thread", DEFAULT_TIMEOUT); mHandlerCheckers.add(mMonitorChSEO靠我ecker); // 为主线程添加检查器 mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMSEO靠我ainLooper()),"main thread", DEFAULT_TIMEOUT)); // 为共享UI线程添加检查器 mHandlerCheckers.add(SEO靠我new HandlerChecker(UiThread.getHandler(),"ui thread", DEFAULT_TIMEOUT)); // 为共享IO线程添加检查器 SEO靠我 mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),"i/o thread", DEFAULT_TIMEOUT)); SEO靠我 // 为共享display线程添加检查器. mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandlSEO靠我er(),"display thread", DEFAULT_TIMEOUT));// 初始化检查器 binder线程. addMonitor(new BinderThreadMoniSEO靠我tor());mOpenFdMonitor = OpenFdMonitor.create();// See the notes on DEFAULT_TIMEOUT. assert DSEO靠我B ||DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;

Watchdog的构造方法中创建了一些HandleSEO靠我rChecker对象, 并添加到自己的监听队列中.

Watchdog添加的监听handler 线程名对应handler说明Timeoutforeground threadFgThSEO靠我read.getHandler()前台线程60smain threadnew Handler(Looper.getMainLooper())主线程60sui threadUiThread.getHanSEO靠我dler()UI线程60si/o threadIoThread.getHandler()IO线程60sdisplay threadDisplayThread.getHandler()Display线程SEO靠我60sPackageManageraddThread(mHandler, time)PackageManagerService主动add的线程10minPackageManageraddThread(SEO靠我mHandler, time)PermissionManagerService主动add的线程60sPowerManagerServiceaddThread(mHandler, time)PowerMSEO靠我anagerService主动add的线程60sActivityManagerServiceaddThread(mHandler, time)ActivityManagerService主动add的线SEO靠我程60s Watchdog添加的监听monitor monitor程名说明TimeoutBinderThreadMonitor检查Binder线程60sSEO靠我OpenFdMonitor检查fd线程60sTvRemoteServiceaddMonitor(this) mLockActivityManagerServiceaddMonitor(this) thSEO靠我isMediaProjectionManagerServiceaddMonitor(this) mLockMediaRouterServiceaddMonitor(this) mLockMediaSeSEO靠我ssionServiceaddMonitor(this) mLockInputManagerServiceaddMonitor(this) mInputFilterLock

nativeMonitor(SEO靠我mPtr);PowerManagerServiceaddMonitor(this) mLockNetworkManagementServiceaddMonitor(this) mConnectorStSEO靠我orageManagerServiceaddMonitor(this) mVoldWindowManagerServiceaddMonitor(this) mWindowMap

HandlerCheckSEO靠我er

public final class HandlerChecker implements Runnable

HandlerChecker用于检查句柄线程的状态和调度监视器回调, 其原理就是通过各个HSEO靠我andler的looper的MessageQueue来判断该线程是否卡住了。当然,该线程是运行在SystemServer进程中的线程。

Watchdog中会构建很多的HandlerChecker, 可以SEO靠我分为两类

Monitor Checker,用于检查是Monitor对象可能发生的死锁, AMS, PKMS, WMS等核心的系统服务都是Monitor对象。Looper Checker,用于检查线程的消SEO靠我息队列是否长时间处于工作状态。Watchdog自身的消息队列,ui, Io, display这些全局的消息队列都是被检查的对象。此外,一些重要的线程的消息队列,也会加入到Looper Checker中SEO靠我,譬如AMS, PKMS,这些是在对应的对象初始化时加入的。

两类HandlerChecker的侧重点不同

Monitor Checker 预警我们不能长时间持有核心系统服务的对象锁,否则会阻塞很多函数的SEO靠我运行Looper Checker预警我们不能长时间的霸占消息队列,否则其他消息将得不到处理 HandlerChecker的构造函数 public final classSEO靠我 HandlerChecker implements Runnable {private final Handler mHandler;private final String mName;privaSEO靠我te final long mWaitMax;private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();privateSEO靠我 boolean mCompleted;private Monitor mCurrentMonitor;private long mStartTime;HandlerChecker(Handler hSEO靠我andler, String name, long waitMaxMillis) {mHandler = handler; //线程handlermName = name; //名称mWaitMax SEO靠我= waitMaxMillis; //等待超时时间mCompleted = true; //线程状态} } HandlerChecker::scheduleCheckSEO靠我Locked

这个方法是在Watchdog中的run方法会调用, 是HandlerChecker的核心方法, 用来检查HandlerChecker是否发生了死锁.

public void scheduleSEO靠我CheckLocked() {if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {// If the SEO靠我target looper has recently been polling, then// there is no reason to enqueue our checker on it sincSEO靠我e that// is as good as it not being deadlocked. This avoid having// to do a context switch to check SEO靠我the thread. Note that we// only do this if mCheckReboot is false and we have no// monitors, since thSEO靠我ose would need to be executed at this point.mCompleted = true;return;}if (!mCompleted) {// we alreadSEO靠我y have a check in flight, so no needreturn;}mCompleted = false;mCurrentMonitor = null;mStartTime = SSEO靠我ystemClock.uptimeMillis();mHandler.postAtFrontOfQueue(this); } isPolling() 这个方法是判断当SEO靠我前线程Looper是否就绪的核心方法. 如果true 当前正在轮询事件, 正常运行, 会继续向下执行如果没有mCompleted, 说明已经在检查了`mHandler.postAtFrontOfQueSEO靠我ue(this)将自己post到队列中, 之后会执行run方法

在scheduleCheckLocked 中,其实主要是处理mMonitorChecker 的情况,对于其他的没有monitor 注册进来SEO靠我的且处于polling 状态的 HandlerChecker 是不去检查的,例如,UiThread,肯定一直处于polling 状态。

MessageQueue::isPolling

mHandler.gSEO靠我etLooper().getQueue().isPolling() 这个方法可以判断当前线程是否被卡住.

true: 表示looper当前正在轮询事件,

这个方法的实现在MessageQueue中,可以看SEO靠我到上面的注释写到:返回当前的looper线程是否在polling工作来做,这个是个很好的用于检测loop是否存活的方法。

frameworks/base/core/java/android/os/MesSEO靠我sageQueue.java

/*** Returns whether this loopers thread is currently polling for more work to do.* ThSEO靠我is is a good signal that the loop is still alive rather than being stuck* handling a callback. Note SEO靠我that this method is intrinsically racy, since the* state of the loop can change before you get the rSEO靠我esult back.** <p>This method is safe to call from any thread.** @return True if the looper is currenSEO靠我tly polling for events.* @hide*/ public boolean isPolling() {synchronized (this) {return isPSEO靠我ollingLocked();} } HandlerChecker::run @Override public void run() SEO靠我{final int size = mMonitors.size();for (int i = 0 ; i < size ; i++) {synchronized (Watchdog.this) {mSEO靠我CurrentMonitor = mMonitors.get(i);}mCurrentMonitor.monitor();}synchronized (Watchdog.this) {mCompletSEO靠我ed = true;mCurrentMonitor = null;} } 里面对自己的Monitors遍历并进行monitor。若有monitor发生了阻塞,那么mCSEO靠我omplete会一直是false。for循环用来检测监听列表中是否有阻塞,而且只有mMonitorChecker会走进此循环其余的handlerChecker因为mMonitors为空,都不会执行此循SEO靠我环 HandlerChecker::getCompletionStateLocked public int getCompletionStateLocked() {ifSEO靠我 (mCompleted) {return COMPLETED;} else {long latency = SystemClock.uptimeMillis() - mStartTime;if (lSEO靠我atency < mWaitMax/2) {return WAITING;} else if (latency < mWaitMax) {return WAITED_HALF;}}return OVESEO靠我RDUE; } 获取完成时间标识, mStartTime初值是在scheduleCheckLocked中设置的在系统检测调用这个获取未完成状态时,就会进入else里面SEO靠我,进行了时间的计算,并返回相应的时间状态码。 线程的状态 状态描述COMPLETED对应消息已处理完毕线程无阻塞WAITING对应消息处理花费0~29秒,继续运行WAISEO靠我TED_HALF对应消息处理花费30~59秒,线程可能已经被阻塞,需要保存当前AMS堆栈状态, 继续监听OVERDUE对应消息处理已经花费超过60, 准备 kill 当前进程. 能够走到这里,说明已经SEO靠我发生了超时60秒了。那么下面接下来全是应对超时的情况 HandlerThread的继承关系

这里的HandlerChecker使用的传入参数都是创建的HandlerThread线程的HaSEO靠我ndler

java.lang.Object↳ Thread implements Runnable↳ HandlerThread extends Thread↳ ServiceThread extenSEO靠我ds HandlerThread↳ FgThread extends ServiceThread 初始化的HandlerChecker public ServiceTSEO靠我hread(String name, int priority, boolean allowIo)private FgThread() {super("android.fg", android.os.SEO靠我Process.THREAD_PRIORITY_DEFAULT, true /*allowIo*/); }private UiThread() {super("android.ui",SEO靠我 Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/); }private IoThread() {super("androidSEO靠我.io", android.os.Process.THREAD_PRIORITY_DEFAULT, true /*allowIo*/); }private DisplayThread(SEO靠我) {//DisplayThread运行重要的东西,但这些东西不如AnimationThread中运行的东西重要。//因此,将优先级设置为较低的一个。super("android.display", SEO靠我Process.THREAD_PRIORITY_DISPLAY + 1, false /*allowIo*/); } Android线程优先级

frameworks/bSEO靠我ase/core/java/android/os/Process.java

public static final int THREAD_PRIORITY_DEFAULT = 0; //默认的线程优先级SEO靠我 public static final int THREAD_PRIORITY_LOWEST = 19; //最低的线程级别 public static final SEO靠我int THREAD_PRIORITY_BACKGROUND = 10; //后台线程建议设置这个优先级 public static final int THREAD_PRIORITYSEO靠我_FOREGROUND = -2; //用户正在交互的UI线程,代码中无法设置该优先级,系统会按照情况调整到该优先级 public static final int THREAD_PRSEO靠我IORITY_DISPLAY = -4; //也是与UI交互相关的优先级界别,但是要比THREAD_PRIORITY_FOREGROUND优先 public static final SEO靠我int THREAD_PRIORITY_URGENT_DISPLAY = -8; //显示线程的最高级别,用来处理绘制画面和检索输入事件 public static final intSEO靠我 THREAD_PRIORITY_AUDIO = -16; //声音线程的标准级别 public static final int THREAD_PRIORITY_URGENT_AUDSEO靠我IO = -19; //声音线程的最高级别,优先程度较THREAD_PRIORITY_AUDIO要高。 public static final int THREAD_PRIORITY_SEO靠我MORE_FAVORABLE = -1; //相对THREAD_PRIORITY_DEFAULT稍微优先 public static final int THREAD_PRIORITYSEO靠我_LESS_FAVORABLE = 1; // 相对THREAD_PRIORITY_DEFAULT稍微落后一些

应用设置线程优先级的方法如下, 但是有一些级别是不允许应用设置的, 是由系统进行分配的.

PSEO靠我rocess.setThreadPriority(Process.THREAD_PRIORITY_BACKGROUND +Process.THREAD_PRIORITY_LESS_FAVORABLE)SEO靠我 describeBlockedStateLocked public String describeBlockedStateLocked() {if (mCurrenSEO靠我tMonitor == null) {return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";} elSEO靠我se {return "Blocked in monitor " + mCurrentMonitor.getClass().getName()+ " on " + mName + " (" + getSEO靠我Thread().getName() + ")";} }

打印Monitor信息

Monitor

Monitor是一个接口, 用来

public interface Monitor {voidSEO靠我 monitor(); } 实现Watchdog.Monitor接口的类

ActivityManagerService

WindowManagerService

PowerSEO靠我ManagerService

InputManagerService

MediaSessionService

MediaRouterService

StorageManagerService

NetworkMaSEO靠我nagementService

NativeDaemonConnector

MediaProjectionManagerService

TvRemoteService

BinderThreadMonitor

OSEO靠我penFdMonitor

Monitor是一个接口,实现这个接口的类有好几个。比如:如下是android9.0搜出来的结果

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QSEO靠我pJfi2aa-1666612570217)(/home/jun/Desktop/Plane3/CoreSystemServer/watchdog/WatchdogImplClass.png)]

使用WSEO靠我atchdog

这么多的类实现了该接口, 他们都注册到了Watchdog中, 如AMS中

public class ActivityManagerService extends IActivityManaSEO靠我ger.Stubimplements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback {......public ActivityManagerSSEO靠我ervice(Context systemContext) {......Watchdog.getInstance().addMonitor(this);Watchdog.getInstance().SEO靠我addThread(mHandler);......}....../** In this method we try to acquire our lock to make sure that we SEO靠我have not deadlocked */public void monitor() {synchronized (this) { }}...... } WatchSEO靠我dog::addThread public void addThread(Handler thread) {addThread(thread, DEFAULT_TIMEOUT); //SEO靠我60s }public void addThread(Handler thread, long timeoutMillis) {synchronized (this) {if (isASEO靠我live()) {throw new RuntimeException("Threads cant be added once the Watchdog is running");}final StrSEO靠我ing name = thread.getLooper().getThread().getName();mHandlerCheckers.add(new HandlerChecker(thread, SEO靠我name, timeoutMillis));} } addThread是将线程的Hander传给Watchdog, 然后Watchdog会根据Handler创建一个新SEO靠我的HandlerChecker,将新的HandlerChecker添加到监听队列中 Watchdog::addMonitor public void addMonitoSEO靠我r(Monitor monitor) {synchronized (this) {if (isAlive()) {throw new RuntimeException("Monitors cant bSEO靠我e added once the Watchdog is running");}mMonitorChecker.addMonitor(monitor);} } 传递mSEO靠我onitor, Watchdog会调用monitor方法, 来判断是否发生阻塞所有的Monitor都添加到了mMonitorChecker, 所以只有mMonitorChecker里是有MonitorSEO靠我

Watchdog::run()

Watchdog的核心方法, 检查线程死锁, looper阻塞, 收集信息和kill掉system_server进程, 重启

@Override publSEO靠我ic void run() {boolean waitedHalf = false;while (true) {final List<HandlerChecker> blockedCheckers;fSEO靠我inal String subject;final boolean allowRestart;int debuggerWasConnected = 0;synchronized (this) {lonSEO靠我g timeout = CHECK_INTERVAL;// Make sure we (re)spin the checkers that have become idle within// thisSEO靠我 wait-and-check intervalfor (int i=0; i<mHandlerCheckers.size(); i++) {//调用每个HandlerChecker的scheduleSEO靠我CheckLocked() 方法HandlerChecker hc = mHandlerCheckers.get(i);hc.scheduleCheckLocked();}if (debuggerWaSEO靠我sConnected > 0) {debuggerWasConnected--;}// NOTE: We use uptimeMillis() here because we do not want SEO靠我to increment the time we// wait while asleep. If the device is asleep then the thing that we are waiSEO靠我ting// to timeout on is asleep as well and wont have a chance to run, causing a false// positive on SEO靠我when to kill things.long start = SystemClock.uptimeMillis(); while (timeout > 0) {if (Debug.isDebuggSEO靠我erConnected()) {debuggerWasConnected = 2;}try {wait(timeout);} catch (InterruptedException e) {Log.wSEO靠我tf(TAG, e);}if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}timeout = CHECK_INTERVAL - (SEO靠我SystemClock.uptimeMillis() - start);}boolean fdLimitTriggered = false;if (mOpenFdMonitor != null) {fSEO靠我dLimitTriggered = mOpenFdMonitor.monitor();}if (!fdLimitTriggered) {final int waitState = evaluateChSEO靠我eckerCompletionLocked();if (waitState == COMPLETED) { //线程状态正常,重新轮询// The monitors have returned; reSEO靠我setwaitedHalf = false;continue;} else if (waitState == WAITING) {//处于阻塞状态,但监测时间小于30s,继续监测// still waSEO靠我iting but within their configured intervals; back off and recheckcontinue;} else if (waitState == WASEO靠我ITED_HALF) {//处于阻塞状态,监测时间已经超过30s,开始dump一些系统信息,然后继续监测30sif (!waitedHalf) {// Weve waited half the deaSEO靠我dlock-detection interval. Pull a stack// trace and wait another half.ArrayList<Integer> pids = new ASEO靠我rrayList<Integer>();pids.add(Process.myPid());ActivityManagerService.dumpStackTraces(true, pids, nulSEO靠我l, null,getInterestingNativePids());waitedHalf = true;}continue;}// something is overdue!blockedChecSEO靠我kers = getBlockedCheckersLocked();subject = describeCheckersLocked(blockedCheckers);} else {blockedCSEO靠我heckers = Collections.emptyList();subject = "Open FD high water mark reached";}allowRestart = mAllowSEO靠我Restart;}// If we got here, that means that the system is most likely hung.// First collect stack trSEO靠我aces from all threads of the system process.// Then kill this process so that the system will restarSEO靠我t.EventLog.writeEvent(EventLogTags.WATCHDOG, subject);ArrayList<Integer> pids = new ArrayList<>();piSEO靠我ds.add(Process.myPid());if (mPhonePid > 0) pids.add(mPhonePid);// Pass !waitedHalf so that just in cSEO靠我ase we somehow wind up here without having// dumped the halfway stacks, we properly re-initialize thSEO靠我e trace file.final File stack = ActivityManagerService.dumpStackTraces(!waitedHalf, pids, null, nullSEO靠我, getInterestingNativePids());// Give some extra time to make sure the stack traces get written.// TSEO靠我he systems been hanging for a minute, another second or two wont hurt much.SystemClock.sleep(2000);/SEO靠我/ Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel logdoSysRSEO靠我q(w);doSysRq(l);// Try to add the error to the dropbox, but assuming that the ActivityManager// itseSEO靠我lf may be deadlocked. (which has happened, causing this statement to// deadlock and the watchdog as SEO靠我a whole to be ineffective)Thread dropboxThread = new Thread("watchdogWriteToDropbox") {public void rSEO靠我un() {mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null,subject, null, stackSEO靠我, null);}};dropboxThread.start();try {dropboxThread.join(2000); // wait up to 2 seconds for it to reSEO靠我turn.} catch (InterruptedException ignored) {}IActivityController controller;synchronized (this) {coSEO靠我ntroller = mController;}if (controller != null) {Slog.i(TAG, "Reporting stuck state to activity contSEO靠我roller");try {Binder.setDumpDisabled("Service dumps disabled due to hung system process.");// 1 = keSEO靠我ep waiting, -1 = kill systemint res = controller.systemNotResponding(subject);if (res >= 0) {Slog.i(SEO靠我TAG, "Activity controller requested to coninue to wait");waitedHalf = false;continue;}} catch (RemotSEO靠我eException e) {}}// Only kill the process if the debugger is not attached.if (Debug.isDebuggerConnecSEO靠我ted()) {debuggerWasConnected = 2;}if (debuggerWasConnected >= 2) {Slog.w(TAG, "Debugger connected: WSEO靠我atchdog is *not* killing the system process");} else if (debuggerWasConnected > 0) {Slog.w(TAG, "DebSEO靠我ugger was connected: Watchdog is *not* killing the system process");} else if (!allowRestart) {Slog.SEO靠我w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");} else {Slog.w(TAG, "***SEO靠我 WATCHDOG KILLING SYSTEM PROCESS: " + subject);WatchdogDiagnostics.diagnoseCheckers(blockedCheckers)SEO靠我;Slog.w(TAG, "*** GOODBYE!");Process.killProcess(Process.myPid());System.exit(10);}waitedHalf = falsSEO靠我e;} }

run() 方法就是死循环, 不断的去遍历所有HandlerChecker,并调其监控方法,等待三十秒,评估状态。

遍历所有的HandlerChecker, 并调用其schedSEO靠我uleCheckLocked方法, 记录开始时间

for (int i=0; i<mHandlerCheckers.size(); i++) {HandlerChecker hc = mHandlerCSEO靠我heckers.get(i);hc.scheduleCheckLocked(); }

等待 30 秒

// 等待30秒 //使用uptimeMills是为了不把手机睡眠时间SEO靠我算进入,手机睡眠时系统服务同样睡眠 long start = SystemClock.uptimeMillis(); while (timeout > 0) {if (SEO靠我Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}try {wait(timeout);} catch (InterruptedExcepSEO靠我tion e) {Log.wtf(TAG, e);}if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}timeout = CHECSEO靠我K_INTERVAL - (SystemClock.uptimeMillis() - start); }

评估Checker的状态,里面会遍历所有的HandlerChecker,并获取最SEO靠我大的返回值。

最大的返回值有四种情况:COMPLETED 对应消息已处理完毕线程无阻塞WAITING 对应消息处理花费0~29秒,继续运行WAITED_HALF 对应消息处理花费30~59秒,线程可能已SEO靠我经被阻塞,需要保存当前AMS堆栈状态, 继续监听OVERDUE 对应消息处理已经花费超过60, 准备 kill 当前进程. 能够走到这里,说明已经发生了超时60秒了。那么下面接下来全是应对超时的情况 SEO靠我boolean fdLimitTriggered = false; if (mOpenFdMonitor != null) {fdLimitTriggered = mOpenFdMonSEO靠我itor.monitor(); } if (!fdLimitTriggered) {final int waitState = evaluateCheckerComplSEO靠我etionLocked();if (waitState == COMPLETED) {// The monitors have returned; resetwaitedHalf = false;coSEO靠我ntinue;} else if (waitState == WAITING) {// still waiting but within their configured intervals; bacSEO靠我k off and recheckcontinue;} else if (waitState == WAITED_HALF) {if (!waitedHalf) {// Weve waited halSEO靠我f the deadlock-detection interval. Pull a stack// trace and wait another half.ArrayList<Integer> pidSEO靠我s = new ArrayList<Integer>();pids.add(Process.myPid());ActivityManagerService.dumpStackTraces(true, SEO靠我pids, null, null,getInterestingNativePids());waitedHalf = true;}continue;}// something is overdue!blSEO靠我ockedCheckers = getBlockedCheckersLocked();subject = describeCheckersLocked(blockedCheckers); SEO靠我 } else {blockedCheckers = Collections.emptyList();subject = "Open FD high water mark reached"; SEO靠我 }

fdMonitor

public boolean monitor() {if (mFdHighWaterMark.exists()) {dumpOpenDescriptors();returSEO靠我n true;}return false; }

收集信息

杀死系统进程

Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subjeSEO靠我ct); WatchdogDiagnostics.diagnoseCheckers(blockedCheckers); Slog.w(TAG, "*** GOODBYESEO靠我!"); Process.killProcess(Process.myPid()); System.exit(10); HandlerChecker:SEO靠我:scheduleCheckLocked HandlerChecker::run Watchdog::evaluateCheckerCompletionLocked

评估SEO靠我Checker的状态,里面会遍历所有的HandlerChecker,并获取最大的返回值。

private int evaluateCheckerCompletionLocked() {int stateSEO靠我 = COMPLETED;// COMPLETED = 0for (int i=0; i<mHandlerCheckers.size(); i++) {HandlerChecker hc = mHanSEO靠我dlerCheckers.get(i);state = Math.max(state, hc.getCompletionStateLocked());}return state; } SEO靠我 HandlerChecker::getCompletionStateLocked Watchdog::getBlockedCheckersLocked SEO靠我 Watchdog::describeCheckersLocked private ArrayList<HandlerChecker> getBlockedCheckersLockedSEO靠我() {ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();for (int i=0; i<mHandlerCheSEO靠我ckers.size(); i++) {HandlerChecker hc = mHandlerCheckers.get(i);if (hc.isOverdueLocked()) {checkers.SEO靠我add(hc);}}return checkers; }private String describeCheckersLocked(List<HandlerChecker> checkSEO靠我ers) {StringBuilder builder = new StringBuilder(128);for (int i=0; i<checkers.size(); i++) {if (builSEO靠我der.length() > 0) {builder.append(", ");}builder.append(checkers.get(i).describeBlockedStateLocked()SEO靠我);}return builder.toString(); } 打印阻塞或死锁线程的信息

注意

通过 monitor() 方法检查死锁针对不同线程之间的,而服务主线程是否SEO靠我阻塞是针对主线程,所以通过 sendMessage() 方式是只能检测主线程是否阻塞,而不能检测是否死锁,因为如果服务主线程和另外一个线程发生死锁(如另外一个线程synchronized 关键字长时间SEO靠我持有某个锁,不释放),此时向主线程发送 Message,主线程的Handler是可以继续处理的。

触发方法

Blocked in Monitor

使用Monitor接口中的锁一直无法释放即可Blocked SEO靠我in handler

可以在Service的onCreate中做crash, 这样长时间就会导致systemServer重启.

触发log

常见Log有下面两种,一种是Blocked in handlerSEO靠我另外一种是:Blocked in monitor

Blocked in handler11-15 06:56:39.696 24203 24902 W Watchdog: *** WATCHDOG KISEO靠我LLING SYSTEM PROCESS: Blocked in handler on main thread (main), Blocked in handler on ui thread (andSEO靠我roid.ui) 11-15 06:56:39.696 24203 24902 W Watchdog: main thread stack trace: 11-15 0SEO靠我6:56:39.696 24203 24902 W Watchdog: at android.os.MessageQueue.nativePollOnce(Native Method) SEO靠我 11-15 06:56:39.696 24203 24902 W Watchdog: at android.os.MessageQueue.next(MessageQueue.java:323) SEO靠我 11-15 06:56:39.696 24203 24902 W Watchdog: at android.os.Looper.loop(Looper.java:142) SEO靠我 11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.server.SystemServer.run(SystemServer.javaSEO靠我:377) 11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.server.SystemServer.main(SysSEO靠我temServer.java:239) 11-15 06:56:39.696 24203 24902 W Watchdog: at java.lang.reflect.Method.iSEO靠我nvoke(Native Method) 11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.internal.os.ZSEO靠我ygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:901) 11-15 06:56:39.696 24203 24902 W WatcSEO靠我hdog: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:791) 11-15 06:56:39.696 242SEO靠我03 24902 W Watchdog: ui thread stack trace: ......Blocked in monitor10-26 00:07:00.884 1000 1SEO靠我7132 17312 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.WaSEO靠我tchdog$BinderThreadMonitor on foreground thread (android.fg) 10-26 00:07:00.884 1000 17132 1SEO靠我7312 W Watchdog: foreground thread stack trace: 10-26 00:07:00.885 1000 17132 17312 W WatchdSEO靠我og: at android.os.Binder.blockUntilThreadAvailable(Native Method) 10-26 00:07:00.885 1000 17SEO靠我132 17312 W Watchdog: at com.android.server.Watchdog$BinderThreadMonitor.monitor(Watchdog.java:381) SEO靠我 10-26 00:07:00.885 1000 17132 17312 W Watchdog: at com.android.server.Watchdog$HandlerCheckeSEO靠我r.run(Watchdog.java:353) 10-26 00:07:00.885 1000 17132 17312 W Watchdog: at android.os.HandlSEO靠我er.handleCallback(Handler.java:873) 10-26 00:07:00.886 1000 17132 17312 W Watchdog: at androSEO靠我id.os.Handler.dispatchMessage(Handler.java:99) 10-26 00:07:00.886 1000 17132 17312 W WatchdoSEO靠我g: at android.os.Looper.loop(Looper.java:193) 10-26 00:07:00.886 1000 17132 17312 W WatchdogSEO靠我: at android.os.HandlerThread.run(HandlerThread.java:65) 10-26 00:07:00.886 1000 17132 17312SEO靠我 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44) 10-26 00:07:00.8SEO靠我86 1000 17132 17312 W Watchdog: *** GOODBYE!

reference

Android SystemServer 中 WatchDog 机制介绍

Android系统层WSEO靠我atchdog机制源码分析

Watchdog原理和问题分析

Android 系统中的 WatchDog 详解

应用与系统稳定性第五篇—Watchdog原理和问题分析

Watchdog 日志分析

WatchdogSEO靠我识别到SystemServer线程死锁后, 会收集打印信息, 代码在run函数中

while (true) {//如果发生了死锁或者消息队列阻塞就会走到下面 // If we got here, thaSEO靠我t means that the system is most likely hung.// First collect stack traces from all threads of the sySEO靠我stem process.// Then kill this process so that the system will restart.EventLog.writeEvent(EventLogTSEO靠我ags.WATCHDOG, subject);ArrayList<Integer> pids = new ArrayList<>();pids.add(Process.myPid());if (mPhSEO靠我onePid > 0) pids.add(mPhonePid);// Pass !waitedHalf so that just in case we somehow wind up here witSEO靠我hout having// dumped the halfway stacks, we properly re-initialize the trace file.final File stack =SEO靠我 ActivityManagerService.dumpStackTraces(!waitedHalf, pids, null, null, getInterestingNativePids());/SEO靠我/ Give some extra time to make sure the stack traces get written.// The systems been hanging for a mSEO靠我inute, another second or two wont hurt much.SystemClock.sleep(2000);// Trigger the kernel to dump alSEO靠我l blocked threads, and backtraces on all CPUs to the kernel logdoSysRq(w);doSysRq(l);// Try to add tSEO靠我he error to the dropbox, but assuming that the ActivityManager// itself may be deadlocked. (which haSEO靠我s happened, causing this statement to// deadlock and the watchdog as a whole to be ineffective)ThreaSEO靠我d dropboxThread = new Thread("watchdogWriteToDropbox") {public void run() {mActivity.addErrorToDropBSEO靠我ox("watchdog", null, "system_server", null, null,subject, null, stack, null);}};dropboxThread.start(SEO靠我);try {dropboxThread.join(2000); // wait up to 2 seconds for it to return.} catch (InterruptedExceptSEO靠我ion ignored) {}IActivityController controller;synchronized (this) {controller = mController;}if (conSEO靠我troller != null) {Slog.i(TAG, "Reporting stuck state to activity controller");try {Binder.setDumpDisSEO靠我abled("Service dumps disabled due to hung system process.");// 1 = keep waiting, -1 = kill systemintSEO靠我 res = controller.systemNotResponding(subject);if (res >= 0) {Slog.i(TAG, "Activity controller requeSEO靠我sted to coninue to wait");waitedHalf = false;continue;}} catch (RemoteException e) {}}// Only kill tSEO靠我he process if the debugger is not attached.if (Debug.isDebuggerConnected()) {debuggerWasConnected = SEO靠我2;}if (debuggerWasConnected >= 2) {Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the sySEO靠我stem process");} else if (debuggerWasConnected > 0) {Slog.w(TAG, "Debugger was connected: Watchdog iSEO靠我s *not* killing the system process");} else if (!allowRestart) {Slog.w(TAG, "Restart not allowed: WaSEO靠我tchdog is *not* killing the system process");} else {Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESSEO靠我S: " + subject);WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);Slog.w(TAG, "*** GOODBYE!");PrSEO靠我ocess.killProcess(Process.myPid());System.exit(10);}waitedHalf = false; }

输出event log

EventLogSEO靠我.writeEvent(EventLogTags.WATCHDOG, subject);

dump 堆栈信息

ArrayList<Integer> pids = new ArrayList<>(); SEO靠我 pids.add(Process.myPid()); if (mPhonePid > 0) pids.add(mPhonePid); // Pass !waSEO靠我itedHalf so that just in case we somehow wind up here without having // dumped the halfway sSEO靠我tacks, we properly re-initialize the trace file. final File stack = ActivityManagerService.dSEO靠我umpStackTraces(!waitedHalf, pids, null, null, getInterestingNativePids()); // Give some extrSEO靠我a time to make sure the stack traces get written. // The systems been hanging for a minute, SEO靠我another second or two wont hurt much. SystemClock.sleep(2000);

dump kerner info

// Trigger theSEO靠我 kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log doSysRq(w)SEO靠我; doSysRq(l);

收集dropbox信息

// Try to add the error to the dropbox, but assuming that the ActiviSEO靠我tyManager // itself may be deadlocked. (which has happened, causing this statement to SEO靠我 // deadlock and the watchdog as a whole to be ineffective) Thread dropboxThread = new ThreSEO靠我ad("watchdogWriteToDropbox") {public void run() {mActivity.addErrorToDropBox("watchdog", null, "systSEO靠我em_server", null, null,subject, null, stack, null);} }; dropboxThread.start(); SEO靠我 try {dropboxThread.join(2000); // wait up to 2 seconds for it to return. } catch (InterruSEO靠我ptedException ignored) {}

kill 掉系统进程, 如果不在debug模式, 就kill掉自己

// Only kill the process if the debugger iSEO靠我s not attached. if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2; } SEO靠我 if (debuggerWasConnected >= 2) {Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the sysSEO靠我tem process"); } else if (debuggerWasConnected > 0) {Slog.w(TAG, "Debugger was connected: WaSEO靠我tchdog is *not* killing the system process"); } else if (!allowRestart) {Slog.w(TAG, "RestarSEO靠我t not allowed: Watchdog is *not* killing the system process"); } else {Slog.w(TAG, "*** WATCSEO靠我HDOG KILLING SYSTEM PROCESS: " + subject);WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);SlogSEO靠我.w(TAG, "*** GOODBYE!");Process.killProcess(Process.myPid());System.exit(10); }

prop dalvik.vSEO靠我m.stack-trace-dir

指的是 /data/anr

final String tracesDirProp = SystemProperties.get("dalvik.vm.stack-traSEO靠我ce-dir", "");

reference

Android 系统中WatchDog 日志分析

Java基础之—反射

“SEO靠我”的新闻页面文章、图片、音频、视频等稿件均为自媒体人、第三方机构发布或转载。如稿件涉及版权等问题,请与 我们联系删除或处理,客服邮箱:html5sh@163.com,稿件内容仅为传递更多信息之目的,不代表本网观点,亦不代表本网站赞同 其观点或证实其内容的真实性。

网站备案号:浙ICP备17034767号-2