Why is my event loop still running?

0001-01-01

You can find the source to this demo in my repo.

It all started with tests

Well, not a failed test, but I ran into an issue with the node built-in test runner where my unit tests completed, but my test runner wouldn’t exit. Frustrated, I switched to Jest, which detected that there might be open handles keeping my code from exiting. Running Jest with --detectOpenHandles, I was able to figure out that there was a Redis connection object in my code that was opened, but never closed, which prevented the event loop from returning (and thus exiting).

I didn’t really understand why this would keep my event loop from exiting, but digging through the Node API, it looks like certain objects (like Timers, Sockets etc) can keep the event loop from exiting. An unref API function is used to detach an object from the event loop’s exit condition (and similary a hasRef() fn tells us if an object that isn’t closed can keep an event loop alive or not)

Asking chatGPT how to detect objects that might keep the event loop alive, I stumbled across why-is-node-running, a library that uses Node’s async hooks to track the lifecycle of async objects.

Using the init hook, we can hook into async objects (such as Promises, Timeouts, Immediate’s etc) when they are created. When running my init and destroy hooks, however, the only kind of info I could access was the type of the async object and an id associated with it, but nothing else.

But how do we know which part of the code (is it our code, a library, or an Node JS built-in) created it ?

Here is where why-is-node-running uses an interesting trick. The V8 JS library is used by NodeJS as the JS interpreter/JITcompiler (a lot of the speed of NodeJS can be attributed to just how much wicked fast V8 is). V8 has a API that allows one to customize the Error object in Javascript. More specifically, V8 adds a .stack property to the Error object in Javascript, that contains the trace of the last 10 (customizable) function frames in the stack, leading upto the function where the Error object was created. The trace (stored at .stack propery) is a formatted string.

For example, here is the value of Error.stack on my Node Repl:

$ node
Welcome to Node.js v20.6.1.
Type ".help" for more information.
> function myFunc() {
...   let err = new Error("abced");
...   console.log(`${err.stack}`);
... }
undefined
> myFunc();
Error: abced
    at myFunc (REPL14:2:13)
    at REPL15:1:1
    at Script.runInThisContext (node:vm:122:12)
    at REPLServer.defaultEval (node:repl:593:29)
    at bound (node:domain:433:15)
    at REPLServer.runBound [as eval] (node:domain:444:12)
    at REPLServer.onLine (node:repl:923:10)
    at REPLServer.emit (node:events:526:35)
    at REPLServer.emit (node:domain:489:12)
    at [_onLine] [as _onLine] (node:internal/readline/interface:416:12)

The formatted string stored at .stack can be customized. You can do this by attaching a function to the prepareStackTrace property of the error Error object. This is a function of 2 params: The Error object created and an array of callsites, where callsite is a function present in the stack above the fn where the Error object is created. The return value of this prepareStackTrace function is stored in the Error.stack property when new Error(...) is created. You can find more information about this in the Stack Trace API documentation.

why-is-node-running uses this trick to store a trace of the code where the async object is created. When a new async object (Promise, Timeout etc) is created, the init hooks runs. In the hook fn, the library creates a error object that is tracked along with the async object, thus pointing to where this object was initialized

  init (asyncId, type, triggerAsyncId, resource) {
    if (type === 'TIMERWRAP' || type === 'PROMISE') return
    if (type === 'PerformanceObserver' || type === 'RANDOMBYTESREQUEST') return
    var err = new Error('whatevs')
    var stacks = stackback(err)
    active.set(asyncId, {type, stacks, resource})

Stackback is a library that provides some helpers around extracting the info. about the fn frames leading to the Error object

The active object is a global Map mapping each asyncId to its type, the stack of fns in the Error.

When you need the information about why your node process is kept alive, you call the whyIsNodeRunning function, which iterates through the list of all async object created, identifies the ones that can keep the even loop alive (for example, Unresolved Promises do not keep the event loop alive). The objects that can keep the event loop alive, have a hasRef function that returns true when called. An extract from the why-is-node-running library:

function whyIsNodeRunning (logger) {
  if (!logger) logger = console

  hook.disable()
  var activeResources = [...active.values()].filter(function(r) {
    if (
      typeof r.resource.hasRef === 'function'
      && !r.resource.hasRef()
    ) return false
    return true
  })

  logger.error('There are %d handle(s) keeping the process running', activeResources.length)
  for (const o of activeResources) printStacks(o)

  function printStacks (o) {
    var stacks = o.stacks.slice(1).filter(function (s) {
      var filename = s.getFileName()
      return filename && filename.indexOf(sep) > -1 && filename.indexOf('internal' + sep) !== 0
    })

Well if my Node instance is stuck, how do I trigger this function ?

Enter Unix Signals.

We can send a Signal to a process, which triggers a signal handler to handle it. Some Signals (like KILL, SEGV) cannot be handled and the process terminates, but there are standard signals that can be customized and handled by our process. For example, SIGUSR1 and SIGUSR2 can be customized by the program for Interprocess communication or custom signal handling etc (you can also use SIGPIPE, because Node by default ignores SIGPIPE)

In NodeJS, we can setup a signal handler on our process to execute a function when a signal is received. I set up the signal handler in my demo, to trigger the function that iterates through the map of async objects created to see what object could possibly keeping the program alive

process.on('SIGUSR1', () => { console.log("captured sigterm") ; showMeTheCulprit(fd)});

Caveats

Async Hooks have a performance impact. It might be prudent to have this instrumentation as an alternate main function of sorts, to be used only when you run into issues and have to debug the program and not running all the time in Production

Conclusion

Researching on this topic was a fantastic way for me to learn a bit more about NodeJS internals and how asynchronous object work :) To more such hacking and learning

Reply to this post by email ↪