Gowind's site

Node JS internals - How async file system calls work

Note: The internal details of NodeJs (parts that are not exposed as an API) will always be subject to change.

This post was written based on the Node JS source code at commit f09a50c39d92efd5ed65a87fb07f64675baa8774. The info in this post might become obsolete if and when Node JS internals change (althought that might be rare, but nevertheless still possible)

As discussed in our lib_uv primer , we know how we can execute a fn (work) to be run in a worker on libuv’s threadpool and then have a callback fn execute on the main thread, once work is executed on a worker thread)

The fn uv_fs_open is used to open a file without blocking the main/event thread (as open in most UNIXes is a blocking call). It does so by submitting a work fn that is the open call with a uv__fs_done callback fn.

In the uv_fs_open implementation , we are expanding the INIT, PATH and POST macros, to show what happens under the hood.

Node JS open wraps uv_fs_open in a way that we can open the file in a worker thread and have the user provided JS callback fn execute on the event thread. We are basically looking at how this flow works.

int uv_fs_open(uv_loop_t* loop,
               uv_fs_t* req,
               const char* path,
               int flags,
               int mode,
               uv_fs_cb cb) {
  INIT(OPEN);
  PATH;
  req->flags = flags;
  req->mode = mode;
  POST;
}

INIT, PATH and POST are macros that expand to code like

do {                                                                        \
    if (req == NULL)                                                          \
      return UV_EINVAL;                                                       \
    UV_REQ_INIT(req, UV_FS);                                                  \
    req->fs_type = UV_FS_OPEN                                        \
    req->result = 0;                                                          \
    req->ptr = NULL;                                                          \
    req->loop = loop;                                                         \
    req->path = NULL;                                                         \
    req->new_path = NULL;                                                     \
    req->bufs = NULL;                                                         \
    req->cb = cb;                                                             \
  }                                                                           \
  while (0)


  do {                                                                        \
    assert(path != NULL);                                                     \
    if (cb == NULL) {                                                         \
      req->path = path;                                                       \
    } else {                                                                  \
      req->path = uv__strdup(path);                                           \
      if (req->path == NULL)                                                  \
        return UV_ENOMEM;                                                     \
    }                                                                         \
  }                                                                           \
  while (0)
 .req->flags = flags;
  req->mode = mode;

  do {                                                                        \
    if (cb != NULL) {                                                         \
      uv__req_register(loop, req);                                            \
      uv__work_submit(loop,                                                   \
                      &req->work_req,                                         \
                      UV__WORK_FAST_IO,                                       \
                      uv__fs_work,                                            \
                      uv__fs_done);                                           \
      return 0;                                                               \
    }                                                                         \
    else {                                                                    \
      uv__fs_work(&req->work_req);                                            \
      return req->result;                                                     \
    }                                                                         \
  }                                                                           \
  while (0)

If we pass a callback function (which is almost always), uv_fs_open submits a req of type FAST_IO using uv__work_submit:

void uv__work_submit(uv_loop_t* loop,
                     struct uv__work* w,
                     enum uv__work_kind kind,
                     void (*work)(struct uv__work* w),
                     void (*done)(struct uv__work* w, int status)) {
  uv_once(&once, init_once);
  w->loop = loop;
  w->work = work;
  w->done = done;
  post(&w->wq, kind);
}

w->work is the function that is executed by the worker thread. post adds this work item to the work queue. The worker fn picks up the work and executes the work function

The flow of code from JS -> C++ -> C is as follows

JS open : fs.open(path, options, callback) 1. create a req: FSReqCallback = Req(context(callback)) req has an oncomplete member, that will be called after req is done the oncomplete simply calls callback(), which is the User provided callback 2. call into Node C++ : binding.open(path, flags, mode:666, req)

binding.Open: 1.If a req argument is provided, the file is opened asynchronously 2. make req_wrap_async: FSReqBase = FSReqBase(req) 3. why do we wrap req ? req is a FSReqCallback which is an object that lives in JS land. We need to wrap it in a BaseObject,which is the abstraction used to tie JS objects to the C++ world. (I am not sure why we can’t just pass around a v8::object without wrapping it in a BaseObject, probably because we need to increment the ReferenceCount to this object so that it isn’t GC’ed while waiting for something to happen in CPP/C land) Some understanding of Class Hierarchy is needed here:
The order of sub-classes (super -> sub) : BaseObject -> ReqWrap -> FSReqBase (ReqWrap<uv_fs_t>) FSReqBase is a parametrized sub-class of ReqWrap<uv_fs_t> which => that it deals with fs requests (A request in lib_uv is a short action (such as opening a file, reading a file etc)) From Open, we call uv_fs_open, the libuv file that is used to open a file, asynchronously, via AsyncCall

AsyncCall This is a wrapper over AsyncDestCall with a extra nullptr argument

AsyncDestCall AsyncDestCall(env, req_wrap_async, args, "open",UTF8,nullptr,0,AfterInteger, uv_fs_open, *path, flags, mode) req_wrap_async: A C++ land wrapper over our user passed JS Callback (which is FSReqCallback(user_provided_callback)) Before calling uv_fs_open, we have to do a few things Init req_wrap_async with a name of the syscall we want to do (open in this case) call req_wrap_async->Dispatch(uv_fs_open, ..args, AfterInteger)

Dispatch: int ReqWrap<T>::Dispatch(LibuvFunction fn, Args... args) { Dispatch is where uv_fs_open is actually called The signature of uv_fs_open looks like: int uv_fs_open(uv_loop_t *loop, uv_fs_t *req, const char *path, int flags, int mode, uv_fs_cb  cb) What is the responsibility of each arg to uv_fs_open ? 1. loop -> The event loop to which we submit our work that is opening a file. 2. req -> denotes that we are operating on a file (uv_fs_t). req can contain a pointer to some arbitrary data that will be of use later 3. char* path : The path to the file that we want to open 4. flags and mode: The flags and mode with we want to open the file (lookup man 2 open on macOS) 5. cb: A callback function to be called once our file is opened. This callback function will be passed the req that we passed in the uv_fs_open. We have a C++ function: AfterInteger that is passed as the callback function. There is a lot of funkiness about how AfterInteger is called. It is first Wrapped in some sort of C++ template magic.This is done so that our JS function can be called by the C uv__fs_done function (that is run after our open call)

AfterInteger If you do not understand this section , feel free to ignore it. Basically, our original JS callback, provided by the user to fs.open(...) is called in AfterInteger, thus finishing our code cycle.

When AfterInteger is called, our open call is done and we have the result of the call. Assuming, we opened the file successfully, we need to fetch the callback passed by the user, so that we can execute it. Recall that our uv_fs_open calls the callback with a uv_fs_t data structure. So how do we get back the original FSReqCallback structure, from a uv_fs_t data structure ? We use the container_of magic to get the wrapping data structure of our uv_fs_t (which is FSReqCallback). AfterInteger calls Resolve on FSReqCallback, which finally executes our user provided callback fn

Reply to this post by email ↪