Skip to content

Dereferencing values

JavaScript as a language makes two assertions that are particularly relevant in the context of static analysis. Let's re-iterate on them so that we may explain what dereferencing means and why we need it.

One: objects are copied by reference. This is evident when, for example, you copy an array:

javascript
var a = [1, 2]
var b = a;

b.pop();

console.log(a) // => [1]

Two: expressions produce values. For example, as an argument to a function call, you could provide what would otherwise be a class declaration:

javascript
function instantiate(klass) { // klass is a Class *expression*
  return new klass()
}

instantiate(class X {}); // => X()

These traits of the language govern our ability to search for the use of a value as opposed to its definition. Because a value may be referenced by any number of expressions - variable declarations are only one! - and because expressions are treated as values, the language is dynamic to a point that makes it difficult to search for when and how a value is used.

Consider a scenario where we want to find the calls to the member property then of Promise instances. You could search for this:

clojure
(call (mem then (of Promise)))

Which would capture a statement like this:

javascript
new Promise(() => {}).then(() => {})

But what if a reference to new Promise was created somewhere, and it is on that reference that the member then was called?

javascript
var x = new Promise();
x.then(() => {})

Sometimes, we do not care whether the member then was accessed through a reference or directly through the instance of Promise. Other times, we do not even know how a value gets used after it is defined -- that could be what we're looking for in the first place, but we do know how it gets defined, so how can we locate the usage?

To deal with such scenarios, SYNG has limited support for "following" a value in a process it calls dereferencing.

Selecting references to a value

(:ref) selects references to a value:

clojure
(:ref [value])

Back to our Promise example, were we to change the query to this instead:

clojure
(call (mem then (:ref (of Promise))))

Suddenly, SYNG will match all (and more) of the following use patterns:

javascript
{
  new Promise().then(...) // OK
}

{
  var x = new Promise()

  x.then(...) // OK
}

{
  function doSomethingLater(deferred) {
    deferred.then(...) // OK
  }

  doSomethingLater(new Promise())
}

SYNG can tell by analyzing the script that the variable x in var x = new Promise() is referencing the value we selected: (of Promise). Based on that knowledge, it will from that point on treat (id x) exactly as if it's (of Promise), until x gets redeclared or goes out of scope. To an extent, it is equivalent to writing the following query by hand:

clojure
(:or (of Promise) (id x))

Earlier, we mentioned that SYNG's support for this process is limited, that is because without actually evaluating expressions, you can only dereference certain patterns and not all. For example, here is a case where we cannot detect a reference statically:

javascript
function doThingsAfter(thingsBeingDone) {
  for (let i = 0; i < thingsBeingDone.length; ++i) {
    thingsBeingDone[i].then(...)
  }
}

doThingsAfter([ new Promise(), new Promise() ])

Although SYNG knows the parameter thingsBeingDone has values matching (of Promise) at indices 0 and 1, it has no way of knowing that the loop body, specifically in thingsBeingDone[i], will actually access those indices -- not unless it evaluates the loop clause. Evaluation is outside the scope of SYNG.

With that said, the (:ref) operator still proves to be very useful in practice, even if it does not cover all patterns of referencing. At times, it points you much closer to where the answer lies, after which the human analysis can take over. Other times, it is your only way of crossing the layers between the definition of a value and its use, best-effort as it may be.

Consult the reference page for the operator to see exactly which patterns are recognized.

When to dereference

Use (:ref) when the target of your search for is disconnected from what you can select.

In other words, you're looking for the usage of some value that you only know how it is defined. The intermediary layers between its definition (what you can select) and its use (what you can't select) --- this is what (:ref) tries to fill in for you.

Consider this real-world example from the Canvas LMS codebase. To translate strings into the user's desired language, we must first generate an object that knows how to translate by calling useScope and then call the t method on it for each individual string:

javascript
import { useScope as useI18nScope } from '@canvas/i18n'

const I18n = useI18nScope("foo")

I18n.t("banana")
I18n.t("fish")

We're looking for t calls, and we know they'll be made on whatever is referring to the return value of the call to useScope, but we don't know what that actually is. In this example, it is I18n, but it could be anything. With (:ref), we tell SYNG to follow what we know about and fill in what we don't:

clojure
(call (mem t ; I18n.t
           (:ref ; const I18n = useI18nScope()
              (call ; useI18nScope()
                (import "@canvas/i18n" useScope))))) ; useI18nScope

One could argue that you would still get actionable results by simply searching for calls to the member t, since there likely won't be that many similar interfaces in the codebase:

clojure
(call (mem t))

If you're lucky, yes! The decision is for you to make. SYNG by design aims to do the biggest part it can in refining the results before you take over, but that doesn't mean you always have to provide a most specific query. It's there for when you need it.

Copyright © 2022-present Semantic Works, Inc.