shxdow's notebook

aboutblog rss

Analysis of CVE-2023-32439: a Type Confusion Bug

Introduction

JavaScriptCore, the engine powering web browsers like Safari, employs advanced optimization layers to enhance JavaScript performance. These layers employ sophisticated techniques to analyze and optimize code, improving execution speed and memory efficiency. From baseline compilation to just-in-time (JIT) compilation, these layers adapt to runtime conditions, minimizing overhead and maximizing the application’s responsiveness. Leveraging a blend of adaptive optimizations and inline caching, JavaScriptCore’s optimization layers play a vital role in achieving optimal performance for modern web applications.

The complex design of this system makes it a prime target for potential attackers. Its intricate nature introduces many vulnerabilities and weak spots. This broadens the area an attacker can exploit, making it an attractive target for malicious individuals. The system’s interconnections and diverse functionalities increase the chances of unnoticed security flaws, adding to the risk for cyber adversaries keen on exploiting its intricacies.

This article presupposes a reasonable familiarity with JSC’s internal workings.

Bug analysis

On June 21, 2023, Apple rolled out a security update for Safari 16.5.1, tagging CVE-2023-32439 to a type confusion issue. In commit 52fe95e5805c735cc1fa4d6200fcaa1912efbfea the Bugzilla bug ID cited in the patch notes is cross-referenced.

The git message reads as follows:

EnumeratorNextUpdateIndexAndMod and HasIndexedProperty are different DFG nodes. However, they might introduce the same heap location kind in DFGClobberize.h which might lead to hash collision. We should introduce a new locationn kind for EnumeratorNextUpdateIndexAndMode.

The test case added is:

//@ runDefault("--watchdog=300", "--watchdog-exception-ok")
const arr = [0];

function foo() {
    for (let _ in arr) {
        0 in arr;
        while(1);
    }
}


foo();

Program analysis and abstract interpretation

To grasp the inner workings of JavaScriptCore, having a foundational knowledge of static program analysis and abstract interpretation comes in handy.

from “Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints”

A program denotes computations in some universe of objects. Abstract interpretation of programs consists in using that denotation to describe computations in another universe of abstract objects, so that the resulta of abstract execution give some informations on the actual computations. An intuitive example (which we borrow from Sintzoff [72]) is the rule of signs. The text -151517 may be undestood to denote computations on the abstract universe {(+), (-), (+-)} where the semantics of arithmetic operators is defined by the rule of signs. The abstract execution -151517 ==> -(+)(+) ==> (-)(+) ==> (-), proves that -1515+17 is a negative number. Abstract interpretation is concerned by a particlar underlying structure of the usual universe of computations (the sign, in our example). It gives a summay of some facets of the actual executions of a program. In general this summary is simple to obtain but inacurrate (e.g. -1515+17 ==> -(+)+(+) ==> (-)+(+) ==> (+-)). Despite its fundamental incomplete results abstract interpretation allows the programmer or the compiler to answer questions which do not need full knowledge of program executions or which tolerate an imprecise answer (e.g. partial correctness proofs of programs ignoring the termination problems, type checking, program optimizations which are not carried in the absence of certainty about their feasibility, …).

JavaScriptCore intricately models its computation; for more details, dive into the following: https://webkit.org/blog/10308/speculation-in-javascriptcore/.

Baseline JIT Compiler

The initial layer, known as the “Baseline JIT Compiler”, swiftly translates JavaScript source code into native machine code, prioritizing speed over extensive optimization. This first layer acts as a quick start for applications, providing immediate execution while preparing for more refined optimizations in subsequent layers.

For instance test.js compiles to:

// ... truncated for brevity

bb#1
Predecessors: [ ]
[   0] enter
[   1] new_func           dst:loc5, scope:loc4, functionDecl:0

// ... truncated for brevity

[  49] resolve_scope      dst:loc6, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[  56] get_from_scope     dst:loc5, scope:loc6, var:0, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[  64] call               dst:loc5, callee:loc5, argc:1, argv:12
[  70] end                value:loc5
Successors: [ ]

Basic Block #1 (bb#1) serves as the primary block where the function foo() is invoked via opcode [64]:

foo#C1tLOB:[0x10d44c220->0x10d42d800, NoneFunctionCall, 76]: 21 instructions (0 16-bit instructions, 0 32-bit instructions, 6 instructions with metadata); 196 bytes (120 metadata bytes); 1 parameter(s); 16 callee register(s); 5 variable(s); scope at loc4

bb#1
Predecessors: [ ]
[   0] enter
[   1] mov                dst:loc5, src:<JSValue()>(const0)
[   4] resolve_scope      dst:loc6, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[  11] get_from_scope     dst:loc6, scope:loc6, var:0, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[  19] mov                dst:loc8, src:Int32: 0(const1)
[  22] mov                dst:loc9, src:Int32: 0(const1)
[  25] get_property_enumerator dst:loc11, base:loc6
[  28] jeq_ptr            value:loc11, specialPointer:Cell: 0x101020e68 (0x3000042c0:[0x42c0/17088, JSPropertyNameEnumerator, (0/0, 0/0){}, NonArray, Leaf]), StructureID: 17088(const2), targetLabel:46(->74)
Successors: [ #6 #2 ]

bb#2
Predecessors: [ #1 #5 ]
[  32] loop_hint
[  33] check_traps
[  34] enumerator_next    propertyName:loc10, mode:loc8, index:loc9, base:loc6, enumerator:loc11
[  41] jeq_ptr            value:loc10, specialPointer:String (atomic),8Bit:(1),length:(1): $, StructureID: 16976(const3), targetLabel:33(->74)
Successors: [ #6 #3 ]

bb#3
Predecessors: [ #2 ]
[  45] mov                dst:loc5, src:loc10
[  48] resolve_scope      dst:loc12, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[  55] get_from_scope     dst:loc13, scope:loc12, var:0, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[  63] in_by_val          dst:loc14, base:loc13, property:Int32: 0(const4)
Successors: [ #4 ]

// ... truncated for brevity

The key aspects to focus on are the opcodes 25, 34, and 63, constituting the structure of a for loop

[  25] get_property_enumerator   ..., ...
...
[  34] enumerator_next           ..., ...
...
[  63] in_by_val                 ..., ...

Data Flow Graph (DFG)

JavaScriptCore (JSC) utilizes the Data Flow Graph (DFG) as a crucial component for optimizing JavaScript code. DFG is a representation of the program’s control flow and data flow, enabling sophisticated analysis and transformation of the code to enhance performance. This dynamic approach ensures that JavaScript execution is efficient and well-tailored to the specific program’s needs. The DFG plays a pivotal role in achieving optimal performance, making it an essential part of JSC’s optimization strategy.

After DFG’s bytecode parsing, instructions 25, 34, and `63 undergo conversion into its internal Intermediate Representation (IR). For instance, bytecode #34 is compiled to:

...

  5  1   :   D@44:<!0:->	GetLocal(JS|MustGen|UseAsOther, loc8(Q~/FlushedJSValue), R:Stack(loc8), bc#34, ExitValid)  predicting None
  6  1   :   D@45:< 1:->	EnumeratorNextUpdateIndexAndMode(Check:Untyped:D@41, Check:Untyped:D@42, Check:Untyped:D@44, Check:Untyped:D@43, VarArgs, Int32+OriginalCopyOnWriteArray+InBounds+AsIs+Read, enumeratorModes = 1, R:World, W:Heap, Exits, ClobbersExit, bc#34, ExitValid)

...

and bytecode #63 to

...

 12  2   :   D@69:< 1:->	JSConstant(JS|UseAsOther, Int32: 0, bc#63, ExitValid)
 13  2   :   D@70:<!0:->	InByVal(Check:Untyped:D@66, Check:Untyped:D@69, Boolean|MustGen|PureInt, Int32+OriginalCopyOnWriteArray+InBounds+AsIs+Read, R:World, W:Heap, Exits, ClobbersExit, bc#63, ExitValid)
 14  2   :   D@71:<!0:->	MovHint(Check:Untyped:D@70, MustGen, loc14, W:SideState, ClobbersExit, bc#63, ExitInvalid)

...

Even though the git message references HasIndexedProperty, there is no apparent mention of it in the unoptimized graph.

Prediction propagation phase

Prediction propagation phase propagates predictions obtained from heap load sites via the value profiler earlier in computation, as well as from slow path executions, in order to create a prediction for every node within the graph.

setPrediction() is used when we know that there is no way that we can change our minds about what the prediction is going to be.

For EnumeratorNextUpdateIndexAndMod and HasIndexedProperty

case EnumeratorNextUpdateIndexAndMode: {
            setTuplePredictions(SpecInt32Only, SpecInt32Only);
            break;
}

...

case HasIndexedProperty: {
            setPrediction(SpecBoolean);
            break;
}

These two nodes are believed to possess distinct values, and this is expected to remain unchanged. Hence, considering them as identical in the value they yield would be incorrect.

Fixup phase

Fixup phase addresses inefficient segments of the graph based on our predictions. This step should occur after prediction propagation but before performing Common Subexpression Elimination (CSE).

If, during the Prediction propagation phase, the DFG anticipates that InByVal is working on an object in search of anInt3 value, it is then replaced with a HasIndexedProperty node.

...

case InByVal: {
    if ((node->child1()->shouldSpeculateObject()
        || (node->child1()->shouldSpeculateObjectOrOther() && !m_graph.hasExitSite(node->origin.semantic, BadType)))
        && node->child2()->shouldSpeculateInt32()) {
        convertToHasIndexedProperty(node);
        break;
    }

    fixEdge<CellUse>(node->child1());
    break;
}

...

Common Subexpression Elimination

Common Subexpression Elimination (CSE) is an optimization technique used in compilers. It identifies repetitive expressions—those yielding the same value—and assesses the potential benefit of substituting them with a single variable storing the computed value.

Many past exploits have leveraged the mechanisms of this optimization pass to trigger type confusion vulnerabilities, and this case is no exception. When seeking to remove a redundant node, DFG treats both EnumeratorNextUpdateIndexAndMode and HasIndexedProperty as interchangeable. This is due to their LocationKind values being the same within their respective HeapLocation objects. The identical LocationKind implies that these two nodes read the same data type from the same location, making them mergeable into one.

Before CSE:

...
 14  2 27:   D@84:<!0:->	CheckInBounds(Int32:D@29, Check:KnownInt32:D@90, JS|MustGen|PureInt, Int32, Exits, bc#63, ExitValid)
 15  2 27:   D@70:< 1:->	HasIndexedProperty(Object:D@26, Int32:D@29, Check:Untyped:D@95, Check:Untyped:D@84, Boolean|VarArgs|PureInt, Bool, Int32+OriginalCopyOnWriteArray+InBoundsSaneChain+AsIs+Read, R:Butterfly_publicLength,JSObject_butterfly,IndexedInt32Properties, Exits, bc#63, ExitValid)
 16  2 27:   D@123:<!0:->	KillStack(MustGen, loc14, W:Stack(loc14), ClobbersExit, bc#63, ExitInvalid)
...

After CSE optimization node 70 is no longer HasIndexedProperty

...
 14  2 28:   D@84:<!0:->	CheckInBounds(Int32:D@29, Check:KnownInt32:D@90, JS|MustGen|PureInt, Int32, Exits, bc#63, ExitValid)
 15  2 28:   D@70:<!0:->	CheckVarargs(MustGen|VarArgs, Bool, bc#63, ExitValid)
 16  2 28:   D@42:<!0:->	KillStack(MustGen, loc14, W:Stack(loc14), ClobbersExit, bc#63, ExitInvalid)
...

Patch

This behaviour can be verified by examining the patch, which illustrates how CSE previously failed to differentiate between LocationKinds.

void clobberize( . . . ) {
	...
	switch (node->op()) {
		...
		case EnumeratorNextUpdateIndexAndMode:
		case HasIndexedProperty: {
		...
				read(JSObject_butterfly);
		        ArrayMode mode = node->arrayMode();
		+       LocationKind locationKind = node->op() == EnumeratorNextUpdateIndexAndMode ? EnumeratorNextUpdateIndexAndModeLoc : HasIndexedPropertyLoc;
		        switch (mode.type()) {
		        case Array::ForceExit: {
		            write(SideState);
		            return;
		        }
		        case Array::Int32: {
		            if (mode.isInBounds()) {
		                read(Butterfly_publicLength);
		                read(IndexedInt32Properties);
		-               def(HeapLocation(HasIndexedPropertyLoc, IndexedInt32Properties, graph.varArgChild(node, 0), graph.varArgChild(node, 1)), LazyNode(node));
		+               def(HeapLocation(locationKind, IndexedInt32Properties, graph.varArgChild(node, 0), graph.varArgChild(node, 1)), LazyNode(node));
		                return;
		            }
		            break;
		        }
    ...
}

Exploitation

Identifying a way to exploit the bug might have entailed analyzing FTL’s IR to identify the node misusing an alleged boolean value, which, actually is an Int32 type.

Conclusion:

EnumeratorNextUpdateIndexAndMode and HasIndexedProperty represent distinct DFG nodes, but they may both use the same heap location kind value during clobberization. This misuse of heap location kind has the potential to result in a type confusion bug after CSE, given that the former produces a 32-bit integer while the latter deals with boolean values.

References