前言

angr已经不在是从前的angr了。

前一阵子为了学习ollvm混淆,纠结了很久angr的版本,可以说之前的angr很乱,版本稍一改动api就变了,早在暑假就听闻angr会有很大的变动,最近偶然间看到最新版的angr,于是又萌生了学习一番的冲动,毕竟angr在CTF解题中有许多应用。

基本原理

什么是符号执行,参阅Wiki

angr官网

API文档

CTF应用

例题

官方给出了很多例题,这都是很好的学习资源呐!

搭好环境后我们可以先来测试一个官方给的例子是否有效,

就先测试耗时比较短的一题Whitehat CTF 2015 - Crypto 400

可以看到喜人的输出结果。

比较

改版之后的不同,我这刚好有早之前的angr的示例脚本,拿来比较着学习。

旧版(7.x.x.x)

#!/usr/bin/env python
# coding: utf-8
import angr
import time

def main():
    # Load the binary. This is a 64-bit C++ binary, pretty heavily obfuscated.
    p = angr.Project('wyvern')

    # This block constructs the initial program state for analysis.
    # Because we're going to have to step deep into the C++ standard libraries
    # for this to work, we need to run everyone's initializers. The full_init_state
    # will do that. In order to do this peformantly, we will use the unicorn engine!
    st = p.factory.full_init_state(args=['./wyvern'], add_options=angr.options.unicorn)

    # It's reasonably easy to tell from looking at the program in IDA that the key will
    # be 29 bytes long, and the last byte is a newline.

    # Constrain the first 28 bytes to be non-null and non-newline:
    for _ in xrange(28):
        k = st.posix.files[0].read_from(1)
        st.solver.add(k != 0)
        st.solver.add(k != 10)

    # Constrain the last byte to be a newline
    k = st.posix.files[0].read_from(1)
    st.solver.add(k == 10)

    # Reset the symbolic stdin's properties and set its length.
    st.posix.files[0].seek(0)
    st.posix.files[0].length = 29

    # Construct a SimulationManager to perform symbolic execution.
    # Step until there is nothing left to be stepped.
    sm = p.factory.simulation_manager(st)
    sm.run()

    # Get the stdout of every path that reached an exit syscall. The flag should be in one of these!
    out = ''
    for pp in sm.deadended:
        out = pp.posix.dumps(1)
        if 'flag{' in out:
            return filter(lambda s: 'flag{' in s, out.split())[0]

    # Runs in about 15 minutes!

def test():
    assert main() == 'flag{dr4g0n_or_p4tric1an_it5_LLVM}'

if __name__ == "__main__":
    before = time.time()
    print main()
    after = time.time()
    print "Time elapsed: {}".format(after - before)

最新版8.18.10.25

#!/usr/bin/env python
# coding: utf-8
import angr
import claripy
import time

def main():
    # Load the binary. This is a 64-bit C++ binary, pretty heavily obfuscated.
    # its correct emulation by angr depends heavily on the libraries it is loaded with,
    # so if this script fails, try copying to this dir the .so files from our binaries repo:
    # https://github.com/angr/binaries/tree/master/tests/x86_64
    p = angr.Project('wyvern')

    # It's reasonably easy to tell from looking at the program in IDA that the key will
    # be 29 bytes long, and the last byte is a newline. Let's construct a value of several
    # symbols that we can add constraints on once we have a state.

    flag_chars = [claripy.BVS('flag_%d' % i, 8) for i in range(28)]
    flag = claripy.Concat(*flag_chars + [claripy.BVV(b'\n')])

    # This block constructs the initial program state for analysis.
    # Because we're going to have to step deep into the C++ standard libraries
    # for this to work, we need to run everyone's initializers. The full_init_state
    # will do that. In order to do this peformantly, we will use the unicorn engine!
    st = p.factory.full_init_state(
            args=['./wyvern'],
            add_options=angr.options.unicorn,
            stdin=flag,
    )

    # Constrain the first 28 bytes to be non-null and non-newline:
    for k in flag_chars:
        st.solver.add(k != 0)
        st.solver.add(k != 10)

    # Construct a SimulationManager to perform symbolic execution.
    # Step until there is nothing left to be stepped.
    sm = p.factory.simulation_manager(st)
    sm.run()

    # Get the stdout of every path that reached an exit syscall. The flag should be in one of these!
    out = b''
    for pp in sm.deadended:
        out = pp.posix.dumps(1)
        if b'flag{' in out:
            return next(filter(lambda s: b'flag{' in s, out.split()))

    # Runs in about 15 minutes!

def test():
    assert main() == b'flag{dr4g0n_or_p4tric1an_it5_LLVM}'

if __name__ == "__main__":
    before = time.time()
    print(main())
    after = time.time()
    print("Time elapsed: {}".format(after - before))

试着运行旧版的脚本,如下报错

可见st.posix.files中的files方法已经被移除了,之前了解过angr的同学应该都知道,这一段代码是用来做条件约束的,那么新版是如何进行约束的呢?

for _ in xrange(28):
    k = st.posix.files[0].read_from(1)
    st.solver.add(k != 0)
    st.solver.add(k != 10)

angr==8.18.10.25中使用了claripy模块进行输入以及条件约束。

flag_chars = [claripy.BVS('flag_%d' % i, 8) for i in range(28)]
flag = claripy.Concat(*flag_chars + [claripy.BVV(b'\n')])

for k in flag_chars:
        st.solver.add(k != 0)
        st.solver.add(k != 10)

虽然形式发生了变化,但是理解起来是一样的。

只希望angr的api不要再经常变动了,这对于想学习angr的同学来说可真不友好

初涉angr

我们从比较简单的一题开始看起

defcamp_r100

载入IDA

逻辑很清楚了,sub_4006FD函数中只有一个简单的运算,对于angr这么点运算量实在是微不足道。

import angr

def main():
    p = angr.Project("r100",auto_load_libs=True)
    simgr = p.factory.simulation_manager(p.factory.full_init_state())
    simgr.explore(find=0x400844, avoid=0x400855)

    return simgr.found[0].posix.dumps(0).strip(b'\0\n')

def test():
    assert main().startswith(b'Code_Talkers')

if __name__ == '__main__':
    print(main())

使用p.factory.simulation_manager创建一个simulation_manager进行模拟执行,其中传入一个SimState.

SimState对象通常有三种

1. blank_state(**kwargs)
返回一个未初始化的state,此时需要主动设置入口地址,以及自己想要设置的参数。
2. entry_state(**kwargs)
返回程序入口地址的state,通常来说都会使用该状态
3. full_init_state(**kwargs)
同entry_state(**kwargs) 类似,但是调用在执行到达入口点之前应该调用每个初始化函数

除此之外还需要了解一下auto_load_libs参数,该参数用来设置是否自动载入依赖的库,如果设置为True会自动载入依赖的库,然后分析到库函数调用时也会进入库函数,这样会增加分析的工作量。如果为False,程序调用函数时,会直接返回一个不受约束的符号值。

simgr.explore(find=0x400844, avoid=0x400855)然后使用simgr.explore进行模拟执行find是想要执行分支,avoid是不希望执行的分支。

执行完之后找到一个符合条件的分支。

<SimulationManager with 12 avoid, 1 found, 2 active>,此时相关的状态已经保存在了simgr当中,我们可以通过simgr.found来访问所有符合条件的分支。

此时我们可以试想一下,我们该如何获取angr正确执行到0x400844分支所进行的输入呢?

在官方的文档上我们可以知道SimState都有哪些参数和方法。

不过我有时候更习惯的是help(simgr.found[0])

可以比较清楚的知道SimState中保存了当前状态的寄存器,内存,输入输出等信息,这里我们需要使用posix,接下来在看一下posix都由哪些方法可以供我们使用。

我比较关注的是dump方法,它可以返回相应文件描述符的内容,很明显为了获取输入,应该使用dump(0),因此完整的代码就应该如下simgr.found[0].posix.dumps(0)

到此,虽然题目解决了,但是相信大家也都有许多很多困惑的地方,<SimulationManager with 12 avoid, 1 found, 2 active>avoid 和 active又是什么呢?

我试着探究一番。逐个打印出simgr.avoid的输入,发现有趣的地方。

可以看到这12个avoid就是angr在模拟执行到0x400855分支的过程。

再来看看active

其中0x4007e4对应的是一个死循环。

至此我们已经掌握了angr的基本用法了,用来解决一般的题目没有任何问题,我们只需设置好find 和 avoid就可以了。

!但是要想真正发挥出angr的作用,可不仅仅如此,我们需要知道如何进行条件约束,以及如何hook,从而在有限的时间内,充分发挥出angr的用处。

下一篇了解一下angr如何进行命令行传参并求解

源链接

Hacking more

...