angr已经不在是从前的angr了。
前一阵子为了学习ollvm混淆,纠结了很久angr的版本,可以说之前的angr很乱,版本稍一改动api就变了,早在暑假就听闻angr会有很大的变动,最近偶然间看到最新版的angr,于是又萌生了学习一番的冲动,毕竟angr在CTF解题中有许多应用。
什么是符号执行,参阅Wiki
官方给出了很多例题,这都是很好的学习资源呐!
搭好环境后我们可以先来测试一个官方给的例子是否有效,
就先测试耗时比较短的一题Whitehat CTF 2015 - Crypto 400
可以看到喜人的输出结果。
改版之后的不同,我这刚好有早之前的angr的示例脚本,拿来比较着学习。
旧版(7.x.x.x)
#!/usr/bin/env python
# coding: utf-8
import angr
import time
def main():
# Load the binary. This is a 64-bit C++ binary, pretty heavily obfuscated.
p = angr.Project('wyvern')
# This block constructs the initial program state for analysis.
# Because we're going to have to step deep into the C++ standard libraries
# for this to work, we need to run everyone's initializers. The full_init_state
# will do that. In order to do this peformantly, we will use the unicorn engine!
st = p.factory.full_init_state(args=['./wyvern'], add_options=angr.options.unicorn)
# It's reasonably easy to tell from looking at the program in IDA that the key will
# be 29 bytes long, and the last byte is a newline.
# Constrain the first 28 bytes to be non-null and non-newline:
for _ in xrange(28):
k = st.posix.files[0].read_from(1)
st.solver.add(k != 0)
st.solver.add(k != 10)
# Constrain the last byte to be a newline
k = st.posix.files[0].read_from(1)
st.solver.add(k == 10)
# Reset the symbolic stdin's properties and set its length.
st.posix.files[0].seek(0)
st.posix.files[0].length = 29
# Construct a SimulationManager to perform symbolic execution.
# Step until there is nothing left to be stepped.
sm = p.factory.simulation_manager(st)
sm.run()
# Get the stdout of every path that reached an exit syscall. The flag should be in one of these!
out = ''
for pp in sm.deadended:
out = pp.posix.dumps(1)
if 'flag{' in out:
return filter(lambda s: 'flag{' in s, out.split())[0]
# Runs in about 15 minutes!
def test():
assert main() == 'flag{dr4g0n_or_p4tric1an_it5_LLVM}'
if __name__ == "__main__":
before = time.time()
print main()
after = time.time()
print "Time elapsed: {}".format(after - before)
最新版8.18.10.25
#!/usr/bin/env python
# coding: utf-8
import angr
import claripy
import time
def main():
# Load the binary. This is a 64-bit C++ binary, pretty heavily obfuscated.
# its correct emulation by angr depends heavily on the libraries it is loaded with,
# so if this script fails, try copying to this dir the .so files from our binaries repo:
# https://github.com/angr/binaries/tree/master/tests/x86_64
p = angr.Project('wyvern')
# It's reasonably easy to tell from looking at the program in IDA that the key will
# be 29 bytes long, and the last byte is a newline. Let's construct a value of several
# symbols that we can add constraints on once we have a state.
flag_chars = [claripy.BVS('flag_%d' % i, 8) for i in range(28)]
flag = claripy.Concat(*flag_chars + [claripy.BVV(b'\n')])
# This block constructs the initial program state for analysis.
# Because we're going to have to step deep into the C++ standard libraries
# for this to work, we need to run everyone's initializers. The full_init_state
# will do that. In order to do this peformantly, we will use the unicorn engine!
st = p.factory.full_init_state(
args=['./wyvern'],
add_options=angr.options.unicorn,
stdin=flag,
)
# Constrain the first 28 bytes to be non-null and non-newline:
for k in flag_chars:
st.solver.add(k != 0)
st.solver.add(k != 10)
# Construct a SimulationManager to perform symbolic execution.
# Step until there is nothing left to be stepped.
sm = p.factory.simulation_manager(st)
sm.run()
# Get the stdout of every path that reached an exit syscall. The flag should be in one of these!
out = b''
for pp in sm.deadended:
out = pp.posix.dumps(1)
if b'flag{' in out:
return next(filter(lambda s: b'flag{' in s, out.split()))
# Runs in about 15 minutes!
def test():
assert main() == b'flag{dr4g0n_or_p4tric1an_it5_LLVM}'
if __name__ == "__main__":
before = time.time()
print(main())
after = time.time()
print("Time elapsed: {}".format(after - before))
试着运行旧版的脚本,如下报错
可见st.posix.files
中的files
方法已经被移除了,之前了解过angr
的同学应该都知道,这一段代码是用来做条件约束的,那么新版是如何进行约束的呢?
for _ in xrange(28):
k = st.posix.files[0].read_from(1)
st.solver.add(k != 0)
st.solver.add(k != 10)
在angr==8.18.10.25
中使用了claripy
模块进行输入以及条件约束。
flag_chars = [claripy.BVS('flag_%d' % i, 8) for i in range(28)]
flag = claripy.Concat(*flag_chars + [claripy.BVV(b'\n')])
for k in flag_chars:
st.solver.add(k != 0)
st.solver.add(k != 10)
虽然形式发生了变化,但是理解起来是一样的。
只希望angr的api不要再经常变动了,这对于想学习angr的同学来说可真不友好
我们从比较简单的一题开始看起
载入IDA
逻辑很清楚了,sub_4006FD
函数中只有一个简单的运算,对于angr
这么点运算量实在是微不足道。
import angr
def main():
p = angr.Project("r100",auto_load_libs=True)
simgr = p.factory.simulation_manager(p.factory.full_init_state())
simgr.explore(find=0x400844, avoid=0x400855)
return simgr.found[0].posix.dumps(0).strip(b'\0\n')
def test():
assert main().startswith(b'Code_Talkers')
if __name__ == '__main__':
print(main())
使用p.factory.simulation_manager
创建一个simulation_manager
进行模拟执行,其中传入一个SimState
.
SimState
对象通常有三种
1. blank_state(**kwargs)
返回一个未初始化的state,此时需要主动设置入口地址,以及自己想要设置的参数。
2. entry_state(**kwargs)
返回程序入口地址的state,通常来说都会使用该状态
3. full_init_state(**kwargs)
同entry_state(**kwargs) 类似,但是调用在执行到达入口点之前应该调用每个初始化函数
除此之外还需要了解一下auto_load_libs
参数,该参数用来设置是否自动载入依赖的库,如果设置为True
会自动载入依赖的库,然后分析到库函数调用时也会进入库函数,这样会增加分析的工作量。如果为False
,程序调用函数时,会直接返回一个不受约束的符号值。
simgr.explore(find=0x400844, avoid=0x400855)
然后使用simgr.explore
进行模拟执行find
是想要执行分支,avoid
是不希望执行的分支。
执行完之后找到一个符合条件的分支。
<SimulationManager with 12 avoid, 1 found, 2 active>
,此时相关的状态已经保存在了simgr
当中,我们可以通过simgr.found
来访问所有符合条件的分支。
此时我们可以试想一下,我们该如何获取angr正确执行到0x400844
分支所进行的输入呢?
在官方的文档上我们可以知道SimState
都有哪些参数和方法。
不过我有时候更习惯的是help(simgr.found[0])
可以比较清楚的知道SimState
中保存了当前状态的寄存器,内存,输入输出等信息,这里我们需要使用posix
,接下来在看一下posix
都由哪些方法可以供我们使用。
我比较关注的是dump
方法,它可以返回相应文件描述符的内容,很明显为了获取输入,应该使用dump(0)
,因此完整的代码就应该如下simgr.found[0].posix.dumps(0)
到此,虽然题目解决了,但是相信大家也都有许多很多困惑的地方,<SimulationManager with 12 avoid, 1 found, 2 active>
中avoid 和 active
又是什么呢?
我试着探究一番。逐个打印出simgr.avoid
的输入,发现有趣的地方。
可以看到这12个avoid
就是angr在模拟执行到0x400855
分支的过程。
再来看看active
其中0x4007e4
对应的是一个死循环。
至此我们已经掌握了angr
的基本用法了,用来解决一般的题目没有任何问题,我们只需设置好find 和 avoid
就可以了。
!但是要想真正发挥出angr
的作用,可不仅仅如此,我们需要知道如何进行条件约束,以及如何hook
,从而在有限的时间内,充分发挥出angr
的用处。
下一篇了解一下angr如何进行命令行传参并求解