Introducing the RIPS analysis engine Secer's Blog - 记录互联网安全历程与个人成长经历

In today’s post, we would like to share some insights into our static code analysis engine RIPS that detected the security bugs described in the previous and upcoming calendar gifts. The engine has a long history and went through several generations before reaching its current performance. What does it actually do within the few seconds after you click on the scan button and the first vulnerability report pops up? How can a security vulnerability be automatically detected in source code? Let’s have a look.

History

2007 - 2009

Almost 10 years ago, a simple PHP Scanner was developed during popularity gaining Capture The Flag (CTF) hacking battles of university teams. The scanner based on regular expressions and identified simple connections between user input that is first assigned to a variable and then used in a critical operation of the PHP code. It worked for the analysis of small applications in CTF events, but it became quickly evident that regular expressions are insufficient for parsing a programming language thoroughly.

123// this examples confuses even the syntax highlighter of our blog
$var = rips1("rips2(/* rips3())", rips4("*/"));
// the function rips1() and rips4() is called, rips2() within a string is not 

2009 - 2012

Using a tokenizer was the first step into the right direction. A new tool was developed that first splits the PHP code into its single tokens following the official PHP syntax, leading then to much more precise analysis results. The tool was named RIPS and released during the Month of PHP Security (MOPS). Today, it is the most popular open-source PHP analysis tool used by many leading companies world-wide for security audits. The major drawback of the open-source version is, however, the vast amount of false positives and the missing support for analyzing object-oriented code that is used in every modern PHP application.

1234567class Text {
    public function __construct($data) {
        $this->data = $data;
    }
}
$t = new Text($_POST['data']);
echo $t->data; // XSS

2012 - 2016

To overcome these limitations, a new analysis engine was built from scratch and that leverages the lessons learned during the past years of engineering. Challenges of the dynamic PHP language and its features were tackled and the efficient analysis of large web applications with object-oriented PHP code was pioneered by refining state-of-the-art static code analysis techniques with novel approaches specifically designed for the PHP language. As of today, RIPS is the only SAST tool with a dedicated focus on PHP analysis from its start and, as a result, is able to detect even complex vulnerability types with high precision.

How it works

When the new engine is pointed to a code repository, it transforms all PHP code into a graph representation within an initial analysis phase. For this purpose, the code is split into its single tokens, abstract syntax trees are built and devided into blocks, and then these blocks are connected to an annotated control flow graph. Now the data flow can be analyzed on top of this abstract model. With the help of taint analysis, user input is detected that is used unsanitized in a security critical operation of an application by following the data flow of each input throughout the graph model. The concept of a source tainting a sink can be applied to many different vulnerability types, such as cross-site scripting (XSS) and SQL injection.

Example

In the following, we have a look at a simple example code and its analysis. We skip all obstacles that stem from inter-procedural (functions, methods), constraint, or object-sensitive analysis. The example will demonstrate why a dedicated focus on PHP and its features is necessary in order to detect and validate a security vulnerability.

123456789$id = $_POST['id'];
if(...) {
    $id = (int)$id;
}
else {
    $id = htmlentities($id);
}
echo "<div id='$id'>"; // XSS
 

The code contains an XSS vulnerability in line 8 because the user input / source ($_POST['id']) in line 1 flows into the sensitive sink echo. In between, input sanitization is applied which requires further analysis. In the initial analysis phase, the code is parsed by the engine and tokenized. It identifies different branches (if/else) and separates the code into different blocks accordingly. These block are then connected to a control flow graph with labeled edges.

RIPS

Each block of the graph is analyzed for sensitive sinks. In our example, the echo operator is detected in the last block (red border). At this point, the engine invokes a markup parser dedicated to the markup of the sink - for our example, HTML. Our HTML markup parser is able to pinpoint the exact location of dynamic content within the HTML. It detects that the variable $id lies within a single-quoted attribute of an HTML element. This information is very important because now we acknowledge what an attacker needs in order to break out of the attribute and to inject malicious HTML: he needs a single-quote '.

Next, the engine resolves the arguments of the sink echo from the previous blocks. The variable $id is looked up in the left block where a typecast prevents any exploitation and stops the trace. Then, the variable is looked up in the right block. Here, the PHP built-in function htmlentities() is used to sanitize $id. The engine executes a complete simulation of this built-in function and detects that without an additional parameter, only ", <, and > characters are encoded to HTML entities. Without this PHP-specific precision, previous generations as well as other approaches would stop the trace at this point and often whitelist htmlentities() as an XSS sanitizer. Instead, our engine learns during simulation precisely which characters are affected, and continues the trace from the right block to the first block in our graph. Here, the variable $id maps to a $_POST source.

Finally, our engine can combine all gathered information and decide that the source $_POST['id'] is not sanitized against single-quotes and taints an HTML attribute id with single-quotes as delimiter. Because of insufficient sanitization, an attacker can perform cross-site scripting attacks and a vulnerability report is issued with the following facts. Further, the severity can be fine-tuned based on the vulnerability type, its markup context, the type of source, and any present security mechanisms.

Cross-Site Scripting (single-quoted attribute)

Severity: Medium, CWE: 79, OWASP Top 10: A3, SANS 25 Rank: 4

12345$id = $_POST['id'];
⋮
$id = htmlentities($id);
⋮
echo "<div id='$id'>";

Reconstructed HTML Context

`1`	`<div id='$_POST['id']'>`

Summary

The PHP landscape changed in the past years and so did the requirements for SAST tools. Diverse language features and characteristics, as well as more security-aware developers and growing code sizes lead to more complex applications. Static code analysis has to be advanced in order to keep up with these challenges for the automated detection of security issues.

In this post, we had a glance at the inner working of the RIPS analysis engine and at some key advances over previous generations. We hope that we provided some insights into the world of code analysis that will be helpful for understanding the background of our upcoming vulnerability posts. In case you would like to work together with leading experts in the field of static analysis, we are currently hiring and are looking forward to getting in contact with you.

Follow us on Twitter to be notified when the next gift of our advent calendar is opened!

APAV Time Table

Date	Author	Title
24 Dec 2016	Johannes Dahse	What we learned from our Advent Calendar
23 Dec 2016	Hendrik Buchwald	e107 2.1.2: SQL Injection through Object Injection
22 Dec 2016	Daniel Peeren	Security Compliance with Static Code Analysis
21 Dec 2016	Martin Bednorz	AbanteCart 1.2.8 - Multiple SQL Injections
20 Dec 2016	Martin Bednorz	Kliqqi 3.0.0.5: From Cross-Site Request Forgery to Code Execution
19 Dec 2016	Robin Peraglie	osClass 3.6.1: Remote Code Execution via Image File
18 Dec 2016	Daniel Peeren	Continuous Integration - Jenkins at your service
17 Dec 2016	Johannes Dahse	OpenConf 5.30 - Multi-Step Remote Command Execution
16 Dec 2016	Robin Peraglie	Redaxo 5.2.0: Remote Code Execution via CSRF
15 Dec 2016	Dennis Detering	Guest Post: Vtiger 6.5.0 - SQL Injection
14 Dec 2016	Hendrik Buchwald	The State of Wordpress Security
13 Dec 2016	Johannes Dahse	phpBB 2.0.23 - From Variable Tampering to SQL Injection
12 Dec 2016	Martin Bednorz	Teampass 2.1.26.8: Unauthenticated SQL Injection
11 Dec 2016	Daniel Peeren	Rescanning Applications with RIPS
10 Dec 2016	Hendrik Buchwald	Non-Exploitable Security Issues
9 Dec 2016	Hendrik Buchwald	Precurio 2.1: Remote Command Execution via Xinha Plugin
8 Dec 2016	Martin Bednorz	PHPKit 1.6.6: Code Execution for Privileged Users
7 Dec 2016	Hendrik Buchwald	Serendipity 2.0.3: From File Upload to Code Execution
6 Dec 2016	Robin Peraglie	Roundcube 1.2.2: Command Execution via Email
5 Dec 2016	Hendrik Buchwald	Expression Engine 3.4.2: Code Reuse Attack
4 Dec 2016	Johannes Dahse	Introducing the RIPS analysis engine
3 Dec 2016	Martin Bednorz	eFront 3.6.15: Steal your professors password
2 Dec 2016	Martin Bednorz	Coppermine 1.5.42: Second-Order Command Execution
1 Dec 2016	Hendrik Buchwald	FreePBX 13: From Cross-Site Scripting to Remote Command Execution
25 Nov 2016	Martin Bednorz	Announcing the Advent of PHP Application Vulnerabilities