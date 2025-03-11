Users of Static Application Security Testing (SAST) tools frequently complain about false positives (FPs): issues reported by the tool that are, upon further investigation, incorrect. These are not real security issues requiring remedial action. These complaints are understandable, as reviewing false positives takes a lot of time. When they occur in large numbers, this can become unacceptable. Especially among developer audiences, seeing many false positives may trigger questions about whether it makes sense to use the tool at all.

Our Fortify SAST tool produces false positives too – like any SAST tool does. We offer many powerful ways of managing them in ways that are more efficient than individual issue auditing. This includes scan policies (where we changed the default one in 24.2 to “security” to provide a less noisy out-of-the-box experience), filtersets, custom rules, and the AI-powered Fortify Aviator. When used correctly, they make the operational false positives problem a thing of the past.

Nevertheless, we occasionally get the question: why not simply improve the tool to prevent these false positives from occurring in the first place? That’s a reasonable question, which we’ll try to answer in this blog post. And since we’ll be talking about false positives so much, we’re going to simply call them FP from here on.

An everyday FP case: Smoke detectors

Before diving into the rather mathematical world of static analysis, let’s look at a simpler system prone to producing FPs, which many of us will know first-hand: smoke detectors. We’ll find that many conceptual facts about FPs in smoke detectors carry over 1-on-1 to the world of SAST, so it pays off to start by exploring this simpler domain.

Smoke detectors are devices designed to alert us to beginning, uncontrolled, in-house fires by detecting the smoke they cause. When they do, they produce an impossible-to-ignore sound allowing the inhabitants to suppress the fire or leave the house in time. There’s already one obvious SAST parallel: an FP rate above a certain threshold will make the system unusable. If your smoke detectors were to trigger every day, you’d remove or replace them.

False positive triggers

What are some triggers of FPs here? Burning food on the stove or in the oven is an obvious one. Excessive smoke escaping from the wood burner. I’d imagine a 1970s party with everybody smoking would trigger it too. Even though there’s real smoke in all these cases, these events are not what we wanted to detect. How could we prevent the FPs? You could raise the minimum smoke concentration level at which the device triggers above the levels associated with the mentioned events. This would fix the problem, but there’s a price to pay: real fires would be detected later, or not at all. In other words, we have reduced the FP rate by increasing the false negative (FN) rate. This increases risk and may increase risk to a point where it defeats the purpose of having the smoke detectors.

Could we engineer something better? Yes, of course. By using not one but many connected detector units per room, having differentiated detectors for many chemical compounds, using video cameras, AI, training for typical FP scenarios, etc., it is probably possible to differentiate a turkey left in the oven for too long from the curtains catching fire after a candle fell over. So, what’s the issue here? Price. Current advanced smoke detectors retail at USD 150 a piece, and you need one for each room. Imagine this times ten to implement the advanced features described to prevent the FPs, and we end up with a price that few will be willing to pay. In practical scenarios, our ability to reduce the FP rate is bounded by resource limitations, similar to software supply chain.

A second FP example: The table grill

Before we switch to SAST, there’s one more scenario that’s interesting to consider. This is more of a thought experiment than anything else – it would be very unwise to try for real. Suppose that instead of using an indoor, smokeless table grill I wanted to do indoor flame-grilling of my burgers, and I created a small, controlled fire in my living room. Surely, my smoke detectors would trigger. You might consider it FP – hey, I wanted to have this fire, it’s not the accidental, uncontrolled fire that the system is supposed to detect.

There are two important points here. First, the FP assessment is solely based on contextual, rather than physical, information. A system cannot measure intent. Such FPs are inevitable. Second, what if the system hadn’t triggered in this case? Would you still trust it to function correctly for “real” fires? More generally, the occasional FP from my smoke detectors may be annoying but also provides comfort that they are operating effectively.

5 facts to help you understand SAST FP management

By looking at FPs in the context of smoke detectors, we discovered five facts that will help us better understand SAST FP management as well:

An excessive FP rate makes the system unusable. There’s an FP/FN tradeoff. The FP rate can easily be lowered by increasing the detection threshold, but this is at the expense of a higher FN rate. Efforts to reduce FP rate are limited by resource constraints. Even when we can conceive methods to further reduce FP rates, they may not be practical. Some FPs are caused by a lack of contextual understanding and therefore difficult to prevent. Occasional FPs provide assurance that detection is working. A zero FP rate isn’t necessarily optimal.

What causes SAST FPs?

In the smoke detector case, FPs are generally triggered by the presence of smoke not related to the type of fire we are looking to detect. Smoke is an imperfect proxy to what we’re really interested in. What is the general cause of FPs in the world of SAST? Frequently, users perceive FPs in a SAST tool as a sign of bugginess; the tool should simply be fixed. But in a mature SAST tool like Fortify, this is rare. Most such bugs have been fixed a long time ago. Remaining FPs have more fundamental causes, broadly speaking in three categories: algorithms, rules, and context. Let’s look at each.

Algorithms

Some low-end SAST tools (typically, software quality tools that also do a bit of security) exclusively rely on direct recognition of structural patterns in the code. This will inevitably lead to high FP and/or high FN rates. Serious SAST tools operate in a different way. They use control-flow analysis, data flow analysis, taint analysis, and others to understand the runtime behavior of the program.

The most important type of data flow analysis for SAST is taintanalysis. Taint is the concept that some variable in the program may be dangerous in some way (e.g., coming from web input). This tainted input propagates through assignments, invocations, etc. If such dangerous data ends up in a place where it could do harm, e.g. a SQL statement, there is a security issue. Taint analysis works by detecting the entire data flow from the place where the taint enters the program (the source) to the place where it can do harm (the sink). Structural analysis by itself cannot detect such issues.

Implementing data-flow analysis

Data-flow analysis is implemented by considering the control-flow graph of the program, which consists of blocks of program instructions that are always executed sequentially, and junctions in which the program flow can continue in, or come from, multiple directions, similar to Dynamic Application Security Testing (DAST). For each block, facts about the program state before and after the block are clearly related; there is a certain equation. The control-flow graph therefore represents a (huge) system of equations. Data-flow analysis amounts to solving this system.

Calculating the solution isn’t computationally feasible without making certain approximations. This is similar to the resource constraints we saw in the smoke detector case. In the scientific data flow analysis literature, these approximations are commonly referred to as insensitivities. This is where false positives creep in.

Common limitations

A detailed description of the impact of these limitations is beyond the scope of this blog. But to quickly name a few common ones: context-insensitivity (not distinguishing call sites of a function), as well as control- and path-insensitivity (not considering the consequences of conditions in control structures). Another limitation is that virtual functions cannot always be completely resolved. Practically, in an OO language, code may call an interface method. The SAST algorithm may not be able to determine which concrete implementation of the interface receives the call.

Another limitation is the inability to detect customer validation or sanitization mechanisms, which is crucial for API security. While taint analysis can understand that certain standard validation or sanitization mechanisms are being used, and prevent FPs in those cases, understanding that an algorithm you wrote from scratch does in fact execute validation or sanitization is beyond the state of the art. This can be frustrating when you implemented this algorithm in response to a SAST finding, only to see the finding still being reported.

Rules and the SAST system and the impact on FP and FN rates

Algorithms represent one side of a SAST system; they determine how it performs calculations. But what should it calculate? That’s determined by rules. What patterns define source and sinks for taint analysis?

While this may seem straightforward, this is where we must continuously trade-off FP and FN rates, similar to software composition analysis. Let’s look at a simple example. Suppose that a program has a variable called “password”. Now, suppose that we see that the program invoked a function to print this password. This would imply a password being printed to the console, which is a security violation that a SAST tool should report.

To model this, we need a source rule representing that variables called “password” represent a type of taint (which you label something like confidential/password/private), and a sink rule representing that if you call the print function on data with this taint, we have a security violation.

Doing so, we have introduced a risk for FP, which is a common challenge in open-source security. There are all kinds of reasons why a variable called “password” doesn’t contain one. It may be a prompt for a user (“Please enter your password here.”), or maybe the name of a key or property (“user.pwd”). Understanding these nuances is beyond the scope of taint analysis.

Can we prevent the FPs? Yes, by not writing the rule. We would then have to accept the FNs. This is the same FP/FN trade-off principle we saw for smoke detectors, now in action for SAST.

Context and its role in determining FPs

There are multiple ways in which context (relevant information outside of the source code itself) plays a role in the occurrence of FPs. The simplest one is risk appetite. Not all organizations are the same. If time-to-market is key, you’re dealing with non-sensitive information, and executing in a non-critical environment, you can accept a lot more risk than when you’re writing software managing financial transactions or controlling an insulin pump. While a straightforward SQL Injection issue would be relevant in either case, something like Log Forging would be considered irrelevant in the more risk-accepting context. A SAST tool cannot know this unless you tell it that, but that’s very easy to manage.

Mitigating risks

A more subtle phenomenon is when the intended behavior of a program is something that a SAST tool should normally flag. Example: prior to modern web application servers, we had RFC 3875 “The Common Gateway Interface (CGI)”. CGI let otherwise static HTML web servers invoke an external program to add functionality. This comes with all kinds of risks that users need to carefully mitigate. If you scan a webserver implementing CGI with Fortify, it will report security issues like command injection. You might say these are FP; in the context, it’s intended behavior. But in many ways, the CGI architecture is like flame-grilling in your living room. It’s a bad idea to begin with, you can’t really blame your alarm for triggering, and in fact, it would be concerning if it didn’t.

While this example may seem extreme, cases like this are common. Displaying a cryptographic secret key on a web page is generally bad, but in the context of showing it as a QR code on a page for 2FA enrolment, it’s perfectly acceptable. Failure to close a database connection after using it may result in a DoS vulnerability in a server process, but in the context of a command line utility, it may be harmless as the connection gets closed upon termination of the program. A hardcoded password is a bad thing, but in the context of a unit test of password verification, completely fine. There are many cases where contextual information determines whether something is FP.

Managing SAST FPs

FP rate remains an important topic in the SAST community. Now that we understand the SAST FP landscape a bit better, what does that tell us about the way ahead?

As a tool vendor, we need to continue to improve our offering in this aspect. However, opportunities to really improve it (i.e. without accepting many more FNs) are rather limited. We could have some control sensitivity. On our next-gen SAST engine, we can write more precise rules. We may add a feature to better differentiate application types. These are all relatively small improvements though. The much bigger improvements come from the continued roll out of AI-driven add-on solutions that go beyond regular SAST algorithms.

3 things to consider when choosing a SAST solution

As a tool user, first of all, do not select a tool based on out-of-the-box FP rate alone. There isn’t a magical way to have a low FP rate without getting a high FN rate; remember the trade-off. Selecting a tool on FP rate alone would expose you to risk. Second, make a conscious decision regarding risk appetite – what kind of thing do you want to be detected and remediated, and what are the things that you don’t care about? Can you configure your tool to report issues accordingly? Third, have a look at AI/LLM systems like Fortify Aviator. Since FPs are inevitable on the level of the SAST tool alone, an add-on like this is the best bet if you want to drive FPs to near-zero.