I Ran Free SAST Tools on OpenEMR and Found a CVE

CVE-2025-30161(Github Advisory) is a stored XSS vulnerability in the bronchitis form component of OpenEMR. It allows anyone with access to edit a bronchitis form to inject malicious javascript payload into the form and run the payload in the context of other users including administrators.

Background

I am a big proponent of open source static analysis engines like Semgrep and tools like Snyk Code that provide a free tier. One of the biggest selling point for both Semgrep and Snyk is that they allow writing your own rules. I have written many Semgrep rules at my job or otherwise and have found the rule language easy to learn and understand, while also providing very powerful features such as taint tracking and autofixing. Snyk’s custom rule support is currently in preview as of the time of writing. Both Semgrep and Snyk also have a set of pre-built rules. In the Semgrep world they are called community rules and can be found in this repo.

Experimental Setup

While writing rules is fun, I am a lazy security researcher who wants quick ~~profits~~ CVEs. So I decided to perform an experiment. Here are the steps of my experiment:

Pick a few open source repositories.
Clone them to my laptop and run Semgrep with its community rules.
Fork them to my GitHub account and run Snyk (Snyk’s GitHub integration requires the user to have write access to the repos and the easiest way to do this was to fork it to my account).
Analyze the results.
Find ___ CVEs.

This seemed like an easy enough experiment. I anticipated the result analysis to take some time since the probability of having a lot of noise was quite high.

The first repository I picked was OpenEMR. From their homepage:

OpenEMR is the most popular open source electronic health records and medical practice management solution. OpenEMR is a community of passionate volunteers and contributors dedicated to guarding OpenEMR’s status as a free, open source software solution for medical practices with a commitment to openness, kindness and cooperation.

The motivation behind picking this was a talk I attended where the speaker spoke about security issues in EMR (Electronic Medical Record) systems. The speaker emphasized on the need to ensure the security of these systems as they contain a lot of sensitive information.

OpenEMR has been scrutinized many times in the past. Some references:

Report by Project Insecurity
RCE in OpenEMR
SQL injecion in phpGACL (These CVEs are not in OpenEMR itself, but OpenEMR was used to demonstrate the PoC)

Due to these reports, the developers on OpenEMR have worked very hard to make the codebase quite secure, which is very respectable and great for users, but not good for me as my chances of finding vulnerabilities are greatly reduced 🙁.

Running Semgrep

OpenEMR is a large code base. Running cloc on only git tracked files shows 6116 text files and over 1.7 million lines of code.

cloc --list-file=<(git ls-files)

    6116 text files.

github.com/AlDanial/cloc v 2.02  T=22.94 s (239.0 files/s, 97010.4 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
PHP                            3717          86837         263709         637104
SVG                             274            106          33465         411322
SQL                             123           7386           7446         321356
HTML                            217           3294           7123         132878
JavaScript                      284          11920          15960          82732
CSS                              91           5006           1071          42703
... snipped ...
--------------------------------------------------------------------------------
SUM:                           5482         122999         346123        1755972
--------------------------------------------------------------------------------

The Semgrep scan result was also incredibly large:

Ran 1362 rules on 4588 files: 3021 findings.

3021 findings?! Boy, have a found a gold mine! Or did I? Spoiler alert: I did not. Going over 3000+ findings is an impossible feat. I read some of the findings and quickly determined that most of them were false positives.

I was interested in learning how many findings are reported per rule so I asked ChatGPT to write a script for me to count the findings by rule from the sarif output. These were the top rules:

php.lang.security.taint-unsafe-echo-tag.taint-unsafe-echo-tag: 2243
javascript.express.security.audit.xss.mustache.explicit-unescape.template-explicit-unescape: 123
php.lang.security.unlink-use.unlink-use: 99
php.lang.security.tainted-user-input-in-php-script.tainted-user-input-in-php-script: 97
php.lang.security.injection.tainted-sql-string.tainted-sql-string: 92
php.lang.security.injection.tainted-filename.tainted-filename: 72
php.lang.security.unserialize-use.unserialize-use: 61
php.lang.security.injection.printed-request.printed-request: 41
php.lang.security.exec-use.exec-use: 38
generic.secrets.security.detected-bcrypt-hash.detected-bcrypt-hash: 26
javascript.browser.security.insecure-document-method.insecure-document-method: 22

Reading the OpenEMR code, I found that the developers have applied robust protections for common issues such as XSS, SQL injection, and code injection. For XSS mitigation, they added many helper functions and used them wherever user input was used. For SQL injection, they have correctly used prepared statements and parameterized queries. For code injection, they have ensured to not use any user input while making system calls.

The 2243 findings for php.lang.security.taint-unsafe-echo-tag.taint-unsafe-echo-tag were mostly reported for code like

<span class=\"text\"><?php echo xlt('Total active reminders before update') . \": \" . text($update_rem_log['total_pre_active_reminders']); ?></span><br />"

The xlt function is from the aforementioned helper functions designed to prevent XSS. Glancing through many of them, I quickly got bored and decided that this rule is noisy. In the rule writer’s and Semgrep’s defense, it is very unlikely that a generic rule can correctly determine that a custom XSS protection mechanism is being used here.

Similarly I looked through the 92 SQL injection findings and found them to be noise too. So much for “easy CVEs”…

I ended up writing a script to ignore a bunch of rules and then re-ran Semgrep:

exclude_rules=(
  "php.lang.security.taint-unsafe-echo-tag.taint-unsafe-echo-tag"
  "php.lang.security.unlink-use.unlink-use"
  "javascript.express.security.audit.xss.mustache.explicit-unescape.template-explicit-unescape"
  "php.lang.security.curl-ssl-verifypeer-off.curl-ssl-verifypeer-off"
  "javascript.browser.security.insecure-document-method.insecure-document-method"
  "php.lang.security.tainted-path-traversal.tainted-path-traversal"
  "php.lang.security.tainted-user-input-in-php-script.tainted-user-input-in-php-script"
  "php.lang.security.injection.printed-request.printed-request"
  "php.lang.security.exec-use.exec-use"
  "php.lang.security.injection.tainted-sql-string.tainted-sql-string"
  "php.lang.security.injection.tainted-filename.tainted-filename"
  "php.laravel.security.laravel-path-traversal.laravel-path-traversal"
)

semgrep_command="semgrep scan --config auto . -o semgrep-results.txt --severity ERROR --severity WARNING"

for rule in "${exclude_rules[@]}"; do
  semgrep_command+=" --exclude-rule $rule"
done

eval "$semgrep_command"%

This drastically reduced the findings to 165. This I could go over. Having a better understanding of the OpenEMR codebase and its various security protection, I was able to rule out (drumroll please) all 165 findings as false positives.

I spent almost 20 hours doing this and had nothing to show for it…

Enter Snyk

I had almost given up on finding an issue in OpenEMR (and with this experiment in general). I had other “easy CVE” ideas which seemed very enticing. However, I decided not to give in to my habit of jumping from one idea to another (… cough … all my incomplete coding projects … cough … from the past 8 years … cough …). On a whim, I decided to run Snyk.

Snyk had considerably lesser findings. Only ~ 640. (Side note: for some unknown reasons, I decided to delete the original project, and therefore rescanned it today for this blog. Therefore, the original finding is understandably missing from this result.)

snyk-output

Many of these findings were also XSS and SQL injections - similar to Semgrep - which can again be attributed to the custom helper functions. At this point I was ready to walk away from this and go to my next “easy CVE” idea ✨ fuzzing ✨.

By now, it was well into the evening, and not wanting to start a new experiment, I told myself I will spend as much time as I had before I had to start preparing dinner to quickly look through the High findings.

CVE-2025-30161

A few minutes into reviewing the High findings, I found code that used echo without the helper functions. Could it be??? 😱. Well, dinner prep went for a toss as I started looking into this finding.

Not too long after, I confirmed that this code was indeed vulnerable to stored XSS. The finding was in the bronchitis form component of OpenEMR. There are 2 sources of XSS in the bronchitis form:

Bronchitis Ops Appearance
Bronchitis Oropharynx Appearance

The issue is caused by improper sanitization of user input in these lines:

The code here is like this:

<td><input type="text" name="bronchitis_oropharynx_appearance" value="<?php echo
stripslashes($obj["bronchitis_oropharynx_appearance"]);?>" size="15"></td>

where $obj["bronchitis_oropharynx_appearance"] is user controlled data. When a payload like " onfocus="alert(1)" is passed as the user input, the <td> is rendered like this which leads to XSS:

<td><input type="text" name="bronchitis_oropharynx_appearance" value="" onfocus="alert(1)" size="15"></td>

The issue is the same with Bronchitis Ops Appearance.

Fix

The OpenEMR developers fixed this issue very promptly by applying the attr() helper function to the vulnerable sources. The fix has been added to OpenEMR 7.0.3.

Conclusion

There may be “easy CVEs”. Perhaps I chose the wrong repo. After this success, I have continued the experiment with a few more repos and I have nothing to report yet. As I have mentioned previously, I am lazy and therefore I do not spend too much time looking into a finding, which, along with my general lack of skill in crafting exploits from SAST findings, could have lead to me missing vulnerabilities. If you are good at such sports, this experiment may produce better results for you.

I should note here that I have only tried somewhat popular repos with large-ish number of stars and forks. It is possible that not-so-popular repos would lead to more results. Even though I keep saying “easy CVEs”, I find myself more interested in finding CVEs in software that have a decent user base. The same also applies to my target selection for my fuzzing escapades. I will write about that in some time (or won’t - depending on my learnings and success…). Or maybe I will have more success with this experiment. Stay tuned! Thank you for reading.

Timeline

January 23, 2025: Reported issue to OpenEMR security team.
January 28, 2025: Issue fixed by OpenEMR security team.
March 1, 2025: Notified by OpenEMR security team that the issue is fixed.
March 19, 2025: Requested a CVE for this issue through GitHub.
March 20, 2025: CVE-2025-30161 assigned.
March 30, 2025: GHSA published, blog published.