FIPS-ing the Un-FIPS-able: Apache Spark
We’re excited to announce that Chainguard is now building and delivering FIPS-validated images for the Apache Spark project, entirely from source. This milestone marks the first FIPS-validated image for Apache Spark in our industry and represents the latest achievement in Chainguard’s journey to FIPS the Un-FIPS-able. If you’ve been following Chainguard, you’ll know that this is the second open source project where Chainguard has overcome the limitations of incompatibility between the upstream code and FIPS-approved cryptographic libraries. Only last month, we announced FIPS-validated versions of Apache Cassandra.
Now, with Chainguard delivering FIPS-validated images for both Spark and Spark Operator, customers operating in highly regulated environments – public sector, finance, healthcare, and cybersecurity – can now deploy this powerful distributed data processing engine using only FIPS-approved cryptography. This is a major milestone for teams seeking to harden their data infrastructure while simplifying their compliance and regulatory needs by default.
In this post, we’ll walk through why we tackled Spark, how we did it, and what it means for our customers.
The Demand for Spark FIPS Images
Apache Spark plays a foundational role in the modern data stack because its distributed computing capabilities and in-memory processing enable engineers to handle massive-scale data workloads efficiently. Critical applications that demand high-performance, large-scale data processing like real-time data streaming, fraud detection algorithms, recommendation systems, and more all run on Apache Spark. Naturally, customers started asking Chainguard for FIPS-compliant versions of Spark for their most sensitive environments.
It seemed impossible at first: the upstream Spark project doesn’t support FIPS-compatible cryptography, and adapting it would be a massive effort. But the requests kept piling up – from cloud-native data teams in financial services, cybersecurity companies building on Spark MLlib, and federal contractors who couldn’t remove Spark from their architecture but also couldn’t deploy it as-is in secure environments.
It became clear that we had to do what no one else had done before: build FIPS-validated Apache Spark container images.
How We FIPS-ified Apache Spark
Just like our work with Cassandra, FIPS-enabling Spark required a deep dive into Spark’s dependencies, architecture, and runtime behavior. We split the work into three major areas:
Source Code Forks for FIPS Compatibility: To FIPS-ify Spark, we not only had to fork the Apache Spark project, but also its main dependencies: Hadoop and gRPC. For each of these open source projects, we made targeted and modular code changes to ensure that Spark was compliant in leveraging FIPS-validated cryptographic libraries instead of its native cryptographic functions.
Extensive Testing: To test and validate that our Spark images were FIPS compliant, we had to enable Spark to use TLS. Getting TLS working across every Spark node is notoriously challenging. That’s because every Spark node needed proper configuration to use Chainguard’s keystore. Additionally, the Spark operator’s properties file – which is eventually copied to every Spark node – had to be overwritten with Chainguard’s properties. Lastly, we wrote extensive tests in Scala, Python, and R to validate that Spark functioned as expected.
Building and Maintaining FIPS-Validated Images: After validating the code changes and test coverage, we created and released Spark container images using Chainguard’s hardened toolchain – with continuous maintenance and updates to ensure both security and FIPS compliance. These images follow the same update and lifecycle model as all other Chainguard Images, ensuring customers always have a secure, compliant base to build on.
We're also working toward contributing these modular changes back upstream – because we believe secure software should be available to the entire open source community.
Getting Started with Spark-FIPS
Chainguard’s FIPS-enabled images for Apache Spark represent a major leap forward in securing the modern data stack. Spark is no longer an exception to your FIPS compliance strategy – it’s now part of it.
We’re still early in the journey. We’re actively exploring FIPS builds for other previously “un-FIPS-able” projects like Apache Kafka and Apache ZooKeeper, and we’d love to hear what else you want us to prioritize.
If you’re interested in Spark-FIPS or want to learn more about Chainguard’s custom FIPS image program, please reach out.
Ready to Lock Down Your Supply Chain?
Talk to our customer obsessed, community-driven team.