Since 1922, technology and innovation have played an integral role in USAA’s ability to serve military members and their families. As membership has grown to over 13 million, the services required to meet members’ needs have also evolved. As a result, application teams across our banking and insurance businesses have turned to Google Cloud to develop the services of tomorrow.
At USAA, partners throughout the business rely heavily on the security team to get application teams onboarded to Google Cloud and productive in short order. This post details the automated processes our security team uses to satisfy those requirements. Specifically, we focus on the following principles:
- Ensure security best practices are built into the foundation – Whether it’s time-sensitive credentials or integration with security tools, building in security from the start reduces technical debt and ensures visibility across our GCP organization.
- Reduce the roadblocks to application team adoption of Terraform – Configuring Terraform allows application developers to focus on what they do best and ensures a better overall experience. Teams are also able to take advantage of existing preventive guardrails, helping improve USAA’s security posture.
- Establish a repeatable onboarding process for our security team – As demand grows, any security team member can reuse established, standardized procedures, helping to quickly add approved baseline security controls and guardrails. This also ensures knowledge isn’t centralized amongst a few team members.
Getting Started – Organizing Your Organization
One of the most powerful tools in your security team’s tool belt is the ability to structure your GCP resources hierarchically. This allows permissions to be defined at any tier and inherited by child resources. While it may be simple to lift and shift an existing organizational structure to the cloud, this initial step often benefits from a more thoughtful design given the impact of how an organization secures, deploys, and uses resources, both currently and in the future.
Security Pipeline – Access Control Framework
Early in our cloud journey, the security team made the decision to own all CI/CD resources along with Federated Identity Group IAM access. This decision allows the security team to grant IAM access programmatically and enables the application team service accounts and remote Terraform state buckets to live in security-owned projects outside the purview of the application team using them. It also has significant security implications and proves essential to building application baselines and boundaries for each team.
One advantage of this approach is the centralized management of remote Terraform state buckets and their object lifecycle policy, a control implemented due to the risks of application secrets and other sensitive information being stored in the state file. This solution also centralizes all CI/CD Pipeline Service Accounts. Because Google Cloud API access for service accounts is controlled by enabling/disabling APIs in their original project, API access for all Pipeline Service Accounts is centrally managed from this security-owned project. For example, suppose the Google Cloud Billing API is not enabled in the IaC-app project. In that case, none of the application service accounts will be able to enable the Billing API projects created under the Engineering folder. This method also allows the security team to prevent deletion, unauthorized service account key generation, or any other tampering with these privileged service accounts.
The security CI/CD pipeline is built using a simple set of tools:
- A Security Pipeline Service Account with a minimal set of permissions.
- A Terraform module that defines each application team baseline.
- A Terraform Remote State Bucket to manage each application team baseline deployment.
Additionally, to help manage our internal multi-tenant footprint, we’ve extended our onboarding process to populate a managed Firestore database with application-specific metadata. This data can be pulled for bill-back purposes or by other services, such as the project auto-labeler mentioned in the Event-Based Enrollment section of this article.
Application Baselining – Creating Permission Boundaries
The application-specific metadata enables our application baseline module to build an Application Team Folder, which teams employ as their GCP application workspace. In addition, the module creates a CI/CD Pipeline Service Account and a Google Cloud Storage bucket to hold remote Terraform state files in a security-owned project (IaC-app in the diagram below). Most importantly, it lays the foundational access controls for their Pipeline Service Account and Federated Identity Group. These IAM bindings are set on the Application Team Folder.
While this module works well in providing a base set of permissions, we quickly realized it did not provide an adequate level of agility for application teams. Because the identity that creates a Google Project is added as the owner without restrictions, project creation was initially handled manually by the security team. To solve this problem, we gave developers the ability to create and manage their own projects by implementing an event-based “de-escalation of ownership” service. This service uses an Organizational Log Sink to capture project creation events and trigger a Cloud Function, removing the developer as owner of the project. This ensures the permission boundary originally set via Whitelisted Roles cannot be modified by the application team.
This workflow provides several advantages. It allows application teams to get building in just minutes and provides teams a limited permission set to work with out-of-the-box. And while we afford limited console access in sandboxed environments, this initial baseline reduces friction associated with Terraform. No longer do teams need to worry about state file management, versioning, or file level encryption.
Authentication – Prebuilt Workflows for Application Teams
Authentication comes with tricky questions – what type of credential is used, where and how are these secrets stored, how are these incorporated into a CI/CD pipeline, etc. Unanswered, application teams are forced to invent their own strategy, not only diverting focus from development, but potentially serving as the source of security concerns. Our security team has worked to mitigate the issue with a pre-built workflow used by all CI/CD pipelines deploying to Google Cloud.
To introduce additional centralized controls around authentication to GCP, we created a Cloud Function protected by certain VPC Service Controls. The sole purpose of this Cloud Function is to generate short-lived credentials for our CI/CD pipelines. When we run the application baseline module described in the previous section, we create a “caller” service account and a number of “working” service accounts per environment for the application team. A service account key is generated for the “caller” service account which is securely stored on-prem and made retrievable to the CI/CD pipelines. The Cloud Function determines who the “caller” is, the environment being deployed to, and provides temporary credentials for the requested working service account. A word of caution here – Terraform is able to utilize these short-lived credentials without issue; however, other tools may not. Please see documentation for your deployment/test tools before implementing a similar strategy.
We’ve found this process works well for nearly all GCP deployments. Secrets are properly managed and safeguarded for application teams, and we follow best practices by using short-lived tokens and restricting ingress with VPC Service Controls. In addition, our team abstracts the usage of remote state buckets in the pipeline to eliminate the guesswork of mapping developer changes to the correct Terraform state file and bucket. This helps ensure developers on the same team don’t trample on teammates’ work when promoting code or creating feature branches. Similar to our baseline setup above, we’ve found this helps aid the adoption of Terraform as the deployment mechanism of choice across the organization.
Now that we’ve laid the foundation for application teams, the next challenge is ensuring our security team has visibility throughout our Google Cloud environment. This can encompass logging, monitoring, alerting, vulnerability management and more. To satisfy these requirements, we use a combination of native GCP services, external vendors and custom automation.
All automation starts when application teams create projects. We use aggregated log sinks at the organization level filtering for “projectCreate” events. These events are exported to a Pub Sub topic, triggering a Cloud Function. Depending on the purpose, the Cloud Function might make API calls to a vendor’s SaaS platform or a tool run out of USAA datacenters. We also utilize custom tooling, including an auto-labeler that ensures every project is labeled with the application-specific Firestore metadata supplied during the baselining process.
As application teams build out their capabilities, maintaining “day 2” support is paramount. Event-based processes moves our security team away from manual, error-prone processes and allows members to focus on higher priority work. As a result, our security team is able to scale as additional applications are onboarded.
As we continue to build on these foundational security layers, we maintain focus on balancing developer usability and agility with security requirements and ground ourselves in the ethos of low-touch cloud infrastructure as the number of teams scales out. We hope this article will assist your organization to develop a more secure foundational layer if you are considering utilizing the capabilities of Google Cloud Platform.
By: Tyler Warren (USAA) and James DeLuna (USAA)
Source: Google Cloud Blog