Editor’s note: Today we’re hearing from Padmanabh Dabke, Senior Director of Data Analytics and Zander Lichstein, Technical Director and Architect for GCP Migration at Broadcom. They share how Google Cloud helped them modernize their data analytics infrastructure to simplify their operations, lower support and infrastructure costs and greatly improve the robustness of their data analytics ecosystem.
Broadcom Inc. is best known as a global technology leader that designs and manufactures semiconductor and infrastructure software solutions. With the acquisition of Symantec in 2019, Broadcom expanded their footprint of mission critical infrastructure software. Symantec, as a division of Broadcom, has security products that protect millions of customers around the world through software installed on desktops, mobile devices, email servers, network devices, and cloud workloads.
All of this activity generates billions and billions of interesting events per day. We have dozens of teams and hundreds of systems which, together, provide protection, detection, exoneration, and intelligence, all of which requires handling a massive amount of data in our data lake. Broadcom’s Security Technology and Response (STAR) team leverages this data lake to provide threat protection and analytics applications. The team needed a more flexible way to manage data systems while eliminating resource contention and enabling cost accountability between teams.
Our data lake has served us well but as our business has grown so have our technology requirements. We needed to modernize the legacy implementation of the data lake and analytics applications built on top. Its monolithic architecture made it difficult to operate and severely limited the choices available to individual application developers. We chose Google Cloud to speed up this transformation. In spite of the complexity and scale of our systems, the move to Google Cloud took less than a year and was completely seamless for our customers. Our architectural optimizations, coupled with Google Cloud’s platform capabilities simplified our operational model, lowered support and infrastructure costs, and greatly improved the robustness of our data analytics ecosystem. We’ve reduced the number of issues being reported on the data lake, translating to a reduction of 25% in monthly support calls from internal Symantec researchers related to resource allocation and noisy neighbor issues.
Where does our data come from and how do we use it?
Providing threat protection requires a giant feedback loop. As we detect and block cyber attacks in the field, those systems send us telemetry and samples: the type of threats, where they came from, and the damage they tried to cause. We sift through the telemetry to decide what’s bad, what’s good, what needs to be blocked, which websites are safe or unsafe, and convert those opinions into new protection which is then pushed back out to our customers. And the cycle repeats.
In the early days, this was all done by people—experts mailing floppy disks around. But these days the number of threats and the amount of data are so overwhelming that we must use machine learning (ML) and automation to handle the vast majority of the analysis. This allows our people to focus on handling the newest and most dangerous threats. These new technologies are then introduced into the field to continue the cycle.
Shortcomings of the legacy data platform
Our legacy data platform had evolved from an on-prem solution, and was built as a single, massive, relatively inflexible multi-tenant system. It worked well when there was a big infrastructure team that maintained it but failed to take advantage of many capabilities built into Google Cloud. The design also introduced a number of obvious limitations, and even encouraged bad habits from our application teams. Accountability was challenging, changes and upgrades were painful, and performance ultimately suffered. We’d built the ecosystem on top of a specific vendor’s Apache Hadoop stack. We were always limited by their point of view, and had to coordinate all of our upgrade cycles across our user base.
Our data platform needed a transformation. We wanted to move away from a centralized platform to a cloud-based data lake that was decentralized, easy to operate, and cost-effective. We also wanted to implement a number of architectural transformations like Infrastructure as Code (IaC) and containerization.
“Divide and Conquer” with ephemeral clusters
When we built our data platform on Google Cloud, we went from a big, centrally managed, multi-tenant Hadoop cluster to running most of our applications on smaller, ephemeral Dataproc clusters. We realized that most of our applications follow the same execution pattern. They wake up periodically, operate on the most recent telemetry for a certain time window, and they generate analytical results that are either consumed by other applications or pushed directly to our security engines in the field. The new design obviated the need to centrally plan the collective capacity of a common cluster by guessing individual application requirements. It also meant that the application developers were free to choose their compute, storage, and software stack within the platform as they seemed fit, clearly a win-win for both sides.
After the migration, we also switched to using Google Cloud and open-source solutions in our stack. The decentralized cloud-based architecture of our data lake provides users with access to shared data in Cloud Storage, metadata services via a shared Hive Metastore, job orchestration services via Cloud Composer, and authorization via IAM and Apache Ranger. We have a few use cases where we employ Cloud SQL and Bigtable. We had a few critical systems based on HBase which we were able to easily migrate to Bigtable. Performance has improved, and it’s obviously much easier to maintain. For containerized workloads we use Google Kubernetes Engine (GKE), and to store our secrets we use Secret Manager. Some of our team members also use Cloud Scheduler and Cloud Functions.
Teaming up for speed
STAR has a large footprint on Google Cloud with a diverse set of applications and over 200 data analytics team members. We needed a partner with in-depth understanding of the technology stack and our security requirements. Google Cloud’s support accelerated what would otherwise have been a slow migration. Right from the start of our migration project, their professional services organization (PSO) team worked like an extension of our core team, participating in our daily stand-ups and providing the necessary support. The Google Cloud PSO team also helped us quickly and securely set up IaC (infrastructure as code). Some of our bleeding edge requirements even made their way to Google Cloud’s own roadmap, so it was a true partnership.
Previously, it took almost an entire year to coordinate just a single major-version upgrade of our data lake. With this Google Cloud transformation we can do much more in the same time, it only took about a year to not only move and re-architect the data lake and its applications, but also to migrate and optimize dozens of other similarly complex and mission-critical backend systems. It was a massive effort, but overall, it went smoothly, and the Google Cloud team was there to work with us on any specific obstacles.
A cloud data lake for smoother sailing
Moving the data lake from this monolithic implementation to Google Cloud allowed the team to deliver a platform focused entirely on enabling app teams to do their jobs. This gives our engineers more flexibility in how they develop their systems while providing cost accountability, allowing app-specific performance optimization, and completely eliminating resource contention between teams.
Having distributed control allows teams to do more, make their own decisions, and has proven to be much more cost-effective. Because users run their own persistent or ephemeral clusters, their compute resources are decoupled from the core data platform compute resources and users can scale on their own. The same applies to user-specific storage needs.
We now also have portability across cloud providers to avoid vendor lock-in, and we like the flexibility and availability of Google Cloud-specific operators in Composer, which allow us to submit and run jobs across Dataproc or on an external GKE cluster.
We’re at a great place after our migration. Processes are stable and our data lake customers are happy. Application owners can self-manage their systems. All issues around scale have been removed. On top of these benefits, we’re now taking a post-migration pass at our processes to optimize some of our costs.
With our new data lake built on Google Cloud, we’re excited about the opportunities that have opened up for us. Now we don’t need to spend a lot of time on managing our data and can devote more of our resources to innovation.
By: Zander Lichstein (Technical Director and Architect for GCP Migration [Broadcom]) and Padmanabh Dabke (Senior Director of Data Analytics [Broadcom])
Source: Google Cloud Blog