Editor’s note: Today we hear from Kenny Kon, an SRE Director at Sabre. Kenny shares about how they have been able to successfully adopt Google’s SRE framework by leveraging their partnership with Google Cloud.
As a leader in the travel industry, Sabre Corporation is driving innovation in the global travel industry and developing solutions that help airlines, hotels, and travel agencies transform the traveler experience and satisfy the ever-evolving needs of its customers.
In order to build these solutions, we joined forces with Google Cloud as our preferred cloud provider to accelerate our digital transformation. We chose Google because they understand the industry we are in as they also manage travel products such as Google Travel. Google also created SRE (Site Reliability Engineering), and operates with SRE principles at the Google scale, which is what intrigued us the most.
Initially we started with a multi-cloud model, but that didn’t help us move faster so we consolidated to just Google Cloud. To speed our transformation along, we have adopted Google SRE (Site Reliability Engineering) practices which enables us to balance reliability and speed. We have been able to make this transformation with the direct help of Google Cloud’s Professional Services Organization (PSO) along with Google Cloud’s tooling, like Cloud Monitoring and Cloud Logging, and operating on Google Kubernetes Engine (GKE), and Cloud Spanner.
In adopting SRE at Sabre, we’d like to highlight three key takeaways from the journey:
1. Find colleagues who are also passionate about shifting culture and adopting SRE
Create a community within your organization who is dedicated to the SRE journey and motivated to make things happen. As we adopted SRE at Sabre I saw more and more people rallying and coming together to support the culture change. With some momentum built it was great to bring shared experiences to the team as we all spoke in the same language talking about SLOs, SLIs, and about how we measure things.
Some of the ways in which we built our community was by hosting monthly brown bag sessions. This is an informal gathering where teams come in and share their experiences and challenges, or teach on specific SRE topics such as SLOs or toil. We also created a public Google Developer Group (GDG) and have hosted several Google SRE subject matter experts to speak on SRE principles and best practices.
2. Get your mid level leadership stakeholders on board
We know how important getting leadership buy in is to creating a successful SRE movement within an organization. That top-level buy-in is highly important to get resources and drive transformation across the organization, but what is sometimes missed is making it a priority to get mid-level leadership on board as well. It’s difficult to enact change from the ground up starting with practitioners at the bottom, and it’s also difficult to just have leadership buy in, as once it gets down to the middle, things may fall apart. It is imperative to have mid-level leaders on board as well, as they directly affect the culture and decisions of their teams. To avoid resistance, it is also important that the mid-level leadership (product, operations and engineering managers), i.e. people managers, will understand the motivations behind change so they will be onboard. Without that understanding, it will hinder mid-level leadership’s ability to communicate changes to the practitioners level and can impact the teams’ goal and allocated bandwidth.
3. Don’t be afraid to get help from professionals
Adopting SRE at a large organization is no simple feat. Partnering with Google’s SRE consulting experts has brought about a huge shift at Sabre. The value PSO brings is not just training, it’s also listening. We’ve had experienced Googlers who understand our problems and have been at our stage in the SRE journey listen, analyze and tailor the approach specific to our team’s goals. PSO helped us by shifting our engineering teams to be more customer centric, and aligning our product, operations, and development teams. But most importantly, they’ve helped to make our current teams happier, because they’re not spinning their wheels, waiting around on blocked requests.
When we partnered with PSO we were aware of who the key stakeholders in our organization are: the mid-level leadership and people managers. We made sure to bring them into our PSO discussions and decision making sessions and as a result, helped us to get more traction and solve the gap we had, enabling the middle-level and bringing them on board.
Some of the actions we have taken with help from our PSO SRE partners include adding a tiers of service approach, improving incident management through wheels of misfortune (WoM), defining critical user journeys (CUJs), and implementing error budgets.
Since putting these SRE practices into place, our business is more aligned to customer experience. We now invest org resources according to the needs of our customers and with that have reduced silos across our teams. Our Ops team is much happier since they can move faster and not have to block requests. SRE has taught us a common language, a common framework. Moreover, it gives this whole discipline a culture and meaning.
By: Kenny Kon (SRE Director at Sabre)
Source: Google Cloud Blog