Maksym Lushpenko 12/3/24 Maksym Lushpenko 12/3/24

How We Reduced Our Google Cloud Bill by 65%

Learn how we reduced our Google Cloud costs by 65% using Kubernetes optimizations, workload consolidation, and smarter logging strategies. Perfect for startups aiming to extend their runway and save money.

Introduction

No matter if you are running a startup or working at a big corporation, keeping infrastructure costs under control is always a good practice. But it’s especially important for startups to extend their runway. This was our goal.

We just got a bill from Google Cloud for the month of November and are happy to see that we reduced our costs by ~65%, from $687/month to $247/month.

Most of our infrastructure is running on Google Kubernetes Engine (GKE), so most savings tips are related to that. This is one of those situations on how to optimize at a small scale, but most of the things can be applied to big-scale setups as well.

TLDR

Here’s what we did, sorted from the biggest impact to the least amount of savings:

Almost got rid of stable on-demand instances by moving part of the setup to spot instances and reducing the amount of time stable nodes have to be running to the bare minimum.
Consolidated dev and prod environments
Optimized logging
Optimized workload scheduling

Some of these steps are interrelated, but they have a specific impact on your cloud bill. Let’s dive in.

Stable Instances

The biggest impact on our cloud costs was running stable servers. We needed them for several purposes:

some services didn’t have a highly available (HA) setup (multiple instances of the same service)
some of our skills assessments are running inside a single Kubernetes pod and we can’t allow pod restarts or the progress of the test will be lost
we weren’t sure if all of our backend services could handle a shutdown gracefully in case of a node restart

For services that didn’t have a HA setup, we had the option to explore HA setup were possible (this often requires installing additional infrastructure components, especially for stateful applications, which in turn drives infrastructure costs up); migrating the service to a managed solution (e.g. offload Postgres setup to Google Cloud instead of managing it ourselves); accept that service may be down for 1-2 minutes a day if it’s not critical for the user experience.

For instance, we are running a small Postgres instance on Google Cloud and the load on this instance is very small. So, when some other backend component needs Postgres, we create a new database on the same instance instead of spinning up another instance on Google Cloud or running a Postgres pod on our Kubernetes cluster.

I know this approach is not for everyone, but it works for us as several Postgres databases all have a very light load. And remember, it’s not only about cost savings, this also allows us not to think about node restarts or basic database management.

At the same time, we are running a single instance of Grafana (monitoring tool). It’s not a big deal if it goes down during node restart as it is our internal tool and we can wait a few minutes before it comes back to life if we need to check some dashboards. A similar approach to the ArgoCD server that handles our deployments - it doesn’t have to be up all the time.

High Availability Setup

Here’s what we did for HA of our services on Kubernetes to be able to get rid of stable nodes, this can be applied to the majority of services:

created multiple replicas of our services (at least 2), so if one pod goes down, another one can serve traffic
configured pod anti-affinity based on the node name, so our service replicas are always running on different nodes:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app.kubernetes.io/name
          operator: In
          values:
          - pgbouncer
      topologyKey: kubernetes.io/hostname

added PodDistributionBudget with a minimum of 1 available pod (for services with 2 replicas). This doesn’t guarantee protection, but as we have automated node upgrades enabled, it can prevent GKE from killing our nodes when we don’t have a spare replica ready
reviewed terminationGracePeriodSeconds settings for each service to make sure applications have enough time to shut down properly
updated code in some apps to make sure they could be shut down unexpectedly. This is a separate topic, but you need to make sure no critical data is lost and you can recover from whatever happens during node shutdown
moved these services to spot instances (the main cost-savings step, the other steps were just needed for reliable service operations)

Experienced Kubernetes engineers can suggest a few more improvements, but this is enough for us right now.

Temporary Stable Instances

Now we come to the part about our skills assessments that need stable nodes. We can’t easily circumvent this requirement (yet, we have some ideas for the future).

We decided to try node auto-provisioning on GKE. Instead of having always available stable servers, we would dynamically create node pools with specific characteristics to run our skills assessments.

This comes with certain drawbacks - candidates who start our skills assessments have to wait an extra minute while the server is being provisioned compared to the past setup where stable servers were just waiting for Kubernetes pods to start. It’s not ideal, but considering it saves us a lot of money, it’s acceptable.

As we want to make sure no other workloads are running on those stable nodes, we use node taints and tolerations for our tests. Here’s what we add to our deployment spec:

nodeSelector:
  type: stable
tolerations:
  - effect: NoSchedule
    key: type
    operator: Equal
    value: stable

We also add resource requests (and limits, where needed), so auto-provisioning can select the right-sized node pool for our workloads. So, when there is a pending pod, auto-provisioning creates a new node pool of specific size with correct labels and tolerations:

GKE Node Taints and Labels from node auto-provisioning — Node Taints and Labels

Our skills assessment are running a maximum of 3 hours at a time and then automatically removed, which allows Kubernetes autoscaler to scale down our nodes.

There are a few more important things to mention. You need to actively manage resources for you workloads or pods may get evicted by Kubernetes (kicked out of the node because they are using more resources than they should).

In our case, we are going through each skill assessment we develop and take a note of resource usage to define how much we need. If this was an always-on type of workload, we could have deployed vertical pod autscaler that can provide automatic recommendations of how much resources you need based on resource usage metrics.

Another important point, is that sometimes autoscaler can kick in and remove the node if the usage if quite low, so we had to add the following annotation to our deployments to make sure we don’t get accidental pod restarts:

spec:
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

All of this allows us to have temporary stable nodes for our workloads. We use backend service to remove deployments after 3 hours maximum, but GKE auto-provisioning has its own mechanism where you can define how long these nodes can stay alive.

Optimizations

While testing this setup, we noticed that auto-provisioning was not perfect - it was choosing a little too big nodes for our liking.

Another problem, as expected, creating new node pools for every new workload takes some extra time, e.g. it takes 1m53s for a pending pod to start on an existing node pool vs 2m11s on a new node pool.

So, here’s what we did to save a bit more money:

pre-created node pools of multiple sizes with 0 nodes by default and autoscaling enabled. All of these have the same labels and taints, so autoscaler chooses the most optimal one. This saves us a bit of money vs node auto-provisioning
choose older instance types, e.g. N1 family vs N2 which is newer but a bit more expensive. Saved some more money

Plus, got faster test provisioning as node pools are already created, and we still have auto-provisioning as a backup option in case we forget to create a new node pool for future tests.

The last thing I wanted to mention here, we were considering 1-node per test semantics for resource-hungry tests, e.g. ReactJS environments. This can be achieved with additional labels and pod anti-affinity as discussed previously. We might add this on a case-by-case basis.

Consolidated Dev and Prod

We have a relatively simple setup for a small team: dev and prod. Each environment consists of a GKE cluster and a Postgres database (and some other things not related to cost savings).

I went to a Kubernetes meetup in San Franciso in September and discovered a cool tool called vcluster. It allows you to create virtual Kubernetes clusters within the same Kubernetes cluster, so developers can get access to fully isolated Kubernetes clusters and install whatever they want inside without messing up the main cluster.

They have nice documentation, so I will just share how it impacted our cost savings. We moved from a separate GKE cluster in another project for our dev environment to a virtual cluster inside our prod GKE cluster. What that means:

We got rid of a full GKE cluster. Even not taking into account actual nodes, Google started charging a fee for cluster management recently.
We can share nodes between dev and prod clusters. Even empty nodes require around 0.5 CPU and 0.5 GB RAM to operate, so the fewer nodes, the better.
We save money on shared infrastructure, e.g. we don’t need two Grafana instances, Prometheus Operators, etc. because it is the same “physical” infrastructure and we can monitor it together. The isolation between virtual clusters happens on the namespace level and some smart renaming mechanics.
We save money by avoiding paying for extra load balancers. Vcluster allows you to share ingress controllers (and other resources you’d like to share) between clusters, a kind of parent-child relationship.
We don’t need another cloud database, we moved our dev database to the prod database instance. You don’t have to do this step, but our goal was aggressive cost savings.

We had some struggles with Identity and Access Management (IAM) set up during this migration as some functionality required a subscription to vcluster, but we found a workaround.

We understand that there are certain risks with such a setup, but we are small-scale for now and we can always improve isolation and availability concerns as we grow.

Cloud Logging

I was reviewing our billing last month and noticed something strange - daily charges for Cloud Logging even though I couldn’t remember enabling anything special like Managed Prometheus service.

I got worried as this would mean spending almost $100/month for I don’t know what. I was also baffled why it started in the middle of the month, I thought maybe one of the developers enabled something and forgot.

After some investigation, I found what it was:

GKE Control Plane components were generating 100GB of logs every month. The reason I saw some charges in the middle of the month is there is a free tier of 50GB, so for the first two weeks there wouldn’t be any charges, and once you cross the threshold, you start seeing it in billing.

We already had somewhat optimized setup by disabling logging for user worklods:

We want to have control plane logs in case there are some issues, but this was way too much. I started investigating deeper and found that the vast majority of logs are info-level logs from the API Server. Those are often very basic and don’t help much with troubleshooting.

To solve this, we added an exclusion rule to the _Default Log Router Sink to exclude info logs from the API server:

As you can see on one of the previous images, the logging generation flattened out after applying this filter and we now have GKE logging under control. I’ve also added a budget alert specifically for Cloud Logging to catch this earlier in the future.

Conclusion & Next Steps

I wanted to see how much we can achieve without relying on any committed-use discounts or reserved instances as those approaches still cost money and are associated with extra risks, depending on if you buy 1 or 3-year commitments. Now, that we reduced our costs a lot, we can consider applying committed use discounts as those will be a pretty low risk at this level of costs.

I hope this will give you a few fresh ideas on how to optimize your own infrastructure as most of these decisions can be applied to all major cloud providers.

Maksym Lushpenko 2/16/24 Maksym Lushpenko 2/16/24

We've Upgraded Our Recruitment Technology to Give You More Control + API Documentation and More

Our new solution makes hiring more efficient, simplifying the way you invite candidates to DevOps assessments

More Efficient Recruitment Technology

No one likes a tedious hiring process, but unfortunately, many technical recruiting tools don't provide sufficient automation, slowing down productivity for recruiters and hiring teams. For example, individually inviting each candidate to take a skills assessment, rather than being able to invite multiple candidates all at once.

This month, we've introduced a solution to make hiring more efficient, simplifying the way you invite candidates to showcase their skills in areas like DevOps, Cloud, and SRE.

Introducing: Single Invitation Links

With Brokee's latest update, hiring managers can now create a single invitation link for job-specific tests and share it with multiple candidates at once.

This feature cuts down the time spent on administrative tasks, allowing you to focus more on evaluating the talents that come your way. It's a straightforward, efficient approach to tech hiring, designed with both hiring teams and candidates in mind.

Benefits for Technical Recruiters & Hiring Teams

One of the biggest perks of this new feature is the sheer amount of time it saves. Instead of sending out invites one by one, you can now reach an entire pool of candidates with just one link. It's like opening a direct line between your company and potential talent, making the initial steps of recruitment smoother and faster for everyone involved.

It also creates a more candidate-friendly application process. When candidates have a hassle-free experience from the start, it sets a positive tone for the rest of the hiring process. Plus, we've found that most technical candidates care greatly about how technically advanced the hiring process is at the companies they are considering.

Control and Flexibility

In the recruitment process, having control is crucial, and our newest feature ensures that you're in the driver's seat. You can decide exactly how many candidates can take the test through the invitation link, allowing you to manage your recruitment budget effectively without compromising on the quality of your talent pool.

Whether you're looking to fill a single position or several, this feature gives you the flexibility to scale your search up or down, based on your current needs. This way, you'll have the right tools to find the right people, at the right time.

How to Use the New Feature

Using Brokee's new invitation link feature is very straightforward. All you need to do is create an invitation link, add the role, assessment type, and number of candidates, and send the generated link!

With just a few clicks, you can generate a unique link for your job-specific test and share it across any platform you use to communicate with potential candidates.

Once you've filled the position or wish to pause the recruitment process, you can easily disable the link, ensuring no additional candidates will access the test until you're ready to hire again.

What's Coming Next

We're looking to add even more functionality to this feature. You'll soon be able to set an expiration date for your invitation links, providing another layer of control over your recruitment timeline.

We'll also give you the ability to adjust the number of candidates that can use the link, so you can seamlessly add more candidates without needing to create a new link.

As experienced hiring managers, we know the importance of providing you with all the tools you need to tailor your hiring process to your exact requirements.

Also Introducing: Test Preview and Quick Access

We've updated our navigation to make Brokee even easier to use. Upon logging in, you are greeted with access to core features – including 'Invite New Candidate' and 'See Tests in Action'. The latter allows you to view the testing process from the eyes of the candidates you're inviting, giving you special insights into the quality of your hiring process.

Our new candidate dashboard

Plus: New Diagrams for Clearer Test Instructions

To improve our tests' accessibility and clarity, Brokee has introduced diagrams alongside traditional text descriptions.

This initiative is especially beneficial for candidates who prefer visual explanations or face challenges with text instructions, such as dyslexia. By incorporating diagrams, we're making it easier for all engineers to understand the available infrastructure, furthering our commitment to inclusive and effective tech hiring.

Our skills assessment diagram was built with accessibility in mind.

Get Early Access to Our API Documentation!

We're thrilled to announce the initial release of Brokee's API documentation. It's a starting point for developers looking to integrate our skills assessment platform into their systems.

The first version covers the essentials, but we're aware it has room for improvement. We welcome your feedback as we refine and enhance the documentation. Your input is crucial for us to make the API more robust and user-friendly. Dive in, explore its capabilities, and let us know your thoughts!

Conclusion

Brokee's latest features mark a significant step forward in tech hiring: one-click invitations, improved recruiter dashboards, visual test diagrams for enhanced clarity, and accessible API documentation.

These updates will streamline the recruitment process, making it faster, more inclusive, and adaptable to your needs. We're dedicated to improving your hiring experience and eagerly await your feedback on these new functionalities. Explore what's new on Brokee and discover how we're evolving to help you hire the best tech talent more effectively.

Thank you for trusting Brokee with your tech recruitment needs. Let's continue to transform tech hiring together.

Maksym Lushpenko 1/16/24 Maksym Lushpenko 1/16/24

Introducing Pricing Plans and Free Trial for DevOps Tests

We've released pricing plans with a flexible and transparent credit system for tech hiring teams of all sizes. Plus, free trials!

We're starting 2024 off with a bang! We are happy to announce our biggest product update yet: the release of payment plans with a free trial.

To make this a reality, we:

Developed payment plans and integrated our platform with Stripe
Opened sign-ups to the public
Developed free tests specifically for the free trial
Added team management
Implemented a permissions system for various platform features based on payment plans

Let's start with the most exciting part - Free Tests!

Free Trial for (Easy) DevOps Tests

To showcase our platform without asking potential customers for credit card details, we designed several easy tests that can be taken free of charge.

Simply sign up on Brokee with a company name, and get quick access to a feature-limited free trial that allows new users to conduct 5 easy tests and manage 1 user, giving you a preview of our platform’s capabilities.

Once you get comfortable with the process, simply upgrade to any paid plan to send tests to more users and have access to more advanced tests.

What do we mean by advanced DevOps tests? We have environments where candidates can work with live systems deployed to major cloud providers and showcase their skills in real-time. We've found that live DevOps tests help avoid hiring risks more than relying on professional certifications or resumes.

Pricing Plans for DevOps Tests

Our payment plans are crafted with the understanding that every hiring team has its own set of unique needs. Whether you're a startup looking for flexibility or a large enterprise seeking comprehensive solutions, our plans are designed to suit a wide range of requirements.

Flexible System: Credits Rollover Each Month

At the core of our payment system is the credit model. Users can choose between buying a certain amount of credits upfront or opting for a monthly subscription that comes with 2 credits each month.

Test credits indicate how many hiring candidates can take skills assessments. Every time a candidate starts a test, 1 credit is deducted from the company's balance.

Our Growth Plan has one amazing feature: credits accumulate month-to-month if not used. So, no need to stress about paying for a product and not using it. The hiring process can be sporadic, so you can use rollover credits from less active months to hire confidently, without seeing a spike in your billing.

This system offers unparalleled flexibility – if you're on a Growth Plan and go over your credit limit, candidates can still continue taking skills assessments - you will be charged for the extra usage at the end of the month based on a tiered pricing model (the more tests that are used in a month, the cheaper they get).

Similarly, if you’re on an On-Demand Plan and run out of credits, you may simply buy more as needed.

Send as many invites as you'd like to candidates. You'll only get charged if they take the test.

Unlimited Candidate Test Invites

We wanted to mirror the natural hiring process when designing our payment plans. We've seen that when you're hiring engineers for a specific role, you often invite many candidates for a technical interview, but only some of them will show up.

With this idea in mind, Brokee allows you to invite an unlimited number of candidates to take a skills assessment, even though not all of them will actually take a test. This way, we charge customers based on the skills assessments that were started by candidates, not based on the number of invitations you send.

Add Multiple Admins for Team Management

When you're a small startup, one admin user might be enough for a testing platform. However, for tech recruiting firms or large hiring teams, you'll want to be able to provide access to multiple teammates. This is why we added basic team management to our paid plans to support bigger teams.

Right now, the only role is an admin user, but in the future, we'll add more roles with different levels of access.

Team Management Dashboard

Ready to Try Our Free DevOps Testing?

We developed Brokee's payment plans with a deep understanding of the unique challenges faced by tech hiring teams. We invite you to sign up for our free trial and experience firsthand how our platform can revolutionize your tech hiring process.

Maksym Lushpenko 12/11/23 Maksym Lushpenko 12/11/23

Now Providing DevOps Tests for all Major Cloud Providers

Brokee has achieved one important milestone - providing skills assessments for all major Cloud providers by adding skills assessments for Azure.

As 2023 is coming to an end, we are excited to share that the team at Brokee has achieved an important milestone – Providing skill assessments for all major Cloud providers - Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP). We also improved our method of notifying customers about the tests submitted by their candidates.

Keep reading for a deeper dive into our new Azure assessments and other latest updates. Plus, jump to the bottom to read about what we have in store for next month (hint: free tests)!

Azure Skills Assessments

In November, we released two new tests for companies running their IT infrastructure on Azure cloud. These tests are to qualify Junior, Mid, and Senior-Level Cloud Engineers.

Considering how actively Azure is promoting their service and that they are offering free credits for startups, we expect many companies to take advantage of Azure’s cloud platform.

When creating cloud assessments – like our newest Azure tests – we have to keep the following in mind:

Ensuring each candidate has an isolated unit for their test, so no one can mess with the environment;
Automating the creation and deletion of testing environments;
Providing candidates the ability to easily log in to their test.

When it comes to setting up infrastructure to allow for automated creation and deletion of testing environments, it is never a simple task. In terms of the ease of cloud resource management, we learned that Azure falls somewhere between GCP and AWS.

I’ll explain what this means – On AWS, we have a granular set of permissions for each test, specifying in detail what candidates can and cannot do within the testing environment.

Similarly, for GCP, we provide a separate project for every test, with broad permission inside the project, but with a set of guardrails to ensure candidates will stay within the bounds of the environment.

However, Azure has an unusual structure for managing access to cloud resources. They have completely separate User Management and Resource Management (servers, networks, etc.).

For resource management, Azure has ‘Resource Groups’, which allow you to create and delete a set of resources in one go. To create our Azure skills assessments, we used ‘Resource Groups’ to automate resource management and to create isolated units for each test.

Candidate Login Process

Another challenge we faced when creating the Azure tests, was simplifying the login process for candidates. For other tests like our CGP Assessments, candidates could just log in to the testing environment with their Brokee credentials.

Although Azure has many different options for authentication to the Azure portal, unfortunately for us, most of them are geared toward collaboration with other businesses or for typical end-user applications. In our case, we needed an integration with our user database, and this wasn’t within our means.

So, to overcome this obstacle, we settled on a simple approach of creating a temporary user for each test. We made it easy for candidates by pre-populating the username, so the candidate only has to provide an automatically generated password from the test description to access the test.

We Simplified Candidate Login for Accessing the Azure Testing Environment

Improved Test Submission Notifications

At the beginning of Brokee's development, our product was geared toward engineers. Most engineers communicate via Slack, so that is exactly where we sent test submission notifications.

As we’ve grown, we realize that not everyone uses Slack, and configuring private Slack channels with test notifications is not the most scalable option. So, we decided it was due time to move to email notifications.

In November, we configured email notifications as a default notification method to inform clients when their candidates submit tests. Now when candidates finish a test, our clients will receive an email notification looking similar to the one below:

While we still may provide Slack notifications as an option, rather than creating private channels, we'll allow companies to provide a Webhook to send notifications to. We believe this automated approach will be a win-win for our clients and our team.

Other Updates at Brokee

This month, we worked on updating our backend services, improving our monitoring setup to resolve issues quicker, and most importantly - integrating payment infrastructure.

Brokee was given Congressional and State recognition this month for our economic impact in Nevada. We're happy to say that ever since we moved some operations to Reno, we've had an absolutely welcoming reception.

Finally, we've published new articles on the blog, that we believe will be insightful and interesting to any professionals or teams that hire DevOps (or plan to hire DevOps in 2024):

Coming Up in 2024

Want to reduce DevOps hiring costs? In 2024, will release a free trial plan allowing users to sign up with just their business email (no credit cards required!).

This way, you can enjoy Brokee without having to talk to a single human (or pay a dime!). What could be better, right? ;) Stay tuned for the details next month, and have a great holiday season!

Maksym Lushpenko 11/9/23 Maksym Lushpenko 11/9/23

We've Added New Tests: Kubernetes and GPC Tests

We want to offer skills assessments for all the major Cloud providers. Lately, we've been working hard to do just that.

Since releasing our first test to assess cloud skills on Amazon Web Services, we knew we wanted to offer skills assessments for all the major Cloud providers. Lately, we've been working hard to do just that.

New DevOps Assessments Added to Our Testing Library

This month, we released two tests for Google Cloud: a short test for more junior engineers, and a more complex test tailored for mid to senior candidates.

We also decided to release an easier version of our most popular Kubernetes test that will provide quick validation of whether an engineer knows how to work with applications running on Kubernetes. This test was designed so it could be used by both software engineers and cloud engineers (DevOps, Cloud, SREs).

We added these junior-level assessments to our testing library because many of our clients expressed an interest in faster and easier skills assessments that would take only 15-20 min to complete, while still providing the same high-quality standard of assessing engineers' hands-on skills in certain technologies.

Read on to learn more about the new tests we are launching this month.

GCP Skills Assessments

Both Google Cloud Platform (GCP) tests provide candidates access to the Google Cloud console to work with cloud resources. Each candidate's test is created in a separate Google Cloud project, so their testing environment is completely isolated from others.

Introducing our new GCP Tests

Unlike AWS, Google Cloud doesn't allow the creation of usernames and passwords to access the GCP Console, so this means a candidate must have a Google account to log in to the GCP Console.

To overcome this limitation and to provide a seamless experience for candidates, we've implemented the Workforce Identity Federation. This allows candidates to use existing their Brokee credentials to log in to Google Cloud to complete skill tests.

This experience proved to be even more convenient than creating users on AWS, so in the future, we will attempt to achieve the same workflow for our AWS tests.

Similar to AWS and other tests on Brokee, hiring teams can still expect to receive the same detailed reports for each candidate. We will provide a history of user activity and AI summaries of the candidate's performance.

Junior Kubernetes Assessment

Our most popular test for Kubernetes is often considered pretty difficult for mid-level - and even sometimes senior - applicants!

In addition to needing to know their way around Kubernetes (which is complex on its own), engineers need to understand how external-dns and cert-manager can be used together to automatically create DNS records and SSL certificates for the application running on Kubernetes.

We usually see one of three things when candidates take this test:

a) they solve the test completely in a reasonable time frame (1-2 hours),

b) they complete 70-80% but have a tough time

c) they can't do it at all.

Not all engineers who work with Kubernetes necessarily have (or need!) a deep understanding of Kubernetes administration. Some developers just need to be able to deploy simple applications, while their colleagues do Kubernetes administration.

This made us realize that a simpler test would be sufficient to filter out the majority of unqualified candidates, and checking core Kubernetes functionality would be more applicable to a wider pool of companies.

Introducing our Kubernetes Junior Test!...

This test can be used for software developers or junior DevOps candidates (even though some would argue that there are no Juniors in DevOps 😈).

Candidates will fix configuration errors in a Kubernetes application to make it run correctly on the cluster. This should be a quick puzzle for those with at least some knowledge of Kubernetes.

Thank you for reading - we invite you to try out our new tests with your teammates or hiring candidates, and let us know what you think. What's coming up next month? We are developing skills assessments for Azure, stay tuned!

Maksym Lushpenko 10/9/23 Maksym Lushpenko 10/9/23

We Offer an AI-Powered Technical Recruiting Tool

Brokee.io added two major features in September: Recording of user activity and AI-generated summaries of candidate performance.

Our team had a busy September. We expanded our team by hiring two engineers, an engineering intern, and a lead generation expert. Since it takes time to onboard new team members, we didn't plan too much for September. Despite that, we successfully delivered two features that we had been planning for a long time. Additionally, we moved our blog posts to a new platform called Ghost, which has improved the writing experience. Previously, we used Sanity, but I may write another blog post explaining why we made the switch. All in all, it was a productive month for us.

Long story short, let's jump to our big updates.

Recording Of User Activity

Brokee is a platform that offers technical skills assessments to potential hiring candidates. The candidates are given access to live IT infrastructure and are required to fix a broken application to showcase their technical skills with specific technical stacks. Previously, we provided a text-based history of the candidate's user activity that showed which commands they used to troubleshoot and fix various problems with the application's configuration, environment, or security.

This approach was helpful in providing a quick glimpse of how the candidate was attempting to fix IT infrastructure issues. It allowed us to see where candidates got stuck for 5-20 minutes and needed to spend time reading documentation or searching for potential fixes on the internet. However, it lacked some depth as we couldn't see what was happening when the candidate was editing a file. This made it difficult to judge if specific changes were relevant or if the candidate was just looking at a file without making any changes. We do provide automated evaluations of each test to ensure that the application is fixed at the end of the test, but it's still important to know exactly how the problem was solved.

So, we've integrated asciinema into our testing environment. Now, you can replay the whole user activity with a click of a button.

While this is great, nobody wants to spend 30-60 minutes rewatching the terminal activity of every hiring candidate, so we suggest the following approach for technical team members to spend the least amount of time analyzing user activity:

Take a 1-minute look at the history of user commands.
If you see something interesting or unusual, look at the time stamp, and jump to the specific time in the recording to understand what was happening in detail.

This way, you can quickly understand how exactly candidates solved specific problems in a fast and efficient manner.

AI-Generated Summary of Candidate Performance

Another major update on Brokee was the integration of AI into automated evaluations of candidate performance. While the history of user commands and replaying of user activity can help technical team members understand how every candidate fixes problems, every engineer has different skill sets and experience levels. As a result, less technical members of the hiring team may not be able to understand user activity, making textual and video representations insufficient to make informed decisions about the best hire. To address this, we asked ourselves how we could make the results more accessible to everyone on the hiring team?

Meet AI summaries of candidate performance:

As we are gathering data on user activity, we can use it to generate a structured summary with easily understandable explanations of what occurred in the testing environment. This allows everyone to quickly review the report and have a common understanding of which tools the candidate used to troubleshoot problems, along with their general purpose (such as networking configuration, working with files, security configuration, etc.). Additionally, the report highlights any changes made to the environment, and provides details on specific configurations used. As there are multiple ways to fix the same problem, even experienced engineers can learn something new by reviewing the AI-generated summary, especially if the candidate used a new tool to resolve an issue.

This is great because our automated evaluations can serve as performance reports as well as educational materials for the entire team. If you have multiple candidates who have completed skills assessments, you can refer to this report to identify the most efficient and creative problem solvers for issues with IT infrastructure.

Reporting Recap

This is just the beginning. We plan to provide more advanced reports in the future, but just to recap, here's what Brokee offers right now in terms of reporting:

Test Completion Status: Passed/Failed
Time Taken To Complete Assessment
Number of Completed Subtasks
Automated Checks
AI-Generated Summary of User Activity
History of Commands
Recording of Terminal Activity

We strive to be your top choice in hiring DevOps and Cloud engineers. If you're planning to hire or need help assessing candidates, please contact us via our website, and don't forget to follow us on Linkedin and Facebook.

Maksym Lushpenko 10/9/23 Maksym Lushpenko 10/9/23

How Much Does Hiring DevOps Cost?: Hidden Costs of Your Hiring Pipeline

But how does a company track the amount of time and resources they put toward Hiring DevOps Engineers? Let’s take a look at our Savings Calculator.

Hiring is a Top Concern for CEOs

Despite recession expectations, hiring tops the list of CEOs’ internal concerns. According to the 2023 Conference Board Survey, talent ranks first among internal worries for CEOs worldwide.

Yet only about one-third of U.S. companies report that they monitor whether their hiring practices lead to good employees; few of them do so carefully, and only a minority even track cost per hire and time to hire.

Tracking Hiring Costs

Over our time in the hiring sphere, we’ve spoken to dozens of CTOs, CEOs, recruiters, tech hiring teams, and HR leaders, and we’ve found that most companies don’t have data on the amount of time and resources they spend on hiring.

When we calculate hiring costs for them, most C-suites and directors are surprised and left wondering why they haven’t tracked hiring costs better to begin with, especially when they track other spending, from sales and marketing expenses, to travel and equipment.

By tracking hiring spending, businesses are able to allocate their resources better, save on unnecessary costs, and prioritize initiatives – such as better retention and candidate qualification.

Hiring Sys Admins, DevOps, and Cloud Engineers

This is especially pertinent when it comes to high-value DevOps and Cloud system engineer hires - some of the most expensive hires in the IT world.

Great companies spend immense resources searching for talented DevOps engineers, and even more to bring them onto the team. These recruiting and sourcing costs add up quickly.

But how does a company track the amount of time and resources they put toward Hiring DevOps Engineers? Let’s take a look at our Savings Calculator.

How Much Does Hiring a DevOps Engineer Cost?

We’ve created an algorithm that calculates DevOps and Cloud hiring costs, taking into account many factors:

•The Cost per Technical Interview, based on the salary of Senior Engineers who conduct these interviews.

•The Average Cost Per Additional Interview, which is based on the salary of HR or Hiring Managers who conduct cultural fit or other additional interviews.

•The amount of time it takes to prepare for, conduct, and evaluate each interview.

•The number of candidates that are interviewed to fill each role.

The costs for the interviews are calculated based on the average salaries in the United States for the respective roles. These salaries are then converted into hourly rates to determine the interview costs.

The Hidden Costs of Hiring

We will explain how this process works using our standard calculations based on the median US salaries from Levels. It's important to note that each company has the flexibility to adjust these numbers based on its own pay rates and specific hiring dynamics.

For our calculations, we consider the average hourly costs for a DevOps Engineer and a Software Engineering Manager, which are conservative rates.

Typically, a senior DevOps engineer spends at least onehour discussing candidates' technical backgrounds and asking AWS DevOps questions, while an additional hour is needed from the Software Engineering Manager, hiring manager, or team to assess culture fit and role expectations.

To hire a DevOps engineer, you would likely conduct a 1-hour technical interview with a senior member of the DevOps team. This amounts to 1 hour of their average salary, or $69 on average.

Keep in mind that this cost is per candidate, and typically, anywhere from 20 to 50 candidates are interviewed before selecting a hire.

DevOps teams often waste hours of time helping with hiring that could be automated

Let's say you had a pipeline of 100 candidates, 50 decided to take the test, and 20 passed the test.

In a typical hiring process, you'd have to interview all 100 candidates. By introducing an automated test, you save $6900 (100 candidates * $69).

Plus, your senior engineers will get an extra 100 productive work hours as they don't need to spend time interviewing those candidates.

Once the technical team qualifies a candidate, there is usually a second interview with a software engineering manager, costing approximately $139 for each hour of their time.

In our scenario, 80 candidates didn't pass to the next stage, so the hiring manager won't have to spend time conducting a second interview with those candidates.

This equates to an extra 80 productive work hours for your hiring manager and $11120 in savings (80 candidates * $139).

Based on the proposed rates, when hiring a single DevOps engineer, the hiring team would save an overall $18020 ($6900+$11120) as well as 180 hours of time.

This estimation does not include expenses such as HR and recruiting fees, onboarding, and other hiring costs.

How to Save Money on DevOps Hiring

By utilizing Brokee's technical evaluations, you can eliminate the initial technical interview for every candidate, resulting in time and cost savings for your DevOps hiring.

Furthermore, Brokee's tests have a 20% passing rate, guaranteeing that all engineers who pass the test will be highly qualified for the final interview with a software engineering manager.

By filtering out 80% of candidates who fail the technical screening, you significantly reduce the chances of low-quality engineers slipping through the cracks.

This saves valuable time for software engineering managers, CTOs, and other team members who are no longer required to interview these candidates.

Interested in saving money on DevOps hiring?

Brokee has multiple DevOps, Cloud, and System engineering evaluations for popular IT systems, such as Linux, Kubernetes, and AWS.

Our tests are ‘Chat-GPT-proof’ and ‘ungoogleable’, ensuring we’ll qualify only the best talent. Get started with a free demo of Brokee and start saving money while hiring with confidence.