The Database teams are generally responsible for the performance, integrity, and security of databases. The teams are often also involved in planning, developing and troubleshooting databases. The Database teams manage Bolt databases, some of the largest in the northern hemisphere.
Everything in Bolt is big, and this goes for databases as well. MySQL is a fantastic database and technology, but in some cases, we need something bigger. TitaniumDB is a next-generation Database engine, and we operate, manage, and develop this new technology.
Running MySQL at the scale of Bolt does have its challenges. It requires high competencies and cold nerves because operating in the heart of Bolt’s backend services is nothing less than high-impact work. We ensure that everything works fast: queries have low latency, the capacity’s there, and nothing piles up in databases. We invest in having good visibility of the technical platform and solving problems before they occur.
The Site Reliability and Database teams’ mission is to keep Bolt services up and running. We're closest to the infrastructure and maintain all Cloud services: servers, Kubernetes clusters, AWS managed and unmanaged services, etc. Additionally, we manage Bolt databases which are some of the largest in the northern hemisphere
We are an SRE team dedicated to managing Bolt's stateful services. We take care of several Elasticsearch, Redis, Kafka, and graph database setups. We also run systems that collect and visualise metrics and generate alerts. We have live systems that store hundreds of TB and handle millions of requests per second. We use tools like Prometheus, Thanos, Grafana, Kubernetes, AWS EC2, Terraform, and Ansible.
We govern any cloud infrastructure Bolt uses or plans to use in the future. Our responsibility is to understand its possibilities, limitations, and operational excellence needs. It also means being a cloud engineer with a magnifying glass ready to dive deeply into certain services. We also maintain and develop internal IoC practices and validate compliance with different requirements.
We’ve been tasked to guard the stateless infrastructure, using technologies like Docker, Kubernetes, Terraform, Ansible, Envoy, and Nginx to maintain and ensure scalability for the thousands of micro-services running in it. With all of the infrastructure in the Cloud and code, we have the opportunity to focus on automation and projects, which operate on a scale of Petabytes for both our network as well as storage. We have the power to steer the architecture but also the responsibility to ensure our services keep working at all times.
The Backend Platform team supports Bolt’s software engineers in their routines across multiple areas, including CI/CD pipelines, quality assurance and automation through testing automation, and code quality through linters.
The UI Platform team is responsible for developing and maintaining solutions which enable engineering teams at Bolt to build complex user interfaces. Our team’s work lies in the domain of various platforms such as Web, React Native and Node.js, as well as different parts of a diverse technical stack. We regularly interact with a few dozen engineering teams, and this amount is constantly increasing.
The Data Engineering teams develop the Data Platform — an internal product to collect, access, process, and store huge amounts of data from different sources. In addition, we build the Data Science platform to optimise data storage and pre-processing and improve ML models' lifecycle. Our mission is to keep the highest possible data quality and maintain a delicate balance between data availability, lateness, completeness, and infrastructure costs.
Our team democratises the usage of data and machine learning by building a platform which solves common challenges of the production ML systems. We build infrastructure for training, testing, serving, and monitoring models at scale. We provide the AB Testing Platform — a scalable and extendible way to run AB tests so that other teams can define metrics computed for tests across all of the business verticals. The main programming languages used are Python and TypeScript (Node.js), and the team uses AWS, Airflow, Docker, and Spark, among other technologies.
We provide a platform to access data insights. We’re responsible for the technical aspects of our data views: integration of Looker with Data Warehouse, definition and maintenance of incremental data views (using SQL & Python), and data access management solutions. We manage integrations with third-party data providers and build internal tools for Data Analysts.
The Data Transformation team works with AWS services (Redshift, S3, Batch, Spark — all managed using Terraform) building the Data Pipelines framework based on Airflow (Python, SQL) and Spark (Java, Scala, Python running on Databricks). We're building a platform that empowers other teams to build and run data processing pipelines.
The Data Lake team develops infrastructure for collecting, storing, and querying data from different sources. This includes mostly working with AWS services (Redshift and S3 managed using Terraform), PrestoDB, and tuning our Streaming System based on Kafka: writing stream processors with KSQL and Java. The main responsibilities include development, optimisations and improvements to the Data Lake ingestion pipelines.