Introduction
Observability is the backbone of any modern infrastructure, enabling organizations to monitor system health, optimize performance, and ensure seamless operations. However, when legacy observability systems reach their limits—whether due to scalability challenges, high costs, or lack of vendor support—businesses must pivot to more future-proof solutions.
One such case was a leading communication company that approached us to revamp its aging observability stack. Their existing system was built on InfluxDB v1, TICK stack, and Grafana 8, which has reached end-of-life, posing risks to reliability and long-term maintenance. They needed a modern, scalable, and cost-effective alternative—one that could support multi-tenancy and efficiently handle years of historical data. After evaluating several options, they identified Grafana Mimir as the ideal choice, a solution already proven at scale by organizations like CERN.
The migration presented a unique set of challenges: handling seven years of monitoring data (roughly 100TB of uncompressed data), complex migration as there is no direct support to migrate from InfluxDB to Grafana Mimir, Grafana dashboards rewrite from InfluxQL to PromQL format for 100s of dashboards. In this blog, we’ll walk through the entire migration process, the challenges faced, and the architectural choices that enabled a seamless transition to Grafana Mimir.
Open source leading the 2025 Observability Trends
In 2025, the observability landscape is undergoing significant transformations, driven by the complexities of modern, distributed systems and the need for cost-effective, scalable solutions. Traditional monitoring tools like Prometheus and proprietary systems such as InfluxDB have been instrumental in system monitoring. However, they present challenges: Prometheus can struggle with scalability and multi-tenancy, while proprietary solutions often lead to vendor lock-in and escalating costs.
In response to these challenges, organizations are increasingly adopting open-source solutions like Grafana Mimir. Mimir offers a scalable, multi-tenant metrics storage system that integrates seamlessly with existing monitoring setups, providing enhanced performance and flexibility. Complementing this trend is the widespread adoption of OpenTelemetry, which has become a standard for instrumenting and collecting telemetry data, enabling interoperability across diverse observability tools.
According to Grafana Labs' 2025 Observability Survey, there is a notable convergence of profiling and tracing, allowing for deeper insights into application performance. Additionally, the integration of AI/ML is enhancing root cause analysis and predictive analytics, augmenting engineers' capabilities rather than replacing them. These developments underscore a shift towards open-source, scalable, and intelligent observability solutions that address the limitations of traditional monitoring tools.
Grafana Mimir
Grafana Mimir is an open-source, horizontally scalable, and highly available time-series database designed to provide long-term storage for Prometheus metrics. Developed by Grafana Labs and introduced in 2022, Mimir addresses the scalability and high-availability challenges associated with large-scale metric monitoring. It enables organizations to scale their metrics infrastructure to over 1 billion active series, ensuring robust performance and reliability.
One of Mimir's key features is its seamless compatibility with Prometheus, including support for remote write, PromQL, and alerting functionalities. This ensures that organizations can integrate Mimir into their existing Prometheus-based monitoring setups without significant modifications. Additionally, Mimir offers multi-tenancy support, allowing multiple teams or business units to share the same infrastructure while maintaining data isolation. Its architecture leverages object storage solutions like AWS S3, Google Cloud Storage, and Azure Blob Storage for durable and cost-effective long-term data retention.
Furthermore, Mimir's horizontally scalable, microservices-based architecture allows organizations to expand their monitoring capabilities by simply adding more instances to the cluster. This design not only simplifies deployment and maintenance but also ensures high availability through data replication, safeguarding against data loss in case of machine failures.
Looking for Grafana Mimir Expertise
We provide tailored support and implementation services to fit your unique requirements.
Mimir architecture
Grafana Mimir offers flexible deployment modes to accommodate various operational needs. In monolithic mode, all components run within a single process, simplifying initial setup and development environments. For production scenarios requiring scalability and fault isolation, the microservices mode allows each component to operate as an independent service, enabling granular scaling and resilience. Central to Mimir's architecture is the clear separation of the write path and read path, enhancing performance and scalability. The write path involves components like the distributor and ingester, which handle incoming metrics and ensure efficient data storage. Conversely, the read path comprises components such as the query-frontend and querier, dedicated to processing and executing queries. This separation ensures that heavy read operations do not impact data ingestion, maintaining system efficiency.
Migration from InfluxDB to Grafana Mimir
Migrating from InfluxDB to Mimir is not straightforward, as the format of data is different. Influx v1 storage engine stores data in time-structured merge tree format (TSM), which has a very high compression ratio and up to nanosecond precision. Mimir stores it in TSDB format and less precision as compared to Influx.
Process
The migration involved processing approximately 100TB (uncompressed) of historical data. We utilized two virtual machines (VMs) with 64 vCPUs and 128GB RAM to handle this large-scale migration efficiently. These VMs ran custom scripts specifically designed to facilitate the migration process.
Since InfluxDB stores data in a compressed line format using gzip by default, direct exports were not feasible. To overcome this, we migrated the data in monthly chunks. Some months contained up to 250GB of compressed data, making the process resource-intensive.
Additionally, due to the extensive retention policy, the disk usage increased linearly over time, leading to rising storage costs—one of the key reasons for initiating this migration.
To streamline the transition, we divided the migration into two phases:
- Migrating all InfluxDB data to Mimir
- Migrating existing InfluxDB Grafana dashboards (InfluxQL) to Prometheus-compatible (PromQL) dashboards
Let’s see these steps in detail.
Exporting data from InfluxDB
The InfluxDB instance contained seven years' worth of data collected from hundreds of Telegraf instances. Since this was a single InfluxDB instance, we developed a custom script to process the data. This script segmented the data by database and month, breaking it into manageable chunks before exporting it to an S3 object store. Below is an illustration of this process. Here, we further divided these chunks into small chunks using mimircli(tools used for migration)
In this process, we divided the Telegraf database into multiple chunks, first by year and then by month. The exported data was systematically pushed to S3 in a structured format, ensuring easy retrieval. Given the sheer volume of data and the fact that only a single instance was handling the export, while also dealing with compressed files, this process was computationally intensive and required concurrent processing.
Once the data was exported to S3, it was processed using our internal tool, mimircli. Let’s take a closer look at how this tool works.
Mimircli
To accelerate the migration process, we experimented with different methods for writing InfluxDB data to Mimir. One approach we tested was Mimir’s backfill option, which uses remote write to ingest historical data. However, this method proved to be extremely slow, processing only 800MB in 45 minutes, and also resulted in some data being dropped by the ingester, making it an unsuitable option.
Since Mimir shares architectural similarities with Thanos and leverages the Thanos store-gateway, we considered directly generating TSDB blocks for ingestion. However, we encountered a major challenge: the exported data from InfluxDB was in its native format and stored in gzip-compressed form. In some cases, the actual uncompressed data size was 70-90 times larger than the compressed version, making direct processing impractical. To tackle this, we initiated an internal proof of concept (PoC) to validate our approach. We selected a small database with a compressed size of 1.1GB, which expanded to 98GB when exported. To manage this efficiently, we divided it into 9.8GB chunks for further processing.
Next, we converted these data chunks from InfluxDB’s line protocol format to an OpenMetrics-compatible format, which aligns with Mimir’s expected data structure. Below is an example of this conversion for those unfamiliar with these formats. InfluxDB uses a line-based format, while Mimir adheres to the OpenMetrics/PromQL format.
InLine Example
myMeasurement,tag1=value1,tag2=value2 fieldKey="fieldValue" 1556813561098000000
bash
Openmetric Example
my\_measurement{tag1="value1",tag2="value2"} 1 1556813561098
bash
The first challenge we faced was converting metric types and then generating TSDB blocks from the data. If we had created TSDB blocks directly from 1.1GB of data, it would have consumed a significant amount of disk space. To address this, we opted to create TSDB blocks with a 168-hour (1-week) retention period to optimize disk usage. For this, we leveraged Promtool, an open-source utility from Prometheus that facilitates TSDB block generation. However, using a 168-hour block size introduced some issues, which we will discuss later. At this point, you can see how mimircli functions internally. This initial approach was tested on a small dataset (98GB), but for a larger-scale migration, we had to refine mimircli further.
Until now, mimircli was capable of generating 168-hour TSDB blocks but was doing so from uncompressed data. Since we couldn't store the entire uncompressed dataset on disk or process compressed files directly, we had to decompress the data in memory. From there, we created chunks of 100,000 metrics each and processed these chunks to generate TSDB blocks. The processed data was then recompressed using gzip to minimize storage requirements.
To put things into perspective, for one month’s worth of data, we generated 60,000 to 70,000 chunks, meaning that for a full year, we had to process approximately 720,000 chunks. This migration required significant compute resources, so we utilized two 32 vCPU 64GB Ram virtual machines (VMs) to handle the workload efficiently. To speed up the process, more compute can be added.
Before checking how mimircli works, we used a custom script that would get data from Influx and push it to S3 in a compressed format, which would be consumed by mimircli.
How mimircli helps in the migration
- Mimircli will download the data from S3, which is in compressed format
- It will start converting influx data to OpenMetric format, processed in memory. It is a memory-intensive task as it involves compression and decompression.
- Validate migration:
- If the migration is successful, the chunk is deleted from the disk
- If an error occurs, the chunk is marked as failed and uploaded back to S3 under the path:
failed/<database-name>/2024-month/chunk_50284.gz
- Upload completed TSDB blocks to S3 under an anonymous user path for Mimir
➜ mimircli openmetrics --help
Convert inline protocol to OpenMetrics format
Usage:
mimircli openmetrics [flags]
Flags:
-s, --chunk int Chunk size for processing (default 1000000)
-c, --compressed Input files are gzip compressed
-d, --dir string Directory with input chunks (default "chunks")
-a, --dirout string Output directory for TSDB (default "chunks-dir")
-h, --help help for openmetrics
-m, --max-concurrent int Maximum concurrent file processing
-l, --memory-limit float Maximum memory usage percentage (0.0-1.0) (default 0.7)
-o, --out string Output filename (default "openmetrics.txt")
-t, --telegraf Influx export of telegraf data is a bit different use this option when parsing telegraf data
bash
These are the steps we followed each month. Until now, we had a separate process running on the Influx server to export data, and our mimircli
was running on two different hosts. We were using a single-tenant architecture and didn’t encounter any issues initially, but that changed later. Let’s talk about what happened.
This was a single-tenant architecture, meaning all the TSDB blocks that were generated were being pushed under the anonymous user at the /anonymous
path in S3.
Grafana dashboard migration
Over time, the client had built hundreds of Grafana dashboards with thousands of panels, all relying on InfluxQL to query the InfluxDB backend. Migrating these dashboards to PromQL without disrupting operations was critical.
To ensure a seamless transition, we implemented a dual-streaming approach, sending data to both InfluxDB and Grafana Mimir during the migration. This allowed us to maintain two live environments for testing and validation. For automating the dashboard conversion, we initially tried using an open-source tool by Logz.io and Aiven, but it didn’t meet our requirements. To address this, we forked and enhanced the tool to suit our specific needs.
Conclusion
Migrating from a legacy observability stack like InfluxDB v1 and the TICK stack to Grafana Mimir was not just a technological upgrade—it was a strategic move toward scalability, performance, and future-proofing. Transition from a compression-based storage architecture to a compaction-based one came with its own set of challenges, especially around data compatibility, performance tuning, and dashboard migration.
By investing in custom tooling like mimircli
, and taking an incremental, chunked approach to data transformation, we were able to migrate over 100TB of historical metrics with precision and reliability. We tested multiple strategies—from remote write to native TSDB block generation—before settling on a hybrid pipeline that balanced resource usage, accuracy, and speed.
Beyond the migration itself, this journey gave us a deeper appreciation for the nuances of multi-tenancy, the importance of clean and maintainable observability data, and the benefits of aligning with open standards like OpenMetrics and PromQL.
We hope this blog not only demystifies the migration process but also acts as a guide for those considering a similar path. If you’re looking to move away from InfluxDB or other proprietary systems to a modern, scalable solution like Grafana Mimir, we’d love to share our tools, insights, and learnings to help you get there faster.
Need help or want to share your experience? Reach out to us. We’re always up for a good observability chat.