菜单
开源

从微服务迁移到读写模式,无需停机

警告

读写模式以及模式之间的迁移是实验性功能。

概括而言,涉及的步骤如下:

  1. 将读写组件与微服务并行部署;它们都加入相同的 Ring。
  2. 切换入口中的端点。
  3. 停用微服务。

从读写模式迁移到微服务的步骤可以反向应用。

步骤 1:配置区域感知前提条件

读写模式要求您启用多区域 Ingester 和 Store-gateway

jsonnet
{
  _config+:: {
    multi_zone_ingester_enabled: true,
    multi_zone_store_gateway_enabled: true,
  },
}

如果当前在微服务中使用 Ruler 远程评估,则还需要禁用它,但这项操作在迁移过程的稍后阶段进行。

步骤 2:以 0 个副本数部署读写组件

通过将 deployment_mode 设置为 migration,jsonnet 将为读写模式和微服务模式配置组件。首先,我们需要将其副本数设置为 0。

jsonnet
{
  _config+:: {
    deployment_mode: 'migration',

    mimir_write_replicas: 0,
    mimir_read_replicas: 0,
    mimir_backend_replicas: 0,
    autoscaling_mimir_read_enabled: false,
  },
}

可选:仔细检查配置

此时,您可以选择比较微服务模式和读写模式之间的组件配置。

例如

bash
# Export all of the Kubernetes objects to yaml:

kubectl get -o yaml deployment distributor > distributor.yaml; yq eval -i '.spec' distributor.yaml
kubectl get -o yaml deployment overrides-exporter > overrides-exporter.yaml; yq eval -i '.spec' overrides-exporter.yaml
kubectl get -o yaml deployment querier > querier.yaml; yq eval -i '.spec' querier.yaml
kubectl get -o yaml deployment query-frontend > query-frontend.yaml; yq eval -i '.spec' query-frontend.yaml
kubectl get -o yaml deployment query-scheduler > query-scheduler.yaml; yq eval -i '.spec' query-scheduler.yaml
kubectl get -o yaml deployment ruler > ruler.yaml; yq eval -i '.spec' ruler.yaml
kubectl get -o yaml deployment ruler-querier > ruler-querier.yaml; yq eval -i '.spec' ruler-querier.yaml
kubectl get -o yaml deployment ruler-query-frontend > ruler-query-frontend.yaml; yq eval -i '.spec' ruler-query-frontend.yaml
kubectl get -o yaml deployment ruler-query-scheduler > ruler-query-scheduler.yaml; yq eval -i '.spec' ruler-query-scheduler.yaml
kubectl get -o yaml deployment mimir-read > mimir-read.yaml; yq eval -i '.spec' mimir-read.yaml
kubectl get -o yaml statefulset compactor > compactor.yaml; yq eval -i '.spec' compactor.yaml
kubectl get -o yaml statefulset ingester-zone-a > ingester-zone-a.yaml; yq eval -i '.spec' ingester-zone-a.yaml
kubectl get -o yaml statefulset ingester-zone-b > ingester-zone-b.yaml; yq eval -i '.spec' ingester-zone-b.yaml
kubectl get -o yaml statefulset ingester-zone-c > ingester-zone-c.yaml; yq eval -i '.spec' ingester-zone-c.yaml
kubectl get -o yaml statefulset store-gateway-zone-a > store-gateway-zone-a.yaml; yq eval -i '.spec' store-gateway-zone-a.yaml
kubectl get -o yaml statefulset store-gateway-zone-b > store-gateway-zone-b.yaml; yq eval -i '.spec' store-gateway-zone-b.yaml
kubectl get -o yaml statefulset store-gateway-zone-c > store-gateway-zone-c.yaml; yq eval -i '.spec' store-gateway-zone-c.yaml
kubectl get -o yaml statefulset mimir-write-zone-a > mimir-write-zone-a.yaml; yq eval -i '.spec' mimir-write-zone-a.yaml
kubectl get -o yaml statefulset mimir-write-zone-b > mimir-write-zone-b.yaml; yq eval -i '.spec' mimir-write-zone-b.yaml
kubectl get -o yaml statefulset mimir-write-zone-c > mimir-write-zone-c.yaml; yq eval -i '.spec' mimir-write-zone-c.yaml
kubectl get -o yaml statefulset mimir-backend-zone-a > mimir-backend-zone-a.yaml; yq eval -i '.spec' mimir-backend-zone-a.yaml
kubectl get -o yaml statefulset mimir-backend-zone-b > mimir-backend-zone-b.yaml; yq eval -i '.spec' mimir-backend-zone-b.yaml
kubectl get -o yaml statefulset mimir-backend-zone-c > mimir-backend-zone-c.yaml; yq eval -i '.spec' mimir-backend-zone-c.yaml

# Diff deployments and statefulsets:

## Write
diff --color=always distributor.yaml mimir-write-zone-a.yaml
diff --color=always ingester-zone-a.yaml mimir-write-zone-a.yaml
diff --color=always ingester-zone-b.yaml mimir-write-zone-b.yaml
diff --color=always ingester-zone-c.yaml mimir-write-zone-c.yaml

## Read
diff --color=always query-frontend.yaml mimir-read.yaml
diff --color=always querier.yaml mimir-read.yaml
diff --color=always ruler-query-frontend.yaml mimir-read.yaml
diff --color=always ruler-querier.yaml mimir-read.yaml

## Backend
diff --color=always overrides-exporter.yaml mimir-backend-zone-a.yaml
diff --color=always query-scheduler.yaml mimir-backend-zone-a.yaml
diff --color=always ruler-query-scheduler.yaml mimir-backend-zone-a.yaml
diff --color=always ruler.yaml mimir-backend-zone-a.yaml
diff --color=always compactor.yaml mimir-backend-zone-a.yaml
diff --color=always store-gateway-zone-a.yaml mimir-backend-zone-a.yaml
diff --color=always store-gateway-zone-b.yaml mimir-backend-zone-b.yaml
diff --color=always store-gateway-zone-c.yaml mimir-backend-zone-c.yaml

步骤 3:将读取路径迁移到读写服务

步骤 3.1:扩容读取组件

使用自动扩缩容或显式设置副本数来扩容 Mimir 读取组件。(保持当前微服务组件的副本数或自动扩缩容级别不变)。

jsonnet
{
  _config+:: {
    deployment_mode: 'migration',

    mimir_write_replicas: 0,
    mimir_read_replicas: 3,
    mimir_backend_replicas: 0,
    autoscaling_mimir_read_enabled: false,
  },
}
jsonnet
{
  _config+:: {
    deployment_mode: 'migration',

    mimir_write_replicas: 0,
    mimir_backend_replicas: 0,
    autoscaling_mimir_read_enabled: true,
    autoscaling_mimir_read_min_replicas: 3,
    autoscaling_mimir_read_max_replicas: 30,
  },
}

此时,读写 Querier 将开始运行来自 query-scheduler 的查询(因为它们共享同一个 Ring)。

步骤 3.2:检查 mimir-read 是否正常工作

通过端口转发到 mimir-read 执行测试查询。

确保 mimir-read 正在运行查询

sum by (pod) (rate(cortex_querier_request_duration_seconds_count{job=~".*mimir-read.*))", route=~"(prometheus|api_prom)_api_v1_.+"}[1m]))

步骤 3.3:将流量路由到 mimir-read

配置您的负载均衡器将读取请求路由到 mimir-read

确保 query-frontend 微服务不再接收请求

sum by(pod) (rate(cortex_query_frontend_queries_total{pod!~"ruler-query-frontend.*"}[1m]))

步骤 4:将后端组件迁移到后端服务

步骤 4.1:扩容后端组件

扩容 Mimir 后端组件。

jsonnet
{
  _config+:: {
    mimir_backend_replicas: 3,
  },
}

步骤 4.2:检查 mimir-backend 是否正常工作

检查以下 Ring

  • Query-scheduler Ring 应同时包含微服务和读写组件。
  • Store-gateway Ring 应同时包含微服务和读写组件。
  • Compactor Ring 应同时包含微服务和读写组件。
  • Ruler Ring 应同时包含微服务和读写组件。

对长期数据运行一些测试查询。

步骤 4.3:将流量路由到 mimir-backend

配置您的负载均衡器将 compactorruler 端点路由到 mimir-backend

步骤 5:缩容微服务读取和后端组件

现在 mimir-readmimir-backend 已扩容并接收流量,我们可以安全地停用这些路径上的微服务。

首先,配置微服务 store-gateway 离开 Ring

jsonnet
{
  // Configure microservices store-gateway to leave the ring.
  store_gateway_args+:: {
    'store-gateway.sharding-ring.unregister-on-shutdown': true,
  },

  mimir_backend_args+:: {
    'store-gateway.sharding-ring.unregister-on-shutdown': false,
  },
}

然后缩容所有组件的副本数

jsonnet
{
  _config+:: {
    multi_zone_store_gateway_replicas: 0,
    autoscaling_querier_enabled: false,
  },

  query_frontend_deployment+:
    deployment.mixin.spec.withReplicas(0),

  query_scheduler_deployment+:
    deployment.mixin.spec.withReplicas(0),

  querier_deployment+:
    deployment.mixin.spec.withReplicas(0),

  ruler_deployment+:
    deployment.mixin.spec.withReplicas(0),

  overrides_exporter_deployment+:
    deployment.mixin.spec.withReplicas(0),

  compactor_statefulset+:
    statefulSet.mixin.spec.withReplicas(0),
}

确保后端组件(query-schedulercompactorrulerstore-gateway)正确离开了它们各自的 Ring(Query-scheduler、Compactor、Ruler、Store-gateway)。

现在可以安全地禁用Ruler 远程评估了。(这需要在微服务 Ruler 缩容之后进行,否则规则评估可能会失败)。

jsonnet
{
  _config+:: {
    ruler_remote_evaluation_enabled: false,
    autoscaling_ruler_querier_enabled: false,
  },
}

步骤 6:将写入路径迁移到读写部署

步骤 6.1:扩容写入组件

扩容 mimir-write

jsonnet
{
  _config+:: {
    mimir_write_replicas: 3,
  },
}

步骤 6.2:将流量路由到 mimir-write

配置您的负载均衡器将写入请求路由到 mimir-write

确保微服务 Distributor 不再接收写入请求

sum by (job) (rate(cortex_request_duration_seconds_count{job=~".*distributor.*", route=~"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics"}[1m]))

步骤 7:缩容写入微服务

步骤 7.1:缩容 Distributor

distributor 副本数设置为 0

jsonnet
{
  distributor_deployment+:
    deployment.mixin.spec.withReplicas(0),
}

等待 Ingester 的下一个 TSDB Head 压缩(2 小时)。

步骤 7.2:缩容 Ingester

警告

您必须遵循关闭 Ingester 过程以避免数据丢失。

遵循关闭 ingestersingester-zone-a 中的过程。

缩容 zone-a 副本数(这可以在等待关闭过程步骤 4 之前完成)

jsonnet
{
  ingester_zone_a_statefulset+:
    statefulSet.mixin.spec.withReplicas(0),
}

按照关闭过程步骤 4 要求等待所需的时间。

对 zone-b 和 zone-c 重复关闭和缩容 Ingester,在每个区域之间等待所需的时间。

步骤 8:最终清理

迁移完成后,您可以清理部署和 jsonnet。

deployment_mode 更改为最终状态 read-write 将删除所有微服务 Kubernetes 对象。

jsonnet
{
  _config+:: {
    deployment_mode: 'read-write',
  },
}

由于 query_frontend_deployment 等对象不再定义,您还需要删除我们为这些组件所做的扩缩容设置。现在是删除您可能设置的任何其他剩余扩缩容或微服务配置的好时机。

最后,您可以删除和释放微服务中任何未使用的卷。例如,获取未使用 PVC 列表的方法如下:

kubectl get pvc --no-headers | grep -E '(ingester|store-gateway|compactor)' | awk '{print $1}'