Ceph中文文檔

2022-10-07 16:17 作者:提依拉 0人讀過 | 我要投稿

Ceph簡(jiǎn)介

Ceph存儲(chǔ)群集至少需要一個(gè)Ceph監(jiān)視器(Ceph Monitor)、Ceph管理器(Ceph Manager)和Ceph OSD（對(duì)象存儲(chǔ)守護(hù)程序）((Object Storage Daemon))。在運(yùn)行Ceph文件系統(tǒng)客戶端時(shí)，還需要Ceph元數(shù)據(jù)服務(wù)器(Ceph Metadata Server)。

監(jiān)視器(Monitors)：Ceph監(jiān)視器（Ceph mon）維護(hù)集群狀態(tài)的映射，包括監(jiān)視器映射、管理器映射、OSD映射、MDS映射和CRUSH map。這些映射是Ceph守護(hù)進(jìn)程相互協(xié)調(diào)所需的關(guān)鍵集群狀態(tài)。監(jiān)視器還負(fù)責(zé)管理守護(hù)進(jìn)程和客戶端之間的身份驗(yàn)證。為了實(shí)現(xiàn)冗余和高可用性，通常至少需要三臺(tái)監(jiān)視器。
管理器(Managers)：Ceph管理器守護(hù)程序（Ceph mgr）負(fù)責(zé)跟蹤運(yùn)行時(shí)指標(biāo)和Ceph群集的當(dāng)前狀態(tài)，包括存儲(chǔ)利用率、當(dāng)前性能指標(biāo)和系統(tǒng)負(fù)載。Ceph Manager守護(hù)進(jìn)程還托管基于python的模塊，以管理和公開Ceph集群信息，包括基于web的Ceph儀表板和REST API。高可用性通常需要至少兩個(gè)管理器。
Ceph OSDs：Ceph OSD（對(duì)象存儲(chǔ)守護(hù)程序 object storage daemon，Ceph OSD）存儲(chǔ)數(shù)據(jù)，處理數(shù)據(jù)復(fù)制(handles data replication)、恢復(fù)(recovery)、重新平衡(rebalancing)，并通過檢查其他Ceph OSD守護(hù)程序的心跳信號(hào)，向Ceph監(jiān)控器和管理器提供一些監(jiān)控信息。為了實(shí)現(xiàn)冗余和高可用性，通常至少需要3個(gè)Ceph OSD。
MDSs：Ceph元數(shù)據(jù)服務(wù)器（MDS，Ceph MDS）代表Ceph文件系統(tǒng)存儲(chǔ)元數(shù)據(jù)（即，Ceph塊設(shè)備和Ceph對(duì)象存儲(chǔ)不使用MDS）。Ceph元數(shù)據(jù)服務(wù)器允許POSIX文件系統(tǒng)用戶執(zhí)行基本命令（如ls、find等），而不會(huì)給Ceph存儲(chǔ)集群帶來巨大負(fù)擔(dān)。

Ceph將數(shù)據(jù)作為對(duì)象存儲(chǔ)在邏輯存儲(chǔ)池中。使用CRUSH算法，Ceph計(jì)算哪個(gè)放置組應(yīng)該包含該對(duì)象，并進(jìn)一步計(jì)算哪個(gè)Ceph OSD守護(hù)進(jìn)程應(yīng)該存儲(chǔ)該放置組。CRUSH算法使Ceph存儲(chǔ)群集能夠動(dòng)態(tài)擴(kuò)展、重新平衡和恢復(fù)。

硬件建議

RAM

一般來說，RAM越多越好。對(duì)于一個(gè)中等規(guī)模的集群來說，監(jiān)視器Monitors/管理器Managers 節(jié)點(diǎn)可以使用64GB的容量；對(duì)于擁有數(shù)百個(gè)OSD的大型集群來說，128GB是一個(gè)合理的目標(biāo)。BlueStore OSD有一個(gè)默認(rèn)為4GB的內(nèi)存目標(biāo)?？紤]到操作系統(tǒng)和管理任務(wù)（如監(jiān)控和指標(biāo)）的謹(jǐn)慎裕度，以及恢復(fù)過程中消耗的增加：建議每個(gè)BlueStore OSD配置約8GB。

Monitors and managers (ceph-mon and ceph-mgr)

Monitor and manager守護(hù)進(jìn)程內(nèi)存使用量通常隨集群的大小而變化。請(qǐng)注意，在啟動(dòng)時(shí)以及拓?fù)涓暮突謴?fù)期間，這些守護(hù)進(jìn)程將需要比穩(wěn)態(tài)運(yùn)行期間更多的RAM，因此請(qǐng)計(jì)劃峰值使用率。對(duì)于非常小的集群，32GB就足夠了。對(duì)于高達(dá)300個(gè)OSD的集群，可以使用64GB。

Metadata servers (ceph-mds)

元數(shù)據(jù)守護(hù)進(jìn)程的內(nèi)存利用率取決于其緩存配置為消耗多少內(nèi)存。我們建議大多數(shù)系統(tǒng)的最小容量為1 GB

Memory

Bluestore使用自己的內(nèi)存來緩存數(shù)據(jù)，而不是依賴操作系統(tǒng)頁面緩存。在bluestore中，您可以使用osd_memory_target配置選項(xiàng)來調(diào)整OSD試圖消耗的內(nèi)存量

通常不建議將osd_memory_target設(shè)置為2GB以下（這可能無法將內(nèi)存保持在如此低的水平，還可能導(dǎo)致性能極低）
將內(nèi)存目標(biāo)設(shè)置在2GB和4GB之間通常有效，但可能會(huì)導(dǎo)致性能下降：在IO期間，可能會(huì)從磁盤讀取元數(shù)據(jù)，除非活動(dòng)數(shù)據(jù)集相對(duì)較小
4GB是當(dāng)前默認(rèn)的osd_memory_target大小。此默認(rèn)值是為典型用例選擇的，旨在平衡典型用例的內(nèi)存需求和OSD性能
當(dāng)處理多個(gè)（小）對(duì)象或大（256GB/osd或更多）數(shù)據(jù)集時(shí)，將osd_memory_target設(shè)置為高于4GB可以提高性能

最低硬件建議

ceph-osd:

RAM: 每個(gè)守護(hù)進(jìn)程4GB以上（越多越好） 2-4GB經(jīng)常運(yùn)行（可能很慢）不建議小于2GB

ceph-mon:

Processor: 2 core

RAM: 每個(gè)守護(hù)進(jìn)程24GB以上

ceph-mds:

Processor: 2 core

RAM: 每個(gè)守護(hù)進(jìn)程2GB以上

安裝Ceph

其他方法

ceph-ansible

ceph-deploy

Cephadm

cephadm部署和管理Ceph集群。它通過SSH將manager守護(hù)進(jìn)程連接到主機(jī)來實(shí)現(xiàn)這一點(diǎn)。manager守護(hù)進(jìn)程能夠添加、刪除和更新Ceph容器。cephadm不依賴外部配置工具，如Ansible、Rook和Salt。

cephadm管理Ceph群集的整個(gè)生命周期。這個(gè)生命周期從引導(dǎo)過程開始，cephadm在單個(gè)節(jié)點(diǎn)上創(chuàng)建一個(gè)小型Ceph集群。此群集由一個(gè)監(jiān)視器和一個(gè)管理器組成。cephadm然后使用編排界面（“第2天”命令）擴(kuò)展集群，添加所有主機(jī)，并提供所有Ceph守護(hù)進(jìn)程和服務(wù)。此生命周期的管理可以通過Ceph命令行界面（CLI）或儀表板（GUI）執(zhí)行。

部署新的Ceph集群

Cephadm通過在單個(gè)主機(jī)上“bootstrapping”，擴(kuò)展集群以包含任何其他主機(jī)，然后部署所需的服務(wù)來創(chuàng)建一個(gè)新的Ceph集群。

安裝cephadm

cephadm可以: bootstrap一個(gè)新集群; 使用Ceph CLI啟動(dòng)容器化shell; 協(xié)助調(diào)試容器化Ceph守護(hù)進(jìn)程

dnf install --assumeyes centos-release-ceph-pacific.noarch dnf install --assumeyes cephadm

Bootstrap一個(gè)新集群

bootstrap之前要知道

創(chuàng)建新Ceph群集的第一步是在Ceph群集的第一臺(tái)主機(jī)上運(yùn)行cephadm bootstrap命令。在Ceph集群的第一臺(tái)主機(jī)上運(yùn)行cephadm bootstrap命令會(huì)創(chuàng)建Ceph集群的第一個(gè)“monitor daemon”，該監(jiān)控守護(hù)程序需要一個(gè)IP地址。必須將Ceph群集的第一臺(tái)主機(jī)的IP地址傳遞給Ceph bootstrap命令

如果有多個(gè)網(wǎng)絡(luò)和接口，請(qǐng)確保選擇一個(gè)可供任何訪問Ceph群集的主機(jī)訪問的網(wǎng)絡(luò)和接口。

運(yùn)行bootstrap命令

cephadm bootstrap --mon-ip *<mon-ip>*

此命令將：

在本地主機(jī)上為新群集創(chuàng)建監(jiān)視器和管理器守護(hù)程序。
為Ceph集群生成一個(gè)新的SSH密鑰，并將其添加到root用戶的/root/.ssh/authorized_keys 文件
將公鑰的副本寫入/etc/ceph/ceph.pub
將最小配置文件寫入/etc/ceph/ceph.conf. 需要此文件才能與新群集通信
寫一份client.admin管理（特權(quán)?。?etc/ceph/ceph.client.admin.keyring的密鑰
將_admin標(biāo)簽添加到引導(dǎo)主機(jī)。默認(rèn)情況下，任何帶有此標(biāo)簽的主機(jī)（也）都將獲得/etc/ceph/ceph.conf和/etc/ceph/ceph.client.admin.keyring的副本

啟用Ceph CLI

Cephadm不要求在主機(jī)上安裝任何Ceph軟件包。但是，我們建議啟用對(duì)ceph命令的輕松訪問。有幾種方法可以做到這一點(diǎn)：

cephadm shell命令在安裝了所有Ceph包的容器中啟動(dòng)bash shell。默認(rèn)情況下，如果在主機(jī)上的/etc/ceph中找到配置文件和keyring文件，則會(huì)將它們傳遞到容器環(huán)境中，這樣shell就可以完全正常工作。請(qǐng)注意，在MON主機(jī)上執(zhí)行時(shí)，cephadm shell將從MON容器推斷配置，而不是使用默認(rèn)配置。

cephadm shell

要執(zhí)行ceph命令，還可以運(yùn)行以下命令：

cephadm shell -- ceph -s

您可以安裝ceph common包，其中包含所有ceph命令，包括ceph、rbd、mount。ceph（用于安裝CephFS文件系統(tǒng)）等：

cephadm add-repo --release octopus cephadm install ceph-common

確認(rèn)ceph命令可通過以下方式訪問：

ceph -v

確認(rèn)ceph命令可以連接到群集，并確認(rèn)其狀態(tài)為：

ceph status

添加主機(jī)

接下來，通過添加主機(jī)將所有主機(jī)添加到集群。

默認(rèn)情況下，所有帶有_admin標(biāo)簽的主機(jī)上的/etc/ceph中都會(huì)維護(hù)一個(gè)ceph.conf文件和一個(gè)client.admin keyring副本，該標(biāo)簽最初僅應(yīng)用于引導(dǎo)主機(jī)。我們通常建議為一臺(tái)或多臺(tái)其他主機(jī)提供_admin標(biāo)簽，以便在多臺(tái)主機(jī)上輕松訪問Ceph CLI（例如，通過cephadm shell）。要將_admin標(biāo)簽添加到其他主機(jī)

ceph orch host label add *<host>* _admin

添加額外的MONs

典型的Ceph集群有三個(gè)或五個(gè)分布在不同主機(jī)上的監(jiān)視器守護(hù)進(jìn)程。如果集群中有五個(gè)或更多節(jié)點(diǎn)，建議部署五個(gè)監(jiān)視器。

添加存儲(chǔ)

要向群集添加存儲(chǔ)，請(qǐng)告訴Ceph使用任何可用和未使用的設(shè)備：

ceph orch apply osd --all-available-devices

啟用OSD內(nèi)存自動(dòng)調(diào)諧

在其他情況下，如果集群硬件并非由Ceph獨(dú)家使用（超融合），請(qǐng)減少Ceph的內(nèi)存消耗，如下所示：

# hyperconverged only: ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2

然后啟用內(nèi)存自動(dòng)調(diào)諧：

ceph config set osd osd_memory_target_autotune true

使用CEPH

To use the?Ceph Filesystem, follow?Deploy CephFS.

To use the?Ceph Object Gateway, follow?Deploy RGWs.

To use?NFS, follow?NFS Service

To use?iSCSI, follow?Deploying iSCSI

主機(jī)管理

要列出與群集關(guān)聯(lián)的主機(jī)，請(qǐng)執(zhí)行以下操作：

ceph orch host ls [--format yaml] [--host-pattern <name>] [--label <label>] [--host-status <status>]

其中可選參數(shù)“host-pattern”、“l(fā)abel”和“host-status”用于篩選

添加主機(jī)

要將每個(gè)新主機(jī)添加到群集，請(qǐng)執(zhí)行兩個(gè)步驟：

在新主機(jī)的根用戶的authorized_keys文件中安裝集群的公共SSH密鑰：

ssh-copy-id -f -i /etc/ceph/ceph.pub root@*<new-host>*

2. 告訴Ceph新節(jié)點(diǎn)是群集的一部分：

ceph orch host add *<newhost>* [*<ip>*] [*<label1> ...*]

主機(jī)標(biāo)簽

orchestrator支持將標(biāo)簽分配給主機(jī)。標(biāo)簽是自由形式的，本身沒有特殊意義，每個(gè)主機(jī)可以有多個(gè)標(biāo)簽。它們可用于指定守護(hù)進(jìn)程的位置。

Labels can be added when adding a host with the?--labels?flag:

ceph orch host add my_hostname --labels=my_label1 ceph orch host add my_hostname --labels=my_label1,my_label2

To add a label a existing host, run:

ceph orch host label add my_hostname my_label

服務(wù)管理

MON Service
MGR Service
OSD Service
RGW Service
MDS Service
NFS Service
iSCSI Service
Custom Container Service
Monitoring Services
SNMP Gateway Service

服務(wù)狀態(tài)

要查看Ceph群集中運(yùn)行的其中一個(gè)服務(wù)的狀態(tài)，請(qǐng)執(zhí)行以下操作：

使用命令行打印服務(wù)列表。
找到要檢查其狀態(tài)的服務(wù)。
打印服務(wù)的狀態(tài)。

以下命令打印編排器已知的服務(wù)列表。要將輸出限制為僅在指定主機(jī)上的服務(wù)，請(qǐng)使用可選的--host參數(shù)。要將輸出僅限于特定類型的服務(wù)，請(qǐng)使用可選的--type參數(shù)（mon、osd、mgr、mds、rgw）：

ceph orch ls [--service_type type] [--service_name name] [--export] [--format f] [--refresh]

Discover the status of a particular service or daemon:

ceph orch ls --service_type type --service_name <name> [--refresh]

To export the service specifications knows to the orchestrator, run the following command.

ceph orch ls --export

The service specifications exported with this command will be exported as yaml and that yaml can be used with the?ceph?orch?apply?-i?command.

守護(hù)進(jìn)程狀態(tài)

守護(hù)進(jìn)程是正在運(yùn)行的systemd單元，是服務(wù)的一部分。

要查看守護(hù)進(jìn)程的狀態(tài)，請(qǐng)執(zhí)行以下操作：

打印編排器已知的所有守護(hù)進(jìn)程的列表。
查詢目標(biāo)守護(hù)程序的狀態(tài)。

首先，打印編排器已知的所有守護(hù)進(jìn)程的列表：

ceph orch ps [--hostname host] [--daemon_type type] [--service_name name] [--daemon_id id] [--format f] [--refresh]

然后查詢特定服務(wù)實(shí)例（mon、osd、mds、rgw）的狀態(tài)。對(duì)于OSD，id是數(shù)字OSD id。對(duì)于MDS服務(wù)，id是文件系統(tǒng)名稱：

ceph orch ps --daemon_type osd --daemon_id 0

服務(wù)規(guī)范

服務(wù)規(guī)范是用于指定服務(wù)部署的數(shù)據(jù)結(jié)構(gòu)。以下是YAML中的服務(wù)規(guī)范示例：

service_type: rgw
service_id: realm.zone
placement:
hosts:
- host1
- host2
- host3
unmanaged: false
networks:
- 192.169.142.0/24
spec:
# Additional service specific attributes.

service_id

The name of the service. Required for?iscsi,?mds,?nfs,?osd,?rgw,?container,?ingress

service_type

The type of the service. Needs to be either a Ceph service (mon,?crash,?mds,?mgr,?osd?or?rbd-mirror), a gateway (nfs?or?rgw), part of the monitoring stack (alertmanager,?grafana,?node-exporter?or?prometheus) or (container) for custom containers.

mon、mgr和監(jiān)視類型的服務(wù)規(guī)范不需要服務(wù)id

檢索正在運(yùn)行的服務(wù)規(guī)范

ceph orch ls --service-name rgw.<realm>.<zone> --export > rgw.<realm>.<zone>.yaml
ceph orch ls --service-type mgr --export > mgr.yaml
ceph orch ls --export > cluster.yaml

更新服務(wù)規(guī)范

List the current?ServiceSpec:
ceph orch ls --service_name=<service-name> --export > myservice.yaml
Update the yaml file:
vi myservice.yaml
Apply the new?ServiceSpec:
ceph orch apply -i myservice.yaml [--dry-run]

守護(hù)進(jìn)程放置

Note
cephadm will not deploy daemons on hosts with the?_no_schedule?label; see?Special host labels.

Note
The?apply?command can be confusing. For this reason, we recommend using YAML specifications.
Each?ceph?orch?apply?<service-name>?command supersedes the one before it. If you do not use the proper syntax, you will clobber your work as you go.
For example:
ceph orch apply mon host1 ceph orch apply mon host2 ceph orch apply mon host3
This results in only one host having a monitor applied to it: host 3.
(The first command creates a monitor on host1. Then the second command clobbers the monitor on host1 and creates a monitor on host2. Then the third command clobbers the monitor on host2 and creates a monitor on host3. In this scenario, at this point, there is a monitor ONLY on host3.)
To make certain that a monitor is applied to each of these three hosts, run a command like this:
ceph orch apply mon "host1,host2,host3"
There is another way to apply monitors to multiple hosts: a?yaml?file can be used. Instead of using the “ceph orch apply mon” commands, run a command of this form:
ceph orch apply -i file.yaml
Here is a sample?file.yaml?file
service_type: mon

placement:

hosts:

- host1

- host2

- host3

CEPHADM操作

監(jiān)視CEPHADM日志消息

Cephadm將日志寫入Cephadm集群日志通道。您可以通過讀取日志來實(shí)時(shí)監(jiān)控Ceph的活動(dòng)。運(yùn)行以下命令以實(shí)時(shí)查看日志：

ceph -W cephadm

Ceph守護(hù)程序日志

Ceph守護(hù)進(jìn)程通常會(huì)將日志寫入/var/log/Ceph。默認(rèn)情況下，Ceph守護(hù)進(jìn)程會(huì)記錄到日志，容器運(yùn)行時(shí)環(huán)境會(huì)捕獲Ceph日志。他們可以通過journalctl訪問

健康檢查

cephadm模塊提供額外的運(yùn)行狀況檢查，以補(bǔ)充集群提供的默認(rèn)運(yùn)行狀況檢查。這些額外的健康檢查分為兩類：

cephadm操作：當(dāng)cephadm模塊處于活動(dòng)狀態(tài)時(shí)，始終會(huì)執(zhí)行此類健康檢查。

集群配置：這些運(yùn)行狀況檢查是可選的，重點(diǎn)是集群中主機(jī)的配置。

標(biāo)簽：