Crush rulesets when importing cluster

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Crush rulesets when importing cluster

rdowell
This post was updated on .
Question about how to handle CRUSH rulesets and storage groups during cluster import:

With the particular Ceph setup we're using, there are 2 CRUSH rules created during the Ceph installation before we get to importing VSM, one named 'replicated ruleset' which I think is a Ceph default, and one that we define called 'if500-ruleset' as follows:

Crushmap:

cephuser@parkcity:~$ ceph osd crush dump
{
    "devices": [
        {
            "id": 0,
            "name": "osd.0"
        },
        {
            "id": 1,
            "name": "osd.1"
        },
        {
            "id": 2,
            "name": "osd.2"
        },
        {
            "id": 3,
            "name": "osd.3"
        },
        {
            "id": 4,
            "name": "osd.4"
        },
        {
            "id": 5,
            "name": "osd.5"
        },
        {
            "id": 6,
            "name": "osd.6"
        },
        {
            "id": 7,
            "name": "osd.7"
        },
        {
            "id": 8,
            "name": "osd.8"
        },
        {
            "id": 9,
            "name": "osd.9"
        },
        {
            "id": 10,
            "name": "osd.10"
        },
        {
            "id": 11,
            "name": "osd.11"
        },
        {
            "id": 12,
            "name": "osd.12"
        },
        {
            "id": 13,
            "name": "osd.13"
        },
        {
            "id": 14,
            "name": "osd.14"
        },
        {
            "id": 15,
            "name": "osd.15"
        }
    ],
    "types": [
        {
            "type_id": 0,
            "name": "osd"
        },
        {
            "type_id": 1,
            "name": "host"
        },
        {
            "type_id": 2,
            "name": "chassis"
        },
        {
            "type_id": 3,
            "name": "rack"
        },
        {
            "type_id": 4,
            "name": "row"
        },
        {
            "type_id": 5,
            "name": "pdu"
        },
        {
            "type_id": 6,
            "name": "pod"
        },
        {
            "type_id": 7,
            "name": "room"
        },
        {
            "type_id": 8,
            "name": "datacenter"
        },
        {
            "type_id": 9,
            "name": "region"
        },
        {
            "type_id": 10,
            "name": "root"
        }
    ],
    "buckets": [
        {
            "id": -1,
            "name": "default",
            "type_id": 10,
            "type_name": "root",
            "weight": 7313392,
            "alg": "straw",
            "hash": "rjenkins1",
            "items": [
                {
                    "id": -4,
                    "weight": 7313392,
                    "pos": 0
                }
            ]
        },
        {
            "id": -2,
            "name": "brighton",
            "type_id": 1,
            "type_name": "host",
            "weight": 3656696,
            "alg": "straw",
            "hash": "rjenkins1",
            "items": [
                {
                    "id": 0,
                    "weight": 457087,
                    "pos": 0
                },
                {
                    "id": 2,
                    "weight": 457087,
                    "pos": 1
                },
                {
                    "id": 4,
                    "weight": 457087,
                    "pos": 2
                },
                {
                    "id": 6,
                    "weight": 457087,
                    "pos": 3
                },
                {
                    "id": 7,
                    "weight": 457087,
                    "pos": 4
                },
                {
                    "id": 9,
                    "weight": 457087,
                    "pos": 5
                },
                {
                    "id": 12,
                    "weight": 457087,
                    "pos": 6
                },
                {
                    "id": 14,
                    "weight": 457087,
                    "pos": 7
                }
            ]
        },
        {
            "id": -3,
            "name": "alta",
            "type_id": 1,
            "type_name": "host",
            "weight": 3656696,
            "alg": "straw",
            "hash": "rjenkins1",
            "items": [
                {
                    "id": 1,
                    "weight": 457087,
                    "pos": 0
                },
                {
                    "id": 3,
                    "weight": 457087,
                    "pos": 1
                },
                {
                    "id": 5,
                    "weight": 457087,
                    "pos": 2
                },
                {
                    "id": 8,
                    "weight": 457087,
                    "pos": 3
                },
                {
                    "id": 10,
                    "weight": 457087,
                    "pos": 4
                },
                {
                    "id": 11,
                    "weight": 457087,
                    "pos": 5
                },
                {
                    "id": 13,
                    "weight": 457087,
                    "pos": 6
                },
                {
                    "id": 15,
                    "weight": 457087,
                    "pos": 7
                }
            ]
        },
        {
            "id": -4,
            "name": "IF100_1",
            "type_id": 2,
            "type_name": "chassis",
            "weight": 7313392,
            "alg": "straw",
            "hash": "rjenkins1",
            "items": [
                {
                    "id": -3,
                    "weight": 3656696,
                    "pos": 0
                },
                {
                    "id": -2,
                    "weight": 3656696,
                    "pos": 1
                }
            ]
        }
    ],
    "rules": [
        {
            "rule_id": 0,
            "rule_name": "replicated_ruleset",
            "ruleset": 0,
            "type": 1,
            "min_size": 1,
            "max_size": 10,
            "steps": [
                {
                    "op": "take",
                    "item": -1,
                    "item_name": "default"
                },
                {
                    "op": "chooseleaf_firstn",
                    "num": 0,
                    "type": "host"
                },
                {
                    "op": "emit"
                }
            ]
        },
        {
            "rule_id": 1,
            "rule_name": "IF500-ruleset",
            "ruleset": 1,
            "type": 1,
            "min_size": 1,
            "max_size": 10,
            "steps": [
                {
                    "op": "take",
                    "item": -1,
                    "item_name": "default"
                },
                {
                    "op": "chooseleaf_firstn",
                    "num": 0,
                    "type": "chassis"
                },
                {
                    "op": "emit"
                }
            ]
        }
    ],
    "tunables": {
        "choose_local_tries": 0,
        "choose_local_fallback_tries": 0,
        "choose_total_tries": 50,
        "chooseleaf_descend_once": 1,
        "chooseleaf_vary_r": 1,
        "chooseleaf_stable": 0,
        "straw_calc_version": 1,
        "allowed_bucket_algs": 54,
        "profile": "hammer",
        "optimal_tunables": 0,
        "legacy_tunables": 0,
        "minimum_required_version": "firefly",
        "require_feature_tunables": 1,
        "require_feature_tunables2": 1,
        "has_v2_rules": 0,
        "require_feature_tunables3": 1,
        "has_v3_rules": 0,
        "has_v4_buckets": 0,
        "require_feature_tunables5": 0,
        "has_v5_rules": 0
    }
}

In our setup, we're also only (currently) using a single category of storage hardware, so we only created a single storage class in VSM's cluster manifest named 'SSD'.  We discovered a while back that when importing the cluster, the storage group names need to correspond to crush rulesets, otherwise VSM can't correctly identify them when creating a replicated pool.  When importing the cluster, we create a 'storage group' entry corresponding to each crush rule, and since we only have one storage class, had to assign it to both storage groups as follows:

VSM cluster.manifest during install, prior to import:

[storage_class]
ssd

[storage_group]
replicated_ruleset replicated_ruleset ssd
if500-ruleset if500-ruleset ssd

As a result, after importing the cluster, we see 2 storage groups in the GUI, with identical capacity values:

Storage groups after import

So far, this isn't too confusing, it's just that there are 2 groups on the same type of storage with different crushrules backing them, and the names are identical to the crush rules so it's not obvious that these objects are really storage groups, and not the rulesets themslves.  What gets a bit more confusing is that it seems the import actually treats these 2 groups as separate storage classes as well.  If I go to create a new storage group, the 'SSD' storage class actually shows up twice, as though it's really asking me which crush ruleset to use and displaying the corresponding storage class, expecting that there will always be a 1:1 relation:

Creating new storage class

So the question is - is there something we should be doing differently in our cluster setup or our import steps?  Should we not be using different rulesets?  Is there any way to setup the storage groups in the cluster manifest that will clearly indicate that they're both associated to the same storage class but not interfere when creating new groups?



For reference, here's what happens when you instead create a single storage group with a name not associated to either crush ruleset - the other two still get created, but instead of being associated to the 'SSD' storage class, the class name is also the same as the group/ruleset name - this actually seems to make more sense when creating new storage groups, because now they at least don't have the same name:

cluster.manifest

[storage_class]
ssd

[storage_group]
high_performance high_performance ssd


This was the result in the GUI:
Created 1 storage group with unrelated name
groups created with class name same as ruleset

Given the previous result, I also tried leaving storage group out of the cluster manifest entirely, and got the following error during installation:

+ cluster_manifest
Traceback (most recent call last):
  File "/usr/local/bin/cluster_manifest", line 240, in <module>
    smp = ManifestChecker(fpath)
  File "/usr/local/bin/cluster_manifest", line 43, in __init__
    self._info = self._smp.format_to_json()
  File "/usr/local/lib/python2.7/dist-packages/vsm/manifest/parser.py", line 584, in format_to_json
    return self._format_cluster_manifest_to_json()
  File "/usr/local/lib/python2.7/dist-packages/vsm/manifest/parser.py", line 567, in _format_cluster_manifest_to_json
    self._dict_insert_storage_group_c()
  File "/usr/local/lib/python2.7/dist-packages/vsm/manifest/parser.py", line 493, in _dict_insert_storage_group_c
    "storage_class": name_dict['third'][idx]}
IndexError: list index out of range
+ cleanup
Reply | Threaded
Open this post in threaded view
|

Re: Crush rulesets when importing cluster

rdowell
This post was updated on .
I tried another method today - simply appending '-group' to the CRUSH ruleset names prior to import:

cluster.manifest

[storage_class]
ssd

[storage_group]
replicated_ruleset-group replicated_ruleset-group ssd
if500-ruleset-group if500-ruleset-group ssd

This actually made the GUI even more confusing, because it now imported each ruleset as its own storage group, while also adding the groups I defined in the manifest:

Monitor storage groups
Manage storage groups

In this case, the groups that VSM created directly from the crush rules are actually exactly as we want them to be, but the ones defined in the manifest are useless - despite being associated to the 'ssd' storage class which all our OSD's are defined under in the manifests, they show no associated storage.  The two that directly mirror the crush rules show all the storage and also show the ruleset name as their 'storage class' value, meaning that they show the distinct values in the 'create storage group' dialog as well, though this still implies to me that 'create storage group' in the GUI is actually still using crush rules and expecting that each storage class has a distinct related crush rule.

Create new group

Given all of this, it looks like the way import should really work is that it ought to not require any storage groups in the cluster manifest at all, and should instead create a new group for each crush ruleset it finds, while also flagging those rulesets as 'storage class' objects and allowing them to be used for creating new user-defined storage groups.  The problem is that the manifest is parsed during install, before getting to the import step, so it requires the manifest to follow the same parameters as it would if doing a standalone VSM installation.

I also tried a different method - defining the 'storage class' name in the storage groups as matching the crush rulesets:

cluster.manifest

[storage_class]
ssd

[storage_group]
replicated_group replicated_group replicated_ruleset
if500-group if500-group if500-ruleset

As I expected, this resulted in an error during installation, before getting to importing the cluster:
+ cluster_manifest
[cluster]
{'name': 'ec_profiles', 'third': [], 'second': [], 'single': [], 'fourth': [], 'fifth': [], 'first': []}
{'name': 'ec_profiles', 'third': [], 'second': [], 'single': [], 'fourth': [], 'fifth': [], 'first': []}
Warning: There are some networks be the same.
management_network = 10.55.0.0/16
ceph_public_network = 10.55.0.0/16
cluster_network = 10.55.0.0/16
----------------ERROR---------------
Errors below is caused by your manifest file
Please check your manifest file!
------------------------------------
Can not find storage_class ssd in storage_group
------------------------------------
+ cleanup
+ test -f /etc/ssh/ssh_config.vsmsave
+ mv /etc/ssh/ssh_config.vsmsave /etc/ssh/ssh_config
Reply | Threaded
Open this post in threaded view
|

Re: Crush rulesets when importing cluster

rdowell
This post was updated on .
Just tried another method to resolve this and still not getting the proper results in the GUI

Create both a 'storage class' and 'storage group' entry in cluster.manifest for each existing CRUSH ruleset:

cluster.manifest

[storage_class]
replicated_ruleset
if500-ruleset

[storage_group]
replicated_group replicated_group replicated_ruleset
if500-group if500-group if500-ruleset

Define storage devices in one of the groups arbitrarily, since the two crush rulesets apply to all devices it shouldn't matter which group they're defined under in the manifest

server.manifest

[replicated_ruleset]
/dev/sdb1 /dev/sdb2
/dev/sdc1 /dev/sdc2
/dev/sdd1 /dev/sdd2
/dev/sde1 /dev/sde2
/dev/sdf1 /dev/sdf2
/dev/sdg1 /dev/sdg2
/dev/sdh1 /dev/sdh2
/dev/sdi1 /dev/sdi2

The result is the same as the last post: GUI shows 4 storage groups, the 2 that I created in the manifest but showing as no attached storage, and the 2 it automatically created from the crush rulesets each showing the same attached storage

I think maybe the proper solution here would be to fix the problem inside the database during import.  Since the groups defined in the cluster manifest seem to always be incorrect when importing a cluster, the VSM agent should first delete all existing storage group entries from the database (which will delete the ones defined in the manifest) before creating the new ones from the discovered crush rules.  The only problem with this solution is that it won't solve the initial confusion that caused our QA team to flag this issue in the first place - the fact that the user will see storage groups with identical names to the crush rulesets, so it's not clear that the objects being displayed really are groups and not the rulesets themselves.  Another possibility would be to setup the storage groups in the cluster manifest as defined in this post but update the agent code that runs during cluster import to recognize the rulesets it finds in the crushmaps as related to the existing DB entries from the manifest, instead of creating unnecessary duplicates.
Reply | Threaded
Open this post in threaded view
|

Re: Crush rulesets when importing cluster

rdowell
I may have finally found a solution to this problem:

Set the 'storage class' values to mirror the crush rules
Set the 'storage group' name values to mirror the crush rules, but modify the 'friendly' names only:

cluster.manifest

[storage_class]
replicated_ruleset
if500-ruleset

[storage_group]
replicated_ruleset replicated_group replicated_ruleset
if500-ruleset if500-group if500-ruleset

Assign the storage devices to one class arbitrarily as before:

server.manifest

[replicated_ruleset]
/dev/sdb1 /dev/sdb2
/dev/sdc1 /dev/sdc2
/dev/sdd1 /dev/sdd2
/dev/sde1 /dev/sde2
/dev/sdf1 /dev/sdf2
/dev/sdg1 /dev/sdg2
/dev/sdh1 /dev/sdh2
/dev/sdi1 /dev/sdi2

Now I see only 2 groups in the GUI, and I see 2 distinct class options representing the crush rulesets when creating a new group:




It's still less than ideal that the group name is identical to the ruleset name, but as long as the friendly name is visible and distinct from the ruleset name that's a bit less of an issue