Manage Servers stuck in ERROR: add_osd error

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Manage Servers stuck in ERROR: add_osd error

Syahrul Sazli Shaharir
Hi,

I have a ceph cluster running fine on vsm 2.1. One day I added another osd node via "Manage Servers" --> "Add Servers", but I didn't realize one of the node's disks ( /dev/sde ) was offline at the time, resulting in "ERROR: add_osd error", with this logged in the new node's vsm agent logs:-

2016-06-01 06:09:44     INFO [vsm.agent.driver] >>>> _add_ceph_osd_to_config added
2016-06-01 06:09:44     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2016-06-01 06:09:44     INFO [vsm.agent.driver] >>> step4 start
2016-06-01 06:09:44     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER
2016-06-01 06:09:44     INFO [vsm.utils] ---------CMD-------------
2016-06-01 06:09:44     INFO [vsm.utils] Running cmd = sudo vsm-rootwrap /etc/vsm/rootwrap.conf ceph-osd -i 8 --mkfs --mkkey
2016-06-01 06:09:44     INFO [vsm.utils] stdout =
2016-06-01 06:09:44     INFO [vsm.utils] stderr = 2016-06-01 06:09:44.582822 7f28a3461880 -1 filestore(/var/lib/ceph/osd/osd8) mkjournal error
 creating journal on /dev/sde1: (22) Invalid argument
2016-06-01 06:09:44.582857 7f28a3461880 -1 OSD::mkfs: ObjectStore::mkfs failed with error -22
2016-06-01 06:09:44.582925 7f28a3461880 -1  ** ERROR: error creating empty object store in /var/lib/ceph/osd/osd8: (22) Invalid argument
2016-06-01 06:09:44     INFO [vsm.utils] ---------CMD-------------
2016-06-01 06:09:44     INFO [vsm.utils] u"Unexpected error while running command.\nCommand: sudo vsm-rootwrap /etc/vsm/rootwrap.conf ceph-osd
 -i 8 --mkfs --mkkey\nExit code: 1\nStdout: ''\nStderr: '2016-06-01 06:09:44.582822 7f28a3461880 -1 filestore(/var/lib/ceph/osd/osd8) mkjourna
l error creating journal on /dev/sde1: (22) Invalid argument\\n2016-06-01 06:09:44.582857 7f28a3461880 -1 OSD::mkfs: ObjectStore::mkfs failed
with error -22\\n2016-06-01 06:09:44.582925 7f28a3461880 -1 \\x1b[0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/osd8:
 (22) Invalid argument\\x1b[0m\\n'" failed. Retrying.

And now the "Manage Servers" UI is stuck at "ERROR: add_osd error", and I am unable to remove the faulty node via the UI and thus unable to fix the issue and re-add.

Please advise - thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Manage Servers stuck in ERROR: add_osd error

bxzhu
hi  Syahrul Sazli Shaharir,
        Now from the UI of vsm, you can not operate the server which is with error information. But I think you can do as followed.
        1. mysql -uroot -p`cat /etc/vsmdeploy/deployrc |grep MYSQL_ROOT_PASSWORD|awk -F '=' '{print $2}'`
        2. use vsm;
        3. select * from init_nodes;  --> note the id of error node
        4. update init_nodes set status="Active" where id=?;  --> '?' is the id which you note before
        5. now, from the UI of vsm, you can remove the server, then try to add it again.
Reply | Threaded
Open this post in threaded view
|

Re: Manage Servers stuck in ERROR: add_osd error

Syahrul Sazli Shaharir
That certainly did it. Thanks.