Deploy appears successful, but no storage nodes visible in GUI.

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
Yaguang,

I've worked my way through many deployment issues and finally found a configuration that seems to deploy cleanly. I've attached my deployment notes as a PDF file (VSM-Deployment-Notes.pdf).

I've captured an analyzed the ~10,000 line log output from install.sh, and found no obvious errors. However, in the GUI I cannot see any storage nodes. Is there something I can check to determine what might be the problem here?

Also, I'm attaching a couple of patches:

* install.patch
* uninstall.patch

These are very simple changes. The install.sh script would not handle a host name for the controller node in the server.manifest file if it contained a dash character. and the uninstall.sh script still refers to hostrc rather than installrc.

Thanks,
John
Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

ywang19
Administrator

Hi John,

 

Great work! Could you share the raw format instead of pdf? So I can merge them into installation guide.

 

For the two patches, I will review and merge them.

 

For your new issue, any differences on your setup? Below log files may help trouble shooting:

i)   /var/log/apache2/*.log

ii)  /var/log/vsm/*.log on controller node and all agent nodes

iv) /var/log/syslog on controller node and all agent nodes

 

Regards,

-yaguang

From: jcalcote [via vsm-discuss] [mailto:ml-node+[hidden email]]
Sent: Friday, August 14, 2015 7:15 AM
To: Wang, Yaguang
Subject: Deploy appears successful, but no storage nodes visible in GUI.

 

Yaguang,

I've worked my way through many deployment issues and finally found a configuration that seems to deploy cleanly. I've attached my deployment notes as a PDF file (VSM-Deployment-Notes.pdf).

I've captured an analyzed the ~10,000 line log output from install.sh, and found no obvious errors. However, in the GUI I cannot see any storage nodes. Is there something I can check to determine what might be the problem here?

Also, I'm attaching a couple of patches:

* install.patch
* uninstall.patch

These are very simple changes. The install.sh script would not handle a host name for the controller node in the server.manifest file if it contained a dash character. and the uninstall.sh script still refers to hostrc rather than installrc.

Thanks,
John


If you reply to this email, your message will be added to the discussion below:

http://vsm-discuss.33411.n7.nabble.com/Deploy-appears-successful-but-no-storage-nodes-visible-in-GUI-tp93.html

To start a new topic under vsm-discuss, email [hidden email]
To unsubscribe from vsm-discuss, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

ywang19
Administrator
In reply to this post by jcalcote

About the notes, actually there are binary packages provided at https://github.com/01org/virtual-storage-manager/releases , the latest package is 2.0.0 alpha2 at https://github.com/01org/virtual-storage-manager/releases/download/v2.0.0-a2/2.0.0-alpha2.tar.gz .

 

About issues you mentioned in “Deployment” section:

1.   Messy directory tree diagram

a.    Fixed in https://github.com/01org/virtual-storage-manager/commit/88d6aeb2bc69debfe5004de72339bcbc5f9a643e

2.   About hostrc or installrc

a.    This is an change occurred after build 149, as we expect to define more variables instead of host information, so changed the name from hostrc to installrc.

 

 

 

From: jcalcote [via vsm-discuss] [mailto:ml-node+[hidden email]]
Sent: Friday, August 14, 2015 7:15 AM
To: Wang, Yaguang
Subject: Deploy appears successful, but no storage nodes visible in GUI.

 

Yaguang,

I've worked my way through many deployment issues and finally found a configuration that seems to deploy cleanly. I've attached my deployment notes as a PDF file (VSM-Deployment-Notes.pdf).

I've captured an analyzed the ~10,000 line log output from install.sh, and found no obvious errors. However, in the GUI I cannot see any storage nodes. Is there something I can check to determine what might be the problem here?

Also, I'm attaching a couple of patches:

* install.patch
* uninstall.patch

These are very simple changes. The install.sh script would not handle a host name for the controller node in the server.manifest file if it contained a dash character. and the uninstall.sh script still refers to hostrc rather than installrc.

Thanks,
John


If you reply to this email, your message will be added to the discussion below:

http://vsm-discuss.33411.n7.nabble.com/Deploy-appears-successful-but-no-storage-nodes-visible-in-GUI-tp93.html

To start a new topic under vsm-discuss, email [hidden email]
To unsubscribe from vsm-discuss, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
In reply to this post by ywang19
Yaguang,

Here's a Word doc containing the information. I can only export from our Confluence site in PDF or Word, but I assume you can translate from Word to ASCII or whatever you need.

* Setting_Up_Virtual_Storage_Manager.doc

I fixed the issues you mentioned before exporting - so it now references the release packages properly in the section on Download. I think I missed the github releases because I found the 01.org (Intel) project site "Download" section to be basically empty (just a readme, I think).

The links in the doc that refer to scripts to be downloaded are just simple bash script versions of the code blocks they reference:

* upgrademe.sh
* mkstorage.sh
* mkmesuper.sh
* getstorage.sh

I've uploaded them here if you want to make them available on the site.

I'll look into the log files you mentioned to see what might have gone wrong with my deployment.

Thanks,
John
Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
In reply to this post by ywang19
The /var/log/vsm-api.log has some possibly interesting information. I deployed yesterday about 15:45 (that's when the log starts). I immediately got several lines of data about not being able to import name 'storage' (top of log):
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Initializing extension manager.
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hardware_tenant_attribute.Hardware_tenant_attribute: cannot import name storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: os-quota-sets
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hardware_actions.Hardware_actions: cannot import name storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: appnodes
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: os-services
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hardware_image_metadata.Hardware_image_metadata: cannot import name storage
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.admin_actions.Admin_actions: cannot import name storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: poolusages
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.types_extra_specs.Types_extra_specs: No module named storage
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.backups.Backups: cannot import name backup
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.types_manage.Types_manage: No module named storage
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hardware_host_attribute.Hardware_host_attribute: cannot import name storage
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.extended_snapshot_attributes.Extended_snapshot_attributes: cannot import name storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: os-image-create
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hosts.Hosts: No module named storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: os-quota-class-sets
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find vsm_controller_ip
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find role
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find zone
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find server_list
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find openstack_ip
...
Then there are several middleware tracebacks in this log that look like this:
...
2015-08-13 15:52:32    ERROR [vsm.api.middleware.fault] Caught error: 'NoneType' object has no attribute 'get'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/middleware/fault.py", line 71, in __call__
    return req.get_response(self.application)
  File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1320, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1284, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/keystoneclient/middleware/auth_token.py", line 663, in __call__
    return self.app(env, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/routes/middleware.py", line 131, in __call__
    response = self.app(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/openstack/wsgi.py", line 785, in __call__
    content_type, body, accept)
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/openstack/wsgi.py", line 833, in _process_stack
    action_result = self.dispatch(meth, request, action_args)
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/openstack/wsgi.py", line 909, in dispatch
    return method(req=request, **action_args)
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/v1/vsms.py", line 129, in summary
    return sum_views.ViewBuilder().basic(sum_data, 'vsm')
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/views/summary.py", line 134, in basic
    'uptime': sum_data.get('uptime'),
AttributeError: 'NoneType' object has no attribute 'get'
2015-08-13 15:52:32     INFO [vsm.api.middleware.fault] http://10.50.33.78:8778/v1/73e1079e4cf847b38e4ef58dffb593a5/vsms/summary returned with HTTP 500
...
There are also lots of Exceptions related to "unsupported operand type(s) for -: 'NoneType' and 'int'.

About 17:30 I stopped playing with the system and got no more log messages till 22:22, when I got this error:

...
2015-08-13 22:22:13    ERROR [vsm.openstack.common.rpc.common] Failed to consume message from queue: Socket closed
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 552, in ensure
    return method(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 630, in _consume
    queues_tail.consume(nowait=False)
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 171, in consume
    self.queue.consume(*args, callback=_callback, **options)
  File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 609, in consume
    nowait=nowait)
  File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 1787, in basic_consume
    (60, 21),  # Channel.basic_consume_ok
  File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 67, in wait
    self.channel_id, allowed_methods)
  File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 237, in _wait_method
    self.method_reader.read_method()
  File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 189, in read_method
    raise m
IOError: Socket closed
2015-08-13 22:22:18     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:20    ERROR [vsm.openstack.common.rpc.common] Failed to consume message from queue: Socket closed
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 552, in ensure
    return method(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 630, in _consume
    queues_tail.consume(nowait=False)
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 171, in consume
    self.queue.consume(*args, callback=_callback, **options)
  File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 609, in consume
    nowait=nowait)
  File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 1787, in basic_consume
    (60, 21),  # Channel.basic_consume_ok
  File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 67, in wait
    self.channel_id, allowed_methods)
  File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 237, in _wait_method
    self.method_reader.read_method()
  File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 189, in read_method
    raise m
IOError: Socket closed
...
Within a few seconds the stack traces stopped and the rest of the log (from last night till this morning) is filled with stuff like this:
...
2015-08-13 22:22:21     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 5 seconds.
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 5 seconds.
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
...
HTH, John
Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote

On the agent, the logs are a bit different, but perhaps more telling.

The /var/log/vsm/vsm-agent.log starts dumping stack traces immediately after deployment at 15:45:

2015-08-13 15:47:44     INFO [vsm.manifest.parser] [warning] can not find zone
2015-08-13 15:47:44     INFO [vsm.manifest.wsgi_client] Agent token = 49fe4c1cb3284af6ba5ada1ed907589e, access url = http://vsm-controller:8778/v1/2c15fb0482e84e6184af1835a3fe75d8
2015-08-13 15:47:44     INFO [vsm.manifest] Can not get keyring from DB.
2015-08-13 15:47:44     INFO [vsm.manifest.config] Maybe recv is not json.
2015-08-13 15:47:45     INFO [vsm.manifest.parser] [warning] can not find zone
2015-08-13 15:47:45     INFO [vsm.service] Starting 1 workers
2015-08-13 15:47:45     INFO [vsm.service] Started child 10170
2015-08-13 15:47:45    AUDIT [vsm.service] Starting vsm-agent node (version 2.0.0)
2015-08-13 15:47:45     INFO [vsm.agent.manager] init_host in manager DEBUG
2015-08-13 15:47:45     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2015-08-13 15:47:45     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER
2015-08-13 15:47:46     INFO [vsm.openstack.common.rpc.common] Connected to AMQP server on 10.50.33.78:5673
2015-08-13 15:47:46     INFO [vsm.agent.manager] Get vlan list = [u'10.50.33.0/24', u'10.50.33.0/24', u'10.50.33.0/24']
2015-08-13 15:47:46    ERROR [vsm.service] Unhandled exception
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/vsm/service.py", line 222, in _start_child
    self._child_process(wrap.server)
  File "/usr/local/lib/python2.7/dist-packages/vsm/service.py", line 199, in _child_process
    launcher.run_server(server)
  File "/usr/local/lib/python2.7/dist-packages/vsm/service.py", line 95, in run_server
    server.start()
  File "/usr/local/lib/python2.7/dist-packages/vsm/service.py", line 385, in start
    self.manager.insert_node_info_into_db()
  File "/usr/local/lib/python2.7/dist-packages/vsm/agent/manager.py", line 277, in insert_node_info_into_db
    cluster_ref = self._get_cluster_ref()
  File "/usr/local/lib/python2.7/dist-packages/vsm/agent/manager.py", line 123, in _get_cluster_ref
    if utils.is_in_lan(controller_ip, lan):
  File "/usr/local/lib/python2.7/dist-packages/vsm/utils.py", line 1412, in is_in_lan
    if ip in ipcalc.Network(ip_mask):
  File "/usr/local/lib/python2.7/dist-packages/vsm/ipcalc.py", line 621, in __contains__
    return self.check_collision(ip)
  File "/usr/local/lib/python2.7/dist-packages/vsm/ipcalc.py", line 608, in check_collision
    other = Network(other)
  File "/usr/local/lib/python2.7/dist-packages/vsm/ipcalc.py", line 157, in __init__
    self.ip = self._dqtoi(ip)
  File "/usr/local/lib/python2.7/dist-packages/vsm/ipcalc.py", line 310, in _dqtoi
    raise ValueError('Invalid address input')
ValueError: Invalid address input
2015-08-13 15:47:46     INFO [vsm.service] Child 10170 exited with status 2
2015-08-13 15:47:46     INFO [vsm.service] Started child 10173
2015-08-13 15:47:46    AUDIT [vsm.service] Starting vsm-agent node (version 2.0.0)
2015-08-13 15:47:46     INFO [vsm.agent.manager] init_host in manager DEBUG
2015-08-13 15:47:46     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2015-08-13 15:47:46     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER
...

This sequence continues until 22:21, when we start seeing the rest of the log filled with this:

...
ValueError: Invalid address input
2015-08-13 22:21:59     INFO [vsm.service] Child 24132 exited with status 2
2015-08-13 22:21:59     INFO [vsm.service] Forking too fast, sleeping
2015-08-13 22:22:00     INFO [vsm.service] Started child 24135
2015-08-13 22:22:00    AUDIT [vsm.service] Starting vsm-agent node (version 2.0.0)
2015-08-13 22:22:00     INFO [vsm.agent.manager] init_host in manager DEBUG
2015-08-13 22:22:00     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2015-08-13 22:22:00     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER
2015-08-13 22:22:05    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:06     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:06    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:09     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:09    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 5 seconds.
2015-08-13 22:22:14     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:14    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 7 seconds.
2015-08-13 22:22:21     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 9 seconds.
2015-08-13 22:22:30     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:30    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 11 seconds.
2015-08-13 22:22:41     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
...

Looks like the agent is not functioning properly. John

Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote

I don't know if this is related, but it seems a big coincidence that the 'beam' process invoked the OOM killer on the controller node at 22:22 (/var/log/syslog):

...
Aug 13 22:17:02 vsm-controller CRON[1855]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 13 22:22:05 vsm-controller kernel: [28416.607906] beam invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Aug 13 22:22:05 vsm-controller kernel: [28416.607911] beam cpuset=/ mems_allowed=0
Aug 13 22:22:05 vsm-controller kernel: [28416.607921] CPU: 0 PID: 19609 Comm: beam Not tainted 3.16.0-45-generic #60~14.04.1-Ubuntu
Aug 13 22:22:05 vsm-controller kernel: [28416.607922] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013
Aug 13 22:22:05 vsm-controller kernel: [28416.607924]  0000000000000000 ffff88003bf07a20 ffffffff81765ca1 ffff88003dabdbb0
Aug 13 22:22:05 vsm-controller kernel: [28416.607927]  ffff88003bf07aa8 ffffffff8175f855 0000000000000000 0000000000000000
...
John
Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

ywang19
Administrator
In reply to this post by jcalcote

Hi John,

 

This should be the issue why agent nodes not showing up in server list. I made a commit to fix it: https://github.com/01org/virtual-storage-manager/commit/7969ad31367f12034587a18b0072886c1511930e, it’s working in my test. Feel free to have a try.

 

 

-yaguang

 

From: jcalcote [via vsm-discuss] [mailto:ml-node+[hidden email]]
Sent: Saturday, August 15, 2015 2:57 AM
To: Wang, Yaguang
Subject: RE: Deploy appears successful, but no storage nodes visible in GUI.

 

The /var/log/vsm-api.log has some possibly interesting information. I deployed yesterday about 15:45 (that's when the log starts). I immediately got several lines of data about not being able to import name 'storage' (top of log):

 
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Initializing extension manager.
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hardware_tenant_attribute.Hardware_tenant_attribute: cannot import name storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: os-quota-sets
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hardware_actions.Hardware_actions: cannot import name storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: appnodes
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: os-services
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hardware_image_metadata.Hardware_image_metadata: cannot import name storage
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.admin_actions.Admin_actions: cannot import name storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: poolusages
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.types_extra_specs.Types_extra_specs: No module named storage
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.backups.Backups: cannot import name backup
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.types_manage.Types_manage: No module named storage
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hardware_host_attribute.Hardware_host_attribute: cannot import name storage
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.extended_snapshot_attributes.Extended_snapshot_attributes: cannot import name storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: os-image-create
2015-08-13 15:45:46  WARNING [vsm.api.contrib] Failed to load extension vsm.api.contrib.hosts.Hosts: No module named storage
2015-08-13 15:45:46    AUDIT [vsm.api.extensions] Loaded extension: os-quota-class-sets
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find vsm_controller_ip
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find role
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find zone
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find server_list
2015-08-13 15:45:46     INFO [vsm.manifest.parser] [warning] can not find openstack_ip
...

Then there are several middleware tracebacks in this log that look like this:

 
...
2015-08-13 15:52:32    ERROR [vsm.api.middleware.fault] Caught error: 'NoneType' object has no attribute 'get'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/middleware/fault.py", line 71, in __call__
    return req.get_response(self.application)
  File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1320, in send
    application, catch_exc_info=False)
  File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1284, in call_application
    app_iter = application(self.environ, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/keystoneclient/middleware/auth_token.py", line 663, in __call__
    return self.app(env, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/routes/middleware.py", line 131, in __call__
    response = self.app(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
    return resp(environ, start_response)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/openstack/wsgi.py", line 785, in __call__
    content_type, body, accept)
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/openstack/wsgi.py", line 833, in _process_stack
    action_result = self.dispatch(meth, request, action_args)
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/openstack/wsgi.py", line 909, in dispatch
    return method(req=request, **action_args)
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/v1/vsms.py", line 129, in summary
    return sum_views.ViewBuilder().basic(sum_data, 'vsm')
  File "/usr/local/lib/python2.7/dist-packages/vsm/api/views/summary.py", line 134, in basic
    'uptime': sum_data.get('uptime'),
AttributeError: 'NoneType' object has no attribute 'get'
2015-08-13 15:52:32     INFO [vsm.api.middleware.fault] http://10.50.33.78:8778/v1/73e1079e4cf847b38e4ef58dffb593a5/vsms/summary returned with HTTP 500
...

There are also lots of Exceptions related to "unsupported operand type(s) for -: 'NoneType' and 'int'.

About 17:30 I stopped playing with the system and got no more log messages till 22:22, when I got this error:

 
...
2015-08-13 22:22:13    ERROR [vsm.openstack.common.rpc.common] Failed to consume message from queue: Socket closed
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 552, in ensure
    return method(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 630, in _consume
    queues_tail.consume(nowait=False)
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 171, in consume
    self.queue.consume(*args, callback=_callback, **options)
  File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 609, in consume
    nowait=nowait)
  File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 1787, in basic_consume
    (60, 21),  # Channel.basic_consume_ok
  File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 67, in wait
    self.channel_id, allowed_methods)
  File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 237, in _wait_method
    self.method_reader.read_method()
  File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 189, in read_method
    raise m
IOError: Socket closed
2015-08-13 22:22:18     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:20    ERROR [vsm.openstack.common.rpc.common] Failed to consume message from queue: Socket closed
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 552, in ensure
    return method(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 630, in _consume
    queues_tail.consume(nowait=False)
  File "/usr/local/lib/python2.7/dist-packages/vsm/openstack/common/rpc/impl_kombu.py", line 171, in consume
    self.queue.consume(*args, callback=_callback, **options)
  File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 609, in consume
    nowait=nowait)
  File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 1787, in basic_consume
    (60, 21),  # Channel.basic_consume_ok
  File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 67, in wait
    self.channel_id, allowed_methods)
  File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 237, in _wait_method
    self.method_reader.read_method()
  File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 189, in read_method
    raise m
IOError: Socket closed
...

Within a few seconds the stack traces stopped and the rest of the log (from last night till this morning) is filled with stuff like this:

 
...
2015-08-13 22:22:21     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:22    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 5 seconds.
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 5 seconds.
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:25     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
...

HTH, John


If you reply to this email, your message will be added to the discussion below:

http://vsm-discuss.33411.n7.nabble.com/Deploy-appears-successful-but-no-storage-nodes-visible-in-GUI-tp93p97.html

To start a new topic under vsm-discuss, email [hidden email]
To unsubscribe from vsm-discuss, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

ywang19
Administrator
In reply to this post by jcalcote

Are you using host name instead of ip address in [vsm_controller_ip]? Although you can use host name for installrc, but so far ip address is still required for [vsm_controller_ip] in server.manifest.

 

 

From: jcalcote [via vsm-discuss] [mailto:ml-node+[hidden email]]
Sent: Saturday, August 15, 2015 3:05 AM
To: Wang, Yaguang
Subject: RE: Deploy appears successful, but no storage nodes visible in GUI.

 

On the agent, the logs are a bit different, but perhaps more telling.

The /var/log/vsm/vsm-agent.log starts dumping stack traces immediately after deployment at 15:45:

 
2015-08-13 15:47:44     INFO [vsm.manifest.parser] [warning] can not find zone
2015-08-13 15:47:44     INFO [vsm.manifest.wsgi_client] Agent token = 49fe4c1cb3284af6ba5ada1ed907589e, access url = http://vsm-controller:8778/v1/2c15fb0482e84e6184af1835a3fe75d8
2015-08-13 15:47:44     INFO [vsm.manifest] Can not get keyring from DB.
2015-08-13 15:47:44     INFO [vsm.manifest.config] Maybe recv is not json.
2015-08-13 15:47:45     INFO [vsm.manifest.parser] [warning] can not find zone
2015-08-13 15:47:45     INFO [vsm.service] Starting 1 workers
2015-08-13 15:47:45     INFO [vsm.service] Started child 10170
2015-08-13 15:47:45    AUDIT [vsm.service] Starting vsm-agent node (version 2.0.0)
2015-08-13 15:47:45     INFO [vsm.agent.manager] init_host in manager DEBUG
2015-08-13 15:47:45     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2015-08-13 15:47:45     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER
2015-08-13 15:47:46     INFO [vsm.openstack.common.rpc.common] Connected to AMQP server on 10.50.33.78:5673
2015-08-13 15:47:46     INFO [vsm.agent.manager] Get vlan list = [u'10.50.33.0/24', u'10.50.33.0/24', u'10.50.33.0/24']
2015-08-13 15:47:46    ERROR [vsm.service] Unhandled exception
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/vsm/service.py", line 222, in _start_child
    self._child_process(wrap.server)
  File "/usr/local/lib/python2.7/dist-packages/vsm/service.py", line 199, in _child_process
    launcher.run_server(server)
  File "/usr/local/lib/python2.7/dist-packages/vsm/service.py", line 95, in run_server
    server.start()
  File "/usr/local/lib/python2.7/dist-packages/vsm/service.py", line 385, in start
    self.manager.insert_node_info_into_db()
  File "/usr/local/lib/python2.7/dist-packages/vsm/agent/manager.py", line 277, in insert_node_info_into_db
    cluster_ref = self._get_cluster_ref()
  File "/usr/local/lib/python2.7/dist-packages/vsm/agent/manager.py", line 123, in _get_cluster_ref
    if utils.is_in_lan(controller_ip, lan):
  File "/usr/local/lib/python2.7/dist-packages/vsm/utils.py", line 1412, in is_in_lan
    if ip in ipcalc.Network(ip_mask):
  File "/usr/local/lib/python2.7/dist-packages/vsm/ipcalc.py", line 621, in __contains__
    return self.check_collision(ip)
  File "/usr/local/lib/python2.7/dist-packages/vsm/ipcalc.py", line 608, in check_collision
    other = Network(other)
  File "/usr/local/lib/python2.7/dist-packages/vsm/ipcalc.py", line 157, in __init__
    self.ip = self._dqtoi(ip)
  File "/usr/local/lib/python2.7/dist-packages/vsm/ipcalc.py", line 310, in _dqtoi
    raise ValueError('Invalid address input')
ValueError: Invalid address input
2015-08-13 15:47:46     INFO [vsm.service] Child 10170 exited with status 2
2015-08-13 15:47:46     INFO [vsm.service] Started child 10173
2015-08-13 15:47:46    AUDIT [vsm.service] Starting vsm-agent node (version 2.0.0)
2015-08-13 15:47:46     INFO [vsm.agent.manager] init_host in manager DEBUG
2015-08-13 15:47:46     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2015-08-13 15:47:46     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER
...

This sequence continues until 22:21, when we start seeing the rest of the log filled with this:

 
...
ValueError: Invalid address input
2015-08-13 22:21:59     INFO [vsm.service] Child 24132 exited with status 2
2015-08-13 22:21:59     INFO [vsm.service] Forking too fast, sleeping
2015-08-13 22:22:00     INFO [vsm.service] Started child 24135
2015-08-13 22:22:00    AUDIT [vsm.service] Starting vsm-agent node (version 2.0.0)
2015-08-13 22:22:00     INFO [vsm.agent.manager] init_host in manager DEBUG
2015-08-13 22:22:00     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2015-08-13 22:22:00     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER
2015-08-13 22:22:05    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-08-13 22:22:06     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:06    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 3 seconds.
2015-08-13 22:22:09     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:09    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 5 seconds.
2015-08-13 22:22:14     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:14    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 7 seconds.
2015-08-13 22:22:21     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:21    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 9 seconds.
2015-08-13 22:22:30     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
2015-08-13 22:22:30    ERROR [vsm.openstack.common.rpc.common] AMQP server on 10.50.33.78:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 11 seconds.
2015-08-13 22:22:41     INFO [vsm.openstack.common.rpc.common] Reconnecting to AMQP server on 10.50.33.78:5673
...

Looks like the agent is not functioning properly. John


If you reply to this email, your message will be added to the discussion below:

http://vsm-discuss.33411.n7.nabble.com/Deploy-appears-successful-but-no-storage-nodes-visible-in-GUI-tp93p98.html

To start a new topic under vsm-discuss, email [hidden email]
To unsubscribe from vsm-discuss, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
Thanks Yaguang - I'll try the patch you committed and mentioned above. To your other question, yes, I was using the controller host name in the installrc - you probably figured that out because one of my patches was to allow hostnames with dashes in that location in installrc. :)

I'll pull your latest and change controller host name to ip address. I'll let you know what happens.

Thanks,
John
Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
Yaguang,

Your changes, along with your recommendation to use controller IP address in server.manifest files, helped me get past the issue with no vsm agent connectivity. All appears to be working in VSM now.

The next problem I'm facing is getting ceph working. It appears that ceph is neither loaded nor configured properly on the storage nodes. At first I thought it was a problem with the way I formatted my xfs volumes, but then I realized that ceph is not running on the storage nodes. An attempt to start ceph on one of the storage nodes generated an error:

$ sudo /etc/init.d/ceph start
/etc/init.d/ceph: ceph conf /etc/ceph/ceph.conf not found; system is not configured.

Sure enough - there is no /etc/ceph/ceph.conf file on this node.

Thanks for all your help so far,

John

Reply | Threaded
Open this post in threaded view
|

Re: Deploy appears successful, but no storage nodes visible in GUI.

dan.ferber@intel.com
Administrator

John,

I have heard of one other instance of this, so will be interesting to see what Yagunag says.

Back in the das of VSM 1.0 the install process was more manual, so if ceph did not install on the agent / storage nodes, or the vsm agents did not start – you saw that really clearly.

I have not tried the new automated install yet and I need to – but I would think errors like ceph not getting installed (or manifest files not passing sanity checks) would get flagged and logged by the vsm installer?

Or related, how one checks that the basics from the automated installer are all done, and done correctly.

Dan


Dan Ferber
Software  Defined Storage
+1 651-344-1846
[hidden email]

From: "jcalcote [via vsm-discuss]" <[hidden email]<mailto:[hidden email]>>
Date: Monday, August 17, 2015 at 10:04 AM
To: Dan Ferber <[hidden email]<mailto:[hidden email]>>
Subject: RE: Deploy appears successful, but no storage nodes visible in GUI.

Yaguang,

Your changes, along with your recommendation to use controller IP address in server.manifest files, helped me get past the issue with no vsm agent connectivity. All appears to be working in VSM now.

The next problem I'm facing is getting ceph working. It appears that ceph is neither loaded nor configured properly on the storage nodes. At first I thought it was a problem with the way I formatted my xfs volumes, but then I realized that ceph is not running on the storage nodes. An attempt to start ceph on one of the storage nodes generated an error:

$ sudo /etc/init.d/ceph start
/etc/init.d/ceph: ceph conf /etc/ceph/ceph.conf not found; system is not configured.

Sure enough - there is no /etc/ceph/ceph.conf file on this node.

Thanks for all your help so far,
John

________________________________
If you reply to this email, your message will be added to the discussion below:
http://vsm-discuss.33411.n7.nabble.com/Deploy-appears-successful-but-no-storage-nodes-visible-in-GUI-tp93p103.html
To start a new topic under vsm-discuss, email [hidden email]<mailto:[hidden email]>
To unsubscribe from vsm-discuss, click here<
NAML<
http://vsm-discuss.33411.n7.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
Dan Ferber
Intel Software Defined Storage
Reply | Threaded
Open this post in threaded view
|

Re: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
Hi Dan,

Agreed on all points.

FYI - here's a screen shot of the Manage Devices screen immediately after a clean installation. Hoping Yaguang will see this and say, "Oh - I know what that means..." :)



John
Reply | Threaded
Open this post in threaded view
|

Re: Deploy appears successful, but no storage nodes visible in GUI.

ywang19
Administrator
Hi John,

Glad to hear the agent connectivity issue is resolved. for the device status page, as your devices have red mark, it means there are some issues on recognizing those devices. looking at log files in /var/log/vsm folder should give out more information.


-yaguang
Reply | Threaded
Open this post in threaded view
|

Re: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
This post was updated on .

That's probably true, and it's likely related to my earlier comment that there are no ceph processes running on the OSD nodes. I've looked in the /var/log/vsm/*.log files and found nothing that appears to indicate VSM has a problem with the drives. Here's the vsm-agent.log - as you can see it's pretty short:

root@vsm-store1:/var/log/vsm# cat vsm-agent.log
2015-08-17 12:58:29     INFO [vsm.manifest.parser] [warning] can not find zone
2015-08-17 12:58:29     INFO [vsm.manifest.wsgi_client] Agent token = 9942e72a149d4f62972f301d12f45f26, access url = http://10.50.33.78:8778/v1/1f61d0ac33364663bdcd7d3231473ea8
2015-08-17 12:58:29     INFO [vsm.manifest] Can not get keyring from DB.
2015-08-17 12:58:29     INFO [vsm.manifest.config] Maybe recv is not json.
2015-08-17 12:58:30     INFO [vsm.manifest.parser] [warning] can not find zone
2015-08-17 12:58:30     INFO [vsm.service] Starting 1 workers
2015-08-17 12:58:30     INFO [vsm.service] Started child 7269
2015-08-17 12:58:30    AUDIT [vsm.service] Starting vsm-agent node (version 2.0.0)
2015-08-17 12:58:30     INFO [vsm.agent.manager] init_host in manager DEBUG
2015-08-17 12:58:30     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2015-08-17 12:58:30     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER
2015-08-17 12:58:30     INFO [vsm.openstack.common.rpc.common] Connected to AMQP server on 10.50.33.78:5673
2015-08-17 12:58:30     INFO [vsm.agent.manager] Get vlan list = [u'10.50.33.0/24', u'10.50.33.0/24', u'10.50.33.0/24']
2015-08-17 12:58:30     INFO [vsm.agent.manager] storage_group_ref=['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__hash__', '__init__', '__iter__', '__mapper__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '__table__', '__table_args__', '__table_initialized__', '__tablename__', '__weakref__', '_decl_class_registry', '_sa_class_manager', '_sa_instance_state', 'created_at', 'delete', 'deleted', 'deleted_at', 'drive_extended_threshold', 'friendly_name', 'get', 'id', 'iteritems', 'metadata', 'name', 'next', 'osd_state', 'rule_id', 'save', 'status', 'storage_class', 'storage_pool', 'update', 'updated_at']==
2015-08-17 12:58:30     INFO [vsm.agent.manager] ip_dict = {'raw_ip': '10.50.33.82', 'service_id': 3L, 'secondary_public_ip': '10.50.33.82', 'cluster_ip': '10.50.33.82', 'primary_public_ip': '10.50.33.82'}
2015-08-17 12:58:30     INFO [vsm.agent.manager]  create init_node ref = {
    "ceph_ver": "0.80.10",
    "cluster_id": 1,
    "cluster_ip": "10.50.33.82",
    "data_drives_number": 1,
    "deleted": false,
    "host": "vsm-store1",
    "id_rsa_pub": "",
    "mds": "no",
    "primary_public_ip": "10.50.33.82",
    "raw_ip": "10.50.33.82",
    "secondary_public_ip": "10.50.33.82",
    "service_id": 3,
    "status": "available",
    "type": "monitor,storage,",
    "weight": "1.0",
    "zone_id": 1
}
2015-08-17 12:58:30     INFO [vsm.openstack.common.rpc.common] Connected to AMQP server on 10.50.33.78:5673
2015-08-17 12:58:31     INFO [vsm.agent.manager] host = vsm-store1 ip = [u'10.50.33.82', u'10.50.33.82', u'10.50.33.82']
2015-08-17 12:58:31     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db.
2015-08-17 12:58:31     INFO [vsm.agent.manager] agent/manager.py update ceph.conf from db. OVER

The vsm-physical.log file is even shorter:

root@vsm-store1:/var/log/vsm# cat vsm-physical.log
2015-08-17 12:58:37     INFO [vsm.service] Starting 1 workers
2015-08-17 12:58:37     INFO [vsm.service] Started child 7407
2015-08-17 12:58:37    AUDIT [vsm.service] Starting vsm-physical node (version 2.0.0)
2015-08-17 12:58:37     INFO [vsm.openstack.common.rpc.common] Connected to AMQP server on 10.50.33.78:5673
2015-08-17 12:58:49     INFO [vsm.openstack.common.rpc.common] Connected to AMQP server on 10.50.33.78:5673

And here's my OSD node process list:

root@vsm-store1:/var/log/vsm# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Aug17 ?        00:00:01 /sbin/init
root         2     0  0 Aug17 ?        00:00:00 [kthreadd]
root         3     2  0 Aug17 ?        00:00:02 [ksoftirqd/0]
root         5     2  0 Aug17 ?        00:00:00 [kworker/0:0H]
root         7     2  0 Aug17 ?        00:00:02 [rcu_sched]
root         8     2  0 Aug17 ?        00:00:03 [rcuos/0]
root         9     2  0 Aug17 ?        00:00:00 [rcu_bh]
root        10     2  0 Aug17 ?        00:00:00 [rcuob/0]
root        11     2  0 Aug17 ?        00:00:00 [migration/0]
root        12     2  0 Aug17 ?        00:00:02 [watchdog/0]
root        13     2  0 Aug17 ?        00:00:00 [khelper]
root        14     2  0 Aug17 ?        00:00:00 [kdevtmpfs]
root        15     2  0 Aug17 ?        00:00:00 [netns]
root        16     2  0 Aug17 ?        00:00:00 [khungtaskd]
root        17     2  0 Aug17 ?        00:00:00 [writeback]
root        18     2  0 Aug17 ?        00:00:00 [ksmd]
root        19     2  0 Aug17 ?        00:00:00 [khugepaged]
root        20     2  0 Aug17 ?        00:00:00 [crypto]
root        21     2  0 Aug17 ?        00:00:00 [kintegrityd]
root        22     2  0 Aug17 ?        00:00:00 [bioset]
root        23     2  0 Aug17 ?        00:00:00 [kblockd]
root        24     2  0 Aug17 ?        00:00:00 [ata_sff]
root        25     2  0 Aug17 ?        00:00:00 [khubd]
root        26     2  0 Aug17 ?        00:00:00 [md]
root        27     2  0 Aug17 ?        00:00:00 [devfreq_wq]
root        31     2  0 Aug17 ?        00:00:00 [kswapd0]
root        32     2  0 Aug17 ?        00:00:00 [fsnotify_mark]
root        33     2  0 Aug17 ?        00:00:00 [ecryptfs-kthrea]
root        45     2  0 Aug17 ?        00:00:00 [kthrotld]
root        46     2  0 Aug17 ?        00:00:00 [acpi_thermal_pm]
root        47     2  0 Aug17 ?        00:00:00 [scsi_eh_0]
root        48     2  0 Aug17 ?        00:00:00 [scsi_tmf_0]
root        49     2  0 Aug17 ?        00:00:00 [scsi_eh_1]
root        50     2  0 Aug17 ?        00:00:00 [scsi_tmf_1]
root        52     2  0 Aug17 ?        00:00:00 [ipv6_addrconf]
root        72     2  0 Aug17 ?        00:00:00 [deferwq]
root        73     2  0 Aug17 ?        00:00:00 [charger_manager]
root       119     2  0 Aug17 ?        00:00:00 [mpt_poll_0]
root       120     2  0 Aug17 ?        00:00:00 [mpt/0]
root       121     2  0 Aug17 ?        00:00:00 [kpsmoused]
root       124     2  0 Aug17 ?        00:00:00 [scsi_eh_2]
root       125     2  0 Aug17 ?        00:00:00 [scsi_tmf_2]
root       137     2  0 Aug17 ?        00:00:05 [jbd2/sda1-8]
root       138     2  0 Aug17 ?        00:00:00 [ext4-rsv-conver]
root       284     1  0 Aug17 ?        00:00:00 upstart-udev-bridge --daemon
root       292     1  0 Aug17 ?        00:00:00 /lib/systemd/systemd-udevd --daemon
root       375     1  0 Aug17 ?        00:00:00 upstart-file-bridge --daemon
root       413     2  0 Aug17 ?        00:00:00 [ttm_swap]
message+   440     1  0 Aug17 ?        00:00:00 dbus-daemon --system --fork
syslog     482     1  0 Aug17 ?        00:00:00 rsyslogd
root       494     1  0 Aug17 ?        00:00:00 /lib/systemd/systemd-logind
root       527     1  0 Aug17 ?        00:00:00 upstart-socket-bridge --daemon
root       612     1  0 Aug17 ?        00:00:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root       748     1  0 Aug17 tty4     00:00:00 /sbin/getty -8 38400 tty4
root       751     1  0 Aug17 tty5     00:00:00 /sbin/getty -8 38400 tty5
root       756     1  0 Aug17 tty2     00:00:00 /sbin/getty -8 38400 tty2
root       757     1  0 Aug17 tty3     00:00:00 /sbin/getty -8 38400 tty3
root       759     1  0 Aug17 tty6     00:00:00 /sbin/getty -8 38400 tty6
root       793     1  0 Aug17 ?        00:00:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket
root       839     1  0 Aug17 ?        00:00:01 cron
daemon     840     1  0 Aug17 ?        00:00:00 atd
root       921     1  0 Aug17 tty1     00:00:00 /bin/login --
root       993     2  0 Aug17 ?        00:00:00 [kauditd]
cephuser  1040   921  0 Aug17 tty1     00:00:00 -bash
ntp       6418     1  0 Aug17 ?        00:00:05 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:112
root      7134     1  0 Aug17 ?        00:00:00 /usr/sbin/sshd -D
root      7213     1  1 Aug17 ?        00:13:17 python /usr/bin/vsm-agent --config-file /etc/vsm/vsm.conf --log-file /var/log/vsm/vsm-agent.lo
root      7269  7213  0 Aug17 ?        00:04:48 python /usr/bin/vsm-agent --config-file /etc/vsm/vsm.conf --log-file /var/log/vsm/vsm-agent.lo
root      7337     1  1 Aug17 ?        00:13:17 python /usr/bin/vsm-physical --config-file /etc/vsm/vsm.conf --log-file /var/log/vsm/vsm-physi
root      7407  7337  0 Aug17 ?        00:01:22 python /usr/bin/vsm-physical --config-file /etc/vsm/vsm.conf --log-file /var/log/vsm/vsm-physi
root      8307     2  0 Aug17 ?        00:00:44 [kworker/0:0]
root      9542     2  0 07:17 ?        00:00:00 [kworker/u2:0]
root      9681  7134  0 09:29 ?        00:00:00 sshd: cephuser [priv]
root      9685     2  0 09:29 ?        00:00:00 [kworker/0:2]
cephuser  9759  9681  0 09:29 ?        00:00:00 sshd: cephuser@pts/0
cephuser  9760  9759  0 09:29 pts/0    00:00:00 -bash
root      9782  9760  0 09:34 pts/0    00:00:00 su
root      9783  9782  0 09:34 pts/0    00:00:00 bash
root      9798     2  0 09:35 ?        00:00:00 [kworker/u2:2]
root      9866  9783  0 09:45 pts/0    00:00:00 ps -ef

The vsm agent and physical python processes are running, but no ceph osd process is running. That could easily explain why VSM has a problem with the data stores. Since I don't have a running system to compare with (and I'm not especially familiar with ceph in general), I don't even know which ceph processes are supposed to be running. I'll be digging into ceph docs today to try to figure out what's supposed to run on an OSD node and why it might not be. Any insight you have would be helpful.

[EDIT]: When I try to start /etc/init.d/ceph manually, it comes back with an error about missing /etc/ceph/ceph.conf. It appears, based on the log file, that vsm-agent is attempting to write out the ceph.conf file, but failing to do so, making it impossible for ceph to start.

Thanks,

John

Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

ywang19
Administrator

Our test shows below line in install_agent() in install.sh should be the culprit:

    $SSH $USER@$1 "if [ -r /etc/init/ceph-all.conf ] && [ ! -e /etc/init/ceph.conf ]; then sudo ln -s /etc/init/ceph-all.conf /etc/init/ceph.conf; sudo initctl reload-configuration; fi"

 

So far there is no an upstart conf for ceph but ceph-all, this line is trying to create a symbol link between ceph-all.conf and ceph.conf, but it seems not working as expected.

 

 

From: jcalcote [via vsm-discuss] [mailto:ml-node+[hidden email]]
Sent: Tuesday, August 18, 2015 11:19 PM
To: Wang, Yaguang
Subject: Re: Deploy appears successful, but no storage nodes visible in GUI.

 

That's probably true, and it's likely related to my earlier comment that there are no ceph processes running on the OSD nodes:

 
root@vsm-store1:/var/log/vsm# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Aug17 ?        00:00:01 /sbin/init
root         2     0  0 Aug17 ?        00:00:00 [kthreadd]
root         3     2  0 Aug17 ?        00:00:02 [ksoftirqd/0]
root         5     2  0 Aug17 ?        00:00:00 [kworker/0:0H]
root         7     2  0 Aug17 ?        00:00:02 [rcu_sched]
root         8     2  0 Aug17 ?        00:00:03 [rcuos/0]
root         9     2  0 Aug17 ?        00:00:00 [rcu_bh]
root        10     2  0 Aug17 ?        00:00:00 [rcuob/0]
root        11     2  0 Aug17 ?        00:00:00 [migration/0]
root        12     2  0 Aug17 ?        00:00:02 [watchdog/0]
root        13     2  0 Aug17 ?        00:00:00 [khelper]
root        14     2  0 Aug17 ?        00:00:00 [kdevtmpfs]
root        15     2  0 Aug17 ?        00:00:00 [netns]
root        16     2  0 Aug17 ?        00:00:00 [khungtaskd]
root        17     2  0 Aug17 ?        00:00:00 [writeback]
root        18     2  0 Aug17 ?        00:00:00 [ksmd]
root        19     2  0 Aug17 ?        00:00:00 [khugepaged]
root        20     2  0 Aug17 ?        00:00:00 [crypto]
root        21     2  0 Aug17 ?        00:00:00 [kintegrityd]
root        22     2  0 Aug17 ?        00:00:00 [bioset]
root        23     2  0 Aug17 ?        00:00:00 [kblockd]
root        24     2  0 Aug17 ?        00:00:00 [ata_sff]
root        25     2  0 Aug17 ?        00:00:00 [khubd]
root        26     2  0 Aug17 ?        00:00:00 [md]
root        27     2  0 Aug17 ?        00:00:00 [devfreq_wq]
root        31     2  0 Aug17 ?        00:00:00 [kswapd0]
root        32     2  0 Aug17 ?        00:00:00 [fsnotify_mark]
root        33     2  0 Aug17 ?        00:00:00 [ecryptfs-kthrea]
root        45     2  0 Aug17 ?        00:00:00 [kthrotld]
root        46     2  0 Aug17 ?        00:00:00 [acpi_thermal_pm]
root        47     2  0 Aug17 ?        00:00:00 [scsi_eh_0]
root        48     2  0 Aug17 ?        00:00:00 [scsi_tmf_0]
root        49     2  0 Aug17 ?        00:00:00 [scsi_eh_1]
root        50     2  0 Aug17 ?        00:00:00 [scsi_tmf_1]
root        52     2  0 Aug17 ?        00:00:00 [ipv6_addrconf]
root        72     2  0 Aug17 ?        00:00:00 [deferwq]
root        73     2  0 Aug17 ?        00:00:00 [charger_manager]
root       119     2  0 Aug17 ?        00:00:00 [mpt_poll_0]
root       120     2  0 Aug17 ?        00:00:00 [mpt/0]
root       121     2  0 Aug17 ?        00:00:00 [kpsmoused]
root       124     2  0 Aug17 ?        00:00:00 [scsi_eh_2]
root       125     2  0 Aug17 ?        00:00:00 [scsi_tmf_2]
root       137     2  0 Aug17 ?        00:00:05 [jbd2/sda1-8]
root       138     2  0 Aug17 ?        00:00:00 [ext4-rsv-conver]
root       284     1  0 Aug17 ?        00:00:00 upstart-udev-bridge --daemon
root       292     1  0 Aug17 ?        00:00:00 /lib/systemd/systemd-udevd --daemon
root       375     1  0 Aug17 ?        00:00:00 upstart-file-bridge --daemon
root       413     2  0 Aug17 ?        00:00:00 [ttm_swap]
message+   440     1  0 Aug17 ?        00:00:00 dbus-daemon --system --fork
syslog     482     1  0 Aug17 ?        00:00:00 rsyslogd
root       494     1  0 Aug17 ?        00:00:00 /lib/systemd/systemd-logind
root       527     1  0 Aug17 ?        00:00:00 upstart-socket-bridge --daemon
root       612     1  0 Aug17 ?        00:00:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root       748     1  0 Aug17 tty4     00:00:00 /sbin/getty -8 38400 tty4
root       751     1  0 Aug17 tty5     00:00:00 /sbin/getty -8 38400 tty5
root       756     1  0 Aug17 tty2     00:00:00 /sbin/getty -8 38400 tty2
root       757     1  0 Aug17 tty3     00:00:00 /sbin/getty -8 38400 tty3
root       759     1  0 Aug17 tty6     00:00:00 /sbin/getty -8 38400 tty6
root       793     1  0 Aug17 ?        00:00:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket
root       839     1  0 Aug17 ?        00:00:01 cron
daemon     840     1  0 Aug17 ?        00:00:00 atd
root       921     1  0 Aug17 tty1     00:00:00 /bin/login --
root       993     2  0 Aug17 ?        00:00:00 [kauditd]
cephuser  1040   921  0 Aug17 tty1     00:00:00 -bash
ntp       6418     1  0 Aug17 ?        00:00:05 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:112
root      7134     1  0 Aug17 ?        00:00:00 /usr/sbin/sshd -D
root      7213     1  1 Aug17 ?        00:13:17 python /usr/bin/vsm-agent --config-file /etc/vsm/vsm.conf --log-file /var/log/vsm/vsm-agent.lo
root      7269  7213  0 Aug17 ?        00:04:48 python /usr/bin/vsm-agent --config-file /etc/vsm/vsm.conf --log-file /var/log/vsm/vsm-agent.lo
root      7337     1  1 Aug17 ?        00:13:17 python /usr/bin/vsm-physical --config-file /etc/vsm/vsm.conf --log-file /var/log/vsm/vsm-physi
root      7407  7337  0 Aug17 ?        00:01:22 python /usr/bin/vsm-physical --config-file /etc/vsm/vsm.conf --log-file /var/log/vsm/vsm-physi
root      8307     2  0 Aug17 ?        00:00:44 [kworker/0:0]
root      9542     2  0 07:17 ?        00:00:00 [kworker/u2:0]
root      9681  7134  0 09:29 ?        00:00:00 sshd: cephuser [priv]
root      9685     2  0 09:29 ?        00:00:00 [kworker/0:2]
cephuser  9759  9681  0 09:29 ?        00:00:00 sshd: cephuser@pts/0
cephuser  9760  9759  0 09:29 pts/0    00:00:00 -bash
root      9782  9760  0 09:34 pts/0    00:00:00 su
root      9783  9782  0 09:34 pts/0    00:00:00 bash
root      9798     2  0 09:35 ?        00:00:00 [kworker/u2:2]
root      9866  9783  0 09:45 pts/0    00:00:00 ps -ef

The vsm agent and physical python processes are running, but no ceph osd process is running. That could easily explain why VSM has a problem with the data stores. Since I don't have a running system to compare with (and I'm not especially familiar with ceph in general), I don't even know which ceph processes are supposed to be running. I'll be digging into ceph docs today to try to figure out what's supposed to run on an OSD node and why it might not be. Any insight you have would be helpful.

Thanks,

John


If you reply to this email, your message will be added to the discussion below:

http://vsm-discuss.33411.n7.nabble.com/Deploy-appears-successful-but-no-storage-nodes-visible-in-GUI-tp93p107.html

To start a new topic under vsm-discuss, email [hidden email]
To unsubscribe from vsm-discuss, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
This post was updated on .
Hi Yaguang -

I do not believe that line you refer to is the root of the problem. You see, my OSD nodes do have a symlink /etc/init/ceph.conf -> /etc/init/ceph-all.conf. What's missing, however is /etc/ceph/ceph.conf, which is a different file with a different format. When you try to start ceph, you get this:

cephuser@vsm-store1:~$ sudo /etc/init.d/ceph start
/etc/init.d/ceph: ceph conf /etc/ceph/ceph.conf not found; system is not configured.

The line you refer to is working properly - the link is being created. The missing file, however, is the ceph configuration file, not the ceph system init file (sad that they're both named ceph.conf).

[Edit] There is no line in install.sh that refers to /etc/ceph/ceph.conf, and I ran dpkg-deb -c on each of the ceph packages in the vsm-dep-repo/vsm-dep-repo directory - the base ceph package owns the /etc/ceph directory, but there is no ceph.conf file laid down by any of those packages.

[Edit2] The manager code in /usr/local/lib/python2.7/dist-packages/vsm/agent/manager.py starts ceph using this command line: 'service ceph -c /etc/ceph/ceph.conf start mon.$1' - the ceph -c option specifies the config file, so I'm fairly certain you expect it to exist there. Additionally, there are other places in the agent python code base that refer to updating and modifying the ceph.conf file. The vsm processes are all running as root, so I doubt this is a rights issue.

[Edit3] The responsibility for managing the /etc/ceph/ceph.conf file appears to belong to vsm-agent - specifically, cephconfigparser.py, which is responsible for writing the initial ceph.conf file, and for syncing it from the vsm database. It appears that save_conf method in cephconfigparser.py is not getting called. The first thing it does is chown the contents of /etc/ceph as vsm:vsm (-R), but that directory (it's empty) is still owned by root:root.

John

Reply | Threaded
Open this post in threaded view
|

RE: Deploy appears successful, but no storage nodes visible in GUI.

jcalcote
Hi Yaguang,

Just letting you know - I downloaded the 2.0.0-beta1 package and redeployed. All is well - my ceph services are running on the OSD hosts now.

Thanks for all your help!
John