VMware vCenter 5.1 下重装SQL Server 2008 R2数据库遇到的问题
vCenter里的SQL Server数据库满了,已经到了10GB。使用一键安装的vCenter包里面集成的是SQL Server R2 express版本,express版本的数据库都有限制,比如只能使用1个CPU内核,数据文件大小也有限制(2008 R2前是4G,R2提到了10G)。同事在安装VC的时候没有注意这一点,直接用在了生产虚拟化平台上……
思路是先备份数据库–重装为企业版SQL Server–恢复数据库。
到恢复数据库为止一切顺利,VC的服务却启动失败。查看日志: C:\ProgramData\VMware\VMware VirtualCenter\Logs\vpxd-xx.log
2014-06-09T17:51:26.208+08:00 [04252 error ‘Default’] SSLStreamImpl::DoClientHandshake (000000000bb5d600) SSL_connect failed. Dumping SSL error queue:
2014-06-09T17:51:26.217+08:00 [04252 error ‘Default’] [0] error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
2014-06-09T17:51:26.220+08:00 [04252 error ‘HttpConnectionPool-000001′] [ConnectComplete] Connect failed to <cs p:000000000ac28180, TCP:win-h3s7ieeoe3h:7444>; cnx: (null), error: class Vmacore::Ssl::SSLVerifyException(SSL Exception: Verification parameters:
–> PeerThumbprint: E2:7B:BB:24:87:2A:0C:C1:03:BC:3E:75:B3:0B:79:A7:C4:2B:97:6A
–> ExpectedThumbprint:
–> ExpectedPeerName: win-h3s7ieeoe3h
–> The remote host certificate has these problems:
–>
–> * A certificate in the host’s chain is based on an untrusted root.
–>
–> * Host name does not match the subject name(s) in certificate.)
2014-06-09T17:51:26.220+08:00 [00288 error ‘[SSO][SsoFactory_CreateFacade]’] Unable to create SSO facade: SSL Exception: Verification parameters:
–> PeerThumbprint: E2:7B:BB:24:87:2A:0C:C1:03:BC:3E:75:B3:0B:79:A7:C4:2B:97:6A
–> ExpectedThumbprint:
–> ExpectedPeerName: win-h3s7ieeoe3h
–> The remote host certificate has these problems:
–>
–> * A certificate in the host’s chain is based on an untrusted root.
–>
–> * Host name does not match the subject name(s) in certificate..
2014-06-09T17:51:26.221+08:00 [00288 error ‘vpxdvpxdMain’] [Vpxd::ServerApp::Init] Init failed: Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)
–> Backtrace:
–> backtrace[00] rip 000000018018a8ca
–> backtrace[01] rip 0000000180102f28
–> backtrace[02] rip 000000018010423e
–> backtrace[03] rip 000000018008e00b
–> backtrace[04] rip 00000000003d5c2c
–> backtrace[05] rip 00000000003f6512
–> backtrace[06] rip 00000001409e0701
–> backtrace[07] rip 00000001409da51c
–> backtrace[08] rip 0000000140bfc92b
–> backtrace[09] rip 000007fefeaca82d
–> backtrace[10] rip 0000000077a8f56d
–> backtrace[11] rip 0000000077cc3281
–>
2014-06-09T17:51:26.221+08:00 [00288 warning ‘VpxProfiler’] ServerApp::Init [TotalTime] took 4105 ms
2014-06-09T17:51:26.221+08:00 [00288 error ‘Default’] Failed to intialize VMware VirtualCenter. Shutting down…
2014-06-09T17:51:26.221+08:00 [00288 info ‘vpxdvpxdSupportManager’] Wrote uptime information
2014-06-09T17:51:34.143+08:00 [01580 warning ‘VpxProfiler’ opID=SWI-701d9c61] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2014-06-09T17:51:46.209+08:00 [04924 warning ‘VpxProfiler’ opID=SWI-5e2d21d2] VpxUtil_InvokeWithOpId [TotalTime] took 12001 ms
2014-06-09T17:51:58.275+08:00 [04908 warning ‘VpxProfiler’ opID=SWI-202bab02] VpxUtil_InvokeWithOpId [TotalTime] took 12001 ms
2014-06-09T17:52:10.338+08:00 [04828 warning ‘VpxProfiler’ opID=SWI-d16eb4ef] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2014-06-09T17:52:22.312+08:00 [04908 warning ‘VpxProfiler’ opID=SWI-4675184] VpxUtil_InvokeWithOpId [TotalTime] took 12001 ms
2014-06-09T17:52:22.517+08:00 [00288 info ‘Default’] Forcing shutdown of VMware VirtualCenter now
PS:请自动忽略那丑陋的主机名「win-h3s7ieeoe3h」吧,都是装VC的那个家伙留下的地雷……
根据日志报错,好像是SSL 证书验证的问题,我刚改过VC的配置文件vpxd.conf,赶紧把文件里面的FQDN改回去,再启动VC还是报错:
2014-06-09T18:56:30.897+08:00 [02400 error ‘HttpConnectionPool-000001′] [ConnectComplete] Connect failed to <cs p:000000000a9635d0, TCP:win-h3s7ieeoe3h.core.lhjdomain:7444>; cnx: (null), error: class Vmacore::SystemException(由于目标计算机积极拒绝,无法连接。 )
2014-06-09T18:56:30.898+08:00 [04816 error ‘[SSO][SsoCertificateManagerImpl]’] [CreateAdminSsoServiceContent] Failure while trying to connect to SSO Admin server: 由于目标计算机积极拒绝,无法连接。 . Will retry in 10 seconds.
2014-06-09T18:56:35.423+08:00 [02644 warning ‘VpxProfiler’ opID=SWI-631f293f] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2014-06-09T18:56:40.885+08:00 [04952 info ‘Default’] Thread attached
2014-06-09T18:56:41.186+08:00 [04816 error ‘Default’] Found dangling SSL error: [0] error:00000001:lib(0):func(0):reason(1)
2014-06-09T18:56:41.186+08:00 [04816 error ‘Default’] Found dangling SSL error: [1] error:00000001:lib(0):func(0):reason(1)
2014-06-09T18:56:41.186+08:00 [04816 error ‘[SSO][SsoFactory_CreateFacade]’] Unable to create SSO facade: vmodl.fault.SystemError.
2014-06-09T18:56:41.187+08:00 [04816 warning ‘VpxProfiler’] Vpxd::ServerApp::Init [Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)] took 11355 ms
2014-06-09T18:56:41.187+08:00 [04816 error ‘vpxdvpxdMain’] [Vpxd::ServerApp::Init] Init failed: Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)
–> Backtrace:
……
报错也很明显,VC连接不上SSO,SSO是vCenter 5.0开始增加的一个组件,用于单点登录验证。估计是SSO不正常,查看SSO服务状态是“已启动”,翻日志,在C:\Program Files\VMware\Infrastructure\SSOServer\logs文件夹下面,catalina.2014-06-09.log文件是SSO服务的启动日志:
09-Jun-2014 18:58:35.198 SEVERE [pool-3-thread-1] org.apache.catalina.core.StandardContext.startInternal Error listenerStart
09-Jun-2014 18:58:35.199 SEVERE [pool-3-thread-1] org.apache.catalina.core.StandardContext.startInternal Context [/ims] startup failed due to previous errors
09-Jun-2014 18:58:35.202 INFO [pool-3-thread-1] com.sun.xml.ws.transport.http.servlet.WSServletContextListener.contextDestroyed WSSERVLET13: JAX-WS context listener destroyed
09-Jun-2014 18:58:35.224 SEVERE [pool-3-thread-1] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/ims] appears to have started a thread named [Thread-3] but has failed to stop it. This is very likely to create a memory leak.
09-Jun-2014 18:58:35.225 SEVERE [pool-3-thread-1] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/ims] appears to have started a thread named [Thread-4] but has failed to stop it. This is very likely to create a memory leak.
09-Jun-2014 18:58:35.226 SEVERE [pool-3-thread-1] org.apache.catalina.loader.WebappClassLoader.checkThreadLocalMapForLeaks The web application [/ims] created a ThreadLocal with key of type [com.sun.xml.ws.api.streaming.XMLStreamReaderFactory$Default$1] (value [com.sun.xml.ws.api.streaming.XMLStreamReaderFactory$Default$1@287efdd8]) and a value of type [com.sun.xml.internal.stream.XMLInputFactoryImpl] (value [com.sun.xml.internal.stream.XMLInputFactoryImpl@294b84ad]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
先是报了2个java的错误,然后提示“memory leak”,没有指向很明确的有用信息。
再检查其它SSO的日志文件 localhost_access_log.2014-06-09.txt:
210.36.17.46 – – [09/Jun/2014:17:55:12 +0800] “GET /sso-adminserver/sdk HTTP/1.1″ 500 593
210.36.17.46 – – [09/Jun/2014:17:59:54 +0800] “POST /lookupservice/sdk HTTP/1.1″ 404 1078
……
一直报404,找不到文件,SSO的服务启动失败,那肯定报错了。
最后在日志文件localhost.2014-06-09.log 中总算找到了指向明确的信息:
Caused by: org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [com.rsa.ims.components.spring.SecurityAwareClassPathXmlApplicationContext]: Constructor threw exception; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name ‘DatabaseMetadataBean’ defined in class path resource [ims-components-common.xml]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [com.rsa.ims.common.DatabaseMetadataBean]: Constructor threw exception; nested exception is org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (用户 ‘RSA_USER’ 登录失败。)
at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:141)
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:105)
at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:278)
… 27 more
出现了很多java报错,都是报用户 RSA_USER 登录失败。RSA_USER是SSO连接SQL Server数据库时使用的用户,手工用这个用户连一下确实连接失败。
问题就出在这里:RSA_USER用户在RSA数据库中是存在的,但是在SQL中不存在,也就是说恢复SQL Server中的数据库后在SQL层面的用户不会被恢复,需要手工重新创建:
- Run the following sql query against the SSO database to show all unmapped users of the database: sp_change_users_login report
- Now create a new SQL User (SQL Authentication) at the SQL Server level not at database level. Name this user RSA_USER and use the same password the database RSA_USER has. Set the default database to RSA (the SSO database).
- Run the following sql query against the SSO database to map the user RSA_USER (server level) to the RSA_USER (database level): sp_change_users_login ‘update_one’, ‘RSA_USER’, ‘RSA_USER’
- To check if things worked out, rerun the query against the SSO database. The RSA_USER should not show up: sp_change_users_login report
- At SQL Server level create the SQL user RSA_DBA and be sure to use the same password you previously used. (Well, you can always reset it later on).
- After the RSA_DBA user has been created, open the properties of the user and now set this user as the owner of the SSO database
按照上面的步骤修复后SSO即可正常启动,VC当然也可以启动了。
参考资料:
Updating the vCenter Single Sign-On server database configuration
How to move VMware Single Sign On (SSO) database (重点的一篇)