1 頁 (共 1 頁)

[vSAN] 如何定位vSAN集群中联想服务器上硬盘的物理位置

發表於 : 2023-02-18, 11:54
Lexaul
https://blog.51cto.com/wuweijava/4224380

问题描述
今天客户的vSAN集群报警,有一台联想服务器的硬盘故障,但是vSAN上并不能显示这块硬盘在服务器的哪个槽位中,只是显示了硬盘的nna号,例如naa.5000039908196f09


定位方法
一. 使用 vSphere Web Client 启用或禁用 vSAN 磁盘组中的设备上的定位器 LED
导航到 vSAN 群集。
单击配置选项卡。
在vSAN下,单击磁盘管理。
在页面底部,从列表中选择一个或多个存储设备。
单击“所有操作”菜单,然后选择打开定位器 LED 或关闭定位器 LED。
没有成功,不是所有的服务器都能完美兼容该功能,包括品牌服务器
二. 使用命令行工具定位
 https://datacentersupport.lenovo.com/us ... hinksystem

执行下面的命令,获取主机上的所有硬盘信息

代碼: 選擇全部

localcli storage core device list

。。。此处有删节

naa.5000039908196f09:
   Display Name: Local LENOVO Disk (naa.5000039908196f09)
   Has Settable Display Name: true
   Size: 1144641
   Device Type: Direct-Access 
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.5000039908196f09
   Vendor: LENOVO  
   Model: AL15SEB120N     
   Revision: TB52
   SCSI Level: 6
   Is Pseudo: false
   Status: on
   Is RDM Capable: true
   Is Local: true
   Is Removable: false
   Is SSD: false
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: unknown
   Attached Filters: 
   VAAI Status: unsupported
   Other UIDs: vml.02000000005000039908196f09414c31355345
   Is Shared Clusterwide: false
   Is SAS: true
   Is USB: false
   Is Boot Device: false
   Device Max Queue Depth: 254
   No of outstanding IOs with competing worlds: 32
   Drive Type: unknown
   RAID Level: unknown
   Number of Physical Drives: unknown
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false

安装阵列卡命令行工具
 https://download.lenovo.com/servers/mig ... x86-64.tgz

下载后解压缩,将对应版本的xml和vib文件上传到主机
执行下面的命令,数字签名可能过期,需要加上 --no-sig-check参数

代碼: 選擇全部

esxcli software vib install -v=file://path/vmware-storcli.vib --no-sig-check 
1.
创建软链接

代碼: 選擇全部

ln -s /opt/lsi/storcli/storcli /sbin/storcli
1.
运行阵列卡管理命令,获取硬盘信息

代碼: 選擇全部

storcli /call/eall/sall show all

。。。此处有删节

Drive /c0/e0/s11 :
================

-----------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model            Sp 
-----------------------------------------------------------------------
0:11      5 JBOD  -  1.090 TB SAS  HDD -   -  512B AL15SEB120N      -  
-----------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No|DID-Device ID|DG-DriveGroup
UGood-Unconfigured Good|UBad-Unconfigured Bad|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition


Drive /c0/e0/s11 - Detailed Information :
=======================================

Drive /c0/e0/s11 State :
======================
Shield Counter = N/A
Media Error Count = N/A
Other Error Count = N/A
Predictive Failure Count = N/A
S.M.A.R.T alert flagged by drive = N/A


Drive /c0/e0/s11 Device attributes :
==================================
Manufacturer Id = LENOVO  
Model Number = AL15SEB120N     
NAND Vendor = NA
SN = Y860A00BFHRF
WWN = 5000039908196F08
Firmware Revision = TB52
Raw size = 1.090 TB [0x8bba0caf Sectors]
Coerced size = 1.090 TB [0x8bba0caf Sectors]
Non Coerced size = 1.090 TB [0x8bba0caf Sectors]
Device Speed = 12.0Gb/s
Link Speed = 12.0Gb/s
Sector Size = 512B
Config ID = NA
Number of Blocks = 2344225967
Connector Name = C0   x1


Drive /c0/e0/s11 Policies/Settings :
==================================
Enclosure position = 0
Connected Port Number = 2(path0) 
Sequence Number = 0
Commissioned Spare = No
Emergency Spare = No
Last Predictive Failure Event Sequence Number = N/A
Successful diagnostics completion on = N/A
SED Capable = N/A
SED Enabled = N/A
Secured = N/A
Needs EKM Attention = N/A
PI Eligible = N/A
Certified = N/A
Wide Port Capable = N/A
Multipath = No

Port Information :
================

-----------------------------------------
Port Status Linkspeed SAS address        
-----------------------------------------
   0 Active 12.0Gb/s  0x5000039908196f0a 
-----------------------------------------


Inquiry Data = 
00 00 06 12 9f 01 10 02 4c 45 4e 4f 56 4f 20 20 
41 4c 31 35 53 45 42 31 32 30 4e 20 20 20 20 20 
54 42 35 32 54 36 36 33 4c 56 53 41 54 42 35 32 
54 42 35 32 54 42 35 32 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 30 30 30 31 31 38 33 31 35 00 30 30 30 31 
32 32 30 30 59 4b 30 31 36 20 20 20 20 20 4e 33 
对比两个命令行工具输出

上图可以看到,WWN和naa并不是完全相同的,这里我们使用naa查询对应的WWN,然后定位物理插槽

代碼: 選擇全部

Drive /c0/e0/s11 Device attributes :
==================================
Manufacturer Id = LENOVO
Model Number = AL15SEB120N
NAND Vendor = NA
SN = Y860A00BFHRF
WWN = 5000039908196F08


后记
主机进入维护模式
从磁盘组中删除故障磁盘
根据确定的物理槽位编号,拔出坏硬盘,换上新硬盘
将新硬盘添加到磁盘组中
但是。。。将新硬盘添加到磁盘组时,发现故障硬盘还显示在可用磁盘里。。。谁知道为什么,如何操作。。。T_T
-----------------------------------
©著作权归作者所有:来自51CTO博客作者wuweijava的原创作品,请联系作者获取转载授权,否则将追究法律责任
如何定位vSAN集群中联想服务器上硬盘的物理位置
https://blog.51cto.com/wuweijava/4224380