Page MenuHomePhabricator

rack/setup/install ms-be104[0-3].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking and setup of the 4 new ms-be hosts for eqiad, ms-be104[0-3].eqiad.wmnet.

Racking Proposal: These hosts have both 1G and 10G combined network cards. Eth0-1 are 10G, and eth3-4 are 1G. We also have a limited number of 10G racks. Presently, rows A/B/C will have racks 2/4/7 10G, and row D already has 10G in racks 2 and 7. The ms-be systems are currently deployed as follows:

Rack# of SystemsHostname(s)
A21ms-be1019
A53ms-be1028, ms-be1029, ms-be1030
B11ms-be1022
B21ms-be1020
B32ms-be1023, ms-be1031
B55ms-be1016, ms-be1017, ms-be1018, ms-be1032, ms-be1033
C85ms-be1024, ms-be1025, ms-be1034, ms-be1035, ms-be1036
D13ms-be1013, ms-be1014, ms-be1015
D75ms-be1026, ms-be1027, ms-be1037, ms-be1038, ms-be1039
D81ms-be1021

You can view all of the existing system layout data here.

It seems we want to put the new ms-be systems into racks that are going to be 10G after the refresh (ABC) or are already 10G (D2/D7). Also in discussing available space in projected 10G, we have space for 2 in A7, several in B2 and B7, 1 in C4, and several in D2 and D7. Rows C & D already house the majority of the ms-be systems, so best to avoid those rows if we can. Row D already has 8 10G racked ms-be systems. Rows A and B are very light in ms-be systems. If we can acceptably run these at 1G until the refresh, I (@RobH) recommend putting more of the ms-be systems there. A2 is full and has no more power overhead.

Proposal is to rack the 4 new systems as follows:

hostnamerack
ms-be1040A7
ms-be1041A7
ms-be1042C4
ms-be1043C7

Setup checklists:

ms-be1040:

  • - receive in system on procurement task T187383
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, internal vlan)
    • end on-site specific steps
  • - production dns entries added (internal subnet)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

ms-be1041:

  • - receive in system on procurement task T187383
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, internal vlan)
    • end on-site specific steps
  • - production dns entries added (internal subnet)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

ms-be1042:

  • - receive in system on procurement task T187383
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, internal vlan)
    • end on-site specific steps
  • - production dns entries added (internal subnet)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

ms-be1043:

  • - receive in system on procurement task T187383
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, internal vlan)
    • end on-site specific steps
  • - production dns entries added (internal subnet)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

Related Objects

StatusSubtypeAssignedTask
Resolvedfgiunchedi

Event Timeline

RobH triaged this task as Medium priority.Mar 19 2018, 6:37 PM
RobH created this task.

I'd like to get @fgiunchedi's sign off on our racking proposal, since it affects when the new systems will use their 10G interfaces versus 1G interfaces.

I'd like to get @fgiunchedi's sign off on our racking proposal, since it affects when the new systems will use their 10G interfaces versus 1G interfaces.

Plan looks good to me, thanks for the great overview of racking situation!

I didn't realize we were up to five systems per rack in B/C/D, our failure model for swift is per-row anyways but it would be nice to diversify as much as possible per-rack too. How many 10G racks per row are we going to get after the refresh?

A note on the partitioning since it changed from the last batch of dell ms-be: we're standardizing on sda and sdb being the SSD disks and the first two in boot order, with the spinning disks beginning at sdc. This makes the partman recipe the same across hp and dell in netboot.cfg.

Mentioned in SAL (#wikimedia-operations) [2018-04-25T08:23:50Z] <godog> eqiad-prod: add ms-be104[0-3] with minimal weight - T190081

Mentioned in SAL (#wikimedia-operations) [2018-04-26T09:50:30Z] <godog> eqiad-prod: more weight to ms-be104[0-3] for container/account - T190081

Mentioned in SAL (#wikimedia-operations) [2018-04-27T07:32:58Z] <godog> swift eqiad-prod more weight to ms-be104[0-3] - T190081

Mentioned in SAL (#wikimedia-operations) [2018-05-03T08:08:02Z] <godog> eqiad-prod: more weight to ms-be104[0-3] - T190081

Mentioned in SAL (#wikimedia-operations) [2018-05-07T08:30:04Z] <godog> eqiad-prod: more weight to ms-be104[0-3] - T190081

Rebalance has completed, resolving