Page MenuHomePhabricator

SREGroup
ActivePublic

Members (31)

Watchers (13)

Recent Activity

Today

MatthewVernon created T378267: db1234 crashed.
Sat, Oct 26, 4:32 PM · SRE, DBA
Peachey88 added projects to T376400: Redesign wikitech-static: wikitech.wikimedia.org, SRE.
Sat, Oct 26, 10:52 AM · SRE, wikitech.wikimedia.org
ops-monitoring-bot created T378255: Degraded RAID on wikikube-worker2068.
Sat, Oct 26, 4:52 AM · SRE, DC-Ops, ops-codfw
phaultfinder updated the task description for T377942: PDU sensor over limit.
Sat, Oct 26, 12:54 AM · SRE, DC-Ops, ops-eqiad

Yesterday

Papaul moved T378201: PowerSupplyFailure from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Fri, Oct 25, 10:02 PM · SRE, DC-Ops, ops-codfw
Dzahn added a comment to T378069: Create a mail address for Russian Wikipedia oversighters.

If you don't mind that they are public, can you please paste email addresses here who should become list admin?

Fri, Oct 25, 9:27 PM · SRE, Wikimedia-Mailing-lists
Dwisehaupt added a comment to T377381: Frack eqiad network upgrade: design, installation and configuration.

@cmooney @Jclark-ctr There has been a request to push our maintenance week back one week if possible. Would you all be ok with doing the work on the following week (Nov 11-15)?

Fri, Oct 25, 7:49 PM · DC-Ops, ops-eqiad, fundraising-tech-ops, netops, Infrastructure-Foundations, SRE
KFrancis added a comment to T378182: Grant Access to ldap/nda for Deepesha Burse WMDE.

Please provide Deepesha Burse's email address and I will process the NDA. Thanks!

Fri, Oct 25, 6:52 PM · SRE, LDAP-Access-Requests
KFrancis added a comment to T378082: Requesting access to 'deployment' for 'Joely Rooke WMDE'.

This request first requires signing an NDA with Legal - tagging @KFrancis as per the access request process. Thanks!

Fri, Oct 25, 6:33 PM · SRE, SRE-Access-Requests
Jhancock.wm closed T371984: Q1:rack/setup/install backup2012, a subtask of T376892: Expand media backup storage available space to 960 TB per datacenter, as Resolved.
Fri, Oct 25, 6:32 PM · Patch-For-Review, media-backups, Data-Persistence-Backup, SRE
Jhancock.wm closed T371984: Q1:rack/setup/install backup2012 as Resolved.

@jcrespo hey sorry about that. got this one conflated with another order. it's ready to go now.

Fri, Oct 25, 6:32 PM · SRE, Data-Persistence, Data-Persistence-Backup, ops-codfw, DC-Ops
Maintenance_bot added a project to T378201: PowerSupplyFailure: SRE.
Fri, Oct 25, 6:29 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm updated the task description for T371984: Q1:rack/setup/install backup2012.
Fri, Oct 25, 6:29 PM · SRE, Data-Persistence, Data-Persistence-Backup, ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T371984: Q1:rack/setup/install backup2012.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host backup2012.codfw.wmnet with OS bookworm completed:

  • backup2012 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410251754_jhancock_553835_backup2012.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Fri, Oct 25, 6:28 PM · SRE, Data-Persistence, Data-Persistence-Backup, ops-codfw, DC-Ops
Aklapper merged T378181: Grant Access to ldap/wmde for Deepesha Burse WMDE into T378182: Grant Access to ldap/nda for Deepesha Burse WMDE.
Fri, Oct 25, 4:59 PM · SRE, LDAP-Access-Requests
Aklapper merged task T378181: Grant Access to ldap/wmde for Deepesha Burse WMDE into T378182: Grant Access to ldap/nda for Deepesha Burse WMDE.
Fri, Oct 25, 4:59 PM · SRE, LDAP-Access-Requests
ops-monitoring-bot added a comment to T371984: Q1:rack/setup/install backup2012.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host backup2012.codfw.wmnet with OS bookworm

Fri, Oct 25, 4:42 PM · SRE, Data-Persistence, Data-Persistence-Backup, ops-codfw, DC-Ops
Jhancock.wm updated the task description for T371984: Q1:rack/setup/install backup2012.
Fri, Oct 25, 4:40 PM · SRE, Data-Persistence, Data-Persistence-Backup, ops-codfw, DC-Ops
xcollazo changed the status of T377773: Give Dumps 1.0 access to gmodena from Stalled to Open.
Fri, Oct 25, 4:28 PM · SRE, SRE-Access-Requests
Ahoelzl added a comment to T377773: Give Dumps 1.0 access to gmodena.

Approved.

Fri, Oct 25, 4:27 PM · SRE, SRE-Access-Requests
hnowlan updated subscribers of T378182: Grant Access to ldap/nda for Deepesha Burse WMDE.

This access requires signing an NDA, adding @KFrancis as per access request documentation. Thanks!

Fri, Oct 25, 3:50 PM · SRE, LDAP-Access-Requests
hnowlan moved T378082: Requesting access to 'deployment' for 'Joely Rooke WMDE' from Awaiting User Input to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
Fri, Oct 25, 3:48 PM · SRE, SRE-Access-Requests
hnowlan moved T378182: Grant Access to ldap/nda for Deepesha Burse WMDE from Backlog to NDA Pending on the LDAP-Access-Requests board.
Fri, Oct 25, 3:48 PM · SRE, LDAP-Access-Requests
hnowlan closed T378181: Grant Access to ldap/wmde for Deepesha Burse WMDE as Invalid.

closing as dupe, following up in T378181

Fri, Oct 25, 3:40 PM · SRE, LDAP-Access-Requests
hnowlan updated subscribers of T378082: Requesting access to 'deployment' for 'Joely Rooke WMDE'.

This request first requires signing an NDA with Legal - tagging @KFrancis as per the access request process. Thanks!

Fri, Oct 25, 3:37 PM · SRE, SRE-Access-Requests
hnowlan changed the status of T377773: Give Dumps 1.0 access to gmodena from Open to Stalled.
Fri, Oct 25, 3:36 PM · SRE, SRE-Access-Requests
hnowlan moved T378082: Requesting access to 'deployment' for 'Joely Rooke WMDE' from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Fri, Oct 25, 3:35 PM · SRE, SRE-Access-Requests
aborrero updated the task description for T378192: openstack: wmf sink: extend it to support IPv6.
Fri, Oct 25, 3:25 PM · User-aborrero, Cloud-VPS, cloud-services-team
aborrero created T378192: openstack: wmf sink: extend it to support IPv6.
Fri, Oct 25, 3:19 PM · User-aborrero, Cloud-VPS, cloud-services-team
Jhancock.wm closed T378171: decommission ganeti2011/ganeti2012 as Resolved.
Fri, Oct 25, 3:03 PM · ops-codfw, DC-Ops, SRE, decommission-hardware
thcipriani moved T378076: Parsercache issues in codfw causing large-scale outage from Untriaged to Oct 2024 on the Wikimedia-production-error board.
Fri, Oct 25, 2:52 PM · SRE, DBA, Wikimedia-production-error
Tsevener created T378187: Access issue for golson-wmf.
Fri, Oct 25, 2:37 PM · Gerrit
Jclark-ctr renamed T378185: Q2:rack/setup/install wikikube-worker12[43-58] from Q#:rack/setup/install X to Q2:rack/setup/install wikikube-worker12[43-58].
Fri, Oct 25, 2:35 PM · SRE, ops-eqiad, DC-Ops
Maintenance_bot added a project to T378185: Q2:rack/setup/install wikikube-worker12[43-58]: SRE.
Fri, Oct 25, 2:31 PM · SRE, ops-eqiad, DC-Ops
ssingh added a comment to T378184: Rate Limited for data science project.

Hi: For some more context, Matt reached out to us on IRC (#wikimedia-analytics) and I asked them to file a task here.

Fri, Oct 25, 2:09 PM · Research, Data-Engineering
ssingh triaged T378184: Rate Limited for data science project as Medium priority.
Fri, Oct 25, 2:07 PM · Research, Data-Engineering
ssingh updated Other Assignee for T378184: Rate Limited for data science project, removed: S-Gatekeeper.
Fri, Oct 25, 2:06 PM · Research, Data-Engineering
Papaul moved T378171: decommission ganeti2011/ganeti2012 from Backlog to Decommission on the ops-codfw board.
Fri, Oct 25, 1:49 PM · DC-Ops, ops-codfw, SRE, decommission-hardware
Ottomata moved T291645: Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL from Backlog to Stream Data Products on the Event-Platform board.
Fri, Oct 25, 1:33 PM · Observability-Logging, Analytics, Data-Engineering, Event-Platform, Wikimedia-Logstash, SRE
Ottomata added a comment to T370424: Streamline Data Platform access approvals for WMF staff.

Mid-term the approval management will move to Bitu/idm.wikimedia.org

COOL!

Fri, Oct 25, 1:05 PM · Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th), Data-Platform-SRE, SRE
MBH updated subscribers of T378069: Create a mail address for Russian Wikipedia oversighters.

@Dzahn ruwiki's oversighters are @DR @Leloiandudu @Q-bit-array and maybe @Tatewaki (I'm not sure this is his account, but he has this username in wiki). You can appoint them as admins of the list.

Fri, Oct 25, 12:52 PM · SRE, Wikimedia-Mailing-lists
Maintenance_bot removed a project from T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018: Patch-For-Review.
Fri, Oct 25, 12:31 PM · Ganeti, Infrastructure-Foundations, SRE
gerritbot added a project to T309724: SSH host key verification failures in Ganeti intra node SSH calls after Bullseye update: Patch-For-Review.
Fri, Oct 25, 12:24 PM · Patch-For-Review, Ganeti, Infrastructure-Foundations, SRE
gerritbot added a comment to T309724: SSH host key verification failures in Ganeti intra node SSH calls after Bullseye update.

Change #1083165 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] ganeti-test: Enable puppet-managed /var/lib/ganeti/known_hosts for the role

https://gerrit.wikimedia.org/r/1083165

Fri, Oct 25, 12:24 PM · Patch-For-Review, Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018.
Fri, Oct 25, 12:16 PM · Ganeti, Infrastructure-Foundations, SRE
darthmon_wmde merged task T378181: Grant Access to ldap/wmde for Deepesha Burse WMDE into T378182: Grant Access to ldap/nda for Deepesha Burse WMDE.
Fri, Oct 25, 12:07 PM · SRE, LDAP-Access-Requests
darthmon_wmde added a comment to T378181: Grant Access to ldap/wmde for Deepesha Burse WMDE.

hi there, hereby I am backing up this request as @Deepesha_WMDE 's team lead for Wikibase Suite team at WMDE

Fri, Oct 25, 12:07 PM · SRE, LDAP-Access-Requests
darthmon_wmde reopened T378181: Grant Access to ldap/wmde for Deepesha Burse WMDE as "Open".
Fri, Oct 25, 12:06 PM · SRE, LDAP-Access-Requests
darthmon_wmde merged T378181: Grant Access to ldap/wmde for Deepesha Burse WMDE into T378182: Grant Access to ldap/nda for Deepesha Burse WMDE.
Fri, Oct 25, 12:06 PM · SRE, LDAP-Access-Requests
darthmon_wmde added a comment to T378182: Grant Access to ldap/nda for Deepesha Burse WMDE.

hi there, hereby I am backing up this request as @Deepesha_WMDE 's team lead for Wikibase Suite team at WMDE

Fri, Oct 25, 12:03 PM · SRE, LDAP-Access-Requests