Page MenuHomePhabricator

Upgrade Archiva (meitnerium) to Debian Stretch
Closed, ResolvedPublic13 Estimated Story Points

Description

Archiva runs on meitnerium (Ganeti) and currently runs Debian Jessie. A while ago while trying to upgrade it to openjdk-8 I hit a limitation of the current version (2.0.0-1), that only supports openjdk-7.

Debian Stretch defaults to openjdk-8 so a Archiva version bump would be needed. We maintain the archiva package in:

https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/archiva

Last archiva release seems to be 2.2.3: https://github.com/apache/archiva/releases

Ideally we could do the following:

  1. create another Ganeti VM for archiva 2.2.3 + Stretch and test it.
  2. when ready, flip archiva.wikimedia.org. to the new VM, keeping meitnerium as backup in case something unexpected doesn't work.
  3. decom meitnerium

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+20 -15
operations/puppetproduction+1 -10
operations/puppetproduction+1 -1
operations/puppetproduction+3 -1
operations/puppetproduction+12 -11
operations/dnsmaster+1 -3
operations/dnsmaster+1 -1
operations/puppetproduction+0 -3
operations/puppetproduction+2 -2
operations/puppetproduction+16 -9
operations/puppetproduction+3 -11
operations/dnsmaster+2 -0
operations/puppetproduction+9 -1
operations/puppetproduction+2 -0
operations/debs/archivadebian+8 -0
operations/puppetproduction+1 -1
operations/puppetproduction+21 -1
operations/puppetproduction+14 -6
operations/puppetproduction+9 -0
operations/puppetproduction+1 -1
operations/puppetproduction+20 -5
operations/debs/archivadebian+62 -1 K
operations/debs/archivamaster+977 -1 K
operations/puppetproduction+6 -4
operations/puppetproduction+40 -19
operations/puppetproduction+2 -4
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Looking at meitnerium resources it does not require much so yes we can create a new ganeti VM. Wanna file a task using the link in https://phabricator.wikimedia.org/project/profile/1234/ for the paper trail ? In the meantime I can start doing the actual VM provisioning.

sudo gnt-instance list -o name,be/memory,be/vcpus,disk.count,disk.sizes,nic.count meitnerium
Instance                 ConfigMaxMem ConfigVCPUs Disks Disk_sizes   NICs
meitnerium.wikimedia.org         4.0G           4     2 51200,102400    1

Change 449755 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/archiva@master] Import upstream version 2.2.3

https://gerrit.wikimedia.org/r/449755

Change 449950 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::archiva: depend on default-jdk

https://gerrit.wikimedia.org/r/449950

Change 449950 merged by Elukey:
[operations/puppet@production] profile::archiva: depend on default-jdk

https://gerrit.wikimedia.org/r/449950

Change 449974 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::archiva: move proxy settings to a different profile

https://gerrit.wikimedia.org/r/449974

Change 449974 merged by Elukey:
[operations/puppet@production] profile::archiva: move proxy settings to a different profile

https://gerrit.wikimedia.org/r/449974

Change 449997 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva::proxy: include acme config only when needed

https://gerrit.wikimedia.org/r/449997

Change 449997 merged by Elukey:
[operations/puppet@production] archiva::proxy: include acme config only when needed

https://gerrit.wikimedia.org/r/449997

Change 449755 merged by Elukey:
[operations/debs/archiva@master] Import upstream version 2.2.3

https://gerrit.wikimedia.org/r/449755

Change 450081 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/archiva@debian] Release 2.2.3-1

https://gerrit.wikimedia.org/r/450081

Change 450081 merged by Elukey:
[operations/debs/archiva@debian] Release 2.2.3-1

https://gerrit.wikimedia.org/r/450081

Summary of what has been done and the next steps:

  • a new VM called archiva1001.wikimedia.org is bootstrapped in T200895
  • the archiva debian package repository has been upgraded to 2.2.3 (latest upstream), and the code tested in labs on archiva.eqiad.wmflabs.
  • the debian package has not been uploaded to stretch-wikimedia yet.

When archiva1001 will be ready, we'll just apply the role(archiva) to it, and prepare the host with all the settings needed (some manual configuration work is required IIUC, especially for LDAP). Once we'll be ready to go, we'll flip the DNS A record for archiva.wikimedia.org to the archiva1001.wikimedia.org's IP.

The rollback procedure (in case of issues) will simply be a rollback of the DNS A record.

Eventually meitnerium will be decommed.

Change 454497 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::archiva::proxy: protect first run with firewall rules

https://gerrit.wikimedia.org/r/454497

Change 454497 merged by Elukey:
[operations/puppet@production] profile::archiva::proxy: protect first run with firewall rules

https://gerrit.wikimedia.org/r/454497

Change 454502 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add component/archiva to stretch-wikimedia

https://gerrit.wikimedia.org/r/454502

A couple of important notes found this morning while checking puppet and archiva:

  • archiva::proxy uses letsencrypt::cert::integrated, so it means that the host is actively challenging the Let's Encrypt servers via ACME protocol to establish that it has control of the domain. This is currently done on meitnerium for archvia.wikimedia.org for example. Deploying archiva on a new host, like archiva1001, must be done using do_acme: false in hiera, to force the class to use a self signed certificate rather than contacting let's encrypt via ACME.
  • During the first run, archiva asks to the user to configure an admin account, so some solution must be put in place to avoid untrusted users to mess with archiva1001's setup. https://gerrit.wikimedia.org/r/454497 is an attempt to limit port 80/443 via firewall rules, that should do the job.

Change 454508 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva: set apt component when running on Debian Stretch

https://gerrit.wikimedia.org/r/454508

Change 454508 abandoned by Elukey:
archiva: set apt component when running on Debian Stretch

Reason:
not needed!

https://gerrit.wikimedia.org/r/454508

Change 454502 abandoned by Elukey:
Add component/archiva to stretch-wikimedia

Reason:
not needed!

https://gerrit.wikimedia.org/r/454502

Mentioned in SAL (#wikimedia-operations) [2018-08-22T10:43:02Z] <elukey> upload archiva 2.2.3-1 to stretch-wikimedia/main - T192639

Change 454511 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Assign role::archiva to archiva1001

https://gerrit.wikimedia.org/r/454511

Change 454514 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::archiva::proxy: add monitoring_enabled parameter

https://gerrit.wikimedia.org/r/454514

Change 454514 merged by Elukey:
[operations/puppet@production] profile::archiva::proxy: add monitoring_enabled parameter

https://gerrit.wikimedia.org/r/454514

Change 454511 merged by Elukey:
[operations/puppet@production] Assign role::archiva to archiva1001

https://gerrit.wikimedia.org/r/454511

Change 454548 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::archiva::proxy: fix ferm srange

https://gerrit.wikimedia.org/r/454548

Change 454548 merged by Elukey:
[operations/puppet@production] profile::archiva::proxy: fix ferm srange

https://gerrit.wikimedia.org/r/454548

Change 454551 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/archiva@debian] Release 2.2.3-2

https://gerrit.wikimedia.org/r/454551

Change 454551 merged by Elukey:
[operations/debs/archiva@debian] Release 2.2.3-2

https://gerrit.wikimedia.org/r/454551

Change 454576 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] site.pp: add a note about archiva1001's status

https://gerrit.wikimedia.org/r/454576

Change 454576 merged by Elukey:
[operations/puppet@production] site.pp: add a note about archiva1001's status

https://gerrit.wikimedia.org/r/454576

After a long battle I was able to log in on archiva1001 (via ssh tunnel) using my username, and getting the Archiva Admin right since I am part of the 'ops' ldap group (previously configured with these perms).

Next step is to figure out what is needed to the new host (rsync, config, etc..) and then prepare the switch.

I had a very interesting chat with @Gehel and @dcausse about Archiva, and some good points were raised:

  • it would be great to get rid of the archiva-deploy user, sharing the pass is a pain.
  • the current configured repositories are not very granular. For example, mirrored is a proxy for various sources, like cloudera, central and spark, so it is difficult for somebody that needs a more fine grained dependency (like only central) to use archiva for it. Also tracking dependencies, in case something needs research investigation, would be more painful with the mirrored repository. A possible solution could be to specify single sources of artifacts as single repositories, and then use the repository groups to do what "mirrored" do right now.

So one possible solution for the above points, to implement in the new archiva1001 host, would be:

  • LDAP authentication for users, that instead of using archiva-deploy would simply need to be into a LDAP group called archiva-deployers (already created).
  • configure archiva with more repositories rather than having a big mirrored one, and leverage repository groups to mimic what we have now.

Add @EBernhardson and @Smalyshev and @thcipriani since IIRC they are archiva users, it would be great to know their thoughts as well.

Sounds good, though about the authentication, I have a concern. In order to deploy to archiva (at least currently), I have to store username/password in Maven config files. Since it's dedicated archiva password, I am not super-worried, but if it would be my personal account password, I would be worried a bit more. Not sure whether it's a justified concern or not, but I wonder if there's a way to handle it better - maybe with SSH keys?

Change 455082 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva::proxy: use certificate_name rather than only 'archiva'

https://gerrit.wikimedia.org/r/455082

Change 455082 merged by Elukey:
[operations/puppet@production] archiva::proxy: use certificate_name rather than only 'archiva'

https://gerrit.wikimedia.org/r/455082

Change 455086 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::archiva::proxy: allow the use of a different domain/tls-cert

https://gerrit.wikimedia.org/r/455086

Change 455086 merged by Elukey:
[operations/puppet@production] profile::archiva::proxy: allow the use of a different domain/tls-cert

https://gerrit.wikimedia.org/r/455086

Change 455087 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Add CNAME archiva-new -> archiva1001

https://gerrit.wikimedia.org/r/455087

Change 455087 merged by Elukey:
[operations/dns@master] Add CNAME archiva-new -> archiva1001

https://gerrit.wikimedia.org/r/455087

Change 455091 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Deploy archiva-new.w.o's TLS LE certificate to archiva1001

https://gerrit.wikimedia.org/r/455091

Change 455091 merged by Elukey:
[operations/puppet@production] Deploy archiva-new.w.o's TLS LE certificate to archiva1001

https://gerrit.wikimedia.org/r/455091

Change 455137 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva::proxy: fix tls cert names according to letsencrypt::cert::integrated

https://gerrit.wikimedia.org/r/455137

Change 455137 merged by Elukey:
[operations/puppet@production] archiva::proxy: fix tls cert names according to letsencrypt::cert::integrated

https://gerrit.wikimedia.org/r/455137

Sounds good, though about the authentication, I have a concern. In order to deploy to archiva (at least currently), I have to store username/password in Maven config files. Since it's dedicated archiva password, I am not super-worried, but if it would be my personal account password, I would be worried a bit more. Not sure whether it's a justified concern or not, but I wonder if there's a way to handle it better - maybe with SSH keys?

I got your point, but I'd argue that the archvia-deploy password should be as valuable as your own (since it allows up upload software on our repos). I am not familiar with Maven config files, but a solution like https://wikitech.wikimedia.org/wiki/ReleasingToMavenCentral/Settings.xml might be enough? (assuming good file permissions of ~/.m2/settings.xml)

Current status is:

  • archiva-new.wikimedia.org has been created (but it is empty for the moment). Tested that people in the ops LDAP group can see the system admin panel.
  • the archiva-deployers LDAP group will get the same permissions as the ones currently assigned to the archiva-deploy user on meitnerium.

Now it is a matter of configuring the repositories, but it is still not clear to me how it would be best to proceed.

Very interesting chat happened on IRC between @Gehel and @Ottomata about the current repository structure in Archiva (pros/cons/shortcomings/etc..):

16:09  <ottomata> elukey:  what are repository groups?
16:10  <ottomata> are you suggesting we make new repos that mirror the remote ones exactly?
16:10  <ottomata> like have a archiva hosted 'central', 'cloudera' and 'spark' repo?
16:11  <elukey> disclaimer: I learned today what those are, so not really an expert :)
16:12  <elukey> but yes that is the idea
16:12  <elukey> Gehel was proposing something similar since it might be less problematic for people that want only subset of deps rather than the whole big thing
16:13  <elukey> in theory if we have a repo group called "mirrored" it should do exactly what we want
16:13  <elukey> but not sure if worth it or not, this is the first time that I check archiva
16:13  <elukey> it is a bit obscure to me :)
16:14  <gehel> ottomata: not necessarily repo that mirror the full central (since we only use a very small subset), but proxy repo where we never upload manually.
16:15  <gehel> The current "mirror" repo is a bit of a mess, where it is impossible to know which jar comes from where
16:20  <ottomata> gehel:  right, but a remote repo central that mirrors to a local cached repo 'central' in archiva, right?
16:21  <ottomata> anytime someone dls a dep from central via archiva it would be cached in our archiva's central 'mirror'
16:21  <ottomata> like we do now, but instead of one 'mirrored' repo, separate 'central-mirrored' repos for each configured remote repo?
16:22  <gehel> ottomata: If I remember correctly, we don't have proxying always enabled so that a random person can't trigger a download from central (current situation). This could
 be kept
16:22  <ottomata> gehel:  i think we do have it enabled all the time
16:23  <ottomata> we didn't used to (the docs might say that), but it was so inconvenient that we left it proxying
16:23  <gehel> What I don't like in our current setup, is the manual uploads to the "mirrored" repo. It is too error proned (we've had a few cases)
16:23  <ottomata> manual uploads to mirrored?  ohhhhhh
16:23  <ottomata> right.
16:23  <ottomata> ok
16:24  <gehel> atm, we have remote repos for centran, cloudera and spark-packages
16:25  <elukey> what is the source of the jboss-related packages? Cloudera? (curious)
16:25  <gehel> as far as I can see, those are proxies and are specific enough. It is mostly the "mirrored" repo that I find troublesome. It is very unclear what is in there, where i
t comes from and why it was used.
16:26  <gehel> elukey: honestly, the jboss was an example (because it is a well known problematic repo). I think we have -jboss re-packaged deps in the mirrored repo, but I have not
 actually checked
16:26  <elukey> ack!
16:27  <gehel> The "wikimedia release" repo is also unclear as to what's in there. I would expect it to contain only packages maintained and build by WMF, but it does not seem to be the case
16:28  <gehel> basically, in all the "managed" repos that we have, only the "python" repo  has a clear meaning :)
16:29  <ottomata> gehel:  i also would expect releases to only contained packages by WMF
16:29  <ottomata> if it doesn't, that is weird
16:30  <ottomata> to me the 3 repos were always clear, although I get your argument about mirrored not being clear about where things come from
16:30  <ottomata> releases is for WMF versioned releases
16:30  <ottomata> snapshot is for WMF snapshot jars (which we don't really use that much)
16:30  <ottomata> and mirrored is for remote proxied dependencies
16:31  <gehel> so we agree about the expectations ! Except for mirrored, which does not seem to be a remote proxied repo, but a repo with manual uploads
16:31  <gehel> but the current reality seems to be far from our expectations!
16:32  <gehel> which probably means that this isn't clear enough to all users of those repos and that they have to think too much about how to use them
16:32  <ottomata> haha
16:32  <ottomata> i guess so!

I've rsynced /var/lib/archiva/repositories to archiva1001 and configured them, so archiva-new.wikimedia.org should be ready for the first round of tests. For the moment I decided to split the archiva upgrade (this task) with its repository refactoring/reshaping (that will be handled in another task).

Change 455579 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva1001: enable bacula backups for /var/lib/archiva

https://gerrit.wikimedia.org/r/455579

Result for mvn package set with archiva-new.wikimedia.org:

[INFO] Reactor Summary:
[INFO]
[INFO] Wikimedia Analytics Refinery 0.0.71-SNAPSHOT ....... SUCCESS [  0.527 s]
[INFO] Wikimedia Analytics Refinery Core .................. SUCCESS [01:09 min]
[INFO] Wikimedia Analytics Refinery Spark ................. SUCCESS [07:25 min]
[INFO] Wikimedia Analytics Refinery Tools ................. SUCCESS [ 30.765 s]
[INFO] Wikimedia Analytics Refinery Hive .................. SUCCESS [01:05 min]
[INFO] Wikimedia Analytics Refinery Jobs .................. SUCCESS [01:53 min]
[INFO] Wikimedia Analytics Refinery Camus ................. SUCCESS [01:27 min]
[INFO] Wikimedia Analytics Refinery Cassandra 0.0.71-SNAPSHOT SUCCESS [ 49.287 s]
  • Elasticsearch plugins have no direct dependency on archiva
  • logstash plugins only upload to archiva, @fgiunchedi has been notified, there is no reason this should break and I don't want to upload an intermediate version just to test
  • wdqs still needs to be validated by @Smalyshev

For the moment I decided to split the archiva upgrade (this task) with its repository refactoring/reshaping (that will be handled in another task).

It might be easier to do this now rather than later, since we have a test setup already. What really seems to need to be refactored is the mirrored repo. I think we can/should keep releases and snapshots as they are, but refactor the mirrored repo in to individual proxied repos for each remote, e.g. Cloudera, Central, etc.

Ideally, we'd be able to do this, and not rsync over the existing mirrored repo at all. Since all the artifacts in mirrored should only be proxied and cached, builds with the repos (like refinery) should cause archiva to re-download and re-cache the required dependencies in the proper places.

The main issue is that I already rsynced and configured the mirrored repo, I was under the impression that there was no clear quorum to change the status quo :)

Change 455579 merged by Elukey:
[operations/puppet@production] archiva1001: enable bacula backups for /var/lib/archiva

https://gerrit.wikimedia.org/r/455579

Change 455760 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Switch archiva.wikimedia.org to archiva1001

https://gerrit.wikimedia.org/r/455760

Change 455761 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Move archiva.wikimedia.org from meitnerium to archiva1001

https://gerrit.wikimedia.org/r/455761

The current status is:

  • archiva.wikimedia.org is controlled by meitnerium via letsencrypt::cert::integrated
  • archiva-new.wikimedia.org is controlled by archiva1001 via letsencrypt::cert::integrated

Target result:

  • archiva.wikimedia.org is controlled by archiva1001 via letsencrypt::cert::integrated
  • archiva-new.wikimedia.org is revoked (if needed?)

Plan:

The above plan assumes that moving archiva's certificate on a new host will be handled by letsencrypt::cert::integrated and doesn't need manual copy. The alternative is to manually copy /etc/acme from meitnerium to archvia1001 before re-enabling puppet.

Hi @Smalyshev, any news about wdqs? :)

Seems to work OK for archiva-new for me. Though I see that some files are still going from archiva, so it might be that I failed to reconfigure to archiva-new... Will consult with @Gehel and verify again later today.

Change 455824 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Lower to 5M the TTL for archiva's CNAME record

https://gerrit.wikimedia.org/r/455824

Change 455824 merged by Elukey:
[operations/dns@master] Lower to 5M the TTL for archiva's CNAME record

https://gerrit.wikimedia.org/r/455824

Change 455760 merged by Elukey:
[operations/dns@master] Switch archiva.wikimedia.org to archiva1001

https://gerrit.wikimedia.org/r/455760

Change 455761 merged by Elukey:
[operations/puppet@production] Move archiva.wikimedia.org from meitnerium to archiva1001

https://gerrit.wikimedia.org/r/455761

Change 456090 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Assign role::spare::system to meitnerium

https://gerrit.wikimedia.org/r/456090

Change 456108 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva::proxy: add support for ipv6 to nginx listen directives

https://gerrit.wikimedia.org/r/456108

Change 456108 merged by Elukey:
[operations/puppet@production] archiva::proxy: add support for ipv6 to nginx listen directives

https://gerrit.wikimedia.org/r/456108

Change 456109 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] archiva::proxy: missed a ';' in nginx's config

https://gerrit.wikimedia.org/r/456109

Change 456109 merged by Elukey:
[operations/puppet@production] archiva::proxy: missed a ';' in nginx's config

https://gerrit.wikimedia.org/r/456109

Change 456090 merged by Elukey:
[operations/puppet@production] Assign role::spare::system to meitnerium

https://gerrit.wikimedia.org/r/456090

Change 456156 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::archiva: allow rsync to bind to IPv6 interfaces

https://gerrit.wikimedia.org/r/456156

archiva.wikimedia.org works fine for me now.

archiva.wikimedia.org works fine for me now.

Thanks for the feedback!

Next steps to finish the upgrade:

  • Ask people to review and merge https://gerrit.wikimedia.org/r/456156 (rsync listening IPv6)
  • Decommission meitnerium (separate task already opened)
  • Verify that archiva1001 is correctly backed up via bacula
  • Update ops' pwstore with new admin passwords (plus remove archiva-deploy credentials)

Next steps to finish the upgrade:

  • Verify that archiva1001 is correctly backed up via bacula
Terminated Jobs:
 JobId  Level    Files      Bytes   Status   Finished        Name
===================================================================
105926  Full    328,669    42.07 G  OK       28-Aug-18 04:59 archiva1001.wikimedia.org-Monthly-1st-Fri-production-var-lib-archiva
106048  Incr        536    963.0 M  OK       29-Aug-18 04:05 archiva1001.wikimedia.org-Monthly-1st-Fri-production-var-lib-archiva
106169  Incr     27,465    1.825 G  OK       30-Aug-18 04:08 archiva1001.wikimedia.org-Monthly-1st-Fri-production-var-lib-archiva

Looks good!

Next steps to finish the upgrade:

  • Update ops' pwstore with new admin passwords (plus remove archiva-deploy credentials)

Done!

Change 456156 merged by Elukey:
[operations/puppet@production] profile::archiva: allow rsync to bind to IPv6 interfaces

https://gerrit.wikimedia.org/r/456156

elukey set the point value for this task to 13.Sep 5 2018, 9:39 AM