How a Nexus Repository Manager corruption led to a mini Odyssey

Image result for nexus sonatype orientdb

Last week (worst Friday of all ;-)) we had a very serious incident in our Nexus Repository Manager service which affected the releases lifecycle of our products. Unfortunately something bad happened in our NFS server and our nexus docker complained about several corruptions like:

2019-09-05 19:38:17,321+0000 ERROR [FelixStartLevel] *SYSTEM com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - $ANSI{green {db=config}} Error on creating record in cluster: plocal cluster: quartz_trigger
com.orientechnologies.orient.core.exception.OPaginatedClusterException: Error during record creation
DB name="config"
Component Name="quartz_trigger"
at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.createSinglePageRecord(OPaginatedCluster.java:687)
at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.createDataRecord(OPaginatedCluster.java:564)

While searching on how others fixed it, most of them dropped the specific orient database table: config.quartz_trigger. This table is actually the holder of the scheduled tasks so it was not a big deal to drop and recreate it.

You should connect to orientdb console. Here are the commands to connect to config db and drop this table:

java -jar /opt/sonatype/nexus/lib/support/nexus-orient-console.jar
connect plocal:/opt/sonatype/sonatype-work/nexus3/db/config/ admin admin
drop class quartz_trigger

Then repair database, disconnect and restart nexus.

REPAIR DATABASE component
DISCONNECT

Nevertheless, after restart I experienced some other errors that were very strange how they occurred.

Return code is: 500 , ReasonPhrase:javax.servlet.ServletException: com.orientechnologies.orient.core.exception.OCommandExecutionException: Error on execution of command: sql.select from asset where bucket = :bucket and name = :propValue??	DB name="component". -> [Help 1]

This time I didn’t have with me google or stackoverflow so I was trying to understand what happened. After first corruption I decided to do an upgrade of nexus just in case there was a bug in the release before the upgrade. The update though did not use the old nexus.vmoptions, and the following vmoptions were missing:

-Xms2703m -Xmx2703m -XX:MaxDirectMemorySize=2703m

I realized that the orientdb did not have enough MaxDirectMemorySize (sets a limit on the amount of memory that can be reserved for all Direct Byte Buffers) allocated. Basically no new blob stores, no new settings, and no uploads were possible. Did an update on my ansible script for the docker container creation:

    - name: Create nexus container
      docker_container:
        name: nexus
        image: sonatype/nexus3:3.18.1
        state: started
        restart_policy: unless-stopped
        env:
          INSTALL4J_ADD_VM_PARAMS="-Xms2703m -Xmx2703m -XX:MaxDirectMemorySize=2703m"
        volumes:
           - /data/nexus/sonartype-work:/opt/sonatype/sonatype-work:rw
        ports:
          - "8081:8081"
          - "9000:9000"

In the meantime, developers did several maven deploy leading to the following error:

2019-09-06 11:21:20,573+0000 WARN [qtp1097449578-137] deployment org.sonatype.nexus.transaction.RetryController - Exceeded retry limit: 8/8 (com.orientechnologies.orient.core.storage.ORecordDuplicatedException: Cannot index record #31:309836: found duplicated key 'OCompositeKey{keys=[#22:6, null, gr/aaafx/backend/aaafx-dealer/maven-metadata.xml]}' in index 'asset_bucket_component_name_idx' previously assigned to the record #30:307921
DB name="component" INDEX=asset_bucket_component_name_idx RID=#30:307921)

Fortunately help command of orientdb console had listed a command called truncate record. Voila! That did the trick and since then after nexus restart everything worked as expected.

Here are the commands I applied:

java -jar /opt/sonatype/nexus/lib/support/nexus-orient-console.jar
connect plocal:/opt/sonatype/sonatype-work/nexus3/db/component/ admin admin

load record #30:307921
truncate record #30:307921
rebuild index asset_bucket_component_name_idx
REPAIR DATABASE component

Make sure that you always have a backup of the database you’re going to touch. Basically Nexus has 3 databases: config, component and security. I’ve added a scheduled task that creates a backup daily.

Here are the commands for backup and restore through orientdb console:

export database component-export
drop database
create database plocal:/nexus-data/db/component admin admin
import database component-export.json.gz

Enjoyed the weekend after all 😉

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.