RocksDB performance problems

Description

We have been experiencing performance issues since pantheon 1.1.3 due the rocksdb read performance.

Our context is the following:

  • We use Eventeum as the event listener platform for BESU. We are core committers of the project.

  • Eventeum internally stores a kind of index, to store which was the latestblock read, to get all the events happening on the chain

  • When eventeum stops for a while, either by a planned maintenance, configuration change, or any read problem at the node level, being the node able to get new blocks from the chain, but not able to serve any query to eventeum, eventeum is not synced with the head block of the chain

  • When eventeum starts it creates logs per any event filte via eth_newFilter, syncs the pending information via eth_getFilterLogs, and then it uses eth_getFilterChanges to get updates when on sync

We are seeing that when for whatever reason, the amount of unsynced blocks si higher than 10000, the performance drops, 100k of blocks, makes BESU hanf at query level locking rocks db with traces like "Thread Thread[vert.x-worker-thread-18,5,main] has been blocked for 76419 ms, time limit is 60000 ms"

Based on this scenario we have done some performance checks compared to goquorum

  • 10k blocks: BESU = 5 secs, QUORUM = 0,9 secs

  • 50k blocks: BEDU = 10 secs , Quorum = 4 sec
    -100k blocks: BESU = 20 sec, Quorum = 10secs

> 100k BESU rocksdb threads blocks, quorum takes some time but answers

Its strange based on a comparison between leveldb and rocksdb.

With this scenario, we really need and urgent solution on it, to boost the performance.

We have several things, menawhile , on mind:

  • Monitor rocks db with the following metrics : "Latency for read from RocksDB.", "Latency of remove requests from RocksDB.", "Latency for write to RocksDB.", "Latency for commits to RocksDB."), raising alerts when it is underperforming. The problem is that we cannot see that metrics on the metrics endpoint neither at 1.2.2 or 1.3.1. Can you review the following bug? Any suggestion to monitor rocksdb

  • At eventeum level, split the unsynced blocks, in a number of chunks based on a kinf of window. any suggestion on the mas window to include?

Kind regards

Environment

None

Status

Assignee

Danno Ferrin

Reporter

Fernando Paris

Labels

None

Scrum Team

Chupacabra

Refinement State

Not Started

Components

Sprint

Fix versions

Priority

P2
Configure