How indexing Postgres and flushing caches impacts performance
Dealing with performance problems in Atlassian tools is our bread and butter. When we start working with customers, they frequently tell us that their database is what is causing the slowdown. In our experience this has rarely been the root cause, but we still need to make sure we understand what knobs we can turn, and what the impact will be in order to help customers improve performance.
In this post we explore the impact of indexing PostgreSQL, warming caches in PostgreSQL, and filesystem caching on a large system.
- Wipe FS caches
root@infra-test:~# echo 1 > /proc/sys/vm/drop_caches
- Restart PostgreSQL (to wipe PostgreSQL caches)
- Run (several times)
select * from jiraissue where summary like '%test%';
Timing information comes from PostgreSQL \timing command.
- 21327.413 ms << Filesystem caches and PostgreSQL reset prior to test
- 1444.533 ms << PostgreSQL reset prior to test
- 1442.624 ms<< PostgreSQL reset prior to test
- 1415.440 ms<< PostgreSQL reset prior to test
- 21068.620 ms << Filesystem caches and PostgreSQL reset prior to test
- 1238.252 ms << Query against warm caches
- 1221.243 ms << Query against warm caches
- 1255.460 ms << Query against warm caches
- 1249.523 ms << Query against warm caches
- 20762.072 ms << Filesystem caches and PostgreSQL reset prior to test
- 1394.704 ms << PostgreSQL reset prior to test
- 1665.776 ms << PostgreSQL reset prior to test
- 1350.857 ms << PostgreSQL reset prior to test
- 1448.302 ms << PostgreSQL reset prior to test
- 20220.585 ms << Filesystem caches and PostgreSQL reset prior to test
- 1204.089 ms << Query against warm caches
- 1238.489 ms << Query against warm caches
- 1228.982 ms << Query against warm caches
- 1253.364 ms << Query against warm caches
Does not provide much value beyond reducing size on disk. The size of the DB on disk went from 47gb to 42gb in this test. This can have a significant impact if the reduction in size are large enough to load frequently used contents into FS caches. For example, if PostgreSQL is now able to hold the group membership entries in cache instead of on disk, this can really bring improvements.
PostgreSQL warm caches
Provides a very minor benefit.
Provides a huge benefit and should be monitored carefully. FS caches are going to be critical in an environment where the DB size is larger than the memory available to the DB.
First, we must remember that with a dataset this large, PostgreSQL is not able to load most of the content into memory. We still need to test a situation where the system has sufficient ram, and PostrgreSQL is configured to use more of it. Second, JIRA does not search issue data in the way we are testing except when performing a re-index. The goal here was not to simulate activity like what JIRA would do, simply to understand 1. if a REINDEX provides any meaningful performance gain, 2. if PostgreSQL caches are helping to reduce the query times for operations like this, and 3. if filesystem caches provide a significant improvement in performance when the DB size far outweighs the size of the available memory.
JIRA dataset used for testing:
- JIRA 7.3.4
- Comments 14000102
- Components 10000
- Custom Fields 507
- Groups 1003
- Issue Types 112
- Issues 2000000
- Priorities 5
- Projects 1000
- Resolutions 50
- Screen Schemes 751
- Screens 753
- Statuses 55
- Users 2002
- Versions 10000
- Workflows 100
This was running on Ubuntu 16.04 on a n1-standard-4 (4 vCPUs, 15 GB memory) virtual machine in Google Cloud Platform. PostgreSQL 9.4.11 was used for testing, and the following changes were made in postgresql.conf:
- shared_buffers = 258MB
- work_mem = 16MB
- checkpoint_segments = 8
- checkpoint_completion_target = 0.7
- effective_cache_size = 1GB
When we started, the DB on disk was ~ 45GB in size based on \l+. After the REINDEX it was ~ 42GB.
Reindex was performed using REINDEX DATABASE jira734; which took 2509264.949 ms.