Skip to content

[Bug] Concurrent update cause select data error #1465

@yanboer

Description

@yanboer

Apache Cloudberry version

gpadmin=# select version();
                                                                                                    version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 14.4 (Apache Cloudberry 2.0.0-incubating build 1) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11), 64-bit compiled on Dec  2 2025 19:35:44 (with assert checking)
(1 row)

[gpadmin@cdw ~]$ pg_config
BINDIR = /usr/local/cloudberry-db/bin
DOCDIR = /usr/local/cloudberry-db/share/doc/postgresql
HTMLDIR = /usr/local/cloudberry-db/share/doc/postgresql
INCLUDEDIR = /usr/local/cloudberry-db/include
PKGINCLUDEDIR = /usr/local/cloudberry-db/include/postgresql
INCLUDEDIR-SERVER = /usr/local/cloudberry-db/include/postgresql/server
LIBDIR = /usr/local/cloudberry-db/lib
PKGLIBDIR = /usr/local/cloudberry-db/lib/postgresql
LOCALEDIR = /usr/local/cloudberry-db/share/locale
MANDIR = /usr/local/cloudberry-db/share/man
SHAREDIR = /usr/local/cloudberry-db/share/postgresql
SYSCONFDIR = /usr/local/cloudberry-db/etc/postgresql
PGXS = /usr/local/cloudberry-db/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE =  '--prefix=/usr/local/cloudberry-db' '--enable-cassert' '--enable-debug-extensions' '--enable-ic-proxy' '--enable-mapreduce' '--enable-orafce' '--enable-orca' '--enable-pxf' '--enable-tap-tests' '--with-gssapi' '--with-ldap' '--with-libxml' '--with-openssl' '--with-pam' '--with-perl' '--with-pgport=5432' '--with-python' '--with-pythonsrc-ext'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-unused-but-set-variable -Werror=implicit-fallthrough=3 -Wno-format-truncation -Wno-stringop-truncation -O3 -fPIC  -DUSE_INTERNAL_FTS=1  -Werror=uninitialized -Werror=implicit-function-declaration -Werror
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/usr/local/cloudberry-db/lib',--enable-new-dtags -Werror
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lxerces-c -lbz2 -lxml2 -lpam -lrt -lssl -lcrypto -lgssapi_krb5 -luv -lz -lreadline -lm  -lcurl -lzstd
VERSION = PostgreSQL 14.4

What happened

select data error

## real result(multi rows)
gpadmin=*# select * from public.pgbench_branches where bid = 71;
 bid | bbalance | filler
-----+----------+--------
  71 |   868076 |
  71 |   785976 |
(2 rows)

gpadmin=*# \d public.pgbench_branches
s              Table "public.pgbench_branches"
  Column  |     Type      | Collation | Nullable | Default
----------+---------------+-----------+----------+---------
 bid      | integer       |           | not null |
 bbalance | integer       |           |          |
 filler   | character(88) |           |          |
Distributed by: (bid)
Image

What you think should happen instead

select only 1 rows data

## expect result(only 1 rows)
gpadmin=*# select * from public.pgbench_branches where bid = 71;
 bid | bbalance | filler
-----+----------+--------
  71 |   xxx |
(1 rows)

How to reproduce

1、set param

gpconfig -c gp_enable_global_deadlock_detector -v on
gpconfig -c max_connections -v 4096 -m 2048
gpconfig -c gp_vmem_protect_limit -v 16384
gpstop -r -M fast -a

2、prepare sql

  • error_snapshot.sql
begin;
SET TRANSACTION ISOLATION LEVEL repeatable read;
SELECT pg_catalog.pg_export_snapshot();
\copy public.pgbench_branches to '/home/gpadmin/pgbench_branches.csv';
select count(1) from public.pgbench_branches;

DO $$
DECLARE
    user_count INTEGER;
    max_users INTEGER := 1000;
BEGIN
    select count(1) into user_count from public.pgbench_branches;

    IF user_count > max_users THEN
        perform pg_sleep(10000000);
    END IF;
END 
$$;

COMMIT;
  • pgbench.sql
\set bid random(1, 100 * :scale)
\set delta random(-50000, 50000)
BEGIN;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
END;

3、reproduce

### init data
pgbench -i -s 1000 -h localhost -p 5432

## concurrent run pgbench and error_snapshot.sql
pgbench -M prepared -h localhost -p 5432 -T 30000 --rate 100000 -j 32 -c 512 -P 1 -f pgbench.sql
for i in $(seq 1 100000); do psql -f error_snapshot.sql; done
Image

Operating System

NAME="Rocky Linux" VERSION="9.7 (Blue Onyx)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="9.7" PLATFORM_ID="platform:el9" PRETTY_NAME="Rocky Linux 9.7 (Blue Onyx)" ANSI_COLOR="0;32" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:rocky:rocky:9::baseos" HOME_URL="https://rockylinux.org/" VENDOR_NAME="RESF" VENDOR_URL="https://resf.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" SUPPORT_END="2032-05-31" ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9" ROCKY_SUPPORT_PRODUCT_VERSION="9.7" REDHAT_SUPPORT_PRODUCT="Rocky Linux" REDHAT_SUPPORT_PRODUCT_VERSION="9.7"

Anything else

If the problem cannot be reproduced, try increasing the concurrency.

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions