EDB Postgres Distributed support v2
You can use LiveCompare against EDB Postgres Distributed (PGD, formerly known as BDR) nodes as well as non-PGD nodes.
Setting logical_replication_mode = bdr
makes the tool assume that all databases being compared belong to the same PGD cluster. Then you can specify node names as connections and replication sets to filter tables.
For example, suppose you can connect to any node in the PGD cluster, which we'll refer to as the initial connection. By initially connecting to this node, LiveCompare can check PGD metadata and retrieve connection information from all other nodes.
Now suppose you want to compare three PGD nodes. As LiveCompare can connect to any node starting from the initial connection, you don't need to define dsn
or any connection information for the data connections. You only need to
define node_name
. LiveCompare searches in PGD metadata about the connection information for that node and then connects to the node.
For LiveCompare to connect to all other nodes by fetching PGD metadata, LiveCompare must be able to connect to
them using the same DSN from the PGD view bdr.node_summary
in the field interface_connstr
. In this case, we recommend running LiveCompare on the same machine as the initial connection as the postgres user. If that's not possible, then define the dsn
attribute in all data connections.
You can also specify replication sets as table filters. LiveCompare uses PGD metadata to build the table list, considering only tables that belong to the replication sets you defined in the replication_sets
setting.
For example, you can create an .ini
file to compare three PGD nodes:
You can also tell LiveCompare to compare all active nodes in the PGD cluster. To do so:
- Under
General Settings
, enableall_bdr_nodes = on
. - Under
Initial Connection
, specify an initial connection.
Additional data connections aren't required.
For example:
When all_bdr_nodes = on
, LiveCompare uses the Initial Connection
setting to fetch the list of all PGD nodes. While additional data connections aren't required, if set, they're appended to the list of data connections. For example, you can compare a whole PGD cluster against a single Postgres connection, which is useful in migration projects:
The settings node_name
and replication_sets
are supported for the following technologies:
- PGD 1, 2, 3, and 4
- pglogical 2 and 3
To enable pglogical metadata fetch instead of PGD, set logical_replication_mode = pglogical
instead of logical_replication_mode = bdr
.
PGD witness nodes
Using replication sets in PGD, you can configure specific tables to include in the PGD replication. You can also specify the nodes to receive data from these tables by configuring the node to subscribe to the replication set the table belongs to. This setting allows for different architectures such as PGD sharding and the use of PGD witness nodes.
A PGD witness is a regular PGD node that doesn't replicate any DML from other nodes. The purpose of the witness is to provide quorum in Raft Consensus voting. (For details on the PGD witness node, see Witness nodes in the PGD documentation.) Replication set configuration determines whether the witness replicates DDLs. This means that there are two types of PGD witnesses:
- A completely empty node, without any data nor tables
- A node that replicates DDL from other nodes, so it has empty tables
In the first case, even if the PGD witness is included in the comparison (either manually under [Connections]
or using all_bdr_nodes = on
), because the witness doesn't have any tables, the following message is logged:
In the second case, the table exists on the PGD witness. However, it's not correct to report data missing on the witness as divergences. So, for each table, LiveCompare checks the following information on each node included in the comparison:
- The replication sets that the node subscribes to
- The replication sets that the table is associated with
- The replication sets, if any, you defined in the filter
replication_sets
underTable Filter
If the intersection among all three lists of replication sets is empty, which is the case for the PGD witness, then LiveCompare logs this message:
In both cases, the comparison for that table proceeds on the nodes where the table exists, and the table is replicated according to the replication sets configuration.
Differences in a PGD cluster
LiveCompare makes changes only to the local node. It's important that corrective changes don't get replicated to other nodes.
When logical_replication_mode = bdr
, LiveCompare first checks if a replication origin called bdr_local_only_origin
already exists. (You can configure the name of the replication origin by adjusting the setting difference_fix_replication_origin
.) If a replication origin called bdr_local_only_origin
doesn't exist, then LiveCompare creates it on all PGD connections.
Important
PGD 3.6.18 introduced the new preexisting bdr_local_only_origin
replication origin to use for applying local-only transactions. If LiveCompare is connected to PGD 3.6.18, it doesn't create this replication origin.
LiveCompare generates apply scripts considering the following:
- Set the current transaction to use the replication origin
bdr_local_only_origin
, so any DML executed hasxmin
associated withbdr_local_only_origin
. - Set the current transaction datetime to be far in the past, so if there are any PGD conflicts with real DML being executed on the database, LiveCompare DML always loses the conflict.
After applying a LiveCompare fix script to a PGD node, you can get exactly the rows that were inserted or updated by LiveCompare using the following query. Replace mytable
with the name of any table.
Deleted rows are no longer visible.
LiveCompare requires at least a PostgreSQL user with bdr_superuser privileges to properly fetch metadata.
All of these steps involving replication origins applied only to the output script if the PostgreSQL user has bdr_superuser or PostgreSQL superuser privileges. Otherwise, LiveCompare generates fixes without associating any replication origin. (Transaction replication is still disabled using SET LOCAL bdr.xact_replication = off
.) However, we recommend using a replication origin when applying the DML scripts. Otherwise, LiveCompare has the same precedence as a regular user application regarding conflict resolution. Also, as there isn't any replication origin associated with the fix, you can't use the query to list all rows fixed by LiveCompare.
Between PGD 3.6.18 and PGD 3.7.0, the following functions are used:
bdr.difference_fix_origin_create()
: Executed by LiveCompare to create the replication origin specified indifference_fix_replication_origin
(by default, set tobdr_local_only_origin
), if this replication origin doesn't exist.bdr.difference_fix_session_setup()
: Included in the generated DML script so the transaction is associated with the replication origin specified indifference_fix_replication_origin
.bdr.difference_fix_xact_set_avoid_conflict()
: Included in the generated DML script so the transaction is set far in the past (2010-01-01
). The fix transaction applied by LiveCompare always loses any conflict.
These functions require a bdr_superuser rather than a PostgreSQL superuser. Starting with PGD 3.7.0, those functions are deprecated. In that case, if running as a PostgreSQL superuser, LiveCompare uses the following functions to perform the same actions:
pg_replication_origin_create(origin_name)
;pg_replication_origin_session_setup()
;pg_replication_origin_xact_setup()
.
If a PostgreSQL superuser isn't being used, then LiveCompare includes only the following in the generated DML transaction:
Conflicts in PGD
LiveCompare has an execution mode called conflicts
. This execution mode is specific for PGD clusters. It works only in PGD 3.6, PGD 3.7, PGD 4, and PGD 5 clusters.
While compare
mode is used to compare all content of tables as a whole, conflicts
mode focuses just in tuples/tables that are related to existing conflicts that are registered in bdr.apply_log
, in case of PGD 3.6, or in
bdr.conflict_history
, in case of PGD 3.7, PGD 4, and PGD 5.
conflicts
execution mode is expected to run much faster than compare
mode because it inspects only specific tuples from specific tables. However, it's not as complete as compare
mode for the same reason.
The main objective of this execution mode is to check that the automatic conflict resolution that's being done by PGD is consistent among nodes, that is, after PGD resolves conflicts, the cluster is in a consistent state.
Although, for the general use case, automatic conflict resolution ensures cluster consistency, there are a few known cases where automatic conflict resolution can result in divergent tuples among nodes. So the conflicts
execution mode from LiveCompare can help with checking and ensuring consistency, providing a good balance between time and result.
Conflict example
Suppose on node3
, you execute the following query:
You can see the following conflict in bdr.conflict_history
:
This conflict means that when the DELETE
arrived from node2
to node3
, there was no row with b = 3
in table tbl
. However, the INSERT
might have arrived from node1
to node3
later, which then added the row with b = 3
to node3
. So this is the current situation on node3
:
While on nodes node1
and node2
, you see this:
The PGD cluster is divergent.
To detect and fix such divergence, you can execute LiveCompare in compare
mode. However, depending on the size of the comparison set (suppose table tbl
is very large), that can take a long time, even hours.
This situation is one in in which conflicts
mode can be helpful. In this case, the delete_missing
conflict is visible only from node3
, but LiveCompare can extract the PK values from the conflict logged rows (key_tuple
, local_tuple
, remote_tuple
, and apply_tuple
) and perform an automatic cluster-wide comparison only on the affected table, already filtering by the PK values. The comparison then checks the current row version in all nodes in the cluster.
Create a check.ini
file to set all_bdr_nodes = on
, that is, to tell LiveCompare to compare all nodes in the cluster:
To run LiveCompare in conflicts
mode:
After the execution, in the console output, you see something like this:
Inside folder ./lc_session_X/
(X
is the number of the current comparison session), LiveCompare writes the file conflicts_DAY.out
(replacing DAY
in the name of the file with the current day). The file shows the main information about all divergent conflicts.
If you connect to database liveoutput
, you can see more details about the conflicts, for example, using this query:
The output is something like this:
The is_conflict_divergent = true
means that LiveCompare compared the conflict and found the nodes to be currently divergent in the tables and rows reported by the conflict. The view livecompare.vw_conflicts
shows information about all conflicts, including the non-divergent ones.
LiveCompare also generates the DML script ./lc_session_X/apply_on_the_node3_DAY.sql
(where DAY
in the name of the file with the current day):
LiveCompare is suggesting to DELETE
the row where b = 3
from node3
because the row doesn't exist on the other two rows. By default, LiveCompare suggests the DML to fix based on the majority of the nodes.
Running this DML script against node3
makes the PGD cluster consistent again:
As the --conflicts
mode comparison is much faster than a full --compare
, we strongly recommend scheduling a --conflicts
comparison session more often to ensure conflict resolution is providing cluster-wide consistency.
Note
To see the data in bdr.conflict_history
in PGD 3.7 or bdr.apply_log
in PGD 3.6, run LiveCompare with a user that's a bdr_superuser or a PostgreSQL superuser.
To be able to see the data in bdr.conflict_history
in PGD 3.7+ or bdr.apply_log
in PGD 3.6, run LiveCompare with a
user that's bdr_superuser or a PostgreSQL superuser.
Conflicts Filter
You can also tell LiveCompare to filter the conflicts by any of the columns in either bdr.conflicts_history
or bdr.apply_log
. For example:
Mixing technologies
Metadata for node_name
and replication_sets
are fetched in the initial connection. So it must be a pglogical- and/or PGD-enabled database.
The list of tables is built in the first data connection. So the replication_sets
condition must be valid in the first connection.
You can perform mixed-technology comparisons, for example:
- PGD 1 node versus PGD 3 node
- PGD 4 node versus vanilla Postgres instance
- Vanilla Postgres instance versus pglogical node