In PostgreSQL, object binding (or dependencies) encompasses the connections existing among various database elements. These interdependencies hold significant importance when it comes to the management and modification of objects within the database. They ensure that adjustments made to one object don’t inadvertently disrupt other dependent objects.
This series is divided into two posts. In the Managing object dependencies in PostgreSQL – Overview and helpful inspection queries (Part 1) post, we introduced object dependencies and discussed various types of dependencies with examples. We also discussed the catalog tables and queries that help you find the dependencies based on which you can take appropriate actions. In this post, we dive deep into the techniques for managing object dependencies in PostgreSQL. We explore how to use DROP and ALTER statements to handle dependencies and discuss best practices for avoiding potential problems.
Prerequisites
To follow along with this post, complete the following prerequisites:
Create an Amazon Aurora PostgreSQL-Compatible Edition cluster or Amazon Relational Database Service (Amazon RDS) for PostgreSQL instance if you don’t already have one. For instructions, refer to Create an Amazon Aurora PostgreSQL-Compatible DB cluster or Create a PostgreSQL DB instance, respectively.
Create an Amazon Elastic Compute Cloud (Amazon EC2) instance to install the PostgreSQL client to access the Aurora PostgreSQL or RDS for PostgreSQL instance. For instructions, refer to Create your EC2 resources and launch your EC2 instance. Or you can set up connectivity between your RDS database and EC2 compute instance in one click.
On Amazon Linux 2023, you can use the following commands to download the psql command line tool to run the SQL statements in the blog:
The pg_depend table
To grasp PostgreSQL dependencies, we need to start with the pg_depend system catalog table. This table stores information about the relationships between various database objects, allowing you to understand which objects depend on others. The following table contains a breakdown of the columns in the pg_depend table.
Name
Type
References
Description
classid
OID
pg_class.oid
The object ID of the system catalog the dependent object is in.
objid
OID
any OID column
The OID of the specific dependent object.
objsubid
INT4
For a table column, this is the column number. For other object types, this is set to zero.
refclassid
OID
pg_class.oid
The OID of the system catalog the referenced object is in.
refobjid
OID
any OID column
The OID of the specific referenced object.
refobjsubid
INT4
Similar to objsubid; used for table columns.
deptype
CHAR
A code defining the specific semantics of this dependency relationship.
Let’s dive deep into these columns:
classid – The classid column represents the OID of the system catalog where the dependent object is listed. For instance, for a schema, classid will be set to pg_namespace, whereas for a table or a view, it will be pg_class. This column helps you identify the category to which the dependent object belongs.
objid – The objid column contains the OID of the specific dependent object. It’s stored in the relation mentioned in the classid column. For example, if you’re dealing with a table, objid will reference to that table’s OID.
objsubid – The objsubid column is used only when the dependency involves a table’s column. In such cases, it stores the column number. For other object types, it’s set to zero.
refclassid and refobjid – These columns, similar to their counterparts, classid and objid, indicate the system catalog and the specific referenced object, respectively. They help establish the dependencies between objects.
deptype – The deptype column holds a code defining the semantics of the dependency relationship. PostgreSQL uses various codes (as explained in “Dependency categories†section) to categorize dependencies, and understanding these codes is crucial. We provide examples in the next section.
Dependency categories
Here are some basic examples of the codes to categorize dependencies.
DEPENDENCY_NORMAL (n)
This indicates a normal relationship between separately created objects. The dependent object can be dropped without affecting the referenced object. The referenced object can only be dropped by specifying CASCADE, which will also drop the dependent object. An example of this is a table column’s normal dependency on its data type.
The following an example of a normal dependency:
In this example, the current_mood column in the person table has a normal dependency on the mood type. If you try to drop the mood type, PostgreSQL will prevent you from doing so because it’s being used by the person table. However, if you use the CASCADE option, it will drop both the mood type and the person table.
To demonstrate this, let’s attempt to drop the mood type without using CASCADE:
PostgreSQL will return an error, indicating that the mood type is still being used by the person table.
Now, let’s try to drop the mood type with the CASCADE option:
This time, PostgreSQL will successfully drop the mood type and also drop the person table because of the normal dependency between the person table and the mood type.
DEPENDENCY_AUTO (a)
In this case, the dependent object can be dropped separately from the referenced object, but it should be automatically dropped if the referenced object is dropped, regardless of RESTRICT or CASCADE mode. For instance, a named constraint on a table is made auto-dependent on the table, ensuring it’s removed if the table is dropped.
The following is an example of an automatic dependency.
First, create a table with a named constraint:
Check the pg_constraint system catalog for the constraint:
In this case, the age_check constraint is an automatic dependency on the students table. If we drop this table, the age_check constraint will be automatically dropped as well, regardless of whether we use RESTRICT or CASCADE mode:
Now the table and its constraint age_check are both dropped. If we try to query this constraint from the pg_constraint system catalog, we find that it no longer exists:
This query returns no rows, confirming that the age_check constraint was automatically dropped when the students table was dropped.
DEPENDENCY_INTERNAL (i)
This indicates that the dependent object was created as part of the referenced object’s creation and is an internal part of its implementation. Dropping the dependent object is disallowed, and dropping the referenced object will propagate to drop the dependent object, with or without CASCADE. An example is a trigger created to enforce a foreign key constraint.
The following code example demonstrates an internal dependency between a foreign key constraint and its trigger.
Create the teachers table:
Create the students table with a foreign key constraint referencing teachers:
In this example, the teacher_id column in the students table has a foreign key constraint referencing the teacher_id column in the teachers table. PostgreSQL automatically creates a trigger to enforce this foreign key constraint. This trigger is an example of an internal dependency (DEPENDENCY_INTERNAL or ‘i‘) because it was created as part of the constraint’s creation and is an integral part of its implementation.
You can’t directly drop these internal triggers:
However, if you drop the foreign key constraint, these triggers will be dropped automatically:
DEPENDENCY_EXTENSION (e)
In this scenario, the dependent object is a member of the extension that is the referenced object. The dependent object can only be dropped via DROP EXTENSION on the referenced object. Functionally, this type of dependency acts like an internal dependency, but it’s kept separate for clarity.
The following code is an example of creating an extension and checking its dependency type:
In this example, we first create an extension named hstore. The second command is a query that retrieves the dependency type (deptype) for the hstore extension from the pg_depend table. This will return ‘e’, indicating an extension dependency.
DEPENDENCY_PARTITION_PRI (P) and DEPENDENCY_PARTITION_SEC (S)
These represent primary and secondary partition dependencies in PostgreSQL. The dependent object is created as part of the referenced object and must not be dropped unless one of the referenced objects is dropped. Primary and secondary dependencies behave identically, but the primary is preferred for error messages. Partition dependencies are added to normal dependencies, simplifying ATTACH/DETACH PARTITION operations.
The following is an example to create a partitioned table with child tables and check the dependencies:
In this example, the orders_y2023 table is a partition of the orders table and an index created on the orders table created an index on orders_y2023 automatically.
Let’s check the dependency type:
The primary and secondary dependencies are the parent table (orders) and the partition child (orders_y2023).
Managing nested dependencies
Nested dependencies in PostgreSQL occur when one database object depends on another, which in turn depends on another, creating a chain of dependencies. This is commonly seen with views and foreign keys. When working with nested dependencies, it’s essential to understand the full chain of dependencies to avoid disrupting database functionality. In the following sections, we discuss nested views, nested foreign keys, and options to deal with foreign key dependencies.
Nested views
Let’s take an example of nested views. Nested views are views that are based on other views.
Create the view v_orders based on the orders table:
Create the second view v_orders_summary based on the first view:
Create the second view v_orders_summary_history based on the first view:
Now, if you try to drop or modify the v_orders view, it will throw an error that there are other dependent objects:
It clearly shows all the dependents related to the table, and it will be same with the view as well:
You can use the following query to find the order of dependency for these views to analyze:
Following is the output of the above query:
Nested foreign keys
Foreign keys represent a link between two tables. When a table has a foreign key to another table, it creates a dependency. If the referenced table has further dependencies, then those are indirectly dependencies of the first table. This can affect operations like deleting records or altering tables, because changes need to be cascaded through the chain of dependencies to maintain data integrity.
Let’s create an example to show the nested foreign key dependencies.
Create the primary key table pk_1:
Create table pk_fk_1 references to the primary key of pk_1:
Create table pk_fk_2 references to the primary key of the pk_fk_1 table:
Before altering the reference columns or deleting data from these columns or dropping keys you need to analyze the dependencies because you can’t directly perform the operations on these columns:
1. You can generate a dependent matrix for each table and list the tables in the order of dependency. Then, prepare the DELETE operations in the same order.
Generate a dependent matrix
You can generate a dependent matrix using the following query (we use the pk_1 table as an example):
It generates three levels of dependency matrix for the pk_1 table. The following is an example of the dependent order for the table:
2. You can implement ON DELETE CASCADE foreign keys. In this case, the database takes care of the delete from all the dependent tables.
Generate CREATE foreign keys
The following query generates SQLs for creating the foreign keys with ON DELETE CASCADE and the NOT VALID option. We use the NOT VALID option to skip validation because it’s already done for the current data and for new data, so validation will be enforced. It saves a lot of time for validation.
Generate DROP foreign keys
The following query generates SQLs to drop the current foreign keys:
Run the DROP and ALTER operations to modify the foreign keys:
Let’s delete some rows from the pk_1 table and check if it’s cascaded to the other tables, pk_fk_1 and pk_fk_2:
The DELETE operation from pk_1 cascaded to the pk_fk_1 and pk_fk_2 tables, and nine rows are deleted from all three tables.
Considerations
This option has the following advantages:
You don’t need to generate a dependent matrix
You don’t need to take care of dependent deletion because the database will make sure of the order of deletion
You don’t need to manually delete table by table
However, it has the following disadvantages:
You need to recreate foreign keys that aren’t ON DELETE CASCADE.
A single DELETE operation on the parent table would delete from all the dependent tables in a single transaction. Therefore, you need to test and monitor replicas for any delay during the cleanups.
There is also an overhead due to the internal triggers created for foreign keys.
User dependencies
All the objects in this post are created through the test_dependent user. The following code shows what happens if you try to drop the user:
You can’t drop the user unless you reassign the ownership of the objects to a different user or revoke if there are any other privileges.
You can use the following command to reassign the ownership of test_dependent to the postgres user:
Now try to drop the user:
As you see, there are still public schema privileges to test_dependent. You need to revoke these privileges to drop the user:
You can now successfully drop the user:
Cleanup
Create snapshots and Terminate EC2 and delete Amazon RDS instances after completing all testing to prevent unnecessary expenses.
Conclusion
In the initial segment of our series on object dependencies in PostgreSQL, we introduced the concept and explored various types of dependencies. In this second part of the blog, we delved into the PostgreSQL catalog pg_depend table, which manages dependency relationships among database objects. This information is crucial for DROP commands to identify objects that need to be dropped using DROP CASCADE or to prevent dropping in the DROP RESTRICT case.
Understanding PostgreSQL dependencies is essential for managing your database effectively. It helps you navigate complex relationships between objects and make informed decisions about object modification or deletion. Although tracking dependencies may seem intricate at first, it provides PostgreSQL with great flexibility and control.
If you have any questions or suggestions about this post, feel free to leave a comment. We hope the information we shared helps!
About the authors
Baji Shaik is a Sr. Database Consultant with AWS Professional Services, Global Competency Centre. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on premises Oracle, SQL Server to Amazon RDS and Amazon Aurora PostgreSQL/MySQL. He is an eminent author, having written several books on PostgreSQL. A few of his recent works include “
Procedural Programming with PostgreSQL PL/pgSQL†“
PostgreSQL Configuration,†“
Beginning PostgreSQL on the Cloud,†and “
PostgreSQL Development Essentials.†Furthermore, he has delivered several conference and workshop sessions.
Wajid Ali Mir is a Database Consultant at AWS and works as a database migration specialist, helping and enabling customers to build highly available, cost-effective database solutions and migrate their database workloads to the AWS Cloud. Wajid works with AWS customers and partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.
Viswanatha Shastry Medipalli is a Consultant with the AWS ProServe team. His background spans a wide depth and breadth of expertise and experience in database migrations. He has architected and designed many successful database solutions addressing challenging business requirements. He has provided solutions using Oracle, SQL Server and PostgreSQL for reporting, BI, applications, and development support. He also has a good knowledge of automation, and orchestration. His focus area is homogeneous and heterogeneous migrations of on-premise databases to Amazon.
Yaser Raja is a Principal Strategic Tech Advisor with Professional Services team at Amazon Web Services. He works with customers to build scalable, highly available and secure solutions in AWS cloud. His focus area includes database migrations and modernization, and leveraging Generative AI capabilities to build innovative solutions for the customers.
Source: Read More