Another Way to De-Duplicate Table Rows: Quick Tip

To remove duplicate table rows it is typically better (i.e. faster) to create a temporary table that contains all of the distinct rows from a…

To remove duplicate table rows it is typically better (i.e. faster) to create a temporary table that contains all of the distinct rows from a table, drop the original table, then rename the temp table to the original table’s name.

Example:

dbadmin=> SELECT * FROM dups; c1 | c2 ----+---- 1 | A 1 | A 1 | A 2 | B 3 | C 3 | C 4 | D (7 rows)


dbadmin=> CREATE TABLE dups_new LIKE dups INCLUDING PROJECTIONS;

CREATE TABLE
dbadmin=> INSERT /*+ DIRECT */ INTO dups_new SELECT DISTINCT * FROM dups;

OUTPUT

--------

      4

(1 row)
dbadmin=> DROP TABLE dups;

DROP TABLE
dbadmin=> ALTER TABLE dups_new RENAME TO dups;

ALTER TABLE

dbadmin=> SELECT * FROM dups; c1 | c2 ----+---- 1 | A 2 | B 3 | C 4 | D (4 rows)

The issue with that solution is that you’ll need to be sure that the original table grants are restored if they exist.

For smaller tables that have duplicate rows, here is another method to remove them that doesn’t involve creating a new table.

dbadmin=> SELECT * FROM dups2; c1 | c2 ----+---- 1 | A 1 | A 1 | A 2 | B 3 | C 3 | C 4 | D (7 rows)


dbadmin=> INSERT /*+ DIRECT */ INTO dups2 SELECT DISTINCT c1, c2 FROM dups;

OUTPUT

--------

      4

(1 row)
dbadmin=> DELETE /*+ DIRECT */ FROM dups2 WHERE epoch IS NOT NULL;

OUTPUT

--------

      7

(1 row)
dbadmin=> SELECT * FROM dups2;

c1 | c2

----+----

  1 | A

  2 | B

  3 | C

  4 | D

(4 rows)

dbadmin=> COMMIT; COMMIT

This method works because the hidden table EPOCH column is NULL for each row inserted until you issue a COMMIT statement.

OpenText

OpenText, The Information Company, enables organizations to gain insight through market-leading information management solutions, powered by OpenText Cloud Editions.

See all posts