Quick Tip: Replacing non-UTF-8 Characters

Vertica database servers expect to receive all data in UTF-8 and Vertica outputs all data in UTF-8. However, you can load or insert data into…

OpenText profile picture
OpenText

July 16, 20212 minute read

Vertica database servers expect to receive all data in UTF-8 and Vertica outputs all data in UTF-8. However, you can load or insert data into Vertica that is non UTF-8, but you’ll want to clean it up.

I used to recommend the REGEXP_REPLACE function for that task, but now there is a better way!

Vertica 10.1.x introduces the MAKEUTF8 built-in function that coerces a string to UTF-8 by removing or replacing non-UTF-8 characters.

The old way of removing non-UTF-8 characters:


dbadmin=> \d nonutf_test;
                                       List of Fields by Tables
 Schema |    Table    | Column |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------------+--------+--------------+------+---------+----------+-------------+-------------
 public | nonutf_test | c      | varchar(100) |  100 |         | f        | f           |
(1 row)

dbadmin=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT c, REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b') FROM nonutf_test ;
    c    | REGEXP_REPLACE
---------+----------------
 ▒Test!▒ | Test!
(1 row)

dbadmin=> SELECT * FROM nonutf_test;
    c
---------
 ▒Test!▒
(1 row)

dbadmin=> UPDATE /*+ DIRECT */ nonutf_test SET c = REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b');
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT * FROM nonutf_test;
   c
-------
 Test!
(1 row)

Now the new and improved way:


verticademos=> TRUNCATE TABLE nonutf_test;
TRUNCATE TABLE

verticademos=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
 OUTPUT
--------
      1
(1 row)

verticademos=> SELECT c, makeutf8(c) FROM nonutf_test;
    c    | makeutf8
---------+----------
 ▒Test!▒ | Test!
(1 row)

verticademos=> SELECT c, makeutf8(c) "Removed non-UTF8 Junk!" FROM nonutf_test;
    c    | Removed non-UTF8 Junk!
---------+------------------------
 ▒Test!▒ | Test!
(1 row)

verticademos=> UPDATE nonutf_test SET c = makeutf8(c);
 OUTPUT
--------
      1
(1 row)

verticademos=> SELECT * FROM nonutf_test;
   c
-------
 Test!
(1 row)

Share this post

Share this post to x. Share to linkedin. Mail to
OpenText avatar image

OpenText

OpenText, The Information Company, enables organizations to gain insight through market-leading information management solutions, powered by OpenText Cloud Editions.

See all posts

More from the author

Manutan combines digital services with the human touch to delight customers

Manutan combines digital services with the human touch to delight customers

At Manutan, we equip businesses and communities with the products and services they require to succeed. Headquartered in France, our company has three divisions, serving…

4 minute read

Reaching new markets in Europe and beyond

Reaching new markets in Europe and beyond

How information management specialists at One Fox slashed time to market for innovative products with OpenText Cloud Platform Services At One Fox, we’ve driven some…

4 minute read

SoluSoft helps government agencies tackle fraud faster

SoluSoft helps government agencies tackle fraud faster

Fraud, in all its forms, is a pervasive problem, spanning industries and preying on vulnerabilities in federal and state government systems. Each year in the…

3 minute read

Stay in the loop!

Get our most popular content delivered monthly to your inbox.